Reliability DOE for Life Tests
Reliability DOE
Reliability analysis is commonly thought of as an approach to model failures of existing products. The usual reliability analysis involves characterization of failures of the products using distributions such as exponential, Weibull and lognormal. Based on the fitted distribution, failures are mitigated, or warranty returns are predicted, or maintenance actions are planned. However, reliability analysis can also be used as a powerful tool to design robust products that operate with minimal failures, by adopting the methodology of Design for Reliability (DFR). In DFR, reliability analysis is carried out in conjunction with physics of failure and experiment design techniques. Under this approach, Design of Experiments (DOE) uses life data to "build" reliability into the products, not just to quantify the existing reliability. Such an approach, if properly implemented, can result in significant cost savings, especially in terms of fewer warranty returns or repair and maintenance actions. Although DOE techniques can be used to improve product reliability and also make this reliability robust to noise factors, the discussion in this chapter is focused on reliability improvement.
Reliability DOE Analysis
Reliability DOE (R-DOE) analysis is fairly similar to the analysis of other designed experiments except that the response is the life of the product in the respective units (e.g. for an automobile component the units of life may be miles, for a mechanical component this may be cycles, and for a pharmaceutical product this may be months or years). However, two important differences exist that make R-DOE analysis unique. The first is that life data of most products are typically well modeled by either the lognormal, Weibull or exponential distribution, but usually do not follow the normal distribution. Traditional DOE techniques follow the assumption that response values at any treatment level follow the normal distribution and therefore, the error terms, , can be assumed to be normally and independently distributed. This assumption may not be valid for the response data used in most of the R-DOE analyses. Further, the life data obtained may either be complete or censored and in this case standard regression techniques applicable to the response data in traditional DOEs can no longer be used.
Stresses affecting the life of the product may also be investigated using R-DOE analysis. In this case, the primary purpose of any R-DOE analysis is to identify which of the investigated stresses affect the life of the product (by investigating if change in the level of any stress leads to a significant change in the life of the product). Once the important stresses affecting the life of the product have been identified, detailed analyses can be carried out using ReliaSoft's ALTA software. ALTA includes a number of life-stress relationships (LSRs) to model the relation between life and the stress affecting the life of the product. [Note]
R-DOE Analysis of Lognormally Distributed Data
Assume that the life, , for a certain product has been found to be lognormally distributed. The probability density function for the lognormal distribution is: (1)
where represents the mean of the natural logarithm of the times-to-failure and represents the standard deviation of the natural logarithms of the times-to-failure [19]. If the analyst wants to investigate a single two level factor that may affect the life, , then the following model may be used:
(2)
where:
represents the times-to-failure at the th treatment level of the factor.
represents the mean value of for the th treatment.
is the random error term.
the subscript represent the treatment level of the factor with for a two level factor.
The model of Eqn. (2) is analogous to the ANOVA model, , used in Chapter 6 for traditional DOE analyses. Note, however, that the random error term, , is not normally distributed here because the response, , is lognormally distributed. It is known that the logarithmic value of a lognormally distributed random variable follows the normal distribution. Therefore, if the logarithmic transformation of , , is used in Eqn. (2), the model will be identical to the ANOVA model, , used in Chapter 6. Thus, using the logarithmic failure times, the model can be written as:
(3)
where:
represents the logarithmic times-to-failure at the th treatment.
represents the mean of the natural logarithm of the times-to-failure at the th treatment.
represents the standard deviation of the natural logarithms of the times-to-failure.
The random error term, , is normally distributed because the response, , is normally distributed. Since the model of Eqn. (3) is identical to the ANOVA model used in traditional DOE analysis, regression techniques can be applied here and the R-DOE analysis can be carried out similar to the traditional DOE analyses. Recall from Chapter 7 that if the factor(s) affecting the response has only two levels, then the notation of the regression model can be applied to the ANOVA model. Therefore, the model of Eqn. (3) can be written using a single indicator variable, , to represent the two level factor as: [Note] (4)
where is the intercept term and is the effect coefficient for the investigated factor. Setting Eqns. (3) and (4) equal to each other returns: (5)
The natural logarithm of the times-to-failure at any factor level, , is referred to as the life characteristic because it represents a characteristic point of the underlying life distribution. The life characteristic used in the R-DOE analysis will change based on the underlying distribution assumed for the life data. If the analyst wants to investigate the effect of two factors (each at two levels) on the life of the product, then the life characteristic equation can be easily expanded as follows:
where is the effect coefficient for the second factor and is the indicator variable representing the second factor. If the interaction effect is also to be investigated, then the following equation can be used:
In general the model to investigate a given number of factors can be expressed as:
(6)
Based on the model equations mentioned thus far, the analyst can easily conduct an R-DOE analysis for the lognormally distributed life data using standard regression techniques. However this is no longer true once the data also includes censored observations. In the case of censored data, the analysis has to be carried out using maximum likelihood estimation (MLE) techniques.
Maximum Likelihood Estimation for the Lognormal Distribution
The maximum likelihood estimation method can be used to estimate parameters in R-DOE analyses when censored data are present. The likelihood function is calculated for each observed time to failure, , and the parameters of the model are obtained by maximizing the log-likelihood function. The likelihood function for complete data following the lognormal distribution is given as:
where:
is the total number of observed times-to-failure.
is the life characteristic and has been substituted based on Eqn. (6).
is the time of the th failure.
For right censored data the likelihood function is:[19]
where:
is the total number of observed suspensions.
is the time of th suspension.
For interval data the likelihood function is:[19]
where:
is the total number of interval data.
is the beginning time of the th interval.
is the end time of the th interval.
The complete likelihood function when all types of data (complete, right censored and interval) are present is: (7)
Then the log-likelihood function is:
(8)
The MLE estimates are obtained by solving for parameters so that:
Once the estimates are obtained, the significance of any parameter, , can be assessed using the likelihood ratio test.
Hypothesis Tests
Hypothesis testing in R-DOE analyses is carried out using the likelihood ratio test. To test the significance of a factor, the corresponding effect coefficient(s), , is tested. The following statements are used:
The statistic used for the test is the likelihood ratio, . The likelihood ratio for the parameter is calculated as follows:
(9)
where:
is the vector of all parameter estimates obtained using MLE (i.e. ...).
is the vector of all parameter estimates excluding the estimate of .
is the value of the likelihood function when all parameters are included in the model.
is the value of the likelihood function when all parameters except are included in the model.
If the null hypothesis, , is true then the ratio, , follows the Chi-Squared distribution with one degree of freedom. Therefore, is rejected at a significance level, , if is greater than the critical value .
The likelihood ratio test can also be used to test the significance of a number of parameters, , at the same time. In this case, represents the likelihood value when all parameters to be tested are not included in the model. In other words, would represent the likelihood value for the reduced model that does not contain the parameters under test. Here, the ratio will follow the Chi-Squared distribution with degrees of freedom if all parameters are insignificant (with representing the number of parameters in the full model). Thus, if , the null hypothesis, , is rejected and it can be concluded that at least one of the parameters is significant.
Example 11.1
To illustrate the use of MLE in R-DOE analysis, consider the case where the life of a product is thought to be affected by two factors, and . The failure of the product has been found to follow the lognormal distribution. The analyst decides to run an R-DOE analysis using a single replicate of the 2 design. Previous studies indicate that the interaction between and does not affect the life of the product. The design for this experiment can be set up in DOE++ as shown in Figure 11.1. The resulting experiment design and the corresponding times-to-failure data obtained are shown in Figure 11.2. Note that, although the life data shown in Figure 11.2 is complete data and regression techniques are applicable, calculations are shown using MLE. DOE ++ uses MLE for all R-DOE analysis calculations.
Figure 11.1: Design properties for the experiment in Example 11.1.
Figure 11.2: The 2 experiment design and the corresponding life data for Example 11.1.
Because the purpose of the experiment is to study two factors without considering their interaction, the applicable model for the lognormally distributed response data is: (10)
where is the mean of the natural logarithm of the times-to-failure at the th treatment combination (), is the effect coefficient for factor and is the effect coefficient for factor . The analysis for this case is carried out in DOE++ by dropping the interaction using the Select Effects icon in the Control Panel.
The following hypotheses need to be tested in this example:
This test investigates the main effect of factor . The statistic for this test is:
where represents the value of the likelihood function when all coefficients are included in the model and represents the value of the likelihood function when all coefficients except are included in the model.
This test investigates the main effect of factor . The statistic for this test is:
where represents the value of the likelihood function when all coefficients are included in the model and represents the value of the likelihood function when all coefficients except are included in the model.
To calculate the test statistics, the maximum likelihood estimates of the parameters must be known. The estimates are obtained next.
MLE Estimates
Since the life data for the present experiment are complete and follow the lognormal distribution, the likelihood function can be written as:
Substituting from Eqn. (10), the likelihood function is:
Then the log-likelihood function is:
(11)
To obtain the MLE estimates of the parameters, and , the log-likelihood function must be differentiated with respect to these parameters:
Equating the terms to zero returns the required estimates. The coefficients , and are obtained first as these are required to estimate . Setting :
Substituting the values of , and from Figure 11.2 and simplifying:
Thus:
(12)
Setting :
Thus:
(13)
Setting :
Thus:
(14)
Knowing and , can now be obtained. Setting :
Thus:
(15)
Once the estimates have been calculated, the likelihood ratio test can be carried out for the two factors.
Likelihood Ratio Test
The likelihood ratio test for factor is conducted by using the likelihood value corresponding to the full model and the likelihood value when is not included in the model. The likelihood value corresponding to the full model (in this case ) is:
The corresponding logarithmic value is .
The likelihood value for the reduced model that does not contain factor (in this case ) is:
The corresponding logarithmic value is .
Therefore, the likelihood ratio to test the significance of factor is: (16)
The value corresponding to is:
Assuming that the desired significance level for the present experiment is 0.1, since , cannot be rejected and it can be concluded that factor does not affect the life of the product.
The likelihood ratio to test factor can be calculated in a similar way as shown next: (17)
The value corresponding to is:
Since , is rejected and it is concluded that factor affects the life of the product. The previous calculation results are displayed as the Likelihood Ratio Test Table in the results obtained from DOE++ as shown in Figure 11.3.
Figure 11.3: Likelihood ratio test results from DOE++ for the experiment in Example 11.1.
Fisher Matrix Bounds on Parameters
In general, the MLE estimates of the parameters are asymptotically normal. This means that for large sample sizes the distribution of the estimates from the same population would be very close to the normal distribution [12]. If is the MLE estimate of any parameter, , then the ()% two-sided confidence bounds on the parameter are:
(18)
where represents the variance of and is the critical value corresponding to a significance level of on the standard normal distribution. [Note] The variance of the parameter, , is obtained using the Fisher information matrix. For parameters, the Fisher information matrix is obtained from the log-likelihood function as follows:
(19)
The variance-covariance matrix is obtained by inverting the Fisher matrix :
Once the variance-covariance matrix is known the variance of any parameter can be obtained from the diagonal elements of the matrix. Note that if a parameter, , can take only positive values, it is assumed that the follows the normal distribution [12]. The bounds on the parameter in this case are:
Using we get . Substituting this value we have:
(20)
Knowing from the variance-covariance matrix, the confidence bounds on can then be determined.
Example 11.2
Continuing with Example 11.1, the confidence bounds on the MLE estimates of the parameters , , and can now be obtained. The Fisher information matrix for the example is:
The variance-covariance matrix can be obtained by taking the inverse of the Fisher matrix :
Inverting returns the following matrix:
Therefore, the variance of the parameter estimates are:
Knowing the variance, the confidence bounds on the parameters can be calculated. For example, the 90% bounds () on can be calculated as shown next:
The 90% bounds on are (considering that can only take positive values):
The standard error for the parameters can be obtained by taking the positive square root of the variance. For example, the standard error for is:
The statistic for is:
The value corresponding to this statistic based on the standard normal distribution is:
The previous calculation results are displayed as MLE Information in the results obtained from DOE++ as shown in Figure 11.4. In the figure, the Effect corresponding to each factor is simply twice the MLE estimate of the coefficient for that factor. Generally, the value corresponding to any coefficient in the MLE Information table should match the value obtained from the likelihood ratio test (displayed in the Likelihood Ratio Test table of Figure 11.3). If the sample size is not large enough, as in the case of the present example, a difference may be seen in the two values. In such cases, the value from the likelihood ratio test should be given preference. For the present example, the value of 0.8318 for , obtained from the likelihood ratio test, would be preferred to the value of 0.8313 displayed under MLE information. For details see [12].
Figure 11.4: MLE information from DOE++ for Example 11.2.
R-DOE Analysis of Data Following the Weibull Distribution
The Weibull Distribution is one of the commonly used distributions (in addition to lognormal and exponential) to conduct Reliability DOE analysis to investigate stresses that affect the life of a product.
The probability density function for the two parameter Weibull distribution is:
where is the scale parameter of the Weibull distribution and is the shape parameter.[19] To distinguish the Weibull shape parameter from the effect coefficients, the shape parameter is represented as instead of in the remaining chapter.
For data following the two parameter Weibull distribution, the life characteristic used in R-DOE analysis is the scale parameter, .[18] Since represents life data that cannot take negative values, a logarithmic transformation is applied to it. The resulting model used in the R-DOE analysis for a two factor experiment with each factor at two levels can be written as follows: (22)
where:
is the value of the scale parameter at the th treatment combination of the two factors.
is the indicator variable representing the level of the first factor.
is the indicator variable representing the level of the second factor.
is the intercept term.
and are the effect coefficients for the two factors.
is the effect coefficient for the interaction of the two factors.
The model can be easily expanded to include other factors and their interactions. Note that when any data follows the Weibull distribution, the logarithmic transformation of the data follows the extreme-value distribution, whose probability density function is given as follows: (23)
where the s follows the Weibull distribution, is the location parameter of the extreme-value distribution and is the scale parameter of the extreme-value distribution. [Note] Eqns. (22) and (23) show that for R-DOE analysis of life data that follows the Weibull distribution, the random error terms, , will follow the extreme-value distribution (and not the normal distribution). Hence, regression techniques are not applicable even if the data is complete. Therefore, maximum likelihood estimation has to be used.
Maximum Likelihood Estimation for the Weibull Distribution
The likelihood function for complete data in R-DOE analysis of Weibull distributed life data is:
where:
is the total number of observed times-to-failure
is the life characteristic at the th treatment
is the time of the th failure
For right censored data, the likelihood function is:
where:
is the total number of observed suspensions
is the time of th suspension
For interval data, the likelihood function is:
where:
is the total number of interval data
is the beginning time of the th interval
is the end time of the th interval
In each of the likelihood functions, is substituted based on Eqn. (22) as:
The complete likelihood function when all types of data (complete, right and left censored) are present is:
Then the log-likelihood function is:
The MLE estimates are obtained by solving for parameters so that:
Once the estimates are obtained, the significance of any parameter, , can be assessed using the likelihood ratio test. Other results can also be obtained as discussed in Chapter 11, Maximum Likelihood Estimation for the Lognormal Distribution and Chapter 11, Fisher Matrix Bounds on Parameters.
R-DOE Analysis of Data Following the Exponential Distribution
The exponential distribution is a special case of the Weibull distribution when the shape parameter is equal to 1. Substituting in the probability density function of Eqn. (21) gives: (24)
where of Eqn. (21) has been replaced by . Parameter is called the failure rate [19]. Hence, R-DOE analysis for exponentially distributed data can be carried out by substituting and replacing by in the Weibull distribution.
Model Diagnostics
Residual plots can be used to check if the model obtained, based on the MLE estimates, is a good fit to the data. DOE++ uses standardized residuals for R-DOE analyses. If the data follows the lognormal distribution, then standardized residuals are calculated using the following equation: (25)
For the probability plot, the standardized residuals are displayed on a normal probability plot. This is because under the assumed model for the lognormal distribution, the standardized residuals should follow a normal distribution with a mean of 0 and a standard deviation of 1.
For data that follows the Weibull distribution, the standardized residuals are calculated as shown next: (26)
The probability plot, in this case, is used to check if the residuals follow the extreme-value distribution with a mean of 0. Note that in all residual plots, when an observation, , is censored the corresponding residual is also censored.
Application Examples
Example 11.3
Figure 11.5: The 2 experiment design for Example 11.3 to study factors affecting the reliability of fluorescent lights.
Figure 11.6: Results of the R-DOE analysis for the experiment in Example 11.3.
This example illustrates the use of R-DOE analysis to design reliability into the products. An experiment was carried out to investigate the effect of five factors (each at two levels) on the reliability of fluorescent lights (Taguchi, 1987, p. 930). The factors, through , were studied using a 2 design (with the defining relations and ) under the assumption that all interaction effects, except , can be assumed to be inactive. For each treatment, two lights were tested (two replicates) with the readings taken every two days. The experiment was run for 20 days and, if a light had not failed by the 20th day, it was assumed to be a suspension. The experimental design and the corresponding failure times are shown in Figure 11.5. The short duration of the experiment and failure times were probably because the lights were tested under conditions which resulted in stress higher than normal conditions. The failure of the lights was assumed to follow the lognormal distribution.
The analysis results from DOE++ for this experiment are shown in Figure 11.6. The results are obtained by selecting the main effects of the five factors and the interaction using the Select Effects icon in the Control Panel. The results show that factors , and are active at a significance level of 0.05. The MLE estimates of the effect coefficients corresponding to these factors are , and , respectively. Based on these coefficients, the best settings for these effects to improve the reliability of the fluorescent lights (by maximizing the response, which in this case is the failure time) are:
- Factor should be set at the lower level of since its coefficient is negative
- Factor should be set at the higher level of since its coefficient is positive
- Factor should be set at the lower level of since its coefficient is negative
Note that, since actual factor levels are not disclosed (presumably for proprietary reasons), predictions beyond the test conditions cannot be carried out in this case.
Example 11.4
Consider a product whose reliability is thought to be affected by eight potential factors - (temperature), (humidity), (load), (fan-speed), (voltage), (material), (vibration) and (current). Assuming that all interaction effects are absent, a 2 design is used to investigate the eight factors at two levels. The generators used to obtain the design are , , and . The design and the corresponding life data obtained are shown in Figure 11.7. Readings for the experiment are taken every 20 time units and the test is terminated at 200 time units. The life of the product is assumed to follow the Weibull distribution.
The results from DOE++ for this experiment are shown in Figure 11.8. The results show that only factors and are active at a significance level of 0.1. Assume that, in terms of the actual units, the level of factor corresponds to a temperature of 333 and the level corresponds to a temperature of 383 . Similarly, assume that the two levels of factor are 1000 and 2000 respectively. From the MLE estimates of the effect coefficients it can be noted that to improve reliability (by maximizing the response) factors and should be set as follows:
- Factor should be set at the lower level of 333 since its coefficient is negative
- Factor should be set at the higher level of 2000 since its coefficient is positive
Figure 11.7: The 2 design to investigate the reliability of a product for Example 11.4.
Figure 11.8: Results for the experiment in Example 11.4.
Now assume that the use conditions for the product for the significant factors, and , are a temperature of 298 and a fan-speed of 3000 respectively. The analysis can be taken a step further to obtain an estimate of the reliability of the product at the use conditions using ReliaSoft's ALTA software. The data is entered into ALTA as shown in Figure 11.9. ALTA allows for modeling of the nature of relationship between life and stress. It is assumed that the relation between life of the product and temperature follows the Arrhenius relation [18] while the relation between life and fan-speed follows the inverse power law relation [18]. Using these relations ALTA fits the following model for the data in Figure 11.9:
(27)
Figure 11.9: Additional reliability analysis for Example 11.4, conducted using ReliaSoft's ALTA software.
Based on this model the B10 life of the product at the use conditions is obtained as shown next. [Note] The Weibull reliability equation is: (28)
Substituting the value of from Eqn. (27) and the value of as obtained from ALTA, the reliability equation becomes:
Finally, substituting the use conditions (Temp , Fan-Speed ) and the desired reliability value of 90%, the B10 life is obtained:
Therefore, at the use conditions, the B10 life of the product is 225 time units. This result and other reliability metrics can be directly obtained from ALTA.
Additional R-DOE Analyses on Single Factor Experiments
DOE++ also allows for the analysis of single factor R-DOE experiments. This analysis is similar to the analysis of single factor designed experiments mentioned in Chapter 6. In single factor R-DOE analysis, the focus is on discovering whether change in the level of a factor affects reliability and how each of the factor levels are different from the other levels. The analysis models and calculations are similar to multi-factor R-DOE analysis.
Example 11.5
To illustrate single factor R-DOE analysis, consider the data in Table 11.1 where life data readings for a product are taken at three levels of a certain factor, . Factor may either be a stress that is thought to affect life or three different designs of the same product or the same product manufactured by three different machines or operators, etc. The goal of the experiment is to see if there is a change in life due to change in the levels of the factor. The design for this experiment is shown in Figure 11.10. The life of the product is assumed to follow the Weibull distribution. Therefore, the life characteristic to be used in the R-DOE analysis is the scale parameter, . Since factor has three levels, the model for the life characteristic, , is: (29)
where is the intercept, is the effect coefficient for the first level of the factor ( is represented as "A[1]" in DOE++) and is the effect coefficient for the second level of the factor ( is represented as "A[2]" in DOE++). Two indicator variables, and are the used to represent the three levels of factor such that:
Table 11.1: Data obtained from a single factor R-DOE experiment.
The following hypothesis test needs to be carried out in this example:
where . The statistic for this test is:
where is the value of the likelihood function corresponding to the full model, and is the likelihood value for the reduced model. To calculate the statistic for this test, the MLE estimates of the parameters must be obtained.
Figure 11.10: Experiment design for Example 11.5.
MLE Estimates
Following the procedure used in the analysis of multi-factor R-DOE experiments, MLE estimates of the parameters are obtained by differentiating the log-likelihood function :
Substituting from Eqn. (29) and setting the partial derivatives to zero, the parameter estimates are obtained as , , and . These parameters are shown in Figure 11.11 in the MLE Information table.
Figure 11.11: MLE results for the experiment in Example 11.5.
Likelihood Ratio Test
Knowing the MLE estimates, the likelihood ratio test for the significance of factor can be carried out. The likelihood value for the full model, , is the value of the likelihood function corresponding to the model :
The likelihood value for the reduced model, , is the value of the likelihood function corresponding to the model :
Then the likelihood ratio is:
If the null hypothesis, , is true then the likelihood ratio will follow the Chi-Squared distribution. The number of degrees of freedom for this distribution is equal to the difference in the number of parameters between the full and the reduced model. In this case, this difference is 2. The value corresponding to the likelihood ratio on the Chi-Squared distribution with two degrees of freedom is:
Assuming that the desired significance is 0.1, since , is rejected it is concluded that, at a significance of 0.1, at least one of the parameters, or , is non-zero. Therefore, factor affects the life of the product. This result is shown in the Likelihood Ratio Test table in Figure 11.11.
Additional results for single factor R-DOE analysis obtained from DOE ++ include information on the life characteristic and comparison of life characteristics at different levels of the factor.
Life Characteristic Summary Results
Results in the Life Characteristic Summary table, include information about the life characteristic corresponding to each treatment level of the factor. If is represented as , then Eqn. (29) can be written as:
The respective equations for all three treatment levels for a single replicate of the experiment can be expressed in matrix notation as:
where:
Knowing , and , the predicted value of the life characteristic at any level can be obtained. For example, for the second level:
Thus:
The variance for the predicted values of life characteristic can be calculated using the following equation:
where is the variance-covariance matrix for , and . [Note] Substituting the required values:
From the previous matrix, . Therefore, the 90% confidence interval () on is:
Since the 90% confidence interval on is:
Results for other levels can be calculated in a similar manner and are shown in Figure 11.12.
Figure 11.12: Life characteristic results for the experiment in Example 11.5.
Life Comparisons Results
Results under Life Comparisons include information on how life is different at a level in comparison to any other level of the factor. For example, the difference between the predicted values of life at levels 1 and 2 is (in terms of the logarithmic transformation):
The pooled standard error for this difference can be obtained as:
If the covariance between and is taken into account, then the pooled standard error is:
This is the value displayed by DOE++. Knowing the pooled standard error the confidence interval on the difference can be calculated. The 90% confidence interval on the difference in (logarithmic) life between levels 1 and 2 of factor is:
Since the confidence interval does not include zero it can be concluded that the two levels are significantly different at . Another way to test for the significance of the difference in levels is to observe the value. The statistic corresponding to this difference is:
The value corresponding to this statistic, based on the standard normal distribution, is:
Since it can be concluded that the levels are significantly different at . The results for other levels can be calculated in a similar manner and are shown in Figure 11.12.