Parameter Estimation
Parameter Estimation
Parameter estimation refers to the process of using sample data (in our case times-to-failure or suceess data) to estimate the parameters of the selected distribution. Several parameter estimation methods are available. This section presents an overview of the available methods used in life data analysis. More specidically we start with the relatively simple method of probability plotting and continues with the more sophisticated methods of rank regression ( or least squares) and maximum likelihood.
Parameter Estimation
Parameter estimation refers to the process of using sample data (in our case times-to-failure or suceess data) to estimate the parameters of the selected distribution. Several parameter estimation methods are available. This section presents an overview of the available methods used in life data analysis. More specidically we start with the relatively simple method of probability plotting and continues with the more sophisticated methods of rank regression ( or least squares) and maximum likelihood. Template loop detected: Template:Probability Plotting
Template loop detected: Template:Rank Regression or Least Squares Parameter Estimation
MLE (Maximum Likelihood) Parameter Estimation for Complete Data
From a statistical point of view, the method of maximum likelihood estimation is, with some exceptions, considered to be the most robust of the parameter estimation techniques discussed here. This method is presented in this section for complete data, that is, data consisting only of single times-to-failure.
Background on Theory
The basic idea behind MLE is to obtain the most likely values of the parameters, for a given distribution, that will best describe the data. As an example, consider the following data (-3, 0, 4) and assume that you are trying to estimate the mean of the data. Now, if you have to choose the most likely value for the mean from -5, 1 and 10, which one would you choose? In this case, the most likely value is 1 (given your limit on choices). Similarly, under MLE, one determines the most likely values for the parameters of the assumed distribution.
It is mathematically formulated as follows:
If [math]\displaystyle{ x }[/math] is a continuous random variable with [math]\displaystyle{ pdf: }[/math]
- [math]\displaystyle{ f(x;{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) }[/math]
where [math]\displaystyle{ {\theta _1},{\theta _2},...,{\theta _k} }[/math] are [math]\displaystyle{ k }[/math] unknown parameters which need to be estimated, with independent observations, , which correspond in the case of life data analysis to failure times. The likelihood function is given by:
- [math]\displaystyle{ L({{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}|{{x}_{1}},{{x}_{2}},...,{{x}_{R}})=L=\underset{i=1}{\overset{R}{\mathop \prod }}\,f({{x}_{i}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) }[/math]
- [math]\displaystyle{ i=1,2,...,R }[/math]
The logarithmic likelihood function is given by:
- [math]\displaystyle{ \Lambda = \ln L =\sum_{i = 1}^R \ln f({x_i};{\theta _1},{\theta _2},...,{\theta _k}) }[/math]
The maximum likelihood estimators (or parameter values) of [math]\displaystyle{ {{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}, }[/math] are obtained by maximizing [math]\displaystyle{ L }[/math] or [math]\displaystyle{ \Lambda . }[/math]
By maximizing [math]\displaystyle{ \Lambda , }[/math] which is much easier to work with than [math]\displaystyle{ L }[/math], the maximum likelihood estimators (MLE) of [math]\displaystyle{ {{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}} }[/math] are the simultaneous solutions of [math]\displaystyle{ k }[/math] equations such that:
- [math]\displaystyle{ \frac{\partial{\Lambda}}{\partial{\theta_j}}=0, \text{ j=1,2...,k} }[/math]
Even though it is common practice to plot the MLE solutions using median ranks (points are plotted according to median ranks and the line according to the MLE solutions), this is not completely representative. As can be seen from the equations above, the MLE method is independent of any kind of ranks. For this reason, the MLE solution often appears not to track the data on the probability plot. This is perfectly acceptable since the two methods are independent of each other, and in no way suggests that the solution is wrong.
Comments on the MLE Method
The MLE method has many large sample properties that make it attractive for use. It is asymptotically consistent, which means that as the sample size gets larger, the estimates converge to the right values. It is asymptotically efficient, which means that for large samples, it produces the most precise estimates. It is asymptotically unbiased, which means that for large samples one expects to get the right value on average. The distribution of the estimates themselves is normal, if the sample is large enough, and this is the basis for the usual Fisher Matrix confidence bounds discussed later. These are all excellent large sample properties.
Unfortunately, the size of the sample necessary to achieve these properties can be quite large: thirty to fifty to more than a hundred exact failure times, depending on the application. With fewer points, the methods can be badly biased. It is known, for example, that MLE estimates of the shape parameter for the Weibull distribution are badly biased for small sample sizes, and the effect can be increased depending on the amount of censoring. This bias can cause major discrepancies in analysis.
There are also pathological situations when the asymptotic properties of the MLE do not apply. One of these is estimating the location parameter for the three-parameter Weibull distribution when the shape parameter has a value close to 1. These problems, too, can cause major discrepancies.
However, MLE can handle suspensions and interval data better than rank regression, particularly when dealing with a heavily censored data set with few exact failure times or when the censoring times are unevenly distributed. It can also provide estimates with one or no observed failures, which rank regression cannot do. As a rule of thumb, our recommendation is to use rank regression techniques when the sample sizes are small and without heavy censoring (censoring is discussed in Chapter 4). When heavy or uneven censoring is present, when a high proportion of interval data is present and/or when the sample size is sufficient, MLE should be preferred.
Bayesian Methods
Up to this point, we have dealt exclusively with what is commonly referred to as classical statistics. In this section, another school of thought in statistical analysis will be introduced, namely Bayesian statistics. The premise of Bayesian statistics (within the context of life data analysis) is to incorporate prior knowledge, along with a given set of current observations, in order to make statistical inferences. The prior information could come from operational or observational data, from previous comparable experiments or from engineering knowledge. This type of analysis can be particularly useful when there is limited test data for a given design or failure mode but there is a strong prior understanding of the failure rate behavior for that design or mode. By incorporating prior information about the parameter(s), a posterior distribution for the parameter(s) can be obtained and inferences on the model parameters and their functions can be made. This section is intended to give a quick and elementary overview of Bayesian methods, focused primarily on the material necessary for understanding the Bayesian analysis methods available in Weibull++. Extensive coverage of the subject can be found in numerous books dealing with Bayesian statistics.
Bayes’s Rule
Bayes’s rule provides the framework for combining prior information with sample data. In this reference, we apply Bayes’s rule for combining prior information on the assumed distribution's parameter(s) with sample data in order to make inferences based on the model. The prior knowledge about the parameter(s) is expressed in terms of a [math]\displaystyle{ \varphi (\theta ), }[/math] called the prior distribution. The posterior distribution of [math]\displaystyle{ \theta }[/math] given the sample data, using Bayes rule, provides the updated information about the parameters [math]\displaystyle{ \theta }[/math]. This is expressed with the following posterior [math]\displaystyle{ pdf }[/math]:
- [math]\displaystyle{ f(\theta |Data) = \frac{L(Data|\theta )\varphi (\theta )}{\int_{\zeta}^{} L(Data|\theta )\varphi(\theta )d (\theta)} }[/math]
- where:
- [math]\displaystyle{ \theta }[/math] is a vector of the parameters of the chosen distribution,
- [math]\displaystyle{ \zeta }[/math] is the range of [math]\displaystyle{ \theta }[/math] ,
- [math]\displaystyle{ L(Data|\theta) }[/math]is the likelihood function based on the chosen distribution and data
- [math]\displaystyle{ \varphi(\theta ) }[/math] is the prior distribution for each of the parameters.
The integral in Eqn. (BayesRuleGeneral) is often referred to as the marginal probability and can be interpreted as the probability of obtaining the sample data given a prior distribution and it's a constant number. Generally, the integral in Eqn. (BayesRuleGeneral) does not have a closed form solution and numerical methods are needed for its solution.
As can be seen from Eqn. (BayesRuleGeneral), there is a significant difference between classical and Bayesian statistics. First, the idea of prior information does not exist in classical statistics. All inferences in classical statistics are based on the sample data. On the other hand, in the Bayesian framework, prior information constitutes the basis of the theory. Another difference is in the overall approach of making inferences and their interpretation. For example, in Bayesian analysis the parameters of the distribution to be fitted are the random variables. In reality, there is no distribution fitted to the data in the Bayesian case.
For instance, consider the case where data is obtained from a reliability test. Based on prior experience on a similar product, the analyst believes that shape parameter of the Weibull distribution has a value between [math]\displaystyle{ {\beta _1} }[/math] and [math]\displaystyle{ {{\beta }_{2}} }[/math] and wants to utilize this information. This can be achieved by using the Bayes theorem. At this point, the analyst is automatically forcing the Weibull distribution as a model for the data and with a shape parameter between [math]\displaystyle{ {\beta _1} }[/math] and [math]\displaystyle{ {\beta _2} }[/math]. In this example, the range of values for the shape parameter is the prior distribution, which in this case is Uniform. By applying Eqn. (BayesRuleGeneral), the posterior distribution of the shape parameter will be obtained. Thus, we end up with a distribution for the parameter rather than an estimate of the parameter, as in classical statistics.
To better illustrate the example, assume that a set of failure data was provided along with a distribution for the shape parameter (i.e. uniform prior) of the Weibull (automatically assuming that the data are Weibull distributed). Based on that, a new distribution (the posterior) for that parameter is then obtained using Eqn. (BayesRuleGeneral). This posterior distribution of the parameter may or may not resemble in form the assumed prior distribution. In other words, in this example the prior distribution of [math]\displaystyle{ \beta }[/math] was assumed to be uniform but the posterior is most likely not a uniform distribution.
The question now becomes: what is the value of the shape parameter? What about the reliability and other results of interest? In order to answer these questions, we have to remember that in the Bayesian framework all of these metrics are random variables. Therefore, in order to obtain an estimate, a probability needs to be specified or we can use the expected value of the posterior distribution.
In order to demonstrate the procedure of obtaining results from the posterior distribution, we will rewrite Eqn. (BayesRuleGeneral) for a single parameter [math]\displaystyle{ {\theta _1} }[/math]:
- [math]\displaystyle{ f(\theta |Data) = \frac{L(Data|\theta_1 )\varphi (\theta_1 )}{\int_{\zeta}^{} L(Data|\theta_1 )\varphi(\theta_1 )d (\theta)} }[/math]
The expected value (or mean value) of the parameter [math]\displaystyle{ {{\theta }_{1}} }[/math] can be obtained using Eqns. (mean) and (BayesRuleSingle):
- [math]\displaystyle{ E({\theta _1}) = {m_{{\theta _1}}} = \int_{\zeta}^{}{\theta _1} \cdot f({\theta _1}|Data)d{\theta _1} }[/math]
An alternative result for [math]\displaystyle{ {\theta _1} }[/math] would be the median value. Using Eqns. (median) and (BayesRuleSingle):
- [math]\displaystyle{ \int_{-\infty ,0}^{{\theta }_{0.5}}f({{\theta }_{1}}|Data)d{{\theta }_{1}}=0.5 }[/math]
Eqn. (bayesMedian) is solved for [math]\displaystyle{ {\theta _{0.5}} }[/math] the median value of [math]\displaystyle{ {\theta _1} }[/math]
Similarly, any other percentile of the posterior [math]\displaystyle{ pdf }[/math] can be calculated and reported. For example, one could calculate the [math]\displaystyle{ 90th }[/math] percentile of [math]\displaystyle{ {\theta _1} }[/math]’s posterior [math]\displaystyle{ pdf }[/math]:
- [math]\displaystyle{ \int_{-\infty ,0}^{{{\theta }_{0.9}}}f({{\theta }_{1}}|Data)d{{\theta }_{1}}=0.9 }[/math]
This calculation will be used in Chapter 5 for obtaining confidence bounds on the parameter(s).
The next step will be to make inferences on the reliability. Since the parameter [math]\displaystyle{ {\theta _1} }[/math] is a random variable described by the posterior [math]\displaystyle{ pdf, }[/math] all subsequent functions of [math]\displaystyle{ {{\theta }_{1}} }[/math] are distributed random variables as well and entirely based on the posterior [math]\displaystyle{ pdf }[/math] of [math]\displaystyle{ {{\theta }_{1}} }[/math]. Therefore, expected value, median or other percentile values will also need to be calculated. For example, the expected reliability at time [math]\displaystyle{ T }[/math] is:
- [math]\displaystyle{ E[R(T|Data)] = \int_{\varsigma}^{} R(T)f(\theta |Data)d{\theta} }[/math]
In other words, at a given time [math]\displaystyle{ T }[/math], there is a distribution that governs the reliability value at that time, [math]\displaystyle{ T }[/math], and by using Eqn. (BayesRel), the expected (or mean) value of the reliability is obtained. Other percentiles of this distribution can also be obtained. A similar procedure is followed for other functions of [math]\displaystyle{ {\theta _1} }[/math], such as failure rate, reliable life, etc.
Prior Distributions
Prior distributions play a very important role in Bayesian Statistics. They are essentially the basis in Bayesian analysis. Different types of prior distributions exist, namely informative and non-informative. Non-informative prior distributions (a.k.a. vague, flat and diffuse) are distributions that have no population basis and play a minimal role in the posterior distribution. The idea behind the use of non-informative prior distributions is to make inferences that are not greatly affected by external information or when external information is not available. The uniform distribution is frequently used as a non-informative prior.
On the other hand, informative priors have a stronger influence on the posterior distribution. The influence of the prior distribution on the posterior is related to the sample size of the data and the form of the prior. Generally speaking, large sample sizes are required to modify strong priors, where weak priors are overwhelmed by even relatively small sample sizes. Informative priors are typically obtained from past data.
Parameter Estimation
Parameter estimation refers to the process of using sample data (in our case times-to-failure or suceess data) to estimate the parameters of the selected distribution. Several parameter estimation methods are available. This section presents an overview of the available methods used in life data analysis. More specidically we start with the relatively simple method of probability plotting and continues with the more sophisticated methods of rank regression ( or least squares) and maximum likelihood. Template loop detected: Template:Probability Plotting
Template loop detected: Template:Rank Regression or Least Squares Parameter Estimation
MLE (Maximum Likelihood) Parameter Estimation for Complete Data
From a statistical point of view, the method of maximum likelihood estimation is, with some exceptions, considered to be the most robust of the parameter estimation techniques discussed here. This method is presented in this section for complete data, that is, data consisting only of single times-to-failure.
Background on Theory
The basic idea behind MLE is to obtain the most likely values of the parameters, for a given distribution, that will best describe the data. As an example, consider the following data (-3, 0, 4) and assume that you are trying to estimate the mean of the data. Now, if you have to choose the most likely value for the mean from -5, 1 and 10, which one would you choose? In this case, the most likely value is 1 (given your limit on choices). Similarly, under MLE, one determines the most likely values for the parameters of the assumed distribution.
It is mathematically formulated as follows:
If [math]\displaystyle{ x }[/math] is a continuous random variable with [math]\displaystyle{ pdf: }[/math]
- [math]\displaystyle{ f(x;{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) }[/math]
where [math]\displaystyle{ {\theta _1},{\theta _2},...,{\theta _k} }[/math] are [math]\displaystyle{ k }[/math] unknown parameters which need to be estimated, with independent observations, , which correspond in the case of life data analysis to failure times. The likelihood function is given by:
- [math]\displaystyle{ L({{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}|{{x}_{1}},{{x}_{2}},...,{{x}_{R}})=L=\underset{i=1}{\overset{R}{\mathop \prod }}\,f({{x}_{i}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) }[/math]
- [math]\displaystyle{ i=1,2,...,R }[/math]
The logarithmic likelihood function is given by:
- [math]\displaystyle{ \Lambda = \ln L =\sum_{i = 1}^R \ln f({x_i};{\theta _1},{\theta _2},...,{\theta _k}) }[/math]
The maximum likelihood estimators (or parameter values) of [math]\displaystyle{ {{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}, }[/math] are obtained by maximizing [math]\displaystyle{ L }[/math] or [math]\displaystyle{ \Lambda . }[/math]
By maximizing [math]\displaystyle{ \Lambda , }[/math] which is much easier to work with than [math]\displaystyle{ L }[/math], the maximum likelihood estimators (MLE) of [math]\displaystyle{ {{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}} }[/math] are the simultaneous solutions of [math]\displaystyle{ k }[/math] equations such that:
- [math]\displaystyle{ \frac{\partial{\Lambda}}{\partial{\theta_j}}=0, \text{ j=1,2...,k} }[/math]
Even though it is common practice to plot the MLE solutions using median ranks (points are plotted according to median ranks and the line according to the MLE solutions), this is not completely representative. As can be seen from the equations above, the MLE method is independent of any kind of ranks. For this reason, the MLE solution often appears not to track the data on the probability plot. This is perfectly acceptable since the two methods are independent of each other, and in no way suggests that the solution is wrong.
Comments on the MLE Method
The MLE method has many large sample properties that make it attractive for use. It is asymptotically consistent, which means that as the sample size gets larger, the estimates converge to the right values. It is asymptotically efficient, which means that for large samples, it produces the most precise estimates. It is asymptotically unbiased, which means that for large samples one expects to get the right value on average. The distribution of the estimates themselves is normal, if the sample is large enough, and this is the basis for the usual Fisher Matrix confidence bounds discussed later. These are all excellent large sample properties.
Unfortunately, the size of the sample necessary to achieve these properties can be quite large: thirty to fifty to more than a hundred exact failure times, depending on the application. With fewer points, the methods can be badly biased. It is known, for example, that MLE estimates of the shape parameter for the Weibull distribution are badly biased for small sample sizes, and the effect can be increased depending on the amount of censoring. This bias can cause major discrepancies in analysis.
There are also pathological situations when the asymptotic properties of the MLE do not apply. One of these is estimating the location parameter for the three-parameter Weibull distribution when the shape parameter has a value close to 1. These problems, too, can cause major discrepancies.
However, MLE can handle suspensions and interval data better than rank regression, particularly when dealing with a heavily censored data set with few exact failure times or when the censoring times are unevenly distributed. It can also provide estimates with one or no observed failures, which rank regression cannot do. As a rule of thumb, our recommendation is to use rank regression techniques when the sample sizes are small and without heavy censoring (censoring is discussed in Chapter 4). When heavy or uneven censoring is present, when a high proportion of interval data is present and/or when the sample size is sufficient, MLE should be preferred.
Bayesian Methods
Up to this point, we have dealt exclusively with what is commonly referred to as classical statistics. In this section, another school of thought in statistical analysis will be introduced, namely Bayesian statistics. The premise of Bayesian statistics (within the context of life data analysis) is to incorporate prior knowledge, along with a given set of current observations, in order to make statistical inferences. The prior information could come from operational or observational data, from previous comparable experiments or from engineering knowledge. This type of analysis can be particularly useful when there is limited test data for a given design or failure mode but there is a strong prior understanding of the failure rate behavior for that design or mode. By incorporating prior information about the parameter(s), a posterior distribution for the parameter(s) can be obtained and inferences on the model parameters and their functions can be made. This section is intended to give a quick and elementary overview of Bayesian methods, focused primarily on the material necessary for understanding the Bayesian analysis methods available in Weibull++. Extensive coverage of the subject can be found in numerous books dealing with Bayesian statistics.
Bayes’s Rule
Bayes’s rule provides the framework for combining prior information with sample data. In this reference, we apply Bayes’s rule for combining prior information on the assumed distribution's parameter(s) with sample data in order to make inferences based on the model. The prior knowledge about the parameter(s) is expressed in terms of a [math]\displaystyle{ \varphi (\theta ), }[/math] called the prior distribution. The posterior distribution of [math]\displaystyle{ \theta }[/math] given the sample data, using Bayes rule, provides the updated information about the parameters [math]\displaystyle{ \theta }[/math]. This is expressed with the following posterior [math]\displaystyle{ pdf }[/math]:
- [math]\displaystyle{ f(\theta |Data) = \frac{L(Data|\theta )\varphi (\theta )}{\int_{\zeta}^{} L(Data|\theta )\varphi(\theta )d (\theta)} }[/math]
- where:
- [math]\displaystyle{ \theta }[/math] is a vector of the parameters of the chosen distribution,
- [math]\displaystyle{ \zeta }[/math] is the range of [math]\displaystyle{ \theta }[/math] ,
- [math]\displaystyle{ L(Data|\theta) }[/math]is the likelihood function based on the chosen distribution and data
- [math]\displaystyle{ \varphi(\theta ) }[/math] is the prior distribution for each of the parameters.
The integral in Eqn. (BayesRuleGeneral) is often referred to as the marginal probability and can be interpreted as the probability of obtaining the sample data given a prior distribution and it's a constant number. Generally, the integral in Eqn. (BayesRuleGeneral) does not have a closed form solution and numerical methods are needed for its solution.
As can be seen from Eqn. (BayesRuleGeneral), there is a significant difference between classical and Bayesian statistics. First, the idea of prior information does not exist in classical statistics. All inferences in classical statistics are based on the sample data. On the other hand, in the Bayesian framework, prior information constitutes the basis of the theory. Another difference is in the overall approach of making inferences and their interpretation. For example, in Bayesian analysis the parameters of the distribution to be fitted are the random variables. In reality, there is no distribution fitted to the data in the Bayesian case.
For instance, consider the case where data is obtained from a reliability test. Based on prior experience on a similar product, the analyst believes that shape parameter of the Weibull distribution has a value between [math]\displaystyle{ {\beta _1} }[/math] and [math]\displaystyle{ {{\beta }_{2}} }[/math] and wants to utilize this information. This can be achieved by using the Bayes theorem. At this point, the analyst is automatically forcing the Weibull distribution as a model for the data and with a shape parameter between [math]\displaystyle{ {\beta _1} }[/math] and [math]\displaystyle{ {\beta _2} }[/math]. In this example, the range of values for the shape parameter is the prior distribution, which in this case is Uniform. By applying Eqn. (BayesRuleGeneral), the posterior distribution of the shape parameter will be obtained. Thus, we end up with a distribution for the parameter rather than an estimate of the parameter, as in classical statistics.
To better illustrate the example, assume that a set of failure data was provided along with a distribution for the shape parameter (i.e. uniform prior) of the Weibull (automatically assuming that the data are Weibull distributed). Based on that, a new distribution (the posterior) for that parameter is then obtained using Eqn. (BayesRuleGeneral). This posterior distribution of the parameter may or may not resemble in form the assumed prior distribution. In other words, in this example the prior distribution of [math]\displaystyle{ \beta }[/math] was assumed to be uniform but the posterior is most likely not a uniform distribution.
The question now becomes: what is the value of the shape parameter? What about the reliability and other results of interest? In order to answer these questions, we have to remember that in the Bayesian framework all of these metrics are random variables. Therefore, in order to obtain an estimate, a probability needs to be specified or we can use the expected value of the posterior distribution.
In order to demonstrate the procedure of obtaining results from the posterior distribution, we will rewrite Eqn. (BayesRuleGeneral) for a single parameter [math]\displaystyle{ {\theta _1} }[/math]:
- [math]\displaystyle{ f(\theta |Data) = \frac{L(Data|\theta_1 )\varphi (\theta_1 )}{\int_{\zeta}^{} L(Data|\theta_1 )\varphi(\theta_1 )d (\theta)} }[/math]
The expected value (or mean value) of the parameter [math]\displaystyle{ {{\theta }_{1}} }[/math] can be obtained using Eqns. (mean) and (BayesRuleSingle):
- [math]\displaystyle{ E({\theta _1}) = {m_{{\theta _1}}} = \int_{\zeta}^{}{\theta _1} \cdot f({\theta _1}|Data)d{\theta _1} }[/math]
An alternative result for [math]\displaystyle{ {\theta _1} }[/math] would be the median value. Using Eqns. (median) and (BayesRuleSingle):
- [math]\displaystyle{ \int_{-\infty ,0}^{{\theta }_{0.5}}f({{\theta }_{1}}|Data)d{{\theta }_{1}}=0.5 }[/math]
Eqn. (bayesMedian) is solved for [math]\displaystyle{ {\theta _{0.5}} }[/math] the median value of [math]\displaystyle{ {\theta _1} }[/math]
Similarly, any other percentile of the posterior [math]\displaystyle{ pdf }[/math] can be calculated and reported. For example, one could calculate the [math]\displaystyle{ 90th }[/math] percentile of [math]\displaystyle{ {\theta _1} }[/math]’s posterior [math]\displaystyle{ pdf }[/math]:
- [math]\displaystyle{ \int_{-\infty ,0}^{{{\theta }_{0.9}}}f({{\theta }_{1}}|Data)d{{\theta }_{1}}=0.9 }[/math]
This calculation will be used in Chapter 5 for obtaining confidence bounds on the parameter(s).
The next step will be to make inferences on the reliability. Since the parameter [math]\displaystyle{ {\theta _1} }[/math] is a random variable described by the posterior [math]\displaystyle{ pdf, }[/math] all subsequent functions of [math]\displaystyle{ {{\theta }_{1}} }[/math] are distributed random variables as well and entirely based on the posterior [math]\displaystyle{ pdf }[/math] of [math]\displaystyle{ {{\theta }_{1}} }[/math]. Therefore, expected value, median or other percentile values will also need to be calculated. For example, the expected reliability at time [math]\displaystyle{ T }[/math] is:
- [math]\displaystyle{ E[R(T|Data)] = \int_{\varsigma}^{} R(T)f(\theta |Data)d{\theta} }[/math]
In other words, at a given time [math]\displaystyle{ T }[/math], there is a distribution that governs the reliability value at that time, [math]\displaystyle{ T }[/math], and by using Eqn. (BayesRel), the expected (or mean) value of the reliability is obtained. Other percentiles of this distribution can also be obtained. A similar procedure is followed for other functions of [math]\displaystyle{ {\theta _1} }[/math], such as failure rate, reliable life, etc.
Prior Distributions
Prior distributions play a very important role in Bayesian Statistics. They are essentially the basis in Bayesian analysis. Different types of prior distributions exist, namely informative and non-informative. Non-informative prior distributions (a.k.a. vague, flat and diffuse) are distributions that have no population basis and play a minimal role in the posterior distribution. The idea behind the use of non-informative prior distributions is to make inferences that are not greatly affected by external information or when external information is not available. The uniform distribution is frequently used as a non-informative prior.
On the other hand, informative priors have a stronger influence on the posterior distribution. The influence of the prior distribution on the posterior is related to the sample size of the data and the form of the prior. Generally speaking, large sample sizes are required to modify strong priors, where weak priors are overwhelmed by even relatively small sample sizes. Informative priors are typically obtained from past data.
MLE (Maximum Likelihood) Parameter Estimation for Complete Data
From a statistical point of view, the method of maximum likelihood estimation is, with some exceptions, considered to be the most robust of the parameter estimation techniques discussed here. This method is presented in this section for complete data, that is, data consisting only of single times-to-failure.
Background on Theory
The basic idea behind MLE is to obtain the most likely values of the parameters, for a given distribution, that will best describe the data. As an example, consider the following data (-3, 0, 4) and assume that you are trying to estimate the mean of the data. Now, if you have to choose the most likely value for the mean from -5, 1 and 10, which one would you choose? In this case, the most likely value is 1 (given your limit on choices). Similarly, under MLE, one determines the most likely values for the parameters of the assumed distribution.
It is mathematically formulated as follows:
If [math]\displaystyle{ x }[/math] is a continuous random variable with [math]\displaystyle{ pdf: }[/math]
- [math]\displaystyle{ f(x;{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) }[/math]
where [math]\displaystyle{ {\theta _1},{\theta _2},...,{\theta _k} }[/math] are [math]\displaystyle{ k }[/math] unknown parameters which need to be estimated, with independent observations, , which correspond in the case of life data analysis to failure times. The likelihood function is given by:
- [math]\displaystyle{ L({{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}|{{x}_{1}},{{x}_{2}},...,{{x}_{R}})=L=\underset{i=1}{\overset{R}{\mathop \prod }}\,f({{x}_{i}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) }[/math]
- [math]\displaystyle{ i=1,2,...,R }[/math]
The logarithmic likelihood function is given by:
- [math]\displaystyle{ \Lambda = \ln L =\sum_{i = 1}^R \ln f({x_i};{\theta _1},{\theta _2},...,{\theta _k}) }[/math]
The maximum likelihood estimators (or parameter values) of [math]\displaystyle{ {{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}, }[/math] are obtained by maximizing [math]\displaystyle{ L }[/math] or [math]\displaystyle{ \Lambda . }[/math]
By maximizing [math]\displaystyle{ \Lambda , }[/math] which is much easier to work with than [math]\displaystyle{ L }[/math], the maximum likelihood estimators (MLE) of [math]\displaystyle{ {{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}} }[/math] are the simultaneous solutions of [math]\displaystyle{ k }[/math] equations such that:
- [math]\displaystyle{ \frac{\partial{\Lambda}}{\partial{\theta_j}}=0, \text{ j=1,2...,k} }[/math]
Even though it is common practice to plot the MLE solutions using median ranks (points are plotted according to median ranks and the line according to the MLE solutions), this is not completely representative. As can be seen from the equations above, the MLE method is independent of any kind of ranks. For this reason, the MLE solution often appears not to track the data on the probability plot. This is perfectly acceptable since the two methods are independent of each other, and in no way suggests that the solution is wrong.
Comments on the MLE Method
The MLE method has many large sample properties that make it attractive for use. It is asymptotically consistent, which means that as the sample size gets larger, the estimates converge to the right values. It is asymptotically efficient, which means that for large samples, it produces the most precise estimates. It is asymptotically unbiased, which means that for large samples one expects to get the right value on average. The distribution of the estimates themselves is normal, if the sample is large enough, and this is the basis for the usual Fisher Matrix confidence bounds discussed later. These are all excellent large sample properties.
Unfortunately, the size of the sample necessary to achieve these properties can be quite large: thirty to fifty to more than a hundred exact failure times, depending on the application. With fewer points, the methods can be badly biased. It is known, for example, that MLE estimates of the shape parameter for the Weibull distribution are badly biased for small sample sizes, and the effect can be increased depending on the amount of censoring. This bias can cause major discrepancies in analysis.
There are also pathological situations when the asymptotic properties of the MLE do not apply. One of these is estimating the location parameter for the three-parameter Weibull distribution when the shape parameter has a value close to 1. These problems, too, can cause major discrepancies.
However, MLE can handle suspensions and interval data better than rank regression, particularly when dealing with a heavily censored data set with few exact failure times or when the censoring times are unevenly distributed. It can also provide estimates with one or no observed failures, which rank regression cannot do. As a rule of thumb, our recommendation is to use rank regression techniques when the sample sizes are small and without heavy censoring (censoring is discussed in Chapter 4). When heavy or uneven censoring is present, when a high proportion of interval data is present and/or when the sample size is sufficient, MLE should be preferred.
Bayesian Methods
Up to this point, we have dealt exclusively with what is commonly referred to as classical statistics. In this section, another school of thought in statistical analysis will be introduced, namely Bayesian statistics. The premise of Bayesian statistics (within the context of life data analysis) is to incorporate prior knowledge, along with a given set of current observations, in order to make statistical inferences. The prior information could come from operational or observational data, from previous comparable experiments or from engineering knowledge. This type of analysis can be particularly useful when there is limited test data for a given design or failure mode but there is a strong prior understanding of the failure rate behavior for that design or mode. By incorporating prior information about the parameter(s), a posterior distribution for the parameter(s) can be obtained and inferences on the model parameters and their functions can be made. This section is intended to give a quick and elementary overview of Bayesian methods, focused primarily on the material necessary for understanding the Bayesian analysis methods available in Weibull++. Extensive coverage of the subject can be found in numerous books dealing with Bayesian statistics.
Bayes’s Rule
Bayes’s rule provides the framework for combining prior information with sample data. In this reference, we apply Bayes’s rule for combining prior information on the assumed distribution's parameter(s) with sample data in order to make inferences based on the model. The prior knowledge about the parameter(s) is expressed in terms of a [math]\displaystyle{ \varphi (\theta ), }[/math] called the prior distribution. The posterior distribution of [math]\displaystyle{ \theta }[/math] given the sample data, using Bayes rule, provides the updated information about the parameters [math]\displaystyle{ \theta }[/math]. This is expressed with the following posterior [math]\displaystyle{ pdf }[/math]:
- [math]\displaystyle{ f(\theta |Data) = \frac{L(Data|\theta )\varphi (\theta )}{\int_{\zeta}^{} L(Data|\theta )\varphi(\theta )d (\theta)} }[/math]
- where:
- [math]\displaystyle{ \theta }[/math] is a vector of the parameters of the chosen distribution,
- [math]\displaystyle{ \zeta }[/math] is the range of [math]\displaystyle{ \theta }[/math] ,
- [math]\displaystyle{ L(Data|\theta) }[/math]is the likelihood function based on the chosen distribution and data
- [math]\displaystyle{ \varphi(\theta ) }[/math] is the prior distribution for each of the parameters.
The integral in Eqn. (BayesRuleGeneral) is often referred to as the marginal probability and can be interpreted as the probability of obtaining the sample data given a prior distribution and it's a constant number. Generally, the integral in Eqn. (BayesRuleGeneral) does not have a closed form solution and numerical methods are needed for its solution.
As can be seen from Eqn. (BayesRuleGeneral), there is a significant difference between classical and Bayesian statistics. First, the idea of prior information does not exist in classical statistics. All inferences in classical statistics are based on the sample data. On the other hand, in the Bayesian framework, prior information constitutes the basis of the theory. Another difference is in the overall approach of making inferences and their interpretation. For example, in Bayesian analysis the parameters of the distribution to be fitted are the random variables. In reality, there is no distribution fitted to the data in the Bayesian case.
For instance, consider the case where data is obtained from a reliability test. Based on prior experience on a similar product, the analyst believes that shape parameter of the Weibull distribution has a value between [math]\displaystyle{ {\beta _1} }[/math] and [math]\displaystyle{ {{\beta }_{2}} }[/math] and wants to utilize this information. This can be achieved by using the Bayes theorem. At this point, the analyst is automatically forcing the Weibull distribution as a model for the data and with a shape parameter between [math]\displaystyle{ {\beta _1} }[/math] and [math]\displaystyle{ {\beta _2} }[/math]. In this example, the range of values for the shape parameter is the prior distribution, which in this case is Uniform. By applying Eqn. (BayesRuleGeneral), the posterior distribution of the shape parameter will be obtained. Thus, we end up with a distribution for the parameter rather than an estimate of the parameter, as in classical statistics.
To better illustrate the example, assume that a set of failure data was provided along with a distribution for the shape parameter (i.e. uniform prior) of the Weibull (automatically assuming that the data are Weibull distributed). Based on that, a new distribution (the posterior) for that parameter is then obtained using Eqn. (BayesRuleGeneral). This posterior distribution of the parameter may or may not resemble in form the assumed prior distribution. In other words, in this example the prior distribution of [math]\displaystyle{ \beta }[/math] was assumed to be uniform but the posterior is most likely not a uniform distribution.
The question now becomes: what is the value of the shape parameter? What about the reliability and other results of interest? In order to answer these questions, we have to remember that in the Bayesian framework all of these metrics are random variables. Therefore, in order to obtain an estimate, a probability needs to be specified or we can use the expected value of the posterior distribution.
In order to demonstrate the procedure of obtaining results from the posterior distribution, we will rewrite Eqn. (BayesRuleGeneral) for a single parameter [math]\displaystyle{ {\theta _1} }[/math]:
- [math]\displaystyle{ f(\theta |Data) = \frac{L(Data|\theta_1 )\varphi (\theta_1 )}{\int_{\zeta}^{} L(Data|\theta_1 )\varphi(\theta_1 )d (\theta)} }[/math]
The expected value (or mean value) of the parameter [math]\displaystyle{ {{\theta }_{1}} }[/math] can be obtained using Eqns. (mean) and (BayesRuleSingle):
- [math]\displaystyle{ E({\theta _1}) = {m_{{\theta _1}}} = \int_{\zeta}^{}{\theta _1} \cdot f({\theta _1}|Data)d{\theta _1} }[/math]
An alternative result for [math]\displaystyle{ {\theta _1} }[/math] would be the median value. Using Eqns. (median) and (BayesRuleSingle):
- [math]\displaystyle{ \int_{-\infty ,0}^{{\theta }_{0.5}}f({{\theta }_{1}}|Data)d{{\theta }_{1}}=0.5 }[/math]
Eqn. (bayesMedian) is solved for [math]\displaystyle{ {\theta _{0.5}} }[/math] the median value of [math]\displaystyle{ {\theta _1} }[/math]
Similarly, any other percentile of the posterior [math]\displaystyle{ pdf }[/math] can be calculated and reported. For example, one could calculate the [math]\displaystyle{ 90th }[/math] percentile of [math]\displaystyle{ {\theta _1} }[/math]’s posterior [math]\displaystyle{ pdf }[/math]:
- [math]\displaystyle{ \int_{-\infty ,0}^{{{\theta }_{0.9}}}f({{\theta }_{1}}|Data)d{{\theta }_{1}}=0.9 }[/math]
This calculation will be used in Chapter 5 for obtaining confidence bounds on the parameter(s).
The next step will be to make inferences on the reliability. Since the parameter [math]\displaystyle{ {\theta _1} }[/math] is a random variable described by the posterior [math]\displaystyle{ pdf, }[/math] all subsequent functions of [math]\displaystyle{ {{\theta }_{1}} }[/math] are distributed random variables as well and entirely based on the posterior [math]\displaystyle{ pdf }[/math] of [math]\displaystyle{ {{\theta }_{1}} }[/math]. Therefore, expected value, median or other percentile values will also need to be calculated. For example, the expected reliability at time [math]\displaystyle{ T }[/math] is:
- [math]\displaystyle{ E[R(T|Data)] = \int_{\varsigma}^{} R(T)f(\theta |Data)d{\theta} }[/math]
In other words, at a given time [math]\displaystyle{ T }[/math], there is a distribution that governs the reliability value at that time, [math]\displaystyle{ T }[/math], and by using Eqn. (BayesRel), the expected (or mean) value of the reliability is obtained. Other percentiles of this distribution can also be obtained. A similar procedure is followed for other functions of [math]\displaystyle{ {\theta _1} }[/math], such as failure rate, reliable life, etc.
Prior Distributions
Prior distributions play a very important role in Bayesian Statistics. They are essentially the basis in Bayesian analysis. Different types of prior distributions exist, namely informative and non-informative. Non-informative prior distributions (a.k.a. vague, flat and diffuse) are distributions that have no population basis and play a minimal role in the posterior distribution. The idea behind the use of non-informative prior distributions is to make inferences that are not greatly affected by external information or when external information is not available. The uniform distribution is frequently used as a non-informative prior.
On the other hand, informative priors have a stronger influence on the posterior distribution. The influence of the prior distribution on the posterior is related to the sample size of the data and the form of the prior. Generally speaking, large sample sizes are required to modify strong priors, where weak priors are overwhelmed by even relatively small sample sizes. Informative priors are typically obtained from past data.