Proportional Hazards Model: Difference between revisions
Cruz Daniel (talk | contribs) m (changed the listed 'unknowns to solve for' to not include eta, since eta has been renamed a_0, which we already list. Suggestion came from Sayyed Vazirizade at University of Arizona) |
|||
(One intermediate revision by one other user not shown) | |||
Line 47: | Line 47: | ||
However, the exponential form is mostly used due to its simplicity and is given by: | However, the exponential form is mostly used due to its simplicity and is given by: | ||
::<math>g(\underline{X},\underline{A})={{e}^{{{\underline{A}}^{T}}{{\underline{X}}^{T}}}}={{e}^{\mathop{}_{j=1}^{m}{{a}_{j}}{{x}_{j}}}}\,\!</math> | ::<math>g(\underline{X},\underline{A})={{e}^{{{\underline{A}}^{T}}{{\underline{X}}^{T}}}}={{e}^{\mathop{\sum}_{j=1}^{m}{{a}_{j}}{{x}_{j}}}}\,\!</math> | ||
The failure rate can then be written as: | The failure rate can then be written as: | ||
::<math>\lambda (t,\underline{X})={{\lambda }_{0}}(t)\cdot {{e}^{\mathop{\sum}_{j= | ::<math>\lambda (t,\underline{X})={{\lambda }_{0}}(t)\cdot {{e}^{\mathop{\sum}_{j=1}^{m}{{a}_{j}}{{x}_{j}}}}\,\!</math> | ||
==Parametric Model Formulation== | ==Parametric Model Formulation== | ||
Line 78: | Line 78: | ||
\end{align}\,\!</math> | \end{align}\,\!</math> | ||
The total number of unknowns to solve for in this model is <math>m+2\,\!</math> (i.e., <math>\beta | The total number of unknowns to solve for in this model is <math>m+2\,\!</math> (i.e., <math>\beta ,{{a}_{0}},{{a}_{1}},...{{a}_{m}}\,\!</math>). | ||
The maximum likelihood estimation method can be used to determine these parameters. The log-likelihood function for this case is given by: | The maximum likelihood estimation method can be used to determine these parameters. The log-likelihood function for this case is given by: |
Latest revision as of 16:26, 29 October 2021
This article also appears in the Accelerated Life Testing Data Analysis Reference book.
Introduced by D. R. Cox, the Proportional Hazards (PH) model was developed in order to estimate the effects of different covariates influencing the times-to-failure of a system. The model has been widely used in the biomedical field, as discussed in Leemis [22], and recently there has been an increasing interest in its application in reliability engineering. In its original form, the model is non-parametric, (i.e., no assumptions are made about the nature or shape of the underlying failure distribution). In this reference, the original non-parametric formulation as well as a parametric form of the model will be considered utilizing a Weibull life distribution. In ALTA, the proportional hazards model is included in its parametric form and can be used to analyze data with up to eight variables. The GLL-Weibull and GLL-exponential models are actually special cases of the proportional hazards model. However, when using the proportional hazards in ALTA, no transformation on the covariates (or stresses) can be performed.
Non-Parametric Model Formulation
According to the PH model, the failure rate of a system is affected not only by its operation time, but also by the covariates under which it operates. For example, a unit may have been tested under a combination of different accelerated stresses such as humidity, temperature, voltage, etc. It is clear then that such factors affect the failure rate of a unit.
The instantaneous failure rate (or hazard rate) of a unit is given by:
- [math]\displaystyle{ \lambda (t)=\frac{f(t)}{R(t)}\,\! }[/math]
where:
- [math]\displaystyle{ f(t)\,\! }[/math] is the probability density function.
- [math]\displaystyle{ R(t)\,\! }[/math] is the reliability function.
Note that for the case of the failure rate of a unit being dependent not only on time but also on other covariates, the above equation must be modified in order to be a function of time and of the covariates. The proportional hazards model assumes that the failure rate (hazard rate) of a unit is the product of:
- an arbitrary and unspecified baseline failure rate, [math]\displaystyle{ {{\lambda }_{0}}(t),\,\! }[/math] which is a function of time only.
- a positive function [math]\displaystyle{ g(x,\underline{A})\,\! }[/math], independent of time, which incorporates the effects of a number of covariates such as humidity, temperature, pressure, voltage, etc.
The failure rate of a unit is then given by:
- [math]\displaystyle{ \lambda (t,\underline{X})={{\lambda }_{0}}(t)\cdot g(\underline{X},\underline{A})\,\! }[/math]
where:
- [math]\displaystyle{ \underline{X}\,\! }[/math] is a row vector consisting of the covariates:
- [math]\displaystyle{ \underline{X}=({{x}_{1}},{{x}_{2}},...,{{x}_{m}})\,\! }[/math]
- [math]\displaystyle{ \underline{A}\,\! }[/math] is a column vector consisting of the unknown parameters (also called regression parameters) of the model:
- [math]\displaystyle{ \underline{A}={{({{a}_{1}},{{a}_{2}},...{{a}_{m}})}^{T}}\,\! }[/math]
- where:
- [math]\displaystyle{ \quad \quad m\,\! }[/math] = number of stress related variates (time-independent).
It can be assumed that the form of [math]\displaystyle{ g(\underline{X},\underline{A})\,\! }[/math] is known and [math]\displaystyle{ {{\lambda }_{0}}(t)\,\! }[/math] is unspecified. Different forms of [math]\displaystyle{ g(\underline{X},\underline{A})\,\! }[/math] can be used.
However, the exponential form is mostly used due to its simplicity and is given by:
- [math]\displaystyle{ g(\underline{X},\underline{A})={{e}^{{{\underline{A}}^{T}}{{\underline{X}}^{T}}}}={{e}^{\mathop{\sum}_{j=1}^{m}{{a}_{j}}{{x}_{j}}}}\,\! }[/math]
The failure rate can then be written as:
- [math]\displaystyle{ \lambda (t,\underline{X})={{\lambda }_{0}}(t)\cdot {{e}^{\mathop{\sum}_{j=1}^{m}{{a}_{j}}{{x}_{j}}}}\,\! }[/math]
Parametric Model Formulation
A parametric form of the proportional hazards model can be obtained by assuming an underlying distribution. In ALTA, the Weibull and exponential distributions are available. In this section we will consider the Weibull distribution to formulate the parametric proportional hazards model. In other words, it is assumed that the baseline failure rate is parametric and given by the Weibull distribution. In this case, the baseline failure rate is given by:
- [math]\displaystyle{ {{\lambda }_{0}}(t)=\frac{\beta }{\eta }{{\left( \frac{t}{\eta } \right)}^{\beta -1}}\,\! }[/math]
The PH failure rate then becomes:
- [math]\displaystyle{ \lambda (t,\underline{X})=\frac{\beta }{\eta }{{\left( \frac{t}{\eta } \right)}^{\beta -1}}\cdot {{e}^{\mathop{\sum}_{j=1}^{m}{{a}_{j}}{{x}_{j}}}}\,\! }[/math]
It is often more convenient to define an additional covariate, [math]\displaystyle{ {{x}_{0}} = 1\,\! }[/math], in order to allow the Weibull scale parameter raised to the beta (shape parameter) to be included in the vector of regression coefficients. The PH failure rate can then be written as:
- [math]\displaystyle{ \lambda (t,\underline{X})=\beta \cdot {{t}^{\beta -1}}\cdot {{e}^{\mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{j}}}}\,\! }[/math]
The PH reliability function is given by:
- [math]\displaystyle{ \begin{align} R(t,\underline{X})=\ {{e}^{-\int_{0}^{t}\lambda (u)du}} =\ {{e}^{-\int_{0}^{t}\lambda (u,\underline{X})du}} =\ {{e}^{-{{t}^{\beta }}\cdot {{e}^{\mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{j}}}}}} \end{align}\,\! }[/math]
The pdf can be obtained by taking the partial derivative of the reliability function with respect to time. The PH pdf is:
- [math]\displaystyle{ \begin{align} f(t,\underline{X})= & \lambda (t,\underline{X})\cdot R(t,\underline{X}) =\ \beta \cdot {{t}^{\beta -1}}{{e}^{\left[ \mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{j}}-{{t}^{\beta }}\cdot {{e}^{\mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{j}}}} \right]}} \end{align}\,\! }[/math]
The total number of unknowns to solve for in this model is [math]\displaystyle{ m+2\,\! }[/math] (i.e., [math]\displaystyle{ \beta ,{{a}_{0}},{{a}_{1}},...{{a}_{m}}\,\! }[/math]).
The maximum likelihood estimation method can be used to determine these parameters. The log-likelihood function for this case is given by:
- [math]\displaystyle{ \begin{align} \ln (L)= & \Lambda =\underset{i=1}{\overset{{{F}_{e}}}{\mathop \sum }}\,{{N}_{i}}\ln \left( \beta \cdot T_{i}^{\beta -1}{{e}^{-T_{i}^{\beta }\cdot {{e}^{\mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{i,j}}}}}}{{e}^{\mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{i,j}}}} \right) -\underset{i=1}{\overset{S}{\mathop \sum }}\,N_{i}^{\prime }{{\left( T_{i}^{\prime } \right)}^{\beta }}{{e}^{\mathop{\sum}_{j=0}^{m}{{a}_{j}}{{x}_{i,j}}}}+\overset{FI}{\mathop{\underset{i=1}{\mathop{\underset{}{\overset{}{\mathop \sum }}\,}}\,}}\,N_{i}^{\prime \prime }\ln [R_{Li}^{\prime \prime }-R_{Ri}^{\prime \prime }] \end{align}\,\! }[/math]
where:
- [math]\displaystyle{ \begin{align} & R_{Li}^{\prime \prime }= & {{e}^{-T_{Li}^{\prime \prime \beta }{{e}^{\underset{j=0}{\mathop{\overset{n}{\mathop{\mathop{\sum}_{}^{}}}\,}}\,{{\alpha }_{j}}{{x}_{j}}}}}} \\ & R_{Ri}^{\prime \prime }= & {{e}^{-T_{Ri}^{\prime \prime \beta }{{e}^{\underset{j=0}{\mathop{\overset{n}{\mathop{\mathop{\sum}_{}^{}}}\,}}\,{{\alpha }_{j}}{{x}_{j}}}}}} \end{align}\,\! }[/math]
Solving for the parameters that maximize the log-likelihood function will yield the parameters for the PH-Weibull model. Note that for [math]\displaystyle{ \beta =1 \,\! }[/math], the log-likelihood function becomes the log-likelihood function for the PH-exponential model, which is similar to the original form of the proportional hazards model proposed by Cox and Oakes [39].
Note that the likelihood function of the GLL model is very similar to the likelihood function for the proportional hazards-Weibull model. In particular, the shape parameter of the Weibull distribution can be included in the regression coefficients as follows:
- [math]\displaystyle{ {{a}_{i,PH}}=-\beta \cdot {{a}_{i,GLL}}\,\! }[/math]
where:
- [math]\displaystyle{ {{a}_{i,PH}}\,\! }[/math] are the parameters of the PH model.
- [math]\displaystyle{ {{a}_{i,GLL}}\,\! }[/math] are the parameters of the general log-linear model.
In this case, the likelihood functions are identical. Therefore, if no transformation on the covariates is performed, the parameter values that maximize the likelihood function of the GLL model also maximize the likelihood function for the proportional hazards-Weibull (PHW) model. Note that for [math]\displaystyle{ \beta = 1\,\! }[/math] (exponential life distribution), the two likelihood functions are identical, and [math]\displaystyle{ {{a}_{i,PH}}=-{{a}_{i,GLL}}.\,\! }[/math]
Indicator Variables
Another advantage of the multivariable relationships used in ALTA is that they allow for simultaneous analysis of continuous and categorical variables. Categorical variables are variables that take on discrete values such as the lot designation for products from different manufacturing lots. In this example, lot is a categorical variable, and it can be expressed in terms of indicator variables. Indicator variables only take a value of 1 or 0. For example, consider a sample of test units. A number of these units were obtained from Lot 1, others from Lot 2, and the rest from Lot 3. These three lots can be represented with the use of indicator variables, as follows:
- Define two indicator variables, [math]\displaystyle{ {{X}_{1}}\,\! }[/math] and [math]\displaystyle{ {{X}_{2}}.\,\! }[/math]
- For the units from Lot 1, [math]\displaystyle{ {{X}_{1}}=1,\,\! }[/math] and [math]\displaystyle{ {{X}_{2}}=0.\,\! }[/math]
- For the units from Lot 2, [math]\displaystyle{ {{X}_{1}}=0,\,\! }[/math] and [math]\displaystyle{ {{X}_{2}}=1.\,\! }[/math]
- For the units from Lot 3, [math]\displaystyle{ {{X}_{1}}=0,\,\! }[/math] and [math]\displaystyle{ {{X}_{2}}=0.\,\! }[/math]
Assume that an accelerated test was performed with these units, and temperature was the accelerated stress. In this case, the GLL relationship can be used to analyze the data. From this relationship we get:
- [math]\displaystyle{ L(\underline{X})={{e}^{{{\alpha }_{0}}+{{\alpha }_{1}}{{X}_{1}}+{{\alpha }_{2}}{{X}_{2}}+{{\alpha }_{3}}{{X}_{3}}}}\,\! }[/math]
where:
- [math]\displaystyle{ {{X}_{1}}\,\! }[/math] and [math]\displaystyle{ {{X}_{2}}\,\! }[/math] are the indicator variables, as defined above.
- [math]\displaystyle{ {{X}_{3}}=\tfrac{1}{T},\,\! }[/math] where [math]\displaystyle{ T\,\! }[/math] is the temperature.
The data can now be entered in ALTA and, with the assumption of an underlying life distribution and using MLE, the parameters of this model can be obtained.