Appendix A: Brief Statistical Background
In this appendix we attempt to provide a brief elementary introduction to the most common and fundamental statistical equations and definitions used in reliability engineering and life data analysis. The equations and concepts presented in this appendix are used extensively throughout this reference.
Basic Statistical Definitions
Random Variables
In general, most problems in reliability engineering deal with quantitative measures, such as the time-to-failure of a product, or whether the product fails or does not fail. In judging a product to be defective or non-defective, only two outcomes are possible. We can use a random variable to denote these possible outcomes (i.e. defective or non-defective). In this case, [math]\displaystyle{ X }[/math] is a random variable that can take on only these values.
In the case of times-to-failure, our random variable [math]\displaystyle{ X }[/math] can take on the time-to-failure of the product and can be in a range from [math]\displaystyle{ 0 }[/math] to infinity (since we do not know the exact time a priori).
In the first case in which the random variable can take on discrete values (let's say [math]\displaystyle{ defective=0 }[/math] and [math]\displaystyle{ non-defective=1 }[/math] ), the variable is said to be a [math]\displaystyle{ discrete }[/math] [math]\displaystyle{ random }[/math] [math]\displaystyle{ variable. }[/math] In the second case, our product can be found failed at any time after time 0 (i.e. at 12 hr or at 100 hr and so forth), thus [math]\displaystyle{ X }[/math] can take on any value in this range. In this case, our random variable [math]\displaystyle{ X }[/math] is said to be a [math]\displaystyle{ continous }[/math] [math]\displaystyle{ random }[/math] [math]\displaystyle{ variable. }[/math] In this reference, we will deal almost exclusively with continuous random variables.
The Probability Density and Cumulative Density Functions
Designations
From probability and statistics, given a continuous random variable [math]\displaystyle{ X, }[/math] we denote:
- • The probability density (distribution) function, [math]\displaystyle{ pdf }[/math] , as [math]\displaystyle{ f(x). }[/math]
- • The cumulative density function, [math]\displaystyle{ cdf }[/math] , as [math]\displaystyle{ F(x). }[/math]
The [math]\displaystyle{ pdf }[/math] and [math]\displaystyle{ cdf }[/math] give a complete description of the probability distribution of a random variable.
Definitions
[math]\displaystyle{ }[/math]
If [math]\displaystyle{ X }[/math] is a continuous random variable, then the [math]\displaystyle{ probability }[/math] [math]\displaystyle{ density }[/math] [math]\displaystyle{ function, }[/math] [math]\displaystyle{ pdf }[/math] , of [math]\displaystyle{ X }[/math] is a function [math]\displaystyle{ f(x) }[/math] such that for two numbers, [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] with [math]\displaystyle{ a\le b }[/math] :
- [math]\displaystyle{ P(a\le X\le b)=\mathop{}_{a}^{b}f(x)dx }[/math]
That is, the probability that [math]\displaystyle{ X }[/math] takes on a value in the interval [math]\displaystyle{ [a,b] }[/math] is the area under the density function from [math]\displaystyle{ a }[/math] to [math]\displaystyle{ b }[/math] .
The [math]\displaystyle{ cumulative }[/math] [math]\displaystyle{ distribution }[/math] [math]\displaystyle{ function }[/math] , [math]\displaystyle{ cdf }[/math] , is a function [math]\displaystyle{ F(x), }[/math] of a random variable [math]\displaystyle{ X }[/math] , and is defined for a number [math]\displaystyle{ x }[/math] by:
- [math]\displaystyle{ F(x)=P(X\le x)=\mathop{}_{0}^{x}f(s)ds }[/math]
That is, for a number [math]\displaystyle{ x }[/math] , [math]\displaystyle{ F(x) }[/math] is the probability that the observed value of [math]\displaystyle{ X }[/math] will be at most [math]\displaystyle{ x }[/math] .
Note that depending on the function denoted by [math]\displaystyle{ f(x) }[/math] , or more specifically the distribution denoted by [math]\displaystyle{ f(x), }[/math] the limits will vary depending on the region over which the distribution is defined. For example, for all the life distributions considered in this reference, this range would be [math]\displaystyle{ [0,+\infty ]. }[/math]
Graphical representation of the [math]\displaystyle{ pdf }[/math] and [math]\displaystyle{ cdf }[/math]
[math]\displaystyle{ }[/math]
Mathematical Relationship Between the [math]\displaystyle{ pdf }[/math] and [math]\displaystyle{ cdf }[/math]
The mathematical relationship between the [math]\displaystyle{ pdf }[/math] and [math]\displaystyle{ cdf }[/math] is given by:
- [math]\displaystyle{ F(x)=\mathop{}_{0}^{x}f(s)ds }[/math]
where [math]\displaystyle{ s }[/math] is a dummy integration variable.
Conversely:
- [math]\displaystyle{ f(x)=-\frac{d(F(x))}{dx} }[/math]
In plain English, the [math]\displaystyle{ cdf }[/math] is the area under the probability density function, up to a value of [math]\displaystyle{ x }[/math] , if so chosen. The total area under the [math]\displaystyle{ pdf }[/math] is always equal to 1, or mathematically:
- [math]\displaystyle{ \mathop{}_{0}^{\infty }f(x)dx=1 }[/math]
[math]\displaystyle{ }[/math]
An example of a probability density function is the well known normal distribution, for which the [math]\displaystyle{ pdf }[/math] is given by:
[math]\displaystyle{ f(t)=\frac{1}{\sigma \sqrt{2\pi }}{{e}^{-\tfrac{1}{2}{{\left( \tfrac{t-\mu }{\sigma } \right)}^{2}}}} }[/math]
where [math]\displaystyle{ \mu }[/math] is the mean and [math]\displaystyle{ \sigma }[/math] is the standard deviation. The normal distribution is a two parameter distribution, i.e. with two parameters [math]\displaystyle{ \mu }[/math] and [math]\displaystyle{ \sigma }[/math] .
Another is the lognormal distribution, whose [math]\displaystyle{ pdf }[/math] is given by:
- [math]\displaystyle{ f(t)=\frac{1}{t\cdot {{\sigma }^{\prime }}\sqrt{2\pi }}{{e}^{-\tfrac{1}{2}{{\left( \tfrac{{{t}^{\prime }}-{{\mu }^{\prime }}}{{{\sigma }^{\prime }}} \right)}^{2}}}} }[/math]
where [math]\displaystyle{ {\mu }' }[/math] is the mean of the natural logarithms of the times-to-failure, and [math]\displaystyle{ {\sigma }' }[/math] is the standard deviation of the natural logarithms of the times-to-failure. Again, this is a two parameter distribution.
The Reliability Function
The reliability function can be derived using the previous definition of the cumulative density function, Eqn. (pv21a). Note that the probability of an event occurring by time [math]\displaystyle{ t }[/math] (based on a continuous distribution given by [math]\displaystyle{ f(x), }[/math] or henceforth [math]\displaystyle{ f(t) }[/math] since our random variable of interest in life data analysis is time, or [math]\displaystyle{ t }[/math] ), is given by:
- [math]\displaystyle{ F(t)=\mathop{}_{0}^{t}f(s)ds }[/math]
One could equate this event to the probability of a unit failing by time [math]\displaystyle{ t }[/math] .
[math]\displaystyle{ }[/math]
From this fact, the most commonly used function in reliability engineering, the reliability function, can then be obtained. The reliability function enables the determination of the probability of success of a unit, in undertaking a mission of a prescribed duration.
To show this mathematically, we first define the unreliability function, [math]\displaystyle{ Q(t) }[/math] , which is the probability of failure, or the probability that our time-to-failure is in the region of [math]\displaystyle{ 0 }[/math] and [math]\displaystyle{ t }[/math] . So from Eqn. (ee34):
- [math]\displaystyle{ F(t)=Q(t)=\mathop{}_{0}^{t}f(s)ds }[/math]
Reliability and unreliability are success and failure probabilities, are the only two events being considered, and are mutually exclusive; hence, the sum of these probabilities is equal to unity. So then:
- [math]\displaystyle{ \begin{align} Q(t)+R(t)= & 1 \\ R(t)= & 1-Q(t) \\ R(t)= & 1-\mathop{}_{0}^{t}f(s)ds \\ R(t)= & \mathop{}_{t}^{\infty }f(s)ds \end{align} }[/math]
Conversely:
- [math]\displaystyle{ f(t)=\frac{d(R(t))}{dt} }[/math]
The Failure Rate Function
The failure rate function enables the determination of the number of failures occurring per unit time. Omitting the derivation, see [18; Ch. 4], the failure rate is mathematically given as:
- [math]\displaystyle{ \lambda (t)=\frac{f(t)}{R(t)} }[/math]
Failure rate is denoted as failures per unit time.
The Mean Life Function
The mean life function, which provides a measure of the average time of operation to failure is given by:
- [math]\displaystyle{ \overline{T}=m=\mathop{}_{0}^{\infty }t\cdot f(t)dt }[/math]
This is the expected or average time-to-failure and is denoted as the [math]\displaystyle{ MTTF }[/math] (Mean Time-to-Failure) and synonymously called [math]\displaystyle{ MTBF }[/math] (Mean Time Before Failure) by many authors.
Median Life
Median life, [math]\displaystyle{ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{T} }[/math],
is the value of the random variable that has exactly one-half of the area under the [math]\displaystyle{ pdf }[/math] to its left and one-half to its right. The median is obtained from:
- [math]\displaystyle{ \mathop{}_{0}^{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{T}}}f(t)dt=0.5 }[/math]
(For individual data, e.g. 12, 20, 21, the median is the midpoint value, or 20 in this case.)
Mode
The modal (or mode) life, [math]\displaystyle{ \widetilde{T} }[/math],
is the maximum value of [math]\displaystyle{ T }[/math] that satisfies:
- [math]\displaystyle{ \frac{d\left[ f(t) \right]}{dt}=0 }[/math]
For a continuous distribution, the mode is that value of the variate which corresponds to the maximum probability density (the value at which the [math]\displaystyle{ pdf }[/math] has its maximum value).