Appendix A: Brief Statistical Background: Difference between revisions
Line 343: | Line 343: | ||
===Single Parameter Case=== | ===Single Parameter Case=== | ||
For simplicity, consider a one parameter distribution represented by a general function <math>G,</math> which is a function of one parameter estimator, say <math>G(\widehat{\theta }).</math> Then, in general, the expected value of <math>G\left( \widehat{\theta } \right)</math> can be found by: | For simplicity, consider a one parameter distribution represented by a general function <math>G,</math> which is a function of one parameter estimator, say <math>G(\widehat{\theta }).</math> Then, in general, the expected value of <math>G\left( \widehat{\theta } \right)</math> can be found by: | ||
<br> | <br> | ||
::<math>E\left( G\left( \widehat{\theta } \right) \right)=G(\theta )+O\left( \frac{1}{n} \right)</math> | ::<math>E\left( G\left( \widehat{\theta } \right) \right)=G(\theta )+O\left( \frac{1}{n} \right)</math> | ||
where <math>G(\theta )</math> is some function of <math>\theta </math> , such as the reliability function, and <math>\theta </math> is the population moment, or parameter such that <math>E\left( \widehat{\theta } \right)=\theta </math> as | |||
where <math>G(\theta )</math> is some function of <math>\theta </math> , such as the reliability function, and <math>\theta </math> is the population moment, or parameter such that <math>E\left( \widehat{\theta } \right)=\theta </math> as <math>n\to \infty </math>. The term <math>O\left( \tfrac{1}{n} \right)</math> is a function of <math>n</math> , the sample size, and tends to zero, as fast as <math>\frac{1}{n}</math> as <math>n\to \infty .</math> For example, in the case of <math>\widehat{\theta }=\overline{x}</math> and <math>G(x)={{x}^{2}}</math> , then <math>E(G(\overline{x}))={{\mu }^{2}}+O\left( \tfrac{1}{n} \right)</math> where <math>O\left( \tfrac{1}{n} \right)=\tfrac{{{\sigma }^{2}}}{n},</math> thus as <math>n\to \infty </math> , <math>E(G(\overline{x}))={{\mu }^{2}}</math> ( <math>\mu </math> and <math>\sigma </math> are the mean and standard deviation, respectively). Using the same one parameter distribution, the variance of the function <math>G\left( \widehat{\theta } \right)</math> can then be estimated by: | |||
::<math>Var\left( G\left( \widehat{\theta } \right) \right)=\left( \frac{\partial G}{\partial \widehat{\beta }} \right)_{\widehat{\theta }=\theta }^{2}Var\left( \widehat{\theta } \right)+O\left( \frac{1}{{{n}^{\tfrac{3}{2}}}} \right)</math> | ::<math>Var\left( G\left( \widehat{\theta } \right) \right)=\left( \frac{\partial G}{\partial \widehat{\beta }} \right)_{\widehat{\theta }=\theta }^{2}Var\left( \widehat{\theta } \right)+O\left( \frac{1}{{{n}^{\tfrac{3}{2}}}} \right)</math> | ||
<br> | <br> | ||
Revision as of 18:18, 23 February 2012
Reference Appendix A: Brief Statistical Background
In this appendix we attempt to provide a brief elementary introduction to the most common and fundamental statistical equations and definitions used in reliability engineering and life data analysis. The equations and concepts presented in this appendix are used extensively throughout this reference.
Basic Statistical Definitions
Random Variables
In general, most problems in reliability engineering deal with quantitative measures, such as the time-to-failure of a product, or whether the product fails or does not fail. In judging a product to be defective or non-defective, only two outcomes are possible. We can use a random variable to denote these possible outcomes (i.e. defective or non-defective). In this case,
In the case of times-to-failure, our random variable
In the first case in which the random variable can take on discrete values (let's say
The Probability Density and Cumulative Density Functions
Designations
From probability and statistics, given a continuous random variable
• The probability density (distribution) function,
• The cumulative density function,
The
Definitions
If
That is, the probability that
That is, for a number
Graphical representation of the and
Mathematical Relationship Between the and
The mathematical relationship between the
where
Conversely:
In plain English, the
An example of a probability density function is the well known normal distribution, for which the
where
Another is the lognormal distribution, whose
where
The Reliability Function
The reliability function can be derived using the previous definition of the cumulative density function. Note that the probability of an event occurring by time
One could equate this event to the probability of a unit failing by time
From this fact, the most commonly used function in reliability engineering, the reliability function, can then be obtained. The reliability function enables the determination of the probability of success of a unit, in undertaking a mission of a prescribed duration.
To show this mathematically, we first define the unreliability function,
Reliability and unreliability are success and failure probabilities, are the only two events being considered, and are mutually exclusive; hence, the sum of these probabilities is equal to unity. So then:
Conversely:
The Failure Rate Function
The failure rate function enables the determination of the number of failures occurring per unit time. Omitting the derivation, see [18; Ch. 4], the failure rate is mathematically given as:
Failure rate is denoted as failures per unit time.
The Mean Life Function
The mean life function, which provides a measure of the average time of operation to failure is given by:
This is the expected or average time-to-failure and is denoted as the
Median Life
Median life,
(For individual data, e.g. 12, 20, 21, the median is the midpoint value, or 20 in this case.)
Mode
The modal (or mode) life,
For a continuous distribution, the mode is that value of the variate which corresponds to the maximum probability density (the value at which the
Distributions
A statistical distribution is fully described by its
The exponential distribution is a very commonly used distribution in reliability engineering. Due to its simplicity, it has been widely employed even in cases to which it does not apply. The
In this definition, note that
where the mean,
For example, we know that the exponential distribution
Thus the reliability function can be derived:
The failure rate function is given by:
The mean time to/before failure (MTTF / MTBF) is given by:
Exactly the same methodology can be applied to any distribution given its
Most Commonly Used Distributions
There are many different lifetime distributions that can be used. ReliaSoft [31] presents a thorough overview of lifetime distributions. Leemis [22] and others also present good overviews of many of these distributions. The three distributions used in ALTA, the 1-parameter exponential, 2-parameter Weibull, and the lognormal, are presented in greater detail here.
Confidence Intervals (or Bounds)
One of the most confusing concepts to an engineer new to the field is the concept of putting a probability on a probability. In life data analysis, this concept is referred to as confidence intervals or confidence bounds. In this section, we will try to briefly present the concept, in less than statistical terms, but based on solid common sense.
The Black and White Marbles
To illustrate, imagine a situation in which there are millions of black and white marbles in a rather large swimming pool, and our job is to estimate the percentage of black marbles. One way to do this (other than counting all the marbles!) is to estimate the percentage of black marbles by taking a sample and then counting the number of black marbles in the sample.
Taking a Small Sample of Marbles
First, let's pick out a small sample of marbles and count the black ones. Say you picked out 10 marbles and counted 4 black marbles. Based on this, your estimate would be that 40% of the marbles are black.
If you put the 10 marbles back into the pool and repeated this example, you might get 5 black marbles, changing your estimate to 50% black marbles.
Which of the two estimates is correct? Both estimates are correct! As you repeat this experiment over and over again, you might find out that this estimate is usually between
Taking a Larger Sample of Marbles
If we now repeat the experiment and pick out 1,000 marbles, we might get results such as 545, 570, 530, etc. for the number of black marbles in each trial. Note that the range in this case will be much narrower than before. For example, let's say that 90% of the time, the number of black marbles will be from
Back to Reliability
Returning to the subject at hand, our task is to determine the probability of failure or reliability of all of our units. However, until all units fail, we will never know the exact value. Our task is to estimate the reliability based on a sample, much like estimating the number of black marbles in the pool. If we perform 10 different reliability tests for our units, and estimate the parameters using ALTA, we will obtain slightly different parameters for the distribution each time, and thus slightly different reliability results. However, when employing confidence bounds, we obtain a range in which these values are more likely to occur
One-Sided and Two-Sided Confidence Bounds
Confidence bounds (or intervals) are generally described as one-sided or two-sided.
Two-Sided Bounds
When we use two-sided confidence bounds (or intervals), we are looking at where most of the population is likely to lie. For example, when using 90% two-sided confidence bounds, we are saying that 90% lies between
One-Sided Bounds
When using one-sided intervals, we are looking at the percentage of units that are greater or less (upper and lower) than a certain point
For example, 95% one-sided confidence bounds would indicate that 95% of the population is greater than
In ALTA, we use upper to mean the higher limit and lower to mean the lower limit, regardless of their position, but based on the value of the results. So for example, when returning the confidence bounds on the reliability, we would term the lower value of reliability as the lower limit and the higher value of reliability as the higher limit. When returning the confidence bounds on probability of failure, we will again term the lower numeric value for the probability of failure as the lower limit and the higher value as the higher limit.
Confidence Limits Determination
This section presents an overview of the theory on obtaining approximate confidence bounds on suspended (multiply censored) data. The methodology used is the so-called Fisher Matrix Bounds, described in Nelson [27] and Lloyd and Lipow [24].
Suggested References
This section presents a brief introduction into how the confidence intervals are calculated by ALTA. By no means do we intend to cover the full theory behind this methodology. More complete details on confidence intervals can be found in the following books:
• Nelson, Wayne, Applied Life Data Analysis, 1982, John Wiley & Sons, New York, New York.
• Nelson, Wayne, Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, 1990, John Wiley & Sons, New York, New York.
• David K. Lloyd and Myron Lipow, Reliability: Management, Methods, and Mathematics, 1962, Prentice Hall, Englewood Cliffs, New Jersey.
• H. Cramer, Mathematical Methods of Statistics, 1946, Princeton University Press, Princeton, New Jersey.
Approximate Estimates of the Mean and Variance of a Function
Single Parameter Case
For simplicity, consider a one parameter distribution represented by a general function
where
Two Parameter Case
Repeating the previous method for the case of a two parameter distribution, it is generally true that for a function
- and:
Note that the derivatives of Eqn. (var) are evaluated at
Variance and Covariance Determination of the Parameters
The determination of the variance and covariance of the parameters is accomplished via the use of the Fisher information matrix. For a two parameter distribution, and using maximum likelihood estimates, the log likelihood function for censored data (without the constant coefficient) is given by:
Then the Fisher information matrix is given by:
- where
and
So for a sample of
By substituting in the values of the estimated parameters, in this case
Then the variance of a function (
Approximate Confidence Intervals on the Parameters
In general, MLE estimates of the parameters are asymptotically normal, thus if
then:
for large
then from Eqn. (e729):
where
Now by simplifying Eqn. (e731), one can obtain the approximate confidence bounds on the parameter
If
The one-sided approximate confidence bounds on the parameter
The same procedure can be repeated for the case of a two or more parameter distribution. Lloyd and Lipow [24] elaborate on this procedure.
Percentile Confidence Bounds (Type 1 in ALTA)
Percentile confidence bounds are confidence bounds around time. For example, when using the 1-parameter exponential distribution, the corresponding time for a given exponential percentile (i.e., y-ordinate or unreliability,
Percentile bounds (Type 1) return the confidence bounds by determining the confidence intervals around
Reliability Confidence Bounds (Type 2 in ALTA)
Type 2 bounds in ALTA are confidence bounds around reliability. For example, when using the 1-parameter exponential distribution, the reliability function is:
Reliability bounds (Type 2) return the confidence bounds by determining the confidence intervals around