Repairable Systems Analysis
The previous chapters presented analysis methods for data obtained during developmental testing. However, data from systems in the field can also be analyzed in the RGA software. This type of data is called fielded systems data and is analogous to warranty data. Fielded systems can be categorized into two basic types: one-time or nonrepairable systems and reusable or repairable systems. In the latter case, under continuous operation, the system is repaired, but not replaced after each failure. For example, if a water pump in a vehicle fails, the water pump is replaced and the vehicle is repaired.
This chapter presents repairable systems analysis where the reliability of a system can be tracked and quantified based on data from multiple systems in the field. The next chapter will present fleet analysis where data from multiple systems in the field can be collected and analyzed so that reliability metrics for the fleet as a whole can be quantified.
Background
Most complex systems, such as automobiles, communication systems, aircraft, printers, medical diagnostics systems, helicopters, etc., are repaired and not replaced when they fail. When these systems are fielded or subjected to a customer use environment, it is often of considerable interest to determine the reliability and other performance characteristics under these conditions. Areas of interest may include assessing the expected number of failures during the warranty period, maintaining a minimum mission reliability, evaluating the rate of wearout, determining when to replace or overhaul a system and minimizing life cycle costs. In general, a lifetime distribution, such as the Weibull distribution, cannot be used to address these issues. In order to address the reliability characteristics of complex repairable systems, a process is often used instead of a distribution. The most popular process model is the Power Law model. This model is popular for several reasons. One is that it has a very practical foundation in terms of minimal repair. This is the situation when the repair of a failed system is just enough to get the system operational again. Second, if the time to first failure follows the Weibull distribution, then each succeeding failure is governed by the Power Law model in the case of minimal repair. From this point of view, the Power Law model is an extension of the Weibull distribution.
Sometimes, the Crow Extended model, which was introduced in a previous chapter for analyzing developmental data, is also applied for fielded repairable systems. Applying the Crow Extended model on repairable system data allows analysts to project the system MTBF after reliability-related issues are addressed during the field operation. Projections are calculated based on the mode classifications (A, BC and BD). The calculation procedure is the same as the one for the developmental data.and is not repeated in this chapter.
Distribution Example
Visualize a socket into which a component is inserted at time
Each component life
A distribution, such as the Weibull, governs a single lifetime. There is only one event associated with a distribution. The distribution
A distribution is also characterized by its density function, such that:
The density function for the Weibull distribution is:
In addition, an important reliability property of a distribution function is the failure rate, which is given by:
The interpretation of the failure rate is that for a small interval of time
It is important to note the condition that the component has not failed by time
Process Example
Now suppose that a system consists of many components with each component in a socket. A failure in any socket constitutes a failure of the system. Each component in a socket is a renewal process governed by its respective distribution function. When the system fails due to a failure in a socket, the component is replaced and the socket is again as good as new. The system has been repaired. Because there are many other components still operating with various ages, the system is not typically put back into a like new condition after the replacement of a single component. For example, a car is not as good as new after the replacement of a failed water pump. Therefore, distribution theory does not apply to the failures of a complex system, such as a car. In general, the intervals between failures for a complex repairable system do not follow the same distribution. Distributions apply to the components that are replaced in the sockets but not at the system level. At the system level, a distribution applies to the very first failure. There is one failure associated with a distribution. For example, the very first system failure may follow a Weibull distribution.
For many systems in a real world environment, a repair is only enough to get the system operational again. If the water pump fails on the car, the repair consists only of installing a new water pump. If a seal leaks, the seal is replaced but no additional maintenance is done, etc. This is the concept of minimal repair. For a system with many failure modes, the repair of a single failure mode does not greatly improve the system reliability from what it was just before the failure. Under minimal repair for a complex system with many failure modes, the system reliability after a repair is the same as it was just before the failure. In this case, the sequence of failure at the system level follows a non-homogeneous Poisson process (NHPP).
The system age when the system is first put into service is time
Under minimal repair, the system intensity function is:
This is the Power Law model. It can be viewed as an extension of the Weibull distribution. The Weibull distribution governs the first system failure and the Power Law model governs each succeeding system failure. If the system has a constant failure intensity
Therefore, the probability
This is referred to as a homogeneous Poisson process because there is no change in the intensity function. This is a special case of the Power Law model for
Power Law Model
The Power Law model is often used to analyze the reliability for complex repairable systems in the field. A system of interest may be the total system, such as a helicopter, or it may be subsystems, such as the helicopter transmission or rotator blades. When these systems are new and first put into operation, the start time is
Some system types may be overhauled and some may not, depending on the maintenance policy. For example, an automobile may not be overhauled but helicopter transmissions may be overhauled after a period of time. In practice, an overhaul may not convert the system reliability back to where it was when the system was new. However, an overhaul will generally make the system more reliable. Appropriate data for the Power Law model is over cycles. If a system is not overhauled, then there is only one cycle and the zero time is when the system is first put into operation. If a system is overhauled, then the same serial number system may generate many cycles. Each cycle will start a new zero time, the beginning of the cycle. The age of the system is from the beginning of the cycle. For systems that are not overhauled, there is only one cycle and the reliability characteristics of a system as the system ages during its life is of interest. For systems that are overhauled, you are interested in the reliability characteristics of the system as it ages during its cycle.
For the Power Law model, a data set for a system will consist of a starting time
There are many ways to generate a random sample of
In addition, the warranty period may be of particular interest. In this case, randomly choose
This is the mission reliability for a system of age
Parameter Estimation
Suppose that the number of systems under study is
where
If
The following example illustrates these estimation procedures.
Power Law Model Example
For the data in the following table, the starting time for each system is equal to
Repairable system failure data | ||
System 1 ( |
System 2 ( |
System 3 ( |
---|---|---|
1.2 | 1.4 | 0.3 |
55.6 | 35.0 | 32.6 |
72.7 | 46.8 | 33.4 |
111.9 | 65.9 | 241.7 |
121.9 | 181.1 | 396.2 |
303.6 | 712.6 | 444.4 |
326.9 | 1005.7 | 480.8 |
1568.4 | 1029.9 | 588.9 |
1913.5 | 1675.7 | 1043.9 |
1787.5 | 1136.1 | |
1867.0 | 1288.1 | |
1408.1 | ||
1439.4 | ||
1604.8 | ||
Solution
Because the starting time for each system is equal to zero and each system has an equivalent ending time, the general equations for
The system failure intensity function is then estimated by:
The next figure is a plot of
Goodness-of-Fit Tests for Repairable System Analysis
It is generally desirable to test the compatibility of a model and data by a statistical goodness-of-fit test. A parametric Cramér-von Mises goodness-of-fit test is used for the multiple system and repairable system Power Law model, as proposed by Crow in [17]. This goodness-of-fit test is appropriate whenever the start time for each system is 0 and the failure data is complete over the continuous interval
Cramér-von Mises Test
To illustrate the application of the Cramér-von Mises statistic for multiple system data, suppose that
Step 1: If
Step 2: For each system divide each successive failure time by the corresponding end time
Step 3: Next calculate
Step 4: Treat the
Step 5: Calculate the parametric Cramér-von Mises statistic.
Critical values for the Cramér-von Mises test are presented in Table B.2 of Appendix B.
Step 6: If the calculated
Cramér-von Mises Example
For the data from Example 1, use the Cramér-von Mises test to examine the compatibility of the model at a significance level
Solution
Step 1:
Step 2: Calculate
Step 3: Calculate
Step 4: Calculate
Step 5: Find the critical value (CV) from Table B.2 for
Step 6: Since
Chi-Squared Test
The parametric Cramér-von Mises test described above requires that the starting time,
where
Confidence Bounds for Repairable Systems Analysis
Bounds on
Fisher Matrix Bounds
The parameter
All variance can be calculated using the Fisher Information Matrix.
Crow Bounds
Calculate the conditional maximum likelihood estimate of
The Crow 2-sided
Bounds on
Fisher Matrix Bounds
The parameter
The approximate confidence bounds on
where
Crow Bounds
Time Terminated
The confidence bounds on
Failure Terminated
The confidence bounds on
Bounds on Growth Rate
Since the growth rate is equal to
If Fisher Matrix confidence bounds are used then
Bounds on Cumulative MTBF
Fisher Matrix Bounds
The cumulative MTBF,
The approximate confidence bounds on the cumulative MTBF are then estimated from:
- where:
The variance calculation is the same as Eqns. (var1), (var2) and (var3).
Crow Bounds
To calculate the Crow confidence bounds on cumulative MTBF, first calculate the Crow cumulative failure intensity confidence bounds:
- Then
Bounds on Instantaneous MTBF
Fisher Matrix Bounds
The instantaneous MTBF,
The approximate confidence bounds on the instantaneous MTBF are then estimated from:
- where:
The variance calculation is the same as (var1), (var2) and (var3).
Crow Bounds
Failure Terminated Data
To calculate the bounds for failure terminated data, consider the following equation:
Find the values
where
where
Time Terminated Data
To calculate the bounds for time terminated data, consider the following equation where
Find the values
Calculate
where
where
Bounds on Cumulative Failure Intensity
Fisher Matrix Bounds
The cumulative failure intensity,
The approximate confidence bounds on the cumulative failure intensity are then estimated using:
- where:
- and:
The variance calculation is the same as Eqns. (var1), (var2) and (var3):
Crow Bounds
The Crow cumulative failure intensity confidence bounds are given by:
Bounds on Instantaneous Failure Intensity
Fisher Matrix Bounds
The instantaneous failure intensity,
The approximate confidence bounds on the instantaneous failure intensity are then estimated from:
where
The variance calculation is the same as Eqns. (var1), (var2) and (var3):
Crow Bounds
The Crow instantaneous failure intensity confidence bounds are given as:
Bounds on Time Given Cumulative MTBF
Fisher Matrix Bounds
The time,
The confidence bounds on the time are given by:
- where:
The variance calculation is the same as Eqns. (var1), (var2) and (var3).
Crow Bounds
Step 1: Calculate:
Step 2: Estimate the number of failures:
Step 3: Obtain the confidence bounds on time given the cumulative failure intensity by solving for
Bounds on Time Given Instantaneous MTBF
Fisher Matrix Bounds
The time,
The confidence bounds on the time are given by:
- where:
The variance calculation is the same as Eqns. (var1), (var2) and (var3).
Crow Bounds
Step 1: Calculate the confidence bounds on the instantaneous MTBF as presented in Section 5.5.2.
Step 2: Calculate the bounds on time as follows.
Failure Terminated Data
So the lower an upper bounds on time are:
Time Terminated Data
So the lower and upper bounds on time are:
Bounds on Time Given Cumulative Failure Intensity
Fisher Matrix Bounds
The time,
The confidence bounds on the time are given by:
- where:
The variance calculation is the same as Eqns. (var1), (var2) and (var3):
Crow Bounds
Step 1: Calculate:
Step 2: Estimate the number of failures:
Step 3: Obtain the confidence bounds on time given the cumulative failure intensity by solving for
Bounds on Time Given Instantaneous Failure Intensity
Fisher Matrix Bounds
These bounds are based on:
The confidence bounds on the time are given by:
- where:
The variance calculation is the same as Eqns. (var1), (var2) and (var3).
Crow Bounds
Step 1: Calculate
Step 2: Use the equations from 13.1.7.9 to calculate the bounds on time given the instantaneous failure intensity.
Bounds on Reliability
Fisher Matrix Bounds
These bounds are based on:
The confidence bounds on reliability are given by:
The variance calculation is the same as Eqns. (var1), (var2) and (var3).
Crow Bounds
Failure Terminated Data
With failure terminated data, the 100(
- where
Time Terminated Data
With time terminated data, the 100(
- where:
Bounds on Time Given Reliability and Mission Time
Fisher Matrix Bounds
The time,
The confidence bounds on time are calculated by using:
- where:
is calculated numerically from:
The variance calculations are done by:
Crow Bounds
Failure Terminated Data
Step 1: Calculate
Step 2: Let
Step 3: Let
Step 4: If
Time Terminated Data
Step 1: Calculate
Step 2: Let
Step 3: Let
Step 4: If
Bounds on Mission Time Given Reliability and Time
Fisher Matrix Bounds
The mission time,
The confidence bounds on mission time are given by using:
- where:
Calculate
The variance calculations are done by:
Crow Bounds
Failure Terminated Data
Step 1: Calculate
Step 2: Let
Step 3: Let
Step 4: If
Time Terminated Data
Step 1: Calculate
Step 2: Let
Step 3: Let
Step 4: If
Bounds on Cumulative Number of Failures
Fisher Matrix Bounds
The cumulative number of failures,
- where:
The variance calculation is the same as Eqns. (var1), (var2) and (var3).
Crow Bounds
where
Confidence Bounds Example
Using the data from Example 1, calculate the mission reliability at
Solution
The maximum likelihood estimates of
From Eq. (reliability), the mission reliability at
At the 90% confidence level and
The Crow confidence bounds for the mission reliability are:
Figures ConfReliFish and ConfRelCrow show the Fisher Matrix and Crow confidence bounds on mission reliability for mission time
Economical Life Model
One consideration in reducing the cost to maintain repairable systems is to establish an overhaul policy that will minimize the total life cost of the system. However, an overhaul policy makes sense only if
Denote
So the average system cost is:
The instantaneous maintenance cost at time
The following equation holds at optimum overhaul time
- Therefore:
When there is no scheduled maintenance, Eqn. (ecolm) becomes:
The optimum overhaul time,
More Examples
Example 6 (repairable system data)
This case study is based on the data given in the article Graphical Analysis of Repair Data by Dr. Wayne Nelson [23]. The data in Table 13.10 represents repair data on an automatic transmission from a sample of 34 cars. For each car, the data set shows mileage at the time of each transmission repair, along with the latest mileage. The + indicates the latest mileage observed without failure. Car 1, for example, had a repair at 7068 miles and was observed until 26,744 miles. Do the following:
- 1) Estimate the parameters of the Power Law model.
- 2) Estimate the number of warranty claims for a 36,000 mile warranty policy for an estimated fleet of 35,000 vehicles.
Table 13.10 - Automatic transmission data | ||||
Car | Mileage | Car | Mileage | |
---|---|---|---|---|
1 | 7068, 26744+ | 18 | 17955+ | |
2 | 28, 13809+ | 19 | 19507+ | |
3 | 48, 1440, 29834+ | 20 | 24177+ | |
4 | 530, 25660+ | 21 | 22854+ | |
5 | 21762+ | 22 | 17844+ | |
6 | 14235+ | 23 | 22637+ | |
7 | 1388, 18228+ | 24 | 375, 19607+ | |
8 | 21401+ | 25 | 19403+ | |
9 | 21876+ | 26 | 20997+ | |
10 | 5094, 18228+ | 27 | 19175+ | |
11 | 21691+ | 28 | 20425+ | |
12 | 20890+ | 29 | 22149+ | |
13 | 22486+ | 30 | 21144+ | |
14 | 19321+ | 31 | 21237+ | |
15 | 21585+ | 32 | 14281+ | |
16 | 18676+ | 33 | 8250, 21974+ | |
17 | 23520+ | 34 | 19250, 21888+ |
Solution
- 1) The estimated Power Law parameters are shown in Figure Repair3.
- 2) The expected number of failures at 36,000 miles can be estimated using the QCP as shown in Figure Repair4. The model predicts that 0.3559 failures per system will occur by 36,000 miles. This means that for a fleet of 35,000 vehicles, the expected warranty claims are 0.3559 * 35,000 = 12,456.
Example 7 (repairable system data)
Field data have been collected for a system that begins its wearout phase at time zero. The start time for each system is equal to zero and the end time for each system is 10,000 miles. Each system is scheduled to undergo an overhaul after a certain number of miles. It has been determined that the cost of an overhaul is four times more expensive than a repair. Table 13.11 presents the data. Do the following:
- 1) Estimate the parameters of the Power Law model.
- 2) Determine the optimum overhaul interval.
- 3) If
, would it be cost-effective to implement an overhaul policy?
Table 13.11 - Field data | ||
System 1 | System 2 | System 3 |
---|---|---|
1006.3 | 722.7 | 619.1 |
2261.2 | 1950.9 | 1519.1 |
2367 | 3259.6 | 2956.6 |
2615.5 | 4733.9 | 3114.8 |
2848.1 | 5105.1 | 3657.9 |
4073 | 5624.1 | 4268.9 |
5708.1 | 5806.3 | 6690.2 |
6464.1 | 5855.6 | 6803.1 |
6519.7 | 6325.2 | 7323.9 |
6799.1 | 6999.4 | 7501.4 |
7342.9 | 7084.4 | 7641.2 |
7736 | 7105.9 | 7851.6 |
8246.1 | 7290.9 | 8147.6 |
7614.2 | 8221.9 | |
8332.1 | 9560.5 | |
8368.5 | 9575.4 | |
8947.9 | ||
9012.3 | ||
9135.9 | ||
9147.5 | ||
9601 |
Solution
- 1) Figure Repair5 shows the estimated Power Law parameters.
- 2) The QCP can be used to calculate the optimum overhaul interval as shown in Figure Repair6.
- 3) Since
then the systems are not wearing out and it would not be cost-effective to implement an overhaul policy. An overhaul policy makes sense only if the systems are wearing out. Otherwise, an overhauled unit would have the same probability of failing as a unit that was not overhauled.
Example 8 (repairable system data)
Failures and fixes of two repairable systems in the field are recorded. Both systems start from time 0. System 1 ends at time = 504 and system 2 ends at time = 541. All the BD modes are fixed at the end of the test. A fixed effectiveness factor equal to 0.6 is used. Answer the following questions:
- 1) Estimate the parameters of the Crow Extended model.
- 2) Calculate the projected MTBF after the delayed fixes.
- 3) What is the expected number of failures at time 1,000, if no fixes were performed for the future failures?
Solution
- 1) Figure CrowExtendedRepair shows the estimated Crow Extended parameters.
- 2) Figure CrowExtendedMTBF shows the projected MTBF at time = 541 (i.e. the age of the oldest system).
- 3) Figure CrowExtendedNumofFailure shows the expected number of failures at time = 1,000.