Revision as of 20:51, 3 February 2012

New format available! This reference is now available in a new format that offers faster page load, improved display for calculations and images, more targeted search and the latest content available as a PDF. As of September 2023, this Reliawiki page will not continue to be updated. Please update all links and bookmarks to the latest reference at help.reliasoft.com/reference/system_analysis

Chapter 8: Repairable Systems Analysis Through Simulation

Chapter 8

Repairable Systems Analysis Through Simulation

Available Software:
BlockSim

More Resources:
BlockSim examples

Having introduced some of the basic theory and terminology for repairable systems in Chapter 7, we will now examine the steps involved in the analysis of such complex systems. We will begin by examining system behavior through a sequence of discrete deterministic events and expand the analysis using discrete event simulation.

Simple Repairs

Deterministic View, Simple Series

To first understand how component failures and simple repairs affect the system and to visualize the steps involved, let's begin with a very simple deterministic example with two components, [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math], in series.

Component [math]\displaystyle{ A }[/math] fails every 100 hours and component [math]\displaystyle{ B }[/math] fails every 120 hours. Both require 10 hours to get repaired. Furthermore, assume that the surviving component stops operating when the system fails (thus not aging). NOTE: When a failure occurs in certain systems, some or all of the system's components may or may not continue to accumulate operating time while the system is down. For example, consider a transmitter-satellite-receiver system. This is a series system and the probability of failure for this system is the probability that any of the subsystems fail. If the receiver fails, the satellite continues to operate even though the receiver is down. In this case, the continued aging of the components during the system inoperation must be taken into consideration, since this will affect their failure characteristics and have an impact on the overall system downtime and availability.
The system behavior during an operation from 0 to 300 hours would be as shown in Figure fig1.

Overview of system and components for a simple series system with two components. Component A fails every 100 hours and component B fails every 120 hours. Both require 10 hours to get repaired and do not age(operate through failure) when the system is in a failed state.

Specifically, component [math]\displaystyle{ A }[/math] would fail at 100 hours, causing the system to fail. After 10 hours, component [math]\displaystyle{ A }[/math] would be restored and so would the system. The next event would be the failure of component [math]\displaystyle{ B }[/math] . We know that component [math]\displaystyle{ B }[/math] fails every 120 hours (or after an age of 120 hours). Since a component does not age while the system is down, component [math]\displaystyle{ B }[/math] would have reached an age of 120 when the clock reaches 130 hours. Thus, component [math]\displaystyle{ B }[/math] would fail at 130 hours and be repaired by 140 and so forth. Overall in this scenario, the system would be failed for a total of 40 hours due to four downing events (two due to [math]\displaystyle{ A }[/math] and two due to [math]\displaystyle{ B }[/math] ). The overall system availability (average or mean availability) would be [math]\displaystyle{ 260/300=0.86667 }[/math] . Point availability is the availability at a specific point time. In this deterministic case, the point availability would always be equal to 1 if the system is up at that time and equal to zero if the system is down at that time.

Operating Through System Failure

In the prior section we made the assumption that components do not age when the system is down. This assumption applies to most systems. However, under special circumstances, a unit may age even while the system is down. In such cases, the operating profile will be different from the one presented in the prior section. Figure fig2 illustrates the case where the components operate continuously, regardless of the system status.

Overview of up and down states for a simple series system with two components. Component A failes every 100 hours and component B fails every 120 hours. Both require 10 hours to get repaired and age when the system is in a failed state(operate through failure).

Effects of Operating Through Failure

Consider a component with an increasing failure rate, as shown in Figure fig2a. In the case that the component continues to operate through system failure, then when the system fails at [math]\displaystyle{ {{t}_{1}} }[/math] the surviving component's failure rate will be [math]\displaystyle{ {{\lambda }_{1}} }[/math] , as illustrated in Figure fig2a. When the system is restored at [math]\displaystyle{ {{t}_{2}} }[/math] , the component would have aged by [math]\displaystyle{ {{t}_{2}}-{{t}_{1}} }[/math] and its failure rate would now be [math]\displaystyle{ {{\lambda }_{2}} }[/math] .

In the case of a component that does not operate through failure, then the surviving component would be at the same failure rate, [math]\displaystyle{ {{\lambda }_{1}}, }[/math] when the system resumes operation.

Illustration of a component with a linearly increasing failure rate and the effect of operation through system failure.

Deterministic View, Simple Parallel

Consider the following system where [math]\displaystyle{ A }[/math] fails every 100, [math]\displaystyle{ B }[/math] every 120, [math]\displaystyle{ C }[/math] every 140 and [math]\displaystyle{ D }[/math] every 160 time units. Each takes 10 time units to restore. Furthermore, assume that components do not age when the system is down.

A deterministic system view is shown in Figure fig2a. The sequence of events is as follows:

At 100, [math]\displaystyle{ A }[/math] fails and is repaired by 110. The system is failed.
At 130, [math]\displaystyle{ B }[/math] fails and is repaired by 140. The system continues to operate.
At 150, [math]\displaystyle{ C }[/math] fails and is repaired by 160. The system continues to operate.
At 170, [math]\displaystyle{ D }[/math] fails and is repaired by 180. The system is failed.
At 220, [math]\displaystyle{ A }[/math] fails and is repaired by 230. The system is failed.
At 280, [math]\displaystyle{ B }[/math] fails and is repaired by 290. The system continues to operate.
End at 300.

Overview of simple redundant system with four components.

Additional Notes

It should be noted that we are dealing with these events deterministically in order to better illustrate the methodology. When dealing with deterministic events, it is possible to create a sequence of events that one would not expect to encounter probabilistically. One such example consists of two units in series that do not operate through failure but both fail at exactly 100, which is highly unlikely in a real-world scenario. In this case, the assumption is that one of the events must occur at least an infinitesimal amount of time ( [math]\displaystyle{ dt) }[/math] before the other. Probabilistically, this event is extremely rare, since both randomly generated times would have to be exactly equal to each other, to 15 decimal points. In the rare event that this happens, BlockSim would pick the unit with the lowest ID value as the first failure. BlockSim assigns a unique numerical ID when each component is created. These can be viewed by selecting the Show Block ID option in the Diagram Options window.

Deterministic Views of More Complex Systems

Even though the examples presented are fairly simplistic, the same approach can be repeated for larger and more complex systems. The reader can easily observe/visualize the behavior of more complex systems in BlockSim using the Up/Down plots. These are the same plots used in this chapter. It should be noted that BlockSim makes these plots available only when a single simulation run has been performed for the analysis (i.e. Number of Simulations = 1). These plots are meaningless when doing multiple simulations because each run will yield a different plot.

Probabilistic View, Simple Series

In a probabilistic case, the failures and repairs do not happen at a fixed time and for a fixed duration, but rather occur randomly and based on an underlying distribution, as shown in Figures Ch8fig3 and Ch8fig4.

A single component with a probabilistic failure time and repair duration.

We use discrete event simulation in order to analyze (understand) the system behavior. Discrete event simulation looks at each system/component event very similarly to the way we looked at these events in the deterministic example. However, instead of using deterministic (fixed) times for each event occurrence or duration, random times are used. These random times are obtained from the underlying distribution for each event. As an example, consider an event following a 2-parameter Weibull distribution. The [math]\displaystyle{ cdf }[/math] of the 2-parameter Weibull distribution is given by:

[math]\displaystyle{ F(T)=1-{{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}} }[/math]

The Weibull reliability function is given by:

[math]\displaystyle{ \begin{align} R(T)= & 1-F(t) \\ = & {{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}} \end{align} }[/math]

A system up/down plot illustrating a probabilistic failure time and repair duration for component B.

Then, to generate a random time from a Weibull distribution with a given [math]\displaystyle{ \eta }[/math] and [math]\displaystyle{ \beta }[/math] , a uniform random number from 0 to 1, [math]\displaystyle{ {{U}_{R}}[0,1] }[/math] , is first obtained. The random time from a Weibull distribution is then obtained from:

[math]\displaystyle{ {{T}_{R}}=\eta \cdot {{\left\{ -\ln \left[ {{U}_{R}}[0,1] \right] \right\}}^{\tfrac{1}{\beta }}} }[/math]

To obtain a conditional time, the Weibull conditional reliability function is given by:

[math]\displaystyle{ R(T,t)=\frac{R(T+t)}{R(T)}=\frac{{{e}^{-{{\left( \tfrac{T+t}{\eta } \right)}^{\beta }}}}}{{{e}^{-{{\left( \tfrac{T}{\eta } \right)}^{\beta }}}}} }[/math]

Or:

[math]\displaystyle{ R(T,t)={{e}^{-\left[ {{\left( \tfrac{T+t}{\eta } \right)}^{\beta }}-{{\left( \tfrac{T}{\eta } \right)}^{\beta }} \right]}} }[/math]

The random time would be the solution for [math]\displaystyle{ t }[/math] for [math]\displaystyle{ R(T,t)={{U}_{R}}[0,1] }[/math] .
To illustrate the sequence of events, assume a single block with a failure and a repair distribution. The first event, [math]\displaystyle{ {{E}_{{{F}_{1}}}} }[/math] , would be the failure of the component. Its first time-to-failure would be a random number drawn from its failure distribution, [math]\displaystyle{ {{T}_{{{F}_{1}}}} }[/math] . Thus, the first failure event, [math]\displaystyle{ {{E}_{{{F}_{1}}}} }[/math] , would be at [math]\displaystyle{ {{T}_{{{F}_{1}}}} }[/math] . Once failed, the next event would be the repair of the component, [math]\displaystyle{ {{E}_{{{R}_{1}}}} }[/math] . The time to repair the component would now be drawn from its repair distribution, [math]\displaystyle{ {{T}_{{{R}_{1}}}} }[/math] . The component would be restored by time [math]\displaystyle{ {{T}_{{{F}_{1}}}}+{{T}_{{{R}_{1}}}} }[/math] . The next event would now be the second failure of the component after the repair, [math]\displaystyle{ {{E}_{{{F}_{2}}}} }[/math] . This event would occur after a component operating time of [math]\displaystyle{ {{T}_{{{F}_{2}}}} }[/math] after the item is restored (again drawn from the failure distribution), or at [math]\displaystyle{ {{T}_{{{F}_{1}}}}+{{T}_{{{R}_{1}}}}+{{T}_{{{F}_{2}}}} }[/math] . This process is repeated until the end time. It is important to note that each run will yield a different sequence of events due to the probabilistic nature of the times. To arrive at the desired result, this process is repeated many times and the results from each run (simulation) are recorded. In other words, if we were to repeat this 1,000 times, we would obtain 1,000 different values for [math]\displaystyle{ {{E}_{{{F}_{1}}}} }[/math] , or [math]\displaystyle{ \left[ {{E}_{{{F}_{{{1}_{1}}}}}},{{E}_{{{F}_{{{1}_{2}}}}}},...,{{E}_{{{F}_{{{1}_{1,000}}}}}} \right] }[/math].

The average of these values, [math]\displaystyle{ \left( \tfrac{1}{1000}\underset{i=1}{\overset{1,000}{\mathop{\sum }}}\,{{E}_{{{F}_{{{1}_{i}}}}}} \right) }[/math] , would then be the average time to the first event, [math]\displaystyle{ {{E}_{{{F}_{1}}}} }[/math] , or the mean time to first failure (MTTFF) for the component. Obviously, if the component were to be 100% renewed after each repair, then this value would also be the same for the second failure, etc.

General Simulation Results

To further illustrate this, assume that both components in the prior example had normal failure and repair distributions with their means equal to the deterministic values used in the prior example and standard deviations of 10 and 1 respectively. That is, [math]\displaystyle{ {{F}_{A}}\tilde{\ }N(100,10), }[/math] [math]\displaystyle{ {{F}_{B}}\tilde{\ }N(120,10), }[/math] [math]\displaystyle{ {{R}_{A}}={{R}_{B}}\tilde{\ }N(10,1) }[/math] . Obviously, given the probabilistic nature of the example, the times to each event will vary. If one were to repeat this [math]\displaystyle{ X }[/math] number of times, one would arrive at the results of interest for the system and its components. Some of the results for this system and this example, over 1,000 simulations, are given in Figure Ch8fig6 and explained in the next sections. The simulation settings are shown in Figure Ch8fig5a.

BlockSim simulation window.

Summary of system results for 1,000 simulations.

General

Std Deviation (Mean Availability)

This is the standard deviation of the mean availability of all downing events for the system during the simulation.

Mean Availability (w/o PM & Inspection), [math]\displaystyle{ {{\overline{A}}_{CM}} }[/math]

This is the mean availability due to failure events only and it is 0.868 for this example. Note that for this case, the mean availability without preventive maintenance and inspection is identical to the mean availability for all events. This is because no preventive maintenance actions or inspections were defined for this system. We will discuss the inclusion of these actions in later sections.

Downtimes caused by PM and inspections are not included. However, if the PM or inspection action results in the discovery of a failure, then these times are included. As an example, consider a component that has failed but its failure is not discovered until the component is inspected. Then the downtime from the time failed to the time restored after the inspection is counted as failure downtime, since the original event that caused this was the component's failure.

Template:Blocksim sim point availability

Expected Number of Failures, [math]\displaystyle{ {{N}_{F}} }[/math]

This is the average number of system failures. The system failures (not downing events) for all simulations are counted and then averaged. For this case, this is 3.993, which implies that a total of 3,993 system failure events occurred over 1000 simulations. Thus, the expected number of system failures for one run is 3.993. This number includes all failures, even those that may have a duration of zero.

Std Deviation (Number of Failures)

This is the standard deviation of the number of failures for the system during the simulation.

MTTFF

MTTFF is the mean time to first failure for the system. This is computed by keeping track of the time at which the first system failure occurred for each simulation. MTTFF is then the average of these times. This may or may not be identical to the MTTF obtained in the analytical solution for the same reasons as those discussed in the Point Reliability section. For this case, this is 98.856. This is fairly obvious for this case since the mean of one of the components in series was 100 hours.

It is important to note that for each simulation run, if a first failure time is observed, then this is recorded as the system time to first failure. If no failure is observed in the system, then the simulation end time is used as a right censored (suspended) data point. MTTFF is then computed using the total operating time until the first failure divided by the number of observed failures (constant failure rate assumption). Furthermore, and if the simulation end time is much less than the time to first failure for the system, it is also possible that all data points are right censored (i.e. no system failures were observed). In this case, the MTTFF is again computed using a constant failure rate assumption, or:

[math]\displaystyle{ MTTFF=\frac{2\cdot ({{T}_{S}})\cdot N}{\chi _{0.50;2}^{2}} }[/math]

Where [math]\displaystyle{ {{T}_{S}} }[/math] is the simulation end time and [math]\displaystyle{ N }[/math] is the number of simulations. One should be aware that this formulation may yield unrealistic (or erroneous) results if the system does not have a constant failure rate. If you are trying to obtain an accurate (realistic) estimate of this value, then your simulation end time should be set to a value that is well beyond the MTTF of the system (as computed analytically). As a general rule, the simulation end time should be at least three times larger than the MTTF of the system.

System Downing Events

System downing events are events associated with downtime. If the duration of an event is zero, the event is not counted as a system downing event. However, the block properties CM brings system down, PM brings system down and Inspection brings system down take precedence in which case an event with zero duration will be counted as a system downing event.

Number of Failures (System Downing), [math]\displaystyle{ {{N}_{{{F}_{Down}}}} }[/math]

This is the average number of system downing failures. Unlike the Expected Number of Failures, [math]\displaystyle{ {{N}_{F}}, }[/math] this number does not include failures with zero duration. For this example, this is 3.993.

Number of CMs (System Downing), [math]\displaystyle{ {{N}_{C{{M}_{Down}}}} }[/math]

This is the number of corrective maintenance actions that caused the system to fail. It is obtained by taking the sum of all CM actions that caused the system to fail divided by the number of simulations. It does not include CM events of zero duration. For this example, this is 3.993. Note that this may differ from the Number of Failures (System Downing), [math]\displaystyle{ {{N}_{{{F}_{Down}}}} }[/math] . An example would be a case where the system has failed, but due to other settings for the simulation, a CM is not initiated (e.g. an inspection is needed to initiate a CM).

Number of Inspections (System Downing), [math]\displaystyle{ {{N}_{{{I}_{Down}}}} }[/math]

This is the number of inspection actions that caused the system to fail. It is obtained by taking the sum of all inspection actions that caused the system to fail divided by the number of simulations. It does not include inspection events of zero duration. For this example, this is zero.

Number of PMs (System Downing), [math]\displaystyle{ {{N}_{P{{M}_{Down}}}} }[/math]

This is the number of PM actions that caused the system to fail. It is obtained by taking the sum of all PM actions that caused the system to fail divided by the number of simulations. It does not include PM events of zero duration. For this example, this is zero.

Total Events (System Downing), [math]\displaystyle{ {{N}_{AL{{L}_{Down}}}} }[/math]

This is the total number of system downing events. It also does not include events of zero duration. It is possible that this number may differ from the sum of the other listed events. As an example, consider the case where a failure does not get repaired until an inspection, but the inspection occurs after the simulation end time. In this case, the number of inspections, CMs and PMs will be zero while the number of total events will be one.

Costs and Throughput

Cost and throughput results are discussed in later sections.

Note About Overlapping Downing Events

It is important to note that two identical system downing events (that are continuous or overlapping) may be counted and viewed differently. As shown in Case 1 of Figure fig7, two overlapping failure events are counted as only one event from the system perspective because the system was never restored and remained in the same down state, even though that state was caused by two different components. Thus, the number of downing events in this case is one and the duration is as shown in CM system. In the case that the events are different, as shown in Case 2 of Figure fig7, two events are counted, the CM and the PM. However, the downtime attributed to each event is different from the actual time of each event. In this case, the system was first down due to a CM and remained in a down state due to the CM until that action was over. However, immediately upon completion of that action, the system remained down but now due to a PM action. In this case, only the PM action portion that kept the system down is counted.

Duration and count of different overlapping events.

System Point Result

The system point results, as shown in Figure fig8, shows the Point Availability (All Events), [math]\displaystyle{ A\left( t \right) }[/math] , and Point Reliability, [math]\displaystyle{ R(t) }[/math] , as defined in the previous section. These are computed and returned at different points in time, based on the number of intervals selected by the user. Additionally, this window shows [math]\displaystyle{ (1-A\left( t \right)) }[/math] , [math]\displaystyle{ (1-R(t)) }[/math] , [math]\displaystyle{ Cost(t) }[/math] , [math]\displaystyle{ Mean }[/math] [math]\displaystyle{ A(t) }[/math] , [math]\displaystyle{ Mean }[/math] [math]\displaystyle{ A({{t}_{i}}-{{t}_{i-1}}) }[/math] , [math]\displaystyle{ System }[/math] [math]\displaystyle{ Failures(t) }[/math] , and [math]\displaystyle{ Throughput(t) }[/math] .

System point results. the number of intervals shown is vased on the increments set (Figure 8.7). In this figure, the number of increments set was 300, which implies that the results should be shown ever 1 tu. The results shown in this figure are for 10 increments, or shown every 30 tu.

Results by Component

Simulation results for each component can also be viewed. Figure fig9 shows the results for component A. These results are explained in the sections that follow.

The Block Details results for component A.

General Information

Number of Downing Events, [math]\displaystyle{ Componen{{t}_{NDE}} }[/math]

This the number of times the component went down (failed). It includes all downing events.

Number of SD Events, [math]\displaystyle{ Componen{{t}_{NSDE}} }[/math]

This is the number of times that this component's downing caused the system to be down. For component [math]\displaystyle{ A }[/math] , this is 2.011. Note that this value is the same in this case as the number of component failures, since the two components are reliability-wise in series. If this were not the case (e.g. if they were in a parallel configuration), this value would be different.

Number of Failures, [math]\displaystyle{ Componen{{t}_{NF}} }[/math]

This is the number of times the component failed and does not include other downing events. Note that this could also be interpreted as the number of spare parts required for CM actions for this component. For component [math]\displaystyle{ A }[/math] , this is 2.011.

Number of SD Failures, [math]\displaystyle{ Componen{{t}_{NSDF}} }[/math]

This is the number of times that this component's failure caused the system to be down. Note that this may be different from the Number of SD Events. It only counts the failure events that downed the system and does not include zero duration system failures.

Mean Availability (All Events), [math]\displaystyle{ {{\overline{A}}_{AL{{L}_{Component}}}} }[/math]

This has the same definition as for the system with the exception that this accounts only for the component.

Mean Availability (w/o PM & Inspection), [math]\displaystyle{ {{\overline{A}}_{C{{M}_{Component}}}} }[/math]

This has the same definition as for the system with the exception that this accounts only for the component.

Template:Block uptime

Template:Block downtime

Metrics

Template:Rs deci

MTBDE

This is the mean time between downing events of the component, which is computed from:

[math]\displaystyle{ MTBDE=\frac{{{T}_{Componen{{t}_{UP}}}}}{Componen{{t}_{NDE}}} }[/math]

For component [math]\displaystyle{ A }[/math] , this is 139.2168.

Template:Rs fci

MTBF, [math]\displaystyle{ MTB{{F}_{C}} }[/math]

Mean time between failures is the mean (average) time between failures of this component, in real clock time. This is computed from:

[math]\displaystyle{ MTB{{F}_{C}}=\frac{{{T}_{S}}-CFDowntime}{Componen{{t}_{NF}}} }[/math]

[math]\displaystyle{ CFDowntime }[/math] is the downtime of the component due to failures only (without PM and inspection). The discussion regarding what is a failure downtime that was presented in the section explaining Mean Availability (w/o PM & Inspection) also applies here. For component [math]\displaystyle{ A }[/math] , this is 139.2168. Note that this value could fluctuate for the same component depending on the simulation end time. As an example, consider the deterministic scenario for this component. It fails every 100 hours and takes 10 hours to repair. Thus, it would be failed at 100, repaired by 110, failed at 210 and repaired by 220. Therefore, its uptime is 280 with two failure events, MTBF = 280/2 = 140. Repeating the same scenario with an end time of 330 would yield failures at 100, 210 and 320. Thus, the uptime would be 300 with three failures, or MTBF = 300/3 = 100. Note that this is not the same as the MTTF (mean time to failure), commonly referred to as MTBF by many practitioners.

Mean Downtime per Event, [math]\displaystyle{ MDPE }[/math]

Mean downtime per event is the average downtime for a component event. This is computed from:

[math]\displaystyle{ MDPE=\frac{{{T}_{Componen{{t}_{Down}}}}}{Componen{{t}_{NDE}}} }[/math]

Other Results of Interest

The remaining component (block) results are similar to those defined for the system with the exception that now they apply only to the component.

Imperfect Repairs

Restoration Factors (RF)

In the prior discussion it was assumed that a repaired component is as good as new after repair. This is usually the case when replacing a component with a new one. The concept of a restoration factor may be used in cases in which one wants to model imperfect repair, or a repair with a used component. The best way to indicate that a component is not as good as new is to give the component some age. As an example, if one is dealing with car tires, a tire that is not as good as new would have some pre-existing wear on it. In other words, the tire would have some accumulated mileage. A restoration factor concept is used to better describe the existing age of a component. The restoration factor is used to determine the age of the component after a repair or any other maintenance action (addressed in later sections, such as a PM action or inspection).

The restoration factor in BlockSim is defined as a number between 0 and 1 and has the following effect:

A restoration factor of 1 (100%) implies that the component is as good as new after repair, which in effect implies that the starting age of the component is 0.
A restoration factor of 0 implies that the component is the same as it was prior to repair, which in effect implies that the starting age of the component is the same as the age of the component at failure.
A restoration factor of 0.25 (25%) implies that the starting age of the component is equal to 75% of the age of the component at failure.

Figure figrestore provides a visual demonstration of restoration factors. It should be noted that for successive maintenance actions on the same component, the age of the component after such an action is the initial age plus the time to failure since the last maintenance action.

Different restoration factors(RF).

Type I and Type II RFs

BlockSim 7 offers two kinds of restoration factors. The type I restoration factor is based on Kijima [12, 13] model I and assumes that the repairs can only fix the wear-out and damage incurred during the last period of operation. Thus, the nth repair can only remove the damage incurred during the time between the (n-1)th and nth failures. The type II restoration factor, based on Kijima model II, assumes that the repairs fix all of the wear-out and damage accumulated up to the current time. As a result, the nth repair not only removes the damage incurred during the time between the (n-1)th and nth failures, but can also fix the cumulative damage incurred during the time from the first failure to the (n-1)th failure.

A Repairable System Structure

To illustrate this, consider a repairable system, observed from time [math]\displaystyle{ t=0 }[/math] , as shown in Figure RFInIIsys. Let the successive failure times be denoted by [math]\displaystyle{ {{t}_{1}} }[/math] , [math]\displaystyle{ {{t}_{2}} }[/math] , ... and let the times between failures be denoted by [math]\displaystyle{ {{x}_{1}} }[/math] , [math]\displaystyle{ {{x}_{2}} }[/math] , .... Let [math]\displaystyle{ RF }[/math] denote the restoration factor, then the age of the system [math]\displaystyle{ {{v}_{n}} }[/math] at time [math]\displaystyle{ {{t}_{n}} }[/math] using the two types of restoration factors is:
Type I Restoration Factor:

[math]\displaystyle{ {{v}_{n}}={{v}_{n-1}}+(1-RF){{x}_{n}} }[/math]

Type II Restoration Factor:

[math]\displaystyle{ {{v}_{n}}=(1-RF)({{v}_{n-1}}+{{x}_{n}}) }[/math]

Illustrating Type I RF Through an Example

Assume that you have a component with a Weibull failure distribution ( [math]\displaystyle{ \beta =1.5 }[/math] , [math]\displaystyle{ \eta =1000 }[/math] [math]\displaystyle{ hr }[/math] ), RF type I = 0.25 and the component undergoes instant repair. Furthermore, assume that the component starts life new (i.e. with a start age of zero). The simulation steps are as follows:

Generate a uniform random number, [math]\displaystyle{ {{U}_{R}}[0,1] }[/math] = 0.7021885.
The first failure event will then be at 500 hrs.
After instantaneous repair, the component will begin life with an age after repair of 350 hrs [math]\displaystyle{ (500\times (1-0.25)) }[/math] .
Generate another uniform random number, [math]\displaystyle{ {{U}_{R}}[0,1] }[/math] = 0.8824969.
The next failure event is now determined using the conditional reliability equation, or:

[math]\displaystyle{ \begin{align} R(t+T)= & R(t,T)\cdot R(T) \\ R(t+350)= & 0.8824969\cdot R(350) \\ R(t+350)= & 0.8824969\cdot 0.8129686 \\ R(t+350)= & 0.71744226 \\ t+350= & 479.527 \\ t = & 129.527 \end{align} }[/math]

Thus, the next failure event will be at [math]\displaystyle{ 500+129.527=629.527 }[/math] hrs. Note that if the component had been as good as new (i.e. RF = 100%), then the next failure would have been at 750 hrs (500 + 250), where 250 is the time corresponding to a reliability of 0.8824969, which is the random number that was generated in Step 4.

6. At this failure point, the item's age will now be equal to the initial age, after the first corrective action, plus the additional time it operated, or [math]\displaystyle{ 350+129.527 }[/math] hrs.

7. Thus, the age after the second repair will be the sum of the previous age and the restoration factor times the age of the component since the last failure, or [math]\displaystyle{ 350+(129.527\times (1-0.25))=447.14525 }[/math] hrs.

8. Go to Step 4 and repeat the process.

Illustrating Type II RF Through an Example

Assume that you have a component with a Weibull failure distribution ( [math]\displaystyle{ \beta =1.5 }[/math] , [math]\displaystyle{ \eta =1000 }[/math] [math]\displaystyle{ hr }[/math] ), RF type II = 0.25 and the component undergoes instant repair. Furthermore, assume that the component starts life new (i.e. with a start age of zero). The simulation steps are as follows:

Generate a uniform random number, [math]\displaystyle{ {{U}_{R}}[0,1] }[/math] = 0.7021885.
The first failure event will then be at 500 hrs.
After instantaneous repair, the component will begin life with an age after repair of 350 hrs [math]\displaystyle{ (500\times (1-0.25)) }[/math] .
Generate another uniform random number, [math]\displaystyle{ {{U}_{R}}[0,1] }[/math] = 0.8824969.
The next failure event is now determined using the conditional reliability equation, or:

[math]\displaystyle{ \begin{align} R(t+T)= & R(t,T)\cdot R(T) \\ R(t+350)= & 0.8824969\cdot R(350) \\ R(t+350)= & 0.8824969\cdot 0.8129686 \\ R(t+350)= & 0.71744226 \\ t+350= & 479.527 \\ t= & 129.527 \end{align} }[/math]

Thus, the next failure event will be at [math]\displaystyle{ 500+129.527=629.527 }[/math] hrs. Note that if the component had been as good as new (i.e. RF = 100%), then the next failure would have been at 750 hrs (500 + 250), where 250 is the time corresponding to a reliability of 0.8824969, which is the random number that was generated in Step 4.

6. At this failure point, the item's age will now be equal to the initial age, after the first corrective action, plus the additional time it operated, or [math]\displaystyle{ 350+129.527 }[/math] .

7. Thus, the age after the second repair will be the restoration factor times the age of the component at failure, or [math]\displaystyle{ (350+129.527)\times (1-0.25)=359.64525 }[/math] hrs.

8. Go to Step 4 and repeat the process.

Discussion of Type I and Type II RFs

As an application example, consider an automotive engine that fails after six years of operation. The engine is rebuilt. The rebuild has the effect of rejuvenating the engine to a condition as if it were three years old (i.e. a 50% RF). Assume that the rebuild affects all of the damage on the engine (i.e. a Type II restoration). The engine fails again after three years (when it again reaches an age of six) and another rebuild is required. This rebuild will also rejuvenate the engine by 50%, thus making it three years old again.

Now consider a similar engine subjected to a similar rebuild, but that the rebuild only affects the damage since the last repair (i.e. a Type I restoration of 50%). The first rebuild will rejuvenate the engine to a three-year-old condition. The engine will fail again after three years, but the rebuild this time will only affect the age (of three years) after the first rebuild. Thus the engine will have an age of four and a half years after the second rebuild ( [math]\displaystyle{ 3+3\times (1-0.5)=4.5 }[/math] ). After the second rebuild the engine will fail again after a period of one and a half years and a third rebuild will be required. The age of the engine after the third rebuild will be five years and three months ( [math]\displaystyle{ 4.5+1.5\times (1-0.5)=5.25 }[/math] ).

It should be pointed out that when dealing with constant failure rates (i.e. with a distribution such as the exponential), the restoration factor has no effect.

Calculations to obtain RFs

The two types of restoration factors discussed in the previous sections can be calculated using the parametric RDA (Recurrent Data Analysis) tool in Weibull++ 7. This tool uses the GRP (General Renewal Process) model to analyze failure data of a repairable item. More information on the Parametric RDA tool and the GRP (General Renewal Process) model can be found in [25]. As an example, consider the times to failure for an air-conditioning unit of an aircraft recorded in the following table. Assume that each time the unit is repaired, the repair can only remove the damage incurred during the last period of operation. This assumption implies a type I RF factor which is specified as an analysis setting in the Weibull++ folio. The type I RF for the air-conditioning unit can be calculated using the results from Weibull++ shown in Figure RFtypeIRDAEx.

Using the Parametric RDA tool in Weibull++ to calculate restoration factors.

The value of the action effectiveness factor [math]\displaystyle{ q }[/math] obtained from Weibull++ is:

[math]\displaystyle{ q=0.1344 }[/math]

The type I RF factor is calculated using [math]\displaystyle{ q }[/math] as:

[math]\displaystyle{ \begin{align} RF= & 1-q \\ = & 1-0.1344 \\ = & 0.8656 \end{align} }[/math]

The parameters of the Weibull distribution for the air-conditioning unit can also be calculated. [math]\displaystyle{ \beta }[/math] is obtained from Weibull++ as 1.1976. [math]\displaystyle{ \eta }[/math] can be calculated using the [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \lambda }[/math] values from Weibull++ as:

[math]\displaystyle{ \begin{align} \eta = & {{\left( \frac{1}{\lambda } \right)}^{\tfrac{1}{\beta }}} \\ = & {{\left( \frac{1}{0.0049} \right)}^{\tfrac{1}{1.1976}}} \\ = & 84.8582 \end{align} }[/math]

The values of the type I RF, [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \eta }[/math] calculated above can now be used to model the air-conditioning unit as a component in BlockSim.