Event Log Data: Difference between revisions

From ReliaWiki
Jump to navigation Jump to search
(Created page with '{{Stubs}}')
 
No edit summary
Line 1: Line 1:
{{Stubs}}
{{Template:LDABOOK_SUB|Additional Reliability Analysis Tools|Event-Log Data Analysis}}
Event logs, or maintenance logs, store information about a piece of equipment's failures and repairs. They provide useful information that can help companies achieve their productivity goals by giving insight about the failure modes, frequency of outages, repair duration, uptime/downtime and availability of the equipment. Event/maintenance logs typically include:
:* The date/time the system monitoring started and the date/time the monitoring ended.
:* The date/time when an event occurred and the date/time when the system was restored to operation.
 
 
The following assumptions are valid for cases in which the component operates through the failure of other components.
 
:*For ''n'' number of failures and repair actions that took place within the event logging period, the times-to-failure of every unique occurrence of an event, are obtained by calculating the time between the last repair and the time the new failure occurred.
 
::<math>Time-to-failure_{i}=t_{1}-r_{i-1}\,\!</math>
 
::where:
:::<math>i=1,...n\,\!</math>
:::<math>t_{i}\,\!</math> is the date/time of occurrence of <math>i\,\!</math>.
:::<math>r_{i-1}\,\!</math> is the date/time of restoration of the previous occurrence <math>(i-1)\,\!</math>.
 
 
:*For systems that were new when the collection of event log data started, the times to first occurrence of every event is equivalent to the date/time of the occurrence of the event minus the time the system monitoring started. That is:
 
::<math>Time-to-failure_{1}=t_{1}-SystemStartTime\,\!</math>
 
 
:*For systems that were not new when the collection of event log data started, the times to first occurrence of every event are considered to be suspensions (right censored) because the system is assumed to have accumulated more hours before the data collection period started. In this case:
 
::<math>Suspension_{1}=t_{1}-SystemStartTime\,\!</math>
 
 
:*When monitoring on the system is stopped or when the system is no longer being used, all events that have not occurred by this time are considered to be suspensions.
 
::<math>LastSuspension=SystemEndTime-r_{n}\,\!</math>
 
 
When the component does not operate through the failures, the assumptions must include the downtime of the system due to the other failures. In other words, the first four equations become:
 
::<math>Time-to-failure_{i}=t_{1}-r_{i-1}-(Downtime\,since\,r_{i-1})\,\!</math>
 
::<math>Time-to-failure_{i}=t_{1}-SystemStartTime-(Downtime\,since\,SystemStartTime)\,\!</math>
 
::<math>Suspension_{1}=t_{1}-SystemStartTime-(Downtime\,since\,SystemStartTime)\,\!</math>
 
::<math>LastSuspension = SystemEndTime-r_{n}-(Downtime\,since\,r_{n})\,\!</math>
 
 
Repair times are obtained by calculating the difference between the date/time of event occurrence and the date/time of restoration:
 
::<math>Times-to-repair_{i}=r_{i}-t_{i}\,\!</math>
 
 
These equations should also take into consideration the periods when the system is not operating or not in use, as in the case of operations that do not run on a 24/7 basis. '''Weibull++''' automatically takes
this into account when converting the event logs into failure/repair data. The failure/repair data of every component in the event log can then be used to derive failure distributions and repair distributions using life data analysis methods.
 
 
==Example==
Consider a very simple system composed of only two components, A and B. The system runs from 8 AM to 5 PM, Monday through Friday. When a failure is observed, the system undergoes repair and the failed component is replaced. The date and time of each failure is recorded in an equipment downtime log, along with an indication of the component that caused the failure. The date and time when the system was restored is also recorded. The downtime log for this simple system is given next.
 
Note that:
:* The date and time of each failure is recorded.
:* The date and time of repair completion for each failure is recorded.
:* The repair involves replacement of the responsible component.
:* The responsible component for each failure is recorded.
 
For this example, we will assume that an engineer began recording these events on January 1, 1997 at 12 PM and stopped recording on March 18, 1997 at 1 PM, at which time the analysis was performed. Information for events prior to January 1 is unknown.
 
The objective of the analysis is to obtain the failure and repair distributions for each component. To do this, the times-to-failure and the times-to-repair for each component need to be computed from the data in the table. Once the times-to-failure data and times-to-repair data have been obtained, a life distribution will be fitted to each data set. The principles and theory for fitting a life distribution is presented in detail in [[Life Distributions]].
 
 
'''Solution'''
 
 
Obtaining Failure and Repair Times for Component A<hr>
We begin the analysis by looking at component A. The first time that component A is known to have failed is recorded in row 1 of the data sheet; thus, the first age (or time-to-failure) for A is the difference between the time we began recording the data and the time when this failure event happened. Also, the component does not age when the system is down due to the failure of another component. Therefore, this time must be taken into account.
 
 
'''1. The First Time-To-Failure for Component A, TTFA[1]'''
 
The first time-to-failure of component A, TTFA[1], is the sum of the hours of operation for each day, starting on the start date (and time) and ending with the failure date (and time). This is shown graphically next. The operating periods are indicated with a green background.

Revision as of 21:51, 23 August 2012

Template:LDABOOK SUB Event logs, or maintenance logs, store information about a piece of equipment's failures and repairs. They provide useful information that can help companies achieve their productivity goals by giving insight about the failure modes, frequency of outages, repair duration, uptime/downtime and availability of the equipment. Event/maintenance logs typically include:

  • The date/time the system monitoring started and the date/time the monitoring ended.
  • The date/time when an event occurred and the date/time when the system was restored to operation.


The following assumptions are valid for cases in which the component operates through the failure of other components.

  • For n number of failures and repair actions that took place within the event logging period, the times-to-failure of every unique occurrence of an event, are obtained by calculating the time between the last repair and the time the new failure occurred.
[math]\displaystyle{ Time-to-failure_{i}=t_{1}-r_{i-1}\,\! }[/math]
where:
[math]\displaystyle{ i=1,...n\,\! }[/math]
[math]\displaystyle{ t_{i}\,\! }[/math] is the date/time of occurrence of [math]\displaystyle{ i\,\! }[/math].
[math]\displaystyle{ r_{i-1}\,\! }[/math] is the date/time of restoration of the previous occurrence [math]\displaystyle{ (i-1)\,\! }[/math].


  • For systems that were new when the collection of event log data started, the times to first occurrence of every event is equivalent to the date/time of the occurrence of the event minus the time the system monitoring started. That is:
[math]\displaystyle{ Time-to-failure_{1}=t_{1}-SystemStartTime\,\! }[/math]


  • For systems that were not new when the collection of event log data started, the times to first occurrence of every event are considered to be suspensions (right censored) because the system is assumed to have accumulated more hours before the data collection period started. In this case:
[math]\displaystyle{ Suspension_{1}=t_{1}-SystemStartTime\,\! }[/math]


  • When monitoring on the system is stopped or when the system is no longer being used, all events that have not occurred by this time are considered to be suspensions.
[math]\displaystyle{ LastSuspension=SystemEndTime-r_{n}\,\! }[/math]


When the component does not operate through the failures, the assumptions must include the downtime of the system due to the other failures. In other words, the first four equations become:

[math]\displaystyle{ Time-to-failure_{i}=t_{1}-r_{i-1}-(Downtime\,since\,r_{i-1})\,\! }[/math]
[math]\displaystyle{ Time-to-failure_{i}=t_{1}-SystemStartTime-(Downtime\,since\,SystemStartTime)\,\! }[/math]
[math]\displaystyle{ Suspension_{1}=t_{1}-SystemStartTime-(Downtime\,since\,SystemStartTime)\,\! }[/math]
[math]\displaystyle{ LastSuspension = SystemEndTime-r_{n}-(Downtime\,since\,r_{n})\,\! }[/math]


Repair times are obtained by calculating the difference between the date/time of event occurrence and the date/time of restoration:

[math]\displaystyle{ Times-to-repair_{i}=r_{i}-t_{i}\,\! }[/math]


These equations should also take into consideration the periods when the system is not operating or not in use, as in the case of operations that do not run on a 24/7 basis. Weibull++ automatically takes this into account when converting the event logs into failure/repair data. The failure/repair data of every component in the event log can then be used to derive failure distributions and repair distributions using life data analysis methods.


Example

Consider a very simple system composed of only two components, A and B. The system runs from 8 AM to 5 PM, Monday through Friday. When a failure is observed, the system undergoes repair and the failed component is replaced. The date and time of each failure is recorded in an equipment downtime log, along with an indication of the component that caused the failure. The date and time when the system was restored is also recorded. The downtime log for this simple system is given next.

Note that:

  • The date and time of each failure is recorded.
  • The date and time of repair completion for each failure is recorded.
  • The repair involves replacement of the responsible component.
  • The responsible component for each failure is recorded.

For this example, we will assume that an engineer began recording these events on January 1, 1997 at 12 PM and stopped recording on March 18, 1997 at 1 PM, at which time the analysis was performed. Information for events prior to January 1 is unknown.

The objective of the analysis is to obtain the failure and repair distributions for each component. To do this, the times-to-failure and the times-to-repair for each component need to be computed from the data in the table. Once the times-to-failure data and times-to-repair data have been obtained, a life distribution will be fitted to each data set. The principles and theory for fitting a life distribution is presented in detail in Life Distributions.


Solution


Obtaining Failure and Repair Times for Component A


We begin the analysis by looking at component A. The first time that component A is known to have failed is recorded in row 1 of the data sheet; thus, the first age (or time-to-failure) for A is the difference between the time we began recording the data and the time when this failure event happened. Also, the component does not age when the system is down due to the failure of another component. Therefore, this time must be taken into account.


1. The First Time-To-Failure for Component A, TTFA[1]

The first time-to-failure of component A, TTFA[1], is the sum of the hours of operation for each day, starting on the start date (and time) and ending with the failure date (and time). This is shown graphically next. The operating periods are indicated with a green background.