Sunteți pe pagina 1din 19

Failure Data Analysis Tools

2010/7/3

Tools for Failure Data Analysis y


Dr. Albert H.C. Tsang

Phone: (852) 27666591

Fax: (852) 23625267

email: albert.tsang@polyu.edu.hk

Importance of Maintenance
What p percentage g of y your organizations g total operating budget is maintenance related? 15 to 40% of total manufacturing costs are maintenance related 40% of man-hours expended in providing the railway services are consumed by maintenance work MTR Corporation

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Maintenance as a Business Process


Maintenance is the last remaining business process that has not been optimized Handled well, it pays direct dividends, handled wrongly, it incurs costs Effective maintenance can bring substantial savings, both direct & indirect, maximizing efficiency & productivity, and improving the bottom line
2010 by Albert H.C. Tsang Failure Data Analysis Tools 3

Types of Losses
Effective Utilization
Breaks Maintenance / Cleaning Training planned

Break downs / Repairs Production change over unplanned Tool change / Adjustments Run at reduced speeds Minor stoppages Idling p / Shut down Start up Errors / Defects Scrap Rework
2010 by Albert H.C. Tsang Failure Data Analysis Tools

Stops

Performance Losses

QualityLosses
4

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Overall Equipment Effectiveness (OEE)


Total Time
480 min The OEE is decreased by all non-value adding Events: E t
Planned - Breaks, Maintenance, Cleaning - Changeover, Adjustment, Tool change Unplanned - Breakdowns, Repairs - Run at reduced speeds - Minor stoppages - Idling (e g due to missing material/personnel) (e.g. - Start up and Shut down losses

Available Time
400 min

- 80 min

Availability Losses

Productive Time
330 min

- 70 min

Performance Losses

Effective Time
310 min

- 20 min

Quality Losses

- Scrap - Defects - Rework

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

OEE Calculation
Availability Rate = Available Time Total Time = Total Time - Availability Losses Total Time 480 min 80 min 480 min =

OEE (in %) =
AR 0,833

X
PR 0,825

Performance Rate =

Productive Time Available Time

Available Time Performance Losses Available Time 400 min - 70 min 400 min

Quality Rate =

Effective Time Productive Time

X
= Productive Time Quality Losses Productive Time 330 min - 20 min 330 min

QR 0,939

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

= 0,646 = 64,6% 6

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Prioritization of the Losses


Adjustments Break Downs Minor Stops Changeovers

Use a Pareto diagram to visually prioritize the individual types of losses


Losses Pareto Analysis

OEE

Break-Downs

Changeovers

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

Adjustments

Minor Stops

Effective Utilization

Determining Maintenance Priorities

How do you prioritize failure codes?

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Optimizing Maintenance Decisions


Optimizing p g Equipment q p Maintenance and Replacement p Decisions

Component Replacement

Inspection Procedures

Capital Equipment Resource Replacement Requirements

DATABASE (CMM/EAM/ERP System)

Quantitative decision models


2010 by Albert H.C. Tsang Failure Data Analysis Tools 9

Quality of Life Data

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

10

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Component Replacement
Failure Replacement
The unscheduled actions taken, as a result of failure, to restore a system to a specified level of performance

Preventive Replacement
The scheduled actions taken, not as a result of f failure, f il to t retain t i a system t at t a specified ifi d level of performance by such functions as scheduled replacement of critical items and overhauls
2010 by Albert H.C. Tsang Failure Data Analysis Tools 11

Component Replacement

Is preventive replacement a panacea for optimization of maintenance?

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

12

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Component Replacement
Use Preventive Replacement only if :
the risk of failure increases with age or usage, i.e., wear-out effect occurring total cost of a failure replacement is greater than total cost of a preventive, replacement

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

13

Optimizing Preventive Replacement Decisions We want


Fact-based arguments
(data driven decisions)

NOT
Intuition-based pronouncements
(strength of personalities, # of mechanics complaints)

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

14

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Preventive Replacement Cost Conflicts


Total C Cost Per Week

Failure Replacement Cost per Week Cost per Week

Preventive P ti R Replacement l t Cost per Week

0 0
Optimal preventive replacement age

Preventive Replacement Age

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

15

Optimizing Preventive Replacement Decisions


Maintenance data are stored in databases of CMM/EAM/ERP system A typical scenario: Data Rich, Information Poor The h risk k of f failure f l can be b determined d d from analysis of the items failure data Weibull Analysis
2010 by Albert H.C. Tsang Failure Data Analysis Tools 16

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

Software & Web Sites


Software ReliSoft: www.reliasoft.com MORE tools: http://www.crcpress.com/product/isbn/9780849339660 Maintenance Web Sites
http://www.pem-mag.com http://www.plant-maintenance.com http://reliabilityweb.com

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

17

References
Tsang, Albert H.C., et al. (2000) RCM: A Key to Maintenance Excellence, CityU Press Jardine, A.K.S. and Tsang, Albert H.C. (2006) Maintenance, Replacement, and Reliability: Theory and Applications Taylor & Francis: CRC Press

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

18

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

2010/7/3

The Instructor
Albert H.C. Tsang is Principal Lecturer in the Department of Industrial & Systems Engineering at The Hong Kong Polytechnic University. University He has a PhD from the University of Toronto. Dr. Tsang is a chartered engineer in the United Kingdom with working experience in the manufacturing industry covering functions such as industrial engineering, quality assurance, and project management. He is a founding member, fellow, and past Chairman of the Hong Kong Society for Quality (HKSQ). Dr. Tsang had provided consultancy and advisory services to enterprises and industry support organizations in manufacturing, logistics, public utilities, health care, and government sectors on matters related to quality, reliability, maintenance, performance management and assessment of performance excellence. Dr. Tsang is the author of WeibullSoft, a computer-aided self learning package on Weibull analysis, and a co-author of 2 books: Reliability-Centred Maintenance: A Key to Maintenance Excellence, and Maintenance, Reliability and Replacement: Theory and Applications.

2010 by Albert H.C. Tsang

Failure Data Analysis Tools

19

2010 by Albert H.C. Tsang

10

Tools for Failure Data Analysis


Albert H.C. Tsang
The Hong Kong Polytechnic University albert.tsang@polyu.edu.hk

Abstract Using knowledge acquired from fact-based analysis to inform decisions is the hallmark of performance excellence in maintenance management. This paper presents a set of data mining tools applicable to failure data analysis that will support maintenance and asset replacement optimization, and prioritization of failure codes. These tools Weibull analysis, Laplace trend test, logarithmic scatter plot of MTTR versus count of failures, and jack-knife diagrams will enable maintenance, plant, works or planning managers and engineers to: (1) (2) (3) (4) (5) (6) Characterize the risk of failure of specific assets Choose between preventive replacement and run-to-failure Detect trends of reliability degradation or improvement Classify failure codes into reliability, maintainability and availability problems Determine criticality of failure codes in the context of the business cycle Visualize maintenance performance trends associated with specific failure codes

Keywords Maintenance optimization, Weibull analysis, Laplace trend test, plot of MTTR versus failure counts, Jack-knife diagrams

Introduction A survey conducted by Plant Engineering & Maintenance Magazine (Robertson & Jones, 2004) indicated that maintenance budgets ranged from 2 to 90% of the total plant operating budget, with and the average being 20.8%. It can be reasoned that operation and maintenance (O&M) represent a major cost item in equipment intensive industrial operations. These operations can achieve significant savings in O&M costs by making the right and opportune maintenance decisions. Maintenance is often the business process that has not been optimised. Instead of being a liability of business operations, achieving excellence in maintenance will pay huge dividends through reduced waste, maximised efficiency and productivity, thereby improving the bottom line. Maintenance excellence is concerned with balancing performance, risks and the resource inputs to achieve an optimal solution. This is not an easy task because much of what happens in an industrial environment is characterised by uncertainties. Traditionally maintenance practitioners in industry are expected to cope with maintenance problems without seeking to operate in an optimal manner. For example, many preventive maintenance schemes are put into operation with only a slight, if any, quantitative approach to the scheme. As a consequence, no one is very sure just what the best interval between preventive replacements is and, as a result, these schemes are cancelled because it is said they cost too much. Clearly some form of balance between the preventive replacement interval and the returns from it is required (for example, maintenance costs are reduced because items are often replaced before they fail in-service resulting in costly repairs).
Tools for Failure Data Analysis Page 1 of 9 17/10/2006

Asset managers who wish to optimise the life-cycle value of the organisations human and physical assets must consider four key decision areas: (1) component replacement, (2) inspection activities, including condition monitoring, (3) replacement of capital equipment, and (4) resource requirements. (Jardine and Tsang, 2005) Consider the conflicts involved in decision area (1), i.e., component replacement. If a decision is made to perform repairs only, and not to do any preventive maintenance, such as overhauls, it may well reduce the budget required by the asset management department but it may also cause considerable production or operation downtime. Analytical models can help the maintenance manager to take account of these interactions in order to achieve optimal solutions, thereby reducing tension that often occurs between maintenance and operations.
Total cost of maintenance and time lost

Cost / unit time

Optimal Policy Cost of maintenance policy

Cost of time lost due to breakdowns

0 0
Maintenance policy (say, frequency of overhauls)

Figure 1: Optimal frequency of overhauls Figure 1.9 illustrates the type of approach taken by using a mathematical model to determine the optimal frequency of overhauling a piece of plant by balancing the input (maintenance cost) of the maintenance policy, against its output (reduction in downtime). Modelling the risk of failure is a crucial step in optimising replacement of components that is subject to failure in preventive maintenance or condition based maintenance schemes. Weibull analysis, a powerful tool for modelling such risks, as well as the techniques and tools available to address complications that may arise in such analysis are presented in the subsequent section. Outage incidents and repair times are typically classified according to failure modes, which are known as failure codes in many organizations. Pareto charts are commonly used to determine the maintenance priorities by ranking equipment failure codes according to their relative downtime contribution. However, these charts fail to highlight the dominant variables that contribute to equipment downtime. These variables are frequency of failure, and mean time to repair (MTTR). The latter part of this paper presents a powerful charting tool that
Tools for Failure Data Analysis Page 2 of 9 17/10/2006

addresses the above mentioned limitation of Pareto analyses. It allows us to classify failures into acute and chronic problems, and identify problems that affect system reliability, availability, and maintainability. This type of knowledge is critical in adjusting the focus of maintenance efforts that takes into account changes in business priorities. The charting technique can also be extended to visualize maintenance performance trends associated with specific failure codes Weibull Analysis Maintenance decision analyses that consider risks of failure involve the use of the equipments failure time distribution that may not be known. However, observations of failure times may well be available from maintenance records. We might wish to find a Weibull distribution that fits these failure observations. The Weibull distribution is named after Waloddi Weibull (18871979) who found that in general, distributions of data on an items time-to-failure can be modeled by a function of the following form:

t f (t ) =
f (t ) = 0

t exp

for t > for t

The three parameters of a Weibull distribution are: (Shape parameter), (Location parameter), and (Scale parameter). and have non-negative values. When an items failure time distribution can be modelled by a Weibull distribution, the value of that distribution will determine the items risk profile. When is less than, equal to, and greater than 1, the risk of failure will decrease, remain constant, and increase with age (usage), respectively. The motivation of replacing an item preventatively is to reduce the risk of inservice failure so as to optimize total preventive and failure replacement costs. Obviously, there is no justification to do preventive replacement when is not greater than 1. In such cases, run-to-failure should be the optimal replacement policy. The Weibull distribution that models a given set of failure data can be determined graphically using Weibull analysis. The procedure, which involves plotting cumulative percentage failure F(t) versus the observes failure time t on a special probability paper known as Weibull paper, is given in Jardine and Tsang (2005: 239 243). Figure 2 shows the Weibull plot of a set of lamp failure data. A straight line can be fitted through the plot, indicating that the Weibull distribution with = 0 can be used as the model of the data set. We can then proceed to estimate the other parameters of the distribution from the plot. From the estimation point on the top left hand corner of the Weibull paper, we draw a line perpendicular to the fitted line. The intersection between the perpendicular line and the scale beneath the Estimation Point gives the estimated value of , which is 1.2 in this case. Thus, the lamp passes an essential hurdle to justify the adoption of a preventive replacement policy. The value of t at which the fitted line cuts F(t) = 63.2% (the estimation line on Weibull paper) is an estimate of .

While a Weibull distribution is completely defined by the values of its , and parameters, we may also wish to determine its mean value . It can also be determined from the Weibull plot, from the intersection between the perpendicular line and the P scale beneath the

Tools for Failure Data Analysis

Page 3 of 9

17/10/2006

Estimation Point of the Weibull paper. In the example given in Figure 2, P = 60%. Thus, the distribution mean is 40 hours, the time at which the cumulative probability of failure = 60%.

60%

1.2

Item: Lamps

60%

= 43 hours = 40 hours

Age: (hours)

Figure 2: Weibull plot of lamp failure data Weibull analysis on failure data may get complicated in practice. Commonly encountered complications include: the Weibull plot is non-linear; not every item observed has been run to failure, i.e., data set with censored (incomplete) failure data; too many records need to be analysed; and analysis of data generated from multiple failure modes. Furthermore, additional information may need to be mined from the Weibull plot, such as determining the confidence interval on estimates of reliability at specified age, and assessing the goodness of fit between the hypothesized Weibull model and the data set. A discussion on the techniques and tools to deal with these situations and demands will considerably increase the size of this paper. Interested readers are referred to Appendix 2 of Jardine and Tsang (2005) for such information.
Tools for Failure Data Analysis Page 4 of 9 17/10/2006

Identifying Trends of Failure Data

A Weibull analysis involves fitting a probability distribution to a set of failure data. It is assumed that the process generating the failure times is statistically stable. In reality, this condition may not apply as in the case of failure times observed from maintenance records of repairable systems. Fore example, design modifications and improvements made on the equipment in successive life cycles may have the effect of progressively reducing the frequency of failure. In another scenario, imperfect repair or increasing severity of usage in successive life cycles may produce a trend of increasing frequency of failure. Conducting a Weibull analysis on time-between-failure data of these cases is inappropriate because the failure distribution varies from one life cycle to another. The Laplace trend test can be used to detect existence of trends in a data set of successive event times. Let ti denotes the running time of a repairable item at its i-th failure, where i = 1, , n; N(tn) be the total number of failures observed to time tn, and the observation terminates at time T when the item is in the operational state. In other words, the failure times are obtained from a time terminated test. Figure 3 shows the notations used.
End of observation

t1 t1

t2

ti-1

ti ti

tn-1

tn

Running time

tn T

Figure 3: Time-terminated failure times Using the Laplace trend test to determine if the failure generating process is stable, the test statistic for time terminated data is:
nt i 1 0.5 u = 12 N (t n ) T N (t n ) (1)

If the process is stable, u will be normally distributed with mean = 0, standard deviation = 1. When u is significantly small (negative), we infer the existence of reliability growth. When u is significantly large (positive), we infer the existence of reliability deterioration. If we are satisfied that the failure generating process is stable, Weibull analysis can be performed on the inter-failure times, (ti ti1), where i = 1 to n. In the case where the observation terminates at a failure event, say tn , we have a set of failure terminated data. The test statistic for failure terminated data is:
n 1 t i 1 0.5 u = 12 N (t n1 ) t n N (t n 1 ) (2)

Worked examples illustrating the use of Laplace trend test can be found in Jardine and Tsang (2005: 268 270).

Tools for Failure Data Analysis

Page 5 of 9

17/10/2006

Visualization of Maintenance Priorities

Pareto charts are the classical tool used by maintenance analysts to prioritize failure codes. Depending on the focus of attention, these Pareto charts may help to visualize ranking of failure codes according to cumulative maintenance downtime (or costs), failure frequency, and MTTR. The priorities generated from these Pareto charts are often dissimilar to each other. Given the multiple listings, which failure codes should be given top priority for improvement effort to maximize impact on business performance? It is also noted that maintenance downtime, failure frequency, and MTTR are inter-related. Maintenance downtime is contributed by the combined effects of failure frequency and MTTR. Thus, a Pareto chart of maintenance downtime that prioritizes availability problems is unable to identify failure modes that caused brief yet frequent operational disturbances (reliability problems), or those with long downtime in each instance (maintainability problems). A tool that can simultaneously identify availability, reliability, and maintainability problems will be very useful to maintenance analysts. Knights (2004) presents a visualization tool that serves this purpose. It is a scatter plot of failure codes on a log-log graph paper. The X and Y axes of the plot indicate number of failures recorded and MTTR, respectively. Curves of constant downtime will be shown as lines with a slope of -1 on the graph. Figure 4 is an example of such a plot, with each dot represents a failure code.

Figure 4: Log scatter plot of MTTR versus number of failures Source: Knights (2004) Failure codes that result in lengthy repair in each occasion are classified as acute problems, and those that recur frequently are considered chronic problems. Thus, we can divide the scatter plot into four quadrants as shown in Figure 5. The threshold limits that define the boundaries of these quadrants can be determined by company policy, or they can be relative values, as suggested by Knights (2004). The determination of maintenance priorities is influenced by business imperatives. When facilities have to operate at capacity to meet surplus demand for output, or when each unit of output generates high return, the opportunity cost of lost production will far exceed the direct cost of repair and maintenance. In such situation, enhancing equipment availability and
Tools for Failure Data Analysis Page 6 of 9 17/10/2006

reliability should have higher priority than improving maintainability, and the Jack-knife diagram shown in Figure 6 helps to identify these higher priority problems.

Figure 5: Log scatter plot with limit values Source: Knights (2004)

Figure 6: Jack-knife diagram for facilities operating at peak capacity Source: Knights (2004)

Tools for Failure Data Analysis

Page 7 of 9

17/10/2006

In another scenario, when facilities have lots of idle capacity, or when the return from output is low, controlling and reducing the direct cost of repair and maintenance will become the primary focus of attention. Under such circumstances, dealing with equipment availability and maintainability problems should be given higher priority than resolving reliability problems, and the Jack-knife diagram shown in Figure 7 helps to identify these higher priority problems.

Figure 7: Jack-knife diagram for facilities with lots of idle capacity Source: Knights (2004) Maintenance performance trends associated with specific failure codes can also be visualized on the log scatter plot. Figure 8 shows the trends of four failure codes over a period of 3 years.

Figure 8: Trends in unplanned failures for an equipment Source: Knights (2004)


Tools for Failure Data Analysis Page 8 of 9 17/10/2006

Concluding Remarks

This paper presents two powerful tools for failure data analysis that support maintenance and replacement optimization, and prioritization of maintenance problems. The data used in such analyses are typically maintained in the database of the computerised maintenance management system (CMMS), enterprise asset management system (EAM), or enterprise resource planning system (ERP). However, it should be noted that these data are often not well organized in structure and imperfect (dirty and incomplete) in content. Preprocessing such data for decision support becomes a challenge. Tsang, et al (2006) discuss issues of data management for condition based maintenance (CBM) optimization. Some of the common data quality problems identified in that publication are also relevant for effective application of the tools presented in this paper.

References

Jardine, A.K.S. and Tsang, A.H.C. (2005) Maintenance, Replacement, and Reliability: Theory & Applications, CRC Press Knights, P.F. (2004) Downtime Priorities, Jack-knife Diagrams, and the Business Cycle, Maintenance Journal, 17(2), Melbourne, Australia Robertson, R. and Jones, A. (2004) Pay Day, Plant Engineering & Maintenance, 28(9), pages 1825 Tsang, A.H.C., Yeung, W.K., Jardine, A.K.S. and Leung, Bartholomew P.K. (2006) Data Management for CBM Optimization, Journal of Quality in Maintenance Engineering, 12(1), 3751

Tools for Failure Data Analysis

Page 9 of 9

17/10/2006

S-ar putea să vă placă și