Sunteți pe pagina 1din 97

Concept of

Rev. 1 : August 07, 2007

ALL RIGHTS RESERVED 2007

OBJECTIVE OF THIS COURSE :


Provide a basic understanding on equipment failure Understand the different maintenance strategies at hand Learn when to use the different maintenance matrix and indices Understand what Root Cause Failure Analysis is and how far should we analyze the problem Share the lessons on Reliability and Maintenance

RELIABILITY MAINTENANCE

RELIABILITY MAINTENANCE COURSE MODULES :


Module 1 : Understanding Equipment Failure
The truth about equipment failures What maintenance can do after all ? Understanding the patterns of failure Why Preventive Maintenance is limited ?

Module 2 : Understanding The Different Maintenance Strategies


Reactive Maintenance Preventive Maintenance Predictive Maintenance Proactive Maintenance

Module 3 : Maintenance Matrix, Indices and KPIs


Understanding MTBF and MTTF Understanding MTTR

Module 4 : Root Cause Failure Analysis


3 Levels of Root Cause Failure Analysis Sample Case Study : Pump Failure

Module 5 : Lessons On Reliability

When we set out to maintain something


What is the existing state that we wish to preserve ? What is it that we wish to cause to continue ?

Hence, when we maintain an asset


Someone wants it to do something They expect it to fulfil a specific functions Because it is what the users want it to do

By Definition
Maintenance is ensuring that physical assets continue to do what the users want them to do.

RELIABILITY DEFINED
FAILURE simply means the inability of an equipment to perform its required function. The failure of a component is viewed as terminating its life on the other hand RELIABILITY is the probability that no failure will occur throughout a prescribed operating period.

BAZOVSKY states that . . . .


Modern concept of reliability in popular language simply as the capability of an equipment not to break down in operation. When an equipment works well and performs to do its job for which it was designed, such equipment is said to be reliable

MODULE 1

Understanding Equipment Failures

INTRODUCTION :
Morale declines and standard drops Spares and budget grows on maintenance

Start here

A BELIEF THAT All Parts will wear Backlog grows and PM is missed

Pressure on maintenance to keep machine ok

DOMINO EFFECT OF BEING REACTIVE

More failures occur

Resources is taken down by breakdowns

More repeat work, working long hours

Operations cope w/ backlog wont give for PM

Get the line going, temporary repairs

SIMPLE FLOWCHART FOR REACTIVE ENVIRONMENT


YES
NO

Is It Working? Dont Mess With It!


YES

Did You Mess With It?

YOU BETTER WATCH OUT !!! Does Anyone Else Knows?


YES YES

NO

YOU BETTER NOT CRY !!!

Will it Blow Up In Your Hands?

NO NO

NO

Hide It now quick

Can You Blame Someone Else?

Look The Other Way

YES

NO PROBLEM!

LOOKING AT THE REAL MEANING OF FAILURE

Is failure GOOD or BAD ?


Failure is not bad after all

Dont we learn from failure ?


Failure is our greatest teacher

Can we succeed without failing ?


Failure is our key to success

Is failure telling us something ?


Yes, there are lessons to be learned from it

Is it ok to fail?
I really dont see anything wrong with failure if we accept them positively

What they say about failures :

Thomas Alva Edison


(1847 - 1931)

Henry Ford
(1863 - 1947)

Soichiro Honda
(1906 - 1991)

Dont call it a mistake call it an education . . .


Only 3 months of schooling First incandescent electric bulb lighted in Oct. 21st, 1879 for 40 hrs When he died he held over 1368 separate US & foreign patents

Failure is only the opportunity to begin again more intelligently


Had only limited schooling He produced an affordable car, paid high wages,helped create a middle class.

What people see of my Success is only 1 percent But what they dont see is 99%w/c are my failures
Today, Honda Corporation employs over 100,000 people in the USA and Japan, and is one of the world's largest automobile companies.

THE TRUTH ABOUT MACHINERY AND EQUIPMENT

CAN WE REALLY ELIMINATE FAILURES ? An equipment will compose of the following Electronic parts Electrical parts Mechanical parts
(100,000 pcs) ( 30,000 pcs) (5000 pcs)

2 important questions to raise for the maintenance will be

1) What exact part will fail ? 2) When will that part fail ?

THE TRUTH ABOUT MACHINERY AND EQUIPMENT

But we have around 100 similar machines & 10 types of equipment Each equipment have around more than 130,000 components in it We only have 5 maintenance craftspeople per shift for all our equipment How do we know which parts will fail, what machine and when ? Can we accept the fact that failures are really meant to happen after all ?

THE TRUTH ABOUT MACHINERY AND EQUIPMENT

TO ADDRESS THIS ISSUE :


Many people are deployed to perform full time repair work
We have some form of Preventive Maintenance that sort of schedule these equipments for some form of maintenance work
I guess thats the way it is boss !

Maintenance will only focus on failed parts that will stop the equipment from running & likely ignore failures of secondary functions
Inspections are added from time to time increasing the amount of work for the maintenance Maintenance are measured by how fast they perform their repair

Despite these very noble efforts machine still fails,RIGHT

THE TRUTH ABOUT MACHINERY AND EQUIPMENT


We use machinery and equipment to perform a particular function, if it cannot provide that function we say that our equipment have failed or a breakdown occurs

FACT 1
Equipments do not fail, there are some parts on the equipment that had failed, once we have identified the failed part and replace it then the machine will be running again.

FACT 2
Although we might be using some statistics & history records as a baseline, the fact still remains, we do not know exactly which parts are going to fail and when it will fail precisely, but we certainly know that one day our car will run dead, our computer will stop working and our equipment will stop working due to an event of a failure or breakdown . . . . .

THE TRUTH ABOUT MACHINERY AND EQUIPMENT


Therefore, the aim of maintenance is to control the timing of failure so that we can select or perform a task before failure happens The best that we can do to our equipment will be to :

1st - Extend the length of time between failures


2nd - Prevent the failures by replacing the most worrisome component before they fail 3rd - Monitor failures by providing signs and symptoms that they are on the verge of failing, this is possible by determining the condition of the equipment

Making equipment more reliable is about extending the life & the time between failure (MTBF) as well as preventing failures by replacing of part & components. This is what maintenance is all about . . . . .

TYPICAL CAUSES OF FAILURES

FAILURE
( TIP OF THE ICEBERG )
FRACTURE HUMAN ERROR CORROSION LOOSE BOLTS VIBRATION LOOSENESS DEFORMATION MISALIGNMENT DIRT / DUST LEAKAGE TEMPERATURE FATIGUE ABRASION CONTAMINATION LUBRICATION ENVIRONMENT

Common Belief : Does all parts will wear out ?


Maintenance people believe that ALL parts after consistent use will reach a point of wear and tear, hence, overhauling or replacing the part before it fails on a specific fix schedule will ensure the reliability of the equipment, therefore the concept of Preventive Maintenance will solve the problem of unexpected failures, RIGHT or WRONG ?
Natural Deterioration Point that part is expected to reach failure

DETERIORATION

Failure Line
Accelerated Deterioration
Time-Based Condition-Based Failed State / Run To Fail
Point 1 Point 2
Point 3 Point 4

TIME

Most Manufacturing Industries Experience . . .

It is also borne out by the machine operator who says that every time maintenance works on it over the weekend, it takes up to Wednesday to get it going again
Reference page 143 RCM by John Moubrey

It is the belief that led to the idea that the more often an item is overhauled, the less likely it is to fail . . .
Schedule Overhauls / Preventive Maintenance increases Overall failures by introducing Infant Mortality into otherwise stable system

Resulting schedules are used for all similar assets again, without considering that different consequences apply in different operating context. This results in large number of schedules which are wasted , not because they are wrong in the technical sense, but in reality, they achieve nothing

What did Stanley Nowlan and the late Howard Heap Discovered 2 discoveries evolved which created a change in the evolution and thinking of the maintenance system worldwide . . . . .

First, scheduled maintenance has little or no effect on the reliability of a complex item unless the item has a dominant failure mode. Second, there are many items for which there is no effective form of scheduled maintenance.

UNDERSTANDING BREAKDOWN
HARD FACTS ABOUT EQUIPMENT FAILURES

Not all failures will constitute a downtime Failure occur in 3 pattern, Infant Mortality, Random Failure & Age-Related Failures, and most of the failures we encounter is either random or infant mortality failures Increasing the amount of Preventive Maintenance activities on the equipment will likewise increase the chances of Infant Mortality Failures & that the only way to reduce Infant Mortality Failure is to reduce the amount of work in our PM Not all failures can be eliminated, the best that maintenance can actually do is to control the timing of failure and that reducing the consequences of failure is more feasible rather than trying to eliminate the failure itself

UNDERSTANDING BREAKDOWN
HARD FACTS ABOUT EQUIPMENT FAILURES

Preventive Maintenance can only capture wear out or age-related failures. When failure is random in nature, this is when PM is at weakest point and likewise not feasible to use All failures are not created equal, yet all failures will have their degree of consequences. Hence, the degree of maintenance requirements should be based upon the consequences of failure itself. When failure has little or minor consequences it is a good decision to allow the failure to occur

THE TRUTH ABOUT MACHINERY AND EQUIPMENT


MACHINE 1 MACHINE 2 MACHINE 3 MACHINE 4 MACHINE 5

1 Failure / Mo MACHINE 6

1 Failure / Mo MACHINE 7

No Failures MACHINE 8

No Failures MACHINE 9

1 Failure / Mo MACHINE 10

9 Failures / Mo

8 Failures / Mo 1 Failure / Mo

No Failures

No Failures

Will these 10 equipments have the same amount of PM required ? Which machines will require the greater amount of maintenance ? Should we follow the specs or we apply common sense on maintenance ?

COMPARING RANDOM AND AGE-RELATED FAILURES


CASE 1: RANDOM FAILURES 5
1

Ex : 100 failures encountered on a ball bearing for a span of 9 years & distribution is as ff 20 10
4 5

15
2

10
3

5
6

15
7

10
8

10
9

PERIOD OR LIFE

CONCLUSION : Failure distribution is not symmetrical, PM not applicable


CASE 2 : AGE-RELATED FAILURES
2
1

BEST PERIOD TO PERFORM REPLACEMENT

1
2

0
3

0
4

0
5

2
6

1
7

0
8

94
9

PERIOD OR LIFE

CONCLUSION : Failure distribution is almost age-related, for this case the best period to perform replacement is on the 8 month

There is a belief that all items have a life and that installing a new part before the life is reach will automatically restore it to its original basic condition = FALSE

This will lead us to the conclusion that the truth is . . . . .

MORE PM MEANS MORE PROBLEM


LESS PM MEANS LESSER PROBLEM

CHANGING THE WAY WE THINK ABOUT FAILURES

We need to understand that failure occur in 3 ways . . . . .


1st - INFANT MORTALITY : Failure can occur at the beginning

2nd - RANDOM FAILURES : Failure can occur at any period


3rd - AGE-RELATED FAILURES : Failure will wear due to age

And most maintenance only focus on the 3rd type of failure, and neglecting to understand that infant mortality failures & random failures occur more frequently than wear out failures
RANDOM FAILURES

BATHTUB CURVE
INFANT MORTALITY
Occurrences of random and infant mortality failures are more frequent than wear out failures WEAR OUT FAILURES

MISCONCEPTION ABOUT PREVENTIVE MAINTENANCE ?

Can all failures be captured by Preventive Maintenance ?


ANSWER : Despite the best efforts & structure on Preventive Maintenance Failures are still inevitable & will not be captured solely by PM. Zeroing out all breakdowns is like catching a lighting with a Polaroid Camera . . .

Why wont PM capture all failures ?


ANSWER : Typically only around 20% of component failures will wear out or are directly related to the age of the equipment, and around 80% or all failures will fit the random and infant mortality failures. And when the failure is random in nature, there is no amount of PM that can address this issue. This is where PM is at its weakest, hence, let us not misuse this strategy.

MODULE 2

Understanding The Different Maintenance Strategies

REACTIVE MAINTENANCE

PREVENTIVE MAINTENANCE

PREDICTIVE MAINTENANCE

PROACTIVE MAINTENANCE

REACTIVE MAINTENANCE :
Maintenance is done at a point when there is repair or actual breakdown It occurs when repair action is taken on a problem only when the problem results in machines failure. Unplanned downtime, in its simplest definition, breakdown maintenance simply means fixing it when it fails

Run-to fail

Run-to destruction

Reactive Maintenance

Band-Aid Maintenance

No Scheduled Maintenance

Firefighting

REACTIVE MAINTENANCE :
If aint broke dont fix it, when it breaks will fix it

When this is the sole type of maintenance practice


- High percentage of unplanned activities - High replacement and parts inventories - High pressure to keep equipment running

A purely reactive maintenance strategy ignores opportunities to influence equipment reliability and survivability Justifiable in particular instances if :
- Does not produce critical delays - Does not sacrifice peoples safety - Does not significantly increase costs - With redundant functions of standby

RUN TO FAIL If failure is evident and does not affect safety or environment, or if it hidden but does not affect safety or environment then default decision is No Scheduled Mtce
RUN TO FAIL MAINTENANCE IS VALID IF :

- A suitable scheduled tasks cannot be found for hidden function - A costs effective preventive tasks cannot be found for failures w/c have operational or non-operational consequences

WHEN REACTIVE MAINTENANCE CAN BE JUSTIFIED


IS MONITORING, SCHEDULED MAINTENANCE OR INSPECTION REQUIRED FOR SAFETY OR ENVIRONMENTAL COMPLIANCE ?

NO
WILL THE BREAKDOWN BE MORE COSTLY THAN THE TASKS OF PREVENTING THE FAILURE ITSELF ?

NO
IS THE EQUIPMENT IN THE CRITICAL PATH IN MANUFACTURING OR CONSIDERED A BOTTLENECK EQUIPMENT OR PROCESS ?

NO
IS BACK-UP EQUIPMENT UNAVAILABLE ?

NO
WILL THE BREAKDOWN ADVERSELY AFFECT DELIVERY OR CUSTOMER SERVICE OR PROVIDE ANY DELAYS ?

NO
WILL THE BREAKDOWN FURTHER DAMAGE THE EQUIPMENT OR PROVIDE SECONDARY DAMAGES ?

THEN REACTIVE MAINTENANCE IS JUSTIFIED

RUN TO FAIL EXAMPLES


Electronic Circuit Boards Busted Light Bulb Failures Parts with redundancy or standby items such as pumps & motors Spare parts & component failures that will limit the failure to the component itself with no chances of secondary failures

Overstock inventories that can accommodate the repair time itself When the consequences of failure and the cost or repair is minimal

PREVENTIVE MAINTENANCE :
Preventive Maintenance is simply performing maintenance on a fixed interval w/c may be in the form of time, number of strokes or frequency

Calendar-Based

Stroke-Based

Scheduled-Discard / Replace Parts

Time-Based

Running Hours

Scheduled-Restoration / Overhaul

PREVENTIVE MAINTENANCE :
Also known as Time-Based or Calendar

Based Maintenance
Maintenance activities are performed on

a calendar or fix operating schedule in order to extend the life of the equipment and prevent failures
Maintenance is performed without regard

to equipment condition
Assumes that the condition of the machine

and the need for maintenance is correlated with time which means that the item can be expected to operate reliably for an amount of time and is expected to wear out
A failure rate and history records are used

to established the best frequency

PREVENTIVE MAINTENANCE :
Stress cause an asset to deteriorate by lowering its resistance, exposure to stress includes output, distance traveled, operating cycles, calendar time and running time

Trademark for Patterns A, B, and C

WHEN PREVENTIVE MAINTENANCE IS FEASIBLE


When the part or component wears out directly with respect to its operating age

These parts will survive this defined age Ex. 98 % of impellers were replaced after the end of 2 years
The part or component will have a normal rate of wear, TPM term will be natural deterioration. A more technical term will be normal fatigue Fatigue happens when the stress exceeds the strength of the material of the spare part or component Application of Preventive Maintenance tasks will only be worth doing and feasible to parts that will have a normal wear or deterioration

WHY PREVENTIVE MAINTENANCE IS LIMITED ?


A common problem with mature maintenance programs that if not correctly designed, then between 40 to 60% of the PM tasks serve very little purpose and therefore, evaluating our current Preventive John Moubray 1997 Maintenance System should lead us : Many tasks duplicate other tasks Some tasks are done to often while others are not enough Some tasks serve no purpose whatsoever Many tasks will be intrusive (forced) and overhaul based whereas they should be condition-based Some tasks cost more to do than the failure it is meant to prevent Maintenance is costly by replacing perfectly good parts since we are basing replacement on time-based

John Moubray author


Reliability-Centered Maintenance

WHY PREVENTIVE MAINTENANCE IS LIMITED ?


Should maintenance or replacement be carried out on a piece of equipment & if the equipment is in good condition, then it should remain in service. Preventive Maintenance does not guarantee that the parts to be replace really needs to be replaced

Why don't PMs significantly reduce the amount of reactive maintenance being performed in your plant? The answer is simple. PMs were designed around the theory that equipment failures are directly related to the age of the equipment. Since only 20 percent of equipment failures fit this pattern that means that 80 percent of equipment failures are not being effectively managed by doing time-based PMs.

PREDICTIVE MAINTENANCE :
Predictive Maintenance aids in detective potential failures in equipment with the aid of specialized instruments. Maintenance is based on the condition of the equipment which differentiate it from Preventive Mtce

Condition-Based Maintenance

Equipment Diagnostic Equipment Monitoring Technique Technique

On-Condition Tasks

Just In Time Maintenance

On-Line Monitoring Equipment

Reliability-Based Maintenance

PREDICTIVE MAINTENANCE DEFINED

A person is gifted with 5 senses which are sense of smell, touch, taste, hear, sight. He can use these senses to detect problems on the equipment. Condition-Based Monitoring checks the condition of an equipment through the use of sophisticated measuring instruments with precision accuracy. Predictive Maintenance instruments are a higher form of the human senses

CONDITION-BASED MAINTENANCE DEFINED

CBM tasks entails checking for potential failures, so that action can be taken to prevent the functional failure or to avoid the consequences of a functional failure

P-F INTERVAL
When to used CBM technique ?
P-F INTERVAL :

Is the interval between the emergence of the Potential Failure and its decay into a Functional Failure

POTENTIAL FAILURE : Is defined as an identifiable physical condition which indicates that a functional failure is either about to occur or is in the process of occurring FUNCTIONAL FAILURE : Is defined as the inability of an item to meet a specific performance standard

DETERMINING POTENTIAL FAILURES


Predictive Maintenance aids us in determining the potential failure or symptoms that an equipment is in the process of failing. Changes or increase in the following can denote a potential failure. Specialized diagnostic instruments can aid in detecting the following : Heat or temperature Vibration For Electrical we have

changes in resistance changes in conductivity changes in dielectric strength

Increase in Noise Pressure change Flow rate change Lubricant contamination Wall thickness decrement Rate of corrosion Leak detection Crack detection

WHY PDM IS BETTER THAN PM ?


Preventive Maintenance
Predictive Maintenance

Overhauls performed on a fixed interval Overhauls to be performed if there is a whether Time-Based or Running hours potential failure detected Preventive Maintenance is performed when the machine is stopped Parts are being replaced on fixed-interval, after it reached its specific time or running hours Predictive Maintenance can be perform while the machine is running Parts are only replaced if a specific potential failure is present, if nothing is wrong, then no replacement takes place More cost effective than preventive since part is utilized almost all of its entire life span Parts with potential failures replaced

Parts are being utilized based on the frequency of replacement, parts will be replaced even when good, to conform
Possibility of replacing good parts

Cannot detect exact location of problem Infra-red cameras can detect the exact location of the temperature rise

PROACTIVE MAINTENANCE :
- Proactive Maintenance is about analyzing why failures occur so that its recurrence is finally eliminated, and thereby extending the life of the part or component - Proactive Maintenance is when maintenance or a group of cross-functional team analyzes the failure with analytical techniques such as Root Cause Failure Analysis, FMEA, Kepner Tregoe, P-M Analysis, Fault-Tree Analysis etc. are used to better understand why the failure occurred in the first place. - In Preventive Maintenance we replace the part that we think is in the process of wearing out. Our thinking is that replacing the part will bring the equipment back to its original condition, we have not taken into account the need to analyze further why a certain part keeps on failing.

Trouble shooting is no longer an effective strategy. In todays competitive world, the Analysts find real solutions . . . .

PROACTIVE MAINTENANCE :
REDESIGN or MODIFICATION
- Includes changing the specification of a component - Adding a new item - Replacing an entire machine with a different type - Relocating a machine - Change in process or procedure which affects operation

SAFETY & ENVIRONMENTAL ASPECTS


- Reduce the probability of Failure Mode occurring to a level which is acceptable Replacing component with stronger or more reliable replacement making the failure no longer a threat to safety and environment
1990s

1980s

1900s

1920s

1930s

1940s

1950s

1970s

PROACTIVE MAINTENANCE :
OPERATIONAL & NON-OPERATIONAL CONSEQUENCES - Reduce the no. of times failure occurs - Reduce or eliminate the consequences of a failure (example thru redundancy) - Preventive tasks is costs effective hence alternate solution is to re-design FACTORS CONSIDERED IN REDESIGN : 1. Does the failure involved major operational consequences ? 2. Is the cost or scheduled / or Breakdown maintenance high ? 3. Are there specific costs which can be eliminated by the design change ? 4. Does the design have no harmful effects which can be generated afterwards ? 5. Is there an economic trade off study on expected cost savings ? 6. Is the asset to stay or to be used for a long time or will it be decommissioned ?

IF YOUR ANSWER TO THIS QUESTION IS YES, THEN REDESIGN IS RECOMMENDED.

WORLD CLASS MAINTENANCE EXCELLENCE :


10 - 20 %

5 % and more Maintenance Prevention Level 5


Maintenance Free Plug and Play Longer Lifespan

40 - 50 %
Predictive Maintenance

Proactive Maintenance Level 4

20 - 30 %

10 - 15 %
Reactive Maintenance

Preventive Maintenance Level 2

Scheduled Overhauls Schedule Discards Outage Schedules Level 1 Time-Based Maintenance Band-Aid Maintenance Stroke-Based/Running Hrs Breakdown Maintenance Scheduled and Fix Intervals Run to Fail / Destruction Is your company adopting No Scheduled Maintenance

P-M Analysis Root Cause Failure Analysis Failure Mode & Effect Analysis Level 3 Failure Analysis Condition-Based Maintenance Use of Diagnostic Tools Specialized Equipment Predict Eminent Failure Early Alert / Detection

Reliability-Centred Maintenance ?

MODULE 3

MAINTENANCE MATRIX, KPIs and INDICES

MEAN TIME BETWEEN FAILURE


MTBF is a reliability engineering term that means the average amount of operating time between the occurrence of breakdowns that requires repair

MTBF simply means the average time between failures. It is based on historical data or estimated by vendors and is use as a benchmark for reliability

MTBF =

OPERATING TIME
NUMBER OF FAILURE

WHERE : OPERATING TIME = LOADING TIME - MACHINE RELATED DOWNTIME LOADING TIME = AVAILABLE TIME - NON-MACHINE RELATED DOWNTIME
AVAILABLE TIME = 168 hrs

NMDT
40 hrs

MDT
72 hrs (6x)

OPERATING TIME

COMPUTE FOR THE MTBF IF BDO IS 6 ?

MEAN TIME BETWEEN FAILURE


MTBF trend will be the higher the value the more reliable the machine or part In case where there is no breakdown or failure, an MTBF of infinity will be obtained. This simply indicates that there is nothing wrong w/ the equation either prolong the duration of MTBF or when there is no failure, assume a denominator of 1 to obtain a value If we buy a component with 30,000 MTBF, it means that on an average the part will run for 3.42 years without failure

MTBF VARIATIONS
MTBF can be computed on the following basis : MTBF BY CRITICAL COMPONENT To determine on an average when a particular critical component will fail MTBF BY SUB-ASSEMBLY To determine which sub-assembly fails frequently on a machine MTBF BY MACHINE To determine the MTBF of a particular machine MTBF BY GROUP OF MACHINES To determine the machine w/ the lowest MTBF and perform improvements MTBF BY PROCESS OR LINE To determine which equipment fails frequently and identify the bottleneck area in a process

MEAN TIME TO FAILURE


MTBF is a key reliability metric for systems that can be repaired or that can be restored. MTTF is the expected time to failure of a system. Nonrepairable systems can fail only once, hence for non-repairable items, MTTF is equivalent to its mean of its failure time distribution. Repairable system can fail several times, while non-repairable can fail only once.

MTBF MTTF

A MTTR

B MTTR

Point where a new part is installed Time to repair Point where the 1st failure occurs

Total time it will take for the part to fail

Point where the new part will fail again Point where the 2nd failure occurs

HENCE : MTBF = MTTR + MTTF

WHEN TO USE MEAN TIME BETWEEN FAILURE


When the type of equipment breakdown or failure is high When we want to improve the design weakness of a critical component of an equipment To determine main contributor why equipment keeps on failing (PARETO)

To compare 2 identical parts from different vendors

To determine the frequency of replacement for parts which have symmetrical or linear failures, not recommended for parts that fail randomly (Patterns D, E and F)

WHY MEASURE REPAIR TIME ?


When a failure occurs, it is critical to restore the equipment as soon as possible. Typically much of repair time is spend in determining the cause of the problem The traditional trend will be to apply a fix and never get to the root cause Repair time should be performed at the shortest possible time and our in goal will be to put back the equipment its operating state

For failures that keeps on repeating itself over and over, the best strategy will be to address the real root cause of the problem and prevent it from recurring on its own again

MTTR DEFINED
MTTR is defined as the average time required to repair the equipment divided by the Breakdown Occurrence When the system fails, and it will fail, how easy will it be to recover?" Repair Time

MTTR =

Breakdown Occurrence
MACHINE DOWNTIME Endorse Machine to operator

MACHINE STOPS

Find person who can repair it

Diagnose the fault

Find the spare parts

Repair the fault


Repair time

Revalidate test run the machine

Downtime means the total amount of time the asset would normally be out of service from the time it fails until it is fully operational

MTTR

MTTR varies from one company to another, hence, there must be a clear understanding on what MTTR constitutes

MTTR DEFINED
MTTR (Mean Time To Repair) is the average time required to repair a component Other terms used is Mean Time To Restore or Mean Time To Recover MTTR trend will be the lower or the shorter the time to repair the better. Improving the MTTR means shortening the time to repair the machine

MTTR DEFINED
MTTR (Mean Time To Repair) is the average time required to perform corrective maintenance or repair on all of the removable items in a product or system. MTTR analyzes how long repairs & maintenance tasks will take in the event of a system failure MTTR may be defined as the time it will take to bring a failed system back to its available or operating status again.
If an Ethernet card in your computer fails and takes 3 hrs to purchase and install a new card the MTTR for your computer will be 3 hrs but the Ethernet card is still broken and may never be repaired hence the MTTR for the Ethernet card is forever

UNDERSTANDING MTTR
A true and correct MTTR starts at the time of failure and continues until the system is operational again, regardless if a system part or component will be available or not

MTTR is also difficult to estimate since


one must consider a variety or repairs. An engine repair will include tightening a drain plug bolt to overhauling an entire engine assembly

MTTR and MTBF is limited to consideration


of predictable failures of parts or system for operational related causes. Equipment failures due to war, vehicle collision, fires, terrorism, lighting and sabotage are generally ignored

MTTR TO IMPROVE REPAIR TIME


MTTR can be used to track down the level of skills for maintenance and Technicians in performing repairs and to improve upon it Example monitoring the MTTR for a certain group composing of 20 people from the maintenance department, Bob is said to have the lowest MTTR when performing repair, therefore, we can define proper procedures on repairs based on Bobs practices that can be followed by other people thereby avoiding trial & error, the goal is to improve repair time performed by other maintenance craftsperson

Planned Maintenance Skills Evaluation


Gearing Towards A Pro-Active Maintenance System
Teamname : Leader : C L A SS C The Untouchables Sam Milby C L A SS B Equipment type handled : All Types C L A SS A

Division : Central Equipment Engineering Station : PLCC Department C L A SS D

Legend :
Knowledge & Skill not Satisfactory (0 points) Knowledge Satisfactory ( 0.50 points) Skill Satisfactory ( 0.75 points) Knowledge and Skill both Satisfactory

(1 Point) CAS SAY UMA NENE FRANZIN JB

Training Attended Classification No. Knowledge / Skill Item Yes No SAM BOB
1 Basic Machine Function

PLANNED MAINTENANCE MEMBERS


RICO RACQUEL

BASIC MACHINE FUNCTION

2 Machine Specs, Parts and Function 3 Knowledge in Actual Set-up and Conversion 4 Basic Lubrication Knowledge 5 Basic Repair and Troubleshooting 8 Failure Mode and Effect Analysis 9 Root Cause Failure Analysis 10 P-M Analysis 11 MTBA Snapshot and Analysis 12 Sequence Of Events Analysis 13 Knowledge and use on FRL's 14 Knowledge and use on Pipings and Connectors

ANALYTICAL SKILLS ENHANCEMENT

PNEUMATICS & HYDRAULICS

15 Knowledge and use of Cylinders 16 Knowledge and use on Filtration 17 Knowledge and use on Speed Controllers 18 Leaks and Seals 19 Bearing Failures and Causes 20 Sensors Technology 21 Motors and Pumps

OTHERS

22 Screws and Fasteners 23 Spare Parts Management 24 RCM and OER Strategy 25 Maintenance Indices and Measurements 26 Knowledge on Vibration Monitoring

PREDICTIVE MAINTENANCE ( Specialization)

27 Principles of Heat and Thermography 28 Oil Analysis and Tribology 29 Ultrasonic Monitoring 30 CMMS Structure and System

S5-03

Total Points

Module 4

Understanding Root Cause Failure Analysis

Root Cause Analysis Defined :

Root Cause Failure Analysis is trying to UNDERSTAND why something went wrong . . . . .
Root Cause Failure Analysis identifies the basic source or origin of the problem so that recurrence of the problem may be prevented

RCFA provides a methodology for investigating, categorizing and eliminating the root cause of incidents w/ safety, quality, reliability & manufacturing process consequences . . .

Identifying the Root Cause Failure Analysis event allows us to explain the WHAT, HOW and WHY of the failure

Root Cause Analysis Defined :

Proper Root Cause Analysis identifies the basic source or the origin of the problem . . . .
Every system, spares or components failure happens for a reason. There are specific succession of events that lead to a failure. RCFA follows the cause and effect path from the final failure back to its origin The root cause analysis methodology provides specific & solid foundation for preventing the recurrence of the problem or failure

Root cause analysis is a tool to better explain what happened, to determine how it happened and to better understand why it happen . . . . .

Root Cause Analysis Defined :

Root Cause Analysis separates the facts from hearsay. RCFA is not about trial and error and seeing what works and not
While there are many techniques in analyzing a problem which provide a quick answer, it does not mean that the answer is correct everytime. A true and meaningful Root Cause Failure Analysis takes the time to prove that what we say is fact & supports our hypothesis with evidence before we spend our money to improve the design of the equipment

When the facts are backed up by evidence & science and they are separated from the fiction we now have a better understanding as to the real Root cause of the problem

RCFA CASESTUDY : MISSING MONEY


CAUSE STUDY :
In the problem below a car wash manufacturer sold one of his complete, turn-key car wash systems to a client in Maryland. This includes the change machines for the people who wish to get change to wash their cars. The new owner recognizes that he is losing a significant amount of money from this change machine and insinuates that the manufacturers employees have a spare key and are stealing the money. The problem started when the new owner complained to Bill that he was losing significant amounts of money from his coin machines each week. Bill just cant believe that his people was stealing the money since he have known them for many years Bill then form a RCFA to get to the bottom of the problem

The group decided to install a surveillance camera to know who was stealing the money

RCFA CASE STUDY : MISSING MONEY Logic Tree Diagram


Missing Money
(Money from the Change Machine was missing)

Money was never there


Customers not paying

Money was stolen from the machine


(Theres a thief)

Change Machine Malfunction


Not working properly

Stolen by someone

Stolen by something

The video surveillance indicates that the customers entering the car wash hence, their hypothesis that customers was not paying was disregarded The owner try to simulate the Machine by placing some coins in them and the machine was then working properly so Change Machine Malfunction was not the problem, It is clear to them that someone is stealing the money but who . . .

RCFA CASESTUDY : MISSING MONEY


But the RCFA group had not given up and monitor the surveillance camera and found out . . .

Thats a bird sitting on the change slot of the machine and it had to go down into the machine but why ?

Thats 3 quarters he has in his beak, another amazing thing is that it was not just one bird but several of them

There goes another bird this time taking only 1 quarter

Once they identify the thieves, they found over $ 4,000.00 in the roof the the car wash and more under a nearby tree, therefore, the case of the stolen money was solved thanks to Root Cause Analysis . . .

Understanding Why-Why Analysis :


Level 1

Kingdom is Lost

Why is the kingdom lost ? Why is the king killed ? Why did the king fell of the horse ?

If the king is not killed then the kingdom had not been captured ? If the horseshoe did not come off the king might not fell on the ground and might not have been killed The groomsman might have prevented the king from riding the horse due to a missing nail and its implications If the kings horse shoe nail was complete then it might not have come of at all

Level 2

King is Killed
King fell of the horse

Level 3

Level 4

Horseshoe comes off


1 nail short on shoe Shortage of nails Prepare horses for battle

Why did the horseshoe come off ?


Why is it that one nail is short on the horseshoe ? Why is there shortage of nails ? Why prepare horses for battle ?

Level 5

Level 6

Level 7

If the city have been defended even if the king was dead then it might not have been captured ?

Understanding Why-Why Analysis :


The story is told that before an important battle a king sent his horse with a groomsman to the blacksmith for shoeing. But the blacksmith had used all the nails shoeing the knight's horses for battle and was one short. The groomsman tells the blacksmith to do as good a job as he can. But the blacksmith warns him that the missing nail may allow the shoe to come off. The king rides into battle not knowing of the missing horseshoe nail. In the midst of the battle he rides toward the enemy. As he approaches them the horseshoe comes off the horse's hoof causing it to stumble and the king falls to the ground. The enemy is quickly onto him and kills him. The king's troops see the death, give up the fight and retreat. The enemy surges onto the city and captures the kingdom. The kingdom is lost because of a missing horseshoe nail.

(1)

(2)

(3)

(4)

(5)

(6)

(7)

EXERCISES : Lets Determine The Sequence Of Events


Excessive Moisture Lack Lubricant Bearing Failure Leak in the seal Bearing Failure Lack Lubricant Corrosion Present Excessive Moisture Leak in the seal Seal was damage

High Acidity Level


Corrosion Present Seal was damage

High Acidity Level

Determine the problem and ask why to determine the sequence of events in these sample

Physical, Human and Latent Causes :

PROBLEM
Layer 1

PHYSICAL CAUSE
Layer 2

How did the incident occurred ? The Physics of the incident. This usually explains how the failure had occurred, example a bearing failed due to fatigue, this mostly explains the metallurgical factor why the failure occur What is the error committed that lead to the physical cause ? Either someone did something wrong or did the wrong thing We asked what caused the person to commit this mistake These are the management system weaknesses. These includes training, policies, procedures & specifications. People make decision based on these and if the system is flawed, the decision will be in error and will be the triggering mechanism that causes the mechanical failure to occur

HUMAN CAUSE
Layer 3

LATENT CAUSE

RCFA LOGIC TREE DIAGRAM

In RCFA Analysis a Logic Tree is used to work through a failure The failure event is placed on top followed by all failure modes or possible causes of breakdowns

DESCRIBE THE FAILURE EVENT


DESCRIBE THE FAILURE MODE

Each of the causes are hypothesis that needs to be verified so that HYPOTHESIS VERIFY HYPOTHESIS we have an understanding on w/c of the causes actually led to the DETERMINE PHYSICAL ROOTS & VERIFY problem

DETERMINE HUMAN ROOTS & VERIFY DETERMINE LATENT ROOTS & VERIFY

The next step consists of determining and verifying the physical roots, human roots and latent roots behind the failure. The final cause will always have to do with the latent cause of failures

Physical, Human and Latent Cause :


Problem : Cylinder does not operate smoothly WHY 1 : Why is it that the cylinder dont not operate smoothly ? Strainer was clogged WHY 2 : Why is the strainer clogged ? Oil was dirty WHY 3 : Why is the oil dirty ? Dirt enter the tank WHY 4 : Why did the dirt enter the tank ? Upper plate in the tank had a hole and gap - Physical Cause WHY 5 : Why was there hole and gap in Evidence of dirt from Oil Analysis the tank ? Repair error during maintenance work - Human Cause WHY 6: Why was there repair error ? No procedure to follow - Latent Cause

ROOTCAUSE IS LIKE A ROADMAP


PROBLEM

Root Cause

In performing Root Cause Failure Analysis, we are interested to know the real cause of a particular failure by verifying each hypothesis until we reach the final cause of the failure . . . . .

WHAT SEPARATES RCFA FROM THE REST


PROBLEM SOLVING TOOLS

IN-DEPT ANALYSIS

ISHIKAWA / FISHBONE WHY-WHY ANALYSIS BRAINSTORMING PARETO ANALYSIS FMEA / FMECA FAULT TREE ANALYSIS

RCFA
PHYSICAL CAUSE

HUMAN CAUSE
LATENT CAUSE

Root Cause Failure Analysis will always be based upon pure evidence and takes the time to verify each failure mode to determine the real cause of the problem. RCFA only concludes once the latent cause had been identified

P-M ANALYSIS
PROCESS MAPPING FAILURE ANALYSIS
These techniques mostly concludes on the physical and human causes only

RCFA WORKSHOP 1 :
CAUSE STUDY :
A pump was declared failed since it was not discharging fluid at all. The pump failed due to a failure of the bearing. The maintenance decided to perform a Root Cause Analysis on the failed bearing to determine the real cause of the problem and have the failed bearing analyzed on a metallurgical laboratory. Arrange the causes in sequence to determine the real root cause of the problem

INSTRUCTION :
Brainstorm and analyze the case study and rearrange the set of cards and prepare a RCFA Logic Tree Diagram

Clues :
There are 6 or 7 levels in the logic tree Metallurgical lab report indicates that the bearing failed due to fatigue w/c is a a type of wear The last level (Bottom part) will be the real root cause of the problem

ANALYZING THE BEARING FAILURE LOGIC TREE


LEVEL 1

The pump may fail for a variety of reasons, in this case it is evident to the mtce that the cause of the pump to fulfill its function of discharge fluid is bearing failure.

What the maintenance will do ?


A typical job of the maintenance is to replace the bearing with a new one

since the part had evidently failed and production is up and running again but the question is asked, Did the problem go away ? No, it will recur again on a given time

What the engineers will do ?


When we have our engineers take a look

at the failed bearing, he then takes a look on failure history and data of the pump, and conclude that a different type of bearing more heavy duty be installed. We would then get a heavy duty bearing and install it with the new design and again the question is asked, Did the problem go away ?

ANALYZING THE BEARING FAILURE LOGIC TREE


Logic Tree Diagram
LEVEL 1

Pump Failure
(No discharge at all)

Functional Failure

Motor Burned Out


Failure Mode

Bearing Failure
Failure Mode

Valve Is Shut
Failure Mode

Lets analyzed the failure of a pump


The pump failed since it is not discharging fluid at all All causes are hypothesis and must be proven if they exists The motor was checked and it was working, therefore, motor burned out had been disregarded The valve was open therefore, valve shut had been disregarded The bearing had been analyzed and it was evident that there was bearing failure, we now asked why the bearing had failed

ANALYZING THE BEARING FAILURE LOGIC TREE


LEVEL 2 : DIRT/DEBRIS and WEAR

The bearing may fail on a variety of reasons, such as dirt entry or ingression which may have caused the accelerated wear of the bearing. All are probable causes and are still considered as hypothesis. Hence, to distinguished the facts from hearsay the bearing was sent to a metallurgical lab for further analysis to determine how did the bearing failed to fulfill its function.
LEVEL 3 : WEAR DUE TO FATIGUE

The bearing had been analyzed and reviewed by metallurgist and the report concluded that there is strong evidence of FATIGUE, now the other probable causes had been therefore eliminated we ask ourselves how can fatigue occur on the bearing ?

ANALYZING THE BEARING FAILURE LOGIC TREE


Logic Tree Diagram
LEVEL 1

Pump Failure
(No discharge at all)

Functional Failure

Motor Burned Out


Failure Mode
LEVEL 2

Bearing Failure
Failure Mode

Valve Is Shut
Failure Mode

Dirt / Debris
LEVEL 3

Lack of Lubrication

Overloading

Wear

Have the bearing analyze for its metallurgical lab on why it failed

Adhesive

Abrasive

Erosive

Fatigue

Corrosive

How

Lubrication in the bearing was checked and found out it is sufficient Vibration monitoring shows there is no indication of overloading The only possibility left was Dirt/Debris and Wear and so the team decided to have the bearing test on a metallurgical laboratory

ANALYZING THE BEARING FAILURE LOGIC TREE


LEVEL 4 : HIGH VIBRATION

In Level 4 of our analysis we ask ourselves How can Fatigue occur on the bearing ? We hypothesize that it can come from high vibration. We check our vibration monitoring records and we are certain that there is evidence of excessive vibration. Excessive amplitude from our vibration data supports our hypothesis that fatigue occur on the bearing due to high or excessive vibration
LEVEL 5 : MISALIGNMENT

As we dig deeper into the root cause, again we hypothesize, How can we have excessive vibration? Possibilities is that it can come from imbalance, resonance and misalignment Again the vibration analyst verifies his vibration records and find out the resonance and imbalance is not a major cause for the excessive vibration. We called the maintenance who aligned the pump to align it again and we observe his practices. From our observation we are certain that he does not know how to align the pump properly

ANALYZING THE BEARING FAILURE LOGIC TREE


LEVEL 6 : NO PROCEDURE / NO TRAINING / IMPROPER TOOLS

We asked the mechanic if he had been trained in the proper alignment and he said that he was never trained in how to align, there was no procedure for the alignment and how frequent it should be performed People often misalign because they were never trained in proper alignment practices, no procedure exists outlining alignment as a required practice with specification or the current alignment equipment we are using is worn our or inadequate for the application

THIS IS THE LATENT CAUSE

ANALYZING THE BEARING FAILURE LOGIC TREE


Logic Tree Diagram
LEVEL 1

Pump Failure
(No discharge at all)

Functional Failure

Motor Burned Out


Failure Mode
LEVEL 2

Bearing Failure
Failure Mode

Valve Is Shut
Failure Mode

Dirt / Debris
LEVEL 3

Lack of Lubrication

Overloading

Wear

Have the bearing analyze for its metallurgical lab on why it failed

Adhesive
LEVEL 4

Abrasive

Erosive

Fatigue High Vibration

Corrosive How

How

LEVEL 5 LEVEL 6

Imbalance

Misalignment

Resonance

How

Real Root Cause of the Problem

No Procedure

No Training

No Alignment Tools

WITHOUT RCFA WHAT DO THEY DO TO SOLVE THE PROBLEM


FROM A PREVENTIVE MAINTENANCE VIEWPOINT

The maintenance will merely change or replace the bearing. If this part fails frequently then boss makes sure that there is enough stock in the warehouse department
FROM A PREDICTIVE MAINTENANCE VIEWPOINT

Our CBM group can warn the operation of an impending failure to occur bought about by excessive vibration in the pump. Although the failure is predicted, the problem still does not seem to go away
FROM AN ENGINEERING VIEWPOINT

Modify or change the bearing with a more heavy duty and put it in service. In short we conclude at once to change out the bearings with a New Design
FROM A CONTINUOUS IMPROVEMENT VIEWPOINT

Brainstorming teams gather together with past history and data performance of the pump and sees a variety of causes, however they are not certain which is the real cause so they all agreed that it was due to the change in the lubricant
FROM AN OPERATIONS VIEWPOINT

Hold countless hours of meeting blaming the maintenance for not doing their job
FROM TOP MANAGEMENT VIEWPOINT

We penalize the culprits and even threathen to cut off their 13 month pay if the same problem arises in the future, or get another guy that can do the job better.

MODULE 5

LESSONS ON RELIABILITY

LESSON # 1 ON RELIABILITY
Focus must be on RELIABILITY & not cost, because if RELIABILITY starts to improve COST will definitely go down, there will be times that focusing on COST will tend to hurt RELIABILITY, it cannot be the other way around. Having a low cost maintenance is a consequence of good maintenance practice
The goal of any maintenance is to improve equipments reliability, once reliability starts to improve cost goes down & its not the other way around. Cutting cost on maintenance will definitely not improve reliability. Reducing cost had been a focus for most maintenance managers and that perhaps, we need to learn from the lessons of history. Cost must be studied thoroughly not just based from its initial cost but on the entire life cycle cost of the equipment . . . . .

LESSON # 2 ON RELIABILITY

Never ever accept failures in your plant. Trouble shooting is no longer an effective strategy. In todays competitive world, the analysts finds real solutions to the problems
When we get really good at doing things then something is wrong because we are doing it much often, but when we expect a different result from the same tasks we are doing then this is simple not possible, the Chinese called this INSANITY . . . . . The new paradigm is that FAILURES MUST NOT BE ACCEPTED it can be eliminated if we know the right tools to address them. The true job of maintenance is to eliminate failures & not fixing them all the time . . . . .

LESSON # 3 ON RELIABILITY
The best time to address a problem is when it is small. It is very hard to advance to any form of specialized maintenance activities and improvement efforts if equipment's Basic Condition had not been well established. Always remember our equipment is a shared responsibility for both operators & maintenance people, a lesson we must all learn from the Japanese.

Performing maintenance on the equipment is not the sole responsibility of the maintenance department, this should be a shared responsibility for operations and maintenance . . . . .

LESSON # 4 ON RELIABILITY
In a REACTIVE ENVIRONMENT, we always complain that we lack manpower resources to address failures, but once equipment starts to improve we always wonder where they have been in the first place . . .

In reality maintenance is not outnumbered, they are just too busy working with breakdowns. Maintenance is not measured by how fast we repair but on how we are able to eliminate the failure itself

LESSON # 5 ON RELIABILITY
Every failure has a specific set of consequences, being PROACTIVE has something to do about reducing or eliminating the consequences of failure to a minimum rather that completely eliminating the failure itself . . . .

The best maintenance strategy to adopt will always have to be based upon the consequences of the failure itself The first thing to ask in the event of a failure will be what is the consequences of the failure if it occurs on its own and will the failure be acceptable to the user or not . . . .

LESSON # 6 ON RELIABILITY
A question on why industry remain reactive may lead to a thousand reasons or more & those who fear that improving reliability may lead to elimination of jobs are right only to the point where they resist change. Increasing reliability is not achieved by cutting manpower nor are they contrasting goals. Increasing reliability means slowly getting out of the repair business so that new doors will open to maintenance function
The best positions in industry always belong to the maintenance function, however, most industries groomed their people to be mechanics rather than being a maintenance. Always be proud that you belong to the maintenance function . . . .

POSITIONS ON MAINTENANCE
Vibration Analyst Thermographer Ultrasonic Analyst Technical Trainer Oil / Lube Analyst Reliability Expert

Spare Parts Manager Tribologist

Maintenance Positions

Fractographer

CMMS Specialists

Preventive Maintenance

Failure Analyst

LESSON # 7 ON RELIABILITY
The real mission of the maintenance department is to provide reliable physical assets & excellent support for its customers by reducing and eliminating the need for maintenance. Do not confuse maintenance as synonymous to repair, these 2 are entirely different. The distinction between a true blooded maintenance & a mechanic is a maintenance uses more of his brain than his hand while a mechanic uses his hand much of the time. Let us treat our people as maintenance & not as mere mechanics

LESSON # 8 ON RELIABILITY
There is no silver bullet program or strategy that can transform a plants reliability overnight all will start with its basic foundation and that is by EDUCATION and this is the most most powerful weapon to change the mindset of our people

Reliability is not a program with an end but a culture without an end, its the same as any continuous improvement philosophy . . . .

LESSON # 9 ON RELIABILITY
Always remember that in any Reliability Improvement Initiative, the focus must be on the people provide them with the skills they need & these skills will be used to improve their equipment. People will improve their machines and it is not the other way around

The saying that the companies greatest asset is its people is not always true in the real world of manufacturing. What is correct is that, the right people will be the companies greatest asset. There are people who wants to learn and there are people who never learn

S-ar putea să vă placă și