Sunteți pe pagina 1din 5

Lean RW Final.

qxp 4/16/2007 10:00 AM Page 290

Lessons from the Crime Scene:


Evidence Preservation during Equipment Troubleshooting
By: Ken Reed of System Improvements, Inc.

Introduction often appears to require expert knowledge about how the equipment
Law enforcement organizations have this down to a science. was operated, how it was installed, the original design specs,
Arrive at any crime scene, and you’ll find yourself immediately in the changes in the environment, how it was actually being used, etc.
midst of a flurry of activity. After the Generic Food Mart is burgled, Luckily, with just the right combination of repair expertise, root cause
the area is roped off, witnesses are gathered together and segregated analysis, and corrective action implementation, the process does not
from other onlookers, fingerprints are being lifted, and suspects may necessarily have to be harder to get more productive and lasting
already be in custody. More cops are there to guard the area from results. The right systems have usually already been purchased and
accidental or purposeful intrusion. put in place at most production facilities to get the data required for
an accurate and detailed failure analysis. Unfortunately, the
The amount of resources expended on a major (or even many minor) employment of these resources is not always optimum. A smarter
crime scene can be truly mind-boggling. You’ll find the team leader, who approach to the gathering of evidence, the correct interpretation of
directs general responsibilities. The photographer documents visual what that evidence is telling you and the judicious application of
evidence, a sketch artist takes descriptions and draws the crime scene, corrective actions will put those expensive monitoring systems to
and a number of officers guard the area. Investigators interview people work for you.
at the scene, while more patrolmen canvass the local residents for more
data. Specially-trained evidence gathering personnel process the The evidence gathering process
evidence and ensure the documentation is foolproof. Investigators Most companies already have many systems in place that can help
immediately start researching the backgrounds on suspects, looking for the troubleshooter narrow down his focus, but often times the data is no
clues in past history. longer available. The act of repairing the gear has already modified,
moved, or destroyed key pieces of evidence. Although the failure
Why is this immediate effort so massive? When actually analyzed, appeared to be minor at first, these data points can be crucial to finding
the number of man-hours invested, equipment expended and an actual root cause of equipment damage. Where do you get the
depreciated, and the inter-departmental coordination required add up to a evidence you need to determine the actual root cause of the failure?
hefty wad of cash that the taxpayers must pony up. Of course, this must
have been determined to be appropriate, or local law-enforcement efforts A good place to start is with the equipment operators. How often
would be shut down. Is this initial level of investigation really necessary? have you heard (AFTER the gear is down), “Oh, yeah, it’s been doing
In fact, why not wait a few days for everyone to calm down, let the that for a while,” or “It’s always been that way.” This can be one of
emotions die off? After all, we are hurting the business owner by the most frustrating times in the life of the maintenance manager,
restricting access to the shop, bothering his customers, even
listening to an operator describe in detail the tell-tail signs that his
appropriating pieces of his store or inventory. Let him get back on his
gear is about to fail. However, at this point in the failure analysis,
feet. What makes this worth the effort?
this is just INFORMATION TO BE GATHERED. The fact that the
operator did not inform anyone about the previous abnormalities is
The reason this is acceptable is that there is really no other method
yet another data point. Again, this is only data that can be used later
available that can reliably produce the required results. If the
for root cause analysis and corrective actions. Do not draw any
photographer was not there, there would be no record of the actual
environment at the scene. Evidence that is not quickly and conclusions at this time.
accurately recorded will be lost or modified, with no hope of retrieval. Some companies have trained their operators to immediately
We could wait to begin researching background information, but this document the conditions encountered at the time of a failure. The
will just prolong the successful completion of the investigation data is often written on a standard form or in the operator’s log using
beyond reasonable time-limits. Sweeping up and throwing away the an approved format. In either event, the report should include some
broken glass gets the business up and running, but for how long? basic information:
Without this process in place, the crime is almost guaranteed to • Time and Date
happen again. The stricken store may install bars on the windows, • The initial indication of the failure (loud vibration, initial alarm, etc.)
but the criminal still at-large will just find another way in, or move on • Operator’s name
to the next store down the street.
• Operation being performed (start-up, shutdown, capacity test, etc)

The process of determining the cause of an equipment malfunction • Any alarms, indicators, warnings, or other installed indications,
can often seem as daunting as a major crime scene investigation. It including pressure and temperature of the process

290 2007 Conference Proceedings


Lean RW Final.qxp 4/16/2007 10:00 AM Page 291

/ Reliability Engineering

• Environmental conditions (air conditioning secured for 3 hours) part to determine not only what broke, but how it broke. The failure
• Physical conditions noted (smoke, noise, smell, hot to touch) mode and failure agents must be determined to find and eliminate the
• Actions taken in response to the failure actual cause of the failure.

This data must be captured immediately upon recognizing the


Sequencing the analysis
failure and any required actions completed.
The sequence of the data-gathering steps is actually fairly
important. The operator should immediately write down his
The operator may be one of your best sources of information, but
indication. The troubleshooter should talk to the operator early on
here caution is required. Although he may have the data:
to get his thoughts while it is still fresh in his mind. But when can
• He may not know that he has it. You may have to ask the right equipment repair begin? After all, working in parallel to find the
questions to get the information you are looking for. cause, while simultaneously preparing for the repair, just seems like
• He may think he has it. In reality, he may have misinterpreted good sense. However, this is where an enormous amount of
an indication, missed another indicator, or just not understand information is often lost, destroyed, or altered. The following
what he saw. example illustrates how working ahead of the analysis can lead to
• He may not want you to have it. This is an angle on the frustrating re-work.
investigation that I will not focus on at this point. Just be
aware that the motivations of the people you are interviewing Example
for data may not be known, and the answers you get may or A power plant was having its entire main condensate system
may not match up with what really happened. overhauled. New piping was being installed, and the condensate
pumps were to be rebuilt. Work began on the system by removing
Equipment monitoring records and recordings contain a wealth of the pumps and hauling them to the pump shop for refurbishment.
information. Vibration monitoring recordings, thermal images, and oil Piping in the system was cut out and replaced to correct below-spec
analysis results can all be used to determine the timeline of events minimum wall thicknesses.
leading up to the failure. You may not know what to do with the data
yet, but have it available and ready for further scrutiny. The pumps were spec’ed out, rebuilt, and hydrostatically tested in
the shop. No issues were found.
Machinery history and repair records are invaluable. These records
can be on paper or in electronic format. They can be used to
Two months after their removal, the pumps were re-installed in the
discover long-term trends in equipment operational status and down-
system. The system was filled, vented, and tested one pump at a
time analysis. Has this happened before? What caused it that time?
time. After running for 20 hours, the lower pump bearing failed, as
How did we fix it last time? Did that fix work?
indicated by excessive vibration.

At this point, the usefulness of these records is established by past


The pump was removed from the system and inspected. The
maintenance practices. Entries in these records that say (more or
lower pump bearing was found to have failed. The bearing was
less), “Process pump #3 down due to pump failure” is much less
replaced and the pump re-installed. Twelve hours after start-up for
useful than, “Process pump #3 secured (run hours 2910). Smoke
run-in, the bearing again failed.
noted issuing from mechanical seal upon initial start-up. Discovered
clogged flush line. Line cleared, flow verified, seal replaced and
This time, the ace pump re-builder was called in. Obviously,
retested.” The second entry contains a wealth of information that
someone is not installing the bearing correctly. He had been doing
can be used for a much better analysis of the reason it failed versus
this for years, and would make sure the job was done correctly this
just a single failure data point. This entry would probably take the
time. He personally supervised the rebuilding and retesting of the
maintenance supervisor an extra 3 minutes to complete.
pump. It was run on a test fixture for 80 hours, with all vibration
When should entries be made in the machinery history log? Best measurements well within spec. Everything looked fine from his
practice is to make a minimum of 2 entries: one immediately at the perspective. He saw nothing that he recognized as a problem from
initial failure, and one following repair and retest. If further his experience.
indications were found, special troubleshooting methods were The pump was again re-installed and retested. The bearing failed
employed, or the troubleshooting was very complex, more entries can for the 3rd time after 20 hours of operation. Each bearing
be made as required. Bottom line: for electronic recording systems, replacement cost over $23,000 for just parts and labor. So far, this
there cannot be too much data. Paper systems may require a more equated to nearly $70,000, not including the slip in delivery date,
judicious use of space to prevent an unmanageable clutter, but can the extra time and effort expended by the expert pump supervisor,
still contain a good amount of information. and the extensive pre-installation vibration testing on the third go-
around. Unfortunately, the pump was in worse condition than
Another important information resource is the broken piece of before the overhaul.
equipment itself. It is critical that the troubleshooter look at the failed

2007 Conference Proceedings 291


Lean RW Final.qxp 4/16/2007 10:01 AM Page 292

But, wait a minute. We determined earlier that one of the most


important pieces of the puzzle is the failed component. How can we
This is just an isolated incident, right? Recently, a company analyze the bearing if we don’t first disassemble the pump? We
was having problems with the mechanical seals on 2 alkali seem to need to know the possible causes before we even start the
process pumps at their plant. They replaced the seals 23 disassembly!
times in a 12-month period. In addition to the $500,000 in lost
production, the last replacement resulted (due to a lockout / This is a great question. It runs to the core of why many
tagout error) in an explosion with personnel injury and over troubleshooting and repair scenarios end with a rework of the same
$12 million in facility damage. failure.

The Right Way


Let’s walk through the above example again, using what we’ve
Finding the culprit discussed. The first step in any root cause analysis should be to
From this example, with the data you have been given, the cause
diagram out exactly what happened. This is can be done using a dry-
of the bearing failure will not be obvious. Even the expert is left
erase board or specialty software like TapRooT®, and would look
scratching his head. How do you go about finding the cause of this
something like Figure 1.
type of failure?
By using this system, a timeline is set up with all the known data
The sequence of evidence gathering listed above was followed for
incorporated into an easy-to-understand format. It may be tempting
all three bearing failures. Obviously, there must be something else
to skip this part (“I know what happened!”), but this is a crucial step
going on that even the “pump guru” was not aware of or hadn’t
in understanding exactly what happened when.
thought of. What do you do?
Now, we need to start trying to look for the cause of the failure.
This facility fell into one of the traps that many companies stumble
This is not easy. We are tempted to start ripping and tearing into the
into. Repairs were commenced before the failure analysis was
gear, looking for obvious problems and setting up for the repair work.
complete. Companies want to get ahead and disassemble the pump,
This is the normal practice, but again, this can be self-defeating. As
but this can lead to the disruption (or destruction) of evidence needed
disassembly continues, our technicians may not know what to look
to determine the cause.

Figure 1. Initial SnapCharT®.

292 2007 Conference Proceedings


Lean RW Final.qxp 4/16/2007 10:01 AM Page 293

/ Reliability Engineering

for. They are doing the best they can, but generally they may not remaining causes are now known, and valuable data can be brought
have the expertise or the guidance to look for the right thing. In this to the jobsite to find the actual cause. You now know the right
example, the shop removing the pump is the rigging shop. They are questions to ask during the equipment teardown:
good at what they do, but they are not pump rebuilding experts by
1. Is there a misalignment between the pump and motor?
any stretch of the imagination.
2. Is there casing distortion due to excessive pipe strain?
Before disassembly, you must have a list of probable failure modes.
These can be obtained from many sources. Previous troubleshooting At this point, you can continue the investigation just like any other.
and repair records are a great resource for recurring failures (although Since you know what to ask, you know what to look for. You can go
the fact that they are recurring should catch your attention. If you to the job site and gather the extra data that you need. In this case,
are lucky enough to have that expert on-site, use him. There may before the pump is unbolted from the foundation, you notice the
even be troubleshooting tables available that can give you guidance riggers are connecting chain falls to the discharge piping and the
as to where to start looking. pump. When questioned, the riggers tell you that it took chain falls
to get the piping aligned during installation, and there will be quite a
In this example, the possibilities you have put together may bit of tension as the flange bolts are loosened.
include:
a. A bent shaft The Root Cause?
Our timeline would now look something like Figure 2:
b. Distorted pump casing due to piping strain
c. Pump / motor mis-alignment
We found the root cause!! Those mechanics obviously don’t know
d. Pump imbalance what they’re doing and are flexing the pipe (and the pump casing) too
e. Motor imbalance much. Tell those mechanics to line it up right next time!
f. Motor electrical problems
Sound reasonable? Of course not. Unfortunately, this is the type
You can eliminate many of these causes right away (the pump had of response that is heard over and over again throughout industry.
been verified in balance, the shaft was not bent, etc). The possible “Tell those guys to be more careful.” This has the same effect as

Figure 2. Final SnapCharT®.

2007 Conference Proceedings 293


Lean RW Final.qxp 4/16/2007 10:01 AM Page 294

telling your son (after running over the mailbox) to drive more Summary
carefully in the future. You’ll get a half-hearted “OK,” and nothing has Industry is spending large sums of money on predictive
changed. These are not the root causes. After completing the maintenance systems, allowing them to know WHEN the gear is
investigation and fully analyzing the problems, several root causes about to fail, but none of these systems can tell you WHY. It is up to
may be found. For example: the trained investigator, with the right tools, to be able to avoid the
costly repeat failures that continue to plague the manufacturing field.
1. The prints used to fabricate the piping contained a typographical
error, causing the incorrect piping length to be used. By ensuring failures are understood and fixed right the first time,
2. Riggers were not trained on the correct method of rigging enormous amounts of time, effort, and money can be saved, allowing
pumps. your production lines to remain operating at peak capacity.
3. A procedure for rigging the pump was already written, but it
was buried in the notes section of the piping print.
4. No audits had ever been conducted on rigging large pumps and
valves into position.
5. Supervisors were not available during the rigging.
6. The personnel in the pump shop did not communicate effectively
to the riggers.
7. After the first failure, there was no process in place to
determine the actual root cause. (In actuality, this incident was
discovered by an independent supervisor working another job
watching the riggers install the chain falls.)

Corrective Actions
This is another point in the incident investigation process that often
fails. Corrective actions must now be assigned that are meaningful,
achievable, and the results measurable. For example, it does no good
to tell the workers to be more careful. Each of the root causes must
be addressed on its own merit, with corrective actions assigned,
carried through, and audited.

Best Practice
Who has time for this type of analysis? In reality, all best in class
companies have found the time. The time spent properly following up
on equipment failures is rarely wasted time. In fact the savings are
compounded, 2-fold. In this particular case, the time spent
conducting a proper equipment failure analysis would have saved the
shipyard the 3 weeks and over $150,000 in delays after the first
bearing failure. In addition, if the corrective actions are not
implemented, this same issue is almost guaranteed to happen again,
causing repeat equipment failures and delays further down the road.

Unfortunately, this scenario is not an isolated case. Every plant


has at least one of these stories to tell. Not every plant can say they
have come up with a proven system that has averted further repeat
problems. Studies have shown that industry is not meeting the best
practice mix of maintenance resource strategies. Table 1 illustrates
the gulf between world-class companies and the average
manufacturing facility.

Table 1. Best practice comparison.

294 2007 Conference Proceedings

S-ar putea să vă placă și