Root Cause Analysis Methodology in
Troubleshooting of Pump Failures
Mubashshir Mirza
1
Mubashshir Mirza Mubashshir Mirza works as a Staff Maintenance Engineer for Husky
Energy at the Lloydminster Upgrader. He has over 25 years of experience
working mainly in the areas of Rotating Machinery, both at operating
plants and EPC. He is a registered Professional Engineer with APEGA and
APEGS and holds a bachelor’s degree in Mechanical Engineering. He has a
professional background working in the areas of steam turbines,
centrifugal and reciprocating compressors and pumps, trouble shooting
and root cause failure analysis.
Prior to joining Husky, he worked at Petro-Canada’s Lubricants Refinery
as Rotating Equipment Specialist for 10 years.
Presenter
2
Agenda
3
1. Definitions
2. Root Cause Analysis Methodology
3. Common Root Causes
4. RCA Tree
5. RCA Case Studies
Incident
An unusual or unexpected event or emergency,
which either resulted in, or had the potential to
injure people, adversely impact the
environment, damage property or assets,
interrupt process operations or negatively affect
the company’s reputation. Usually involves an
energy source and the release of energy or the
occurrence of an action.
4
Root Cause
A root cause is an underlying cause (physical,
human or latent) of an incident and should be
permanently eliminated to prevent incident
reoccurrence.
5
Root Cause Analysis
Root cause analysis is an approach for
identifying the underlying causes (physical,
human and latent roots) of an incident so that
the most effective solutions can be identified
and implemented to prevent incident
recurrence.
The goal is to find a cure – not just treat the
symptoms
6
Root Cause Analysis
Why
• Learn from our mistakes and educate others
• To know what failed, which is usually is
obvious, but not always
• To know what led up to the failure; what
sequence of events had to line up
7
Physical Roots
Are related to the physics of the incident (how
the incident / failure occurred)
e.g.
• Fatigue
• Erosion
• Corrosion
8
Human Roots
Stem from Decision Errors (actions or inactions)
that trigger the physical roots to surface (what
error was committed)
e.g.
• Purchased poor quality material
• Procedures not followed
• OEM recommendations not followed
9
Latent Roots
Stem from Organizational or Management
System Flaws (why the human made the error)
e.g.
• Training deficiencies
• Policy and Procedure deficiencies
• Paradigms or beliefs
10
RCA Methodology
RCA Process Steps:
1. Preserve Event Information – Parts, Position,
People, Paradigms and Paper (The Five Ps)
2. Order the Analysis Team
3. Analyze – Describe the Event, Describe Modes,
Hypothesize, Verify Hypothesis,
Determine and Verify Physical Roots,
Determine and Verify Human Roots,
Determine and Verify Latent Roots
4. Communicate Findings and Recommendations
– in the form of RCA reports and Bulletins
5. Track the Results
11
RCA Methodology
12
PRESERVE
1. Parts – Failed components, product samples
2. Position – Pictures of failure
3. People – Interview personnel involved with the
failure, Operator logs
4. Paradigms – Repetitive themes/common
mindset
5. Paper – Drawings, failure reports, repair
reports, procedures, manuals
RCA Methodology
13
ASSEMBLE ANALYSIS TEAM
DRAW IN ADEQUATE RESOURCES
• Principal analyst/facilitator
• Operator
• Tradesman
• Subject matter expert
• Other stakeholders
DEVELOP CHARTER
RCA Methodology
14
ANALYZE
• Define event (Why we care)
• Define failure modes (How did…)
• Brainstorm and verify hypotheses (How can..)
• Identify root causes
• Generate recommendations to overcome root
causes
RCA Methodology
15
COMMUNICATE
• Check if the recommendations apply to
other assets in the organization
• Report and bulletin is provided to incident
owner
• Present recommendations and findings to
review team; typically bulletin is shared site
wide
RCA Methodology
16
TRACK
• Verify that recommendations are
implemented to ensure execution
• RCA tracking spreadsheet
• Bottom line impact of implemented
recommendations
Common Root Causes
Human & Latent Roots
• Operating outside design conditions (Human Root)
Incorrect operating and maintenance procedures
(Latent Root)
• Deficient designs (Human Root)
Design criteria does not meet plant needs (Latent Root)
• Lack of inspections (Human Root)
No defined inspection interval/inspection interval too long
(Latent Root)
• Poor documentation (Human Root)
Documentation requirements for MOCs, EWRs, etc. not well
defined (Latent Root)
17
RCA Tree
18
RCA Tree
19
RCA Tree
20
Root Cause Analysis Work Flow
21
RCA Tools
•Five “Whys”
•Reason Pro
•Fishbone
•Cause and Effect Analysis
•Fault Tree Analysis
•KT (Kepner Tregoe)
•TapRoot® or Equifactor®
•PROACT®
22
RCA Case Study 1
23
High thrust bearing temperatures upon
startup of a pump
Incident: Operations attempted several times to
put the pump back on line; however, was forced to
shut down due to observed high thrust bearing
temperatures.
Physical Root Findings
24
• The Physical Root cause of these higher thrust
bearing temperatures was found to be the RTD
change incorporated in the thrust pads of the
bearings.
• The temperatures as measured by the old style
temperature device were far lower than the new
embedded type.
• This was verified by pump OEM and the bearing
manufacturer
• When the change was implemented on the sister
pump, similar rise of bearing temperatures
observed.
Bayonet Type MTD
Existing Temperature Measurement Device was Bayonet Type
25
Embedded RTD Thrust Pad
26
Embedded RTD
Changed Temperature Measurement Device to Embedded
RTD
Human and Latent Roots Findings
27
• The Human Root was that change was not carried
out following the proper MOC process.
• The Latent Root was that OEM failed to inform
that higher temperatures will be observed upon
changing to embedded type RTDs from existing
Bayonet Type (A paradigm belief that client knows
and will adjust the settings)
Recommendations
28
• Set the temperature alert and danger limits
higher to accommodate changed bearing RTD
conditions
• Follow proper Management of Change (MOC)
business process for any changes in equipment
RCA Case Study 2
29
Bearing Failures
Incident:
Subsequent to major overhaul of a pump by a
pump vendor, the pump installed and started with
high drive end bearing temperatures and high
vibrations. Pump taken off line and inspection of
Drive End bearing revealed babbit wipe off on all
five pads.
Physical Root Findings
30
• The Physical Root cause of the bearing failure was
found to be insufficient clearance between the
bearings and the bearing housing.
• The condition of the bearing housing at the time
of the failure was such that there was an
interference fit between the drive end bearing
and the spherical bore of the bearing housing.
• This housing crush would prevent the bearing
from self-aligning, resulting in clearance loss
(bearing ID to shaft), rubbing and bearing failure.
Self-Aligning Tilt Pad Spherical Seat Journal Bearing
31
Damaged Babbit Due to Rubbing and Clearance Loss
32
Damaged Babbit
Human and Latent Roots Findings
33
• The Human Root was that repair checks to
compare required versus as left interference fit
were not recorded.
• The Latent Root was that the Vendor failed to
establish fit diameter checking criteria /
procedure
Recommendations
34
• Include bearing fit checks in the repair scope
• Pump vendor to develop clear written step-by-
step procedure and record required versus as left
fits
Summary
35
• Root cause analysis helps to identify underlying
causes of an incident so the most effective
solution can be implemented
• Five steps are taken when completing a root
cause analysis
• Completing RCA lets us learn from our mistakes
while identifying what failed and what led to the
failure to prevent recurring incidents
Questions?
36

Root Cause Failure Analysis Methods for Pump Failures

  • 1.
    Root Cause AnalysisMethodology in Troubleshooting of Pump Failures Mubashshir Mirza 1
  • 2.
    Mubashshir Mirza MubashshirMirza works as a Staff Maintenance Engineer for Husky Energy at the Lloydminster Upgrader. He has over 25 years of experience working mainly in the areas of Rotating Machinery, both at operating plants and EPC. He is a registered Professional Engineer with APEGA and APEGS and holds a bachelor’s degree in Mechanical Engineering. He has a professional background working in the areas of steam turbines, centrifugal and reciprocating compressors and pumps, trouble shooting and root cause failure analysis. Prior to joining Husky, he worked at Petro-Canada’s Lubricants Refinery as Rotating Equipment Specialist for 10 years. Presenter 2
  • 3.
    Agenda 3 1. Definitions 2. RootCause Analysis Methodology 3. Common Root Causes 4. RCA Tree 5. RCA Case Studies
  • 4.
    Incident An unusual orunexpected event or emergency, which either resulted in, or had the potential to injure people, adversely impact the environment, damage property or assets, interrupt process operations or negatively affect the company’s reputation. Usually involves an energy source and the release of energy or the occurrence of an action. 4
  • 5.
    Root Cause A rootcause is an underlying cause (physical, human or latent) of an incident and should be permanently eliminated to prevent incident reoccurrence. 5
  • 6.
    Root Cause Analysis Rootcause analysis is an approach for identifying the underlying causes (physical, human and latent roots) of an incident so that the most effective solutions can be identified and implemented to prevent incident recurrence. The goal is to find a cure – not just treat the symptoms 6
  • 7.
    Root Cause Analysis Why •Learn from our mistakes and educate others • To know what failed, which is usually is obvious, but not always • To know what led up to the failure; what sequence of events had to line up 7
  • 8.
    Physical Roots Are relatedto the physics of the incident (how the incident / failure occurred) e.g. • Fatigue • Erosion • Corrosion 8
  • 9.
    Human Roots Stem fromDecision Errors (actions or inactions) that trigger the physical roots to surface (what error was committed) e.g. • Purchased poor quality material • Procedures not followed • OEM recommendations not followed 9
  • 10.
    Latent Roots Stem fromOrganizational or Management System Flaws (why the human made the error) e.g. • Training deficiencies • Policy and Procedure deficiencies • Paradigms or beliefs 10
  • 11.
    RCA Methodology RCA ProcessSteps: 1. Preserve Event Information – Parts, Position, People, Paradigms and Paper (The Five Ps) 2. Order the Analysis Team 3. Analyze – Describe the Event, Describe Modes, Hypothesize, Verify Hypothesis, Determine and Verify Physical Roots, Determine and Verify Human Roots, Determine and Verify Latent Roots 4. Communicate Findings and Recommendations – in the form of RCA reports and Bulletins 5. Track the Results 11
  • 12.
    RCA Methodology 12 PRESERVE 1. Parts– Failed components, product samples 2. Position – Pictures of failure 3. People – Interview personnel involved with the failure, Operator logs 4. Paradigms – Repetitive themes/common mindset 5. Paper – Drawings, failure reports, repair reports, procedures, manuals
  • 13.
    RCA Methodology 13 ASSEMBLE ANALYSISTEAM DRAW IN ADEQUATE RESOURCES • Principal analyst/facilitator • Operator • Tradesman • Subject matter expert • Other stakeholders DEVELOP CHARTER
  • 14.
    RCA Methodology 14 ANALYZE • Defineevent (Why we care) • Define failure modes (How did…) • Brainstorm and verify hypotheses (How can..) • Identify root causes • Generate recommendations to overcome root causes
  • 15.
    RCA Methodology 15 COMMUNICATE • Checkif the recommendations apply to other assets in the organization • Report and bulletin is provided to incident owner • Present recommendations and findings to review team; typically bulletin is shared site wide
  • 16.
    RCA Methodology 16 TRACK • Verifythat recommendations are implemented to ensure execution • RCA tracking spreadsheet • Bottom line impact of implemented recommendations
  • 17.
    Common Root Causes Human& Latent Roots • Operating outside design conditions (Human Root) Incorrect operating and maintenance procedures (Latent Root) • Deficient designs (Human Root) Design criteria does not meet plant needs (Latent Root) • Lack of inspections (Human Root) No defined inspection interval/inspection interval too long (Latent Root) • Poor documentation (Human Root) Documentation requirements for MOCs, EWRs, etc. not well defined (Latent Root) 17
  • 18.
  • 19.
  • 20.
  • 21.
    Root Cause AnalysisWork Flow 21
  • 22.
    RCA Tools •Five “Whys” •ReasonPro •Fishbone •Cause and Effect Analysis •Fault Tree Analysis •KT (Kepner Tregoe) •TapRoot® or Equifactor® •PROACT® 22
  • 23.
    RCA Case Study1 23 High thrust bearing temperatures upon startup of a pump Incident: Operations attempted several times to put the pump back on line; however, was forced to shut down due to observed high thrust bearing temperatures.
  • 24.
    Physical Root Findings 24 •The Physical Root cause of these higher thrust bearing temperatures was found to be the RTD change incorporated in the thrust pads of the bearings. • The temperatures as measured by the old style temperature device were far lower than the new embedded type. • This was verified by pump OEM and the bearing manufacturer • When the change was implemented on the sister pump, similar rise of bearing temperatures observed.
  • 25.
    Bayonet Type MTD ExistingTemperature Measurement Device was Bayonet Type 25
  • 26.
    Embedded RTD ThrustPad 26 Embedded RTD Changed Temperature Measurement Device to Embedded RTD
  • 27.
    Human and LatentRoots Findings 27 • The Human Root was that change was not carried out following the proper MOC process. • The Latent Root was that OEM failed to inform that higher temperatures will be observed upon changing to embedded type RTDs from existing Bayonet Type (A paradigm belief that client knows and will adjust the settings)
  • 28.
    Recommendations 28 • Set thetemperature alert and danger limits higher to accommodate changed bearing RTD conditions • Follow proper Management of Change (MOC) business process for any changes in equipment
  • 29.
    RCA Case Study2 29 Bearing Failures Incident: Subsequent to major overhaul of a pump by a pump vendor, the pump installed and started with high drive end bearing temperatures and high vibrations. Pump taken off line and inspection of Drive End bearing revealed babbit wipe off on all five pads.
  • 30.
    Physical Root Findings 30 •The Physical Root cause of the bearing failure was found to be insufficient clearance between the bearings and the bearing housing. • The condition of the bearing housing at the time of the failure was such that there was an interference fit between the drive end bearing and the spherical bore of the bearing housing. • This housing crush would prevent the bearing from self-aligning, resulting in clearance loss (bearing ID to shaft), rubbing and bearing failure.
  • 31.
    Self-Aligning Tilt PadSpherical Seat Journal Bearing 31
  • 32.
    Damaged Babbit Dueto Rubbing and Clearance Loss 32 Damaged Babbit
  • 33.
    Human and LatentRoots Findings 33 • The Human Root was that repair checks to compare required versus as left interference fit were not recorded. • The Latent Root was that the Vendor failed to establish fit diameter checking criteria / procedure
  • 34.
    Recommendations 34 • Include bearingfit checks in the repair scope • Pump vendor to develop clear written step-by- step procedure and record required versus as left fits
  • 35.
    Summary 35 • Root causeanalysis helps to identify underlying causes of an incident so the most effective solution can be implemented • Five steps are taken when completing a root cause analysis • Completing RCA lets us learn from our mistakes while identifying what failed and what led to the failure to prevent recurring incidents
  • 36.