SlideShare a Scribd company logo
1 of 36
Root Cause Analysis
Investigating causes of failures & 
mishaps 
Stop and ask yourself… 
Did you really find the causes 
of the failure? 
Like icebergs, most of the problem is 
usually below the surface!
This is NOT Root Cause 
Analysis
Technical Proficiency 
• Once the accident happened how did Gene 
Krantz rely on the skills and expertise of his 
people? 
• How did Lovell work to initiate actions in the 
spaceship? Was he able to balance that with 
his technical responsibilities in the craft? How 
did he do it? 
• What steps does your unit take to maintain 
Technical Proficiency? 
Lessons from Apoll0 13
Teambuilding 
• How did Lovell contribute to the group process 
when Mattingly wanted to practice the docking 
procedure again after 3 hrs of practice? 
• When Krantz had the team in the classroom how 
did he establish the goal and then how did he go 
about motivating others to achieve the goal of 
returning the space craft safely to earth? 
• Did Lovell make the right call when faced with the 
challenge of forcing Mattingly to stay behind 
because of the fear of measles? 
• How does a leader successfully build a strong 
team, but then separate him or herself from the 
Team to make a critical decision? 
• How’s your Team doing? 
Lessons from Apoll0 13
Effective Communications 
• Even as everything is breaking loose in 
Mission Control, Gene Krantz asks his team 
to “Work the Problem.” He then listened to 
the experts report in on their areas of the 
mission. How did his effective comms set the 
stage for a successful recovery? 
• Krantz stated “Failure is not an option” and 
Lovell told his crew “I intend to go home.” By 
clearly stating their ideas and vision how did it 
direct the teams towards mission 
accomplishment? 
• Whose the best communicator you’ve ever 
worked with? What made them excel? 
Lessons from Apoll0 13
Vision Development & Implementation 
• JFK’s Vision: "I believe that this nation 
should commit itself to achieving the goal, 
before this decade is out, of landing a man 
on the moon and returning him safely to 
Earth.“ 
• How does a stated vision focus the unit 
and bring the crew together? 
• Lovell states; “Columbus, Lindberg, and 
Armstrong; it is not a miracle for man to 
walk on the moon, we just decided to go.” 
• What’s the vision at your unit? Has 
everyone decided “to go?” What can your 
unit do to get everyone “on board”? 
Lessons from Apoll0 13
Conflict Management 
• How did Lovell deal with stress and conflict in the 
LEM? 
• How did the CO2 challenge help the crew to 
overcome the conflict they were experiencing? 
• Is there more or less conflict when people are busy 
and focused or when there is less to do and folks have 
time on their hands? Why? 
• How did Krantz and Lovell go about alleviating conflict 
between the crew and the Medical team? 
Lessons from Apoll0 13
Decision Making & Problem 
Solving 
• How did the Team live the Competency of 
Decision Making and Problem Solving in 
working the “Power” problem to conclusion? 
• Right after the explosion Krantz’s asks 
Mission Control “What do we have on the 
Space Craft that’s good?” 
• Why did he ask this question? 
• How did it aid in making the correct decision 
to shut down the fuel cells? 
• Does everyone at your Teamt ensure that the 
Decision Makers have all the available and 
correct information? Why or Why not? 
Lessons from Apoll0 13
Creativity and Innovation 
• We’ve discussed a lot of positive leadership 
qualities during this session. How did Gene 
Krantz create an environment with his 
Mission Control team to ensure they were 
able to figure out how to solve the CO2 
problem with a “Square Peg in a Round 
Hole!” 
• Lovell states at the end of the movie; 
“Thousands of people worked to bring the 3 
of us back home.” How did creativity and 
innovation make the “Successful Failure” a 
reality? 
• How does your unit build on Lessons 
Learned? 
Lessons from Apoll0 13
Apollo 13 
Questions on homework
Investigating causes of failures & 
mishaps 
When performing an investigation, it is necessary to look at more than 
just the immediately visible cause, which is often the proximate cause. 
There are underlying organizational causes that are more difficult to 
see, however, they may contribute significantly to the undesired 
outcome and, if not corrected, they will continue to create similar types 
of problems. These are root causes. 
Requirements for mishap reporting and investigating all mishaps and 
investigations must identify the proximate causes(s), root causes(s) and 
contributing factor(s).
Definitions 
Proximate Cause(s) (Direct Cause) 
• The event(s) that occurred, including any condition(s) that 
existed immediately before the undesired outcome, directly 
resulted in its occurrence and, if eliminated or modified, would 
have prevented the undesired outcome. 
• Examples of proximate causes: 
Equipment Human 
• Arched • Pushed incorrect button 
• Leaked • Fell 
• Over-loaded • Dropped tool 
• Over-heated • Connected wires
Root Cause(s) 
Definitions 
• One of multiple factors (events, conditions or organizational factors) 
that contributed to or created the proximate cause and subsequent 
undesired outcome and, if eliminated, or modified would have 
prevented the undesired outcome. Typically multiple root causes 
contribute to an undesired outcome. 
Organizational factors 
• Any operational or management structural entity that exerts control 
over the system at any stage in its life cycle, including but not limited 
to the system’s concept development, design, fabrication, test, 
maintenance, operation, and disposal. 
• Examples: resource management (budget, staff, training); policy 
(content, implementation, verification); and management decisions.
Definitions 
Root Cause Analysis (RCA) 
• A structured evaluation method that identifies the root causes for 
an undesired outcome and the actions adequate to prevent 
recurrence. Root cause analysis should continue until 
organizational factors have been identified, or until data are 
exhausted. 
• RCA is a method that helps professionals determine: 
• What happened. 
• How it happened. 
• Why it happened. 
• Allows learning from past problems, failures, and accidents.
Root Cause Analysis - Steps 
1. Identify and clearly define the undesired outcome (outage). 
2. Gather data. 
3. Create a timeline. 
4. Place events & conditions on an event and causal factor tree. 
5. Use a fault tree or other method/tool to identify all potential causes. 
6. Decompose system failures down to a basic events or conditions (Further describe what 
happened) 
7. Identify specific failure modes (Immediate Causes) 
8. Continue asking “WHY” to identify root causes. 
9. Check your logic and your facts. Eliminate items that are not causes or contributing 
factors. 
10. Generate solutions that address both proximate causes and root causes.
Root Cause Analysis - Steps 
Clearly define the undesirable outcome. 
• Describe the undesired outcome. 
• For example: “software failed to deploy,” “transaction failed,” or 
“XYZ project schedule significantly slipped.” 
Gather data. 
Identify facts surrounding the undesired outcome. 
• When did the undesired outcome occur? 
• Where did it occur? 
• What conditions were present prior to its occurrence? 
• What controls or barriers could have prevented its 
occurrence but did not? 
• What are all the potential causes? 
• What actions can prevent recurrence? 
• What amelioration occurred? Did it prevent further damage?
Root Cause Analysis - Steps 
Create a timeline (sequence diagram) 
• Illustrate the sequence of events in chronological order 
horizontally across the page. 
• Depict relationships between conditions, events, and exceeded 
or failed barriers/controls. 
Exceeded- 
Failed Barrier 
Or Control 
Event 
Undesired 
Outcome 
Condition 
Event Event
Root Cause Analysis - Steps 
Create a timeline (sequence diagram) 
• If amelioration occurred (e.g., reboot server, move application to 
another server), this should be included in the evaluation to ensure that 
it did not contribute to the undesired outcome. 
Example: In the of a server reboot, the investigation should ensure that 
the reboot was the result of the mishap and a result of latent hardware 
defects. 
Exceeded- 
Failed Barrier 
Or Control 
Event 
Undesired 
Outcome 
Condition 
Event Event 
Exceeded- 
Failed 
Amelioration
Root Cause Analysis - Steps 
Example: simple timeline. 
Application failed 
to Go Live 
Operating system 
started up 
Lost 
transactions 
(Penalties 
paid) 
Tech. Used 
Wrong Method 
To Correct 
Server 
Powered Up 
Switch port 
in wrong 
VLAN
Root Cause Analysis - Steps 
Create an event and causal factor tree. 
(A visual representation of the causes that led to the failure or mishap.) 
• Place the undesired outcome at the top of the tree. 
• Add all events, conditions, and exceeded/failed barriers that occurred immediately 
before the undesired outcome and might have caused it. 
Application failed 
to Go Live 
Operating system 
started up 
Technician Used 
Wrong 
Method to Correct 
Lost transactions (Penalties paid) 
Server 
Powered Up 
Switch port in 
wrong VLAN
Root Cause Analysis - Steps 
Create an event and causal factor tree. 
• Brainstorm to ensure that all 
possible causes are included, NOT 
just those that you are sure are 
involved. 
• Be sure to consider people, 
hardware, software, policy, 
procedures, and the environment. 
Electric power 
tripped 
Application failed to 
Go Live 
Operating system 
started up 
Technician Used Wrong 
Method to Correct 
Lost transactions (Penalties Paid) 
Server 
Powered Up 
Switch port in 
wrong VLAN 
Technicians not 
properly trained 
Power Supply 
Failed 
Port labeled 
incorrectly 
Switch labeled 
incorrectly 
NIC driver 
wrong
Root Cause Analysis - Steps 
Create an event and causal factor 
tree continued... 
• If you have solid data indicating 
that one of the possible causes is 
not applicable, it can be 
eliminated from the tree. 
Caution: Do not be too eager to eliminate 
early on. If there is a possibility that it is a 
causal factor, leave it and eliminate it later 
when more information is available. 
Electric power 
tripped 
Application failed to 
Go Live 
Operating system 
started up 
Technician Used Wrong 
Method to Correct 
Lost transactions (Penalties Paid) 
Server 
Powered Up 
Switch port in 
wrong VLAN 
Technicians not 
properly trained 
Power Supply 
Failed 
Port labeled 
incorrectly 
Switch labeled 
incorrectly 
NIC driver 
wrong 
X
Root Cause Analysis - Steps 
Create an event and causal factor tree 
continued… 
• You may use a fault tree to determine all 
potential causes and to decompose the 
failure down to the “basic event” (e.g., 
system component level). 
Electric power 
tripped 
Application failed to 
Go Live 
Technician Used Wrong 
Method to Correct 
Lost transactions (Penalties Paid) 
Switch port in 
wrong VLAN 
Technicians not 
properly trained 
Switch labeled 
incorrectly 
Port labeled 
incorrectly 
Power supply 
failed 
NIC driver 
wrong 
Maintenance swap Diagram wrong 
with no re-label 
Confusing labels 
Operating system 
started up
Root Cause Analysis - Steps 
Create an event and causal factor 
tree continued… 
• A fault tree can also be used to 
identify all possible types of 
human failures. 
Didn’t Perceive 
System Feedback 
Application failed to 
Go Live 
Technician Used Wrong 
Method to Correct 
Lost transactions (Penalties paid) 
Switch port in 
wrong VLAN 
Didn’t Understand 
System Feedback 
Operation system 
started up 
Correct Interpretation 
Incorrect Decision 
Correct Decision But 
Incorrect Action 
Perception Error Interpretation Error Decision-Making Error Action-Execution Error 
Rule-Based 
Error 
Knowledge-Based 
Error 
Skill-Based 
Error
Root Cause Analysis - Steps 
Create an event and causal factor tree continued… 
• After you have identified all the possible causes, ask yourself “WHY” each 
may have occurred. 
• Be sure to keep your questions focused on the original issue. For example 
“Why was the condition present?”; “Why did the event occur?”; “Why was 
the parameter exceeded?” or “Why did the condition fail?” 
Event #2 Failed or Exceeded 
Barrier or Control 
Undesired Outcome 
Event #1 Condition 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Event #2 
Occurred 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Condition 
Existed or 
Changed 
WHY 
Failed 
Exceeded 
Barrier or 
Control 
WHY 
Failed 
Exceeded 
Barrier or 
Control 
WHY 
Failed 
Exceeded 
Barrier or 
Control
Root Cause Analysis – Steps 
Continue to ask “why” until you have reached: 
1. Root cause(s) - including all 
organizational factors that exert control 
over the design, fabrication, 
development, maintenance, operation, 
and disposal of the system. 
2. A problem that is not correctable by IT or 
IT contractor. 
3. Insufficient data to continue.
Root Cause Analysis- Steps 
The resultant tree of questions and 
answers should lead to a 
comprehensive picture of 
POTENTIAL causes for the 
undesired outcome 
Event #2 Failed or Exceeded 
Barrier or Control 
Undesired Outcome 
Event #1 Condition 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Event #2 
Occurred 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Condition 
Existed or 
Changed 
WHY 
Failed 
Exceeded 
Barrier or 
Control X 
WHY 
Failed 
Exceeded 
Barrier or 
Control 
WHY 
Failed 
Exceeded 
Barrier or 
Control 
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY 
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
X X WHY 
WHY 
Failed 
Exceeded 
Barrier or 
Control 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Failed 
Exceeded 
Barrier or 
Control 
WHY 
Event #2 
Occurred 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Condition 
Existed or 
Changed 
Check your logic with a detailed review of 
each potential cause. 
• Verify it is a contributor or cause. 
• If the action, deficiency, or decision in 
question were corrected, eliminated 
or avoided, would the undesired 
outcome be prevented or avoided? 
> If no, then eliminate it from the 
tree. 
Root Cause Analysis- Steps 
Event #2 Failed or Exceeded 
Barrier or Control 
Undesired Outcome 
Event #1 Condition 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
X 
Failed 
Exceeded 
Barrier or 
Control 
X X X X 
X X 
X 
X 
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY 
X X X X X X 
X X 
X 
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
Root Cause Analysis - Steps 
Create an event and causal factor tree continued… 
• The remaining items on the tree are the causes (or probable causes). necessary to 
produce the undesired outcome. 
• Proximate causes are those immediately before the undesired outcome. 
• Intermediate causes are those between the proximate and root causes. 
• Root causes are organizational factors or systemic problems located at the bottom 
of the tree. 
PROXIMATE 
CAUSES 
INTERMEDIATE 
CAUSES 
ROOT CAUSES 
Event #2 Failed or Exceeded 
Barrier or Control 
Undesired Outcome 
Event #1 Condition 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Failed/Exceeded 
Barrier or Control 
WHY 
Event #2 
Occurred 
WHY 
Event #2 
Occurred 
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY 
WHY 
WHY WHY WHY WHY WHY WHY WHY WHY 
WHY WHY 
WHY 
Condition 
Existed or 
Changed 
WHY 
Condition 
Existed or 
Changed 
WHY 
Failed/Exceeded 
Barrier or Control
Root Cause Analysis- Steps 
Some people choose to leave contributing factors on the tree to show 
all factors that influenced the event. 
Contributing factor: An event or condition that may have contributed to the 
occurrence of an undesired outcome but, if eliminated or modified, would not by 
itself have prevented the occurrence. 
If this is done, illustrate them differently (e.g., dotted line boxes and arrows) so that it is 
clear that they are not causes. 
Contributing 
Factors 
Event #2 Failed or Exceeded 
Barrier or Control 
Undesired Outcome 
Event #1 Condition 
WHY 
Event #1 
Occurred 
WHY 
Event #1 
Occurred 
WHY 
Failed/Exceeded 
Barrier or Control 
WHY 
Event #2 
Occurred 
WHY 
Event #2 
Occurred 
WHY 
Condition 
Existed or 
Changed 
WHY 
Condition 
Existed or 
Changed 
WHY 
Failed/Exceeded 
Barrier or Control 
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY 
WHY 
WHY WHY WHY 
WHY WHY WHY WHY WHY WHY WHY
No IP connection to 
VLAN assigned 
Incorrect server static 
address used 
incorrectly 
Engineer did not 
read correct label 
network 
Root Cause is Much Deeper 
Keep Asking Why 
Investigating Causes of 
Failures & Mishaps 
Application failed to 
Go Live 
Technician Used Wrong 
Method to Correct 
Lost transaction (Penalties paid) 
Switch port in 
wrong VLAN 
Operation system 
started up
Investigating Causes of Failures & Mishaps 
VLAN changed in 
unrelated move 
Application failed to 
Go Live 
No IP connection to 
network 
VLAN incorrectly 
Incorrect server 
static address used 
assigned 
Engineer did not 
read correct label 
Technician Used Wrong 
Method to Correct 
Lost transactions (Penalties paid) 
Switch port 
in wrong 
VLAN 
Operating system 
started up 
No Quality 
Inspection 
Insufficient 
Quality Staff 
Insufficient 
Budget 
Procedure 
Incorrect 
Not Updated 
Correct Interpretation 
Incorrect Decision 
Decision-Making Error 
New Task Insufficient 
Anomaly Training 
Training Does 
Not Exist 
Not Under Configuration 
Mgmt 
Insufficient 
Training Budget 
Organization Under Estimates 
Importance of Anomaly Training
Root Cause Analysis- Steps 
Generating Recommendations: 
At a minimum corrective actions should be generated to eliminate proximate 
causes and eliminate or mitigate the negative effects of root causes. 
When multiple causes exist, there is limited budget, or it is difficult to 
determine what should be corrected: 
• Quantitative analysis can be used to determine the total contribution of 
each cause to the undesirable outcome . 
• Fishbone diagrams (or other methods) can be used to arrange causes 
in order of their importance. 
• Those causes which contribute most to the undesirable outcome 
should be eliminated or the negative effects should be mitigated to 
minimize risk.
Definitions of RCA & 
Related Terms 
Cause (Causal Factor) An event or condition that results in an effect. Anything that shapes or influences the outcome. 
Proximate Cause(s) The event(s) that occurred, including any condition(s) that existed immediately before the undesired 
outcome, directly resulted in its occurrence and, if eliminated or modified, would have prevented the 
undesired outcome. Also known as the direct cause(s). 
Root Cause(s) One of multiple factors (events, conditions or organizational factors) that contributed to or created the 
proximate cause and subsequent undesired outcome and, if eliminated, or modified would have prevented 
the undesired outcome. Typically multiple root causes contribute to an undesired outcome. 
Root Cause Analysis (RCA) A structured evaluation method that identifies the root causes for an undesired outcome and the actions 
adequate to prevent recurrence. Root cause analysis should continue until organizational factors have 
been identified, or until data are exhausted. 
Event A real-time occurrence describing one discrete action, typically an error, failure, or malfunction. 
Examples: pipe broke, power lost, lightning struck, person opened valve, etc… 
Condition Any as-found state, whether or not resulting from an event, that may have safety, health, quality, security, 
operational, or environmental implications. 
Organizational Factors Any operational or management structural entity that exerts control over the system at any stage in its life 
cycle, including but not limited to the system’s concept development, design, fabrication, test, 
maintenance, operation, and disposal. 
Examples: resource management (budget, staff, training); policy (content, implementation, verification); 
and management decisions. 
Contributing Factor An event or condition that may have contributed to the occurrence of an undesired outcome but, if 
eliminated or modified, would not by itself have prevented the occurrence. 
Barrier A physical device or an administrative control used to reduce risk of the undesired outcome to an 
acceptable level. Barriers can provide physical intervention (e.g., a guardrail) or procedural separation in 
time and space (e.g., lock-out-tag-out procedure).
MIR Process / Forms 
Major Incident – Severe Business impact: 
• service, system or infrastructure component not functioning adequately to enable business 
process 
• total loss of service, system or infrastructure component 
Major Incidents can also be considered to be those which do not entirely impede the use of the 
service, system or infrastructure component such as: 
• continuous slow response 
• general degradation of service 
• Refer: http://thinkingproblemmanagement.blogspot.com

More Related Content

What's hot

5 Why Training Slides Oct 14, 2009
5 Why Training Slides Oct 14, 20095 Why Training Slides Oct 14, 2009
5 Why Training Slides Oct 14, 2009
ExerciseLeanLLC
 
Root Cause Analysis (RCA)
Root Cause Analysis (RCA)Root Cause Analysis (RCA)
Root Cause Analysis (RCA)
Operational Excellence Consulting
 

What's hot (20)

5 why training_presentation
5 why training_presentation5 why training_presentation
5 why training_presentation
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
 
Root Cause Analysis
Root Cause AnalysisRoot Cause Analysis
Root Cause Analysis
 
5 Why Training Slides Oct 14, 2009
5 Why Training Slides Oct 14, 20095 Why Training Slides Oct 14, 2009
5 Why Training Slides Oct 14, 2009
 
Root Cause Analysis (RCA)
Root Cause Analysis (RCA)Root Cause Analysis (RCA)
Root Cause Analysis (RCA)
 
Root cause analysis - tools and process
Root cause analysis - tools and processRoot cause analysis - tools and process
Root cause analysis - tools and process
 
Root Cause Analysis
Root Cause AnalysisRoot Cause Analysis
Root Cause Analysis
 
Root Cause Analysis (RCA) Tools
Root Cause Analysis (RCA) ToolsRoot Cause Analysis (RCA) Tools
Root Cause Analysis (RCA) Tools
 
Root Cause Analysis and Corrective Actions
Root Cause Analysis and Corrective ActionsRoot Cause Analysis and Corrective Actions
Root Cause Analysis and Corrective Actions
 
5-Why Training
5-Why Training5-Why Training
5-Why Training
 
Mini-Training: Using root-cause analysis for problem management
Mini-Training: Using root-cause analysis for problem managementMini-Training: Using root-cause analysis for problem management
Mini-Training: Using root-cause analysis for problem management
 
Root cause analysis arg sc
Root cause analysis arg scRoot cause analysis arg sc
Root cause analysis arg sc
 
5 why’s technique and cause and effect analysis
5 why’s technique and cause and effect analysis5 why’s technique and cause and effect analysis
5 why’s technique and cause and effect analysis
 
Root Cause And Corrective Action Workshop Cinci Asq 2009
Root Cause And Corrective Action Workshop  Cinci Asq 2009Root Cause And Corrective Action Workshop  Cinci Asq 2009
Root Cause And Corrective Action Workshop Cinci Asq 2009
 
Root Cause Analysis By Deepak
Root Cause Analysis By DeepakRoot Cause Analysis By Deepak
Root Cause Analysis By Deepak
 
Root Cause Analysis (RCA) Tools
Root Cause Analysis (RCA) ToolsRoot Cause Analysis (RCA) Tools
Root Cause Analysis (RCA) Tools
 
Introduction to Root Cause Analysis
Introduction to Root Cause AnalysisIntroduction to Root Cause Analysis
Introduction to Root Cause Analysis
 
Root Cause Corrective Action
Root Cause Corrective ActionRoot Cause Corrective Action
Root Cause Corrective Action
 
Problem Solving Tools and Techniques by TQMI
Problem Solving Tools and Techniques by TQMIProblem Solving Tools and Techniques by TQMI
Problem Solving Tools and Techniques by TQMI
 
Root cause analysis training
Root cause analysis trainingRoot cause analysis training
Root cause analysis training
 

Viewers also liked

Ch 10 Fire-Resistive Construction
Ch 10 Fire-Resistive ConstructionCh 10 Fire-Resistive Construction
Ch 10 Fire-Resistive Construction
snoshoesam
 
polymers for concrete repair
polymers for concrete repairpolymers for concrete repair
polymers for concrete repair
Kris Kiran
 

Viewers also liked (19)

Root cause analysis by: ICG Team
Root cause analysis by: ICG TeamRoot cause analysis by: ICG Team
Root cause analysis by: ICG Team
 
Apollo13
Apollo13Apollo13
Apollo13
 
Ensayo Apolo 13
Ensayo Apolo 13Ensayo Apolo 13
Ensayo Apolo 13
 
Apollo 13
Apollo 13Apollo 13
Apollo 13
 
Reinforced concrete
Reinforced concreteReinforced concrete
Reinforced concrete
 
Structural concrete, theory and design,4th ed
Structural concrete, theory and design,4th edStructural concrete, theory and design,4th ed
Structural concrete, theory and design,4th ed
 
Dissertation
DissertationDissertation
Dissertation
 
Extending service life of existing bridges
Extending service life of existing bridgesExtending service life of existing bridges
Extending service life of existing bridges
 
Strengthening of RC Arches
Strengthening of RC ArchesStrengthening of RC Arches
Strengthening of RC Arches
 
Structural collapse awareness hmm
Structural collapse awareness hmmStructural collapse awareness hmm
Structural collapse awareness hmm
 
Ch 10 Fire-Resistive Construction
Ch 10 Fire-Resistive ConstructionCh 10 Fire-Resistive Construction
Ch 10 Fire-Resistive Construction
 
Prolonging the service life of existing reinforced concrete slab bridges
Prolonging the service life of existing reinforced concrete slab bridgesProlonging the service life of existing reinforced concrete slab bridges
Prolonging the service life of existing reinforced concrete slab bridges
 
Conference ppt
Conference pptConference ppt
Conference ppt
 
Concrete Repair: Bridges and Tunnels--Epoxies
Concrete Repair: Bridges and Tunnels--Epoxies Concrete Repair: Bridges and Tunnels--Epoxies
Concrete Repair: Bridges and Tunnels--Epoxies
 
polymers for concrete repair
polymers for concrete repairpolymers for concrete repair
polymers for concrete repair
 
Proof loading of existing bridges
Proof loading of existing bridges Proof loading of existing bridges
Proof loading of existing bridges
 
Ferrocement roof
Ferrocement roofFerrocement roof
Ferrocement roof
 
Atelectasis and collapse in CXR
Atelectasis and collapse in CXRAtelectasis and collapse in CXR
Atelectasis and collapse in CXR
 
Chapter 2 seviceability and durability
Chapter 2 seviceability and durabilityChapter 2 seviceability and durability
Chapter 2 seviceability and durability
 

Similar to Root cause analysis

CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1
John Rooksby
 
root cause analyse
root cause analyseroot cause analyse
root cause analyse
Abdou El
 
Disclaimer Use of this tool is not mandated by CMS, nor does
Disclaimer Use of this tool is not mandated by CMS, nor does Disclaimer Use of this tool is not mandated by CMS, nor does
Disclaimer Use of this tool is not mandated by CMS, nor does
AlyciaGold776
 

Similar to Root cause analysis (20)

Root Cause Analysis تحليل أسباب جذور المشكلة
Root Cause Analysis تحليل أسباب جذور المشكلةRoot Cause Analysis تحليل أسباب جذور المشكلة
Root Cause Analysis تحليل أسباب جذور المشكلة
 
Root Cause Analysis
Root Cause AnalysisRoot Cause Analysis
Root Cause Analysis
 
Corrective & Preventive Action
Corrective & Preventive Action Corrective & Preventive Action
Corrective & Preventive Action
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
 
Quality improvement tools
Quality improvement tools Quality improvement tools
Quality improvement tools
 
8D Training Presentation (tai lieu tham khao)
8D Training Presentation (tai lieu tham khao)8D Training Presentation (tai lieu tham khao)
8D Training Presentation (tai lieu tham khao)
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
 
CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1CS5032 Lecture 5: Human Error 1
CS5032 Lecture 5: Human Error 1
 
2 5 root cause
2 5 root cause2 5 root cause
2 5 root cause
 
root cause analyse
root cause analyseroot cause analyse
root cause analyse
 
2 5 root cause
2 5 root cause2 5 root cause
2 5 root cause
 
2 5 root cause
2 5 root cause2 5 root cause
2 5 root cause
 
Incident investigation and Root Cause Analysis
Incident investigation and Root Cause AnalysisIncident investigation and Root Cause Analysis
Incident investigation and Root Cause Analysis
 
Cause and effect diagrams
Cause and effect diagramsCause and effect diagrams
Cause and effect diagrams
 
IS/IS NOT Solving “Unsolvable” Problems
IS/IS NOT Solving “Unsolvable” ProblemsIS/IS NOT Solving “Unsolvable” Problems
IS/IS NOT Solving “Unsolvable” Problems
 
Outcome Over Output - And why should we care?
Outcome Over Output - And why should we care?Outcome Over Output - And why should we care?
Outcome Over Output - And why should we care?
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Disclaimer Use of this tool is not mandated by CMS, nor does
Disclaimer Use of this tool is not mandated by CMS, nor does Disclaimer Use of this tool is not mandated by CMS, nor does
Disclaimer Use of this tool is not mandated by CMS, nor does
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
How to Use Agile to Move the Earth
How to Use Agile to Move the EarthHow to Use Agile to Move the Earth
How to Use Agile to Move the Earth
 

More from Ronald Bartels

More from Ronald Bartels (20)

Implementing a modern Fusion Centre
Implementing a modern Fusion Centre Implementing a modern Fusion Centre
Implementing a modern Fusion Centre
 
NSA advisory about state sponsored cybersecurity threats
NSA advisory about state sponsored cybersecurity threatsNSA advisory about state sponsored cybersecurity threats
NSA advisory about state sponsored cybersecurity threats
 
The reasons why your business cannot afford to be offline
The reasons why your business cannot afford to be offlineThe reasons why your business cannot afford to be offline
The reasons why your business cannot afford to be offline
 
RADWIN, software defined wide area network, Press Release
RADWIN, software defined wide area network, Press ReleaseRADWIN, software defined wide area network, Press Release
RADWIN, software defined wide area network, Press Release
 
Infrastructure management presented to GPNOG (Updated)
Infrastructure management presented to GPNOG (Updated)Infrastructure management presented to GPNOG (Updated)
Infrastructure management presented to GPNOG (Updated)
 
Infrastructure management using a VPN Concentrator
Infrastructure management using a VPN ConcentratorInfrastructure management using a VPN Concentrator
Infrastructure management using a VPN Concentrator
 
Problem management foundation - Introduction
Problem management foundation - IntroductionProblem management foundation - Introduction
Problem management foundation - Introduction
 
Problem management foundation - Overview
Problem management foundation - OverviewProblem management foundation - Overview
Problem management foundation - Overview
 
Problem management foundation - Perceptions
Problem management foundation - PerceptionsProblem management foundation - Perceptions
Problem management foundation - Perceptions
 
Problem management foundation - Engineering
Problem management foundation - EngineeringProblem management foundation - Engineering
Problem management foundation - Engineering
 
Problem management foundation - Tiger teams
Problem management foundation - Tiger teamsProblem management foundation - Tiger teams
Problem management foundation - Tiger teams
 
Problem management foundation - Lifecycle
Problem management foundation - Lifecycle Problem management foundation - Lifecycle
Problem management foundation - Lifecycle
 
Problem management foundation - Tools
Problem management foundation - ToolsProblem management foundation - Tools
Problem management foundation - Tools
 
Problem management foundation - Analysing
Problem management foundation - AnalysingProblem management foundation - Analysing
Problem management foundation - Analysing
 
Problem management foundation Simulation
Problem management foundation SimulationProblem management foundation Simulation
Problem management foundation Simulation
 
Problem management foundation - IT risk
Problem management foundation - IT riskProblem management foundation - IT risk
Problem management foundation - IT risk
 
Problem management foundation - Continious improvement
Problem management foundation - Continious improvementProblem management foundation - Continious improvement
Problem management foundation - Continious improvement
 
Problem management foundation - Mission control
Problem management foundation - Mission controlProblem management foundation - Mission control
Problem management foundation - Mission control
 
Problem management foundation - Significant havoc in technology
Problem management foundation - Significant havoc in technologyProblem management foundation - Significant havoc in technology
Problem management foundation - Significant havoc in technology
 
Problem management foundation Budget
Problem management foundation BudgetProblem management foundation Budget
Problem management foundation Budget
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Root cause analysis

  • 2. Investigating causes of failures & mishaps Stop and ask yourself… Did you really find the causes of the failure? Like icebergs, most of the problem is usually below the surface!
  • 3. This is NOT Root Cause Analysis
  • 4. Technical Proficiency • Once the accident happened how did Gene Krantz rely on the skills and expertise of his people? • How did Lovell work to initiate actions in the spaceship? Was he able to balance that with his technical responsibilities in the craft? How did he do it? • What steps does your unit take to maintain Technical Proficiency? Lessons from Apoll0 13
  • 5. Teambuilding • How did Lovell contribute to the group process when Mattingly wanted to practice the docking procedure again after 3 hrs of practice? • When Krantz had the team in the classroom how did he establish the goal and then how did he go about motivating others to achieve the goal of returning the space craft safely to earth? • Did Lovell make the right call when faced with the challenge of forcing Mattingly to stay behind because of the fear of measles? • How does a leader successfully build a strong team, but then separate him or herself from the Team to make a critical decision? • How’s your Team doing? Lessons from Apoll0 13
  • 6. Effective Communications • Even as everything is breaking loose in Mission Control, Gene Krantz asks his team to “Work the Problem.” He then listened to the experts report in on their areas of the mission. How did his effective comms set the stage for a successful recovery? • Krantz stated “Failure is not an option” and Lovell told his crew “I intend to go home.” By clearly stating their ideas and vision how did it direct the teams towards mission accomplishment? • Whose the best communicator you’ve ever worked with? What made them excel? Lessons from Apoll0 13
  • 7. Vision Development & Implementation • JFK’s Vision: "I believe that this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the moon and returning him safely to Earth.“ • How does a stated vision focus the unit and bring the crew together? • Lovell states; “Columbus, Lindberg, and Armstrong; it is not a miracle for man to walk on the moon, we just decided to go.” • What’s the vision at your unit? Has everyone decided “to go?” What can your unit do to get everyone “on board”? Lessons from Apoll0 13
  • 8. Conflict Management • How did Lovell deal with stress and conflict in the LEM? • How did the CO2 challenge help the crew to overcome the conflict they were experiencing? • Is there more or less conflict when people are busy and focused or when there is less to do and folks have time on their hands? Why? • How did Krantz and Lovell go about alleviating conflict between the crew and the Medical team? Lessons from Apoll0 13
  • 9. Decision Making & Problem Solving • How did the Team live the Competency of Decision Making and Problem Solving in working the “Power” problem to conclusion? • Right after the explosion Krantz’s asks Mission Control “What do we have on the Space Craft that’s good?” • Why did he ask this question? • How did it aid in making the correct decision to shut down the fuel cells? • Does everyone at your Teamt ensure that the Decision Makers have all the available and correct information? Why or Why not? Lessons from Apoll0 13
  • 10. Creativity and Innovation • We’ve discussed a lot of positive leadership qualities during this session. How did Gene Krantz create an environment with his Mission Control team to ensure they were able to figure out how to solve the CO2 problem with a “Square Peg in a Round Hole!” • Lovell states at the end of the movie; “Thousands of people worked to bring the 3 of us back home.” How did creativity and innovation make the “Successful Failure” a reality? • How does your unit build on Lessons Learned? Lessons from Apoll0 13
  • 11. Apollo 13 Questions on homework
  • 12. Investigating causes of failures & mishaps When performing an investigation, it is necessary to look at more than just the immediately visible cause, which is often the proximate cause. There are underlying organizational causes that are more difficult to see, however, they may contribute significantly to the undesired outcome and, if not corrected, they will continue to create similar types of problems. These are root causes. Requirements for mishap reporting and investigating all mishaps and investigations must identify the proximate causes(s), root causes(s) and contributing factor(s).
  • 13. Definitions Proximate Cause(s) (Direct Cause) • The event(s) that occurred, including any condition(s) that existed immediately before the undesired outcome, directly resulted in its occurrence and, if eliminated or modified, would have prevented the undesired outcome. • Examples of proximate causes: Equipment Human • Arched • Pushed incorrect button • Leaked • Fell • Over-loaded • Dropped tool • Over-heated • Connected wires
  • 14. Root Cause(s) Definitions • One of multiple factors (events, conditions or organizational factors) that contributed to or created the proximate cause and subsequent undesired outcome and, if eliminated, or modified would have prevented the undesired outcome. Typically multiple root causes contribute to an undesired outcome. Organizational factors • Any operational or management structural entity that exerts control over the system at any stage in its life cycle, including but not limited to the system’s concept development, design, fabrication, test, maintenance, operation, and disposal. • Examples: resource management (budget, staff, training); policy (content, implementation, verification); and management decisions.
  • 15. Definitions Root Cause Analysis (RCA) • A structured evaluation method that identifies the root causes for an undesired outcome and the actions adequate to prevent recurrence. Root cause analysis should continue until organizational factors have been identified, or until data are exhausted. • RCA is a method that helps professionals determine: • What happened. • How it happened. • Why it happened. • Allows learning from past problems, failures, and accidents.
  • 16. Root Cause Analysis - Steps 1. Identify and clearly define the undesired outcome (outage). 2. Gather data. 3. Create a timeline. 4. Place events & conditions on an event and causal factor tree. 5. Use a fault tree or other method/tool to identify all potential causes. 6. Decompose system failures down to a basic events or conditions (Further describe what happened) 7. Identify specific failure modes (Immediate Causes) 8. Continue asking “WHY” to identify root causes. 9. Check your logic and your facts. Eliminate items that are not causes or contributing factors. 10. Generate solutions that address both proximate causes and root causes.
  • 17. Root Cause Analysis - Steps Clearly define the undesirable outcome. • Describe the undesired outcome. • For example: “software failed to deploy,” “transaction failed,” or “XYZ project schedule significantly slipped.” Gather data. Identify facts surrounding the undesired outcome. • When did the undesired outcome occur? • Where did it occur? • What conditions were present prior to its occurrence? • What controls or barriers could have prevented its occurrence but did not? • What are all the potential causes? • What actions can prevent recurrence? • What amelioration occurred? Did it prevent further damage?
  • 18. Root Cause Analysis - Steps Create a timeline (sequence diagram) • Illustrate the sequence of events in chronological order horizontally across the page. • Depict relationships between conditions, events, and exceeded or failed barriers/controls. Exceeded- Failed Barrier Or Control Event Undesired Outcome Condition Event Event
  • 19. Root Cause Analysis - Steps Create a timeline (sequence diagram) • If amelioration occurred (e.g., reboot server, move application to another server), this should be included in the evaluation to ensure that it did not contribute to the undesired outcome. Example: In the of a server reboot, the investigation should ensure that the reboot was the result of the mishap and a result of latent hardware defects. Exceeded- Failed Barrier Or Control Event Undesired Outcome Condition Event Event Exceeded- Failed Amelioration
  • 20. Root Cause Analysis - Steps Example: simple timeline. Application failed to Go Live Operating system started up Lost transactions (Penalties paid) Tech. Used Wrong Method To Correct Server Powered Up Switch port in wrong VLAN
  • 21. Root Cause Analysis - Steps Create an event and causal factor tree. (A visual representation of the causes that led to the failure or mishap.) • Place the undesired outcome at the top of the tree. • Add all events, conditions, and exceeded/failed barriers that occurred immediately before the undesired outcome and might have caused it. Application failed to Go Live Operating system started up Technician Used Wrong Method to Correct Lost transactions (Penalties paid) Server Powered Up Switch port in wrong VLAN
  • 22. Root Cause Analysis - Steps Create an event and causal factor tree. • Brainstorm to ensure that all possible causes are included, NOT just those that you are sure are involved. • Be sure to consider people, hardware, software, policy, procedures, and the environment. Electric power tripped Application failed to Go Live Operating system started up Technician Used Wrong Method to Correct Lost transactions (Penalties Paid) Server Powered Up Switch port in wrong VLAN Technicians not properly trained Power Supply Failed Port labeled incorrectly Switch labeled incorrectly NIC driver wrong
  • 23. Root Cause Analysis - Steps Create an event and causal factor tree continued... • If you have solid data indicating that one of the possible causes is not applicable, it can be eliminated from the tree. Caution: Do not be too eager to eliminate early on. If there is a possibility that it is a causal factor, leave it and eliminate it later when more information is available. Electric power tripped Application failed to Go Live Operating system started up Technician Used Wrong Method to Correct Lost transactions (Penalties Paid) Server Powered Up Switch port in wrong VLAN Technicians not properly trained Power Supply Failed Port labeled incorrectly Switch labeled incorrectly NIC driver wrong X
  • 24. Root Cause Analysis - Steps Create an event and causal factor tree continued… • You may use a fault tree to determine all potential causes and to decompose the failure down to the “basic event” (e.g., system component level). Electric power tripped Application failed to Go Live Technician Used Wrong Method to Correct Lost transactions (Penalties Paid) Switch port in wrong VLAN Technicians not properly trained Switch labeled incorrectly Port labeled incorrectly Power supply failed NIC driver wrong Maintenance swap Diagram wrong with no re-label Confusing labels Operating system started up
  • 25. Root Cause Analysis - Steps Create an event and causal factor tree continued… • A fault tree can also be used to identify all possible types of human failures. Didn’t Perceive System Feedback Application failed to Go Live Technician Used Wrong Method to Correct Lost transactions (Penalties paid) Switch port in wrong VLAN Didn’t Understand System Feedback Operation system started up Correct Interpretation Incorrect Decision Correct Decision But Incorrect Action Perception Error Interpretation Error Decision-Making Error Action-Execution Error Rule-Based Error Knowledge-Based Error Skill-Based Error
  • 26. Root Cause Analysis - Steps Create an event and causal factor tree continued… • After you have identified all the possible causes, ask yourself “WHY” each may have occurred. • Be sure to keep your questions focused on the original issue. For example “Why was the condition present?”; “Why did the event occur?”; “Why was the parameter exceeded?” or “Why did the condition fail?” Event #2 Failed or Exceeded Barrier or Control Undesired Outcome Event #1 Condition WHY Event #1 Occurred WHY Event #1 Occurred WHY Event #1 Occurred WHY Event #2 Occurred WHY Condition Existed or Changed WHY Event #2 Occurred WHY Event #2 Occurred WHY Condition Existed or Changed WHY Condition Existed or Changed WHY Failed Exceeded Barrier or Control WHY Failed Exceeded Barrier or Control WHY Failed Exceeded Barrier or Control
  • 27. Root Cause Analysis – Steps Continue to ask “why” until you have reached: 1. Root cause(s) - including all organizational factors that exert control over the design, fabrication, development, maintenance, operation, and disposal of the system. 2. A problem that is not correctable by IT or IT contractor. 3. Insufficient data to continue.
  • 28. Root Cause Analysis- Steps The resultant tree of questions and answers should lead to a comprehensive picture of POTENTIAL causes for the undesired outcome Event #2 Failed or Exceeded Barrier or Control Undesired Outcome Event #1 Condition WHY Event #1 Occurred WHY Event #1 Occurred WHY Event #1 Occurred WHY Event #2 Occurred WHY Condition Existed or Changed WHY Event #2 Occurred WHY Event #2 Occurred WHY Condition Existed or Changed WHY Condition Existed or Changed WHY Failed Exceeded Barrier or Control X WHY Failed Exceeded Barrier or Control WHY Failed Exceeded Barrier or Control WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
  • 29. X X WHY WHY Failed Exceeded Barrier or Control WHY Event #2 Occurred WHY Condition Existed or Changed WHY Failed Exceeded Barrier or Control WHY Event #2 Occurred WHY Event #2 Occurred WHY Condition Existed or Changed WHY Condition Existed or Changed Check your logic with a detailed review of each potential cause. • Verify it is a contributor or cause. • If the action, deficiency, or decision in question were corrected, eliminated or avoided, would the undesired outcome be prevented or avoided? > If no, then eliminate it from the tree. Root Cause Analysis- Steps Event #2 Failed or Exceeded Barrier or Control Undesired Outcome Event #1 Condition WHY Event #1 Occurred WHY Event #1 Occurred WHY Event #1 Occurred X Failed Exceeded Barrier or Control X X X X X X X X WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY X X X X X X X X X WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
  • 30. Root Cause Analysis - Steps Create an event and causal factor tree continued… • The remaining items on the tree are the causes (or probable causes). necessary to produce the undesired outcome. • Proximate causes are those immediately before the undesired outcome. • Intermediate causes are those between the proximate and root causes. • Root causes are organizational factors or systemic problems located at the bottom of the tree. PROXIMATE CAUSES INTERMEDIATE CAUSES ROOT CAUSES Event #2 Failed or Exceeded Barrier or Control Undesired Outcome Event #1 Condition WHY Event #1 Occurred WHY Event #1 Occurred WHY Failed/Exceeded Barrier or Control WHY Event #2 Occurred WHY Event #2 Occurred WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY Condition Existed or Changed WHY Condition Existed or Changed WHY Failed/Exceeded Barrier or Control
  • 31. Root Cause Analysis- Steps Some people choose to leave contributing factors on the tree to show all factors that influenced the event. Contributing factor: An event or condition that may have contributed to the occurrence of an undesired outcome but, if eliminated or modified, would not by itself have prevented the occurrence. If this is done, illustrate them differently (e.g., dotted line boxes and arrows) so that it is clear that they are not causes. Contributing Factors Event #2 Failed or Exceeded Barrier or Control Undesired Outcome Event #1 Condition WHY Event #1 Occurred WHY Event #1 Occurred WHY Failed/Exceeded Barrier or Control WHY Event #2 Occurred WHY Event #2 Occurred WHY Condition Existed or Changed WHY Condition Existed or Changed WHY Failed/Exceeded Barrier or Control WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
  • 32. No IP connection to VLAN assigned Incorrect server static address used incorrectly Engineer did not read correct label network Root Cause is Much Deeper Keep Asking Why Investigating Causes of Failures & Mishaps Application failed to Go Live Technician Used Wrong Method to Correct Lost transaction (Penalties paid) Switch port in wrong VLAN Operation system started up
  • 33. Investigating Causes of Failures & Mishaps VLAN changed in unrelated move Application failed to Go Live No IP connection to network VLAN incorrectly Incorrect server static address used assigned Engineer did not read correct label Technician Used Wrong Method to Correct Lost transactions (Penalties paid) Switch port in wrong VLAN Operating system started up No Quality Inspection Insufficient Quality Staff Insufficient Budget Procedure Incorrect Not Updated Correct Interpretation Incorrect Decision Decision-Making Error New Task Insufficient Anomaly Training Training Does Not Exist Not Under Configuration Mgmt Insufficient Training Budget Organization Under Estimates Importance of Anomaly Training
  • 34. Root Cause Analysis- Steps Generating Recommendations: At a minimum corrective actions should be generated to eliminate proximate causes and eliminate or mitigate the negative effects of root causes. When multiple causes exist, there is limited budget, or it is difficult to determine what should be corrected: • Quantitative analysis can be used to determine the total contribution of each cause to the undesirable outcome . • Fishbone diagrams (or other methods) can be used to arrange causes in order of their importance. • Those causes which contribute most to the undesirable outcome should be eliminated or the negative effects should be mitigated to minimize risk.
  • 35. Definitions of RCA & Related Terms Cause (Causal Factor) An event or condition that results in an effect. Anything that shapes or influences the outcome. Proximate Cause(s) The event(s) that occurred, including any condition(s) that existed immediately before the undesired outcome, directly resulted in its occurrence and, if eliminated or modified, would have prevented the undesired outcome. Also known as the direct cause(s). Root Cause(s) One of multiple factors (events, conditions or organizational factors) that contributed to or created the proximate cause and subsequent undesired outcome and, if eliminated, or modified would have prevented the undesired outcome. Typically multiple root causes contribute to an undesired outcome. Root Cause Analysis (RCA) A structured evaluation method that identifies the root causes for an undesired outcome and the actions adequate to prevent recurrence. Root cause analysis should continue until organizational factors have been identified, or until data are exhausted. Event A real-time occurrence describing one discrete action, typically an error, failure, or malfunction. Examples: pipe broke, power lost, lightning struck, person opened valve, etc… Condition Any as-found state, whether or not resulting from an event, that may have safety, health, quality, security, operational, or environmental implications. Organizational Factors Any operational or management structural entity that exerts control over the system at any stage in its life cycle, including but not limited to the system’s concept development, design, fabrication, test, maintenance, operation, and disposal. Examples: resource management (budget, staff, training); policy (content, implementation, verification); and management decisions. Contributing Factor An event or condition that may have contributed to the occurrence of an undesired outcome but, if eliminated or modified, would not by itself have prevented the occurrence. Barrier A physical device or an administrative control used to reduce risk of the undesired outcome to an acceptable level. Barriers can provide physical intervention (e.g., a guardrail) or procedural separation in time and space (e.g., lock-out-tag-out procedure).
  • 36. MIR Process / Forms Major Incident – Severe Business impact: • service, system or infrastructure component not functioning adequately to enable business process • total loss of service, system or infrastructure component Major Incidents can also be considered to be those which do not entirely impede the use of the service, system or infrastructure component such as: • continuous slow response • general degradation of service • Refer: http://thinkingproblemmanagement.blogspot.com