2. AIM
To compare existing methods for modeling (and/or
predicting) failure in real world, complex systems.
Research and summarize an number off failures and
accidents involving complex socio- technical
systems.
Research and summarize several modeling
approaches
Apply selected modeling approaches to chosen
accident
3. BREAKDOWN
3 Weeks: Researching failures and accidents.
- Columbia and Challenger Disasters
- 2010 Flash Crash
- Aviation accidents and incidents
4 Weeks: Researching ancient modeling approaches
5 Weeks: Modeled a well documented accident using
various systemic and sequential accident
modeling techniques.
6. SPACE SHUTTLE
COLUMBIA DISASTER
On the 1st February 2003 A critical
systems failure occurred on the
space shuttle Columbia (STS - 107) on
its re – entry to the earth’s
atmosphere.
This caused the disintegration of the
shuttle leading to the death of all
seven crew members.
STS-107 flight insignia
7. FLASH CRASH 2010
At approximately 2:45 on 6th May 2010 Prices on the United
States stock market fell sharply only to recover minutes later.
The Dow Jones dropped 600 points during the crash adding
to 300 point drop that day(due to Greek debt crisis).
Most of the 600 point drop was recovered within tens of
minutes.
(The staffs of the U.S. Commodity Futures Trading Commission and the U.S. Securities and Exchange Commission. )
Dow Jones – Important Index of the stock of 30 large
companies that are representative of the
United states economy. Represents state of
market.
8. KEGWORTH AIR
DISASTER
On 8th January 1989 British midland flight 92 crashed while
undertaking an emergency landing.
Crashed site : M1 embankment near the village of Kegworth.
The Boeing 737 -400 aircraft was severely damaged 79 of the
of the 126 people aboard the plane survived.
An investigation was carried out by the Air Accidents
Investigation Branch (AAIB).
(Air Accidents Investigation Branch, 1989)
9. EVENTS OF CRASH
1. Moments after reaching cruising attitude fan blade broke
off causing decrease in power and increase in vibrations.
This caused the left engine to produced a jet of flames.
1. Smoke flooded into the cabin. Captain shut down the
engine on the right.
1. Smell of smoke and vibrations reduced.
1. Crew diverted to Midlands Airport. Left engine completely
failed during the descent of the emergency landing
10. Insufficient
knowledge
of aircraft
Right In
Crash
engine was adequate
shut down training
Improper
Left engine design
failed testing
11. CONTRIBUTING
FACTORS
Inadequate knowledge of the aircraft
Flight crew observed smoke in the cabin.
Believed they could not trust the Vibration sensors. Was true
of the old Boeing 737 but not the new 737-400.
Indicates the state of the engines.
Fell back on general knowledge of aircraft which was wrong.
Thought that bleed air(pressure and heating) was taken from
the right engine.
In fact the air conditioning systems utilized both engines in
the new model.
12. BOEING 737 (OLD)
Right engine
Key
- bleed air via air
conditioning
14. CONTRIBUTING
FACTORS
In adequate training
The combination of violent engine vibrations and the smell of
smoke while climbing to covered attitude was not covered in
training.
Two separate protocols existed for each event but not in
conjunction.
No simulation training for engine failure of this kind, or what to
do if the situations fall out of bounds of standard procedures.
Differences in the Boeing 737 and 737-400 were not adequately
taught.
15. WHY THE MISTAKE
WAS NOT FOUND
By chance the the smoke dissipated and the vibrations
reduced – this was actually due to standard procedure
reducing fuel flow to both the engines.
Pilots did not communicate with the cabin crew who had
visual confirmation of which engine was damaged.
Immediate division to Midlands airport create a high cabin
workload this resulted in incorrect review procedure after the
right engine was shut down.
18. SEQUENTIAL
ACCIDENT MODELS
Simplest form of accident modeling.
Describes the accident as a series of events that occur in a
particular order.
Events occur along a linear timeline.
Analysis: Identifies specific cause and broken links in
accident chain. Goal is to eliminate broken links.
Fault tree analysis, Domino Model of accident causation, Events and
causal factors charting Event tree analysis, Management and Oversight
Risk Tree (MORT), Sequential Timed Events Plotting
(STEP),Man, Technology and Organization (MTO)-analysis, TRIPOD
19. SEQUENTIAL ACCIDENT
MODELS SUMMARY
Advantages:
Human readable, easy to communicate chain of events.
Can identify root cause or break in chain of events that lead
to accident.
Good starting of point.
Disadvantages:
Does not take into account latent factors.
Inadequate to model the variability of Sociotechnical
systems.
20. EPIDEMIOLOGICAL
ACCIDENT MODEL
SUMMARY
Accident is described as a disease.
Some factor that effects the accident occur right
away while others are latent.
Takes into account that events can manifest over time
Swiss cheese Model (Reason, 1997)
22. SYSTEMIC ACCIDENT
MODEL SUMMARY
Accidents naturally emerge, they are expected to occur. As
detailed In Perrow’s Normal Accidents.
(Perrow, 1984)
Focus:
Systemic models focus on the characteristics of a systems
as oppose to a series of events that cause the accident in the
system.
Difficult but powerful:
Ideal for complex systems but hard to represent graphically.
23. SYSTEMIC ACCIDENT
MODEL SUMMARY
Considers the performance of the system as a whole.
Organization
Environmental
Human
Technical
System is view as many components interacting causing a
equilibrium.
Systemic can evolve dynamically
Flawed interactions between components could cause
system to be thrown out of balance
Accident
24. SYSTEMIC ACCIDENT
MODEL SUMMARY
Cognitive Reliability Error Analysis Method (CREAM)
(Hollnagel E. , Cognitive Reliability and Error Analysis Method., 1998)
The Functional Resonance Analysis Method (FRAM)
(Hollnagel E. , FRAM – The Functional Resonance Analysis Method, 2012)
AcciMap
(Rasmussen, 1997)
Systems-Theoretic Accident Model and Processes (STAMP)
(Leveson, 2004)
26. FAULT TREE
ANALYSIS
(Høyland & Rausand, 1994)
Graphical representation of normal events, system
failures, human errors and environmental factors.
Logic gate are used to construct chains of events.
Used to identify sequences off failure.
Identifies root cause.
27. British Midland Flight
BD 92 crash landing
Wrong engine
Engine failure
shutdown (right engine)
Fan blade fracture
Judgment error Equipment failure Metal fatigue Heavy vibrations
Other tasks Insufficient protocols Poor aircraft design Inadequate training Engine vibrations
sensor failure Heavy vibrations Flawed engine design
No
protocols in
Pilots did no re- place to No way to Pilots did not High
evaluate engine
switch of
deal with get visual know that the
power In
aircraft had a
decision due to simultaneo conformatio Inadequa adequate
high cabin work us
different air- setting in
n from condition te testing in
load. symptom of cockpit system flight
vibration maintena high
and smoke. nce
28. Engine failure
Fan blade fracture
Metal fatigue Heavy vibrations
Heavy vibrations Flawed engine design
High power
setting in In adequate
flight testing in
high
29. Wrong engine shutdown (right
engine)
Judgment error Equipment failure
Other tasks Insufficient protocols Poor aircraft design Engine vibrations sensor failure
Inadequate training
No protocols
Pilots did no in place to No way to get Pilots did not
re-evaluate deal with visual know that the
engine switch Inadequate
simultaneous conformation aircraft had a
of decision maintenance
symptom of from cockpit different air-
due to high vibration and condition
cabin work smoke. system
load.
30. Wrong engine shutdown (right
engine)
Judgment error Equipment failure
Other tasks Insufficient protocols Poor aircraft design Engine vibrations sensor failure
Inadequate training
No protocols
Pilots did no in place to No way to get Pilots did not
re-evaluate deal with visual know that the
engine switch Inadequate
simultaneous conformation aircraft had a
of decision maintenance
symptom of from cockpit different air-
due to high vibration and condition
cabin work smoke. system
load.
31. ADVANTAGES AND
DISADVATGES
Advantages:
Root cause can be easily be identified.
Human readable easy to communicate events that lead to
accident.
Disadvantages:
Does not take into account latent conditions.
Does not take into account the environment in which the
ancient occurred
32. CREAM - COGNITIVE
RELIABILITY AND ERROR
ANALYSIS METHOD
(Hollnagel E. , Cognitive Reliability and Error Analysis Method., 1998)
Background:
Developed by Erik Hollnagel in 1998
Cognitive system engineering approach
design of human-machine systems accounting for
factors of the environment in which the system
exists.
Key idea:
Cognitive modeling of human performance for accident
analysis or performance predictions
33. HOW CAN IT BE USED
CREAM is a bi – directional analysis method.
Retrospective analysis – the analysis of error. Used for
accident analysis.
Prospective analysis – predicting possible error. Used for
accident prediction.
34. COMMON PERFORMANCE
CONDITIONS
Humans action can be correct or incorrect but also occur within
the context of situation.
Context can greatly effect an persons actions. Cream breaks
down context into 9 criteria.
Adequacy of organization
Working conditions
Adequacy of MMI and operational support
Availability of procedures/ plans
Number of simultaneous goals
Available time
Time of day (circadian rhythm)
Adequacy of training and expertise
After context has been
Crew collaboration quality
established analysis
can begin
36. ANALYSIS
CREAM defines error as follows:
Phenotype – An error that is a physical action that can be
measured and observed.
Genotype – The errors possible cause influenced by context.
These boundaries greatly reduce the inconsistency between
different analysts.
37. ANALYSIS
Cream describes how errors happen through the following
terminology:
Antecedent – the cause of the error.
Consequent – the effect of the error.
Each antecedent may have one to * consequent and each
consequent may have one to * antecedent.
Using a table of varies antecedents and consequents an
analysis of the accident can be built.
(Serwy, Rantanen, & Hollnagel)
38. MAN-TECHNOLOGY-
ORGANIZATION (MTO)
TRIAD
The contextual antecedents and consequents are split into
three categories:
Man – physical and cognitive limitations of person.
Technology – technological failure
Organization – failure of the organization in which the
situation exists.
At each stage of the analysis there are several options to
proceeded, due to the context stage some of these option are
more likely.
simplifies analysis processes.
39. HOW TO DO CREAM
The CREAM technique can be used for both retrospective
and prospective analysis. Here is how to use it:
1) Identify the Common Performance Conditions, under 'CPC’
2) Start with a genotype "Error Mode" (with retrospective) or a
phenotype "MTO triad" (with prospective) under 'Workspace’
3) For each step, select a Specfic Consequent to better explain the step.
4) For retrospective analysis, if there is enough information to select a
specific antecedent, then do so. The analysis stops for that branch.
5) Continue with each step of the analysis, exploring all the likely paths as
shown in the left panel of the Workspace.
(Serwy, Rantanen, & Hollnagel)
43. EVALUATION
Specific antecedent were found to be:
• Lack of knowledge of the aircraft
• Inadequate training of the flight crew
• Design failure of the aircraft( no visibility of engines)
• Competing tasks – cabin workload to high.
44. ADVANTAGES OF
CREAM
Allows for the context of the accident to be taken into
account. Shows how the context in which people work effect
there actions.
Can effectively do both Retrospective and Prospective
analysis. Only need to learn once as they used the same
simple principles.
A good structure that keeps inconsistency between different
analyst low.
45. DISADVANTAGES OF
CREAM
Resource hungry, requires a long period of time to complete.
Need to have a good level of exposure accident analysis in
particular the human factors.
No guidance on how the errors you have found can be
reduced.
46. USEFUL RESOURCES
Software tool for CREAM analysis.
http://www.ews.uiuc.edu/~serwy/cream/v0.6.1/
Evaluation of software(tells you how to use it)
47. FRAM - FUNCTIONAL
RESONANCE ANALYSIS
METHOD
(Hollnagel E. , FRAM – The Functional Resonance Analysis Method, 2012)
Background:
Developed by Erik Hollnagel in 2004
Performance variability
Performance in a system whither internal, external
dynamically fluctuates. Variability in complex systems is
normal.
Key idea:
Models how components of a system resonate and interact
with each other causing the system to lose balance leading
to accidents.
48. FRAM ANALYSIS
0. Define the purpose of modeling and describe the situation
being analyzed. An event that has occurred (incident/accident) or a
possible future scenario (risk).
1. Identify the essential functions in the event ('foreground'
functions when things go right); characterize each by six basic
aspects.
2. Characterize the actual / potential variability of 'foreground'
functions and 'background' functions (context). Consider both
normal and worst case variability.
3. Define functional resonance based on potential / actual
dependencies (couplings) among functions.
4. Propose ways to monitor and dampen performance
variability(indicators, barriers, design / modification, etc.)
49. Non-normal event
(Engine Failure)
Non-normal
procedures
Air conditioning High engine
smoke vibrations
Air conditioning High engine
smoke procedures vibrations
procedures
Engine shutdown
Engine shutdown Divert to nearest
checklist airport
Landing procedure Review any engine
shutdown decisions
Landing
50. T C
T C
Non-normal
T C
event
I (Engine O
Failure) High engine
I vibrations O
Non-normal
I procedures O
P R
P R
P R
T C
T C
High engine
Air
I vibrations O
I conditioning
O procedures
smoke
T C P
P R
R
T C
Divert to
I nearest
O
Air airport
conditioning
I smoke O T C
procedures
P R
Review any
P engine
R I O
shutdown
T C decisions
T C
P R
Engine
I shutdown O T C
Landing
I procedure O
T C
Engine
P I shutdown
R O
checklist
P R
I Landing
O
P R
P R
51. CHARACTERISTICS OF
FUNCTION VII
Control
Time Non-normal procedures ,
Must divert Commander and first officers’
immediately, top priority. T C Actions. Boeing 737 operations
Manual.
I Divert to nearest Output
Input O
airport Input to:
Air conditioning Landing procedure
smoke procedures
P
R
Resource
Precondition Commanders and first officers’
Air traffic control clearance attention and time, cabin crews
attention, air traffic control and
ground crew manpower
52. T C T C
Non-normal
event High engine Pilots did not know of
I (Engine O I
T C vibrations O newly introduced
Failure) Pilots experienced engine vibration
Left engine
malfunctioned symptoms of engine procedure
failure
P R Non-normal P
I O R
procedures
Engine vibration
producers were not
Engine vibration
P carried out
producers were not
R
carried out
symptoms stopped
Pilots experienced
symptoms of engine T C
T C failure
High engine
Air I vibrations O
I conditioning procedures
O T C
smoke
Divert to P R
P R nearest
I O
airport
T C
Flight crews chose to
deal with smoke P Review any
R
Shut down of right engine
T C I shutdown O
Flight crews attention engine was not
symptoms stopped reviewed decisions
focused on diverting
Air to nearest airport
conditioning
I P R
smoke O
T C T C
procedures Determined that right High cabin workload
engine was cause
Pilots were
P Engine required to
R Landing land a review
T C I shutdown I
O procedure O T C
checklist engine
shutdown
decisions
Engine
I shutdown O P R P R I Landing
O
P R P R
53. EVALUATION
Harmful interactions were found to be:
• Inadequate training of the flight crew, did nor know of
certain protocols
• Competing tasks – cabin workload to high.
54. ADVANTAGES OF
FRAM
Guides the investigation tem to ask more questions rather
than just looking for answers.
Can effectively do both Retrospective and Prospective
analysis. Only need to learn once as they used the same
simple principles.
Takes it to account the system in which the accident
occurred.
55. DISADVANTAGES OF
CREAM
Resource hungry, requires a long period of time to complete.
Need to have a good level of exposure accident analysis in
particular the human factors.
Does not find rote cause, further analysis is needed to
determine this.
56. REFERENCES
Marais, K., Dulac, N., & Leveson, N. (2004). Beyond Normal Accidents and High Reliability Organizations: The Need for an Alternative Approach to Safety
in Complex Systems. Cambridge.
Air Accidents Investigation Branch. (2012). June 2012 Bulletin. Aldershot: Air Accidents Investigation Branch.
Air Accidents Investigation Branch. (1989). Report on the Accident to Boeing 737-400 G-OBME near Kegworth, Leicesterhire on 8 Janury 1989. Aldershot:
Air Accidents Investigation Branch.
Amalberti, R. (1996). La conduite des systkmes ri risques. Paris: PUF.
Australian Transport Safety Bureau. (2008). In-flight upset 154 km west of Learmonth, WA 7 October 2008 VH-QPA Airbus A330-303. Canberra: Australian
Transport Safety Bureau.
Board, Columbia Accident Investigation. (2003). Columbia Accident Investigation Board Vol 1. Washington, D.C: Columbia Accident Investigation Board.
CME Group. (2010). What Happend on May 6th? Chicago: CME Group.
Department of Energy. (1999). DOE Workbook, Conducting Accident Investigations . Washington,: Department of Energy.
Dulac, N. (2007). A Framework for Dynamic Safety and Risk Management Modeling in Complex Engineering Systems. Cambridge: MIT.
Easley, D., Lopez de Prado, M. M., & O'Hara, M. (2012). Flow Toxicity and Liquidity in a High Frequency World. Review of Financial Studies , 1457-1493.
Easley, D., Lopez de Prado, M. M., & O'Hara, M. (2010). The Microstructure of the ‘Flash Crash’: Flow Toxicity, Liquidity Crashes and the Probability of
Informed Trading. he Journal of Portfolio Management , 118-128.
Ferry, T. (1988). Modern Accident Investigation and Analysis. Second Edition. New York: Wiley.
Gouran , D. S., Hirokawa,, R. Y., & Martz, A. E. (1986). A critical analysis of factors related to decisional processes involved in the challenger disaster.
Central States Speech Journal , 37.
Høyland, A., & Rausand, M. (1994). System reliability Theory: Models and Statistical Methods. New York: Wiley.
Heimann, C. F. (1993). Understanding the Challenger Disaster: Organizational Structure and the Design of Reliable Systems. The American Political
Science Review , 87, 421-435.
57. Hollnagel, E. (1998). Cognitive Reliability and Error Analysis Method. Oxford: Elsevier Science Ltd.
Hollnagel, E. (2012). FRAM – The Functional Resonance Analysis Method. Farnham: Ashgate.
Hollnagel, E. (2005). Functional Resonance Accident Model Method and examples. COGNITIVE SYSTEMS ENGINEERING LABORATORY . University of
Linköping.
Hollnagel, E. (2002). Understanding accidents-from root causes to performance variability. Human Factors and Power Plants, 2002. Proceedings of the
2002 IEEE 7th Conference on , (pp. 1 - 1-6 ).
Hopkins, A. (2006, December). Studying organisational cultures and their effects on safety. Safety Science , 44, pp. 875-889.
Keong, T. H. (1997, July 9). Risk Analysis Methodologies. Retrieved June 8, 2012, from pacific.net.sg: http://home1.pacific.net.sg/~thk/risk.html
Kim, M., Seong, P., & Hollnagel, E. (2006). A probabilistic approach for determining the control mode in CREAM. Reliability Engineering and System
Safety , 191-199.
Lehto, M. (1991). Models of accident causation and their application: Review and reappraisal. journal of engineering and technology management , 173.
Leveson, N. G. (2004). A new accident model for engineering safer systems. Safety Science , 237-270.
Perrow, C. (1984). Normal Accidents: Living With High-Risk Technologies. New york: Basic books.
PRESIDENTIAL COMMISSION on the Space Shuttle Challenger Accident. (1986). Report of the PRESIDENTIAL COMMISSION on the Space Shuttle
Challenger Accident. Washington, D.C.: PRESIDENTIAL COMMISSION on the Space Shuttle Challenger Accident.
Qureshi, Z. H. (2007). A review of accident modelling approaches for complex socio-technical systems. SCS '07 Proceedings of the twelfth Australian
workshop on Safety critical systems and software and safety-related programmable systems (pp. 47-59). Darlinghurst: Australian Computer Society.
Rasmussen, J. (1997). Risk management in a dynamic society: a modelling problem. Safety Sci. , 183–213.
Reason, J. (1997). Managing the Risks of Organizational Accidents. Aldershot: Ashgate.
Serwy, R. D., Rantanen, E. M., & Hollnagel, E. (n.d.). How to do CREAM. Retrieved August 3, 2012, from Cognitive Reliability Error Analysis Method Web
Demonstration Version 0.6: http://www.ews.uiuc.edu/~serwy/cream/v0.6.1/
Sklet, S. (2003). Comparison of some selected methods for accident investigation. Journal of hazardous materials , 29-37.
Smith, D. (2000). On a wing and a prayer? Exploring the human components of technological failure. Syst. Res. , 543–559.
Svedung, I., & Rasmussen , J. (2002). Graphic representation of accident scenarios: mapping system structure and the causation of accident. Safety
Science , 397±417.
Svenson, O. (2001). Accident and Incident Analysis Based on the Accident Evolution and Barrier Function ( AEB) Model. Cognition, Technology & Work
, 42-52.
Svenson, O. (1991). The Accident Evolution and Barrier Function (AEB) Model Applied to Incident Analysis in the Processing Industries. Risk Analysis
, 499–507.
The staffs of the U.S. Commodity Futures Trading Commission and the U.S. Securities and Exchange Commission. . FINDINGS REGARDING THE
MARKET EVENTS OF MAY 6, 2010 . Washington, D.C : U.S. Commodity Futures Trading Commission and the U.S. Securities and Exchange Commission.
Øien, K. (2001). Risk indicators as a tool for risk control. Reliability Engineering & System Safety , 129–145.