Cheryl MacKenzie from the United States Chemical Safety Board and Peter Wilkinson, who provided support to the CSB during the investigation, will take us through some of the more important human and organisational factors and discuss how these can be put into practice and explain why the disaster was not a Black Swan.
The speakers will highlight the issues that should be at the forefront of our thinking in all the everyday operations we take so much for granted. Understanding process safety may save our lives, and the lives of our work mates.
3. Organisational Issues From Investigation
Perspectives
Presented by Cheryl MacKenzie, Investigator at United States
Chemical Safety and Hazard Investigation Board (CSB)
4. Cheryl MacKenzie, U.S. CSB Investigations Team Lead
Cheryl.Mackenzie@csb.gov www.csb.gov
February 20 – 27, 2017
Deepwater Horizon Revisited
CSB Investigative Insights
University of Sydney Chemical and Biomolecular
Engineering Foundation & SIA
5. • Purpose – Popular View vs. Reality
• Who is the U.S. Chemical Safety Board?
• What is Human & Organizational Factors?
• Investigation HOF Findings and
Conclusions
• Broad Takeaways
Outline
7. • Single “bad guy” and single bad actor
among industry
• Individuals on the rig made inexplicable,
bad decisions
• Profits and greed solely to blame
• Incident could have been prevented had it
not been for a few incompetent people
Popular View – Movie and Elsewhere
8. • Complex incident involving multiple parties
making numerous (often unrecognizably)
interdependent decisions
• Individuals on the rig made decisions and
took actions that made sense to them at time
• Identifying gaps between policy and practice
give useful safety insights
• More proactive approaches for hazard
management exist
Reality
9. US Chemical Safety Board
Drive chemical safety change through independent
investigations to protect people and the environment
10. • Independent non-regulatory federal
agency
• Investigate catastrophic chemical
accidents in the US
• Determine causes and identify
lessons learned
• Make recommendations for safety
improvements
US Chemical Safety Board (CSB)
12. • Modifying individual behavior
• Finding fault in order to blame
• Weeding out the bad apples
‘Human & Organizational Factors’
is NOT about
13. • Understanding the interactions between people
and other elements of a complex system
• Defining what we expect of those interactions
• Determining if those expectations are
reasonable
• Putting in place systems and processes that
ensure those expectations can be achieved
• Monitoring the gaps between expectations and
practice
Human & Organizational Factors is about
14. • Crew exhibited natural human
tendencies to rationalize situation
• Undefined and unrealistic expectations
placed on the well operations crew
• Major gaps in Work-as-Imagined versus
Work-as-Done
• Organizational practices influenced
human performance
CSB found:
15. Adverse outcomes are not the
result of unusual actions in usual
conditions, but the result of usual
actions in unusual conditions.
Erik Hollnagel, “Is Justice Really Important for Safety?,” 2013
22. Temporary Abandonment
• Install cement surface plug
• Intentionally unbalance the well to
test for integrity
• Monitor well conditions
• Remove mud fully
23. “Free gas in the riser represents one of the most
dangerous situations on a rig from a standpoint of
personnel safety… It is not out of the realm of
possibilities that this slow migration of gas in the
riser could go unnoticed as the other activities are
taking place, and the gas will begin to unload
before anyone notices it.”
BP Well Control Manual
24. Partially remove
the mud barrier
Negative
(pressure/flow)
test(s)
Crew accepts
negative
(pressure/flow)
test results
Negative Test
25. Challenges
• Downhole conditions inferred and
calculated
• Delayed real-time feedback
• Various groups provide critical
information – no one person or entity can
feasibly have it all
• The interconnectedness of decisions not
fully understood
26. Unusual Spacer
• Spacer is used to displace the mud in
preparation to test for well integrity
• Atypical type and amount used
• No operational reason for decision; chosen
to ease disposal.
• This material likely plugged the kill line that
later was used to conduct the negative test
that was deemed a success. But that was
not known to the crew
27. • Riser level was not full
• The level could have dropped before BOP was
closed or after
─ After = leak past annular
─ Before = well integrity lost
• Crew assumed it was after the BOP was closed;
this option made more sense to them
Rationalized Well Conditions Based on
Experience
28. Post-Incident Well Data Analysis
Real-time Deepwater Horizon data indicates the drillpipe
pressure began to drop just after the crew closed an
annular preventer, implying a loss of well integrity NOT
leaking annular.
Why did that assumption seem more plausible?
• Challenges of well up to now successfully overcome,
reinforcing mentality that success was inevitable
- Multiple loss-of-well control events
- Changes to drilling plans to accommodate challenges
• Various personnel deemed the cement job successful
• Positive pressure test was successful (e.g., no leaks
from inside the well to the outside)
• It is “not uncommon” to see an annular leak.
31. Gap Between Policies and Practices
BP did not send a corrected “Forward Plan”
Transocean had policy to co-develop Standing Instructions
to the Driller (SID) with its customer (BP)
‒ Described as a key communication tool that should be
discussed with drillers at the beginning of a shift
A Transocean advisory issued weeks before noted that a
SID should “raise awareness and […] highlight”
underbalanced conditions in a well when a single barrier is
present
‒ No evidence SIDs were used on the Deepwater Horizon
33. Procedure Assumes
Successful Test
Close [BOP] and
conduct negative
test. After
successful
negative test
open [BOP]
The night shift WSL recalled
participating in approximately 50
previous negative tests; to his
knowledge, never had one failed.
34. Negative Test Procedure & Approach
• At least 6 different procedures used by the
DWH from August 2007 through April 2010
• The procedure at Macondo was different
from any of these
• Transocean required written procedures for
safety critical tasks—including negative
tests
• Generic DWH procedures identified personal
safety and minor spills of mud
36. Conversation between Well Site Leader
and Onshore Drilling Engineer
• Conversation about the next steps - negative test
came up
• WSL tells ODE test was “squirrelly” but “no
problems”
• Toolpusher/drill crew was “annular compression”
that “happens all the time”
• “If there had been a kick in the well, we would have
seen it”
37. Conversation between Well Site Leader
and Onshore Drilling Engineer
• Lacking contextual information
• Influence of org hierarchy and structure
• Relationship will impact tone and purpose
• Purpose of call is to discuss next steps
39. Conversation between Mudlogger and
Other Well Operations Crewmembers
• Mudlogger provides a second set of eyes
on the well data from the control board and
video feed of fluid flow on the rig
• Perceived as independent layer of
protection
• Yet not privy to all pertinent information to
fulfill his protective role
40. Conversation between Mudlogger and
Other Well Operations Crewmembers
• Multiple fluid movements and transfers between
pits and off the rig between 9:10 and 9:35 pm
– These activities impacted his understanding of the
data he was meant to monitor
• When sought information, didn’t get sufficient
feedback
• Org structure discouraged assertiveness
• Not co-located – lacking same visual and
contextual information as well operations crew
41. Other Organizational Factors
• Development and use of relevant safety
performance indicators and metrics
─ LTI award recognition from BP to
Transocean
─ LTI ≠ control over major accident
hazards
43. Other Organizational Factors
• Development and use of relevant safety
performance indicators and metrics
─ LTI award recognition from BP to
Transocean
─ LTI ≠ control over major accident
hazards
─ Influence of safety observation
programs
44. Influence of Safety Observation Program
Policy: Employees shall observe and report unsafe
situations/activities
• Transocean crews required to submit daily START card
• Crewmembers believed the focus on the quantity not
quality of observation.
• “people [tried] not to rat people out so to speak, you
know like you wanted to be helpful, […] whereas some
of the higher-ups in the office, they kind of wanted to
weed out problems …”
• “I’ve seen guys get fired for someone [writing] a bad
START card about them”
(pg 143-144, Vol 3 CSB Macondo Report)
45. Well Control Events – Precursor Data
2008 – 2009:
• 6 riser unloading events
2009:
• 121 well control events
• 32 different operators
• Various geographic
locations
Source: Transocean Well Control Events & Statistics report, 2005 - 2009
Indicators:
• Kick volume
• Kick intensity
• Riser unloading
events
46. Other Organizational Factors
• Insights of Organizational Culture found
in the WAI-WAD Gap
─ Not necessarily about operational
discipline
─ The gap is there for a reason, and it is
usually not due to complacency
─ The gap reveals discrepancies
between espoused values and actual
culture
47. • Complex incident involving multiple parties
making numerous (often unrecognizably)
interdependent decisions
• Individuals on the rig made decisions and
took actions that made sense to them at time
• The power of metrics
• Safety opportunity resides within the gaps
between policies and practice
Broad Takeaways
48. Beyond Today’s Presentation
Volume 1
Incident Background
Offshore Drilling Primer
Volume 2
Blowout Preventer
Safety Critical Barrier Management
Volume 3
Human & Organizational Factors
Safety Performance Indicators
Risk Management
Corporate Governance
Safety Culture
Volume 4
US Offshore Safety Regulations
During & Post-Macondo
Attributes of An Effective Regulator
& Regulatory System
49. This presentation for the SIA and the University of
Sydney Chemical and Biomolecular Engineering
Foundation by Cheryl MacKenzie, Investigator for the
U.S. Chemical Safety and Hazard Investigation Board,
on February 20 – 27, 2017, is for general informational
purposes only. The presentation is the view of Ms.
MacKenzie. References, conclusions or other
statements about CSB investigations may not
represent a formal, adopted product or position of the
entire Board. For information on completed
investigations, please refer to the final written products
on the CSB website at: www.csb.gov.
Disclaimer
50. Cheryl MacKenzie, U.S. CSB Investigations Team Lead
Cheryl.Mackenzie@csb.gov www.csb.gov
Questions?
54. What will I cover?
Definition of Human and Organisational Factors (HOF)
The special problem of very low probability but very high consequence events
What are the main HOF issues and what can we do about them?
A checklist for improvement – not limited to oil and gas
But first – what do we mean by Human and Organisational Factors?
54
55. What are Human and Organisational Factors?
Definitions of Human Factors:
Human Factors are “the study of the interactions between human and machine” –
Gordon 1998
Human Factors “…include a focus on environmental, organisational and job factors
which influence work behaviour in a way that can affect health and safety” – UK
HSE
Human Factors “…[cover] …management functions, decision making, learning
and communication, training, resource allocation and organisational culture”
As the focus has widened the term Human and Organisational Factors is increasingly
used.
HOF is multi-disciplinary: Psychology, Management Science, Sociology,
Anthropology…
55
56. Intellectual roots of HOF
Aviation
Oil and Gas
Chemical
Mining
Rail
Healthcare
Public Service
Maritime
56
Management
Science
Psychology
Engineering
Sociology
Academic Disciplines Sectors
57. Incident causation?
Key Question – what is the mental model of incident causation in your organisation?
What are the causes of incidents?
80% caused by human error? – So, who caused the remaining 20%?
Is there ever one root cause?
More modern view:
Humans involved in all incidents, but not just at front line
Managers, supervisors, designers, manufacturers, suppliers at all levels and not just
“hands on” front line workers
Incidents typically involve failures or defects in:
Systems, Processes and procedures
Equipment, hardware and software
Organisational culture or “climate”
57
58. Terminology – We have a problem
Sector/region specific terms
Process Safety - PSM USA
Major Accident Events - MAEs offshore oil and gas e.g. UK/Australia
Major Accident Hazards – onshore major hazard industry
Technical Safety – old BP term
Catastrophic Events – some mining companies
Material Unwanted Events - ICMM (international mining peak body)
These are all low probability/high consequence events. They can have devastating
impacts on people, the environment and businesses.
They are material risks to an organisation and they (and their precursor events)
warrant serious attention.
They share a similar set of underpinning ideas and concepts.
58
59. Terminology – Process vs Personal Safety
59
Leak in oil
pipeline can
result in:
BUT – it is down
to chance which,
if any of these
consequences
eventuate
Gas Release
Oil Spill
Loss of
Supply
Financial
Loss
Reputation
Damage
Fire
Explosion
Environment
Damage
60. Low probability but high consequence events –
Are these especially difficult to deal with and if so, why?
Feedback
Low probability but high consequence = less feedback?
High probability but low consequence = more feedback
Cognitive biases
Optimism bias – “she’ll be right…!”
Availability heuristic and risk matrices
Confirmation bias
Hindsight bias
Decision Making
Decision making – validity of rational actor model?
Making sense of decision making in practice
Leadership
Avoiding dissonance “…tell me why this can’t happen to us”
And finally how well does bad news travel upwards in an organisation? But you cannot manage
what you do not know about
60
61. Many high hazard organisations will have one
or more of these characteristics
Strong focus on personal safety including fatality risk
Genuine shock and surprise when a serious event occurs – they might even call
it a Black Swan event!
They have a large number of systems, procedures, policies, practices.
Quality sometimes good BUT ease of use varies due to:
Volume of material
Complexity
Clarity
They assume that work is done in accordance with the written procedures –
work as imagined vs work as done
Reporting on “health” of risk controls – doesn’t get high enough in organisation
And even where it does - it is often unduly optimistic
61
62. Signs, symptoms and treatments (1)
Strong focus on personal safety especially fatality risk
LEADERSHIP
Senior leaders can articulate the difference between process safety (or
MAEs/Catastrophic Hazards etc.) and personal safety
Metrics for both types of hazards are reported to the top
Senior leaders are incentivised to improve control over process safety
The annual report talks about this aspect of the company’s activities
62
63. Signs, symptoms and treatments (2)
Large volume of paperwork – is it clear what really matters?
Risk assessment and bowties
Volume, complexity and length of procedures
Is their purpose clear
Training
Checklist to be rigidly followed
General guidance
Is what is really important clear?
Why are the first 3 pages about document control?
63
65. 65
Large no.
of controls Processes &
procedures
(Shelfware)?
Complex
bowties
“shelf-ware”
But what really matters?
Bowtie
Critical control
summary sheet
66. Signs, symptoms and treatments (3)
Monitoring of the implementation of controls –
“work as imagined” vs “work as done”
Is there a clear model of the purpose and scope of monitoring?
Who is accountable for monitoring control implementation”?
How is this to be done?
What is the frequency of monitoring?
Do supervisors at all levels have the skills for this?
Are the results available in a useful format – after all these are “material risks”
66
67. Simplified model of monitoring
67
Managers
Ensure supervisors have
systems for monitoring critical
controls and carry out some
monitoring themselves
Supervisors
Monitor implementation
of critical controls by operators
Front line workers
Do the work! Carry out their
own monitoring – including
each other
Audit
68. Signs, symptoms and treatments (4)
Reporting and governance over Risk
Process Safety risks are usually Material Risks – ASX Principle 7
As a result should appear in the Enterprise Risk Management System
Are conclusions on these sorts of risk based on field data? Or are they
unsupported assertions?
Bad news doesn’t travel upwards well – but cannot manage what you do not
know about
Reward bad news – but expect people to bring you solutions too!
68
69. Signs, symptoms and treatments (5)
Culture
Culture: “…remains a confusing and ambiguous concept…little evidence of a
relationship between safety culture and safety performance…”
Values + Practices = Culture (John Coleman, Harvard Business Review); Andrew
Hopkins and Edgar Schein say much the same.
Values can be faked – Practices are visible. In good cultures; Values and
Practices must be in sync
To improve culture as applied to safety – a focus on practices is likely to be more
successful. Practices repeated are “How we do things round here.”
Putting Safety Critical Controls at the heart of the prevention
(and mitigation) strategy for MAEs is good for the culture!
69
70. HOFs – Examples of what we can do (1)
Deepwater Horizon HOF Issues
7 years LTI free award Leadership Focus Personal injury data not related
to major accident prevention
Diverter: over-reliance on front
line personnel
Human Factors Engineering Engineering design important in
preventing human error
Focus on risk on environmental
“spills”
Decision on cement plug
integrity
Group think and confirmation
bias
WAI vs WAD
Senior and respected “black
hat” as part of team to
challenge
Assumption re drillers
instructions
WAI vs WAD Active monitoring of critical
controls
70
71. HOFs – Examples of what we can do (2)
Deepwater Horizon HOF Issues
BOP – technical issues Maintenance induced error? Design, active monitoring of
maintenance procedures
Previous incidents did not result
in effective action to
communicate and take action
Lessons Learnt processes
ineffective
Many organisations identify “Lessons
to be learnt” – lesson only learnt when
tools, techniques, practices are
changed and implemented
Important issues left to front line
personnel
Availability Heuristic How well can we tell stories about low
probability but high consequence
events
Risk Matrices
legal blameworthy approaches
especially front line workers
Fundamental Attribution
Error
Hindsight Bias
Normative thinking and language
prevalent
71
72. Conclusion
We know there are a variety of factors involved in major accidents
But we are better at dealing with the engineering compared with the human
and organisational
Todays thesis is that naming and explaining some of these HOFs helps people to
talk about them, research them and apply them in practice
Some HOFs are easier to deal with than others eg reduce over focus on LTIs
compared with managing group think – but techniques are readily available to
address most HOFs
72