What the NTSB teaches us about incident management & postmortems

Michael Kehoe
Michael KehoeArchitect of reliable, scalable infrastructure at LinkedIn
What the NTSB teaches us about
incident management & postmortems
​Jeff Weiner
​Chief Executive Officer
​Michael Kehoe
​Staff Site Reliability Engineer
​Nina Mushiana
​Sr Site Reliability Manager
Agenda and Vision
Today’s
agenda
1 Introductions
2 Background on the NTSB
3 NTSB: Investigative Process
4 Recommendations & Most Wanted List
5 How this applies to us?
6 Final thoughts
Michael Kehoe
​$ /USR/BIN/WHOAMI
● Staff Site Reliability Engineer @ LinkedIn
● Production-SRE Team
● Funny accent = Australian + 4 years American
Nina Mushiana
​$ /USR/BIN/WHOAMI
● Sr Site Reliability Engineer Manager @ LinkedIn
● Production-SRE Team & Site-Ops
Production-SRE Team @ LinkedIn
​$ /USR/BIN/WHOAMI
● Disaster Recovery - Planning & Automation
● Incident Response – Process & Automation
● Visibility Engineering – Making use of
operational data
● Reliability Principles – Defining best practice &
automating it
Incident Command System (ICS)
https://training.fema.gov/emiweb/is/icsresource/assets/reviewmaterials.pdf
Background on the NTSB
Background on the NTSB
​JURISDICTION
● Aviation
● Surface Transportation
● Marine
● Pipeline
● Assistance to other agencies/ governments
“The NTSB shall investigate or have investigated and
establish the facts, circumstances, and cause or
probable cause of accidents…”
U.S. Code § 1131
“… The Board shall report on the facts and
circumstances of each accident investigated…The
Board shall make each report available to the public
at reasonable cost…”
U.S. Code § 1131
“The NTSB does not assign fault or blame for an
accident or incident…accident/incident
investigations are fact-finding proceedings with no
formal issues and no adverse parties … and are not
conducted for the purpose of determining the rights
or liabilities of any person.”
U.S. Code § 1154
Similar Organizations
● Italy –Agenzia nazionale per la
Sicurezza del Volo (ANSV)
● Canada – Transportation Safety Board
of Canada (TSB)
● Indonesia- Komite Nasional
Keselamatan Transportasi (NTSC)
● Netherlands – Dutch Safety Board
(DSB)
● Australia – Australian Transport Safety
Bureau (ATSB)
● United Kingdom – Air Accidents
Investigation Branch (AAIB)
● Germany – Bundesstelle für
Flugunfalluntersuchung
● France –Bureau d’Enquetes et
d’Analyses pour la Securite de
l’Aviation Civile (BEA)
NTSB Investigation Process
NTSB Investigation Process
1. Pre-Investigation Preparation
2. Notification & Initial Response
3. On-Scene Activities
4. Post-On-Scene Activities
1. Pre-Investigation
Preparation
Pre-Investigation Preparation
​GO TEAM
● Go team: On call investigators ready for
assignments
● Investigator-In-Change (IIC) pre-assigned
● Full Go team may contain several subject
matter experts; e.g.
○ Human performance
○ Aircraft performance
○ Air Traffic Control
Pre-Investigation Preparation
​GO TEAM ROSTER
● Oncall roster made available internally
○ Phone & Pager numbers
● Updated weekly
● All personnel should be able to arrive at an
airport 2 hours after notification
○ Should have essentials on them if they
live far away from an airport
● Division Chiefs responsible for testing pager
2. Notification & Initial
Response
Notification & Initial Response
​REGIONAL RESPONSE
1. Regional office notifies headquarters of
incident
2. Closest regional office to accident will
provide at least one investigator to perform
PR & “stakedown”
Notification & Initial Response
​HEADQUARTERS RESPONSE
1. After incident occurs: communication center
advises IIC and chief of Major Investigations
(who subsequently inform their superiors)
2. OAS director decides whether to launch a
Go-Team
3. Other executives are made aware by Chief of
Major Investigations
Notification & Initial Response
​NOTIFICATION & ASSIGNMENTS
● Go-Team composition determined by
incident circumstances
● Send more specialists if in doubt
Notification & Initial Response
​PARTY NOTIFICATION
● IIC gives party status to organizations that
can provide technical assistance (airlines,
aircraft manufacturers etc.)
● Communication center will help with travel
arrangements and on-site administrative
support
● Go-Team will travel together to accident site
3. On-Scene Activities
On-Scene Activities
​COMMAND ROOMS
● Have meeting rooms to accommodate at least
30 people
● Have space for media
● Ensure you have equipment in command
room
○ PCs
○ Telephone systems
○ Forms
● IIC is responsible for managing this
On-Scene Activities
​COMMAND ROOMS
● For Major investigations, Administrative
support is provided
● Government purchase card is available for
goods or services
On-Scene Activities
​ORGANIZATIONAL MEETING
● Share preliminary information
● Organize (assign) participants
● Organize observers
● Establish lines of authority
“The manner in which the IIC conducts the
organizational meeting will establish the tone of the
investigation. Therefore, the importance of being
organized, articulate, assertive, composed, and
understanding cannot be overstated”
Major Investigations Manual Sec 3.2
On-Scene Activities
​ACCIDENT SITE SAFETY PRECAUTIONS
● Safety officer identifies & classifies risks and
then develops counter-measures
● Safety officer performs daily briefings to
accident site team.
On-Scene Activities
​OBSERVERS
● Observers may be allowed if they do not have
self-interest
● May include:
○ Congressional oversight committee(s)
○ Military personnel
○ Foreign Governments
○ Federal Agencies
On-Scene Activities
​LINE OF AUTHORITY
● IIC is the most senior person on-scene and all
investigative activity is under his/ her control
● If IIC cannot resolve an issue, IIC may talk to
Chief of Major Investigations
● Ability to escalate further if required
On-Scene Activities
​PROGRESS MEETINGS
● On-site progress meetings are held daily to:
○ Disseminate information obtained
○ Plan the day’s activities
○ Discuss plans for subsequent
investigative activities
● Generally start at 6pm
● Plan next day’s meeting
On-Scene Activities
​DAILY ACTIVITIES OF IIC
● Headquarters briefing
● Safety board staff meeting
● Party coordinator meeting
● Site visit
4. Post-On-Scene Activities
NTSB Report Structure
Gathering facts
about the incident
Factual
Information
Extra information
Appendices
Analyze how the
facts contribution to
the incident
Analysis
Draw conclusions
about what
happened
Conclusions
Write detailed
recommendations
Recommendation
s
Post-On-Scene Activities
​WORK PLANNING
● Discuss activities that will follow the on-scene
phase of investigation
● Build timelines for work
● Provides avenues for various teams to work
together
Post-On-Scene Activities
​FACTS & ANALYSIS REPORT
● A factual report based on the field notes and
subsequent investigation activities
● Each group chairman shall submit an analysis
report based on the information contained in
his or her factual report.
Post-On-Scene Activities
​PUBLIC HEARING
● Led by IIC/ Hearing Officer
● Identify witnesses whose testimony is
appropriate
● The witnesses may be from the parties to the
investigation or can be suggested by one or
more of the parties.
● Purpose: To ensure all relevant information is
gathered before writing the report
Post-On-Scene Activities
​TECHNICAL REVIEW
● Provides an additional opportunity for all
parties to review all factual information
● Ensures all issues are resolved
● Technical Review is held as soon as possible
after public hearing
Post-On-Scene Activities
​PREPARATION OF FINAL REPORT
● Dedicated department to help write report
● Follows a standard template
○ Annex 13 to the International Civil
Aviation Organization (ICAO)
● Contains formal recommendations to
manufacturers/ transportation authorities
Recommendations &
Most Wanted List
Recommendations & Most Wanted List
● NTSB advocates for particular action items
based on report(s):
○ Generally directed towards Transport
bodies/ manufacturers
● NTSB publicly tracks response of the
responsible body
https://www.ntsb.gov/safety/mwl/Pages/default.aspx
How this relates to all of us?
1. Pre-Investigation
Preparation
Applying this to operations
​PRE-INCIDENT PREPARATION
● Have an Incident commander pre-assigned
● Publish on-call schedules
○ Manager is responsible
● Test on-call pagers regularly
● Ensure that you can respond within SLA
● Printed copy of Oncall contact info
● DR
http://i.imgur.com/wvg8IDq.gif
2. Notification & Initial
Response
Applying this to operations
​NOTIFICATION & INITIAL RESPONSE
● NOC/ SiteOps teams notifies incident
commander + manager
○ Prod-SRE gets engaged
● Prod-SRE Manager/Oncall
○ Access, Engage, Notify, Mitigate
https://docs.microsoft.com/en-us/windows/uwp/design/shell/tiles-and-notifications/images/toast-mirroring.gif
Applying this to operations
​NOTIFICATION & INITIAL RESPONSE
● Once verified, we launch full response for Major
Incident
● Incident commander gives “party status” to
observers
● Manager informs executives & PR
○ Periodic updates
● Mitigate
http://www.roadrunneremaillogin.com/wp-content/uploads/2018/06/RoadRunner-Email.jpg
3. On-Scene Activities
Applying this to operations
​ON-SCENE ACTIVITIES
● Private + Public slack work-channels
● IC is empowered to make decisions
● Organizational call to ensure:
○ Problem is understood
○ Area of investigations assigned
http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
Applying this to operations
​ON-SCENE ACTIVITIES
● War room
○ Incident commander drives the
war-room
○ Roles & responsibilities assigned to each
“party”
○ Communication at regular cadence to
execs
○ Admin ensures supplies and food
● Gathering data and updating timeline doc
http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
4. Post-On-Scene Activities
Applying this to operations
​POST ON-SCENE ACTIVITIES
● Post mortem
○ Dedicated team
○ PM Template
○ Blameless
● “Postmortem rollup”
○ Action items are prioritized
○ Weekly reporting on status of
action-items
https://www.economist.com/sites/default/files/imagecache/1280-width/20180414_OFP021.gif
Recommendations:
Most Wanted List
Applying this to operations
​MOST WANTED LIST
● Use the post-incident process to improve
and hold people accountable for action
items
● Keep track of recurring issues/ repeaters
https://clip2art.com/images/meeting-clipart-animated-gif-2.gif
Final Thoughts
Final Thoughts
Complete Incident +
Postmortem process
NTSB Investigative
Process
The more you put in,
the more you’ll get
out
Invest
Accountability for
improvements/
action items
Accountability
Questions?
What the NTSB teaches us about incident management & postmortems
1 of 59

Recommended

AllDayDevops: What the NTSB teaches us about incident management & postmortems by
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
321 views58 slides
Yearly Achievement, Plan SS Securitym before and after. by
Yearly Achievement, Plan SS Securitym before and after.Yearly Achievement, Plan SS Securitym before and after.
Yearly Achievement, Plan SS Securitym before and after.Louison Malu-Malu
409 views13 slides
Successful%20 rsat ver7 by
Successful%20 rsat ver7Successful%20 rsat ver7
Successful%20 rsat ver7VERA ASAMOAH GLOBAL BANKS
332 views17 slides
C by
CC
CCharles Lee
60 views6 slides
Copy of exo sylvan by
Copy of exo sylvanCopy of exo sylvan
Copy of exo sylvanDave Ronca
215 views12 slides
PROCUREMENT: Expediting How To - General by
PROCUREMENT: Expediting How To - GeneralPROCUREMENT: Expediting How To - General
PROCUREMENT: Expediting How To - GeneralSierra Romeo
3.9K views11 slides

More Related Content

Similar to What the NTSB teaches us about incident management & postmortems

Risk Management For Construction Projects by
Risk Management For Construction ProjectsRisk Management For Construction Projects
Risk Management For Construction ProjectsMaritza Tyson
5 views41 slides
OC3 STRATEGIC CONVERSATION FEB 2009 by
OC3 STRATEGIC CONVERSATION FEB 2009OC3 STRATEGIC CONVERSATION FEB 2009
OC3 STRATEGIC CONVERSATION FEB 2009Ian van Vuuren
161 views41 slides
KC_SAFETY CV UPDATED 2 HSE ENGR by
KC_SAFETY CV UPDATED 2 HSE ENGRKC_SAFETY CV UPDATED 2 HSE ENGR
KC_SAFETY CV UPDATED 2 HSE ENGROsmond Okonkwo
289 views2 slides
PM Symposium 2009 Apply Risk Techniques on RAI Prj by
PM Symposium 2009 Apply Risk Techniques on RAI PrjPM Symposium 2009 Apply Risk Techniques on RAI Prj
PM Symposium 2009 Apply Risk Techniques on RAI PrjTerry Startzel, MS, PMP, SCPM, CSM
392 views21 slides
ROGEL resume up date as of AUG. by
ROGEL resume up date as of AUG.ROGEL resume up date as of AUG.
ROGEL resume up date as of AUG.Rogelio C. Montalbo
343 views13 slides

Similar to What the NTSB teaches us about incident management & postmortems(20)

Risk Management For Construction Projects by Maritza Tyson
Risk Management For Construction ProjectsRisk Management For Construction Projects
Risk Management For Construction Projects
Maritza Tyson5 views
OC3 STRATEGIC CONVERSATION FEB 2009 by Ian van Vuuren
OC3 STRATEGIC CONVERSATION FEB 2009OC3 STRATEGIC CONVERSATION FEB 2009
OC3 STRATEGIC CONVERSATION FEB 2009
Ian van Vuuren161 views
KC_SAFETY CV UPDATED 2 HSE ENGR by Osmond Okonkwo
KC_SAFETY CV UPDATED 2 HSE ENGRKC_SAFETY CV UPDATED 2 HSE ENGR
KC_SAFETY CV UPDATED 2 HSE ENGR
Osmond Okonkwo289 views
Is it Necessary to Document the BCMS plan? by PECB
Is it Necessary to Document the BCMS plan?Is it Necessary to Document the BCMS plan?
Is it Necessary to Document the BCMS plan?
PECB 887 views
Event infrastructure by M. C.
Event infrastructure Event infrastructure
Event infrastructure
M. C.16K views
NIGEL DIXON CV 260716 by Nigel Dixon
NIGEL DIXON CV 260716NIGEL DIXON CV 260716
NIGEL DIXON CV 260716
Nigel Dixon110 views
Akmal CV.doc (hse )Apply for hse job by akmal khan
Akmal CV.doc (hse )Apply for hse jobAkmal CV.doc (hse )Apply for hse job
Akmal CV.doc (hse )Apply for hse job
akmal khan610 views
CISSP Week 12 by jemtallon
CISSP Week 12CISSP Week 12
CISSP Week 12
jemtallon1.6K views
Manuel Neto- Resume 2016 by Neto Manuel
Manuel Neto-  Resume 2016Manuel Neto-  Resume 2016
Manuel Neto- Resume 2016
Neto Manuel212 views
C shea 21 ctto presentaion by Colin Shea
C shea   21 ctto presentaionC shea   21 ctto presentaion
C shea 21 ctto presentaion
Colin Shea145 views
C shea 21 ctto presentaion - 1 by Colin Shea
C shea   21 ctto presentaion - 1C shea   21 ctto presentaion - 1
C shea 21 ctto presentaion - 1
Colin Shea152 views
Seminar 141120202109-conversion-gate02 by Ashraf Rady
Seminar 141120202109-conversion-gate02Seminar 141120202109-conversion-gate02
Seminar 141120202109-conversion-gate02
Ashraf Rady63 views

More from Michael Kehoe

eBPF Workshop by
eBPF WorkshopeBPF Workshop
eBPF WorkshopMichael Kehoe
1.4K views26 slides
eBPF Basics by
eBPF BasicseBPF Basics
eBPF BasicsMichael Kehoe
2.7K views63 slides
Code Yellow: Helping operations top-heavy teams the smart way by
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
140 views29 slides
QConSF 2018: Building Production-Ready Applications by
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
193 views43 slides
Helping operations top-heavy teams the smart way by
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
420 views29 slides
Linux Container Basics by
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
510 views29 slides

More from Michael Kehoe(20)

Code Yellow: Helping operations top-heavy teams the smart way by Michael Kehoe
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
Michael Kehoe140 views
QConSF 2018: Building Production-Ready Applications by Michael Kehoe
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
Michael Kehoe193 views
Helping operations top-heavy teams the smart way by Michael Kehoe
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
Michael Kehoe420 views
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops by Michael Kehoe
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Michael Kehoe285 views
PyBay 2018: Production-Ready Python Applications by Michael Kehoe
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
Michael Kehoe283 views
Helping operations top-heavy teams the smart way by Michael Kehoe
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
Michael Kehoe233 views
The Next Wave of Reliability Engineering by Michael Kehoe
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
Michael Kehoe687 views
Building Production-Ready Microservices: DevopsExchangeSF by Michael Kehoe
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
Michael Kehoe452 views
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine... by Michael Kehoe
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
Michael Kehoe321 views
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at... by Michael Kehoe
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
Michael Kehoe270 views
SRECon-Europe-2017: Networks for SREs by Michael Kehoe
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
Michael Kehoe383 views
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale by Michael Kehoe
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Michael Kehoe247 views
Reducing MTTR and False Escalations: Event Correlation at LinkedIn by Michael Kehoe
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Michael Kehoe956 views
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ... by Michael Kehoe
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
Michael Kehoe534 views
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn by Michael Kehoe
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Michael Kehoe720 views
Using SaltStack to Auto Triage and Remediate Production Systems by Michael Kehoe
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
Michael Kehoe1.8K views

Recently uploaded

2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptxlwang78
83 views19 slides
Proposal Presentation.pptx by
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptxkeytonallamon
42 views36 slides
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... by
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...AltinKaradagli
12 views16 slides
Investor Presentation by
Investor PresentationInvestor Presentation
Investor Presentationeser sevinç
25 views26 slides
SPICE PARK DEC2023 (6,625 SPICE Models) by
SPICE PARK DEC2023 (6,625 SPICE Models) SPICE PARK DEC2023 (6,625 SPICE Models)
SPICE PARK DEC2023 (6,625 SPICE Models) Tsuyoshi Horigome
28 views218 slides

Recently uploaded(20)

2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang7883 views
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... by AltinKaradagli
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
AltinKaradagli12 views
fakenews_DBDA_Mar23.pptx by deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra815 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli9 views
Ansari: Practical experiences with an LLM-based Islamic Assistant by M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous5 views
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn5 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
MSA Website Slideshow (16).pdf by msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla76 views

What the NTSB teaches us about incident management & postmortems

  • 1. What the NTSB teaches us about incident management & postmortems ​Jeff Weiner ​Chief Executive Officer ​Michael Kehoe ​Staff Site Reliability Engineer ​Nina Mushiana ​Sr Site Reliability Manager
  • 3. Today’s agenda 1 Introductions 2 Background on the NTSB 3 NTSB: Investigative Process 4 Recommendations & Most Wanted List 5 How this applies to us? 6 Final thoughts
  • 4. Michael Kehoe ​$ /USR/BIN/WHOAMI ● Staff Site Reliability Engineer @ LinkedIn ● Production-SRE Team ● Funny accent = Australian + 4 years American
  • 5. Nina Mushiana ​$ /USR/BIN/WHOAMI ● Sr Site Reliability Engineer Manager @ LinkedIn ● Production-SRE Team & Site-Ops
  • 6. Production-SRE Team @ LinkedIn ​$ /USR/BIN/WHOAMI ● Disaster Recovery - Planning & Automation ● Incident Response – Process & Automation ● Visibility Engineering – Making use of operational data ● Reliability Principles – Defining best practice & automating it
  • 7. Incident Command System (ICS) https://training.fema.gov/emiweb/is/icsresource/assets/reviewmaterials.pdf
  • 9. Background on the NTSB ​JURISDICTION ● Aviation ● Surface Transportation ● Marine ● Pipeline ● Assistance to other agencies/ governments
  • 10. “The NTSB shall investigate or have investigated and establish the facts, circumstances, and cause or probable cause of accidents…” U.S. Code § 1131
  • 11. “… The Board shall report on the facts and circumstances of each accident investigated…The Board shall make each report available to the public at reasonable cost…” U.S. Code § 1131
  • 12. “The NTSB does not assign fault or blame for an accident or incident…accident/incident investigations are fact-finding proceedings with no formal issues and no adverse parties … and are not conducted for the purpose of determining the rights or liabilities of any person.” U.S. Code § 1154
  • 13. Similar Organizations ● Italy –Agenzia nazionale per la Sicurezza del Volo (ANSV) ● Canada – Transportation Safety Board of Canada (TSB) ● Indonesia- Komite Nasional Keselamatan Transportasi (NTSC) ● Netherlands – Dutch Safety Board (DSB) ● Australia – Australian Transport Safety Bureau (ATSB) ● United Kingdom – Air Accidents Investigation Branch (AAIB) ● Germany – Bundesstelle für Flugunfalluntersuchung ● France –Bureau d’Enquetes et d’Analyses pour la Securite de l’Aviation Civile (BEA)
  • 15. NTSB Investigation Process 1. Pre-Investigation Preparation 2. Notification & Initial Response 3. On-Scene Activities 4. Post-On-Scene Activities
  • 17. Pre-Investigation Preparation ​GO TEAM ● Go team: On call investigators ready for assignments ● Investigator-In-Change (IIC) pre-assigned ● Full Go team may contain several subject matter experts; e.g. ○ Human performance ○ Aircraft performance ○ Air Traffic Control
  • 18. Pre-Investigation Preparation ​GO TEAM ROSTER ● Oncall roster made available internally ○ Phone & Pager numbers ● Updated weekly ● All personnel should be able to arrive at an airport 2 hours after notification ○ Should have essentials on them if they live far away from an airport ● Division Chiefs responsible for testing pager
  • 19. 2. Notification & Initial Response
  • 20. Notification & Initial Response ​REGIONAL RESPONSE 1. Regional office notifies headquarters of incident 2. Closest regional office to accident will provide at least one investigator to perform PR & “stakedown”
  • 21. Notification & Initial Response ​HEADQUARTERS RESPONSE 1. After incident occurs: communication center advises IIC and chief of Major Investigations (who subsequently inform their superiors) 2. OAS director decides whether to launch a Go-Team 3. Other executives are made aware by Chief of Major Investigations
  • 22. Notification & Initial Response ​NOTIFICATION & ASSIGNMENTS ● Go-Team composition determined by incident circumstances ● Send more specialists if in doubt
  • 23. Notification & Initial Response ​PARTY NOTIFICATION ● IIC gives party status to organizations that can provide technical assistance (airlines, aircraft manufacturers etc.) ● Communication center will help with travel arrangements and on-site administrative support ● Go-Team will travel together to accident site
  • 25. On-Scene Activities ​COMMAND ROOMS ● Have meeting rooms to accommodate at least 30 people ● Have space for media ● Ensure you have equipment in command room ○ PCs ○ Telephone systems ○ Forms ● IIC is responsible for managing this
  • 26. On-Scene Activities ​COMMAND ROOMS ● For Major investigations, Administrative support is provided ● Government purchase card is available for goods or services
  • 27. On-Scene Activities ​ORGANIZATIONAL MEETING ● Share preliminary information ● Organize (assign) participants ● Organize observers ● Establish lines of authority
  • 28. “The manner in which the IIC conducts the organizational meeting will establish the tone of the investigation. Therefore, the importance of being organized, articulate, assertive, composed, and understanding cannot be overstated” Major Investigations Manual Sec 3.2
  • 29. On-Scene Activities ​ACCIDENT SITE SAFETY PRECAUTIONS ● Safety officer identifies & classifies risks and then develops counter-measures ● Safety officer performs daily briefings to accident site team.
  • 30. On-Scene Activities ​OBSERVERS ● Observers may be allowed if they do not have self-interest ● May include: ○ Congressional oversight committee(s) ○ Military personnel ○ Foreign Governments ○ Federal Agencies
  • 31. On-Scene Activities ​LINE OF AUTHORITY ● IIC is the most senior person on-scene and all investigative activity is under his/ her control ● If IIC cannot resolve an issue, IIC may talk to Chief of Major Investigations ● Ability to escalate further if required
  • 32. On-Scene Activities ​PROGRESS MEETINGS ● On-site progress meetings are held daily to: ○ Disseminate information obtained ○ Plan the day’s activities ○ Discuss plans for subsequent investigative activities ● Generally start at 6pm ● Plan next day’s meeting
  • 33. On-Scene Activities ​DAILY ACTIVITIES OF IIC ● Headquarters briefing ● Safety board staff meeting ● Party coordinator meeting ● Site visit
  • 35. NTSB Report Structure Gathering facts about the incident Factual Information Extra information Appendices Analyze how the facts contribution to the incident Analysis Draw conclusions about what happened Conclusions Write detailed recommendations Recommendation s
  • 36. Post-On-Scene Activities ​WORK PLANNING ● Discuss activities that will follow the on-scene phase of investigation ● Build timelines for work ● Provides avenues for various teams to work together
  • 37. Post-On-Scene Activities ​FACTS & ANALYSIS REPORT ● A factual report based on the field notes and subsequent investigation activities ● Each group chairman shall submit an analysis report based on the information contained in his or her factual report.
  • 38. Post-On-Scene Activities ​PUBLIC HEARING ● Led by IIC/ Hearing Officer ● Identify witnesses whose testimony is appropriate ● The witnesses may be from the parties to the investigation or can be suggested by one or more of the parties. ● Purpose: To ensure all relevant information is gathered before writing the report
  • 39. Post-On-Scene Activities ​TECHNICAL REVIEW ● Provides an additional opportunity for all parties to review all factual information ● Ensures all issues are resolved ● Technical Review is held as soon as possible after public hearing
  • 40. Post-On-Scene Activities ​PREPARATION OF FINAL REPORT ● Dedicated department to help write report ● Follows a standard template ○ Annex 13 to the International Civil Aviation Organization (ICAO) ● Contains formal recommendations to manufacturers/ transportation authorities
  • 42. Recommendations & Most Wanted List ● NTSB advocates for particular action items based on report(s): ○ Generally directed towards Transport bodies/ manufacturers ● NTSB publicly tracks response of the responsible body https://www.ntsb.gov/safety/mwl/Pages/default.aspx
  • 43. How this relates to all of us?
  • 45. Applying this to operations ​PRE-INCIDENT PREPARATION ● Have an Incident commander pre-assigned ● Publish on-call schedules ○ Manager is responsible ● Test on-call pagers regularly ● Ensure that you can respond within SLA ● Printed copy of Oncall contact info ● DR http://i.imgur.com/wvg8IDq.gif
  • 46. 2. Notification & Initial Response
  • 47. Applying this to operations ​NOTIFICATION & INITIAL RESPONSE ● NOC/ SiteOps teams notifies incident commander + manager ○ Prod-SRE gets engaged ● Prod-SRE Manager/Oncall ○ Access, Engage, Notify, Mitigate https://docs.microsoft.com/en-us/windows/uwp/design/shell/tiles-and-notifications/images/toast-mirroring.gif
  • 48. Applying this to operations ​NOTIFICATION & INITIAL RESPONSE ● Once verified, we launch full response for Major Incident ● Incident commander gives “party status” to observers ● Manager informs executives & PR ○ Periodic updates ● Mitigate http://www.roadrunneremaillogin.com/wp-content/uploads/2018/06/RoadRunner-Email.jpg
  • 50. Applying this to operations ​ON-SCENE ACTIVITIES ● Private + Public slack work-channels ● IC is empowered to make decisions ● Organizational call to ensure: ○ Problem is understood ○ Area of investigations assigned http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
  • 51. Applying this to operations ​ON-SCENE ACTIVITIES ● War room ○ Incident commander drives the war-room ○ Roles & responsibilities assigned to each “party” ○ Communication at regular cadence to execs ○ Admin ensures supplies and food ● Gathering data and updating timeline doc http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
  • 53. Applying this to operations ​POST ON-SCENE ACTIVITIES ● Post mortem ○ Dedicated team ○ PM Template ○ Blameless ● “Postmortem rollup” ○ Action items are prioritized ○ Weekly reporting on status of action-items https://www.economist.com/sites/default/files/imagecache/1280-width/20180414_OFP021.gif
  • 55. Applying this to operations ​MOST WANTED LIST ● Use the post-incident process to improve and hold people accountable for action items ● Keep track of recurring issues/ repeaters https://clip2art.com/images/meeting-clipart-animated-gif-2.gif
  • 57. Final Thoughts Complete Incident + Postmortem process NTSB Investigative Process The more you put in, the more you’ll get out Invest Accountability for improvements/ action items Accountability