Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
What the NTSB teaches us about
incident management & postmortems
Michael Kehoe
Staff Site Reliability Engineer
Agenda and Vision
Today’s
agenda
1 Introductions
2 Background on the NTSB
3 NTSB: Investigative Process
4 Recommendations & Most Wanted List...
Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team;
• Disaster Recovery
• Incident ...
Production-SRE Team @ LinkedIn
$ /USR/BIN/WHOAMI
● Disaster Recovery - Planning & Automation
● Incident Response – Process...
Incident Command System (ICS)
https://training.fema.gov/emiweb/is/icsresource/assets/reviewmaterials.pdf
Background on the NTSB
Background on the NTSB
JURISDICTION
● Aviation
● Surface Transportation
● Marine
● Pipeline
● Assistance to other agencies...
“The NTSB shall investigate or have investigated and
establish the facts, circumstances, and cause or
probable cause of ac...
“… The Board shall report on the facts and
circumstances of each accident investigated…The
Board shall make each report av...
“The NTSB does not assign fault or blame for an
accident or incident…accident/incident
investigations are fact-finding pro...
Similar Organizations
● Italy –Agenzia nazionale per la
Sicurezza del Volo (ANSV)
● Canada – Transportation Safety Board
o...
NTSB Investigation Process
NTSB Investigation Process
1. Pre-Investigation Preparation
2. Notification & Initial Response
3. On-Scene Activities
4. P...
1. Pre-Investigation
Preparation
Pre-Investigation Preparation
GO TEAM
● Go team: On call investigators ready for
assignments
● Investigator-In-Change (IIC...
Pre-Investigation Preparation
GO TEAM ROSTER
● Oncall roster made available internally
○ Phone & Pager numbers
● Updated w...
2. Notification & Initial
Response
Notification & Initial Response
REGIONAL RESPONSE
1. Regional office notifies headquarters of
incident
2. Closest regional...
Notification & Initial Response
HEADQUARTERS RESPONSE
1. After incident occurs: communication center
advises IIC and chief...
Notification & Initial Response
NOTIFICATION & ASSIGNMENTS
● Go-Team composition determined by
incident circumstances
● Se...
Notification & Initial Response
PARTY NOTIFICATION
● IIC gives party status to organizations that
can provide technical as...
3. On-Scene Activities
On-Scene Activities
COMMAND ROOMS
● Have meeting rooms to accommodate at least
30 people
● Have space for media
● Ensure y...
On-Scene Activities
COMMAND ROOMS
● For Major investigations, Administrative
support is provided
● Government purchase car...
On-Scene Activities
ORGANIZATIONAL MEETING
● Share preliminary information
● Organize (assign) participants
● Organize obs...
“The manner in which the IIC conducts the
organizational meeting will establish the tone of the
investigation. Therefore, ...
On-Scene Activities
ACCIDENT SITE SAFETY PRECAUTIONS
● Safety officer identifies & classifies risks and
then develops coun...
On-Scene Activities
OBSERVERS
● Observers may be allowed if they do not have
self-interest
● May include:
○ Congressional ...
On-Scene Activities
LINE OF AUTHORITY
● IIC is the most senior person on-scene and all
investigative activity is under his...
On-Scene Activities
PROGRESS MEETINGS
● On-site progress meetings are held daily to:
○ Disseminate information obtained
○ ...
On-Scene Activities
DAILY ACTIVITIES OF IIC
● Headquarters briefing
● Safety board staff meeting
● Party coordinator meeti...
4. Post-On-Scene Activities
NTSB Report Structure
Gathering facts
about the incident
Factual
Information
Extra information
Appendices
Analyze how the
...
Post-On-Scene Activities
WORK PLANNING
● Discuss activities that will follow the on-scene
phase of investigation
● Build t...
Post-On-Scene Activities
FACTS & ANALYSIS REPORT
● A factual report based on the field notes and
subsequent investigation ...
Post-On-Scene Activities
PUBLIC HEARING
● Led by IIC/ Hearing Officer
● Identify witnesses whose testimony is
appropriate
...
Post-On-Scene Activities
TECHNICAL REVIEW
● Provides an additional opportunity for all
parties to review all factual infor...
Post-On-Scene Activities
PREPARATION OF FINAL REPORT
● Dedicated department to help write report
● Follows a standard temp...
Recommendations &
Most Wanted List
Recommendations & Most Wanted List
● NTSB advocates for particular action items
based on report(s):
○ Generally directed t...
How this relates to all of us?
1. Pre-Investigation
Preparation
Applying this to operations
PRE-INCIDENT PREPARATION
● Have an Incident commander pre-assigned
● Publish on-call schedules...
2. Notification & Initial
Response
Applying this to operations
NOTIFICATION & INITIAL RESPONSE
● NOC/ SiteOps teams notifies incident
commander + manager
○ P...
Applying this to operations
NOTIFICATION & INITIAL RESPONSE
● Once verified, we launch full response for Major
Incident
● ...
3. On-Scene Activities
Applying this to operations
ON-SCENE ACTIVITIES
● Private + Public slack work-channels
● IC is empowered to make decisions...
Applying this to operations
ON-SCENE ACTIVITIES
● War room
○ Incident commander drives the war-
room
○ Roles & responsibil...
4. Post-On-Scene Activities
Applying this to operations
POST ON-SCENE ACTIVITIES
● Post mortem
○ Dedicated team
○ PM Template
○ Blameless
● “Postmorte...
Recommendations:
Most Wanted List
Applying this to operations
MOST WANTED LIST
● Use the post-incident process to improve
and hold people accountable for ac...
Final Thoughts
Final Thoughts
Complete Incident +
Postmortem process
NTSB Investigative
Process
The more you put in,
the more you’ll get
...
Questions?
AllDayDevops: What the NTSB teaches us about incident management & postmortems
Upcoming SlideShare
Loading in …5
×

of

AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 1 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 2 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 3 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 4 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 5 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 6 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 7 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 8 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 9 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 10 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 11 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 12 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 13 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 14 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 15 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 16 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 17 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 18 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 19 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 20 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 21 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 22 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 23 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 24 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 25 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 26 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 27 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 28 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 29 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 30 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 31 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 32 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 33 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 34 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 35 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 36 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 37 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 38 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 39 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 40 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 41 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 42 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 43 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 44 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 45 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 46 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 47 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 48 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 49 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 50 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 51 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 52 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 53 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 54 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 55 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 56 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 57 AllDayDevops: What the NTSB teaches us about incident management & postmortems Slide 58
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

AllDayDevops: What the NTSB teaches us about incident management & postmortems

Download to read offline

The National Transport Safety Bureau is one of the most widely known Government bodies in the world. It’s their role to run into an incident, secure the scene and understand everything that happened. Given the important and unpredictable nature of their work, they have an extensive manual that sets out how incidents should be attended to and how the investigation should progress.

This session will detail how the NTSB’s approach to its work and the procedure that drives it, is transferable to us as incident responders. We’ll talk about the NTSB’s pre-incident preparation, incident notification, attending it, collecting information from the field and writing up a report and holding hearings. We’ll consistently draw parallels to IT incident management and how to create applicable process and procedures that mimic those of the NTSB.

Related Books

Free with a 30 day trial from Scribd

See all

AllDayDevops: What the NTSB teaches us about incident management & postmortems

  1. 1. What the NTSB teaches us about incident management & postmortems Michael Kehoe Staff Site Reliability Engineer
  2. 2. Agenda and Vision
  3. 3. Today’s agenda 1 Introductions 2 Background on the NTSB 3 NTSB: Investigative Process 4 Recommendations & Most Wanted List 5 How this applies to us? 6 Final thoughts
  4. 4. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team; • Disaster Recovery • Incident Response • Visibility Engineering • Reliability Principles • Find me online at: • @matrixtek • https://michael-kehoe.io • linkedin.com/in/michaelkkehoe
  5. 5. Production-SRE Team @ LinkedIn $ /USR/BIN/WHOAMI ● Disaster Recovery - Planning & Automation ● Incident Response – Process & Automation ● Visibility Engineering – Making use of operational data ● Reliability Principles – Defining best practice & automating it
  6. 6. Incident Command System (ICS) https://training.fema.gov/emiweb/is/icsresource/assets/reviewmaterials.pdf
  7. 7. Background on the NTSB
  8. 8. Background on the NTSB JURISDICTION ● Aviation ● Surface Transportation ● Marine ● Pipeline ● Assistance to other agencies/ governments
  9. 9. “The NTSB shall investigate or have investigated and establish the facts, circumstances, and cause or probable cause of accidents…” U.S. Code § 1131
  10. 10. “… The Board shall report on the facts and circumstances of each accident investigated…The Board shall make each report available to the public at reasonable cost…” U.S. Code § 1131
  11. 11. “The NTSB does not assign fault or blame for an accident or incident…accident/incident investigations are fact-finding proceedings with no formal issues and no adverse parties … and are not conducted for the purpose of determining the rights or liabilities of any person.” U.S. Code § 1154
  12. 12. Similar Organizations ● Italy –Agenzia nazionale per la Sicurezza del Volo (ANSV) ● Canada – Transportation Safety Board of Canada (TSB) ● Indonesia- Komite Nasional Keselamatan Transportasi (NTSC) ● Netherlands – Dutch Safety Board (DSB) ● Australia – Australian Transport Safety Bureau (ATSB) ● United Kingdom – Air Accidents Investigation Branch (AAIB) ● Germany – Bundesstelle für Flugunfalluntersuchung ● France –Bureau d’Enquetes et d’Analyses pour la Securite de l’Aviation Civile (BEA)
  13. 13. NTSB Investigation Process
  14. 14. NTSB Investigation Process 1. Pre-Investigation Preparation 2. Notification & Initial Response 3. On-Scene Activities 4. Post-On-Scene Activities
  15. 15. 1. Pre-Investigation Preparation
  16. 16. Pre-Investigation Preparation GO TEAM ● Go team: On call investigators ready for assignments ● Investigator-In-Change (IIC) pre-assigned ● Full Go team may contain several subject matter experts; e.g. ○ Human performance ○ Aircraft performance ○ Air Traffic Control
  17. 17. Pre-Investigation Preparation GO TEAM ROSTER ● Oncall roster made available internally ○ Phone & Pager numbers ● Updated weekly ● All personnel should be able to arrive at an airport 2 hours after notification ○ Should have essentials on them if they live far away from an airport ● Division Chiefs responsible for testing pager
  18. 18. 2. Notification & Initial Response
  19. 19. Notification & Initial Response REGIONAL RESPONSE 1. Regional office notifies headquarters of incident 2. Closest regional office to accident will provide at least one investigator to perform PR & “stakedown”
  20. 20. Notification & Initial Response HEADQUARTERS RESPONSE 1. After incident occurs: communication center advises IIC and chief of Major Investigations (who subsequently inform their superiors) 2. OAS director decides whether to launch a Go-Team 3. Other executives are made aware by Chief of Major Investigations
  21. 21. Notification & Initial Response NOTIFICATION & ASSIGNMENTS ● Go-Team composition determined by incident circumstances ● Send more specialists if in doubt
  22. 22. Notification & Initial Response PARTY NOTIFICATION ● IIC gives party status to organizations that can provide technical assistance (airlines, aircraft manufacturers etc.) ● Communication center will help with travel arrangements and on-site administrative support ● Go-Team will travel together to accident site
  23. 23. 3. On-Scene Activities
  24. 24. On-Scene Activities COMMAND ROOMS ● Have meeting rooms to accommodate at least 30 people ● Have space for media ● Ensure you have equipment in command room ○ PCs ○ Telephone systems ○ Forms ● IIC is responsible for managing this
  25. 25. On-Scene Activities COMMAND ROOMS ● For Major investigations, Administrative support is provided ● Government purchase card is available for goods or services
  26. 26. On-Scene Activities ORGANIZATIONAL MEETING ● Share preliminary information ● Organize (assign) participants ● Organize observers ● Establish lines of authority
  27. 27. “The manner in which the IIC conducts the organizational meeting will establish the tone of the investigation. Therefore, the importance of being organized, articulate, assertive, composed, and understanding cannot be overstated” Major Investigations Manual Sec 3.2
  28. 28. On-Scene Activities ACCIDENT SITE SAFETY PRECAUTIONS ● Safety officer identifies & classifies risks and then develops counter-measures ● Safety officer performs daily briefings to accident site team.
  29. 29. On-Scene Activities OBSERVERS ● Observers may be allowed if they do not have self-interest ● May include: ○ Congressional oversight committee(s) ○ Military personnel ○ Foreign Governments ○ Federal Agencies
  30. 30. On-Scene Activities LINE OF AUTHORITY ● IIC is the most senior person on-scene and all investigative activity is under his/ her control ● If IIC cannot resolve an issue, IIC may talk to Chief of Major Investigations ● Ability to escalate further if required
  31. 31. On-Scene Activities PROGRESS MEETINGS ● On-site progress meetings are held daily to: ○ Disseminate information obtained ○ Plan the day’s activities ○ Discuss plans for subsequent investigative activities ● Generally start at 6pm ● Plan next day’s meeting
  32. 32. On-Scene Activities DAILY ACTIVITIES OF IIC ● Headquarters briefing ● Safety board staff meeting ● Party coordinator meeting ● Site visit
  33. 33. 4. Post-On-Scene Activities
  34. 34. NTSB Report Structure Gathering facts about the incident Factual Information Extra information Appendices Analyze how the facts contribution to the incident Analysis Draw conclusions about what happened Conclusions Write detailed recommendations Recommendations
  35. 35. Post-On-Scene Activities WORK PLANNING ● Discuss activities that will follow the on-scene phase of investigation ● Build timelines for work ● Provides avenues for various teams to work together
  36. 36. Post-On-Scene Activities FACTS & ANALYSIS REPORT ● A factual report based on the field notes and subsequent investigation activities ● Each group chairman shall submit an analysis report based on the information contained in his or her factual report.
  37. 37. Post-On-Scene Activities PUBLIC HEARING ● Led by IIC/ Hearing Officer ● Identify witnesses whose testimony is appropriate ● The witnesses may be from the parties to the investigation or can be suggested by one or more of the parties. ● Purpose: To ensure all relevant information is gathered before writing the report
  38. 38. Post-On-Scene Activities TECHNICAL REVIEW ● Provides an additional opportunity for all parties to review all factual information ● Ensures all issues are resolved ● Technical Review is held as soon as possible after public hearing
  39. 39. Post-On-Scene Activities PREPARATION OF FINAL REPORT ● Dedicated department to help write report ● Follows a standard template ○ Annex 13 to the International Civil Aviation Organization (ICAO) ● Contains formal recommendations to manufacturers/ transportation authorities
  40. 40. Recommendations & Most Wanted List
  41. 41. Recommendations & Most Wanted List ● NTSB advocates for particular action items based on report(s): ○ Generally directed towards Transport bodies/ manufacturers ● NTSB publicly tracks response of the responsible body https://www.ntsb.gov/safety/mwl/Pages/default.aspx
  42. 42. How this relates to all of us?
  43. 43. 1. Pre-Investigation Preparation
  44. 44. Applying this to operations PRE-INCIDENT PREPARATION ● Have an Incident commander pre-assigned ● Publish on-call schedules ○ Manager is responsible ● Test on-call pagers regularly ● Ensure that you can respond within SLA ● Printed copy of Oncall contact info ● DR http://i.imgur.com/wvg8IDq.gif
  45. 45. 2. Notification & Initial Response
  46. 46. Applying this to operations NOTIFICATION & INITIAL RESPONSE ● NOC/ SiteOps teams notifies incident commander + manager ○ Prod-SRE gets engaged ● Prod-SRE Manager/Oncall ○ Access, Engage, Notify, Mitigate https://docs.microsoft.com/en-us/windows/uwp/design/shell/tiles-and-notifications/images/toast-mirroring.gif
  47. 47. Applying this to operations NOTIFICATION & INITIAL RESPONSE ● Once verified, we launch full response for Major Incident ● Incident commander gives “party status” to observers ● Manager informs executives & PR ○ Periodic updates ● Mitigate http://www.roadrunneremaillogin.com/wp-content/uploads/2018/06/RoadRunner-Email.jpg
  48. 48. 3. On-Scene Activities
  49. 49. Applying this to operations ON-SCENE ACTIVITIES ● Private + Public slack work-channels ● IC is empowered to make decisions ● Organizational call to ensure: ○ Problem is understood ○ Area of investigations assigned http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
  50. 50. Applying this to operations ON-SCENE ACTIVITIES ● War room ○ Incident commander drives the war- room ○ Roles & responsibilities assigned to each “party” ○ Communication at regular cadence to execs ○ Admin ensures supplies and food ● Gathering data and updating timeline doc http://www.gpla.com/static/img/projects/ubisofts-e3-social-media-war-room/war-room.gif
  51. 51. 4. Post-On-Scene Activities
  52. 52. Applying this to operations POST ON-SCENE ACTIVITIES ● Post mortem ○ Dedicated team ○ PM Template ○ Blameless ● “Postmortem rollup” ○ Action items are prioritized ○ Weekly reporting on status of action- items https://www.economist.com/sites/default/files/imagecache/1280-width/20180414_OFP021.gif
  53. 53. Recommendations: Most Wanted List
  54. 54. Applying this to operations MOST WANTED LIST ● Use the post-incident process to improve and hold people accountable for action items ● Keep track of recurring issues/ repeaters https://clip2art.com/images/meeting-clipart-animated-gif-2.gif
  55. 55. Final Thoughts
  56. 56. Final Thoughts Complete Incident + Postmortem process NTSB Investigative Process The more you put in, the more you’ll get out Invest Accountability for improvements/ action items Accountability
  57. 57. Questions?
  • RaviSingh139

    Oct. 17, 2018

The National Transport Safety Bureau is one of the most widely known Government bodies in the world. It’s their role to run into an incident, secure the scene and understand everything that happened. Given the important and unpredictable nature of their work, they have an extensive manual that sets out how incidents should be attended to and how the investigation should progress. This session will detail how the NTSB’s approach to its work and the procedure that drives it, is transferable to us as incident responders. We’ll talk about the NTSB’s pre-incident preparation, incident notification, attending it, collecting information from the field and writing up a report and holding hearings. We’ll consistently draw parallels to IT incident management and how to create applicable process and procedures that mimic those of the NTSB.

Views

Total views

334

On Slideshare

0

From embeds

0

Number of embeds

6

Actions

Downloads

9

Shares

0

Comments

0

Likes

1

×