Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
(Blameless)
post-mortems
@jasonhand
It’s Not Your Fault
Jason Hand
DevOps
“Handyman”
jason@VictorOps.com
!
@jasonhand
@jasonhand
A little about me…
Dir. of Platform Support - AppDirect
Dir. of Technical Support - Standing Cloud
Dir. of Operational Sys...
Alternative names
Also known as:
(Note: Public & Internal)
Project Retrospectives
Post-mortem analysis Post-project review...
Post-mortem
Defined
A process intended to inform improvements by determining
aspects that were successful or unsuccessful.
...
Post-mortem
Defined
As soon as feasible after the Incident is resolved.
When ?
@jasonhand
Post-mortem
Defined
Everybody
Who ?
@jasonhand
Post-mortem
Defined
To communicate with your team
Why ?
To understand what happened for learning and improving
@jasonhand
Post-mortem
Defined
Talk about the incident timeline
Escalation steps
What was done to resolve the problem
Create a remedia...
The Three R’s
Regret
Acknowledgement and apology
Reason
Initial incident detection to resolution, including
the so-called ...
(Remedy)
Specific
Measurable
Agreed Upon/Agreeable
Realistic
Timebound
Use SMART recommendations
Moving from Reaction to Ac...
Blameless
image from “Across the Universe” @jasonhand
2011 - Hired to Standing Cloud
Cool story, bro
Cloud marketplace & automated deployment of apps
Build Support team
Provide...
Cool story, bro
@jasonhand
– Sydney Dekker
“Reprimanding bad apples may
seem like a quick and rewarding
fix, but it’s like peeing in your
pants.
!
Yo...
What is a blameless
post-mortem?
Team members are accountable but not responsible
Complete Transparency
Deeper look at cir...
– Dave Zwieback
“Your organization must
continually affirm that
individuals are NEVER the “root
cause” of outages.”
@jason...
Paraphrased from “Fallible Humans” by Ian Malpass
- DevOpsDays - Minneapolis
source: http://www.indecorous.com/fallible_hu...
(Efficiency Thoroughness Trade Off)
The trade off between:
!
being efficient
vs
being thorough
ETTO
Efficient
Thorough
@jason...
- Ian Malpass
“We can be thorough and really
dig into the task at hand and
understand it well but this takes
time:
it is i...
Cause & Effect
There are many factors that played a part in the problem
source: http://xkcd.com
“may be”
@jasonhand
Stress
& Cognitive
Bias
@jasonhand
Yerkes-Dodson Model
source: The Human Side of Postmortems
@jasonhand
@jasonhand
Reduce Stress?
… build
muscle memory
Simulate many types of problems
and outages as “practice” …
@jasonhand
Evaluative Threat
Being negatively judged
plays a big role in stress
@jasonhand
What is stress surface?
Variables of a situation
Novel or unusual
Unpredictable
Controllable situation
Negative judgement
...
Capturing the
Human-side
Ask questions
@jasonhand
Stress Questionnaire
The situation was novel or unusual?
The situation was unpredictable?
You were unable to control the s...
Why we don’t punish
De-incentivized to give the details
Practically guarantees a repeat of the problem
Understand why acti...
@jasonhand
Promoting from within
Where do we start?
• Document your timeline or log data
• Document conversations
• Leave room for no...
Tools
Etsy’s Morgue
VictorOps
Post-mortem Report
@jasonhand
Internal Wiki
@jasonhand
Seek the truth
Don’t blame others …
!
Don’t blame yourself
Thank You
Questions ?
@jasonhand
Resources
“The Human Side of Postmortems” - Dave Zwieback
“The Field Guide to Understanding Human Error” - Sydney Dekker
“...
Upcoming SlideShare
Loading in …5
×

It's Not Your Fault - Blameless Post-mortems

10,912 views

Published on

A deeper look at why we perform "blameless" post-mortems.

Published in: Technology

It's Not Your Fault - Blameless Post-mortems

  1. 1. (Blameless) post-mortems @jasonhand It’s Not Your Fault
  2. 2. Jason Hand DevOps “Handyman” jason@VictorOps.com ! @jasonhand @jasonhand
  3. 3. A little about me… Dir. of Platform Support - AppDirect Dir. of Technical Support - Standing Cloud Dir. of Operational Systems - American Fasteners, Inc. Hiker, climber, brewer, runner, biker, boarder, surfer, painter, singer, reader, writer, picker, coder, racer, camper, volunteer …. all the usual “Colorado 1-upper” crap. @jasonhand
  4. 4. Alternative names Also known as: (Note: Public & Internal) Project Retrospectives Post-mortem analysis Post-project review Project Analysis Review Quality Improvement Review Autopsy Review Santayana Review After Action Review Touchdown Meeting @jasonhand
  5. 5. Post-mortem Defined A process intended to inform improvements by determining aspects that were successful or unsuccessful. What ? @jasonhand
  6. 6. Post-mortem Defined As soon as feasible after the Incident is resolved. When ? @jasonhand
  7. 7. Post-mortem Defined Everybody Who ? @jasonhand
  8. 8. Post-mortem Defined To communicate with your team Why ? To understand what happened for learning and improving @jasonhand
  9. 9. Post-mortem Defined Talk about the incident timeline Escalation steps What was done to resolve the problem Create a remediation plan Make it available How ? @jasonhand
  10. 10. The Three R’s Regret Acknowledgement and apology Reason Initial incident detection to resolution, including the so-called “root causes.” Remedy Actionable remediation items Dave Zwieback VP Engineering - Next Big Sound @jasonhand ( simple format )
  11. 11. (Remedy) Specific Measurable Agreed Upon/Agreeable Realistic Timebound Use SMART recommendations Moving from Reaction to Action @jasonhand
  12. 12. Blameless image from “Across the Universe” @jasonhand
  13. 13. 2011 - Hired to Standing Cloud Cool story, bro Cloud marketplace & automated deployment of apps Build Support team Provide Managed services @jasonhand
  14. 14. Cool story, bro @jasonhand
  15. 15. – Sydney Dekker “Reprimanding bad apples may seem like a quick and rewarding fix, but it’s like peeing in your pants. ! You feel relieved and perhaps even nice and warm for a little while, but then it gets cold and uncomfortable. ! And you look like a fool” Quote first seen in J. Paul Reed’s “A Look at Looking in the Mirror" @jasonhand
  16. 16. What is a blameless post-mortem? Team members are accountable but not responsible Complete Transparency Deeper look at circumstances What happened and how to improve it (specific details) Real conditions of failure in complex systems @jasonhand
  17. 17. – Dave Zwieback “Your organization must continually affirm that individuals are NEVER the “root cause” of outages.” @jasonhand
  18. 18. Paraphrased from “Fallible Humans” by Ian Malpass - DevOpsDays - Minneapolis source: http://www.indecorous.com/fallible_humans/@jasonhand
  19. 19. (Efficiency Thoroughness Trade Off) The trade off between: ! being efficient vs being thorough ETTO Efficient Thorough @jasonhand
  20. 20. - Ian Malpass “We can be thorough and really dig into the task at hand and understand it well but this takes time: it is inefficient.” @jasonhand
  21. 21. Cause & Effect There are many factors that played a part in the problem source: http://xkcd.com “may be” @jasonhand
  22. 22. Stress & Cognitive Bias @jasonhand
  23. 23. Yerkes-Dodson Model source: The Human Side of Postmortems @jasonhand
  24. 24. @jasonhand
  25. 25. Reduce Stress? … build muscle memory Simulate many types of problems and outages as “practice” … @jasonhand
  26. 26. Evaluative Threat Being negatively judged plays a big role in stress @jasonhand
  27. 27. What is stress surface? Variables of a situation Novel or unusual Unpredictable Controllable situation Negative judgement Lack of sleep Problems at home Health Relationships @jasonhand Evaluative threats ALSO Etc…
  28. 28. Capturing the Human-side Ask questions @jasonhand
  29. 29. Stress Questionnaire The situation was novel or unusual? The situation was unpredictable? You were unable to control the situation? Others could judge your actions negatively? 0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often During the outage, how often have you felt or thought that: @jasonhand
  30. 30. Why we don’t punish De-incentivized to give the details Practically guarantees a repeat of the problem Understand why actions made sense (at the time) Create safety AND accountability Move away from idea of “individuals are problems” Create new “experts” @jasonhand
  31. 31. @jasonhand
  32. 32. Promoting from within Where do we start? • Document your timeline or log data • Document conversations • Leave room for notes • Mean time to resolution / Time calculations • Level of severity • Archive it for historical retrieval • Remediation. Make it actionable @jasonhand The basics:
  33. 33. Tools Etsy’s Morgue VictorOps Post-mortem Report @jasonhand Internal Wiki
  34. 34. @jasonhand Seek the truth Don’t blame others … ! Don’t blame yourself Thank You
  35. 35. Questions ? @jasonhand
  36. 36. Resources “The Human Side of Postmortems” - Dave Zwieback “The Field Guide to Understanding Human Error” - Sydney Dekker “A Look at Looking in the Mirror” - J. Paul Reed “Fallible Humans” - Ian Malpass (http://www.indecorous.com/fallible_humans/) “4 Questions to ask for an effective Technical Post Mortem” - Jeffrey O’Brien (http://www.maintenanceassistant.com/blog/ 4-questions-effective-technical-post-mortem/) “Nine steps to IT post-mortem excellence” - Michael Krigsman (http://www.zdnet.com/blog/projectfailures/nine-steps-to-it- post-mortem-excellence/1069) “Postmortem reviews: purpose and approaches in software engineering” - Torgeir Dingsøyr (http://www.uio.no/studier/ emner/matnat/ifi/INF5180/v10/undervisningsmateriale/reading-materials/p08/post-mortems.pdf) “Blameless PostMortems and a Just Culture” - John Allspaw (http://codeascraft.com/2012/05/22/blameless-postmortems/) “What blameless really means” - Jessica Harllee (http://www.jessicaharllee.com/notes/what-blameless-really-means/) “Each necessary, but only jointly sufficient” - John Allspaw (http://www.kitchensoap.com/2012/02/10/each-necessary-but- only-jointly-sufficient/) @jasonhand

×