Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Next-Level Incident Management: Culture Matters When Things Break

7,828 views

Published on

Ever had an incident that didn't go as planned? The culture amongst Ops, DevOps and SRE teams is critical to ensuring that your team is effective when things break. We've taken our years of collective experience and converted them into 5 easy to understand values that help teams move away from a hero-driven culture to a team based culture. Learn how to guide autonomous decisions and create a consistent culture between teams that you can easily apply and share with others.

Published in: Software
  • Be the first to comment

Next-Level Incident Management: Culture Matters When Things Break

  1. 1. Next-Level Incident Management: Culture Matters When Things Break PATRICK HILL | SRE SOLUTIONS LEAD | @TOPOFTHEHILL
  2. 2. Culture
  3. 3. 78%of people don’t trust team mates 59% 
 say it’s poor communication 29% 
 say it’s lack of accountability
  4. 4. When @#$% hits the fan, we need to trust each other.
  5. 5. CultureProcess Mandated standardized process Centralized decision making Lots of hierarchy and structure Focus on outputs Efficiency Provide guides and guardrails Empower decision making Autonomy Focus on outcomes Effectiveness
  6. 6. Time Number of Engineers
  7. 7. CultureProcess
  8. 8. CultureProcess Documentation & Diagrams Reports Roles & Responsibilities
  9. 9. CultureProcess Documentation & Diagrams Reports Roles & Responsibilities
  10. 10. Atlassian Values They guide what we do, why we create, and who we hire. Open company, 
 no bullshit Build with heart
 & balance Be the change 
 you seek Play, 
 as a team Don’t #@!% the customer
  11. 11. Certainty Have a consistent process and culture between teams of how we identify, manage and learn from incidents. Goals Relatedness Align teams as to what attitude they should be bringing to each part of incident identification, resolution and reflection. Autonomy Guide autonomous decision- making by people & teams in incident and PIR situations
  12. 12. SCARF Thanks David Rock!
  13. 13. Status Certainty Autonomy Relatedness Fairness
  14. 14. 5 cultural values for when @#$% breaks Detect
 Atlassian knows before our customers do Respond
 Escalate, escalate, escalate Recover
 Sh!t happens, clean it up quickly Learn
 Always Blameless Improve
 Never have the same incident twice 1 2 3 4 5
  15. 15. Atlassian knows before our customers do. DETECT 1
  16. 16. If you’re not first, you’re last.
  17. 17. Good alerting Have someone working on it before users see it. Good communications Inform users before they run into trouble. Good monitoring Detect problems before they impact users. What this value does represent
  18. 18. Build with Heart and Balance. ATLASSIAN’S RELATED VALUE
  19. 19. Escalate, escalate, escalate. RESPOND 2
  20. 20. Wake people up unless you need them. Ask for the help you need. DO DON’T
  21. 21. We don’t need a hero
  22. 22. Play, as a team. ATLASSIAN’S RELATED VALUE
  23. 23. Sh!t happens, clean it up quickly. RESOLVE 3
  24. 24. Customers don’t care about you gathering diagnostics.
  25. 25. Source: https://howlingpixel.com/wiki/Flight_recorder
  26. 26. Don't !@#$ the Customer ATLASSIAN’S RELATED VALUE
  27. 27. Always Blameless LEARN 4
  28. 28. Open Company, No Bullsh!t ATLASSIAN’S RELATED VALUE
  29. 29. Never have the same 
 incident twice. IMPROVE 5
  30. 30. Be open ended Ask open questions, ask what if, ask money is not a problem Follow-up Make sure that actions get completed. Ask how and why repeatedly Really understand what caused this incident.Never have the same incident twice.
  31. 31. Reinforcing the values Each PIR asks about the values, and whether this incident stood up.
  32. 32. Be the change you seek ATLASSIAN’S RELATED VALUE
  33. 33. 5 cultural values for 
 when @#$% breaks Detect
 Atlassian knows before our customers do Respond
 Escalate, escalate, escalate Recover
 Sh!t happens, clean it up quickly Learn
 Always Blameless Improve
 Never have the same incident twice 1 2 3 4 5
  34. 34. A few tips for building your own culture!
  35. 35. Be authentic. Be inclusive. TIPS FOR YOUR OWN CULTURE
  36. 36. Involve people gradually 
 and get them to be 
 your promoters. TIPS FOR YOUR OWN CULTURE
  37. 37. Understand and use the SCARF model in your teams. TIPS FOR YOUR OWN CULTURE
  38. 38. Reward the good behaviors. TIPS FOR YOUR OWN CULTURE
  39. 39. 5 cultural values for when @#$% breaks Detect
 Atlassian knows before our customers do Respond
 Escalate, escalate, escalate Recover
 Sh!t happens, clean it up quickly Learn
 Always Blameless Improve
 Never have the same incident twice 1 2 3 4 5
  40. 40. Thank you! PATRICK HILL | SRE SOLUTIONS LEAD | @TOPOFTHEHILL

×