0
Escalating                           Scenarios                           A Deep Dive Into Outage                          ...
TROUBLESHOOTING         This is NOT about troubleshooting         Or, not just about troubleshootingWednesday, October 3, 12
LAYOUT      • Criteria      • Situational Awareness      • HROs      • Decision Making      • Communication      • Team Co...
Wednesday, October 3, 12
Wednesday, October 3, 12
How important is this?Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Oct 2011                           Sept 2012Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Where to learn                              from?Wednesday, October 3, 12
TMIWednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Kegworth 1989Wednesday, October 3, 12
Dr. Richard Cook, Velocity US 2012                           http://www.youtube.com/watch?v=R_PDc0HFdP0Wednesday, October ...
“The Self-Designing High-Reliability Organization:         Aircraft Carrier Flight Operations at Sea”         Rochlin, La ...
Wednesday, October 3, 12
Wednesday, October 3, 12
What Goes On In Our Heads?Wednesday, October 3, 12
Jens Rasmussen, 1983                                                       Senior Member, IEEE         “Skills, Rules, and...
SKILL - BASED                                Simple, routine RULE - BASED                           Knowable, but unfamili...
Situational Awareness"the perception of elements in the environment within a volume oftime and space, the comprehension of...
OODA Loop            Observe          Orient         Decide                           Act        Metrics            Analys...
Canonical Work         “Towards a Theory of Situational Awareness”         Mica Endsley, Human Factors (1995)         http...
Situational Awareness                                  Level I                                Perception                  ...
System capability                                                                                   Interface design      ...
Level One: PerceptionWednesday, October 3, 12
Wednesday, October 3, 12
Context     Can you spot the anomaly?Wednesday, October 3, 12
Context                           24 hoursWednesday, October 3, 12
Context                           7 daysWednesday, October 3, 12
Context                           Normal                             But                            NoisyWednesday, Octobe...
Level  TwoWednesday, October 3, 12                           Comprehension
Level  TwoWednesday, October 3, 12
Mental Models      • Categorization & Comprehension      • Mental “map” or “schema”      • Informed by experience, stored ...
Mental ModelsWednesday, October 3, 12
Level ThreeWednesday, October 3, 12
Mental ModelsWednesday, October 3, 12
Level Three       Common Clues you’re losing SA at this level         • Ambiguity         • Fixation         • Confusion  ...
Characteristics of response to    escalating scenariosWednesday, October 3, 12
Characteristics of response to    escalating scenarios                           ...tend to neglect how processes         ...
Characteristics of response to    escalating scenarios                           ...have difficulty in dealing with       ...
Characteristics of response to    escalating scenarios                           ...inclined to think in causal SERIES,   ...
SA                           PitfallsRequisite Memory TrapWednesday, October 3, 12
SA                           PitfallsWorkload, anxiety,fatigue, other stressorsWednesday, October 3, 12
SA                           PitfallsData OverloadWednesday, October 3, 12
SA                           PitfallsMisplace SalienceWednesday, October 3, 12
http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdfWednesday, October 3, 12
http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdfWednesday, October 3, 12
Wednesday, October 3, 12
SA                                                 Pitfalls Complexity Creep             “Everything should be as simple a...
SA                           PitfallsPoor Mental ModelsWednesday, October 3, 12
SA                           PitfallsOut-Of-The-LoopSyndromeWednesday, October 3, 12
SA                           PitfallsRefusal to makedecisionsWednesday, October 3, 12
Heroism         Non-communicating lone wolf-ismsWednesday, October 3, 12
Distraction         Irrelevant noise in comm channelsWednesday, October 3, 12
Wednesday, October 3, 12
TEAMS      • Divide and conquer applied to problem space,              division of labor      • Incident resolution vs. Pr...
TEAMS         Shotgun debuggingWednesday, October 3, 12
JOINT                                      ACTIVITY      • Interpredictability      • Common Ground      • Directability  ...
InterpredictabilityWednesday, October 3, 12
Common GroundWednesday, October 3, 12
DirectabilityWednesday, October 3, 12
ImprovisationWednesday, October 3, 12
IMPROVISATIONWednesday, October 3, 12
IMPROVISATIONWednesday, October 3, 12
Improvisation    “...you can’t improvise on nothing; you got to    improvise on something.”                   Charles Ming...
Diagnose                           the problem                            Represent                           the problem ...
Communication     Recommendations         •Explicitness         •Assertiveness         •TimingWednesday, October 3, 12
Assertiveness      • Passive      • Assertive      • AggressiveWednesday, October 3, 12
Wednesday, October 3, 12
ExerciseWednesday, October 3, 12
Communication      • IRC?      • Face-To-Face?      • Conference Call?      • Morse Code?Wednesday, October 3, 12
Kegworth 1989Wednesday, October 3, 12
Transmission                           Encode                       Decode       Meaning                                  ...
Transmission                           Encode                       Decode       Meaning                                  ...
Transmission                           Encode                         Decode      Meaning                                 ...
Transmission                           Encode                         Decode      Meaning                                 ...
Feedback        InformationalWednesday, October 3, 12
Feedback            CorrectiveWednesday, October 3, 12
Feedback          ReinforcingWednesday, October 3, 12
Decision Making         Naturalistic Decision Making (NDM)         Gary KleinWednesday, October 3, 12
Decision Making         Step One: What is the problem?Wednesday, October 3, 12
Decision Making         Step Two: What shall I do?Wednesday, October 3, 12
Decision Making         Recognition-Primed DecisionsWednesday, October 3, 12
Decision Making         Rule-Based DecisionsWednesday, October 3, 12
Decision Making                           Choice decisionsWednesday, October 3, 12
Decision Making                           Creative decisionsWednesday, October 3, 12
Decision Making                                      Decreasing cognitive effort                                      Decr...
Decision Making         PRE-MortemWednesday, October 3, 12
ToolingWednesday, October 3, 12
?                           ?Wednesday, October 3, 12
Time                  Period   MetricWednesday, October 3, 12
ControlsWednesday, October 3, 12
ALERTS      • Meant to boost SA      • Alarm overload      • High false alarm rates      • Routinely disable alertsWednesd...
Alert ReliabilityWednesday, October 3, 12
ALERT DESIGN      • Signal:Noise can be difficult      • Easy to err on more false alarms      • Decay in trust      • Ori...
ALERT DESIGN         ConfirmationWednesday, October 3, 12
ALERT DESIGN         ExpectancyWednesday, October 3, 12
ALERT DESIGNWednesday, October 3, 12
ALERT DESIGN      • Don’t make people singularly reliant on alarms      • Support alarm confirmation activities      • Make...
ALERT DESIGN      • Use multiple modalities      • Minimize alarm disruptions to ongoing activities      • Support the ass...
Mature Role of Automation       “Ironies of Automation” - Lisanne Bainbridge          http://www.bainbrdg.demon.co.uk/Pape...
Mature Role of Automation        •       Moves humans from manual operator to supervisor        •       Extends and augmen...
SUMMARYWednesday, October 3, 12
So what can we do?         “In preparing for battle, I have always         found that plans are useless but planning      ...
So what can we do?         We develop our Non-Technical Skills            • Situational Awareness            • Communicati...
So what can we do?         We tailor our environment to adapt            • Tooling to support SA            • Learning fro...
Wednesday, October 3, 12
THE ENDWednesday, October 3, 12
Credits       •      http://www.flickr.com/photos/28650594@N03/2718027136/       •      http://www.flickr.com/photos/telstar...
Upcoming SlideShare
Loading in...5
×

Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls

5,869

Published on



When things go wrong, our judgement is clouded at best, blinded at worst.

In order to successfully navigate a large-scale outage, being aware of potentials gaps in knowledge and context can help make for a better outcome. The Human Factors and Systems Safety community have been studying how people situate themselves, coordinate amongst a team, use tooling, make decisions, and keep their cool under sometimes very stressful and escalating scenarios. We can learn from this research in order to adopt a more mature stance when the s*#t hits the fan.

We’re going to look closely at how people behave under these circumstances using real-world examples and scan what we can learn from High Reliability Organizations(HROs) and fields such as aviation, military, and trauma-driven healthcare.

Published in: Business
3 Comments
14 Likes
Statistics
Notes
No Downloads
Views
Total Views
5,869
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
142
Comments
3
Likes
14
Embeds 0
No embeds

No notes for slide

Transcript of "Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls"

  1. 1. Escalating Scenarios A Deep Dive Into Outage Pitfalls John Allspaw Velocity London 2012Wednesday, October 3, 12
  2. 2. TROUBLESHOOTING This is NOT about troubleshooting Or, not just about troubleshootingWednesday, October 3, 12
  3. 3. LAYOUT • Criteria • Situational Awareness • HROs • Decision Making • Communication • Team Coordination • A little bit of psychologyWednesday, October 3, 12
  4. 4. Wednesday, October 3, 12
  5. 5. Wednesday, October 3, 12
  6. 6. How important is this?Wednesday, October 3, 12
  7. 7. Wednesday, October 3, 12
  8. 8. Wednesday, October 3, 12
  9. 9. Wednesday, October 3, 12
  10. 10. Wednesday, October 3, 12
  11. 11. Wednesday, October 3, 12
  12. 12. Oct 2011 Sept 2012Wednesday, October 3, 12
  13. 13. Wednesday, October 3, 12
  14. 14. Wednesday, October 3, 12
  15. 15. Wednesday, October 3, 12
  16. 16. Wednesday, October 3, 12
  17. 17. Where to learn from?Wednesday, October 3, 12
  18. 18. TMIWednesday, October 3, 12
  19. 19. Wednesday, October 3, 12
  20. 20. Wednesday, October 3, 12
  21. 21. Kegworth 1989Wednesday, October 3, 12
  22. 22. Dr. Richard Cook, Velocity US 2012 http://www.youtube.com/watch?v=R_PDc0HFdP0Wednesday, October 3, 12
  23. 23. “The Self-Designing High-Reliability Organization: Aircraft Carrier Flight Operations at Sea” Rochlin, La Porte, and Roberts. Naval War College Review 1987 http://govleaders.org/reliability.htmWednesday, October 3, 12
  24. 24. Wednesday, October 3, 12
  25. 25. Wednesday, October 3, 12
  26. 26. What Goes On In Our Heads?Wednesday, October 3, 12
  27. 27. Jens Rasmussen, 1983 Senior Member, IEEE “Skills, Rules, and Knowledge; Signals, Signs, and Symbols, and Other Distinctions in Human Performance Models” IEEE Transactions On Systems, Man, and Cybernetics, May 1983Wednesday, October 3, 12
  28. 28. SKILL - BASED Simple, routine RULE - BASED Knowable, but unfamiliar KNOWLEDGE - BASED (Reason, 1990) WTF IS GOING ON?Wednesday, October 3, 12
  29. 29. Situational Awareness"the perception of elements in the environment within a volume oftime and space, the comprehension of their meaning, and theprojection of their status in the near future,” - (Endsley, 1995)"keeping track of what is going on around you in a complex,dynamic environment" (Moray, 2005, p. 4)"knowing what is going on so you can figure out what todo" (Adam, 1993)Wednesday, October 3, 12
  30. 30. OODA Loop Observe Orient Decide Act Metrics Analysis Planning Execution Monitoring Visualization Resourcing Alerting Correlation Alarming credit: http://blog.b3k.us/ooda.htmlWednesday, October 3, 12
  31. 31. Canonical Work “Towards a Theory of Situational Awareness” Mica Endsley, Human Factors (1995) http://www.satechnologies.com/Papers/pdf/Toward%20a%20Theory %20of%20SA.pdfWednesday, October 3, 12
  32. 32. Situational Awareness Level I Perception Level II Comprehension Level III ProjectionWednesday, October 3, 12
  33. 33. System capability Interface design Stress and workload Complexity Automation Task/System Factors Feedback Situational Awareness Perception Performance State of the of elements Comprehension Projection Decision of actions environment in current situation of current situation of future status LEVEL I LEVEL II LEVEL III Information processing Individual Factors mechanisms Long term Goals and memory states Automaticity objectives Preconceptions - Abilities (expectations) - Experience - Training (Endsley)Wednesday, October 3, 12
  34. 34. Level One: PerceptionWednesday, October 3, 12
  35. 35. Wednesday, October 3, 12
  36. 36. Context Can you spot the anomaly?Wednesday, October 3, 12
  37. 37. Context 24 hoursWednesday, October 3, 12
  38. 38. Context 7 daysWednesday, October 3, 12
  39. 39. Context Normal But NoisyWednesday, October 3, 12
  40. 40. Level TwoWednesday, October 3, 12 Comprehension
  41. 41. Level TwoWednesday, October 3, 12
  42. 42. Mental Models • Categorization & Comprehension • Mental “map” or “schema” • Informed by experience, stored in memoryWednesday, October 3, 12
  43. 43. Mental ModelsWednesday, October 3, 12
  44. 44. Level ThreeWednesday, October 3, 12
  45. 45. Mental ModelsWednesday, October 3, 12
  46. 46. Level Three Common Clues you’re losing SA at this level • Ambiguity • Fixation • Confusion • Lack of Information • Failure to maintain • Failure to meet expected checkpoint or target • Failure to resolve discrepancies • A bad gut feeling that things are not quite right Wednesday, October 3, 12
  47. 47. Characteristics of response to escalating scenariosWednesday, October 3, 12
  48. 48. Characteristics of response to escalating scenarios ...tend to neglect how processes develop within time (awareness of rates) versus assessing how things are in the moment “On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980Wednesday, October 3, 12
  49. 49. Characteristics of response to escalating scenarios ...have difficulty in dealing with exponential developments (hard to imagine how fast something can change, or accelerate) “On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980Wednesday, October 3, 12
  50. 50. Characteristics of response to escalating scenarios ...inclined to think in causal SERIES, instead of causal NETS. A therefore B, instead of A, therefore B and C (therefore D and E), etc. “On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980Wednesday, October 3, 12
  51. 51. SA PitfallsRequisite Memory TrapWednesday, October 3, 12
  52. 52. SA PitfallsWorkload, anxiety,fatigue, other stressorsWednesday, October 3, 12
  53. 53. SA PitfallsData OverloadWednesday, October 3, 12
  54. 54. SA PitfallsMisplace SalienceWednesday, October 3, 12
  55. 55. http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdfWednesday, October 3, 12
  56. 56. http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdfWednesday, October 3, 12
  57. 57. Wednesday, October 3, 12
  58. 58. SA Pitfalls Complexity Creep “Everything should be as simple as it can be, but not simpler.” - paraphrased, EinsteinWednesday, October 3, 12
  59. 59. SA PitfallsPoor Mental ModelsWednesday, October 3, 12
  60. 60. SA PitfallsOut-Of-The-LoopSyndromeWednesday, October 3, 12
  61. 61. SA PitfallsRefusal to makedecisionsWednesday, October 3, 12
  62. 62. Heroism Non-communicating lone wolf-ismsWednesday, October 3, 12
  63. 63. Distraction Irrelevant noise in comm channelsWednesday, October 3, 12
  64. 64. Wednesday, October 3, 12
  65. 65. TEAMS • Divide and conquer applied to problem space, division of labor • Incident resolution vs. Problem resolution • Reproducibility • Fault Tolerance EffectsWednesday, October 3, 12
  66. 66. TEAMS Shotgun debuggingWednesday, October 3, 12
  67. 67. JOINT ACTIVITY • Interpredictability • Common Ground • Directability http://csel.eng.ohio-state.edu/woods/distributed/CG%20final.pdfWednesday, October 3, 12
  68. 68. InterpredictabilityWednesday, October 3, 12
  69. 69. Common GroundWednesday, October 3, 12
  70. 70. DirectabilityWednesday, October 3, 12
  71. 71. ImprovisationWednesday, October 3, 12
  72. 72. IMPROVISATIONWednesday, October 3, 12
  73. 73. IMPROVISATIONWednesday, October 3, 12
  74. 74. Improvisation “...you can’t improvise on nothing; you got to improvise on something.” Charles MingusWednesday, October 3, 12
  75. 75. Diagnose the problem Represent the problem Detect the Generate a Apply course of Leverage Problem/Opportunity action Points EvaluateWednesday, October 3, 12
  76. 76. Communication Recommendations •Explicitness •Assertiveness •TimingWednesday, October 3, 12
  77. 77. Assertiveness • Passive • Assertive • AggressiveWednesday, October 3, 12
  78. 78. Wednesday, October 3, 12
  79. 79. ExerciseWednesday, October 3, 12
  80. 80. Communication • IRC? • Face-To-Face? • Conference Call? • Morse Code?Wednesday, October 3, 12
  81. 81. Kegworth 1989Wednesday, October 3, 12
  82. 82. Transmission Encode Decode Meaning Meaning Sender ReceiverWednesday, October 3, 12
  83. 83. Transmission Encode Decode Meaning Meaning Sender ReceiverWednesday, October 3, 12
  84. 84. Transmission Encode Decode Meaning Meaning Sender Receiver Meaning Meaning Receiver Sender Decode Encode TransmissionWednesday, October 3, 12
  85. 85. Transmission Encode Decode Meaning Meaning Sender Receiver Meaning Meaning Receiver Sender Decode Encode TransmissionWednesday, October 3, 12
  86. 86. Feedback InformationalWednesday, October 3, 12
  87. 87. Feedback CorrectiveWednesday, October 3, 12
  88. 88. Feedback ReinforcingWednesday, October 3, 12
  89. 89. Decision Making Naturalistic Decision Making (NDM) Gary KleinWednesday, October 3, 12
  90. 90. Decision Making Step One: What is the problem?Wednesday, October 3, 12
  91. 91. Decision Making Step Two: What shall I do?Wednesday, October 3, 12
  92. 92. Decision Making Recognition-Primed DecisionsWednesday, October 3, 12
  93. 93. Decision Making Rule-Based DecisionsWednesday, October 3, 12
  94. 94. Decision Making Choice decisionsWednesday, October 3, 12
  95. 95. Decision Making Creative decisionsWednesday, October 3, 12
  96. 96. Decision Making Decreasing cognitive effort Decreasing effects of stress Creative Choice Rule-Based RPD Increasing cognitive effort Increasing effects of stressWednesday, October 3, 12
  97. 97. Decision Making PRE-MortemWednesday, October 3, 12
  98. 98. ToolingWednesday, October 3, 12
  99. 99. ? ?Wednesday, October 3, 12
  100. 100. Time Period MetricWednesday, October 3, 12
  101. 101. ControlsWednesday, October 3, 12
  102. 102. ALERTS • Meant to boost SA • Alarm overload • High false alarm rates • Routinely disable alertsWednesday, October 3, 12
  103. 103. Alert ReliabilityWednesday, October 3, 12
  104. 104. ALERT DESIGN • Signal:Noise can be difficult • Easy to err on more false alarms • Decay in trust • Origins: Undetectable conditionsWednesday, October 3, 12
  105. 105. ALERT DESIGN ConfirmationWednesday, October 3, 12
  106. 106. ALERT DESIGN ExpectancyWednesday, October 3, 12
  107. 107. ALERT DESIGNWednesday, October 3, 12
  108. 108. ALERT DESIGN • Don’t make people singularly reliant on alarms • Support alarm confirmation activities • Make alarms unambiguous • Reduce, reduce, reduce false alerts • Set missed/false alert trade-offs appropriatelyWednesday, October 3, 12
  109. 109. ALERT DESIGN • Use multiple modalities • Minimize alarm disruptions to ongoing activities • Support the assessment/diagnosis of multiple alerts • Support global SA of systems in an alarm stateWednesday, October 3, 12
  110. 110. Mature Role of Automation “Ironies of Automation” - Lisanne Bainbridge http://www.bainbrdg.demon.co.uk/Papers/Ironies.htmlWednesday, October 3, 12
  111. 111. Mature Role of Automation • Moves humans from manual operator to supervisor • Extends and augments human abilities, doesn’t replace it • Doesn’t remove “human error” • Are brittle • Recognize that there is always discretionary space for humans • Recognizes the Law of Stretched SystemsWednesday, October 3, 12
  112. 112. SUMMARYWednesday, October 3, 12
  113. 113. So what can we do? “In preparing for battle, I have always found that plans are useless but planning is indispensable.” - EisenhowerWednesday, October 3, 12
  114. 114. So what can we do? We develop our Non-Technical Skills • Situational Awareness • Communication • Decision Making • Improvisation • Crew Resource Management (CRM)Wednesday, October 3, 12
  115. 115. So what can we do? We tailor our environment to adapt • Tooling to support SA • Learning from outages (PostMortem) • Anticipating problems (PreMortem) • Gather Meta-MetricsWednesday, October 3, 12
  116. 116. Wednesday, October 3, 12
  117. 117. THE ENDWednesday, October 3, 12
  118. 118. Credits • http://www.flickr.com/photos/28650594@N03/2718027136/ • http://www.flickr.com/photos/telstar/2816887784/ • http://www.flickr.com/photos/soldiersmediacenter/3855375117/ • http://www.flickr.com/photos/19743256@N00/2640217771/ • http://www.flickr.com/photos/29456680@N06/6148754691/ • https://www.flickr.com/photos/spencediddy/3197199659/sizes/l/in/faves-allspaw/ • http://www.flickr.com/photos/splorp/64027565/sizes/l/in/photostream/Wednesday, October 3, 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×