Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. National Aeronautics and Space Administration How Complex Systems Fail David Fuller NASA Glenn Research Center www.nasa.gov 1
  2. 2. National Aeronautics and Space Administration How Complex Systems Fail A Short Treatise by Richard Cook, MD • Written by Richard Cook, MD, Director of the Cognitive Technologies Laboratory at the University of Chicago • http://www.ctlab.org/documents/How%20Complex%2 0Systems%20Fail.pdf • 18 short paragraphs on complex systems that will help every project manager understand and reduce risk in their project www.nasa.gov 2
  3. 3. National Aeronautics and Space Administration 1. Complex Systems are Intrinsically Hazardous Systems • Complex systems are found in transportation, healthcare, power generation, and space. • Because they are complex, they are inherently and unavoidably hazardous. • The defenses that are created against these hazards characterize these systems. www.nasa.gov 3
  4. 4. National Aeronautics and Space Administration 2. Complex Systems are Heavily and Successfully Defended Against Failure • Multiple layers of defense against hazards in: – Machine – Human – Organizational – Institutional – Regulatory • These defenses keep operations away from accidents www.nasa.gov 4
  5. 5. National Aeronautics and Space Administration 3. Catastrophe Requires Multiple Failures • Defenses are generally successful. • Catastrophic failures occur when small or disconnected failures come together. • Most initial failure trajectories are blocked by the systems safety components. • Trajectories that reach operational level are blocked by humans operating the system. www.nasa.gov 5
  6. 6. National Aeronautics and Space Administration 4. Complex Systems Contain Changing Mixtures of Latent Failures • Multiple flaws are always present. • Individual flaws are considered minor factors because they are insufficient individually to cause failure. • Eradication of latent failures is limited by economic cost. • Difficult to foresee how these minor flaws might contribute to accidents. • Failures change constantly: – Changing technology – Changing work organization – Changing efforts to eradicate failures. www.nasa.gov 6
  7. 7. National Aeronautics and Space Administration 5. Complex Systems Run in Degraded Mode • Complex systems run as broken systems. • Continues to function because it contains many redundancies. • Human operators learn to make it function. • System operations are dynamic: – Organization changes – Human behavior changes – Technology changes. www.nasa.gov 7
  8. 8. National Aeronautics and Space Administration 6. Catastrophe is Always Just Around the Corner • Human operators are in close physical and temporal proximity to these potential failures. • Failure can occur at any time and any place. • It is impossible to eliminate this potential. • Potential for disaster is always present by the systems own nature. www.nasa.gov 8
  9. 9. National Aeronautics and Space Administration 7. Post-Accident Attribution to a “Root Cause” is Fundamentally Wrong • There is never an isolated cause of an accident. • Many individual causes that join together to cause accidents. • Causes are many times not coupled. • Evaluations based on finding the “root cause” show a misunderstanding of the nature of accidents. • Insistence on a “root cause” reflects the social and cultural need to blame specific, localized forces for accidents. www.nasa.gov 9
  10. 10. National Aeronautics and Space Administration 8. Hindsight Biases Post-Accident Assessments of Human Performance • Knowledge of the outcome makes the investigator unable to understand the human factors present at the time of accident. • Knowledge of the outcome poisons the ability of the investigator to recreate the views of the humans involved. • Hindsight bias remains the primary obstacle to accident investigation, especially when expert human performance is involved. www.nasa.gov 10
  11. 11. National Aeronautics and Space Administration 9. Human Operators have Dual Roles: Producers and Defenders Against Failure • Operators work to produce the desired product and also work to forestall accidents. • Operators balance production against safety in a dynamic environment. • In times of no accidents, production is emphasized. • After accidents, the defensive role is emphasized. www.nasa.gov 11
  12. 12. National Aeronautics and Space Administration 10. All Practitioner Actions are Gambles • All decisions are made in the face of uncertainty. • The degree of uncertainty changes from moment to moment. • The “gamble” appears clear after accidents (see 8 above). • Post hoc analysis of accidents regards these gambles as poor ones. • Successful outcomes are also the result of gambles, but are seen in a much more favorable light. www.nasa.gov 12
  13. 13. National Aeronautics and Space Administration 11. Actions at the Sharp End Resolve All Ambiguity • Organizations are ambiguous about the relationship between: – Production – Efficient use of resources – Economy/costs of operations – Acceptable risk • All of this ambiguity is resolved moment by moment by the operators. www.nasa.gov 13
  14. 14. National Aeronautics and Space Administration 12. Human Practitioners are the Adaptable Element of Complex Systems • Operators actively adapt the system to maximize production and minimize accidents. • These adaptations include: – Restructuring the system to reduce exposure of vulnerable parts to failure – Concentrating critical resources in areas of high demand – Providing pathways for retreat or recovery from faults – Establishing means for early detection of changed system performance. www.nasa.gov 14
  15. 15. National Aeronautics and Space Administration 13. Human expertise in Complex Systems is Constantly Changing • Expertise changes as technology changes. • Experts are replaced (turnover). • Operators are being trained and skills refined. • The cognitive abilities of humans are variable from moment to moment. www.nasa.gov 15
  16. 16. National Aeronautics and Space Administration 14. Change Introduces New Forms of Failure • A low rate of accidents may encourage changes. • Changes create opportunities for new failure modes. • New technologies introduce new failure pathways. • Because failures are low rate, multiple system changes may occur before an accident, making it hard to understand the contribution of the new technology. www.nasa.gov 16
  17. 17. National Aeronautics and Space Administration 15. Views of “Cause” Limit the Effectiveness of Defenses Against Future Events • Post-accident remedies for “human error” are usually predicated on obstructing activities that “cause” accidents. • These measure do little to reduce the likelihood of further accidents. • Identical accidents are very low because the pattern of latent failures changes constantly. • Post-accident remedies usually increase the coupling and complexity of the system. www.nasa.gov 17
  18. 18. National Aeronautics and Space Administration 16. Safety is a Characteristic of Systems and not their Components • Safety is an emergent property. • It does not reside in any one person, device, or department with the organization. • The state of safety is always dynamic. • The whole is greater than the sum of the parts. www.nasa.gov 18
  19. 19. National Aeronautics and Space Administration 17. People Continuously Create Safety • Failure free operations are the result of activities of people who work to keep the system within the boundaries of tolerable performance. • These activities are part of normal operations. • Because system operations are never trouble free, operators adapt to changing conditions. • Operators are creating safety from moment to moment. • Safety is at the mercy of the operators perception of the situation. www.nasa.gov 19
  20. 20. National Aeronautics and Space Administration 18. Failure Free Operations Require Experience with Failure • Recognizing hazards and successfully manipulating system operations requires intimate contact with failure. • Operators must be able to see the “edge of the envelope.” • Improved safety depends on providing operators with calibrated views of the hazards. • Training allows errors to be experienced in a controlled environment. www.nasa.gov 20