CS5032 Lecture 10: Learning from failure 2Presentation Transcript
LEARNINGFROMFAILURE 2DR JOHN ROOKSBY
IN THIS LECTUREThis lecture will focus on how accidents and serious incidents areinvestigated and analysed• When do investigations happen?• How are they conducted?• What analytical methods are used?
INVESTIGATIONSAccidents and failures need to be investigated.• Investigations enable you to identify the (most likely) causes of accidents and failures • The causes and conditions leading up to accidents and incidents are often complex, and immediate reactions may be wrong • Investigations seek to uncover underlying causes, not just the immediate causes• Investigations should also address how future accidents can be avoided • In this context, investigations are primarily to prevent future occurrences than establish responsibility
INVESTIGATIONSThe basic steps of an investigation are1. Collection phase: Evidence, and facts are sought2. Analysis phase: The evidence and facts are analysed and opinions invited from experts and other parties3. Judgements phase: Judgements are made about the causes of an incident or accident and the associated responsibilities4. Follow up: Recommendations should be made on how to stop similar problems happening again.In practice, the process will be iterative.
INVESTIGATIONSThere are limitations on the investigation process. Investigationscan be costly.• We may never know all the facts. With complex systems, it can be very hard or impossible to know everything that happened in the run up to an incident. In major accidents sources of evidence may be damaged or lost.• There will always be subjective views and uncertainties, especially around human actions.Judgements need to be made about the extent to which anincident can be investigated.Investigations often conclude with the “likely” or “probable” causesrather than a definitive version of events
WHO INVESTIGATES?The scope and emphasis of an investigation is likely to reflect theposition of the investigatorAn investigator ought to be independent. • In practice, this can be hard to achieve. • Some industries have an official, independent investigation organisation • In the event of a major incident, a „public enquiry‟ may be used, in which the evidence and investigative process is made public and so open to scrutiny.
ANALYSISThe analysis phase of an investigation needs to explore andevaluate often complex information.Experts and specialists may need to be involved at this point.There is no standard method for analysing an accident, andcontinuing debate about how this is best done.Approaches include • Narrative approaches • Causal chains • Systems approaches
NARRATIVE APPROACHESAll accident investigations will produce a narrative of some kind.Many reports are purely a narrative and a set of conclusions. Anarrative is a written account of an incident or accident. • Producing this can be non-trivial because it can be difficult to structure events, many of which may have occurred simultaneously and many of which may have ambiguities, into a linear document.Producing a narrative is a key step in making sense of an incidentNarrative accounts have serious limitations however. It is difficultto evaluate their depth and coverage, and they tend to „storify‟complex events.
“ROOT CAUSE” APPROACHESMany approaches have been developed to systematically identifythe root causes of an incident. These approaches are based onthe idea that the immediate events in an incident are symptoms ofa much deeper problem.Root cause analysis techniques usually express events as achain. The chains often branch, and multiple chains can besynchronised to represent parallel events. • Examples: MORT (management oversight risk tree), FMEA (Failure mode and effects analysis) , Barrier analysis, WBA (Why-because Analysis)
“ROOT CAUSE” APPROACHES -LIMITATIONSThe stopping problem• A causal chain could in theory go backwards indefinitely.The proximity problem• A root cause is often found to be something proximal to the accident (often a human operator).The causation problem• Hindsight and investigative biases frame particular actions in terms of their contributions to an outcomeHowever, this does not mean that it is wrong to try toidentify underlying causesInvestigations usually refer to the “likely” or “probable” rootcauses
SYSTEMS METHODSSystems methods for accident analysis have come into use overthe last decade. • From this perspective, accidents result from inadequate control or enforcement of safety-related constraints on the development, design, and operation of the systems.Systems methods emphasise controls over the system itself. Thisrecognises that no system is inherently safe, and that systems(particularly socio-technical systems) adapt and change over time. • A key approach is STAMP (Systems-Theoretic Accident Model and Processes).
SYSTEMS METHODSKey criticisms of systems models • They are often used as a means of pursuing and attributing blame to high level people in an organisation • They can turn attention too far away from the actual design and implementation of the technology
HINDSIGHT ANDFORESIGHTIt is essential to learn from our mistakes, but we should not waitfor accidents to happen before we try to improve the dependabilityof systems. How can we predict problems that may occur? Howcan we ensure systems are resilient to possible problems.Several of the methods mentioned in this lecture can be used tofollow through the consequences of possible problems or failures.Predicting possible causes and consequences of failure, unless invery narrow circumstances, can involve many arbitrary decisions.
INVESTIGATIONThe Columbia Accident Investigation Board was an independentboard set up to analyse the Columbia disaster• 13 board members and many investigators• Investigation took around 5 months• Cost approximately 17 million dollars• 230 page report producedThe proximal cause was fairly clear from the outset. Theinvestigation sought to focus on underlying causes.• The investigation focused on organisational, historical, budgetary and political factors in the shuttle programme• The questions surrounded the issue that foam strikes were routinely ignored
KEY POINTSInvestigations are important for learning from failures.Investigations often show that initial assumptions about the causeof an incident are wrong or partial. They aim to find underlying or“root” causes.All investigations involve some sort of judgement. Investigationsshould be as neutral as possible, but in practice this is difficult toachieve.There are many methods for analysing an incident oraccident, and no single right way to do this.