Organizational Failure (LSCITS EngD 2012)

1,208 views

Published on

Discusses the organizational issues that affect systems failure

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,208
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Organizational Failure (LSCITS EngD 2012)

  1. 1. Organisational Failure Prof Ian Sommerville Video linkOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 1
  2. 2. Organisational failure • Why and how organisational factors can contribute to system failuresOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 2
  3. 3. Why organisations matter? • Organisations have multiple, inter-related, potentially conflicting goals: – Efficient resource utilisation – Timely delivery of products/services – Customer satisfaction – Owner satisfaction – Regulatory compliance – Safety and dependability – Maintenance of reputation/brand – Future developmentOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 3
  4. 4. Decision making • Organisational decision making involves taking all of these into account – Inevitably, this sometimes means making compromises that affect the safety and dependability of a system • These compromises lead to vulnerabilities and hazards that may then compromise the safety or dependability of the system • In complex organisations, there are competing priorities in different parts of the organisation – Shifting power and authority in an organisation affects decision making – May be deliberate lack of communications across the organisationOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 4
  5. 5. NASA Challenger disaster • Space shuttle exploded shortly after take-off • The cause was the failure of rubber seals (O-rings) that allowed hot gas to escape and make contact with fuel tanks which then exploded • Subsequent enquiry showed that O-ring failure was due to brittleness at low temperatures • Arguably, decision makers were complacent because – Redundant (primary and secondary) O-rings in the system – Damage to primary O-rings had been tolerated in previous launchesOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 5
  6. 6. Organisational failure • Engineers were concerned about launching in low temperatures and advised against launch • But goals other than safety and dependability took precedence and engineers were overruled – „Owner‟ satisfaction • already several delays to flight – Future planning • NASA wanted a success to support budget negotiations – Resource utilisation • Reluctance to address known problem with O-rings because of costsOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 6
  7. 7. Normal accidents • Developed by Charles Perrow who conducted a study of a nuclear accident in the USA (Three Mile Island) • Official conclusion was that the problems were due to “human error” • Perrow disagreed with this and argues that failures are „normal‟ and inevitable in complex systems which have: – Interactive complexity • The presence of unfamiliar, unplanned and unexpected sequences of events in a system that are not visible or immediately comprehensible – Tight coupling • The presence of interdependent components. • Tight coupling will make a system more prone to cascading errors.Organisational Failure, York EngD Course in LSCITS, 2012 Slide 7
  8. 8. Organisational Failure, York EngD Course in LSCITS, 2012 Slide 8
  9. 9. Redundancy • The use of redundancy is a fundamental technique in achieving system safety – Primary and secondary O-rings on space shuttle – Quintuple redundancy in Airbus FCS • Failure of primary system can be tolerated • Perrow argues that redundancy can decrease rather than increase safety: – Increases complexity and coupling in the system – Provides reassurance that system faults can be toleratedOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 9
  10. 10. Failures or successes • Normal accident theory is based on extensive studies of system failures • It argues that failure is systemic and an inherent characteristic of the system itself • Alternative perspective is based on studies of success – Why are there some areas that are apparently complex (e.g. air traffic management) where failures are relatively uncommon? • Led to the notion of high-reliability organisationsOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 10
  11. 11. Failure-free organisations? • High-reliability organisation (HRO) researchers disagree that complex, highly interdependent systems will inevitably have accidents – They believe organisations are able to compensate for technical shortcomings through their methods of operation, in essence they argue that organisations can be ‘failure free’. • Based on studies of „reliable‟ organisations – Aircraft carriers – Air traffic control – Nuclear power stations – Intensive care unitsOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 11
  12. 12. Aircraft carrier flight operationsOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 12
  13. 13. Nuclear powered carriers • Complex systems – Carriers are 24 stories high and carry enough fuel for 15 years. 2000 telephones. 3,360 compartments and spaces – Multiple software intensive systems (command systems, aircraft software) – Dangerous objects (aircraft, fuel, and explosives) in close proximity. – Aircraft taking off and landing in 48-60 second intervals. – 6000 crew. Several different kinds of aircraft, multiple squadrons. – All work interdependently and must be coordinated.Organisational Failure, York EngD Course in LSCITS, 2012 Slide 13
  14. 14. Nuclear powered carriers • High risk – Nuclear reactor accidents – Fire, flooding, grounding, collision – Fuel and weapons explosions – Mistaken identification of friends and foes – High risks both to crew and a much larger public • High reliability – Low “crunch rates” – comparatively few major accidents • High reliability achieved through organisational designOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 14
  15. 15. High Reliability Organisations • High Reliability Organisations (HROs) have particular qualities – Reliability takes precedence over efficiency – Preoccupation with failure, not success – Share the big picture – Focus on details – Migrate decisionsOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 15
  16. 16. Reliability over Efficiency – Reliability comes before efficiency but cannot replace it – Decisions are made on the grounds of reliability first and then efficiency – Efficiency initiatives are treated with scepticism – Managers regularly talk to and familiarise themselves with staff about how they do their work and why. This stops managers focusing just on figures. – Organisations develop safety measures as well as financial measures, and include these in employee evaluations – Organisations assign value to the avoidance of accidents – High redundancy despite cost – Cautious actions when necessary despite costOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 16
  17. 17. Preoccupation with Failure • HROs recognise that: – Workers need to be heedful to the possibility of failure – Failures are normal but accidents should be avoided – Acknowledge there can be unexpected failure modes, even in common activities • HROs address failure by: – Constant training of all people (simulations, apprenticing, practice) – Using incident reporting – Designing in extensive redundancy – Maintaining contingencies for critical operations – Requiring proofs that something is safe, not that it is unsafeOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 17
  18. 18. Carrier operations – There is constant tracking of issues around malfunctioning, defective and substandard equipment. • They act on these by training crew how to overcome problems and pressuring vendors to make improvements – Extensive redundancy (overlapping jobs, multiple channels and centres of communications, spare parts, multiple sources for decision making). • Example: if an aircrafts landing gear warning light comes on, the spotter, commander and pilot all work together to establish what the issues is. – Multiple contingencies are maintained • Example: There will always be multiple options for how to land the plane (or for the pilot to escape).Organisational Failure, York EngD Course in LSCITS, 2012 Slide 18
  19. 19. Sharing the Big Picture • HROs recognise that: – If people are narrowly focused they will act only in their own interest – People need to maintain awareness of other people and events around the organisation • HROs – Train people broadly – Educate people about overarching objectives, and set statements of purpose – Give people access to information on what is happening elsewhere – Clearly specify how people and teams fit into the wholeOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 19
  20. 20. Reluctance to Simplify • HROS are reluctant to simplify • All organisations have to simplify and abstract, to filter out unnecessary information (particularly for getting “big pictures”) • Rather, HROs – Use labels and categories as little as possible as they stop you from looking further into details and events. – Continually rework labels and categories – Listen to wisdom, but with skepticism – Do not focus on information that supports expectations, but focus on that which doesn‟t fit or disconfirms desiresOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 20
  21. 21. Migration of decision making • HROs migrate decision making as far down the organisation as possible – Decisions are not made by one central authority • HROs recognise: – Decisions need to be made where there is expertise – Decisions often need to be made quickly – People must be trained in making decisions and are given the right resources to do so – Skill levels and legitimacy through the organisation and people are trustedOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 21
  22. 22. HROs and Normal Accidents • HRO theory is sometimes presented as conflicting with Normal Accidents – HRO proponents may argue that accidents are not „normal‟ – Leveson critiques work on HROs and argues that they are not based on concerns of tightly coupled systems • Arguably, an HRO is an organisation that has taken active steps to: – reduce coupling and – reduce interactions – Once that has been achieved, the driver for HRO‟s is perhaps a strong „safety culture‟ to promote safety across the organisationOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 22
  23. 23. Organisational vulnerabilities • Organisational vulnerabilities are characteristics of an organisation that weaken defensive layers and so may lead to system failure. • Examples of organisational vulnerabilities – Over-reliance on process to achieve safety/dependability – Responsibility failures – Weak safety/dependability culture – Under-resourcing of safetyOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 23
  24. 24. Over-reliance on process • Quality standards such as ISO 9000 place great emphasis on process and process assurance – Implication of these standards is that process is paramount • This tends to promote a belief that focusing on process is the way to achieve safety and dependability • However, processes are never isolated and have to be enacted in a dynamic context • Sometimes necessary to deviate from the „normal‟ process to achieve safety and dependabilityOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 24
  25. 25. Responsibility failures • System failures are often a consequence of responsibility failures – Unassigned responsibility – Misassigned responsibility – Misunderstood responsibility – Duplicated responsibilities – Responsibility overload – Responsibility fragility • Responsibility failures may be a consequence of poor communications and/or under-resourcingOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 25
  26. 26. Organisational culture • “The way that we do things around here” • Culture may conflict with public statements of priorities – “The patient comes first” – “Safety is our goal” • Investment banking – High risk, high reward – Lack of regulation or weak compliance with regulations – Large-scale failuresOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 26
  27. 27. Safety culture • Some organisations have developed a strong safety culture where safety is seen as a priority by all members of the organisation • Safety culture (UK HSE) – “The product of individual and group values, attitudes, perceptions, competencies, and patterns of behaviour that determine the commitment to, and the style and proficiency of, an organization‟s health and safety management”Organisational Failure, York EngD Course in LSCITS, 2012 Slide 27
  28. 28. Safety culture (Reason)Organisational Failure, York EngD Course in LSCITS, 2012 Slide 28
  29. 29. Safety maturityOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 29
  30. 30. Under-resourcing • If operations are under-resourced then safety and dependability are often sacrificed • Organisational priorities focus on optimising resource utilisation to continue service delivery – Safety and dependability may be seen as an avoidable overhead • Example – Cleaning services in hospital outsourced to save money – Competitive tender – Under-resourced so quality of service reduced • Consequent increase in hospital acquired infectionsOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 30
  31. 31. Complex systems • Complexity = Coupling + Interaction • Lesson for LSCITS – Increasing complexity will lead to unpredictable system failure – Strive to build LSITS rather than LSCITS • Improve safety by – Reducing coupling – Reducing interactions – Redundancy may not improve safety as it increases complexity in the system • Address problems at organisational as well as the system levelOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 31
  32. 32. Key points • Organisational decisions, influenced by structure and culture, often have a major impact on safety and dependability • Normal Accident Theory postulates that accidents are inevitable in complex, tightly coupled systems • High-reliability organisations aim to achieve safety through a set of practices that aim to reduce failures • Organisational vulnerabilities include over-reliance on process, responsibility failures, poor safety culture and under-resourcingOrganisational Failure, York EngD Course in LSCITS, 2012 Slide 32

×