ORGANISATIONS
AND
DEPENDABILITY 1
DR JOHN ROOKSBY
IN THIS LECTURE…
High Reliability Organisations
These are organisations that are able to achieve high reliability
from complex, critical systems
  • This lecture will cover five of the key qualities said to be held
    by these organisations

This lecture will use Nuclear Powered Carriers as an example
High Reliability Organisation, and NASA at the time of the
Columbia disaster as an example of an unreliable organisation
NORMAL ACCIDENTS
Charles Perrow, and introduced the idea that failures are normal in
complex systems. Perrow argued serious failures are likely when
there is:
  • Interactive complexity: The presence of unfamiliar, unplanned
    and unexpected sequences of events in a system that are not
    visible or immediately comprehensible
  • Tight coupling: The presence of interdependent components.
    Tight coupling will make a system more prone to cascading
    errors.

So complex, tightly coupled systems shouldn‟t be built?
HRO researchers argue that some complex, tightly coupled systems
are far more dependable than others – because of the way they are
managed
PRINCIPLES
 High Reliability                 Low Reliability
 Organisations                    Organisations


 Focus on failure                 Focus on Success

 Focus on reliability             Focus on efficiency

 Reluctant to simplify            Rely on Simplicity

 Dynamic hierarchies              Inflexible Hierarchy

 De-centralised decision making   Centralised decision making

 Open information                 Hide Information

 Multiple perspectives            Single perspectives

 Are committed to resilience      Are on “automatic pilot”
NUCLEAR POWERED
CARRIERS

Complex, high risk socio-technical systems
  • Multiple (mechanical and digital) systems
  • Dangerous objects (aircraft, fuel, and explosives) in close
    proximity. Aircraft taking off and landing in 48-60 second
    intervals.
  • 6000 crew. Several different kinds of aircraft, multiple squadrons.
    All work interdependently and must be coordinated.
  • Carriers are 24 stories high and carry enough fuel for 15 years.
    2000 telephones. 3,360 compartments and spaces
NUCLEAR POWERED
CARRIERS
High risk
  •   Nuclear reactor accidents
  •   Fire, flooding, grounding, collision
  •   Fuel and weapons explosions
  •   Mistaken identification of friends and foes
  •   High risks both to crew and a much larger public


High reliability
  • Low “crunch rates”
  • Comparatively few major accidents
COLUMBIA DISASTER
Feb 1st 2003 - Columbia disintegrates during re-entry into the
earth‟s atmosphere
The thermal protection system had been damaged during launch
when a large piece of foam insulation broke off the main
propellant tank and hit the shuttle
  • Known problem.
       • The majority of shuttle launches had included foam
         strikes, but nothing had been done about the design
       • They were aware the foam had struck the wing, but it
         was not treated as serious
       • Engineers concerns were not listened to
NASA
NASA had repeated similar failings
  • The Challenger disaster, 28th Jan 1986 (mission STS 51-L)
  • The Columbia disaster, 1st Feb 2003 (Mission STS-107)


Many of the failings were the result of deep routed organisational
findings
NASA strived to implement HRO principles
FIVE PRINCIPLES
  High Reliability                 Low Reliability
  Organisations                    Organisations


  Focus on failure                 Focus on Success

  Focus on reliability             Focus on efficiency

  Reluctant to simplify            Rely on Simplicity

  Dynamic hierarchies              Inflexible Hierarchy

  De-centralised decision making   Centralised decision making

  Share information                Hide Information

  Multiple perspectives            Single perspectives

  Are committed to resilience      Are on “automatic pilot”
1. RELIABILITY OVER
EFFICIENCY
High Reliability Organisations give reliability precedence over
efficiency
  • Decisions are made on the grounds of reliability first and then
    efficiency
  • Efficiency initiatives are treated with scepticism
1. RELIABILITY OVER
EFFICIENCY
High Reliability Organisations do the following:
  • Managers regularly talk to and familiarise themselves with
    staff about how they do their work and why.
  • Organisations develop safety measures as well as financial
    measures, and include these in employee evaluations
  • Organisations assign value to the avoidance of accidents
  • High redundancy despite cost
  • Cautious actions when necessary despite cost
• Carriers have to persuade congress that enormous amounts
  of redundancy (in jobs, communication structures, parts) are
  necessary, and that enormous amounts of training are
  necessary
• Constant training despite cost. Commanding officers demand
  that carriers have regular sea exercises, that they are not just
  kept in port
NASA Prioritised efficiency over reliability
•   In the 1990s NASA faced drastic cuts and became overly
    concerned with pleasing congress. NASA Initiated the
    Faster, Better, Cheaper strategy in the mid-90s. Wanted to
    stick to a strict schedule.
 • With STS-107 they worried that the time needed to analyse
   the foam strike would delay the next mission. Didn‟t want to
   change the next missions objectives to a rescue mission.
 • Saw positioning the shuttle over Hawaii for images to be
   made as time consuming and costly
2. PREOCCUPATION WITH
FAILURE
High Reliability Organisations are preoccupied with failure (They do
not focus on success)
  • Workers need to be heedful to the possibility of failure
  • Failures are understood to be normal (but unacceptable)
  • Know there can be unexpected failure modes, even in common
    activities
2. PREOCCUPATION
WITH FAILURE

High Reliability Organisations address failure by
  • Constant training of all people (simulations, apprenticing,
    practice)
  • Using incident reporting
  • Designing in extensive redundancy
  • Maintaining contingencies for critical operations
  • Requiring proofs that something is safe, not that it is unsafe
• There is constant tracking of issues around malfunctioning,
  defective and substandard equipment. They act on these by
  training crew how to overcome problems and pressuring
  vendors to make improvements
• Extensive redundancy (overlapping jobs, multiple channels
  and centres of communications, spare parts, multiple sources
  for decision making).
     • Example: if an aircrafts landing gear warning light comes on,
       the spotter, commander and pilot all work together to establish
       what the issues is.
• Multiple contingencies are maintained. Example: There will
  always be multiple options for how to land the plane (or for
  the pilot to escape).
• Foam had been shed on 65 of 79 missions prior to STS-107.
  There were repeated resolves to do something about this and
  yet nothing happened.
• After the foam strike, engineers who raised concerns were
  asked to prove it posed a danger rather than prove it didn‟t.
• No sustained effort to acquire images of the shuttle, or to
  share them internally
• A shuttle was available for a rescue mission but never actually
  considered.
3. SHARING THE BIG
PICTURE
High Reliability Organisations want everyone to know the whole
picture
  • If people are narrowly focused they will act only in their own
    interest.
  • People need to maintain awareness of other people and
    events around the organisation
3. SHARING THE BIG
PICTURE
High Reliability Organisations
  • Train people broadly
  • Educate people about overarching objectives, and set
    statements of purpose
  • Give people access to information on what is happening
    elsewhere
  • Clearly specify how people and teams fit into the whole
• Maintain awareness through many communication devices
  and multiple kinds of communication device, and have
  multiple centers of communication, each has direct access to
  information, each is vigilant.
• Have well articulated hierarchies
• Deck hands are motivated because they are treated as core
  parts of teams
• People are rotated through different jobs. Top personnel are
  rotated to a different position every 90 days.
• Employees had little understanding of the overall
  organisation, and its internal processes
• A team was set up with the correct expertise to assess the
  foam strike damage but its objectives were fuzzy and it had
  no direct connection to management
• But not given the appropriate official category “Tiger Team”
• The investigators did not know the process for requesting
  images, and were rebuked when they tried because they did
  not have the authority to request them or the correct approval
4. RELUCTANCE TO
SIMPLIFY
All organisations have to simplify and abstract, to filter out
unnecessary information (particularly for getting “big pictures”)
But High Reliability Organisations
  • Use labels and categories as little as possible as they stop
    you from looking further into details and events.
  • Continually rework labels and categories
  • Listen to wisdom, but with skepticism
  • Do not focus on information that supports expectations, but
    focus on that which doesn‟t fit or disconfirms desires
•   There are clear responsibilities and tasks, but in practice the
    crew are constantly negotiating, communicating and
    interacting
•   If there is a problem with an aircraft, multiple people take
    multiple views.
• Narrowed the foam strike down to a „tile incident‟, because
  management had expertise in Tiles. It was a reinforced-
  carbon carbon panel (RCC) incident.
• The assessment of the damage was done using simulation
  software called „Crater‟ .
• This software was designed for simulating small projectiles
  but the foam debris was 640 times larger than the data used
  to calibrate Crater.
• Crater was not understood by NASA and the simulation was
  actually run and interpreted outside the organisation.
• The simulation was only run twice and the people who ran it
  did not think it was very useful, but did not communicate this
  well
5. MIGRATION OF DECISION
MAKING
High Reliability Organisations migrate decision making as far
down the organisation as possible
  • Decisions are not made by one central authority. Decisions
    need to be made where there is expertise. This helps
    decisions to be made quickly and correctly
5. MIGRATION OF DECISION
MAKING
In order to defer expertise:
  • Decision making ability migrated to the lowest appropriate
    levels
  • People are trained in making decisions and are given the right
    resources to do so
  • There is recognition of skill levels and legitimacy through the
    organisation and people are trusted
•   There is hierarchy, but decision making is pushed to the
    extremes. For example if there is debris on the runway,
    whoever spots it can halt operations and have it cleared
•   Rank is not treated as an issue here
• NASA Mission STS-107
• Decision making centralised among managers and ignored
  the expert opinions of engineers
• Required authority for decisions to be made
• Example: When images were requested, the organisation
  worried about the rank of the requestor
KEY POINTS
• Organisational approaches are necessary for achieving
  dependable systems. Dependability is not a quality of a
  technology but a quality of technology-in-practice.
• Technologies are not inherently dependable, but require
  people to operate and manage them in ways that are
  dependable
• The HRO literature has identified a number of qualities of
  highly reliable organisations. These mainly relate to the
  operation of technology, although some researchers have
  studied software development organisations from this
  perspective.
READING
KH Roberts (1990) Some Characteristics of One Type of High Reliability
Organisation. Organisational Science, 1, 2: 160-76.


Book: Charles Perrow (1984) Normal Accidents, Living with High Risk
Technologies


Book chapter: Karl Weick (2005) Making Sense of Blurred Images. In W Starbuck
and M Farjoun, Organisation at the Limit. Blackwell publishing

CS5032 Lecture 13: organisations and failure

  • 1.
  • 2.
    IN THIS LECTURE… HighReliability Organisations These are organisations that are able to achieve high reliability from complex, critical systems • This lecture will cover five of the key qualities said to be held by these organisations This lecture will use Nuclear Powered Carriers as an example High Reliability Organisation, and NASA at the time of the Columbia disaster as an example of an unreliable organisation
  • 4.
    NORMAL ACCIDENTS Charles Perrow,and introduced the idea that failures are normal in complex systems. Perrow argued serious failures are likely when there is: • Interactive complexity: The presence of unfamiliar, unplanned and unexpected sequences of events in a system that are not visible or immediately comprehensible • Tight coupling: The presence of interdependent components. Tight coupling will make a system more prone to cascading errors. So complex, tightly coupled systems shouldn‟t be built? HRO researchers argue that some complex, tightly coupled systems are far more dependable than others – because of the way they are managed
  • 5.
    PRINCIPLES High Reliability Low Reliability Organisations Organisations Focus on failure Focus on Success Focus on reliability Focus on efficiency Reluctant to simplify Rely on Simplicity Dynamic hierarchies Inflexible Hierarchy De-centralised decision making Centralised decision making Open information Hide Information Multiple perspectives Single perspectives Are committed to resilience Are on “automatic pilot”
  • 8.
    NUCLEAR POWERED CARRIERS Complex, highrisk socio-technical systems • Multiple (mechanical and digital) systems • Dangerous objects (aircraft, fuel, and explosives) in close proximity. Aircraft taking off and landing in 48-60 second intervals. • 6000 crew. Several different kinds of aircraft, multiple squadrons. All work interdependently and must be coordinated. • Carriers are 24 stories high and carry enough fuel for 15 years. 2000 telephones. 3,360 compartments and spaces
  • 9.
    NUCLEAR POWERED CARRIERS High risk • Nuclear reactor accidents • Fire, flooding, grounding, collision • Fuel and weapons explosions • Mistaken identification of friends and foes • High risks both to crew and a much larger public High reliability • Low “crunch rates” • Comparatively few major accidents
  • 12.
    COLUMBIA DISASTER Feb 1st2003 - Columbia disintegrates during re-entry into the earth‟s atmosphere The thermal protection system had been damaged during launch when a large piece of foam insulation broke off the main propellant tank and hit the shuttle • Known problem. • The majority of shuttle launches had included foam strikes, but nothing had been done about the design • They were aware the foam had struck the wing, but it was not treated as serious • Engineers concerns were not listened to
  • 13.
    NASA NASA had repeatedsimilar failings • The Challenger disaster, 28th Jan 1986 (mission STS 51-L) • The Columbia disaster, 1st Feb 2003 (Mission STS-107) Many of the failings were the result of deep routed organisational findings NASA strived to implement HRO principles
  • 14.
    FIVE PRINCIPLES High Reliability Low Reliability Organisations Organisations Focus on failure Focus on Success Focus on reliability Focus on efficiency Reluctant to simplify Rely on Simplicity Dynamic hierarchies Inflexible Hierarchy De-centralised decision making Centralised decision making Share information Hide Information Multiple perspectives Single perspectives Are committed to resilience Are on “automatic pilot”
  • 15.
    1. RELIABILITY OVER EFFICIENCY HighReliability Organisations give reliability precedence over efficiency • Decisions are made on the grounds of reliability first and then efficiency • Efficiency initiatives are treated with scepticism
  • 16.
    1. RELIABILITY OVER EFFICIENCY HighReliability Organisations do the following: • Managers regularly talk to and familiarise themselves with staff about how they do their work and why. • Organisations develop safety measures as well as financial measures, and include these in employee evaluations • Organisations assign value to the avoidance of accidents • High redundancy despite cost • Cautious actions when necessary despite cost
  • 17.
    • Carriers haveto persuade congress that enormous amounts of redundancy (in jobs, communication structures, parts) are necessary, and that enormous amounts of training are necessary • Constant training despite cost. Commanding officers demand that carriers have regular sea exercises, that they are not just kept in port
  • 18.
    NASA Prioritised efficiencyover reliability • In the 1990s NASA faced drastic cuts and became overly concerned with pleasing congress. NASA Initiated the Faster, Better, Cheaper strategy in the mid-90s. Wanted to stick to a strict schedule. • With STS-107 they worried that the time needed to analyse the foam strike would delay the next mission. Didn‟t want to change the next missions objectives to a rescue mission. • Saw positioning the shuttle over Hawaii for images to be made as time consuming and costly
  • 19.
    2. PREOCCUPATION WITH FAILURE HighReliability Organisations are preoccupied with failure (They do not focus on success) • Workers need to be heedful to the possibility of failure • Failures are understood to be normal (but unacceptable) • Know there can be unexpected failure modes, even in common activities
  • 20.
    2. PREOCCUPATION WITH FAILURE HighReliability Organisations address failure by • Constant training of all people (simulations, apprenticing, practice) • Using incident reporting • Designing in extensive redundancy • Maintaining contingencies for critical operations • Requiring proofs that something is safe, not that it is unsafe
  • 21.
    • There isconstant tracking of issues around malfunctioning, defective and substandard equipment. They act on these by training crew how to overcome problems and pressuring vendors to make improvements • Extensive redundancy (overlapping jobs, multiple channels and centres of communications, spare parts, multiple sources for decision making). • Example: if an aircrafts landing gear warning light comes on, the spotter, commander and pilot all work together to establish what the issues is. • Multiple contingencies are maintained. Example: There will always be multiple options for how to land the plane (or for the pilot to escape).
  • 22.
    • Foam hadbeen shed on 65 of 79 missions prior to STS-107. There were repeated resolves to do something about this and yet nothing happened. • After the foam strike, engineers who raised concerns were asked to prove it posed a danger rather than prove it didn‟t. • No sustained effort to acquire images of the shuttle, or to share them internally • A shuttle was available for a rescue mission but never actually considered.
  • 23.
    3. SHARING THEBIG PICTURE High Reliability Organisations want everyone to know the whole picture • If people are narrowly focused they will act only in their own interest. • People need to maintain awareness of other people and events around the organisation
  • 24.
    3. SHARING THEBIG PICTURE High Reliability Organisations • Train people broadly • Educate people about overarching objectives, and set statements of purpose • Give people access to information on what is happening elsewhere • Clearly specify how people and teams fit into the whole
  • 25.
    • Maintain awarenessthrough many communication devices and multiple kinds of communication device, and have multiple centers of communication, each has direct access to information, each is vigilant. • Have well articulated hierarchies • Deck hands are motivated because they are treated as core parts of teams • People are rotated through different jobs. Top personnel are rotated to a different position every 90 days.
  • 26.
    • Employees hadlittle understanding of the overall organisation, and its internal processes • A team was set up with the correct expertise to assess the foam strike damage but its objectives were fuzzy and it had no direct connection to management • But not given the appropriate official category “Tiger Team” • The investigators did not know the process for requesting images, and were rebuked when they tried because they did not have the authority to request them or the correct approval
  • 27.
    4. RELUCTANCE TO SIMPLIFY Allorganisations have to simplify and abstract, to filter out unnecessary information (particularly for getting “big pictures”) But High Reliability Organisations • Use labels and categories as little as possible as they stop you from looking further into details and events. • Continually rework labels and categories • Listen to wisdom, but with skepticism • Do not focus on information that supports expectations, but focus on that which doesn‟t fit or disconfirms desires
  • 28.
    There are clear responsibilities and tasks, but in practice the crew are constantly negotiating, communicating and interacting • If there is a problem with an aircraft, multiple people take multiple views.
  • 29.
    • Narrowed thefoam strike down to a „tile incident‟, because management had expertise in Tiles. It was a reinforced- carbon carbon panel (RCC) incident. • The assessment of the damage was done using simulation software called „Crater‟ . • This software was designed for simulating small projectiles but the foam debris was 640 times larger than the data used to calibrate Crater. • Crater was not understood by NASA and the simulation was actually run and interpreted outside the organisation. • The simulation was only run twice and the people who ran it did not think it was very useful, but did not communicate this well
  • 30.
    5. MIGRATION OFDECISION MAKING High Reliability Organisations migrate decision making as far down the organisation as possible • Decisions are not made by one central authority. Decisions need to be made where there is expertise. This helps decisions to be made quickly and correctly
  • 31.
    5. MIGRATION OFDECISION MAKING In order to defer expertise: • Decision making ability migrated to the lowest appropriate levels • People are trained in making decisions and are given the right resources to do so • There is recognition of skill levels and legitimacy through the organisation and people are trusted
  • 32.
    There is hierarchy, but decision making is pushed to the extremes. For example if there is debris on the runway, whoever spots it can halt operations and have it cleared • Rank is not treated as an issue here
  • 33.
    • NASA MissionSTS-107 • Decision making centralised among managers and ignored the expert opinions of engineers • Required authority for decisions to be made • Example: When images were requested, the organisation worried about the rank of the requestor
  • 34.
    KEY POINTS • Organisationalapproaches are necessary for achieving dependable systems. Dependability is not a quality of a technology but a quality of technology-in-practice. • Technologies are not inherently dependable, but require people to operate and manage them in ways that are dependable • The HRO literature has identified a number of qualities of highly reliable organisations. These mainly relate to the operation of technology, although some researchers have studied software development organisations from this perspective.
  • 35.
    READING KH Roberts (1990)Some Characteristics of One Type of High Reliability Organisation. Organisational Science, 1, 2: 160-76. Book: Charles Perrow (1984) Normal Accidents, Living with High Risk Technologies Book chapter: Karl Weick (2005) Making Sense of Blurred Images. In W Starbuck and M Farjoun, Organisation at the Limit. Blackwell publishing