CS5032 Lecture 13: organisations and failure

ORGANISATIONS
AND
DEPENDABILITY 1
DR JOHN ROOKSBY

IN THIS LECTURE…
High Reliability Organisations
These are organisations that are able to achieve high reliability
from complex, critical systems
• This lecture will cover five of the key qualities said to be held
by these organisations

This lecture will use Nuclear Powered Carriers as an example
High Reliability Organisation, and NASA at the time of the
Columbia disaster as an example of an unreliable organisation

NORMAL ACCIDENTS
Charles Perrow, and introduced the idea that failures are normal in
complex systems. Perrow argued serious failures are likely when
there is:
• Interactive complexity: The presence of unfamiliar, unplanned
and unexpected sequences of events in a system that are not
visible or immediately comprehensible
• Tight coupling: The presence of interdependent components.
Tight coupling will make a system more prone to cascading
errors.

So complex, tightly coupled systems shouldn‟t be built?
HRO researchers argue that some complex, tightly coupled systems
are far more dependable than others – because of the way they are
managed

PRINCIPLES
High Reliability Low Reliability
Organisations Organisations

Focus on failure Focus on Success

Focus on reliability Focus on efficiency

Reluctant to simplify Rely on Simplicity

Dynamic hierarchies Inflexible Hierarchy

De-centralised decision making Centralised decision making

Open information Hide Information

Multiple perspectives Single perspectives

Are committed to resilience Are on “automatic pilot”

NUCLEAR POWERED
CARRIERS

Complex, high risk socio-technical systems
• Multiple (mechanical and digital) systems
• Dangerous objects (aircraft, fuel, and explosives) in close
proximity. Aircraft taking off and landing in 48-60 second
intervals.
• 6000 crew. Several different kinds of aircraft, multiple squadrons.
All work interdependently and must be coordinated.
• Carriers are 24 stories high and carry enough fuel for 15 years.
2000 telephones. 3,360 compartments and spaces

NUCLEAR POWERED
CARRIERS
High risk
• Nuclear reactor accidents
• Fire, flooding, grounding, collision
• Fuel and weapons explosions
• Mistaken identification of friends and foes
• High risks both to crew and a much larger public

High reliability
• Low “crunch rates”
• Comparatively few major accidents

COLUMBIA DISASTER
Feb 1st 2003 - Columbia disintegrates during re-entry into the
earth‟s atmosphere
The thermal protection system had been damaged during launch
when a large piece of foam insulation broke off the main
propellant tank and hit the shuttle
• Known problem.
• The majority of shuttle launches had included foam
strikes, but nothing had been done about the design
• They were aware the foam had struck the wing, but it
was not treated as serious
• Engineers concerns were not listened to

NASA
NASA had repeated similar failings
• The Challenger disaster, 28th Jan 1986 (mission STS 51-L)
• The Columbia disaster, 1st Feb 2003 (Mission STS-107)

Many of the failings were the result of deep routed organisational
findings
NASA strived to implement HRO principles

FIVE PRINCIPLES
High Reliability Low Reliability
Organisations Organisations

Focus on failure Focus on Success

Focus on reliability Focus on efficiency

Reluctant to simplify Rely on Simplicity

Dynamic hierarchies Inflexible Hierarchy

De-centralised decision making Centralised decision making

Share information Hide Information

Multiple perspectives Single perspectives

Are committed to resilience Are on “automatic pilot”

1. RELIABILITY OVER
EFFICIENCY
High Reliability Organisations give reliability precedence over
efficiency
• Decisions are made on the grounds of reliability first and then
efficiency
• Efficiency initiatives are treated with scepticism

1. RELIABILITY OVER
EFFICIENCY
High Reliability Organisations do the following:
• Managers regularly talk to and familiarise themselves with
staff about how they do their work and why.
• Organisations develop safety measures as well as financial
measures, and include these in employee evaluations
• Organisations assign value to the avoidance of accidents
• High redundancy despite cost
• Cautious actions when necessary despite cost

• Carriers have to persuade congress that enormous amounts
of redundancy (in jobs, communication structures, parts) are
necessary, and that enormous amounts of training are
necessary
• Constant training despite cost. Commanding officers demand
that carriers have regular sea exercises, that they are not just
kept in port

NASA Prioritised efficiency over reliability
• In the 1990s NASA faced drastic cuts and became overly
concerned with pleasing congress. NASA Initiated the
Faster, Better, Cheaper strategy in the mid-90s. Wanted to
stick to a strict schedule.
• With STS-107 they worried that the time needed to analyse
the foam strike would delay the next mission. Didn‟t want to
change the next missions objectives to a rescue mission.
• Saw positioning the shuttle over Hawaii for images to be
made as time consuming and costly

2. PREOCCUPATION WITH
FAILURE
High Reliability Organisations are preoccupied with failure (They do
not focus on success)
• Workers need to be heedful to the possibility of failure
• Failures are understood to be normal (but unacceptable)
• Know there can be unexpected failure modes, even in common
activities

2. PREOCCUPATION
WITH FAILURE

High Reliability Organisations address failure by
• Constant training of all people (simulations, apprenticing,
practice)
• Using incident reporting
• Designing in extensive redundancy
• Maintaining contingencies for critical operations
• Requiring proofs that something is safe, not that it is unsafe

• There is constant tracking of issues around malfunctioning,
defective and substandard equipment. They act on these by
training crew how to overcome problems and pressuring
vendors to make improvements
• Extensive redundancy (overlapping jobs, multiple channels
and centres of communications, spare parts, multiple sources
for decision making).
• Example: if an aircrafts landing gear warning light comes on,
the spotter, commander and pilot all work together to establish
what the issues is.
• Multiple contingencies are maintained. Example: There will
always be multiple options for how to land the plane (or for
the pilot to escape).

• Foam had been shed on 65 of 79 missions prior to STS-107.
There were repeated resolves to do something about this and
yet nothing happened.
• After the foam strike, engineers who raised concerns were
asked to prove it posed a danger rather than prove it didn‟t.
• No sustained effort to acquire images of the shuttle, or to
share them internally
• A shuttle was available for a rescue mission but never actually
considered.

3. SHARING THE BIG
PICTURE
High Reliability Organisations want everyone to know the whole
picture
• If people are narrowly focused they will act only in their own
interest.
• People need to maintain awareness of other people and
events around the organisation

3. SHARING THE BIG
PICTURE
High Reliability Organisations
• Train people broadly
• Educate people about overarching objectives, and set
statements of purpose
• Give people access to information on what is happening
elsewhere
• Clearly specify how people and teams fit into the whole

• Maintain awareness through many communication devices
and multiple kinds of communication device, and have
multiple centers of communication, each has direct access to
information, each is vigilant.
• Have well articulated hierarchies
• Deck hands are motivated because they are treated as core
parts of teams
• People are rotated through different jobs. Top personnel are
rotated to a different position every 90 days.

• Employees had little understanding of the overall
organisation, and its internal processes
• A team was set up with the correct expertise to assess the
foam strike damage but its objectives were fuzzy and it had
no direct connection to management
• But not given the appropriate official category “Tiger Team”
• The investigators did not know the process for requesting
images, and were rebuked when they tried because they did
not have the authority to request them or the correct approval

4. RELUCTANCE TO
SIMPLIFY
All organisations have to simplify and abstract, to filter out
unnecessary information (particularly for getting “big pictures”)
But High Reliability Organisations
• Use labels and categories as little as possible as they stop
you from looking further into details and events.
• Continually rework labels and categories
• Listen to wisdom, but with skepticism
• Do not focus on information that supports expectations, but
focus on that which doesn‟t fit or disconfirms desires

• There are clear responsibilities and tasks, but in practice the
crew are constantly negotiating, communicating and
interacting
• If there is a problem with an aircraft, multiple people take
multiple views.

• Narrowed the foam strike down to a „tile incident‟, because
management had expertise in Tiles. It was a reinforced-
carbon carbon panel (RCC) incident.
• The assessment of the damage was done using simulation
software called „Crater‟ .
• This software was designed for simulating small projectiles
but the foam debris was 640 times larger than the data used
to calibrate Crater.
• Crater was not understood by NASA and the simulation was
actually run and interpreted outside the organisation.
• The simulation was only run twice and the people who ran it
did not think it was very useful, but did not communicate this
well

5. MIGRATION OF DECISION
MAKING
High Reliability Organisations migrate decision making as far
down the organisation as possible
• Decisions are not made by one central authority. Decisions
need to be made where there is expertise. This helps
decisions to be made quickly and correctly

5. MIGRATION OF DECISION
MAKING
In order to defer expertise:
• Decision making ability migrated to the lowest appropriate
levels
• People are trained in making decisions and are given the right
resources to do so
• There is recognition of skill levels and legitimacy through the
organisation and people are trusted

• There is hierarchy, but decision making is pushed to the
extremes. For example if there is debris on the runway,
whoever spots it can halt operations and have it cleared
• Rank is not treated as an issue here

• NASA Mission STS-107
• Decision making centralised among managers and ignored
the expert opinions of engineers
• Required authority for decisions to be made
• Example: When images were requested, the organisation
worried about the rank of the requestor

KEY POINTS
• Organisational approaches are necessary for achieving
dependable systems. Dependability is not a quality of a
technology but a quality of technology-in-practice.
• Technologies are not inherently dependable, but require
people to operate and manage them in ways that are
dependable
• The HRO literature has identified a number of qualities of
highly reliable organisations. These mainly relate to the
operation of technology, although some researchers have
studied software development organisations from this
perspective.

READING
KH Roberts (1990) Some Characteristics of One Type of High Reliability
Organisation. Organisational Science, 1, 2: 160-76.

Book: Charles Perrow (1984) Normal Accidents, Living with High Risk
Technologies

Book chapter: Karl Weick (2005) Making Sense of Blurred Images. In W Starbuck
and M Farjoun, Organisation at the Limit. Blackwell publishing

CS5032 Lecture 13: organisations and failure

More Related Content

Similar to CS5032 Lecture 13: organisations and failure

More from John Rooksby

Recently uploaded

CS5032 Lecture 13: organisations and failure