SlideShare a Scribd company logo
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
1 9 5R e p o r t V o l u m e I A u g u s t 2 0 0 3
The Board began its investigation with two central ques-
tions about NASA decisions. Why did NASA continue to fly
with known foam debris problems in the years preceding the
Columbia launch, and why did NASA managers conclude
that the foam debris strike 81.9 seconds into Columbiaʼs
flight was not a threat to the safety of the mission, despite
the concerns of their engineers?
8.1 ECHOES OF CHALLENGER
As the investigation progressed, Board member Dr. Sally
Ride, who also served on the Rogers Commission, observed
that there were “echoes” of Challenger in Columbia. Ironi-
cally, the Rogers Commission investigation into Challenger
started with two remarkably similar central questions: Why
did NASA continue to fly with known O-ring erosion prob-
lems in the years before the Challenger launch, and why, on
the eve of the Challenger launch, did NASA managers decide
that launching the mission in such cold temperatures was an
acceptable risk, despite the concerns of their engineers?
The echoes did not stop there. The foam debris hit was not
the single cause of the Columbia accident, just as the failure
of the joint seal that permitted O-ring erosion was not the
single cause of Challenger. Both Columbia and Challenger
were lost also because of the failure of NASA
̓ s organiza-
tional system. Part Two of this report cites failures of the
three parts of NASA
̓ s organizational system. This chapter
shows how previous political, budgetary, and policy deci-
sions by leaders at the White House, Congress, and NASA
(Chapter 5) impacted the Space Shuttle Programʼs structure,
culture, and safety system (Chapter 7), and how these in turn
resulted in flawed decision-making (Chapter 6) for both ac-
cidents. The explanation is about system effects: how actions
taken in one layer of NASA
̓ s organizational system impact
other layers. History is not just a backdrop or a scene-setter.
History is cause. History set the Columbia and Challenger
accidents in motion. Although Part Two is separated into
chapters and sections to make clear what happened in the
political environment, the organization, and managers ̓and
engineers ̓decision-making, the three worked together. Each
is a critical link in the causal chain.
This chapter shows that both accidents were “failures of
foresight” in which history played a prominent role.1 First,
the history of engineering decisions on foam and O-ring
incidents had identical trajectories that “normalized” these
anomalies, so that flying with these flaws became routine
and acceptable. Second, NASA history had an effect. In re-
sponse to White House and Congressional mandates, NASA
leaders took actions that created systemic organizational
flaws at the time of Challenger that were also present for
Columbia. The final section compares the two critical deci-
sion sequences immediately before the loss of both Orbit-
ers – the pre-launch teleconference for Challenger and the
post-launch foam strike discussions for Columbia. It shows
history again at work: how past definitions of risk combined
with systemic problems in the NASA organization caused
both accidents.
Connecting the parts of NASA
̓ s organizational system and
drawing the parallels with Challenger demonstrate three
things. First, despite all the post-Challenger changes at
NASA and the agencyʼs notable achievements since, the
causes of the institutional failure responsible for Challenger
have not been fixed. Second, the Board strongly believes
that if these persistent, systemic flaws are not resolved,
the scene is set for another accident. Therefore, the recom-
mendations for change are not only for fixing the Shuttleʼs
technical system, but also for fixing each part of the orga-
nizational system that produced Columbiaʼs failure. Third,
the Boardʼs focus on the context in which decision making
occurred does not mean that individuals are not responsible
and accountable. To the contrary, individuals always must
assume responsibility for their actions. What it does mean
is that NASA
̓ s problems cannot be solved simply by retire-
ments, resignations, or transferring personnel.2
The constraints under which the agency has operated
throughout the Shuttle Program have contributed to both
CHAPTER 8
History As Cause:
Columbia and Challenger
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
1 9 6 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 7R e p
o r t V o l u m e I A u g u s t 2 0 0 3
Shuttle accidents. Although NASA leaders have played
an important role, these constraints were not entirely of
NASA
̓ s own making. The White House and Congress must
recognize the role of their decisions in this accident and take
responsibility for safety in the future.
8.2 FAILURES OF FORESIGHT: TWO DECISION
HISTORIES AND THE NORMALIZATION OF
DEVIANCE
Foam loss may have occurred on all missions, and left bipod
ramp foam loss occurred on 10 percent of the flights for
which visible evidence exists. The Board had a hard time
understanding how, after the bitter lessons of Challenger,
NASA could have failed to identify a similar trend. Rather
than view the foam decision only in hindsight, the Board
tried to see the foam incidents as NASA engineers and man-
agers saw them as they made their decisions. This section
gives an insider perspective: how NASA defined risk and
how those definitions changed over time for both foam debris
hits and O-ring erosion. In both cases, engineers and manag-
ers conducting risk assessments continually normalized the
technical deviations they found.3 In all official engineering
analyses and launch recommendations prior to the accidents,
evidence that the design was not performing as expected was
reinterpreted as acceptable and non-deviant, which dimin-
ished perceptions of risk throughout the agency.
The initial Shuttle design predicted neither foam debris
problems nor poor sealing action of the Solid Rocket Boost-
er joints. To experience either on a mission was a violation
of design specifications. These anomalies were signals of
potential danger, not something to be tolerated, but in both
cases after the first incident the engineering analysis con-
cluded that the design could tolerate the damage. These en-
gineers decided to implement a temporary fix and/or accept
the risk, and fly. For both O-rings and foam, that first deci-
sion was a turning point. It established a precedent for ac-
cepting, rather than eliminating, these technical deviations.
As a result of this new classification, subsequent incidents of
O-ring erosion or foam debris strikes were not defined as
signals of danger, but as evidence that the design was now
acting as predicted. Engineers and managers incorporated
worsening anomalies into the engineering experience base,
which functioned as an elastic waistband, expanding to hold
larger deviations from the original design. Anomalies that
did not lead to catastrophic failure were treated as a source
of valid engineering data that justified further flights. These
anomalies were translated into a safety margin that was ex-
tremely influential, allowing engineers and managers to add
incrementally to the amount and seriousness of damage that
was acceptable. Both O-ring erosion and foam debris events
were repeatedly “addressed” in NASA
̓ s Flight Readiness
Reviews but never fully resolved. In both cases, the engi-
neering analysis was incomplete and inadequate. Engineers
understood what was happening, but they never understood
why. NASA continued to implement a series of small correc-
tive actions, living with the problems until it was too late.4
NASA documents show how official classifications of risk
were downgraded over time.5 Program managers designated
both the foam problems and O-ring erosion as “acceptable
risks” in Flight Readiness Reviews. NASA managers also
assigned each bipod foam event In-Flight Anomaly status,
and then removed the designation as corrective actions
were implemented. But when major bipod foam-shedding
occurred on STS-112 in October 2002, Program manage-
ment did not assign an In-Flight Anomaly. Instead, it down-
graded the problem to the lower status of an “action” item.
Before Challenger, the problematic Solid Rocket Booster
joint had been elevated to a Criticality 1 item on NASAʼs
Critical Items List, which ranked Shuttle components by
failure consequences and noted why each was an accept-
able risk. The joint was later demoted to a Criticality 1-R
(redundant), and then in the month before Challengerʼs
launch was “closed out” of the problem-reporting system.
Prior to both accidents, this demotion from high-risk item
to low-risk item was very similar, but with some important
differences. Damaging the Orbiterʼs Thermal Protection
System, especially its fragile tiles, was normalized even be-
fore Shuttle launches began: it was expected due to forces
at launch, orbit, and re-entry.6 So normal was replacement
of Thermal Protection System materials that NASA manag-
ers budgeted for tile cost and turnaround maintenance time
from the start.
It was a small and logical next step for the discovery of foam
debris damage to the tiles to be viewed by NASA as part of an
already existing maintenance problem, an assessment based
on experience, not on a thorough hazard analysis. Foam de-
bris anomalies came to be categorized by the reassuring
term “in-family,” a formal classification indicating that new
occurrences of an anomaly were within the engineering ex-
perience base. “In-family” was a strange term indeed for a
violation of system requirements. Although “in-family” was
a designation introduced post-Challenger to separate prob-
lems by seriousness so that “out-of-family” problems got
more attention, by definition the problems that were shifted
into the lesser “in-family” category got less attention. The
Boardʼs investigation uncovered no paper trail showing es-
calating concern about the foam problem like the one that
Solid Rocket Booster engineers left prior to Challenger.7
So ingrained was the agencyʼs belief that foam debris was
not a threat to flight safety that in press briefings after the
Columbia accident, the Space Shuttle Program Manager
still discounted the foam as a probable cause, saying that
Shuttle managers were “comfortable” with their previous
risk assessments.
From the beginning, NASA
̓ s belief about both these prob-
lems was affected by the fact that engineers were evaluat-
ing them in a work environment where technical problems
were normal. Although management treated the Shuttle
as operational, it was in reality an experimental vehicle.
Many anomalies were expected on each mission. Against
this backdrop, an anomaly was not in itself a warning sign
of impending catastrophe. Another contributing factor was
that both foam debris strikes and O-ring erosion events were
examined separately, one at a time. Individual incidents
were not read by engineers as strong signals of danger.
What NASA engineers and managers saw were pieces of ill-
structured problems.8 An incident of O-ring erosion or foam
bipod debris would be followed by several launches where
the machine behaved properly, so that signals of danger
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
1 9 6 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 7R e p
o r t V o l u m e I A u g u s t 2 0 0 3
were followed by all-clear signals – in other words, NASA
managers and engineers were receiving mixed signals.9
Some signals defined as weak at the time were, in retrospect,
warnings of danger. Foam debris damaged tile was assumed
(erroneously) not to pose a danger to the wing. If a primary
O-ring failed, the secondary was assumed (erroneously)
to provide a backup. Finally, because foam debris strikes
were occurring frequently, like O-ring erosion in the years
before Challenger, foam anomalies became routine signals
– a normal part of Shuttle operations, not signals of danger.
Other anomalies gave signals that were strong, like wiring
malfunctions or the cracked balls in Ball Strut Tie Rod As-
semblies, which had a clear relationship to a “loss of mis-
sion.” On those occasions, NASA stood down from launch,
sometimes for months, while the problems were corrected.
In contrast, foam debris and eroding O-rings were defined
as nagging issues of seemingly little consequence. Their
significance became clear only in retrospect, after lives had
been lost.
History became cause as the repeating pattern of anomalies
was ratified as safe in Flight Readiness Reviews. The official
definitions of risk assigned to each anomaly in Flight Readi-
ness Reviews limited the actions taken and the resources
spent on these problems. Two examples of the road not taken
and the devastating implications for the future occurred close
in time to both accidents. On the October 2002 launch of
STS-112, a large piece of bipod ramp foam hit and dam-
aged the External Tank Attachment ring on the Solid Rocket
Booster skirt, a strong signal of danger 10 years after the last
known bipod ramp foam event. Prior to Challenger, there
was a comparable surprise. After a January 1985 launch, for
which the Shuttle sat on the launch pad for three consecutive
nights of unprecedented cold temperatures, engineers discov-
ered upon the Orbiter s̓ return that hot gases had eroded the
primary and reached the secondary O-ring, blackening the
putty in between – an indication that the joint nearly failed.
But accidents are not always preceded by a wake-up call.10
In 1985, engineers realized they needed data on the rela-
tionship between cold temperatures and O-ring erosion.
However, the task of getting better temperature data stayed
on the back burner because of the definition of risk: the
primary erosion was within the experience base; the sec-
ondary O-ring (thought to be redundant) was not damaged
and, significantly, there was a low probability that such cold
Florida temperatures would recur.11 The scorched putty, ini-
tially a strong signal, was redefined after analysis as weak.
On the eve of the Challenger launch, when cold temperature
became a concern, engineers had no test data on the effect
of cold temperatures on O-ring erosion. Before Columbia,
engineers concluded that the damage from the STS-112
foam hit in October 2002 was not a threat to flight safety.
The logic was that, yes, the foam piece was large and there
was damage, but no serious consequences followed. Further,
a hit this size, like cold temperature, was a low-probability
event. After analysis, the biggest foam hit to date was re-
defined as a weak signal. Similar self-defeating actions and
inactions followed. Engineers were again dealing with the
poor quality of tracking camera images of strikes during
ascent. Yet NASA took no steps to improve imagery and
took no immediate action to reduce the risk of bipod ramp
foam shedding and potential damage to the Orbiter before
Columbia. Furthermore, NASA performed no tests on what
would happen if a wing leading edge were struck by bipod
foam, even though foam had repeatedly separated from the
External Tank.
During the Challenger investigation, Rogers Commis-
sion member Dr. Richard Feynman famously compared
launching Shuttles with known problems to playing Russian
roulette.12 But that characterization is only possible in hind-
sight. It is not how NASA personnel perceived the risks as
they were being assessed, one launch at a time. Playing Rus-
sian roulette implies that the pistol-holder realizes that death
might be imminent and still takes the risk. For both foam
debris and O-ring erosion, fixes were in the works at the time
of the accidents, but there was no rush to complete them be-
cause neither problem was defined as a show-stopper. Each
time an incident occurred, the Flight Readiness process
declared it safe to continue flying. Taken one at a time, each
decision seemed correct. The agency allocated attention and
resources to these two problems accordingly. The conse-
quences of living with both of these anomalies were, in its
view, minor. Not all engineers agreed in the months immedi-
ately preceding Challenger, but the dominant view at NASA
– the managerial view – was, as one manager put it, “we
were just eroding rubber O-rings,” which was a low-cost
problem.13 The financial consequences of foam debris also
were relatively low: replacing tiles extended the turnaround
time between launches. In both cases, NASA was comfort-
able with its analyses. Prior to each accident, the agency saw
no greater consequences on the horizon.
8.3 SYSTEM EFFECTS: THE IMPACT OF HISTORY
AND POLITICS ON RISKY WORK
The series of engineering decisions that normalized technical
deviations shows one way that history became cause in both
accidents. But NASA
̓ s own history encouraged this pattern
of flying with known flaws. Seventeen years separated the
two accidents. NASA Administrators, Congresses, and po-
litical administrations changed. However, NASA
̓ s political
and budgetary situation remained the same in principle as it
had been since the inception of the Shuttle Program. NASA
remained a politicized and vulnerable agency, dependent on
key political players who accepted NASA
̓ s ambitious pro-
posals and then imposed strict budget limits. Post-Challeng-
er policy decisions made by the White House, Congress, and
NASA leadership resulted in the agency reproducing many
of the failings identified by the Rogers Commission. Policy
constraints affected the Shuttle Programʼs organization cul-
ture, its structure, and the structure of the safety system. The
three combined to keep NASA on its slippery slope toward
Challenger and Columbia. NASA culture allowed flying
with flaws when problems were defined as normal and rou-
tine; the structure of NASA
̓ s Shuttle Program blocked the
flow of critical information up the hierarchy, so definitions
of risk continued unaltered. Finally, a perennially weakened
safety system, unable to critically analyze and intervene, had
no choice but to ratify the existing risk assessments on these
two problems. The following comparison shows that these
system effects persisted through time, and affected engineer-
ing decisions in the years leading up to both accidents.
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
1 9 8 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 9R e p
o r t V o l u m e I A u g u s t 2 0 0 3
The Board found that dangerous aspects of NASAʼs 1986
culture, identified by the Rogers Commission, remained
unchanged. The Space Shuttle Program had been built on
compromises hammered out by the White House and NASA
headquarters.14 As a result, NASA was transformed from a
research and development agency to more of a business,
with schedules, production pressures, deadlines, and cost
efficiency goals elevated to the level of technical innovation
and safety goals.15 The Rogers Commission dedicated an
entire chapter of its report to production pressures.16 More-
over, the Rogers Commission, as well as the 1990 Augus-
tine Committee and the 1999 Shuttle Independent Assess-
ment Team, criticized NASA for treating the Shuttle as if it
were an operational vehicle. Launching on a tight schedule,
which the agency had pursued as part of its initial bargain
with the White House, was not the way to operate what
was in fact an experimental vehicle. The Board found that
prior to Columbia, a budget-limited Space Shuttle Program,
forced again and again to refashion itself into an efficiency
model because of repeated government cutbacks, was beset
by these same ills. The harmful effects of schedule pressure
identified in previous reports had returned.
Prior to both accidents, NASA was scrambling to keep up.
Not only were schedule pressures impacting the people
who worked most closely with the technology – techni-
cians, mission operators, flight crews, and vehicle proces-
sors – engineering decisions also were affected.17 For foam
debris and O-ring erosion, the definition of risk established
during the Flight Readiness process determined actions
taken and not taken, but the schedule and shoestring bud-
get were equally influential. NASA was cutting corners.
Launches proceeded with incomplete engineering work on
these flaws. Challenger-era engineers were working on a
permanent fix for the booster joints while launches contin-
ued.18 After the major foam bipod hit on STS-112, manage-
ment made the deadline for corrective action on the foam
problem after the next launch, STS-113, and then slipped it
again until after the flight of STS-107. Delays for flowliner
and Ball Strut Tie Rod Assembly problems left no margin in
the schedule between February 2003 and the management-
imposed February 2004 launch date for the International
Space Station Node 2. Available resources – including time
out of the schedule for research and hardware modifications
– went to the problems that were designated as serious –
those most likely to bring down a Shuttle. The NASA
culture encouraged flying with flaws because the schedule
could not be held up for routine problems that were not de-
fined as a threat to mission safety.19
The question the Board had to answer was why, since the
foam debris anomalies went on for so long, had no one rec-
ognized the trend and intervened? The O-ring history prior
to Challenger had followed the same pattern. This question
pointed the Boardʼs attention toward the NASA organiza-
tion structure and the structure of its safety system. Safety-
oriented organizations often build in checks and balances
to identify and monitor signals of potential danger. If these
checks and balances were in place in the Shuttle Program,
they werenʼt working. Again, past policy decisions pro-
duced system effects with implications for both Challenger
and Columbia.
Prior to Challenger, Shuttle Program structure had hindered
information flows, leading the Rogers Commission to con-
clude that critical information about technical problems was
not conveyed effectively through the hierarchy.20 The Space
Shuttle Program had altered its structure by outsourcing
to contractors, which added to communication problems.
The Commission recommended many changes to remedy
these problems, and NASA made many of them. However,
the Board found that those post-Challenger changes were
undone over time by management actions.21 NASA ad-
ministrators, reacting to government pressures, transferred
more functions and responsibilities to the private sector.
The change was cost-efficient, but personnel cuts reduced
oversight of contractors at the same time that the agencyʼs
dependence upon contractor engineering judgment in-
creased. When high-risk technology is the product and lives
are at stake, safety, oversight, and communication flows are
critical. The Board found that the Shuttle Programʼs normal
chain of command and matrix system did not perform a
check-and-balance function on either foam or O-rings.
The Flight Readiness Review process might have reversed
the disastrous trend of normalizing O-ring erosion and foam
debris hits, but it didnʼt. In fact, the Rogers Commission
found that the Flight Readiness process only affirmed the
pre-Challenger engineering risk assessments.22 Equally
troubling, the Board found that the Flight Readiness pro-
cess, which is built on consensus verified by signatures of
all responsible parties, in effect renders no one accountable.
Although the process was altered after Challenger, these
changes did not erase the basic problems that were built into
the structure of the Flight Readiness Review.23 Managers at
the top were dependent on engineers at the bottom for their
engineering analysis and risk assessments. Information was
lost as engineering risk analyses moved through the process.
At succeeding stages, management awareness of anomalies,
and therefore risks, was reduced either because of the need
to be increasingly brief and concise as all the parts of the
system came together, or because of the need to produce
consensus decisions at each level. The Flight Readiness
process was designed to assess hardware and take corrective
actions that would transform known problems into accept-
able flight risks, and that is precisely what it did. The 1986
House Committee on Science and Technology concluded
during its investigation into Challenger that Flight Readi-
ness Reviews had performed exactly as they were designed,
but that they could not be expected to replace engineering
analysis, and therefore they “cannot be expected to prevent
a flight because of a design flaw that Project management
had already determined an acceptable risk.”24 Those words,
true for the history of O-ring erosion, also hold true for the
history of foam debris.
The last line of defense against errors is usually a safety
system. But the previous policy decisions by leaders de-
scribed in Chapter 5 also impacted the safety structure
and contributed to both accidents. Neither in the O-ring
erosion nor the foam debris problems did NASAʼs safety
system attempt to reverse the course of events. In 1986,
the Rogers Commission called it “The Silent Safety Sys-
tem.”25 Pre-Challenger budget shortages resulted in safety
personnel cutbacks. Without clout or independence, the
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
1 9 8 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 9R e p
o r t V o l u m e I A u g u s t 2 0 0 3
safety personnel who remained were ineffective. In the
case of Columbia, the Board found the same problems
were reproduced and for an identical reason: when pressed
for cost reduction, NASA attacked its own safety system.
The faulty assumption that supported this strategy prior to
Columbia was that a reduction in safety staff would not
result in a reduction of safety, because contractors would
assume greater safety responsibility. The effectiveness
of those remaining staff safety engineers was blocked by
their dependence on the very Program they were charged
to supervise. Also, the Board found many safety units with
unclear roles and responsibilities that left crucial gaps.
Post-Challenger NASA still had no systematic procedure
for identifying and monitoring trends. The Board was sur-
prised at how long it took NASA to put together trend data
in response to Board requests for information. Problem
reporting and tracking systems were still overloaded or
underused, which undermined their very purpose. Mul-
tiple job titles disguised the true extent of safety personnel
shortages. The Board found cases in which the same person
was occupying more than one safety position – and in one
instance at least three positions – which compromised any
possibility of safety organization independence because the
jobs were established with built-in conflicts of interest.
8.4 ORGANIZATION, CULTURE, AND
UNINTENDED CONSEQUENCES
A number of changes to the Space Shuttle Program structure
made in response to policy decisions had the unintended
effect of perpetuating dangerous aspects of pre-Challenger
culture and continued the pattern of normalizing things that
were not supposed to happen. At the same time that NASA
leaders were emphasizing the importance of safety, their
personnel cutbacks sent other signals. Streamlining and
downsizing, which scarcely go unnoticed by employees,
convey a message that efficiency is an important goal.
The Shuttle/Space Station partnership affected both pro-
grams. Working evenings and weekends just to meet the
International Space Station Node 2 deadline sent a signal
to employees that schedule is important. When paired with
the “faster, better, cheaper” NASA motto of the 1990s and
cuts that dramatically decreased safety personnel, efficiency
becomes a strong signal and safety a weak one. This kind of
doublespeak by top administrators affects peopleʼs decisions
and actions without them even realizing it.26
Changes in Space Shuttle Program structure contributed to
the accident in a second important way. Despite the con-
straints that the agency was under, prior to both accidents
NASA appeared to be immersed in a culture of invincibility,
in stark contradiction to post-accident reality. The Rogers
Commission found a NASA blinded by its “Can-Do” atti-
tude,27 a cultural artifact of the Apollo era that was inappro-
priate in a Space Shuttle Program so strapped by schedule
pressures and shortages that spare parts had to be cannibal-
ized from one vehicle to launch another.28 This can-do atti-
tude bolstered administrators ̓belief in an achievable launch
rate, the belief that they had an operational system, and an
unwillingness to listen to outside experts. The Aerospace
Safety and Advisory Panel in a 1985 report told NASA
that the vehicle was not operational and NASA should stop
treating it as if it were.29 The Board found that even after the
loss of Challenger, NASA was guilty of treating an experi-
mental vehicle as if it were operational and of not listening
to outside experts. In a repeat of the pre-Challenger warn-
ing, the 1999 Shuttle Independent Assessment Team report
reiterated that “the Shuttle was not an ʻoperational ̓vehicle
in the usual meaning of the term.”30 Engineers and program
planners were also affected by “Can-Do,” which, when
taken too far, can create a reluctance to say that something
cannot be done.
How could the lessons of Challenger have been forgotten
so quickly? Again, history was a factor. First, if success
is measured by launches and landings,31 the machine ap-
peared to be working successfully prior to both accidents.
Challenger was the 25th launch. Seventeen years and 87
missions passed without major incident. Second, previous
policy decisions again had an impact. NASAʼs Apollo-era
research and development culture and its prized deference
to the technical expertise of its working engineers was
overridden in the Space Shuttle era by “bureaucratic ac-
countability” – an allegiance to hierarchy, procedure, and
following the chain of command.32 Prior to Challenger, the
can-do culture was a result not just of years of apparently
successful launches, but of the cultural belief that the Shut-
tle Programʼs many structures, rigorous procedures, and
detailed system of rules were responsible for those success-
es.33 The Board noted that the pre-Challenger layers of pro-
cesses, boards, and panels that had produced a false sense of
confidence in the system and its level of safety returned in
full force prior to Columbia. NASA made many changes to
the Space Shuttle Program structure after Challenger. The
fact that many changes had been made supported a belief in
the safety of the system, the invincibility of organizational
and technical systems, and ultimately, a sense that the foam
problem was understood.
8.5 HISTORY AS CAUSE: TWO ACCIDENTS
Risk, uncertainty, and history came together when unprec-
edented circumstances arose prior to both accidents. For
Challenger, the weather prediction for launch time the next
day was for cold temperatures that were out of the engineer-
ing experience base. For Columbia, a large foam hit – also
outside the experience base – was discovered after launch.
For the first case, all the discussion was pre-launch; for
the second, it was post-launch. This initial difference de-
termined the shape these two decision sequences took, the
number of people who had information about the problem,
and the locations of the involved parties.
For Challenger, engineers at Morton-Thiokol,34 the Solid
Rocket Motor contractor in Utah, were concerned about
the effect of the unprecedented cold temperatures on the
rubber O-rings.35 Because launch was scheduled for the
next morning, the new condition required a reassessment of
the engineering analysis presented at the Flight Readiness
Review two weeks prior. A teleconference began at 8:45
p.m. Eastern Standard Time (EST) that included 34 people
in three locations: Morton-Thiokol in Utah, Marshall, and
Kennedy. Thiokol engineers were recommending a launch
delay. A reconsideration of a Flight Readiness Review risk
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
2 0 0 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 1R e p
o r t V o l u m e I A u g u s t 2 0 0 3
assessment the night before a launch was as unprecedented
as the predicted cold temperatures. With no ground rules or
procedures to guide their discussion, the participants auto-
matically reverted to the centralized, hierarchical, tightly
structured, and procedure-bound model used in Flight Read-
iness Reviews. The entire discussion and decision to launch
began and ended with this group of 34 engineers. The phone
conference linking them together concluded at 11:15 p.m.
EST after a decision to accept the risk and fly.
For Columbia, information about the foam debris hit was
widely distributed the day after launch. Time allowed for
videos of the strike, initial assessments of the size and speed
of the foam, and the approximate location of the impact to
be dispersed throughout the agency. This was the first de-
bris impact of this magnitude. Engineers at the Marshall,
Johnson, Kennedy, and Langley centers showed initiative
and jumped on the problem without direction from above.
Working groups and e-mail groups formed spontaneously.
The size of Johnsonʼs Debris Assessment Team alone neared
and in some instances exceeded the total number of partici-
pants in the 1986 Challenger teleconference. Rather than a
tightly constructed exchange of information completed in a
few hours, time allowed for the development of ideas and
free-wheeling discussion among the engineering ranks. The
early post-launch discussion among engineers and all later
decision-making at management levels were decentralized,
loosely organized, and with little form. While the spontane-
ous and decentralized exchanging of information was evi-
dence that NASA
̓ s original technical culture was alive and
well, the diffuse form and lack of structure in the rest of the
proceedings would have several negative consequences.
In both situations, all new information was weighed and
interpreted against past experience. Formal categories and
cultural beliefs provide a consistent frame of reference in
which people view and interpret information and experi-
ences.36 Pre-existing definitions of risk shaped the actions
taken and not taken. Worried engineers in 1986 and again
in 2003 found it impossible to reverse the Flight Readiness
Review risk assessments that foam and O-rings did not pose
safety-of-flight concerns. These engineers could not prove
that foam strikes and cold temperatures were unsafe, even
though the previous analyses that declared them safe had
been incomplete and were based on insufficient data and
testing. Engineers ̓ failed attempts were not just a matter
of psychological frames and interpretations. The obstacles
these engineers faced were political and organizational.
They were rooted in NASA history and the decisions of
leaders that had altered NASA culture, structure, and the
structure of the safety system and affected the social con-
text of decision-making for both accidents. In the following
comparison of these critical decision scenarios for Columbia
and Challenger, the systemic problems in the NASA orga-
nization are in italics, with the system effects on decision-
making following.
NASA had conflicting goals of cost, schedule, and safety.
Safety lost out as the mandates of an “operational system”
increased the schedule pressure. Scarce resources went to
problems that were defined as more serious, rather than to
foam strikes or O-ring erosion.
In both situations, upper-level managers and engineering
teams working the O-ring and foam strike problems held
opposing definitions of risk. This was demonstrated imme-
diately, as engineers reacted with urgency to the immediate
safety implications: Thiokol engineers scrambled to put
together an engineering assessment for the teleconference,
Langley Research Center engineers initiated simulations
of landings that were run after hours at Ames Research
Center, and Boeing analysts worked through the weekend
on the debris impact analysis. But key managers were re-
sponding to additional demands of cost and schedule, which
competed with their safety concerns. NASA
̓ s conflicting
goals put engineers at a disadvantage before these new situ-
ations even arose. In neither case did they have good data
as a basis for decision-making. Because both problems had
been previously normalized, resources sufficient for testing
or hardware were not dedicated. The Space Shuttle Program
had not produced good data on the correlation between cold
temperature and O-ring resilience or good data on the poten-
tial effect of bipod ramp foam debris hits.37
Cultural beliefs about the low risk O-rings and foam debris
posed, backed by years of Flight Readiness Review deci-
sions and successful missions, provided a frame of refer-
ence against which the engineering analyses were judged.
When confronted with the engineering risk assessments, top
Shuttle Program managers held to the previous Flight Readi-
ness Review assessments. In the Challenger teleconference,
where engineers were recommending that NASA delay the
launch, the Marshall Solid Rocket Booster Project manager,
Lawrence Mulloy, repeatedly challenged the contractorʼs
risk assessment and restated Thiokolʼs engineering ratio-
nale for previous flights.38 STS-107 Mission Management
Team Chair Linda Ham made many statements in meetings
reiterating her understanding that foam was a maintenance
problem and a turnaround issue, not a safety-of-flight issue.
The effects of working as a manager in a culture with a cost/
efficiency/safety conflict showed in managerial responses. In
both cases, managers ̓techniques focused on the information
that tended to support the expected or desired result at that
time. In both cases, believing the safety of the mission was
not at risk, managers drew conclusions that minimized the
risk of delay.39 At one point, Marshall s̓ Mulloy, believing
in the previous Flight Readiness Review assessments, un-
convinced by the engineering analysis, and concerned about
the schedule implications of the 53-degree temperature limit
on launch the engineers proposed, said, “My God, Thiokol,
when do you want me to launch, next April?”40 Reflecting the
overall goal of keeping to the Node 2 launch schedule, Ham s̓
priority was to avoid the delay of STS–114, the next mis-
sion after STS-107. Ham was slated as Manager of Launch
Integration for STS-114 – a dual role promoting a conflict of
interest and a single-point failure, a situation that should be
avoided in all organizational as well as technical systems.
NASA s̓ culture of bureaucratic accountability emphasized
chain of command, procedure, following the rules, and go-
ing by the book. While rules and procedures were essential
for coordination, they had an unintended but negative effect.
Allegiance to hierarchy and procedure had replaced defer-
ence to NASA engineers ̓technical expertise.
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
2 0 0 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 1R e p
o r t V o l u m e I A u g u s t 2 0 0 3
In both cases, engineers initially presented concerns as well
as possible solutions – a request for images, a recommenda-
tion to place temperature constraints on launch. Manage-
ment did not listen to what their engineers were telling them.
Instead, rules and procedures took priority. For Columbia,
program managers turned off the Kennedy engineers ̓initial
request for Department of Defense imagery, with apologies
to Defense Department representatives for not having fol-
lowed “proper channels.” In addition, NASA administrators
asked for and promised corrective action to prevent such
a violation of protocol from recurring. Debris Assessment
Team analysts at Johnson were asked by managers to dem-
onstrate a “mandatory need” for their imagery request, but
were not told how to do that. Both Challenger and Columbia
engineering teams were held to the usual quantitative stan-
dard of proof. But it was a reverse of the usual circumstance:
instead of having to prove it was safe to fly, they were asked
to prove that it was unsafe to fly.
In the Challenger teleconference, a key engineering chart
presented a qualitative argument about the relationship be-
tween cold temperatures and O-ring erosion that engineers
were asked to prove. Thiokolʼs Roger Boisjoly said, “I had
no data to quantify it. But I did say I knew it was away from
goodness in the current data base.”41 Similarly, the Debris
Assessment Team was asked to prove that the foam hit was
a threat to flight safety, a determination that only the imag-
ery they were requesting could help them make. Ignored by
management was the qualitative data that the engineering
teams did have: both instances were outside the experience
base. In stark contrast to the requirement that engineers ad-
here to protocol and hierarchy was managementʼs failure to
apply this criterion to their own activities. The Mission Man-
agement Team did not meet on a regular schedule during the
mission, proceeded in a loose format that allowed informal
influence and status differences to shape their decisions, and
allowed unchallenged opinions and assumptions to prevail,
all the while holding the engineers who were making risk
assessments to higher standards. In highly uncertain circum-
stances, when lives were immediately at risk, management
failed to defer to its engineers and failed to recognize that
different data standards – qualitative, subjective, and intui-
tive – and different processes – democratic rather than proto-
col and chain of command – were more appropriate.
The organizational structure and hierarchy blocked effective
communication of technical problems. Signals were over-
looked, people were silenced, and useful information and
dissenting views on technical issues did not surface at higher
levels. What was communicated to parts of the organization
was that O-ring erosion and foam debris were not problems.
Structure and hierarchy represent power and status. For both
Challenger and Columbia, employees ̓positions in the orga-
nization determined the weight given to their information,
by their own judgment and in the eyes of others. As a result,
many signals of danger were missed. Relevant information
that could have altered the course of events was available
but was not presented.
Early in the Challenger teleconference, some engineers who
had important information did not speak up. They did not
define themselves as qualified because of their position: they
were not in an appropriate specialization, had not recently
worked the O-ring problem, or did not have access to the
“good data” that they assumed others more involved in key
discussions would have.42 Geographic locations also re-
sulted in missing signals. At one point, in light of Marshallʼs
objections, Thiokol managers in Utah requested an “off-line
caucus” to discuss their data. No consensus was reached,
so a “management risk decision” was made. Managers
voted and engineers did not. Thiokol managers came back
on line, saying they had reversed their earlier NO-GO rec-
ommendation, decided to accept risk, and would send new
engineering charts to back their reversal. When a Marshall
administrator asked, “Does anyone have anything to add to
this?,” no one spoke. Engineers at Thiokol who still objected
to the decision later testified that they were intimidated by
management authority, were accustomed to turning their
analysis over to managers and letting them decide, and did
not have the quantitative data that would empower them to
object further.43
In the more decentralized decision process prior to
Columbia s̓ re-entry, structure and hierarchy again were re-
sponsible for an absence of signals. The initial request for
imagery came from the “low status” Kennedy Space Center,
bypassed the Mission Management Team, and went directly
to the Department of Defense separate from the all-power-
ful Shuttle Program. By using the Engineering Directorate
avenue to request imagery, the Debris Assessment Team was
working at the margins of the hierarchy. But some signals
were missing even when engineers traversed the appropriate
channels. The Mission Management Team Chair s̓ position in
the hierarchy governed what information she would or would
not receive. Information was lost as it traveled up the hierar-
chy. A demoralized Debris Assessment Team did not include
a slide about the need for better imagery in their presentation
to the Mission Evaluation Room. Their presentation included
the Crater analysis, which they reported as incomplete and
uncertain. However, the Mission Evaluation Room manager
perceived the Boeing analysis as rigorous and quantitative.
The choice of headings, arrangement of information, and size
of bullets on the key chart served to highlight what manage-
ment already believed. The uncertainties and assumptions
that signaled danger dropped out of the information chain
when the Mission Evaluation Room manager condensed the
Debris Assessment Team s̓ formal presentation to an infor-
mal verbal brief at the Mission Management Team meeting.
As what the Board calls an “informal chain of command”
began to shape STS-107ʼs outcome, location in the struc-
ture empowered some to speak and silenced others. For
example, a Thermal Protection System tile expert, who was
a member of the Debris Assessment Team but had an office
in the more prestigious Shuttle Program, used his personal
network to shape the Mission Management Team view and
snuff out dissent. The informal hierarchy among and within
Centers was also influential. Early identifications of prob-
lems by Marshall and Kennedy may have contributed to the
Johnson-based Mission Management Teamʼs indifference to
concerns about the foam strike. The engineers and managers
circulating e-mails at Langley were peripheral to the Shuttle
Program, not structurally connected to the proceedings, and
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
2 0 2 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 3R e p
o r t V o l u m e I A u g u s t 2 0 0 3
therefore of lower status. When asked in a post-accident
press conference why they didnʼt voice their concerns to
Shuttle Program management, the Langley engineers said
that people “need to stick to their expertise.”44 Status mat-
tered. In its absence, numbers were the great equalizer.
One striking exception: the Debris Assessment Team tile
expert was so influential that his word was taken as gospel,
though he lacked the requisite expertise, data, or analysis
to evaluate damage to RCC. For those with lesser standing,
the requirement for data was stringent and inhibiting, which
resulted in information that warned of danger not being
passed up the chain. As in the teleconference, Debris As-
sessment Team engineers did not speak up when the Mission
Management Team Chair asked if anyone else had anything
to say. Not only did they not have the numbers, they also
were intimidated by the Mission Management Team Chairʼs
position in the hierarchy and the conclusions she had already
made. Debris Assessment Team members signed off on the
Crater analysis, even though they had trouble understanding
it. They still wanted images of Columbiaʼs left wing.
In neither impending crisis did management recognize how
structure and hierarchy can silence employees and follow
through by polling participants, soliciting dissenting opin-
ions, or bringing in outsiders who might have a different
perspective or useful information. In perhaps the ultimate
example of engineering concerns not making their way
upstream, Challenger astronauts were told that the cold tem-
perature was not a problem, and Columbia astronauts were
told that the foam strike was not a problem.
NASA structure changed as roles and responsibilities were
transferred to contractors, which increased the dependence
on the private sector for safety functions and risk assess-
ment while simultaneously reducing the in-house capability
to spot safety issues.
A critical turning point in both decisions hung on the discus-
sion of contractor risk assessments. Although both Thiokol
and Boeing engineering assessments were replete with
uncertainties, NASA ultimately accepted each. Thiokolʼs
initial recommendation against the launch of Challenger
was at first criticized by Marshall as flawed and unaccept-
able. Thiokol was recommending an unheard-of delay on
the eve of a launch, with schedule ramifications and NASA-
contractor relationship repercussions. In the Thiokol off-line
caucus, a senior vice president who seldom participated in
these engineering discussions championed the Marshall
engineering rationale for flight. When he told the managers
present to “Take off your engineering hat and put on your
management hat,” they reversed the position their own
engineers had taken.45 Marshall engineers then accepted
this assessment, deferring to the expertise of the contractor.
NASA was dependent on Thiokol for the risk assessment,
but the decision process was affected by the contractorʼs
dependence on NASA. Not willing to be responsible for a
delay, and swayed by the strength of Marshallʼs argument,
the contractor did not act in the best interests of safety.
Boeingʼs Crater analysis was performed in the context of
the Debris Assessment Team, which was a collaborative
effort that included Johnson, United Space Alliance, and
Boeing. In this case, the decision process was also affected
by NASA
̓ s dependence on the contractor. Unfamiliar with
Crater, NASA engineers and managers had to rely on Boeing
for interpretation and analysis, and did not have the train-
ing necessary to evaluate the results. They accepted Boeing
engineers ̓use of Crater to model a debris impact 400 times
outside validated limits.
NASA s̓ safety system lacked the resources, independence,
personnel, and authority to successfully apply alternate per-
spectives to developing problems. Overlapping roles and re-
sponsibilities across multiple safety offices also undermined
the possibility of a reliable system of checks and balances.
NASA
̓ s “Silent Safety System” did nothing to alter the deci-
sion-making that immediately preceded both accidents. No
safety representatives were present during the Challenger
teleconference – no one even thought to call them.46 In the
case of Columbia, safety representatives were present at
Mission Evaluation Room, Mission Management Team, and
Debris Assessment Team meetings. However, rather than
critically question or actively participate in the analysis, the
safety representatives simply listened and concurred.
8.6 CHANGING NASAʼS ORGANIZATIONAL
SYSTEM
The echoes of Challenger in Columbia identified in this
chapter have serious implications. These repeating patterns
mean that flawed practices embedded in NASA
̓ s organiza-
tional system continued for 20 years and made substantial
contributions to both accidents. The Columbia Accident
Investigation Board noted the same problems as the Rog-
ers Commission. An organization system failure calls for
corrective measures that address all relevant levels of the
organization, but the Boardʼs investigation shows that for all
its cutting-edge technologies, “diving-catch” rescues, and
imaginative plans for the technology and the future of space
exploration, NASA has shown very little understanding of
the inner workings of its own organization.
NASA managers believed that the agency had a strong
safety culture, but the Board found that the agency had
the same conflicting goals that it did before Challenger,
when schedule concerns, production pressure, cost-cut-
ting and a drive for ever-greater efficiency – all the signs
of an “operational” enterprise – had eroded NASA
̓ s abil-
ity to assure mission safety. The belief in a safety culture
has even less credibility in light of repeated cuts of safety
personnel and budgets – also conditions that existed before
Challenger. NASA managers stated confidently that every-
one was encouraged to speak up about safety issues and that
the agency was responsive to those concerns, but the Board
found evidence to the contrary in the responses to the Debris
Assessment Teamʼs request for imagery, to the initiation of
the imagery request from Kennedy Space Center, and to the
“we were just ʻwhat-iffingʼ” e-mail concerns that did not
reach the Mission Management Team. NASA
̓ s bureaucratic
structure kept important information from reaching engi-
neers and managers alike. The same NASA whose engineers
showed initiative and a solid working knowledge of how
to get things done fast had a managerial culture with an al-
legiance to bureaucracy and cost-efficiency that squelched
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
2 0 2 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 3R e p
o r t V o l u m e I A u g u s t 2 0 0 3
the engineers ̓ efforts. When it came to managers ̓ own ac-
tions, however, a different set of rules prevailed. The Board
found that Mission Management Team decision-making
operated outside the rules even as it held its engineers to
a stifling protocol. Management was not able to recognize
that in unprecedented conditions, when lives are on the line,
flexibility and democratic process should take priority over
bureaucratic response.47
During the Columbia investigation, the Board consistently
searched for causal principles that would explain both the
technical and organizational system failures. These prin-
ciples were needed to explain Columbia and its echoes of
Challenger. They were also necessary to provide guidance
for NASA. The Boardʼs analysis of organizational causes in
Chapters 5, 6, and 7 supports the following principles that
should govern the changes in the agencyʼs organizational
system. The Boardʼs specific recommendations, based on
these principles, are presented in Part Three.
Leaders create culture. It is their responsibility to change
it. Top administrators must take responsibility for risk,
failure, and safety by remaining alert to the effects their
decisions have on the system. Leaders are responsible for
establishing the conditions that lead to their subordinates ̓
successes or failures. The past decisions of national lead-
ers – the White House, Congress, and NASA Headquarters
– set the Columbia accident in motion by creating resource
and schedule strains that compromised the principles of a
high-risk technology organization. The measure of NASA
̓ s
success became how much costs were reduced and how ef-
ficiently the schedule was met. But the Space Shuttle is not
now, nor has it ever been, an operational vehicle. We cannot
explore space on a fixed-cost basis. Nevertheless, due to
International Space Station needs and scientific experiments
that require particular timing and orbits, the Space Shuttle
Program seems likely to continue to be schedule-driven.
National leadership needs to recognize that NASA must fly
only when it is ready. As the White House, Congress, and
NASA Headquarters plan the future of human space flight,
the goals and the resources required to achieve them safely
must be aligned.
Changes in organizational structure should be made only
with careful consideration of their effect on the system and
their possible unintended consequences. Changes that make
the organization more complex may create new ways that it
can fail.48 When changes are put in place, the risk of error
initially increases, as old ways of doing things compete with
new. Institutional memory is lost as personnel and records
are moved and replaced. Changing the structure of organi-
zations is complicated by external political and budgetary
constraints, the inability of leaders to conceive of the full
ramifications of their actions, the vested interests of insiders,
and the failure to learn from the past.49
Nonetheless, changes must be made. The Shuttle Programʼs
structure is a source of problems, not just because of the
way it impedes the flow of information, but because it
has had effects on the culture that contradict safety goals.
NASAʼs blind spot is it believes it has a strong safety cul-
ture. Program history shows that the loss of a truly indepen-
dent, robust capability to protect the systemʼs fundamental
requirements and specifications inevitably compromised
those requirements, and therefore increased risk. The
Shuttle Programʼs structure created power distributions that
need new structuring, rules, and management training to
restore deference to technical experts, empower engineers
to get resources they need, and allow safety concerns to be
freely aired.
Strategies must increase the clarity, strength, and presence
of signals that challenge assumptions about risk. Twice in
NASA history, the agency embarked on a slippery slope that
resulted in catastrophe. Each decision, taken by itself, seemed
correct, routine, and indeed, insignificant and unremarkable.
Yet in retrospect, the cumulative effect was stunning. In
both pre-accident periods, events unfolded over a long time
and in small increments rather than in sudden and dramatic
occurrences. NASA
̓ s challenge is to design systems that
maximize the clarity of signals, amplify weak signals so they
can be tracked, and account for missing signals. For both ac-
cidents there were moments when management definitions
of risk might have been reversed were it not for the many
missing signals – an absence of trend analysis, imagery data
not obtained, concerns not voiced, information overlooked
or dropped from briefings. A safety team must have equal
and independent representation so that managers are not
again lulled into complacency by shifting definitions of risk.
It is obvious but worth acknowledging that people who are
marginal and powerless in organizations may have useful
information or opinions that they donʼt express. Even when
these people are encouraged to speak, they find it intimidat-
ing to contradict a leader s̓ strategy or a group consensus.
Extra effort must be made to contribute all relevant informa-
tion to discussions of risk. These strategies are important for
all safety aspects, but especially necessary for ill-structured
problems like O-rings and foam debris. Because ill-structured
problems are less visible and therefore invite the normaliza-
tion of deviance, they may be the most risky of all.
Challenger launches on the ill-fated STS-33/51-L mission on
Janu-
ary 28, 1986. The Orbiter would be destroyed 73 seconds later.
A C C I D E N T I N V E S T I G A T I O N B O A R D
COLUMBIA
2 0 4 R e p o r t V o l u m e I A u g u s t 2 0 0 3
ENDNOTES FOR CHAPTER 8
The citations that contain a reference to “CAIB document” with
CAB or
CTF followed by seven to eleven digits, such as CAB001-0010,
refer to a
document in the Columbia Accident Investigation Board
database maintained
by the Department of Justice and archived at the National
Archives.
1 Turner studied 85 different accidents and disasters, noting a
common
pattern: each had a long incubation period in which hazards and
warning signs prior to the accident were either ignored or
misinterpreted.
He called these “failures of foresight.” Barry Turner, Man-made
Disasters,
(London: Wykeham, 1978); Barry Turner and Nick Pidgeon,
Man-made
Disasters, 2nd ed. (Oxford: Butterworth Heinneman,1997).
2 Changing personnel is a typical response after an organization
has
some kind of harmful outcome. It has great symbolic value. A
change in
personnel points to individuals as the cause and removing them
gives the
false impression that the problems have been solved, leaving
unresolved
organizational system problems. See Scott Sagan, The Limits of
Safety.
Princeton: Princeton University Press, 1993.
3 Diane Vaughan, The Challenger Launch Decision: Risky
Technology,
Culture, and Deviance at NASA (Chicago: University of
Chicago Press.
1996).
4 William H. Starbuck and Frances J. Milliken, “Challenger:
Fine-tuning
the Odds until Something Breaks.” Journal of Management
Studies 23
(1988), pp. 319-40.
5 Report of the Presidential Commission on the Space Shuttle
Challenger
Accident, (Washington: Government Printing Office, 1986),
Vol. II,
Appendix H.
6 Alex Roland, “The Shuttle: Triumph or Turkey?” Discover,
November
1985: pp. 29-49.
7 Report of the Presidential Commission, Vol. I, Ch. 6.
8 Turner, Man-made Disasters.
9 Vaughan, The Challenger Launch Decision, pp. 243-49, 253-
57, 262-64,
350-52, 356-72.
10 Turner, Man-made Disasters.
11 U.S. Congress, House, Investigation of the Challenger
Accident,
(Washington: Government Printing Office, 1986), pp. 149.
12 Report of the Presidential Commission, Vol. I, p. 148; Vol.
IV, p. 1446.
13 Vaughan, The Challenger Launch Decision, p. 235.
14 Report of the Presidential Commission, Vol. I, pp. 1-3.
15 Howard E. McCurdy, “The Decay of NASAʼs Technical
Culture,” Space
Policy (November 1989), pp. 301-10.
16 Report of the Presidential Commission, Vol. I, pp. 164-177.
17 Report of the Presidential Commission, Vol. I, Ch. VII and
VIII.
18 Report of the Presidential Commission, Vol. I, pp. 140.
19 For background on culture in general and engineering culture
in
particular, see Peter Whalley and Stephen R. Barley, “Technical
Work
in the Division of Labor: Stalking the Wily Anomaly,” in
Stephen R.
Barley and Julian Orr (eds.) Between Craft and Science, (Ithaca:
Cornell
University Press, 1997) pp. 23-53; Gideon Kunda, Engineering
Culture:
Control and Commitment in a High-Tech Corporation,
(Philadelphia:
Temple University Press, 1992); Peter Meiksins and James M.
Watson,
“Professional Autonomy and Organizational Constraint: The
Case of
Engineers,” Sociological Quarterly 30 (1989), pp. 561-85;
Henry
Petroski, To Engineer is Human: The Role of Failure in
Successful Design
(New York: St. Martinʼs, 1985); Edgar Schein. Organization
Culture and
Leadership, (San Francisco: Jossey-Bass, 1985); John Van
Maanen and
Stephen R. Barley, “Cultural Organization,” in Peter J. Frost,
Larry F.
Moore, Meryl Ries Louise, Craig C. Lundberg, and Joanne
Martin (eds.)
Organization Culture, (Beverly Hills: Sage, 1985).
20 Report of the Presidential Commission, Vol. I, pp. 82-111.
21 Harry McDonald, Report of the Shuttle Independent
Assessment Team.
22 Report of the Presidential Commission, Vol. I, pp. 145-148.
23 Vaughan, The Challenger Launch Decision, pp. 257-264.
24 U. S. Congress, House, Investigation of the Challenger
Accident,
(Washington: Government Printing Office, 1986), pp. 70-71.
25 Report of the Presidential Commission, Vol. I, Ch.VII.
26 Mary Douglas, How Institutions Think (London: Routledge
and Kegan
Paul, 1987); Michael Burawoy, Manufacturing Consent
(Chicago:
University of Chicago Press, 1979).
27 Report of the Presidential Commission, Vol. I, pp. 171-173.
28 Report of the Presidential Commission, Vol. I, pp. 173-174.
29 National Aeronautics and Space Administration, Aerospace
Safety
Advisory Panel, “National Aeronautics and Space
Administration Annual
Report: Covering Calendar Year 1984,” (Washington:
Government
Printing Office, 1985).
30 Harry McDonald, Report of the Shuttle Independent
Assessment Team.
31 Richard J. Feynman, “Personal Observations on Reliability
of the
Shuttle,” Report of the Presidential Commission, Appendix F:1.
32 Howard E. McCurdy, “The Decay of NASAʼs Technical
Culture,” Space
Policy (November 1989), pp. 301-10; See also Howard E.
McCurdy,
Inside NASA (Baltimore: Johns Hopkins University Press,
1993).
33 Diane Vaughan, “The Trickle-Down Effect: Policy
Decisions, Risky Work,
and the Challenger Tragedy,” California Management Review,
39, 2,
Winter 1997.
34 Morton subsequently sold its propulsion division of Alcoa,
and the
company is now known as ATK Thiokol Propulsion.
35 Report of the Presidential Commission, pp. 82-118.
36 For discussions of how frames and cultural beliefs shape
perceptions, see,
e.g., Lee Clarke, “The Disqualification Heuristic: When Do
Organizations
Misperceive Risk?” in Social Problems and Public Policy, vol.
5, ed. R. Ted
Youn and William F. Freudenberg, (Greenwich, CT: JAI, 1993);
William
Starbuck and Frances Milliken, “Executive Perceptual Filters –
What They
Notice and How They Make Sense,” in The Executive Effect,
Donald C.
Hambrick, ed. (Greenwich, CT: JAI Press, 1988); Daniel
Kahneman,
Paul Slovic, and Amos Tversky, eds. Judgment Under
Uncertainty:
Heuristics and Biases (Cambridge: Cambridge University Press,
1982);
Carol A. Heimer, “Social Structure, Psychology, and the
Estimation of
Risk.” Annual Review of Sociology 14 (1988): 491-519;
Stephen J. Pfohl,
Predicting Dangerousness (Lexington, MA: Lexington Books,
1978).
37 Report of the Presidential Commission, Vol. IV: 791;
Vaughan, The
Challenger Launch Decision, p. 178.
38 Report of the Presidential Commission, Vol. I, pp. 91-92;
Vol. IV, p. 612.
39 Report of the Presidential Commission, Vol. I, pp. 164-177;
Chapter 6,
this Report.
40 Report of the Presidential Commission, Vol. I, p. 90.
41 Report of the Presidential Commission, Vol. IV, pp. 791. For
details of
teleconference and engineering analysis, see Roger M. Boisjoly,
“Ethical
Decisions: Morton Thiokol and the Space Shuttle Challenger
Disaster,”
American Society of Mechanical Engineers, (Boston: 1987), pp.
1-13.
42 Vaughan, The Challenger Launch Decision, pp. 358-361.
43 Report of the Presidential Commission, Vol. I, pp. 88-89, 93.
44 Edward Wong, “E-Mail Writer Says He was Hypothesizing,
Not
Predicting Disaster,” New York Times, 11 March 2003, Sec. A-
20, Col. 1
(excerpts from press conference, Col. 3).
45 Report of the Presidential Commission, Vol. I, pp. 92-95.
46 Report of the Presidential Commission, Vol. I, p. 152.
47 Weick argues that in a risky situation, people need to learn
how to “drop
their tools:” learn to recognize when they are in unprecedented
situations
in which following the rules can be disastrous. See Karl E.
Weick, “The
Collapse of Sensemaking in Organizations: The Mann Gulch
Disaster.”
Administrative Science Quarterly 38, 1993, pp. 628-652.
48 Lee Clarke, Mission Improbable: Using Fantasy Documents
to Tame
Disaster, (Chicago: University of Chicago Press, 1999); Charles
Perrow,
Normal Accidents, op. cit.; Scott Sagan, The Limits of Safety,
op. cit.;
Diane Vaughan, “The Dark Side of Organizations,” Annual
Review of
Sociology, Vol. 25, 1999, pp. 271-305.
49 Typically, after a public failure, the responsible organization
makes
safety the priority. They sink resources into discovering what
went wrong
and lessons learned are on everyoneʼs minds. A boost in
resources goes
to safety to build on those lessons in order to prevent another
failure.
But concentrating on rebuilding, repair, and safety takes energy
and
resources from other goals. As the crisis ebbs and normal
functioning
returns, institutional memory grows short. The tendency is then
to
backslide, as external pressures force a return to operating
goals.
William R. Freudenberg, “Nothing Recedes Like Success? Risk
Analysis
and the Organizational Amplification of Risks,” Risk: Issues in
Health and
Safety 3, 1: 1992, pp. 1-35; Richard H. Hall, Organizations:
Structures,
Processes, and Outcomes, (Prentice-Hall. 1998), pp. 184-204;
James G.
March, Lee S. Sproull, and Michal Tamuz, “Learning from
Samples of
One or Fewer,” Organization Science, 2, 1: February 1991, pp.
1-13.
Understanding the Challenger Disaster: Organizational
Structure and the Design of Reliable
Systems
Author(s): C. F. Larry Heimann
Source: The American Political Science Review, Vol. 87, No. 2
(Jun., 1993), pp. 421-435
Published by: American Political Science Association
Stable URL: http://www.jstor.org/stable/2939051
Accessed: 16-08-2016 18:48 UTC
Your use of the JSTOR archive indicates your acceptance of the
Terms & Conditions of Use, available at
http://about.jstor.org/terms
JSTOR is a not-for-profit service that helps scholars,
researchers, and students discover, use, and build upon a wide
range of content in a trusted
digital archive. We use information technology and tools to
increase productivity and facilitate new forms of scholarship.
For more information about
JSTOR, please contact [email protected]
Cambridge University Press, American Political Science
Association are collaborating with
JSTOR to digitize, preserve and extend access to The American
Political Science Review
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
American Political Science Review Vol. 87, No. 2 June 1993
UNDERSTANDING THE CHALLENGER DISASTER:
ORGANIZATIONAL
STRUCTURE AND THE DESIGN OF RELIABLE SYSTEMS
C. F. LARRY HEIMANN Michigan State University
e destruction of the space shuttle Challenger was a tremendous
blow to American space
policy. To what extent was this loss the result of organizational
factors at the National
.1.. Aeronautics and Space Administration? To discuss this
question analytically, we need a
theory of organizational reliability and agency behavior. Martin
Landau's work on redundancy and
administrative performance provides a good starting point for
such an effort. Expanding on Landau's
work, Iformulate a more comprehensive theory of
organizational reliability that incorporates both type
I and type II errors. These principles are then applied in a study
of NASA and its administrative
behavior before and after the Challenger accident.
O n 28 January 1986, the entire nation focused
on a single event. Seventy-three seconds after
lift-off, the space shuttle Challenger was de-
stroyed in a powerful explosion fifty thousand feet
above the Kennedy Space Center. The losses result-
ing from this catastrophe were quite high. Seven
astronauts, including Teacher-in-Space Christa
McAuliffe, were killed as the shuttle broke apart and
fell into the sea. The shuttle itself had to be replaced
at a cost of over two billion dollars. The launch of
many important commercial and military satellites
had to be delayed as American space policy ground to
a complete halt. The accident had a profound impact
on the National Aeronautics and Space Administra-
tion (NASA), as well. The agency's credibility and its
reputation for flawless execution of complex techno-
logical tasks were lost, along with the Challenger. To
this day, the legacy of the Challenger haunts the
decisions of both the agency and its political superi-
ors in Congress and the White House.
An examination of the shuttle remnants and other
launch data revealed the technical cause of the acci-
dent. The shuttle was destroyed when an O-ring seal
on the right solid rocket motor failed, allowing the
escaping hot gases to burn through and ignite the
main fuel tank of liquid hydrogen and liquid oxygen.
Although identifying the technical cause of this dis-
aster is important, it represents only one aspect of the
problem. Perrow (1984) has argued that organiza-
tional and technological failures have become so
intimately linked that to fully understand the cause of
most major accidents, we must analyze both the
administrative and technical aspects of the situation.
While there have been some administrative critiques
of NASA in the wake of this disaster, almost all have
centered on issues of bureaucratic culture, such as the
agency's propensity to ignore key evidence and its
myopic view of its mission. Surprisingly little has
been done on a systematic analysis of the NASA
organization structure and how it may or may not
have contributed to the loss of the Challenger.
One reason for this void is that the analytical tool
necessary-a comprehensive theory of organizational
reliability-is still lacking. Those involved in the
debate over structural design have adopted two op-
posing stances. The traditional public administration
focus on organizational design has involved the pur-
suit of efficiency in the sense of minimizing costs for
a given level of output. Implicit in this traditional
analysis is that the reliable performance of the orga-
nization was constant. The critical question, there-
fore, has usually been how one might achieve same
level of services at a lower cost. As a result, the policy
recommendations from this traditional line of think-
ing have been to streamline administrative systems
and reduce organizational redundancy as much as
possible.
The work of Martin Landau (1969) on redundancy
in organizations was a particularly important contri-
bution to this debate, inasmuch as he recognized that
administrative reliability is dependent on structural
factors. In breaking with conventional wisdom,
Landau's landmark 1969 essay, "Redundancy, Ration-
ality, and the Problem of Duplication and Overlap,"
asserted that the critical question was not how to cut
the costs of administrative performance but, rather,
how to ensure the organization's effectiveness.
Landau began by noting that, "no matter how much
a part is perfected, there is always the chance that it
will fail" (p. 350). As he observed, a streamlined
system requires only one part to fail for the entire
system to fail. Drawing on concepts from engineering
reliability theory, Landau argued that redundancy
built into the system can make an organization more
reliable than any of its parts. Therefore, Landau
concluded, administrative redundancy and duplica-
tion can be an important part of effective govern-
ment.
Some work has been done to extend Landau's
initial insights regarding organizational redundancy
and administrative reliability, primarily by students
and colleagues of Landau. Jon Bendor's (1985) Parallel
Systems, originally written as a dissertation under
Landau, was the most comprehensive effort to follow
up on these ideas. Bendor formalized many of Land-
au's concepts and was the first to conduct empirical
testing of these propositions, focusing on transporta-
tion planning and operations in three American met-
421
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
Understanding the Challenger Disaster June 1993
ropolitan areas. Donald Chisholm's (1989) Coordina-
tion without Hierarchy, also written originally as a
dissertation under Landau, discussed the notion of
reliability and redundancy as it existed in informal
organizational structures. Others have extended
some of Landau's concepts of reliability to the oper-
ations of air traffic controllers and aircraft carrier
battle groups (LaPorte and Consolini 1991; Rochlin,
LaPorte, and Roberts 1987).
The policy prescriptions that follow from these two
positions are thus quite different. The tradition-
alist argument tells us to reduce redundancy and
streamline administrative systems whenever possi-
ble, whereas Landau advocates increasing redun-
dancy by adding parallel units to the system. Most
interesting, however, is the fact that an analysis of
the organizational changes at NASA reveals that
neither the traditionalists nor Landau are fully cor-
rect. Certain parts of the space agency followed a
traditionalist policy of streamlining, while other seg-
ments of NASA adopted the Landau perspective by
generating new parallel linkages. Yet (as I shall
show), both these decisions were wrong and contrib-
uted to the untimely destruction of the Challenger.
How can this be? The reason is a fundamental
limitation in the current framework for discussing
organizational reliability. Most work in this area has
implicitly assumed that there was only one kind of
institutional failure and thus only two possible states
for organizational performance: the agency either
adopted the proper policy or not. But it should be
recognized that the latter possibility conceals two
different problems: the agency can simply fail to act,
or it can adopt an improper policy. Considering the
impact of both forms of error is important, because it
leads us to a different set of policy prescriptions. An
organizational structure that is effective at preventing
one type of error may not be equally effective at
preventing the other type of error.
To date, Bendor alone has formally recognized this
distinction; but his analysis was limited (1985, 49-52).
I shall extend our understanding of organizational
reliability by exploring how each kind of failure is
affected by different kinds of administrative struc-
tures. I shall lay a foundation for an analysis of
multiple forms of administrative failure by describing
the principles from engineering reliability theory,
then show how different kinds of structures leave the
agency vulnerable to different kinds of errors. Apply-
ing these arguments to the structure and decision-
making process of NASA in the pre- and post-
Challenger eras, I shall argue that the disaster had its
roots in the structural changes adopted by the space
agency in the 1970s and 1980s.
Although occasionally cited by students of public
administration, Landau's work on institutional per-
formance and reliability has received little attention
from political scientists. One possible reason for this
stems from the engineering roots of reliability theory.
Political scientists may have considered the issue of
organizational redundancy and reliability simply to
be a technocratic problem that could be solved by
reference to engineering formulas that dictate the
appropriate structural form and do not raise political
issues. Indeed, if we limit the scope of the problem to
two-state devices, this apolitical view of structural
design has some merit. But in a system with multiple
types of errors, trade-offs between the errors must be
made; and this moves us into the realm of politics.
Which goals will be embraced? How will resources be
allocated to combat each kind of error? Answers to
these questions are ultimately political issues.
TWO TYPES OF
ADMINISTRATIVE ERRORS
Before proceeding further, it is important to discuss
more thoroughly what is meant by the term policy
failure. As just noted, administrative problems often
have a richer structure than the simple two-state
(operating/failed) model can incorporate. In particu-
lar, bureaucracies are often in the position to commit
two types of errors; (1) implemention of the wrong
policy, an error of commission; and (2) failure to act
when action is warranted, an error of omission. If we
consider the agency's decision to take action to be
comparable to the acceptance of a hypothesis, we can
relate these two kinds of failures to the more familiar
type I and type II errors often studied in statistics.
To establish this link, we must first define the null
and alternative hypothesis in terms of potential bu-
reaucratic action. Since presumption often favors the
status quo, let us consider the null hypothesis to be
that the agency should not take any new action and
the alternative hypothesis to be that the bureau
should take such action. Therefore, if the agency
chooses to act when it is improper to do so (rejects the
null hypothesis when it is true), a type I error has
been committed. Likewise, if the bureau fails to act in
a situation where it should (accepts the null hypoth-
esis when it is false), then a type II error has been
committed.
To illustrate the concept of type I and type II errors
in organizational systems, let us consider the exam-
ple of NASA and its decision to launch the space
shuttle. The space agency traditionally approaches
the launch decision with the assumption that a mis-
sion is not safe to fly. Subordinates are then required
to prove that such is not the case before the launch
is permitted. The null hypothesis, therefore, is that
the mission should be aborted. If NASA were to
reject the null hypothesis by launching a mission that
is actually unsafe, it would be committing a type I
error. On the other hand, if NASA decided not to
launch a mission that was technologically sound,
then it would have committed a type II error. The
agency's choices and consequences are summarized
in Figure 1.
Each form of failure is associated with a different
set of costs. By committing a type II error, NASA
loses the opportunity to achieve its objectives and
wastes time, effort, and materials that could have
422
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
American Political Science Review Vol. 87, No. 2
Summary of NASA Responses and Possible Errors
Regarding Launch Decisions
The proper course of action:
Launch Abort
Correct Type I1 Error
0 : Decision
e Mission Accident occurs; co cu~ision Possible loss of
. successful life and/or
successful equipment
< U Type 11 Correct
Z o Error Decision
.:
Missed opportunity; Accident
wasted resources avoided
null hypothesis: the mission should be aborted.
been usefully employed elsewhere. For example, the
shuttle's propellants in the external tank, liquid hy-
drogen and liquid oxygen, are lost if the mission is
scrubbed and have alone been valued at approxi-
mately five hundred thousand dollars. Furthermore,
the agency may forfeit the opportunity to carry out
rare scientific research, as was the case when NASA
missed the launch date for its ASTRO mission to
study Halley's Comet. The Challenger accident clearly
demonstrated, however, that a type I failure may be
far more costly. The destroyed shuttle was replaced
by the Endeavour at an expense over two billion
dollars, and the death of the seven astronauts repre-
sents an incalculable loss. Certainly, in the case of
NASA, type I errors are associated with greater costs
than type II failures. The exact cost trade-off between
these types of failure would vary, of course, for
different agencies and according to the individual
circumstances prevailing at the time of each decision.
For this reason, it is important to develop a general
approach to the study of organizational reliability that
recognizes both types of errors and identifies the
consequences associated with them.
By recognizing both forms of potential failure, we
are better able to understand the nature of the trade-
offs demanded. As with hypothesis testing, gains in
type I reliability often come at the expense of type II
reliability. However, just as it is conceivable to reduce
both a and , in hypothesis testing by increasing the
sample size, it is possible to increase both type I and
type II reliability by raising an agency's resource
levels. But because of resource limitations in the real
world, it is inevitable that bureaucrats and their
political superiors will have to strike a balance be-
tween each type of reliability. On the whole, three-
state devices allow us to consider both forms of error
and thus provide a richer framework for studying the
issue of organizational reliability.1
BASIC CONCEPTS OF STRUCTURE
AND RELIABILITY
General Assumptions
Throughout this discussion I will be contrasting com-
ponent and system reliability. It is important to clarify,
in advance, what is meant by these terms. A system
is a collection of subunits, known as components,
which are linked together in a particular structure. In
administrative theory, identifying components is de-
pendent on the system level in question. If one were
concentrating on the behavior of an agency, then the
agency as a whole is the system, and the offices
within it are the components. Alternatively, if one
were looking at the executive branch as a whole, that
would be the system and each agency a component
within the system. In an organizational context, de-
termining what the components are depends largely
on the system with which you are concerned. In this
case study of NASA, the system of concern is the
agency, and each component represents an office or
division inside NASA.
Three other assumptions of this work must also be
specified. First, I start by assuming that the probabil-
ity of failure for each component in the system has
already been determined either by testing or past
history,2 then, later, relax this assumption, demon-
strating that the theoretical framework is valuable
even when the exact probabilities of component fail-
ure are not known. Making this assumption at the
onset, however, is useful for developing the theory;
since the components are assumed to be known
quantities, the remaining question is how to assemble
these components into a reliable network. Second, I
assume that organizational reliability is static, not
dynamic. In the engineering literature, reliability is
often treated as time-dependent so as to simulate the
breakdown of mechanical components; the probabil-
ity of such failure naturally increases with age. For
administrative systems, however, it is not clear how
component reliability would change over time. One
might argue that agents become more reliable over
time because they have greater experience and exper-
tise with the issues and are thus better able to address
them. On the other hand, it could be said that agents
are less reliable over time because they are more
secure in their positions and lose their incentive to
perform well. Additionally, interest-group capture of
some public agency may affect the reliability of its
performance. While these ideas raise interesting
questions, static models of administrative reliability
will provide sufficient insight for our needs here.
Finally, I assume that the states of all components are
statistically independent. In other words, the failure
of one component or subsystem does not affect the
probability of failure of other components. Those
who study public administration may question the
423
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
Understanding the Challenger Disaster June 1993
Redundant System in Parallel
(a) serial configuration
(b) parallel configuration
validity of this assumption. Bendor, however, proves
a theorem demonstrating that some degree of com-
ponent interaction does not negate the general results
of reliability theory (1985, 44 49). Therefore, the as-
sumption of component independence is a useful sim-
plifying assumption that does not undercut the gener-
alizability of the results. With these assumptions in
hand, we can now focus our attention on the structural
configurations commonly found in organizations.
Types of Organizational Structures
There are several basic organizational forms em-
ployed in the development of administrative sys-
tems. The first is a serial structure, a form often found
in organizations. In a traditional serial structure,
pictured in Figure 2, the first component must cor-
rectly process a policy initiative before sending it on
to the'next component. To be effective, policy must
successfully pass through each of these components.
The result is that in order for the system as a whole to
fail, it is only necessary for one of the components in
series to fail. If any one component were to fail to
pass the policy to the next unit, then all the compo-
nents that followed it would be unable to act, and the
policy could not get through the system.
Another possible organizational form is the estab-
lishment of parallel linkages between components. In
a parallel structure, such as the one illustrated in
Figure 2, a policy may pass through any one of the
components in order to get to the implementation
stage. It is different from the serial structure inasmuch
as even if one or more units fail to pass the policy along,
it may still be able to make it through the system. The
end result is that for a policy to fail to get through this
type of system, all components must fail.
One variation of this structural form is the k-out-
of-m unit network. In certain cases, we require that a
certain number, k, of the m units in an active parallel
redundant system must work for the system to be
successful. One example of this type of system might
be the requirement that at least two of the space
shuttle's four major computers be on-line for launch.
An organizational application of this system would
be an agency director's decision rule not to imple-
ment any policy that a majority of the staff cannot
agree upon (k = + 1).
There are two special cases of the k-out-of-m unit
network which merit attention. The first instance is
where k = 1, in which case we are back to a simple
parallel system. The second case is where k = m,
which effectively reduces the structure to a serial
system. (See Appendix for proof.) In the latter case,
although the network is configured as an active
parallel system, in terms of reliability it is behaving as
a serial structure. To differentiate between this type
of system and a traditional serial network, we will
classify this type of structure as a serially independent
system. This name signifies that while the system is
essentially a serial one, its components operate com-
pletely independently from each other.
In an organizational context, an important distinc-
tion between a serially independent system and the
traditional serial structure is the difference in process-
ing time. In a traditional serial system, the first
component must process the information, then pass
it on to the next component for processing. This
continues until all the components have processed
the information. The total processing time of the
system is the sum of processing times for each
component plus some factor to account for transmis-
sion delay between units. In contrast, the serially
independent system allows all components to oper-
ate simultaneously. The time needed for the serially
independent system to complete its task is simply the
time it takes the slowest component to finish its
operations. So while both structures yield the same
level of reliability, serially independent systems re-
quire less processing time.
Larger organizations often utilize combinations of
serial and parallel structural forms. Two examples of
this can be seen in Figure 3. In a series-parallel system
there are several parallel subsystems linked together in
a serial fashion. A parallel-series configuration is the
result of several serial substructures combined into a
larger parallel network. As these two examples illus-
trate, most large and complex structures can be decom-
posed into smaller units for easier analysis.
THE ADVANTAGE OF PARALLEL
SYSTEMS IN A TWO-STATE WORLD
Components aligned in a series configuration are
perhaps the easiest systems to analyze, as well as the
most commonly encountered. As I noted earlier, in
order for the system as a whole to fail, only one of the
components in series has to fail. Defining the proba-
bility of failure for component i as fi, a mathematical
statement of the reliability of a serial system with m
components would be
m
i = 1
424
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
American Political Science Review Vol. 87, No. 2
Combinations of Structures with m Units in Series
and n Units in Parallel
1,1 1,21,
2. 1 2.. . .. . . . . ..r
21,1 2,2 L 2,m
(a) Series-Parallel Configuration
1,1 ~~~1,2 . ..... 1.m
By...~~~~~~~~.......... 2,1 222,m
(b) Parallel-Series Configuration
We can see that two factors determine the reliability
of a series system: component reliability and the
number of components in the system. In order to
increase the reliability of this type of system, one
must either increase the reliability of the components
or decrease the total number of components em-
ployed. As illustrated in Figure 4, marginal gains in
system reliability from increasing component perfor-
mance decrease as component reliability increases.
Because the costs of increasing component reliability
often rise exponentially, it is more effective in many
cases to reduce the total number of components in
series to reach reliability goals.
Perhaps the most common means of increasing
reliability in a two-state world is to add parallel
components to a system. It is assumed that all
branches are active and that a signal needs to pass
through only one branch to be successfully transmit-
ted. Since it is assumed that all branches must fail in
order for the system to fail, the reliability function for
a parallel system with n components is simply
n
i = 1
Figure 7 illustrates the relationship between compo-
nent reliability, the number of parallel elements, and
overall system reliability. In this case, we see that the
Declining Reliability of Serial Systems
0.90 , R = 0.99
n 0.80 '= 0.98
0
0.70 .
2 ~~~~~~~~0
4.. ~~~~~~~0
* 0.60 o
l _ ?0 R=0.95
0.50 R Component Reliability | 0
0
0.40 ..............
1 3 5 7 9 11 13 15
Number of Components In Series
marginal gains from adding parallel channels de-
crease as the number of parallel components in-
creases.
In comparing Figures 4 and 5, the attractiveness of
parallel systems in a two-state world is clear. Holding
component reliability constant, the addition of redun-
dancy in a parallel fashion will raise the reliability of
the overall system, while creating serial redundancies
decreases total system reliability. It is not surprising,
therefore, that many scholars in this area have
spurned serial systems and focused, instead, on
parallel linkages when discussing this issue. Press-
man and Wildavsky (1973) have criticized the exist-
ence of "multiple clearance points" (a serial system)
for an implementive decision because it reduces the
likelihood that any policy can ever be executed.
Landau (1969) directs the same logic against "stream-
lined" serial structures, claiming that such systems
are more susceptible to failure than those with paral-
Increasing Reliability with Parallel Systems
1.00n 4
0.90 n=
0.80 n=2
cc 0.70
E
, 0.60
n=
n = number of parallel components
0.50
0.40 I l l l l I l l l
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Component Reliability
425
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
Understanding the Challenger Disaster June 1993
lel redundancies. As a result, the major policy pre-
scription of the work of Landau (1969, 1973, 1991) and
others (Bendor 1985; Lerner 1986) has been to empha-
size the use of parallel linkages in order to create
more reliable organizations.
However, if it is true that serial systems are gener-
ally less reliable, why are such structures ever
adopted by organizations? To answer this question
fully, we must go beyond the two-state world to
allow for multiple forms of error.
TYPE I AND TYPE II ERRORS IN AN
ADMINISTRATIVE SETTING
The Advantages of Serial Systems in a
Three-State World
As we have seen in the two-state world, serial struc-
tures are less reliable than others of comparable size.
Nonetheless, it is clear that this structural form has
been employed repeatedly in the development of
bureaucracies. The reason this type of organizational
structure has survived, I will suggest, is that it is
valuable when we must design systems to accommo-
date multiple types of error. I previously assumed
that there existed only two states for every compo-
nent: good (operating) and failed (not operating). I
shall now elaborate a'theory of organizational reliabil-
ity that allows for the existence of both type I and
type II errors.
In general, a series structure is better suited to stop
type I errors from occurring. Since the null hypothe-
sis is that the agency should not take any action, if
any component chooses to pass the proposed policy
through its part of the system, then it is rejecting the
null hypothesis. For any policy to pass through a
serial system successfully, all units must agree to pass
it along. If rejecting the null hypothesis was actually
the incorrect decision (a type I error), then all units
must commit such an error for the serial system as a
whole to fail. Mathematically, we can identify both
the probability of a type I failure occurring in the
system (Fa) and the reliability of the system against
this error (Ra) as
m
Fa= [I ai
m
Ra = 1 - H ai,
i= 1
where ca is the probability of a type I error occurring
in the ith component and m is the number of compo-
nents linked in series.
For an example, consider NASA's launch decision
process. We would find that a serial structure is more
effective in preventing unsafe launches. In order to
approve an unsafe launch in a serial system, every
unit must err. The more hurdles are established (via
increasing numbers of serial components), the harder
it is for an unsafe launch proposal to pass through the
system unopposed.
With regard to type II errors, however, series
structures are less effective. Consider the fact that if
one unit accepts the null hypothesis of no new action,
then the policy cannot be passed through the rest of
the serial system. If accepting the null hypothesis was
actually incorrect (a type II error), then the whole
system would have failed. Thus, in order for a type II
error to occur at the system level, only one compo-
nent in series must fail in this manner for the system
to fail. The more components added in series, the
greater the probability that a component will commit
a type II error, causing the system to fail (Fit). The
reliability of a series structure with regards to type II
errors (RB) can be represented as
m
F03l rl -oi)
i = 1
m
i= 1
where f3i is the probability of a type II error occurring
in the ith component and m is the number of compo-
nents linked in series.
We may also be interested in the overall reliability
of the system-the likelihood that an agency would
commit either a type I or type II error. To find this, we
assume that for a given event at a given time, it is not
possible for an administrative system to commit both
a type I and a type II error simultaneously. This
assumption is not unreasonable. A type I error re-
quires that the agency act, while a type II error
demands that the agency postpone any action. An
organization cannot both act and not act at the same
time. Consider NASA's decision to launch the shut-
tle. The space agency can either decide to launch the
shuttle now (possibly resulting in a type I error) or
abort the launch until another time (possibly result-
ing in a type II error). It cannot decide to both launch
and abort at the same time.
Given this argument, we can conclude that type I
and type II errors are, at any point in time, mutually
exclusive events. Therefore, the probability of either a
type I or type II error in a serial system can be found
as
P(Fa U Fa) = P(Fa) + P(FO) =
m e m
niai+1 rl (1-oi)
i=1 i=+
The overall reliability of the serial system is then
found to be
Rsys = 1 - P(Fa U Fp)
1 - (fi a i+ 1 -fi(1 ni ))
426
This content downloaded from 69.67.124.210 on Tue, 16 Aug
2016 18:48:44 UTC
All use subject to http://about.jstor.org/terms
American Political Science Review Vol. 87, No. 2
m m
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx
A C C I D E N T  I N V E S T I G A T I O N  B O A R DCOLUM.docx

More Related Content

Similar to A C C I D E N T I N V E S T I G A T I O N B O A R DCOLUM.docx

Rodney rocha
Rodney rochaRodney rocha
Rodney rochaNASAPMC
 
Rodney rocha
Rodney rochaRodney rocha
Rodney rochaNASAPMC
 
Gehman’s axiom number 3 (Columbia Space Shuttle)
Gehman’s axiom number 3 (Columbia Space Shuttle)Gehman’s axiom number 3 (Columbia Space Shuttle)
Gehman’s axiom number 3 (Columbia Space Shuttle)
Cpo Creed
 
Nasa tragedies and lessons
Nasa tragedies and lessonsNasa tragedies and lessons
Nasa tragedies and lessonspkmarvel
 
Risk management paper
Risk management paperRisk management paper
Risk management paper
PMlynda17
 
Long_Alexandra-8900_final
Long_Alexandra-8900_finalLong_Alexandra-8900_final
Long_Alexandra-8900_finalAlexandra Long
 
The Space Shuttle Disasters
The Space Shuttle DisastersThe Space Shuttle Disasters
The Space Shuttle DisastersDon W. Lewis
 
There are several documentaries about the engineering disasters assoc.pdf
 There are several documentaries about the engineering disasters assoc.pdf There are several documentaries about the engineering disasters assoc.pdf
There are several documentaries about the engineering disasters assoc.pdf
anshuanil26
 
The communication challenges at nasa
The communication challenges at nasaThe communication challenges at nasa
The communication challenges at nasaAneth Sanchez
 
1 Which typical team challenges refer to those in the book.pdf
1 Which typical team challenges refer to those in the book.pdf1 Which typical team challenges refer to those in the book.pdf
1 Which typical team challenges refer to those in the book.pdf
abhishek483040
 
Engl3355 case analysis group 6
Engl3355 case analysis group 6Engl3355 case analysis group 6
Engl3355 case analysis group 6Aneth Sanchez
 
KXEX2165_Moral & Ethics_Assignment 2
KXEX2165_Moral & Ethics_Assignment 2KXEX2165_Moral & Ethics_Assignment 2
KXEX2165_Moral & Ethics_Assignment 2
Max Lee
 
52352main contour
52352main contour52352main contour
52352main contourPedro León
 
SPACE SHUTTLE CHALLENGER
SPACE SHUTTLE CHALLENGERSPACE SHUTTLE CHALLENGER
SPACE SHUTTLE CHALLENGER
MIbrar4
 
Ag avt-140-01
Ag avt-140-01Ag avt-140-01
Ag avt-140-01ceginc
 
Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
 Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions. Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
Om Shukla
 

Similar to A C C I D E N T I N V E S T I G A T I O N B O A R DCOLUM.docx (20)

Rodney rocha
Rodney rochaRodney rocha
Rodney rocha
 
Rodney rocha
Rodney rochaRodney rocha
Rodney rocha
 
Gehman’s axiom number 3 (Columbia Space Shuttle)
Gehman’s axiom number 3 (Columbia Space Shuttle)Gehman’s axiom number 3 (Columbia Space Shuttle)
Gehman’s axiom number 3 (Columbia Space Shuttle)
 
Nasa tragedies and lessons
Nasa tragedies and lessonsNasa tragedies and lessons
Nasa tragedies and lessons
 
Risk management paper
Risk management paperRisk management paper
Risk management paper
 
Long_Alexandra-8900_final
Long_Alexandra-8900_finalLong_Alexandra-8900_final
Long_Alexandra-8900_final
 
The Space Shuttle Disasters
The Space Shuttle DisastersThe Space Shuttle Disasters
The Space Shuttle Disasters
 
There are several documentaries about the engineering disasters assoc.pdf
 There are several documentaries about the engineering disasters assoc.pdf There are several documentaries about the engineering disasters assoc.pdf
There are several documentaries about the engineering disasters assoc.pdf
 
The communication challenges at nasa
The communication challenges at nasaThe communication challenges at nasa
The communication challenges at nasa
 
1 Which typical team challenges refer to those in the book.pdf
1 Which typical team challenges refer to those in the book.pdf1 Which typical team challenges refer to those in the book.pdf
1 Which typical team challenges refer to those in the book.pdf
 
Engl3355 case analysis group 6
Engl3355 case analysis group 6Engl3355 case analysis group 6
Engl3355 case analysis group 6
 
KXEX2165_Moral & Ethics_Assignment 2
KXEX2165_Moral & Ethics_Assignment 2KXEX2165_Moral & Ethics_Assignment 2
KXEX2165_Moral & Ethics_Assignment 2
 
52352main contour
52352main contour52352main contour
52352main contour
 
SPACE SHUTTLE CHALLENGER
SPACE SHUTTLE CHALLENGERSPACE SHUTTLE CHALLENGER
SPACE SHUTTLE CHALLENGER
 
PellegrinoHonorsThesis
PellegrinoHonorsThesisPellegrinoHonorsThesis
PellegrinoHonorsThesis
 
Space Shuttle Columbia Disaster Review
Space Shuttle Columbia Disaster ReviewSpace Shuttle Columbia Disaster Review
Space Shuttle Columbia Disaster Review
 
Ag avt-140-01
Ag avt-140-01Ag avt-140-01
Ag avt-140-01
 
NPS_Article
NPS_ArticleNPS_Article
NPS_Article
 
Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
 Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions. Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions.
 
Proc,Air,Water_July 2016
Proc,Air,Water_July 2016Proc,Air,Water_July 2016
Proc,Air,Water_July 2016
 

More from ransayo

Zoe is a second grader with autism spectrum disorders. Zoe’s father .docx
Zoe is a second grader with autism spectrum disorders. Zoe’s father .docxZoe is a second grader with autism spectrum disorders. Zoe’s father .docx
Zoe is a second grader with autism spectrum disorders. Zoe’s father .docx
ransayo
 
Zlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docx
Zlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docxZlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docx
Zlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docx
ransayo
 
Zia 2Do You Choose to AcceptYour mission, should you choose.docx
Zia 2Do You Choose to AcceptYour mission, should you choose.docxZia 2Do You Choose to AcceptYour mission, should you choose.docx
Zia 2Do You Choose to AcceptYour mission, should you choose.docx
ransayo
 
Ziyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docx
Ziyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docxZiyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docx
Ziyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docx
ransayo
 
Ziyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docx
Ziyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docxZiyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docx
Ziyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docx
ransayo
 
Zhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docx
Zhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docxZhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docx
Zhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docx
ransayo
 
Zichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docx
Zichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docxZichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docx
Zichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docx
ransayo
 
Zheng Hes Inscription This inscription was carved on a stele erec.docx
Zheng Hes Inscription This inscription was carved on a stele erec.docxZheng Hes Inscription This inscription was carved on a stele erec.docx
Zheng Hes Inscription This inscription was carved on a stele erec.docx
ransayo
 
Zhou 1Time and Memory in Two Portal Fantasies An Analys.docx
Zhou 1Time and Memory in Two Portal Fantasies An Analys.docxZhou 1Time and Memory in Two Portal Fantasies An Analys.docx
Zhou 1Time and Memory in Two Portal Fantasies An Analys.docx
ransayo
 
Zhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docx
Zhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docxZhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docx
Zhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docx
ransayo
 
Zhang 1Nick ZhangMr. BetheaLyric Peotry13.docx
Zhang 1Nick ZhangMr. BetheaLyric Peotry13.docxZhang 1Nick ZhangMr. BetheaLyric Peotry13.docx
Zhang 1Nick ZhangMr. BetheaLyric Peotry13.docx
ransayo
 
Zero trust is a security stance for networking based on not trusting.docx
Zero trust is a security stance for networking based on not trusting.docxZero trust is a security stance for networking based on not trusting.docx
Zero trust is a security stance for networking based on not trusting.docx
ransayo
 
Zero plagiarism4 referencesNature offers many examples of sp.docx
Zero plagiarism4 referencesNature offers many examples of sp.docxZero plagiarism4 referencesNature offers many examples of sp.docx
Zero plagiarism4 referencesNature offers many examples of sp.docx
ransayo
 
Zero plagiarism4 referencesLearning ObjectivesStudents w.docx
Zero plagiarism4 referencesLearning ObjectivesStudents w.docxZero plagiarism4 referencesLearning ObjectivesStudents w.docx
Zero plagiarism4 referencesLearning ObjectivesStudents w.docx
ransayo
 
Zero Plagiarism or receive a grade of a 0.Choose one important p.docx
Zero Plagiarism or receive a grade of a 0.Choose one important p.docxZero Plagiarism or receive a grade of a 0.Choose one important p.docx
Zero Plagiarism or receive a grade of a 0.Choose one important p.docx
ransayo
 
ZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docx
ZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docxZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docx
ZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docx
ransayo
 
zctnoFrl+.1Affid ow9iar!(al+{FJr.docx
zctnoFrl+.1Affid ow9iar!(al+{FJr.docxzctnoFrl+.1Affid ow9iar!(al+{FJr.docx
zctnoFrl+.1Affid ow9iar!(al+{FJr.docx
ransayo
 
Zeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docx
Zeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docxZeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docx
Zeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docx
ransayo
 
zClass 44.8.19§ Announcements§ Go over quiz #1.docx
zClass 44.8.19§ Announcements§ Go over quiz #1.docxzClass 44.8.19§ Announcements§ Go over quiz #1.docx
zClass 44.8.19§ Announcements§ Go over quiz #1.docx
ransayo
 
zClass 185.13.19§ Announcements§ Review of last .docx
zClass 185.13.19§ Announcements§ Review of last .docxzClass 185.13.19§ Announcements§ Review of last .docx
zClass 185.13.19§ Announcements§ Review of last .docx
ransayo
 

More from ransayo (20)

Zoe is a second grader with autism spectrum disorders. Zoe’s father .docx
Zoe is a second grader with autism spectrum disorders. Zoe’s father .docxZoe is a second grader with autism spectrum disorders. Zoe’s father .docx
Zoe is a second grader with autism spectrum disorders. Zoe’s father .docx
 
Zlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docx
Zlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docxZlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docx
Zlatan Ibrahimović – Sports PsychologyOutlineIntroduction .docx
 
Zia 2Do You Choose to AcceptYour mission, should you choose.docx
Zia 2Do You Choose to AcceptYour mission, should you choose.docxZia 2Do You Choose to AcceptYour mission, should you choose.docx
Zia 2Do You Choose to AcceptYour mission, should you choose.docx
 
Ziyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docx
Ziyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docxZiyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docx
Ziyao LiIAS 3753Dr. Manata HashemiWorking Title The Edu.docx
 
Ziyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docx
Ziyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docxZiyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docx
Ziyan Huang (Jerry)Assignment 4Brand PositioningProfessor .docx
 
Zhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docx
Zhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docxZhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docx
Zhtavius Moye04192019BUSA 4126SWOT AnalysisDr. Setliff.docx
 
Zichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docx
Zichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docxZichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docx
Zichun Gao Professor Karen Accounting 1AIBM FInancial Stat.docx
 
Zheng Hes Inscription This inscription was carved on a stele erec.docx
Zheng Hes Inscription This inscription was carved on a stele erec.docxZheng Hes Inscription This inscription was carved on a stele erec.docx
Zheng Hes Inscription This inscription was carved on a stele erec.docx
 
Zhou 1Time and Memory in Two Portal Fantasies An Analys.docx
Zhou 1Time and Memory in Two Portal Fantasies An Analys.docxZhou 1Time and Memory in Two Portal Fantasies An Analys.docx
Zhou 1Time and Memory in Two Portal Fantasies An Analys.docx
 
Zhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docx
Zhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docxZhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docx
Zhang 1Yixiang ZhangTamara KuzmenkovEnglish 101.docx
 
Zhang 1Nick ZhangMr. BetheaLyric Peotry13.docx
Zhang 1Nick ZhangMr. BetheaLyric Peotry13.docxZhang 1Nick ZhangMr. BetheaLyric Peotry13.docx
Zhang 1Nick ZhangMr. BetheaLyric Peotry13.docx
 
Zero trust is a security stance for networking based on not trusting.docx
Zero trust is a security stance for networking based on not trusting.docxZero trust is a security stance for networking based on not trusting.docx
Zero trust is a security stance for networking based on not trusting.docx
 
Zero plagiarism4 referencesNature offers many examples of sp.docx
Zero plagiarism4 referencesNature offers many examples of sp.docxZero plagiarism4 referencesNature offers many examples of sp.docx
Zero plagiarism4 referencesNature offers many examples of sp.docx
 
Zero plagiarism4 referencesLearning ObjectivesStudents w.docx
Zero plagiarism4 referencesLearning ObjectivesStudents w.docxZero plagiarism4 referencesLearning ObjectivesStudents w.docx
Zero plagiarism4 referencesLearning ObjectivesStudents w.docx
 
Zero Plagiarism or receive a grade of a 0.Choose one important p.docx
Zero Plagiarism or receive a grade of a 0.Choose one important p.docxZero Plagiarism or receive a grade of a 0.Choose one important p.docx
Zero Plagiarism or receive a grade of a 0.Choose one important p.docx
 
ZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docx
ZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docxZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docx
ZACHARY SHEMTOB AND DAVID LATZachary Shemtob, formerly editor in.docx
 
zctnoFrl+.1Affid ow9iar!(al+{FJr.docx
zctnoFrl+.1Affid ow9iar!(al+{FJr.docxzctnoFrl+.1Affid ow9iar!(al+{FJr.docx
zctnoFrl+.1Affid ow9iar!(al+{FJr.docx
 
Zeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docx
Zeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docxZeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docx
Zeng Jiawen ZengChenxia Zhu English 3001-015292017Refl.docx
 
zClass 44.8.19§ Announcements§ Go over quiz #1.docx
zClass 44.8.19§ Announcements§ Go over quiz #1.docxzClass 44.8.19§ Announcements§ Go over quiz #1.docx
zClass 44.8.19§ Announcements§ Go over quiz #1.docx
 
zClass 185.13.19§ Announcements§ Review of last .docx
zClass 185.13.19§ Announcements§ Review of last .docxzClass 185.13.19§ Announcements§ Review of last .docx
zClass 185.13.19§ Announcements§ Review of last .docx
 

Recently uploaded

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 

Recently uploaded (20)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 

A C C I D E N T I N V E S T I G A T I O N B O A R DCOLUM.docx

  • 1. A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 1 9 5R e p o r t V o l u m e I A u g u s t 2 0 0 3 The Board began its investigation with two central ques- tions about NASA decisions. Why did NASA continue to fly with known foam debris problems in the years preceding the Columbia launch, and why did NASA managers conclude that the foam debris strike 81.9 seconds into Columbiaʼs flight was not a threat to the safety of the mission, despite the concerns of their engineers? 8.1 ECHOES OF CHALLENGER As the investigation progressed, Board member Dr. Sally Ride, who also served on the Rogers Commission, observed that there were “echoes” of Challenger in Columbia. Ironi- cally, the Rogers Commission investigation into Challenger started with two remarkably similar central questions: Why did NASA continue to fly with known O-ring erosion prob- lems in the years before the Challenger launch, and why, on the eve of the Challenger launch, did NASA managers decide that launching the mission in such cold temperatures was an acceptable risk, despite the concerns of their engineers? The echoes did not stop there. The foam debris hit was not the single cause of the Columbia accident, just as the failure of the joint seal that permitted O-ring erosion was not the single cause of Challenger. Both Columbia and Challenger were lost also because of the failure of NASA ̓ s organiza-
  • 2. tional system. Part Two of this report cites failures of the three parts of NASA ̓ s organizational system. This chapter shows how previous political, budgetary, and policy deci- sions by leaders at the White House, Congress, and NASA (Chapter 5) impacted the Space Shuttle Programʼs structure, culture, and safety system (Chapter 7), and how these in turn resulted in flawed decision-making (Chapter 6) for both ac- cidents. The explanation is about system effects: how actions taken in one layer of NASA ̓ s organizational system impact other layers. History is not just a backdrop or a scene-setter. History is cause. History set the Columbia and Challenger accidents in motion. Although Part Two is separated into chapters and sections to make clear what happened in the political environment, the organization, and managers ̓and engineers ̓decision-making, the three worked together. Each is a critical link in the causal chain. This chapter shows that both accidents were “failures of foresight” in which history played a prominent role.1 First, the history of engineering decisions on foam and O-ring incidents had identical trajectories that “normalized” these anomalies, so that flying with these flaws became routine and acceptable. Second, NASA history had an effect. In re- sponse to White House and Congressional mandates, NASA leaders took actions that created systemic organizational flaws at the time of Challenger that were also present for Columbia. The final section compares the two critical deci- sion sequences immediately before the loss of both Orbit- ers – the pre-launch teleconference for Challenger and the post-launch foam strike discussions for Columbia. It shows history again at work: how past definitions of risk combined with systemic problems in the NASA organization caused both accidents. Connecting the parts of NASA ̓ s organizational system and
  • 3. drawing the parallels with Challenger demonstrate three things. First, despite all the post-Challenger changes at NASA and the agencyʼs notable achievements since, the causes of the institutional failure responsible for Challenger have not been fixed. Second, the Board strongly believes that if these persistent, systemic flaws are not resolved, the scene is set for another accident. Therefore, the recom- mendations for change are not only for fixing the Shuttleʼs technical system, but also for fixing each part of the orga- nizational system that produced Columbiaʼs failure. Third, the Boardʼs focus on the context in which decision making occurred does not mean that individuals are not responsible and accountable. To the contrary, individuals always must assume responsibility for their actions. What it does mean is that NASA ̓ s problems cannot be solved simply by retire- ments, resignations, or transferring personnel.2 The constraints under which the agency has operated throughout the Shuttle Program have contributed to both CHAPTER 8 History As Cause: Columbia and Challenger A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 1 9 6 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 7R e p o r t V o l u m e I A u g u s t 2 0 0 3
  • 4. Shuttle accidents. Although NASA leaders have played an important role, these constraints were not entirely of NASA ̓ s own making. The White House and Congress must recognize the role of their decisions in this accident and take responsibility for safety in the future. 8.2 FAILURES OF FORESIGHT: TWO DECISION HISTORIES AND THE NORMALIZATION OF DEVIANCE Foam loss may have occurred on all missions, and left bipod ramp foam loss occurred on 10 percent of the flights for which visible evidence exists. The Board had a hard time understanding how, after the bitter lessons of Challenger, NASA could have failed to identify a similar trend. Rather than view the foam decision only in hindsight, the Board tried to see the foam incidents as NASA engineers and man- agers saw them as they made their decisions. This section gives an insider perspective: how NASA defined risk and how those definitions changed over time for both foam debris hits and O-ring erosion. In both cases, engineers and manag- ers conducting risk assessments continually normalized the technical deviations they found.3 In all official engineering analyses and launch recommendations prior to the accidents, evidence that the design was not performing as expected was reinterpreted as acceptable and non-deviant, which dimin- ished perceptions of risk throughout the agency. The initial Shuttle design predicted neither foam debris problems nor poor sealing action of the Solid Rocket Boost- er joints. To experience either on a mission was a violation of design specifications. These anomalies were signals of potential danger, not something to be tolerated, but in both cases after the first incident the engineering analysis con- cluded that the design could tolerate the damage. These en-
  • 5. gineers decided to implement a temporary fix and/or accept the risk, and fly. For both O-rings and foam, that first deci- sion was a turning point. It established a precedent for ac- cepting, rather than eliminating, these technical deviations. As a result of this new classification, subsequent incidents of O-ring erosion or foam debris strikes were not defined as signals of danger, but as evidence that the design was now acting as predicted. Engineers and managers incorporated worsening anomalies into the engineering experience base, which functioned as an elastic waistband, expanding to hold larger deviations from the original design. Anomalies that did not lead to catastrophic failure were treated as a source of valid engineering data that justified further flights. These anomalies were translated into a safety margin that was ex- tremely influential, allowing engineers and managers to add incrementally to the amount and seriousness of damage that was acceptable. Both O-ring erosion and foam debris events were repeatedly “addressed” in NASA ̓ s Flight Readiness Reviews but never fully resolved. In both cases, the engi- neering analysis was incomplete and inadequate. Engineers understood what was happening, but they never understood why. NASA continued to implement a series of small correc- tive actions, living with the problems until it was too late.4 NASA documents show how official classifications of risk were downgraded over time.5 Program managers designated both the foam problems and O-ring erosion as “acceptable risks” in Flight Readiness Reviews. NASA managers also assigned each bipod foam event In-Flight Anomaly status, and then removed the designation as corrective actions were implemented. But when major bipod foam-shedding occurred on STS-112 in October 2002, Program manage- ment did not assign an In-Flight Anomaly. Instead, it down- graded the problem to the lower status of an “action” item. Before Challenger, the problematic Solid Rocket Booster
  • 6. joint had been elevated to a Criticality 1 item on NASAʼs Critical Items List, which ranked Shuttle components by failure consequences and noted why each was an accept- able risk. The joint was later demoted to a Criticality 1-R (redundant), and then in the month before Challengerʼs launch was “closed out” of the problem-reporting system. Prior to both accidents, this demotion from high-risk item to low-risk item was very similar, but with some important differences. Damaging the Orbiterʼs Thermal Protection System, especially its fragile tiles, was normalized even be- fore Shuttle launches began: it was expected due to forces at launch, orbit, and re-entry.6 So normal was replacement of Thermal Protection System materials that NASA manag- ers budgeted for tile cost and turnaround maintenance time from the start. It was a small and logical next step for the discovery of foam debris damage to the tiles to be viewed by NASA as part of an already existing maintenance problem, an assessment based on experience, not on a thorough hazard analysis. Foam de- bris anomalies came to be categorized by the reassuring term “in-family,” a formal classification indicating that new occurrences of an anomaly were within the engineering ex- perience base. “In-family” was a strange term indeed for a violation of system requirements. Although “in-family” was a designation introduced post-Challenger to separate prob- lems by seriousness so that “out-of-family” problems got more attention, by definition the problems that were shifted into the lesser “in-family” category got less attention. The Boardʼs investigation uncovered no paper trail showing es- calating concern about the foam problem like the one that Solid Rocket Booster engineers left prior to Challenger.7 So ingrained was the agencyʼs belief that foam debris was not a threat to flight safety that in press briefings after the Columbia accident, the Space Shuttle Program Manager still discounted the foam as a probable cause, saying that
  • 7. Shuttle managers were “comfortable” with their previous risk assessments. From the beginning, NASA ̓ s belief about both these prob- lems was affected by the fact that engineers were evaluat- ing them in a work environment where technical problems were normal. Although management treated the Shuttle as operational, it was in reality an experimental vehicle. Many anomalies were expected on each mission. Against this backdrop, an anomaly was not in itself a warning sign of impending catastrophe. Another contributing factor was that both foam debris strikes and O-ring erosion events were examined separately, one at a time. Individual incidents were not read by engineers as strong signals of danger. What NASA engineers and managers saw were pieces of ill- structured problems.8 An incident of O-ring erosion or foam bipod debris would be followed by several launches where the machine behaved properly, so that signals of danger A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 1 9 6 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 7R e p o r t V o l u m e I A u g u s t 2 0 0 3 were followed by all-clear signals – in other words, NASA managers and engineers were receiving mixed signals.9 Some signals defined as weak at the time were, in retrospect, warnings of danger. Foam debris damaged tile was assumed (erroneously) not to pose a danger to the wing. If a primary
  • 8. O-ring failed, the secondary was assumed (erroneously) to provide a backup. Finally, because foam debris strikes were occurring frequently, like O-ring erosion in the years before Challenger, foam anomalies became routine signals – a normal part of Shuttle operations, not signals of danger. Other anomalies gave signals that were strong, like wiring malfunctions or the cracked balls in Ball Strut Tie Rod As- semblies, which had a clear relationship to a “loss of mis- sion.” On those occasions, NASA stood down from launch, sometimes for months, while the problems were corrected. In contrast, foam debris and eroding O-rings were defined as nagging issues of seemingly little consequence. Their significance became clear only in retrospect, after lives had been lost. History became cause as the repeating pattern of anomalies was ratified as safe in Flight Readiness Reviews. The official definitions of risk assigned to each anomaly in Flight Readi- ness Reviews limited the actions taken and the resources spent on these problems. Two examples of the road not taken and the devastating implications for the future occurred close in time to both accidents. On the October 2002 launch of STS-112, a large piece of bipod ramp foam hit and dam- aged the External Tank Attachment ring on the Solid Rocket Booster skirt, a strong signal of danger 10 years after the last known bipod ramp foam event. Prior to Challenger, there was a comparable surprise. After a January 1985 launch, for which the Shuttle sat on the launch pad for three consecutive nights of unprecedented cold temperatures, engineers discov- ered upon the Orbiter s̓ return that hot gases had eroded the primary and reached the secondary O-ring, blackening the putty in between – an indication that the joint nearly failed. But accidents are not always preceded by a wake-up call.10 In 1985, engineers realized they needed data on the rela- tionship between cold temperatures and O-ring erosion.
  • 9. However, the task of getting better temperature data stayed on the back burner because of the definition of risk: the primary erosion was within the experience base; the sec- ondary O-ring (thought to be redundant) was not damaged and, significantly, there was a low probability that such cold Florida temperatures would recur.11 The scorched putty, ini- tially a strong signal, was redefined after analysis as weak. On the eve of the Challenger launch, when cold temperature became a concern, engineers had no test data on the effect of cold temperatures on O-ring erosion. Before Columbia, engineers concluded that the damage from the STS-112 foam hit in October 2002 was not a threat to flight safety. The logic was that, yes, the foam piece was large and there was damage, but no serious consequences followed. Further, a hit this size, like cold temperature, was a low-probability event. After analysis, the biggest foam hit to date was re- defined as a weak signal. Similar self-defeating actions and inactions followed. Engineers were again dealing with the poor quality of tracking camera images of strikes during ascent. Yet NASA took no steps to improve imagery and took no immediate action to reduce the risk of bipod ramp foam shedding and potential damage to the Orbiter before Columbia. Furthermore, NASA performed no tests on what would happen if a wing leading edge were struck by bipod foam, even though foam had repeatedly separated from the External Tank. During the Challenger investigation, Rogers Commis- sion member Dr. Richard Feynman famously compared launching Shuttles with known problems to playing Russian roulette.12 But that characterization is only possible in hind- sight. It is not how NASA personnel perceived the risks as they were being assessed, one launch at a time. Playing Rus- sian roulette implies that the pistol-holder realizes that death might be imminent and still takes the risk. For both foam
  • 10. debris and O-ring erosion, fixes were in the works at the time of the accidents, but there was no rush to complete them be- cause neither problem was defined as a show-stopper. Each time an incident occurred, the Flight Readiness process declared it safe to continue flying. Taken one at a time, each decision seemed correct. The agency allocated attention and resources to these two problems accordingly. The conse- quences of living with both of these anomalies were, in its view, minor. Not all engineers agreed in the months immedi- ately preceding Challenger, but the dominant view at NASA – the managerial view – was, as one manager put it, “we were just eroding rubber O-rings,” which was a low-cost problem.13 The financial consequences of foam debris also were relatively low: replacing tiles extended the turnaround time between launches. In both cases, NASA was comfort- able with its analyses. Prior to each accident, the agency saw no greater consequences on the horizon. 8.3 SYSTEM EFFECTS: THE IMPACT OF HISTORY AND POLITICS ON RISKY WORK The series of engineering decisions that normalized technical deviations shows one way that history became cause in both accidents. But NASA ̓ s own history encouraged this pattern of flying with known flaws. Seventeen years separated the two accidents. NASA Administrators, Congresses, and po- litical administrations changed. However, NASA ̓ s political and budgetary situation remained the same in principle as it had been since the inception of the Shuttle Program. NASA remained a politicized and vulnerable agency, dependent on key political players who accepted NASA ̓ s ambitious pro- posals and then imposed strict budget limits. Post-Challeng- er policy decisions made by the White House, Congress, and NASA leadership resulted in the agency reproducing many of the failings identified by the Rogers Commission. Policy constraints affected the Shuttle Programʼs organization cul-
  • 11. ture, its structure, and the structure of the safety system. The three combined to keep NASA on its slippery slope toward Challenger and Columbia. NASA culture allowed flying with flaws when problems were defined as normal and rou- tine; the structure of NASA ̓ s Shuttle Program blocked the flow of critical information up the hierarchy, so definitions of risk continued unaltered. Finally, a perennially weakened safety system, unable to critically analyze and intervene, had no choice but to ratify the existing risk assessments on these two problems. The following comparison shows that these system effects persisted through time, and affected engineer- ing decisions in the years leading up to both accidents. A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 1 9 8 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 9R e p o r t V o l u m e I A u g u s t 2 0 0 3 The Board found that dangerous aspects of NASAʼs 1986 culture, identified by the Rogers Commission, remained unchanged. The Space Shuttle Program had been built on compromises hammered out by the White House and NASA headquarters.14 As a result, NASA was transformed from a research and development agency to more of a business, with schedules, production pressures, deadlines, and cost efficiency goals elevated to the level of technical innovation and safety goals.15 The Rogers Commission dedicated an entire chapter of its report to production pressures.16 More- over, the Rogers Commission, as well as the 1990 Augus-
  • 12. tine Committee and the 1999 Shuttle Independent Assess- ment Team, criticized NASA for treating the Shuttle as if it were an operational vehicle. Launching on a tight schedule, which the agency had pursued as part of its initial bargain with the White House, was not the way to operate what was in fact an experimental vehicle. The Board found that prior to Columbia, a budget-limited Space Shuttle Program, forced again and again to refashion itself into an efficiency model because of repeated government cutbacks, was beset by these same ills. The harmful effects of schedule pressure identified in previous reports had returned. Prior to both accidents, NASA was scrambling to keep up. Not only were schedule pressures impacting the people who worked most closely with the technology – techni- cians, mission operators, flight crews, and vehicle proces- sors – engineering decisions also were affected.17 For foam debris and O-ring erosion, the definition of risk established during the Flight Readiness process determined actions taken and not taken, but the schedule and shoestring bud- get were equally influential. NASA was cutting corners. Launches proceeded with incomplete engineering work on these flaws. Challenger-era engineers were working on a permanent fix for the booster joints while launches contin- ued.18 After the major foam bipod hit on STS-112, manage- ment made the deadline for corrective action on the foam problem after the next launch, STS-113, and then slipped it again until after the flight of STS-107. Delays for flowliner and Ball Strut Tie Rod Assembly problems left no margin in the schedule between February 2003 and the management- imposed February 2004 launch date for the International Space Station Node 2. Available resources – including time out of the schedule for research and hardware modifications – went to the problems that were designated as serious – those most likely to bring down a Shuttle. The NASA culture encouraged flying with flaws because the schedule
  • 13. could not be held up for routine problems that were not de- fined as a threat to mission safety.19 The question the Board had to answer was why, since the foam debris anomalies went on for so long, had no one rec- ognized the trend and intervened? The O-ring history prior to Challenger had followed the same pattern. This question pointed the Boardʼs attention toward the NASA organiza- tion structure and the structure of its safety system. Safety- oriented organizations often build in checks and balances to identify and monitor signals of potential danger. If these checks and balances were in place in the Shuttle Program, they werenʼt working. Again, past policy decisions pro- duced system effects with implications for both Challenger and Columbia. Prior to Challenger, Shuttle Program structure had hindered information flows, leading the Rogers Commission to con- clude that critical information about technical problems was not conveyed effectively through the hierarchy.20 The Space Shuttle Program had altered its structure by outsourcing to contractors, which added to communication problems. The Commission recommended many changes to remedy these problems, and NASA made many of them. However, the Board found that those post-Challenger changes were undone over time by management actions.21 NASA ad- ministrators, reacting to government pressures, transferred more functions and responsibilities to the private sector. The change was cost-efficient, but personnel cuts reduced oversight of contractors at the same time that the agencyʼs dependence upon contractor engineering judgment in- creased. When high-risk technology is the product and lives are at stake, safety, oversight, and communication flows are critical. The Board found that the Shuttle Programʼs normal chain of command and matrix system did not perform a check-and-balance function on either foam or O-rings.
  • 14. The Flight Readiness Review process might have reversed the disastrous trend of normalizing O-ring erosion and foam debris hits, but it didnʼt. In fact, the Rogers Commission found that the Flight Readiness process only affirmed the pre-Challenger engineering risk assessments.22 Equally troubling, the Board found that the Flight Readiness pro- cess, which is built on consensus verified by signatures of all responsible parties, in effect renders no one accountable. Although the process was altered after Challenger, these changes did not erase the basic problems that were built into the structure of the Flight Readiness Review.23 Managers at the top were dependent on engineers at the bottom for their engineering analysis and risk assessments. Information was lost as engineering risk analyses moved through the process. At succeeding stages, management awareness of anomalies, and therefore risks, was reduced either because of the need to be increasingly brief and concise as all the parts of the system came together, or because of the need to produce consensus decisions at each level. The Flight Readiness process was designed to assess hardware and take corrective actions that would transform known problems into accept- able flight risks, and that is precisely what it did. The 1986 House Committee on Science and Technology concluded during its investigation into Challenger that Flight Readi- ness Reviews had performed exactly as they were designed, but that they could not be expected to replace engineering analysis, and therefore they “cannot be expected to prevent a flight because of a design flaw that Project management had already determined an acceptable risk.”24 Those words, true for the history of O-ring erosion, also hold true for the history of foam debris. The last line of defense against errors is usually a safety system. But the previous policy decisions by leaders de- scribed in Chapter 5 also impacted the safety structure
  • 15. and contributed to both accidents. Neither in the O-ring erosion nor the foam debris problems did NASAʼs safety system attempt to reverse the course of events. In 1986, the Rogers Commission called it “The Silent Safety Sys- tem.”25 Pre-Challenger budget shortages resulted in safety personnel cutbacks. Without clout or independence, the A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 1 9 8 R e p o r t V o l u m e I A u g u s t 2 0 0 3 1 9 9R e p o r t V o l u m e I A u g u s t 2 0 0 3 safety personnel who remained were ineffective. In the case of Columbia, the Board found the same problems were reproduced and for an identical reason: when pressed for cost reduction, NASA attacked its own safety system. The faulty assumption that supported this strategy prior to Columbia was that a reduction in safety staff would not result in a reduction of safety, because contractors would assume greater safety responsibility. The effectiveness of those remaining staff safety engineers was blocked by their dependence on the very Program they were charged to supervise. Also, the Board found many safety units with unclear roles and responsibilities that left crucial gaps. Post-Challenger NASA still had no systematic procedure for identifying and monitoring trends. The Board was sur- prised at how long it took NASA to put together trend data in response to Board requests for information. Problem reporting and tracking systems were still overloaded or
  • 16. underused, which undermined their very purpose. Mul- tiple job titles disguised the true extent of safety personnel shortages. The Board found cases in which the same person was occupying more than one safety position – and in one instance at least three positions – which compromised any possibility of safety organization independence because the jobs were established with built-in conflicts of interest. 8.4 ORGANIZATION, CULTURE, AND UNINTENDED CONSEQUENCES A number of changes to the Space Shuttle Program structure made in response to policy decisions had the unintended effect of perpetuating dangerous aspects of pre-Challenger culture and continued the pattern of normalizing things that were not supposed to happen. At the same time that NASA leaders were emphasizing the importance of safety, their personnel cutbacks sent other signals. Streamlining and downsizing, which scarcely go unnoticed by employees, convey a message that efficiency is an important goal. The Shuttle/Space Station partnership affected both pro- grams. Working evenings and weekends just to meet the International Space Station Node 2 deadline sent a signal to employees that schedule is important. When paired with the “faster, better, cheaper” NASA motto of the 1990s and cuts that dramatically decreased safety personnel, efficiency becomes a strong signal and safety a weak one. This kind of doublespeak by top administrators affects peopleʼs decisions and actions without them even realizing it.26 Changes in Space Shuttle Program structure contributed to the accident in a second important way. Despite the con- straints that the agency was under, prior to both accidents NASA appeared to be immersed in a culture of invincibility, in stark contradiction to post-accident reality. The Rogers Commission found a NASA blinded by its “Can-Do” atti-
  • 17. tude,27 a cultural artifact of the Apollo era that was inappro- priate in a Space Shuttle Program so strapped by schedule pressures and shortages that spare parts had to be cannibal- ized from one vehicle to launch another.28 This can-do atti- tude bolstered administrators ̓belief in an achievable launch rate, the belief that they had an operational system, and an unwillingness to listen to outside experts. The Aerospace Safety and Advisory Panel in a 1985 report told NASA that the vehicle was not operational and NASA should stop treating it as if it were.29 The Board found that even after the loss of Challenger, NASA was guilty of treating an experi- mental vehicle as if it were operational and of not listening to outside experts. In a repeat of the pre-Challenger warn- ing, the 1999 Shuttle Independent Assessment Team report reiterated that “the Shuttle was not an ʻoperational ̓vehicle in the usual meaning of the term.”30 Engineers and program planners were also affected by “Can-Do,” which, when taken too far, can create a reluctance to say that something cannot be done. How could the lessons of Challenger have been forgotten so quickly? Again, history was a factor. First, if success is measured by launches and landings,31 the machine ap- peared to be working successfully prior to both accidents. Challenger was the 25th launch. Seventeen years and 87 missions passed without major incident. Second, previous policy decisions again had an impact. NASAʼs Apollo-era research and development culture and its prized deference to the technical expertise of its working engineers was overridden in the Space Shuttle era by “bureaucratic ac- countability” – an allegiance to hierarchy, procedure, and following the chain of command.32 Prior to Challenger, the can-do culture was a result not just of years of apparently successful launches, but of the cultural belief that the Shut- tle Programʼs many structures, rigorous procedures, and
  • 18. detailed system of rules were responsible for those success- es.33 The Board noted that the pre-Challenger layers of pro- cesses, boards, and panels that had produced a false sense of confidence in the system and its level of safety returned in full force prior to Columbia. NASA made many changes to the Space Shuttle Program structure after Challenger. The fact that many changes had been made supported a belief in the safety of the system, the invincibility of organizational and technical systems, and ultimately, a sense that the foam problem was understood. 8.5 HISTORY AS CAUSE: TWO ACCIDENTS Risk, uncertainty, and history came together when unprec- edented circumstances arose prior to both accidents. For Challenger, the weather prediction for launch time the next day was for cold temperatures that were out of the engineer- ing experience base. For Columbia, a large foam hit – also outside the experience base – was discovered after launch. For the first case, all the discussion was pre-launch; for the second, it was post-launch. This initial difference de- termined the shape these two decision sequences took, the number of people who had information about the problem, and the locations of the involved parties. For Challenger, engineers at Morton-Thiokol,34 the Solid Rocket Motor contractor in Utah, were concerned about the effect of the unprecedented cold temperatures on the rubber O-rings.35 Because launch was scheduled for the next morning, the new condition required a reassessment of the engineering analysis presented at the Flight Readiness Review two weeks prior. A teleconference began at 8:45 p.m. Eastern Standard Time (EST) that included 34 people in three locations: Morton-Thiokol in Utah, Marshall, and Kennedy. Thiokol engineers were recommending a launch delay. A reconsideration of a Flight Readiness Review risk
  • 19. A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 2 0 0 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 1R e p o r t V o l u m e I A u g u s t 2 0 0 3 assessment the night before a launch was as unprecedented as the predicted cold temperatures. With no ground rules or procedures to guide their discussion, the participants auto- matically reverted to the centralized, hierarchical, tightly structured, and procedure-bound model used in Flight Read- iness Reviews. The entire discussion and decision to launch began and ended with this group of 34 engineers. The phone conference linking them together concluded at 11:15 p.m. EST after a decision to accept the risk and fly. For Columbia, information about the foam debris hit was widely distributed the day after launch. Time allowed for videos of the strike, initial assessments of the size and speed of the foam, and the approximate location of the impact to be dispersed throughout the agency. This was the first de- bris impact of this magnitude. Engineers at the Marshall, Johnson, Kennedy, and Langley centers showed initiative and jumped on the problem without direction from above. Working groups and e-mail groups formed spontaneously. The size of Johnsonʼs Debris Assessment Team alone neared and in some instances exceeded the total number of partici- pants in the 1986 Challenger teleconference. Rather than a tightly constructed exchange of information completed in a
  • 20. few hours, time allowed for the development of ideas and free-wheeling discussion among the engineering ranks. The early post-launch discussion among engineers and all later decision-making at management levels were decentralized, loosely organized, and with little form. While the spontane- ous and decentralized exchanging of information was evi- dence that NASA ̓ s original technical culture was alive and well, the diffuse form and lack of structure in the rest of the proceedings would have several negative consequences. In both situations, all new information was weighed and interpreted against past experience. Formal categories and cultural beliefs provide a consistent frame of reference in which people view and interpret information and experi- ences.36 Pre-existing definitions of risk shaped the actions taken and not taken. Worried engineers in 1986 and again in 2003 found it impossible to reverse the Flight Readiness Review risk assessments that foam and O-rings did not pose safety-of-flight concerns. These engineers could not prove that foam strikes and cold temperatures were unsafe, even though the previous analyses that declared them safe had been incomplete and were based on insufficient data and testing. Engineers ̓ failed attempts were not just a matter of psychological frames and interpretations. The obstacles these engineers faced were political and organizational. They were rooted in NASA history and the decisions of leaders that had altered NASA culture, structure, and the structure of the safety system and affected the social con- text of decision-making for both accidents. In the following comparison of these critical decision scenarios for Columbia and Challenger, the systemic problems in the NASA orga- nization are in italics, with the system effects on decision- making following. NASA had conflicting goals of cost, schedule, and safety. Safety lost out as the mandates of an “operational system”
  • 21. increased the schedule pressure. Scarce resources went to problems that were defined as more serious, rather than to foam strikes or O-ring erosion. In both situations, upper-level managers and engineering teams working the O-ring and foam strike problems held opposing definitions of risk. This was demonstrated imme- diately, as engineers reacted with urgency to the immediate safety implications: Thiokol engineers scrambled to put together an engineering assessment for the teleconference, Langley Research Center engineers initiated simulations of landings that were run after hours at Ames Research Center, and Boeing analysts worked through the weekend on the debris impact analysis. But key managers were re- sponding to additional demands of cost and schedule, which competed with their safety concerns. NASA ̓ s conflicting goals put engineers at a disadvantage before these new situ- ations even arose. In neither case did they have good data as a basis for decision-making. Because both problems had been previously normalized, resources sufficient for testing or hardware were not dedicated. The Space Shuttle Program had not produced good data on the correlation between cold temperature and O-ring resilience or good data on the poten- tial effect of bipod ramp foam debris hits.37 Cultural beliefs about the low risk O-rings and foam debris posed, backed by years of Flight Readiness Review deci- sions and successful missions, provided a frame of refer- ence against which the engineering analyses were judged. When confronted with the engineering risk assessments, top Shuttle Program managers held to the previous Flight Readi- ness Review assessments. In the Challenger teleconference, where engineers were recommending that NASA delay the launch, the Marshall Solid Rocket Booster Project manager, Lawrence Mulloy, repeatedly challenged the contractorʼs risk assessment and restated Thiokolʼs engineering ratio-
  • 22. nale for previous flights.38 STS-107 Mission Management Team Chair Linda Ham made many statements in meetings reiterating her understanding that foam was a maintenance problem and a turnaround issue, not a safety-of-flight issue. The effects of working as a manager in a culture with a cost/ efficiency/safety conflict showed in managerial responses. In both cases, managers ̓techniques focused on the information that tended to support the expected or desired result at that time. In both cases, believing the safety of the mission was not at risk, managers drew conclusions that minimized the risk of delay.39 At one point, Marshall s̓ Mulloy, believing in the previous Flight Readiness Review assessments, un- convinced by the engineering analysis, and concerned about the schedule implications of the 53-degree temperature limit on launch the engineers proposed, said, “My God, Thiokol, when do you want me to launch, next April?”40 Reflecting the overall goal of keeping to the Node 2 launch schedule, Ham s̓ priority was to avoid the delay of STS–114, the next mis- sion after STS-107. Ham was slated as Manager of Launch Integration for STS-114 – a dual role promoting a conflict of interest and a single-point failure, a situation that should be avoided in all organizational as well as technical systems. NASA s̓ culture of bureaucratic accountability emphasized chain of command, procedure, following the rules, and go- ing by the book. While rules and procedures were essential for coordination, they had an unintended but negative effect. Allegiance to hierarchy and procedure had replaced defer- ence to NASA engineers ̓technical expertise. A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA
  • 23. A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 2 0 0 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 1R e p o r t V o l u m e I A u g u s t 2 0 0 3 In both cases, engineers initially presented concerns as well as possible solutions – a request for images, a recommenda- tion to place temperature constraints on launch. Manage- ment did not listen to what their engineers were telling them. Instead, rules and procedures took priority. For Columbia, program managers turned off the Kennedy engineers ̓initial request for Department of Defense imagery, with apologies to Defense Department representatives for not having fol- lowed “proper channels.” In addition, NASA administrators asked for and promised corrective action to prevent such a violation of protocol from recurring. Debris Assessment Team analysts at Johnson were asked by managers to dem- onstrate a “mandatory need” for their imagery request, but were not told how to do that. Both Challenger and Columbia engineering teams were held to the usual quantitative stan- dard of proof. But it was a reverse of the usual circumstance: instead of having to prove it was safe to fly, they were asked to prove that it was unsafe to fly. In the Challenger teleconference, a key engineering chart presented a qualitative argument about the relationship be- tween cold temperatures and O-ring erosion that engineers were asked to prove. Thiokolʼs Roger Boisjoly said, “I had no data to quantify it. But I did say I knew it was away from goodness in the current data base.”41 Similarly, the Debris Assessment Team was asked to prove that the foam hit was a threat to flight safety, a determination that only the imag- ery they were requesting could help them make. Ignored by management was the qualitative data that the engineering
  • 24. teams did have: both instances were outside the experience base. In stark contrast to the requirement that engineers ad- here to protocol and hierarchy was managementʼs failure to apply this criterion to their own activities. The Mission Man- agement Team did not meet on a regular schedule during the mission, proceeded in a loose format that allowed informal influence and status differences to shape their decisions, and allowed unchallenged opinions and assumptions to prevail, all the while holding the engineers who were making risk assessments to higher standards. In highly uncertain circum- stances, when lives were immediately at risk, management failed to defer to its engineers and failed to recognize that different data standards – qualitative, subjective, and intui- tive – and different processes – democratic rather than proto- col and chain of command – were more appropriate. The organizational structure and hierarchy blocked effective communication of technical problems. Signals were over- looked, people were silenced, and useful information and dissenting views on technical issues did not surface at higher levels. What was communicated to parts of the organization was that O-ring erosion and foam debris were not problems. Structure and hierarchy represent power and status. For both Challenger and Columbia, employees ̓positions in the orga- nization determined the weight given to their information, by their own judgment and in the eyes of others. As a result, many signals of danger were missed. Relevant information that could have altered the course of events was available but was not presented. Early in the Challenger teleconference, some engineers who had important information did not speak up. They did not define themselves as qualified because of their position: they were not in an appropriate specialization, had not recently
  • 25. worked the O-ring problem, or did not have access to the “good data” that they assumed others more involved in key discussions would have.42 Geographic locations also re- sulted in missing signals. At one point, in light of Marshallʼs objections, Thiokol managers in Utah requested an “off-line caucus” to discuss their data. No consensus was reached, so a “management risk decision” was made. Managers voted and engineers did not. Thiokol managers came back on line, saying they had reversed their earlier NO-GO rec- ommendation, decided to accept risk, and would send new engineering charts to back their reversal. When a Marshall administrator asked, “Does anyone have anything to add to this?,” no one spoke. Engineers at Thiokol who still objected to the decision later testified that they were intimidated by management authority, were accustomed to turning their analysis over to managers and letting them decide, and did not have the quantitative data that would empower them to object further.43 In the more decentralized decision process prior to Columbia s̓ re-entry, structure and hierarchy again were re- sponsible for an absence of signals. The initial request for imagery came from the “low status” Kennedy Space Center, bypassed the Mission Management Team, and went directly to the Department of Defense separate from the all-power- ful Shuttle Program. By using the Engineering Directorate avenue to request imagery, the Debris Assessment Team was working at the margins of the hierarchy. But some signals were missing even when engineers traversed the appropriate channels. The Mission Management Team Chair s̓ position in the hierarchy governed what information she would or would not receive. Information was lost as it traveled up the hierar- chy. A demoralized Debris Assessment Team did not include a slide about the need for better imagery in their presentation to the Mission Evaluation Room. Their presentation included the Crater analysis, which they reported as incomplete and
  • 26. uncertain. However, the Mission Evaluation Room manager perceived the Boeing analysis as rigorous and quantitative. The choice of headings, arrangement of information, and size of bullets on the key chart served to highlight what manage- ment already believed. The uncertainties and assumptions that signaled danger dropped out of the information chain when the Mission Evaluation Room manager condensed the Debris Assessment Team s̓ formal presentation to an infor- mal verbal brief at the Mission Management Team meeting. As what the Board calls an “informal chain of command” began to shape STS-107ʼs outcome, location in the struc- ture empowered some to speak and silenced others. For example, a Thermal Protection System tile expert, who was a member of the Debris Assessment Team but had an office in the more prestigious Shuttle Program, used his personal network to shape the Mission Management Team view and snuff out dissent. The informal hierarchy among and within Centers was also influential. Early identifications of prob- lems by Marshall and Kennedy may have contributed to the Johnson-based Mission Management Teamʼs indifference to concerns about the foam strike. The engineers and managers circulating e-mails at Langley were peripheral to the Shuttle Program, not structurally connected to the proceedings, and A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 2 0 2 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 3R e p o r t V o l u m e I A u g u s t 2 0 0 3
  • 27. therefore of lower status. When asked in a post-accident press conference why they didnʼt voice their concerns to Shuttle Program management, the Langley engineers said that people “need to stick to their expertise.”44 Status mat- tered. In its absence, numbers were the great equalizer. One striking exception: the Debris Assessment Team tile expert was so influential that his word was taken as gospel, though he lacked the requisite expertise, data, or analysis to evaluate damage to RCC. For those with lesser standing, the requirement for data was stringent and inhibiting, which resulted in information that warned of danger not being passed up the chain. As in the teleconference, Debris As- sessment Team engineers did not speak up when the Mission Management Team Chair asked if anyone else had anything to say. Not only did they not have the numbers, they also were intimidated by the Mission Management Team Chairʼs position in the hierarchy and the conclusions she had already made. Debris Assessment Team members signed off on the Crater analysis, even though they had trouble understanding it. They still wanted images of Columbiaʼs left wing. In neither impending crisis did management recognize how structure and hierarchy can silence employees and follow through by polling participants, soliciting dissenting opin- ions, or bringing in outsiders who might have a different perspective or useful information. In perhaps the ultimate example of engineering concerns not making their way upstream, Challenger astronauts were told that the cold tem- perature was not a problem, and Columbia astronauts were told that the foam strike was not a problem. NASA structure changed as roles and responsibilities were transferred to contractors, which increased the dependence on the private sector for safety functions and risk assess- ment while simultaneously reducing the in-house capability
  • 28. to spot safety issues. A critical turning point in both decisions hung on the discus- sion of contractor risk assessments. Although both Thiokol and Boeing engineering assessments were replete with uncertainties, NASA ultimately accepted each. Thiokolʼs initial recommendation against the launch of Challenger was at first criticized by Marshall as flawed and unaccept- able. Thiokol was recommending an unheard-of delay on the eve of a launch, with schedule ramifications and NASA- contractor relationship repercussions. In the Thiokol off-line caucus, a senior vice president who seldom participated in these engineering discussions championed the Marshall engineering rationale for flight. When he told the managers present to “Take off your engineering hat and put on your management hat,” they reversed the position their own engineers had taken.45 Marshall engineers then accepted this assessment, deferring to the expertise of the contractor. NASA was dependent on Thiokol for the risk assessment, but the decision process was affected by the contractorʼs dependence on NASA. Not willing to be responsible for a delay, and swayed by the strength of Marshallʼs argument, the contractor did not act in the best interests of safety. Boeingʼs Crater analysis was performed in the context of the Debris Assessment Team, which was a collaborative effort that included Johnson, United Space Alliance, and Boeing. In this case, the decision process was also affected by NASA ̓ s dependence on the contractor. Unfamiliar with Crater, NASA engineers and managers had to rely on Boeing for interpretation and analysis, and did not have the train- ing necessary to evaluate the results. They accepted Boeing engineers ̓use of Crater to model a debris impact 400 times outside validated limits. NASA s̓ safety system lacked the resources, independence,
  • 29. personnel, and authority to successfully apply alternate per- spectives to developing problems. Overlapping roles and re- sponsibilities across multiple safety offices also undermined the possibility of a reliable system of checks and balances. NASA ̓ s “Silent Safety System” did nothing to alter the deci- sion-making that immediately preceded both accidents. No safety representatives were present during the Challenger teleconference – no one even thought to call them.46 In the case of Columbia, safety representatives were present at Mission Evaluation Room, Mission Management Team, and Debris Assessment Team meetings. However, rather than critically question or actively participate in the analysis, the safety representatives simply listened and concurred. 8.6 CHANGING NASAʼS ORGANIZATIONAL SYSTEM The echoes of Challenger in Columbia identified in this chapter have serious implications. These repeating patterns mean that flawed practices embedded in NASA ̓ s organiza- tional system continued for 20 years and made substantial contributions to both accidents. The Columbia Accident Investigation Board noted the same problems as the Rog- ers Commission. An organization system failure calls for corrective measures that address all relevant levels of the organization, but the Boardʼs investigation shows that for all its cutting-edge technologies, “diving-catch” rescues, and imaginative plans for the technology and the future of space exploration, NASA has shown very little understanding of the inner workings of its own organization. NASA managers believed that the agency had a strong safety culture, but the Board found that the agency had the same conflicting goals that it did before Challenger, when schedule concerns, production pressure, cost-cut-
  • 30. ting and a drive for ever-greater efficiency – all the signs of an “operational” enterprise – had eroded NASA ̓ s abil- ity to assure mission safety. The belief in a safety culture has even less credibility in light of repeated cuts of safety personnel and budgets – also conditions that existed before Challenger. NASA managers stated confidently that every- one was encouraged to speak up about safety issues and that the agency was responsive to those concerns, but the Board found evidence to the contrary in the responses to the Debris Assessment Teamʼs request for imagery, to the initiation of the imagery request from Kennedy Space Center, and to the “we were just ʻwhat-iffingʼ” e-mail concerns that did not reach the Mission Management Team. NASA ̓ s bureaucratic structure kept important information from reaching engi- neers and managers alike. The same NASA whose engineers showed initiative and a solid working knowledge of how to get things done fast had a managerial culture with an al- legiance to bureaucracy and cost-efficiency that squelched A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 2 0 2 R e p o r t V o l u m e I A u g u s t 2 0 0 3 2 0 3R e p o r t V o l u m e I A u g u s t 2 0 0 3 the engineers ̓ efforts. When it came to managers ̓ own ac- tions, however, a different set of rules prevailed. The Board found that Mission Management Team decision-making operated outside the rules even as it held its engineers to a stifling protocol. Management was not able to recognize
  • 31. that in unprecedented conditions, when lives are on the line, flexibility and democratic process should take priority over bureaucratic response.47 During the Columbia investigation, the Board consistently searched for causal principles that would explain both the technical and organizational system failures. These prin- ciples were needed to explain Columbia and its echoes of Challenger. They were also necessary to provide guidance for NASA. The Boardʼs analysis of organizational causes in Chapters 5, 6, and 7 supports the following principles that should govern the changes in the agencyʼs organizational system. The Boardʼs specific recommendations, based on these principles, are presented in Part Three. Leaders create culture. It is their responsibility to change it. Top administrators must take responsibility for risk, failure, and safety by remaining alert to the effects their decisions have on the system. Leaders are responsible for establishing the conditions that lead to their subordinates ̓ successes or failures. The past decisions of national lead- ers – the White House, Congress, and NASA Headquarters – set the Columbia accident in motion by creating resource and schedule strains that compromised the principles of a high-risk technology organization. The measure of NASA ̓ s success became how much costs were reduced and how ef- ficiently the schedule was met. But the Space Shuttle is not now, nor has it ever been, an operational vehicle. We cannot explore space on a fixed-cost basis. Nevertheless, due to International Space Station needs and scientific experiments that require particular timing and orbits, the Space Shuttle Program seems likely to continue to be schedule-driven. National leadership needs to recognize that NASA must fly only when it is ready. As the White House, Congress, and NASA Headquarters plan the future of human space flight, the goals and the resources required to achieve them safely
  • 32. must be aligned. Changes in organizational structure should be made only with careful consideration of their effect on the system and their possible unintended consequences. Changes that make the organization more complex may create new ways that it can fail.48 When changes are put in place, the risk of error initially increases, as old ways of doing things compete with new. Institutional memory is lost as personnel and records are moved and replaced. Changing the structure of organi- zations is complicated by external political and budgetary constraints, the inability of leaders to conceive of the full ramifications of their actions, the vested interests of insiders, and the failure to learn from the past.49 Nonetheless, changes must be made. The Shuttle Programʼs structure is a source of problems, not just because of the way it impedes the flow of information, but because it has had effects on the culture that contradict safety goals. NASAʼs blind spot is it believes it has a strong safety cul- ture. Program history shows that the loss of a truly indepen- dent, robust capability to protect the systemʼs fundamental requirements and specifications inevitably compromised those requirements, and therefore increased risk. The Shuttle Programʼs structure created power distributions that need new structuring, rules, and management training to restore deference to technical experts, empower engineers to get resources they need, and allow safety concerns to be freely aired. Strategies must increase the clarity, strength, and presence of signals that challenge assumptions about risk. Twice in NASA history, the agency embarked on a slippery slope that resulted in catastrophe. Each decision, taken by itself, seemed correct, routine, and indeed, insignificant and unremarkable.
  • 33. Yet in retrospect, the cumulative effect was stunning. In both pre-accident periods, events unfolded over a long time and in small increments rather than in sudden and dramatic occurrences. NASA ̓ s challenge is to design systems that maximize the clarity of signals, amplify weak signals so they can be tracked, and account for missing signals. For both ac- cidents there were moments when management definitions of risk might have been reversed were it not for the many missing signals – an absence of trend analysis, imagery data not obtained, concerns not voiced, information overlooked or dropped from briefings. A safety team must have equal and independent representation so that managers are not again lulled into complacency by shifting definitions of risk. It is obvious but worth acknowledging that people who are marginal and powerless in organizations may have useful information or opinions that they donʼt express. Even when these people are encouraged to speak, they find it intimidat- ing to contradict a leader s̓ strategy or a group consensus. Extra effort must be made to contribute all relevant informa- tion to discussions of risk. These strategies are important for all safety aspects, but especially necessary for ill-structured problems like O-rings and foam debris. Because ill-structured problems are less visible and therefore invite the normaliza- tion of deviance, they may be the most risky of all. Challenger launches on the ill-fated STS-33/51-L mission on Janu- ary 28, 1986. The Orbiter would be destroyed 73 seconds later. A C C I D E N T I N V E S T I G A T I O N B O A R D COLUMBIA 2 0 4 R e p o r t V o l u m e I A u g u s t 2 0 0 3
  • 34. ENDNOTES FOR CHAPTER 8 The citations that contain a reference to “CAIB document” with CAB or CTF followed by seven to eleven digits, such as CAB001-0010, refer to a document in the Columbia Accident Investigation Board database maintained by the Department of Justice and archived at the National Archives. 1 Turner studied 85 different accidents and disasters, noting a common pattern: each had a long incubation period in which hazards and warning signs prior to the accident were either ignored or misinterpreted. He called these “failures of foresight.” Barry Turner, Man-made Disasters, (London: Wykeham, 1978); Barry Turner and Nick Pidgeon, Man-made Disasters, 2nd ed. (Oxford: Butterworth Heinneman,1997). 2 Changing personnel is a typical response after an organization has some kind of harmful outcome. It has great symbolic value. A change in personnel points to individuals as the cause and removing them gives the false impression that the problems have been solved, leaving unresolved organizational system problems. See Scott Sagan, The Limits of Safety. Princeton: Princeton University Press, 1993. 3 Diane Vaughan, The Challenger Launch Decision: Risky
  • 35. Technology, Culture, and Deviance at NASA (Chicago: University of Chicago Press. 1996). 4 William H. Starbuck and Frances J. Milliken, “Challenger: Fine-tuning the Odds until Something Breaks.” Journal of Management Studies 23 (1988), pp. 319-40. 5 Report of the Presidential Commission on the Space Shuttle Challenger Accident, (Washington: Government Printing Office, 1986), Vol. II, Appendix H. 6 Alex Roland, “The Shuttle: Triumph or Turkey?” Discover, November 1985: pp. 29-49. 7 Report of the Presidential Commission, Vol. I, Ch. 6. 8 Turner, Man-made Disasters. 9 Vaughan, The Challenger Launch Decision, pp. 243-49, 253- 57, 262-64, 350-52, 356-72. 10 Turner, Man-made Disasters. 11 U.S. Congress, House, Investigation of the Challenger Accident, (Washington: Government Printing Office, 1986), pp. 149. 12 Report of the Presidential Commission, Vol. I, p. 148; Vol. IV, p. 1446. 13 Vaughan, The Challenger Launch Decision, p. 235. 14 Report of the Presidential Commission, Vol. I, pp. 1-3.
  • 36. 15 Howard E. McCurdy, “The Decay of NASAʼs Technical Culture,” Space Policy (November 1989), pp. 301-10. 16 Report of the Presidential Commission, Vol. I, pp. 164-177. 17 Report of the Presidential Commission, Vol. I, Ch. VII and VIII. 18 Report of the Presidential Commission, Vol. I, pp. 140. 19 For background on culture in general and engineering culture in particular, see Peter Whalley and Stephen R. Barley, “Technical Work in the Division of Labor: Stalking the Wily Anomaly,” in Stephen R. Barley and Julian Orr (eds.) Between Craft and Science, (Ithaca: Cornell University Press, 1997) pp. 23-53; Gideon Kunda, Engineering Culture: Control and Commitment in a High-Tech Corporation, (Philadelphia: Temple University Press, 1992); Peter Meiksins and James M. Watson, “Professional Autonomy and Organizational Constraint: The Case of Engineers,” Sociological Quarterly 30 (1989), pp. 561-85; Henry Petroski, To Engineer is Human: The Role of Failure in Successful Design (New York: St. Martinʼs, 1985); Edgar Schein. Organization Culture and Leadership, (San Francisco: Jossey-Bass, 1985); John Van Maanen and Stephen R. Barley, “Cultural Organization,” in Peter J. Frost, Larry F. Moore, Meryl Ries Louise, Craig C. Lundberg, and Joanne
  • 37. Martin (eds.) Organization Culture, (Beverly Hills: Sage, 1985). 20 Report of the Presidential Commission, Vol. I, pp. 82-111. 21 Harry McDonald, Report of the Shuttle Independent Assessment Team. 22 Report of the Presidential Commission, Vol. I, pp. 145-148. 23 Vaughan, The Challenger Launch Decision, pp. 257-264. 24 U. S. Congress, House, Investigation of the Challenger Accident, (Washington: Government Printing Office, 1986), pp. 70-71. 25 Report of the Presidential Commission, Vol. I, Ch.VII. 26 Mary Douglas, How Institutions Think (London: Routledge and Kegan Paul, 1987); Michael Burawoy, Manufacturing Consent (Chicago: University of Chicago Press, 1979). 27 Report of the Presidential Commission, Vol. I, pp. 171-173. 28 Report of the Presidential Commission, Vol. I, pp. 173-174. 29 National Aeronautics and Space Administration, Aerospace Safety Advisory Panel, “National Aeronautics and Space Administration Annual Report: Covering Calendar Year 1984,” (Washington: Government Printing Office, 1985). 30 Harry McDonald, Report of the Shuttle Independent Assessment Team. 31 Richard J. Feynman, “Personal Observations on Reliability of the
  • 38. Shuttle,” Report of the Presidential Commission, Appendix F:1. 32 Howard E. McCurdy, “The Decay of NASAʼs Technical Culture,” Space Policy (November 1989), pp. 301-10; See also Howard E. McCurdy, Inside NASA (Baltimore: Johns Hopkins University Press, 1993). 33 Diane Vaughan, “The Trickle-Down Effect: Policy Decisions, Risky Work, and the Challenger Tragedy,” California Management Review, 39, 2, Winter 1997. 34 Morton subsequently sold its propulsion division of Alcoa, and the company is now known as ATK Thiokol Propulsion. 35 Report of the Presidential Commission, pp. 82-118. 36 For discussions of how frames and cultural beliefs shape perceptions, see, e.g., Lee Clarke, “The Disqualification Heuristic: When Do Organizations Misperceive Risk?” in Social Problems and Public Policy, vol. 5, ed. R. Ted Youn and William F. Freudenberg, (Greenwich, CT: JAI, 1993); William Starbuck and Frances Milliken, “Executive Perceptual Filters – What They Notice and How They Make Sense,” in The Executive Effect, Donald C. Hambrick, ed. (Greenwich, CT: JAI Press, 1988); Daniel Kahneman, Paul Slovic, and Amos Tversky, eds. Judgment Under
  • 39. Uncertainty: Heuristics and Biases (Cambridge: Cambridge University Press, 1982); Carol A. Heimer, “Social Structure, Psychology, and the Estimation of Risk.” Annual Review of Sociology 14 (1988): 491-519; Stephen J. Pfohl, Predicting Dangerousness (Lexington, MA: Lexington Books, 1978). 37 Report of the Presidential Commission, Vol. IV: 791; Vaughan, The Challenger Launch Decision, p. 178. 38 Report of the Presidential Commission, Vol. I, pp. 91-92; Vol. IV, p. 612. 39 Report of the Presidential Commission, Vol. I, pp. 164-177; Chapter 6, this Report. 40 Report of the Presidential Commission, Vol. I, p. 90. 41 Report of the Presidential Commission, Vol. IV, pp. 791. For details of teleconference and engineering analysis, see Roger M. Boisjoly, “Ethical Decisions: Morton Thiokol and the Space Shuttle Challenger Disaster,” American Society of Mechanical Engineers, (Boston: 1987), pp. 1-13. 42 Vaughan, The Challenger Launch Decision, pp. 358-361. 43 Report of the Presidential Commission, Vol. I, pp. 88-89, 93. 44 Edward Wong, “E-Mail Writer Says He was Hypothesizing, Not
  • 40. Predicting Disaster,” New York Times, 11 March 2003, Sec. A- 20, Col. 1 (excerpts from press conference, Col. 3). 45 Report of the Presidential Commission, Vol. I, pp. 92-95. 46 Report of the Presidential Commission, Vol. I, p. 152. 47 Weick argues that in a risky situation, people need to learn how to “drop their tools:” learn to recognize when they are in unprecedented situations in which following the rules can be disastrous. See Karl E. Weick, “The Collapse of Sensemaking in Organizations: The Mann Gulch Disaster.” Administrative Science Quarterly 38, 1993, pp. 628-652. 48 Lee Clarke, Mission Improbable: Using Fantasy Documents to Tame Disaster, (Chicago: University of Chicago Press, 1999); Charles Perrow, Normal Accidents, op. cit.; Scott Sagan, The Limits of Safety, op. cit.; Diane Vaughan, “The Dark Side of Organizations,” Annual Review of Sociology, Vol. 25, 1999, pp. 271-305. 49 Typically, after a public failure, the responsible organization makes safety the priority. They sink resources into discovering what went wrong and lessons learned are on everyoneʼs minds. A boost in resources goes to safety to build on those lessons in order to prevent another failure. But concentrating on rebuilding, repair, and safety takes energy
  • 41. and resources from other goals. As the crisis ebbs and normal functioning returns, institutional memory grows short. The tendency is then to backslide, as external pressures force a return to operating goals. William R. Freudenberg, “Nothing Recedes Like Success? Risk Analysis and the Organizational Amplification of Risks,” Risk: Issues in Health and Safety 3, 1: 1992, pp. 1-35; Richard H. Hall, Organizations: Structures, Processes, and Outcomes, (Prentice-Hall. 1998), pp. 184-204; James G. March, Lee S. Sproull, and Michal Tamuz, “Learning from Samples of One or Fewer,” Organization Science, 2, 1: February 1991, pp. 1-13. Understanding the Challenger Disaster: Organizational Structure and the Design of Reliable Systems Author(s): C. F. Larry Heimann Source: The American Political Science Review, Vol. 87, No. 2 (Jun., 1993), pp. 421-435 Published by: American Political Science Association Stable URL: http://www.jstor.org/stable/2939051 Accessed: 16-08-2016 18:48 UTC Your use of the JSTOR archive indicates your acceptance of the
  • 42. Terms & Conditions of Use, available at http://about.jstor.org/terms JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected] Cambridge University Press, American Political Science Association are collaborating with JSTOR to digitize, preserve and extend access to The American Political Science Review This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms American Political Science Review Vol. 87, No. 2 June 1993 UNDERSTANDING THE CHALLENGER DISASTER: ORGANIZATIONAL STRUCTURE AND THE DESIGN OF RELIABLE SYSTEMS C. F. LARRY HEIMANN Michigan State University e destruction of the space shuttle Challenger was a tremendous blow to American space policy. To what extent was this loss the result of organizational
  • 43. factors at the National .1.. Aeronautics and Space Administration? To discuss this question analytically, we need a theory of organizational reliability and agency behavior. Martin Landau's work on redundancy and administrative performance provides a good starting point for such an effort. Expanding on Landau's work, Iformulate a more comprehensive theory of organizational reliability that incorporates both type I and type II errors. These principles are then applied in a study of NASA and its administrative behavior before and after the Challenger accident. O n 28 January 1986, the entire nation focused on a single event. Seventy-three seconds after lift-off, the space shuttle Challenger was de- stroyed in a powerful explosion fifty thousand feet above the Kennedy Space Center. The losses result- ing from this catastrophe were quite high. Seven astronauts, including Teacher-in-Space Christa McAuliffe, were killed as the shuttle broke apart and fell into the sea. The shuttle itself had to be replaced at a cost of over two billion dollars. The launch of many important commercial and military satellites had to be delayed as American space policy ground to a complete halt. The accident had a profound impact on the National Aeronautics and Space Administra- tion (NASA), as well. The agency's credibility and its reputation for flawless execution of complex techno- logical tasks were lost, along with the Challenger. To this day, the legacy of the Challenger haunts the decisions of both the agency and its political superi- ors in Congress and the White House.
  • 44. An examination of the shuttle remnants and other launch data revealed the technical cause of the acci- dent. The shuttle was destroyed when an O-ring seal on the right solid rocket motor failed, allowing the escaping hot gases to burn through and ignite the main fuel tank of liquid hydrogen and liquid oxygen. Although identifying the technical cause of this dis- aster is important, it represents only one aspect of the problem. Perrow (1984) has argued that organiza- tional and technological failures have become so intimately linked that to fully understand the cause of most major accidents, we must analyze both the administrative and technical aspects of the situation. While there have been some administrative critiques of NASA in the wake of this disaster, almost all have centered on issues of bureaucratic culture, such as the agency's propensity to ignore key evidence and its myopic view of its mission. Surprisingly little has been done on a systematic analysis of the NASA organization structure and how it may or may not have contributed to the loss of the Challenger. One reason for this void is that the analytical tool necessary-a comprehensive theory of organizational reliability-is still lacking. Those involved in the debate over structural design have adopted two op- posing stances. The traditional public administration focus on organizational design has involved the pur- suit of efficiency in the sense of minimizing costs for a given level of output. Implicit in this traditional analysis is that the reliable performance of the orga- nization was constant. The critical question, there- fore, has usually been how one might achieve same level of services at a lower cost. As a result, the policy recommendations from this traditional line of think-
  • 45. ing have been to streamline administrative systems and reduce organizational redundancy as much as possible. The work of Martin Landau (1969) on redundancy in organizations was a particularly important contri- bution to this debate, inasmuch as he recognized that administrative reliability is dependent on structural factors. In breaking with conventional wisdom, Landau's landmark 1969 essay, "Redundancy, Ration- ality, and the Problem of Duplication and Overlap," asserted that the critical question was not how to cut the costs of administrative performance but, rather, how to ensure the organization's effectiveness. Landau began by noting that, "no matter how much a part is perfected, there is always the chance that it will fail" (p. 350). As he observed, a streamlined system requires only one part to fail for the entire system to fail. Drawing on concepts from engineering reliability theory, Landau argued that redundancy built into the system can make an organization more reliable than any of its parts. Therefore, Landau concluded, administrative redundancy and duplica- tion can be an important part of effective govern- ment. Some work has been done to extend Landau's initial insights regarding organizational redundancy and administrative reliability, primarily by students and colleagues of Landau. Jon Bendor's (1985) Parallel Systems, originally written as a dissertation under Landau, was the most comprehensive effort to follow up on these ideas. Bendor formalized many of Land- au's concepts and was the first to conduct empirical testing of these propositions, focusing on transporta- tion planning and operations in three American met-
  • 46. 421 This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms Understanding the Challenger Disaster June 1993 ropolitan areas. Donald Chisholm's (1989) Coordina- tion without Hierarchy, also written originally as a dissertation under Landau, discussed the notion of reliability and redundancy as it existed in informal organizational structures. Others have extended some of Landau's concepts of reliability to the oper- ations of air traffic controllers and aircraft carrier battle groups (LaPorte and Consolini 1991; Rochlin, LaPorte, and Roberts 1987). The policy prescriptions that follow from these two positions are thus quite different. The tradition- alist argument tells us to reduce redundancy and streamline administrative systems whenever possi- ble, whereas Landau advocates increasing redun- dancy by adding parallel units to the system. Most interesting, however, is the fact that an analysis of the organizational changes at NASA reveals that neither the traditionalists nor Landau are fully cor- rect. Certain parts of the space agency followed a traditionalist policy of streamlining, while other seg- ments of NASA adopted the Landau perspective by generating new parallel linkages. Yet (as I shall show), both these decisions were wrong and contrib- uted to the untimely destruction of the Challenger.
  • 47. How can this be? The reason is a fundamental limitation in the current framework for discussing organizational reliability. Most work in this area has implicitly assumed that there was only one kind of institutional failure and thus only two possible states for organizational performance: the agency either adopted the proper policy or not. But it should be recognized that the latter possibility conceals two different problems: the agency can simply fail to act, or it can adopt an improper policy. Considering the impact of both forms of error is important, because it leads us to a different set of policy prescriptions. An organizational structure that is effective at preventing one type of error may not be equally effective at preventing the other type of error. To date, Bendor alone has formally recognized this distinction; but his analysis was limited (1985, 49-52). I shall extend our understanding of organizational reliability by exploring how each kind of failure is affected by different kinds of administrative struc- tures. I shall lay a foundation for an analysis of multiple forms of administrative failure by describing the principles from engineering reliability theory, then show how different kinds of structures leave the agency vulnerable to different kinds of errors. Apply- ing these arguments to the structure and decision- making process of NASA in the pre- and post- Challenger eras, I shall argue that the disaster had its roots in the structural changes adopted by the space agency in the 1970s and 1980s. Although occasionally cited by students of public administration, Landau's work on institutional per- formance and reliability has received little attention
  • 48. from political scientists. One possible reason for this stems from the engineering roots of reliability theory. Political scientists may have considered the issue of organizational redundancy and reliability simply to be a technocratic problem that could be solved by reference to engineering formulas that dictate the appropriate structural form and do not raise political issues. Indeed, if we limit the scope of the problem to two-state devices, this apolitical view of structural design has some merit. But in a system with multiple types of errors, trade-offs between the errors must be made; and this moves us into the realm of politics. Which goals will be embraced? How will resources be allocated to combat each kind of error? Answers to these questions are ultimately political issues. TWO TYPES OF ADMINISTRATIVE ERRORS Before proceeding further, it is important to discuss more thoroughly what is meant by the term policy failure. As just noted, administrative problems often have a richer structure than the simple two-state (operating/failed) model can incorporate. In particu- lar, bureaucracies are often in the position to commit two types of errors; (1) implemention of the wrong policy, an error of commission; and (2) failure to act when action is warranted, an error of omission. If we consider the agency's decision to take action to be comparable to the acceptance of a hypothesis, we can relate these two kinds of failures to the more familiar type I and type II errors often studied in statistics. To establish this link, we must first define the null and alternative hypothesis in terms of potential bu-
  • 49. reaucratic action. Since presumption often favors the status quo, let us consider the null hypothesis to be that the agency should not take any new action and the alternative hypothesis to be that the bureau should take such action. Therefore, if the agency chooses to act when it is improper to do so (rejects the null hypothesis when it is true), a type I error has been committed. Likewise, if the bureau fails to act in a situation where it should (accepts the null hypoth- esis when it is false), then a type II error has been committed. To illustrate the concept of type I and type II errors in organizational systems, let us consider the exam- ple of NASA and its decision to launch the space shuttle. The space agency traditionally approaches the launch decision with the assumption that a mis- sion is not safe to fly. Subordinates are then required to prove that such is not the case before the launch is permitted. The null hypothesis, therefore, is that the mission should be aborted. If NASA were to reject the null hypothesis by launching a mission that is actually unsafe, it would be committing a type I error. On the other hand, if NASA decided not to launch a mission that was technologically sound, then it would have committed a type II error. The agency's choices and consequences are summarized in Figure 1. Each form of failure is associated with a different set of costs. By committing a type II error, NASA loses the opportunity to achieve its objectives and wastes time, effort, and materials that could have 422
  • 50. This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms American Political Science Review Vol. 87, No. 2 Summary of NASA Responses and Possible Errors Regarding Launch Decisions The proper course of action: Launch Abort Correct Type I1 Error 0 : Decision e Mission Accident occurs; co cu~ision Possible loss of . successful life and/or successful equipment < U Type 11 Correct Z o Error Decision .: Missed opportunity; Accident wasted resources avoided null hypothesis: the mission should be aborted. been usefully employed elsewhere. For example, the shuttle's propellants in the external tank, liquid hy- drogen and liquid oxygen, are lost if the mission is scrubbed and have alone been valued at approxi- mately five hundred thousand dollars. Furthermore,
  • 51. the agency may forfeit the opportunity to carry out rare scientific research, as was the case when NASA missed the launch date for its ASTRO mission to study Halley's Comet. The Challenger accident clearly demonstrated, however, that a type I failure may be far more costly. The destroyed shuttle was replaced by the Endeavour at an expense over two billion dollars, and the death of the seven astronauts repre- sents an incalculable loss. Certainly, in the case of NASA, type I errors are associated with greater costs than type II failures. The exact cost trade-off between these types of failure would vary, of course, for different agencies and according to the individual circumstances prevailing at the time of each decision. For this reason, it is important to develop a general approach to the study of organizational reliability that recognizes both types of errors and identifies the consequences associated with them. By recognizing both forms of potential failure, we are better able to understand the nature of the trade- offs demanded. As with hypothesis testing, gains in type I reliability often come at the expense of type II reliability. However, just as it is conceivable to reduce both a and , in hypothesis testing by increasing the sample size, it is possible to increase both type I and type II reliability by raising an agency's resource levels. But because of resource limitations in the real world, it is inevitable that bureaucrats and their political superiors will have to strike a balance be- tween each type of reliability. On the whole, three- state devices allow us to consider both forms of error and thus provide a richer framework for studying the issue of organizational reliability.1
  • 52. BASIC CONCEPTS OF STRUCTURE AND RELIABILITY General Assumptions Throughout this discussion I will be contrasting com- ponent and system reliability. It is important to clarify, in advance, what is meant by these terms. A system is a collection of subunits, known as components, which are linked together in a particular structure. In administrative theory, identifying components is de- pendent on the system level in question. If one were concentrating on the behavior of an agency, then the agency as a whole is the system, and the offices within it are the components. Alternatively, if one were looking at the executive branch as a whole, that would be the system and each agency a component within the system. In an organizational context, de- termining what the components are depends largely on the system with which you are concerned. In this case study of NASA, the system of concern is the agency, and each component represents an office or division inside NASA. Three other assumptions of this work must also be specified. First, I start by assuming that the probabil- ity of failure for each component in the system has already been determined either by testing or past history,2 then, later, relax this assumption, demon- strating that the theoretical framework is valuable even when the exact probabilities of component fail- ure are not known. Making this assumption at the onset, however, is useful for developing the theory; since the components are assumed to be known quantities, the remaining question is how to assemble these components into a reliable network. Second, I
  • 53. assume that organizational reliability is static, not dynamic. In the engineering literature, reliability is often treated as time-dependent so as to simulate the breakdown of mechanical components; the probabil- ity of such failure naturally increases with age. For administrative systems, however, it is not clear how component reliability would change over time. One might argue that agents become more reliable over time because they have greater experience and exper- tise with the issues and are thus better able to address them. On the other hand, it could be said that agents are less reliable over time because they are more secure in their positions and lose their incentive to perform well. Additionally, interest-group capture of some public agency may affect the reliability of its performance. While these ideas raise interesting questions, static models of administrative reliability will provide sufficient insight for our needs here. Finally, I assume that the states of all components are statistically independent. In other words, the failure of one component or subsystem does not affect the probability of failure of other components. Those who study public administration may question the 423 This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms Understanding the Challenger Disaster June 1993 Redundant System in Parallel
  • 54. (a) serial configuration (b) parallel configuration validity of this assumption. Bendor, however, proves a theorem demonstrating that some degree of com- ponent interaction does not negate the general results of reliability theory (1985, 44 49). Therefore, the as- sumption of component independence is a useful sim- plifying assumption that does not undercut the gener- alizability of the results. With these assumptions in hand, we can now focus our attention on the structural configurations commonly found in organizations. Types of Organizational Structures There are several basic organizational forms em- ployed in the development of administrative sys- tems. The first is a serial structure, a form often found in organizations. In a traditional serial structure, pictured in Figure 2, the first component must cor- rectly process a policy initiative before sending it on to the'next component. To be effective, policy must successfully pass through each of these components. The result is that in order for the system as a whole to fail, it is only necessary for one of the components in series to fail. If any one component were to fail to pass the policy to the next unit, then all the compo- nents that followed it would be unable to act, and the policy could not get through the system. Another possible organizational form is the estab- lishment of parallel linkages between components. In a parallel structure, such as the one illustrated in Figure 2, a policy may pass through any one of the components in order to get to the implementation
  • 55. stage. It is different from the serial structure inasmuch as even if one or more units fail to pass the policy along, it may still be able to make it through the system. The end result is that for a policy to fail to get through this type of system, all components must fail. One variation of this structural form is the k-out- of-m unit network. In certain cases, we require that a certain number, k, of the m units in an active parallel redundant system must work for the system to be successful. One example of this type of system might be the requirement that at least two of the space shuttle's four major computers be on-line for launch. An organizational application of this system would be an agency director's decision rule not to imple- ment any policy that a majority of the staff cannot agree upon (k = + 1). There are two special cases of the k-out-of-m unit network which merit attention. The first instance is where k = 1, in which case we are back to a simple parallel system. The second case is where k = m, which effectively reduces the structure to a serial system. (See Appendix for proof.) In the latter case, although the network is configured as an active parallel system, in terms of reliability it is behaving as a serial structure. To differentiate between this type of system and a traditional serial network, we will classify this type of structure as a serially independent system. This name signifies that while the system is essentially a serial one, its components operate com- pletely independently from each other. In an organizational context, an important distinc- tion between a serially independent system and the
  • 56. traditional serial structure is the difference in process- ing time. In a traditional serial system, the first component must process the information, then pass it on to the next component for processing. This continues until all the components have processed the information. The total processing time of the system is the sum of processing times for each component plus some factor to account for transmis- sion delay between units. In contrast, the serially independent system allows all components to oper- ate simultaneously. The time needed for the serially independent system to complete its task is simply the time it takes the slowest component to finish its operations. So while both structures yield the same level of reliability, serially independent systems re- quire less processing time. Larger organizations often utilize combinations of serial and parallel structural forms. Two examples of this can be seen in Figure 3. In a series-parallel system there are several parallel subsystems linked together in a serial fashion. A parallel-series configuration is the result of several serial substructures combined into a larger parallel network. As these two examples illus- trate, most large and complex structures can be decom- posed into smaller units for easier analysis. THE ADVANTAGE OF PARALLEL SYSTEMS IN A TWO-STATE WORLD Components aligned in a series configuration are perhaps the easiest systems to analyze, as well as the most commonly encountered. As I noted earlier, in order for the system as a whole to fail, only one of the components in series has to fail. Defining the proba- bility of failure for component i as fi, a mathematical
  • 57. statement of the reliability of a serial system with m components would be m i = 1 424 This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms American Political Science Review Vol. 87, No. 2 Combinations of Structures with m Units in Series and n Units in Parallel 1,1 1,21, 2. 1 2.. . .. . . . . ..r 21,1 2,2 L 2,m (a) Series-Parallel Configuration 1,1 ~~~1,2 . ..... 1.m By...~~~~~~~~.......... 2,1 222,m (b) Parallel-Series Configuration We can see that two factors determine the reliability of a series system: component reliability and the number of components in the system. In order to
  • 58. increase the reliability of this type of system, one must either increase the reliability of the components or decrease the total number of components em- ployed. As illustrated in Figure 4, marginal gains in system reliability from increasing component perfor- mance decrease as component reliability increases. Because the costs of increasing component reliability often rise exponentially, it is more effective in many cases to reduce the total number of components in series to reach reliability goals. Perhaps the most common means of increasing reliability in a two-state world is to add parallel components to a system. It is assumed that all branches are active and that a signal needs to pass through only one branch to be successfully transmit- ted. Since it is assumed that all branches must fail in order for the system to fail, the reliability function for a parallel system with n components is simply n i = 1 Figure 7 illustrates the relationship between compo- nent reliability, the number of parallel elements, and overall system reliability. In this case, we see that the Declining Reliability of Serial Systems 0.90 , R = 0.99 n 0.80 '= 0.98 0 0.70 .
  • 59. 2 ~~~~~~~~0 4.. ~~~~~~~0 * 0.60 o l _ ?0 R=0.95 0.50 R Component Reliability | 0 0 0.40 .............. 1 3 5 7 9 11 13 15 Number of Components In Series marginal gains from adding parallel channels de- crease as the number of parallel components in- creases. In comparing Figures 4 and 5, the attractiveness of parallel systems in a two-state world is clear. Holding component reliability constant, the addition of redun- dancy in a parallel fashion will raise the reliability of the overall system, while creating serial redundancies decreases total system reliability. It is not surprising, therefore, that many scholars in this area have spurned serial systems and focused, instead, on parallel linkages when discussing this issue. Press- man and Wildavsky (1973) have criticized the exist- ence of "multiple clearance points" (a serial system) for an implementive decision because it reduces the likelihood that any policy can ever be executed. Landau (1969) directs the same logic against "stream- lined" serial structures, claiming that such systems are more susceptible to failure than those with paral-
  • 60. Increasing Reliability with Parallel Systems 1.00n 4 0.90 n= 0.80 n=2 cc 0.70 E , 0.60 n= n = number of parallel components 0.50 0.40 I l l l l I l l l 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Component Reliability 425 This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms Understanding the Challenger Disaster June 1993 lel redundancies. As a result, the major policy pre- scription of the work of Landau (1969, 1973, 1991) and
  • 61. others (Bendor 1985; Lerner 1986) has been to empha- size the use of parallel linkages in order to create more reliable organizations. However, if it is true that serial systems are gener- ally less reliable, why are such structures ever adopted by organizations? To answer this question fully, we must go beyond the two-state world to allow for multiple forms of error. TYPE I AND TYPE II ERRORS IN AN ADMINISTRATIVE SETTING The Advantages of Serial Systems in a Three-State World As we have seen in the two-state world, serial struc- tures are less reliable than others of comparable size. Nonetheless, it is clear that this structural form has been employed repeatedly in the development of bureaucracies. The reason this type of organizational structure has survived, I will suggest, is that it is valuable when we must design systems to accommo- date multiple types of error. I previously assumed that there existed only two states for every compo- nent: good (operating) and failed (not operating). I shall now elaborate a'theory of organizational reliabil- ity that allows for the existence of both type I and type II errors. In general, a series structure is better suited to stop type I errors from occurring. Since the null hypothe- sis is that the agency should not take any action, if any component chooses to pass the proposed policy through its part of the system, then it is rejecting the null hypothesis. For any policy to pass through a
  • 62. serial system successfully, all units must agree to pass it along. If rejecting the null hypothesis was actually the incorrect decision (a type I error), then all units must commit such an error for the serial system as a whole to fail. Mathematically, we can identify both the probability of a type I failure occurring in the system (Fa) and the reliability of the system against this error (Ra) as m Fa= [I ai m Ra = 1 - H ai, i= 1 where ca is the probability of a type I error occurring in the ith component and m is the number of compo- nents linked in series. For an example, consider NASA's launch decision process. We would find that a serial structure is more effective in preventing unsafe launches. In order to approve an unsafe launch in a serial system, every unit must err. The more hurdles are established (via increasing numbers of serial components), the harder it is for an unsafe launch proposal to pass through the system unopposed. With regard to type II errors, however, series structures are less effective. Consider the fact that if one unit accepts the null hypothesis of no new action, then the policy cannot be passed through the rest of
  • 63. the serial system. If accepting the null hypothesis was actually incorrect (a type II error), then the whole system would have failed. Thus, in order for a type II error to occur at the system level, only one compo- nent in series must fail in this manner for the system to fail. The more components added in series, the greater the probability that a component will commit a type II error, causing the system to fail (Fit). The reliability of a series structure with regards to type II errors (RB) can be represented as m F03l rl -oi) i = 1 m i= 1 where f3i is the probability of a type II error occurring in the ith component and m is the number of compo- nents linked in series. We may also be interested in the overall reliability of the system-the likelihood that an agency would commit either a type I or type II error. To find this, we assume that for a given event at a given time, it is not possible for an administrative system to commit both a type I and a type II error simultaneously. This assumption is not unreasonable. A type I error re- quires that the agency act, while a type II error demands that the agency postpone any action. An organization cannot both act and not act at the same time. Consider NASA's decision to launch the shut- tle. The space agency can either decide to launch the
  • 64. shuttle now (possibly resulting in a type I error) or abort the launch until another time (possibly result- ing in a type II error). It cannot decide to both launch and abort at the same time. Given this argument, we can conclude that type I and type II errors are, at any point in time, mutually exclusive events. Therefore, the probability of either a type I or type II error in a serial system can be found as P(Fa U Fa) = P(Fa) + P(FO) = m e m niai+1 rl (1-oi) i=1 i=+ The overall reliability of the serial system is then found to be Rsys = 1 - P(Fa U Fp) 1 - (fi a i+ 1 -fi(1 ni )) 426 This content downloaded from 69.67.124.210 on Tue, 16 Aug 2016 18:48:44 UTC All use subject to http://about.jstor.org/terms American Political Science Review Vol. 87, No. 2 m m