Extensible Python: Robustness through Addition - PyCon 2024
Tim.barth mark.nappi
1. Beyond Band-Aid Solutions:
Proactively Reducing Mishap Risks
Project Management Challenge 2010
Presenters:
Tim Barth, NASA Engineering and Safety Center
Mark Nappi, United Space Alliance
1
Used with Permission
3. Background
Safety performance in Shuttle ground operations over the history of
the Shuttle Program is commendable
Many safety improvements over the years
We can still raise the bar
KSC is proactively reduce mishap
risks through Shuttle fly-out
Hardware/
Significant numbers of hardware Software
and software changes/challenges, Systems
process changes/challenges, and
workforce changes/challenges
happening at the same time
Tasks and
Systems will be stretched, Workers
Processes
especially with workforce
challenges
Making KSC organizational
Work Environment
systems and processes more
robust to handle these changes
and challenges reduces the risks
of mishaps, process escapes, and
other adverse events
3
4. Background - continued
“We must challenge our assumptions, recognize
our risks, and address each difficulty directly
and openly so that we can operate more safely
and more successfully than we did yesterday, or
last month, or last year. We must always strive
to be better, and to do better.“
Chris Scolese, Day of Remembrance Memo, Jan. 29, 2009
“Space shuttle safety is not a random event. It is
derived from carefully understanding and then
controlling or mitigating known risks.”
Richard Covey, Florida Today, Jan. 15, 2009
4
5. Staying on the Cutting Edge of
Investigative Methods and Tools
Mishaps, close-calls, and process escapes are learning
opportunities
Steady evolution of investigative techniques and
capabilities over the past 20 years in Shuttle ground ops
Joint NASA/Contractor Human Factors Team
Perry Committee
Human factors model
Human factors reps on investigation teams
Industrial and Human Engineering Groups
Standing Accident Investigation Boards
Additional investigation teams
White papers
Corrective Action Engineering
Software and experts for root cause analysis
“No one wants to learn by mistakes, but we cannot learn enough
from successes to go beyond the state of the art.”
Henry Petrosky, To Engineer is Human
5
6. Methodology Development and
Implementation Timeline
STS-115 Sept 9-23, 2006
STS-116 Dec 9-22, 2006
KSC Safety STS-117 June 8-22, 2007
Stand-down STS-118 Aug 9-21, 2007
March 16, 2006 STS-120 Oct 23–Nov 6, 2007
STS-122 Feb 7-20, 2008
STS-123 March 11-26, 2008
Columbia STS-124 May 31-June 14, 2008
Tragedy STS-126 Nov 14-30, 2008
Feb 1, 2003 STS-119 Mar 15-25, 2009
NESC
STS-125 May 11-24, 2009
Established
STS-114 STS-127 July 15-31, 2009
Nov 1, 2003
July 26-Aug 9, 2005 STS-121 STS-128 Aug 28-Sept 11, 2009
July 4-17, 2006 STS-129 Nov 16-27, 2009
2002 2003 2004 2005 2006 2007 2008 2009
Risk Red. Actions
Sept 2008 – present
Shuttle Processing
Initial Research Methodology Development & Validation Risk Reduction Action Dev
Mishap Study
Jan 2002 – Feb 2003 March 2003 – Aug 2006 & Process Escape
Aug 2006 – May 2008
Assessment
Mishap Study (mishaps from Feb
June 2008 – Jan 2009
Kicked off by PH 2003 – May 2008)
& USA Mgmt
KSC Shuttle
Aug 2006
Processing All-Star
Off-site Meeting
June 2008
“The NESC gains insight into the technical activities of programs/projects
through…systems engineering reviews and independent trend or pattern analyses of
program/project technical problems, technical issues, mishaps, and close calls within
and across programs/projects”
NESC Management Plan, Feb. 2008 6
7. Fundamental (System-level) and
Symptomatic Solutions
Two “balancing processes”
compete for control of a
problem symptom
Proactive & reactive
Preventive & corrective
Both solutions treat the
symptom, but only the
fundamental solution treats the
system-level issue
Medical analogies: lung cancer,
diabetes
Symptomatic solution
frequently has the side effect
of deferring the fundamental
solution, making it harder to from Peter Senge, “Systemic Leadership and Change”
achieve
7
8. Swiss Cheese Model of Defenses
Active Failures Individual
Human Error
Defenses
Production Activities Adverse
Latent Failures Event
Pre Conditions
Line Management /
Support Activities
Decision Makers
Error trajectory passes through
corresponding holes in the layers
of defenses, barriers,
safeguards, and controls
Adapted from James Reason
8
9. Contributing Factors and Causes
Influence chain assessments focus on
DIRECT CONTRIBUTING FACTORS
Mishap investigations
and CAUSES
focus on CAUSES
Indirect Contributing FACTORS
Direct Contributing FACTORS
CAUSES
Root or
Contributing Proximate
Probable
Causes Causes
Cause(s)
9
10. Influence Chain Mapping Methodology
Specifically designed to step back from individual
mishaps to evaluate trends and patterns in
contributing factors/causes in order to identify the
most significant system-level safety issues
Shuttle mishap “recurring cause” study
Complements (does not replace) root cause
analysis methods
Explicitly models the influences between
organizational systems and individual behaviors
of front-line workers
Emphasizes absent barriers/controls in addition
to failed barriers/controls
10
11. Dual Role Model for Addressing
System-Level Safety Issues
11
12. Dual Role Taxonomy of
Contributing Factors and Causes
Control System Factors
Dual Role Factors
Local Resource Factors
12
16. Mobile Crane Mishap
Completed Influence Chain (IC) Map
Control System Factors
3a
2a
1a
SUMMARY
- 3 influence chains (major issues)
Dual Role Factors - 9 contributing factors
1b
2b
Local Resource Factors
1c, 3c
3b
2c
Key: IC Contributing Factor
IC Influence Link
16
17. “Swiss Cheese” Model
Crane Impact with
for Mobile Crane Material Resources & Facility
Individuals
Work Environment
Mishap Emotional
Factors
Quality Control
Cognitive
Factors
Supervision
Incomplete Support Equip
Procedures
Training Systems Team Comm Feedback
Task Team Operational Support
Senior Leadership Procedures Information
Enabling Systems
Schedule Controls
Procedure Design
Support Equip Design
Design & Dev Systems
17
18. Mobile Crane Mishap
Influence Chain Cont. Factors + SAIB Findings + SAIB Recommendations
Control System Factors
F2.3
F2.2
F2.1
F1.1
Dual Role Factors
F3.1
F1.2
Local Resource Factors
2 IA cf’s F1.2-through corrective action link
Key: IC Contributing Factor SAIB Contributing Factor SAIB Corrective Action
18
IC Influence Link SAIB Corr. Action Link
19. Event-Specific Risk Reduction
Recommendations
From a human-
system integration
perspective, a Least Effective
vulnerable system
enables workers to vulnerable Accept
make unintentional
errors and/or cause Inspect,
collateral damage Warn,
average Train,
A well designed Add Procedure Details
(robust or resistant)
system enables
workers to avoid Design,
resistant Guard,
errors and collateral
damage
Most Effective Provide System Feedback
“To err is human, but errors can be prevented.”
National Institute of Medicine
19
20. Event-Specific Recommendations
Examples
Crane Boom Impact with VAB Structure
(04/20/04)
Install a sensor system with beepers and/or
flashing lights on the mobile cranes that are
activated when the cranes are moved in the
destowed position, similar to backup
beepers on trucks.
Freedom Star
Retrieval Ship
Frustum Incident
(12/10/06)
Replace the polyester
straps used to secure
the frustum to the deck.
Consider using steel
cables and the
frustum’s cable attach
points used for VAB
stacking operations.
20
21. Event-Specific Recommendations
Additional Examples
OPF HB1 Platform
System Leak onto
OV-104 RH OMS
Pod TPS (10/26/04)
Re-implement torque
stripe requirements for
facility KC/AA fittings.
Crane Overturned on Pad B
Surface (01/31/05)
Install tire pressure indicators
in crane tires to provide visual
indications of low tire pressure
and potential instability issues.
21
22. “Recurring Cause” Summary
Influence chain assessments were completed
for over 60% of Standing Accident
Investigation Board (SAIB) reports from
February 2003 through May 2008
Observed similar trends and patterns in
contributing factors/causes to process
escapes and process catches from August
2008 through January 2009
Results of aggregate data analysis were used
to formulate system-level risk reduction
actions
22
23. Aggregate Data Analysis Results
“Top 8” Proactive Risk Reduction Opportunities
Control System Factors
Major Factors • Control System or Dual Role Factor • Frequency unaddressed by SAIB
• Non-design issue • Part of influence chains
in Analysis:
Dual Role Factors • Frequency of occurrence • Emerging risk area
Local Resource Factors
Key:
Proactive risk reduction opportunity for Shuttle 23
ground operations
24. Development of System-Level
Risk Reduction Actions
Selected Shuttle processing “all-stars” developed
recommendations for actions focused on buying
down the risk of mishaps and process escapes
Recognized leaders from Engineering, Shop, and
Operations organizations in different facilities
Reviewed the data and applied their knowledge of
operational practices
Some recommendations were not practical to
implement at this point in the Shuttle Program
Results presented to Ground Operations Steering
Committee
Multiple iterations of risk reduction action plans
24
26. Performance Self-Assessments
Designed to stimulate a two-way conversation
between supervisors and employees to identify:
What went well (recognize successes)
Opportunities for improvement (identify and manage
risk)
Positive behaviors (reinforce and encourage)
Similar to post task de-briefings
Minimum 1x/month
Listen and learn: “together we’re smarter and
safer”
26
27. “Do Not Use or Operate” Tags
Visual operational
constraint system to alert
and inform personnel of the
following conditions:
Out of configuration hardware
with potential to be forgotten
In-process work unattended
for more than one shift
Replaces an ad hoc system
(tape) for stationary GSE
panel set-ups and portable
GSE
OSHA lock-out tag-out (LOTO)
Operating procedure
released 27
28. Systems Training for Loaned
Personnel
A new process to reduce risks
of mishaps associated with
personnel loaned to other
facilities or Programs
Flight systems, unique facility
systems, and GSE
The need for support and
applicable skills are matched to
a group capabilities model
Identifies requisite skills and
provides management the
opportunity to assure any deltas
to equivalent training are
addressed before work begins
Prior to returning to the home
department, the employee
receives notification to review
current policies/practices
28
29. Risk Assessment Enhancements
Ground Operations Risk Assessment STS 117
(GORA) performed for any first-time or
infrequent task, unplanned task
(especially unplanned work performed in
previously closed out work areas),
troubleshooting, hazardous jobs, or tasks
with unusual test assemblies/setups
Scope of each Process Failure Modes and
Effects Analysis (PFMEA) and GORA STS 124
includes pre-ops and close-out
inspections
Require an assessment of similar
operations for associated mishaps or
process escapes
Technician and human factors
engineering support
Team members communicate identified STS 128
risks
NESC support to KSC Risk Review Board
29
30. Problem Resolution Center and
Flow Management Workshop
Problem Resolution Center deploys floor
engineers to "hot spots" to help resolve
technical and scheduling issues real-time
Roving troubleshooters
Joint NASA/USA Flow Management Workshop
addressed the following issues (what to do,
what NOT to do):
Workload vs. right resources
Constellation and Shuttle co-existence
Uncertainty
Critical skills and sharing resources
Maintaining focus and attention to detail
30
31. Crucial Conversations Training
Communication skills training
to increase trust and dialog
during Shuttle fly-out and
transition
Focus is on making it safe to
talk about anything by creating
mutual purpose and mutual
respect
USA Ground Operations and
NASA Shuttle Processing
managers and supervisors
have received training
31
32. Summary
Proactive risk reduction efforts will continue
through Shuttle fly-out
Influence chain methodology complements root
cause analysis efforts
Study results have also been applied to
Constellation systems
Human factors engineering pathfinder for GSE
designers
Ground support equipment (GSE) design reviews
Ground operations planning and operability
enhancements
Orion assembly and processing
"Complex systems sometimes fail in complex ways.
Sometimes you have to work pretty hard to pin down
those complex failure mechanisms. But if you can do
that, you will have done the system a great service.”
Admiral Gehman, Chair of the Columbia Accident Investigation Board
32