Sean carter dan_deans

11
Sean Carter, NASA JSC
Daniel Deans, ManTech SRS Technologies
Constellation Reliability
Engineering Process –
Optimizing CxP Risk
Used with Permission

2
DFRAM Overview
 Why does reliability engineering exist?
 How does it fit within the life cycle?
 Success space vs. failure space
 Partnership on system engineering team
 The value of “designing-out” failure modes
 Where does it fit in the lifecycle?
 What are some of the tools?
 How are they applied?
 Real examples
2

3
 Failure is not an option…
 A design engineer does not
know what he does not know
 An extra set of eyes and ears
is always good
 You have to spend money to
make money
 Mr. Murphy tends to rear
his ugly head when you are
not expecting it…
 What all this means is: You
have to work at it – nothing
worth accomplishing
comes easy
 Reliability engineering is a
discipline that adds value
to the systems engineering
process!
3
Reliability Engineering Value - Clichés

4
Typical System Engineering Lifecycle

5
Reliability Engineering Throughout Project Life

66
The Life Cycle Approach
 Reliability is best designed-in;
it is, for the most part, not:
 Analyzed in
 Tested in
 Operated in
 Successful reliability performance
begins with a diligent, intentional
approach at the very beginning of a project
 Pre-phase A: requirements
 Phase A: allocation; plan; resources
 Phase B: analysis, design input, preliminary design review
 Phase C: detailed design inputs; more analysis; trade studies;
design verification; critical design review
 Phase D: test planning, test readiness, manufacturing, final
validation; flight readiness review
 Phase E/F: ops, growth, disposal and lessons learned
System EngineeringSystem Engineering Test and AssessmentTest and Assessment
Element
Integration & Test
System
Integration Test
System Element
Data Reduction and
Assessment
System Concept
Exploration
Preliminary
Design
Design Synthesis
Component Fabrication, Assembly,
Integrate, & Test
Requirements
Compliance
Configuration
Management
Project Direction,
Control, & Planning
Risk
Management
System
Analysis
Project
Direction
and
Control
Project
Direction
and
Control
• System, Element,
Subsystem Models
• System Performance
Analyses
• Specifications
• Verification
• Management Plan
• Budget Development & Control
• Project Plan Development
• Schedule Development & Control
• Design Data Base
• Problem/Failure
Reports (PFR)
• Engineering Change
Orders
• Risk Planning
• Risk Assessment
• Risk Handling/Mitigation
• Risk Monitoring

77
Success Space vs. Failure Space
 A design engineer thinks in success space (typically)
 How will the widget work?
 When it is designed, what function will it perform?
 What are the performance requirements?
 Reliability engineer paid to think in failure space
 How will the widget fail?
 What about the operating environment will cause issues?
 What materials, processes, and tools will accentuate failure modes?
 Is redundancy required
 Are there operational work-arounds?
 How will faults propagate through the system?
 What are the effects of a failure mode on the mission
 Superimpose the two processes, you get success!

88
Credibility: Partnership on
System Engineering Team
 Safety and Mission Assurance organization provides
discipline experts to support design teams
 Our job is to serve; not to inhibit
 We help the system engineering teams identify
hazards and failure modes and design them out
 Our sole reason for existing is to ensure
project/program success and to reduce/eliminate
operational risk
 We are partners for success
 The aim in partnership is to duplicate our knowledge
in the collective heads of our design-team partners

9
The Value of “Designing-Out” Failure Modes
 A failure mode is an obstacle to mission success
 Not all may cause mission failure, but, any failure of a
component has potential
 In the commercial world, a failure in the field costs 10 times
what it costs to mitigate in the design process
 In the space business, a failure can and will cost the
mission and quite possibly endanger people
 Identifying and designing-out failure modes is important!
9Company Confidential

1010
How Do We Design Out Failure Modes?
 Methodical process; starts in pre-phase A, follows the lifecycle.
 DMEDI – Define, Measure, Explore, Develop, Implement
(12 steps)
 Define requirements
 Allocate requirements
 Plan activities and analysis, including test and verification
 Collect data and develop data sources
 Use RAM simulation, FMEA, FTA, worst case analysis, derating,
proven design practices to drive the design
 Support design reviews and require improvement
 Verify and ensure that design will meet requirements
 Plan and implement thorough testing
 Finalize verification, ascertain flight readiness
 Identify reliability growth opportunities once design is complete
 Investigate and eliminate root causes to anomalies
 Develop lessons learned, provide feedback to future engineering teams

11
Pre-Phase A Concept
Development
 Very important part of process –
DFRAM starts here
 Develop requirements that will
optimize RAM for program/project
 Requirements include availability,
mean time to failure, fault tolerance,
mean time to repair, time to replace
 Import lessons learned from similar
programs/systems
 Collect similar system failure history
data
 Begin development of system model
 Begin development of RAM Plan

12
Phase A: Preliminary Analysis
 Refine requirements, negotiate
allocations with design elements
 Finalize RAM Plan and educate design
team on process; what role reliability
engineering team will fill
 Continue to develop preliminary model;
begin FMEAs, FTAs, Probabilistic
assessments
 Allocate requirements to lowest
design-to level
 Negotiate failure definitions, failure
budgets with design teams
 Identify initial critical items, compare with
lessons learned from previous systems
 Continue to identify data sources
 Identify critical suppliers; begin to form
partnerships

13
Phase B – Preliminary Design
 Continue to build simulation (model) and
add more details
 Identify most effective analyses tools to use
to drive design
 Complete preliminary FMEA, FTA, PRA
 Continue to develop supplier partnerships
 Prepare for preliminary design review
 Perform maintenance task analysis
 Identify design improvement initiatives and
optimize using simulation
 Perform other sensitivity studies based on
fault tolerance requirements
 Begin developing and finalizing FRACAS,
test plans, reliability growth strategy
 Partner with designers to identify failure
modes, design them out
 Support concept of operations optimization

14
Phase C – Detailed Design
 Perform detailed design analysis – PDR recovery
 Focus on pareto items identified from analyses (Top 10)
 Continue to develop and use RAM simulation, FMEA,
FTA, etc. to design out failure modes
 Use Con-Ops to develop operational work-arounds as
failure mode mitigation
 Finalize test plans –review for reliability success criteria
 Audit suppliers, provide support for reliability
improvement
 Mitigate schedule risks
 Finalize critical items, document for testing
 Begin life testing of components and subsystems as
feasible
 Perform specialized analysis (sneaks, fault propagation)
 Prepare for and support CDR

15
Phase D –Development
 Finalize design - CDR recovery, cut into
manufacturing
 Finalize FMEAs, FTAs, Simulations, CILs
 Support testing, root cause
investigations and corrective action
 Begin collection of failure and
operational history data (upon first
application of power)
 Finalize reliability growth strategy
 Develop and begin implementation of
reliability-centered maintenance
approach
 Make “last minute” improvements based
on test results
 Identify lessons learned and document
 Update Con-Ops with operational work-
arounds for critical items

16
Phase E/F – Ops and Disposal
 Continue to gather data, monitor
operations for anomalies
 Support failure analyses, root cause
investigations
 Implement reliability growth process,
identify areas for growth, design
solutions
 Document lessons learned
 Use simulation to validate reliability
growth strategy, sensitivities
 Update RAM Plan with lessons
learned
 Support system disposal via
identification of reliability challenges
to shutdown

17
What are the Tools?
 Some of the tools that we use are:
 Requirements allocation
 RAM simulation/probabilistic risk assessment
 FMEA/FMECA
 Fault tree analysis (FTA)/event tree assessment
 Parts stress analysis/derating
 Detailed design analysis
 Worst case analysis
 Redundancy screens
 Extensive testing and verification analysis
 Reliability growth planning and implementation
 Others….

18
Reliability and Maintainability Simulation
 A very powerful process
 Can help design out failure modes without cutting metal
 Provides for the Pareto Principle (20/80)
 Gives design team a tool for sensitivity analysis
 Allows for trying many different scenarios
 Helps to optimize the return on investment based on cost to
improve curve
$ Cost
Reliability
High rate of return
KITC
Area of diminishing return
KITC = Point on Curve where rise
becomes less than run (reliability
improvement = rise, cost to
improve = run)

19
Simulation Basics
 Simulations are built based on the system architecture
 Model provides for “RAM” characteristics of system
 Input data includes failure rates, repair times, sparing
information, logistics information, operational work-
arounds
 Simulation is run based on mission profiles
 “Monte Carlo” methodology is used
 Typically data is input using statistical distributions
 Outputs are system availability and cutsets (and other
failure “illuminators”)
 Cutsets lead to sensitivity analyses which in turn can
drive improvements (failure mode elimination)

20
RAM Simulation Example
 Simulation is dynamic, not static analysis
 Can provide much information about overall availability
of system under many different sets of conditions
 Today’s tools can include operational concepts and
rules, optimization of spares (some automatic)
 Requires specific input data

21
How Results are Used
 Outputs of baseline simulations are verified and
validated using expert elicitation
 Once all agree that the simulation is in the “ballpark,” (do
not get wrapped around the axle on the numbers; it is the
gap elimination that provides the most value) – begin the
sensitivity analyses
 Identify opportunities for improvement, plug those back
into the sim, ascertain value of improvements
 Continue this process until gaps are eliminated or at
least reduced.
 This can include block improvement of overall
component failure rates – get the suppliers in on the act
(supplier partnerships)
 Ensure data from simulation is used in the design
process

22
Success Stories: NASA Instrument Design
 Validation of proper installation of sample cup retaining springs
on Sample Manipulation System to preclude workmanship
failures. (single ring failure would result in loss of solid sample
science)
 Use of physics of failure methods to identify and eliminate,
where possible, failure modes of Pyrolysis Oven.
 Implementation of HiPot test for Wide Range Pump motor to
eliminate workmanship related failures.
 Identification of Hall Effect Device on actuators as possible
Radiation Sensitive device. Subsequent testing validated
suitability of device.
 Identification of thermal switch on Gas Trap as Reliability
Issue. Redesign produced higher Reliability solution.
 FMEA of Gas Processing System provided justification for
addition of limited redundancy.
 Improved reliability of instrument by approximately 25% based in
initial predictions.

23
Complex Space Systems Application
 Predicated on effective
requirements
implementation
 Detailed RAM Plan
developed and
implemented at Program
Level
 RAM requirements, RAM
Plan flowed down to
systems, elements of
systems
 System owners
responsible for DFRAM,
but program will facilitate
and audit
 Program level analyses
including simulation, FMEA,
PRA being performed
 Verification and validation
will be program level
functions
 PRA will be part of flight
readiness decision
 Software included in DFRAM
activities (no longer black
box)
 System Engineering
organization partnering with
S&MA organization for RAM
implementation
23

24
SUMMARY
 Success of a system
predicated on intentional
implementation of DFRAM
 It will not happen
spontaneously
 Must be married with the
system engineering
process
 Program management
must be disciples – will
not work otherwise
 It is always easier and
more cost effective to do
it right the first time
 Implementation requires
people skills and a
service mentality
24

Sean carter dan_deans

More Related Content

What's hot

Similar to Sean carter dan_deans

More from NASAPMC

Recently uploaded

Sean carter dan_deans