Sean carter dan_deans

Constellation Reliability
Engineering Process –
Optimizing CxP Risk

Sean Carter, NASA JSC
Daniel Deans, ManTech SRS Technologies
Used with Permission 1

DFRAM Overview

 Why does reliability engineering exist?
 How does it fit within the life cycle?
 Success space vs. failure space
 Partnership on system engineering team
 The value of “designing-out” failure modes
 Where does it fit in the lifecycle?
 What are some of the tools?
 How are they applied?
 Real examples

2

Reliability Engineering Value - Clichés

 Failure is not an option…  Mr. Murphy tends to rear
his ugly head when you are
 A design engineer does not not expecting it…
know what he does not know
 What all this means is: You
 An extra set of eyes and ears have to work at it – nothing
is always good worth accomplishing
comes easy
 You have to spend money to
make money  Reliability engineering is a
discipline that adds value
to the systems engineering
process!

3

Typical System Engineering Lifecycle

4

Reliability Engineering Throughout Project Life

5

The Life Cycle Approach
 Reliability is best designed-in; System Engineering
Project Direction, Requirements System
Test and Assessment
Configuration Risk

it is, for the most part, not: Control, & Planning
• Management Plan
• Budget Development & Control
• Project Plan Development
• Schedule Development & Control
Compliance
• Specifications
• Verification
Analysis
• System, Element,
Subsystem Models
• System Performance
Analyses
Management
• Design Data Base
• Problem/Failure
Reports (PFR)
• Engineering Change
Orders
Management
• Risk Planning
• Risk Assessment
• Risk Handling/Mitigation
• Risk Monitoring

 Analyzed in System Concept
Project
Direction System Element
Exploration Data Reduction and
and Assessment

 Tested in Preliminary
Control
System
Design Integration Test

 Operated in
Element
Design Synthesis
Integration & Test

 Successful reliability performance
begins with a diligent, intentional Component Fabrication, Assembly,
Integrate, & Test

approach at the very beginning of a project
 Pre-phase A: requirements
 Phase A: allocation; plan; resources
 Phase B: analysis, design input, preliminary design review
 Phase C: detailed design inputs; more analysis; trade studies;
design verification; critical design review
 Phase D: test planning, test readiness, manufacturing, final
validation; flight readiness review
 Phase E/F: ops, growth, disposal and lessons learned

6 6

Success Space vs. Failure Space

 A design engineer thinks in success space (typically)
 How will the widget work?
 When it is designed, what function will it perform?
 What are the performance requirements?

 Reliability engineer paid to think in failure space
 How will the widget fail?
 What about the operating environment will cause issues?
 What materials, processes, and tools will accentuate failure modes?
 Is redundancy required
 Are there operational work-arounds?
 How will faults propagate through the system?
 What are the effects of a failure mode on the mission

 Superimpose the two processes, you get success!
7 7

Credibility: Partnership on
System Engineering Team

 Safety and Mission Assurance organization provides
discipline experts to support design teams
 Our job is to serve; not to inhibit
 We help the system engineering teams identify
hazards and failure modes and design them out
 Our sole reason for existing is to ensure
project/program success and to reduce/eliminate
operational risk
 We are partners for success
 The aim in partnership is to duplicate our knowledge
in the collective heads of our design-team partners
8 8

The Value of “Designing-Out” Failure Modes

 A failure mode is an obstacle to mission success

 Not all may cause mission failure, but, any failure of a
component has potential

 In the commercial world, a failure in the field costs 10 times
what it costs to mitigate in the design process

 In the space business, a failure can and will cost the
mission and quite possibly endanger people

 Identifying and designing-out failure modes is important!

Company Confidential 9

How Do We Design Out Failure Modes?
 Methodical process; starts in pre-phase A, follows the lifecycle.
 DMEDI – Define, Measure, Explore, Develop, Implement
(12 steps)
 Define requirements
 Allocate requirements
 Plan activities and analysis, including test and verification
 Collect data and develop data sources
 Use RAM simulation, FMEA, FTA, worst case analysis, derating,
proven design practices to drive the design
 Support design reviews and require improvement
 Verify and ensure that design will meet requirements
 Plan and implement thorough testing
 Finalize verification, ascertain flight readiness
 Identify reliability growth opportunities once design is complete
 Investigate and eliminate root causes to anomalies
 Develop lessons learned, provide feedback to future engineering teams

10 10

Pre-Phase A Concept
Development
 Very important part of process –
DFRAM starts here
 Develop requirements that will
optimize RAM for program/project
 Requirements include availability,
mean time to failure, fault tolerance,
mean time to repair, time to replace
 Import lessons learned from similar
programs/systems
 Collect similar system failure history
data
 Begin development of system model
 Begin development of RAM Plan

11

Phase A: Preliminary Analysis
 Refine requirements, negotiate
allocations with design elements
 Finalize RAM Plan and educate design
team on process; what role reliability
engineering team will fill
 Continue to develop preliminary model;
begin FMEAs, FTAs, Probabilistic
assessments
 Allocate requirements to lowest
design-to level
 Negotiate failure definitions, failure
budgets with design teams
 Identify initial critical items, compare with
lessons learned from previous systems
 Continue to identify data sources
 Identify critical suppliers; begin to form
partnerships
12

Phase B – Preliminary Design
 Continue to build simulation (model) and
add more details
 Identify most effective analyses tools to use
to drive design
 Complete preliminary FMEA, FTA, PRA
 Continue to develop supplier partnerships
 Prepare for preliminary design review
 Perform maintenance task analysis
 Identify design improvement initiatives and
optimize using simulation
 Perform other sensitivity studies based on
fault tolerance requirements
 Begin developing and finalizing FRACAS,
test plans, reliability growth strategy
 Partner with designers to identify failure
modes, design them out
 Support concept of operations optimization

13

Phase C – Detailed Design
 Perform detailed design analysis – PDR recovery
 Focus on pareto items identified from analyses (Top 10)
 Continue to develop and use RAM simulation, FMEA,
FTA, etc. to design out failure modes
 Use Con-Ops to develop operational work-arounds as
failure mode mitigation
 Finalize test plans –review for reliability success criteria
 Audit suppliers, provide support for reliability
improvement
 Mitigate schedule risks
 Finalize critical items, document for testing
 Begin life testing of components and subsystems as
feasible
 Perform specialized analysis (sneaks, fault propagation)
 Prepare for and support CDR

14

Phase D –Development
 Finalize design - CDR recovery, cut into
manufacturing
 Finalize FMEAs, FTAs, Simulations, CILs
 Support testing, root cause
investigations and corrective action
 Begin collection of failure and
operational history data (upon first
application of power)
 Finalize reliability growth strategy
 Develop and begin implementation of
reliability-centered maintenance
approach
 Make “last minute” improvements based
on test results
 Identify lessons learned and document
 Update Con-Ops with operational work-
arounds for critical items
15

Phase E/F – Ops and Disposal
 Continue to gather data, monitor
operations for anomalies
 Support failure analyses, root cause
investigations
 Implement reliability growth process,
identify areas for growth, design
solutions
 Document lessons learned
 Use simulation to validate reliability
growth strategy, sensitivities
 Update RAM Plan with lessons
learned
 Support system disposal via
identification of reliability challenges
to shutdown
16

What are the Tools?
 Some of the tools that we use are:
 Requirements allocation
 RAM simulation/probabilistic risk assessment
 FMEA/FMECA
 Fault tree analysis (FTA)/event tree assessment
 Parts stress analysis/derating
 Detailed design analysis
 Worst case analysis
 Redundancy screens
 Extensive testing and verification analysis
 Reliability growth planning and implementation
 Others….

17

Reliability and Maintainability Simulation
 A very powerful process
 Can help design out failure modes without cutting metal
 Provides for the Pareto Principle (20/80)
 Gives design team a tool for sensitivity analysis
 Allows for trying many different scenarios
 Helps to optimize the return on investment based on cost to
improve curve

KITC = Point on Curve where rise
KITC becomes less than run (reliability
Reliability

improvement = rise, cost to
Area of diminishing return improve = run)

High rate of return

$ Cost

18

Simulation Basics
 Simulations are built based on the system architecture
 Model provides for “RAM” characteristics of system
 Input data includes failure rates, repair times, sparing
information, logistics information, operational work-
arounds
 Simulation is run based on mission profiles
 “Monte Carlo” methodology is used
 Typically data is input using statistical distributions
 Outputs are system availability and cutsets (and other
failure “illuminators”)
 Cutsets lead to sensitivity analyses which in turn can
drive improvements (failure mode elimination)

19

RAM Simulation Example

 Simulation is dynamic, not static analysis
 Can provide much information about overall availability
of system under many different sets of conditions
 Today’s tools can include operational concepts and
rules, optimization of spares (some automatic)
 Requires specific input data

20

How Results are Used
 Outputs of baseline simulations are verified and
validated using expert elicitation
 Once all agree that the simulation is in the “ballpark,” (do
not get wrapped around the axle on the numbers; it is the
gap elimination that provides the most value) – begin the
sensitivity analyses
 Identify opportunities for improvement, plug those back
into the sim, ascertain value of improvements
 Continue this process until gaps are eliminated or at
least reduced.
 This can include block improvement of overall
component failure rates – get the suppliers in on the act
(supplier partnerships)
 Ensure data from simulation is used in the design
process

21

Success Stories: NASA Instrument Design
 Validation of proper installation of sample cup retaining springs
on Sample Manipulation System to preclude workmanship
failures. (single ring failure would result in loss of solid sample
science)
 Use of physics of failure methods to identify and eliminate,
where possible, failure modes of Pyrolysis Oven.
 Implementation of HiPot test for Wide Range Pump motor to
eliminate workmanship related failures.
 Identification of Hall Effect Device on actuators as possible
Radiation Sensitive device. Subsequent testing validated
suitability of device.
 Identification of thermal switch on Gas Trap as Reliability
Issue. Redesign produced higher Reliability solution.
 FMEA of Gas Processing System provided justification for
addition of limited redundancy.
 Improved reliability of instrument by approximately 25% based in
initial predictions.
22

Complex Space Systems Application
 Predicated on effective  Program level analyses
requirements including simulation, FMEA,
implementation PRA being performed
 Detailed RAM Plan  Verification and validation
developed and will be program level
implemented at Program functions
Level
 PRA will be part of flight
 RAM requirements, RAM readiness decision
Plan flowed down to
systems, elements of  Software included in DFRAM
systems activities (no longer black
box)
 System owners
responsible for DFRAM,  System Engineering
but program will facilitate organization partnering with
and audit S&MA organization for RAM
implementation

23

SUMMARY
 Success of a system  Program management
predicated on intentional must be disciples – will
implementation of DFRAM not work otherwise

 It will not happen  It is always easier and
spontaneously more cost effective to do
it right the first time
 Must be married with the
system engineering  Implementation requires
process people skills and a
service mentality

24

Sean carter dan_deans

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (20)

Similar to Sean carter dan_deans

Similar to Sean carter dan_deans (20)

More from NASAPMC

More from NASAPMC (20)

Recently uploaded

Recently uploaded (20)

Sean carter dan_deans