Decarbonising Commercial Real Estate: The Role of Operational Performance
Sean carter dan_deans
1. Constellation Reliability
Engineering Process –
Optimizing CxP Risk
Sean Carter, NASA JSC
Daniel Deans, ManTech SRS Technologies
Used with Permission 1
2. DFRAM Overview
Why does reliability engineering exist?
How does it fit within the life cycle?
Success space vs. failure space
Partnership on system engineering team
The value of “designing-out” failure modes
Where does it fit in the lifecycle?
What are some of the tools?
How are they applied?
Real examples
2
3. Reliability Engineering Value - Clichés
Failure is not an option… Mr. Murphy tends to rear
his ugly head when you are
A design engineer does not not expecting it…
know what he does not know
What all this means is: You
An extra set of eyes and ears have to work at it – nothing
is always good worth accomplishing
comes easy
You have to spend money to
make money Reliability engineering is a
discipline that adds value
to the systems engineering
process!
3
6. The Life Cycle Approach
Reliability is best designed-in; System Engineering
Project Direction, Requirements System
Test and Assessment
Configuration Risk
it is, for the most part, not: Control, & Planning
• Management Plan
• Budget Development & Control
• Project Plan Development
• Schedule Development & Control
Compliance
• Specifications
• Verification
Analysis
• System, Element,
Subsystem Models
• System Performance
Analyses
Management
• Design Data Base
• Problem/Failure
Reports (PFR)
• Engineering Change
Orders
Management
• Risk Planning
• Risk Assessment
• Risk Handling/Mitigation
• Risk Monitoring
Analyzed in System Concept
Project
Direction System Element
Exploration Data Reduction and
and Assessment
Tested in Preliminary
Control
System
Design Integration Test
Operated in
Element
Design Synthesis
Integration & Test
Successful reliability performance
begins with a diligent, intentional Component Fabrication, Assembly,
Integrate, & Test
approach at the very beginning of a project
Pre-phase A: requirements
Phase A: allocation; plan; resources
Phase B: analysis, design input, preliminary design review
Phase C: detailed design inputs; more analysis; trade studies;
design verification; critical design review
Phase D: test planning, test readiness, manufacturing, final
validation; flight readiness review
Phase E/F: ops, growth, disposal and lessons learned
6 6
7. Success Space vs. Failure Space
A design engineer thinks in success space (typically)
How will the widget work?
When it is designed, what function will it perform?
What are the performance requirements?
Reliability engineer paid to think in failure space
How will the widget fail?
What about the operating environment will cause issues?
What materials, processes, and tools will accentuate failure modes?
Is redundancy required
Are there operational work-arounds?
How will faults propagate through the system?
What are the effects of a failure mode on the mission
Superimpose the two processes, you get success!
7 7
8. Credibility: Partnership on
System Engineering Team
Safety and Mission Assurance organization provides
discipline experts to support design teams
Our job is to serve; not to inhibit
We help the system engineering teams identify
hazards and failure modes and design them out
Our sole reason for existing is to ensure
project/program success and to reduce/eliminate
operational risk
We are partners for success
The aim in partnership is to duplicate our knowledge
in the collective heads of our design-team partners
8 8
9. The Value of “Designing-Out” Failure Modes
A failure mode is an obstacle to mission success
Not all may cause mission failure, but, any failure of a
component has potential
In the commercial world, a failure in the field costs 10 times
what it costs to mitigate in the design process
In the space business, a failure can and will cost the
mission and quite possibly endanger people
Identifying and designing-out failure modes is important!
Company Confidential 9
10. How Do We Design Out Failure Modes?
Methodical process; starts in pre-phase A, follows the lifecycle.
DMEDI – Define, Measure, Explore, Develop, Implement
(12 steps)
Define requirements
Allocate requirements
Plan activities and analysis, including test and verification
Collect data and develop data sources
Use RAM simulation, FMEA, FTA, worst case analysis, derating,
proven design practices to drive the design
Support design reviews and require improvement
Verify and ensure that design will meet requirements
Plan and implement thorough testing
Finalize verification, ascertain flight readiness
Identify reliability growth opportunities once design is complete
Investigate and eliminate root causes to anomalies
Develop lessons learned, provide feedback to future engineering teams
10 10
11. Pre-Phase A Concept
Development
Very important part of process –
DFRAM starts here
Develop requirements that will
optimize RAM for program/project
Requirements include availability,
mean time to failure, fault tolerance,
mean time to repair, time to replace
Import lessons learned from similar
programs/systems
Collect similar system failure history
data
Begin development of system model
Begin development of RAM Plan
11
12. Phase A: Preliminary Analysis
Refine requirements, negotiate
allocations with design elements
Finalize RAM Plan and educate design
team on process; what role reliability
engineering team will fill
Continue to develop preliminary model;
begin FMEAs, FTAs, Probabilistic
assessments
Allocate requirements to lowest
design-to level
Negotiate failure definitions, failure
budgets with design teams
Identify initial critical items, compare with
lessons learned from previous systems
Continue to identify data sources
Identify critical suppliers; begin to form
partnerships
12
13. Phase B – Preliminary Design
Continue to build simulation (model) and
add more details
Identify most effective analyses tools to use
to drive design
Complete preliminary FMEA, FTA, PRA
Continue to develop supplier partnerships
Prepare for preliminary design review
Perform maintenance task analysis
Identify design improvement initiatives and
optimize using simulation
Perform other sensitivity studies based on
fault tolerance requirements
Begin developing and finalizing FRACAS,
test plans, reliability growth strategy
Partner with designers to identify failure
modes, design them out
Support concept of operations optimization
13
14. Phase C – Detailed Design
Perform detailed design analysis – PDR recovery
Focus on pareto items identified from analyses (Top 10)
Continue to develop and use RAM simulation, FMEA,
FTA, etc. to design out failure modes
Use Con-Ops to develop operational work-arounds as
failure mode mitigation
Finalize test plans –review for reliability success criteria
Audit suppliers, provide support for reliability
improvement
Mitigate schedule risks
Finalize critical items, document for testing
Begin life testing of components and subsystems as
feasible
Perform specialized analysis (sneaks, fault propagation)
Prepare for and support CDR
14
15. Phase D –Development
Finalize design - CDR recovery, cut into
manufacturing
Finalize FMEAs, FTAs, Simulations, CILs
Support testing, root cause
investigations and corrective action
Begin collection of failure and
operational history data (upon first
application of power)
Finalize reliability growth strategy
Develop and begin implementation of
reliability-centered maintenance
approach
Make “last minute” improvements based
on test results
Identify lessons learned and document
Update Con-Ops with operational work-
arounds for critical items
15
16. Phase E/F – Ops and Disposal
Continue to gather data, monitor
operations for anomalies
Support failure analyses, root cause
investigations
Implement reliability growth process,
identify areas for growth, design
solutions
Document lessons learned
Use simulation to validate reliability
growth strategy, sensitivities
Update RAM Plan with lessons
learned
Support system disposal via
identification of reliability challenges
to shutdown
16
17. What are the Tools?
Some of the tools that we use are:
Requirements allocation
RAM simulation/probabilistic risk assessment
FMEA/FMECA
Fault tree analysis (FTA)/event tree assessment
Parts stress analysis/derating
Detailed design analysis
Worst case analysis
Redundancy screens
Extensive testing and verification analysis
Reliability growth planning and implementation
Others….
17
18. Reliability and Maintainability Simulation
A very powerful process
Can help design out failure modes without cutting metal
Provides for the Pareto Principle (20/80)
Gives design team a tool for sensitivity analysis
Allows for trying many different scenarios
Helps to optimize the return on investment based on cost to
improve curve
KITC = Point on Curve where rise
KITC becomes less than run (reliability
Reliability
improvement = rise, cost to
Area of diminishing return improve = run)
High rate of return
$ Cost
18
19. Simulation Basics
Simulations are built based on the system architecture
Model provides for “RAM” characteristics of system
Input data includes failure rates, repair times, sparing
information, logistics information, operational work-
arounds
Simulation is run based on mission profiles
“Monte Carlo” methodology is used
Typically data is input using statistical distributions
Outputs are system availability and cutsets (and other
failure “illuminators”)
Cutsets lead to sensitivity analyses which in turn can
drive improvements (failure mode elimination)
19
20. RAM Simulation Example
Simulation is dynamic, not static analysis
Can provide much information about overall availability
of system under many different sets of conditions
Today’s tools can include operational concepts and
rules, optimization of spares (some automatic)
Requires specific input data
20
21. How Results are Used
Outputs of baseline simulations are verified and
validated using expert elicitation
Once all agree that the simulation is in the “ballpark,” (do
not get wrapped around the axle on the numbers; it is the
gap elimination that provides the most value) – begin the
sensitivity analyses
Identify opportunities for improvement, plug those back
into the sim, ascertain value of improvements
Continue this process until gaps are eliminated or at
least reduced.
This can include block improvement of overall
component failure rates – get the suppliers in on the act
(supplier partnerships)
Ensure data from simulation is used in the design
process
21
22. Success Stories: NASA Instrument Design
Validation of proper installation of sample cup retaining springs
on Sample Manipulation System to preclude workmanship
failures. (single ring failure would result in loss of solid sample
science)
Use of physics of failure methods to identify and eliminate,
where possible, failure modes of Pyrolysis Oven.
Implementation of HiPot test for Wide Range Pump motor to
eliminate workmanship related failures.
Identification of Hall Effect Device on actuators as possible
Radiation Sensitive device. Subsequent testing validated
suitability of device.
Identification of thermal switch on Gas Trap as Reliability
Issue. Redesign produced higher Reliability solution.
FMEA of Gas Processing System provided justification for
addition of limited redundancy.
Improved reliability of instrument by approximately 25% based in
initial predictions.
22
23. Complex Space Systems Application
Predicated on effective Program level analyses
requirements including simulation, FMEA,
implementation PRA being performed
Detailed RAM Plan Verification and validation
developed and will be program level
implemented at Program functions
Level
PRA will be part of flight
RAM requirements, RAM readiness decision
Plan flowed down to
systems, elements of Software included in DFRAM
systems activities (no longer black
box)
System owners
responsible for DFRAM, System Engineering
but program will facilitate organization partnering with
and audit S&MA organization for RAM
implementation
23
24. SUMMARY
Success of a system Program management
predicated on intentional must be disciples – will
implementation of DFRAM not work otherwise
It will not happen It is always easier and
spontaneously more cost effective to do
it right the first time
Must be married with the
system engineering Implementation requires
process people skills and a
service mentality
24