1
Reliability
October 26, 2004
2
Today
• DFDC (Design for a Developing Country)
• HW November 2
– detailed design
– Parts list
– Trade-off
• Midterm November 4
• Factory Visit November 16th
3
Midterm
• Presentation Purpose- a midcourse correction
– less than 15 minutes with 5 minutes discussion
– Approx. 7 power point slides- all should participate in
presentation
– Show what you have done
– Show what you are going to do
– Discuss issues, barriers and plans for overcoming
(procedural, team, subject matter, etc.
– Scored on originality, candor, thoughtfullness, etc. not
on total amount accomplished
– Schedule today from 1:00 to 4:00 (speaker at 4:00 PM)
4
Reliability
The probability that no (system) failure will occur
in a given time interval
A reliable system is one that meets the
specifications Do you accept this?
5
What do Reliability Engineers Do?
• Implement Reliability Engineering
Programs across all functions
– Engineering
– Research
– manufacturing
– Testing
– Packaging
– field service
6
Reliability as a Process module
• Reliability Goals
• Schedule time
• Budget Dollars
• Test Units
• Design Data
Reliability
Assurance
Module
Internal Methods
•Design Rules
•Components Testing
•Subsystem Testing
•Architectural Strategy
•Life Testing
•Prototype testing
•Field Testing
•Reliability Predictions
(models)
INPUT
Product
Assurance
7
Early product failure
• Strongest effect on customer satisfaction
– A field day for competitors
• The most expensive to repair
– Why?
– Rings through the entire production system
– High volume
– Long C/T (cycle time)
• Examples from GE (but problem not confined to GE!)
– GE Variable Power module for House Air Conditioning
– GE Refrigerators
– GE Cellular
8
Early Product Failure
• Can be catastrophic for human life
– Challenger, Columbia
– Titanic
– DC 10
– Auto design
– Aircraft Engine
– Military equipment
9
# of components
in Series
Component
Reliability =
99.999%
Component
Reliability =
99.99%
100 99.9 99.01
250 99.75 97.53
500 99.50 95.12
1000 99.01 90.48
10,000 90.48 36.79
100,000 36.79 0.01
Reliability as a function of System Complexity
Why computers made of tubes (or discrete transistors)
cannot be made to work
10
Three Classifications of
Reliability Failure
Type
• Early (infant mortality)
• Wearout (physical
degradation)
• Chance (overstress)
Old Remedy- Repair mentality
• Burn-in
• Maintenance
• In service testing
11
Bathtub Curve
Infant
Mortality
Useful life
No memory
No improvement
No wear-out
Random causes
Wear out
Failure Rate
#/million hours
Time
12
Reliability
Age
Prob
of dying
in the next
year
(deaths/
1000)
0
10
20
30
40
50
60
70
80
90
0
2
5
12
16
19
30
50
70
86
From the Statistical Bulletin 79, no 1, Jan-Mar 1998
13
Early failure causes or infant mortality
(Occur at the beginning of life and then disappear)
• Manufacturing Escapes
– workmanship/handling
– process control
– materials
– contamination
• Improper installation
14
Chance Failures
(Occur throughout the life a product at a constant rate)
• Insufficient safety factors in design
• Higher than expected random loads
• Human errors
• Misapplication
• Developing world concerns
15
Wear-out
(Occur late in life and increase with age)
• Aging
• degradation in strength
• Materials Fatigue
• Creep
• Corrosion
• Poor maintenance
• Developing World Concerns
16
Failure Types
• Catastrophic
• Degradation
• Drift
• Intermittent
17
Failure Effects
(What customer experiences)
• Noise
• Erratic operation
• Inoperability
• Instability
• Intermittent operation
• Impaired Control
• Impaired operation
• Roughness
• Excessive effort requirements
• Unpleasant or unusual odor
• Poor appearance
18
Failure Modes
• Cracking
• Deformation
• Wear
• Corrosion
• Loosening
• Leaking
• Sticking
• Electrical shorts
• Electrical opens
• Oxidation
• Vibration
• Fracturing
19
Reliability Remedies
• Early
• Wearout
• Chance
• Quality
manufacture/Robust
Design
• Physically-based
models, preventative
maintenance, Robust
design (FMEA)
• Tight customer linkages,
testing, HAST
20
Reliability
semi-empirical formulae
2
2
2
/
)
(
2
1
)
( 


M
T
e
T
f 


Wear out
Chance Failure
Early failure
T
m
T
e
m
e
T
f
1
1
)
(



 
 k =constant failure rate
m=MTBF

 )
(
1
2
1
)
(
)
( k
T
e
k
T
k
T
f 



 =pdf
21
Failures Vs time as a function of
Stress
High Stress
Medium Stress
Low Stress
22
Highly Accelerated Stress Testing
• Test to Failure
• Fix Failed component
• Continue to Test
• Appropriate for developing world?
23
Duane Plot
Reinertson p 237
Log
Failures
per 100
hours
Log Cumulative Operating Hours
x
x
xx x
x
xx
x
x
x
x
x
x
x
Actual Reliability
Required Reliability
at Introduction
Predicted
24
Integration into the Product Development Process
FMEA- Failure Modes and Effects Analysis
Customer
Requirements
Baseline
data from
Previous
Products
Brainstorm
potential failures
Summarize
results
(FMEA)
Update
FMEA
Baseline
data from
Previous
Products
Feed results
to Risk Assessment
Process
Use at
Design
Reviews
Develop Failure
Compensation
Provisions
Test Activity
Uncovers new
Failure modes
Failure prob-
through test/field
data
Probabilities
developed
through analysis
25
Risk Assessment process
Assess risk
• Program Risk
• Market Risk
• Technology Risk
– Reliability Risk
• Systems Integration Risk
Devise mitigation Strategy
Re-assess
26
Fault Tree analysis
Seal Regulator
Valve Fails
Valve Fails Open
when commanded
closed
Fails to meet
response time
Excessive
leakage
Regulates
High
Regulates
Low
Fails closed
when commanded
open
Excessive
hysteresis
Excessive
port leakage
Excessive
case leakage
Fails to meet
response time
Fails to meet
response time
1 5
4
3
2
6 7 8 9
Next
Page
27
Fault Tree analysis (cont)
Valve Fails Open
when commanded
closed
1
Valve Fails Open
when commanded
closed
Mechanical
Failure
Selenoid
Electrical
Failure of
Selenoid
Open
Circuit
Coil short
Insulation
Solder Joint
Failure
Wire
Broken
corosion Armature
seals
Material
selection
wear
Material
selection
Contamination
Valve
orientation
Insuff
filtering
Wire
Broken
Transient
electro mechanical
force
28
FMEA
29
FMEA Root Cause Analysis
30
Fault Tree Analysis-
example
Example: A solar cell driven LED
31
Reliability Management
• Redundancy
– Examples
• Computers
• memory chips?
• Aircraft
– What are the problems with this approach
• 1. Design inelegance
– expensive
– heavy
– slow
– complex
• 2. Sub optimization
– Can take the eye off the ball of improving component and system reliability by
reducing defects
– Where should the redundancy be allocated
• system
• subsystem
• board
• chip
• device
• software module
• operation
32
Other “best practices”
• Fewer Components
• Small Batch Size (why)
• Better material selection
• Parallel Testing
• Starting Earlier
• Module to systems test allocation
• Predictive (Duane) testing
• Look for past experience
– emphasize re-use
• over-design
– e.g. power modules
• Best: Understand the physics of the failure and model
– e.g. Crack propagation in airframes or nuclear reactors
33
Other suggestions?

Lecture 10 Reliability.ppt

  • 1.
  • 2.
    2 Today • DFDC (Designfor a Developing Country) • HW November 2 – detailed design – Parts list – Trade-off • Midterm November 4 • Factory Visit November 16th
  • 3.
    3 Midterm • Presentation Purpose-a midcourse correction – less than 15 minutes with 5 minutes discussion – Approx. 7 power point slides- all should participate in presentation – Show what you have done – Show what you are going to do – Discuss issues, barriers and plans for overcoming (procedural, team, subject matter, etc. – Scored on originality, candor, thoughtfullness, etc. not on total amount accomplished – Schedule today from 1:00 to 4:00 (speaker at 4:00 PM)
  • 4.
    4 Reliability The probability thatno (system) failure will occur in a given time interval A reliable system is one that meets the specifications Do you accept this?
  • 5.
    5 What do ReliabilityEngineers Do? • Implement Reliability Engineering Programs across all functions – Engineering – Research – manufacturing – Testing – Packaging – field service
  • 6.
    6 Reliability as aProcess module • Reliability Goals • Schedule time • Budget Dollars • Test Units • Design Data Reliability Assurance Module Internal Methods •Design Rules •Components Testing •Subsystem Testing •Architectural Strategy •Life Testing •Prototype testing •Field Testing •Reliability Predictions (models) INPUT Product Assurance
  • 7.
    7 Early product failure •Strongest effect on customer satisfaction – A field day for competitors • The most expensive to repair – Why? – Rings through the entire production system – High volume – Long C/T (cycle time) • Examples from GE (but problem not confined to GE!) – GE Variable Power module for House Air Conditioning – GE Refrigerators – GE Cellular
  • 8.
    8 Early Product Failure •Can be catastrophic for human life – Challenger, Columbia – Titanic – DC 10 – Auto design – Aircraft Engine – Military equipment
  • 9.
    9 # of components inSeries Component Reliability = 99.999% Component Reliability = 99.99% 100 99.9 99.01 250 99.75 97.53 500 99.50 95.12 1000 99.01 90.48 10,000 90.48 36.79 100,000 36.79 0.01 Reliability as a function of System Complexity Why computers made of tubes (or discrete transistors) cannot be made to work
  • 10.
    10 Three Classifications of ReliabilityFailure Type • Early (infant mortality) • Wearout (physical degradation) • Chance (overstress) Old Remedy- Repair mentality • Burn-in • Maintenance • In service testing
  • 11.
    11 Bathtub Curve Infant Mortality Useful life Nomemory No improvement No wear-out Random causes Wear out Failure Rate #/million hours Time
  • 12.
    12 Reliability Age Prob of dying in thenext year (deaths/ 1000) 0 10 20 30 40 50 60 70 80 90 0 2 5 12 16 19 30 50 70 86 From the Statistical Bulletin 79, no 1, Jan-Mar 1998
  • 13.
    13 Early failure causesor infant mortality (Occur at the beginning of life and then disappear) • Manufacturing Escapes – workmanship/handling – process control – materials – contamination • Improper installation
  • 14.
    14 Chance Failures (Occur throughoutthe life a product at a constant rate) • Insufficient safety factors in design • Higher than expected random loads • Human errors • Misapplication • Developing world concerns
  • 15.
    15 Wear-out (Occur late inlife and increase with age) • Aging • degradation in strength • Materials Fatigue • Creep • Corrosion • Poor maintenance • Developing World Concerns
  • 16.
    16 Failure Types • Catastrophic •Degradation • Drift • Intermittent
  • 17.
    17 Failure Effects (What customerexperiences) • Noise • Erratic operation • Inoperability • Instability • Intermittent operation • Impaired Control • Impaired operation • Roughness • Excessive effort requirements • Unpleasant or unusual odor • Poor appearance
  • 18.
    18 Failure Modes • Cracking •Deformation • Wear • Corrosion • Loosening • Leaking • Sticking • Electrical shorts • Electrical opens • Oxidation • Vibration • Fracturing
  • 19.
    19 Reliability Remedies • Early •Wearout • Chance • Quality manufacture/Robust Design • Physically-based models, preventative maintenance, Robust design (FMEA) • Tight customer linkages, testing, HAST
  • 20.
    20 Reliability semi-empirical formulae 2 2 2 / ) ( 2 1 ) (    M T e T f   Wear out Chance Failure Early failure T m T e m e T f 1 1 ) (       k =constant failure rate m=MTBF   ) ( 1 2 1 ) ( ) ( k T e k T k T f      =pdf
  • 21.
    21 Failures Vs timeas a function of Stress High Stress Medium Stress Low Stress
  • 22.
    22 Highly Accelerated StressTesting • Test to Failure • Fix Failed component • Continue to Test • Appropriate for developing world?
  • 23.
    23 Duane Plot Reinertson p237 Log Failures per 100 hours Log Cumulative Operating Hours x x xx x x xx x x x x x x x Actual Reliability Required Reliability at Introduction Predicted
  • 24.
    24 Integration into theProduct Development Process FMEA- Failure Modes and Effects Analysis Customer Requirements Baseline data from Previous Products Brainstorm potential failures Summarize results (FMEA) Update FMEA Baseline data from Previous Products Feed results to Risk Assessment Process Use at Design Reviews Develop Failure Compensation Provisions Test Activity Uncovers new Failure modes Failure prob- through test/field data Probabilities developed through analysis
  • 25.
    25 Risk Assessment process Assessrisk • Program Risk • Market Risk • Technology Risk – Reliability Risk • Systems Integration Risk Devise mitigation Strategy Re-assess
  • 26.
    26 Fault Tree analysis SealRegulator Valve Fails Valve Fails Open when commanded closed Fails to meet response time Excessive leakage Regulates High Regulates Low Fails closed when commanded open Excessive hysteresis Excessive port leakage Excessive case leakage Fails to meet response time Fails to meet response time 1 5 4 3 2 6 7 8 9 Next Page
  • 27.
    27 Fault Tree analysis(cont) Valve Fails Open when commanded closed 1 Valve Fails Open when commanded closed Mechanical Failure Selenoid Electrical Failure of Selenoid Open Circuit Coil short Insulation Solder Joint Failure Wire Broken corosion Armature seals Material selection wear Material selection Contamination Valve orientation Insuff filtering Wire Broken Transient electro mechanical force
  • 28.
  • 29.
  • 30.
  • 31.
    31 Reliability Management • Redundancy –Examples • Computers • memory chips? • Aircraft – What are the problems with this approach • 1. Design inelegance – expensive – heavy – slow – complex • 2. Sub optimization – Can take the eye off the ball of improving component and system reliability by reducing defects – Where should the redundancy be allocated • system • subsystem • board • chip • device • software module • operation
  • 32.
    32 Other “best practices” •Fewer Components • Small Batch Size (why) • Better material selection • Parallel Testing • Starting Earlier • Module to systems test allocation • Predictive (Duane) testing • Look for past experience – emphasize re-use • over-design – e.g. power modules • Best: Understand the physics of the failure and model – e.g. Crack propagation in airframes or nuclear reactors
  • 33.