Reliability Tools and Integration Seminar

RELIABILITY
INTEGRATION
TOOLS

RELIABILITY INTEGRATION

“the process of seamlessly
cohesively integrating reliability
tools together to maximize
reliability and at the lowest
possible cost”

Reliability vs. Cost
♦ Intuitively, one recognizes that there is some
minimum total cost that will be achieved when an
emphasis in reliability increases development and
manufacturing costs while reducing warranty and
in-service costs. Use of the proper tools during the
proper life cycle phase will help to minimize total
Life Cycle Cost (LCC).

CRE Primer by QCI, 1998

Reliability vs. Cost, continued
TOTAL
COST
OPTIMUM CURVE
COST
POINT RELIABILITY
PROGRAM
COSTS
COST

WARRANTY
COSTS

RELIABILITY


In order to minimize total Life Cycle Costs (LCC),
a Reliability Engineer must do two things:
♦ choose the best tools from all of the tools
available and must apply these tools at the proper
phases of a product life cycle.
♦ properly integrate these tools together to assure
that the proper information is fed forward and
backwards at the proper times.


As part of the integration process, we must
choose a set of tools at the heart of our program
in which all other tools feed to and are fed from.
The tools we have chosen for this are:

Reliability Goals & Metrics
and
HALT and HASS

Reliability Definition
♦ Reliability is often considered quality over time

♦ Reliability is the probability of a product
performing its intended function over its specified
period of usage, and under specified operating
conditions, in a manner that meets or exceeds
customer expectations.

Reliability Goals & Metrics Summary
♦ Reliability Goals & Metrics tie together all stages
of the product life cycle. Well crafted goals
provide the target for the business to achieve,
they set the direction.

♦ Metrics provide the milestones, the “are we there,
yet”, the feedback all elements of the organization
needs to stay on track toward the goals.

♦ A reliability goal includes each of the five elements
of the reliability definition.
• Probability of product performance
• Intended function
• Specified life
• Specified operating conditions
• Customer expectations

♦ A reliability metric is often something that
organization can measure on a relatively short
periodic basis.

• Predicted failure rate (during design phase)
• Field failure rate
• Warranty
• Actual field return rate
• Dead on Arrival rate

HALT and HASS Summary
♦ Highly Accelerated Life Testing (HALT) and Highly
Accelerated Stress Screening (HASS) are two of the
best reliability tools developed to date, and every
year engineers are turning to HALT and HASS to
help them achieve high reliability.

HALT and HASS Summary, continued
♦ In HALT, a product is introduced to progressively
higher stress levels in order to quickly uncover
design weaknesses, thereby increasing the
operating margins of the product, translating to
higher reliability.
♦ In HASS, a product is “screened” at stress levels
above specification levels in order to quickly
uncover process weaknesses, thereby reducing the
infant mortalities, translating to higher quality.

Presentation Objective

This presentation shall review the best reliability
tools to use in conjunction with Reliability Goals
& Metrics and HALT & HASS, plus how to
integrate them together.

RELIABILITY
INTEGRATION
TOOLS

Reliability Integration Tools - Summary
♦ For the Organization
• Reliability Integration for the Organization - Tools
that are used across an organization in order to
define the reliability requirements and policy of a
program.
• The output of this phase is the Reliability Program
goals, metrics and policies. This structure will
guide the development of reliability plan for specific
products. This is the approach and business
connections that guides the rest of the program.

♦ PHASE I: Concept Phase
• Reliability Integration in the CONCEPT Phase -
Tools that are used in the concept phase of a
project in order to define the reliability requirements
of a program. Benchmarking is usually required.
• The output of this phase is the Reliability Program
and Integration Plan. This plan will specify which
tools to use and the goals and specifications of
each. This is the plan that drives the rest of the
program.

♦ PHASE II: Design Phase
• Reliability Integration in the DESIGN Phase - Tools
that are used in the design phase of a project after
the reliability has been defined.
• Predictions and other forms of reliability analysis
are performed here.
• These tools will only have an impact on the design
if they are done very early in the design process.

♦ Phase III: Prototype Phase
• Reliability Integration in the PROTOTYPE Phase -
Tools that are used after a working prototype has been
developed.
• This represents the first time a product will be tested.
• The testing will mostly be focused at finding design
issues

♦ Phase IV: Manufacturing Phase
• Reliability Integration in the MANUFACTURING
Phase - Tools here are a combination of analytical
and test tools that are used in the manufacturing
environment to continually assess the reliability of
the product.
• The focus here will be mostly at finding process
issues

RELIABILITY
INTEGRATION IN
THE CONCEPT
PHASE

Reliability Integration in the
CONCEPT Phase
♦ Reliability Goal-Setting
♦ Review of Current Capabilities
♦ Gap Analysis
♦ Reliability Program and Integration Plan

Reliability Goal-Setting
♦ Reliability Goals can be derived from
• Customer-specified or implied requirements
• Internally-specified or self-imposed requirements
(usually based on trying to be better than previous
products)
• Benchmarking against competition

Reliability Goal-Setting
♦ Reliability Goals – Which Should We Use ?
• Customer-specified or implied requirements ?
• Internally-specified or self-imposed requirements ?
• Benchmarking ?
♦ For Best Results, Use All Three !

Review of Current Capabilities
♦ Once we have defined our goals, we must
understand our current capabilities. This will be
used to define the Gap
• Interviews (may be same people as during goal-
setting)
• Review documents – plans, reports
• Review field data

Gap Analysis
♦ The Gap Analysis naturally flows from the
Reliability Goal-Setting exercise and Review of
Current Capabilities.
• Once we understand what is expected of the
product in the industry, we must then compare that
with current capabilities, and this becomes the Gap
Analysis.

Gap Analysis
♦ Measuring Size of Gap
♦ Determining if Gap is Attainable
♦ Adjusting Goals if Gap Is Unrealistically High

Gap Analysis
♦ Measuring the Size of Gap

Gap = Goals – Current Capabilities

Reliability Program and Integration Plan
♦ A Reliability Program and Integration Plan is crucial
at the beginning of the product life cycle because in
this plan, we define:
• What are the overall goals of the product and of each
assembly that makes up the product ?
• What has been the past performance of the product ?
• What is the size of the gap ?
• What reliability elements/tools will be used ?
• How will each tool be implemented and integrated to
achieve the goals ?
• What is our schedule for meeting these goals ?

Reliability Program and Integration Plan –
Plan Execution

♦ Now it is time to execute the Reliability Program
and Integration Plan.
♦ Each element of the plan will call for a different
reliability tool.

THE REMAINDER OF THE PRESENTATION WILL
REVIEW EACH TOOL AND HOW TO INTEGRATE
IT TOGETHER WITH THE OTHER TOOLS.

RELIABILITY
INTEGRATION IN
THE DESIGN
PHASE

DESIGN Phase
• Reliability Modeling and Predictions
• Derating Analysis
• Failure Modes, Effects, and Criticality Analysis
(FMECA)
• Design of Experiments
• Fault Tree Analysis
• Stress-Strength Analysis
• Tolerance and Worst-Case Analysis
• Human Factors Analysis
• Maintainability and Preventive Maintenance

RELIABILITY
MODELING
AND PREDICTIONS

Reliability Modeling and Predictions, cont.



Reliability Prediction: Definition
♦ A reliability prediction is a method of calculating the
reliability of a product or piece of a product from the
bottom up - by assigning a failure rate to each
individual component and then summing all of the
failure rates.

Reliability Predictions
♦ help assess the effect of product reliability on the quantity of spare units
required which feeds into the life cycle cost model.

♦ provide necessary input to system-level reliability models (e.g. frequency of
system outages, expected downtime per year, and system availability).

♦ assist in deciding which product to purchase from a list of competing products.

♦ is needed as input to the analysis of complex systems to know how often
different parts of the system are going to fail even for redundant components.

♦ can drive design trade-off studies. For example, we can compare a design with
many simple devices to a design with fewer devices that are newer but more
complex. The unit with fewer devices is usually more reliable.

♦ set achievable in-service performance standards against which to judge actual
performance and stimulate action.

Reliability Modeling and Predictions:
How to Use in Preparation for HALT and
HASS, continued

♦ Reliability Modeling and Predictions can be used for:
• Identifying Thermocouple Locations
• Revealing technology-limiting components
• Calculating amount of HASS needed

HASS, continued
♦ Reliability Modeling and Predictions can be used to
identify thermocouple locations
• For temperature stresses, many component
temperatures will be measured during HALT; therefore,
a quick analysis is helpful prior to choosing
thermocouple locations. This analysis will reveal which
component types are more sensitive to temperature
from a reliability perspective, and used in conjunction
with some basic thermal analysis tools, the
temperature gradients of a product can easily be
modeled. This analysis, when used properly during the
setup of a HALT, can be a very powerful tool in
planning out the discovery of the upper thermal
operating limit and the upper thermal destruct limit.

HASS, continued

♦ Reliability Modeling and Predictions reveal
technology-limiting components
• Reliability predictions can also reveal technology-
limiting components – components that are much
more sensitive to external stresses due to the
technology being used (e.g. opto-electronics are very
sensitive to high temperature)

HASS, continued
♦ Reliability Modeling and Predictions can be used
to calculate the amount of HASS needed
• After HALT is complete, the effects of the first year
multiplier factor on the reliability prediction will play
a big part in helping to determine the HASS profile
because the first year multiplier factor is derived
from the amount of “effective” screening being
performed, and HASS is probably the most
effective type of screening developed to date.

Reliability - Reliability Modeling/Prediction Flow

Work with Product
Develop Models Work with
Work with Architects to Perform Reliability
Based on Hardware
Marketing to Implement Predictions to
Redundancies and Engineering During
Develop Reliability/ Redundancies Determine if Design
Perform Reliability Board Design using
Availability Targets where Necessary to Stays within
Allocations for each Reliability
from Requirements Meet Reliability Targets
Subassembly Allocations
Targets

Yes

Work with
Reliability

Can Model or Hardware Publish Reliability
Does Design Stay Evaluate Model and Results
No Requirements No Engineering to
within Targets? Requirements
Change? Make Changes to
Product

Yes

Use Results as
Input for
Use Results as
Reliability
Input for HALT
Demonstration
Testing

DERATING
ANALYSIS

DERATING ANALYSIS

Derating is defined as
♦ Using an item in such a way that applied
stresses are below rated values, or
♦ The lowering of the rating of an item in one
stress field to allow an increase in rating in
another stress field.


DERATING ANALYSIS: How to Use in
Preparation for a HALT

♦ How to use a Derating Analysis in Preparation for a
HALT
• For electrical stresses, design engineers typically
follow derating guidelines, but there are times when
these guidelines are violated, either by exception or
by mistake. Reliability predictions can quickly catch
these violations and determine the impact on the
reliability of the component and on the product. This
is important input when planning a HALT, specifically
to determine which electrical stresses to apply as
accelerant stresses, and by how much.

Failure Modes, Effects,
and Criticality Analysis
(FMECA)

FMECA, continued
♦ A FMECA is a systematic technique to analyze a
system for all potential failure modes. Each failure
mode is scored as follows:
• Probability that the failure mode will actually occur
• Severity of the failure on the rest of the system
• Detectability of the failure mode when it occurs
♦ The criticality portion of this method allows us to
place a value or rating on the criticality of the failure
effect on the entire system or user.
♦ Following the scoring of each failure mode, we
prioritize and then provide mitigations for the top
failure modes and then rescore.


FMECA: How to Use in Preparation for a
HALT

♦ FMECA’s can be used for:
• Identifying failure modes that HALT is likely to uncover
• Identifying failure modes that require extra planning to
find
• Identifying non-relevant failure modes
• Helping to identify the number of samples

HALT, continued
♦ FMECA’s can identify failure modes that HALT is
likely to uncover
• A FMECA will identify failure modes that are likely to
be found in HALT so that an estimation can be made
on how long the HALT will take.
• A FMECA can also help in identifying the effects of
certain failure modes so that they can be easily
isolated when they occur, saving troubleshooting time
during the HALT.

HALT, continued

♦ FMECA’s can identify failure modes that require
extra planning to find during HALT
• One of the first steps in planning a HALT is to
determine which stresses to apply and what test
routines and monitoring techniques to use. A FMECA
can identify failure modes that may be difficult to find,
requiring special types or sequences of stresses as
well as how to exercise the product so that these types
of failure modes are being looked for.
• Conversely, a FMECA will also identify failure modes
that are not critical so that special time is not devoted
on developing test routines to find these types of
failures.

HALT, continued
♦ FMECA’s can identify non-relevant failure modes
• Some failure modes for a product may not be
relevant. A carefully performed FMECA can help
identify these so that the HALT plan can attempt to
avoid re-discovering these. This will save a lot of time
and money, as well as embarrassment (nothing
affects the credibility of a HALT program more than
having design engineers chasing non-relevant failure
modes).

HALT, continued
♦ FMECA’s can identify non-relevant failure modes,
cont.
• Having found all non-relevant failure modes is also
very important when setting up a HASS. We may find
that the product survived a specific temperature or
vibration level during HALT, but degradation due to
wearout may be taking place. Even a sophisticated
proof-of-screen may not be able to catch some
wearout mechanisms. If this occurs, HASSing the
product may wind up partially degraded parts being
shipped to the field.

HALT, continued
♦ FMECA’s can identify non-relevant failure modes,
cont.
WEAROUT EXAMPLE
An example of this type of wearout failure occurs with
optical components due to heat. Many optical
components wear out quickly when exposed to
excessive temperatures. For HALT, If we know about
this when writing our HALT Plan, we can put stop points
in our high temperature testing so that we avoid finding
these. Another approach is to still go to and beyond
these levels, but to disregard these failures when found
and concentrate on other more relevant failures at or
above these levels. For HASS, we must be aware of
these so that we do not start causing wearout to occur.

HALT, continued
♦ FMECA’s help identify the number of samples
needed for HALT
• In determining which stresses to apply, the FMECA
will identify the major modes of failure and the
sensitivity to each over different stresses. Then,
based on the number of failure modes identified, an
estimation can be made as to how many failure
modes will be uncovered during HALT, thereby
helping to determine the number of samples needed
for the test.

DESIGN OF
EXPERIMENTS

Design of Experiments (SDE)

Traditional experiments focus on one or two
factors at a couple of levels and try to hold
everything else constant (which is impossible to
do in a complex process).

When SDE is properly constructed, it can focus
on a wide range of key input factors or variables
and will determine the optimum levels of each of
the factors.


Design of Experiments: When to Use in
Conjunction with HALT and HASS

♦ When to use Design of Experiments in conjunction
with HALT and HASS
• When identifying how to combine different stresses for
HALT or HASS
• When trying to justify using combined stresses for
HASS
• When troubleshooting HALT or HASS failures

Conjunction with HALT and HASS, cont.

♦ Design of Experiments can be used when identifying
how to combine different stresses for HALT or HASS
• It may be worth running a Design of Experiments with
different stresses to find which stresses combined
together are best at finding defects. There are papers
published on which work best together, but the results
may differ for some types of products, in which case
running a small Design of Experiments can help.


♦ Design of Experiments can be used when trying to
justify using combined stresses for HASS
• When trying to justify implementing HASS, it may be
worth running a Design of Experiments with different
stresses to find which stresses to use. This can help
determine the return on investment if capital
equipment must be purchased. With most products, a
combined environment chamber is the best equipment
to use, but some companies have been able to justify
running a single stress HASS using existing
equipment without compromising the effectiveness of
the screen.


♦ Design of Experiments can be used when
troubleshooting HALT or HASS failures
• Some failures require detailed failure analysis, and
Design of Experiments is a specific type of failure
analysis tool that can be deployed when trying to
determine the cause of a failure.


♦ Design of Experiments can be used when
troubleshooting HALT or HASS failures, continued
EXAMPLE
A failure of an IC occurs in HASS ramping up to 80oC
while power cycling and modulating the vibration from
3 to 5 Grms. A second failure of the same IC occurs
ramping down from 80oC while power cycling and
modulating the vibration from 5 to 3 Grms. Which
stress(es) contributed to the failure? We don’t have
an unlimited sample of units at our disposal to figure
this out. A mini Design of Experiments can help solve
this for us.

FAULT TREE ANALYSIS

Fault Tree Analysis

♦ The fault tree analysis uses the concepts of
logic gates to determine the overall reliability
of a system.
• When we perform an FTA, we start with an
undesired event. The undesired event
constitutes the top event in a fault tree diagram.
• We then brainstorm (just like the FMEA) as to
the possible failure modes that can result in this
undesired effect.
♦ Fault tree analysis is also used in assessing
potential system failure modes.


Fault Tree Analysis: When to Use in

♦ When to Use Fault Tree Analysis (FTA) in
• During HALT planning
• During failure analysis


♦ Using FTA’s during failure analysis
• FTA’s are a powerful failure analysis tool after a
failure occurs to help identify the cause of the failure.
In many cases, troubleshooting can isolate which
component failed but an FTA is needed to determine
what caused the failure. Even if we know what
stresses we were applying at the time, we may not
know what ultimately caused the failure.


♦ Using FTA’s during failure analysis, continued
EXAMPLE
A device has an internal short during vibration. After
performing an FTA, it was discovered that the
vibration caused momentary surges of the power
supply and the power supply surge ultimately
damaged the component.

STRESS-STRENGTH
ANALYSIS

Stress-Strength Analysis
In the most basic terms, an item fails when the
applied stress exceeds the strength of the
item. In general, designers design for a
nominal strength and a nominal stress that will
be applied to an item. One must also be aware
of the variability about the stress and strength
nominals.


Stress-Strength Analysis: How to Use in
♦ How to use Stress-Strength Analysis in Conjunction
with HALT and HASS
• When HALT discovers a failure with little margin
• When HALT discovers an inconsistent margin

♦ Using Stress-Strength Analysis after HALT
In the example below, the upper operating limit (strength) does not have
enough margin as compared with the upper product spec (stress being
applied by customer may slightly exceed specs), and failures can occur
within the operating range if the margin is not enough.
Lower Lower Upper Upper
Destruct Operating Operating Destruct
Limit Product Limit
Limit Limit
Specs
Destruct Margin Destruct Margin
Operating Operating
Margin Margin
Problem
Area

Stress

♦ Using Stress-Strength Analysis after HALT, continued
Many times, margins will vary from one sample to the next. If there is
enough variability, then products with margins that appear to be adequate
may still fail because of the shape or distribution of the strength curve.

Lower Lower Upper Upper
Destruct Operating Operating Destruct
Product
Limit Limit Limit Limit
Specs
Destruct Margin Destruct Margin
Operating Operating
Margin Margin

Problem
Area

Stress

TOLERANCE AND WORST
CASE ANALYSIS

Tolerance and Worst Case Analysis

Another method of evaluating the design
reliability is to analyze the design assuming
worst case. That is, assuming that the
components are at the extreme in tolerance,
environmental or operating conditions.


Tolerance and Worst Case Analysis:
How to Use in Conjunction with HALT
♦ How to use Tolerance and Worst Case Analysis in
conjunction with HALT
• Do not over-design - HALT will catch most tolerance
issues
• Use Tolerance and Worst Case Analysis in critical
areas that have wide tolerance spreads

How to Use in Conjunction with HALT,cont.
♦ Do not over-design – HALT will catch most tolerance
issues
• Many engineers make the mistake of over-designing
the entire product. A better approach is to design the
product using basic design guidelines and then let
HALT point out the weaknesses of the product. If
HALT is performed early enough, then a Tolerance
Analysis can be run on failure areas, and these areas
can be redesigned stronger without impacting
schedule. This will save costs because only a small
portion of the product will then be “over-designed”

How to Use in Conjunction with HALT,cont.
♦ Use Tolerance and Worst Case Analysis in critical
areas that have wide tolerance spreads
• If a part of the design has a wide tolerance spread,
then issues related to tolerance may not be picked up
in HALT due to the low sample size being used. For
these cases, then worst-case design practices may
need to be employed.
• Many mechanical assemblies fall into this category. If
issues can arise due to tolerance stacking of
dimensions, designing for worst case is the best
approach.

HUMAN FACTORS
ANALYSIS

Human Factors Analysis
♦ Human Factors Considerations must be reviewed in
each design for:
• Safety
• Workmanship
• Maintainability

♦ Depending on the product type and user interface,
the scope of this task can vary dramatically.

Human Factors Analysis: How to use in
Planning for HALT and HASS

♦ How to use Human Factors Analysis in planning for
HALT and HASS
• Use/Abuse Conditions Added to HALT Plan
• Human Factors Analysis Can Find Manufacturing
Variability before HASS Catches Them

Planning for HALT and HASS, continued
♦ Human Factors Analysis can pinpoint use/abuse
conditions so that they can be added to the HALT
plan
• In products with high user interface, use/abuse
scenarios must be considered. This can lead to
additional stresses and tests required.
EXAMPLE
On a medical product that was intended to be carried
around in a purse, two protocols were developed and
added to the HALT Plan:
• the possibility of a sharp object accidentally poking
into the side of the product.
• the possibility of lipstick coming in contact with the
product.

Planning for HALT and HASS, continued

♦ Human Factors Analysis Can Find Manufacturing
Variability before HASS Catches Them
• One of the goals of a Human Factors Analysis is to
make the product easier to manufacture. Variability in
manufacturing processes are easily detected in
HASS, but if found in HASS, the issues are more
expensive to fix. If there are too many variability
issues, HASS is liable to miss some. Therefore, a
good Human Factors Analysis on the manufacturing
process can help increase the throughput during
HASS.

MAINTAINABILITY
AND
PREVENTIVE MAINTENANCE

Maintainability and Preventive
Maintenance, continued

Maintainability is a function of the design cycle with
the focus on providing a system design that
contributes to the ease of maintenance and lowest
life cycle cost.
• Maintainability must be applied early because it can
drive both the mechanical and in some cases the
electrical design.
• A maintainability prediction is a calculation of the
average amount of time a product will be in repair
once a failure occurs. This is a function of isolation
time, repair time, and checkout time.


Maintenance, continued

Preventive maintenance (PM) has the function of
prevention of failures via planned or scheduled
efforts. PM can be based on:
• scheduled service for cleaning.
• service for lubricating.
• detection of early signals of problems.
• replacement after specific length of use.


Maintenance: How to Use in Conjunction
with HALT and HASS
♦ How to use Maintainability and Preventive
Maintenance in conjunction with HALT and HASS
• Performing HASS on spares
• Being prepared for maintaining system during HALT

with HALT and HASS
♦ Performing HASS on spares in conjunction with
Preventive Maintenance
• Performing Preventive Maintenance/parts replacement
on subsystems that have no wearout mode, or too
soon before the subsystem goes into wearout mode
can actually reduce the reliability because we will be
taking out a part in its steady state failure period
(bottom of bathtub curve) and replacing it with one at
the infant mortality period (left-most part of bathtub
curve). One way around this is to perform HASS on
the subsystem prior to shipping as a spare.

with HALT and HASS
♦ Maintainability Analysis prior to HALT can reveal how
to diagnose and repair the product during HALT
• When planning a HALT, a maintainability analysis will
indicate what equipment is needed to diagnose and
repair different types of failure modes. This will save
the HALT engineer a lot of time and it may even cause
changes in the test plan to try to avoid discovering
these types of failure modes if adequate resources are
not available to help fix the failure (or postponing
discovery of the failure modes until which time the
resources are available).

RELIABILITY
INTEGRATION IN
THE PROTOTYPE
PHASE

PROTOTYPE Phase

• Highly Accelerated Life Testing (HALT)
• Failure Reporting, Analysis and Corrective Action
System (FRACAS)
• Reliability Demonstration Test

HIGHLY ACCELERATED
LIFE TESTING (HALT)

HALT: How to Perform HALT in
Conjunction with the Reliability Tools
♦ How to Perform HALT in Conjunction with the other
Reliability Tools
• Planning for a HALT
• Using results from the Modeling and Predictions, FMECA,
and Derating Analyses to help develop the HALT Plan
• Executing the HALT
• Using a FRACAS for root cause analysis on each failure
• Using the HALT Results
• Using the HALT results to help plan the RDT
• Using the HALT results to help plan HASS
All of these are discussed in more detail in the specific section for that
tool at the end of the section.

HALT Flow Chart

Reliability - Highly Accelerated Life Testing (HALT) Flow

Use Reliability
Modeling/
Derating Data as
Input

Perform a Failure Research
Perform HALT, Taking Evaluate Failures/
Modes and Effects Environmental
Product Outside Weaknesses and
Analysis (FMEA) to Limitations on All
Environmental and Fix Those That Are
Determine "Exotic"
Performance Specs to Relevant and Cost-
Weakpoints in a Technologies Being
Find Weakpoints Effective
Design Used
Reliability

Send failure
information to
FRACAS

Publish Results
Are Margins Yes
Retest Product to
Acceptable for
No Determine New
Reliability
Limits
Reqts?

Use Results to
Use Results to Develop a
Develop a HASS Reliability
Profile Demonstration
Test

FAILURE REPORTING,
ANALYSIS, AND
CORRECTIVE ACTION
SYSTEM (FRACAS)

FRACAS

♦ This is also sometimes referred to as Closed
Loop Corrective Action (CLCA) or Corrective
and Preventive Action (CAPA).

♦ The purpose of the FRACAS is to provide a
closed loop failure reporting system,
procedures for analysis of failures to
determine root cause, and documentation for
recording corrective action.


FRACAS: How to use in conjunction with
a HALT
♦ When performing HALT, failures are identified and
each must be taken to root cause. FRACAS is the
perfect tool for this. A FRACAS can:
• Help classify failures as to their relevancy
• Help choose the appropriate analysis tool
• Keep track of the progress on each open issue
• Help communicate results with other departments and
outside the company

FRACAS: How to use in conjunction
with a HALT, continued
♦ A FRACAS can help classify failures as to their
relevancy
• During HALT, many failures are likely to be
uncovered. However, not all failures will be relevant.
The FMECA process will find many of these non-
relevant failures, but for those that are first found in
HALT, a FRACAS will help make the determination of
the relevancy by use of a variety of tools.

♦ When performing a failure analysis, there are many
tools that can be helpful. Some of these are:
• Fault Tree Analyses (FTA’s)
• Fishbone diagrams
• Pareto charts
• Designs of Experiments
• Tolerance Analyses

♦ A FRACAS can keep track of the progress on each
open issue
• Each failure is assigned a unique FRACAS Report ID
• Each report requires detailed information about the
corrective action and must be signed off
• During critical stages in a project, regular FRACAS
review meetings are typically held

♦ A FRACAS can help communicate results with other
departments and outside the company
• FRACAS databases are typically kept on a network
drive for general viewing
• FRACAS can be sent to a vendor to track failure
analysis
• FRACAS can be used to communicate with customers
on product development or field issues

FRACAS Flow Chart
Reliability - Failure Reporting Analysis and Corrective Action System (FRACAS) Flow

Trend Failure Failure
Discovered in Discovered Discovered
Repair in HALT in HASS
Center Process Process

Develop Failure
Analysis Plan for Contact Customer Send Sample of
Specific Failure or Supplier (if Failure Back to
Gather Failure Analyze Failure to
Including Resource appropriate) to Component
Information Root Cause
Plan Inform Them of Manufacturer (if
appropriate)
Reliability

Plan

Report Findings
Duplicate Failure, if and Implement Did Solution Fix
Recommendations Test Solution
possible Corrective Action Problem?

Yes
No

Report Solution and Monitor
Close Failure Effectiveness of Modify HASS
Analysis Solution / Perform Profile, if necessary
Verification HALT

RELIABILITY
DEMONSTRATION
TESTING (RDT)

RDT: What is it?
♦ A sample of units are tested at accelerated
stresses for several months.
♦ The stresses are a bit lower than the HALT
stresses and they are held constant (or cycled
constantly) rather than gradually increasing.
♦ This enables us to calculate the acceleration factor
for the test.
♦ The RDT can be used to validate the reliability
prediction analyses.
♦ It is also useful in finding failure modes that are
not easily detected in a high time compression test
such as HALT.

RDT, continued


RDT, continued


RDT: Hypothesis Testing
♦ Testing if two means are equal

H o : µ = µo
H a : µ > µo

n =σ2
(z α + zβ )
2

∆2

RDT: Type I and Type II Errors

Null Hypothesis

Decision True False

Reject Ho Type I error Correct
α 1-β

Accept Ho Correct Type II error
1- α β

RDT: Success Testing

ln (1 − C )
n=
ln R L

RDT: Accelerated Life Testing

2
K σ 
n = R 'V  γ
 w
R' is the ratio of the Meeker Hahn variance over the optimum variance
V is the optimum variance factor
Kγ is the standard normal100(1+ γ )/2 percentile
σ is the standard deviation (1/β if Weibull)
w is distance to true value

RDT: Sample Size Calculation
Needed sample size giving approximately a 50% chance of having
a confidence interval factor for the 0.2 quantile that is less than R
weibull Distribution with eta= 1573 and beta= 1.5
Test censored at 2160 Time Units with 80 expected percent failing
2000
1000
500

200
Sample Size

100
50

20
10 99%

5 95%
90%

2 80%

1.0 1.5 2.0 2.5 3.0 3.5
Confidence Interval Precision Factor ROct 23 11:04:43 PDT 2004
Sat

RDT: How to Use the Results of HALT in
Planning an RDT
♦ Two of the most important pieces of information to
decide upon when planning an RDT is which
stresses to apply and how much. From this, we can
derive the acceleration factor for the test. HALT can
help with both of these.
• HALT will identify the effects of each stress on the
product to determine which are most applicable.
• HALT will identify the margins of the product with
respect to each stress. This is critical so that the
highest amount of stress is applied in the RDT to gain
the most acceleration without applying too much,
possibly causing non-relevant failures.

RDT: How to Use the Results of
Reliability Predictions in Planning an RDT

♦ Another key factor in planning an RDT is the goal of
the test. This is usually driven by marketing
requirements, but the Reliability Prediction will help
determine how achievable this is
• Although the prediction may not be able to give an
exact MTBF number, it will give a number close
enough to help determine how long of an RDT to run
and what type of confidence in the numbers to expect.
• Many times, the reliability of the product will far
exceed initial marketing requirements. If this is the
case, the RDT can be planned to try to prove these
higher levels. Once achieved, the published specs
from marketing can be increased.

RDT Flow Chart
Reliability - Reliability Demonstration Testing Flow

Input From
Reliability Input From
Modeling/ HALT
Derating

Develop Test Plan, including
1. Number of Units
Review Reliability 2. Acceleration Factors Set up and Begin
Reliability

Goals Based on 3. Total Test Time Monitor Results
Test
Marketing Input 4. Confidence Levels

Have Reliability Publish Results
Goals Been Yes
Met?

RELIABILITY
INTEGRATION IN THE
MANUFACTURING
PHASE

Reliability Tools and Integration in the
MANUFACTURING Phase

• Highly Accelerated Stress Screening (HASS)
• Highly Accelerated Stress Auditing (HASA)
• On-Going Reliability Testing (ORT)
• Repair Depot Setup
• Field Failure Tracking System
• Reliability Performance Reporting
• End-of-Life Assessment

HIGHLY ACCELERATED
STRESS SCREENING
(HASS)

HASS: How to Use the Results of FMECA
and a Reliability Predictions in Planning a
HASS
♦ How to use the results of FMECA and a Reliability
Prediction in planning a HASS
• FMECA results can identify possible wearout
mechanisms that need to be taken into account for
HASS.
• Reliability Prediction results can help determine how
much screening is necessary.

HASS: How to Use the Results of
FMECA and a Reliability Predictions in
Planning a HASS, continued
♦ Using FMECA results to identify possible wearout
mechanisms that need to be taken into account for
HASS
• As we discussed in the FMECA section, certain
wearout failure modes are not easily detectable in
HALT or even in HASS Development. Therefore,
when wearout failure modes are present, we must rely
on the results of a FMECA to help determine
appropriate screen parameters.

♦ Using Reliability Prediction results to determine how
much screening is necessary
• One of the parameters of a reliability prediction is the
First Year Multiplier factor. This is a factor applied to
a product based on how much manufacturing
screening is being performed (or is planned for) to
take into account infant mortality failures.
• The factor is on a scale between 1 and 4. No
screening yields a factor of 4, and 10,000 hours of
“effective” screening yields a factor of 1 (the scale is
logarithmic).

♦ Using Reliability Prediction results to determine how
much screening is necessary, continued
• Effective screening allows for accelerants such as
temperature and temperature cycling.
• HASS offers the best acceleration of any known
screen. Therefore, HASS is the perfect vehicle for
helping to keep this factor low in a reliability
prediction.

HASS: Using the Results of HALT to
Develop a HASS Profile

♦ Using the HALT Results, we then run a HASS
Development process
• The process must prove there is significant life left
in the product
• The process must prove that it is effective at finding
defects.

HASS: Linking the Repair Depot with
HASS by Sending “NTF” hardware back
through HASS
♦ During the repair process, we may identify a large
number of “No Trouble Founds” or NTFs. HASS is the
perfect vehicle for identifying if these NTFs are truly
intermittent hardware problems or due to something
else. Using HASS to assist with the “No Trouble
Found (NTF)” issue at the Repair Depot.

ON-GOING RELIABILITY
TESTING (ORT)

On-Going Reliability Testing (ORT)

♦ ORT is a process of taking a sample of products
off a production line and testing them for a period
of time, adding the cumulative test time to achieve
a reliability target. The samples are rotated on a
periodic basis to:
• get an on-going indication of the reliability
• assure that the samples are not wearing too much
(because after the ORT is complete, the samples are shipped).

Comparison Between ORT and HASA

♦ ORT Benefits over HASA
• You can measure reliability at any given time
♦ HASA Benefits over ORT
• Effective process monitoring tool due to ability to
find failures and to timely corrective actions
• Don’t need to measure on-going reliability because
reliability measurement was already done once in
RDT. Also, periodic HALT is a much better vehicle
for continuously monitoring reliability over time after
it has been baselined.

REPAIR DEPOT
SETUP

Repair Depot Setup
♦ A Repair Depot facility must be set up with the
proper testing in place to reproduce the failures
and to assure that the product has enough life left
to be shipped back into the field.

♦ But more importantly it must be set up in such a
way as to learn from the failures and make
changes to the design and manufacturing
processes to assure the failures are not repeated.

Repair Depot Setup
♦ Set up the Repair Depot System to feed data to the
Field Failure Tracking System
• The Repair Depot Center retests products returned
from the field to confirm failures and determine root
cause.
• The confirmation is then fed back to the Field
Failure Tracking System so that it can be properly
categorized for reliability data reporting.

FIELD FAILURE
TRACKING SYSTEM

♦ The purpose of the Field Failure Tracking System
is to provide a system for evaluating a product’s
performance in the field and for quickly
identifying trends.

♦ Integrating the Field Failure Tracking System with
the Repair Depot Center
• Failed products from the field are returned to the
Repair Depot Center for confirm and to determine
root cause.
• The confirmation is then fed back to the Field
Failure Tracking System so that it can be properly
categorized for reliability data reporting.

RELIABILITY
PERFORMANCE
REPORTING

Reliability Performance Reporting
♦ Reliability Performance Reporting in its simplest form
is just reporting back how we are doing against our
plan. In this report, we must capture
• how we are doing against our goals and against our
schedule to meet our goals ?
• how well we are integrating each tool together ?
• what modifications we may need to make to our plan ?
♦ In the report, we can also add information on specific
issues, progress on failure analyses, and paretos and
trend charts

♦ How we are doing against our goals and against our
schedule to meet our goals ?
• After collecting the field data, we then compare with our
goals and estimate how we are doing.
• If we are achieving a specific goal element, we explain
what pieces are working and the steps we are going to
take to assure that this continues
• If we are not achieving a specific goal element, we must
understand what contributed to this and what steps we
are going to take to change this
• As part of this, we must understand the major
contributors to each goal element through trend
plotting and failure analyses

♦ How well we are integrating each tool together ?
• As part of an understanding the effectiveness of our
reliability program, we must look at the overall program
• For example, if we stated in the plan that we were going
to use the results of the prediction as input to HALT, we
must describe here how we accomplished this
• This can help explain the effectiveness of the HALT
so that its results can be repeated
• This can help explain how the HALT can be more
effective in future programs if we overlooked or
skipped some of the integration
• This will serve as documentation for future programs

♦ What modifications we may need to make to our plan ?
• Occasionally, we may need to modify the plan
• Goals may change due to new customer/marketing
requirements
• We may have discovered new tools or new
approaches to using existing tools based on research
• We may have developed new methods of integration
based on experimentation and research
• Schedule may have changed

♦ What modifications we may need to make to our plan ?
• If this occurs, we need to
• Re-write the plan
• Summarize the changes in our Reliability
Performance Report so that we can accurately
capture these new elements going forward

END-OF-LIFE
ASSESSMENT

End-of-Life (EOL) Assessment
♦ We Perform End-of-Life Assessments to
• Determine when a product is starting to wear out in
case product needs to be discontinued
• Monitor preventive maintenance strategy and
modify as needed
• Monitor spares requirements to determine if a
change in allocation is necessary
• Tie back to End-of-Life Analysis done in the Design
Phase to determine accuracy of analysis

♦ A review of the “bathtub” curve

Infant Mortality level driven by amount of
screening in mfg./characterized using a
special factor in prediction

Failure Onset of end-
Ideal Steady State Reliability of-life (EOL)
Rate
reliability at Level described by
time of prediction
ship

Time

♦ To figure out where we are, we plot the field data
• We must “scrub” the data to
• accurately determine the number of days in use
before failure
• properly categorize the failure
• We must be careful and plot data by assembly
type, especially if different assemblies have
different wearout mechanisms. Otherwise, it will
be impossible to determine a pattern

ReliaSoft's Weibull++ 6.0 - www.Weibull.com
Failure Rate vs Time Plot
0.10
Weibull
Since Jan 28 - (NTF-knwnissues)

W2 RRX - SRM MED
F=49 / S=0

0.08

0.06
Failure Rate, f(t)/R(t)

0.04

0.02

Mike Silverman
Company
0 5/2/2004 07:58
0 40.00 80.00 120.00 160.00 200.00
Time, (t)

β=2.9032, η=60.9188, ρ=0.8154

RELIABILITY INTEGRATION
SUMMARY

Reliability Integration Summary
In this section, we learned about:
• The four phases of a reliability program
• Concept
• Design
• Prototype
• Manufacturing
• We learned about the reliability tools used in each
phase and how to integrate all of the tools together
• We learned about HALT and HASS and their role
in an overall reliability program

In the Concept Phase, we learned about:
• Benchmarking
• Gap Analyses
• ReliabilityProgram and Integration Plan
Development
and how to use these tools to effectively plan and
execute a Reliability Program

In this Design Phase, we learned about:
• Reliability Modeling and Predictions
• Derating Analysis/Component Selection
• Tolerance/Worst Case Analysis/Design of Experiments
• Risk Management / FMECAs
• Fault Tree Analysis (FTA)
• Human Factors/Maintainability/Preventive Maintenance
• Software Reliability
and how to integrate these together and with
tools from the other phases, including HALT and
HASS

In the Prototype Phase, we learned about:
• Reliability Test Plan Development
• Highly Accelerated Life Testing (HALT)
• Design Verification Testing (DVT)
• Reliability Demonstration Testing
• Failure Analysis Process Setup
tools from the other phases

In the Manufacturing Phase, we learned about:
• Highly Accelerated Stress Screening (HASS)
• On-Going Reliability Testing
• Repair Depot Setup
• Field Failure Tracking System Setup
• Reliability Performance Reporting
• End-of-Life Assessment
tools from the other phases

In Summary we have learned:
• the power of developing realistic reliability goals
early, planning an implementation strategy, and
then executing the strategy, and...

the power of integration !!


WHAT ARE YOUR QUESTIONS ?

Further Education

• For a more In-depth view of this topic and more,
Mike will be teaching at:
• January 11th through March 1st, 2005: “Certified
Reliability Engineer (CRE) Preparation Course”
to prepare for taking the ASQ CRE Exam
• May, 2005: “Analysis & Test Tools for
Comprehensive Reliability” – a more in-depth
look at the best reliability tools being used today

Additional Services from
Ops A La Carte
Reliability Integration in the Concept Phase
1. Benchmarking
2. Gap Analysis
3. Reliability Program and Integration Plan Development

Reliability Integration in the Design Phase
1. Reliability Modeling and Predictions
2. Derating Analysis/Component Selection
3. Tolerance/Worst Case Analysis/Design of Experiments
4. Risk Management / Failure Modes, Effects, & Criticality Analysis (FMECA)
5. Fault Tree Analysis (FTA)
6. Human Factors/Maintainability/Preventive Maintenance Analysis
7. Software Reliability

Additional Services from
Ops A La Carte
Reliability Integration in the Prototype Phase
1. Reliability Test Plan Development
2. Highly Accelerated Life Testing (HALT)
3. Design Verification Testing (DVT)
4. Reliability Demonstration Testing
5. Failure Analysis Process Setup

Reliability Integration in the Manufacturing Phase
1. Highly Accelerated Stress Screening (HASS)
2. On-Going Reliability Testing
3. Repair Depot Setup
4. Field Failure Tracking System Setup
5. Reliability Performance Reporting
6. End-of-Life Assessment

Additional Educational Courses from
Ops A La Carte
1. Reliability Tools and Integration for Overall Reliability Programs
2. Reliability Tools and Integration in the Concept Phase
3. Reliability Tools and Integration in the Design Phase
4. Reliability Tools and Integration in the Prototype Phase
5. Reliability Tools and Integration in the Manufacturing Phase
6. Reliability Techniques for Beginners
7. Reliability Statistics
8. FMECA
9. Certified Reliability Engineer (CRE) Preparation Course for ASQ
10.Certified Quality Engineer (CQE) Preparation Course for ASQ

For more information...

♦ Contact Ops A La Carte (www.opsalacarte.com)
• Mike Silverman
• (408) 472-3889
• mikes@opsalacarte.com
• Fred Schenkelberg
• (408) 710-8248
• fms@opsalacarte.com

Thank you for your time !

Reliability Tools and Integration Seminar

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Reliability Tools and Integration Seminar

Similar to Reliability Tools and Integration Seminar (20)

More from Accendo Reliability

More from Accendo Reliability (20)

Recently uploaded

Recently uploaded (20)

Reliability Tools and Integration Seminar