Cost of changes (ifo Reliability)
NJIT by Rishi R Persad
Objectives of a reliability approach
Early identification of weak points in design to:
• Limit the risk/cost of modifications in production or deployment phase
• Reduce product failures/returns/recalls during the product lifecycle
Improve time to market by early detection of weakness and flaws
Minimize number of dead-on-arrivals
Increase customer satisfaction
The definition of the Reliability approach starts with the inventarisation by subsystem:
System breakdown in subsystems-assembly-subassembly-components
Typical systems: Electronic & electrical systems, mechanical, hydraulic, process
Which critical topics are relevant (FMECA: Failure modes)?
How will these critical topics be evaluated ifo life-time. (via which norm or guideline)
This will invlove the inventarisation of the norms or guidelines that are the most relevant
for the application or intended purpose.
How? Example: FMECA
RPN = Severity x Occurrence x Detection
The RPN can then be used to compare issues within the analysis and to prioritize problems for corrective
Action. The ratings are defined by:
Main published standards for this type of analysis, like SAE J1739, AIAG FMEA-3 and MIL-STD1629A.
Industries and companies have developed their own procedures to meet the specific requirements of
Why use a FMECA
FMECA/FMEA is useful as a survey method to identify effects of major failure modes in a system
It can contribute to improved designs for products and processes, resulting in higher reliability, better
quality, increased safety, enhanced customer satisfaction and reduced costs.
Avoid time and cost consuming design changes at a late stage in the development
The tool can also be used to establish and optimize maintenance plans, control plans and other
quality assurance procedures.
In addition, an FMEA or FMECA is often required to comply with safety and quality requirements,
such as ISO 9001, QS 9000, ISO/TS 16949, 13485, FDA,…
Complex systems & processes makes the task of defining a detailed FMEA/FMECA time-consuming
Assumes the causes of problems are all single event in nature (combinations of events = 1 event)
The process relies on the right participants & open communication & cooperation
Human error sometimes overlooked
It’s just a tool. Without a follow-up plan & actions, It will not improve the reliability of your system
Evaluation & definition of the appropriate calculation methods of the failure rate
For the defined building blocks (sub-systems) & specific parts, we will analyze which norm or
standard provides the best method for the evaluation & calculation of the failure rate.
& Study logic ifo
reliability design &
ECSS-E-ST-33-01C Space Mechanisms
oScope of the standard: requirements applicable to the:
concept definition, design, analysis, development, production,
test verification and operation of space mechanisms
to meet the mission performance requirements
MTBF, FIT calculations (Prediction Method)
To obtain high product reliability, consideration of reliability issues should be integrated from the very
beginning of the design phase. This leads to the concept of reliability prediction.
MTBF: Mean Operating Time Between Failures
The failure rate of the system is calculated by summing up the failure rates of each component in
each category (based on probability theory). This applies under the assumption that a failure of any
component is assumed to lead to a system failure.
Constant failure rate Relevant for Useful life-time
Fault is repairable
MIL-HDBK-217F is probably the most internationally recognized empirical prediction method, by far.
FEM Analysis: (FEA)
FEA consists of a computer model (2D, 3D)of a material or design that is stressed and analyzed for
It is used in new product design, and existing product refinement. A company is able to verify a proposed
design and will be able to perform to the client's specifications prior to manufacturing or construction.
What can you check at an early stage?
Point, pressure, thermal, gravity, and centrifugal static loads
Thermal loads from solution of heat transfer analysis
Heat flux and convection
Point, pressure and gravity dynamic loads
Thermal cross points
DESTECS (Design Support and Tooling for Dependable Embedded Control Software)
o Use collaborative multidisciplinary design of Embedded Systems
o Rapid construction and evaluation of system models
o Evaluated on industrial applications
Need because of Embedded Systems
o More demanding requirements for Reliability, Fault Tolerance
o Increasingly distributed: more complex design possibilities more fault scenario’s
Advantages of empirical methods:
Easy to use, and a lot of component models exist.
Relatively Indicators of inherent reliability.
Provide an approximation of field failure rates.
Disadvantages of empirical methods:
Based on statistical data & sometimes out-dated
Not all components from new designs are described in
Failure of the components is not always due to
component-intrinsic mechanisms but can be caused by
the system design.
Early validation of your system
More and faster iterations
Parallel hw & sw development
Early full system validation and risk
mitigation without hw
Less real-life testing
(= the poor man’s approach)
Not Traditional Testing!!
Traditional (QA) testing is done before product release but after the design & development phase (ex.
Burn-in test, environmental testing, drop testing, shock & vibration testing,…)
Many of today's products are capable of operating under extremes of environmental stress and for
thousands of hours without failure. Traditional test methods are no longer sufficient to identify design
weaknesses or validate life predictions.
Test under operating conditions Takes too long
Testing is costly! (equipment, time-consuming,…)
Will not tell you anything about the realiability during useful life. Just about infant failures. (DOA)
Too late in NPD process, Design corrections will be
Highly accelerated testing
HALT = Highly Accelerated Life Time Test
Highly accelerated life testing (HALT) techniques are important in uncovering many of
the weak links of a product DURING THE DESIGN PHASE
These discovery tests rapidly find weaknesses using accelerated stress conditions
Stresses are applied in a controlled, incremental fashion while the unit under test is
continuously monitored for failures
HALT reveals product failure modes in a matter of hours or days
Traditional test methods that can take weeks or even months to find, if at all
The purpose of HALT is to determine the operating and destruct limits of a design – why
those limitations exist and what is required to increase those margins. HALT, therefore,
stresses products beyond their design specifications.
Using a test environment that is more severe than that experienced during normal equipment use.
Done on early prototypes & different design concepts
Since higher stresses are used, accelerated testing must be approached with caution to avoid introducing
failure modes that will not be encountered in normal use. Accelerating factors used, either singly or in
More frequent power cycling
Higher vibration levels
More severe temperature cycling
‘ It’s not a Pass/Fail test but a discovery process! ’
• Component failures
• Component dislocation
• PCB delamination, via-cracking, …
• Solder failure
• Software failures due to component degradation
• Connector problems
Information on product limits and product capabilities outside the limits
Product weaknesses & design errors
HALT provides engineers with the opportunity to improve
product design, increasing its robustness and minimizing
possibility of costly warranty services and expensive
product recalls after release
Once the weaknesses of the product are uncovered and corrective actions taken, the limits of the
product are clearly understood and the operating margins have been extended as far as possible.
A much more mature product can be introduced much more quickly with a
higher degree of reliability.
Taking It a step further…
Define the S-N curve for the specific failure mechanisms
Use test data in a model relating the reliability (or life) measured under high stress conditions to that
which is expected under normal operation to determine length of life
Accelerated test models relate the failure rate or the life of a component to a given stress such that
measurements taken during accelerated testing can then be extrapolated back to the expected
performance under normal operating conditions
Design for Reliability!!! PoF
EXAMPLE: Central Heating sensor
Thermal cycle vs measurement errors
Life-time expectancy necessary for product = 10years
Verify the reliability of measurements with HALT test setup
Discover design weakness, improve & repeat test
acceleration : cycle 1x/day => 1x/hour
acceleration : min-max temperatures & high transient
statistical number of test samples (one is not enough)
Identify & measure performance parameter(s)
Upfront definition of evaluation criteria are important.
Multiple failure modes
Non-constant (random) failures
Performance degradation over time: Quality of the measurements will degrade in time.
Temperature induced (thermo-mechanical stress)
HALT vs Field & Traditional testing
• Faster results (accelerated stress)
• Correct & increase design
reliability throughout the test
• Control over test conditions
• Main costs:
Fabrication of samples, test setup,
•More spread on the test results
•Same test conditions cannot be
guaranteed: Difficult for quatative
• Expensive setups
• Expensive corrective actions
• Too late in design cycle
• Only for infant failures (DOA)
Current approaches = not sufficient?
Mostly only FMECA executed. Rarely identifies design issues because of limited focus on the failure
Incorporation of HALT and failure analysis (HALT is test, not DfR; failure analysis is too late)
MTBF/MTTF calculations tend to assume that failures are random in nature
Provides no motivation for failure avoidance
Easy to manipulate numbers
Tweaks are made to reach desired MTBF
E.g., quality factors for each component are
50K hour MTBF does not mean no failures in
Source: Loughborough University
Alternative = Physics-of-Failure principle:
The use of science (physics,chemistry, etc.) to capture an understanding of failure mechanisms
and evaluate useful life under actual operating conditions
Focus on failure mechanisms
o The EFFECT by which a failure is OBSERVED, PERCEIVED or SENSED.
o The PROCESS (elect., mech., phy., chem. ... etc.) that causes failures.
FMMEA: Add failure mechanisms to FMEA
Center for Advanced Life Cycle Engineering (CALCE),
University of Maryland
Further break-down to PBA level
Failure site = CBGA IC broken-off from PCB
Solder-joint = Surface mount solder attachment.
Electrical interconnection & mechanical attachment of electronic
component on the PCB but also critical heat transfer in
Failure Mode = Solder-joint fatigue
Failure effect: Solder-Joint crack
Example: Solder-joint cracks
Failure Mechanism: Solder-joint fatigue by CTE mismatch
Caused by the local thermal mismatches between the different material characteristics of IC, PCB and
solder itself = CTE mismatch. (Coeficient of Thermal expansion)
Result: Different thermal expansions, due to thermal energy dissipated stress on solder joints
Fatigue leads to growing of the grains inside the solder Result: Cracks!
S-N curve of solder-joint fatigue
For each failure mode a S-N curve can be defined
Solder-joint fatigue = Function of Thermal strain vs N cycles to failure
Established out of:
• Test data
• FE simulation
• Physical modeling
Thermal swings (dT) in the operational environment
accelerating the thermal strain
accelerating solder-joint fatigue
accelerating failure effect: Solder-joint crack
Thermal cycling test requirements:
• Heat/cool rate limited (transient)
• Allow for minimal dwell times at extreme temperatures: time is essential.
• Materials set limits to temperature extremes
Establish accelerating factor = Thermal strain (accelerated temp conditions)/Thermal strain
(normal temp conditions)
These are mathematical models that can extrapolate the Number cycles to failure under accelerated
Temp conditions to the number of cycles to failure under operational Temp conditions
Example: Solder-joint cracks
Establish test failure distribution and predict operational failure distribution
using the acceleration factors and the operational use of the product
Use test data in a model relating the reliability (or life) measured under high
stress conditions to that which is expected under normal operation to
determine length of life
Characteristics, benefits and limitations:
Physics not statistics.
The only way to predict long term wearout lifetime.
Testing is in general done on specially designed test samples, not on the actual product.
It is input for the design process. Can be established independent from design cycle. Time-to–
Requires profound understanding of technologies used in the product and the wearout physics
Limitation: Establishing the S-N curves and acceleration factors is a tedious, time-consuming and
expensive job with a lot of pitfalls. Therefore, for many relevant failure mechanisms S-N or
acceleration factor information is not available.
Still subject of scientific research.
VERHAERT MASTERS IN INNOVATION®
9150 Kruibeke (B)
tel +32 (0)3 250 19 00
fax +32 (0)3 254 10 08
More at www.verhaert.com
VERHAERT MASTERS IN INNOVATION®
MASTERS IN INNOVATION® is a platform set up by VERHAERT to train, stimulate and incubate
you as an innovator.
We provide an extensive training program with different tracks and covering critical areas of new
products and business innovation.
Furthermore we manage the VERHAERT venturing program and organize our Innovation Day, an
annual conference on best practices and insights on new products & business innovation.
ESIC European Space Innovation Centre
2201 BB Noordwijk (NL)
Tel: +31 (0)618 12 19 19
More at www.verhaert.com