SlideShare a Scribd company logo
1 of 153
&




We Provide You Confidence in Your Product ReliabilityTM
   Ops A La Carte / (408) 654-0499 / askops@opsalacarte.com / www.opsalacarte.com
DESIGN FOR
      RELIABILITY (DfR)
         SEMINAR
             at


              February 11, 2010
Mike Silverman // (408) 472-3889 // mikes@opsalacarte.com

       Ops A La Carte LLC // www.opsalacarte.com
                         © 2009 Ops A La Carte              1
DfR Seminar Overview
                                           Thurs, Feb 11, 2010
                                              - DFR SEMINAR -
       10:00-10:10am Introduction
       10:10-10:30am             DfR Overview/Introduction
       10:30-11:00am FMEA
       11:00-11:30am Using FMEA to Design a Better Reliability Test Program
       11:30-11:50 am HALT
       11:50- 12:10pm Lunch Break
       12:10-12:30pm ALT
       12:30:1:00pm             HALT vs. ALT – When to Use Which Technique?
       1:00-1:15pm              Reliability Demonstration Test (RDT)
       1:15-1:45pm              HALT vs. RDT – The HALT Calculator
       1:45-2:00pm              Wrap-Up/Questions
Note that this ½ day seminar is an abridged version of a 5 day DfX seminar we will be holding 3 times this year:
     - Apr 16-20 in Santa Clara, CA
     - May 17-21 in Huntsville, AL
     - Oct 11-15 in Maryland
                                                © 2009 Ops A La Carte                                              2
Product Life Cycle Reliability and Test Spectrum
    Wyle and OPS Combined Capabilities



 Program                                                Test &                                                       Operate &
 Capture           Design             Build             Eval                 Qualify         Manufacture             Maintain

      Test Engineering Services
Test Quotes
              Tech        Test
           Requirements
                                                                                                                           KEY
                          Plans
                                  Test                             Data Analysis
                                                                                                                             Wyle
                               Procedures
                                                                                                                             Ops
Test Services
                                                                                                                             Wyle & OPS
       HALT

                                                                                                HASS
                                            Dev Test
                                                                       Qual Test
                                                                                                                Acceptance
Reliability, Maintainability, Supportability Services
         FMECA
                   Reliability Eng    Configuration
                                                        Publications              Training         Reliability Eng
                    & Analysis        Management
                                                                                                    & Analysis
                                                                Asset                Lean
                                                              Management                                             RCM
                                                                                             Six Sigma
                                                                                                                              NDI
                                                                                                          TOC


                                                          © 2009 Ops A La Carte                                                           3
Jim Pinyan
                          Director of Business Development
                        Test, Engineering & Research Division
                                    (310) 563-6651
                                Jim.pinyan@wyle.com
                                 128 Maryland Street
                                El Segundo, CA 90245
© 2009 Ops A La Carte                                       4
RELIABILITY                       CONSULTING




Company                           Overview
          © 2009 Ops A La Carte                5
provides clients with integrated reliability solutions across the
Product Life Cycle.

We have the unique ability to assess a product and understand the key
reliability elements necessary to measure/improve product performance
and customer satisfaction.

Our strength lies in our ability to tailor a solution to fit your needs based on
your product reliability requirements, schedule and budget.

                                  © 2009 Ops A La Carte                            6
HALT and HASS Labs
• Our own lab facility located in Northern California in the heart of Silicon Valley. We
  provide HALT/HASS services on a world-wide basis, using partner labs for tests
  outside California.




• Second oldest HALT facility in the world, established in 1995 (originally owned by
  QualMark)

• HALT equipment has all latest technology – only lab in region

• Highly-experienced staff with over 100 years of combined experience in HALT and
  HASS

• Tested over 500 products in over 302009 Ops A La Carte industries
                                    © different                                            7
The following presentation materials are
    copyright protected property of
         Ops A La Carte LLC.
These materials may not be distributed
      outside of your company.




               © 2009 Ops A La Carte       8
What is DESIGN for
  RELIABILITY?


       © 2009 Ops A La Carte   9
First we must ask: What is Reliability?
Reliability is often considered quality over time.



Reliability is…
 “The ability of a system or component to perform its required
  functions under stated conditions for a specified period of time”

                                                     - IEEE 610.12-1990




 We shall revisit this when we discuss Reliability Goal Setting.




                             © 2009 Ops A La Carte                        10
Different Views of Reliability
 Product development teams
   View reliability as the domain to
   address mechanical and electrical, and           Mechanical

   manufacturing issues.                            Reliability



 Customers                                            +
   View reliability as a system-level issue,         Electrical
   with minimal concern placed on the                Reliability
   distinction into sub-domains.

 Since the primary measure of                          +
  reliability is made by the customer,                  SW
  engineering teams must maintain a                  Reliability
  balance of both views (system and
  sub-domain) in order to develop a
  reliable product.
                                                     System
                            © 2009 Ops A La Carte                  11
Reliability vs. Cost
   Intuitively, the emphasis in reliability to
    achieve a reduction in warranty and in-service
    costs results in some minimal increase in
    development and manufacturing costs .


   Use of the proper tools during the proper life
    cycle phase will help to minimize total Life
    Cycle Cost (LCC).




                     © 2009 Ops A La Carte           12
Reliability vs. Cost, continued
   To minimize total Life Cycle Costs (LCC), an
   organization must do two things:
1. Choose the best tools from all of the tools
   available and apply these tools at the proper
   phases of the product life cycle.
2. Properly integrate these tools together to assure
   that the proper information is fed forwards and
   backwards at the proper times.




                      © 2009 Ops A La Carte            13
Reliability Integration
     “the process of seamlessly,
    cohesively integrating reliability
       tools together to maximize
      reliability and at the lowest
              possible cost”




                 © 2009 Ops A La Carte   14
Reliability vs. Cost, continued
                                           TOTAL
                                           COST
                OPTIMUM                    CURVE
                 COST
                 POINT                     RELIABILITY
                                           PROGRAM
                                           COSTS
COST




                                           WARRANTY
                                           COSTS




              RELIABILITY
       HW RELIABILITY & COSTS

                   © 2009 Ops A La Carte             15
ELEMENTS
   OF A
RELIABILITY
 PROGRAM
    © 2009 Ops A La Carte   16
DfR Tool Selection


A reliability assessment is the recommended first
step in establishing a reliability program. This
mechanism is the appropriate forum for selecting
the best tools for each product life cycle phase.




                     © 2009 Ops A La Carte          17
RELIABILITY
ASSESSMENT

    © 2009 Ops A La Carte   18
Reliability Program Assessment
  •   Initiate a Reliability Program
  •   Determine next best steps                                        $ Profits
  •   Reduce customer complaints
  •   Select right tools
  •   Improve reliability                                                              market
                                                                          Goal         share
                                       Program Plan

                         Gap Analysis
                                                                        satisfaction
                   Benchmarking

              Statistical
              Data Analysis
                                                         A detailed evaluation of an
                                                         organization’s approach and
           Assessment
           Interviews
                                                         processes involved in creating
field
                                                         reliable products. The assessment
failures            $ unreliability                      captures the current state and
             Now                                         leads to an actionable reliability
                    ? Unknown                            program plan.
 complaints
                     Reliability ?     © 2009 Ops A La Carte                                    19
Agenda

             • motivation
             • approach
             • results
             • findings
             • observations
             • next steps
             • close

 © 2009 Ops A La Carte        20
Assessment Motivation

• Identify systemic changes that impact
 reliability
   – Tie into culture and product
   – Both enjoy benefits

• Provides roadmap for activities that
 achieve results
   – Matching of capabilities and expectations
   – Cooperative approach


                  © 2009 Ops A La Carte          21
Assessment Approach
 Preparation

 Checklist

 Who to interview in organization

 Analysis, average scores and summary of
 comments




                      © 2009 Ops A La Carte   22
Steps Involved

 selecting people to
  survey
 selecting survey topics

 develop scoring system

 data analysis

 summary feedback
  results
 review of results

 recommended actions




                            © 2009 Ops A La Carte   23
Select People to Survey
Hardware:
 Hardware manager
 Electrical engineering lead
 Mechanical engineering lead
 System engineering lead
 Reliability manager/engineer
 Procurement
 Manufacturing

Software:
 sw r&d manager
 sw r&d engineer
 sw test manager
 sw test engineer


                          © 2009 Ops A La Carte   24
Select Survey Topics
                    DFR Methods Survey
           Scoring: 4 = 100%, top priority, always done
                    3 = >75%, use normally, expected
                    2 = 25% - 75%, variable use
                    1 = <25%, only occasional use
                    0 = not done or discontinued
                    - = not visible, no comment

 Management:
 □ Goal setting for division
 □ Priority of quality & reliability improvement
 □ Management attention & follow up (goal ownership)


 Design:
 □ Documented hardware design cycle
 □ Goal setting by product or module
                         © 2009 Ops A La Carte            25
Example
 To what extent is FMEA used?
    Design Engineer

        Score = 1: Used only as a troubleshooting tool

      Manufacturing Engineer
        Score = 3: Commonly used on critical design elements

      Reliability Engineer
        Score = 4: Always used on all products

Results: Score 2.6
Comments: Clearly a disconnect between reliability and
design engineering – indicative of a problem with the tool.
                         © 2009 Ops A La Carte                26
Reliability Maturity Grid
• 5 levels of maturity
• Loosely based on IEEE 1332: “Reliability Program
  for the Development and Production of Electronic
  Products” (currently in draft form)
• Similar to Crosby’s Quality Maturity

• On the following page is a matrix based on
  Crosby’s as an example.
• Read across each row and find the statement that
  seems most true for your organization.
• The center of mass of the levels is the
  organization’s overall level.
                         © 2009 Ops A La Carte       27
Reliability Maturity Matrix
        Measurement                       Stage I:                        Stage II:                      Stage III:                 Stage IV:                     Stage V:
        Category                         Uncertainty                   Awakening                        Enlightenment                  Wisdom                     Certainty
Management                    No comprehension of               Recognizing that reliability    Still learning more about   Participating.               Consider reliability
Understanding and Attitude    reliability as a management       management may be of            reliability management.     Understand absolutes of      management an
                              tool. Tend to blame               value but not willing to        Becoming supportive and     reliability management.      essential part of company
                              reliability engineering for       provide money or time to        helpful.                    Recognize their personal     system.
                              ‘reliability problems’            make it happen.                                             role in continuing
                                                                                                                            emphasis.
Reliability status            Reliability is hidden in          A stronger reliability          Reliability manager         Reliability manager is an    Reliability manager is on
                              manufacturing or                  leader appointed, yet           reports to top              officer of company;          board of directors.
                              engineering departments.          main emphasis is still on       management, with role in    effective status reporting   Prevention is main
                              Reliability testing probably      an audit of initial product     management of division.     and preventive action.       concern. Reliability is a
                              not part of organization.         functionality. Reliability                                  Involved with consumer       thought leader.
                              Emphasis on initial product       testing still not performed.                                affairs.
                              functionality.
Problem handling              Fire fighting; no root cause      Teams are set up to solve       Corrective action process   Problems are identified      Except in the most
                              analysis or resolution; lots of   major problems. Long-           in place. Problems are      early in their               unusual cases, problems
                              yelling and accusations.          range solutions are not         recognized and solved in    development. All             are prevented.
                                                                identified or                   orderly way.                functions are open to
                                                                implemented.                                                suggestion and
                                                                                                                            improvement.
Cost of Reliability as % of   Warranty: unknown                 Warranty: 3%                    Warranty: 4%                Warranty: 3%                 Warranty: 1.5%
net revenue                   Reported: unknown                 Reported: unknown               Reported: 8%                Reported: 6.5%               Reported: 3%
                              Actual: 20%                       Actual: 18%                     Actual: 12%                 Actual: 8%                   Actual: 3%
Feedback process              None. No reliability testing.     Some understanding of           Accelerated testing of      Refinement of testing        The few field failures are
                              No field failure reporting        field failures and              critical systems during     systems – only testing       fully analyzed and
                              other than customer               complaints. Designers           design. System level        critical or uncertain        product designs or
                              complaints and returns.           and manufacturing do            modeling and testing.       areas. Increased             procurement
                                                                not get meaningful              Field failures analyzed     understanding of causes      specifications altered.
                                                                information.                    and root causes reported.   of failure allow             Reliability testing done to
                                                                                                                            deterministic failure rate   augment reliability
                                                                                                                            prediction models            models.
DFR program status            No organized activities.          Organization told               Implementation of DFR       DFR program active in all    Reliability improvement is
                              No understanding of such          reliability is important. DFR   program with thorough       areas of division – not      a normal and continued
                              activities.                       tools and processes             understanding and           just design & mfg’ing.       activity.
                                                                inconsistently applied and      establishment of each       DFR normal part of R&D
                                                                only ‘when time permits’.       tool.                       and manufacturing.
Summation of reliability      “We don’t know why we             “Is it absolutely necessary     “Through commitment         “Failure prevention is a     “We know why we do not
posture                       have problems with                to always have problems         and reliability             routine part of our          have problems with
                              reliability”                      with reliability?”              improvement we are          operation.”                  reliability.”
                                                                                                identifying and resolving
                                                                                                our problems.”



                                                                             © 2009 Ops A La Carte                                                                                28
Reliability Maturity Matrix
Lets look at one row to get a better understanding.
Measure-    Stage I:       Stage II:               Stage III:    Stage IV:    Stage V:
           Uncertainty    Awakening               Enlighten-     Wisdom       Certainty
 ment
                                                     ment
Category
Problem    Fire           Teams are              Corrective     Problems     Except in
handling   fighting; no   set up to              action         are          the most
           root cause     solve                  process in     identified   unusual
           analysis or    major                  place.         early in     cases,
           resolution;    problems.              Problems       their        problems
           lots of        Long-                  are            developm     are
           yelling and    range                  recognize      ent. All     prevented.
           accusations    solutions              d and          functions
           .              are not                solved in      are open
                          identified             orderly        to
                          or                     way.           suggestio
                          implement                             n and
                          ed.                                   improvem
                                                                ent.
                              © 2009 Ops A La Carte                                       29
Results & Meaning
• Looking for trends, gaps in process, skill mismatches,
  over analysis, under analysis, etc.

• Looking for differences across the organization,
  pockets of excellence, areas with good results

• Process provides snapshot of current system

• No one tool make an entire reliability program. The
  tools need to match the needs of the products and
  the culture.

• Check step is critical before moving to
  recommendation around improvement plan

                       © 2009 Ops A La Carte               30
Observations
What Companies Are                        What Companies Are
 Doing Best                                Weak at
 Prediction                               Goal setting/Planning
 HALT                                     Repair & warranty
                                                 invisible
 Golden nuggets
                                           Lessons learned
 Fast reaction to fix
                                                 capture
  problems
                                           Single owner of product
                                                 reliability
                                           Multiple defect tracking
                                                 systems
                                           Reliability Integration
                                           Statistics
                         © 2009 Ops A La Carte                         31
Next Steps
• Determine current state of your organization
 (Summary of Assessment)
   – Identify strong and weak areas

• Goal Setting
   – Market Analysis to gather requirements
   – Benchmarking


• Gap Analysis


• Develop plan and implement

                     © 2009 Ops A La Carte       32
Failure Mode and Effect
Analysis (FMEA) Seminar



          © 2009 Ops A La Carte   33
FMEA



A FMEA is a systematic method
of identifying and preventing
product and process problems
BEFORE they occur.



           © 2009 Ops A La Carte   34
© 2009 Ops A La Carte   35
© 2009 Ops A La Carte   36
Not close enough to home yet?




            © 2009 Ops A La Carte   37
© 2009 Ops A La Carte   38
FMEA Benefits
 Facilitates investigation of design alternatives to consider high
  reliability at the conceptual stages of the design.
 Provides a basis for identifying root cause failures and
  developing corrective actions.
 Determines the effects of each failure mode on system
  performance.
 Aids   in developing     test            methods   and   troubleshooting
  techniques.
 Provides a foundation for qualitative analyses.

 Provide structured forum for cross functional discussions

 Provide common understanding and focus to reduce product
  or process issues
 Provide documentation of risk management effort

                           © 2009 Ops A La Carte                         39
Types of FMEAs

  Design FMEA
  Process FMEA
  System FMEA
  Functional FMEA
  User FMEA
  Software FMEA
  Many others




                     © 2009 Ops A La Carte   40
When Is a FMEA Performed


 FMEA’s are begun early in the design process and
 then updated throughout the life cycle of a product to
 capture changes in the design.




                     © 2009 Ops A La Carte                41
The 10 Steps
 Step 1: Review the Process/Design
 Step 2: Brainstorm potential failure modes
 Step 3: List potential effects of each failure mode
 Step 4: Assign a severity rating for each effect
 Step 5: Assign an occurrence rating for failure modes
 Step 6: Assign a detection rating for modes/effects
 Step 7: Calculate the risk priority numbers
 Step 8: Prioritize the failure modes for action
 Step 9: Take action to eliminate/reduce high-risk
 Step 10: Calculate the resulting RPN


                         © 2009 Ops A La Carte            42
Step 1: Review the Design or Process
 Understand the topic of study
   Design – drawings, prototypes, etc.
   Process – flowcharts, assembly instructions, etc.
 Focus on developing common understanding of
 design or process
 Designers or Process Experts available for questions




                         © 2009 Ops A La Carte           43
Step 2: Brainstorm potential failure
modes
 Have fun!
 How can the design/process fail?


 Break complex designs/processes into smaller
  elements
 Combine like ideas (affinity plotting)
 May have more than one failure mode per item or
  function
 List failure modes on worksheet
 Determine failure modes vs. failure mechanisms
 Use Boundary Interface Diagram Tool
 Use P-Diagram Tool
                         © 2009 Ops A La Carte      44
Common brainstorming tools
 Team dynamics
 Consensus-building techniques
 Team project documentation
 Idea-generation techniques
   Group brainstorming with a facilitator
   Affinity diagramming
 Flowcharting
 Boundary Interface Diagram
 P-Diagram
 Data analysis
 Graphing techniques
                           © 2009 Ops A La Carte   45
Step 3: List Potential effects of each
failure mode
 If the failure occurs, what are the consequences?


 List effect for each failure mode (not mechanism).


 List more than one effect, when necessary
   (note: more than one effect if ratings will be different, or
    solution would have to different)




                           © 2009 Ops A La Carte                   46
Step 4: Assign a severity rating for each
effect
 What is the consequence of the failure should it
  occur?
 Assign a severity rating for each effect
 An estimation of how serious the effects would be if
  the failure mode occurs
   Historical data
   Engineering judgment
   Experimentation, DOE, if needed




                         © 2009 Ops A La Carte           47
Severity
Severity is the assessment of the seriousness of the
effect of the failure mode to the next component,
subsystem, system or customer if it occurs.
Below is a typical Severity Rating Table.

Rating   Description                                          Definition
  10     Dangerously High   Catastrophic Failure Causing Replacement of the Entire System)

  9         Very high       Failure of a FRU Component, MTTR > 1 Hour

  8           High          Failure of a FRU Component, MTTR < 1 Hour

  6         Moderate        Failure that results in reduced throughput

  4           Minor         Failure that requires a tool reset or recalibration

  2        Very minor       Failure that can be corrected during a PM cycle

  1           None          Failure that does not affect system performance



                                      © 2009 Ops A La Carte                                  48
Step 5: Assign an occurrence rating for
each failure mode
 What is the probability of the failure occurring


 List the potential causes of failure


 Use actual data when available for rating


 When real data is not available:
   Engineering estimates or models
   Consider the failure causes probabilities
   Rank order then assign rating


                          © 2009 Ops A La Carte      49
Probability of Occurrence
 Probability of Occurrence can be in terms of failure rate or
 can just be a scale of 1-10 relative to all other failure modes.
 Below is a typical Probability Rating Table

Rating   Description                                      Definition
  10      Dangerously   Likely to Occur Chronically, (Daily or Hourly)
             High

  9       Very High     Likely to Occur during one week of operation

  8         High        Likely to occur during one month of operation.

  6        Medium       Likely to occur during one year of operation.

  4       Moderate      Is likely to Occur during the Life of the System.

  2          Low        A Remote Probability of Occurrence in the Life of the System

  1        Remote       An Unlikely Probability of Occurrence in the Life of the System




                                  © 2009 Ops A La Carte                                   50
Step 6: Assign a detection rating for each
failure mode and/or effect
 What is the probability of the failure being detected
  before the impact of the effect is realized


 List known current controls
 Those items without controls are unlikely to be
  detected (scoring 9 or 10)
 Again, use actual data when possible




                          © 2009 Ops A La Carte           51
Detection
A third factor used in assessing the risk of a failure is
likelihood of Detection of the failure before releasing the
product. The following table is an example of detection
scores (note that a high score indicates that the failure
is more difficult to detect).
Below is a typical Detection Rating Scale
Rating     Description                                         Definition
                             No ability to detect before it occurs or and some ability to detect
   5          Very Low       after (unconfirmed failures)
                             No ability to detect before it occurs but can detect after
   3          Moderate
                             Some ability to detect before it occurs but can detect after
   2            High
                             Very likely it will be detectable before it occurs and after
   1       Almost Certain

Note that the Detection Scale has been derated (scale 1-5 only). For many industries, the
key drivers are severity and probability.
In many industries, there is a high unconfirmed failure rate. Yet there is a high
probability of failures repeating themselves when they go back to the field after not
being confirmed – hence the importance of health diagnostics and the conditional
based maintenance strategy based on these health monitoring diagnostics.
                                       © 2009 Ops A La Carte                                       52
Step 7: Calculate the risk priority number
for each effect
 RPN = S x P x D


 Risk Priority Number equals
  Severity rating times
             Probability of Occurrence rating times
                            Detection rating




                          © 2009 Ops A La Carte       53
Risk Priority Number
 Risk Priority Number (RPN)
      The RPN is the product of the Severity Score, the
       Probability Score, and the Detection Score.
      Once all of the RPN’s have been calculated, the data
       can be sorted from highest to lowest RPN to show
       which are the most critical items to work on.
      Below is an example of an RPN Table
RISK VALUE (RPN)
               251-500 Intolerable Risk                        Additional measures are required to ensure
                                                               adequate safety.
                 101-250 Undesirable Risk                      Risk is tolerable only if risk reduction is impractical or
                                                               if reduction costs are grossly disproportionate to the
                                                               improvement(s) gained. (Requires Executive Mgt.
                                                               Approval.)
                  11-100 Tolerable Risk                        The risk is tolerable if the cost of risk reduction will
                                                               exceed the improvement(s) gained. (Requires Project
                                                               Mgt. Approval.)
                   1-10   Negligible                           Acceptable as implemented.

                                       © 2009 Ops A La Carte                                                                54
Step 8: Prioritize the failure modes for
action

 Simple rank ordering from high to low based on RPN


 Decide on cutoff value
   Those above get attention & resources to improve
   Those below are left alone for now


 Consider including above the cut off any Severity
 rating of 9 or 10



                           © 2009 Ops A La Carte       55
Step 9: Take action to eliminate or reduce
the high risk failure modes
 Use an organized problem-solving process


 Identify and implement actions to eliminate or reduce
 the high-risk failure modes


 Consider DOE as tool to break down and solve
 multiple variable or complex issues




                        © 2009 Ops A La Carte             56
Step 10: Calculate the resulting RPN as
the failure modes are reduced or
eliminate
 Document progress in reducing product risk with an
 update by team of resulting RPN.


 You should expect 50% or greater reduction in total
 PRN after an FMEA


 Continue to make improvements on highest risk items
 until time, resources or overall ROI shift focus.




                        © 2009 Ops A La Carte           57
Linking FMEAs with Test Plans


 In order to write better test plans,
 we must first understand;
 - the use environment
 - the key risks to the design

 The best tool for this is FMEA
Developing Better Test Plans

Stated another way, we cannot
know what to test for unless we
understand the key risks.

Therefore, FMEA is one of the
best sources of input for a
Reliability Test Plan.
Case Study - Inhaler
Developing a Test Plan
without FMEA
 What types of tests can you think of for
 this device?
Developing a Test Plan
without FMEA
 We used the IEC standards and came up
 with a number of solid tests, including:
    High/Low Temperature
    Temperature Cycling
    Vibration
    Drop
    Shock
    Crush
    Humidity
    Altitude
    Did we miss any?
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
   Different cleaning solutions
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
   Different cleaning solutions
   Pen test
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
   Different cleaning solutions
   Pen test
   Lipstick test
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
   Different cleaning solutions
   Pen test
   Lipstick test
   Motor Oil Test
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
   Different cleaning solutions
   Pen test
   Lipstick test
   Motor Oil Test
   Cap Tether Test
FMEA Generated Tests

 Then
     we performed an FMEA and
 came up with the following:
   Different cleaning solutions
   Pen test
   Lipstick test
   Motor Oil Test
   Cap Tether Test
   Did we miss any?
Conclusion
FMEA is a development tactic
 that can help solve the problem
 of testing too little by uncovering
 failure modes that require
 tailored test methods rather than
 only cookbook methods from
 industry standards.
HALT
Highly Accelerated
   Life Testing


      © 2009 Ops A La Carte   71
HALT - Highly Accelerated
        Life Test
    Quickly discover design issues.
    Evaluate & improve design margins.
    Release mature product at market introduction.
    Reduce development time & cost.
    Eliminate design problems before release.
    Evaluate cost reductions made to product.

 Developmental HALT is not really a test you pass or fail,
 it is a process tool for the design engineers.

 There are no pre-established limits.

                        © 2009 Ops A La Carte                72
HALT, How It Works




      Start low and step up the
      stress, testing the product
      during the stressing




         © 2009 Ops A La Carte      73
HALT, How It Works




                      Gradually increase
                      stress level until a
                      failure occurs




        © 2009 Ops A La Carte                74
HALT, How It Works




                                Analyze
                                the failure

        © 2009 Ops A La Carte                 75
HALT, How It Works




Make
temporary
improvements
               © 2009 Ops A La Carte   76
HALT, How It Works
Increase
stress and
start
process
over




              © 2009 Ops A La Carte   77
HALT, How It Works



  Fundamental
  Technological
     Limit


        © 2009 Ops A La Carte   78
HALT, Why It Works
       Classic S-N Diagram
      (stress vs. number of cycles)



                                 S0= Normal Stress conditions
 S2
                                 N0= Projected Normal Life

 S1

 S0



            N2                 N1        N0

                 © 2009 Ops A La Carte                          79
HALT, Why It Works
       Classic S-N Diagram
      (stress vs. number of cycles)


            Point at which failures become non-relevant

                                 S0= Normal Stress conditions
 S2
                                 N0= Projected Normal Life

 S1

 S0



            N2                 N1        N0

                 © 2009 Ops A La Carte                          80
Margin Improvement Process

   Lower     Lower                             Upper    Upper
  Destruct   Oper.    Product                  Oper.   Destruct
   Limit     Limit   Operational               Limit    Limit
                       Specs




                      Stress


                       © 2009 Ops A La Carte                      81
Margin Improvement Process
 This is what the product spec distribution really looks like

     Lower       Lower                             Upper    Upper
    Destruct     Oper.    Product                  Oper.   Destruct
     Limit       Limit   Operational               Limit    Limit
                           Specs




                          Stress


                           © 2009 Ops A La Carte                      82
Margin Improvement Process

 Lower     Lower                             Upper    Upper
Destruct   Oper.    Product                  Oper.   Destruct
 Limit     Limit   Operational               Limit    Limit
                     Specs

                   Destruct
                    Margin
                   Operating
                    Margin




                    Stress


                     © 2009 Ops A La Carte                      83
Developmental HALT Process
    Planning a HALT
    Setting up for a HALT
    Executing a HALT
    Post Testing




                    © 2009 Ops A La Carte   84
When to Perform HALT ?
   Feasibility          Development                           Qualification      Launch
   P1- P2 →             Late P2 →                             P3 →



Perform HALT      Perform HALT on                  Demonstrate               Tracking
on 1 to 2 early   more samples.                    100% reliability           reliability through
prototypes.       These samples will               target @ 80% C.L.          field data
These samples     be closer to final               Shipping /
may be hand-      product and                      Packaging test
made and test     functional tests will            Validation HALT
coverage may      be more refined                  can be performed
be low, but we    with higher test                 here
can still get     coverage.
clues as to
gross design
issues.


    Lessons learned feedback to next generation product



                                      © 2009 Ops A La Carte                                         85
Summary of Results
         - by stress -


                                          Cold Step Stress: 14%

                                          Hot Step Stress: 17%

                                          Rapid Thermal Transitions: 4%

                                          Vibration Step Stress: 45%

                                          Combined Environment: 20%




Significance:
Without Combined Environment, 20% of all
failures would have been missed
                  © 2009 Ops A La Carte                                   86
Traditional vs HALT
           Engineering Needs
Product Development Manpower Requirements
Spending
  Rate
 6                      DVT1 ..... DVTn,

 5

 4                                                            MR
 3
                             MR
 2

 1                                                             $ Savings
 0
     1 2   3   4   5   6 7   8   9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

                                             Time

                                      © 2009 Ops A La Carte                             87
HALT
             Cost Benefits
   Reduced product time to market
   Lowered warranty cost through higher MTBF
   Faster DVT with fewer product samples
   Accelerated screening (HASS) allowed




                        © 2009 Ops A La Carte   88
Accelerated Life Testing
         (ALT)




         © 2009 Ops A La Carte   89
Accelerated Life Test (ALT)
 An Accelerated Life Test (ALT) is the process of
  determining the reliability of a product in a short period
  of time by accelerating the use environment.
 ALT's are good for finding dominant failure
  mechanisms.
 ALT's are usually performed on individual assemblies
  rather than full systems.
 ALT's are also frequently used when there is a wear-out
  mechanism involved.




                           © 2009 Ops A La Carte               90
Stress
  Anything applied to a product, either electrically or
  environmentally, to accelerate finding possible
  weaknesses


  Examples of Electrical Stress: Current, Voltage (DC
  and AC), Power Cycling, and Frequency (line and
  board)
  Examples of Environmental Stress: Temperature
  Extremes, Temperature Cycling, Vibration, Shock,
  Humidity, ESD, Drop, Altitude




                          © 2009 Ops A La Carte            91
Physical Acceleration
 Acceleration means that operating a unit at high
 stress (temperature, voltage, humidity, or duty cycle,
 etc.) produces the same failures that would occur at
 typical-use stresses, except that they happen much
 quicker.


 Failure may be due to mechanical fatigue, corrosion,
 chemical reaction, diffusion, migration, etc. The
 causes are the same, the time scale is simply
 different.
 Changing the stress is equivalent to transforming the
 time scale. This is often a linear transform, which
 means the time-to-fail at high stress is multiplied by a
 constant (acceleration factor) to obtain the equivalent
 time-to-fail at use stress.
                        © 2009 Ops A La Carte               92
Failure Mode Dependence
 Keep in mind that the acceleration factor is highly
  dependent on the failure mechanism.
 Each failure mechanism will most likely have a
  different acceleration factor.


 During testing, conduct thorough failure analysis and
  separate the failure mechanisms for separate
  analysis.


 Selecting the stress to apply must be done with the
  expected failure mechanisms in mind.


                          © 2009 Ops A La Carte           93
Theory of ALT
                  Classic S-N Diagram
                 (stress vs. number of cycles)



                                              S0= Normal Stress conditions
            S2
                                              N0= Projected Normal Life

   Stress   S1

            S0



                       N2                  N1         N0
                        Number of Cycles                                     94
                            © 2009 Ops A La Carte
When to Apply ALT




              ALT Region of Application




                 © 2009 Ops A La Carte    95
ALT Parameters
In order to set up an ALT, we must know several different
parameters, including
 Length of Test
 Number of Samples
 Goal of Test
 Confidence Desired
 Accuracy Desired
 Cost
 Acceleration Factor
   •   Field Environment
   •   Test Environment
   •   Acceleration Factor Calculation
 Slope of Weibull Distribution (Beta parameter)
                            © 2009 Ops A La Carte           96
Review


 When wear-out is a dominant failure
 mechanism, we must be able to predict or
 characterize this wear-out mechanism to
 assure that it occurs outside customer
 expectations and outside the warranty period.

 ALT is an excellent method for doing this




                     © 2009 Ops A La Carte       97
HALT vs. ALT
When to Use Which
   Technique?



      © 2009 Ops A La Carte   98
Overview

HALT and ALT are two of the most
popular testing methods but often
times engineers are confused about
which to use when.




              © 2009 Ops A La Carte   99
Overview
Highly Accelerated Life Testing (HALT) is a great
reliability technique to use for finding predominant
failure mechanisms in a hardware product.

However, in many cases, the predominant failure
mechanism is wear-out.

When this is the situation, we must be able to predict or
characterize this wear-out mechanism to assure that it
occurs outside customer expectations and outside the
warranty period.

The best technique to use for this is a slower test method
Accelerated Life Testing (ALT).
                        © 2009 Ops A La Carte          100
Overview
In many cases, it is best to use both
because each technique is good at
finding different types of failure
mechanisms.

The proper use of both techniques
together will offer a complete picture
of the reliability of the product.


               © 2009 Ops A La Carte     101
HALT
    Highly Accelerated Life Testing
           used for Product Ruggedization




                      ALT
        Accelerated Life Testing
used to Characterize Predominant Failure Mechanisms,
               Especially for Wearout



                       © 2009 Ops A La Carte     102
Comparison Between
  ALT and HALT
                FAILURE TESTING



   HALT                                        ALT

  OBJECTIVES                                 OBJECTIVES
  1. Root Cause Analysis                     1. Reliability Evaluation (e.g. Failure Rates)
  2. Corrective Action Identification        2. Dominant Failure Mechanisms Identification
  3. Design Robustness Determination

  TESTING REQUIREMENTS                       TESTING REQUIREMENTS
  1. Detailed Product Knowledge              1. Detailed Parameters
  2. Engineering Experience                     (a) Test Length
                                                (b) Number of Samples
                                                (c) Confidence/Accuracy
                                                (d) Acceleration Factors
                                                (e) Test Environment
                                             2. Test Metrology & Factors
                                                (a) 4:2:1Procedure Or Other
                                                (b) Costs

                                             ANALYTICAL MODELS
                                             1. Weibull Distribution
                                             2. Arrhenius
                                             3. Coffin-Manson
                                             4. Norris-Lanzberg


                     © 2009 Ops A La Carte                                      103
Advantage of ALT over
           HALT

   One key advantage of ALT over HALT is when we
    need to know the life of the product.
   In HALT, we don’t concern ourselves with this
    much because we are more interested in making
    the product as reliable as we can, and measuring
    the amount of reliability is not as important.
   However, with mechanical items that wear over
    time, it is very important to know the life of the
    product as accurately as possible.


                      © 2009 Ops A La Carte         104
Advantage of ALT over HALT
Another advantage is that we often do not need any
environmental equipment. Benchtop testing is often adequate.




                        © 2009 Ops A La Carte        105
Advantage of HALT over
             ALT
   A big advantage of HALT over ALT is time. We
    are not so worried about time to failure as we are
    which failure mode is dominant. And this we can
    usually find out in a matter of days rather than
    weeks or months.
   This savings in time is also a big savings in money
    since it takes less time at a test lab.
   The number of samples is far fewer (usually 10 to
    1)
   We don’t need to calculate acceleration factor
   We don’t need to stay with the same stresses as the
    field environment because of the cross-over effect
                      © 2009 Ops A La Carte          106
Combining ALT with HALT
Often times we will run a product through HALT and then
run the subassemblies through ALT that were not good
candidates for HALT.

      HALT on System                           ALT on System Fan




                       © 2009 Ops A La Carte                       107
Developing ALT from HALT
And at other times, we may develop the ALT based on the
HALT limits, using the same accelerants but lowering the
acceleration factors to measurable levels.

    HALT on System                             ALT on System




                       © 2009 Ops A La Carte                   108
Examples of Products for
           HALT and ALT
Component
                                     Robot

Fan
                                      Infusion Pump

Hard Drive
                                       Medical
                                       Cabinet
Automotive
Electronics
                                       Cell Phone
Automobile

  These pictures are samples of products we have tested. These are not the
  actual products to protect the proprietary nature of the products we test.
                                © 2009 Ops A La Carte                          109
Component



          Characteristic                                  Accelerant
Aging                                         High Temperature

Contamination, Package                        Temp/Humidity
Hermeticity
Mismatch of Thermal                           Temp Cycling
Characteristics of Package Matls
Die Attachment, Bond Wires                    Vibration

                             © 2009 Ops A La Carte                     110
Automobile



              Test                                Accelerant
Electronics                           Temperature, Vibration, Humidity
                                      Contamination
Mechanical                            Repetitive cycling test




                     © 2009 Ops A La Carte                         111
Fan



              Test                                Accelerant
Spinning                               Duty Cycle, Speed, Torque,
                                       Backpressure
Lubricant Longevity                    Temperature, Humidity,
                                       Contamination




                      © 2009 Ops A La Carte                         112
Hard Drive


                 Test                                 Accelerant
Head Spinning                             Duty Cycle, Start/Stop, Speed,
                                          Temperature?, Vibration?
Contamination on Head Surface             Non-Operational Vibration

Board Derating                            Temperature/Voltage

Connectors – Power, Data                  Duty Cycle, Force, Angle


                            © 2009 Ops A La Carte                          113
Robot



                Test                                      Accelerant
Arm Movement (side to side)                  Duty Cycle, Speed, Torque


Z-Stage (up and down)                        Duty Cycle, Speed, Torque
Vacuum Hold-down                             Temperature, Altitude
Repeatability                                Duty Cycle


                              © 2009 Ops A La Carte                      114
Automotive
  Electronics –
GPS Receiver


                 Test                             Accelerant
Electronics                            Temperature, Vibration, Humidity
                                       Contamination
Button Pushing                         Duty Cycle, Force?, Angle




                        © 2009 Ops A La Carte                        115
Infusion Pump


               Test                                    Accelerant
Battery Charging                            Duty Cycle, Deep Discharge, Speed
                                            of Charge
Touchscreen                                 Duty Cycle, Location, Force?
Pumping                                     Duty Cycle, Rate, Plunger Force
Connectors – Battery, Charger, Pole Duty Cycle, Force, Angle
Clamp, IV Line, Cassette


                             © 2009 Ops A La Carte                         116
Drawer for
  Medical Cabinet


              Test                                   Accelerant

Opening/Closing of Drawer                 Duty Cycle, Force, Angle

Locking Mechanism                         Duty Cycle, Force, Contamination




                            © 2009 Ops A La Carte                       117
Cell Phone



              Test                              Accelerant
Button Pushing                        Duty Cycle, Force?, Angle
Touchscreen                           Duty Cycle, Location, Force?
Connectors – Headset, Battery, Duty Cycle, Force, Angle
Charger


                        © 2009 Ops A La Carte                        118
Summary
   When wear-out is not a dominant failure
    mechanism, HALT is an excellent tool for
    finding product weaknesses in a short
    period of time.




                   © 2009 Ops A La Carte   119
Summary
   When wear-out is a dominant failure
    mechanism, we must be able to predict or
    characterize this wear-out mechanism to
    assure that it occurs outside customer
    expectations and outside the warranty
    period.

   ALT is an excellent method for doing this

                    © 2009 Ops A La Carte   120
RELIABILITY
DEMONSTRATION
 TESTING (RDT)



                             121
     © 2009 Ops A La Carte
Reliability Demonstration Testing (RDT)
 A sample of units are tested at accelerated
  stresses for several months.
 The stresses are a bit lower than the HALT
  stresses and they are held constant (or cycled
  constantly) rather than gradually increasing.
 This enables us to calculate the acceleration
  factor for the test.
 The RDT can be used to validate the reliability
  prediction analyses.




                                                    122
                         © 2009 Ops A La Carte
RDT vs. ALT
 RDT and ALT are very similar in that the stresses
 are usually accelerated but at a lower level than
 HALT.
 The main difference between RDT and ALT is that
 ALT is usually used to characterize the wearout
 region of the product whereas RDT is usually used
 to demonstrate the MTBF in the steady state region
 of the product.
 In an RDT, you CAN substitute samples for time.
 In an ALT, you CANNOT substitute samples for
 time.



                                                      123
                      © 2009 Ops A La Carte
RDT vs. ALT




                                 ALT Region

     RDT Region

                                              124
                  © 2009 Ops A La Carte
RDT, continued




                                                   125
 CRE Primer by QCI, 1998   © 2009 Ops A La Carte
RDT, continued




                                                   126
 CRE Primer by QCI, 1998   © 2009 Ops A La Carte
HALT vs. RDT




    © 2009 Ops A La Carte   127
Overview
Highly Accelerated Life Testing (HALT) is a great
reliability technique to use for finding predominant
failure mechanisms in a hardware product.

However, in many cases, customers need to know the
MTBF or Annualized Failure Rate (AFR) of a product in
the field.

When this is the situation, most people turn to RDT.

However, recently we have developed a method for
estimating MTBF from HALT data.

                       © 2009 Ops A La Carte           128
The AFR Estimator


   The AFR Estimator is a patent pending
    mathematical model that, when provided with
    the appropriate HALT and product
    information, will accurately estimate the
    product’s field AFR or Annual Failure Rate.
   This methodology has been used on a number
    of products with significant positive financial
    results.

                     © 2009 Ops A La Carte        129
Justification for the
                AFR Estimator
   As HALT takes only a few days to run and to implement its
    corrective action(s), and even if it took a bit longer, this time
    would be far less than waiting for an RDT to be run and to
    implement its corrective action(s). The application of this
    model can be a huge time and cost saver.
   As higher HALT limits equate to lower AFR, you now have a
    tool that can accurately estimate the field AFR before
    launching the product. Stress levels that are depicted in the
    table in Section E are highly recommended for HALT. These
    levels can assure the producer that the product will exceed
    customer expectations and allow the producer to accurately
    forecast warranty expenditures.


                              © 2009 Ops A La Carte             130
Justification for the
                AFR Estimator
   By not performing life tests and simply doing HALT, time and
    money will be saved. This is not to say that life testing isn’t
    important. It should be considered for new technologies and
    for an existing part/design with a different application but not
    as a process to accurately estimate AFR.
   With seven to ten simple data entry points and most of them
    coming from the HALT effort, the AFR Estimator will provide
    an accurate field AFR instantaneously with its associated 90%
    statistical confidence limits. The inputs for HASS and HASA
    are: will you perform HASS or HASA, the daily sample size,
    and the detectable shift in the AFR you wish to detect.



                             © 2009 Ops A La Carte            131
Justification for the
               AFR Estimator
   The AFR Estimator has been validated on over twenty products
    from diverse manufacturers and design environments.
   The model can accommodate HALT samples sizes from one to
    six with the optimum size being four. Sample sizes of greater
    than four will default to four.
   90% upper and lower confidence limits are calculated based on
    the HALT AFR and the HALT Sample Size.




                           © 2009 Ops A La Carte           132
Recommendations when
        using the AFR Estimator
   An effective HALT needs to be done with at least three units
    and highly preferable four although the model can
    accommodate sample sizes from one to six.
   Please realize that HALT sample sizes of three or less will
    dramatically affect the ability to detect product defects and
    hence, the statistical confidence is likewise impacted.




                             © 2009 Ops A La Carte            133
Recommendations when
        using the AFR Estimator
1.   Root Cause for Failures
2.   Robust Protocols
3.   Achieve at least the Guard Band Limits
4.   For HASS or HASA, normalize chamber vibration tables
5.   Obtain a copy of, “HALT, HASS, & HASA Explained”, by
     Harry McLean and use it as a reference.




                         © 2009 Ops A La Carte        134
Recommendation 1:
          Root Cause for Failures
   Each of the issues encountered needs to have root cause
    analysis understood, corrective action implemented, then
    verified in HALT under the same stress conditions in which the
    defect was detected. Exceptions to this would be limitations
    that occur beyond the Guard Band Limits in the table following
    Section E. Issues encountered beyond these levels are to have
    root cause analysis performed but corrective action
    implemented as a business decision based on timeliness, cost,
    and program delays.




                            © 2009 Ops A La Carte           135
Recommendation 3:
      Achieve Guard Band Limits
   For the maximum benefit of a low field AFR or a high MTBF,
    it is suggested that the product achieve at least the levels shown
    under the Guard Band Limits in Section E below. These are
    very achievable with time and understanding within the
    organization without having to use extended (more costly)
    temperature range components.




                             © 2009 Ops A La Carte              136
How to Use the Estimator:




         © 2009 Ops A La Carte   137
How to Use the Estimator:
      Calculated MTBF Estimate
   The MTBF estimate in kHours can be from Telcordia, Relex,
    or a similar tool. If this estimate is not available, use 40,000
    as a default value for the estimator. This parameter has very
    little effect on the final field AFR or MTBF estimate due to
    the highly variable processes followed by the many
    assumptions used in estimating an MTBF. Enter this value in
    the table following Section H. Please note that the estimator
    will recommend an MTBF of 40,000 when a value to less than
    40,000 is used.




                            © 2009 Ops A La Carte             138
How to Use the Estimator:
       HALT Operating Limits
   The final Hot operating limit (HOL) achieved in HALT as
    measured on the product and not the chamber setpoint. Enter
    this value in the table following Section H.
   The final Cold operating limit (COL) achieved in HALT as
    measured on the product and not the chamber setpoint. Enter
    this value in the table following Section H.
   The final Vibration operating limit (VOL) achieved in HALT
    as measured on the product and not the chamber setpoint.
    Enter this value in the table following Section H.




                           © 2009 Ops A La Carte          139
How to Use the Estimator:
          Product Environment
    The product’s published thermal operating specifications, in
     C. Try to match your product's Published Specifications to a
     corresponding Level number listed in the table below, i.e., a
     high-end consumer product equates to a Level 2.

    Product's Published Specs         Category             Guard Band Limits    Level 
           0C to 40C               Consumer                ‐30C to +80C       1 
          0C to +50C           Hi‐end Consumer            ‐30C to +100C       2 
         ‐10C to +50C           Hi Performance            ‐40C to +110C       3 
         ‐20C to +50C         Critical Application        ‐50C to +110C       4 
         ‐25C to +65C              Sheltered              ‐50C to +110C       5 
         ‐40C to +85C             All Outdoor             ‐65C to +110C       6 




                                   © 2009 Ops A La Carte                         140
How to Use the Estimator:
   Running the Estimator
Once the Value for AFR Estimator column is completed, you
are ready to run the AFR Estimator and determine the
product’s AFR, MTBF, Confidence Limits, and days to detect
shift in AFR if HASS or HASA is being used.




                      © 2009 Ops A La Carte         141
WRAP-UP



  © 2009 Ops A La Carte   142
Design for Reliability (DfR) Tools
 Reliability Assessment, Goal Setting, and Planning
 Reliability Modeling and Prediction
 Thermal Analysis
 Derating Analysis
 Failure Modes and Effects Analysis (FMEA)
 Fault Tree Analysis (FTA)
 Design of Experiments (DoE)
 Human Engineering/Human Factors Analysis
 Highly Accelerated Life Test (HALT)
 Accelerated Life Test (ALT)
 RDT and ORT
 Highly Accelerated Stress Screen (HASS)
 Root Cause Analysis (RCA)
 Restriction of Hazardous Substances (RoHS)
 Outsourced Engineering and Reliability
 Field Data Analysis
  Red shows tools we introduced today.                 143
                          © 2009 Ops A La Carte
Thank you for your
  participation!


       © 2009 Ops A La Carte   144
Contact Information

 Ops A La Carte, LLC                             Ops A La Carte, LLC
   Mike Silverman                                   Vijay Prasad
  Managing Partner                             Program Manager, S. Cal
    (408) 472-3889                                 (858) 349-0443
mikes@opsalacarte.com                          vijayp@opsalacarte.com
 www.opsalacarte.com                            www.opsalacarte.com




                       © 2009 Ops A La Carte                             145
Additional OPS A La Carte
       Information




           © 2009 Ops A La Carte   146
Presenter’s Biographical Sketch – Mike Silverman

◈ Mike Silverman is founder and managing partner at Ops A La Carte, a Professional
  Consulting Company that has in intense focus on helping customers with end-to-end
  reliability. Through Ops A La Carte, Mike has had extensive experience as a consultant
  to high-tech companies, and has consulted for over 300 companies including Cisco,
  Ciena, Siemens, Abbott Labs, and Applied Materials. He has consulted in a variety of
  different industries including power electronics, telecommunications, networking,
  medical, semiconductor, semiconductor equipment, consumer electronics, and defense.
◈ Mike has 20 years of reliability and quality experience. He is also an expert in
  accelerated reliability techniques, including HALT&HASS (and recently purchased a HALT
  Lab), testing over 500 products for 100 companies in 40 different industries. Mike has
  authored and published 8 papers on reliability techniques and has presented these
  around the world including China, Germany, Canada, Taiwan, Singapore, and Korea.
  He has also developed and currently teaches 27 courses on reliability techniques.
◈ Mike has a BS degree in Electrical and Computer Engineering from the University of
  Colorado at Boulder, and is both a Certified Reliability Engineer and a course instructor
  through the American Society for Quality (ASQ), IEEE, Effective Training Associates, and
  Hobbs Engineering. Mike is a member of ASQ, IEEE, SME, ASME, PATCA, and IEEE
  Consulting Society and is the current chapter president in the IEEE Reliability Society for
  Silicon Valley.




                                       © 2009 Ops A La Carte                                147
We Can Help You Sell to Your
          Management
Often times, our main contact has difficulty
selling reliability into their company. We
have many techniques to help:

 1) Detailed Proposals with Case Examples
 2) Free Presentations at your site
 3) Technical Articles/White Papers
 4) Blog Articles covering your situation
 5) Articles from our quarterly Newsletter
                   © 2009 Ops A La Carte   148
What’s New at Ops?
0) New Book “50 Ways to Improve Your Reliability”
1) A new HALT Calculator
2) A new Reliability Blog
3) Semiconductor Reliability services
4) Software Reliability services
5) RoHS conversion services
6) Warranty analysis services
7) New Accelerated Life Test methodology
8) Quality/6 Sigma Seminars
9) Offices: Singapore, China, Taiwan, UK, India
10) Complete Reliability Solutions
11) Green Reliability Services
                             © 2009 Ops A La Carte   149
Reliability Integration Education
                   - 31 different seminars on reliability -
1) Overall Program Reliability Integration                              17) Design for ‘X’ (DfX)
2) Concept Phase Reliability Tools & Integration                        18) Mechanical Design for IC Packaging
3) Design Phase Reliability Tools & Integration                         19) Design of Experiments (DoE)
4) Prototype Phase Reliability Tools & Integration                      20) HALT and HASS Application
5) Manufacturing Phase Reliability Tools & Integr.                      21) Statistics for 6 Sigma
6) Reliability Techniques for Beginners                                 22) Fundamentals of Climatic Testing
7) Reliability Statistics                                               23) Design for Vibration and Shock
8) FMECA                                                                24) Software Reliability
9) CRE Exam Preparation                                                 25) Root Cause Analysis
10) CQE Exam Preparation                                                26) Reality of Pb-Free Reliability
11) Design for Reliability (DfR)                                        27) Statistical Process Control
12) Design for Manufacturability (DfM)                                  28) Innovative Problem Solving
13) Design for Testability (DfT)                                        29) Mechanical Design for Reliability
14) Design for Warranty Cost Reduction (DfW)                            30) Problem Solving Tools
15) Design for 6 Sigma (DfSS)                                           31) Applied Data Analysis
16) Design for Safety

                             Red – Part of our yearly symposium
                                                © 2009 Ops A La Carte                                            150
Upcoming Seminars
CQE Course – Apr-Jun and Oct-Dec, 2010
CRE Course – Jan-Mar and Aug-Oct, 2010


We offer 31 different courses and seminars in Reliability,
 Quality, and Technical Operations.


 Please see our Educational Brochure inside your Ops A La
 Carte packet for more details




                         © 2009 Ops A La Carte                151
Upcoming Events
ARS – June, 2010 Reno
 We are a co-sponsor and we will be exhibiting and will be
 presenting a paper on our new HALT Calculator


ASTR – October, Denver
 We are on the committee and will be exhibiting and
 presenting.


RAMS – January, 11, Orlando
 We are on the committee and will be exhibiting and
 presenting.


                         © 2009 Ops A La Carte               152

More Related Content

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Design for Reliability Seminar for WYLE Labs - Feb 2010 - Mike Silverman Webinar

  • 1. & We Provide You Confidence in Your Product ReliabilityTM Ops A La Carte / (408) 654-0499 / askops@opsalacarte.com / www.opsalacarte.com
  • 2. DESIGN FOR RELIABILITY (DfR) SEMINAR at February 11, 2010 Mike Silverman // (408) 472-3889 // mikes@opsalacarte.com Ops A La Carte LLC // www.opsalacarte.com © 2009 Ops A La Carte 1
  • 3. DfR Seminar Overview Thurs, Feb 11, 2010 - DFR SEMINAR -  10:00-10:10am Introduction  10:10-10:30am DfR Overview/Introduction  10:30-11:00am FMEA  11:00-11:30am Using FMEA to Design a Better Reliability Test Program  11:30-11:50 am HALT  11:50- 12:10pm Lunch Break  12:10-12:30pm ALT  12:30:1:00pm HALT vs. ALT – When to Use Which Technique?  1:00-1:15pm Reliability Demonstration Test (RDT)  1:15-1:45pm HALT vs. RDT – The HALT Calculator  1:45-2:00pm Wrap-Up/Questions Note that this ½ day seminar is an abridged version of a 5 day DfX seminar we will be holding 3 times this year: - Apr 16-20 in Santa Clara, CA - May 17-21 in Huntsville, AL - Oct 11-15 in Maryland © 2009 Ops A La Carte 2
  • 4. Product Life Cycle Reliability and Test Spectrum Wyle and OPS Combined Capabilities Program Test & Operate & Capture Design Build Eval Qualify Manufacture Maintain Test Engineering Services Test Quotes Tech Test Requirements KEY Plans Test Data Analysis Wyle Procedures Ops Test Services Wyle & OPS HALT HASS Dev Test Qual Test Acceptance Reliability, Maintainability, Supportability Services FMECA Reliability Eng Configuration Publications Training Reliability Eng & Analysis Management & Analysis Asset Lean Management RCM Six Sigma NDI TOC © 2009 Ops A La Carte 3
  • 5. Jim Pinyan Director of Business Development Test, Engineering & Research Division (310) 563-6651 Jim.pinyan@wyle.com 128 Maryland Street El Segundo, CA 90245 © 2009 Ops A La Carte 4
  • 6. RELIABILITY CONSULTING Company Overview © 2009 Ops A La Carte 5
  • 7. provides clients with integrated reliability solutions across the Product Life Cycle. We have the unique ability to assess a product and understand the key reliability elements necessary to measure/improve product performance and customer satisfaction. Our strength lies in our ability to tailor a solution to fit your needs based on your product reliability requirements, schedule and budget. © 2009 Ops A La Carte 6
  • 8. HALT and HASS Labs • Our own lab facility located in Northern California in the heart of Silicon Valley. We provide HALT/HASS services on a world-wide basis, using partner labs for tests outside California. • Second oldest HALT facility in the world, established in 1995 (originally owned by QualMark) • HALT equipment has all latest technology – only lab in region • Highly-experienced staff with over 100 years of combined experience in HALT and HASS • Tested over 500 products in over 302009 Ops A La Carte industries © different 7
  • 9. The following presentation materials are copyright protected property of Ops A La Carte LLC. These materials may not be distributed outside of your company. © 2009 Ops A La Carte 8
  • 10. What is DESIGN for RELIABILITY? © 2009 Ops A La Carte 9
  • 11. First we must ask: What is Reliability? Reliability is often considered quality over time. Reliability is… “The ability of a system or component to perform its required functions under stated conditions for a specified period of time” - IEEE 610.12-1990  We shall revisit this when we discuss Reliability Goal Setting. © 2009 Ops A La Carte 10
  • 12. Different Views of Reliability  Product development teams View reliability as the domain to address mechanical and electrical, and Mechanical manufacturing issues. Reliability  Customers + View reliability as a system-level issue, Electrical with minimal concern placed on the Reliability distinction into sub-domains.  Since the primary measure of + reliability is made by the customer, SW engineering teams must maintain a Reliability balance of both views (system and sub-domain) in order to develop a reliable product. System © 2009 Ops A La Carte 11
  • 13. Reliability vs. Cost  Intuitively, the emphasis in reliability to achieve a reduction in warranty and in-service costs results in some minimal increase in development and manufacturing costs .  Use of the proper tools during the proper life cycle phase will help to minimize total Life Cycle Cost (LCC). © 2009 Ops A La Carte 12
  • 14. Reliability vs. Cost, continued To minimize total Life Cycle Costs (LCC), an organization must do two things: 1. Choose the best tools from all of the tools available and apply these tools at the proper phases of the product life cycle. 2. Properly integrate these tools together to assure that the proper information is fed forwards and backwards at the proper times. © 2009 Ops A La Carte 13
  • 15. Reliability Integration “the process of seamlessly, cohesively integrating reliability tools together to maximize reliability and at the lowest possible cost” © 2009 Ops A La Carte 14
  • 16. Reliability vs. Cost, continued TOTAL COST OPTIMUM CURVE COST POINT RELIABILITY PROGRAM COSTS COST WARRANTY COSTS RELIABILITY HW RELIABILITY & COSTS © 2009 Ops A La Carte 15
  • 17. ELEMENTS OF A RELIABILITY PROGRAM © 2009 Ops A La Carte 16
  • 18. DfR Tool Selection A reliability assessment is the recommended first step in establishing a reliability program. This mechanism is the appropriate forum for selecting the best tools for each product life cycle phase. © 2009 Ops A La Carte 17
  • 19. RELIABILITY ASSESSMENT © 2009 Ops A La Carte 18
  • 20. Reliability Program Assessment • Initiate a Reliability Program • Determine next best steps $ Profits • Reduce customer complaints • Select right tools • Improve reliability market Goal share Program Plan Gap Analysis satisfaction Benchmarking Statistical Data Analysis A detailed evaluation of an organization’s approach and Assessment Interviews processes involved in creating field reliable products. The assessment failures $ unreliability captures the current state and Now leads to an actionable reliability ? Unknown program plan. complaints Reliability ? © 2009 Ops A La Carte 19
  • 21. Agenda • motivation • approach • results • findings • observations • next steps • close © 2009 Ops A La Carte 20
  • 22. Assessment Motivation • Identify systemic changes that impact reliability – Tie into culture and product – Both enjoy benefits • Provides roadmap for activities that achieve results – Matching of capabilities and expectations – Cooperative approach © 2009 Ops A La Carte 21
  • 23. Assessment Approach  Preparation  Checklist  Who to interview in organization  Analysis, average scores and summary of comments © 2009 Ops A La Carte 22
  • 24. Steps Involved  selecting people to survey  selecting survey topics  develop scoring system  data analysis  summary feedback results  review of results  recommended actions © 2009 Ops A La Carte 23
  • 25. Select People to Survey Hardware:  Hardware manager  Electrical engineering lead  Mechanical engineering lead  System engineering lead  Reliability manager/engineer  Procurement  Manufacturing Software:  sw r&d manager  sw r&d engineer  sw test manager  sw test engineer © 2009 Ops A La Carte 24
  • 26. Select Survey Topics DFR Methods Survey Scoring: 4 = 100%, top priority, always done 3 = >75%, use normally, expected 2 = 25% - 75%, variable use 1 = <25%, only occasional use 0 = not done or discontinued - = not visible, no comment Management: □ Goal setting for division □ Priority of quality & reliability improvement □ Management attention & follow up (goal ownership) Design: □ Documented hardware design cycle □ Goal setting by product or module © 2009 Ops A La Carte 25
  • 27. Example  To what extent is FMEA used?  Design Engineer Score = 1: Used only as a troubleshooting tool  Manufacturing Engineer Score = 3: Commonly used on critical design elements  Reliability Engineer Score = 4: Always used on all products Results: Score 2.6 Comments: Clearly a disconnect between reliability and design engineering – indicative of a problem with the tool. © 2009 Ops A La Carte 26
  • 28. Reliability Maturity Grid • 5 levels of maturity • Loosely based on IEEE 1332: “Reliability Program for the Development and Production of Electronic Products” (currently in draft form) • Similar to Crosby’s Quality Maturity • On the following page is a matrix based on Crosby’s as an example. • Read across each row and find the statement that seems most true for your organization. • The center of mass of the levels is the organization’s overall level. © 2009 Ops A La Carte 27
  • 29. Reliability Maturity Matrix Measurement Stage I: Stage II: Stage III: Stage IV: Stage V: Category Uncertainty Awakening Enlightenment Wisdom Certainty Management No comprehension of Recognizing that reliability Still learning more about Participating. Consider reliability Understanding and Attitude reliability as a management management may be of reliability management. Understand absolutes of management an tool. Tend to blame value but not willing to Becoming supportive and reliability management. essential part of company reliability engineering for provide money or time to helpful. Recognize their personal system. ‘reliability problems’ make it happen. role in continuing emphasis. Reliability status Reliability is hidden in A stronger reliability Reliability manager Reliability manager is an Reliability manager is on manufacturing or leader appointed, yet reports to top officer of company; board of directors. engineering departments. main emphasis is still on management, with role in effective status reporting Prevention is main Reliability testing probably an audit of initial product management of division. and preventive action. concern. Reliability is a not part of organization. functionality. Reliability Involved with consumer thought leader. Emphasis on initial product testing still not performed. affairs. functionality. Problem handling Fire fighting; no root cause Teams are set up to solve Corrective action process Problems are identified Except in the most analysis or resolution; lots of major problems. Long- in place. Problems are early in their unusual cases, problems yelling and accusations. range solutions are not recognized and solved in development. All are prevented. identified or orderly way. functions are open to implemented. suggestion and improvement. Cost of Reliability as % of Warranty: unknown Warranty: 3% Warranty: 4% Warranty: 3% Warranty: 1.5% net revenue Reported: unknown Reported: unknown Reported: 8% Reported: 6.5% Reported: 3% Actual: 20% Actual: 18% Actual: 12% Actual: 8% Actual: 3% Feedback process None. No reliability testing. Some understanding of Accelerated testing of Refinement of testing The few field failures are No field failure reporting field failures and critical systems during systems – only testing fully analyzed and other than customer complaints. Designers design. System level critical or uncertain product designs or complaints and returns. and manufacturing do modeling and testing. areas. Increased procurement not get meaningful Field failures analyzed understanding of causes specifications altered. information. and root causes reported. of failure allow Reliability testing done to deterministic failure rate augment reliability prediction models models. DFR program status No organized activities. Organization told Implementation of DFR DFR program active in all Reliability improvement is No understanding of such reliability is important. DFR program with thorough areas of division – not a normal and continued activities. tools and processes understanding and just design & mfg’ing. activity. inconsistently applied and establishment of each DFR normal part of R&D only ‘when time permits’. tool. and manufacturing. Summation of reliability “We don’t know why we “Is it absolutely necessary “Through commitment “Failure prevention is a “We know why we do not posture have problems with to always have problems and reliability routine part of our have problems with reliability” with reliability?” improvement we are operation.” reliability.” identifying and resolving our problems.” © 2009 Ops A La Carte 28
  • 30. Reliability Maturity Matrix Lets look at one row to get a better understanding. Measure- Stage I: Stage II: Stage III: Stage IV: Stage V: Uncertainty Awakening Enlighten- Wisdom Certainty ment ment Category Problem Fire Teams are Corrective Problems Except in handling fighting; no set up to action are the most root cause solve process in identified unusual analysis or major place. early in cases, resolution; problems. Problems their problems lots of Long- are developm are yelling and range recognize ent. All prevented. accusations solutions d and functions . are not solved in are open identified orderly to or way. suggestio implement n and ed. improvem ent. © 2009 Ops A La Carte 29
  • 31. Results & Meaning • Looking for trends, gaps in process, skill mismatches, over analysis, under analysis, etc. • Looking for differences across the organization, pockets of excellence, areas with good results • Process provides snapshot of current system • No one tool make an entire reliability program. The tools need to match the needs of the products and the culture. • Check step is critical before moving to recommendation around improvement plan © 2009 Ops A La Carte 30
  • 32. Observations What Companies Are What Companies Are Doing Best Weak at  Prediction  Goal setting/Planning  HALT  Repair & warranty invisible  Golden nuggets  Lessons learned  Fast reaction to fix capture problems  Single owner of product reliability  Multiple defect tracking systems  Reliability Integration  Statistics © 2009 Ops A La Carte 31
  • 33. Next Steps • Determine current state of your organization (Summary of Assessment) – Identify strong and weak areas • Goal Setting – Market Analysis to gather requirements – Benchmarking • Gap Analysis • Develop plan and implement © 2009 Ops A La Carte 32
  • 34. Failure Mode and Effect Analysis (FMEA) Seminar © 2009 Ops A La Carte 33
  • 35. FMEA A FMEA is a systematic method of identifying and preventing product and process problems BEFORE they occur. © 2009 Ops A La Carte 34
  • 36. © 2009 Ops A La Carte 35
  • 37. © 2009 Ops A La Carte 36
  • 38. Not close enough to home yet? © 2009 Ops A La Carte 37
  • 39. © 2009 Ops A La Carte 38
  • 40. FMEA Benefits  Facilitates investigation of design alternatives to consider high reliability at the conceptual stages of the design.  Provides a basis for identifying root cause failures and developing corrective actions.  Determines the effects of each failure mode on system performance.  Aids in developing test methods and troubleshooting techniques.  Provides a foundation for qualitative analyses.  Provide structured forum for cross functional discussions  Provide common understanding and focus to reduce product or process issues  Provide documentation of risk management effort © 2009 Ops A La Carte 39
  • 41. Types of FMEAs  Design FMEA  Process FMEA  System FMEA  Functional FMEA  User FMEA  Software FMEA  Many others © 2009 Ops A La Carte 40
  • 42. When Is a FMEA Performed  FMEA’s are begun early in the design process and then updated throughout the life cycle of a product to capture changes in the design. © 2009 Ops A La Carte 41
  • 43. The 10 Steps  Step 1: Review the Process/Design  Step 2: Brainstorm potential failure modes  Step 3: List potential effects of each failure mode  Step 4: Assign a severity rating for each effect  Step 5: Assign an occurrence rating for failure modes  Step 6: Assign a detection rating for modes/effects  Step 7: Calculate the risk priority numbers  Step 8: Prioritize the failure modes for action  Step 9: Take action to eliminate/reduce high-risk  Step 10: Calculate the resulting RPN © 2009 Ops A La Carte 42
  • 44. Step 1: Review the Design or Process  Understand the topic of study  Design – drawings, prototypes, etc.  Process – flowcharts, assembly instructions, etc.  Focus on developing common understanding of design or process  Designers or Process Experts available for questions © 2009 Ops A La Carte 43
  • 45. Step 2: Brainstorm potential failure modes  Have fun!  How can the design/process fail?  Break complex designs/processes into smaller elements  Combine like ideas (affinity plotting)  May have more than one failure mode per item or function  List failure modes on worksheet  Determine failure modes vs. failure mechanisms  Use Boundary Interface Diagram Tool  Use P-Diagram Tool © 2009 Ops A La Carte 44
  • 46. Common brainstorming tools  Team dynamics  Consensus-building techniques  Team project documentation  Idea-generation techniques  Group brainstorming with a facilitator  Affinity diagramming  Flowcharting  Boundary Interface Diagram  P-Diagram  Data analysis  Graphing techniques © 2009 Ops A La Carte 45
  • 47. Step 3: List Potential effects of each failure mode  If the failure occurs, what are the consequences?  List effect for each failure mode (not mechanism).  List more than one effect, when necessary  (note: more than one effect if ratings will be different, or solution would have to different) © 2009 Ops A La Carte 46
  • 48. Step 4: Assign a severity rating for each effect  What is the consequence of the failure should it occur?  Assign a severity rating for each effect  An estimation of how serious the effects would be if the failure mode occurs  Historical data  Engineering judgment  Experimentation, DOE, if needed © 2009 Ops A La Carte 47
  • 49. Severity Severity is the assessment of the seriousness of the effect of the failure mode to the next component, subsystem, system or customer if it occurs. Below is a typical Severity Rating Table. Rating Description Definition 10 Dangerously High Catastrophic Failure Causing Replacement of the Entire System) 9 Very high Failure of a FRU Component, MTTR > 1 Hour 8 High Failure of a FRU Component, MTTR < 1 Hour 6 Moderate Failure that results in reduced throughput 4 Minor Failure that requires a tool reset or recalibration 2 Very minor Failure that can be corrected during a PM cycle 1 None Failure that does not affect system performance © 2009 Ops A La Carte 48
  • 50. Step 5: Assign an occurrence rating for each failure mode  What is the probability of the failure occurring  List the potential causes of failure  Use actual data when available for rating  When real data is not available:  Engineering estimates or models  Consider the failure causes probabilities  Rank order then assign rating © 2009 Ops A La Carte 49
  • 51. Probability of Occurrence Probability of Occurrence can be in terms of failure rate or can just be a scale of 1-10 relative to all other failure modes. Below is a typical Probability Rating Table Rating Description Definition 10 Dangerously Likely to Occur Chronically, (Daily or Hourly) High 9 Very High Likely to Occur during one week of operation 8 High Likely to occur during one month of operation. 6 Medium Likely to occur during one year of operation. 4 Moderate Is likely to Occur during the Life of the System. 2 Low A Remote Probability of Occurrence in the Life of the System 1 Remote An Unlikely Probability of Occurrence in the Life of the System © 2009 Ops A La Carte 50
  • 52. Step 6: Assign a detection rating for each failure mode and/or effect  What is the probability of the failure being detected before the impact of the effect is realized  List known current controls  Those items without controls are unlikely to be detected (scoring 9 or 10)  Again, use actual data when possible © 2009 Ops A La Carte 51
  • 53. Detection A third factor used in assessing the risk of a failure is likelihood of Detection of the failure before releasing the product. The following table is an example of detection scores (note that a high score indicates that the failure is more difficult to detect). Below is a typical Detection Rating Scale Rating Description Definition No ability to detect before it occurs or and some ability to detect 5 Very Low after (unconfirmed failures) No ability to detect before it occurs but can detect after 3 Moderate Some ability to detect before it occurs but can detect after 2 High Very likely it will be detectable before it occurs and after 1 Almost Certain Note that the Detection Scale has been derated (scale 1-5 only). For many industries, the key drivers are severity and probability. In many industries, there is a high unconfirmed failure rate. Yet there is a high probability of failures repeating themselves when they go back to the field after not being confirmed – hence the importance of health diagnostics and the conditional based maintenance strategy based on these health monitoring diagnostics. © 2009 Ops A La Carte 52
  • 54. Step 7: Calculate the risk priority number for each effect  RPN = S x P x D  Risk Priority Number equals Severity rating times Probability of Occurrence rating times Detection rating © 2009 Ops A La Carte 53
  • 55. Risk Priority Number  Risk Priority Number (RPN)  The RPN is the product of the Severity Score, the Probability Score, and the Detection Score.  Once all of the RPN’s have been calculated, the data can be sorted from highest to lowest RPN to show which are the most critical items to work on.  Below is an example of an RPN Table RISK VALUE (RPN) 251-500 Intolerable Risk Additional measures are required to ensure adequate safety. 101-250 Undesirable Risk Risk is tolerable only if risk reduction is impractical or if reduction costs are grossly disproportionate to the improvement(s) gained. (Requires Executive Mgt. Approval.) 11-100 Tolerable Risk The risk is tolerable if the cost of risk reduction will exceed the improvement(s) gained. (Requires Project Mgt. Approval.) 1-10 Negligible Acceptable as implemented. © 2009 Ops A La Carte 54
  • 56. Step 8: Prioritize the failure modes for action  Simple rank ordering from high to low based on RPN  Decide on cutoff value  Those above get attention & resources to improve  Those below are left alone for now  Consider including above the cut off any Severity rating of 9 or 10 © 2009 Ops A La Carte 55
  • 57. Step 9: Take action to eliminate or reduce the high risk failure modes  Use an organized problem-solving process  Identify and implement actions to eliminate or reduce the high-risk failure modes  Consider DOE as tool to break down and solve multiple variable or complex issues © 2009 Ops A La Carte 56
  • 58. Step 10: Calculate the resulting RPN as the failure modes are reduced or eliminate  Document progress in reducing product risk with an update by team of resulting RPN.  You should expect 50% or greater reduction in total PRN after an FMEA  Continue to make improvements on highest risk items until time, resources or overall ROI shift focus. © 2009 Ops A La Carte 57
  • 59. Linking FMEAs with Test Plans In order to write better test plans, we must first understand; - the use environment - the key risks to the design The best tool for this is FMEA
  • 60. Developing Better Test Plans Stated another way, we cannot know what to test for unless we understand the key risks. Therefore, FMEA is one of the best sources of input for a Reliability Test Plan.
  • 61. Case Study - Inhaler
  • 62. Developing a Test Plan without FMEA  What types of tests can you think of for this device?
  • 63. Developing a Test Plan without FMEA  We used the IEC standards and came up with a number of solid tests, including:  High/Low Temperature  Temperature Cycling  Vibration  Drop  Shock  Crush  Humidity  Altitude  Did we miss any?
  • 64. FMEA Generated Tests  Then we performed an FMEA and came up with the following:
  • 65. FMEA Generated Tests  Then we performed an FMEA and came up with the following:  Different cleaning solutions
  • 66. FMEA Generated Tests  Then we performed an FMEA and came up with the following:  Different cleaning solutions  Pen test
  • 67. FMEA Generated Tests  Then we performed an FMEA and came up with the following:  Different cleaning solutions  Pen test  Lipstick test
  • 68. FMEA Generated Tests  Then we performed an FMEA and came up with the following:  Different cleaning solutions  Pen test  Lipstick test  Motor Oil Test
  • 69. FMEA Generated Tests  Then we performed an FMEA and came up with the following:  Different cleaning solutions  Pen test  Lipstick test  Motor Oil Test  Cap Tether Test
  • 70. FMEA Generated Tests  Then we performed an FMEA and came up with the following:  Different cleaning solutions  Pen test  Lipstick test  Motor Oil Test  Cap Tether Test  Did we miss any?
  • 71. Conclusion FMEA is a development tactic that can help solve the problem of testing too little by uncovering failure modes that require tailored test methods rather than only cookbook methods from industry standards.
  • 72. HALT Highly Accelerated Life Testing © 2009 Ops A La Carte 71
  • 73. HALT - Highly Accelerated Life Test  Quickly discover design issues.  Evaluate & improve design margins.  Release mature product at market introduction.  Reduce development time & cost.  Eliminate design problems before release.  Evaluate cost reductions made to product. Developmental HALT is not really a test you pass or fail, it is a process tool for the design engineers. There are no pre-established limits. © 2009 Ops A La Carte 72
  • 74. HALT, How It Works Start low and step up the stress, testing the product during the stressing © 2009 Ops A La Carte 73
  • 75. HALT, How It Works Gradually increase stress level until a failure occurs © 2009 Ops A La Carte 74
  • 76. HALT, How It Works Analyze the failure © 2009 Ops A La Carte 75
  • 77. HALT, How It Works Make temporary improvements © 2009 Ops A La Carte 76
  • 78. HALT, How It Works Increase stress and start process over © 2009 Ops A La Carte 77
  • 79. HALT, How It Works Fundamental Technological Limit © 2009 Ops A La Carte 78
  • 80. HALT, Why It Works Classic S-N Diagram (stress vs. number of cycles) S0= Normal Stress conditions S2 N0= Projected Normal Life S1 S0 N2 N1 N0 © 2009 Ops A La Carte 79
  • 81. HALT, Why It Works Classic S-N Diagram (stress vs. number of cycles) Point at which failures become non-relevant S0= Normal Stress conditions S2 N0= Projected Normal Life S1 S0 N2 N1 N0 © 2009 Ops A La Carte 80
  • 82. Margin Improvement Process Lower Lower Upper Upper Destruct Oper. Product Oper. Destruct Limit Limit Operational Limit Limit Specs Stress © 2009 Ops A La Carte 81
  • 83. Margin Improvement Process This is what the product spec distribution really looks like Lower Lower Upper Upper Destruct Oper. Product Oper. Destruct Limit Limit Operational Limit Limit Specs Stress © 2009 Ops A La Carte 82
  • 84. Margin Improvement Process Lower Lower Upper Upper Destruct Oper. Product Oper. Destruct Limit Limit Operational Limit Limit Specs Destruct Margin Operating Margin Stress © 2009 Ops A La Carte 83
  • 85. Developmental HALT Process  Planning a HALT  Setting up for a HALT  Executing a HALT  Post Testing © 2009 Ops A La Carte 84
  • 86. When to Perform HALT ? Feasibility Development Qualification Launch P1- P2 → Late P2 → P3 → Perform HALT Perform HALT on Demonstrate Tracking on 1 to 2 early more samples. 100% reliability reliability through prototypes. These samples will target @ 80% C.L. field data These samples be closer to final Shipping / may be hand- product and Packaging test made and test functional tests will Validation HALT coverage may be more refined can be performed be low, but we with higher test here can still get coverage. clues as to gross design issues. Lessons learned feedback to next generation product © 2009 Ops A La Carte 85
  • 87. Summary of Results - by stress - Cold Step Stress: 14% Hot Step Stress: 17% Rapid Thermal Transitions: 4% Vibration Step Stress: 45% Combined Environment: 20% Significance: Without Combined Environment, 20% of all failures would have been missed © 2009 Ops A La Carte 86
  • 88. Traditional vs HALT Engineering Needs Product Development Manpower Requirements Spending Rate 6 DVT1 ..... DVTn, 5 4 MR 3 MR 2 1 $ Savings 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Time © 2009 Ops A La Carte 87
  • 89. HALT Cost Benefits  Reduced product time to market  Lowered warranty cost through higher MTBF  Faster DVT with fewer product samples  Accelerated screening (HASS) allowed © 2009 Ops A La Carte 88
  • 90. Accelerated Life Testing (ALT) © 2009 Ops A La Carte 89
  • 91. Accelerated Life Test (ALT)  An Accelerated Life Test (ALT) is the process of determining the reliability of a product in a short period of time by accelerating the use environment.  ALT's are good for finding dominant failure mechanisms.  ALT's are usually performed on individual assemblies rather than full systems.  ALT's are also frequently used when there is a wear-out mechanism involved. © 2009 Ops A La Carte 90
  • 92. Stress  Anything applied to a product, either electrically or environmentally, to accelerate finding possible weaknesses  Examples of Electrical Stress: Current, Voltage (DC and AC), Power Cycling, and Frequency (line and board)  Examples of Environmental Stress: Temperature Extremes, Temperature Cycling, Vibration, Shock, Humidity, ESD, Drop, Altitude © 2009 Ops A La Carte 91
  • 93. Physical Acceleration  Acceleration means that operating a unit at high stress (temperature, voltage, humidity, or duty cycle, etc.) produces the same failures that would occur at typical-use stresses, except that they happen much quicker.  Failure may be due to mechanical fatigue, corrosion, chemical reaction, diffusion, migration, etc. The causes are the same, the time scale is simply different.  Changing the stress is equivalent to transforming the time scale. This is often a linear transform, which means the time-to-fail at high stress is multiplied by a constant (acceleration factor) to obtain the equivalent time-to-fail at use stress. © 2009 Ops A La Carte 92
  • 94. Failure Mode Dependence  Keep in mind that the acceleration factor is highly dependent on the failure mechanism.  Each failure mechanism will most likely have a different acceleration factor.  During testing, conduct thorough failure analysis and separate the failure mechanisms for separate analysis.  Selecting the stress to apply must be done with the expected failure mechanisms in mind. © 2009 Ops A La Carte 93
  • 95. Theory of ALT Classic S-N Diagram (stress vs. number of cycles) S0= Normal Stress conditions S2 N0= Projected Normal Life Stress S1 S0 N2 N1 N0 Number of Cycles 94 © 2009 Ops A La Carte
  • 96. When to Apply ALT ALT Region of Application © 2009 Ops A La Carte 95
  • 97. ALT Parameters In order to set up an ALT, we must know several different parameters, including  Length of Test  Number of Samples  Goal of Test  Confidence Desired  Accuracy Desired  Cost  Acceleration Factor • Field Environment • Test Environment • Acceleration Factor Calculation  Slope of Weibull Distribution (Beta parameter) © 2009 Ops A La Carte 96
  • 98. Review  When wear-out is a dominant failure mechanism, we must be able to predict or characterize this wear-out mechanism to assure that it occurs outside customer expectations and outside the warranty period.  ALT is an excellent method for doing this © 2009 Ops A La Carte 97
  • 99. HALT vs. ALT When to Use Which Technique? © 2009 Ops A La Carte 98
  • 100. Overview HALT and ALT are two of the most popular testing methods but often times engineers are confused about which to use when. © 2009 Ops A La Carte 99
  • 101. Overview Highly Accelerated Life Testing (HALT) is a great reliability technique to use for finding predominant failure mechanisms in a hardware product. However, in many cases, the predominant failure mechanism is wear-out. When this is the situation, we must be able to predict or characterize this wear-out mechanism to assure that it occurs outside customer expectations and outside the warranty period. The best technique to use for this is a slower test method Accelerated Life Testing (ALT). © 2009 Ops A La Carte 100
  • 102. Overview In many cases, it is best to use both because each technique is good at finding different types of failure mechanisms. The proper use of both techniques together will offer a complete picture of the reliability of the product. © 2009 Ops A La Carte 101
  • 103. HALT Highly Accelerated Life Testing used for Product Ruggedization ALT Accelerated Life Testing used to Characterize Predominant Failure Mechanisms, Especially for Wearout © 2009 Ops A La Carte 102
  • 104. Comparison Between ALT and HALT FAILURE TESTING HALT ALT OBJECTIVES OBJECTIVES 1. Root Cause Analysis 1. Reliability Evaluation (e.g. Failure Rates) 2. Corrective Action Identification 2. Dominant Failure Mechanisms Identification 3. Design Robustness Determination TESTING REQUIREMENTS TESTING REQUIREMENTS 1. Detailed Product Knowledge 1. Detailed Parameters 2. Engineering Experience (a) Test Length (b) Number of Samples (c) Confidence/Accuracy (d) Acceleration Factors (e) Test Environment 2. Test Metrology & Factors (a) 4:2:1Procedure Or Other (b) Costs ANALYTICAL MODELS 1. Weibull Distribution 2. Arrhenius 3. Coffin-Manson 4. Norris-Lanzberg © 2009 Ops A La Carte 103
  • 105. Advantage of ALT over HALT  One key advantage of ALT over HALT is when we need to know the life of the product.  In HALT, we don’t concern ourselves with this much because we are more interested in making the product as reliable as we can, and measuring the amount of reliability is not as important.  However, with mechanical items that wear over time, it is very important to know the life of the product as accurately as possible. © 2009 Ops A La Carte 104
  • 106. Advantage of ALT over HALT Another advantage is that we often do not need any environmental equipment. Benchtop testing is often adequate. © 2009 Ops A La Carte 105
  • 107. Advantage of HALT over ALT  A big advantage of HALT over ALT is time. We are not so worried about time to failure as we are which failure mode is dominant. And this we can usually find out in a matter of days rather than weeks or months.  This savings in time is also a big savings in money since it takes less time at a test lab.  The number of samples is far fewer (usually 10 to 1)  We don’t need to calculate acceleration factor  We don’t need to stay with the same stresses as the field environment because of the cross-over effect © 2009 Ops A La Carte 106
  • 108. Combining ALT with HALT Often times we will run a product through HALT and then run the subassemblies through ALT that were not good candidates for HALT. HALT on System ALT on System Fan © 2009 Ops A La Carte 107
  • 109. Developing ALT from HALT And at other times, we may develop the ALT based on the HALT limits, using the same accelerants but lowering the acceleration factors to measurable levels. HALT on System ALT on System © 2009 Ops A La Carte 108
  • 110. Examples of Products for HALT and ALT Component Robot Fan Infusion Pump Hard Drive Medical Cabinet Automotive Electronics Cell Phone Automobile These pictures are samples of products we have tested. These are not the actual products to protect the proprietary nature of the products we test. © 2009 Ops A La Carte 109
  • 111. Component Characteristic Accelerant Aging High Temperature Contamination, Package Temp/Humidity Hermeticity Mismatch of Thermal Temp Cycling Characteristics of Package Matls Die Attachment, Bond Wires Vibration © 2009 Ops A La Carte 110
  • 112. Automobile Test Accelerant Electronics Temperature, Vibration, Humidity Contamination Mechanical Repetitive cycling test © 2009 Ops A La Carte 111
  • 113. Fan Test Accelerant Spinning Duty Cycle, Speed, Torque, Backpressure Lubricant Longevity Temperature, Humidity, Contamination © 2009 Ops A La Carte 112
  • 114. Hard Drive Test Accelerant Head Spinning Duty Cycle, Start/Stop, Speed, Temperature?, Vibration? Contamination on Head Surface Non-Operational Vibration Board Derating Temperature/Voltage Connectors – Power, Data Duty Cycle, Force, Angle © 2009 Ops A La Carte 113
  • 115. Robot Test Accelerant Arm Movement (side to side) Duty Cycle, Speed, Torque Z-Stage (up and down) Duty Cycle, Speed, Torque Vacuum Hold-down Temperature, Altitude Repeatability Duty Cycle © 2009 Ops A La Carte 114
  • 116. Automotive Electronics – GPS Receiver Test Accelerant Electronics Temperature, Vibration, Humidity Contamination Button Pushing Duty Cycle, Force?, Angle © 2009 Ops A La Carte 115
  • 117. Infusion Pump Test Accelerant Battery Charging Duty Cycle, Deep Discharge, Speed of Charge Touchscreen Duty Cycle, Location, Force? Pumping Duty Cycle, Rate, Plunger Force Connectors – Battery, Charger, Pole Duty Cycle, Force, Angle Clamp, IV Line, Cassette © 2009 Ops A La Carte 116
  • 118. Drawer for Medical Cabinet Test Accelerant Opening/Closing of Drawer Duty Cycle, Force, Angle Locking Mechanism Duty Cycle, Force, Contamination © 2009 Ops A La Carte 117
  • 119. Cell Phone Test Accelerant Button Pushing Duty Cycle, Force?, Angle Touchscreen Duty Cycle, Location, Force? Connectors – Headset, Battery, Duty Cycle, Force, Angle Charger © 2009 Ops A La Carte 118
  • 120. Summary  When wear-out is not a dominant failure mechanism, HALT is an excellent tool for finding product weaknesses in a short period of time. © 2009 Ops A La Carte 119
  • 121. Summary  When wear-out is a dominant failure mechanism, we must be able to predict or characterize this wear-out mechanism to assure that it occurs outside customer expectations and outside the warranty period.  ALT is an excellent method for doing this © 2009 Ops A La Carte 120
  • 122. RELIABILITY DEMONSTRATION TESTING (RDT) 121 © 2009 Ops A La Carte
  • 123. Reliability Demonstration Testing (RDT)  A sample of units are tested at accelerated stresses for several months.  The stresses are a bit lower than the HALT stresses and they are held constant (or cycled constantly) rather than gradually increasing.  This enables us to calculate the acceleration factor for the test.  The RDT can be used to validate the reliability prediction analyses. 122 © 2009 Ops A La Carte
  • 124. RDT vs. ALT  RDT and ALT are very similar in that the stresses are usually accelerated but at a lower level than HALT.  The main difference between RDT and ALT is that ALT is usually used to characterize the wearout region of the product whereas RDT is usually used to demonstrate the MTBF in the steady state region of the product.  In an RDT, you CAN substitute samples for time.  In an ALT, you CANNOT substitute samples for time. 123 © 2009 Ops A La Carte
  • 125. RDT vs. ALT ALT Region RDT Region 124 © 2009 Ops A La Carte
  • 126. RDT, continued 125 CRE Primer by QCI, 1998 © 2009 Ops A La Carte
  • 127. RDT, continued 126 CRE Primer by QCI, 1998 © 2009 Ops A La Carte
  • 128. HALT vs. RDT © 2009 Ops A La Carte 127
  • 129. Overview Highly Accelerated Life Testing (HALT) is a great reliability technique to use for finding predominant failure mechanisms in a hardware product. However, in many cases, customers need to know the MTBF or Annualized Failure Rate (AFR) of a product in the field. When this is the situation, most people turn to RDT. However, recently we have developed a method for estimating MTBF from HALT data. © 2009 Ops A La Carte 128
  • 130. The AFR Estimator  The AFR Estimator is a patent pending mathematical model that, when provided with the appropriate HALT and product information, will accurately estimate the product’s field AFR or Annual Failure Rate.  This methodology has been used on a number of products with significant positive financial results. © 2009 Ops A La Carte 129
  • 131. Justification for the AFR Estimator  As HALT takes only a few days to run and to implement its corrective action(s), and even if it took a bit longer, this time would be far less than waiting for an RDT to be run and to implement its corrective action(s). The application of this model can be a huge time and cost saver.  As higher HALT limits equate to lower AFR, you now have a tool that can accurately estimate the field AFR before launching the product. Stress levels that are depicted in the table in Section E are highly recommended for HALT. These levels can assure the producer that the product will exceed customer expectations and allow the producer to accurately forecast warranty expenditures. © 2009 Ops A La Carte 130
  • 132. Justification for the AFR Estimator  By not performing life tests and simply doing HALT, time and money will be saved. This is not to say that life testing isn’t important. It should be considered for new technologies and for an existing part/design with a different application but not as a process to accurately estimate AFR.  With seven to ten simple data entry points and most of them coming from the HALT effort, the AFR Estimator will provide an accurate field AFR instantaneously with its associated 90% statistical confidence limits. The inputs for HASS and HASA are: will you perform HASS or HASA, the daily sample size, and the detectable shift in the AFR you wish to detect. © 2009 Ops A La Carte 131
  • 133. Justification for the AFR Estimator  The AFR Estimator has been validated on over twenty products from diverse manufacturers and design environments.  The model can accommodate HALT samples sizes from one to six with the optimum size being four. Sample sizes of greater than four will default to four.  90% upper and lower confidence limits are calculated based on the HALT AFR and the HALT Sample Size. © 2009 Ops A La Carte 132
  • 134. Recommendations when using the AFR Estimator  An effective HALT needs to be done with at least three units and highly preferable four although the model can accommodate sample sizes from one to six.  Please realize that HALT sample sizes of three or less will dramatically affect the ability to detect product defects and hence, the statistical confidence is likewise impacted. © 2009 Ops A La Carte 133
  • 135. Recommendations when using the AFR Estimator 1. Root Cause for Failures 2. Robust Protocols 3. Achieve at least the Guard Band Limits 4. For HASS or HASA, normalize chamber vibration tables 5. Obtain a copy of, “HALT, HASS, & HASA Explained”, by Harry McLean and use it as a reference. © 2009 Ops A La Carte 134
  • 136. Recommendation 1: Root Cause for Failures  Each of the issues encountered needs to have root cause analysis understood, corrective action implemented, then verified in HALT under the same stress conditions in which the defect was detected. Exceptions to this would be limitations that occur beyond the Guard Band Limits in the table following Section E. Issues encountered beyond these levels are to have root cause analysis performed but corrective action implemented as a business decision based on timeliness, cost, and program delays. © 2009 Ops A La Carte 135
  • 137. Recommendation 3: Achieve Guard Band Limits  For the maximum benefit of a low field AFR or a high MTBF, it is suggested that the product achieve at least the levels shown under the Guard Band Limits in Section E below. These are very achievable with time and understanding within the organization without having to use extended (more costly) temperature range components. © 2009 Ops A La Carte 136
  • 138. How to Use the Estimator: © 2009 Ops A La Carte 137
  • 139. How to Use the Estimator: Calculated MTBF Estimate  The MTBF estimate in kHours can be from Telcordia, Relex, or a similar tool. If this estimate is not available, use 40,000 as a default value for the estimator. This parameter has very little effect on the final field AFR or MTBF estimate due to the highly variable processes followed by the many assumptions used in estimating an MTBF. Enter this value in the table following Section H. Please note that the estimator will recommend an MTBF of 40,000 when a value to less than 40,000 is used. © 2009 Ops A La Carte 138
  • 140. How to Use the Estimator: HALT Operating Limits  The final Hot operating limit (HOL) achieved in HALT as measured on the product and not the chamber setpoint. Enter this value in the table following Section H.  The final Cold operating limit (COL) achieved in HALT as measured on the product and not the chamber setpoint. Enter this value in the table following Section H.  The final Vibration operating limit (VOL) achieved in HALT as measured on the product and not the chamber setpoint. Enter this value in the table following Section H. © 2009 Ops A La Carte 139
  • 141. How to Use the Estimator: Product Environment  The product’s published thermal operating specifications, in C. Try to match your product's Published Specifications to a corresponding Level number listed in the table below, i.e., a high-end consumer product equates to a Level 2. Product's Published Specs Category  Guard Band Limits  Level  0C to 40C  Consumer  ‐30C to +80C  1  0C to +50C  Hi‐end Consumer  ‐30C to +100C  2  ‐10C to +50C  Hi Performance  ‐40C to +110C  3  ‐20C to +50C  Critical Application  ‐50C to +110C  4  ‐25C to +65C  Sheltered  ‐50C to +110C  5  ‐40C to +85C  All Outdoor  ‐65C to +110C  6  © 2009 Ops A La Carte 140
  • 142. How to Use the Estimator: Running the Estimator Once the Value for AFR Estimator column is completed, you are ready to run the AFR Estimator and determine the product’s AFR, MTBF, Confidence Limits, and days to detect shift in AFR if HASS or HASA is being used. © 2009 Ops A La Carte 141
  • 143. WRAP-UP © 2009 Ops A La Carte 142
  • 144. Design for Reliability (DfR) Tools  Reliability Assessment, Goal Setting, and Planning  Reliability Modeling and Prediction  Thermal Analysis  Derating Analysis  Failure Modes and Effects Analysis (FMEA)  Fault Tree Analysis (FTA)  Design of Experiments (DoE)  Human Engineering/Human Factors Analysis  Highly Accelerated Life Test (HALT)  Accelerated Life Test (ALT)  RDT and ORT  Highly Accelerated Stress Screen (HASS)  Root Cause Analysis (RCA)  Restriction of Hazardous Substances (RoHS)  Outsourced Engineering and Reliability  Field Data Analysis Red shows tools we introduced today. 143 © 2009 Ops A La Carte
  • 145. Thank you for your participation! © 2009 Ops A La Carte 144
  • 146. Contact Information Ops A La Carte, LLC Ops A La Carte, LLC Mike Silverman Vijay Prasad Managing Partner Program Manager, S. Cal (408) 472-3889 (858) 349-0443 mikes@opsalacarte.com vijayp@opsalacarte.com www.opsalacarte.com www.opsalacarte.com © 2009 Ops A La Carte 145
  • 147. Additional OPS A La Carte Information © 2009 Ops A La Carte 146
  • 148. Presenter’s Biographical Sketch – Mike Silverman ◈ Mike Silverman is founder and managing partner at Ops A La Carte, a Professional Consulting Company that has in intense focus on helping customers with end-to-end reliability. Through Ops A La Carte, Mike has had extensive experience as a consultant to high-tech companies, and has consulted for over 300 companies including Cisco, Ciena, Siemens, Abbott Labs, and Applied Materials. He has consulted in a variety of different industries including power electronics, telecommunications, networking, medical, semiconductor, semiconductor equipment, consumer electronics, and defense. ◈ Mike has 20 years of reliability and quality experience. He is also an expert in accelerated reliability techniques, including HALT&HASS (and recently purchased a HALT Lab), testing over 500 products for 100 companies in 40 different industries. Mike has authored and published 8 papers on reliability techniques and has presented these around the world including China, Germany, Canada, Taiwan, Singapore, and Korea. He has also developed and currently teaches 27 courses on reliability techniques. ◈ Mike has a BS degree in Electrical and Computer Engineering from the University of Colorado at Boulder, and is both a Certified Reliability Engineer and a course instructor through the American Society for Quality (ASQ), IEEE, Effective Training Associates, and Hobbs Engineering. Mike is a member of ASQ, IEEE, SME, ASME, PATCA, and IEEE Consulting Society and is the current chapter president in the IEEE Reliability Society for Silicon Valley. © 2009 Ops A La Carte 147
  • 149. We Can Help You Sell to Your Management Often times, our main contact has difficulty selling reliability into their company. We have many techniques to help: 1) Detailed Proposals with Case Examples 2) Free Presentations at your site 3) Technical Articles/White Papers 4) Blog Articles covering your situation 5) Articles from our quarterly Newsletter © 2009 Ops A La Carte 148
  • 150. What’s New at Ops? 0) New Book “50 Ways to Improve Your Reliability” 1) A new HALT Calculator 2) A new Reliability Blog 3) Semiconductor Reliability services 4) Software Reliability services 5) RoHS conversion services 6) Warranty analysis services 7) New Accelerated Life Test methodology 8) Quality/6 Sigma Seminars 9) Offices: Singapore, China, Taiwan, UK, India 10) Complete Reliability Solutions 11) Green Reliability Services © 2009 Ops A La Carte 149
  • 151. Reliability Integration Education - 31 different seminars on reliability - 1) Overall Program Reliability Integration 17) Design for ‘X’ (DfX) 2) Concept Phase Reliability Tools & Integration 18) Mechanical Design for IC Packaging 3) Design Phase Reliability Tools & Integration 19) Design of Experiments (DoE) 4) Prototype Phase Reliability Tools & Integration 20) HALT and HASS Application 5) Manufacturing Phase Reliability Tools & Integr. 21) Statistics for 6 Sigma 6) Reliability Techniques for Beginners 22) Fundamentals of Climatic Testing 7) Reliability Statistics 23) Design for Vibration and Shock 8) FMECA 24) Software Reliability 9) CRE Exam Preparation 25) Root Cause Analysis 10) CQE Exam Preparation 26) Reality of Pb-Free Reliability 11) Design for Reliability (DfR) 27) Statistical Process Control 12) Design for Manufacturability (DfM) 28) Innovative Problem Solving 13) Design for Testability (DfT) 29) Mechanical Design for Reliability 14) Design for Warranty Cost Reduction (DfW) 30) Problem Solving Tools 15) Design for 6 Sigma (DfSS) 31) Applied Data Analysis 16) Design for Safety Red – Part of our yearly symposium © 2009 Ops A La Carte 150
  • 152. Upcoming Seminars CQE Course – Apr-Jun and Oct-Dec, 2010 CRE Course – Jan-Mar and Aug-Oct, 2010 We offer 31 different courses and seminars in Reliability, Quality, and Technical Operations. Please see our Educational Brochure inside your Ops A La Carte packet for more details © 2009 Ops A La Carte 151
  • 153. Upcoming Events ARS – June, 2010 Reno We are a co-sponsor and we will be exhibiting and will be presenting a paper on our new HALT Calculator ASTR – October, Denver We are on the committee and will be exhibiting and presenting. RAMS – January, 11, Orlando We are on the committee and will be exhibiting and presenting. © 2009 Ops A La Carte 152