Effective Reliability Program
Traits and Management


           Fred Schenkelberg
          Ops A La Carte, LLC
Reliability Engineering Management



Fred Schenkelberg
Senior Reliability Consultant
Ops A La Carte, LLC
(408) 710-8248
fms@opsalacarte.com
Tutorial Objectives

   To outline the key traits for the effective
    management of a reliability program.

   To make you think about how to implement
    reliability engineering within an organization.
My Background and Context
Primary Reference




                McGRAW-HILL, 1996
                ISBN: 00701-27506
Additional Reading

   Practical Reliability Engineering, 4th Edition,
    Patrick D. T. O’Connor, 2002
   Improving Product Reliability: Strategies and
    Implementation, Mark A. Levin and Ted T.
    Kalal, 2003
   Quality if Free: The Art of Making Quality
    Certain, Philip B. Crosby, 1979
   Design Paradigms: Case Histories of Error
    and Judgment in Engineering, Henry Petroski,
    1994
HP’s Design for Reliability Story


       Which activities have impact?
Product Development


       (THE OLD WAY)


            GOOD


            FAST

            CHEAP


       PICK ANY TWO!
The Situation



   "Based on an in-depth study of HP's most
  successful divisions, we discovered that as
   much as 25% of our manufacturing assets
  were tied up in reacting to quality problems!

 "Clearly, a bold approach was needed to con-
vince people that a problem existed and to fully
  engage the entire organization in solving it."
The 10X Challenge




 "The proper place to start, we concluded,
 was with a startling goal - one that would
 get attention. The goal we chose was a
   tenfold reduction in the failure rates of
      our products during the 1980's."

                                 John
                                 Young
                                 HP CEO
Dick Moss retired from HP in February
1999, as the Corporate Product
Reliability Manager and winner of the
CEO’s Customer Satisfaction Award.
He worked at HP 39 years, the first 15
in new product development (R&D),
and the last 24 in hardware quality &
reliability. During that time, he
presented more than 700 technical
seminars to over 35,000 HP employees
worldwide. He wrote or edited parts of
4 books and published numerous
papers. He holds a BSEE from
Princeton and an MSEE from Stanford,
and has one patent.
The 10X Challenge Results

FAILURE RATE
                                Actual             10X Goal
 (Normalized)
    1.2


    1.0


    0.8


    0.6


    0.4


    0.2
                                                                                   0.126 ACTUAL (8X)
                                                                                   0.100 GOAL (10X)

    0.0
          1981   1982   1983   1984      1985   1986   1987   1988   1989   1990

                                      FISCAL YEAR
Warranty Savings During 10X

                        (ACTUAL vs PROJECTED @ 1980 RATE)
                                ACTUAL                 1980 RATE

     $300M




     $200M

 ANNUAL                                                               $808 MILLION
EXPENSE                                                              10 YR SAVINGS
                                             PROJECTED COST
     $100M



                                          ACTUAL WTY COST

          0
          FY80   FY81   FY82   FY83   FY84     FY85   FY86    FY87    FY88   FY89   FY90
                                         FISCAL YEAR
Design for Reliability



     HOW'D WE DO THAT?
                             Commitment
     Management Leadership & Involvement


     Lengthen Warranty Period

     Find & Share Best Practices
thoughts or questions

                  what are your
                   questions?
               •   …your comments?
DFR Survey

                                 SURVEY CHECKLIST
Scoring:   4 = 100%, top priority         Engineering:
           3 = >75, use expected                    Documented design cycle
           2 = 25 - 75%, variable use               Reliability goal budgeting
           1 = <25%, occasional use                 Priority of reliability improvement
           0 = not done or discontinued             DFR training programs
                                                    Preferred technology program
Management:
                                                    Component qualification testing
        Goal setting for division
                                                    OEM selection & qualif. Testing
        Priority of Quality & Reliab.
                                                    Physical failure analysis
        Mgmnt attention & follow up
                                                    Root cause analysis
Manufacturing:                                      Statistical engineering experiments
         Design for Manufacturability               Design & stress derating rules
         Priority of Q & R goals                    Design reviews & checking
         Ownership of Q & R goals                   Failure rate estimation
         Quality training programs                  Thermal design & measurements
         SPC & SQC use                              Worst case analysis
         Internal process audits                    Failure Modes & Effects Analysis
         Supplier process audits                    Environmental (margin) testing
         Incoming inspection                        Highly Accel. Stress Testing
         Product burn-in                            Design defect tracking
         Defect Tracking                            Lessons-learned database
         Corrective action
results

widespread use
 environmental test

   manual
 product lifecycle

range of use
 module goal setting

 derating rules

limited use
 DFR training

 physics of failure

   analysis
findings

            ODM concerns
           how to convey needs
             and get reliable products?

             time to market priority
           urgent versus important

            management structures
           many ways to organize roles

            mature products & scores
           when only select tools apply
observations

best practices         worst practices
 goal setting          repair & warranty

 prediction             invisible
 statistics            lessons learned capture

 golden nuggets        single owner of product

 first look process     reliability
                        multiple defect tracking

                         systems
QUESTIONS?




04/23/2002    Design For Reliability -   20
                  Overview.PPT
Reliability Goal Setting


     Establish the target in an engineering
                       meaningful manner
Reliability Definition

   Reliability is often considered quality over
    time

   Reliability is the probability of a product
    performing its intended function over its
    specified period of usage, and under
    specified operating conditions, in a
    manner that meets or exceeds customer
    expectations.
Reliability Goals & Metrics Summary

   Reliability Goals & Metrics tie together all
    stages of the product life cycle. Well
    crafted goals provide the target for the
    business to achieve, they set the
    direction.

   Metrics provide the milestones, the “are
    we there, yet”, the feedback all elements
    of the organization needs to stay on track
    toward the goals.
Reliability Goals & Metrics Summary

   A reliability goal includes each of the four
    elements of the reliability definition.
    o   Intended function
    o   Environment (including use profile)
    o   Duration
    o   Probability of success
    o   [Customer expectations]
Reliability Goals & Metrics Summary

   A reliability metric is often something that
    organization can measure on a relatively short
    periodic basis.

    o   Predicted failure rate (during design phase)
    o   Field failure rate
    o   Warranty
    o   Actual field return rate
    o   Dead on Arrival rate
Reliability Goal-Setting

   Reliability Goals can be derived from
    o   Customer-specified or implied requirements
    o   Internally-specified or self-imposed requirements
        (usually based on trying to be better than previous
        products)
    o   Benchmarking against competition
Example Exercise

   Elements of Product Requirements Document

   Take notes to build a reliability goal statement
PRD Scope

This document defines the product specification for the
    Device A (Dev A). This specification includes a
    description of all electrical, mechanical, and
    functional aspects of the Dev A. It is intended to
    define the characteristics of the Dev A, but is not
    intended to describe a specific design
    implementation, which is covered in other
    documents. Unless otherwise specified, the
    tolerance of the nominal values specified herein will
    be taken as ± 20% at an ambient temperature of
    25° C.

Dev A provides demand-only flow regulation in order to
   conserve gas.
PRD Background

   The device includes a built in regulator, valve, control
    circuitry, and enclosure. The device will be designed
    to attach to a standard compressed gas cylinder.
   The industrial design of the device allows the user a
    simple method of attachment to the cylinder and easy
    access to all controls, batteries, and outlet port.
   A high-valuation, portable, 2 year life, dependable
    product will be targeted, while minimizing cost of
    goods to permit market flexibility.
PRD Reliability Section

 Warranty Period
The Warranty period will be decided by Marketing prior
  to release. The MRD currently states a 1 year
  warranty, however, for design purposes a two year
  warranty period shall be assumed.(PRD074)
 Reliability Over Warranty Period

The project goal is less than 2% at the end of first years
  production.
 Maintainability

The Dev A is intended to be serviced and repaired by
  Company A authorized service centers or authorized
  health care providers.
PRD Reliability Section

  Useful Life
The useful design life of the Dev A shall be
   6,000 hours based on 4 years at 4 hours use
   per day.(PRD077)
PRD Environment Section

  Operating Environment
These devices shall meet all performance
   specifications defined herein while subject to
   the following environmental conditions
   unless otherwise specified:(PRD078)
Temperature: 5 to 40° C
Relative Humidity: 15 to 95% non-condensing
Atmospheric Pressure: 76.7 to 102 kPa
DC Supply Voltage: 4.5 to 6.5 VDC
PRD Environment Section

  Storage Environment
These devices shall perform to all specifications
   after one hour at operating environment
   conditions after storage at the following
   environmental conditions :(PRD079)
Temperature: -20 to 60 ° C
Relative Humidity: 15 to 95% non-condensing
 The Dev A and all package contents shall be

   stored in a sealed plastic bag away from oil
   and grease contaminates.(PRD080)
Goal Statement exercise

   In groups of two or three draft a reliability goal

   Note the missing information and draft
    questions to get the missing information

   This is a brand new product with no field
    history – how would you apportion the system
    goal to the various subsystems?
    (regulator, valve, control circuitry, and enclosure)
Reliability Goals & Metrics Summary

       A reliability metric is often something that
        organization can measure on a relatively
        short, periodic basis:
        o   Predicted failure rate (during design phase)
        o   Field failure rate
        o   Warranty
        o   Actual field return rate
        o   Dead on Arrival rate




(v5)
Fully-Stated Reliability Goals

      System goal at multiple points
       o   Supporting metrics during development and field
       o   Apportionment to appropriate level

      Provide connections to overall business plan,
       contracts, customer expectations, and include
       any assumptions concerning financials

      Benefit: clear target for development, vendor
       and production teams.
(v5)
Reliability Goal


                                                  −t
      Let’s say we expect a few

  
       failures in one year.
       Less than 2%
                                   R(t ) = e           θ


                                   ln(.98) = −8760 / θ
      Laboratory environ.
      XYZ function

                                     XYZ function for one year with
       Assuming constant failure
       rate                            98% reliability in the lab.
                                      (MTBF is 433,605 hrs.)




(v5)
Other Points in Time

      Also consider other business relevant points in
       time

      Infant mortality, out of box type failures
       o   Shipping damage
       o   Component defects, manufacturing defects

      Wear out related failures
       o   Bearings, connectors, solder joints, e-caps


(v5)
Break Down Overall Goal

      Let’s look at example

      A computer with a one year warranty and the
       business model requires less than 5% failures
       within the first year.
       o   A desktop business computer in office environment
           with 95% reliability at one year.




(v5)
Break Down the Goal,            (continued)


          For simplicity consider five major elements
           of the computer
           o   CPU/motherboard
           o   Hard Disk Drive
           o   Power Supply
           o   Monitor
           o   Bios, firmware

          For starters, let’s give each sub-system the
           same goal

(v5)
Apportionment of Goals



                                         Computer
                                         R = 0.95




            CPU             HDD              P/S          Monitor           Bios
           R = 0.99        R = 0.99        R = 0.99       R = 0.99        R = 0.99



       Assuming failures within each sub-system are independent, the simple
       multiplication of the reliabilities should result in meeting the system goal

       0.99 * 0.99 * 0.99 * 0.99 * 0.99 = 0.95

       Given no history or vendor data – this is just a starting point.
(v5)
Estimate Reliability

          The next step is to determine the sub-system
           reliability.
           o   Historical data from similar products
           o   Reliability estimates/test data by vendors
           o   In house reliability testing


          At first estimates are crude, refine as needed
           to make good decisions.



(v5)
Apportionment of Goals



                                    Computer
                                    R = 0.95


  Goals
           CPU          HDD            P/S          Monitor        Bios
          R = 0.99     R = 0.99      R = 0.99       R = 0.99     R = 0.99


  Estimates
           CPU          HDD             P/S         Monitor        Bios
          R = 0.96     R = 0.98      R = 0.999      R = 0.99     R = 0.999


       First pass estimates do not meet system goal. Now what?

(v5)
Resolving the Gap

      CPU goal 99% est. 96%               Use the simple reliability model
                                            to determine if reliability
                                            improvements will impact the
      Largest gap, lowest estimate         system reliability. i.e. changing
                                            the bios reliability form 99.9% to
      First, will the known issues         99.99% will not significantly
       bridge the difference?               alter the system reliability result.

                                          Invest in improvements that will
       In not enough, then use FMEA
       and HALT to populate Pareto of       impact the system reliability.
       what to fix

      Third, validate improvements




(v5)
Resolving the Gap,                         (continued)


                                            When the relationship of the
       HDD goal 0.99 est. 0.98
                                              failure mode and either design
                                              or environmental conditions
      Small gap, clear path to resolve       exist we do not need FMEA or
                                              HALT – go straight to design
      HDD reliability and operating          improvements.
       temperature are related.
       Lowering the internal                 Use ALT to validate the model
       temperature the HDD                    and/or design improvements.
       experiences will improve
       performance.




(v5)
Resolving the Gap,                       (continued)


                                          For any subsystem that exceeds
       P/S goal 0.99 est. 0.999
                                            the reliability goal, explore potential
                                            cost savings by reducing the
      Estimate over the goal               reliability performance.
                                           This is only done when there is
      Further improvement not cost         accurate reliability estimates and
       effective given minimal impact       significant cost savings.
       to system reliability.

      Possible to reduce reliability
       (select less expensive model)
       and use savings to improve
       CPU/motherboard.




(v5)
Progression of Estimates

                           Uppe
                                r Con
                                      fide    nce in
                                                       Estim
                                                               ate



                                                                             Actual Field
                                                                                Data




                                                             at a Dt s e T
                             at a Dr odne V




                                                                te
                                                          stima
                                                   e in E
                                       fi     de nc
                                   Con
       e n gn El aiti nI




                                er
                            L ow
(v5)
          i
Microsoft Model

      Proposed Model:
       Get feedback to the design and
       manufacturing team that permits visibility of
       the reliability gap. Permit comparison to goal.

      Microsoft Model:
       Not estimating or measuring the reliability
       during design is something I call the Microsoft
       model. Just ship it, the customers will tell you
       what needs improvement.
                  Don’t try the Microsoft Model!
             (it works for them but probably won’t work for you)
(v5)
Reliability Goals & Metrics Summary

   A reliability goal includes each of the four
    elements of the reliability definition.
    o   Intended function
    o   Environment (including use profile)
    o   Duration
    o   Probability of success
    o   [Customer expectations]
Reliability Philosophies


   Two fundamental methods to achieving
                  high product reliability
Build, Test, Fix

   In any design there are a finite number of
    flaws.
   If we find them, we can remove the flaw.

   Rapid prototyping
   HALT
   Large field trials or ‘beta’ testing
   Reliability growth modeling
Analytical Approach

   Develop goals
   Model expected failure mechanisms
   Conduct accelerated life tests
   Conduct reliability demonstration tests
   Routinely update system level model

   Balance of simulation/testing to increase
    ability of reliability model to predict field
    performance.
Issues with each approach

Build, Test, Fix              Analytical
 Uncertain if design is       Fix mostly known flaws

  good enough                  ALT’s take too long
 Limited prototypes           RDT’s take even longer
  means limited flaws          Models have large
  discovered                    uncertainty with new
 Unable to plan for
                                technology and
  warranty or field service     environments
Balanced approach


             Goal
             Plan

      FMEA      Prediction
      HALT      RDT/ALT

          Verification
           Review
Balanced approach


             Goal
             Plan

      FMEA      Prediction
      HALT      RDT/ALT

          Verification
           Review
Balanced approach


             Goal
             Plan

      FMEA      Prediction
      HALT      RDT/ALT

          Verification
           Review
Balanced approach


             Goal
             Plan

      FMEA      Prediction
      HALT      RDT/ALT

          Verification
           Review
Reliability Planning


     Selecting the minimum set of tools to
               achieve the reliability goals
Planning Introduction

   Mil Hdbk 785 task 1

“The purpose of this task is to develop a
  reliability program which identifies, and ties
  together, all program management tasks
  required to accomplish program
  requirements.”
Fully Stated Reliability Goals

   System goal at multiple points
    o   Supporting metrics during development and field
    o   Apportionment to appropriate level
   Provide connections to overall business plan,
    contracts, customer expectations, and include
    any assumptions concerning financials

   Benefit: clear target for development, vendor
    and production teams.
Medicine


"The abdomen, the chest, and the brain will be
  forever shut from the intrusion of the wise and
  humane surgeon"




                                    Sir John Erichsen
                        leading British surgeon, 1837
Gap Analysis

   Estimate/review current reliability of system
    against the next project goal
   The difference is the gap to close

   That gap is what the plan needs to bridge
Path to close gap

   This is the ‘art’ of our profession and each
    project needs a unique solution.

   Just because the plan succeeded for the last
    project, it may not work for the current one
    o   Timelines change
    o   Goals and risks change
    o   Business objectives and customer expectations
        change
    o   The organization has grown/lost capabilities
If, small gap and clear Parato

Then,
 Select issues on Parato from past products

  that have the easiest cost, timeline, risk.
 Engineering doesn’t need HALT or FMEA to

  identify or prioritize issues to resolve
 Assumes a system/sub-system reliability

  model, even as simple as Parato based on
  failure rates.
 Engineers may need ALT to verify solution

  assumptions
If, large gap and clear Parato

Then,
 Same as small gap, generally

 Early step is to estimate ability to close gap

  with reasonable business risk
 If there is doubt on validity of issues to

  resolve, consider HALT to uncover possible
  new issues
If, new features, new market

   Then,
   Increase use of HALT, including on
    competitor’s products if possible
   Increase use of environmental testing (HALT
    if able to afford samples and testing
    facilitates). Find margins related to new
    market environment.
   Use reliability growth modeling to determine if
    plan of record is able to meet goals
If, reliant on vendor’s failure
   analysis

Then,
 Consider building internal or third party failure

  analysis and component expertise
 Accelerate time to detection of vendor issues
If, (what is your situation)

When starting a project, consider the goals,
 constraints, etc. and look at the entire
 horizontal process.

Then,
 Let’s find a few options to consider
Exercise

   Identify a circumstance and an approach to
    building the reliability plan.

   What will be the biggest challenges to
    implementing the plan?
   Separate from the plan, what will you do as
    the reliability engineer do to overcome the
    obstacles?
Close on Planning Discussion

   Introduction to Planning
   Fully stated reliability goals
   Constraints
    o   Timeline
    o   Prototype samples
    o   Capabilities (skills and maturity)
   Current state and gap to goal
   Paths to close the gap
    o   Investments
    o   Dual paths
    o   Tolerance for risk
Television


"People will soon get tired of staring at a
plywood box every night."




                                    Darryl F. Zanuck
                         Twentieth Century-Fox, 1946
Reliability Value


         How to speak in management’s
                             language
A Reliability Engineer’s Use of
Warranty Cost Information

            Fred Schenkelberg
Introduction

   Many (most, all?) products have a warranty

   Examples of how to use this information in
    your reliability engineering work
Electric Light




“Good enough for our transatlantic friends, but
  unworthy of the attention of practical or
  scientific men.”



                   British Parliament report on Edison’s work
                                                         1878
Overview


   Warranty as a percentage of revenue.
   Warranty as a cost per unit.
   Who owns warranty?
   How much warranty expense is right?
   What is the right investment to reduce
    warranty?
Warranty Week


 www.warrantyweek.com
Computers


“There is no reason for any individual to have a
  computer in their home.”




                                         Ken Olson
                       Digital Equipment Corp. 1977
Reliability Specifications
     Example
   Given two fan datasheets

   Fan A has a mean time to fail of 4645 hours
   Fan B has a mean time to fail of 300 hours

   Both same price, etc.

   Choose one to maximize reliability
    at 100 hours
Reliability Specifications
     Example
   Consulting an internal fan expert, you are
    advised to get more information

   Fan A has a Weibull time to fail shape
    parameter of 0.8
   Fan B has a Weibull time to fail shape
    parameter of 3.0

                       1
              µ = θΓ1 + 
                     β
                        
Reliability Specifications
     Example
   Fan A has a scale parameter of 4100 hours
   Fan B has a scale parameter of 336 hours

   Use the Weibull Reliability function
                         −( t /θ )   β
          R (t ) = e
   Fan A reliability at 100 hours is 0.95
   Fan B reliability at 100 hours is 0.974
Reliability Specifications
     Example
   Given two fan datasheets

   Fan A has a mean time to fail of 4645 hours
   Fan B has a mean time to fail of 300 hours

   What about later, say 1000 hours?

   Fan A reliability at 1000 hours is 0.723
   Fan B reliability at 1000 hours is 3.5E-12
The Telephone


"That's an amazing invention, but who
would ever want to use one of them?"




                               Rutherford Hayes
                             U.S. President, 1876
The Cost Reduction Example

   Given a FET that costs 10 cents, a new
    procurement engineer finds a new FET
    vendor that only charges 5 cents.

   Switch?

   What else to consider?
The Cost Reduction Example

   Given a FET that costs 10 cents, a new
    procurement engineer finds a new FET
    vendor that only charges 5 cents.

   $0.05 FET has MTBF of 50,000 hours
   $0.10 FET has MTBF of 75,000 hours

   1000 hours of operation
   Shipping 1000 units
   Cost to repair unit $250
The Cost Reduction Example

   Total Cost of $0.10 FET
                      1000 
                    −         
R0.10 (1000 ) = e     75, 000 
                                   = 0.987
   #Failed = (1-0.987) 1000units = 13.25

   Cost of Repairs = 250*13 = $3250

   Total Cost = $3250+0.10*1000 = $3350
The Cost Reduction Example

   Total Cost of $0.05 FET
                      1000 
                    −          
R0.05 (1000 ) = e     50 , 000 
                                    = 0.98
   #Failed = (1-0.98) 1000units = 20

   Cost of Repairs = 250*20 = $5000

   Total Cost = $5000+0.05*1000 = $5050
The Cost Reduction Example

   Total Cost of $0.50 FET
                      1000 
                    −           
R0.50 (1000 ) = e     100 , 000 
                                     = 0.99
   #Failed = (1-0.99) 1000units = 10

   Cost of Repairs = 250*10 = $2500

   Total Cost = $2500+0.50*1000 = $3000
The Cost Reduction Example

   Result?
      FET     Repair       Total
      Cost    Cost         Cost

      $0.10   $3250        $3350
              75,000 hrs

      $0.05   $5000        $5050
              50,000 hrs

      $0.50   $2500        $3000
              100,000hrs
Aviation


"The popular mind often pictures gigantic flying
  machines speeding across the Atlantic and
  carrying innumerable passengers...it seems
  safe to say that such ideas are wholly
  visionary."



                             Wm. Henry Pickering
                          Harvard astronomer, 1908
Component Challenges

   Cost driving manufacturing to low labor cost
    areas of the world
   Pb-free causing redesign/reformulation
   Outsourced design and manufacturing
    facilities gaining “commodity’ component
    selection

   Other than yield - who’s watching Quality,
    Reliability and Warranty?
Component Challenges

   P50 formula error example

   Cracked ceramic capacitors
Component Challenges

   Trust and verify solution

   Build strong, technically verifiable, language
    into purchase contracts

   Check construction and formulation on
    periodic basis
Nuclear Energy


"Nuclear powered vacuum cleaners will
probably be a reality within 10 years."




                                       Alex Lewyt
                   vacuum cleaner manufacturer,1955
Where to Get More Information

   Newsletter and seminars
    http://Warrantyweek.com


   “Warranty Cost: An Introduction”
    http://quanterion.com/ReliabilityQues/V3N3.html

   “Economics of Reliability,” Chapter 4 of
    Handbook of Reliability Engineering and
    Management, 2nd Ed by Ireson, Coombs and Moss.
Reliability Engineering Value
How to determine ‘value add’ or ROI
“All metrics are wrong, some are useful.”
value
Terms

   Value
    o   An amount considered to be a suitable equivalent
        for something else; a fair price or return for goods
        or services
   Value Add
    o   The return or result of individual, team or product
        investment
   Value Capture
    o   Value add documentation related directly to
        merger
   Warranty Reduction
    o   Lower failure rates leading to fewer claims
How is value requested?


   Quarterly review: What have you done for me
    lately?

   Checkpoint meeting: Are we on track to meet
    goals?

   Budget: Which option provides best ROI?

   Annual review: What is your impact?
current status
Warranty – The Big Picture

”American manufacturers spent over $25 billion in
2004 honoring their product warranties, an increase
of 4.8% from the levels seen in 2003. However, an
incredible 63% of U.S.-based product manufacturers
actually saw a decrease in their claims rates as a
percentage of sales. Only 35% saw an increase and
2% saw no change, according to the latest statistics
compiled by Warranty Week.”

                          Eric Arnum, Warranty Week
                        www.warrantyweek.com, May 27th, 2005
document value
VALUE ADDED/ROI
          QUESTIONAIRE
                              Savings/Impact/Benefit
1. Risk / cost / warranty a. Has the work directly identified or mitigated a field related problem
reduction
                            b. If so estimate the probable cost of the field problem in $ (i.e. units
                            affected x repair cost)
                            c. Has the probability of field related problems been reduced?

                            d. If so give a guide by how much and the estimated cost of avoidance
                            (i.e. Estimate 1000 units per month failure at $50 each reduced by 5%)

                            e. Has work provided processes which will reduce the risk of field
                            failures in subsequent products?
2. TTM impact:              a. Did work help you meet or beat your TTM goals?
                            b. Did work identify any problems which would have impacted your
                            TTM?
                            c. Has the use of tools/techniques identified issues which would of
                            impacted TTM?
                            d. If the above are applicable please identify type of problems and
                            estimate TTM impact in days/weeks/months
                            e. What is the estimated cost of a delay in TTM?
                            f. What is the opportunity in $ of additional income from an early
                            TTM?
VALUE ADDED/ROI
          QUESTIONAIRE

                       Savings/Impact/Benefit
3. TT Volume impact:    a. Did work help you accelerate or meet your Time to Volume
                        goals?
                        b. If applicable what is the estimated $ impact of avoiding the
                        TTV issues that were identified
4. Material costs:      a. Did we avoid or save any direct product material or test
                        equipment costs?
                        b. If so please identify type and cost

5. TCE:                 a. Has the work contributed to the TCE of your product?

                        b. If so identify how? i.e. estimated number of customer calls
                        avoided
                        c. If you have a TCE cost model what is the estimated $ impact
                        of the identified improvement
6.Opportunity Cost      a. If engineers from the business had been used to do this work
                        would they have not been able do other product related work. I.e.
                        delivered new functions?
7. Indirect Impact:     a. What advantages did internal work provide over an external
                        consultancy? (i.e. time, cost, contractual issues, Intellectual
                        Property, response time)
“I fall back dazzled at beholding myself all rosy red,
At having, I myself, caused the sun to rise


            Edmund Rostand (1868-1918)
VALUE ADDED/ROI
          QUESTIONAIRE

                        Savings/Impact/Benefit
8. Engineering effort    a. How long would it have taken your team to undertake the
saved:                   work provided. Take into account research time and whether you
                         had the skills available
                         b. If you did not have the skills available how many people
                         would have needed to be recruited to undertake the work?

                         c. How long would it take for these people to become
                         productive?
                         d. Estimate training cost associated with new personnel

9. Misc                  a. Please identify any other benefits or cost savings from using
                         our resources
“Gross national product measures neither the health of our children,
the quality of their education, nor the joy of their play
It measures neither the beauty of our poetry, nor the strength of our
marriages.
It is indifferent to the decency of our factories and the safety of our
streets alike.
It measures neither our wisdom nor our learning, neither our wit nor
our courage, neither our compassion or our devotion to country.
It measures everything in short, except that which makes life worth
living, and it can tell us everything about our country except those
things which make us proud to be part of it.”

                                                    Robert Kennedy
Your ‘value case’

   Problem statement

   Work done to solve problem

   Value statement(s)
Reliability Maturity


     How to understand an organization’s
                        reliability culture
Maturity Matrix

   Handout Matrix

   Based on Quality Management Maturity Grid
    from Quality is Free, c 1979 by Philip B.
    Crosby
Measurement Categories

   Management Understanding and Attitude
    o   Business objectives and language
    o   Attention and investments


   Reliability Status
    o   Position and stature
    o   Location and influence
Measurement Categories

   Problem Handling
    o   Proactive or Reactive

   Cost of ‘Un’ Reliability
    o   Understanding and influence of metrics
    o   Local budget or total product cost

   Feedback Process
    o   Predictions, reliability testing
    o   Failure analysis, time to detection
Measurement Categories

   DFR program status
    o   Exists separately or integrated
    o   Template or customized

   Summation of Reliability Posture
    o   How does the organization talk about reliability?
Stage I Uncertainty

   Management – blame others
   Status – hidden or doesn’t exist
   Problems – may have good fire fighting
   Cost – unknown and no influence
   Feedback – customer returns & complaints
   DFR – doesn’t exist even with designers

   Summation – “Reliability must be ok, since
    customer’s are buying our products.”
Stage II Awakening

   Management – important w/o resources
   Status – champion recognized
   Problems – organized fire fighting
   Cost – generally warranty only
   Feedback – disorganized, antidotal
   DFR – trying some tools

   Summation – “We really should make more
    reliable products.”
Stage III Enlightenment

   Management – Support and encouragement
   Status – Senior staff influence
   Problems – Systematic and reactive
   Cost – Starting to track cost of un-reliability
   Feedback – ALT and modeling, root cause
   DFR – program of reliability activities

   Summation – “We can see how these tools
    help our product’s field performance.”
Stage IV Wisdom

   Management – Personally involved, leading
   Status – Senior manager, major role
   Problems – found and resolved quickly
   Cost – understanding of major drivers
   Feedback – selective testing in risk areas
   DFR – Part of products get designed

   Summation – “We avoid most field reliability
    issues”
Stage V Certainty

   Management – Considered core capability
   Status – thought leader in company
   Problems – Only a few issue, & expected
   Cost – Accurate and decreasing
   Feedback – Testing & field support models
   DFR – Normal part of company business

   Summation – “We do get surprised by the few
    field failures that occur.”
Why do we need to know Maturity?

   Recommendations need to match the
    organizations capabilities

   From current state build path toward the right
    one step at a time

   Value proposition for changes address
    management approach to reliability
How to determine maturity?

   Self assessment
    o   Small team from across organization
    o   Each marks blocks that describe their maturity
    o   Team determine Stage description by consensus

   Observation from within an organization
    o   As an individual trying to position changes
    o   Informally conduct self assessment
How to determine maturity?

   Assessment Interviews
    o   Conduct interviews to understand current reliability
        activities
    o   Review and summarize interviews
    o   Interpret results onto maturity matrix
   What are your questions?
Reliability Assessment


     Using a survey to quickly understand
     the organization’s reliability program
survey approach

   selecting survey topics     choosing interviewees
   interview format             hw r&d manager

   data collection              hw r&d engineer

   business unit summary        reliability manager

   immediate follow up          reliability engineer

   analysis                     procurement

   review                       manufacturing

   key stakeholder reporting
survey form & scoring


                             DFR Methods Survey
                                         
                 Scoring:   4 = 100%, top priority, always done
                            3 = >75%, use normally, expected
                            2 = 25% - 75%, variable use
                            1 = <25%, only occasional use
                            0 = not done or discontinued
                            - = not visible, no comment
 
Management:
    Goal setting for division
    Priority of quality & reliability improvement
    Management attention & follow up (goal ownership)
 
Design:
    Documented hardware design cycle
    Goal setting by product or module
design survey topics

Design:
       Documented hardware design cycle
       Goal setting by product or module
       Priority of Q&R vs performance, cost, schedule
       Design for Reliability (DFR) training
       Preferred technology selection/standardization
       Component qualification testing
       OEM selection & testing to equal HP requirements
       Fault Tree Analysis/Rel. Block Diagrams
  (FTA/RBD)
       Failure/root cause analysis
       Statistically-designed engineering experiments
       Accelerated Stress/Life Testing (ALT)
       Design & derating rules
design survey topics

      Design reviews/design rule checking
      Finite Element Analysis (FEA) or simulations
      Failure rate estimation/prediction
      Thermal design & measurements
      Design tolerance analysis
      Failure Modes & Effects Analysis (FMEA)
      Environmental (design margin) testing
      Highly accelerated life testing (HALT)
      Physics of Failure analysis
      Lessons-learned database
      Design Defect Tracking (DDT)
Ownership of quality & reliability goals
manufacturing survey topics

Manufacturing:
        Design for manufacturability (DFM)
        Priority of Q&R vs schedule & cost
        Quality training programs
        Statistical Process Control (SPC/SQC)
        Total Quality Management (TQM)
        HP process audits (written reports)
        Vendor (& OEM) process audits, TQRDCE
        Incoming inspection/sampling
        Component burn-in
        Assembly-level environmental stress screening (ESS)
        Product-level environmental stress screening (ESS)
        Defect Detection & Tracking (DD&T)
        Corrective Action Reports
        Ownership of quality & reliability goals
Aircraft Company Example

   AC, Inc. a private jet manufacturer, develops,
    manufactures, sells and provides support for
    aircraft, throughout the intended life cycle.
    The product design process is dominated by
    the ability to meet FAA certification
    requirements. This product is high cost and
    very low volume.

   Handout, AC, Inc. Survey Summary
   Determine maturity stage and make
    recommendations
AC, Inc. key points

   MTBF metrics
   Excellent field data
   Very limited sample sizes
   Reactive mode to improvement activities
AC, Inc. Recommendations

   Use Reliability rather than MTBUR. Establish fully
    stated reliability goal in terms of the probability of
    components and aircraft successfully performing as
    expected under stated conditions for two or more
    defined time periods. Reliability is a metric that does
    not have a dependence on a particular lifetime
    distribution and is intuitively interpreted by engineers
    correctly. Using multiple time marks, it promotes the
    use of lifetime distributions rather than single
    parameter descriptions. Once engineers are using
    lifetime distributions, calculating confidence intervals
    is a natural extension.
AC, Inc. Recommendations

   Build and support an aircraft reliability model. Use the historical
    data, lifetime distributions (not MTBUR), RBD (reliability block
    diagramming) and simple mathematics to quickly create a basic
    reliability model. An extension of the model would be to
    incorporate the various environmental factors, flight profiles, and
    the influence of other relevant variables on failure rates. For
    example, some systems experience damaging stress during
    takeoffs and landings, others only while in flight, some only
    when landing in high temperature and humidity climates. Ideally
    for each component the model would incorporate historical field
    history along with environmental and component data. Even a
    very simple model that enables the design and procurement
    teams to evaluate options is well worth the effort to build and
    support. Most importantly a reliability model provides feedback
    very quickly to the design team during the design process.
AC, Inc. Recommendations

   Handout, AC, Inc. recommendations and
    matrix results

   Basic idea is to make the reliability engineer
    more valuable to the design team by building
    an aircraft reliability model.

   Value proposition: better design tradeoffs that
    include reliability.

2011 RAMS Tutorial Effective Reliability Program Traits and Management

  • 1.
    Effective Reliability Program Traitsand Management Fred Schenkelberg Ops A La Carte, LLC
  • 2.
    Reliability Engineering Management FredSchenkelberg Senior Reliability Consultant Ops A La Carte, LLC (408) 710-8248 fms@opsalacarte.com
  • 3.
    Tutorial Objectives  To outline the key traits for the effective management of a reliability program.  To make you think about how to implement reliability engineering within an organization.
  • 4.
  • 5.
    Primary Reference McGRAW-HILL, 1996 ISBN: 00701-27506
  • 6.
    Additional Reading  Practical Reliability Engineering, 4th Edition, Patrick D. T. O’Connor, 2002  Improving Product Reliability: Strategies and Implementation, Mark A. Levin and Ted T. Kalal, 2003  Quality if Free: The Art of Making Quality Certain, Philip B. Crosby, 1979  Design Paradigms: Case Histories of Error and Judgment in Engineering, Henry Petroski, 1994
  • 7.
    HP’s Design forReliability Story Which activities have impact?
  • 8.
    Product Development (THE OLD WAY) GOOD FAST CHEAP PICK ANY TWO!
  • 9.
    The Situation "Based on an in-depth study of HP's most successful divisions, we discovered that as much as 25% of our manufacturing assets were tied up in reacting to quality problems! "Clearly, a bold approach was needed to con- vince people that a problem existed and to fully engage the entire organization in solving it."
  • 10.
    The 10X Challenge "The proper place to start, we concluded, was with a startling goal - one that would get attention. The goal we chose was a tenfold reduction in the failure rates of our products during the 1980's." John Young HP CEO
  • 11.
    Dick Moss retiredfrom HP in February 1999, as the Corporate Product Reliability Manager and winner of the CEO’s Customer Satisfaction Award. He worked at HP 39 years, the first 15 in new product development (R&D), and the last 24 in hardware quality & reliability. During that time, he presented more than 700 technical seminars to over 35,000 HP employees worldwide. He wrote or edited parts of 4 books and published numerous papers. He holds a BSEE from Princeton and an MSEE from Stanford, and has one patent.
  • 12.
    The 10X ChallengeResults FAILURE RATE Actual 10X Goal (Normalized) 1.2 1.0 0.8 0.6 0.4 0.2 0.126 ACTUAL (8X) 0.100 GOAL (10X) 0.0 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 FISCAL YEAR
  • 13.
    Warranty Savings During10X (ACTUAL vs PROJECTED @ 1980 RATE) ACTUAL 1980 RATE $300M $200M ANNUAL $808 MILLION EXPENSE 10 YR SAVINGS PROJECTED COST $100M ACTUAL WTY COST 0 FY80 FY81 FY82 FY83 FY84 FY85 FY86 FY87 FY88 FY89 FY90 FISCAL YEAR
  • 14.
    Design for Reliability HOW'D WE DO THAT? Commitment Management Leadership & Involvement Lengthen Warranty Period Find & Share Best Practices
  • 15.
    thoughts or questions  what are your questions? • …your comments?
  • 16.
    DFR Survey SURVEY CHECKLIST Scoring: 4 = 100%, top priority Engineering: 3 = >75, use expected Documented design cycle 2 = 25 - 75%, variable use Reliability goal budgeting 1 = <25%, occasional use Priority of reliability improvement 0 = not done or discontinued DFR training programs Preferred technology program Management: Component qualification testing Goal setting for division OEM selection & qualif. Testing Priority of Quality & Reliab. Physical failure analysis Mgmnt attention & follow up Root cause analysis Manufacturing: Statistical engineering experiments Design for Manufacturability Design & stress derating rules Priority of Q & R goals Design reviews & checking Ownership of Q & R goals Failure rate estimation Quality training programs Thermal design & measurements SPC & SQC use Worst case analysis Internal process audits Failure Modes & Effects Analysis Supplier process audits Environmental (margin) testing Incoming inspection Highly Accel. Stress Testing Product burn-in Design defect tracking Defect Tracking Lessons-learned database Corrective action
  • 17.
    results widespread use  environmentaltest manual  product lifecycle range of use  module goal setting  derating rules limited use  DFR training  physics of failure analysis
  • 18.
    findings  ODM concerns how to convey needs and get reliable products?  time to market priority urgent versus important  management structures many ways to organize roles  mature products & scores when only select tools apply
  • 19.
    observations best practices worst practices  goal setting  repair & warranty  prediction invisible  statistics  lessons learned capture  golden nuggets  single owner of product  first look process reliability  multiple defect tracking systems
  • 20.
    QUESTIONS? 04/23/2002 Design For Reliability - 20 Overview.PPT
  • 21.
    Reliability Goal Setting Establish the target in an engineering meaningful manner
  • 22.
    Reliability Definition  Reliability is often considered quality over time  Reliability is the probability of a product performing its intended function over its specified period of usage, and under specified operating conditions, in a manner that meets or exceeds customer expectations.
  • 23.
    Reliability Goals &Metrics Summary  Reliability Goals & Metrics tie together all stages of the product life cycle. Well crafted goals provide the target for the business to achieve, they set the direction.  Metrics provide the milestones, the “are we there, yet”, the feedback all elements of the organization needs to stay on track toward the goals.
  • 24.
    Reliability Goals &Metrics Summary  A reliability goal includes each of the four elements of the reliability definition. o Intended function o Environment (including use profile) o Duration o Probability of success o [Customer expectations]
  • 25.
    Reliability Goals &Metrics Summary  A reliability metric is often something that organization can measure on a relatively short periodic basis. o Predicted failure rate (during design phase) o Field failure rate o Warranty o Actual field return rate o Dead on Arrival rate
  • 26.
    Reliability Goal-Setting  Reliability Goals can be derived from o Customer-specified or implied requirements o Internally-specified or self-imposed requirements (usually based on trying to be better than previous products) o Benchmarking against competition
  • 27.
    Example Exercise  Elements of Product Requirements Document  Take notes to build a reliability goal statement
  • 28.
    PRD Scope This documentdefines the product specification for the Device A (Dev A). This specification includes a description of all electrical, mechanical, and functional aspects of the Dev A. It is intended to define the characteristics of the Dev A, but is not intended to describe a specific design implementation, which is covered in other documents. Unless otherwise specified, the tolerance of the nominal values specified herein will be taken as ± 20% at an ambient temperature of 25° C. Dev A provides demand-only flow regulation in order to conserve gas.
  • 29.
    PRD Background  The device includes a built in regulator, valve, control circuitry, and enclosure. The device will be designed to attach to a standard compressed gas cylinder.  The industrial design of the device allows the user a simple method of attachment to the cylinder and easy access to all controls, batteries, and outlet port.  A high-valuation, portable, 2 year life, dependable product will be targeted, while minimizing cost of goods to permit market flexibility.
  • 30.
    PRD Reliability Section Warranty Period The Warranty period will be decided by Marketing prior to release. The MRD currently states a 1 year warranty, however, for design purposes a two year warranty period shall be assumed.(PRD074)  Reliability Over Warranty Period The project goal is less than 2% at the end of first years production.  Maintainability The Dev A is intended to be serviced and repaired by Company A authorized service centers or authorized health care providers.
  • 31.
    PRD Reliability Section  Useful Life The useful design life of the Dev A shall be 6,000 hours based on 4 years at 4 hours use per day.(PRD077)
  • 32.
    PRD Environment Section  Operating Environment These devices shall meet all performance specifications defined herein while subject to the following environmental conditions unless otherwise specified:(PRD078) Temperature: 5 to 40° C Relative Humidity: 15 to 95% non-condensing Atmospheric Pressure: 76.7 to 102 kPa DC Supply Voltage: 4.5 to 6.5 VDC
  • 33.
    PRD Environment Section  Storage Environment These devices shall perform to all specifications after one hour at operating environment conditions after storage at the following environmental conditions :(PRD079) Temperature: -20 to 60 ° C Relative Humidity: 15 to 95% non-condensing  The Dev A and all package contents shall be stored in a sealed plastic bag away from oil and grease contaminates.(PRD080)
  • 34.
    Goal Statement exercise  In groups of two or three draft a reliability goal  Note the missing information and draft questions to get the missing information  This is a brand new product with no field history – how would you apportion the system goal to the various subsystems? (regulator, valve, control circuitry, and enclosure)
  • 35.
    Reliability Goals &Metrics Summary  A reliability metric is often something that organization can measure on a relatively short, periodic basis: o Predicted failure rate (during design phase) o Field failure rate o Warranty o Actual field return rate o Dead on Arrival rate (v5)
  • 36.
    Fully-Stated Reliability Goals  System goal at multiple points o Supporting metrics during development and field o Apportionment to appropriate level  Provide connections to overall business plan, contracts, customer expectations, and include any assumptions concerning financials  Benefit: clear target for development, vendor and production teams. (v5)
  • 37.
    Reliability Goal −t  Let’s say we expect a few  failures in one year. Less than 2% R(t ) = e θ ln(.98) = −8760 / θ  Laboratory environ.  XYZ function   XYZ function for one year with Assuming constant failure rate 98% reliability in the lab.  (MTBF is 433,605 hrs.) (v5)
  • 38.
    Other Points inTime  Also consider other business relevant points in time  Infant mortality, out of box type failures o Shipping damage o Component defects, manufacturing defects  Wear out related failures o Bearings, connectors, solder joints, e-caps (v5)
  • 39.
    Break Down OverallGoal  Let’s look at example  A computer with a one year warranty and the business model requires less than 5% failures within the first year. o A desktop business computer in office environment with 95% reliability at one year. (v5)
  • 40.
    Break Down theGoal, (continued)  For simplicity consider five major elements of the computer o CPU/motherboard o Hard Disk Drive o Power Supply o Monitor o Bios, firmware  For starters, let’s give each sub-system the same goal (v5)
  • 41.
    Apportionment of Goals Computer R = 0.95 CPU HDD P/S Monitor Bios R = 0.99 R = 0.99 R = 0.99 R = 0.99 R = 0.99 Assuming failures within each sub-system are independent, the simple multiplication of the reliabilities should result in meeting the system goal 0.99 * 0.99 * 0.99 * 0.99 * 0.99 = 0.95 Given no history or vendor data – this is just a starting point. (v5)
  • 42.
    Estimate Reliability  The next step is to determine the sub-system reliability. o Historical data from similar products o Reliability estimates/test data by vendors o In house reliability testing  At first estimates are crude, refine as needed to make good decisions. (v5)
  • 43.
    Apportionment of Goals Computer R = 0.95 Goals CPU HDD P/S Monitor Bios R = 0.99 R = 0.99 R = 0.99 R = 0.99 R = 0.99 Estimates CPU HDD P/S Monitor Bios R = 0.96 R = 0.98 R = 0.999 R = 0.99 R = 0.999 First pass estimates do not meet system goal. Now what? (v5)
  • 44.
    Resolving the Gap  CPU goal 99% est. 96%  Use the simple reliability model to determine if reliability improvements will impact the  Largest gap, lowest estimate system reliability. i.e. changing the bios reliability form 99.9% to  First, will the known issues 99.99% will not significantly bridge the difference? alter the system reliability result.   Invest in improvements that will In not enough, then use FMEA and HALT to populate Pareto of impact the system reliability. what to fix  Third, validate improvements (v5)
  • 45.
    Resolving the Gap, (continued)   When the relationship of the HDD goal 0.99 est. 0.98 failure mode and either design or environmental conditions  Small gap, clear path to resolve exist we do not need FMEA or HALT – go straight to design  HDD reliability and operating improvements. temperature are related. Lowering the internal  Use ALT to validate the model temperature the HDD and/or design improvements. experiences will improve performance. (v5)
  • 46.
    Resolving the Gap, (continued)   For any subsystem that exceeds P/S goal 0.99 est. 0.999 the reliability goal, explore potential cost savings by reducing the  Estimate over the goal reliability performance.  This is only done when there is  Further improvement not cost accurate reliability estimates and effective given minimal impact significant cost savings. to system reliability.  Possible to reduce reliability (select less expensive model) and use savings to improve CPU/motherboard. (v5)
  • 47.
    Progression of Estimates Uppe r Con fide nce in Estim ate Actual Field Data at a Dt s e T at a Dr odne V te stima e in E fi de nc Con e n gn El aiti nI er L ow (v5) i
  • 48.
    Microsoft Model  Proposed Model: Get feedback to the design and manufacturing team that permits visibility of the reliability gap. Permit comparison to goal.  Microsoft Model: Not estimating or measuring the reliability during design is something I call the Microsoft model. Just ship it, the customers will tell you what needs improvement. Don’t try the Microsoft Model! (it works for them but probably won’t work for you) (v5)
  • 49.
    Reliability Goals &Metrics Summary  A reliability goal includes each of the four elements of the reliability definition. o Intended function o Environment (including use profile) o Duration o Probability of success o [Customer expectations]
  • 50.
    Reliability Philosophies Two fundamental methods to achieving high product reliability
  • 51.
    Build, Test, Fix  In any design there are a finite number of flaws.  If we find them, we can remove the flaw.  Rapid prototyping  HALT  Large field trials or ‘beta’ testing  Reliability growth modeling
  • 52.
    Analytical Approach  Develop goals  Model expected failure mechanisms  Conduct accelerated life tests  Conduct reliability demonstration tests  Routinely update system level model  Balance of simulation/testing to increase ability of reliability model to predict field performance.
  • 53.
    Issues with eachapproach Build, Test, Fix Analytical  Uncertain if design is  Fix mostly known flaws good enough  ALT’s take too long  Limited prototypes  RDT’s take even longer means limited flaws  Models have large discovered uncertainty with new  Unable to plan for technology and warranty or field service environments
  • 54.
    Balanced approach Goal Plan FMEA Prediction HALT RDT/ALT Verification Review
  • 55.
    Balanced approach Goal Plan FMEA Prediction HALT RDT/ALT Verification Review
  • 56.
    Balanced approach Goal Plan FMEA Prediction HALT RDT/ALT Verification Review
  • 57.
    Balanced approach Goal Plan FMEA Prediction HALT RDT/ALT Verification Review
  • 58.
    Reliability Planning Selecting the minimum set of tools to achieve the reliability goals
  • 59.
    Planning Introduction  Mil Hdbk 785 task 1 “The purpose of this task is to develop a reliability program which identifies, and ties together, all program management tasks required to accomplish program requirements.”
  • 60.
    Fully Stated ReliabilityGoals  System goal at multiple points o Supporting metrics during development and field o Apportionment to appropriate level  Provide connections to overall business plan, contracts, customer expectations, and include any assumptions concerning financials  Benefit: clear target for development, vendor and production teams.
  • 61.
    Medicine "The abdomen, thechest, and the brain will be forever shut from the intrusion of the wise and humane surgeon" Sir John Erichsen leading British surgeon, 1837
  • 62.
    Gap Analysis  Estimate/review current reliability of system against the next project goal  The difference is the gap to close  That gap is what the plan needs to bridge
  • 63.
    Path to closegap  This is the ‘art’ of our profession and each project needs a unique solution.  Just because the plan succeeded for the last project, it may not work for the current one o Timelines change o Goals and risks change o Business objectives and customer expectations change o The organization has grown/lost capabilities
  • 64.
    If, small gapand clear Parato Then,  Select issues on Parato from past products that have the easiest cost, timeline, risk.  Engineering doesn’t need HALT or FMEA to identify or prioritize issues to resolve  Assumes a system/sub-system reliability model, even as simple as Parato based on failure rates.  Engineers may need ALT to verify solution assumptions
  • 65.
    If, large gapand clear Parato Then,  Same as small gap, generally  Early step is to estimate ability to close gap with reasonable business risk  If there is doubt on validity of issues to resolve, consider HALT to uncover possible new issues
  • 66.
    If, new features,new market  Then,  Increase use of HALT, including on competitor’s products if possible  Increase use of environmental testing (HALT if able to afford samples and testing facilitates). Find margins related to new market environment.  Use reliability growth modeling to determine if plan of record is able to meet goals
  • 67.
    If, reliant onvendor’s failure analysis Then,  Consider building internal or third party failure analysis and component expertise  Accelerate time to detection of vendor issues
  • 68.
    If, (what isyour situation) When starting a project, consider the goals, constraints, etc. and look at the entire horizontal process. Then,  Let’s find a few options to consider
  • 69.
    Exercise  Identify a circumstance and an approach to building the reliability plan.  What will be the biggest challenges to implementing the plan?  Separate from the plan, what will you do as the reliability engineer do to overcome the obstacles?
  • 70.
    Close on PlanningDiscussion  Introduction to Planning  Fully stated reliability goals  Constraints o Timeline o Prototype samples o Capabilities (skills and maturity)  Current state and gap to goal  Paths to close the gap o Investments o Dual paths o Tolerance for risk
  • 71.
    Television "People will soonget tired of staring at a plywood box every night." Darryl F. Zanuck Twentieth Century-Fox, 1946
  • 72.
    Reliability Value How to speak in management’s language
  • 73.
    A Reliability Engineer’sUse of Warranty Cost Information Fred Schenkelberg
  • 74.
    Introduction  Many (most, all?) products have a warranty  Examples of how to use this information in your reliability engineering work
  • 75.
    Electric Light “Good enoughfor our transatlantic friends, but unworthy of the attention of practical or scientific men.” British Parliament report on Edison’s work 1878
  • 76.
    Overview  Warranty as a percentage of revenue.  Warranty as a cost per unit.  Who owns warranty?  How much warranty expense is right?  What is the right investment to reduce warranty?
  • 77.
  • 78.
    Computers “There is noreason for any individual to have a computer in their home.” Ken Olson Digital Equipment Corp. 1977
  • 79.
    Reliability Specifications Example  Given two fan datasheets  Fan A has a mean time to fail of 4645 hours  Fan B has a mean time to fail of 300 hours  Both same price, etc.  Choose one to maximize reliability at 100 hours
  • 80.
    Reliability Specifications Example  Consulting an internal fan expert, you are advised to get more information  Fan A has a Weibull time to fail shape parameter of 0.8  Fan B has a Weibull time to fail shape parameter of 3.0  1 µ = θΓ1 +   β  
  • 81.
    Reliability Specifications Example  Fan A has a scale parameter of 4100 hours  Fan B has a scale parameter of 336 hours  Use the Weibull Reliability function −( t /θ ) β R (t ) = e  Fan A reliability at 100 hours is 0.95  Fan B reliability at 100 hours is 0.974
  • 82.
    Reliability Specifications Example  Given two fan datasheets  Fan A has a mean time to fail of 4645 hours  Fan B has a mean time to fail of 300 hours  What about later, say 1000 hours?  Fan A reliability at 1000 hours is 0.723  Fan B reliability at 1000 hours is 3.5E-12
  • 83.
    The Telephone "That's anamazing invention, but who would ever want to use one of them?" Rutherford Hayes U.S. President, 1876
  • 84.
    The Cost ReductionExample  Given a FET that costs 10 cents, a new procurement engineer finds a new FET vendor that only charges 5 cents.  Switch?  What else to consider?
  • 85.
    The Cost ReductionExample  Given a FET that costs 10 cents, a new procurement engineer finds a new FET vendor that only charges 5 cents.  $0.05 FET has MTBF of 50,000 hours  $0.10 FET has MTBF of 75,000 hours  1000 hours of operation  Shipping 1000 units  Cost to repair unit $250
  • 86.
    The Cost ReductionExample  Total Cost of $0.10 FET  1000  −  R0.10 (1000 ) = e  75, 000  = 0.987  #Failed = (1-0.987) 1000units = 13.25  Cost of Repairs = 250*13 = $3250  Total Cost = $3250+0.10*1000 = $3350
  • 87.
    The Cost ReductionExample  Total Cost of $0.05 FET  1000  −  R0.05 (1000 ) = e  50 , 000  = 0.98  #Failed = (1-0.98) 1000units = 20  Cost of Repairs = 250*20 = $5000  Total Cost = $5000+0.05*1000 = $5050
  • 88.
    The Cost ReductionExample  Total Cost of $0.50 FET  1000  −  R0.50 (1000 ) = e  100 , 000  = 0.99  #Failed = (1-0.99) 1000units = 10  Cost of Repairs = 250*10 = $2500  Total Cost = $2500+0.50*1000 = $3000
  • 89.
    The Cost ReductionExample  Result? FET Repair Total Cost Cost Cost $0.10 $3250 $3350 75,000 hrs $0.05 $5000 $5050 50,000 hrs $0.50 $2500 $3000 100,000hrs
  • 90.
    Aviation "The popular mindoften pictures gigantic flying machines speeding across the Atlantic and carrying innumerable passengers...it seems safe to say that such ideas are wholly visionary." Wm. Henry Pickering Harvard astronomer, 1908
  • 91.
    Component Challenges  Cost driving manufacturing to low labor cost areas of the world  Pb-free causing redesign/reformulation  Outsourced design and manufacturing facilities gaining “commodity’ component selection  Other than yield - who’s watching Quality, Reliability and Warranty?
  • 92.
    Component Challenges  P50 formula error example  Cracked ceramic capacitors
  • 93.
    Component Challenges  Trust and verify solution  Build strong, technically verifiable, language into purchase contracts  Check construction and formulation on periodic basis
  • 94.
    Nuclear Energy "Nuclear poweredvacuum cleaners will probably be a reality within 10 years." Alex Lewyt vacuum cleaner manufacturer,1955
  • 95.
    Where to GetMore Information  Newsletter and seminars http://Warrantyweek.com  “Warranty Cost: An Introduction” http://quanterion.com/ReliabilityQues/V3N3.html  “Economics of Reliability,” Chapter 4 of Handbook of Reliability Engineering and Management, 2nd Ed by Ireson, Coombs and Moss.
  • 96.
    Reliability Engineering Value Howto determine ‘value add’ or ROI
  • 97.
    “All metrics arewrong, some are useful.”
  • 98.
  • 99.
    Terms  Value o An amount considered to be a suitable equivalent for something else; a fair price or return for goods or services  Value Add o The return or result of individual, team or product investment  Value Capture o Value add documentation related directly to merger  Warranty Reduction o Lower failure rates leading to fewer claims
  • 100.
    How is valuerequested?  Quarterly review: What have you done for me lately?  Checkpoint meeting: Are we on track to meet goals?  Budget: Which option provides best ROI?  Annual review: What is your impact?
  • 101.
  • 102.
    Warranty – TheBig Picture ”American manufacturers spent over $25 billion in 2004 honoring their product warranties, an increase of 4.8% from the levels seen in 2003. However, an incredible 63% of U.S.-based product manufacturers actually saw a decrease in their claims rates as a percentage of sales. Only 35% saw an increase and 2% saw no change, according to the latest statistics compiled by Warranty Week.” Eric Arnum, Warranty Week www.warrantyweek.com, May 27th, 2005
  • 103.
  • 104.
    VALUE ADDED/ROI QUESTIONAIRE Savings/Impact/Benefit 1. Risk / cost / warranty a. Has the work directly identified or mitigated a field related problem reduction b. If so estimate the probable cost of the field problem in $ (i.e. units affected x repair cost) c. Has the probability of field related problems been reduced? d. If so give a guide by how much and the estimated cost of avoidance (i.e. Estimate 1000 units per month failure at $50 each reduced by 5%) e. Has work provided processes which will reduce the risk of field failures in subsequent products? 2. TTM impact: a. Did work help you meet or beat your TTM goals? b. Did work identify any problems which would have impacted your TTM? c. Has the use of tools/techniques identified issues which would of impacted TTM? d. If the above are applicable please identify type of problems and estimate TTM impact in days/weeks/months e. What is the estimated cost of a delay in TTM? f. What is the opportunity in $ of additional income from an early TTM?
  • 105.
    VALUE ADDED/ROI QUESTIONAIRE Savings/Impact/Benefit 3. TT Volume impact: a. Did work help you accelerate or meet your Time to Volume goals? b. If applicable what is the estimated $ impact of avoiding the TTV issues that were identified 4. Material costs: a. Did we avoid or save any direct product material or test equipment costs? b. If so please identify type and cost 5. TCE: a. Has the work contributed to the TCE of your product? b. If so identify how? i.e. estimated number of customer calls avoided c. If you have a TCE cost model what is the estimated $ impact of the identified improvement 6.Opportunity Cost a. If engineers from the business had been used to do this work would they have not been able do other product related work. I.e. delivered new functions? 7. Indirect Impact: a. What advantages did internal work provide over an external consultancy? (i.e. time, cost, contractual issues, Intellectual Property, response time)
  • 106.
    “I fall backdazzled at beholding myself all rosy red, At having, I myself, caused the sun to rise Edmund Rostand (1868-1918)
  • 107.
    VALUE ADDED/ROI QUESTIONAIRE Savings/Impact/Benefit 8. Engineering effort a. How long would it have taken your team to undertake the saved: work provided. Take into account research time and whether you had the skills available b. If you did not have the skills available how many people would have needed to be recruited to undertake the work? c. How long would it take for these people to become productive? d. Estimate training cost associated with new personnel 9. Misc a. Please identify any other benefits or cost savings from using our resources
  • 108.
    “Gross national productmeasures neither the health of our children, the quality of their education, nor the joy of their play It measures neither the beauty of our poetry, nor the strength of our marriages. It is indifferent to the decency of our factories and the safety of our streets alike. It measures neither our wisdom nor our learning, neither our wit nor our courage, neither our compassion or our devotion to country. It measures everything in short, except that which makes life worth living, and it can tell us everything about our country except those things which make us proud to be part of it.” Robert Kennedy
  • 109.
    Your ‘value case’  Problem statement  Work done to solve problem  Value statement(s)
  • 110.
    Reliability Maturity How to understand an organization’s reliability culture
  • 111.
    Maturity Matrix  Handout Matrix  Based on Quality Management Maturity Grid from Quality is Free, c 1979 by Philip B. Crosby
  • 112.
    Measurement Categories  Management Understanding and Attitude o Business objectives and language o Attention and investments  Reliability Status o Position and stature o Location and influence
  • 113.
    Measurement Categories  Problem Handling o Proactive or Reactive  Cost of ‘Un’ Reliability o Understanding and influence of metrics o Local budget or total product cost  Feedback Process o Predictions, reliability testing o Failure analysis, time to detection
  • 114.
    Measurement Categories  DFR program status o Exists separately or integrated o Template or customized  Summation of Reliability Posture o How does the organization talk about reliability?
  • 115.
    Stage I Uncertainty  Management – blame others  Status – hidden or doesn’t exist  Problems – may have good fire fighting  Cost – unknown and no influence  Feedback – customer returns & complaints  DFR – doesn’t exist even with designers  Summation – “Reliability must be ok, since customer’s are buying our products.”
  • 116.
    Stage II Awakening  Management – important w/o resources  Status – champion recognized  Problems – organized fire fighting  Cost – generally warranty only  Feedback – disorganized, antidotal  DFR – trying some tools  Summation – “We really should make more reliable products.”
  • 117.
    Stage III Enlightenment  Management – Support and encouragement  Status – Senior staff influence  Problems – Systematic and reactive  Cost – Starting to track cost of un-reliability  Feedback – ALT and modeling, root cause  DFR – program of reliability activities  Summation – “We can see how these tools help our product’s field performance.”
  • 118.
    Stage IV Wisdom  Management – Personally involved, leading  Status – Senior manager, major role  Problems – found and resolved quickly  Cost – understanding of major drivers  Feedback – selective testing in risk areas  DFR – Part of products get designed  Summation – “We avoid most field reliability issues”
  • 119.
    Stage V Certainty  Management – Considered core capability  Status – thought leader in company  Problems – Only a few issue, & expected  Cost – Accurate and decreasing  Feedback – Testing & field support models  DFR – Normal part of company business  Summation – “We do get surprised by the few field failures that occur.”
  • 120.
    Why do weneed to know Maturity?  Recommendations need to match the organizations capabilities  From current state build path toward the right one step at a time  Value proposition for changes address management approach to reliability
  • 121.
    How to determinematurity?  Self assessment o Small team from across organization o Each marks blocks that describe their maturity o Team determine Stage description by consensus  Observation from within an organization o As an individual trying to position changes o Informally conduct self assessment
  • 122.
    How to determinematurity?  Assessment Interviews o Conduct interviews to understand current reliability activities o Review and summarize interviews o Interpret results onto maturity matrix
  • 123.
    What are your questions?
  • 124.
    Reliability Assessment Using a survey to quickly understand the organization’s reliability program
  • 125.
    survey approach  selecting survey topics choosing interviewees  interview format  hw r&d manager  data collection  hw r&d engineer  business unit summary  reliability manager  immediate follow up  reliability engineer  analysis  procurement  review  manufacturing  key stakeholder reporting
  • 126.
    survey form &scoring DFR Methods Survey   Scoring: 4 = 100%, top priority, always done 3 = >75%, use normally, expected 2 = 25% - 75%, variable use 1 = <25%, only occasional use 0 = not done or discontinued - = not visible, no comment   Management:  Goal setting for division  Priority of quality & reliability improvement  Management attention & follow up (goal ownership)   Design:  Documented hardware design cycle  Goal setting by product or module
  • 127.
    design survey topics Design:  Documented hardware design cycle  Goal setting by product or module  Priority of Q&R vs performance, cost, schedule  Design for Reliability (DFR) training  Preferred technology selection/standardization  Component qualification testing  OEM selection & testing to equal HP requirements  Fault Tree Analysis/Rel. Block Diagrams (FTA/RBD)  Failure/root cause analysis  Statistically-designed engineering experiments  Accelerated Stress/Life Testing (ALT)  Design & derating rules
  • 128.
    design survey topics  Design reviews/design rule checking  Finite Element Analysis (FEA) or simulations  Failure rate estimation/prediction  Thermal design & measurements  Design tolerance analysis  Failure Modes & Effects Analysis (FMEA)  Environmental (design margin) testing  Highly accelerated life testing (HALT)  Physics of Failure analysis  Lessons-learned database  Design Defect Tracking (DDT) Ownership of quality & reliability goals
  • 129.
    manufacturing survey topics Manufacturing:  Design for manufacturability (DFM)  Priority of Q&R vs schedule & cost  Quality training programs  Statistical Process Control (SPC/SQC)  Total Quality Management (TQM)  HP process audits (written reports)  Vendor (& OEM) process audits, TQRDCE  Incoming inspection/sampling  Component burn-in  Assembly-level environmental stress screening (ESS)  Product-level environmental stress screening (ESS)  Defect Detection & Tracking (DD&T)  Corrective Action Reports  Ownership of quality & reliability goals
  • 130.
    Aircraft Company Example  AC, Inc. a private jet manufacturer, develops, manufactures, sells and provides support for aircraft, throughout the intended life cycle. The product design process is dominated by the ability to meet FAA certification requirements. This product is high cost and very low volume.  Handout, AC, Inc. Survey Summary  Determine maturity stage and make recommendations
  • 131.
    AC, Inc. keypoints  MTBF metrics  Excellent field data  Very limited sample sizes  Reactive mode to improvement activities
  • 132.
    AC, Inc. Recommendations  Use Reliability rather than MTBUR. Establish fully stated reliability goal in terms of the probability of components and aircraft successfully performing as expected under stated conditions for two or more defined time periods. Reliability is a metric that does not have a dependence on a particular lifetime distribution and is intuitively interpreted by engineers correctly. Using multiple time marks, it promotes the use of lifetime distributions rather than single parameter descriptions. Once engineers are using lifetime distributions, calculating confidence intervals is a natural extension.
  • 133.
    AC, Inc. Recommendations  Build and support an aircraft reliability model. Use the historical data, lifetime distributions (not MTBUR), RBD (reliability block diagramming) and simple mathematics to quickly create a basic reliability model. An extension of the model would be to incorporate the various environmental factors, flight profiles, and the influence of other relevant variables on failure rates. For example, some systems experience damaging stress during takeoffs and landings, others only while in flight, some only when landing in high temperature and humidity climates. Ideally for each component the model would incorporate historical field history along with environmental and component data. Even a very simple model that enables the design and procurement teams to evaluate options is well worth the effort to build and support. Most importantly a reliability model provides feedback very quickly to the design team during the design process.
  • 134.
    AC, Inc. Recommendations  Handout, AC, Inc. recommendations and matrix results  Basic idea is to make the reliability engineer more valuable to the design team by building an aircraft reliability model.  Value proposition: better design tradeoffs that include reliability.