Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

overview of reliability engineering


Published on

This seminar session provides an overview of major aspects of reliability engineering, including general introduction of reliability engineering (definition of reliability, function of reliability engineering, a brief history of reliability, etc.), reliability basics (metrics used in reliability, commonly-used probability distributions in reliability, bathtub curve, reliability demonstration test planning, confidence intervals, Bayesian statistics application in reliability, strength-stress interference theory, etc.), accelerated life testing (ALT) (types of ALT, Arrhenius model, inverse power law model, Eyring model, temperature-humidity model, etc.), reliability growth (reliability-based growth models, MTBF-based growth model, etc.), systems reliability & availability (reliability block diagram, non-repairable or repairable systems, reliability modeling of series systems, parallel systems, standby systems, and complex systems, load sharing reliability, reliability allocation, system availability, Monte Carlo simulation, etc.), and degradation-based reliability (introduction of degradation-based reliability, difference between traditional reliability and degradation-based reliability, etc.).

Published in: Technology, Business

overview of reliability engineering

  1. 1. Overview of  Reliability Engineering  (可靠性工程概述) Dr. Wei Huang (黄伟博士) ©2012 ASQ & Presentation Sun Presented live on Aug 19th, 2012 Calendar/Webinars ‐_Chinese/Webinars_‐_Chinese.html
  2. 2. ASQ Reliability Division  ASQ Reliability Division Chinese Webinar Series Chinese Webinar Series One of the monthly webinars  One of the monthly webinars on topics of interest to  reliability engineers. To view recorded webinar (available to ASQ Reliability  Division members only) visit ) / To sign up for the free and available to anyone live  webinars visit and select English  Webinars to find links to register for upcoming events Calendar/Webinars ‐_Chinese/Webinars_‐_Chinese.html
  3. 3. Introduction ofReliability Engineering Presented by Huang, Wei August 18, 2012
  4. 4. 1. Introduction Page 2 of 50
  5. 5. What Is Reliability? From The Oxford Essential Dictionary of the U.S. Military, Oxford UniversityPress, Inc, 2002. Reliability – The ability of an item to perform a required function under stated conditions for a specified period of time. From McGraw-Hill Dictionary of Scientific and Technical Terms, McGraw-HillCompanies, Inc, 2003. Reliability – The probability that a component part, equipment, or system will satisfactorily perform its intended function under given circumstances, such as environmental conditions, limitations as to operating time, and frequency and thoroughness of maintenance for a specified period of time. – Most commonly used in reliability engineering textbooks. Page 3 of 50
  6. 6. Function of Reliability Engineering Ensure that designs meet product reliability requirements. Verify that a product will function reliably over its missionlifetime. Identify design discrepancies and resolve. Evaluate potential failure modes and their effects on mission.Then, provide guidance on corrective actions. Recommend design configurations for redundancy. Establish cost effective test plan based on reliability goal todetermine sample size and test duration. Assess product failure probability at mission lifetime. Predict systems reliability and availability. Page 4 of 50
  7. 7. Costs due to Unreliability In April 1986, due to the failure of a safety control system, theChernobyl nuclear power plant at Ukraine released a huge amount ofradiation into environment, causing the worst nuclear accident in history,including killing more than 10,000 people instantly. In November 2001, due to a tail fin separation from plane body,American Airlines flight 587 crashed into a New York city neighborhoodand killed 265 people, including all passengers and crew members onboard and several people on the ground. In August 2003, the Northeastern and Midwestern United States andOntario, Canada experienced a widespread power outage, due to lackof good reliability design in the power transmission grid, affecting anestimated 45 million people in eight U.S. states and 10 million people inOntario without power. Page 5 of 50
  8. 8. A Brief History of Reliability Engineering* In 1941, Robert Lusser, who led German V-1 missile test program, first recognized theneed for a separate discipline as Reliability Engineering. In 1950, the US Department of Defense (DoD) established the Ad Hoc Group onReliability. In 1951, the secretary of defense, General George C. Marshall, ordered allDoD agencies to increase their emphasis on reliability of military electronic equipment. In 1955, Institute of Electrical and Electronics Engineers (IEEE) initiated the world 1stReliability & Quality Control Society. In 1960, the US Naval Post-Graduate School became the 1st institution to teachreliability engineering courses in the US. In 1962, the 1st Annual Reliability And Maintainability (RAM) Conference was held inthe US. In 1963, the University of Arizona, with support from National Science Foundation,became the 1st national research university to establish a Reliability Engineeringprogram in the U.S.(* Source: Dimitri Kececioglu, Reliability Engineering Handbook, Vol. 1, PTR Prentice Hall, 1991.) Page 6 of 50
  9. 9. Difference between Reliability & Quality Reliability deals with behavior of failure rate over a long period of operation,while quality control deals with percent of defectives based on performancespecifications at a certain point of time. Reliability deals with all periods of existence of a product, with primeemphasis at the design stage, while quality control deals with primarily on themanufacturing stage. Reliability and quality control use different statistical tools to evaluate. LSL Target USL 100% Defective % Reliability 0% Time Performance Measurement Page 7 of 50
  10. 10. 2. Reliability Basics Page 8 of 50
  11. 11. Metrics in Reliability Engineering Reliability (R) or probability of success (Ps) Failure probability (Pf = 1-R), equal to the cumulative density tfunction (cdf) of a lifetime distribution. cdf   f ( x )  dx (here, f is the pdf ) f 0 Failure (or hazard) rate ().   R   Mean time to failure (MTTF). MTTF   x  f ( x)  dx   R( x)  dx 0 0 Mean time between failures (MTBF) System availability (A) Page 9 of 50
  12. 12. Commonly Used Probability Distributions Distribution Variable Application Exponential Continuous variable. Commonly used for electronic Time-to-failure. parts/assemblies with constant failure rates. Weibull Continuous variable. Versatile to any application. Time-to-failure. Lognormal Continuous variable. Mostly used for products subject to wear-out. Time-to-failure. Chi-square (2) Continuous variable. Calculating confidence bounds of a constant failure rate estimate. Also used for two samples comparison, goodness-of-fit test, etc. Binomial Discrete variable with Estimating probability of success from binary outcomes. repeated tests. Also used for sampling plan. F Continuous variable. Calculating confidence bounds of a probability of success. Also used for two samples comparison. Page 10 of 50
  13. 13. Bathtub Curve The bathtub curve describes a particular form of a failure (hazard) ratefunction which comprises three parts: early failure, random failure and wear outfailure. Military Specification requires that for life critical or system criticalapplications, the infant mortality section be burned out or removed, as it greatlyreduces the possibility of the system failing early in its life. Page 11 of 50
  14. 14. Exponential Distribution Most commonly used for electronic parts or assemblies withburning-in. Failure rate is constant, only applicable for the random failure. MIL-HDBK-217 provides failure rate data for electronic parts asa function of electrical stresses and temperature. The probability density function (pdf): Failure Rate f (t )    e   t MTBF  1 /  Time Page 12 of 50
  15. 15. Weibull Distribution Named after Swedish scientist Waloddi Weibull. The most-commonly used probability distribution for life dataanalysis. Failure rate covers the whole scope of the bathtub curve. The probability density function (pdf):  Failure Rate  1 t Beta < 1.0   Beta = 1.0  t     Beta > 1.0 f (t )      e    Time Page 13 of 50
  16. 16. Lognormal Distribution Initially introduced for mechanical fatigue data analysis. Alsoused for long-term return rate on a stock investment. Failure rate covers both early failure and wear out failure, butnot random failure. The probability density function (pdf): Failure Rate 2 Sigma < 1.0 1  ln t   x  Sigma > 1.0    1 2  x  f (t )  e   t   x  2 Time x  ln t Page 14 of 50
  17. 17. Other DistributionsIn addition to the three distributions described above,there are other distributions occasionally used for lifedata analysis: Mixed Weibull – Competing failure modes Normal Extreme Value Logistic Gamma Gumbel Page 15 of 50
  18. 18. How To Determine A Lifetime Distribution? From industry standards or common practices For example, the exponential distribution is usually used for electronic parts due to wide acceptance in electronic industries. From experience or historic data For example, a typical computer hard disc drive lifetime follows a Weibull distribution with  < 1 based on long time field data. From reliability life testing Common situations in reliability engineering. Test data could be in many different types (e.g., complete, left censored, right censored, interval, and group data). Page 16 of 50
  19. 19. Confidence Interval A confidence interval (CI) is an interval estimate of a parameter,used in statistics to indicate how reliable an estimate could be. Since reliability models are often established on reliability lifetest data, any estimated number needs a CI. Page 17 of 50
  20. 20. 3. Accelerated Life Testing Page 18 of 50
  21. 21. What is Accelerating Life Testing (ALT)? The concept of ALT was introduced in 1960s. Dr. Wayne Nelsonplayed a key role to lay the foundation when he worked at GECorporate Research & Development. Driving force to promote the accelerated life testing is fromelectronic industries where products’ lifetime is quite long suchthat it would be difficult, if not impossible, to observe any failure inan affordable period of life testing. ALT is aimed to force the test units to fail more quickly then theywould under normal use conditions. In other words, the ALT is toaccelerate test units’ failure. Page 19 of 50
  22. 22. Qualitative ALT Goal of qualitative ALT is to obtain failure information, such asfailure mode, failure effect, environmental stress limit, etc. Notdesigned to yield life data. A typical example of qualitative ALT is the so-called HALT(highly accelerated life testing). Sample size usually small. Test units subjected to a single level or multiple levels of astress. Quite often, time-varying stresses (e.g., temperaturecycling from cold to hot to observe thermal fatigue). Primarily used to reveal potential design flaws in productreliability. Page 20 of 50
  23. 23. Quantitative ALT Goal of quantitative ALT is to obtain life data. Acceleration is achieved by overstress acceleration or usagerate acceleration. In most cases, the term “Accelerating LifeTesting (ALT)” means quantitative ALT by overstress acceleration. Sample size can’t be small, which is usually decided bysampling plan. Each batch of test samples subjected to a single level of stressor combined stresses. Typical stresses include temperature, humidity, voltage, current,pressure, vibration, etc. Page 21 of 50
  24. 24. ALT Data Analysis The characteristic of a lifetime distribution (e.g., mean, median,Weibull scale parameter, etc) depends on the level of stress. But researchers revealed that the shape parameter (e.g., Weibullshape parameter , lognormal standard deviation x, etc) does not varyfrom a stress level to another, unless the failure mode is changed. Typical life characteristics for the three most common lifetimedistributions (exponential, Weibull, and lognormal) are listed below. Distribution Distribution Parameter(s) Life Characteristic Exponential l MTBF (= 1/ l) Weibull b, h h Lognormal sx , mx Median Page 22 of 50
  25. 25. Example of ALT Data Analysis Following example demonstrates the time-to-failure data of aninsulation on electric motors. Test were conducted at four elevatedtemperature levels: 110, 130, 150, and 170 °C to speed up theinsulation deterioration. The use condition for the motors is 80 °C. Page 23 of 50
  26. 26. Arrhenius Model The Arrhenius model is the most well-known life-stress relationship inALT for thermal stress (i.e., temperature). It is derived from theArrhenius reaction rate proposed by Swedish scientist Svante AugustArrhenius in 1887. Ea  1 1     T T  CLu k  u a  AF (Acceleration Factor)  e CLawhere Ea is the activation energy, k the Boltzman’s constant, Tu thetemperature at use condition, and Ta the temperature at accelerated testcondition.(Note: The activation energy is the energy that a molecule must have in order toparticipate in chemical reaction. So, in other words, the activation energy is a measureof the effect that temperature has on the reaction.) Page 24 of 50
  27. 27. Inverse Power Law Model Developed from the Coffin-Manson equation for low-cycle thermalfatigue lifetime analysis. It describes that the cycles-to-failure isproportional to the inverse power of the temperature range of thecycling. Also used for other non-thermal stresses (current, voltage, vibration,etc). n CLu  S a  AF     CLa  S u  where n is the model exponent, to be determined, Su the stress at usecondition, and Sa the stress at accelerated test condition. Page 25 of 50
  28. 28. Eyring Model The Eyring model was originally developed for thermal stresses fromquantum mechanics. In general, for thermal stresses, both the Eyringmodel and Arrhenius model yield very close results. But the Eyringmodel could also be used for humidity stress.  1 1  b  S S   CLu S a AF   e  u a  CLa S uwhere b is the model parameter, to be determined, Su the stress at usecondition, and Sa the stress at accelerated test condition. Page 26 of 50
  29. 29. Temperature-Humidity Model The temperature-humidity (T-H) model is a variation of the Eyringmodel when both temperature and humidity stresses are involved.  1 1   1 1      b  a    RH  RH  CLu  Tu Ta   u a  AF  e CLawhere both a and b are the model parameters, to be determined, Tu andRHu the temperature and relative humidity at use condition, and Ta andRHa the temperature and relative humidity at accelerated test condition. Page 27 of 50
  30. 30. 4. Reliability Growth Page 28 of 50
  31. 31. What is Reliability Growth? Reliability Growth is a tool to predict reliability of a system orequipment under development to some future development timefrom information available now, or monitor the reliability of thesystem or equipment to establish a trend in increase of reliabilitywith research and engineering efforts to make sure it achieves itsreliability goal. Reliability growth studies are necessary to ensure that, frominformation available at the beginning of a project, the reliabilitygoal is achievable by delivery time. In general, a growth model isprojected to the project completion date. Page 29 of 50
  32. 32. A Typical Reliability Growth Curve*(* Source: Dimitri Kececioglu, Reliability Engineering Handbook, Vol. 2, PTR Prentice Hall, 1991.) Page 30 of 50
  33. 33. Reliability–Based Growth Models Gompertz Model ct R (t )  a  bwhere t is the development time, 0 < a, b & c <1. Logistic Model 1 R(t )  1  a  e  b twhere t is the development time, a & b >0. Lloyd-Lipow Model  Rk  R  kwhere Rk is the reliability at the kth stage of development/testing, and Rthe ultimate reliability. Page 31 of 50
  34. 34. MTBF–Based Growth Models Duane Model MTBF (t )  a  t bwhere t is the development time, a the MTBF at the beginning ofdevelopment (defined as t0 = 1), and 0  b  1. AMSAA (U.S. Army Material Systems Analysis Activity) Model 1   t MTBF (t )        where t is the development time,  &  > 0. Page 32 of 50
  35. 35. 5. Systems Reliability & Availability Page 33 of 50
  36. 36. Objective of System Reliability & Availability To evaluate system reliability; i.e., probability that a system isoperating properly without a failure. To evaluate system availability; i.e., probability that a system isoperating properly when it is requested for use. To provide recommendation for any design change forredundancy to achieve a specified system reliability or availabilitygoal. Page 34 of 50
  37. 37. Reliability Block Diagram (RBD) A graphical representation of subsystems or components of asystem and reliability-wise connection among them. A RBD should be created prior to doing system reliabilitymodeling. A RBD might be different from its functional block diagram Fan Power Micro- Hard Peripheral SDRM Supply Processor Drive Electronics Fan A simplified RBD of a computer system Page 35 of 50
  38. 38. Non-Repairable & Repairable Systems A non-repairable system does not get repaired when it fails. For a non-repairable system, system reliability is a sufficientmeasure of the system performance. A repairable system gets repaired when it fails. In a repairable system, two types of distributions areconsidered: life distribution and repair time distribution. For a repairable system, system reliability itself is not asufficient measure of the system performance since it does notaccount for repair. System availability also needs to be evaluated,and in most cases, even more important than system reliability. Page 36 of 50
  39. 39. Methods of RBD Analysis RBD analysis can be performed with both analytical and simulation techniques. Analytical approach is to develop a mathematical model to describe the reliability of asystem, based on reliability data of subsystems or components. Advantage: A math model is developed. Using it, more analysis can be performed, such as conditional reliability, warranty, etc. Disadvantage: In general, it is difficult to get the model for a complex system or a repairable system. Simulation approach is based on random number generation, to get the time-to-failureof each subsystem or component. The failure time is then analyzed to determine thebehavior of the system. Advantage: It can be used for a highly complex system where no analytical solution is expected. Disadvantage: (1) It can be time-consuming. (2) Result depends on the number of simulation runs. (3) Lack of repeatability in result due to random nature of data generation. Page 37 of 50
  40. 40. Reliability of Series Systems Success of a series system requires every single subsystem orunit to succeed. S1 S2 S3 Sn Reliability block diagram of a series system The system reliability equals to the product of the reliability ofeach individual subsystem or unit. n Rsys (t )   Ri (t ) i 1 Page 38 of 50
  41. 41. Reliability of Parallel Systems – Active Redundancy Failure of a parallel system means all subsystems or units fail. S1 S2 S3 Sn Reliability block diagram of a parallel system The system reliability is expressed as: n Rsys (t )  1     Ri (t ) 1 i 1 Page 39 of 50
  42. 42. Difference between Function & Reliability A functional parallel system does not have to be reliability-wiseparallel. + + X - -For the failure mode of open circuit, For the failure mode of short circuit,the functional parallel capacitors are the functional parallel capacitors arereliability-wise parallel. reliability-wise series. Page 40 of 50
  43. 43. System Reliability in Standby – Inactive Redundancy Standby subsystem remains inactive until the active one fails. SA SS Reliability block diagram of a 2-for-1 standby system For the above 2-for-1 standby system, the system reliability isexpressed as: t R A (t e  t  x ) R (t )  R A (t )   f A ( x)  RS ( x)  dx 0 R A (t e )where te is an equivalent time such that RS(x) = RA(te). Page 41 of 50
  44. 44. Example of Complex Systems Unit B Unit E Unit A Unit C Unit G Unit D Unit FIn this RBD, assume all units are in active redundancy.It would be difficult to recognized which units are in series and whichones are in parallel, due to the fact that Unit C has two paths leadingaway from it, while Unit B & D have only one. Page 42 of 50
  45. 45. System Availability Availability is a probability that a system is operating properlywhen it is requested for use. It is a performance characteristic for repairable systems thataccounts for both reliability and maintainability properties of asubsystem or unit. For example, a lamp with a 99.90% availability means that, inaverage, there would be once out of one thousand times whensomeone needs to use the lamp but finds out the lamp is notoperational either because the lamp is burned out or the lamp is inthe process of being replaced. Page 43 of 50
  46. 46. Repairable Systems vs. Renewal Process For a repairable system, the operation time is not continuous. The life cyclecontains a sequence of up & down states. Once the system fails, it is repairedand restored to its original operating state. The repeated process of failure andrepair is classified as a alternating renewal process. And the associated randomvariables are the times-to-failure and the times-to-repair. Page 44 of 50
  47. 47. Definition of Availability Instantaneous (or Point) Availability – A(t) t A(t )  R (t )   R(t  x)  m( x)  dx 0where m(x) is the renewal density function of the system. Average Uptime (or Mean) Availability – A(t) t 1 A(t )   A( x)  dx t0 Steady State Availability – A() A( )  lim A(t ) t  Inherent Availability (Steady State Availability for Exponential) – AI MTBF AI  MTBF  MTTR Page 45 of 50
  48. 48. 6. Degradation-Based Reliability Page 46 of 50
  49. 49. What & Why Degradation-Based Reliability? Degradation-Based Reliability is a new technique to evaluate product reliability based on its performance degradation measurements, rather than its time-to-failure data. Many failure mechanisms are directly linked to degradation of some critical performance characteristics, such as brake failure due to pad wear, solder joint failure due to fatigue crack propagation, etc. Reliability of today’s products has been greatly improved, such that fewer failures could be observed from reliability testing. Reliability evaluation based on degradation provides a bridge between reliability and physics-of-failure. Degradation testing could be much shorter because it does not need to witness any “hard failure”. It makes it possible to predict products’ residual life from critical performance measurements. Page 47 of 50
  50. 50. Graphic Showing Degradation-Based ReliabilityFollowing plot illustrates three units be tested for performancedegradation. The failure criterion is determined based on theperformance design specification. y(t) y1(t) y2(t) y3(t) Failure Criterion t TTF1 TTF2 TTF3 Page 48 of 50
  51. 51. Approaches for Degradation-Based Reliability Determine failure criterion of a performance characteristic, which defines the maximum allowable degradation level and would constitute a failure once being reached. Measure performance degradation from multiple test units over time, either continuously or at predetermined intervals. Analyze the performance degradation data to establish statistical models for the performance degradation. Evaluate the product reliability based on its failure criterion. Page 49 of 50
  52. 52. Difference in Reliability Modeling In Traditional Failure-Based Reliability Modeling, (1) The goal is to establish a distribution function for the variable of time-to- failure. (2) Distribution parameters are usually time independent. (3) Reliability evaluation is performed directly based on the established time-to-failure distribution function. That is, R(t) = Pr{T > t }. In Degradation-Based Reliability Modeling, (1) The goal is to establish a distribution function for the variable of performance characteristic. (2) Distribution parameters are usually time dependent. (3) Reliability evaluation is performed indirectly based on determined failure criterion and the established performance degradation distribution function. That is, R(t) = Pr{Y(t) < Ycr}. Page 50 of 50