Probabilistic design for reliability (pdfr) in electronics part2of2


Published on

This is a four parts lecture series. The course is designed for reliability engineers working in electronics, opto-electronics and photonics industries. It explains the roles of Highly Accelerated Life Testing (HALT) in the design and manufacturing efforts, with the emphasis on the design one (the HALT in manufacturing is the well known late Greg Hobb’s approach), and teaches what could and should be done to design, when high probability is a must, a product with the predicted, specified (“prescribed”) and, if necessary, even controlled, low probability of the field failure.
Part 3: • Design for Reliability (DfR)
• Probabilistic Design for Reliability (PDfR): role, attributes, challenges, pitfalls
• Safety margin and safety factor
• Practical examples: assemblies subjected to thermal and/or dynamic loading
Part 4: • More general PDfR approach
• New Qualification Approaches Needed?
• One effective way to improve the existing QT practices and specifications

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Probabilistic design for reliability (pdfr) in electronics part2of2

  1. 1. Probabilistic Design for  Reliability (PDfR) in  Electronics El i Part II Part II Dr. E. Suhir ©2011 ASQ & Presentation Suhir Presented live on Jan 03~06th, 2011 Calendar/Short Courses/Short_Courses.html
  2. 2. ASQ Reliability Division  ASQ Reliability Division Short Course Series Short Course Series One of the monthly webinars  One of the monthly webinars on topics of interest to  reliability engineers. To view recorded webinar (available to ASQ Reliability  Division members only) visit ) / To sign up for the free and available to anyone live  webinars visit and select English  Webinars to find links to register for upcoming events Calendar/Short Courses/Short_Courses.html
  3. 3. PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT, the Roles of Failure Oriented Accelerated Testing (FOAT) and Predictive Modeling (PM), and a Novel Approach to Qualification Testing (QT) “You can see a lot by observing” Yogi Berra, American Baseball Player “It is easy to see, it is hard to foresee” Benjamin Franklin, American Scientist and Statesman E. Suhir Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret), University of California, Dept. of Electrical Engineering, Santa Cruz, CA, University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA Tel. 650-969-1530, cell. 408-410-0886, e-mail: Four hour ASQ-IEEE RS Webinar short courseDr. E. Suhir January 3-6, 2011 Page 1
  4. 4. ContentsSession I1. Introduction: background, motivation, incentive2. Reliability engineering as part of applied probability and Probabilistic Risk Management (PRM) bodies of knowledge3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls and interaction with other accelerated test categoriesSession II4. Predictive Modeling (PM): FOAT cannot do without it5. Example of a FOAT: physics, modeling, experimentation, predictionSession III6. Probabilistic Design for Reliability (PDfR), its role and significanceSession IV7. General PDfR approach using probability density functions (pdf)8. Twelve steps to be conducted to add value to the existing practice9. Do electronic industries need new approaches to qualify their devices into products?10. Concluding remarksDr. E. Suhir Page 2
  5. 5. Session III 6. PROBABILISTIC DESIGN FOR RELIABILITY, ITS ROLE and SIGNIFICANCE “Probable is what usually happens” Aristotle, Greek philosopher “Probability is the very guide of life” Marcus Tullius Cicero, Roman philosopher and statesmanDr. E. Suhir Page 60
  6. 6. Design-for-Reliability Design for reliability (DfR) is a set of approaches, methods and best practices that are supposed to be used during the design phase of the product to minimize the likelihood (risk) that the product will not meet the reliability requirements, objectives and expectations. While 50% of the total actual cost of an electronic product is due to the cost of materials, 15% - to the cost of labor, 30% to the overhead costs and only 5% to the design effort, this effort influences about 70% of the total cost of the product (“Six Sigma”, M. Harry and R. Schroeder). If reliability is taken care of during the design phase, the final cost of the product does not go up. If a reliability problem is detected during engineering the cost of the product goes up by a factor of 10. If the problem is caught in production phase, the cost of the product increases by a factor of 100 or more.Dr. E. Suhir Page 61
  7. 7. Deterministic approach Deterministic approach is based on the concept that reliability is assured by introducing a sufficiently high deterministic safety factor, which is defined as the ratio of the capacity (“strength”) C of the system to the demand (“load”) D: C δ =SF= . D The level of the safety factor SF is being chosen depending on the consequences of failure, acceptable risks, the available and trustworthy information about the capacity and the demand, the accuracy with which these characteristics are determined, possible costs and social benefits, variability of materials and structural parameters, construction (manufacturing, fabrication) procedures, etc. In a particular problem the capacity and demand could be different from the strength and load, and the role of these characteristics can be replaced by, say, acceptable and actual current, voltage, light intensity, electrical resistance; traffic capacity and traffic flow; culvert size and the quantity of water; critical (buckling) and actual compressive stresses; etc. The safety factors in engineering are being established from the previous experiences for the considered system in its anticipated environmental or operation conditions.Dr. E. Suhir Page 62
  8. 8. Probabilistic approach Probabilistic DfR (PDfR) approach is based on the probabilistic risk management (PRM) concept, and if applied broadly and consistently, brings in the probability measure (dimension) to each of the design characteristics of interest. Using AT data and particularly FOAT data, and PM techniques, it enables one to establish the probability of the possible (anticipated) failure under the given operation conditions and for the given moment of time in operation After the probabilistic PMs are developed, one should use sensitivity analyses to determine the most feasible materials and geometric characteristics of the design, so that the lowest probability of failure is achieved In other cases, the probabilistic DfR approach enables one to find the most feasible compromise between the reliability and cost effectiveness of the product When probabilistic DfR (PRM) approach is used, the reliability criteria (specifications) are based on the acceptable (allowable) probability of failure for the given product.Dr. E. Suhir Page 63
  9. 9. Basic Principles Underlying our PDfR Approach-1 Not all the products require the PDfR approach, but only those for which high reliability is crucial and for which there is a reason to believe that this probability might not be high enough for particular applications Nobody and nothing is perfect. The difference between a reliable and unreliable system (device) is in the level of the probability of failure in the field under the given (anticipated) loading (environmental) conditions and after the given (specified) time in operation. The probability of failure in the field is the ultimate and a “reliable” criterion (“judge”) of the product’s reliability This probability can be established through a specially designed and carefully conducted DFOAT aimed at understanding the physics of failure and choosing the right predictive DFOAT model (e.g., Arrhenius, Coffin-Manson, crack propagation, demand-capacity “interference”, etc.) for the anticipated loading conditions or their combination (say, thermal+vibrations)Dr. E. Suhir Page 64
  10. 10. Basic Principles Underlying our PDfR Approach-2 The reliability of a product is due to the reliability of its one or two most vulnerable (most unreliable) functional or structural elements, and it is for these elements that the adequate DFOAT should be designed and conducted Sensitivity analyses are a must after the physics of the anticipated failure is established, the appropriate predictive model is agreed upon, and the acceptable probability of failure in the field is specified, but prior to the final decision about launching the mass production of the product DFOAT is not necessarily a destructive test, but is always a test to failure, a test to determine the limits of the reliably operation and the probability that these limits are exceeded DFOAT cannot do without predictive modeling, and it is only through the predictive modeling that the probability-of-failure in the field could be found (established)Dr. E. Suhir Page 65
  11. 11. Basic Principles Underlying our PDfR Approach-3 Time and labor consuming a-posteriori “statistics-of-failure” can be successfully replaced, to a great extent, by the anticipated a-priori “probability-of-failure” confirmed by some statistical data (for the mean and STD values of the probability distribution of interest, but not for the probability-distribution function itself) PDfR concept enables one to qualify a viable device (system) into a reliable-in-the- field product, with the predicted, prescribed (specified) and even, if necessary, controlled probability of failure in the field Technical diagnostics, prognostication and health monitoring could be effective means to anticipate, establish and prevent possible field failures PDfR has to do with the DfR, and not with the Manufacturing-for-Reliability (MfR) Burn-ins could be viewed as a special type of FOAT intended for MfR objectives and are always a must, whatever DfR approach is considered.Dr. E. Suhir Page 66
  12. 12. Reliability function The simplest objects (items) in reliability engineering are those that do not let themselves to restoration (repair) and have to be replaced after the first failure. The reliability of such items is due entirely do their dependability, i.e., probability of non-failure, which is the probability that no failure could possibly occur during the given period of time. The dependence of this probability of time is known as the reliability function. As any other probability, the dependability of a sufficiently large population of non-repairable items can be substituted by the frequency, and therefore the reliability function can be sought as s (t ) , R(t ) = 0 where 0 is the total number of items being tested and s(t) is the number of items that are still sound by the time t .Dr. E. Suhir Page 67
  13. 13. Failure rate Differentiation the relationship s (t ) R(t ) = 0 with respect to time t, we have: dR(t) 1 d s (t) 1 d f (t) = =− dt 0 dt 0 dt where f (t) = 0 − s (t) is the number of the failed items. The failure rate is introduced as follows: 1 d f (t) λ(t) = s (t) dt As evident from this formula, the failure rate is the ratio of the number of items that failed by the time t to the number of items that remained sound by this time. The failure rate characterizes the change in the dependability of an item in the course of its lifetime.Dr. E. Suhir Page 68
  14. 14. Bathtub curveDr. E. Suhir Page 69
  15. 15. Probabilistic and statistical definitions of the reliability function 1 d f (t) dR(t) 1 d s (t) 1 d f (t) Considering λ(t) = , the formula = =− dt dt dt s (t) dt 0 0 dR(t ) 1 s (t ) dR ( t ) yields: dt = −λ (t ) = −λ (t ) R(t ) , or = − λ ( t ) dt 0 0 R t  t  so that ln R(t ) = −∫ λ(τ )dτ . . Hence, R(t ) = exp− ∫ λ (τ )dτ  0  0  The reliability function R(t) satisfies the obvious initial condition R(0)=1. The above formula for the reliability function expresses the probabilistic definition of this function, while the formula s (t) R(t) = 0 provides its statistical definition.Dr. E. Suhir Page 70
  16. 16. Exponential formula of reliability (revisited). Probability of failure When the failure rate is time independent, the formula  t  R ( t ) = exp  − ∫ λ (τ ) d τ   0  leads to the exponential formula of reliability: R(t) = e−λt The function dR (t )  t  f (t ) = − = λ ( t ) exp  − ∫ λ (τ )dτ  dt  0  is the probability density function for the flow of failures, or the failure frequency. The probability of a failure during the time t can be evaluated as t Q(t) =1− R(t) = ∫ f (τ )dτ 0Dr. E. Suhir Page 71
  17. 17. Stress-strength (“interference”) concept The curve on the right should be obtained experimentally, based on the accelerated life testing andon the accumulated experience. The bearing capacity of the structure should be such that theprobability of failure, P(t), is sufficiently low, and the safety factor (SF) is not lower than thespecifies value, say, SF=1.4. In a simplified analysis the curve on the right could be substituted,particularly, by a constant value, which, if a conservative approach is taken, should be sufficientlylow. Capability of the tile structure with respect to the Probability density function for a particular mechanical or thermal loading (may or particular mechanical or thermal may not be time-dependent). In the current analysis characteristic (response) of the tile we assume that the bearing capacity for a particular structure to the given environmental factor reliability characteristic is either a constant value or a at the given moment of time (“Demand”, D) normally distributed random variable with a known (evaluated) mean and standard deviation (“Capacity”, C)The larger is the overlap of these two curves, the higher is the probability of failure, and the lower isthe safety factor. After these two curves are evaluated (established) for each reliability characteristic ofinterest and for each moment of time (separately, for the take off and landing processes) we evaluatethe probability distributing function, f(ψ), for the safety margin, ψ=C-D, its mean, <ψ>, and standarddeviation, ŝ, and the safety factor, SF= <ψ>/ ŝ. It should not be lower than the specified value, say,SF=1.4.
  18. 18. Probability of non-failure (dependability)The “reliability” (actually, “dependability”) of a non-repairable item is defined as theprobability of non-failure, P = P {C>D}, i.e., as the probability that the item’s bearingcapacity (“strength”), C, during the time, t, of operation under the given stressconditions, will always be greater than the demand (“loading”), D. Although the probability of non-failure is never zero, it can be made, if a probabilisticapproach is used, as low as necessary. If the probability distributions f (C) and g (D)(probability density functions) for the random variables C and D are known, then theprobability, P, of non-failure (reliability, dependability) can be evaluated as ∞ P = ∫0 f ψ (ψ ) d ψwhere f(ψ) is the probability density function of the margin of safety ψ=C-D, which isalso a random variable.
  19. 19. Safety factor -1Direct use of the probability of non-failure is often inconvenient, since, for highlyreliable items, this probability is expressed by a number which is very close to one,and, for this reason, even significant chan in the item’s (system’s) design, which havean appreciable impact on the item’s reliability, may have a minor effect on theprobability of non-failure.In those cases when both the mean value, <ψ>, and the standard deviation, ŝ, of themargin of safety (or any other suitable characteristic of the item’s reliability, such asstress, temperature, displacement, affected area, etc.), are available, the safety factor(safety index, reliability index) SF=δ= <ψ>/ŝcan be used as a suitable reliability criterion.
  20. 20. Safety factor-2 After the capacity and the demand curves are established for each probability characteristic of interest and for each moment of time the probability distribution function f (ψ ) for the safety margin Ψ = C − D should be determined. Then, for normally distributed capacity and demand, the mean value ∞ < ψ >= ∫0 f (ψ )ψ d ψ of the safety margin and its standard deviation ∞ sψ = ∫ f (ψ)( − <ψ >)2dψ ψ 0 should be evaluated. The safety factor could be found as the ratio of the mean value of the safety margin to its standard deviation: <ψ > SF = δ = sψDr. E. Suhir Page 75
  21. 21. Safety factor-3 The SF should not be lower than its specified value for the characteristic of interest. This value should reflect the state-of-the-art in the given area of engineering, cost andtime-to-market considerations, and should account for the consequences of failure. If the computed SF does not meet the specification requirements, the design should berevised (improved) until the required level of safety (reliability) is met. The required level of safety could be established also based on the level of theprobability ∞ P(ψ ) = ∫ f (ψ )dψ ψof non-failure. This formula defines the probability that the safety margin Ψ=C−D is found between the given value and infinity. i.e., is higher than the given (specified)value of this margin.Dr. E. Suhir Page 76
  22. 22. The SF and the probability P(ψ ) of exceeding a certain level of the safety margin are related If the reliability characteristic of interest (such as, e.g., the safety margin, ψ) is distributed in accordance with the normal law Normal law  ( −ψ  1 ψ ) 2 fψ (ψ ) = exp−  dψ 2πDψ   2 Dψ  then the probability of non-failure is related to the safety factorSF as P SF P=½[1+Ф(SF)], 0.999000 3.0901where α 0.999900 3.7194 2 ∫ 2 Ф(α) = e−t dt 0.999990 4.5255 π0is the probability integral (Laplace function). 0.999999 4.7518 1.0 ∞
  23. 23. Safety factor-4 SF establishes both the upper limit of the reliability characteristic of interest (through the mean value of the corresponding margin of safety) and the accuracy with which this characteristic is defined (through the corresponding standard deviation). The structure of the SF indicates that it is acceptable that a system characterized by a high mean value of the safety margin (i.e., a system whose bearing capacity with respect to a certain stress/reliability-characteristic, not necessarily mechanical, is significantly higher than the level of loading) has a less accurately defined deviation from this mean value than a system characterized by a low mean value of the safety margin (i.e., a system whose bearing capacity is much closer to the possible level of loading). In other words, the uncertainty in the evaluation of the safety margin should be smaller for a more vulnerable design.Dr. E. Suhir Page 78
  24. 24. Safety factor (SF) and coefficient of variability (COV) Safety factor (SF) is reciprocal to the coefficient of variability (COV). The latter is defined as the ratio of the standard deviation to the mean value of the random variable of interest. While the COV is the characteristic of uncertainty of the random variable of interest, the SF is the characteristic of certainty of the random parameter (stress- at-failure, the highest possible temperature, the ultimate displacement, the affected area, etc.) that is responsible for the non-failure of the item. If the reliability characteristic of interest (for a non-repairable item) is a random variable that is determined by just two independent non-random quantities (say, the mean value and the standard deviation), then the safety factor, SF, determines completely the probability of non-failure (reliability): the larger the SF is, the higher is the probability of non-failure.Dr. E. Suhir Page 79
  25. 25. Time-to-failure (TTF), MTTF and the corresponding SFUsually the capacity (strength), C, and/or the demand (loading), D, change in time.Failure occurs, when the demand (loading), D, becomes equal or smaller than thebearing capacity (strength), C, of the item. This random event is the time-at-failure(TAF), and the duration of operation until this time takes place is the random variableknown as time-to-failure (TTF).Thus, TTF is the time from the beginning of operation until the moment of time whenthe demand (loading) D becomes equal or higher than the bearing capacity C, i.e.,when the safety margin Ψ=C−D becomes zero or negative.The corresponding safety factor, SF, is the ratio of the MTTF to the STD of the TTF: SF=MTTF/STD
  26. 26. Mean time-to-failure and reliability function Mean-time-to-failure (MTTF) is the mean time of the item operation until it fails. ∞ dR ( t ) Hence, it can be computed as t = ∫ 0 f ( t ) tdt . Since f (t ) = − dt we have (using integration by parts): ∞ ∞ ∞ ∞ dR ( t ) ∫ f ( t )tdt = − ∫ tdt = −[R ( t )t ]0 + ∫ R ( t ) dt = ∫ R ( t ) dt , ∞ t = 0 0 dt 0 0 and the variance of the TTF can be found as ∞ ∞ ∞ Dt = ∫ f (t)(t − t )2 dt = ∫ f (t)t2dt − t 2 = 2∫ R(t)tdt− t 2 0 0 0 The corresponding SF is MTTF t δ = SF= = STD D tDr. E. Suhir Page 81
  27. 27. Example #1As a simple example, examine a device whose MTTF, τ , during steady-state operation is describedby the Boltzmann-Arrhenius equation τ = τ 0 exp   The failure rate is therefore U  .  kT  1 1  U  If Weibull law is used to predict the probability of failure, then the probabilityλ = = exp  − . τ τ0  kT of non-failure (dependability) can be evaluated on the basis of the following probability distribution   t  U    where β is a shape parameter. Solving βfunction: P = exp [− ( λ t ) ] = exp  −  β exp  −   ,    τ0   kT    this equation for the absolute temperature T , we obtain: T = − U . τ 1/ β  k ln  0 (− ln P )   t 
  28. 28. Example #1 (cont) U ULet for the given type of failure (say, surface charge accumulation), the ratio is = 116000 K , k kthe τ 0 value predicted on the basis of the ALT is τ 0 = 5x10−8 hours, and the shape parameter βturned out to be close to β = 2 (Rayleigh distribution). Let the allowable (specified) probability of −5failure at the end of the device’s service time of, say, t = 40,000 hours be Q = 10 (it is acceptablethat one out of hundred thousand devices fails). Then the above formula indicates that the steady-state 0 0operation temperature should not exceed T = 349.8 K = 76.8 C, and the thermal managementtools should be designed accordingly. This rather elementary example gives a feeling of how thePDfR concept works and what kind of information one could expect using it.
  29. 29. Example #2 Let, for instance, the absolute temperature T be distributed in accordance with the Rayleigh law, so that the probability that a certain level T is exceeded is * determined as  T*2  P(T > T* ) = exp − 2   T   0  where T0 is the most likely value of the absolute temperature T. Then, using the Boltzmann-Arrhenius relationship Ua  τ = τ 0 exp    kT  we conclude that the probability that the random MTTF τ (“random”, because τ of the uncertainty in the level of the most likely temperature) is below a certain level * (probability of failure is defined in this case as the probability that the specified level is not achieved) can be found asDr. E. Suhir Page 84
  30. 30. Example #2 (cont)    2       T*2  Ua P (τ > τ * ) = exp  − 2  = exp  −   T     0    τ     kT 0 ln *   τ0     Solving this equation for the most likely (specified) T value, we find: 0 Ua T0 = τ* k ln − ln P τ0 This formula indicates how the (most likely) level of the device temperature should be established, so that the probability that the specified level τ of the MTTF is not * achieved is sufficiently low.Dr. E. Suhir Page 85
  31. 31. Reliability of repairable items Reliability of complex items (products) depends not only on their dependability, but on their repairability as well. It is important that the products are designed in such a way that their gradual and potential failures could be easily detected and eliminated in due time, and that the detected damages (defects), such as, say, fatigue cracks, could be removed before a catastrophic failure process commences. The reliability of complex products is characterized, first of all, by their availability, which is defined as an ability of an item (system) to perform its required function at the given time or over a stated period of time, with consideration of its dependability, repairability, maintainability and maintenance support. A high level of reliability of complex products can be achieved by employing the most feasible combination of dependability, on one hand, and dependability, repairability, maintainability and maintenance support, on the other.Dr. E. Suhir Page 86
  32. 32. Availability index-1 The non-steady-state (time dependent) operational availability indexK (t )is defined as the probability that the item of interest will be available to the user at the given moment T of time and will operate failure-free during the given time beginning with the moment t . The steady-state availability index K is the time-independent probability that the item will operate (will be available) failure-free during the time T , beginning with an arbitrary moment t of time that is sufficiently remote from the beginning of operations (so that the “infant mortality” portion of the “bathtub” curve is excluded). The most often used availability characteristic of the Class II and Class III items, whose normal operation includes regular repairs (say, workstations or other complex and expensive electronic systems), is the availability index K a defined as the steady-state probability that the item will be available at the arbitrary moment of time taken between the preplanned preventive maintenance activities.Dr. E. Suhir Page 87
  33. 33. Availability index-2 The availability index K a can be computed by the formula 1 Ka = n t ir 1+ ∑i =1 ti f where ti f is the mean time between successive failures for the i-th item in the system, andr t is the mean-time-to-repair for this item. i The index K a indicates the percentage of time, during which the system is in the working (available) condition. The use of the index K a enables one to make assessments of the unforeseen idle times and to consider these times at early stages of the design of the product.Dr. E. Suhir Page 88
  34. 34. Operational Availability Index The operational availability index K (t ) can be calculated for situations, when the probability of failure-free operation during the time interval t is independent of the beginning of this interval, by the formula K (t ) = K a R (t ) where R(t) is the dependability of the item. This formula determines the probability that two events take place: 1) the item is available at the arbitrary moment of time with the probability Ka and 2) will operate failure-free during the time period of the duration t.Dr. E. Suhir Page 89
  35. 35. Session IV 7. GENERAL PDfR APPROACH USING PROBABILITY DENSITY FUNCTIONS (PDF) “Education is man’s going forward from cocksure ignorance to thoughtful uncertainty”, Donald B. Clark, Australian author, “Scrapbook” “There are things in this world, far more important than the most splended discoveries – It is the methods by which they were made” Gottfried Leibnitz, German mathematicianDr. E. Suhir Page 90
  36. 36. PDfR Characteristics The appropriate electrical, optical, mechanical, thermal, and other physical characteristics that determine the functional performance, mechanical (physical/structural) reliability and/or environmental durability of the design/device/apparatus of interest should be established. Examples of are: appropriate electrical parameters (current, voltage, etc.), light output, heat transfer capability, mechanical ultimate and fatigue strength, fracture toughness, maximum and/or minimum temperatures, maximum accelerations/decelerations, etc.Dr. E. Suhir Page 91
  37. 37. Factors that affect the PDfR characteristics-1 Establish the electrical, optical, mechanical, thermal, environmental and other possible (say, human) stress (loading) factors (conditions) that might affect the reliability characteristics, i.e., characteristics that determine (affect) the short- and long-term reliability of the object (structure) of interest. Examples are: high an/or low temperatures, high electrical current or voltage, electrical and/or optical properties of materials, mechanical and thermal stresses, displacements, maximum temperatures, size of the affected areas, etc. This should be one separately for each characteristic of interest and, if necessary, for each manufacturing process and for different phases of manufacturing, testing and/or operationsDr. E. Suhir Page 92
  38. 38. Factors that affect the PDfR characteristics-2 Based on the physical nature of the particular environmental/loading factor (electrical, optical, mechanical, environmental) and on the available information of it, establish if this factor should be treated as a non-random (deterministic) value, or should/could be treated as a random variable with the given (assumed) probability distribution function. At this stage one could treat random characteristics of interest as nonrandom functions of random factors, and establish the probability distribution functions for the random factors using experimental data, and/or Monte-Carlo simulations, and/or finite-element analyses (FEA), and/or evaluations based on analytical (“mathematical”) modeling, etc.Dr. E. Suhir Page 93
  39. 39. Factors that affect the PDfR characteristics-3 Let, for instance, the absolute temperature T be distributed in accordance with the Rayleigh law, so that the probability that a certain level T* is exceeded is determined as  T*2  P(T > T* ) = exp − 2   T   0  where T0 is the most likely value of the absolute temperature T. Then, using the Boltzmann-Arrhenius relationship Ua  τ = τ 0 exp    kT  τ we conclude that the probability that the random mean-time-to-failure (“random”, τ * of the most likely temperature) is because of the uncertainty in the level below a certain levelDr. E. Suhir Page 94
  40. 40. Factors that affect the PDfR characteristics-4 (probability of failure that is define in this case as the probability that the specified level is not achieved) can be found as    2       T*2  Ua P (τ > τ * ) = exp  − 2  = exp  −   T     0    τ*     kT 0 ln τ     0   Solving this equation for the P(τ >τ* ) we find: value, Ua T0 = τ* k ln − ln P τ0 This formula indicates how the (most likely) level of the device temperature should be established, so that the probability that the specified level τ * of the MTTF is not achieved is sufficiently low.Dr. E. Suhir Page 95
  41. 41. Choose appropriate basic probability distributions-1 After the reliability characteristics are established and the factors affecting these characteristics are selected , one should choose the adequate probability distributions for the factors (conditions) that affect the short- and long-term reliability characteristics. For those factors (conditions) that should be treated as random variables, establish (accept) the physically meaningful probability distribution laws. When the actual experimental information is not available, assume, based on general physical considerations, the most suitable (or the most conservative) laws of the probability distribution (e.g., uniform, exponential, normal, Weibull, Rayleigh, etc.).Dr. E. Suhir Page 96
  42. 42. Choose appropriate basic probability distributions-2 Here are some general considerations that can be used in practical applications. Since the exponential distribution has the largest entropy (the largest uncertainty) of all the distributions with the same mean, this distribution should be considered, if no other information, except the expected (mean) value, is available. The exponentially distributed random variable is always positive. The safety factor for an exponentially distributed random variable is always “one”. If the random process of failures can be treated as a simple Poisson flow with a constant intensity, then the time interval between two adjacent consecutive failures has an exponential distribution. The most likely value of the exponentially distributed random variable, t, is at the initial moment of time t=0.Dr. E. Suhir Page 97
  43. 43. Choose appropriate basic probability distributions-3 If the physical nature of a random environmental factor is such that it can be only positive (i.e., acceleration during take off of an aircraft, or a current for an electronic module) or only negative (i.e., deceleration during landing or during drop tests of a cell phone), its most likely value is certainly non-zero. If only this value (or the mean) is available, then the Rayleigh law could be employed. This law is also (like the exponential law) a single-parametric law. The safety factor, when Rayleigh distribution is used, is always 1 δ = = 0.6633 4 1+ πDr. E. Suhir Page 98
  44. 44. Choose appropriate basic probability distributions-4 If a normally distributed random variable has a finite variance and zero mean, and changes periodically with a constant or next-to-constant frequency, but with a random amplitude and random phase angle, then these amplitudes and the corresponding energies obey the Rayleigh law of distribution. If the expected (mean) value and the variance are known, and the physical nature of the random environmental factor is such that the probability density function is symmetric with respect to the mean value (which coincides with the median and the most likely value), then the normal distribution should be accepted, especially (but not necessarily) if the random variable can be either positive or negative.Dr. E. Suhir Page 99
  45. 45. Choose appropriate basic probability distributions-5 It is noteworthy that if the safety factor defined as the ratio of the mean value of the safety margin to its standard deviation, is significant (which is typically the case), then application of the normal law of the distribution of the safety factor is acceptable: its negative values, although are possible in principle, are characterized by negligibly low probabilities and need not be considered. If the expected (mean) value and the variance are known, and the physical nature of the random environmental factor is such that the probability density function is highly asymmetric (skewed) with respect to its mean or the most likely value, then Weibull distribution, or the distribution of the absolute value of a normal random variable, or a truncated normal distribution, or a log-normal distribution can be used.Dr. E. Suhir Page 100
  46. 46. Establish appropriate cumulative probability distributions-1 Treating each reliability characteristic of interest as a non-random function (output) of a random argument (input) due to a particular external or internal factor, evaluate the probability density function of this characteristic for the assumed (accepted, determined) law of the probability distribution of the environmental factor. Time could enter as an independent parameter into the computed response. For some factors, the input could be considered as a non-random (deterministic) value.Dr. E. Suhir Page 101
  47. 47. Establish appropriate cumulative probability distributions-2 Determine the cumulative probability distribution functions for all the probability density functions that affect the given mechanical or thermal characteristic of interest. Such a convolution of the constituent laws of distribution considers, in the most accurate and non-conservative way, the probabilistic input of each of the environmental parameters that affect the particular mechanical, electrical, optical or thermal characteristic. Cumulative distributions consider the likelihood that the maxima of different important factors might not occur simultaneouslyDr. E. Suhir Page 102
  48. 48. Establish appropriate cumulative probability distributions-3 If the number of random variables does not exceed two, the convolution could be carried out analytically. If the number of random variables is three or more, one should “teach” a computer how to obtain a cumulative law of distribution. Since the above distributions are based on the transient responses of the mechanical (thermal) characteristics of interest to the time-dependent environmental excitations (parameters), these distributions determine the probability that at the given moment of time the given characteristic is below/above the given value of this characteristic.Dr. E. Suhir Page 103
  49. 49. Probabilistic reliability criteria Determine for each point of time, after the given duration of operation (mission): the safety factors and other reliability criteria for the characteristics that determine the performance, reliability, durability and safety of the system, the probability of non-failure, P (t), for the established (accepted) safety factor, at each point of time, and the mean time-to-failure, MTTF, for the established (accepted) safety factor, standard deviation, STD, of the time-to-failure and safety factor SF=MTTF/STD for the time-to-failure.Dr. E. Suhir Page 104
  50. 50. 8. Twelve steps to be conducted to add value to the existing practice “The man who removes a mountain begins by carrying away small stones” Chinese saying “Give me a fruitful error any time, full of seeds, bursting with its own corrections. You can keep your sterile truth for yourself” Vilfredo Pareto, Italian engineer, sociologist, and economistDr. E. Suhir Page 105
  51. 51. Some important preliminary steps Establish, as the manufacturer of a particular product, the list of possible failures and suitable failure criteria, as far as the functional, mechanical (physical) and environmental failures are concerned. Find out the similar requirements that the customer specifies (desires) regarding lifetimes (minimum and mean time to failure), failure rates (considering, for a particular product, if necessary, the wear-out portion of the bath-tub curve), probability of failure (for non-reparable products), availability specifications, etc. Identify active and passive parts, reparable and non-reparable parts, the most vulnerable (least reliably) parts (e.g., solder joint interconnections, materials prone to creep or aging, etc.), the feasibility of introducing redundancy, etc. As a customer, evaluate the ability of a particular manufacturer, to make parts with consistent quality, and, as a manufacturer, establish your company’s ability to produce such parts.Dr. E. Suhir Page 106
  52. 52. Twelve steps to be conducted to add value to the existing practice-11) Develop a detailed list of possible electrical, mechanical (structural), thermal, and environmental failures that should be considered, in one way or another, in the particular design (package, invertor, module, structure, etc.) 2) Make, based on the existing experience and best practices, the preliminary decision on the materials and geometries in the physical design and packaging of the product and its units/subunits/assemblies3) Conduct predictive modeling (using FEA or other simulation packages, as well as analytical/"mathematical" wherever possible) of the stresses and other failure criteria (say, elevated temperatures or electrical characteristics), considering steady state and transient thermal, stress/strain and electrical fields4)Consider possible loading in actual use conditions (electrical, thermal, mechanical, dynamic, as well as their combinations) and distinguish between short-term high- level loading (related to the ultimate strength of the structure) and long-term low-level loading (related to the fatigue strength of the structure)Dr. E. Suhir Page 107
  53. 53. Twelve steps to be conducted to add value to the existing practice-25) Review the existing qualification standards for the similar structures, having in mind, however, that these standards were designed, although for similar, but for different (power, geometry, materials, use) conditions, than what we will be dealing with; come up with the preliminary level of acceptable stresses, accelerations, temperatures, voltages, currents, etc.6) Having in mind FOAT procedures, decide on the constitutive relationships (formulas, FEA procedures, plots) that govern the failure mechanisms in question (Arrhenius type of equations for high temperature "baking", Minor type- for the materials that are expected to work within the elastic range, Erdogan-Paris type - for brittle materials, etc.)7) Design, conduct and interpret the results of the FOAT and, based on this testing, predict the reliability characteristics of the assemblies, joints, subunits and units of interestDr. E. Suhir Page 108
  54. 54. Twelve steps to be conducted to add value to the existing practice-38) Based on the obtained information, the state-of-the-art in the area in question and the requirements of the existing specifications, decide on the allowable (acceptable) values of the characteristics of failure, with consideration of the economically and technically feasible lifetime of the module and its major subassemblies9) Write first draft of the qualification specs (in other words, revise, if necessary, the existing JEDEC specs) for the module and its unites/subunits of interest10) Develop root cause analysis (RCA) methodologies 11) Decide on the burn-in conditions and establish adequate service for collecting field failures 12) Conduct, on the permanent basis, revisions of the designs and the reliability specifications.Dr. E. Suhir Page 109
  55. 55. 9. DO ELECTRONIC INDUSTRIES NEED NEW APPROACHESTO QUALIFY THEIR DEVICES INTO PRODUCTS? “I do not need an everlasting pen. I do not intend to live forever” Ilf and E. Petrov, “The Golden Calf” (in Russian) “It is always better to be approximately right than precisely wrong” Unknown Reliability Manager
  56. 56. Nobody and nothing is perfect: probability of failure is never zero It should be widely recognized that the probability of a failure is never zero, but could be predicted and, if necessary, controlled and maintained at an acceptable low level One effective way to achieve this is to implement the existing methods and approaches of PRM techniques and to develop adequate PDfR methodologies These methodologies should be based mostly on FOAT and on a widely employed predictive modeling effort FOAT should be carried out in a relatively narrow but highly focused and time- effective fashion for the most vulnerable elements of the design of interest If the QT has a solid basis in FOAT, PM and PDfR, then there is reason to believe that the product of interest will be sufficiently robust in the field.Dr. E. Suhir Page 111
  57. 57. QT could be viewed as “quasi-FOAT” The QT could be viewed as “quasi-FOAT,” as a sort-of the “initial stage of FOAT” that more or less adequately replicates the initial non-destructive, yet full-scale, stage of FOAT. We believe that such an approach to qualify devices into products will enable industry to specify, and the manufacturers -to assure, a predicted and low enough probability of failure for a device that passed the QT and will be operated in the field under the given conditions for the given time. We expect that the suggested approach to the DfR and QT will be accepted by the engineering and manufacturing communities, implemented into the engineering practice and be adequately reflected in the future editions of the QT specifications and methodologies.Dr. E. Suhir Page 112
  58. 58. The PDfR-based QT will still be non-destructive Such QTs could be designed, therefore, as a sort of mini-FOAT that, unlike the actual , “full-scale” FOAT, is non-destructive and conducted on a limited scale. The duration and conditions of such “mini-FOAT” QT should be established based on the observed and recorded results of the actual FOAT, and should be limited to the stage when no failures in the actual full-scale FOAT were observed. Prognostics and health management (PHM) technologies (such as “canaries”) should be concurrently tested to make sure that the safe limit is not exceeded.Dr. E. Suhir Page 113
  59. 59. What should be done differently It is important to understand the reliability physics that underlies the mechanisms and modes of failure in electronics and photonics components and devices FOAT should be thoroughly implemented, so that the QT is based on the FOAT information and data. PDfR concept should be widely employed FOAT cannot do without predictive modeling, the role of such modeling, both computer-aided and analytical (“mathematical”), in making the suggested new approach to product qualification practical and successful.Dr. E. Suhir Page 114
  60. 60. 10. CONCLUSIVE REMARKS“Life is the art of drawing sufficient conclusions from insufficient premises” Samuel Butler, British poet and satirist, “The Way of All Flesh”
  61. 61. Conclusions-1 Improvements in the existing QT, as well as in the existing best QT practices, are indeed possible, provided that the Probabilistic Design for Reliability (PD fR) concept is thoroughly developed and the corresponding methodologies are employed. One effective way to improve the existing QT and specs is to conduct, on a wide scale, Failure Oriented Accelerated Testing (FOAT) at the design stage (DFOAT) and at the manufacturing stage (MFOAT), and, since DFOAT cannot do without PM, carry out, whenever and wherever possible, predictive modeling (PM) to understand the physics of failure and to accumulate, when appropriate, failure statistics; revisit, review and revise the existing QT and specs considering the DFOAT and, to a lesser extent, MFOAT data for the most vulnerable elements of the device of interest; develop and widely implement the PDfR methodologies having in mind that “nobody and nothing is perfect”, that probability of failure is never zero, but could be predicted and, if necessary, controlled and maintained during operation at an acceptable low level.Dr. E. Suhir Page 116
  62. 62. Conclusions-2 We believe that our new approach to the qualification of the electronic devices will enable industry to specify and the manufacturers to assure a predicted and low enough probability of failure for a device that passed the qualification specifications and will be operated under the given stress (not necessarily mechanical) conditions for the given time. We expect that eventually the suggested new approaches to the DfR and QT will be accepted by the engineering and manufacturing communities, implemented in a timely fashion into the engineering practice and be adequately reflected in the future editions of the qualification specifications and methodologies.Dr. E. Suhir Page 117
  63. 63. Thank you for taking my course © 2009Dr. E. Suhir Page 118