Probabilistic design for reliability (pdfr) in electronics part1of2


Published on

This is a four parts lecture series. The course is designed for reliability engineers working in electronics, opto-electronics and photonics industries. It explains the roles of Highly Accelerated Life Testing (HALT) in the design and manufacturing efforts, with the emphasis on the design one (the HALT in manufacturing is the well known late Greg Hobb’s approach), and teaches what could and should be done to design, when high probability is a must, a product with the predicted, specified (“prescribed”) and, if necessary, even controlled, low probability of the field failure.
Part 1:• Reliability Engineering (RE) as part of Applied Probability (AP) and Probabilistic Risk Management (PRM)
• Accelerated Testing (AT) and its categories
• Qualification Testing (QT), Accelerated Testing and Highly Accelerated Life Testing (HALT)
• Predictive Modeling (PM) and its role
Part 2: • The most widespread HALT models: 1) Power law (used when PoF is unclear); 2) Boltzmann-Arrhenius equation (used when elevated temperature is the major cause of failure); 3) Coffin-Manson equation (an inverse power law used to evaluate low cycle fatigue life-time); 4) crack growth equations (used to evaluate fracture toughness of brittle materials); 5) Bueche-Zhurkov and Eyring equations (used to consider the combined effect of high temperature and mechanical loading); 6) Peck equation (to evaluate the combined effect of elevated temperature and relative humidity); 7) Black equation (to evaluate the combined effects of elevated temperature and current density); 8) Miner-Palmgren rule (to assess fatigue lifetime when the yield stress of the material is not exceeded); 9) creep rate equations; 10) weakest link model (applicable to extremely brittle materials with defects); 11) stress-strength (demand-capacity) interference model
• Example: typical HALT for an assembly subjected to thermal loading

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Probabilistic design for reliability (pdfr) in electronics part1of2

  1. 1. Probabilistic Design for  Reliability (PDfR) in  Electronics El i Part I Part I Dr. E. Suhir ©2011 ASQ & Presentation Suhir Presented live on Jan 03~06th, 2011 Calendar/Short Courses/Short_Courses.html
  2. 2. ASQ Reliability Division  ASQ Reliability Division Short Course Series Short Course Series One of the monthly webinars  One of the monthly webinars on topics of interest to  reliability engineers. To view recorded webinar (available to ASQ Reliability  Division members only) visit ) / To sign up for the free and available to anyone live  webinars visit and select English  Webinars to find links to register for upcoming events Calendar/Short Courses/Short_Courses.html
  3. 3. PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT, the Roles of Failure Oriented Accelerated Testing (FOAT) and Predictive Modeling (PM), and a Novel Approach to Qualification Testing (QT) “You can see a lot by observing” Yogi Berra, American Baseball Player “It is easy to see, it is hard to foresee” Benjamin Franklin, American Scientist and Statesman E. Suhir Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret), University of California, Dept. of Electrical Engineering, Santa Cruz, CA, University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA Tel. 650-969-1530, cell. 408-410-0886, e-mail: Four hour ASQ-IEEE RS Webinar short courseDr. E. Suhir January 3-6, 2011 Page 1
  4. 4. ContentsSession I1. Introduction: background, motivation, incentive2. Reliability engineering as part of applied probability and Probabilistic Risk Management (PRM) bodies of knowledge3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls and interaction with other accelerated test categoriesSession II4. Predictive Modeling (PM): FOAT cannot do without it5. Example of a FOAT: physics, modeling, experimentation, predictionSession III6. Probabilistic Design for Reliability (PDfR), its role and significanceSession IV7. General PDfR approach using probability density functions (pdf)8. Twelve steps to be conducted to add value to the existing practice9. Do electronic industries need new approaches to qualify their devices into products?10. Concluding remarksDr. E. Suhir Page 2
  5. 5. Session I 1. Introduction: background, motivation, incentive “Vision without action is a daydream. Action without vision is a nightmare” Japanese saying “The problem is not that old age comes. The problem is that young age passes” Common WisdomDr. E. Suhir Page 3
  6. 6. Background The short-term down-to-earth and practical goal of a particular electronic or a photonic device manufacturer is to conduct and pass the established qualification tests, without questioning whether they are perfect or not The ultimate long-term and broad goal of electronic, opto-electronic and photonic industries, regardless of a particular manufacturer or even a particular product, is to make the industries deliverables sufficiently reliable in the field, be consistently good in performance, and so to elicit trust of the customer Qualification testing (QT), such as, e.g., those prescribed by the JEDEC, Telcordia, AEC or the MIL specs, is the major means that the electronic, opto-electronic and photonic industries use to make their viable-and-promising devices into reliable-and- marketable products.Dr. E. Suhir Page 4
  7. 7. Motivation It is well known, however, that devices and systems that passed the existing qualification testsoften fail in the field. Should it be this way? Is this a problem indeed? Are the existing qualificationspecifications adequate? Do electronic and photonic industries need new approaches to qualifytheir devices into products? If they do, could the today’s qualification specifications and testing procedures be improved toan extent that if the device passed these tests, its performance in the field would be satisfactory? On the other hand, there is a perception, perhaps, a rather substantiated one, that some electroniccomponents “never fail”. Although one should never say “never”, such a perception exists becausesome products might be too robust and, as the consequence of that, are more costly thannecessary. Could the situation be changed and could the cost be brought down considerably, if onewould be able to assess the actual, most likely superfluous, probability of non-failure in the fieldand come up, for a particular product, with the best compromise between reliability, cost and time-to-market? Would it be possible to “prescribe” (specify), predict and, if necessary, even control the lowenough probability of failure for a product that operates under the given stress (not necessarilymechanical, of course) conditions for the given time?Dr. E. Suhir Page 5
  8. 8. Incentive We argue that the improvements in the QT, as well as in the existing best practices, areindeed possible, provided that the Probabilistic Design for Reliability (PDfR) concept isthoroughly developed and the corresponding methodologies are employed One effective way to improve the existing QT and specs is to conduct, on a wide scale, the appropriate Failure Oriented Accelerated Testing (FOAT) at both thedesign stage (DFOAT) and the manufacturing stage (MFOAT), and, since DFOAT cannot do withoutpredictive modeling (PM), carry out, whenever and wherever possible, PM to understand the physics of failure, and topredict, based on the DFOAT, the probability of failure in the field, revisit, review and revise, considering the DFOAT and, to a lesser extent, MFOAT data obtainedfor the most vulnerable elements of the device of interest, the existing QT practices, procedures,and specifications, develop and widely implement the PDfR methodologies and algorithms having in mind that“nobody and nothing is perfect” and that probability of failure in the field is never zero, but could bepredicted and, if necessary, minimized, controlled and maintained at an acceptable low level duringproduct operation.Dr. E. Suhir Page 6
  9. 9. 2. Reliability engineering as part of applied probability and Probabilistic Risk Management (PRM) bodies of knowledge “A pinch of probability is worth a pound of perhaps” James G. Thurber, American writer and cartoonist “In a long run we are all dead” John Maynard Keynes, British economistDr. E. Suhir Page 7
  10. 10. Reliability engineering deals with failure modes and mechanisms, “root” causes of occurrence of various failures, role of various defects, methods to estimate and prevent failures, and probability-based designs for reliability; provides guidance on how to make a viable device into a reliable and marketable product; in products, for which a certain level of failures is considered acceptable (such as, e.g., consumer products), examines ways of bringing down the failure rate to an allowable level; for products, for which a failure is a catastrophe, examines and considers ways of making the probability of failure as low as necessary or possible.Dr. E. Suhir Page 8
  11. 11. Reliability engineering as part of applied probability and probabilistic risk management (PRM) bodies of knowledge Reliability is part of applied probability and probabilistic risk management (PRM) bodiesof knowledge, and includes the items (systems) dependability, durability, maintainability,reparability, availability, testability, etc., i.e., probabilities of the corresponding events orcharacteristics Each of these characteristics is measured as a certain probability and could be of agreater or lesser importance depending on the particular function and operationconditions of the item or the system, and consequences of failure Applied probability and Probabilistic Risk Management (PRM) approaches andtechniques put the art of Reliability Engineering on a solid “reliable” ground.Dr. E. Suhir Page 9
  12. 12. “If a man will begin with certainties, he will end with doubts; but if he will be content to begin with doubts, he shall end in certainties.” Sir Francis Bacon, English Philosopher and Statesman “We see that the theory of probability is at heart only common sense reducedto calculations; it makes us appreciate with exactitude what reasonable minds feel by a sort of instincts, often without being able to account for it… The most important questions of life are, for the most part, really only problems of probability.” Pierre Simon, Marquise de Laplace “Mathematical formulas have their own life, they are smarter than we, even smarter than their authors, and provide more than what has been put into them” Heinrich Hertz, German Physicist E. Suhir
  13. 13. Reliability should be taken care of on the permanent basisThe reliability evaluation and assurance cannot be delayed until the device is made (although it is often the case in many actual industries). Reliability should be “conceived” at the early stages of its design (a reliability and an electronic engineers should start working together from the very beginning of the device/system development), implemented during manufacturing (through a high quality manufacturing process) qualified and evaluated by electrical, optical, environmental and mechanical testing both at the design and the manufacturing stages (the customer requirements and the general qualification requirements are to be considered), checked (screened) during production (by implementing an adequate burn-in process) and, if necessary and appropriate, maintained in the field during the product’s operation, especially at the early stages of the product’s use (by employing, e.g., technical diagnostics, prognostication and health monitoring methods and instrumentation).Dr. E. Suhir Page 11
  14. 14. Three classes of engineering products from the reliability point of view See E.Suhir, Applied probability for engineers and scientists, McGraw-Hill, 1997 Class I. The product has to be made as reliable as possible. Failure should not be permitted. Examples are some military or space objects Class II. The product has to be made as reliable as possible, but only for a certain level of demand (stress, loading). Failure is a catastrophe. Examples are civil engineering structures, bridges, ships, aircraft, cars Class III. The reliability does not have to be very high. Failures are permitted, but should be restricted. Examples are consumer products, commercial electronics, agricultural equipment.Dr. E. Suhir Page 12
  15. 15. Class I (military or similar) products The product (object) has to be made as reliable as possible. Failure is viewed as a catastrophe. Examples are some warfare, military aircraft, battle-ships, spacecraft Cost is not a dominating factor The products usually have a single customer, such as the government or a big firm The reliability requirements are defined in the form of government standards The standards not only formulate the reliability requirements for the product, but also specify the methods that are to be used to prove (demonstrate) the reliability, and often even prescribe how the system must be manufactured, tested and screened It is typically the customer, not the manufacturer, who sets the reliability standards.Dr. E. Suhir Page 13
  16. 16. Class II (industrial or similar) products The product (system, structure) has to be made as reliable as possible, but only for a certainspecified level of loading (demand). If the actual load (waves, winds, earthquakes, etc.) happens tobe larger than the design demand, then the product might fail, although the probability of such afailure should be determined beforehand and should/could be (made) very small Examples are: long-haul communication systems, civil engineering structures (bridges, tunnels,towers), passenger elevators, ocean-going vessels, offshore structures, commercial aircraft,railroad carriages, cars, some medical equipment These are highly expensive products, which are produced in large quantities, and thereforeapplication of Class I requirements will lead to unjustifiable, unfeasible and unacceptableexpenses. Failure is a catastrophe and might be associated with loss of human lives and withsignificant economic losses The products are typically intended for industrial, rather than government, markets. Thesemarkets are characterized by rather high volume of production (buildings, bridges, ships, aircraft,automobiles, telecommunication networks, etc.), but also by fewer and more sophisticatedcustomers than in the commercial (Class III) market.Dr. E. Suhir Page 14
  17. 17. Class III (consumer, commercial) products The typical market is the consumer market. An individual consumer is a very small part of thetotal consumer base. The product is inexpensive and manufactured in mass quantities The demand for the product is usually driven by the cost of the product and time-to-market, ratherthan by its reliability. As long as the product is “sellable”, its reliability does not have to be veryhigh: it should only be adequate for customer acceptance and reasonable satisfaction. Simple andinnovative products, which have a high degree of customer appeal and are in significant demand,may be able to prosper, at least for some time, even if they are not very reliable Failure is not a catastrophe: a certain reasonable level of failures during normal operation of theproduct is acceptable, as long as the failure rate is within the anticipated/expected range Reliability testing is limited, and the improvements are often implemented based on the fieldfeedback It is typically the manufacturer, not the consumer, who sets the reliability standards, if any, for theproduct . No special reliability standards are often followed, and it is the customer satisfaction (onthe statistical basis), which is the major criterion of the viability and quality of the product.Dr. E. Suhir Page 15
  18. 18. Reliability, cost-effectiveness, and time-to-market Reliability, cost effectiveness and time-to-market considerations play an important role in the design, materials selection and manufacturing decisions, and are the key issues in competing in the global market-place. A company cannot be successful, if its products are not cost effective, or do not have a worthwhile lifetime and service reliability to match the expectations of the customer. Too low a reliability can lead to a total loss of business Product failures have an immediate, and often dramatic, effect on the profitability and even the very existence of a company. Profits decrease as the failure rate increases. This is due not only to the increase in the cost of replacing or repairing parts, but, more importantly, to the losses due to the interruption in service, not to mention the “moral losses”. These make obvious dents in the company’s reputation and, as the consequence of that, affect its sails The time to develop and to produce products is rapidly decreasing. This circumstance places a significant pressure on both business people and reliability engineers, who are supposed to come up with a reliable product and to confirm its long-term reliability in a short period of time to make their device a product and to make this product successful in the marketplace Each business, whether small or large, should try to optimize its overall approach to reliability. “Reliability costs money”, and therefore a business must understand the cost of reliability, both “direct” cost (the cost of its own operations), and the “indirect” cost (the cost to its customers and their willingness to make future purchases and to pay more for more reliable products).Dr. E. Suhir Page 16
  19. 19. 3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls and interaction with other accelerated test categories “Nothing is impossible. It is often merely for an excuse that we say that things are impossible” Francois de La Rochefoucauld, French philosofer “Truth is really pure and never simple” Oscar Wilde, British writer, “The Importance of Being Earnest”Dr. E. Suhir Page 17
  20. 20. Why accelerated tests? It is impractical and uneconomical to wait for failures, when the mean-time-to-failure for a typical today’s electronic device (equipment) is on the order of hundreds of thousands of hours Accelerated testing (AT) enables one to gain greater control over the reliability of a product AT has become a powerful means in improving reliability. This is true regardless of whether (irreversible or reversible) failures will or will not actually occur during the FOAT (“testing to fail”) or QT (“testing to pass”) In order to accelerate the material’s (device’s) degradation and/or failure, one has to deliberately “distort” (“skew”) one or more parameters (temperature, humidity, load, current, voltage, etc.) affecting the device functional and/or mechanical performance and/or its environmental durability.Dr. E. Suhir Page 18
  21. 21. Accelerated test categories: traditional definitionsAccelerated Product development Qualification (“screening”) Accelerated life teststest type (verification) tests tests (QTs) (ALTs), highly accelerated(category) (PDTs) life tests (HALTs), and failure oriented accelerated tests (FOATs) Technical feedback to Proof of reliability; Understand modes andObjective ensure that the taken demonstration that the mechanisms of failure and , design approach is product is qualified to time permitting, accumulate viable (acceptable) serve in the given capacity failure statistics Time, type, level, and/or Predetermined time and/orEnd point number of failures the # of cycles, and/or the Predetermined number or excessive (unexpected) percent of failures number of failuresFollow-up Failure analysis, design Pass/fail decision Failure analysis and , timeactivity decision permitting, statistical analysis of the test dataPerfect (ideal) Specific definition(s) No failure in a long time Numerous failures in atest SuhirDr. E. short time Page 19
  22. 22. Accelerated test categories: updated definitions Accelerated Product Qualification Accelerated Life Testing (ALT)= test type development (“screening”) testing =Failure Oriented Accelerated Testing (FOAT) (category) (verification) (QT) testing (PDT) at the at the at the design stage (DFOAT) At the manufacturing stage design manufacturi (MFOAT)= Hobbs’ Highly ALT stage (DQT) ng stage (HHALT)= Accelerated burn-in (MQT) Objective Technical Proof of reliability; Understand physics (modes and Assess failure limits, feedback to demonstration that the item mechanisms) of failure, failure limits, Weed out infant mortalities ensure that the is qualified into a product, and, time permitting, accumulate taken design i.e., is able to serve in the failure statistics approach is viable given capacity (acceptable) End point Time, type, level, Predermined time and/or Predetermined number or percent of failures and/or number of number of cycles, and/or failures the excessive (unexpected) number of failures Follow-up Failure analysis, Pass/fail decision Failure analysis and, time permitting, Pass/fail decision activity Design decision also statistical analysis of the test dataPerfect (ideal) Specific No failures in a long time Numerous failures in a short time No failures in a long time test definitions
  23. 23. Some most common accelerated test conditions (stimuli)High Temperature (Steady-State) Soaking/Storage/ Baking/Aging/ Dwell,Low Temperature Storage,Temperature (Thermal) Cycling,Power Cycling,Power Input and Output,Thermal Shock,Thermal Gradients,Fatigue (Crack Propagation) Tests,Mechanical Shock,Drop Shock (Tests),Random Vibration Tests,Sinusoidal Vibration Tests (with the given or variable frequency),Creep/Stress-Relaxation Tests,Electrical Current Extremes,Voltage Extremes,High Humidity,Radiation (UV, cosmic, X-rays),Altitude,Space Vacuum
  24. 24. Elevated StressAT uses elevated stress level and/or higher stress-cycle frequency as effective stimulito precipitate failures over a much shorter time frameThe “stress” in reliability engineering does not necessarily have to be a mechanical ora thermo-mechanical: it could be electrical current or voltage, high (or low)temperature, high humidity, high frequency, high pressure or vacuum, cycling rate, orany other factor (stimulus) responsible for the reliability of the device or the equipmentAT must be specifically designed for the product under testThe experimental design of AT should consider the anticipated failure modes andmechanisms, typical use conditions, and the required or available test resources,approaches and techniques.
  25. 25. Qualification Testing (QT) is a must The objective of the qualification testing (QT) is to prove that the reliability of the product-under-test is above a specified level. This level is usually measured by the percentage of failures per lot and/or by the number of failures per unit time (failure rate) The typical requirement is no more than a few percent failing parts out of the total lot (population) QT enables one to “reduce to a common denominator” different products, as well as similar products, but produced by different manufacturers QT reflects the state-of-the-art in a particular field of engineering, and typical requirements for the performance of the product. Industry cannot do without QT Testing is time limited and is generally non-destructive (not failure oriented).Dr. E. Suhir Page 23
  26. 26. Today’s Qualification Testing (QT): shortcomings The today’s qualification standards and requirements are only good for what they are intended - to confirm that the given device is qualified into a product to serve in a particular capacity If a product passed the standardized qualification tests, it is not always clear why it was good, and if the product failed the tests, it is equally unclear what could be done to improve its reliability If a product passed the qualification tests, it does not mean that there will be no failures in the field, nor it is clear how likely or unlikely these failures might be Since QT is not failure oriented, it is unable to provide the most important ultimate information about the reliability of the product - the probability of its failure after the given time in service and under the given service (operation, stress) conditions.Dr. E. Suhir Page 24
  27. 27. Failure Oriented Accelerated Testing (FOAT)-1 FOAT is aimed at the revealing and understanding the physics of the expected oroccurred failures. Unlike QTs, FOAT is able to detect the possible failure modes andmechanisms Another objective of the FOAT is to accumulate failure statistics. Thus, FOAT deals withthe two major aspects of the Reliability Engineering – physics and statistics of failure Adequately planned, carefully conducted, and properly interpreted FOAT provides aconsistent basis for the prediction of the probability of failure after the given time inservice. Well-designed and thoroughly implemented FOAT can dramatically facilitate thesolutions to many engineering and business-related problems, associated with the costeffectiveness and time-to-market This information can be helpful in understanding what should be changed to design aviable and reliable product. Indeed, any structural, materials and/or technologicalimprovement can be “translated”, using the FOAT data, into the probability of failure forthe given duration of operation under the given service (environmental) conditions.Dr. E. Suhir Page 25
  28. 28. Failure Oriented Accelerated Testing (FOAT)-2 FOAT should be conducted in addition to, and, preferably, long before thequalification tests. There might be also situations, when FOAT can be used as aneffective substitution for the QT, especially for new products, when acceptablequalification standards do not yet exist While it is the QT that makes a device into a product, it is the FOAT that enables oneto understand the reliability physics behind the product and, based on the appropriatePM, to create a reliable product with the predicted probability of failure There is always a temptation to broaden (enhance) the stress as far as possible toachieve the maximum “destructive effect” (FOAT effect) in a shortest period of time.Unfortunately, sometimes, accelerated test conditions may hasten failure mechanismsthat are different from those that could be actually observed in service conditions(“shift” in the modes and/or mechanisms of failure)Dr. E. Suhir Page 26
  29. 29. FOAT pitfalls Because of the existence of such “shifts”, it is always necessary to correctly identify theexpected failure modes and mechanisms, and to establish the appropriate stress limits, inorder to prevent “shifts” in the original (actual) dominant failure mechanism Examples are: change in materials properties at high or low temperatures, time-dependent strain due to diffusion, creep at elevated temperatures, occurrence andmovement of dislocations caused by an elevated stress, or a situation when a bimodaldistribution of failures (a dual mechanism of failure) occurs Since, particularly, infant mortality (“early”) failures might occur concurrently with theanticipated (“operational”) failures, it is imperative to make sure that the “early” and“operational” failures are well separated in the tests Different failure mechanisms are characterized by different physical phenomena anddifferent activation energies, and therefore a simple superposition of the effects of twomechanisms is unacceptable: it can result in erroneous reliability projections.Dr. E. Suhir Page 27
  30. 30. Burn-in testing (BIT) is a special type of FOAT-1 Burn-in (“screening”) testing (BIT) is widely implemented to detect and eliminate infantmortality failures. BIT could be viewed as a special type of manufacturing FOAT (MFOAT).BIT is needed to stabilize the performance of the device in use BIT is supposed to stimulate failures in defective devices by accelerating the stressesthat will cause these devices to fail without damaging good items. The bathtub curve of adevice that undergone BIT is supposed to consist of a steady state and wear-out portionsonly. The rationale behind the BIT is based on a concept that mass production of electronicdevices generates two categories of products that passed QT: 1) robust (“strong”) components that are not expected to fail in the field and 2) relatively unreliable (“week”) components (“freaks”) that, if shipped to the customer,will most likely fail in the field BIT can be based on high temperatures, thermal cycling, voltage, current density, highhumidity, etc., and is performed by either manufacturer or by an independent test house.Dr. E. Suhir Page 28
  31. 31. Burn-ins – special type of FOAT-2 For products that will be shipped out to the customer, BIT is nondestructive BIT is a costly process, and therefore its application must be thoroughly monitored. BITis mandatory on most high-reliability procurement contracts, such as defense, space, andtelecommunication systems. In the today’s practice BIT is often used for consumerproducts as well. For military applications the BIT can last as long as a week (168 hours).For commercial applications burn-ins typically do not last longer than two days (48 hours) Optimum BIT conditions can be established by assessment of the main expected failuremodes and their activation energies, and from the analysis of the failure statistics duringBIT Special investigations are usually required, if one wishes to ensure that cost-effectiveBIT of smaller quantities is acceptable. A cost-effective simplification can be achieved, ifBIT is applied to the complete equipment (assembly or subassembly), rather than to anindividual component, unless it is a large system fabricated of several separately testableassemblies.Dr. E. Suhir Page 29
  32. 32. Burn-ins – special type of FOAT-3 Although there is always a possibility that some defects might escape the BIT, it is morelikely that BIT will introduce some damage to the “healthy” structure and/or might“consume” a certain portion of the useful service life of the product: BIT not only “fights”the infant mortality, but accelerates the very degradation process that takes place in theactual operation conditions, unless the defectives have a much shorter lifetime than thehealthy products and have a more narrow (more “deterministic”, more “delta-like”)probability-of-failure distribution density Some BIT (e.g., high electric fields for dielectric breakdown screening, mechanicalstresses below the fatigue limit) are harmless to the materials and structures under test,and do not lead to an appreciable “consumption” of the useful lifetime (field life loss).Others, although do not trigger any new failure mechanisms, might consume some smallportions of the device lifetime.Dr. E. Suhir Page 30
  33. 33. Burn-ins – special type of FOAT-4 When planning, conducting and evaluating the BIT results, one should make sure thatthe stress applied by the BIT is high enough to weed out infant mortalities, but is lowenough not to consume a significant portion of the product’s lifetime, nor to introduce apermanent damage A natural concern, associated with the BIT, is that there is always a jeopardy that BITmight trigger some failure mechanisms that would not be possible in the actual useconditions and/or might affect the components that should not be viewed as defectiveones. In lasers, the “steady-state” portion is, in effect, not a horizontal, but a slowly risingcurve. In addition, wear-out failures, which are characterized by the time-dependent failurerate, occupy a significant portion of the failure-rate (bath-tub) diagram. Standardproduction BIT should be combined for laser devices with the long-term life testing.Dr. E. Suhir Page 31
  34. 34. Wear-out failures For a well-designed and adequately manufactured product, the were-out failures should occur at the late stages of operation and testing. If one observes that it is not the case (the steady-state portion of the “bathtub” curve is not long enough or does not exist at all), one should revisit the design and to choose different materials and/or different design solutions, and/or a different (more consistent) manufacturing process, etc. In some electronics materials (such as BGA and PGA systems) and in some photonics products (e.g., lasers) the wear-out part of the bathtub curve can occupy a significant portion of the product’s lifetime, and should be carefully analyzed.Dr. E. Suhir Page 32
  35. 35. What one should/could possibly do to prevent failures-1 Develop an in-depth understanding of the physics of possible failures. No failure statistics, nor the most effective ways to accommodate failures (such as redundancy, trouble-shooting, diagnostics, prognostication, health monitoring, maintenance), can replace good understanding of the physics of failure and good (robust) physical design Assess the likelihood (the probability) that the anticipated modes and mechanisms might occur in service conditions and minimize the likelihood of a failure by selecting the best materials and the best physical design of your design/product Understand and distinguish between different aspects of reliability: operational (functional) performance, structural/mechanical reliability (caused by mechanical loading) and environmental durability (caused by harsh environmental conditions).Dr. E. Suhir Page 33
  36. 36. What one should/could possibly do to prevent failures-2 Distinguish between the materials and structural reliability and assess the effect of the mechanical and environmental behavior of the materials and structures in his/her design on the functional performance of the product Understand the difference between the requirements of the qualification specifications and standards, and the actual operation conditions. In other words, understand well the QT conditions and design the product not only that it would be able to withstand the operation conditions on the short- and long-term basis, but also to pass the QT Understand the role and importance of FOAT and conduct PM whenever and wherever possible.Dr. E. Suhir Page 34
  37. 37. Session II 4. Predictive Modeling (PM): FOAT cannot do without it “The probability of anything happening is in inverse ratio to its desirability” John W. Hazard, American attorney-at-law “Any equation longer than three inches is most likely wrong” Unknown Experimental PhysicistDr. E. Suhir Page 35
  38. 38. FOAT cannot do without predictive modeling (PM) FOAT cannot do without simple and meaningful predictive models. It is on the basis of such models that one decides which parameter should be accelerated, how to process the experimental data and, most importantly, how to bridge the gap between what one “sees” as a result of the accelerated testing and what he/she will possibly “get” in the actual operation conditions By considering the fundamental physics that might constrain the final design, PM can result in significant savings of time and expense and shed additional light on the physics of failure PM can be very helpful to predict reliability at conditions other than the FOAT and can provide important information about the device performance Modeling can be helpful in optimizing the performance and lifetime of the device, as well as to come up with the best compromise between reliability, cost effectiveness and time-to-market .Dr. E. Suhir Page 36
  39. 39. Requirements for a good predictive model A good FOAT PM does not need to reflect all the possible situations, but should besimple, should clearly indicate what affects what in the given phenomenon or structure,be suitable/flexible for new applications, with new environmental conditions andtechnology developments, as well as for the accumulation, on its basis, the reliabilitystatistics. The scope of the model depends on the type and the amount of information available. A FOAT PM does not have to be comprehensive, but has to be sufficiently generic, andshould include all the major variables affecting the phenomenon (failure mode) of interest.It should contain all the most important parameters that are needed to describe and tocharacterize the phenomenon of interest, while parameters of the second order ofimportance should not be included into the model. FOAT PM take inputs from various theoretical analyses, test data, field data, customerrequirements, qualification spec requirements, state-of-the-art in the given field,consequences of failure for the given failure mode, etc.Dr. E. Suhir Page 37
  40. 40. What the existing FOAT PMs predict Before one decides on a particular FOAT PM he/she should anticipates the predominantfailure mechanism in advance, and then applied the appropriate model The most widespread PMs identify the mean time-to-failure (MTTF) in steady-state-conditions If one assumes a certain probability density function for the particular failuremechanism, then, for a two-parametric distribution (like, e.g., the normal one) he/shecould construct this function based on the determined mean-time-to-failure and themeasured standard deviation (STD) For a single-parametric probability density distribution function, like an exponential one,the knowledge of the MTTF is sufficient to determine the failure rate and to determine theprobability of failure for the given time in operation.Dr. E. Suhir Page 38
  41. 41. Most widespread predictive models (PMs) Power law (used when the PoF is unclear), Boltzmann-Arrhenius equation (used when there is a belief that the elevated temperature is the major cause of failure), Coffin-Manson equation (inverse power law; used particularly when there is a need to evaluate the low cycle fatigue life-time), Crack growth equations (used to assess the fracture toughness of brittle materials), Bueche-Zhurkov and Eyring equations (used to assess the MTTF when both the high temperature and stress are viewed as the major causes of failure), Peck equation (used to consider the role of the combined action of the elevated temperature and relative humidity) Black equation (used to consider the roles of the elevated temperature and current density), Miner-Palmgren rule (used to consider the role of fatigue when the yield stress is not exceeded), Creep rate equations, Weakest link model (used to evaluate the MTTF in extremely brittle materials with defects), Stress-strength interference model, which is, perhaps, the most flexible and well substantiated model.Dr. E. Suhir Page 39
  42. 42. Example: Boltzmann-Arrhenius equationBoltzmann-Arrhenius equation underlies many FOAT related concepts . The MTTF,τ=tau, is proportional to an exponential function, in which the argument is a fraction,where the activation energy, Ua, eV, is in the numerator, and the product of theBoltzmann’s constant, k=8.6174×10-5eV/ºK, and the absolute temperature, T, is in the ×denominator:  Ua  τ = τ 0 exp    ( k T −T*  )The equation was first obtained by L. Boltzmann in the statistical theory of gases, andthen applied by the S. Arrhenius to describe the inversion of sucrose. Arrhenius paidattention to the fact that the physical processes and the chemical reactions in solidbodies are also enhanced by the absolute temperatureBoltzmann-Arrhenius equation is applicable, when the failure mechanisms areattributed to a combination of physical and chemical processes. Since the rates ofmany physical processes (such as, say, solid state diffusion, many semiconductordegradation mechanisms) and chemical reactions (such as, say, battery life) aretemperature dependent, it is the temperature that is the acceleration parameter..
  43. 43. Boltzmann-Arrhenius Equation and the PDfR concept Boltzmann-Arrhenius equation addresses degradation processes and attributes degradation and possible failures cased by degradation to elevated temperatures and possibly to the elevated humidity as well, i.e., to the environmental factors. The failure rate for a system whose MTTF is given by the Boltzmann-Arrhenius equation can be found as 1  Ua  λ = exp  − τ0  k (T − T * )  The probability of failure at the moment t of time can be found as P = 1 − e − λt This formula is known as exponential formula of reliability. If the probability of failure P is established for the given time t in operation, then the exponential formula of reliability can be used to determine the acceptable failure rate.Dr. E. Suhir Page 41
  44. 44. Coffin-Manson Equation (Inverse Power Law)-1 Many electronic materials and especially solder joints fail primarily because of theelevated mechanical stresses and deformations (strains). The numerous existingempirical and semi-empirical methods and approached that address the low-cycle-fatiguelife-time of solders are, in one way or another, based on the pioneering work of Coffin andManson It has been established that materials that experience elevated stresses and strainswithin the elastic range fail because of elevated stresses, whether steady-state or variable,while the materials that experience high stresses exceeding yield stress fail primarilybecause of the inelastic deformations. Such a behavior, known as low-cycle-fatigueconditions, is typical for solder materials, including even lead-free solders whose yieldpoint might be substantially higher than that for tin-lead solders The original Coffin-Manson equation is just an inversed power law that is applicable tohighly compliant materials exhibiting significant plastic deformations prior to failure. Theinverse power law is used also in some other, physically quite different, applications, suchas MTTF in random vibration tests (Steinberg’s formula); aging in high-power lasers, etc.Dr. E. Suhir Page 42
  45. 45. Coffin-Manson Equation (Inverse Power Law)-2 The studies carried out in the 1990-s addressed primarily flip-chip tin-lead solder jointinterconnections. The today’s studies address primarily the thermal fatigue life of ball-grid-array (BGA) and pad-grid-array (PGA) systems and especially lead-free solder joints The thermally-induced stresses and strains in the flip-chip solder joints are caused bythe CTE mismatch of the chip and the package substrate materials, as well as by thetemperature gradients because of the difference in temperature between the “hot” chipand the “cold” substrate. In BGA and PGA systems the stresses and strains are causedby the mismatch of the package structure and the PCB (“system’s substrate”) The numerous suggested phenomenological semi-empirical models are based on theprediction and improving the solder material fatigue caused by the accumulated cyclicinelastic strain in the solder material. This strain is due to the temperature fluctuationsresulting from the changes in the ambient temperature (temperature cycling) and/or fromheat dissipation in the package (power cycling).Dr. E. Suhir Page 43
  46. 46. Coffin-Manson Equation (Inverse Power Law)-3 The modified Coffin-Manson model  U  f = Af −α ∆T −β − exp    kTmax  can be used to model crack growth in solder and other metals due to temperature cycling. In the above formula, f is the number of cycles to failure, f is the cycling frequency, ∆T is the temperature range during a cycle, Tmax is the maximum temperature reached in each cycle, and k is Boltzmann’s constant. Typical values for the cycling frequency exponent α and the temperature range exponent β are around - 1/3 and 2, respectively. Reduction in the cycling frequency reduces the number of cycles to failure. The activation energy U is around 1.25. In recent years a visco-plastic rate dependent constitutive model, known as Anand model, is often used in combination with the FEA simulation to predict the solder joint reliability. In Anand’s model (that includes one flow equation and three evolution equations) plasticity and creep phenomena are unified and described by the same set of flow and evolution relations.Dr. E. Suhir Page 44
  47. 47. Stress-strength (“interference”) model Fig.20. Stress-strength (“Interference”) models-1 Stress (Demand) and Strength (Capacity) DistributionsDr. E. Suhir Page 45 Page 20
  48. 48. 5. EXAMPLE OF A FOAT: Physics, Modeling, Experimentation, Prediction “A theory without an experiment is dead. An experiment without a theory is blind” Unknown Reliability EngineerDr. E. Suhir Page 46
  49. 49. Dr. E. Suhir Page 47
  50. 50. Dr. E. Suhir Page 48
  51. 51. Dr. E. Suhir Page 49
  52. 52. Dr. E. Suhir Page 50
  53. 53. Dr. E. Suhir Page 51
  54. 54. Dr. E. Suhir Page 52
  55. 55. Dr. E. Suhir Page 53
  56. 56. Finite_Element Analysis (FEA) DataDr. E. Suhir Page 54
  57. 57. Predicted Stresses and Strains in a Short CylinderDr. E. Suhir Page 55
  58. 58. Dr. E. Suhir Page 56
  59. 59. Experimental bathtub curve for the solder joint interconnections in a flip-chip multichip moduleDr. E. Suhir Page 57
  60. 60. Probability of failure of the solder joint interconnections vs. failure rateDr. E. Suhir Page 58
  61. 61. Dr. E. Suhir Page 59
  62. 62. Thank you for taking my course © 2009Dr. E. Suhir Page 118