RELIABILITY INTEGRATION TOOLS RELIABILITY INTEGRATION “the process of seamlesslycohesively integrating reliability tools together to maximize reliability and at the lowest possible cost”
Reliability vs. Cost♦ Intuitively, one recognizes that there is some minimum total cost that will be achieved when an emphasis in reliability increases development and manufacturing costs while reducing warranty and in-service costs. Use of the proper tools during the proper life cycle phase will help to minimize total Life Cycle Cost (LCC). CRE Primer by QCI, 1998Reliability vs. Cost, continued TOTAL COST OPTIMUM CURVE COST POINT RELIABILITY PROGRAM COSTSCOST WARRANTY COSTS RELIABILITY
Reliability vs. Cost, continued In order to minimize total Life Cycle Costs (LCC), a Reliability Engineer must do two things:♦ choose the best tools from all of the tools available and must apply these tools at the proper phases of a product life cycle.♦ properly integrate these tools together to assure that the proper information is fed forward and backwards at the proper times.Reliability vs. Cost, continued As part of the integration process, we must choose a set of tools at the heart of our program in which all other tools feed to and are fed from. The tools we have chosen for this are: Reliability Goals & Metrics and HALT and HASS
Reliability Definition♦ Reliability is often considered quality over time♦ Reliability is the probability of a product performing its intended function over its specified period of usage, and under specified operating conditions, in a manner that meets or exceeds customer expectations.Reliability Goals & Metrics Summary♦ Reliability Goals & Metrics tie together all stages of the product life cycle. Well crafted goals provide the target for the business to achieve, they set the direction.♦ Metrics provide the milestones, the “are we there, yet”, the feedback all elements of the organization needs to stay on track toward the goals.
Reliability Goals & Metrics Summary♦ A reliability goal includes each of the five elements of the reliability definition. • Probability of product performance • Intended function • Specified life • Specified operating conditions • Customer expectationsReliability Goals & Metrics Summary♦ A reliability metric is often something that organization can measure on a relatively short periodic basis. • Predicted failure rate (during design phase) • Field failure rate • Warranty • Actual field return rate • Dead on Arrival rate
HALT and HASS Summary♦ Highly Accelerated Life Testing (HALT) and Highly Accelerated Stress Screening (HASS) are two of the best reliability tools developed to date, and every year engineers are turning to HALT and HASS to help them achieve high reliability.HALT and HASS Summary, continued♦ In HALT, a product is introduced to progressively higher stress levels in order to quickly uncover design weaknesses, thereby increasing the operating margins of the product, translating to higher reliability.♦ In HASS, a product is “screened” at stress levels above specification levels in order to quickly uncover process weaknesses, thereby reducing the infant mortalities, translating to higher quality.
Presentation Objective This presentation shall review the best reliability tools to use in conjunction with Reliability Goals & Metrics and HALT & HASS, plus how to integrate them together. RELIABILITY INTEGRATION TOOLS
Reliability Integration Tools - Summary♦ For the Organization • Reliability Integration for the Organization - Tools that are used across an organization in order to define the reliability requirements and policy of a program. • The output of this phase is the Reliability Program goals, metrics and policies. This structure will guide the development of reliability plan for specific products. This is the approach and business connections that guides the rest of the program.Reliability Integration Tools - Summary♦ PHASE I: Concept Phase • Reliability Integration in the CONCEPT Phase - Tools that are used in the concept phase of a project in order to define the reliability requirements of a program. Benchmarking is usually required. • The output of this phase is the Reliability Program and Integration Plan. This plan will specify which tools to use and the goals and specifications of each. This is the plan that drives the rest of the program.
Reliability Integration Tools - Summary♦ PHASE II: Design Phase • Reliability Integration in the DESIGN Phase - Tools that are used in the design phase of a project after the reliability has been defined. • Predictions and other forms of reliability analysis are performed here. • These tools will only have an impact on the design if they are done very early in the design process.Reliability Integration Tools - Summary♦ Phase III: Prototype Phase • Reliability Integration in the PROTOTYPE Phase - Tools that are used after a working prototype has been developed. • This represents the first time a product will be tested. • The testing will mostly be focused at finding design issues
Reliability Integration Tools - Summary♦ Phase IV: Manufacturing Phase • Reliability Integration in the MANUFACTURING Phase - Tools here are a combination of analytical and test tools that are used in the manufacturing environment to continually assess the reliability of the product. • The focus here will be mostly at finding process issues RELIABILITY INTEGRATION IN THE CONCEPT PHASE
Reliability Integration in the CONCEPT Phase ♦ Reliability Goal-Setting ♦ Review of Current Capabilities ♦ Gap Analysis ♦ Reliability Program and Integration Plan Reliability Goal-Setting♦ Reliability Goals can be derived from • Customer-specified or implied requirements • Internally-specified or self-imposed requirements (usually based on trying to be better than previous products) • Benchmarking against competition
Reliability Goal-Setting♦ Reliability Goals – Which Should We Use ? • Customer-specified or implied requirements ? • Internally-specified or self-imposed requirements ? • Benchmarking ?♦ For Best Results, Use All Three ! Review of Current Capabilities ♦ Once we have defined our goals, we must understand our current capabilities. This will be used to define the Gap • Interviews (may be same people as during goal- setting) • Review documents – plans, reports • Review field data
Gap Analysis♦ The Gap Analysis naturally flows from the Reliability Goal-Setting exercise and Review of Current Capabilities. • Once we understand what is expected of the product in the industry, we must then compare that with current capabilities, and this becomes the Gap Analysis.Gap Analysis♦ Measuring Size of Gap♦ Determining if Gap is Attainable♦ Adjusting Goals if Gap Is Unrealistically High
Gap Analysis♦ Measuring the Size of Gap Gap = Goals – Current CapabilitiesReliability Program and Integration Plan♦ A Reliability Program and Integration Plan is crucial at the beginning of the product life cycle because in this plan, we define: • What are the overall goals of the product and of each assembly that makes up the product ? • What has been the past performance of the product ? • What is the size of the gap ? • What reliability elements/tools will be used ? • How will each tool be implemented and integrated to achieve the goals ? • What is our schedule for meeting these goals ?
Reliability Program and Integration Plan –Plan Execution♦ Now it is time to execute the Reliability Program and Integration Plan.♦ Each element of the plan will call for a different reliability tool. THE REMAINDER OF THE PRESENTATION WILL REVIEW EACH TOOL AND HOW TO INTEGRATE IT TOGETHER WITH THE OTHER TOOLS. RELIABILITY INTEGRATION IN THE DESIGN PHASE
Reliability Integration in theDESIGN Phase • Reliability Modeling and Predictions • Derating Analysis • Failure Modes, Effects, and Criticality Analysis (FMECA) • Design of Experiments • Fault Tree Analysis • Stress-Strength Analysis • Tolerance and Worst-Case Analysis • Human Factors Analysis • Maintainability and Preventive Maintenance RELIABILITY MODELING AND PREDICTIONS
Reliability Modeling and Predictions, cont. CRE Primer by QCI, 1998Reliability Modeling and Predictions, cont. Reliability Prediction: Definition♦ A reliability prediction is a method of calculating the reliability of a product or piece of a product from the bottom up - by assigning a failure rate to each individual component and then summing all of the failure rates.
Reliability Modeling and Predictions, cont. Reliability Predictions♦ help assess the effect of product reliability on the quantity of spare units required which feeds into the life cycle cost model.♦ provide necessary input to system-level reliability models (e.g. frequency of system outages, expected downtime per year, and system availability).♦ assist in deciding which product to purchase from a list of competing products.♦ is needed as input to the analysis of complex systems to know how often different parts of the system are going to fail even for redundant components.♦ can drive design trade-off studies. For example, we can compare a design with many simple devices to a design with fewer devices that are newer but more complex. The unit with fewer devices is usually more reliable.♦ set achievable in-service performance standards against which to judge actual performance and stimulate action. Reliability Modeling and Predictions: How to Use in Preparation for HALT and HASS, continued ♦ Reliability Modeling and Predictions can be used for: • Identifying Thermocouple Locations • Revealing technology-limiting components • Calculating amount of HASS needed
Reliability Modeling and Predictions:How to Use in Preparation for HALT andHASS, continued♦ Reliability Modeling and Predictions can be used to identify thermocouple locations • For temperature stresses, many component temperatures will be measured during HALT; therefore, a quick analysis is helpful prior to choosing thermocouple locations. This analysis will reveal which component types are more sensitive to temperature from a reliability perspective, and used in conjunction with some basic thermal analysis tools, the temperature gradients of a product can easily be modeled. This analysis, when used properly during the setup of a HALT, can be a very powerful tool in planning out the discovery of the upper thermal operating limit and the upper thermal destruct limit.Reliability Modeling and Predictions:How to Use in Preparation for HALT andHASS, continued ♦ Reliability Modeling and Predictions reveal technology-limiting components • Reliability predictions can also reveal technology- limiting components – components that are much more sensitive to external stresses due to the technology being used (e.g. opto-electronics are very sensitive to high temperature)
Reliability Modeling and Predictions:How to Use in Preparation for HALT andHASS, continued ♦ Reliability Modeling and Predictions can be used to calculate the amount of HASS needed • After HALT is complete, the effects of the first year multiplier factor on the reliability prediction will play a big part in helping to determine the HASS profile because the first year multiplier factor is derived from the amount of “effective” screening being performed, and HASS is probably the most effective type of screening developed to date.Reliability Modeling and Predictions, cont.Reliability - Reliability Modeling/Prediction Flow Work with Product Develop Models Work with Work with Architects to Perform Reliability Based on Hardware Marketing to Implement Predictions to Redundancies and Engineering During Develop Reliability/ Redundancies Determine if Design Perform Reliability Board Design using Availability Targets where Necessary to Stays within Allocations for each Reliability from Requirements Meet Reliability Targets Subassembly Allocations Targets Yes Work with Reliability Can Model or Hardware Publish Reliability Does Design Stay Evaluate Model and Results No Requirements No Engineering to within Targets? Requirements Change? Make Changes to Product Yes Use Results as Input for Use Results as Reliability Input for HALT Demonstration Testing
DERATING ANALYSISDERATING ANALYSISDerating is defined as♦ Using an item in such a way that applied stresses are below rated values, or♦ The lowering of the rating of an item in one stress field to allow an increase in rating in another stress field.CRE Primer by QCI, 1998
DERATING ANALYSIS: How to Use inPreparation for a HALT♦ How to use a Derating Analysis in Preparation for a HALT • For electrical stresses, design engineers typically follow derating guidelines, but there are times when these guidelines are violated, either by exception or by mistake. Reliability predictions can quickly catch these violations and determine the impact on the reliability of the component and on the product. This is important input when planning a HALT, specifically to determine which electrical stresses to apply as accelerant stresses, and by how much. Failure Modes, Effects, and Criticality Analysis (FMECA)
FMECA, continued♦ A FMECA is a systematic technique to analyze a system for all potential failure modes. Each failure mode is scored as follows: • Probability that the failure mode will actually occur • Severity of the failure on the rest of the system • Detectability of the failure mode when it occurs♦ The criticality portion of this method allows us to place a value or rating on the criticality of the failure effect on the entire system or user.♦ Following the scoring of each failure mode, we prioritize and then provide mitigations for the top failure modes and then rescore. CRE Primer by QCI, 1998 FMECA: How to Use in Preparation for a HALT ♦ FMECA’s can be used for: • Identifying failure modes that HALT is likely to uncover • Identifying failure modes that require extra planning to find • Identifying non-relevant failure modes • Helping to identify the number of samples
FMECA: How to Use in Preparation for aHALT, continued♦ FMECA’s can identify failure modes that HALT is likely to uncover • A FMECA will identify failure modes that are likely to be found in HALT so that an estimation can be made on how long the HALT will take. • A FMECA can also help in identifying the effects of certain failure modes so that they can be easily isolated when they occur, saving troubleshooting time during the HALT.FMECA: How to Use in Preparation for aHALT, continued♦ FMECA’s can identify failure modes that require extra planning to find during HALT • One of the first steps in planning a HALT is to determine which stresses to apply and what test routines and monitoring techniques to use. A FMECA can identify failure modes that may be difficult to find, requiring special types or sequences of stresses as well as how to exercise the product so that these types of failure modes are being looked for. • Conversely, a FMECA will also identify failure modes that are not critical so that special time is not devoted on developing test routines to find these types of failures.
FMECA: How to Use in Preparation for aHALT, continued♦ FMECA’s can identify non-relevant failure modes • Some failure modes for a product may not be relevant. A carefully performed FMECA can help identify these so that the HALT plan can attempt to avoid re-discovering these. This will save a lot of time and money, as well as embarrassment (nothing affects the credibility of a HALT program more than having design engineers chasing non-relevant failure modes).FMECA: How to Use in Preparation for aHALT, continued♦ FMECA’s can identify non-relevant failure modes, cont. • Having found all non-relevant failure modes is also very important when setting up a HASS. We may find that the product survived a specific temperature or vibration level during HALT, but degradation due to wearout may be taking place. Even a sophisticated proof-of-screen may not be able to catch some wearout mechanisms. If this occurs, HASSing the product may wind up partially degraded parts being shipped to the field.
FMECA: How to Use in Preparation for aHALT, continued♦ FMECA’s can identify non-relevant failure modes, cont. WEAROUT EXAMPLE An example of this type of wearout failure occurs with optical components due to heat. Many optical components wear out quickly when exposed to excessive temperatures. For HALT, If we know about this when writing our HALT Plan, we can put stop points in our high temperature testing so that we avoid finding these. Another approach is to still go to and beyond these levels, but to disregard these failures when found and concentrate on other more relevant failures at or above these levels. For HASS, we must be aware of these so that we do not start causing wearout to occur.FMECA: How to Use in Preparation for aHALT, continued♦ FMECA’s help identify the number of samples needed for HALT • In determining which stresses to apply, the FMECA will identify the major modes of failure and the sensitivity to each over different stresses. Then, based on the number of failure modes identified, an estimation can be made as to how many failure modes will be uncovered during HALT, thereby helping to determine the number of samples needed for the test.
DESIGN OF EXPERIMENTSDesign of Experiments (SDE) Traditional experiments focus on one or two factors at a couple of levels and try to hold everything else constant (which is impossible to do in a complex process). When SDE is properly constructed, it can focus on a wide range of key input factors or variables and will determine the optimum levels of each of the factors. CRE Primer by QCI, 1998
Design of Experiments: When to Use inConjunction with HALT and HASS♦ When to use Design of Experiments in conjunction with HALT and HASS • When identifying how to combine different stresses for HALT or HASS • When trying to justify using combined stresses for HASS • When troubleshooting HALT or HASS failuresDesign of Experiments: When to Use inConjunction with HALT and HASS, cont.♦ Design of Experiments can be used when identifying how to combine different stresses for HALT or HASS • It may be worth running a Design of Experiments with different stresses to find which stresses combined together are best at finding defects. There are papers published on which work best together, but the results may differ for some types of products, in which case running a small Design of Experiments can help.
Design of Experiments: When to Use inConjunction with HALT and HASS, cont.♦ Design of Experiments can be used when trying to justify using combined stresses for HASS • When trying to justify implementing HASS, it may be worth running a Design of Experiments with different stresses to find which stresses to use. This can help determine the return on investment if capital equipment must be purchased. With most products, a combined environment chamber is the best equipment to use, but some companies have been able to justify running a single stress HASS using existing equipment without compromising the effectiveness of the screen.Design of Experiments: When to Use inConjunction with HALT and HASS, cont.♦ Design of Experiments can be used when troubleshooting HALT or HASS failures • Some failures require detailed failure analysis, and Design of Experiments is a specific type of failure analysis tool that can be deployed when trying to determine the cause of a failure.
Design of Experiments: When to Use inConjunction with HALT and HASS, cont.♦ Design of Experiments can be used when troubleshooting HALT or HASS failures, continued EXAMPLE A failure of an IC occurs in HASS ramping up to 80oC while power cycling and modulating the vibration from 3 to 5 Grms. A second failure of the same IC occurs ramping down from 80oC while power cycling and modulating the vibration from 5 to 3 Grms. Which stress(es) contributed to the failure? We don’t have an unlimited sample of units at our disposal to figure this out. A mini Design of Experiments can help solve this for us. FAULT TREE ANALYSIS
Fault Tree Analysis♦ The fault tree analysis uses the concepts of logic gates to determine the overall reliability of a system. • When we perform an FTA, we start with an undesired event. The undesired event constitutes the top event in a fault tree diagram. • We then brainstorm (just like the FMEA) as to the possible failure modes that can result in this undesired effect.♦ Fault tree analysis is also used in assessing potential system failure modes. CRE Primer by QCI, 1998Fault Tree Analysis: When to Use inConjunction with HALT and HASS♦ When to Use Fault Tree Analysis (FTA) in Conjunction with HALT and HASS • During HALT planning • During failure analysis
Fault Tree Analysis: When to Use inConjunction with HALT and HASS, cont.♦ Using FTA’s during failure analysis • FTA’s are a powerful failure analysis tool after a failure occurs to help identify the cause of the failure. In many cases, troubleshooting can isolate which component failed but an FTA is needed to determine what caused the failure. Even if we know what stresses we were applying at the time, we may not know what ultimately caused the failure.Fault Tree Analysis: When to Use inConjunction with HALT and HASS, cont.♦ Using FTA’s during failure analysis, continued EXAMPLE A device has an internal short during vibration. After performing an FTA, it was discovered that the vibration caused momentary surges of the power supply and the power supply surge ultimately damaged the component.
STRESS-STRENGTH ANALYSISStress-Strength Analysis In the most basic terms, an item fails when the applied stress exceeds the strength of the item. In general, designers design for a nominal strength and a nominal stress that will be applied to an item. One must also be aware of the variability about the stress and strength nominals. CRE Primer by QCI, 1998
Stress-Strength Analysis: How to Use inConjunction with HALT and HASS♦ How to use Stress-Strength Analysis in Conjunction with HALT and HASS • When HALT discovers a failure with little margin • When HALT discovers an inconsistent marginStress-Strength Analysis: How to Use inConjunction with HALT and HASS, cont.♦ Using Stress-Strength Analysis after HALT In the example below, the upper operating limit (strength) does not have enough margin as compared with the upper product spec (stress being applied by customer may slightly exceed specs), and failures can occur within the operating range if the margin is not enough. Lower Lower Upper Upper Destruct Operating Operating Destruct Limit Product Limit Limit Limit Specs Destruct Margin Destruct Margin Operating Operating Margin Margin Problem Area Stress
Stress-Strength Analysis: How to Use inConjunction with HALT and HASS, cont.♦ Using Stress-Strength Analysis after HALT, continued Many times, margins will vary from one sample to the next. If there is enough variability, then products with margins that appear to be adequate may still fail because of the shape or distribution of the strength curve. Lower Lower Upper Upper Destruct Operating Operating Destruct Product Limit Limit Limit Limit Specs Destruct Margin Destruct Margin Operating Operating Margin Margin Problem Area Stress TOLERANCE AND WORST CASE ANALYSIS
Tolerance and Worst Case Analysis Another method of evaluating the design reliability is to analyze the design assuming worst case. That is, assuming that the components are at the extreme in tolerance, environmental or operating conditions. CRE Primer by QCI, 1998Tolerance and Worst Case Analysis:How to Use in Conjunction with HALT♦ How to use Tolerance and Worst Case Analysis in conjunction with HALT • Do not over-design - HALT will catch most tolerance issues • Use Tolerance and Worst Case Analysis in critical areas that have wide tolerance spreads
Tolerance and Worst Case Analysis:How to Use in Conjunction with HALT,cont.♦ Do not over-design – HALT will catch most tolerance issues • Many engineers make the mistake of over-designing the entire product. A better approach is to design the product using basic design guidelines and then let HALT point out the weaknesses of the product. If HALT is performed early enough, then a Tolerance Analysis can be run on failure areas, and these areas can be redesigned stronger without impacting schedule. This will save costs because only a small portion of the product will then be “over-designed”Tolerance and Worst Case Analysis:How to Use in Conjunction with HALT,cont.♦ Use Tolerance and Worst Case Analysis in critical areas that have wide tolerance spreads • If a part of the design has a wide tolerance spread, then issues related to tolerance may not be picked up in HALT due to the low sample size being used. For these cases, then worst-case design practices may need to be employed. • Many mechanical assemblies fall into this category. If issues can arise due to tolerance stacking of dimensions, designing for worst case is the best approach.
HUMAN FACTORS ANALYSISHuman Factors Analysis♦ Human Factors Considerations must be reviewed in each design for: • Safety • Workmanship • Maintainability♦ Depending on the product type and user interface, the scope of this task can vary dramatically.
Human Factors Analysis: How to use inPlanning for HALT and HASS♦ How to use Human Factors Analysis in planning for HALT and HASS • Use/Abuse Conditions Added to HALT Plan • Human Factors Analysis Can Find Manufacturing Variability before HASS Catches ThemHuman Factors Analysis: How to use inPlanning for HALT and HASS, continued♦ Human Factors Analysis can pinpoint use/abuse conditions so that they can be added to the HALT plan • In products with high user interface, use/abuse scenarios must be considered. This can lead to additional stresses and tests required. EXAMPLE On a medical product that was intended to be carried around in a purse, two protocols were developed and added to the HALT Plan: • the possibility of a sharp object accidentally poking into the side of the product. • the possibility of lipstick coming in contact with the product.
Human Factors Analysis: How to use inPlanning for HALT and HASS, continued♦ Human Factors Analysis Can Find Manufacturing Variability before HASS Catches Them • One of the goals of a Human Factors Analysis is to make the product easier to manufacture. Variability in manufacturing processes are easily detected in HASS, but if found in HASS, the issues are more expensive to fix. If there are too many variability issues, HASS is liable to miss some. Therefore, a good Human Factors Analysis on the manufacturing process can help increase the throughput during HASS. MAINTAINABILITY ANDPREVENTIVE MAINTENANCE
Maintainability and PreventiveMaintenance, continuedMaintainability is a function of the design cycle withthe focus on providing a system design thatcontributes to the ease of maintenance and lowestlife cycle cost.• Maintainability must be applied early because it can drive both the mechanical and in some cases the electrical design.• A maintainability prediction is a calculation of the average amount of time a product will be in repair once a failure occurs. This is a function of isolation time, repair time, and checkout time. CRE Primer by QCI, 1998Maintainability and PreventiveMaintenance, continuedPreventive maintenance (PM) has the function ofprevention of failures via planned or scheduledefforts. PM can be based on:• scheduled service for cleaning.• service for lubricating.• detection of early signals of problems.• replacement after specific length of use. CRE Primer by QCI, 1998
Maintainability and PreventiveMaintenance: How to Use in Conjunctionwith HALT and HASS♦ How to use Maintainability and Preventive Maintenance in conjunction with HALT and HASS • Performing HASS on spares • Being prepared for maintaining system during HALTMaintainability and PreventiveMaintenance: How to Use in Conjunctionwith HALT and HASS♦ Performing HASS on spares in conjunction with Preventive Maintenance • Performing Preventive Maintenance/parts replacement on subsystems that have no wearout mode, or too soon before the subsystem goes into wearout mode can actually reduce the reliability because we will be taking out a part in its steady state failure period (bottom of bathtub curve) and replacing it with one at the infant mortality period (left-most part of bathtub curve). One way around this is to perform HASS on the subsystem prior to shipping as a spare.
Maintainability and PreventiveMaintenance: How to Use in Conjunctionwith HALT and HASS♦ Maintainability Analysis prior to HALT can reveal how to diagnose and repair the product during HALT • When planning a HALT, a maintainability analysis will indicate what equipment is needed to diagnose and repair different types of failure modes. This will save the HALT engineer a lot of time and it may even cause changes in the test plan to try to avoid discovering these types of failure modes if adequate resources are not available to help fix the failure (or postponing discovery of the failure modes until which time the resources are available). RELIABILITY INTEGRATION IN THE PROTOTYPE PHASE
Reliability Integration in thePROTOTYPE Phase • Highly Accelerated Life Testing (HALT) • Failure Reporting, Analysis and Corrective Action System (FRACAS) • Reliability Demonstration Test HIGHLY ACCELERATED LIFE TESTING (HALT)
HALT: How to Perform HALT inConjunction with the Reliability Tools♦ How to Perform HALT in Conjunction with the other Reliability Tools • Planning for a HALT • Using results from the Modeling and Predictions, FMECA, and Derating Analyses to help develop the HALT Plan • Executing the HALT • Using a FRACAS for root cause analysis on each failure • Using the HALT Results • Using the HALT results to help plan the RDT • Using the HALT results to help plan HASS All of these are discussed in more detail in the specific section for that tool at the end of the section.HALT Flow ChartReliability - Highly Accelerated Life Testing (HALT) Flow Use Reliability Modeling/ Derating Data as Input Perform a Failure Research Perform HALT, Taking Evaluate Failures/ Modes and Effects Environmental Product Outside Weaknesses and Analysis (FMEA) to Limitations on All Environmental and Fix Those That Are Determine "Exotic" Performance Specs to Relevant and Cost- Weakpoints in a Technologies Being Find Weakpoints Effective Design Used Reliability Send failure information to FRACAS Publish Results Are Margins Yes Retest Product to Acceptable for No Determine New Reliability Limits Reqts? Use Results to Use Results to Develop a Develop a HASS Reliability Profile Demonstration Test
FAILURE REPORTING, ANALYSIS, AND CORRECTIVE ACTION SYSTEM (FRACAS)FRACAS ♦ This is also sometimes referred to as Closed Loop Corrective Action (CLCA) or Corrective and Preventive Action (CAPA). ♦ The purpose of the FRACAS is to provide a closed loop failure reporting system, procedures for analysis of failures to determine root cause, and documentation for recording corrective action.CRE Primer by QCI, 1998
FRACAS: How to use in conjunction witha HALT♦ When performing HALT, failures are identified and each must be taken to root cause. FRACAS is the perfect tool for this. A FRACAS can: • Help classify failures as to their relevancy • Help choose the appropriate analysis tool • Keep track of the progress on each open issue • Help communicate results with other departments and outside the companyFRACAS: How to use in conjunctionwith a HALT, continued♦ A FRACAS can help classify failures as to their relevancy • During HALT, many failures are likely to be uncovered. However, not all failures will be relevant. The FMECA process will find many of these non- relevant failures, but for those that are first found in HALT, a FRACAS will help make the determination of the relevancy by use of a variety of tools.
FRACAS: How to use in conjunctionwith a HALT, continued♦ When performing a failure analysis, there are many tools that can be helpful. Some of these are: • Fault Tree Analyses (FTA’s) • Fishbone diagrams • Pareto charts • Designs of Experiments • Tolerance AnalysesFRACAS: How to use in conjunctionwith a HALT, continued♦ A FRACAS can keep track of the progress on each open issue • Each failure is assigned a unique FRACAS Report ID • Each report requires detailed information about the corrective action and must be signed off • During critical stages in a project, regular FRACAS review meetings are typically held
FRACAS: How to use in conjunctionwith a HALT, continued♦ A FRACAS can help communicate results with other departments and outside the company • FRACAS databases are typically kept on a network drive for general viewing • FRACAS can be sent to a vendor to track failure analysis • FRACAS can be used to communicate with customers on product development or field issuesFRACAS Flow ChartReliability - Failure Reporting Analysis and Corrective Action System (FRACAS) Flow Trend Failure Failure Discovered in Discovered Discovered Repair in HALT in HASS Center Process Process Develop Failure Analysis Plan for Contact Customer Send Sample of Specific Failure or Supplier (if Failure Back to Gather Failure Analyze Failure to Including Resource appropriate) to Component Information Root Cause Plan Inform Them of Manufacturer (if appropriate) Reliability Plan Report Findings Duplicate Failure, if and Implement Did Solution Fix Recommendations Test Solution possible Corrective Action Problem? Yes No Report Solution and Monitor Close Failure Effectiveness of Modify HASS Analysis Solution / Perform Profile, if necessary Verification HALT
RELIABILITY DEMONSTRATION TESTING (RDT)RDT: What is it?♦ A sample of units are tested at accelerated stresses for several months.♦ The stresses are a bit lower than the HALT stresses and they are held constant (or cycled constantly) rather than gradually increasing.♦ This enables us to calculate the acceleration factor for the test.♦ The RDT can be used to validate the reliability prediction analyses.♦ It is also useful in finding failure modes that are not easily detected in a high time compression test such as HALT.
RDT, continued CRE Primer by QCI, 1998RDT, continued CRE Primer by QCI, 1998
RDT: Hypothesis Testing♦ Testing if two means are equal H o : µ = µo H a : µ > µo n =σ2 (z α + zβ ) 2 ∆2RDT: Type I and Type II Errors Null Hypothesis Decision True False Reject Ho Type I error Correct α 1-β Accept Ho Correct Type II error 1- α β
RDT: Success Testing ln (1 − C ) n= ln R LRDT: Accelerated Life Testing 2 K σ n = R V γ w R is the ratio of the Meeker Hahn variance over the optimum variance V is the optimum variance factor Kγ is the standard normal100(1+ γ )/2 percentile σ is the standard deviation (1/β if Weibull) w is distance to true value
RDT: Sample Size Calculation Needed sample size giving approximately a 50% chance of having a confidence interval factor for the 0.2 quantile that is less than R weibull Distribution with eta= 1573 and beta= 1.5 Test censored at 2160 Time Units with 80 expected percent failing 2000 1000 500 200 Sample Size 100 50 20 10 99% 5 95% 90% 2 80% 1.0 1.5 2.0 2.5 3.0 3.5 Confidence Interval Precision Factor ROct 23 11:04:43 PDT 2004 Sat RDT: How to Use the Results of HALT in Planning an RDT ♦ Two of the most important pieces of information to decide upon when planning an RDT is which stresses to apply and how much. From this, we can derive the acceleration factor for the test. HALT can help with both of these. • HALT will identify the effects of each stress on the product to determine which are most applicable. • HALT will identify the margins of the product with respect to each stress. This is critical so that the highest amount of stress is applied in the RDT to gain the most acceleration without applying too much, possibly causing non-relevant failures.
RDT: How to Use the Results ofReliability Predictions in Planning an RDT♦ Another key factor in planning an RDT is the goal of the test. This is usually driven by marketing requirements, but the Reliability Prediction will help determine how achievable this is • Although the prediction may not be able to give an exact MTBF number, it will give a number close enough to help determine how long of an RDT to run and what type of confidence in the numbers to expect. • Many times, the reliability of the product will far exceed initial marketing requirements. If this is the case, the RDT can be planned to try to prove these higher levels. Once achieved, the published specs from marketing can be increased.RDT Flow ChartReliability - Reliability Demonstration Testing Flow Input From Reliability Input From Modeling/ HALT Derating Develop Test Plan, including 1. Number of Units Review Reliability 2. Acceleration Factors Set up and Begin Reliability Goals Based on 3. Total Test Time Monitor Results Test Marketing Input 4. Confidence Levels Have Reliability Publish Results Goals Been Yes Met?
RELIABILITYINTEGRATION IN THE MANUFACTURING PHASEReliability Tools and Integration in theMANUFACTURING Phase • Highly Accelerated Stress Screening (HASS) • Highly Accelerated Stress Auditing (HASA) • On-Going Reliability Testing (ORT) • Repair Depot Setup • Field Failure Tracking System • Reliability Performance Reporting • End-of-Life Assessment
HIGHLY ACCELERATED STRESS SCREENING (HASS)HASS: How to Use the Results of FMECAand a Reliability Predictions in Planning aHASS♦ How to use the results of FMECA and a Reliability Prediction in planning a HASS • FMECA results can identify possible wearout mechanisms that need to be taken into account for HASS. • Reliability Prediction results can help determine how much screening is necessary.
HASS: How to Use the Results ofFMECA and a Reliability Predictions inPlanning a HASS, continued♦ Using FMECA results to identify possible wearout mechanisms that need to be taken into account for HASS • As we discussed in the FMECA section, certain wearout failure modes are not easily detectable in HALT or even in HASS Development. Therefore, when wearout failure modes are present, we must rely on the results of a FMECA to help determine appropriate screen parameters.HASS: How to Use the Results ofFMECA and a Reliability Predictions inPlanning a HASS, continued♦ Using Reliability Prediction results to determine how much screening is necessary • One of the parameters of a reliability prediction is the First Year Multiplier factor. This is a factor applied to a product based on how much manufacturing screening is being performed (or is planned for) to take into account infant mortality failures. • The factor is on a scale between 1 and 4. No screening yields a factor of 4, and 10,000 hours of “effective” screening yields a factor of 1 (the scale is logarithmic).
HASS: How to Use the Results ofFMECA and a Reliability Predictions inPlanning a HASS, continued♦ Using Reliability Prediction results to determine how much screening is necessary, continued • Effective screening allows for accelerants such as temperature and temperature cycling. • HASS offers the best acceleration of any known screen. Therefore, HASS is the perfect vehicle for helping to keep this factor low in a reliability prediction.HASS: Using the Results of HALT toDevelop a HASS Profile♦ Using the HALT Results, we then run a HASS Development process • The process must prove there is significant life left in the product • The process must prove that it is effective at finding defects.
HASS: Linking the Repair Depot withHASS by Sending “NTF” hardware backthrough HASS♦ During the repair process, we may identify a large number of “No Trouble Founds” or NTFs. HASS is the perfect vehicle for identifying if these NTFs are truly intermittent hardware problems or due to something else. Using HASS to assist with the “No Trouble Found (NTF)” issue at the Repair Depot. ON-GOING RELIABILITY TESTING (ORT)
On-Going Reliability Testing (ORT)♦ ORT is a process of taking a sample of products off a production line and testing them for a period of time, adding the cumulative test time to achieve a reliability target. The samples are rotated on a periodic basis to: • get an on-going indication of the reliability • assure that the samples are not wearing too much (because after the ORT is complete, the samples are shipped).Comparison Between ORT and HASA♦ ORT Benefits over HASA • You can measure reliability at any given time♦ HASA Benefits over ORT • Effective process monitoring tool due to ability to find failures and to timely corrective actions • Don’t need to measure on-going reliability because reliability measurement was already done once in RDT. Also, periodic HALT is a much better vehicle for continuously monitoring reliability over time after it has been baselined.
REPAIR DEPOT SETUPRepair Depot Setup♦ A Repair Depot facility must be set up with the proper testing in place to reproduce the failures and to assure that the product has enough life left to be shipped back into the field.♦ But more importantly it must be set up in such a way as to learn from the failures and make changes to the design and manufacturing processes to assure the failures are not repeated.
Repair Depot Setup♦ Set up the Repair Depot System to feed data to the Field Failure Tracking System • The Repair Depot Center retests products returned from the field to confirm failures and determine root cause. • The confirmation is then fed back to the Field Failure Tracking System so that it can be properly categorized for reliability data reporting. FIELD FAILURE TRACKING SYSTEM
Field Failure Tracking System♦ The purpose of the Field Failure Tracking System is to provide a system for evaluating a product’s performance in the field and for quickly identifying trends.Field Failure Tracking System♦ Integrating the Field Failure Tracking System with the Repair Depot Center • Failed products from the field are returned to the Repair Depot Center for confirm and to determine root cause. • The confirmation is then fed back to the Field Failure Tracking System so that it can be properly categorized for reliability data reporting.
RELIABILITY PERFORMANCE REPORTING Reliability Performance Reporting♦ Reliability Performance Reporting in its simplest form is just reporting back how we are doing against our plan. In this report, we must capture • how we are doing against our goals and against our schedule to meet our goals ? • how well we are integrating each tool together ? • what modifications we may need to make to our plan ?♦ In the report, we can also add information on specific issues, progress on failure analyses, and paretos and trend charts
Reliability Performance Reporting♦ How we are doing against our goals and against our schedule to meet our goals ? • After collecting the field data, we then compare with our goals and estimate how we are doing. • If we are achieving a specific goal element, we explain what pieces are working and the steps we are going to take to assure that this continues • If we are not achieving a specific goal element, we must understand what contributed to this and what steps we are going to take to change this • As part of this, we must understand the major contributors to each goal element through trend plotting and failure analyses Reliability Performance Reporting♦ How well we are integrating each tool together ? • As part of an understanding the effectiveness of our reliability program, we must look at the overall program • For example, if we stated in the plan that we were going to use the results of the prediction as input to HALT, we must describe here how we accomplished this • This can help explain the effectiveness of the HALT so that its results can be repeated • This can help explain how the HALT can be more effective in future programs if we overlooked or skipped some of the integration • This will serve as documentation for future programs
Reliability Performance Reporting♦ What modifications we may need to make to our plan ? • Occasionally, we may need to modify the plan • Goals may change due to new customer/marketing requirements • We may have discovered new tools or new approaches to using existing tools based on research • We may have developed new methods of integration based on experimentation and research • Schedule may have changed Reliability Performance Reporting♦ What modifications we may need to make to our plan ? • If this occurs, we need to • Re-write the plan • Summarize the changes in our Reliability Performance Report so that we can accurately capture these new elements going forward
END-OF-LIFE ASSESSMENTEnd-of-Life (EOL) Assessment♦ We Perform End-of-Life Assessments to • Determine when a product is starting to wear out in case product needs to be discontinued • Monitor preventive maintenance strategy and modify as needed • Monitor spares requirements to determine if a change in allocation is necessary • Tie back to End-of-Life Analysis done in the Design Phase to determine accuracy of analysis
End-of-Life (EOL) Assessment ♦ A review of the “bathtub” curve Infant Mortality level driven by amount of screening in mfg./characterized using a special factor in predictionFailure Onset of end- Ideal Steady State Reliability of-life (EOL)Rate reliability at Level described by time of prediction ship Time End-of-Life (EOL) Assessment ♦ To figure out where we are, we plot the field data • We must “scrub” the data to • accurately determine the number of days in use before failure • properly categorize the failure • We must be careful and plot data by assembly type, especially if different assemblies have different wearout mechanisms. Otherwise, it will be impossible to determine a pattern
End-of-Life (EOL) AssessmentReliaSofts Weibull++ 6.0 - www.Weibull.com Failure Rate vs Time Plot 0.10 Weibull Since Jan 28 - (NTF-knwnissues) W2 RRX - SRM MED F=49 / S=0 0.08 0.06 Failure Rate, f(t)/R(t) 0.04 0.02 Mike Silverman Company 0 5/2/2004 07:58 0 40.00 80.00 120.00 160.00 200.00 Time, (t) β=2.9032, η=60.9188, ρ=0.8154 RELIABILITY INTEGRATION SUMMARY
Reliability Integration SummaryIn this section, we learned about: • The four phases of a reliability program • Concept • Design • Prototype • Manufacturing • We learned about the reliability tools used in each phase and how to integrate all of the tools together • We learned about HALT and HASS and their role in an overall reliability programReliability Integration SummaryIn the Concept Phase, we learned about: • Benchmarking • Gap Analyses • ReliabilityProgram and Integration Plan Development and how to use these tools to effectively plan and execute a Reliability Program
Reliability Integration SummaryIn this Design Phase, we learned about: • Reliability Modeling and Predictions • Derating Analysis/Component Selection • Tolerance/Worst Case Analysis/Design of Experiments • Risk Management / FMECAs • Fault Tree Analysis (FTA) • Human Factors/Maintainability/Preventive Maintenance • Software Reliability and how to integrate these together and with tools from the other phases, including HALT and HASSReliability Integration SummaryIn the Prototype Phase, we learned about: • Reliability Test Plan Development • Highly Accelerated Life Testing (HALT) • Design Verification Testing (DVT) • Reliability Demonstration Testing • Failure Analysis Process Setup and how to integrate these together and with tools from the other phases
Reliability Integration SummaryIn the Manufacturing Phase, we learned about: • Highly Accelerated Stress Screening (HASS) • On-Going Reliability Testing • Repair Depot Setup • Field Failure Tracking System Setup • Reliability Performance Reporting • End-of-Life Assessment and how to integrate these together and with tools from the other phasesReliability Integration SummaryIn Summary we have learned: • the power of developing realistic reliability goals early, planning an implementation strategy, and then executing the strategy, and... the power of integration !!
Reliability Integration Summary WHAT ARE YOUR QUESTIONS ?Further Education • For a more In-depth view of this topic and more, Mike will be teaching at: • January 11th through March 1st, 2005: “Certified Reliability Engineer (CRE) Preparation Course” to prepare for taking the ASQ CRE Exam • May, 2005: “Analysis & Test Tools for Comprehensive Reliability” – a more in-depth look at the best reliability tools being used today
Additional Services fromOps A La Carte Reliability Integration in the Concept Phase1. Benchmarking2. Gap Analysis3. Reliability Program and Integration Plan Development Reliability Integration in the Design Phase1. Reliability Modeling and Predictions2. Derating Analysis/Component Selection3. Tolerance/Worst Case Analysis/Design of Experiments4. Risk Management / Failure Modes, Effects, & Criticality Analysis (FMECA)5. Fault Tree Analysis (FTA)6. Human Factors/Maintainability/Preventive Maintenance Analysis7. Software ReliabilityAdditional Services fromOps A La Carte Reliability Integration in the Prototype Phase1. Reliability Test Plan Development2. Highly Accelerated Life Testing (HALT)3. Design Verification Testing (DVT)4. Reliability Demonstration Testing5. Failure Analysis Process Setup Reliability Integration in the Manufacturing Phase1. Highly Accelerated Stress Screening (HASS)2. On-Going Reliability Testing3. Repair Depot Setup4. Field Failure Tracking System Setup5. Reliability Performance Reporting6. End-of-Life Assessment
Additional Educational Courses fromOps A La Carte1. Reliability Tools and Integration for Overall Reliability Programs2. Reliability Tools and Integration in the Concept Phase3. Reliability Tools and Integration in the Design Phase4. Reliability Tools and Integration in the Prototype Phase5. Reliability Tools and Integration in the Manufacturing Phase6. Reliability Techniques for Beginners7. Reliability Statistics8. FMECA9. Certified Reliability Engineer (CRE) Preparation Course for ASQ10.Certified Quality Engineer (CQE) Preparation Course for ASQ For more information...♦ Contact Ops A La Carte (www.opsalacarte.com) • Mike Silverman • (408) 472-3889 • firstname.lastname@example.org • Fred Schenkelberg • (408) 710-8248 • email@example.com Thank you for your time !