• Like

Design for Testability DfT Seminar

  • 1,235 views
Uploaded on

Design for Testability (DfT) Seminar

Design for Testability (DfT) Seminar

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
1,235
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
50
Comments
2
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Test Engineering Courtesy of Patrick D.T. O’Connor 62 Whitney Drive Stevenage Herts. SG1 4BJ UK www.pat-oconnor.co.ukwww.pat-oconnor.co.uk/testengineering/htm pat@pat-oconnor.co.uk pdtoconnor@ieee.org 1
  • 2. Test EngineeringOutline (day 1):1. Introduction2. Stress, strength, failure ofmaterials3. Stress, strength, failure ofelectronics4. Variation and reliability5. Design analysis6. Development test principles 2
  • 3. Test EngineeringOutline (day 2):7. Materials and systems test8. Electronics test9. Software10. Manufacturing test11. Testing in service12. Data collection and analysis13. Laws, regulations, standards14. Managing test 3
  • 4. Test EngineeringWhy test?• Design uncertainty• Manufacturing• Variation• Maintenance• Regulations• Contracts 4
  • 5. Test EngineeringCauses of failure• Design inherently incapable• Variation (parameters,environments)• Wearout• Other time-dependent mechanisms• Sneaks• Errors We must know them all! 5
  • 6. Test EngineeringHow to test?• Test to succeed/test to fail?• Accelerated test• Systems and components• Technologies• Processes• Analysis and simulation 6
  • 7. Test EngineeringTesting tales:• “Our engineers are paid to design right”• “Trains don’t need testing”• Ship engine for a locomotive?• We always have done this test• The telecomms system• MIL-STD-883 IC burn-in test• “Don’t overstress”• Too much test? 7
  • 8. Test EngineeringDevelopment test principles•Failure costs exceed costs of test to detect & remove (Deming).•Failure-free design: selection, training,teams, leadership•Optimise test programme •Test adds value! 8
  • 9. Test EngineeringDevelopment test costs• Test articles (“UUT”)• People X time• Facilities• Delay to market• Downstream opportunities(warranty, fixes, reputation, etc.) 9
  • 10. Test EngineeringManagement aspects:• Design capability/risks• Markets, competition• Product environment, life• Suppliers• Regulations• Manufacturing, service 10
  • 11. FAILURE CAUSES: MECHANICAL• Maximum stress, fracture• Stress cycling, fatigue, creep (vibration, temperature cycle)• Wear• Corrosion• Manufacture• Variation• Other (leaks, backlash, friction, ...) 11
  • 12. MATERIAL STRESS, STRENGTH, FAILUREProperties:• Strength/elasticity (Hooke’s Law) – Stress (σ) = Young’s Modulus (E) X strain (ε)• Yield strength, ultimate tensile strength (UTS)• Toughness/brittleness (resistance to fracture: energy/volume)• Crack growth (Griffith’s Law) 12
  • 13. MATERIAL STRESS, STRENGTH, FAILUREHooke’s Law: Stress σ Plastic Fracture Elastic Yield point Strain ε 13 Figure 2.1 Material behaviour in tensile stress
  • 14. MATERIAL STRESS, STRENGTH, FAILURE Brittle:Stress cast iron σ ceramics Tough: MPA glass kevlar steels 400 alloys (Al, Ti, etc.) Ductile: 200 plastics copper solder 10 20 30 Strain ε %Figure 2.2 Tensile stress/strain behaviour of different materials (generalised) 14
  • 15. FINITE ELEMENT ANALYSIS(MECHANICAL STRESS) (MSC) 15
  • 16. MECHANICAL FAILURE CAUSES• Shock overload Constant failure/hazard rate (CFR/CHR) (Load - Strength Analysis)• Strength deterioration Increasing failure/hazard rate (IFR/IHR) Durability 16
  • 17. CAUSES OF STRENGTH DETERIORATION• Fatigue (cyclic stress: vibration, handling, temperature cycling)• Creep (high temperature + mech. stress)• Wear (parts moving in contact: connectors)• Corrosion (electrolytic, contamination, ...)• etc. 17
  • 18. FATIGUE: S - N CURVEStress S UTS Fatigue limit 1 10 100 1000 10000 100000 Cycles to failure N (log scale) 18
  • 19. FATIGUE: MINER’S RULE M1 M2 Mk + + … =1 n1 n2 nk 19
  • 20. “CLASSIC” FATIGUE FAILURE Initiating crack or damage Granular fracture surfaceCrackgrowth rings 20
  • 21. DESIGN AGAINST FATIGUE• Reduce mech. stress concentrations (FEA)• Provide support for heavy components, connectors, etc.• Minimise thermal gradients• Know material fatigue properties particularly solder!• Design for safe life• Design for fail-safe• Design for inspection & test 21
  • 22. VIBRATIONLeads to:• Fatigue• Wear• Loosening• Leaks• Noise 22
  • 23. VIBRATIONMeasures:• Frequency (Hz)• Displacement (m)• Velocity (m/s)• Acceleration (peak) (m/s2 or gn)• Damping (reduces amplitude)• Noise, vibration and harshness (NVH) 23
  • 24. VIBRATION: WATERFALL PLOT Figure 2.5 Waterfall plot of vibration data 24
  • 25. TEMPERATURE EFFECTS• Expansion/contraction (TCE)• Softening, weakening, melting (metals, some plastics)• Charring (plastics, organics)• Drying/condensation/freezing• Other physical/chemical (Arrhenius’ Law)• Viscosity change, lubricant loss• Interactions (corrosion, …) 25
  • 26. WEAR MECHANISMS• Adhesive• Fretting• Abrasive• Cavitation/Erosion• Corrosive 26
  • 27. WEAR REDUCTION• Examine• Test/analyse• Lubricate (oils, MoS2-----)• Surface treatment (PTFE, …)• Stress reduction (mech, temp, vibration)• Material change (eg. non- abrasive) 27
  • 28. CORROSION• Ferrous Alloys (Rust)• Non - Ferrous:- Al, Mg• Chemical• Electrolytic 28
  • 29. PREVENTING CORROSION• Material selection• Surface protection - Anodising - Plating (Cr, Sn, ----) - Painting - Lubricating• Environmental protection (seals, desiccants) 29
  • 30. OTHER MECHANICAL FAILURE MECHANISMS • Backlash (wear?) • Adjustments • Leaks • Loosening (fasteners) - Wear? - Maintenance? • etc. 30
  • 31. MATERIAL SELECTION FOR RELIABILITY/DURABILITY• Metals:- Corrosion Protection Fatigue• Plastics, Rubbers:- Chemical Temperature stability UV sensitivity• Ceramics:- Fracture toughness• Composites:- Impact strength Delamination Erosion 31
  • 32. Electrical/electronics Stress, Strength & Failure• Component selection• Stress derating (electrical, thermal)• EMI, EMC, ESD• Parameter variation• Connectors• Mechanical 32
  • 33. Stress Effects• Current – temperature rise – drift• Voltage – current/overstress (EOS) – arcing, corona discharge• Power (W=I2R)• Temperature 33
  • 34. Arrhenius’ Law λ=Kexp − E ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎣ kT ⎥ ⎦ or λ= Kexp −A ⎡ ⎢ ⎢ ⎤ ⎥ ⎥ T ⎢ ⎢ ⎣ ⎥ ⎥ ⎦E = activation energy (0.3 - 1.5 eV)k = Boltzmann’s constant (8.63 x 10-5 eVK-1) 34
  • 35. Temperature Effect on Reliabilityλ MIL217, Bellcore Reality 20 Rated (85/125) 200? T deg. 35 C
  • 36. Drift Characteristics Carbon Resistor +70C Change in R% 0-0.5-1.0 50% PSR 100% PSR-1.5 1.0 1.5 2.0 Time hX1000 36
  • 37. Semiconductor DeviceConstruction Features• Si preparation• Diffusion• Passivation*• Metallization*• Glassivation• Connection• Packaging (*multilayer) 37
  • 38. Semiconductor Device Technologies• ASIC• Mixed signal (analog/digital/RF)• 3-5 (GaAs, InP)• Power (transistors, thyristors, GTO, IGBT)• Microwave (MMIC) 38
  • 39. Microcircuit Mounting and Connection• DIP in PTH• Flat pack / SOIC• Surface mounting − Leadless chip carrier (LCC) − Pin grid array (PGA)/ball grid array (BGA) − Chip scale packaging (CSP) − Tape automated bonding (TAB)• IC sockets (DIP, LCC) 39
  • 40. Semiconductor Device Failure Mechanisms1. Die Related• Crystal structure / impurity• Diffusion / masking• Passivation / dielectric breakdown (TDDB)• Electromigration• Passivation• Latch-up• Slow trapping, hot carriers, alpha particle• External: ESD / EOS / EMP 40
  • 41. Semiconductor Device Failure Mechanisms2. Package Related • Adhesion • Bonding • Impurity / corrosion / inclusions • Hermeticity • Solderability 41
  • 42. Passive Device Failure Mechanisms1. Resistors (Fixed) • Parameter drift • Open circuit • Noise2. Variables • As above plus: • Mechanical failure • Contact failure • Seal failure 42
  • 43. Passive Device Failure Mechanisms3. Capacitors • Short circuit (dielectric breakdown) • Open circuit (high V) • Leakage (wet types) • Wire bond failure (open circuit) 43
  • 44. Passive Device Failure Mechanisms4. Interconnections• PCB - ball bonds - track cracks (opens) - through hole opens - shorts• Wire/ribbon − breaks (fatigue, damage) − solder attach• Intermittents 44
  • 45. SolderMajor contributor to failures!(SMT, BGA, >10K joints/board)• Inadequate wetting (contamination, oxidation)• Insufficient time (“second drop”)• Fatigue• Creep 45
  • 46. Insulation• Damaged, cut, chafed, trapped, …• Overheated• Aged, embrittled• Eaten (rodents) 46
  • 47. System/circuit Problems• Distortion• Jitter• Timing• Interference/compatibility (“noise”) (EMI/EMC)• Intermittents/no fault found (NFF) 47
  • 48. EMI: Problems• High frequencies (MHz - GHz) (VHF-UHF!)• Close spacing (SMT, narrow tracks)• ASICs, mixed signals (digital, RF)• New regulations (UL, CE, etc.)• Lack of knowledge (designers, managers)• Basic EDA does not simulate 48
  • 49. EMI Sources (internal)• Current loops (Lenz’s Law: reduce loop area)• Signal noise (components, conductors)• Ground noise 49
  • 50. EMI Sources (external)• ESD• Switched inductive loads• Supply transients• Other systems (motors, radars, computers, peripherals) 50
  • 51. EMI Protection• Shielding − Faraday Shield − Coax cables• Circuit protection − Capacitive (decoupling) − Inductive − Opto-couplers − Filters, regulators (on PCB) 51
  • 52. Electrical Overstress/ Electrostatic DamageEOS/ESD• ICs ARE VULNERABLE!!• People generate 1 - 5 kV / 50 - 100 μJ• EOS / ESD can kill ICs• It can also do GBH• On-chip protection 52
  • 53. EOS/ESD Protection• Connector separation for different voltage levels• Decoupling of ICs• Isolation (opto-couplers)• Handling / packaging / bonding• On-chip protection 53
  • 54. Probability DistributionsHistogram and Probability Density Function pdf f(x) x 54
  • 55. Normal DistributionProbability Variable -4 -3 -2 -1 1 2 3 4 X standard 55 Mean deviation s
  • 56. ”Natural” Variation• Constant in time. Past = Future• ”Normal” Distribution Function (Mean, Standard Deviation)• ”Made by God” 56
  • 57. Normal (Gaussian) Distribution• Central Limit Theorem• Symmetrical about mean/median μ• Standard deviation (SD) σ . Variance = σ2 in ±nσ : 1 2 3 6 lie: 68% 95% 99.7% 99.999999% 57
  • 58. Variation in Engineering• Not ”normal”• Not constant in time. Past NOT = Future• Selection effects• Often deterministic (V = IR, F = ma)• Sometimes due to failures, errors,....• Occasionally catastrophic (discontinuous, eg. fatigue)• ”Made by man” 58
  • 59. Curtailed DistributionProbability -4 -3 -2 -1 1 2 3 4 Mean X standard deviation s59 Variable
  • 60. Effect of SelectionProbability -10% -5% Nom. +5% +10% Parameter 60
  • 61. Skewed DistributionProbability Variable 61
  • 62. Bimodal Distribution (typical human mortality)Probabilityof death atthis age 10 20 30 40 50 60 70 80 90 100 110 Variable (years) 62
  • 63. Normal Distributions?1234 -nσ Mean nσ Four distributions with same mean and SD (from Shewhart) 63
  • 64. Weibull Distribution β R = exp[-(t/μ) ]μ = Characteristic lifeβ = Shape parameter (slope) = 1 : CHR < 1 : DHR > 1 : IHRIf failure-free life = γ, replace t with (t - γ) 64
  • 65. Distributed load and strengthProbability Load Strength L S Value L S a. Non-overlapping b. Overlapping distributions: distributions wide strength variation (low LR) L S L S d. Overlapping distributions: c. Curtailed strength distribution wide load distribution (high LR) 65
  • 66. Distributed Load & StrengthFor Normally Distributed Load L and Strength S S- L σσ 2 2 L σL σS+σ L 2 2 66
  • 67. Time-dependent load and strength Strength Load t’ Time/load cycles Log scale 67
  • 68. Strength v. specification (time dependent) Time Probability Probability of failing at max. specified stress Specification Strength Figure 6.3 Strength vs. Specification (time-dependent) 68
  • 69. Summary of High Reliability Design Principles• Determine most likely distributions of load and strength• Evaluate SM for intrinsic reliability• Determine protection methods (load limit, derate, screen, QC)• Analyse strength degradation modes• Test to corroborate, analyse results• Correct or control (redesign, safe life, maintenance,...) 69
  • 70. Multiple VariationsTraditional Method:• Test effect of one variable at a time• Cannot test interactions 70
  • 71. Statistical Design of Experiments DoE • Test all variables simultaneously • Randomisation • Analysis of variance (ANOVA): 1. Determines effects of all variables 2. Determines effects of all interactions (R.A.Fisher, 1926) 71
  • 72. Genichi Taguchi• ”Loss to Society”• System Design• Parameter Design• Tolerance Design• Control & Noise Factors• Orthogonal Arrays• Brainstorm 72
  • 73. DoE: Engineering Aspects• Statistical v. engineering significance• Randomisation• Cost effectiveness• Confirmation• SPC• CAE• Nonlinearity• Management 73
  • 74. Confidence and Risk• s-confidence = probability that population parameter lies between “confidence limits”• Bigger sample, narrower confidence limits• Risk = (1 - confidence) (probability that parameter lies outside confidence limits)• s - confidence vs. engineering confidence 74
  • 75. Statistical, Scientific and Engineering Confidence• Statistical test (binomial): items tested, 0 failures 0 1 10 20 80% s-confidence that R > 0 0.90 0.98 0.99 Data is entirely statistical, no prior knowledge• Scientific test: items dropped, all fall 0 1 10 20 confidence that all will fall 1 1 1 1 Information is deterministic• Engineering: can range from deterministic to statistical 75
  • 76. Measures of Reliability• Failure Rate (FR) (λ)• Hazard Rate (HR for non-repairable items) (λ)• Mean Time Between Failures (MTBF) (M)*• Mean Time to Failure (MTTF) (M)*• Durability (failure free life; FR = 0)• Reliability R = Probability of no failures in time t = e-λt = e-t/M **(for constant failure/hazard rate) 76
  • 77. Patterns of FailureThe Bathtub Curve Total IFR (wearout) CFR DFR (weak) 0 t Infant mortality Useful life Wearout 77
  • 78. Variation: summary• Variation is seldom (never?) “normal”• Most important variation is in the tails – Less data – More uncertain – Conventional stats most misleading• Variation can change over time• Interaction effects• Variation made by people• Most engineering education maths only 78
  • 79. Development Test PrinciplesCategories of test:• Functional (design proving/proof ofprinciple)• Reliability/durability• Contractual/safety/regulatory• Test and evaluation (T&E)• Beta testing 79
  • 80. Development Test PrinciplesFill ”uncertainty gap”• Performance/safety: – demonstrate success – perform once• Reliability/durability: – test to fail – accelerated tests• Variation: – Taguchi/statistical experiments – Multiple tests? 80
  • 81. Development Test Principles• Components, systems, interfaces• Software• External suppliers• FRACAS• Integrated test programme 81
  • 82. Development Test PrinciplesTest economics: major driver ofdevelopment cost & time, BUT:• Failure costs increase duringproject phases (x10 rule: design,development, production, service)• Failure free design is cheaper!(experience, training, integratedengineering, design analysis) 82
  • 83. Development Test Principles Strength v. SpecificationProbability Specification L Strength (stress to fail) 83
  • 84. Development Test PrinciplesStrength v. Specification(transient & permanent failures)Probability Transient Permanent Specification Strength (stress to fail) 84
  • 85. Development Test Principles Strength v. Specification (time dependent)Probability Time Specification Strength (stress to fail) 85
  • 86. Development Test Principles• Failures are often due to combinedstresses/strengths (uncertain)• Failures are often influenced byinteractions (uncertain)• Failures often time-dependent (uncertain)• Causes of service failures can be shownby different test stresses, e.g. – vibration/temperature cycle – high frequency/low frequency 86
  • 87. Development Test Principles Fundamental principle: increase(combined) stresses to cause failures,then use information to make productstrongerLimits:• Technology (e.g. solder melt)• Test capability• Economic 87
  • 88. Development Test PrinciplesTesting at “representative” stresses, andhoping for no failures, is ineffective and awaste of resourcesExamples:• Engines on test beds• Cars on test tracks• “Simulated” environmental test (MIL-STD-781,MIL-STD-810, etc.) 88
  • 89. Development Test PrinciplesEnvironments (1):• All relevant environments• Combined environments (CERT)• User• Environmental simulation? 89
  • 90. Development Test PrinciplesEnvironments (2):• Thermal• Thermal fatigue (switching)• Vibration• Shock• Humidity• Power supply/load• Transients (ESD, EOS)• Pollution, corrosion• People, other animals• Etc. 90
  • 91. Development Test PrinciplesAccelerated stress test• Miner’s Law for fatigue (mech, thermal)• Arrhenius Law for thermal acceleration?• Step-stress testing• Failure modes relevant, not stress levels! 91
  • 92. Development Test PrinciplesHighly accelerated life test (HALT) (1)• Highly accelerated combined stresses(temperature, cycling, multi-axisvibration, others...)• Step stress to discover transient andpermanent limits• Time compression: orders of magnitude• Developed by Gregg Hobbs 92
  • 93. Development Test PrinciplesHALT (2)• Special chambers, facilities (QualMark,Thermotron, Screening Systems, TEAM, ...)• Savings: time, space, energy• Optimise manufacturing screens (HASS)• Similar approaches: – Highly accelerated stress test (HAST) – Stress-induced failure test (STRIFE) – Failure mode verification test (FMVT ® Entela) – Etc. 93
  • 94. HALT Philosophy (1)Stress limitsLower Lower Upper Upper Product operating destructdestruct operating spec. limit limit limit limit Stress (combined) • High stresses = small samples! 94
  • 95. HASS Philosophy Precipitation screen Detection screenLower Lower Upper Upperdestruct operating Product operating destruct limit limit spec. limit limit Stress (combined) 95
  • 96. HALT/HASS Philosophy (2)Stress(S) HALT/ HASS ESS in use Cycles to fail (Log N) 96
  • 97. Accelerated Test ApproachTE p1051. What failures might occur in service? (FMEA,etc).2. List/analyse stresses, combinations.3. Plan how to apply.4. Apply single stresses, step increases to failure.5. Analyse failure, strengthen design.6. Iterate 4 & 5 to fundamental limits.7. Repeat with combined stresses.8. Iterate 5 & 6. 97
  • 98. Accelerated Test ApproachExamples:• Mechanical (rotating, engines, etc.) – Old lubricants, filters – Low fluid levels (oil, coolant) – Out-of-balance• Electro-mech (printers, etc.) – Temp, vib, power V level, humidity, ... – Misalign shafts, etc. – Out-of-spec. materials (paper, friction, ...)• Electronic components/packages, etc. – Temp, vib (high frequencies), etc. – Use vibration transducers (speaker coils?) 98
  • 99. Accelerated Test ApproachQuestions (TE p109):• How many to test?As many as practicable /economic• Can reliability (MTBF, durability) be measured?NO! It will be increased!• How do we know if failure on test could occur inservice?Analyse, use experience, THINK!• Product will see no vibration in service. Why vibrate ontest?Vibration on test can stimulate failures caused by temp.cycle, handling, etc. in service, QUICKLY!• Is the principle limited to temp, vib, elec stress?Not at all. Apply to fluid systems, mech tolerances, etc. 99
  • 100. HALT/HASS Payoffs• Robust designs + capable processes = High Reliability• Reduced test time and cost• Feedback to design: reduce “uncertainty gap” on future products• Continuous improvement (“kaizen”) of design capability (products, processes) 100
  • 101. Accelerated Test or DoE?Important Variables, Effects, etc. DoE/HALT?Parameters: electrical, dimensions, etc. DoEEffects on measured performanceparameters, yields DoEStress: temperature, vibration, etc. HALTEffects on reliability/durability HALTSeveral uncertain variables DoENot enough items available for DoE HALTNot enough time available for DoE HALT 101
  • 102. Circuit Test Principles: Analog• DC: current, potential, resistance (AVO),capacitance, ...• AC: current, potential, impedance, waveforms,...• Signals: waveforms, gain, distortion, jitter, ... 102
  • 103. Circuit Test Principles: Digital “Stuck at” faults (SA0, SA1) Ainputs O (output) B Truth table for 2-input AND gate Truth table: A B O Test vectors: 4 0 0 0 0 1 0 (combinational logic) 1 0 0 1 1 1 103
  • 104. Circuit Test Principles: DigitalLogic classes:• Combinational: outputs follow inputs• Sequential: input dependent, also data flow,memory allocation• Dynamic: requires refresh/”keep alive” 104
  • 105. Circuit Test Principles: Digital Fault types: • SA0, SA1 • Stuck at input • “At speed” • Pattern sensitive • Etc. 105
  • 106. Manual Test Equipment• Basic instruments – DMMs, power meters, ...• Instruments – oscilloscopes, waveform generators, spectrum analysers, logic analysers, ...• Special instruments – RF testers, optical signal testers, hi volt, ...• PC - based 106
  • 107. Automatic Test Equipment (ATE)• Vision: Automatic optical inspection (AOI), X-ray(AXI)• Manufacturing defects analyser (MDA)• In-circuit test (ICT)• Fixtureless/flying probe• Functional test (FT) (via circuit connectors)• Combined ICT/FT• Special test (RF, power supplies, manual, “hotrig”..) 107
  • 108. Test CapabilityATE must:• Confirm correct operation of good circuits• Not classify good as faulty• Detect faulty items• Diagnose fault causes 108
  • 109. Design for Test (DFT)Design must allow ATE to:• Initialize (start clocks, set logic states)• Control (e.g. open feedback loops, force logic, generate inputs)• Observe (access to important nodes)• Partition (reduce test program complexity) 109
  • 110. Layout for ICT• Keep PCB edges clear• Location holes• Large components on top (for double sided PCBs)• Resistors between power lines and control signals (resets, enables, tristates)• Clock disable (provide link) 110
  • 111. Built-in Test (BIT)• Boundary scan (IEEE 1149.1)• ASICs• Logic and function tests• Complexity, false alarms 111
  • 112. EMI/EMC TestMust test for:• Radiated emissions• Conducted emissions (power lines, signal lines)• Compatibility (susceptibility) (radiated, power,signals)• Internal problems• Special situations (rail signalling, avionics,lightning, nuclear (NEMP, etc.)Standards and regulations 112
  • 113. Test Control and Data Acquisition (DAQ)Test databus standards:• General purpose interface bus (GPIB) (IEEE488)• PC interface bus (PCI), PCI extensions for instruments (PXI)• VLSI extensions for instruments (VXI) 113
  • 114. IC Test• Special/expensive ATE• Test cost ≅ IC manufacture cost!• IDDQ test• BIST• Standard tests (MIL-STD-883, etc.)• Rely on IC manufacturer’s tests 114
  • 115. IDDQ Test Good device IDDQ 0.3 (mA) Defective device (at states 2,3,10, ...) 0.2 0.1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Etc. Node stateFigure 8.11 IDDQ plot 115
  • 116. Standards, References, Software• MIL-STD-2165 (USA)• DEF STAN 00-13 (UK)• ‘Design for Testability’ - Jon Turino• ‘Testability Advisor’ - Logical Solutions Inc. 116
  • 117. Software Reliability• All new systems involved (operating & test)• Cannot predict failure modes and effects• Cannot test complete system*• Errors are present in all copies*• S/W - H/W interfaces (keyboards, sensors, devices, emi)*Compare VLSI hardware 117
  • 118. Hardware/SoftwareReliability Differences (1)1. Failures can be caused by 1. Failures are primarily due to design deficiencies in design, production, faults. use and maintenance.2. Failures can be due to wear or other 2. There are no wearout phenomena. energy-related phenomena. Software failures occur without warning,3. No two items are identical. Failures 3. There is no variation: all copies of a can be caused by variation. program are identical.4. Repairs can be made to make 4. There is no repair. The only solution equipment more reliable. is redesign (reprogramming5. Reliability may be time-related, with 5. Reliability is not time related. Failures failures occurring as a function of occur when a specific program step operating (or storage) time, cycles, or path is executed or a specific etc. input condition is encountered, which triggers a failure.6. Reliability may be related to 6. The external environment does not environmental factors affect reliability except insofar as it (temperature, vibration, humidity, might affect program inputs. etc.7. Reliability can be predicted, in 7. Reliability cannot be predicted from any physical bases, since it entirely principle but mostly with large depends on human factors in uncertainty, from knowledge of design. design, parts, usage, and environmental stress factors. 118
  • 119. Hardware/Software Reliability Differences (2)8. Reliability can be improved by 8. Reliability cannot be improved by redundancy. since if one path fails, redundancy if the parallel paths are the other will have the error. identical.9. Failures can occur in components of 9. Failures are rarely predictable from a system in a pattern that is, to analyses of separate statements. some extent, predictable from the Errors are likely to exist randomly stresses on the components and throughout the program, and any other factors. Reliability critical lists statement may be in error. are useful to identify high risk items. Reliability critical lists are not appropriate.10. Hardware interfaces are visual; one 10. Software interfaces are conceptual can see a 10-pin connector. rather than visual.11. Computer-aided design systems 11. There are no computerised methods exist that can be used to create and analyse designs. for software design and analysis.12. Hardware products use standard 12. There are no standard parts in components as basic building software, although there are blocks. standardised logic structures. Software reuse is being deployed, but on a limited basis. 119
  • 120. Software in Engineering• “Real time”• Wide range of interfaces (hardware, human, timing, ...)• Different levels of embedding (ASICs, PGAs, BIOS, ...)• Hardware/software options for functions• Electrically “noisy” environments• Usually smaller 120
  • 121. Software ReliabilityERROR FAULT FAILURESources of error:• Specification (60%)• Design (20%)• Code(20%) (typo, numerical, omissions, etc.)• Timing/emi• Data (information) integrity 121
  • 122. Error Reduction• Modular design• Error traps• Remarks• Spec & code review• Test 122
  • 123. Fault Tolerance• Internal tests (rates of change, cycle times, logic)• Resets, fault indications• Redundancy, voting• Hardware failure protection 123
  • 124. Languages• Machine code/microcode• Assembly level/symbolic assemblers – Both processor specific – Faster, less memory – Difficult, error prone• High level (HLL) (BASIC, Fortran, *Pascal, *Ada, *C, *C++) – Processor independent – Easier, error protection* – Assemblers, compilers• Programmable logic controllers (PLCs)• Assemblers, compilers 124
  • 125. Software Testing (1)• Total paths = 2n (n = branches + loops)• Test specs – All requirements (“must do”, “must not do”) – Extreme conditions (timing, parameter values, rates of change, memory utilisation, ...) – Input sequences – Fault tolerance/error recovery 125
  • 126. Software Testing (2)• Module & interface tests (“white box”) – Data /control flow – Memory allocation – Lookups – Etc.• System tests – Verification – Validation (“black box”) 126
  • 127. Documentation• Specifications• Code, remarks• Notebooks• Changes, corrections• Test results: – Version – Test – Faults 127
  • 128. Software Reliability Prediction and Measurement • Methods: – Error/bug count – Time-based (hours, days, CPU seconds) • “Cleanroom” approach (IBM) • Do not use! 128
  • 129. Test in ManufactureManufactured items are either:1. Good2. Defective, but detected and fixed or scrapped3. Defective, but shipped, and might/will fail later We must inspect/test to discriminate 129
  • 130. Manufacturing Test Principles (1)• All testing costs. So minimise (ideal = zero)• But: – Manufacturing processes generate variation & defects – Later costs of variation & defects can exceed costs of detection & correction/removal• So: – Must consider total life cycle (manufacturing, use, ...) Value-added testing 130
  • 131. Manufacturing Test Principles (2)Test cost justification is difficult, because:• Test costs arise in manufacture; failure costsarise later•Failure occurrences and costs cannot bepredicted Some testing might be obligatory: calibration, EMI/EMC, safety, etc. 131
  • 132. Test CapabilityTests must:• Identify good items• Detect defects (parts, processes, suppliers, ...)• Indicate defect source/location 132
  • 133. Test Pass - Fail Logic Y Y Pass? Next Test OK? test N N Y Diagnose, Detect? repair N 133Figure 10.6 Test pass-fail logic
  • 134. Test Criteria and Stresses• Manufacturing tests are not tests of the design• Manufacturing tests must not damage good items (contrast with development) 134
  • 135. Manufacturing Test EconomicsAspects to consider:• Cost of test(s) (setup, run, repairs, ...)• Defects that might be generated upstream• Test capability• Alternatives to test (inspection, ...)• Methods to reduce/prevent defects• Downstream costs of undetected defects• 100% or sample test? 135
  • 136. Manufacturing Test EconomicsExamples:• Screw• Integrated circuit• Automotive gearbox• Car• Spacecraft• Electronics assembly 136
  • 137. Inspection and MeasurementInspection:• Visual (manual, automatic)Measurement:• Dimensional (metrology) – Micrometers, CMMs, ...• Parameters – mech. (strength, torque, ...) – elec. (instruments, ATE, ...) (Module 8) Inspection, measurement, test: not absolute definitions 137
  • 138. Stress ScreeningDefinition: application of stresses to cause defectiveitems to fail/show without damaging good onesAlternative terms: • Environmental stress screening (ESS) • Burn-in (electronic components & systems) • STRIFE test • etc.Guidelines, etc: • US NAVMAT P-9492 • US MIL-STD-2164 • IEST ESSEH Guidelines 138
  • 139. Highly Accelerated Stress Screening (HASS)• Highly accelerated stresses (temp., vib., elec., ...)• Developed via HALT in development testing• Stresses are not extrapolations of service conditions• Can be applied only to products that have been subjected to HALT in development 139
  • 140. HASS Philosophy (1) Precipitation screen Detection screenLower Lower Upper Upperdestruct operating Product operating destruct limit limit spec. limit limit Stress (combined) 140
  • 141. HALT/HASS Philosophy (2)Stress(S) HALT/ HASS ESS in use Cycles to fail (Log N) 141
  • 142. HASS Philosophy (3)• Proof (safety) of screen (POS)• HASA (audit): sample v. 100%• Review/adapt (e.g. repeat POS)• Can apply to any technology (elec., mech.)• Keep flexible (no standard procedures) 142
  • 143. Electronics Manufacturing FaultsIn rough order:• Solder problems (permanent/intermittent o/c or s/c, weak, ...)• Parts missing/wrong place/wrong value• Part parameters/functions• Damage (physical, ESD, ...)• System/assembly level (cables/connectors, variation, EMI/EMC, ...) In 1970’s list could have been reversed! 143
  • 144. Electronics Test Options/Economics Board test: CM CF CΙ CA pass MDA pass ICT / pass Assemb le AOI Ship FT fail di fail fail dm df Diag nose/ repair CR C = cost d = pr opor tion failed Figure 10.3 Electr onics assembly t est flow example 144
  • 145. Electronics TestOptions/EconomicsA simple model for the manufacturing and test cost perunit is:C = CA + CI + CM + CF + (CR + CM + CF ) (dI + dm + df )If, for example, CA = $200 CI = $10 CM = $10 CF = $20 CR = $50 dI = dm = df = 0.05then the total cost per unit would be $252 145
  • 146. Fault Proportions & Coverage Coverage % Fault faults % AOI AXI MDA/ ICT FT HASSOpen circuit 25 40 95 85 95 *Insufficient solder 18 40 80 0 0 20-80Short circuit 13 60 99 99 95 *Component missing 12 90 99 85 85 *Component misaligned 8 80 80 50 0 0Component elec. para error 8 0 0 20/80 80 *Wrong component 5 15 10 80 90 *Other non-electrical 4 80 0 0 0 20-80Excess solder 3 90 90 0 0 0Component reversed 2 90 90 80 90 * 146
  • 147. Assembly TestBoard 1Board 2 Test TestBackplane PSU Keypad Display 147
  • 148. Electronic Assembly Burn- In (ESS) • Typically -30ºC to 70ºC, 5 cycles • Power on (monitor) • (Vibrate) • Finds production defects – Solder – Damage • Not effective against component defects (low temp, low stress) 148
  • 149. Integrating Stress Screening• Integrate with functional test (FT)• Before/after AOI/ICT?• Assembly stages?: – Board – Intermediate – Final• Re-screen after repair? YES No fixed rules! 149
  • 150. Post-Production Economics• TE Page 183 150
  • 151. Electronic Component Test• All components tested by manufacturers• Generally not practicable/economic for OEMs/CEMs to test (IC tester $5M!)• No repair possible• Special cases: – Power devices? – Etc? 151
  • 152. Electronic Component Population CategoriesFailureprobability Good population (zero failures) Infant “Freaks” mortality 10 100 1000 10000 Time (h) 152
  • 153. IC Test• MIL - STD - 883 (TE p. 186) – Level A, B, C screens – Burn-in (125°C, 168h) – Plastic/hermetic packages (autoclave test)• Other standards (CECC, IEC, ...) Don’t use! 153
  • 154. In-Service Test PhilosophyTest only:• If only way to determine correct function• To determine failure cause (diagnostic)• To confirm repair Optimise during development 154
  • 155. Test Schedules• Continuous (BIT, monitors, ...)• Time run (electronics, aircraft, engines, ...)• Distance travelled (cars, trains, ...)• Operating cycles (electronics, aircraft engines, ...)• Calendar (calibration, seasonal, ...) Must be measured Intervals, tolerances 155
  • 156. Examples• TE pages 191-193 156
  • 157. Built-in (Self) Test (BIT/BIST)• Apply only to functions that are not observed• Keep it simple! – Sensors etc. fail – False alarms• Implement in software (no weight, power, complexity) 157
  • 158. “No Fault Found” (NFF)Causes:• Intermittent failures (components, connections, ...)• Tolerance effects• Connectors• BIT false alarms• Incorrect diagnosis/repair• Inconsistent test criteria• People• Ambiguous cause: >1 suspect unit changed(Also “retest OK” (RTOK), etc.) 158 50% - 80% of repairs!
  • 159. RCM Objectives• Optimises preventive maintenance (PM)• Balances cost, availability, reliability, safety 159
  • 160. Maintenance Categories (1)Corrective (CM):• Failure repair• Unplanned• Expensive/unsafeMinimise by high reliability and durability, + effective PM 160
  • 161. Maintenance Categories (2)Preventive (PM):• Failure Prevention• Planned• Less Expensive/Safe Optimise by RCM 161
  • 162. RCM Decision Logic (1)Failure Pattern:• Increasing (wearout)? Consider replacement – Failure-free life (light bulbs/tubes, drive belts, bearings, ...)• Decreasing/constant? No replacement (electronics, ...) 162
  • 163. RCM Replacement IntervalsHazard Rate (1) Decreasing hazard rate: scheduled replacement increases failure probability m 2m 3m Time Hazard Rate Constant hazard rate: scheduled replacement has no effect on failure probability m 2m 3m Time 163
  • 164. RCM Replacement IntervalsHazard Rate (2) Increasing hazard rate: scheduled replacement reduces failure probability m 2m 3m TimeHazard Rate Increasing hazard rate: with failure-free life >m: scheduled replacement makes failure probability = 0 m 2m 3m Time 164
  • 165. RCM Decision Logic (2)Failure Effect (FMECA):• Critical? Consider replacement / PM• Detectable? Consider PM (eg. fatigue) 165
  • 166. RCM Decision Logic (3)Failure Cost:• High? Consider replacement (gearboxes, engines, ...)• Low? Consider replacement on failure (light bulbs/tubes, hydraulic hoses (?), ...) 166
  • 167. RCM Decision Logic (4) FR No NoIncreasing? Replacement Yes FE No Failure No Replace On Critical? Cost Failure High? Yes Yes Failure YesDetectable? PM No ScheduledReplacement 167
  • 168. (Incipient) Failure Detection MethodsMechanical:• Manual (corrosion, wear, condition, ...)• NDT for fatigue (ultrasonic, dye penetrant, radiographic, ...)• Oil analysis (spectroscopic, magnetic)• Vibration/acousticElectrical/Electronic:• Built-in test• Functional test/calibration 168
  • 169. Stress Screens for Repairs• Proves repair effectiveness• Reduces NFF• Use HASS if units subjected to HALT/HASS 169
  • 170. Calibration• Regular test to ensure accuracy – Measuring devices – Instruments – Sensors• Traceability• Accuracy (ISO5725)• Management, records, labels 170
  • 171. Organisation and ResponsibilitiesTest Department:• Provide facilities (strategic, tactical)• Knowledge (methods, requirements, regulations, standards, ...)• External facilities (contracts, hire, ...)• Maintenance and calibration• Training 171
  • 172. Organisation and ResponsibilitiesProjects:• Create and manage team• Plan and manage testing• Liaison with Test Department• Identify/obtain project-specific requirements 172
  • 173. Organisation and ResponsibilitiesDesign:• Design product• Design processes (manufacture, test, maintenance)• Integrate design analysis & development test• Design review (specification, pre-test, pre- production) 173
  • 174. Test ProceduresInclude:• Organisation and responsibilities• Methods (design analysis, test)• Test planning and action• Failure reporting (FRACAS)• Project/design reviews• Integration (development, production, maintenance test)• Test equipment maintenance & calibration• In-service maintenance & calibration 174
  • 175. Development Test ProgrammeWhat/when to test?• Components, modules, system• Component test: – earlier – more/cheaper – higher stresses – selection• External suppliers’ products• Output module(s) first 175
  • 176. Development Test ProgrammeHow many to test?• As many as practicable (components/modules/systems)• Consider design analyses, risks, time, costs• Rotate items through tests (e.g. Software, proving, environmental, ...) Ever heard of too much testing? 176
  • 177. Testing Purchased ItemsBase testing on:• Project requirements• Existing knowledge – supplier’s data – past use• Application/risks/novelty/costs ...• Supplier’s test programme/results Integrate! Retain Repeat 177
  • 178. In-House v. External FacilitiesIn-house: External:• Core technologies • Lower capital outlay (?) /confidentiality• Designers more • Better facilities involved /expertise (?)• More flexible (?)• Cheaper (?) Consider balanced use of both TE homepage (/testservices.htm) 178
  • 179. Project Test Plan (1)Include:• Requirements (performance, reliability, standards, ...)• Failures that must/should not occur• Design/design analysis inputs (design review)• Tests to be performed• Test items/allocations• Suppliers’ test requirements• Integration through project phases• Responsibilities (primary, support)• Schedules 179
  • 180. Project Test Plan (2)• Single test plan• Link to other project plans – reliability – safety – quality, ...• Link/refer to procedures, standards, ... Flowchart: TE Fig. 14.1 (p. 241) Example: Appendix 3 180
  • 181. Manufacturing Test Plan• Develop from development test results• HALT/HASS Flowchart: TE Fig. 14.2 (p. 242) Example: Appendix 4 181
  • 182. Management Issues• Training – degree courses – short courses – on-the-job (HALT/HASS)• Integration – across functions – through phases• Economics – Long v. short term – Test adds valueThe Practice of Engineering Management, P.D.T. O’Connor (Wiley) 182
  • 183. The Future of Test• Virtual test – EDA, FEA, CFD, ... – Simulation – Virtual reality• “Intelligent” CAE – Integrated physics, variation, ergonomics, ... – automatic design• Internet• Test hardware (BIT, “Sentient™”, ...)• Computer-based test• Teaching (?) 183