Colwell validation attitude

264 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
264
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Colwell validation attitude

  1. 1. The Validation Attitude Bob Colwell April 2010 1 1
  2. 2. Attitude Icould talk about techniques, tools, FV Environments, algorithms, machinery Languages, suites, training but I think attitude is more important than any of those 2 2
  3. 3. No Perfect Designs  Nothing is perfect, everything has bugs – Shortcomings, compromises, defects, design errata, gaffes, goofs, fumbles, errors, boneheaded mistakes, bobbles, bungles, boo-boos – But not all bugs are equal!  Can’t test to saturation: schedule matters too  Why is everything always so darned buggy? – Software…need say no more… – Why did Titanic not have waterproof compartments? – Why did Ford Pinto have gas tank in back? – Why did Challenger fly with leaky O-rings? – Why did torpedoes not explode in WWII? Entropy has a preferred direction Only genius could paint Mona Lisa, but any small child can destroy it quickly 1000 ways to do things wrong, 1 or 2 that work 4/4/07 Bob Colwell 3 3
  4. 4. Prescription: SW visualization, tools to localize bugs, diagnose problems, and instrument behavior 4 4
  5. 5. Accidents Are Inevitable – It's the nature of engineering to push designs to edge of failure (schedule, reliability, thermals, materials, tools, judgment of unknowns) – P(accident) = ε , for ε ≠ 0 – World rewards this behavior Cool new features + first to market often preferred to dependability Other markets (life-support) make (or should make) this trade-off differently! 4/4/07 Bob Colwell 5 5
  6. 6. Isn’t that just ? Close. But Murphy is not quite right. 1. #Near-misses >> #disasters 2. Competent design/test finds simple errors 3. Complex sequences & unlikely 4/4/07 event Colwell Bob cascades survive to prod’n 6 6
  7. 7. Failures Getting Worse  Mechanical things usually fail predictably due to physics – Wings bend, bridges groan, engines rattle, knees ache – By contrast, computer-based things fail “all over the place” Helpful Engineering Attitude: 1. Nature does not want your engineered system to work; will actively work against you 2. Your design will do only what you’ve constrained it to do, only as long as it has to 3. Watch out for… Normalization of deviance (Challenger O-rings, Apollo 1 fire) 4/4/07 Bob Colwell 7 7
  8. 8. The Steely-Eyed Missile Validator  Apollo 12  2nd try to land on moon, launched 11/14/69  36 seconds after liftoff, spacecraft struck by lightning => power surge r t ant o – All telemetry went haywire; book said to abort liftoff m os t imp T?” said 3 t w HA – Both spacecraft pilot and mission controller were furiously considering that option – But John Aaron was on shift, and thought he’d seen this malfunction beforeas T once “Wha  During testing 1 ac As imov e observed test that went off into weeds year earlier, Aaron ar e Isa to investigateien–c him to obscure SCE subsystem in sc this led ords – Aaron took it on himself w  In critical “abort or not” few seconds, with lives on line, Aaron made one of most famous calls in NASA history – “Flight, try SCE to ‘Aux’” – Neither Flight nor spacecraft pilot Conrad knew what that even meant, but Alan Bean tried it – Telemetry came right back, vaulted Aaron into validation stardom  He could have blown off earlier test, but he didn’t  His inner validator wanted to know “what just happened?” 8 8
  9. 9. Complexity Implies Surprises …and surprises are bad Chaos effects in complex µ P’s – Decomposability is a fundamental tenet of complex system design – Butterfly wings ruin decomposability – “Improve design, get slower performance” not at all uncommon We must stop designing large systems as though small ones simply scale up 9 – lesson from comm engineers: assume errors 9
  10. 10. Thinking about validation Abilityto think in analogies is highest form of intelligence – IQ tests like “a:b :: c:d” – Hofstadter's book: numerical sequences Analogies may illuminate a subject in a way that direct introspection cannot – They drive our minds to their creative limits 10 10
  11. 11. Listen to Your Inner Validator 0, 1, 2, …? You knew it wouldn’t be 3, didn’t you? – You sensed something’s not quite as it seems Answer: 0, 1, 2, 720!, … = 0, 1, 2, 6!! D. Hofstadter, Fluid Concepts and Creative Analogies = 0, 1!, 2!!, 3!!!, … That was the voice of your inner validator that you were hearing 11 11
  12. 12. Lesson: Trust Nothing Hyatt Regency hotel, Missouri, 1980 Catwalks on rods 40’ threaded rods with nuts halfway Killed 114, injured 200 12 12
  13. 13. What Happened?  Spec was marginal  40’ threaded rods “too hard”, changed to 2x20’ by contractor  No simulation, no test  Who goofed? Engineer, contractor, inspector…everyone 13 13
  14. 14. Therac-25 Medical particle accelerator Electrons, protons, X-rays Six fatalities from poor system/SW design – And blind naïve 14 faith in computers! 14
  15. 15. Question Everything Test assumptions as well as design – If assumptions are broken, design surely is too – Try to “catch the field goals” 15 15
  16. 16. Fight Urge to Relax Requirements Challenger – Not ok to slip design assumptions (launch temp, # of unburnt O-rings) to suit desires Airbus – Blaming pilot not reasonable explanation; pilot is part of system design Runway “incursions” up 71% since ‘93 – Near-misses are trying to tell us something Diane Vaughan, The Challenger Launch Decision, Chicago Press 1996; Nancy Leveson, Safeware, Addison-Wesley 1995 16 16
  17. 17. If You Didn’t Test It, It Doesn’t Work Mir: fire extinguishers bolted to wall – Still had strong metal launch straps – Had never been needed before, so never tested – Discovered with a roaring fire several feet away 17 17
  18. 18. Complexity Makes Everything Worse  Some things must be complicated to do their job – Our brains, for example  But complex sequences are root of most disasters – Challenger, Bhopal, Chernobyl, FDIV, Exxon Valdez  Where does complexity come from? Why does it keep increasing? Where are the limits? – Pentium 4  “in the small” vs “in the large” design (micros vs comm systems)  What to do? Vigilance, testing, awareness…we are all validators 4/4/07 Bob Colwell 18 18
  19. 19. What To Do  Get the spec right  Design for correctness but…  design knowing perfection is unattainable  Users are part of the system  Formal methods  Pre-production testing and validation  Post-production testing and verification  Education of the public 19 19
  20. 20. Roles Engineers must stand their ground – There are always doubts, incomplete data; don’t let ‘em use those against you Judgment is crucially needed -- YOURS –Remember the Challenger mgt HR engineer “My God, Thiokol, when do you want me to launch? Next April?” –Be careful with “data” “Risk assessment data is like a captured spy; if you torture it long enough, it will tell you anything you want to know…” (Wm. Ruckelshaus) –Crushing, conflicting demands are norm Design must push the envelope w/o ceding responsibility Validation establishes whether they've pushed it too far Management must beware overriding tech judgment Public must understand limits of human design process All players must value roles of others! 4/4/07 Bob Colwell 20 20
  21. 21. Roles cont. Management – wants to assume a product is safe – knows nothing’s ever perfect, comes a time to “shoot the engineers” or they’ll never stop tinkering Validators – want to prove a product is safe – assume it is not by default – only informed arbiters of when product is ready don’t fall for “might as well sign, we’re 21 21
  22. 22. Future Directions: Public Expectations Andy Grove’s FDIV epiphany Paradoxically, the more high tech, the more public expects of product Users caused Chernobyl, TMI by going “off book”, but prevented many other disasters with real-time creativity…lessons are subtle Takes exquisite understanding & judgment to discern accidents from reasonable risk-taking and bonehead errors or incompetence This is what a jury must do. How? Can’t keep trending this way 22 22
  23. 23. Future of Validation Multiple Culture Changes Needed Public needs to stop expecting perfection Design teams must explicitly limit complexity and avoid auto-scale-up assumptions Companies must mature past point of viewing validation as an unpleasant overhead does your company have “Validation Fellows?” Validation is a profession of its own. Cultivate the Validation Attitude! 23 23
  24. 24. The End 24 24

×