Software Safety:
  An Oxymoron?
           March 29, 2007
Ken Wong, Ph.D., Senior Systems Analyst
  McKesson Medical Imagi...
Points to Ponder*
A system can be correct and reliable and yet
unsafe
Software safety is not about bugs
Program testing ca...
Outline
Introduction to Software Safety
Software: Meet System Safety
System Safety: Meet Software
Verifying Software Safety
Introduction to
Software Safety
Software In the Real World
Therac 25 accidents
Ariane 5 Flight 501 explosion
Titan 4 Centaur/Milstar failure
TCAS collisio...
Ariane 501
Ariane 501 Events
Destruction of Ariane 501 on 4 June 1996
(from final report):
  nominal behaviour of the launcher up to ...
Building Dependable Software …

                                           lity
                                       Qua...
Safety is a Distinct Property
Safety is a distinct part of the interlocking puzzle
of how to build dependable software
  A...
Safety is …


avoiding mishaps!
Software:
Meet System Safety
“Is it Safe”?




Christian Szell: Is it safe?
Babe: Yes, it's safe, it's very safe, it's so safe you wouldn't
believe it....
System Safety
“System Safety” is a systematic approach to
safety primarily developed in the US for the
aerospace and defen...
System Safety

Hazard ID


            Hazard
            Analysis

                          Risk
                       ...
Hazard
A hazard is the system’s potential contribution
to a mishap
  E.g., brake failure, engine overheating
Key is unders...
Hazards and Mishaps

hazard causes       hazard     mishap


                System




                 Environment
Ariane 501: SRI Bug?
Uncaught exception from floating point
conversion
  From high value of BH (Horizontal Bias)
  Program...
Safety is a System Property
SRI worked exactly as specified – for Ariane 4!
  Ariane 5 trajectory different from Ariane 4
...
When Software Met Safety
… there was a definite risk in assuming that critical
equipment such as the SRI had been validate...
System Safety:
Meet Software
In the beginning (or Europe) …*
Mechanical systems with well understood
designs
Hazards caused by component failure from
r...
Fault Tree Analysis
    Basic Event
                                               Steering Fails
   Intermediate
       E...
Is Software Another Component?
  What is the probability that the steering
  control software fails?
  If software is just...
Software Revealed
 Basic Event
                                             Steering Fails
 Intermediate
     Event

     ...
The Software Werewolf
Of all the monsters that fill the nightmares of our
folklore, none terrify more than werewolves, bec...
Ariane 501: Safety in Numbers?
In response to “fault”, the Primary SRI was
deliberately shutdown
  Attempt made to switch ...
Safety is an Emergent Property
Software safety is not about “faults”
  Many potential “faults” but not all created equal –...
When Safety Met Software
An underlying theme in the development of Ariane 5 is
the bias towards the mitigation of random f...
Verifying Software
      Safety
Software and Safety Process


Requirements                                Hazards
                     Hazard ID, Analysis...
Limits of Testing
Program testing can be used to show the presence of
bugs, but never to show their absence
  E. Dijkstra ...
Hazard-Driven Testing
Focus on hazard – force it to occur
Consider:
   Hazard risk (“risk-based testing”)
   Mishap scenar...
Summary and Conclusions
Safety is a distinct property
Safety is a system property
  Operational and development environmen...
Safety and Software:
 Happy Together?
References*
ARIANE 5 Flight 501 Failure Report by the
Inquiry Board, Paris, July 1996
Frederick P. Brooks, Jr., No Silver ...
Upcoming SlideShare
Loading in …5
×

Software Safety: An Oxymoron? - VanQ 2007

1,057 views

Published on

It is a well-known maxim that complexity is an essential property of software. In spite of that, software-implemented functionality has increased dramatically in almost all safety-critical sectors. This increasing reliance on software provides great challenges to traditional practice of system safety, which focuses on the management of system hazards in order to mitigate safety risk. Software safety has emerged as sub-discipline of system safety to help address these challenges. However, the marriage of software and system safety has been an uneasy one. This talk discusses some of the issues that arise when software meets safety. Why safety is a distinct property from quality, reliability and other ilities will be addressed. The impact of software on system safety will be discussed. Finally, the need for safety verification of software-intensive systems will be briefly touched upon.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,057
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Software Safety: An Oxymoron? - VanQ 2007

  1. 1. Software Safety: An Oxymoron? March 29, 2007 Ken Wong, Ph.D., Senior Systems Analyst McKesson Medical Imaging Group
  2. 2. Points to Ponder* A system can be correct and reliable and yet unsafe Software safety is not about bugs Program testing can be used to show the presence of bugs, but never to show their absence * We will return to these statements in the discussion
  3. 3. Outline Introduction to Software Safety Software: Meet System Safety System Safety: Meet Software Verifying Software Safety
  4. 4. Introduction to Software Safety
  5. 5. Software In the Real World Therac 25 accidents Ariane 5 Flight 501 explosion Titan 4 Centaur/Milstar failure TCAS collision near Uberlingen, Germany
  6. 6. Ariane 501
  7. 7. Ariane 501 Events Destruction of Ariane 501 on 4 June 1996 (from final report): nominal behaviour of the launcher up to H0 + 36 seconds; failure of the back-up Inertial Reference System (SRI) followed immediately by failure of the active SRI;
  8. 8. Building Dependable Software … lity Qua Safe Corr ectne ss ty Reli ab ility ity Se cu r
  9. 9. Safety is a Distinct Property Safety is a distinct part of the interlocking puzzle of how to build dependable software A system can be “correct” and “reliable” and yet unsafe! Improved software process alone does not mean a safer system Note: These can be a contentious claims even among safety engineers.
  10. 10. Safety is … avoiding mishaps!
  11. 11. Software: Meet System Safety
  12. 12. “Is it Safe”? Christian Szell: Is it safe? Babe: Yes, it's safe, it's very safe, it's so safe you wouldn't believe it. - Marathon Man 1976
  13. 13. System Safety “System Safety” is a systematic approach to safety primarily developed in the US for the aerospace and defense industries Spreading to other industries, e.g., health care Focus on managing system hazards E.g., FDA Quality System Regulation recommends “risk analysis” (A.K.A. hazard analysis)
  14. 14. System Safety Hazard ID Hazard Analysis Risk Assessment Hazard Mitigation Safety Verification
  15. 15. Hazard A hazard is the system’s potential contribution to a mishap E.g., brake failure, engine overheating Key is understanding the system environment
  16. 16. Hazards and Mishaps hazard causes hazard mishap System Environment
  17. 17. Ariane 501: SRI Bug? Uncaught exception from floating point conversion From high value of BH (Horizontal Bias) Programming 101! Conversion check deliberately removed for performance reasons SRI reused from Ariane 4 Check not required for Ariane 4 trajectory
  18. 18. Safety is a System Property SRI worked exactly as specified – for Ariane 4! Ariane 5 trajectory different from Ariane 4 SRI spec did NOT include Ariane 5 trajectory data SRI NOT tested with Ariane 5 trajectory data “Safety” cannot be understood without knowing the operational environment FDA “use-related” vs “device failure” hazards E.g., TCAS collision in Germany
  19. 19. When Software Met Safety … there was a definite risk in assuming that critical equipment such as the SRI had been validated by qualification on its own, or by previous use on Ariane 4. ARIANE 5 Flight 501 Failure Report
  20. 20. System Safety: Meet Software
  21. 21. In the beginning (or Europe) …* Mechanical systems with well understood designs Hazards caused by component failure from random hardware faults Mitigation through integrity and redundancy * Myth, but there is underlying truth in all good myths
  22. 22. Fault Tree Analysis Basic Event Steering Fails Intermediate Event OR Steering Assembly Fails Driver Error OR OR Steering Wheel Fails Drive Shaft Fails Steering Control Software Fails
  23. 23. Is Software Another Component? What is the probability that the steering control software fails? If software is just another component: 1. Software cannot wear out or breakdown like a mechanical component 2. Only “fault” is a programming bug 3. Assuming programmers do their job, failure rate should be zero* *Paraphrased from talk by a system safety engineer
  24. 24. Software Revealed Basic Event Steering Fails Intermediate Event OR Steering Assembly Fails Driver Error OR OR Steering Wheel Drive Shaft Fails Steering Control Software Fails
  25. 25. The Software Werewolf Of all the monsters that fill the nightmares of our folklore, none terrify more than werewolves, because they transform unexpectedly from the familiar into horrors … The familiar software project, at least as seen by the nontechnical manager, has something of this character … Frederick P. Brooks, Jr. from No Silver Bullet : Essence and Accidents of Software Engineering
  26. 26. Ariane 501: Safety in Numbers? In response to “fault”, the Primary SRI was deliberately shutdown Attempt made to switch to backup SRI Typical strategy in face of random failures However, BOTH SRIs shutdown! “Fault” due to same design in both SRIs Exception in non-essential component
  27. 27. Safety is an Emergent Property Software safety is not about “faults” Many potential “faults” but not all created equal – most have no impact on safety “Correct” behaviour can contribute to the hazard! Hazards can emerge from complex interactions between “correct” components
  28. 28. When Safety Met Software An underlying theme in the development of Ariane 5 is the bias towards the mitigation of random failure. Board wishes to point out that software is an expression of a highly detailed design and does not fail in the same sense as a mechanical system. ARIANE 5 Flight 501 Failure Report
  29. 29. Verifying Software Safety
  30. 30. Software and Safety Process Requirements Hazards Hazard ID, Analysis and Mitigation Design Safety Verification Verification Source Code
  31. 31. Limits of Testing Program testing can be used to show the presence of bugs, but never to show their absence E. Dijkstra in Structured Programming
  32. 32. Hazard-Driven Testing Focus on hazard – force it to occur Consider: Hazard risk (“risk-based testing”) Mishap scenarios Hazard causes identified during hazard analysis Problem reports/issues with safety implications See Jeffrey J. Joyce and Ken Wong, Hazard-driven Testing of Safety-Related Software
  33. 33. Summary and Conclusions Safety is a distinct property Safety is a system property Operational and development environment factors Safety is an emergent property Hazards can emerge from complex interactions between “correct” components
  34. 34. Safety and Software: Happy Together?
  35. 35. References* ARIANE 5 Flight 501 Failure Report by the Inquiry Board, Paris, July 1996 Frederick P. Brooks, Jr., No Silver Bullet : Essence and Accidents of Software Engineering, Computer Magazine, April 1987 Jeffrey J. Joyce and Ken Wong, Hazard-driven Testing of Safety-Related Software, 21st International System Safety Conference, Ottawa, Ontario, August 4-8, 2003 *All available on-line

×