Concepts in Software Safety


Published on

An overview of software safety concepts and applicability to electronic medical records.

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Concepts in Software Safety

  1. 1. Principles of Software Safety Sanders, Ver 4 Copyright 2002 Guidelines For Clinical Information Systems And The Electronic Medical Record (EMR)
  2. 2. Overview <ul><li>Premise </li></ul><ul><li>Background </li></ul><ul><li>Brief history of software safety </li></ul><ul><ul><li>Examples of software safety incidents </li></ul></ul><ul><ul><li>Analysis concepts </li></ul></ul><ul><li>Characteristics of a “safe software” environment </li></ul><ul><li>Principles for clinical information systems and the Electronic Medical Record (EMR) </li></ul>
  3. 3. Premise <ul><li>Clinical Information Systems and Electronic Medical Records (EMR) are safety-critical, software-based systems </li></ul><ul><ul><li>Lives are at stake if the software is “unsafe” </li></ul></ul><ul><li>These systems should be designed, developed, and maintained in similar fashion to other safety-critical software systems </li></ul><ul><ul><li>Transportation, medical devices, nuclear power, weapon systems </li></ul></ul>
  4. 4. My Background: I’ve given this topic some thought… <ul><li>Air Force Nuclear Weapons Safety Program: 1989-1995 </li></ul><ul><ul><li>Peacekeeper and Minuteman Intercontinental Ballistic Missile (ICBM) launch and guidance control software safety analysis </li></ul></ul><ul><li>TRW Independent Research and Development (IR&D): 1992-1995 </li></ul><ul><ul><li>“ Assessing Software Safety and Risk in Commercially Developed Software” </li></ul></ul><ul><ul><li>“ Predicting Software Safety and Reliability Using Expert Systems” </li></ul></ul><ul><li>Army Tactical Nuclear Weapons Safety Office: 1993-1994 </li></ul><ul><ul><li>Pershing Missile Software Safety Assessment for Unauthorized Launch </li></ul></ul><ul><li>National Security Agency (NSA): 1993-1995 </li></ul><ul><ul><li>U.S. Nuclear Command and Control Risk Assessment </li></ul></ul><ul><li>Food and Drug Administration:1995 </li></ul><ul><ul><li>“ Applying 510k Principles to Computerized Patient Records” </li></ul></ul><ul><li>Intermountain Health Care: 1997-2004 </li></ul>
  5. 5. Characteristics of Safe Software <ul><li>Software is “safe” if… </li></ul><ul><ul><li>It has features and procedures which ensure that it performs predictably under normal and abnormal conditions </li></ul></ul><ul><ul><li>The likelihood of an undesirable event occurring in the execution of that software is minimized </li></ul></ul><ul><ul><li>If an undesirable event does occur, the consequences are controlled and contained </li></ul></ul>“ Software Safety and Reliability” --D. Herrman
  6. 6. Key Concepts <ul><li>Not all software should be developed and tested using the same methodology </li></ul><ul><ul><li>An EMR vs. A web page for posting minutes </li></ul></ul><ul><ul><li>Adjust your methodology according to your risks </li></ul></ul><ul><li>No software-based, safety-critical system is 100% “safe” </li></ul><ul><ul><li>The question is: How safe is safe enough? </li></ul></ul><ul><li>Software safety is a component of system safety </li></ul><ul><ul><li>Software safety must be evaluated within the context of the system it operates, including the humans that interact with that system </li></ul></ul>
  7. 7. Drivers of Software Safety History <ul><li>1960s </li></ul><ul><ul><li>Nuclear Intercontinental Ballistic Missile Program </li></ul></ul><ul><ul><li>Apollo Space Program </li></ul></ul><ul><li>1970s </li></ul><ul><ul><li>Space Transportation System (Space Shuttle) </li></ul></ul><ul><ul><li>Food and Drug Administration </li></ul></ul><ul><ul><li>Department of Transportation </li></ul></ul><ul><ul><li>Department of Energy </li></ul></ul><ul><li>1980s and 1990s </li></ul><ul><ul><li>Rapid increase in software dependence within all the above </li></ul></ul><ul><ul><li>Control of safety critical computer systems moves from hardware-based logic to software-based logic </li></ul></ul><ul><ul><li>Complexity in all of these environments increases </li></ul></ul>
  8. 8. A Few Notable Examples <ul><li>Patriot Missile system in Gulf War </li></ul><ul><ul><li>Abnormal condition: Continuous operation, no “reboot” </li></ul></ul><ul><li>Chrysler Automotive Jeep </li></ul><ul><ul><li>Sudden acceleration transitioning from Park to Drive </li></ul></ul><ul><li>Therac-25 Radiation Therapy </li></ul><ul><ul><li>Dose calculation algorithm error </li></ul></ul><ul><li>Washington D.C. Metro </li></ul><ul><ul><li>Central control system failure </li></ul></ul><ul><li>Radiology report failed to display </li></ul><ul><ul><li>Missed diagnosis, delayed treatment for cancer </li></ul></ul>
  9. 9. The Role of Risk Assessment <ul><li>Frequency </li></ul><ul><ul><li>How often is this bad event likely to occur? </li></ul></ul><ul><ul><ul><li>Probability of an event occurring during a given time frame </li></ul></ul></ul><ul><li>Consequence </li></ul><ul><ul><li>The business impact of that bad event </li></ul></ul><ul><ul><li>If possible, it should be measured in dollars </li></ul></ul><ul><ul><li>Not always possible </li></ul></ul><ul><ul><ul><li>Could be measured in lives, customers lost, etc. </li></ul></ul></ul><ul><li>Risk </li></ul><ul><ul><li>Ideally, expressed as dollars lost per unit of time </li></ul></ul>Risk = Frequency x Consequence
  10. 10. Risk Prevention: Software Safety Analysis <ul><li>Two basic types of safety analysis techniques </li></ul><ul><ul><li>Event based: “If we made a mistake in this software, could it lead to a patient safety incident? If so, how so and how severe?” </li></ul></ul><ul><ul><li>Consequence based: “What would happen if we failed to retrieve all the radiology reports for this patient?” </li></ul></ul>
  11. 11. Contributing Factors to Safety Risks *-- From the FAA *
  12. 12. Risk and Software Safety <ul><li>Frequency </li></ul><ul><ul><li>How often will we experience a Patient Safety related event that is attributable to a software error? </li></ul></ul><ul><ul><li>It a subset of your Software Defect Rate, assuming you are tracking the number of “bugs” found in your software over a given period of time </li></ul></ul><ul><li>Consequence </li></ul><ul><ul><li>The clinical impact of that software error </li></ul></ul><ul><ul><li>If possible, it should be measured in dollars </li></ul></ul><ul><ul><li>If not dollars, some other meaningful unit of consequence </li></ul></ul><ul><ul><ul><li>Lawsuits, readmits, LOS, sentinel events, etc. </li></ul></ul></ul><ul><li>Risk </li></ul><ul><ul><li>Ideally, expressed as dollars lost per unit of time </li></ul></ul>
  13. 13. Root Causes of Software Safety Violations <ul><li>Requirements specification and communication </li></ul><ul><ul><li>Single largest source of errors </li></ul></ul><ul><ul><li>Software executes “correctly” according to the understanding of the requirement, but the requirement was wrong within the scope of the system </li></ul></ul><ul><ul><li>The requirement was simply misunderstood by the programmer </li></ul></ul><ul><li>Design and coding errors </li></ul><ul><ul><li>Second most common source of errors </li></ul></ul><ul><ul><li>Poorly structured code </li></ul></ul><ul><ul><li>Timing errors, incorrect queries, syntax errors, algorithm errors, results display errors, lack of self-tests, failed error handling </li></ul></ul>
  14. 14. Root Causes of Software Safety Violations (cont) <ul><li>Computer hardware induced errors </li></ul><ul><ul><li>Not as common, but possible </li></ul></ul><ul><ul><li>Hardware logic errors caused by overheating, power transients, radiation, magnetic fields </li></ul></ul><ul><li>Software change control process </li></ul><ul><ul><li>Changes to software introduces unanticipated errors </li></ul></ul><ul><ul><li>Can be traced back to requirements and programming errors </li></ul></ul><ul><ul><li>Failure of the configuration control process </li></ul></ul><ul><li>Inadequate testing </li></ul><ul><ul><li>Software functions properly in unit testing </li></ul></ul><ul><ul><li>Software passes systems and integration testing, but should not because safety-critical test coverage is inadequate </li></ul></ul><ul><ul><li>Can be traced back to requirements and programming errors </li></ul></ul>
  15. 15. Specific Techniques for Software Safety Analysis <ul><li>All with their roots in hardware-based systems </li></ul><ul><ul><li>But they can be applied effectively to software </li></ul></ul><ul><li>Failure Modes Effects Analysis </li></ul><ul><ul><li>“ If this software fails to return the correct lab value, what is the impact?” </li></ul></ul><ul><li>Fault Tree Analysis </li></ul><ul><ul><li>“ What are all the events that could cause this software to incorrectly display lab values?” </li></ul></ul><ul><li>Fault Hazard Analysis </li></ul><ul><ul><li>Typically uses a Fault Tree Analysis </li></ul></ul><ul><ul><li>Also considers human factors and operational procedures </li></ul></ul><ul><li>Common Cause Analysis </li></ul><ul><ul><li>Looks across fault trees for common roots </li></ul></ul><ul><li>Sneak Circuit Analysis </li></ul><ul><ul><li>“ Stray” code that inhibits desired functions or causes undesired functions to occur </li></ul></ul>
  16. 16. General Software Safety Scenarios <ul><li>Software fails to perform a required function </li></ul><ul><ul><li>Function not executed or answer never returned </li></ul></ul><ul><li>Software performs a function that is not required or intended </li></ul><ul><ul><li>Wrong answer returned or issues wrong control instruction </li></ul></ul><ul><li>Software performs the right function, but at the wrong time or under inappropriate conditions </li></ul><ul><li>Software timing or sequencing failure </li></ul><ul><ul><li>Parallel executions fail </li></ul></ul><ul><ul><li>Synchronous or time-dependent executions fail </li></ul></ul><ul><li>Software fails to recognize a hazardous condition and react accordingly </li></ul><ul><ul><li>Or, the software recognizes the condition but reacts improperly </li></ul></ul>
  17. 17. Data Safety Areas <ul><li>Validity checks fail (or do not exist) before acting upon safety critical data </li></ul><ul><ul><li>Illegal or out of range parameters </li></ul></ul><ul><li>Failure during initializing, clearing, or resetting critical data </li></ul><ul><li>Validation failure of data addresses, pointers, indices, and variables </li></ul><ul><li>Incorrect relationships established between files and records </li></ul><ul><li>Detecting, handling, and/or correcting errors during data transfers </li></ul><ul><li>Protecting data from being deleted or inadvertently overwritten </li></ul>
  18. 18. Creating a “Safe Software” Environment What would an auditor from the FAA look for at Boeing?
  19. 19. Auditing for Software Safety <ul><li>Is at least a high-level risk assessment conducted for software safety during the requirements and design phase? </li></ul><ul><li>Is the software testing and quality assurance process risk adjusted? </li></ul><ul><li>Are the test and development environments adequate for identifying safety risks before they appear in production? </li></ul><ul><li>Is there an emphasis on human computer interfaces (HCI) and their relationship to safety risks? </li></ul><ul><li>Is there a well-documented Safety Event Response process when software safety defects are discovered? </li></ul><ul><li>Is there a robust root cause discovery and communication process after a safety event has occurred? </li></ul><ul><li>Is there a software safety defect reporting and tracking system? </li></ul><ul><li>Are there similar principles but different safety risk analysis processes for software developed internally vs. purchased? </li></ul>
  20. 20. More Audit Areas <ul><li>Is there an understanding and appreciation among the software development staff for safety risks? </li></ul><ul><li>Is there a clear nomenclature for characterizing software safety risk scenarios? </li></ul><ul><li>Is there a nomenclature for categorizing software defects based on safety risk? </li></ul><ul><li>Is there a software safety governance and oversight body? </li></ul><ul><li>Is there a well-documented software engineering process for safety critical applications? </li></ul><ul><li>Is independent validation and verification of software part of the development methodology? </li></ul><ul><li>Is safety critical software more tightly controlled for versioning and configuration? </li></ul><ul><li>Is there a certification program for software engineers that are allowed to develop and work on safety critical software? </li></ul>
  21. 21. Applying This To Clinical Information Systems and EMR’s
  22. 22. Relating Patient Safety To EMR Software Safety <ul><li>Step 1: Define the general categories of patient safety risk scenarios, regardless of cause </li></ul><ul><li>Step 2: Define the relationship between these general risk scenarios and the ability for the EMR or clinical information system to contribute to these scenarios </li></ul><ul><li>Step 3: Use a software testing and safety tracking system to measure against these risk scenarios </li></ul><ul><ul><li>“ This function or module of software could contribute to a Moderate patient safety risk scenario. We should design and test accordingly.” </li></ul></ul><ul><ul><li>“ This is a Severe software defect. It must be repaired immediately.” </li></ul></ul>
  23. 23. Potential Categories of Patient Safety Risk Scenarios Type 1: Catastrophic Patient life is in grave danger. The probability for humans to recognize and intervene to mitigate this event is very low or non-existent. Intervention is required within seconds to prevent the loss of life. Type 2: Severe Patient health is in immediate danger. The probability for humans to recognize and intervene to mitigate this event is low, but possible. Intervention is required within minutes to prevent serious injury or degradation of patient health that could lead to the loss of life. Type 3: Moderate Patient health is at risk. However, the probability for humans to recognize and intervene to mitigate this event is probable. Intervention is required within hours or a few days to prevent a moderate degradation in patient health. Type 4: Minor Patient health is minimally at risk. The probability for humans to recognize and intervene to mitigate this event is high. Corrective action should occur within days or weeks to avoid any degradation in patient health.
  24. 24. Specific EMR Safety Risk Scenarios <ul><li>Errors in computerized protocols and decision support tools </li></ul><ul><li>Invalid data posted to a patient record </li></ul><ul><li>Valid data that is accidentally deleted </li></ul><ul><li>Valid data that is not posted or not available </li></ul><ul><ul><li>Incomplete record </li></ul></ul><ul><li>Clinical data posted to the wrong patient record </li></ul><ul><ul><li>Right data, wrong patient </li></ul></ul><ul><li>Data that appears current and timely, but is not </li></ul>Their severity depends on the nature of the specific data or decision making context
  25. 25. When Developing and Testing… <ul><li>Software staff must ask: Does this software control or affect… </li></ul><ul><li>Computerized protocols and decision support tools? </li></ul><ul><li>Data that is posted to the EMR? </li></ul><ul><ul><li>Valid data that is not posted </li></ul></ul><ul><ul><ul><li>Incomplete record </li></ul></ul></ul><ul><ul><li>Clinical data posted to the wrong patient record </li></ul></ul><ul><ul><ul><li>Right data, wrong patient </li></ul></ul></ul><ul><ul><li>Timeliness of EMR data </li></ul></ul><ul><li>The deletion of EMR data? </li></ul><ul><li>The performance or availability of the overall EMR? </li></ul>If so, the rigor of the software engineering process must increase accordingly.
  26. 26. Software Control vs. Safety Risk Does my software control any of these? If so, what is the probability that a defect could cause one of these scenarios? High Risk = Rigorous Design and Testing Catastrophic Severe Moderate Minor Computerized protocols and decision support tools Creating or updating data to the EMR Deleting data from the EMR Performance or availability of the overall EMR
  27. 27. Most to Least Safety Critical? GroupWise Transfusion Management HELP HELP2 Mysis Not necessarily the same as “Business Criticality”… For purposes of illustration… Increasing Safety Criticality and Software Engineering Rigor
  28. 28. For Illustration, Again… Software safety processes don’t apply Information System Business Criticality Data Sensitivity Safety Criticality Accudose 1 1 1 AGFA 1 1 1 Amicus 1 1 1 AS/400 Financial 1 1 4 Audit Log 2 4 4
  29. 29. In Conclusion <ul><li>There is growing need for software safety awareness in clinical information systems and EMR’s </li></ul><ul><li>There are significant lessons learned from other industries </li></ul><ul><ul><li>We don’t have to reinvent the wheel </li></ul></ul><ul><li>To get started… </li></ul><ul><ul><li>Think like an FAA Software Safety Auditor </li></ul></ul><ul><ul><li>Think like a patient </li></ul></ul><ul><ul><li>Think like a physician </li></ul></ul>
  30. 30. Acknowledgements <ul><li>Commercial aviation </li></ul><ul><ul><li>RTCA/DO-178B, Software Considerations in Airborne Systems and Equipment Certification </li></ul></ul><ul><li>European Committee for Electrotechnical Standards </li></ul><ul><ul><li>EN 51028, Software for Railway Control and Protection Systems </li></ul></ul><ul><li>Society of Automotive Engineers </li></ul><ul><ul><li>JA 1002 Software Safety and Reliability Program Standard </li></ul></ul><ul><li>U.S. FDA Center for Devices and Radiological Health </li></ul><ul><ul><li>Premarket Notification Submissions (510k) </li></ul></ul><ul><ul><ul><li>“ General Principles of Software Validation” </li></ul></ul></ul><ul><li>U.S. FAA System Safety Handbook </li></ul><ul><ul><li>Appendix J: Software Safety </li></ul></ul><ul><li>“ Software Safety and Reliability” </li></ul><ul><ul><li>Debra S. Herrman </li></ul></ul><ul><li>“ Safeware: System Safety and Computers” </li></ul><ul><ul><li>Nancy G. Levison </li></ul></ul>
  31. 31. Thank You <ul><li>Please contact me if you have any questions </li></ul><ul><ul><li>Dale Sanders </li></ul></ul><ul><ul><li>Intermountain Health Care </li></ul></ul><ul><ul><li>801-408-2121 </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul>