• Like
  • Save
An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices.
Upcoming SlideShare
Loading in...5
×
 

An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices.

on

  • 317 views

An extended notation of FTA for risk assessment of software-intensive medical devices ...

An extended notation of FTA for risk assessment of software-intensive medical devices
Yoshio Sakai, Seiko Shirasaka and Yasuharu Nishi
It is difficult to assess the risk of software-intensive medical devices. An extended notation of FTA recognizes the risk class before and after the risk control measure and the software in the system affects the top event of FTA.

You can see this content as 6-pages paper from IEEE Website.

Statistics

Views

Total Views
317
Views on SlideShare
317
Embed Views
0

Actions

Likes
0
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices. An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices. Presentation Transcript

    • An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices. - Recognition of The Risk Class Before and After The Risk Control Measure - Yoshio SAKAI Engineering Promotion Center, NIHON KOHDEN CORPORATION Seiko SHIRASAKA The Graduate School of System Design and Management, KEIO University Yasuharu NISHI Department of Systems Engineering, The University of Electro-Communications
    • Flow of the Presentation Lack of consideration of the Software Failure Intensive-Software 2. Risk Assessment Method in ISO 14971 Sequence of Events 1. Traditional FTA 3. An Extended Notation of FTA Hazard Exposure (P1) Hazardous Situation P2 Harm Severity of the Harm Probability of Occurrence of Harm Risk P1 × P2 OLD 1. 2. 3. OLD NEW Explanation of the traditional FTA which lack consideration of the software. Explanation of the risk assessment method in ISO 14971 which lack consideration of the software. Explanation of solutions using an extended notation of FTA. Yoshio_Sakai@mb2.nkc.co.jp 2 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The History of FTA (Fault Tree Analysis) NOW 1965 1962 The FTA is used widely. As for the FTA, completeness was raised by BOEING. Fault Tree Analysis (FTA) was originally developed for Minuteman Missile in 1962 at Bell Laboratories by H.A. Watson. At that time, FTA was designed because the electronic system was not able to endure vibration and caused it to break down. The cause of the trouble was the hardware failure, not software. Yoshio_Sakai@mb2.nkc.co.jp 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The traditional FTA which lacks consideration of the software. • When FTA was developed, the failure caused by the software was not an element of the failures of FTA. • The traditional FTA is not comprehensible about – The effectiveness before and after the risk control measure. – The software in the system and the risk control measure affects the top event. • The calculation of the failure rate on FTA can not use for the failure caused by the software. ○ × Yoshio_Sakai@mb2.nkc.co.jp HARDWARE SOFTWARE 4 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The Traditional Risk Assessment Method The example is the boiled water with an electric kettle. 1. The hot water as the thermal energy 2. A cover opens and spills hot water 3. Getting burned Fig. 3. ISO 14971 P1 is the probability of a hazardous situation occurring. P2 is the probability of a hazardous situation leading to harm. Yoshio_Sakai@mb2.nkc.co.jp 5 Software ? 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The Estimation of the probability of a hazardous situation HARDWARE USABILITY Failure Rate of Random Hardware Failure The likelihood of the usability failure HIGH Frequent Probable Occasional Remote Improbable LOW Likelihood: SOURCE IEC 80001-2-1 Step by Step SOFTWARE •Software is Invisible. •The failure caused by the software occurs systematically, but not statistically. We can not estimate the probability or the likelihood of the failure cased by Software. Yoshio_Sakai@mb2.nkc.co.jp 6 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Feature of Systematic Failure Systematic failure is unwanted behaviour which is • repeatable – If the conditions can be exactly replicated • predictable (but not accurately) – all systems have flaws • indefensible – it should not occur... … but it is extremely hard to prevent Yoshio_Sakai@mb2.nkc.co.jp 7 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The definition and explanation of Systematic Failure Systematic Failure failure, related in a deterministic way to a certain cause, that can only be eliminated by a change of the design or of the manufacturing process, operational procedures, documentation or other relevant factors SOURCE: ISO 26262-1:2011 This International Standard NOTE4 : • sets requirements for the avoidance and control of systematic faults, which are based on experience and judgment from practical experience gained in industry. Even though the probability of occurrence of systematic failures cannot in general be quantified the standard does, however, allow a claim to be made, for a specified safety function, that the target failure measure associated with the safety function can be considered to be achieved if all the requirements in the standard have been met; SOURCE: IEC 61508-3:2010 Yoshio_Sakai@mb2.nkc.co.jp 8 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Two types of evaluation of the hazard caused by Systematic Software Failure The probability of such failure shall be assumed to be 100 percent. (IEC 62304:2006) • The probability is 100%. • This 100 percent principle has been chosen for conservative purpose but not practical in real application. If the hazard could arise from a failure of the software, the risk evaluation should be analyzed by the following two concerns. (IEC 62304:2006 Amd.1 , This Study) • 1st concern is the risk level as the severity of the harm before the risk control measures. • 2nd concern is the risk level as the severity of the harm after the risk control measures. • The evaluation of the residual risk is of importance, but under the cause of the software, the probability of occurrence of harm before the risk control measures is not. Yoshio_Sakai@mb2.nkc.co.jp 9 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The procedure of evaluation of the hazard caused by Systematic Software Failure If the hazardous situation occurs by Systematic Software Failure RISK The safety is affected by • the hardware as the risk control measure and • the reliability of the critical software component. RISK CONTROL MEASURES After the risk control measures, we have to evaluate the residual risk for the safety. RESIDUAL RISK The probability of occurrence of harm caused by the software before the risk control measures is not necessary for the risk assessment. Yoshio_Sakai@mb2.nkc.co.jp 10 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Method of evaluating Systematic Failure Medical device Manufacturers can evaluate the residual risk class by the following combination after countermeasure. a. The severity of the residual risk b. The reliability of the software items that could contribute to a hazardous situation c. The safe architecture of the software system These are not elements of Yoshio_Sakai@mb2.nkc.co.jp the probability 11 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Relation between the risk control measures and Architecture. Complicated Software Items (Low cohesion and High coupling) Segregated Software Items (High cohesion and Low coupling) Layered Architecture (3 Layers: Presentation, Domain and Date Source) Result of having continuous addition (A real software system) Not Clear Yoshio_Sakai@mb2.nkc.co.jp Clear 12 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The Principles of Electrosurgical Knife The mode of cut or coagulation is switched by software. Mode Principles Cut For cutting, a continuous single frequency sine wave is often employed. Coagulation For coagulation, the average power is typically reduced below the threshold of cutting. Generally, the sine wave is turned on and off in a rapid succession. There are the serious hazardous situations in the software system. Yoshio_Sakai@mb2.nkc.co.jp 13 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Electrosurgical Knife Block Diagram The wave is controlled and switched by the software High Risk Software Component High Risk Software Component The most serious hazard is hemorrhage not intended by the abnormal output of Electrosurgical knife. Let’s see the fault tree analysis following slides. Yoshio_Sakai@mb2.nkc.co.jp 14 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Abnormal Output of Electrosurgical Knife Extended Notation of FTA (1) Class A(C)s = OR (A(C)s, A(C)s) Abnornal Output caused by Hardware Class A(C)s = AND (C, --Bs)) 1st column from the bottom and on the left side of FTA Example Unintended Output caused by Software Class A(C)s = AND (Cs, --Bs) Output Hardware Failure Abnormal Monitoring Failure Class Bs d. There are three hardware failures. Each failure is classified by the risk level. Three basic events are connected with OR gate. The highest risk class is adopted by the OR function. Risk Class High-frequency Wave Failure Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) a. b. c. Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B Definition (Source IEC 62304:2006) Class A No injury or damage to health is possible Class B Non-serious injury is possible Class C Death or serious injury is possible Yoshio_Sakai@mb2.nkc.co.jp 15 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Extended Notation of FTA (2) Abnormal Output of Electrosurgical Knife Class A(C)s = OR (A(C)s, A(C)s) Abnornal Output caused by Hardware Class A(C)s = AND (C, --Bs)) 2nd column from the bottom and on the left side of FTA Example Unintended Output caused by Software Class A(C)s = AND (Cs, --Bs) Output Hardware Failure a. b. c. d. The right basic event is an abnormal monitoring failure. This event is caused by the software. It is described with Class Bs as impact level of risk Class B and with “s” as the effect of the software. The abnormal monitoring inhibits and controls the output hardware failure. This is indicated by AND function as AND(C, --Bs). The stage of inhibit is shown by the number of the minus. In this case, the risk control measure goes down the risk level by two stages from C to A. Class A Class Bs Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) High-frequency Wave Failure Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B Class A(C) s = AND(C, --Bs) Class C -- Abnormal Monitoring Failure Risk Control Measure(Class Bs) Down the risk level by two stages Yoshio_Sakai@mb2.nkc.co.jp 16 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Extended Notation of FTA (3) Abnormal Output of Electrosurgical Knife Class A(C)s = OR (A(C)s, A(C)s) Abnornal Output caused by Hardware 1st column from the bottom and On the right side of FTA Example. a. The abnormal monitoring failure is caused by the software. b. Class A(C)s = AND (Cs, --Bs) Output Hardware Failure Abnormal Monitoring Failure Class Bs Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) High-frequency Wave Failure Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B If the basic event does not inhibit the other basic event, the highest risk class is adopted by the AND function. (This method is inspired by the notation of ASIL decomposition in ISO 26262-9) d. Class A(C)s = AND (C, --Bs)) The A/D convertor failure is caused by hardware. c. Unintended Output caused by Software The subscript “s” is inherited from the left side to the right side through the function as the affect of the software to the system. Yoshio_Sakai@mb2.nkc.co.jp 17 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Extended Notation of FTA (4) Abnormal Output of Electrosurgical Knife Class A(C)s = OR (A(C)s, A(C)s) 1st column from the top of FTA Example. Abnornal Output caused by Hardware Unintended Output caused by Software Class A(C)s = AND (C, --Bs)) Class A(C)s = AND (Cs, --Bs) Output Hardware Failure a. The highest risk class is adopted by the OR function. In this case, the risk classes are same. Abnormal Monitoring Failure Class Bs Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) High-frequency Wave Failure Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B b. The risk class of a top event is expressed after all as Class A (C) s. • The followings are recognized by this notation. – The risk class of the residual risk is A. – The highest risk class before the risk control measure is C. – The software affects the top event or the risk control measure in the system. Yoshio_Sakai@mb2.nkc.co.jp 18 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Effectiveness of this Notation These are the following effectiveness of this notation. • The safety analysts can recognize – the risk class before and after the risk control measure. – the software in the system and the risk control measure affects the top event. – the effect of the risk control by the minus mark in the AND function. • When there is the mark "s" of the event in the fault tree, the safety analysts find the start point of the effect of the software for the system safety. • When there is the mark "s" and the minus mark, the safety analysts can recognize the risk which is given by changing software of the risk control measure. Yoshio_Sakai@mb2.nkc.co.jp 19 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Effectiveness of this Notation There is the risk which is given by changing software of the risk control measure There is the risk which is given by changing software of the risk control measure Yoshio_Sakai@mb2.nkc.co.jp The start point of the effect of the software for the system safety 20 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Attention! • FTA is an excellent way to show the structure of the mechanism that Top Event as "undesired state of the system" is generated. • On the other hand, the calculation of the failure rate on FTA has a dangerous feature too. When Systematic Software Failure has not been recognized, the analysis of a radiation therapy machine named Therac-25 included the software in the fault trees but used a “generic failure rate” of 10-4 for software events. This number was justified based on the historical performance of the Therac-25 software.(This source is from SAFEWARE by Pf. Nancy Leveson) But now, we understand the features of the software well, and recognize it is not realistic. 1.The evaluation of the residual risk is of importance. 2.We can evaluate the severity of the harm before and after the risk control measures. Therefore, we should focus on the architecture of the software system and the structure of the risk control measures. Yoshio_Sakai@mb2.nkc.co.jp 21 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Thank you. I wish this notation will be used in the real development of Medical Devices. Yoshio_Sakai@mb2.nkc.co.jp 22 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • REFERENCES [1] Dolores R. Wallace, D. Richard Kuhn, “Failure Modes In Medical Device Software:An Analysis Of 15 Years Of Recall Data” , 2001 [2] S.Shirasaka, Y.Sakai, Y.Nishi, “Feature Analysis of Estimated Causes of Failures in Medical Device Software and Proposal of Effective Measures” , ISSRE 2011, [3] ISO 14971:2007 Medical devices - Application of risk management to medical devices [4] ISO 26262-1:2011 Road vehicles - Functional safety - Part 1: Vocabulary [5] IEC/TR 80001-2-1 Application of risk management for IT-networks incorporating medical devices – Part 2-1: Step-by-step risk management of medical IT-networks – practical applications and examples [6] IEC 62304:2006 Medical device software - Software life cycle processes [7] “Katerina Goseva-Popstojanova, Ahmed Hassan, Ajith Guedem, Walid Abdelmoez, Diaa Eldin M. Nassar, Hany Ammar, Ali Mili, “Architectural-Level Risk Analysis Using UML”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29 NO. 10 OCTOBER 2003 [8] Sherif M. Yacoub, Hany H. Ammar, “A Methodology for Architecture-Level Reliability Risk Analysis”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING VOL. 28 NO. 6 JUNE 2002 Yoshio_Sakai@mb2.nkc.co.jp 23 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Extra Information for this study Yoshio_Sakai@mb2.nkc.co.jp 24 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Therac-25 FTA System outputs the wrong energy • The probability for the computer to choose the wrong energy is 10-11 . • The probability for the computer to choose the wrong mode is 4×10-9 • I took off a safety device with the hardware for an economic reason. • Systematic Software Failure has not been recognized • This number was justified based on the historical performance of the Therac-25 software. PDP-11 VT100 Computer chooses the wrong energy 0.00000000001 The probability is 10-11 ? Yoshio_Sakai@mb2.nkc.co.jp Computer chooses the wrong mode 0.000000004 The probability is 4×10-9 ? 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • IEC 80001-2-1 Figure 8 Yoshio_Sakai@mb2.nkc.co.jp 26 Work Sheet Example of Hazard Analysis 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • New Hazard Analysis of the real medical devices. Probability should be replaced to Probability or Likelihood or NA(Software): Not Applicable. Probability should be replaced to Effect of Risk Control Measure (e.g. Major/Moderate/Minor) Add “Risk Control Measure Type of Concern” SOFTWARE, USABILITY, HARDWARE, CONBINATION of ・・・ If there is the combination of the hardware faults and the software errors, we should have separation of the concern which is Hardware or Usability or Software. Yoshio_Sakai@mb2.nkc.co.jp 27 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Separation of The Concern for the risk assessment 1st Concern SOFTWARE NA→The risk level The risk level before the risk control measures. The risk level after the risk control measures. 2nd Concern 3rd Concern USABILITY Probability likelihood Yoshio_Sakai@mb2.nkc.co.jp HARDWARE (Statistically) 28 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • IEC 80001-2-1 Table D.3 Usability <-> ○ Likelihood Software <-> × Likelihood If the hazardous situation occurred in the software, we can estimate the risk level as only the severity of the harm after the risk control measures. Yoshio_Sakai@mb2.nkc.co.jp 29 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Sequence of Events Change the method of the risk assessment! Hazard Exposure (P1) Hazardous Situation P2 Harm Medical Device System Requirements Analysis User Needs Intended Use Risk Assessment Hazard Hazardous Situation & Harm Risk Reduction Risk Control Measure Severity of the Harm Probability of Occurrence of Harm Risk P1 × P2 Software Architecture Hardware & Software We should focus on the architecture of the software system and the structure of the risk control measures. The important aspects Residual Risk Yoshio_Sakai@mb2.nkc.co.jp 30 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • IEC 62304:2006 Amd1 CD 4.3 Software safety classification This chart and our study are the same classify method. Yoshio_Sakai@mb2.nkc.co.jp 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • The Types of Safety Design Specific Optimization Fault Avoidance Total Optimization Contrasting Method Specific optimization as Fault Avoidance approach is not realistic for the largescale and complicated software system. Yoshio_Sakai@mb2.nkc.co.jp Architecture Fail Safe Fault Tolerance Error Proof (Fool Proof) Total optimization approach is reasonable for today’s medical device software. USER Usability 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • Safety Design Method Realization Technique Fault Avoidance High Coverage Testing Fail Safe Interlock Lockout Safeguard Fault Tolerance Space Tolerance Error Proof / Fool Proof Formal Method Easy Operation Home button Safety Label Yoshio_Sakai@mb2.nkc.co.jp Main Sub Time Tolerance 1st 2nd Information Tolerance Main Information Error Correction 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
    • ISO 26262-9 Figure 2 — ASIL decomposition schemes • If the basic event does not inhibit the other basic event, the highest risk class is adopted by the AND function. (This method is inspired by the notation of ASIL decomposition in ISO 26262-9) AND function without the element of the risk control as inhibit should select the maximum level of failures. Because it focus on the risk class before and after the risk control measures. Yoshio_Sakai@mb2.nkc.co.jp 34 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013