An Extended Notation of FTA for Risk Assessment of
Software-intensive Medical Devices.
- Recognition of The Risk Class Before and After The Risk Control Measure -

Yoshio SAKAI
Engineering Promotion Center, NIHON KOHDEN CORPORATION
Seiko SHIRASAKA The Graduate School of System Design and Management, KEIO University
Yasuharu NISHI
Department of Systems Engineering, The University of Electro-Communications
Flow of the Presentation
Lack of consideration of the Software Failure Intensive-Software
2. Risk Assessment
Method in ISO 14971
Sequence of Events

1. Traditional FTA

3. An Extended
Notation of FTA

Hazard
Exposure (P1)

Hazardous
Situation

P2

Harm

Severity of
the Harm

Probability of
Occurrence
of Harm

Risk

P1 × P2

OLD
1.
2.
3.

OLD

NEW

Explanation of the traditional FTA which lack consideration of the software.
Explanation of the risk assessment method in ISO 14971 which lack consideration of
the software.
Explanation of solutions using an extended notation of FTA.

Yoshio_Sakai@mb2.nkc.co.jp

2

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The History of FTA (Fault Tree Analysis)

NOW

1965

1962

The FTA is used widely.

As for the FTA, completeness was raised by
BOEING.

Fault Tree Analysis (FTA) was originally developed for
Minuteman Missile in 1962 at Bell Laboratories by H.A. Watson.
At that time, FTA was designed because the electronic system
was not able to endure vibration and caused it to break down.

The cause of the trouble was the hardware failure,

not software.
Yoshio_Sakai@mb2.nkc.co.jp

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The traditional FTA which lacks consideration of
the software.
• When FTA was developed, the failure caused by the software was not
an element of the failures of FTA.
• The traditional FTA is not comprehensible about
– The effectiveness before and after the risk control measure.
– The software in the system and the risk control measure affects the top event.

• The calculation of the failure rate on FTA can not use for the failure
caused by the software.
○

×

Yoshio_Sakai@mb2.nkc.co.jp

HARDWARE

SOFTWARE
4

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Traditional Risk Assessment Method
The example is the boiled water with an electric kettle.

1. The hot water as the thermal energy

2. A cover opens and spills hot water

3. Getting burned

Fig. 3. ISO 14971

P1 is the probability of a hazardous situation occurring.
P2 is the probability of a hazardous situation leading to harm.
Yoshio_Sakai@mb2.nkc.co.jp

5

Software ?

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Estimation of the probability of a hazardous situation
HARDWARE

USABILITY

Failure Rate of Random Hardware Failure

The likelihood of the usability failure
HIGH
Frequent
Probable
Occasional
Remote
Improbable
LOW
Likelihood: SOURCE IEC 80001-2-1 Step by Step

SOFTWARE

•Software is Invisible.
•The failure caused by the
software occurs
systematically, but not
statistically.

We can not estimate the probability or
the likelihood of the failure cased by
Software.
Yoshio_Sakai@mb2.nkc.co.jp

6

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Feature of Systematic Failure
Systematic failure is unwanted behaviour which is
• repeatable
– If the conditions can be exactly replicated

• predictable (but not accurately)
– all systems have flaws

• indefensible
– it should not occur...
… but it is extremely hard to prevent

Yoshio_Sakai@mb2.nkc.co.jp

7

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The definition and explanation of Systematic Failure
Systematic Failure
failure, related in a deterministic way to a certain cause, that can only be eliminated by a
change of the design or of the manufacturing process, operational procedures,
documentation or other relevant factors
SOURCE: ISO 26262-1:2011
This International Standard NOTE4
:
• sets requirements for the avoidance and control of systematic faults, which are based
on experience and judgment from practical experience gained in industry. Even though

the probability of occurrence of systematic failures cannot in general be
quantified the standard does, however, allow a claim to be made, for a specified
safety function, that the target failure measure associated with the safety function can
be considered to be achieved if all the requirements in the standard have been met;
SOURCE: IEC 61508-3:2010
Yoshio_Sakai@mb2.nkc.co.jp

8

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Two types of evaluation of the hazard caused by
Systematic Software Failure
The probability of such failure shall be assumed to be 100 percent.
(IEC 62304:2006)
• The probability is 100%.
• This 100 percent principle has been chosen

for conservative purpose
but not practical in real application.

If the hazard could arise from a failure of the software, the risk evaluation should be
analyzed by the following two concerns. (IEC 62304:2006 Amd.1 , This Study)
• 1st concern is the risk level as the severity of the harm before the risk control measures.
• 2nd concern is the risk level as the severity of the harm after the risk control measures.
• The evaluation of the residual risk is of importance, but under the cause of the software, the
probability of occurrence of harm before the risk control measures is not.

Yoshio_Sakai@mb2.nkc.co.jp

9

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The procedure of evaluation

of the hazard caused by Systematic Software Failure
If the hazardous
situation occurs
by Systematic
Software Failure

RISK

The safety is affected by
• the hardware as the risk
control measure and
• the reliability of the
critical software
component.

RISK CONTROL MEASURES

After the risk
control measures,
we have to evaluate
the residual risk for
the safety.

RESIDUAL RISK

The probability of occurrence of harm caused by the software before the risk control
measures is not necessary for the risk assessment.
Yoshio_Sakai@mb2.nkc.co.jp

10

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Method of evaluating Systematic Failure
Medical device Manufacturers can evaluate the residual risk class by the
following combination after countermeasure.
a. The severity of the residual risk
b.

The reliability of the software
items that could contribute to a
hazardous situation

c. The safe architecture of the
software system

These are not elements of

Yoshio_Sakai@mb2.nkc.co.jp

the probability
11

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Relation between the risk control measures and Architecture.
Complicated Software Items
(Low cohesion and High coupling)

Segregated Software Items
(High cohesion and Low coupling)
Layered Architecture (3 Layers: Presentation,
Domain and Date Source)

Result of having continuous addition
(A real software system)

Not Clear
Yoshio_Sakai@mb2.nkc.co.jp

Clear
12

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Principles of Electrosurgical Knife

The mode of cut or coagulation is switched by software.
Mode

Principles

Cut

For cutting, a continuous single frequency sine wave is often
employed.

Coagulation

For coagulation, the average power is typically reduced below the
threshold of cutting. Generally, the sine wave is turned on and off in
a rapid succession.

There are the serious hazardous situations in the software system.
Yoshio_Sakai@mb2.nkc.co.jp

13

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Electrosurgical Knife Block Diagram

The wave is
controlled and
switched by
the software

High Risk
Software
Component

High Risk
Software
Component

The most serious
hazard is
hemorrhage not
intended by the
abnormal output
of Electrosurgical
knife.

Let’s see the fault tree analysis following slides.

Yoshio_Sakai@mb2.nkc.co.jp

14

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Abnormal Output of
Electrosurgical Knife

Extended Notation of FTA (1)

Class A(C)s = OR (A(C)s, A(C)s)

Abnornal Output caused
by Hardware

Class A(C)s = AND (C, --Bs))

1st column from the bottom and on
the left side of FTA Example

Unintended Output caused
by Software

Class A(C)s = AND (Cs, --Bs)

Output Hardware
Failure

Abnormal
Monitoring
Failure
Class Bs

d.

There are three hardware failures.
Each failure is classified by the risk level.
Three basic events are connected with OR
gate.
The highest risk class is adopted by the OR
function.

Risk Class

High-frequency
Wave Failure
Class C

Wave Circuit
Failure
Class C

Failure of the Abnormal
Detection

Class Cs

Class Bs = AND (Bs, B)

Class C = OR (C, C, B)

a.
b.
c.

Cut/Coag
Mode
Mismatch

Timer Failure
Class B

Abnormal
Monitoring
Failure
Class Bs

A/D
Convertor
Failure
Class B

Definition (Source IEC 62304:2006)

Class A

No injury or damage to health is possible

Class B

Non-serious injury is possible

Class C

Death or serious injury is possible

Yoshio_Sakai@mb2.nkc.co.jp

15

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (2)

Abnormal Output of
Electrosurgical Knife

Class A(C)s = OR (A(C)s, A(C)s)

Abnornal Output caused
by Hardware

Class A(C)s = AND (C, --Bs))

2nd column from the bottom and on the left side of
FTA Example

Unintended Output caused
by Software

Class A(C)s = AND (Cs, --Bs)

Output Hardware
Failure

a.
b.
c.
d.

The right basic event is an abnormal monitoring failure.
This event is caused by the software.
It is described with Class Bs as impact level of risk Class
B and with “s” as the effect of the software.
The abnormal monitoring inhibits and controls the output
hardware failure. This is indicated by AND function as
AND(C, --Bs). The stage of inhibit is shown by the
number of the minus. In this case, the risk control measure
goes down the risk level by two stages from C to A.

Class A

Class Bs

Class C

Wave Circuit
Failure
Class C

Failure of the Abnormal
Detection

Class Cs

Class Bs = AND (Bs, B)

Class C = OR (C, C, B)

High-frequency
Wave Failure

Cut/Coag
Mode
Mismatch

Timer Failure
Class B

Abnormal
Monitoring
Failure
Class Bs

A/D
Convertor
Failure
Class B

Class A(C) s = AND(C, --Bs)

Class C
--

Abnormal
Monitoring
Failure

Risk Control Measure(Class Bs)
Down the risk level by two stages

Yoshio_Sakai@mb2.nkc.co.jp

16

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (3)

Abnormal Output of
Electrosurgical Knife

Class A(C)s = OR (A(C)s, A(C)s)

Abnornal Output caused
by Hardware

1st column from the bottom and On
the right side of FTA Example.
a.

The abnormal monitoring failure is caused by the
software.

b.

Class A(C)s = AND (Cs, --Bs)

Output Hardware
Failure

Abnormal
Monitoring
Failure
Class Bs

Class C

Wave Circuit
Failure
Class C

Failure of the Abnormal
Detection

Class Cs

Class Bs = AND (Bs, B)

Class C = OR (C, C, B)

High-frequency
Wave Failure

Cut/Coag
Mode
Mismatch

Timer Failure
Class B

Abnormal
Monitoring
Failure
Class Bs

A/D
Convertor
Failure
Class B

If the basic event does not inhibit the other basic
event, the highest risk class is adopted by the AND
function. (This method is inspired by the notation of
ASIL decomposition in ISO 26262-9)

d.

Class A(C)s = AND (C, --Bs))

The A/D convertor failure is caused by hardware.

c.

Unintended Output caused
by Software

The subscript “s” is inherited from the left side to
the right side through the function as the affect of
the software to the system.

Yoshio_Sakai@mb2.nkc.co.jp

17

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (4)

Abnormal Output of
Electrosurgical Knife

Class A(C)s = OR (A(C)s, A(C)s)

1st column from the top of
FTA Example.

Abnornal Output caused
by Hardware

Unintended Output caused
by Software

Class A(C)s = AND (C, --Bs))

Class A(C)s = AND (Cs, --Bs)

Output Hardware
Failure

a. The highest risk class is adopted by
the OR function. In this case, the risk
classes are same.

Abnormal
Monitoring
Failure
Class Bs

Class C

Wave Circuit
Failure
Class C

Failure of the Abnormal
Detection

Class Cs

Class Bs = AND (Bs, B)

Class C = OR (C, C, B)

High-frequency
Wave Failure

Cut/Coag
Mode
Mismatch

Timer Failure
Class B

Abnormal
Monitoring
Failure
Class Bs

A/D
Convertor
Failure
Class B

b. The risk class of a top event is
expressed after all as Class A (C) s.
•

The followings are recognized by this
notation.
– The risk class of the residual risk is A.
– The highest risk class before the risk
control measure is C.
– The software affects the top event or the
risk control measure in the system.

Yoshio_Sakai@mb2.nkc.co.jp

18

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Effectiveness of this Notation
These are the following effectiveness of this notation.
• The safety analysts can recognize
– the risk class before and after the risk control measure.
– the software in the system and the risk control measure affects the top event.
– the effect of the risk control by the minus mark in the AND function.

• When there is the mark "s" of the event in the fault tree, the safety analysts find the
start point of the effect of the software for the system safety.
• When there is the mark "s" and the minus mark, the safety analysts can recognize the
risk which is given by changing software of the risk control measure.
Yoshio_Sakai@mb2.nkc.co.jp

19

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Effectiveness

of this Notation

There is the
risk which is
given by
changing
software of
the risk control
measure

There is the
risk which is
given by
changing
software of
the risk control
measure

Yoshio_Sakai@mb2.nkc.co.jp

The start point
of the effect of
the software
for the system
safety
20

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Attention!
• FTA is an excellent way to show the structure of the mechanism that
Top Event as "undesired state of the system" is generated.
• On the other hand, the calculation of the failure rate on FTA has a
dangerous feature too.
When Systematic Software Failure has not been recognized, the analysis of a
radiation therapy machine named Therac-25 included the software in the fault
trees but used a “generic failure rate” of 10-4 for software events.
This number was justified based on the historical performance of the Therac-25
software.(This source is from SAFEWARE by Pf. Nancy Leveson)
But now, we understand the features of the software well, and recognize it is not realistic.

1.The evaluation of the residual risk is of importance.
2.We can evaluate the severity of the harm before and after the risk control measures.
Therefore, we should focus on the architecture of the software system and the
structure of the risk control measures.
Yoshio_Sakai@mb2.nkc.co.jp

21

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Thank you.
I wish this notation will be used in the real development of Medical Devices.

Yoshio_Sakai@mb2.nkc.co.jp

22

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
REFERENCES
[1] Dolores R. Wallace, D. Richard Kuhn, “Failure Modes In Medical Device Software:An
Analysis Of 15 Years Of Recall Data” , 2001
[2] S.Shirasaka, Y.Sakai, Y.Nishi, “Feature Analysis of Estimated Causes of Failures in Medical
Device Software and Proposal of Effective Measures” , ISSRE 2011,
[3] ISO 14971:2007 Medical devices - Application of risk management to medical devices
[4] ISO 26262-1:2011 Road vehicles - Functional safety - Part 1: Vocabulary
[5] IEC/TR 80001-2-1 Application of risk management for IT-networks incorporating medical
devices – Part 2-1: Step-by-step risk management of medical IT-networks – practical
applications and examples
[6] IEC 62304:2006 Medical device software - Software life cycle processes
[7] “Katerina Goseva-Popstojanova, Ahmed Hassan, Ajith Guedem, Walid Abdelmoez, Diaa Eldin
M. Nassar, Hany Ammar, Ali Mili, “Architectural-Level Risk Analysis Using UML”, IEEE
TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29 NO. 10 OCTOBER 2003
[8] Sherif M. Yacoub, Hany H. Ammar, “A Methodology for Architecture-Level Reliability Risk
Analysis”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING VOL. 28 NO. 6 JUNE
2002
Yoshio_Sakai@mb2.nkc.co.jp

23

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extra Information for this study

Yoshio_Sakai@mb2.nkc.co.jp

24

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Therac-25 FTA

System outputs the
wrong energy

• The probability for the computer to
choose the wrong energy is 10-11 .
• The probability for the computer to
choose the wrong mode is 4×10-9
• I took off a safety device with the
hardware for an economic reason.
• Systematic Software Failure has not
been recognized
• This number was justified based on
the historical performance of the
Therac-25 software.

PDP-11
VT100

Computer
chooses the
wrong energy
0.00000000001

The probability is 10-11 ?
Yoshio_Sakai@mb2.nkc.co.jp

Computer
chooses the
wrong mode
0.000000004

The probability is 4×10-9 ?
24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
IEC 80001-2-1 Figure 8

Yoshio_Sakai@mb2.nkc.co.jp

26

Work Sheet Example of
Hazard Analysis

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
New Hazard Analysis of the real medical devices.
Probability should be
replaced to Probability or
Likelihood or
NA(Software): Not Applicable.

Probability should be
replaced to Effect of Risk
Control Measure (e.g.
Major/Moderate/Minor)

Add “Risk Control Measure
Type of Concern”
SOFTWARE, USABILITY,
HARDWARE, CONBINATION
of ・・・

If there is the combination
of the hardware faults
and the software errors,
we should have
separation of the concern
which is Hardware or
Usability or Software.
Yoshio_Sakai@mb2.nkc.co.jp

27

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Separation of The Concern for the risk assessment
1st Concern

SOFTWARE

NA→The risk level

The risk level before the risk control measures.
The risk level after the risk control measures.
2nd Concern

3rd Concern

USABILITY

Probability

likelihood

Yoshio_Sakai@mb2.nkc.co.jp

HARDWARE

(Statistically)

28

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
IEC 80001-2-1 Table D.3

Usability <-> ○ Likelihood
Software <-> × Likelihood

If the hazardous
situation occurred
in the software,
we can estimate
the risk level as
only the severity
of the harm after
the risk control
measures.

Yoshio_Sakai@mb2.nkc.co.jp

29

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Sequence of Events

Change the method of
the risk assessment!

Hazard
Exposure (P1)

Hazardous
Situation

P2

Harm

Medical Device System
Requirements
Analysis

User Needs

Intended Use

Risk
Assessment

Hazard

Hazardous
Situation & Harm

Risk Reduction

Risk Control
Measure

Severity of
the Harm

Probability of
Occurrence
of Harm

Risk

P1 × P2

Software
Architecture
Hardware & Software

We should focus on the
architecture of the software
system and the structure of
the risk control measures.

The important aspects

Residual Risk

Yoshio_Sakai@mb2.nkc.co.jp

30

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
IEC 62304:2006 Amd1 CD 4.3 Software safety classification
This chart and
our study are
the same
classify method.

Yoshio_Sakai@mb2.nkc.co.jp

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Types of Safety Design
Specific Optimization

Fault
Avoidance

Total Optimization

Contrasting
Method

Specific optimization as
Fault Avoidance approach
is not realistic for the largescale and complicated
software system.

Yoshio_Sakai@mb2.nkc.co.jp

Architecture

Fail Safe
Fault
Tolerance
Error Proof
(Fool Proof)

Total optimization
approach is
reasonable for
today’s medical
device software.

USER
Usability

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Safety Design
Method

Realization Technique

Fault
Avoidance

High
Coverage
Testing

Fail Safe

Interlock
Lockout
Safeguard

Fault
Tolerance

Space Tolerance

Error Proof /
Fool Proof

Formal
Method

Easy Operation
Home button
Safety Label

Yoshio_Sakai@mb2.nkc.co.jp

Main

Sub

Time Tolerance
1st

2nd

Information Tolerance
Main
Information

Error
Correction

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
ISO 26262-9 Figure 2 — ASIL decomposition schemes
• If the basic event
does not inhibit the
other basic event,
the highest risk class
is adopted by the
AND function. (This
method is inspired
by the notation of
ASIL decomposition
in ISO 26262-9)
AND function without the element of the risk control as inhibit should select the
maximum level of failures. Because it focus on the risk class before and after the risk
control measures.
Yoshio_Sakai@mb2.nkc.co.jp

34

24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013

An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices.

  • 1.
    An Extended Notationof FTA for Risk Assessment of Software-intensive Medical Devices. - Recognition of The Risk Class Before and After The Risk Control Measure - Yoshio SAKAI Engineering Promotion Center, NIHON KOHDEN CORPORATION Seiko SHIRASAKA The Graduate School of System Design and Management, KEIO University Yasuharu NISHI Department of Systems Engineering, The University of Electro-Communications
  • 2.
    Flow of thePresentation Lack of consideration of the Software Failure Intensive-Software 2. Risk Assessment Method in ISO 14971 Sequence of Events 1. Traditional FTA 3. An Extended Notation of FTA Hazard Exposure (P1) Hazardous Situation P2 Harm Severity of the Harm Probability of Occurrence of Harm Risk P1 × P2 OLD 1. 2. 3. OLD NEW Explanation of the traditional FTA which lack consideration of the software. Explanation of the risk assessment method in ISO 14971 which lack consideration of the software. Explanation of solutions using an extended notation of FTA. Yoshio_Sakai@mb2.nkc.co.jp 2 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 3.
    The History ofFTA (Fault Tree Analysis) NOW 1965 1962 The FTA is used widely. As for the FTA, completeness was raised by BOEING. Fault Tree Analysis (FTA) was originally developed for Minuteman Missile in 1962 at Bell Laboratories by H.A. Watson. At that time, FTA was designed because the electronic system was not able to endure vibration and caused it to break down. The cause of the trouble was the hardware failure, not software. Yoshio_Sakai@mb2.nkc.co.jp 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 4.
    The traditional FTAwhich lacks consideration of the software. • When FTA was developed, the failure caused by the software was not an element of the failures of FTA. • The traditional FTA is not comprehensible about – The effectiveness before and after the risk control measure. – The software in the system and the risk control measure affects the top event. • The calculation of the failure rate on FTA can not use for the failure caused by the software. ○ × Yoshio_Sakai@mb2.nkc.co.jp HARDWARE SOFTWARE 4 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 5.
    The Traditional RiskAssessment Method The example is the boiled water with an electric kettle. 1. The hot water as the thermal energy 2. A cover opens and spills hot water 3. Getting burned Fig. 3. ISO 14971 P1 is the probability of a hazardous situation occurring. P2 is the probability of a hazardous situation leading to harm. Yoshio_Sakai@mb2.nkc.co.jp 5 Software ? 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 6.
    The Estimation ofthe probability of a hazardous situation HARDWARE USABILITY Failure Rate of Random Hardware Failure The likelihood of the usability failure HIGH Frequent Probable Occasional Remote Improbable LOW Likelihood: SOURCE IEC 80001-2-1 Step by Step SOFTWARE •Software is Invisible. •The failure caused by the software occurs systematically, but not statistically. We can not estimate the probability or the likelihood of the failure cased by Software. Yoshio_Sakai@mb2.nkc.co.jp 6 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 7.
    Feature of SystematicFailure Systematic failure is unwanted behaviour which is • repeatable – If the conditions can be exactly replicated • predictable (but not accurately) – all systems have flaws • indefensible – it should not occur... … but it is extremely hard to prevent Yoshio_Sakai@mb2.nkc.co.jp 7 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 8.
    The definition andexplanation of Systematic Failure Systematic Failure failure, related in a deterministic way to a certain cause, that can only be eliminated by a change of the design or of the manufacturing process, operational procedures, documentation or other relevant factors SOURCE: ISO 26262-1:2011 This International Standard NOTE4 : • sets requirements for the avoidance and control of systematic faults, which are based on experience and judgment from practical experience gained in industry. Even though the probability of occurrence of systematic failures cannot in general be quantified the standard does, however, allow a claim to be made, for a specified safety function, that the target failure measure associated with the safety function can be considered to be achieved if all the requirements in the standard have been met; SOURCE: IEC 61508-3:2010 Yoshio_Sakai@mb2.nkc.co.jp 8 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 9.
    Two types ofevaluation of the hazard caused by Systematic Software Failure The probability of such failure shall be assumed to be 100 percent. (IEC 62304:2006) • The probability is 100%. • This 100 percent principle has been chosen for conservative purpose but not practical in real application. If the hazard could arise from a failure of the software, the risk evaluation should be analyzed by the following two concerns. (IEC 62304:2006 Amd.1 , This Study) • 1st concern is the risk level as the severity of the harm before the risk control measures. • 2nd concern is the risk level as the severity of the harm after the risk control measures. • The evaluation of the residual risk is of importance, but under the cause of the software, the probability of occurrence of harm before the risk control measures is not. Yoshio_Sakai@mb2.nkc.co.jp 9 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 10.
    The procedure ofevaluation of the hazard caused by Systematic Software Failure If the hazardous situation occurs by Systematic Software Failure RISK The safety is affected by • the hardware as the risk control measure and • the reliability of the critical software component. RISK CONTROL MEASURES After the risk control measures, we have to evaluate the residual risk for the safety. RESIDUAL RISK The probability of occurrence of harm caused by the software before the risk control measures is not necessary for the risk assessment. Yoshio_Sakai@mb2.nkc.co.jp 10 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 11.
    Method of evaluatingSystematic Failure Medical device Manufacturers can evaluate the residual risk class by the following combination after countermeasure. a. The severity of the residual risk b. The reliability of the software items that could contribute to a hazardous situation c. The safe architecture of the software system These are not elements of Yoshio_Sakai@mb2.nkc.co.jp the probability 11 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 12.
    Relation between therisk control measures and Architecture. Complicated Software Items (Low cohesion and High coupling) Segregated Software Items (High cohesion and Low coupling) Layered Architecture (3 Layers: Presentation, Domain and Date Source) Result of having continuous addition (A real software system) Not Clear Yoshio_Sakai@mb2.nkc.co.jp Clear 12 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 13.
    The Principles ofElectrosurgical Knife The mode of cut or coagulation is switched by software. Mode Principles Cut For cutting, a continuous single frequency sine wave is often employed. Coagulation For coagulation, the average power is typically reduced below the threshold of cutting. Generally, the sine wave is turned on and off in a rapid succession. There are the serious hazardous situations in the software system. Yoshio_Sakai@mb2.nkc.co.jp 13 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 14.
    Electrosurgical Knife BlockDiagram The wave is controlled and switched by the software High Risk Software Component High Risk Software Component The most serious hazard is hemorrhage not intended by the abnormal output of Electrosurgical knife. Let’s see the fault tree analysis following slides. Yoshio_Sakai@mb2.nkc.co.jp 14 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 15.
    Abnormal Output of ElectrosurgicalKnife Extended Notation of FTA (1) Class A(C)s = OR (A(C)s, A(C)s) Abnornal Output caused by Hardware Class A(C)s = AND (C, --Bs)) 1st column from the bottom and on the left side of FTA Example Unintended Output caused by Software Class A(C)s = AND (Cs, --Bs) Output Hardware Failure Abnormal Monitoring Failure Class Bs d. There are three hardware failures. Each failure is classified by the risk level. Three basic events are connected with OR gate. The highest risk class is adopted by the OR function. Risk Class High-frequency Wave Failure Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) a. b. c. Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B Definition (Source IEC 62304:2006) Class A No injury or damage to health is possible Class B Non-serious injury is possible Class C Death or serious injury is possible Yoshio_Sakai@mb2.nkc.co.jp 15 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 16.
    Extended Notation ofFTA (2) Abnormal Output of Electrosurgical Knife Class A(C)s = OR (A(C)s, A(C)s) Abnornal Output caused by Hardware Class A(C)s = AND (C, --Bs)) 2nd column from the bottom and on the left side of FTA Example Unintended Output caused by Software Class A(C)s = AND (Cs, --Bs) Output Hardware Failure a. b. c. d. The right basic event is an abnormal monitoring failure. This event is caused by the software. It is described with Class Bs as impact level of risk Class B and with “s” as the effect of the software. The abnormal monitoring inhibits and controls the output hardware failure. This is indicated by AND function as AND(C, --Bs). The stage of inhibit is shown by the number of the minus. In this case, the risk control measure goes down the risk level by two stages from C to A. Class A Class Bs Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) High-frequency Wave Failure Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B Class A(C) s = AND(C, --Bs) Class C -- Abnormal Monitoring Failure Risk Control Measure(Class Bs) Down the risk level by two stages Yoshio_Sakai@mb2.nkc.co.jp 16 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 17.
    Extended Notation ofFTA (3) Abnormal Output of Electrosurgical Knife Class A(C)s = OR (A(C)s, A(C)s) Abnornal Output caused by Hardware 1st column from the bottom and On the right side of FTA Example. a. The abnormal monitoring failure is caused by the software. b. Class A(C)s = AND (Cs, --Bs) Output Hardware Failure Abnormal Monitoring Failure Class Bs Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) High-frequency Wave Failure Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B If the basic event does not inhibit the other basic event, the highest risk class is adopted by the AND function. (This method is inspired by the notation of ASIL decomposition in ISO 26262-9) d. Class A(C)s = AND (C, --Bs)) The A/D convertor failure is caused by hardware. c. Unintended Output caused by Software The subscript “s” is inherited from the left side to the right side through the function as the affect of the software to the system. Yoshio_Sakai@mb2.nkc.co.jp 17 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 18.
    Extended Notation ofFTA (4) Abnormal Output of Electrosurgical Knife Class A(C)s = OR (A(C)s, A(C)s) 1st column from the top of FTA Example. Abnornal Output caused by Hardware Unintended Output caused by Software Class A(C)s = AND (C, --Bs)) Class A(C)s = AND (Cs, --Bs) Output Hardware Failure a. The highest risk class is adopted by the OR function. In this case, the risk classes are same. Abnormal Monitoring Failure Class Bs Class C Wave Circuit Failure Class C Failure of the Abnormal Detection Class Cs Class Bs = AND (Bs, B) Class C = OR (C, C, B) High-frequency Wave Failure Cut/Coag Mode Mismatch Timer Failure Class B Abnormal Monitoring Failure Class Bs A/D Convertor Failure Class B b. The risk class of a top event is expressed after all as Class A (C) s. • The followings are recognized by this notation. – The risk class of the residual risk is A. – The highest risk class before the risk control measure is C. – The software affects the top event or the risk control measure in the system. Yoshio_Sakai@mb2.nkc.co.jp 18 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 19.
    Effectiveness of thisNotation These are the following effectiveness of this notation. • The safety analysts can recognize – the risk class before and after the risk control measure. – the software in the system and the risk control measure affects the top event. – the effect of the risk control by the minus mark in the AND function. • When there is the mark "s" of the event in the fault tree, the safety analysts find the start point of the effect of the software for the system safety. • When there is the mark "s" and the minus mark, the safety analysts can recognize the risk which is given by changing software of the risk control measure. Yoshio_Sakai@mb2.nkc.co.jp 19 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 20.
    Effectiveness of this Notation Thereis the risk which is given by changing software of the risk control measure There is the risk which is given by changing software of the risk control measure Yoshio_Sakai@mb2.nkc.co.jp The start point of the effect of the software for the system safety 20 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 21.
    Attention! • FTA isan excellent way to show the structure of the mechanism that Top Event as "undesired state of the system" is generated. • On the other hand, the calculation of the failure rate on FTA has a dangerous feature too. When Systematic Software Failure has not been recognized, the analysis of a radiation therapy machine named Therac-25 included the software in the fault trees but used a “generic failure rate” of 10-4 for software events. This number was justified based on the historical performance of the Therac-25 software.(This source is from SAFEWARE by Pf. Nancy Leveson) But now, we understand the features of the software well, and recognize it is not realistic. 1.The evaluation of the residual risk is of importance. 2.We can evaluate the severity of the harm before and after the risk control measures. Therefore, we should focus on the architecture of the software system and the structure of the risk control measures. Yoshio_Sakai@mb2.nkc.co.jp 21 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 22.
    Thank you. I wishthis notation will be used in the real development of Medical Devices. Yoshio_Sakai@mb2.nkc.co.jp 22 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 23.
    REFERENCES [1] Dolores R.Wallace, D. Richard Kuhn, “Failure Modes In Medical Device Software:An Analysis Of 15 Years Of Recall Data” , 2001 [2] S.Shirasaka, Y.Sakai, Y.Nishi, “Feature Analysis of Estimated Causes of Failures in Medical Device Software and Proposal of Effective Measures” , ISSRE 2011, [3] ISO 14971:2007 Medical devices - Application of risk management to medical devices [4] ISO 26262-1:2011 Road vehicles - Functional safety - Part 1: Vocabulary [5] IEC/TR 80001-2-1 Application of risk management for IT-networks incorporating medical devices – Part 2-1: Step-by-step risk management of medical IT-networks – practical applications and examples [6] IEC 62304:2006 Medical device software - Software life cycle processes [7] “Katerina Goseva-Popstojanova, Ahmed Hassan, Ajith Guedem, Walid Abdelmoez, Diaa Eldin M. Nassar, Hany Ammar, Ali Mili, “Architectural-Level Risk Analysis Using UML”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29 NO. 10 OCTOBER 2003 [8] Sherif M. Yacoub, Hany H. Ammar, “A Methodology for Architecture-Level Reliability Risk Analysis”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING VOL. 28 NO. 6 JUNE 2002 Yoshio_Sakai@mb2.nkc.co.jp 23 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 24.
    Extra Information forthis study Yoshio_Sakai@mb2.nkc.co.jp 24 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 25.
    Therac-25 FTA System outputsthe wrong energy • The probability for the computer to choose the wrong energy is 10-11 . • The probability for the computer to choose the wrong mode is 4×10-9 • I took off a safety device with the hardware for an economic reason. • Systematic Software Failure has not been recognized • This number was justified based on the historical performance of the Therac-25 software. PDP-11 VT100 Computer chooses the wrong energy 0.00000000001 The probability is 10-11 ? Yoshio_Sakai@mb2.nkc.co.jp Computer chooses the wrong mode 0.000000004 The probability is 4×10-9 ? 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 26.
    IEC 80001-2-1 Figure8 Yoshio_Sakai@mb2.nkc.co.jp 26 Work Sheet Example of Hazard Analysis 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 27.
    New Hazard Analysisof the real medical devices. Probability should be replaced to Probability or Likelihood or NA(Software): Not Applicable. Probability should be replaced to Effect of Risk Control Measure (e.g. Major/Moderate/Minor) Add “Risk Control Measure Type of Concern” SOFTWARE, USABILITY, HARDWARE, CONBINATION of ・・・ If there is the combination of the hardware faults and the software errors, we should have separation of the concern which is Hardware or Usability or Software. Yoshio_Sakai@mb2.nkc.co.jp 27 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 28.
    Separation of TheConcern for the risk assessment 1st Concern SOFTWARE NA→The risk level The risk level before the risk control measures. The risk level after the risk control measures. 2nd Concern 3rd Concern USABILITY Probability likelihood Yoshio_Sakai@mb2.nkc.co.jp HARDWARE (Statistically) 28 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 29.
    IEC 80001-2-1 TableD.3 Usability <-> ○ Likelihood Software <-> × Likelihood If the hazardous situation occurred in the software, we can estimate the risk level as only the severity of the harm after the risk control measures. Yoshio_Sakai@mb2.nkc.co.jp 29 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 30.
    Sequence of Events Changethe method of the risk assessment! Hazard Exposure (P1) Hazardous Situation P2 Harm Medical Device System Requirements Analysis User Needs Intended Use Risk Assessment Hazard Hazardous Situation & Harm Risk Reduction Risk Control Measure Severity of the Harm Probability of Occurrence of Harm Risk P1 × P2 Software Architecture Hardware & Software We should focus on the architecture of the software system and the structure of the risk control measures. The important aspects Residual Risk Yoshio_Sakai@mb2.nkc.co.jp 30 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 31.
    IEC 62304:2006 Amd1CD 4.3 Software safety classification This chart and our study are the same classify method. Yoshio_Sakai@mb2.nkc.co.jp 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 32.
    The Types ofSafety Design Specific Optimization Fault Avoidance Total Optimization Contrasting Method Specific optimization as Fault Avoidance approach is not realistic for the largescale and complicated software system. Yoshio_Sakai@mb2.nkc.co.jp Architecture Fail Safe Fault Tolerance Error Proof (Fool Proof) Total optimization approach is reasonable for today’s medical device software. USER Usability 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 33.
    Safety Design Method Realization Technique Fault Avoidance High Coverage Testing FailSafe Interlock Lockout Safeguard Fault Tolerance Space Tolerance Error Proof / Fool Proof Formal Method Easy Operation Home button Safety Label Yoshio_Sakai@mb2.nkc.co.jp Main Sub Time Tolerance 1st 2nd Information Tolerance Main Information Error Correction 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
  • 34.
    ISO 26262-9 Figure2 — ASIL decomposition schemes • If the basic event does not inhibit the other basic event, the highest risk class is adopted by the AND function. (This method is inspired by the notation of ASIL decomposition in ISO 26262-9) AND function without the element of the risk control as inhibit should select the maximum level of failures. Because it focus on the risk class before and after the risk control measures. Yoshio_Sakai@mb2.nkc.co.jp 34 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013