Viewpoint on ISA TR84.0.02 - simplified methods and fault tree analysis
ISA Transactions 39 (2000) 125±131 www.elsevier.com/locate/isatrans Editorial viewpoint Viewpoint on ISA TR84.0.02 Ð simpli®ed methods and fault tree analysis Angela E. Summers * SIS-TECH Solutions, LLC, PMB-295, 2323 Clear Lake City Blvd, Houston, TX 77062-8032, USAAbstract ANSI/ISA-S84.01-1996 and IEC 61508 require the establishment of a safety integrity level for any safety instru-mented system or safety related system used to mitigate risk. Each stage of design, operation, maintenance, and testingis judged against this safety integrity level. Quantitative techniques can be used to verify whether the safety integritylevel is met. ISA-dTR84.0.02 is a technical report under development by ISA, which discusses how to apply quantita-tive analysis techniques to safety instrumented systems. This paper discusses two of those techniques: (1) Simpli®edequations and (2) Fault tree analysis. # 2000 Elsevier Science Ltd. All rights reserved.Keywords: Safety integrity level (SIL); Safety instrumented system (SIS); ANSI/ISA-S84.01-1996; IEC 61508; ISA-dTR84.0.021. Introduction been issued as ®nal and three are waiting for ®nal vote on the ®nal draft international standard. The In 1996, ISA, the international society for mea- intent is to release the entire standard as ®nal insurement and control, voted unanimously for the early 2000. Instrumented systems designed in theapproval of ISA-S84.01. In 1997, the standard was next millennium must comply with this standardaccepted by the American National Standards with the exception of US installations that mustInstitute (ANSI) and is now known as ANSI/ISA- follow ANSI/ISA-S84.01-1996.S84.01-1996 . This standard is considered by the Both standards are performance-based andUS Environmental Protection Agency (EPA) and contain very few prescriptive requirements. TheOccupational Safety and Health Administration ``performance of the safety instrumented system(OSHA) as a generally accepted good industry (SIS) is based on a target safety integrity levelpractice [2,3]. Any US based instrumented systems (SIL) that is de®ned during the safety require-speci®ed after March 1997 should be designed in ments speci®cation development . According tocompliance with this standard. the standards, the ability of the SIS to achieve a Internationally, IEC 61508, ``Functional safety of speci®c SIL must be validated at each stage ofelectronical/programmable electronic (E/E/PES) design and prior to any change made to the designsafety-related systems [4,5], is getting very close to after commissioning. The entire operation, testing,being released as a ®nal standard. The standard and maintenance procedures and practices are alsoconsists of seven parts, four of which have already judged for agreement with the target SIL. Thus, the successful implementation of a validation pro- * Tel.: +1-713-320-4777; fax: +1-281-461-8109. cess for SIL is very important for compliance with E-mail address: email@example.com either standard.0019-0578/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved.PII: S0019-0578(00)00018-5
126 A.E. Summers / ISA Transactions 39 (2000) 125±131 The SP84 committee is working to complete a suciently simple for the hand calculations. Fortechnical report, ISA-dTR84.0.02, which will dis- SIL 3 systems, the complexity of the design oftencuss three techniques for the quanti®cation of SIL. makes the Simpli®ed equations not so simple toThese methods are simpli®ed equations , Fault use. Therefore, the technical report recommendstree analysis , and Markov modeling [10,11]. the use of Simpli®ed equations for ``simple SISs.The technical report introductory material states For more complex SISs, Fault tree analysis orthat the purpose of dTR84.0.02 is to provide sup- Markov modeling is recommended. Fault treeplemental information that would assist the User analysis is widely used by the general risk assessmentin evaluating the capability of any given SIS industry for de®ning the frequency or probability ofdesign to achieve its required SIL and to reinforce particular incident scenarios. The calculations canthe concept of the performance based evaluation be done by hand, but since computer softwareof SIS. The technical report further states that the models are readily available, most Fault tree analy-quanti®cation of the SIL is performed to ensure sis is performed using a computer program.that the SIS meets the SIL required for each safety Many risk analysts are not familiar with Mar-function, to understand the interactions of all the kov modeling and the fundamental math behindsafety functions, and to understand the impact of the method will be a rude awakening to thosefailure of each component in the SIS. Therefore, Users who have forgotten how to do matrix maththe technical report emphasizes the importance of or how to solve Laplace transforms. However,evaluating the SIS design . Markov modeling should be used for the evalua- The technical report also acknowledges the tion of any programmable logic solver , sinceimportance of spurious trip rate to the operation of Markov modeling can take into account timethe facility. Spurious trips are often not without dependent failures and variable repair rates foundincident. There is a process disruption; alarms in most TUV Class 5 and 6 certi®ed logic solvers.sound; and PRVs lift causing ¯ares many meters It is best to leave the Markov modeling to thehigh. Consequently, the technical report presents Vendor and ask the Vendor for the PFDavg at thethe mathematics involved in determining the spur- anticipated logic solver testing frequency. Usersious trip rate. When viewing the calculations pre- should focus instead on learning how to applysented and interpreting the results, it is important to Simpli®ed equations and Fault tree analysis tounderstand that the spurious trip rate is a frequency evaluate the ®eld design, including the input andwith the units of failures per unit of time and the output devices and support systems.SIL is a probability, i.e. a dimensionless number. ISA-dTR84.0.02 presents three quantitative meth-ods: (1) Simpli®ed equations, (2) Fault tree analysis, 2. Determining SIL of a SIS via simpli®edand (3) Markov modeling. The technical report is not equations a comprehensive textbook or treatise on any of themethods. All of the parts assume that the User of the The Simpli®ed equation technique involvestechnical report has a basic understanding of prob- determining the PFDavg for the ®eld sensors (FS),abilistic theory and the method being presented. It logic solver (LS), ®nal elements (FE), and supportalso assumes that the User knows how to obtain and systems (SS). The ®eld sensors are the inputsevaluate the appropriateness of the data for a speci®c required to detect the hazardous condition. Theapplication. The intent of the technical report is to logic solver accepts these inputs and generatesprovide guidance on how to apply this knowledge to correct outputs that change the state of the ®nalsafety instrumented systems. elements in order to mitigate the hazardous con- Many Users will choose to use Simpli®ed equa- dition. The support systems are those systems thattions for an initial estimation of the Average are required for successful functioning of the SIS.Probability to Fail on Demand (PFDavg) for various If the valves are air-to-move, the instrument airdesign options. It may also be used to evaluate SIL 1 supply must be analyzed. If the SIS is energize-to-and SIL 2 systems where the architecture is trip, the power supply must be considered as part
A.E. Summers / ISA Transactions 39 (2000) 125±131 127of the SIS. Once the individual PFDs for each is estimated as a percentage of the failure rate ofinput, logic solver, output and support system are one of the devices in a redundant con®guration,known, these PFDs are summed for the PFDSIS. assuming both devices have the same failure rate (note third term above). Therefore, the commonphsis Æphp Æphv Æphpi cause failure rate or dependent failure rate would Æph be Â lh and the device failure rate or indepen- dent failure rate would be 1 À Â lh . For the The Simpli®ed equations used for calculating purposes of Part 2, 1 À was considered to bethe PFDavg were initially derived from Markov equal to 1, yielding conservative results. For large models; however, the simpli®cation of the models factors, 1 À should be considered, which wouldresulted in some limitations. Unlike Markov yield the following equation for a 1oo2 architecture:models, this method does not handle time depen- !dent failures or sequence dependent failures. Due À Á2 TI2 PFDvg 1 À lh Âto these limitations, this method should not be 3used to analyze programmable logic solvers. Â h hh Ã l Â l Â MTTR Â TI Part 2 includes equations for 1oo1, 1oo2, 1oo3, ! ! TI TI2oo2, 2oo3, and 2oo4 architectures. These equa- Â lh Â lh Â ptions have been derived from Markov models, 2 2assuming the rare event approximation. The rareevent approximation can only be used when the The published data in OREDA , CCPS ,failure rate (l) multiplied by the testing interval and RAC  sometimes provide the undetected(TI) is much smaller than 0.1. This can be stated dangerous failure rate; however, many times, onlymathematically as lTI ( 0X1. Simpli®ed equations a total dangerous failure rate is published. If onlyresults in the calculation of the PFDavg for each the total dangerous failures are known, the Uservoting con®guration. The extended equations do must make an assumption concerning the percen-include some variables for which published data is tage of the total dangerous failures that can benot available. These variables must be estimated detected with diagnostics. If the percentage is notfrom experience. Consequently, an experienced known, the total dangerous failures can be used torisk analyst and/or engineer is required for correct obtain a conservative estimate of the PFDavg.estimation of these variables. For instance, the The second term is the probability of having aequation for 1oo2 architecture is as follows: second undetected failure (lh ) during the repair of ! a detected failure (lhh ). This numerical value of this À Á2 TI2 PFDvg lh Â term is generally very small, since the repair time 3 (MTTR) is typically less than 24 h. Consequently, Â h hh Ã l Â l Â MTTR Â TI this term often can be considered negligible. ! ! The third term represents the probability of TI TI Â lh Â lh Â p common cause failure based on the beta factor 2 2 method. The beta factor must be estimated by the The ®rst term is the undetected dangerous failure User, since there is almost no published dataof the SIS. It shows the eect that the device unde- À Á available for current technology. The technicaltected dangerous failure rate lh and testing report states that the value is somewhere betweeninterval (TI) have on the PFDavg. This term is the 0 and 20%. Many Users have determined thatmost important part of this equation in determin- with proper design practices  that a beta factoring the unavailability of the SIS. This term is actu- in the range of 0.1 to 2% can be used. The betaally simpli®ed from the full Markov solution. factor has a profound eect on the PFDavg In explanation, the beta () factor method is a obtained for redundant architectures, so it must betechnique that can be used to estimate common selected carefully. For initial comparisons ofcause failure eects on the SIS design. The factor architecture and testing frequency, it is best to
128 A.E. Summers / ISA Transactions 39 (2000) 125±131 Â À ÁÃ Â À ÁÃassume that this term is negligible. Eective design STR 2 l lhh l lhh l pcan minimize common cause failure. However, ifan analysis of the design indicates that common The ®rst term contains the failures associatedcause failures can occur, such as shared process with a device experiencing either a dangeroustaps or a shared ori®ce plate, a beta factor should detected failure which forces the logic to the tripbe selected and included in the ®nal calculation. state or a safe failure. Due to spurious trip con- The fourth term is the probability of systematic cerns, many Users choose to fail a detected devicefailure. Systematic failures are those failures that failure ``away from the trip. This converts the logicresult due to design and implementation errors. Sys- to 1oo1 for the remaining device until repair is initi-tematic failures are not related to the hardware fail- ated. If this type of logic is utilized, the dangerousure. Examples of systematic failures are as follows: detected failure rate contribution to the spurious failure rate can be assumed to be zero. 1. SIS design errors The second term is the common cause term and 2. Hardware implementation errors the third term is the systematic failure rate. Eec- 3. Software errors tive design and good engineering techniques should 4. Human interaction errors minimize both of these terms. The equation can 5. Hardware design errors then be reduced to the following: 6. Modi®cation errors STR 2l The systematic failure rate (lh ) is extremely dif- p®cult to estimate. Also, many of the listed sys- Similar reduced equations can be derived for thetematic failures will aect all of the architectures other architectures.equally. If software design is poor, it does not When STR is known for each combination of ®eldmatter whether there is one, two or three trans- sensors, logic solver, ®nal element, and support sys-mitters. This term assumes that the systematic tems. The overall STR is calculated by summing thefailures can be diagnosed through testing. There- individual STRs. The ®nal answer is the frequencyfore, eective design, independent reviews, and at which the SIS is expected to experience a spuriousthorough testing processes must be implemented trip.to minimize the probability of systematic failures.When good engineering design practices are uti-lized, these failures can be considered negligible. 4. Limitations of the simpli®ed equations Based on the repair time being short and on the methodologycommon cause and systematic failures being mini-mized through good design practices, these terms The published equations in ISA-dTR84.0.02 docan be neglected yielding the following equation: not allow the modeling of diverse technologies. hÀ Á2 i The sensors or ®nal elements used in each voting lh ÂTI2 strategy must have the same failure rate. Conse-PFDvg quently, this method does not allow the modeling 3 of a switch and a transmitter or a control valve Similar reduced equations are provided for 1oo1, and a block valve. During the derivation for the1oo2, 1oo3, 2oo2, 2oo3, and 2oo4 architectures. equations in Part 2 and those shown in Part 5, it was assumed that the failure rate of voted devices were the same. It must be emphasized that this is a3. Determining spurious trip rate via simpli®ed limitation of the equations presented in theseequations parts. It is not a limitation of the mathematics of the methodology. For the spurious trip rate (STR), the full equation However, a signi®cant limitation of the mathe-for 1oo2 is as follows: matics is the requirement that the testing frequency
A.E. Summers / ISA Transactions 39 (2000) 125±131 129be the same for all voted devices. To perform the Fault tree analysis, the PFDavg is calculated forMarkov model derivation, the integration is per- each device and then Boolean algebra is used toformed over the range of time 0 to time ``testing account for the architecture and voting. Conse-frequency. Consequently all devices in a voted set quently, the equations used for some architecturesmust be tested at the same interval. will be dierent when Simpli®ed equations are The method also does not allow the modeling of used rather than Fault tree analysis. When theany SIS device interactions or complex failure equations are dierent, of course, the PFDavg valuelogic, such as 1oo2 temperature sensors detecting will dier. However, both methods provide accep-the same potential event as 2oo3 pressure sensors. table approximations of the PFDavg for the SIS.The actual failure logic may be that the event will A Fault tree analysis begins with a graphicalnot occur unless both temperature sensors and representation of the SIS failure. For example, in the2oo3 pressure sensors fail. This method will only 1oo2 voting of two identical devices, the fault treelook at the sensor failures as separate issues. would look as shown in Fig. 1. The failure of the SISConsequently, this method is used to model simple would only occur if both device 1 and device 2 failed.SISs only. However, the math is easy and all this The and gate is used to illustrate this logic.method requires for execution is a pad of paper The data would be collected and used to calculateand a pen (or computer). the PFDavg of each device: PFDvg lh TIa25. Determining SIL of a SIS via fault tree analysis (9) Boolean algebra, also known as cut-set math, is Part 3 discusses the use of fault trees analysis for used to calculate the and gate. This yields:modeling the SIS. Fault tree symbols are used to À Á2show the failure logic of the SIS. The graphical PFDvg lh TIa2 Â lh TIa2 lh TI a4technique of Fault tree analysis allows easy visuali-zation of failure paths. Since the actual failure logic Since these calculations are based on the PFDavgis modeled, diverse technologies, complex voting for a single device, it is easy to examine casesstrategies, and interdependent relationships can be where the failure rates and testing frequencies ofevaluated. However, Fault tree analysis is not read- the two devices are not the same. The PFDavg forily adaptable to SISs that have time dependent fail- each event is simply calculated based on its failureures. As with Simpli®ed equations, Fault tree rate and testing frequency. These PFDavg valuesanalysis is not recommended for modeling pro- are combined using the cut-set math.grammable logic solvers. The User should obtain Any of the terms discussed in the Simpli®edthe PFDavg for the logic solver from the Vendor at equations overview can be included in the faultthe anticipated logic solver testing frequency. tree as events, such as systematic failure and com- Fault tree analysis is one of the most common mon cause failure. The 1oo2 voting devices, includ-techniques applied for quantifying risk in the pro- ing common cause, would appear as shown in Fig. 2.cess industry. Computer programs, books, andcourses are available to the User to learn how toapply Fault tree analysis. The technical reportrecommends the use of Fault tree analysis in SIL 2and SIL 3 SIS applications. It does require moretraining and experience than the Simpli®ed equa-tions, but will yield more precise results. The mathematical approach for Fault tree analy-sis is dierent from Markov model analysis. Faulttree analysis assumes that the failures of redundantdevices are independent and unconditional. In Fig. 1. Fault tree for PFDavg for 1oo2 voting devices.
130 A.E. Summers / ISA Transactions 39 (2000) 125±131 Fig. 3. Fault tree for spurious trip for 1002 voting devices.Fig. 2. PFDavg for 1oo2 voting devices with common cause The spurious trip rate is calculated as follows:consideration. devie 1 devie 2 The independent failure rate contribution wouldbe calculated as follows: 7. Limitations of the methodology TIPFDvg 1 À lh 2 The derivation methodology for fault tree analysis TI È É2 TI2 is dierent from the Markov derivation methodol- Â 1 À lh 1 À lh 2 4 ogy used in the other parts of TR84. While not truly The common cause contribution to the PFDavg a limitation of the methodology, the dierence in thewould be calculated as follows: PFDavg values for some architectures has resulted in disagreement among TR84 members about the true TI de®nition of PFDavg. However, the dierence in thePFDvg Â lh Â 2 overall results is seldom signi®cant, but the reader is warned that there will be instances where Simpli®ed The common cause failure contribution can equations and Fault tree analysis will not yieldthen be added to the independent failure rate identical results.contribution using cut-set math. For rare events, There are three principle bene®ts associated withthe PFDavg calculations would be as follows: using Fault tree analysis for SIL veri®cation. First, Â Ã2 the graphical representation of the failure logic is 1 À lh TI2 TI easily understood by risk analysts, engineers, andPFDvg Â lh Â project managers. Second, the method has been 4 2 used by the process industry for risk assessment The systematic failure contribution to the for many years, so there is already a resource basePFDavg can be added in a similar fashion. within many User companies, as well as outside consultants. Finally, the availability of software tools to facilitate the calculations improves the6. Determining the spurious trip rate via fault tree quality and precision of the calculation.analysis For the spurious trip rate calculation, the same 8. Conclusionsgraphical technique is used, as well as the samecut-set mathematics. However, the equations used ISA-dTR84.0.02 is intended to provide guidanceto describe the individual events are based on fre- on how to calculate the SIL of a SIS. Since ISA-quencies not probabilities. For the 1oo2 voting dTR84.0.02 is a guidance document, there are nodevices, the fault tree is drawn as shown in Fig. 3. mandatory requirements. The document was not
A.E. Summers / ISA Transactions 39 (2000) 125±131 131developed to be a comprehensive treatise on any  Anon. Functional Safety of Electrical/Electronic/Pro-of the methodologies, but was intended to provide grammable Electronic Safety Related Systems, Parts 1, 3,assistance on how to apply the techniques to the 4, and 5 (IEC 61508, 65A/255/CDV) International Elec- trotechnical Commission, Final Standard, Decemberevaluation of SISs. Each part expects the User to 1998.be familiar with the methodology and suggests  Anon. Functional Safety of Electrical/Electronic/Pro-that the User obtain additional information and grammable Electronic Safety Related Systems, Parts 2, 6,resources beyond that contained in the technical and 7 (IEC 61508, 65A/255/CDV) International Electro-report. The technical report was issued in draft in technical Commission, Final Draft International Stan- dard, January 1999.1998 and should be released as ®nal in 2000.  A.E. Summers. Techniques for assigning a target safety Simpli®ed equations and Fault tree analysis are integrity level. ISA Transactions 37 (1998) 95±104.two excellent techniques that can be used together to  Anon. Safety Instrumented Systems (SIS) Ð Safety Integ-cost eectively evaluate SIS designs for SIL. Initial rity Level (SIL) Evaluation Techniques, Part 1: Introduc- tion (ISA dTR84.0.02). Draft, Version 4, March 1998.assessment of proposed options for input and out-  Anon. Safety Instrumented Systems (SIS) Ð Safetyput architectures can be performed quickly at var- Integrity Level (SIL) Evaluation Techniques, Part 2:ious testing frequencies using Simpli®ed equations. Determining the SIL of a SIS via Simpli®ed EquationsWhen the overall SIS needs to be evaluated, Fault (ISA dTR84.0.02). Draft, Version 4, March 1998.tree analysis is a proven technique that can model  Anon. Safety Instrumented Systems (SIS) Ð Safetyeven the most complex logic relationships. Integrity Level (SIL) Evaluation Techniques, Part 3: Determining the SIL of a SIS via Fault Tree Analysis (ISA dTR84.0.02). Draft, Version 3, March 1998.  Anon. Safety Instrumented Systems (SIS) Ð SafetyAcknowledgements Integrity Level (SIL) Evaluation Techniques, Part 4: Determining the SIL of a SIS via Markov Analysis (ISA This paper was presented at Interkama, Dussel- dTR84.0.02). Draft, Version 4, March 1998.  Anon. Safety Instrumented Systems (SIS) Ð Safetydorf, Germany, October 1999. Integrity Level (SIL) Evaluation Techniques, Part 5: Determining the PFD of SIS Logic Solvers via Markov Analysis (ISA dTR84.0.02). Draft, Version 4, April 1998.References  Anon. OREDA: Oshore Reliability Data Handbook. 3rd Ed., DNV Technica (Det Norske Veritas Industri Norge),  Anon. Application of safety instrumented systems for the Norway, 1997. process industries (ANSI/ISA-S84.01-1996). ISA, Research  Anon. Guidlines for Process Eqiupment Reliability Data, Triangle Park, NC Ceter for Chemical Process Safety of the American Insti-  Anon. Process safety management of highly hazardous tute of Chemical Engineers, New York, 1989. chemicals; explosives and blasting agents (29 CFR Part  Non-Electronic Parts Reliability Data. Reliability Analy- 1910). OSHA: Washington, 1992. sis Center, Rome, NY, 1995.  Anon. Risk management programs for chemical acci-  A.E. Summers. Common cause and common sense, dental release prevention (40 CFR Part 68). EPA: designing failure out of your safety instrumented systems Washington, 1996. (SIS), ISA Transactions 38 (1999) 291±299.