Risk Implications of Digital RPS
     Operating Experience
                      For Presentation at
   IAEA Technical Mee...
Motivations for this work:

No prior risk or importance analysis of
existing digital RPS failure experience exists
Prior N...
CE Digital Core Protection Calculator Basics:
CE High LPD, Low DNBR RPS design switched from analog
Thermal Margin/Low Pre...
CE Digital Core Protection Calculator Basics:
CPCS credited for reactor trip for following events:
     Uncontrolled Contr...
CE Digital CPCS Software Basics:




Software Design: “One Good Version” not “N-Version”
                                 ...
CE Digital CPCS Interchannel Communications Basics:




 4 CPC computers evaluate: LPD, DNBR– using neutron flux, temperat...
CE Reactor Protection System PRA Basics:
                                       PRA Assessments of overall
               ...
How This Study was Carried Out:
Failure experience from on-line NRC LER data base
currently goes back to 1984
   (NOTE: mi...
How Component Population Was Estimated:




Total CPCS subsystem operating time estimation was based upon
above component ...
How Subsystem Operating Time Was Estimated




   Each of 4 CPC Computers and 2 CEAC Computers contain: 1 processor
   boa...
Subsystem failure rates were calculated via Bayesian
  estimation using Jeffrey’s non-informative prior




   Technique a...
Failure Rate and Unavailability Estimation Issues

  Data Needs for Risk Estimation Process:
  Ability to estimate CCDP gi...
Actual Design Basis CPCS Trip Demands




                                        13
              JHBickel - ESRT, LLC
Estimated CPCS Single Subsystem Failure Rates




                                         14
                  JHBickel -...
CPCS Single Subsystem Failure Rates
Also important to note:
Failure modes of recent regulatory concern which have not occu...
Estimated CPCS Double Event Failure Rates




                                       16
                JHBickel - ESRT, L...
Estimated CPCS System CCF Failure Rates




                                      17
               JHBickel - ESRT, LLC
Results: CPCS System CCF Failure Rates
                                       Computer Technicians insert Wrong Data Sets ...
Types of Observed CCF Events:

Inaccurate cross-calibration of all Ex-core neutron flux
(7 events) or all RCS flow channel...
Risk significance of this failure experience?

None of actual CCF events resulted in core damage
(all were latent faults m...
How sensitive is CCDP to RPS Logic CCF ?

RPS failure considers:
– Mechanical CCF jamming of
  control rods
– Relay/Breake...
How sensitive is CCDP to RPS Logic CCF ?




Variations in RPS-LOGIC-CCF are not risk significant until > 1x10-3
         ...
Some example risk assessments of
   actual Digital CCF events




                                   23
            JHBick...
1995 SONGS 2-3 Addressable Data Swapped
 Rod shadowing constants (on data disks) were swapped
 between adjacent SONGS unit...
1984 Erroneous Fx,y factors supplied by CE
        and uploaded to SONGS-2
 Incorrect Fx,y factors generated by CE and use...
2005 Software Design Error in Software
    Upgrade at Palo Verde 2 for 2,736 hrs.
Original software design:
     Trip CPC ...
PRPS-CCF values from
single events span many
decades
Fault duration times drive
PRPS-CCF values
Latent data uploading
erro...
Event specific CCDP
also dominated by
data uploading errors
Latent software CCF
event is smaller due to
unlikelihood of
tr...
Observations from this “Total Picture of RPS”:
Designers of Digital I&C not particularly surprised by relative
dominance o...
What is Concluded from all this?
To Digital I&C risk it’s necessary to view Total Picture of RPS
– not just “software” or ...
Upcoming SlideShare
Loading in …5
×

Jh Bickel Risk Implications Of Digital Rps Operating Experience

896 views

Published on

Presentation on Digital I&C Common Cause Failure Rates at June 2007 IAEA Symposium

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
896
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Jh Bickel Risk Implications Of Digital Rps Operating Experience

  1. 1. Risk Implications of Digital RPS Operating Experience For Presentation at IAEA Technical Meeting on Common-Cause Failures in Digital Instrumentation and Control Systems of Nuclear Power Plants June 19-21, 2007 Bethesda, Maryland, USA Dr. John H. Bickel Evergreen Safety & Reliability Technologies, LLC 1
  2. 2. Motivations for this work: No prior risk or importance analysis of existing digital RPS failure experience exists Prior NRC Research reports concluded LER data too sparse to use – Only found: 18 microprocessor failures, 4 software failures – Suggested need to consider data from aerospace, medical, transport systems Lack of data implied: inability to risk-inform digital I&C applications and issues My belief: Much more data actually exists on CE CPCS Risks from CPCS experience should be assessed 2 JHBickel - ESRT, LLC
  3. 3. CE Digital Core Protection Calculator Basics: CE High LPD, Low DNBR RPS design switched from analog Thermal Margin/Low Pressure Trip to digital Core Protection Calculators in mid 1970’s Used 6 specially qualified minicomputers running stored computer software and addressable constants CPCS performs static/dynamic projections of local power density and DNBR based upon: Ex-core neutron flux Pressurizer pressure Reactor Tcold, Thot RCP pump speed Control rod positions CPCS generates: alarms, pre-trip, and trip safety actions Original system was licensed on ANO-2 in 1978 Subsequently utilized at: SONGS-2/3, Waterford-3, Palo Verde-1/2/3 …. and Korean Standard NPPs 3 JHBickel - ESRT, LLC
  4. 4. CE Digital Core Protection Calculator Basics: CPCS credited for reactor trip for following events: Uncontrolled Control Rod withdrawal from critical (>10-4 power) Uncontrolled Boron Dilution from critical (>10-4 power) Uncontrolled Control Rod withdrawal from power operation Dropped, or mis-positioned Control Rods Ejected Control Rods Single RCP loss of flow Single RCP shaft seizure 4-RCP loss of flow Electrical grid under-frequency Excess secondary steam flow (including turbine bypass valve malfunction) Excess feedwater flow Loss of feedwater heater Steam line break Single MSIV closure Rapid increase in local power 4
  5. 5. CE Digital CPCS Software Basics: Software Design: “One Good Version” not “N-Version” 5 JHBickel - ESRT, LLC
  6. 6. CE Digital CPCS Interchannel Communications Basics: 4 CPC computers evaluate: LPD, DNBR– using neutron flux, temperature, RCS flow and control rod position inputs in each quadrant 2 CEA computers (CEACs) monitor all quadrants for CEA deviations within groups and generate Penalty Factors transmitted to all 4 CPCs CEACs communicate to CPCs via one-way “simplex” communication links 6 JHBickel - ESRT, LLC
  7. 7. CE Reactor Protection System PRA Basics: PRA Assessments of overall CE RPS have existed for some time (2001) Component unavailabilities based on “time averaged” values NUREG/CR-5500 Vol.10: QRPS = 7.2E-6 (Digital CPCS, w/o Operator Action) QRPS = 1.6E-6 (Digital CPCS, w/ Operator Action) Relay and breaker CCF dominates predicted QRPS : CCF of master trip relays (K-1 through K-4) CCF of reactor trip breaker is not as significant on CE design due to configuration 7 JHBickel - ESRT, LLC
  8. 8. How This Study was Carried Out: Failure experience from on-line NRC LER data base currently goes back to 1984 (NOTE: misses first 6 years ANO-2 experience) Post-1984 CPC LERs on CE plants were evaluated CPCS Failure experience categorized by subsystem Size of operating experience pool: 141 LERs (1984 – 2005) ~145.5 Rx years (or: 1.27x106 Rx hr) 70 actual CPC reactor trip demands 26 events involving latent CCF (including: 1 latent software CCF) Subsystem failure rates calculated via Bayesian estimation using Jeffrey’s non-informative prior CCDP risk estimated via ASP approach Method highlights CCDP impact of “higher” than average unavailability 8 JHBickel - ESRT, LLC
  9. 9. How Component Population Was Estimated: Total CPCS subsystem operating time estimation was based upon above component inventory per plant Total CPCS operating time (for 4/4 Channel CCF estimation) was simply total plant operating time. 9 JHBickel - ESRT, LLC
  10. 10. How Subsystem Operating Time Was Estimated Each of 4 CPC Computers and 2 CEAC Computers contain: 1 processor board, 1 memory board, 1 multiplexer board, 1 external Watchdog Timer Each of 4 CPC Channels contains: 1 PZR pressure sensor, 3 ex-core neutron flux inputs, 4 RCP speed sensors, 2 Tcold and 2 Thot inputs 10 JHBickel - ESRT, LLC
  11. 11. Subsystem failure rates were calculated via Bayesian estimation using Jeffrey’s non-informative prior Technique allows bounding failure rate estimation for “0” observed failures 11 JHBickel - ESRT, LLC
  12. 12. Failure Rate and Unavailability Estimation Issues Data Needs for Risk Estimation Process: Ability to estimate CCDP given specific event demands and event-conditional system unavailabilities (such as RPS) Includes conditional unavailability due to specific combinations of input conditions to digital system Certain software “bugs” only triggered by unusual input sets Overall RPS unavailability must consider combinations of random and CCF events Operating experience estimates failure rates: λ Conversion to RPS unavailability uses estimate of time to detect and restore: P = λ x (fault duration) In many cases for latent Digital CCFs fault durations are many months 12 JHBickel - ESRT, LLC
  13. 13. Actual Design Basis CPCS Trip Demands 13 JHBickel - ESRT, LLC
  14. 14. Estimated CPCS Single Subsystem Failure Rates 14 JHBickel - ESRT, LLC
  15. 15. CPCS Single Subsystem Failure Rates Also important to note: Failure modes of recent regulatory concern which have not occurred in population exposure time Recall failure rates can be estimated as: λ ~ 0.5/T Faults propagated via inter-channel communication: 2 events noted involving loss of CPCS -> Plant Computer communications link that resulted in failure to perform Tech. Spec. required cross-checks, λ = 2.5 / ( 6 x 1.27x106 hours) = 3.3 x10-7/hr Other events in which communication link failure occurred without operation impairment likely occurred but not reported in LER data base Events involving a failure propagating to CPC or CEAC would be in LER data base if they occurred “0” events noted in which a communication link failure caused corruption to CPC or CEAC channel, λ ~ 0.5 / ( 6 x 1.27x106 hours), or: ~ 6.6 x10-8/hr 15 JHBickel - ESRT, LLC
  16. 16. Estimated CPCS Double Event Failure Rates 16 JHBickel - ESRT, LLC
  17. 17. Estimated CPCS System CCF Failure Rates 17 JHBickel - ESRT, LLC
  18. 18. Results: CPCS System CCF Failure Rates Computer Technicians insert Wrong Data Sets to all 4 CPCS Channels Breakdown of Common Mode Failures Reactor Vendor supplies Erroneous Data Sets input to all 4 CPCS Channels Reactor Vendor Supplies Software Update Containing 4% 4% Latent Software Error 11% 4% Operators Fail to Confirm ASI in all four CPCS Channels 4% when Reactor Power > 20% 8% Incorrect Acceptance Criteria Used for 4% Excore Data Set Calibration Checks >80% 4% Inaccurate Cross Calibration of Excore Data Sets 8% (Cross Channel, COLSS, etc.) 8% High Log Power Bypass Removal Setpoints (1E-4) Incorrect Inaccurate Cross Calibration of RCS Flow Data Sets 11% 4% (Cross Channel, COLSS, etc.) Operators Fail to Perform 12hr Auto-RESTART Surveillance on all CPCS Channels 26% Operators Fail to Perform Refueling Interval Surveillance on all CPCS Channels Communication Data Link Failure to Plant Computer results in Missed Surveillances on both CEAC Channels 2 of 2 CEACs Inoperable 3 of 4 CPCS Neutron Flux Cross Channel Calibrations OOT The issue of latent software CCF represents only 4% of the CCF experience Calibration, generating, loading of incorrect data sets are the dominant sources of CCF 18 JHBickel - ESRT, LLC
  19. 19. Types of Observed CCF Events: Inaccurate cross-calibration of all Ex-core neutron flux (7 events) or all RCS flow channels (2 events) Computer technicians insert wrong addressable constant data sets into all 4 CPCS channels (3 events) Swapping addressable data sets between units CE supplies erroneous data sets (2 events) Software update provided to plant with incorrect logic for processing of indicated failed sensors (1 event) 19 JHBickel - ESRT, LLC
  20. 20. Risk significance of this failure experience? None of actual CCF events resulted in core damage (all were latent faults missing “triggering event”) Need to consider CCDP implications of specific failure modes Intent: apply risk screening process similar to NRC ASP program which focuses on higher than average values of system unavailability Use: ASP-type failure rate data, SPAR plant specific risk models, actual observed unavailability CCDP = Σ λi x PCPCS-CCF x HEPNR CPCS- PCPCS-CCF = λCPCS-CCF x (duration of latent fault) CPCS- CPCS- First: How sensitive is CCDP to RPS Logic CCF ? 20 JHBickel - ESRT, LLC
  21. 21. How sensitive is CCDP to RPS Logic CCF ? RPS failure considers: – Mechanical CCF jamming of control rods – Relay/Breaker CCF failure – RPS Logic CCF – Operators fail to manually trip – Operators fail to trip MG sets Loss of Offsite Power generates reactor trip without RPS Sensitivity studies conducted using NRC SPAR PRA models 21 JHBickel - ESRT, LLC
  22. 22. How sensitive is CCDP to RPS Logic CCF ? Variations in RPS-LOGIC-CCF are not risk significant until > 1x10-3 22 JHBickel - ESRT, LLC
  23. 23. Some example risk assessments of actual Digital CCF events 23 JHBickel - ESRT, LLC
  24. 24. 1995 SONGS 2-3 Addressable Data Swapped Rod shadowing constants (on data disks) were swapped between adjacent SONGS units for 10,968 hours. Units at different power and burnup history, rod shadowing corrections thus different. Rod shadowing constants only impact power density predictions when control rods dropped, or partially inserted. PCPCS-CCF = 2.75x10-6/hr x 10,968 hr = 3.0 x10-2 Summing over all initiating events involving dropped control rods and rod cycling tests, yields: CCDP < 0.488/yr x 3.0 x10-2 x 0.01 = 1.5 x 10-4 This represents bounding conservative estimate because better knowledge of duty cycle of rod cycling tests would likely reduce by factor of 10 or more. 24 JHBickel - ESRT, LLC
  25. 25. 1984 Erroneous Fx,y factors supplied by CE and uploaded to SONGS-2 Incorrect Fx,y factors generated by CE and used for CPCS LPD calculations from 2-7-84 to 3-20-84 (1,032 hrs). Events such as this have occurred twice. PCPCS-CCF = 1.96x10-6/hr x 1,032 hr = 2.0 x10-3 CCDP = 0.488/yr x 2.0 x10-3 x 0.01 = 1.5 x 10-4 25 JHBickel - ESRT, LLC
  26. 26. 2005 Software Design Error in Software Upgrade at Palo Verde 2 for 2,736 hrs. Original software design: Trip CPC channel if sensor detected to be “Failed – Out of Range” Software hardware upgrade: Use inputs from two sets of instruments and multiplexers (primary and secondary) Out of Range Sensor Failure: Primary detected sensor failure results in switchover to secondary. Out of Range Failure on secondary reverts to “last stored good value” CCF of all sensors of one type could result in continuous use of “last good value” in all 4 CPCS channels rather than TRIP. PCPCS-CCF = 8 x PSensor-CCF x 2.75x10-6/hr x 2,736 hr =8 x 8.4 x10-4 x 2.75x10-6/hr x 2,736 hr = 5.0 x10-5 Given CCF of instruments, no credit for operators, HEP=1.0 CCDP = 0.289/yr x 5.0 x10-5 x 1.0 = 1.44 x10-5 26 JHBickel - ESRT, LLC
  27. 27. PRPS-CCF values from single events span many decades Fault duration times drive PRPS-CCF values Latent data uploading errors are dominant unavailability contributors Data uploading errors larger than relay and breaker CCF found in NUREG/CR-5500 (which used time-averaged values) 27
  28. 28. Event specific CCDP also dominated by data uploading errors Latent software CCF event is smaller due to unlikelihood of triggering condition. 28
  29. 29. Observations from this “Total Picture of RPS”: Designers of Digital I&C not particularly surprised by relative dominance of: Calibration problems and human errors uploading wrong data sets CCF due to errors by vendor in generating data sets These failure modes also existed in NPPs with Analog I&C CCF Unavailability and event CCDP estimates from operating experience are dominated by latent events with very long fault duration intervals Software-related CCF, while important, isn’t dominant CCF source when actual operating experience is evaluated Likely because: software V&V processes more rigorous than operational controls after deployment at NPP Most-obvious software “bugs” generally caught by burn-in testing and qualification programs Software “bugs” triggered by highly unlikely input combinations are not key sources of RPS unavailability or CCDP risk 29 JHBickel - ESRT, LLC
  30. 30. What is Concluded from all this? To Digital I&C risk it’s necessary to view Total Picture of RPS – not just “software” or : “microprocessors”: Final trip relays and trip breakers - will still be there Problems cross calibrating nuclear with thermal - will still be there Human errors inputting set-points and coefficients - will still be there When this is done - Total Picture of RPS risk emerges NPPs with CPCS have been operating since 1978 in typical, controlled, nuclear operations environment, which includes: Vendor generation of cycle specific constants, set-points Routine hardware, software upgrades developed and installed Routine operation, trouble alarms, and alarm response Impact of Technical Specifications, Testing, Calibrations Actual nuclear field reliability experience is better source of data than non-nuclear sources or theoretical models Ability to estimate, or bound risks of specific Digital I&C CCF failure modes thus: clearly exists 30 JHBickel - ESRT, LLC

×