Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huairui Gup
Dynamic vs. Traditional Probabilistic
Risk Assessment Methodologies
ASQ Reliability Division
Chinese Webinar Series
One of the monthly webinars
on topics of interest to
To view upcoming or recorded webinars visit us today at
Dynamic vs. Traditional Probabilistic Risk
• Determine potential undesirable consequences
associated with use of systems and processes.
• Identify ways that such consequences could
• Estimate the likelihood (e.g., probability) of such
• Provide input to decision makers on optimal
strategies to reduce the levels of risk.
Introduction to Risk Analysis
• Risk is usually associated with the uncertainty and
undesirability of a potential situation or event.
• In order to have a risk situation, both elements must
Risk = Uncertainty and Undesirability
Risk = Likelihood and Severity
Definition of Risk
Key metrics of risk are embedded in its definition. Risk
can be measured in terms of
the frequency or likelihood of occurrence of events,
degree or magnitude of their direct and indirect
Levels of risk need to be measured and compared with an
acceptance or tolerance criterion.
• Risk assessment is the process of providing answer to
four basic questions:
1. What can go wrong?
2. What are the consequences?
3. How frequently might they happen?
4. How confident are we about our answer to the
• Answering these questions could be simple or require a
significant amount of analysis and modeling.
Managing risk requires answers to the following questions:
1. What can be done:
- to prevent/avoid risk?
- to mitigate risk?
- to detect/notify of risk?
2. How much will it cost?
3. How efficient is it?
Success of Mission
(e.g, loss of mission)
( e.g., Abort)
( e.g., Degraded Mission)
A path from the initiating event to an end state is called a scenario.
Anatomy of a Risk Scenario
Risk in Qualitative Measures
• Traditional Methodologies
– Fault Tree
– Event Sequence Diagram
• Dynamic Methodologies
– Monte Carlo Simulation
Risk Assessment Methodologies
• Traditional Methodology is a list of methodologies for identifying and
assessing the probability of situations leading to undesired state of a
• Traditional methodologies require analyst to assess possible system
• The quality of PRA using traditional methodologies is analyst
• Inductive Method: Induction involves reasoning from individual
cases to a general conclusion.
– Event Sequence Diagram
– Reliability Block Diagram
• Deductive Method: Deduction constitutes reasoning from the
general to the specific. In a deductive system analysis, it is
postulated that the system itself has failed in a certain way, and an
attempt is made to find out what modes of system or subsystem
(component) behavior contribute to this failure.
– Fault Tree
• The protection system is designed to operate in the following manner. If a
runaway reaction takes place the pressure and temperature sensors will
detect the increase in pressure and temperature above a threshold setting.
The provision of sensors for both temperature and pressure provides
redundancy into the shut-down system design as it only requires one of
these sensors to indicate the threshold is exceeded in order to send a signal
to the alarm unit and valve controller. The function of the valve controller is
to signal both the electrical valves to close. Both input streams must be
shut-down to ensure the runaway reaction is halted. The alarm unit
indicates to the operator that a runaway reaction is taking place. If either of
the two electrical valves fail then the operator may shut valves MV1 and
MV2 manually. Both electrical valves are powered from the grid.
• If the input stream valves do not close one of two possible hazardous
events will occur. If the pressure relief valve NRV opens successfully then
the runaway reaction will be halted with minor release of toxic chemicals. If
the pressure relief valve NRV is stuck closed then the reactor vessel will
rupture with a major release of toxic chemicals.
• Identify the objective
• Define the Initiator/Top Event.
• Define the scope.
• Define the resolution.
• Define ground rules.
• Construct the Model.
• Evaluate the Model.
• Interpret and present the results.
– Build Model
• Common Cause Failure
– Quantify Basic Events
• Hardware Failures
• Software/Human Failures
• Accident Probability
• Cut Set / Importance Measure
– Demand Based Models: Events which occur
at the specific time (absolute mission time or
time relative to the occurrence of a previous
event) that an item is called upon (demanded)
– Time Distributed Models: Events which occur
over an interval of time, for which the
probability of failure over the length of the
interval is expressed as a point estimate and
an uncertainty distribution
• Models specify a distribution over probability of
occurrence of an event
• Distribution consists of a parametric distribution
model, e.g., lognormal, Beta
• Point estimate values are approximated using
parametric distributions (e.g., uniform) with small
Demand Based Models
Event Probability0 1
Point Estimation / Demand Based Models
• Models specify a distribution over time-to-failure
– Example: failure rate for Exponential model
• In addition, the models specify a time interval
• Distributions consist of a parametric distribution
model, e.g., lognormal
Time Based Model
Human / Software Failures
RISK METRI CS
- Li kelihood & Severi ty
- Hazard Ranking
LI KELI HOOD
Action SYSTEM2 S
SY S TE M 1
FA I L UR E
SY S TE M 1
SY S TE M 2
SY S TE M 3
SY S TE M 1A
SY S TE M 1B
SY S TE M2
FA I LU R E
SY S TE MA
SY S TE MB
SY S TE MA 1
SY S TE MA 2
A B A CB
H U MA N
A CT I O N
• The risk associated with a system is computed as the
sum of many different combinations of events that
would bring the system in an undesirable state.
• Component failures leading to top events and risk
scenarios can be thought of as contributors to the
overall risk of the system.
• The following questions are examples:
• Which components or risk scenarios contribute
most to the overall system risk?
• Changes in the reliability of which components is
the total risk most sensitive to?
• A risk scenario is defined as a combination of
events anticipated to bring the system in an
• Scenarios can be described in different forms
• Paths through an Event Tree
• Event sequences in an Event Sequence
• Scenarios can be ranked for significance by
sorting them according to their probabilities
Results - Risk Scenario
• Cut-set: a set of events whose occurrence causes
the system failure to occur
• A cut-set is minimal if after removal of any event from
the set, the set is no longer a cut-set
– All events are required
Results - Cut Set
• Ranking scenarios provides limited insight regarding the contribution
of individual components
• Many occurrences in low probability scenarios may be as significant
as few occurrences in high probability scenarios.
• Risk importance measures provide perspective on dominant
contributions by individual components.
• Quantitative measures indicating contribution to risk or sensitivity
• Function of component’s reliability and its role in the system
• Common importance measures:
– Risk Reduction Worth
– Risk Achievement Worth
Results – Importance Measure
• Dynamic methodology is a set of methods and techniques in which
executable models that represent the behavior of the elements of a system
are exercised in order to identify risks and vulnerabilities of the system
• The essence of this approach is the probabilistic simulation of the dynamic
behavior of the system using the models of the system elements and rules
of their internal and external interactions
– A formal representation of the system behavior needs to be constructed
for the hardware, software, and human components
– A set of rules needs to be prescribed to systematically decompose the
– The executable model is used to simulate the behavior of the system
and the physical processes taking place in the system, as a function of
– The event sequences are generated automatically by controlling the
stochastic events in the model
• Dynamic Probabilistic Risk Assessment
– Discrete Dynamic Event Tree
• Systematically explore all scenarios
– Continuous Event Tree Simulation
• Randomly selecting system states and the timing
Continuous Event Tree Simulation
• Approach to Solve State Explosion Issue
– Reduce the number of risk scenarios
• Combine system and operator states that lead to
similar end states
– Bias the system and operator states toward
interesting or risk significant events and end
• Reduces the computational effort expended on
less important scenarios
• Provides results for desired event sequences using
less simulation effort
• The scheduler that manages the exploration process
– Save the system states, and restarting the simulation
• Guide the simulation toward the plan generated by
– Maintain sufficient coverage of important scenarios
– Guide simulation toward areas where it is expected to
gain more insight of the system vulnerabilities
– Continuously adjust priorities based on simulated
– Simulation should be able to cover all the event
• Scheduling rules constitute a dynamic adjustment of
event biasing factors with the objective to favor
simulation of high importance scenarios
– Learning value changes when a scenario is simulated
– No absolute control over how often a scenario is
• Frequency at which a particular scenario is simulated
depends among other factors on:
– Total number of planned scenarios
– Complexity of the scenario
Pump Control Software
Life Support System
Temperature, Pressure, Time