How to Troubleshoot Apps for the Modern Connected Worker
Dynamic vs. Traditional Probabilistic Risk Assessment Methodologies - by Huairui Gup
1. Dynamic vs. Traditional Probabilistic
Risk Assessment Methodologies
动态与传统概率风险评估方法
Huairui Gup
2. ASQ Reliability Division
Chinese Webinar Series
One of the monthly webinars
on topics of interest to
reliability engineers
To view upcoming or recorded webinars visit us today at
www.asqrd.org
4. 4
• Determine potential undesirable consequences
associated with use of systems and processes.
• Identify ways that such consequences could
materialize.
• Estimate the likelihood (e.g., probability) of such
events.
• Provide input to decision makers on optimal
strategies to reduce the levels of risk.
Introduction to Risk Analysis
5. 5
• Risk is usually associated with the uncertainty and
undesirability of a potential situation or event.
• In order to have a risk situation, both elements must
be present.
Risk = Uncertainty and Undesirability
Risk = Likelihood and Severity
Definition of Risk
6. 6
Key metrics of risk are embedded in its definition. Risk
can be measured in terms of
the frequency or likelihood of occurrence of events,
degree or magnitude of their direct and indirect
consequences
Levels of risk need to be measured and compared with an
acceptance or tolerance criterion.
Risk Metrics
7. 7
• Risk assessment is the process of providing answer to
four basic questions:
1. What can go wrong?
2. What are the consequences?
3. How frequently might they happen?
4. How confident are we about our answer to the
above questions?
• Answering these questions could be simple or require a
significant amount of analysis and modeling.
Risk Assessment
8. 8
Managing risk requires answers to the following questions:
1. What can be done:
- to prevent/avoid risk?
- to mitigate risk?
- to detect/notify of risk?
2. How much will it cost?
3. How efficient is it?
Risk Management
9. 9
Mission Time
Success of Mission
Risk Senario
(e.g, loss of mission)
Risk Senario
( e.g., Abort)
Risk Senario
( e.g., Degraded Mission)
Perturbation
(Initiating Event)
Branch Point
(Pivotal Event)
End State
A path from the initiating event to an end state is called a scenario.
Anatomy of a Risk Scenario
12. 12
• Traditional Methodologies
– Fault Tree
– Event Sequence Diagram
– FMECA
– Etc
• Dynamic Methodologies
– Monte Carlo Simulation
Risk Assessment Methodologies
13. 13
• Traditional Methodology is a list of methodologies for identifying and
assessing the probability of situations leading to undesired state of a
system.
• Traditional methodologies require analyst to assess possible system
failures
• The quality of PRA using traditional methodologies is analyst
dependent.
Traditional Methodologies
14. 14
• Inductive Method: Induction involves reasoning from individual
cases to a general conclusion.
– Event Sequence Diagram
– FMECA
– Reliability Block Diagram
– etc
• Deductive Method: Deduction constitutes reasoning from the
general to the specific. In a deductive system analysis, it is
postulated that the system itself has failed in a certain way, and an
attempt is made to find out what modes of system or subsystem
(component) behavior contribute to this failure.
– Fault Tree
Traditional Methodologies
16. 16
• The protection system is designed to operate in the following manner. If a
runaway reaction takes place the pressure and temperature sensors will
detect the increase in pressure and temperature above a threshold setting.
The provision of sensors for both temperature and pressure provides
redundancy into the shut-down system design as it only requires one of
these sensors to indicate the threshold is exceeded in order to send a signal
to the alarm unit and valve controller. The function of the valve controller is
to signal both the electrical valves to close. Both input streams must be
shut-down to ensure the runaway reaction is halted. The alarm unit
indicates to the operator that a runaway reaction is taking place. If either of
the two electrical valves fail then the operator may shut valves MV1 and
MV2 manually. Both electrical valves are powered from the grid.
• If the input stream valves do not close one of two possible hazardous
events will occur. If the pressure relief valve NRV opens successfully then
the runaway reaction will be halted with minor release of toxic chemicals. If
the pressure relief valve NRV is stuck closed then the reactor vessel will
rupture with a major release of toxic chemicals.
Examples
17. 17
• Identify the objective
• Define the Initiator/Top Event.
• Define the scope.
• Define the resolution.
• Define ground rules.
• Construct the Model.
• Evaluate the Model.
• Interpret and present the results.
Procedures
22. 22
– Build Model
• Common Cause Failure
– Quantify Basic Events
• Hardware Failures
• Software/Human Failures
– Results
• Accident Probability
• Cut Set / Importance Measure
• Uncertainty
Key Elements
23. 23
– Demand Based Models: Events which occur
at the specific time (absolute mission time or
time relative to the occurrence of a previous
event) that an item is called upon (demanded)
to function.
– Time Distributed Models: Events which occur
over an interval of time, for which the
probability of failure over the length of the
interval is expressed as a point estimate and
an uncertainty distribution
Failure Types
24. 24
• Models specify a distribution over probability of
occurrence of an event
• Distribution consists of a parametric distribution
model, e.g., lognormal, Beta
• Point estimate values are approximated using
parametric distributions (e.g., uniform) with small
standard deviations
Demand Based Models
26. 26
• Models specify a distribution over time-to-failure
distribution model
– Example: failure rate for Exponential model
• In addition, the models specify a time interval
• Distributions consist of a parametric distribution
model, e.g., lognormal
Time Based Model
27. 27
Human / Software Failures
1& 2
3
ROOT CAUSES
RISK METRI CS
- Li kelihood & Severi ty
- Hazard Ranking
- ...
LI KELI HOOD
S
E
V
E
R
I
T
Y
L
H
M
MH
L
SSYSTEM1
Human
Action SYSTEM2 S
F
Initiating
Event
F
SY S TE M 1
FA I L UR E
SU B
SY S TE M 1
SU B
SY S TE M 2
SU B
SY S TE M 3
SU B
SY S TE M 1A
X Y
......
1
SU B
SY S TE M 1B
...
SY S TE M2
FA I LU R E
SU B
SY S TE MA
SU B
SY S TE MB
SU B
SY S TE MA 1
SU B
SY S TE MA 2
A B A CB
H U MA N
A CT I O N
3
2
SYSTEM
ORGANIZIATION
Maintenance Operation
Physical
Environment
Socio-Economic
Environment
Regulatory
Environment
29. 29
• The risk associated with a system is computed as the
sum of many different combinations of events that
would bring the system in an undesirable state.
• Component failures leading to top events and risk
scenarios can be thought of as contributors to the
overall risk of the system.
• The following questions are examples:
• Which components or risk scenarios contribute
most to the overall system risk?
• Changes in the reliability of which components is
the total risk most sensitive to?
Results
30. 30
• A risk scenario is defined as a combination of
events anticipated to bring the system in an
undesirable state.
• Scenarios can be described in different forms
• Paths through an Event Tree
• Event sequences in an Event Sequence
Diagram
• Cut-sets
• Scenarios can be ranked for significance by
sorting them according to their probabilities
Results - Risk Scenario
31. 31
• Cut-set: a set of events whose occurrence causes
the system failure to occur
• A cut-set is minimal if after removal of any event from
the set, the set is no longer a cut-set
– All events are required
AND
OR
A
CB
Minimal Cut-Sets:
A
BC
Results - Cut Set
32. 32
• Ranking scenarios provides limited insight regarding the contribution
of individual components
• Many occurrences in low probability scenarios may be as significant
as few occurrences in high probability scenarios.
• Risk importance measures provide perspective on dominant
contributions by individual components.
• Quantitative measures indicating contribution to risk or sensitivity
of risk
• Function of component’s reliability and its role in the system
• Common importance measures:
– Birnbaum
– Fussell-Vesely
– Risk Reduction Worth
– Risk Achievement Worth
Results – Importance Measure
34. 34
• Dynamic methodology is a set of methods and techniques in which
executable models that represent the behavior of the elements of a system
are exercised in order to identify risks and vulnerabilities of the system
• The essence of this approach is the probabilistic simulation of the dynamic
behavior of the system using the models of the system elements and rules
of their internal and external interactions
– A formal representation of the system behavior needs to be constructed
for the hardware, software, and human components
– A set of rules needs to be prescribed to systematically decompose the
system
– The executable model is used to simulate the behavior of the system
and the physical processes taking place in the system, as a function of
time
– The event sequences are generated automatically by controlling the
stochastic events in the model
Dynamic Methodologies
35. 35
• Dynamic Probabilistic Risk Assessment
– Discrete Dynamic Event Tree
• Systematically explore all scenarios
– Continuous Event Tree Simulation
• Randomly selecting system states and the timing
of events
Dynamic Methodologies
37. 37
Continuous Event Tree Simulation
High Probability
Medium Probability
Low Probability
Time
r
x
(xo, ro)
(xt, rt)
38. 38
• Approach to Solve State Explosion Issue
– Reduce the number of risk scenarios
• Combine system and operator states that lead to
similar end states
– Bias the system and operator states toward
interesting or risk significant events and end
states
• Reduces the computational effort expended on
less important scenarios
• Provides results for desired event sequences using
less simulation effort
State Explosion
41. 41
• The scheduler that manages the exploration process
– Save the system states, and restarting the simulation
• Guide the simulation toward the plan generated by
planner
– Maintain sufficient coverage of important scenarios
– Guide simulation toward areas where it is expected to
gain more insight of the system vulnerabilities
– Continuously adjust priorities based on simulated
results
– Simulation should be able to cover all the event
sequence space
Scheduling
42. 42
• Scheduling rules constitute a dynamic adjustment of
event biasing factors with the objective to favor
simulation of high importance scenarios
– Learning value changes when a scenario is simulated
– No absolute control over how often a scenario is
simulated
• Frequency at which a particular scenario is simulated
depends among other factors on:
– Total number of planned scenarios
– Complexity of the scenario
Scheduling
43. 43
Temperature
Pressure
Pump Control Software
Life Support System
Temperature, Pressure, Time
Low Level:
Detail Equation
High Level:
Lookup Table
Software
Scheduler
Danger
Safe
Sensitive
Level Adjustment