VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
Moe wynn caise13 presentation
1. PROFILING EVENT LOGS TO CONFIGURE
RISK INDICATORS FOR PROCESS DELAYS
25th International Conference on Advanced Information Systems Engineering,
Valencia, Spain
21 June 2013
Anastasiia Pika, Wil M. P. van der Aalst, Colin J. Fidge,
Arthur H. M. ter Hofstede, and Moe T. Wynn
Queensland University of Technology & Eindhoven University of Technology
Presenter: Moe T. Wynn
2. Risk-aware Business Process Management
2
“It will investigate, evaluate and enhance current approaches for the
identification, analysis, evaluation, treatment, and overall management of
risk as it relates to business processes”
BPM lifecycle
ARC Discovery Grant Project 2011-2013. Risk-aware business process management
ISO 31000:9000
Risk Management Process
3. 3
Process-related risk is one that threatens the achievement of one or
more process goals
A negative effect in terms of timeliness, cost, or quality of the
outcome.
Is caused by any combination of process design, resource
behaviour, or case-data.
E.g., Breaching SLA agreements in terms of completion times,
over-running agreed budgets, producing low-quality outputs
.
International Organization for Standardization. Risk management: vocabulary = Management
du risque: vocabulaire (ISO guide 73). Geneva, 2009.
Process-related risk
Risk - “effect of uncertainty on objectives” where “an effect is a
deviation from the expected - positive and/or negative.”
4. Research Scope
4
Objective: Develop techniques for identification of process-
related risks
Time-based process related risk: Case Delays
Question: Can we identify the risk of case delays by
analyzing the behaviour of a process?
<a, b, c, d, e>
<a, b, c, d, e>
<a, b, c, c, c, c, d, e>
<a, b, c, d, e>
<a, b, c, d, e>
When an activity is repeated multiple
times, the likelihood of a case delay is high.
5. Starting point: exploiting event log data
5
Modern organisations automate their business processes and processes’
execution data is usually recorded in event logs.
Process mining provides techniques and tools that help to extract
knowledge about processes from event logs.
6. Research Approach
6
Goal: develop a method that can identify the risk of
delay for cases with a high degree of precision
This idea was presented at BPI 2012 workshop [14]
Define Process
Risk Indicators
(PRIs)
Configure PRIs
Identify the
presence of PRI
instances in a
current case
7. Step 1: Define Process Risk Indicators
7
Process Risk Indicator (PRI) - a pattern observable in an event log whose presence
indicates a higher likelihood of some process-related risk.
Activity-based PRIs
PRI3: Multiple activity repetitions PRI4: Presence of a “risky” activity
PRI1: Atypical activity execution time PRI2: Atypical waiting time
PRI 6: Atypical sub-process duration
8. Activity Resource
A R1
B R23
C R12
D R11
E R5
F R4
- -
Process Risk Indicators (Resource-based)
8
Resource-based PRIs
PRI5: Multiple resource involvement PRI7: High resource workload
PRI8: Use of a “risky” resource
Activity Resource
A R1
B R1
C R1
D R1
E R5
F R4
- -
Activity Resource
A R1
B R1
C R111
D R1
E R5
F R4
- -
9. PRI instantiation from logs: approach in [14]
9
“Sample standard deviations” approach for outlier detection:
Cut-off thresholds for a PRI:
Limitations:
Assumption of a normal distribution
Assumption that any atypical behavior is “risky”
E.g., a large variation in execution time of a short activity has the same impact
on case delay predictions as those of a long activity
Results:
Indicators can predict case delays but obtained a high level of false
positive predictions
cases that are predicted to be late but in the end are not
10. Step 2: Configure Process Risk Indicators
10
Motivation: calibrate PRIs so that process semantics are considered
How: using information about the known outcomes from cases in the
past (whether they are delayed or completed on time)
Learn the threshold values for PRIs for a desired precision level
Input parameter: precision level – 80%, 85%, 90%
Input parameter: a log (training set)
11. Example of configuring PRI 1:
“Atypical activity duration”
11
If the duration of activity A is more than t days there is a high risk of the
case delay.
10?
20?
??
To define t:
2. Calculate for each value in C precision of delay predictions in the training set:
• If activity A was executed for more than 12 days 60% of cases were delayed
• If activity A was executed for more than 16 days 90% of cases were delayed …
3. Assign t the smallest value from C that allows predicting delays with a
desired degree of precision.
• t = 16 if we would like 90% precision level
10
12 14
18
16
20
…
C
1. Define a pool of candidates C
including values:
12. Example of configuring PRI 5:
“Multiple resource involvement”
12
If more than t resources are involved in a case there is a high risk of the
case delay.
5?
8?
??
To define t:
2. Calculate for each value in C precision of delay predictions in the training
set:
• If more than 7 resources were involved 80% of cases were delayed
• If more than 8 resources were involved 95% of cases were delayed …
3. Assign t the smallest value from C that allows predicting delays with a
desired degree of precision.
• t = 8
3
4 7
8
5
9
…
C
1. Define a pool of candidates C
including values:
13. Configuring other PRIs
13
Activity-based PRIs: PRI 1, 2, 3, 6
Resource-based PRIs: PRI 5, 7
PRI 4: Presence of a risky activity
we check if there exists an activity that is executed mainly in
delayed cases
PRI 8: Use of a risky resource
we check for each pair “activity-resource" if some resource's
involvement in the execution of an activity mainly occurs in
delayed cases
14. Step 3: Identify the presence of PRI
instances in a current case14
Input parameter: log (test set)
Compare the values for a current case against the thresholds
of PRIs
Record a likelihood of a case delay (0 or 1)
if the number is higher than the value of the learned threshold t
if a ‘risky’ activity is present in the current case
If an activity is assigned to a ‘risky’ resource
15. Implementation within the ProM framework
15
Event log
ProM plug-in
1. Inputs: expected case duration
2. Learn cut-off thresholds for PRIs and Identify the presence of PRIs in cases
3. If any of the PRIs is identified in a case, we consider that there is a risk of delay
4. Compare predicted values with the real case durations
Case ID PRI 1 PRI 2 PRI 3 PRI 4 PRI 5 PRI 6 PRI 7 PRI 8 Risk
1000 1 0 1 0 1 1 1 1 1
102 0 0 0 0 0 0 0 0 0
103 0 0 1 0 0 0 0 0 1
106 0 0 0 0 0 0 0 0 0
305 0 0 0 0 0 0 0 0 0
554 0 0 0 0 0 0 0 0 0
Delayed
activity
Multiple
resources
Activity
repetitions
16. Validation with real event logs
Experimental setup
16
Hold-out cross-validation training and test sets (75:25)
“Random” split and “Time” split (4:2 months)
Evaluation of precision and recall
Precision: the fraction of cases predicted to be delayed that are actually
delayed
Recall: the fraction of delayed cases that can be successfully predicted
against the actually delayed cases
Data pre-processing:
Completed cases
Recent data
Separating cases that are executed in different contexts (e.g., different
departments)
17. Validation with real event logs
Data sets
17
Six Data sets from Suncorp, a large Australian insurance company
Represent insurance claim processes from different organisational units
Properties of data set A
Properties of data sets B1-B5
18. Validation with real event logs
Results. Data set A. “Random” split experiment.
19
Legend:
• 95%, 90%, 80% - desired precision levels
• TP – True Positives (cases predicted
correctly as delayed)
• FP – False Positives (cases predicted to be
delayed but are not delayed)
• FN – False Negatives (delayed cases that
are not predicted to be delayed)
• TN – True Negatives (in time cases that
are also not predicted to be delayed)
• PRI 1: Atypical activity execution time
• PRI 2: Atypical waiting time
• PRI 3: Multiple activity repetitions
• PRI 4: Presence of a “risky” activity
• PRI 5: Multiple resource involvement
• PRI 6: Atypical sub-process duration
• PRI 7: High resource workload
• PRI 8: Use of a “risky” resource
19. Validation with real event logs
Results. Data set A, “Time” split experiment.
20
Legend:
• 95%, 90%, 80% - desired precision levels
• TP – True Positives (cases predicted correctly as
delayed)
• FP – False Positives (cases predicted to be
delayed but are not delayed)
• FN – False Negatives (delayed cases that are not
predicted to be delayed)
• TN – True Negatives (in time cases that are also
not predicted to be delayed)
• PRI 1: Atypical activity execution time
• PRI 2: Atypical waiting time
• PRI 3: Multiple activity repetitions
• PRI 4: Presence of a “risky” activity
• PRI 5: Multiple resource involvement
• PRI 6: Atypical sub-process duration
• PRI 7: High resource workload
• PRI 8: Use of a “risky” resource
20. Data set A. “Random” split experiment (without
configuration)
22
21. Moment of delay prediction: motivation
23
Predicting delays early during a case’s execution is a
highly desirable capability
Early risk detection enables risk mitigation:
Risk elimination (e.g. reallocation of an activity to other
resource)
Reduction of impact (e.g. adding additional resources in a
case to decrease extent of delay)
22. Moment of delay prediction
Data set A, Random split, 90% precision level
24
x: The number of days since the beginning of a case when the risk of the case delay
was discovered.
y: The cumulative number of delay predictions at a certain point in time
23. Observations from the experiments
25
• Good predictors in all data sets:
• PRI 1: Atypical activity execution time
• PRI 2: Atypical waiting time
• PRI 6: Atypical sub-process duration
• Good predictors in some data sets:
• PRI 3: Multiple activity repetitions
• PRI 4: Presence of a ‘risky’ activity
• PRI 7: High resource workload
• PRI 8: Use of a ‘risky’ resource
• Early predictions:
• PRI 4: Presence of a ‘risky’ activity
• PRI 7: High resource workload
• PRI 8: Use of a ‘risky’ resource
• Limitations of the data:
• High process variability in data
sets B1-B5
• No complete information about
resource workload
• Limitations of the approach:
• Assumption that a process is in a
steady state
• External context is not
considered
24. Conclusions and Future work
26
A method for predicting case delays with a high degree of precision
Utilise eight process risk indicators
Calibrate the threshold values for risk indicators using log data
Predict the likelihood of case delays using current case and log data
Experiments showed that this approach
decreases the level of false positive alerts,
significantly improves the precision of case delay predictions,
can predict case delays before a certain deadline
Future work:
Investigating the relation between PRIs and the extent of the expected
delay
Alternative approaches: neural networks, decision trees
Applying the technique to other types of risks (e.g., budget overrun or
low-quality output)
25. PROFILING EVENT LOGS TO CONFIGURE
RISK INDICATORS FOR PROCESS DELAYS
Thank You! Questions?
Email: m.wynn@qut.edu.au
Anastasiia Pika, Wil M. P. van der Aalst, Colin J. Fidge,
Arthur H. M. ter Hofstede, and Moe T. Wynn