Paper presentation at the 3rd International Conference on Process Mining (ICPM), 4 November 2021.
The paper is available at: https://arxiv.org/abs/2105.07111
DBA Basics: Getting Started with Performance Tuning.pdf
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
1. Prescriptive Process
Monitoring for Cost-Aware
Cycle Time Reduction
Zahra Dasht Bozorgi, Irene Teinemaa,
Marlon Dumas, Marcello La Rosa, Artem Polyvyanyy
1
International Conference on Process Mining (ICPM 2021)
November 4, 2021
2. The Problem
• Mary is owner of a loan origination process @ a credit union.
• Speed matters to Mary! Sometimes, cases are delayed.
• There are a few actions (treatments) that Mary and
colleagues can trigger to reduce the cycle time of a case:
– Give a phone call to the customer instead of sending an
email
– Prepare two or more loan offers at once to avoid back-
and-forth
• Each intervention has a cost.
• For which cases should an intervention be triggered and
when?
3. Baseline Solution: Predictive Monitoring
1. Train a predictive model from
an event log
2. Find cases that are predicted
to have the longest cycle time.
3. Apply the treatment to the
cases with the longest
predicted cycle time.
3
4. Example: loan application process
4
Time
Case start
Receive application
Request further documents
Review application
Create loan offer
Check fraud
Send Offer
Wait for customer reply
Treatment point
Remaining [execution] time
Waiting
time
Case end
Cause of long cycle time
If we treat those cases that are predicted to be most delayed, we will
treat cases that take a long time, regardless of whether or not the
treatment has any effect!
5. Causal Inference to the Rescue!
5
Event Log
Event Log
Treatment/Intervention
Causal model
TreatmentPolicy
6. Preliminaries
How do we measure causal effects?
Causal Effect = difference between potential outcomes
Example (loan origination process)
- Treatment/intervention: calling the customer
- Outcome: shortening the cycle time
6
World 1
E[Y1] E[Y0]
World 2
Average Treatment Effect:
ATE = E[Y1 – Y0]
Conditional Average Treatment Effect:
CATE = E[Y1 – Y0|X= x]
7. Preliminaries
id T Y Y1 Y0 Y1 – Y0
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 0 ? 0 ?
5 0 1 ? 1 ?
6 1 1 1 ? ?
7
Fundamental Problem of Causal Inference:
Missing Data!
Solution: Randomised AB Experiment
• Expensive
• Time-consuming
• You can do it every so often, but not all the time
Next best solution:
Uplift Modelling based on observational data
Pre-requisite:
I need to have seen the treatment being applied in a random or “sufficiently diverse” set
of situations
12. Net-value Curve
1. Take all cases in the test set who are ranked in the top n%
according to the estimated treatment effect.
2. Calculate a scale factor for each segment:
N_treated/N_control
3. Calculate Qini(n) = sum(control) × scaleFactor –
sum(treated)
4. Then, gain(n) = v × Qini(n) – c × N_treated
5. Plot gain(n) for each n% policy
Where v is the value of reducing one unit of duration and c
is the cost of treating one case.
14. Now, the cycle time of my talk is going too long…
Should I apply a treatment?
14
15. Datasets and Experimental setup
• Baselines: Lasso and Random Forest
• Chosen because They are examples of a linear
and a non-linear model, and they perform well
on cycle time prediction for these data sets.
15
BPI Challenge 2017:
• Loan Application Process
• A mix of case and event attributes
• Balanced treatment and control groups
• Selected treatment: Call customer after offer
BPI Challenge 2019:
• Purchase-to-pay Process
• Mostly contains case attributes
• Unbalanced treatment and control groups
• Selected Treatment: Allow price change for the
item in the middle of the process.
17. Future Work
• Optimising the time of treatment
• Handling multiple treatments:
• Multiple types of treatments, e.g. call customer vs make a second loan offer
• Discovering candidate treatments from an event log
• Conducting complementary evaluations such as simulation studies or randomised
experiments to validate the findings
17
18. Thank you
Any treatments to reduce the cycle
time of this talk?
Zahra Dasht Bozorgi
zdashtbozorg@student.unimelb.edu.au
School of Computing and Information Systems
University of Melbourne
Editor's Notes
This is also an example of a long case. But this time, the cause is the long check for fraud, not waiting time. So the proposed treatment of calling the customer doesn’t change much.
So in this instance, skipping the ‘check fraud’ activity is a better treatment. Calling this customer is a waste of resource.
Instead of a predictive model, we train a causal model which quantifies the effect of the proposed treatment on the cycle time.
How do we measure causal effects? We do it by taking the difference between the potential outcomes. To see what that means let’s look at an example.
Suppose we have a loan application process. We would like to know whether calling a customer has a causal effect on them cycle time.
Now suppose we have two hypothetical worlds. In world one, all customers get a phone call after an offer is made to them. In world two no one gets this phone call. In each of these worlds, each application has an duration, which is called a potential outcome. This means that each application has two potential outcome. One outcome in world 1, denoted by y superscript 1 and another outcome in in world 2 denoted by y superscript 0. To measure an average treatment effect, the average outcome of world two should be subtracted from the average outcome of world one.
But most of the time, an action or treatment has different effects on different cases. For example, making a phone call does not always cause the duration to be shorter as discussed in the previous slides. So what we are interested in is case level causal effects. This is measured by the Conditional average treatment effect. Which is the expected difference in potential outcomes conditioned on a set of variables X.
But training accurate causal models are challenging. Because we do not have both potential outcomes. Often randomised trials are done to infer causal effects by comparing treatment and no-treatment groups. Since conducting randomised trials is not always possible we use observational data.
Observational studies do not always provide accurate causal effects, but they are still useful for obtaining treatment policies that lead to some benefit.
We propose a two phase approach. Offline and online phase.
K-prefix is the prefix before the treatment activity.
Feature engineering: Temporal features such as month weekday, hour of the last event in the prefix. The time between the first and last event in the prefix, the time between the last two events, and the time since the beginning of the first case in the log.
We also include the number of active cases as a feature to act as a proxy for the current workload in the process.
Activities and resources are encoded using aggregation encoding. We also include case attributes.
Why ORF? Because it allows the use of flexible models for the estimation nuisance functions (propensity score and outcome model). This is good if we have high dimensional set of confounders. Another advantage is the non-parametric estimation of the treatment effect, which is useful if the underlying causal structure is complex.
And also, ORF is asymptotically normal, which allows for the construction of valid confidence intervals if required.
In this step we take a separate test set, estimate the treatment effect for this set. We construct a net-value curve which is shown to an analyst to select the best treatment policy based on organizational constraints. If no constraints, the policy leading to the highest gain is automatically selected.
Next slide describes how we get the net-value curve.
at a given targeting threshold on the x-axis, let's say a threshold corresponding to targeting 10% cases:
1. you take all cases in the test set who are ranked in the top 10% according to your net value ranking metric
2. calculate a "scale factor": N_treated_cases in this segment divided by N_control_cases in this segment
3. sum(duration of cases in the control group) * scale_factor - calculate sum(duration of cases in the treated group)
This would give you the expected incremental reduction in duration, given that you treat the top 10% cases selected by your model.
4. multiply this quantity by v (the value of reducing one unit of duration) and subtract c*N_treated_cases to get the expected net value of your policy at the 10% threshold
The shape of the curve changes based on the ratio between v and c. In this example we see that if this ratio is 0.3 (meaning that the treatment is very expensive compared to the benefit it provides), the best policy is to treat half the cases. Even if this ratio is high, the best policy is to treat 80-90% instead of all the cases.
How we operationalise: we take an ongoing case, estimate a treatment effect for it. Suppose the policy is to treat the top 50% of cases. Based on that policy a threshold for treatment is selected. If the estimated effect for the ongoing case is above the threshold, we treat that case.
Note: the time for treatment is selected randomly from the distribution of the treatment time in past cases.
Here we can see that for both datasets, following the treating based the causal model leads to more benefit than treating based on predictive models.
This is because while predictive models are good at identifying which cases will take a long time, they are not necessarily good at identifying which cases should be targeted with the chosen treatment.