Successfully reported this slideshow.
Your SlideShare is downloading. ×

Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction


Check these out next

1 of 18 Ad

More Related Content

Slideshows for you (20)

Similar to Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction (20)


More from Marlon Dumas (20)


Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction

  1. 1. Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction Zahra Dasht Bozorgi, Irene Teinemaa, Marlon Dumas, Marcello La Rosa, Artem Polyvyanyy 1 International Conference on Process Mining (ICPM 2021) November 4, 2021
  2. 2. The Problem • Mary is owner of a loan origination process @ a credit union. • Speed matters to Mary! Sometimes, cases are delayed. • There are a few actions (treatments) that Mary and colleagues can trigger to reduce the cycle time of a case: – Give a phone call to the customer instead of sending an email – Prepare two or more loan offers at once to avoid back- and-forth • Each intervention has a cost. • For which cases should an intervention be triggered and when?
  3. 3. Baseline Solution: Predictive Monitoring 1. Train a predictive model from an event log 2. Find cases that are predicted to have the longest cycle time. 3. Apply the treatment to the cases with the longest predicted cycle time. 3
  4. 4. Example: loan application process 4 Time Case start Receive application Request further documents Review application Create loan offer Check fraud Send Offer Wait for customer reply Treatment point Remaining [execution] time Waiting time Case end Cause of long cycle time If we treat those cases that are predicted to be most delayed, we will treat cases that take a long time, regardless of whether or not the treatment has any effect!
  5. 5. Causal Inference to the Rescue! 5 Event Log Event Log Treatment/Intervention Causal model TreatmentPolicy
  6. 6. Preliminaries How do we measure causal effects? Causal Effect = difference between potential outcomes Example (loan origination process) - Treatment/intervention: calling the customer - Outcome: shortening the cycle time 6 World 1 E[Y1] E[Y0] World 2 Average Treatment Effect: ATE = E[Y1 – Y0] Conditional Average Treatment Effect: CATE = E[Y1 – Y0|X= x]
  7. 7. Preliminaries id T Y Y1 Y0 Y1 – Y0 1 0 0 ? 0 ? 2 1 1 1 ? ? 3 1 0 0 ? ? 4 0 0 ? 0 ? 5 0 1 ? 1 ? 6 1 1 1 ? ? 7 Fundamental Problem of Causal Inference: Missing Data! Solution: Randomised AB Experiment • Expensive • Time-consuming • You can do it every so often, but not all the time Next best solution: Uplift Modelling based on observational data Pre-requisite: I need to have seen the treatment being applied in a random or “sufficiently diverse” set of situations
  8. 8. Approach 8
  9. 9. Log Pre-processing 9 • Data Cleaning • Feature Engineering • K-prefix extraction • Prefix Encoding
  10. 10. Causal Model Construction 10 Trainingset Orthogonal Random Forest Input: • Outcome Y • Treatment T • Features X • Confounders W
  11. 11. Policy Selection 11 Test set Trainedmodel Optimal policy based on organizational constraints Net-valuecurve
  12. 12. Net-value Curve 1. Take all cases in the test set who are ranked in the top n% according to the estimated treatment effect. 2. Calculate a scale factor for each segment: N_treated/N_control 3. Calculate Qini(n) = sum(control) × scaleFactor – sum(treated) 4. Then, gain(n) = v × Qini(n) – c × N_treated 5. Plot gain(n) for each n% policy Where v is the value of reducing one unit of duration and c is the cost of treating one case.
  13. 13. Online Phase 13 live data Trainedmodel Optimal policy Treatment Effects Decision to treat
  14. 14. Now, the cycle time of my talk is going too long… Should I apply a treatment? 14
  15. 15. Datasets and Experimental setup • Baselines: Lasso and Random Forest • Chosen because They are examples of a linear and a non-linear model, and they perform well on cycle time prediction for these data sets. 15 BPI Challenge 2017: • Loan Application Process • A mix of case and event attributes • Balanced treatment and control groups • Selected treatment: Call customer after offer BPI Challenge 2019: • Purchase-to-pay Process • Mostly contains case attributes • Unbalanced treatment and control groups • Selected Treatment: Allow price change for the item in the middle of the process.
  16. 16. Results 16 BPIC 2017 BPIC 2019
  17. 17. Future Work • Optimising the time of treatment • Handling multiple treatments: • Multiple types of treatments, e.g. call customer vs make a second loan offer • Discovering candidate treatments from an event log • Conducting complementary evaluations such as simulation studies or randomised experiments to validate the findings 17
  18. 18. Thank you Any treatments to reduce the cycle time of this talk? Zahra Dasht Bozorgi School of Computing and Information Systems University of Melbourne

Editor's Notes

  • This is also an example of a long case. But this time, the cause is the long check for fraud, not waiting time. So the proposed treatment of calling the customer doesn’t change much.

    So in this instance, skipping the ‘check fraud’ activity is a better treatment. Calling this customer is a waste of resource.
  • Instead of a predictive model, we train a causal model which quantifies the effect of the proposed treatment on the cycle time.
  • How do we measure causal effects? We do it by taking the difference between the potential outcomes. To see what that means let’s look at an example.
    Suppose we have a loan application process. We would like to know whether calling a customer has a causal effect on them cycle time.
    Now suppose we have two hypothetical worlds. In world one, all customers get a phone call after an offer is made to them. In world two no one gets this phone call. In each of these worlds, each application has an duration, which is called a potential outcome. This means that each application has two potential outcome. One outcome in world 1, denoted by y superscript 1 and another outcome in in world 2 denoted by y superscript 0. To measure an average treatment effect, the average outcome of world two should be subtracted from the average outcome of world one.
    But most of the time, an action or treatment has different effects on different cases. For example, making a phone call does not always cause the duration to be shorter as discussed in the previous slides. So what we are interested in is case level causal effects. This is measured by the Conditional average treatment effect. Which is the expected difference in potential outcomes conditioned on a set of variables X.
  • But training accurate causal models are challenging. Because we do not have both potential outcomes. Often randomised trials are done to infer causal effects by comparing treatment and no-treatment groups. Since conducting randomised trials is not always possible we use observational data.

    Observational studies do not always provide accurate causal effects, but they are still useful for obtaining treatment policies that lead to some benefit.
  • We propose a two phase approach. Offline and online phase.
  • K-prefix is the prefix before the treatment activity.

    Feature engineering: Temporal features such as month weekday, hour of the last event in the prefix. The time between the first and last event in the prefix, the time between the last two events, and the time since the beginning of the first case in the log.
    We also include the number of active cases as a feature to act as a proxy for the current workload in the process.

    Activities and resources are encoded using aggregation encoding. We also include case attributes.
  • Why ORF? Because it allows the use of flexible models for the estimation nuisance functions (propensity score and outcome model). This is good if we have high dimensional set of confounders. Another advantage is the non-parametric estimation of the treatment effect, which is useful if the underlying causal structure is complex.

    And also, ORF is asymptotically normal, which allows for the construction of valid confidence intervals if required.
  • In this step we take a separate test set, estimate the treatment effect for this set. We construct a net-value curve which is shown to an analyst to select the best treatment policy based on organizational constraints. If no constraints, the policy leading to the highest gain is automatically selected.

    Next slide describes how we get the net-value curve.
  • at a given targeting threshold on the x-axis, let's say a threshold corresponding to targeting 10% cases:
    1. you take all cases in the test set who are ranked in the top 10% according to your net value ranking metric
    2. calculate a "scale factor": N_treated_cases in this segment divided by N_control_cases in this segment
    3. sum(duration of cases in the control group) * scale_factor - calculate sum(duration of cases in the treated group)

    This would give you the expected incremental reduction in duration, given that you treat the top 10% cases selected by your model.

    4. multiply this quantity by v (the value of reducing one unit of duration) and subtract c*N_treated_cases to get the expected net value of your policy at the 10% threshold

    The shape of the curve changes based on the ratio between v and c. In this example we see that if this ratio is 0.3 (meaning that the treatment is very expensive compared to the benefit it provides), the best policy is to treat half the cases. Even if this ratio is high, the best policy is to treat 80-90% instead of all the cases.
  • How we operationalise: we take an ongoing case, estimate a treatment effect for it. Suppose the policy is to treat the top 50% of cases. Based on that policy a threshold for treatment is selected. If the estimated effect for the ongoing case is above the threshold, we treat that case.

    Note: the time for treatment is selected randomly from the distribution of the treatment time in past cases.
  • Here we can see that for both datasets, following the treating based the causal model leads to more benefit than treating based on predictive models.

    This is because while predictive models are good at identifying which cases will take a long time, they are not necessarily good at identifying which cases should be targeted with the chosen treatment.