Predictive Process Monitoring
Framework with Hyperparameter
Optimization
Chiara Di Francescomarino
Chiara Ghidini
Fondazione Bruno Kessler
Marlon Dumas
Fabrizio Maria Maggi
University of Tartu
Marco Federici
Williams Rizzi
University of Trento
Predictive Business Process Monitoring
Predictive
Business
Process
Monitoring
Historical
execution
traces
Running
trace
Prediction
problem
Prediction
Does Alice
need a given
exam?
2
Predictive Process
Monitoring Frameworks
• Framework instance
or configuration:
combination of
techniques and their
input parameters
(hyperparameters).
• No unique
framework instance
for all prediction
problems and
datasets.
Predictive Process
Monitoring Framework
Decision
Tree
Random
Forest
Historical
execution
traces
Running
trace
Prediction
problem
3
In the “Real” World
Does Alice need the exams
tumor marker CA- 19.9 or
ca -125 using meia?
Which framework
instance best suits my
dataset and problem?
Which one if I would
like to have only
accurate predictions?
Predictive Process
Monitoring Framework
Decision
Tree
Random
Forest
4
The Existing Landscape
• Approaches for
– the selection of machine learning techniques
– the tuning of their hyperparameters
– the combined optimization of machine learning
techniques and their hyperparameters
• We need to deal with the combination of
more than one machine learning technique,
depending one from the other.
5
How to Avoid Users’ Panic?
• A Predictive Process Monitoring Framework
enhanced with technique and hyperparameter
optimization
1. An exhaustive exploration of a set of the
framework configurations
2. Comparison and analysis of the results.
6
The Enhanced Framework
Prediction
Problem
Predictive
Process
Monitoring
Framework
Historical
execution
traces
Running
trace
Prediction
Technique and Hyperparameter Tuner
Validation
execution
traces
Replayer
Evaluator
Framework
Instance
Aggregated
Metrics
7
The Predictive Process
Monitoring Framework
Pre-processing
Historical
execution
traces
Running
trace
Runtime
Clustering Clusters
Control
flow
encoding
Encoded
control
flow
CONTROL
FLOW
Prefix
extraction
Trace
Prefixes
Predictive Monitoring
Control
flow
encoding
Data
encoding
Cluster(s)
identification
Classification
Prediction
Problem
Prediction
Supervised
Learning Classifiers
Data
encoding
Encoded
data
DATALabeling
function
8
The Predictive Process Monitoring
Framework Instances
• Each technique has its own hyperparameters
• Other framework parameters:
– Trace prefix size
– Voting mechanism
– Interval choice in case of interval time predictions
9
Technique and Hyperparameter Tuning
• A trace is replayed until an evaluation point with a prediction confidence
above a given threshold is reached.
• Three metrics/evaluation dimensions:
– Accuracy
– Failure rate
– Earliness
ProM
ProM
Operational
Support
Service 2.0
Predictive
Monitor
Technique and Hyperparameter Tuner
Replayer
Validation
execution
traces
Configuration
Sender
Evaluator
Framework
Instance
Aggregated
Metrics
Framework
Instance
10
Improving Efficiency
• Scheduling mechanism for parallel replayers
• Reuse of data structures
ProM
ProM
Operational
Support
Service 2.0
Predictive
Monitor
Technique and Hyperparameter Tuner
Replayer 1
<<GUI>>
Unfolding
Module
Configuration
Sender
Replayer
Scheduler
configuration
{Run ID}
<Run ID, Trace>
Replayer 2
Replayer NSCHEDULER
Structured
structure
Repository
11
Supporting Users in the
Analysis of the Results
12
Evaluation
• A suitable configuration for the prediction
problem and dataset in practice
1. Does it return a set of configurations suitable for
the prediction problem?
2. Does the selected configuration meet the choice
criteria?
3. Does it require a reasonable amount of time?
13
Experimental Settings
• Two datasets and two prediction problems
– BPI Challenge 2011
• 𝜑11 = F "tumor marker CA−19.9" ∨ F("ca−125 using
meia")
• 𝜑12 = G("CEA− tumor marker using meia"→ F("squamous
cell carcinoma using eia"))
– BPI Challenge 2015
• 𝜑21 = F "start WABO procedure" ∧ F("extend procedure
term")
• 𝜑22 = G("send confirmation receipt"→ F("retrieve missing
data"))
Dataset
preparation:
•Training set (70%)
•Validation set (20%)
•Testing set (10%)
Identification of the
most suitable
configurations
(among 160)
Evaluation of the
identified
configurations
(with the testing
set) 14
Configuration Set Variability
• Higher variability for the first dataset → tuning
depends on users’ needs
• Lower variability for the second dataset →
configurations do not change that much
15
Configuration Selection
• No unique best configuration.
• Evaluation values are aligned with the
tuning ones. 16
Computation Time
• Computation time can depend on the trace
length.
• Data structure reuse →20% time reduction
• 8 replayers → 13% time reduction
17
Summing up & Looking Ahead
• A predictive monitoring framework enhanced
with technique and hyperparameter
optimization
• Three directions:
– Increase user support
– Optimize exhaustive search
– Prescriptive process monitoring
18

Predictive Process Monitoring with Hyperparameter Optimization

  • 1.
    Predictive Process Monitoring Frameworkwith Hyperparameter Optimization Chiara Di Francescomarino Chiara Ghidini Fondazione Bruno Kessler Marlon Dumas Fabrizio Maria Maggi University of Tartu Marco Federici Williams Rizzi University of Trento
  • 2.
    Predictive Business ProcessMonitoring Predictive Business Process Monitoring Historical execution traces Running trace Prediction problem Prediction Does Alice need a given exam? 2
  • 3.
    Predictive Process Monitoring Frameworks •Framework instance or configuration: combination of techniques and their input parameters (hyperparameters). • No unique framework instance for all prediction problems and datasets. Predictive Process Monitoring Framework Decision Tree Random Forest Historical execution traces Running trace Prediction problem 3
  • 4.
    In the “Real”World Does Alice need the exams tumor marker CA- 19.9 or ca -125 using meia? Which framework instance best suits my dataset and problem? Which one if I would like to have only accurate predictions? Predictive Process Monitoring Framework Decision Tree Random Forest 4
  • 5.
    The Existing Landscape •Approaches for – the selection of machine learning techniques – the tuning of their hyperparameters – the combined optimization of machine learning techniques and their hyperparameters • We need to deal with the combination of more than one machine learning technique, depending one from the other. 5
  • 6.
    How to AvoidUsers’ Panic? • A Predictive Process Monitoring Framework enhanced with technique and hyperparameter optimization 1. An exhaustive exploration of a set of the framework configurations 2. Comparison and analysis of the results. 6
  • 7.
    The Enhanced Framework Prediction Problem Predictive Process Monitoring Framework Historical execution traces Running trace Prediction Techniqueand Hyperparameter Tuner Validation execution traces Replayer Evaluator Framework Instance Aggregated Metrics 7
  • 8.
    The Predictive Process MonitoringFramework Pre-processing Historical execution traces Running trace Runtime Clustering Clusters Control flow encoding Encoded control flow CONTROL FLOW Prefix extraction Trace Prefixes Predictive Monitoring Control flow encoding Data encoding Cluster(s) identification Classification Prediction Problem Prediction Supervised Learning Classifiers Data encoding Encoded data DATALabeling function 8
  • 9.
    The Predictive ProcessMonitoring Framework Instances • Each technique has its own hyperparameters • Other framework parameters: – Trace prefix size – Voting mechanism – Interval choice in case of interval time predictions 9
  • 10.
    Technique and HyperparameterTuning • A trace is replayed until an evaluation point with a prediction confidence above a given threshold is reached. • Three metrics/evaluation dimensions: – Accuracy – Failure rate – Earliness ProM ProM Operational Support Service 2.0 Predictive Monitor Technique and Hyperparameter Tuner Replayer Validation execution traces Configuration Sender Evaluator Framework Instance Aggregated Metrics Framework Instance 10
  • 11.
    Improving Efficiency • Schedulingmechanism for parallel replayers • Reuse of data structures ProM ProM Operational Support Service 2.0 Predictive Monitor Technique and Hyperparameter Tuner Replayer 1 <<GUI>> Unfolding Module Configuration Sender Replayer Scheduler configuration {Run ID} <Run ID, Trace> Replayer 2 Replayer NSCHEDULER Structured structure Repository 11
  • 12.
    Supporting Users inthe Analysis of the Results 12
  • 13.
    Evaluation • A suitableconfiguration for the prediction problem and dataset in practice 1. Does it return a set of configurations suitable for the prediction problem? 2. Does the selected configuration meet the choice criteria? 3. Does it require a reasonable amount of time? 13
  • 14.
    Experimental Settings • Twodatasets and two prediction problems – BPI Challenge 2011 • 𝜑11 = F "tumor marker CA−19.9" ∨ F("ca−125 using meia") • 𝜑12 = G("CEA− tumor marker using meia"→ F("squamous cell carcinoma using eia")) – BPI Challenge 2015 • 𝜑21 = F "start WABO procedure" ∧ F("extend procedure term") • 𝜑22 = G("send confirmation receipt"→ F("retrieve missing data")) Dataset preparation: •Training set (70%) •Validation set (20%) •Testing set (10%) Identification of the most suitable configurations (among 160) Evaluation of the identified configurations (with the testing set) 14
  • 15.
    Configuration Set Variability •Higher variability for the first dataset → tuning depends on users’ needs • Lower variability for the second dataset → configurations do not change that much 15
  • 16.
    Configuration Selection • Nounique best configuration. • Evaluation values are aligned with the tuning ones. 16
  • 17.
    Computation Time • Computationtime can depend on the trace length. • Data structure reuse →20% time reduction • 8 replayers → 13% time reduction 17
  • 18.
    Summing up &Looking Ahead • A predictive monitoring framework enhanced with technique and hyperparameter optimization • Three directions: – Increase user support – Optimize exhaustive search – Prescriptive process monitoring 18