Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Process Mining and
Predictive Process
Monitoring
Marlon Dumas
marlon.dumas@ut.ee
1
Business Process Monitoring
Dashboards & Reports
Process MiningEvent
stream
DB
2
Offline Process Mining
3
/
event log
discovered model
Discovery
Conformance
Deviance
Difference
diagnostics
Performance
...
Offline Process Mining: The Apromore Approach
4
/
event log
discovered model
Discovery
Conformance
Deviance
Difference
d...
Automated Process Discovery
5
Enter Loan
Application
Retrieve
Applicant
Data
Compute
Installments
Approve
Simple
Applicati...
Automated Process Discovery:
Before BPMN Miner (Heuristics Miner)
Automated Process Discovery: BPMN Miner
Conformance Checking
8
≠
Conformance Checking with Trace Alignment
A B C H E I J K C D I J K C E G
A B C H E I J K C D I J K C E
A B C H E I J K C ...
Difference
statements
Event log
Input model
PESM
unfold
PESL
merge
Partially
Synchronized
Product (PSP)
compare
extract
di...
Conformance Checking with Behavioral Alignment
Desired conformance output:
• task C is optional in the log
• the cycle inc...
Given two logs, find the differences and root causes for
variation or deviance between the two logs
Simple claims and quic...
Deviance Mining via Sequence Classification
• Apply discriminative sequence mining methods to extract
features characteris...
Difference
statements
Event log
Input model
PESM
unfold
PESL
merge
Partially
Synchronized
Product (PSP)
compare
extract
di...
Sequence classification vs. log delta analysis
L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
...
Apromore Process Analytics Platform
(apromore.org)
Open-source, highly scalable, SaaS BPM analytics platform
M. La Rosa, H...
How likely is it that a running
process will become “deviant”?
Will it end up in
a negative
outcome?
Will it fail to
meet ...
Deviance Mining and Predictive Monitoring
19
20
Debt repayment due Call the debtor Send a reminder Payment received
Predictive Monitoring Example:
Debt Recovery Process
Debt repayment due Call the debtor Send a reminder Send a warning Call the debtor Call the debtor
Send to external debt
co...
Event log
Classifier
/
Outcome
Predictions
Attributes
Traces
Predictive Process Monitoring: General Approach
22
Event log
...
Predictor
Decision tree
learning
Decision
tree
Class
estimation
Current trace
[Data+] Prediction
Predictive Monitoring:
Ru...
• BPI Challenge 2011 dataset
• Healthcare process at Dutch hospital
• 1141 cases, avg length 14 events/case
• Split normal...
• Reasonably accurate at mid-
point (AUC 0.78-0.88)
• High runtime overhead 5-10
secs / prediction
Evaluation Results
25
Predictive Process Monitoring:
Cluster & Classify
26
Pre-processing
Historical
execution
traces
Running
trace
Runtime
Clus...
Each technique has its own hyperparameters
Other parameters:
• Trace prefix size
• Voting mechanism
• Interval choice in c...
• Four outcome labellings of a large real-life patient treatment
dataset
Experimental Settings
Dataset preparation:
•Train...
• No unique best configuration.
• Accuracy is consistently high and accuracy on testing set
consistent with the tuning.
Ev...
Computation Time!!!
• Idea: One classifier per index
• Classifier for prefixes of length 1
• Classifier for prefixes of length 2
• Etc.
• Trac...
• Same as before, but feature vector of a prefix extended with
Log-Likelihood Ratio of being in the deviant or regular cla...
Evaluation Setup
33
Evaluation Results
34
Predictive Monitoring with Unstructured Data
35
Text mining
36
Text-Extended Index-Based Encoding
37
• Bag-of-N-grams
• Weighted bag-of-N-grams
• Latent Dirichlet Allocation (LDA)
• Par...
Debt Recovery Lead-to-contract
# normal cases 13608 385
# deviant cases 417 390
Avg # words per doc 11 8
# lemmas 11822 25...
Evaluation Results
39
Ongoing work
LSTM-Based Predictive Process Monitoring
40
Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas: Predicti...
• Accurate, robust techniques to predict case outcome,
covering control-flow, structured and textual data
• LSTM-based arc...
Upcoming SlideShare
Loading in …5
×

Process Mining and Predictive Process Monitoring

2,245 views

Published on

Seminar delivered Sapienza University of Rome on 28/04/2017 and at Tallinn Tech on 16/02/2017. Video recording of the Rome delivery is available at: https://www.youtube.com/watch?v=hMQolsRT0K0

Published in: Education

Process Mining and Predictive Process Monitoring

  1. 1. Process Mining and Predictive Process Monitoring Marlon Dumas marlon.dumas@ut.ee 1
  2. 2. Business Process Monitoring Dashboards & Reports Process MiningEvent stream DB 2
  3. 3. Offline Process Mining 3 / event log discovered model Discovery Conformance Deviance Difference diagnostics Performance input model Enhanced model event log’
  4. 4. Offline Process Mining: The Apromore Approach 4 / event log discovered model Discovery Conformance Deviance Difference diagnostics Performance input model Enhanced model event log’ BPMN Miner Log Delta Analysis Behavioral Alignment All integrated into: http://apromore.org
  5. 5. Automated Process Discovery 5 Enter Loan Application Retrieve Applicant Data Compute Installments Approve Simple Application Approve Complex Application Notify Rejection Notify Eligibility CID Task Time Stamp … 13219 Enter Loan Application 2007-11-09 T 11:20:10 - 13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 - 13220 Enter Loan Application 2007-11-09 T 11:22:40 - 13219 Compute Installments 2007-11-09 T 11:22:45 - 13219 Notify Eligibility 2007-11-09 T 11:23:00 - 13219 Approve Simple Application 2007-11-09 T 11:24:30 - 13220 Compute Installements 2007-11-09 T 11:24:35 - … … … …
  6. 6. Automated Process Discovery: Before BPMN Miner (Heuristics Miner)
  7. 7. Automated Process Discovery: BPMN Miner
  8. 8. Conformance Checking 8 ≠
  9. 9. Conformance Checking with Trace Alignment A B C H E I J K C D I J K C E G A B C H E I J K C D I J K C E A B C H E I J K C E I K CJ F A B C H E I J K C D I J K G A B C H E I J K C D I J K G A B C H E I J K C E I KJ A B C H E I J K C E I KJ A B C D I J K C I J KE G A B C D I J K I J K C E G A B C H E I J K C I KJH H H H H H A B C H E I J K C I KJH A B C H I J K C E I KJH A B C H E I J K I K CJ FH A B C H E I J K I K CJ FH A B C D I J K C I J KEH A B C H E I J K I KJC D A B C H E I J K I KJC D A B C H E I J K I KJH A B C H E I J K I KJH A B C H E I J K GEC A B C H E I J K GEC A B C H E I J K EC A B C H E I J K EC A B C H I J K EC G A B C D I J K GEC A B C H I J K C F A B C H I J K C F A B C H I J K G A B C H E I J K A B C GE A IE J K A GE Activity occurs in the log only, but occurs in the model in another path Activity occurs in the model only and is not observed anywhere in the log Activity occurs in the model only, but occurs in the log in another trace Activity occurs both in the model and the log Legend
  10. 10. Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences Conformance Checking with Behavioral Alignment
  11. 11. Conformance Checking with Behavioral Alignment Desired conformance output: • task C is optional in the log • the cycle including IGDF is not observed in the log Log traces: ABCDEH ACBDEH ABCDFH ACBDFH ABDEH ABDFH L. Garcia-Banuelos, N.R. van Beest, M. Dumas, M. La Rosa, W. Mertens, Complete and Interpretable Conformance Checking of Business Processes, Technical Report, IEEE Transactions on Software Engineering, in press.
  12. 12. Given two logs, find the differences and root causes for variation or deviance between the two logs Simple claims and quick Simple claims and slow Deviance Mining MODEL S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013
  13. 13. Deviance Mining via Sequence Classification • Apply discriminative sequence mining methods to extract features characteristic of one class • Build classification models (e.g. decision trees) • Extract difference diagnostics from classification model C. Sun et al. Mining explicit rules for software process evaluation. ICSSP’2013.
  14. 14. Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences Log Delta Analysis Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences 22 Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405
  15. 15. Sequence classification vs. log delta analysis L1 - Short stay 448 cases 7329 events L2 - Long stay 363 cases 7496 events Sequence classification 106-130 statements IF |“NursingProgressNotes”| > 7.5 THEN L1 IF |“Nursing Progress Notes”| ≤ 7.5 AND |“Nursing Assessment”| > 1.5 THEN L2 … Log delta analysis 48 statements In L1, “Nursing Primary Assessment” is repeated after “Medical Assign” and “Triage Request”, while in L2 it is not … N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405
  16. 16. Apromore Process Analytics Platform (apromore.org) Open-source, highly scalable, SaaS BPM analytics platform M. La Rosa, H. Reijers, W. van der Aalst, R. Dijkman, J. Mendling, M. Dumas, L. Garcia-Banuelos “APROMORE: an advanced process model repository”, EXP.SYS.APP. 2011
  17. 17. How likely is it that a running process will become “deviant”? Will it end up in a negative outcome? Will it fail to meet its SLAs in the next 24 hours? Will it generate abnormal effort, costs or rework? Beyond Deviance Mining: Predictive Process Monitoring
  18. 18. Deviance Mining and Predictive Monitoring 19
  19. 19. 20 Debt repayment due Call the debtor Send a reminder Payment received Predictive Monitoring Example: Debt Recovery Process
  20. 20. Debt repayment due Call the debtor Send a reminder Send a warning Call the debtor Call the debtor Send to external debt collection agency Call the debtor Send a reminder Send a warning Call the debtor Call the debtorCall the debtor Call the debtor Call the debtor Call the debtor Call the debtor Call the debtor 21 Predictive Monitoring Example: Debt Recovery Process
  21. 21. Event log Classifier / Outcome Predictions Attributes Traces Predictive Process Monitoring: General Approach 22 Event log Regressor / structured predictor Future “paths” prediction Attributes Traces
  22. 22. Predictor Decision tree learning Decision tree Class estimation Current trace [Data+] Prediction Predictive Monitoring: Runtime Nearest-Neighbors Approach 23 Trace Processor kNN extraction (string-edit distance) Current trace [Event+] Event log Similar execution traces Feature extraction Labeled samples Current trace [Data+] F.M. Maggi, C. Di Francescomarino, M. Dumas, C. Ghidini. Predictive Monitoring of Business Processes. CAiSE'2014
  23. 23. • BPI Challenge 2011 dataset • Healthcare process at Dutch hospital • 1141 cases, avg length 14 events/case • Split normal-deviant via 5 predicates: φ1–φ5 • Prediction made at: • Start event (initial event) • Early event (ca. ¼ of the trace) • Middle Evaluation Setup 24
  24. 24. • Reasonably accurate at mid- point (AUC 0.78-0.88) • High runtime overhead 5-10 secs / prediction Evaluation Results 25
  25. 25. Predictive Process Monitoring: Cluster & Classify 26 Pre-processing Historical execution traces Running trace Runtime Clustering Clusters Control flow encoding Encoded control flow CONTROL FLOW Prefix extraction Trace Prefixes Predictive Monitoring Control flow encoding Data encoding Cluster(s) identification Classification Prediction Problem Prediction Supervised Learning Classifiers Data encoding Encoded data DATALabeling function AUC of 0.6 to 0.85 with a lot of variation
  26. 26. Each technique has its own hyperparameters Other parameters: • Trace prefix size • Voting mechanism • Interval choice in case of interval time predictions Predictive Process Monitoring: Cluster & Classify with Hyperparameter Optimization 27
  27. 27. • Four outcome labellings of a large real-life patient treatment dataset Experimental Settings Dataset preparation: •Training set (70%) •Validation set (20%) •Testing set (10%) Identification of the most suitable configurations (among 160) Evaluation of the identified configurations (with the testing set)
  28. 28. • No unique best configuration. • Accuracy is consistently high and accuracy on testing set consistent with the tuning. Evaluation Results Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi, Irene Teinemaa. Clustering-Based Predictive Process Monitoring. IEEE Transactions on Services Computing, 2017.
  29. 29. Computation Time!!!
  30. 30. • Idea: One classifier per index • Classifier for prefixes of length 1 • Classifier for prefixes of length 2 • Etc. • Traces of length m encoded using an index-based schem • At runtime, classify a trace of length m using the corresponding classifier Index-Based Multi-Classifier 31 Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi: Complex Symbolic Sequence Encodings for Predictive Monitoring of Business Processes. Proc. Of BPM 2015, pp. 297-313.
  31. 31. • Same as before, but feature vector of a prefix extended with Log-Likelihood Ratio of being in the deviant or regular class according to a Hidden-Markov Model Index-Based Multi-Classifier + HMM 32
  32. 32. Evaluation Setup 33
  33. 33. Evaluation Results 34
  34. 34. Predictive Monitoring with Unstructured Data 35
  35. 35. Text mining 36
  36. 36. Text-Extended Index-Based Encoding 37 • Bag-of-N-grams • Weighted bag-of-N-grams • Latent Dirichlet Allocation (LDA) • Paragraph Vector (PV)
  37. 37. Debt Recovery Lead-to-contract # normal cases 13608 385 # deviant cases 417 390 Avg # words per doc 11 8 # lemmas 11822 2588 Evaluation Setup 38 • Data split: 80% train, 20% test (randomly) • Handling imbalance: oversampling • Classifiers: random forest and logistic regression • Evaluation metrics: F-Score and earliness • Parameter-tuning: grid search with 5-fold cross validation on training set
  38. 38. Evaluation Results 39
  39. 39. Ongoing work LSTM-Based Predictive Process Monitoring 40 Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas: Predictive Business Process Monitoring with LSTM Neural Networks. CoRR abs/1612.02130 (2016).
  40. 40. • Accurate, robust techniques to predict case outcome, covering control-flow, structured and textual data • LSTM-based architecture to predict • Next task + timestamp + resource or other attributes • Remaining execution path and time • All code available: • Clustering-based method: http://goo.gl/ykozBf • Index-based method: https://goo.gl/BQFk7k • Index-based method with textual features: https://goo.gl/a2DoWT • LSTM-based method: https://goo.gl/mkQDyy Online predictive process monitoring 41

×