Research paper presentation on deep learning for predictive monitoring of business processes. Talk delivered by Niek Tax at the CAiSE'2017 conference, 15 June 2017. Research paper available at: http://tinyurl.com/yambgtng and source code available at: https://github.com/verenich/ProcessSequencePrediction/
Predictive Business Process Monitoring with LSTM Neural Networks
1. Predictive Business Process
Monitoring with LSTM Neural
Networks
Niek Tax (TU/e)
Ilya Verenich (QUT)
Marcello La Rosa (QUT)
Marlon Dumas (Tartu)
June 15th, 2017
4. Predictive Monitoring Example
PAGE 3
Current
situation
• What is the next activity for
this case?
• When is this next activity
going to take place?
• How long is this case still
going to take until it is
finished?
• What is the outcome of this
case? Is the compensation
going to be paid? Or
rejected?
5. Recurrent Neural Networks
PAGE 4
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436-444 (2015)
Has problems with long-term dependencies in the data!
6. Long Short-Term Memory
PAGE 5Figure from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8),
1735-1780 (1997)
How do we represent events as
neural network inputs?
7. From Event to Feature Vector
PAGE 6
One-hot encoding
1) Time since midnight
2) Time since week start
3) Time since last event <1,0,0,0,0,0,0,0>
• 8 different activities
• Each position
represents one
activity
9. Data Sets for Evaluation
• Helpdesk log
- Ticketing management
process of the helpdesk of an
Italian software company
• BPI Challenge 2012 W subprocess log
- A financial loan application process
at a large financial institution in
the Netherlands
PAGE 8
Events 13 710
Cases 3 804
Activities 9
• Environmental Permit log
- Process of handling environmental
permit applications at a Dutch
municipality
Events 72 413
Cases 9 658
Activities 6
Events 38 944
Cases 937
Activities 381
10. Baseline Technique for Time Prediction
PAGE 9
van der Aalst, W.M.P., Schonenberg, M.H., & Song, M. (2011). Time
prediction based on process mining. Information Systems, 36(2),
450-475.
11. Predicting the Next Activity and its Time
PAGE 10
2-layers of which
one layer shared
is the best
architecture on
both data sets
LSTMs outperform
traditional RNNs
LSTMs outperform the
transition system based
approach for time-of-
next-event prediction
LSTMs are more
accurate in predicting
the next activity on
BPI’12 W than 2
baseline methods
12. Predicting the Suffix of a Case
PAGE 11
Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Time and activity sequence prediction of
business process instances. arXiv preprint arXiv:1602.07566 (2016)
14. Conclusions
• LSTMs can be used as a general framework for prediction tasks in
the context of business processes that outperforms tailor-made
approaches on a range of tasks and data sets
• Predicting time and activity in one shared model outperforms
predicting both in separate models
• We identified a limitation of the technique when event logs contain
many repeated events
• Code and documentation is available at:
http://verenich.github.io/ProcessSequencePrediction
PAGE 13
15. Neural Network Details
• Learning algorithm: Adam
• Loss Functions
• Batch Normalization layer after every two layers
PAGE 14
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine
Learning (pp. 448-456).
Kingma, D., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd
International Conference for Learning Representations
Objective Loss Function
Activity Cross Entropy Loss
Time Mean Absolute Error Loss
Editor's Notes
This Figure sketches out a high-level architecture of a business process monitoring system supporting predictive monitoring.
Predictive monitoring produces recommendations for process workers during the execution of a case. These recommendations refer to a specific (uncompleted) case of the process and tell the user the impact of a given action on the probability that the case at hand will fail to fulfill the performance objectives or compliance rules. In particular, predictive monitoring can be used to raise alerts when certain actions are likely to lead to violation of business constraints. In this way, rather than imposing what to do, the business pro- cess support system acts as a compliance monitoring and recommender system, raising flags whenever certain actions heighten the probability of undesirable deviations.
In this context, SLA is any condition that can be evaluated to true or false over every completed case of the process (“every simple insurance claim should be resolved at most 2 weeks after all required documents have been submitted” ) or a performance objective (“every claim should be resolved within 2 months”)
This Figure sketches out a high-level architecture of a business process monitoring system supporting predictive monitoring.
Predictive monitoring produces recommendations for process workers during the execution of a case. These recommendations refer to a specific (uncompleted) case of the process and tell the user the impact of a given action on the probability that the case at hand will fail to fulfill the performance objectives or compliance rules. In particular, predictive monitoring can be used to raise alerts when certain actions are likely to lead to violation of business constraints. In this way, rather than imposing what to do, the business pro- cess support system acts as a compliance monitoring and recommender system, raising flags whenever certain actions heighten the probability of undesirable deviations.
In this context, SLA is any condition that can be evaluated to true or false over every completed case of the process (“every simple insurance claim should be resolved at most 2 weeks after all required documents have been submitted” ) or a performance objective (“every claim should be resolved within 2 months”)
This slide only at Benelearn, skip it at CAiSE
Process discovery is the task of discovering a process model from a log of events, extracted from for instance an ERP system. Events in an event log contain a case, which groups together events that somehow belong together, like here where each case represents a paper and each event represents a step in the submission process of this paper. Each case can be seen as an instance of the process. The process model generated by a process discovery algorithm can be in any process modeling notation depending on the algorithm, often Petri nets are used, like here on the screen.