Successfully reported this slideshow.
Your SlideShare is downloading. ×

Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 22 Ad

Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning

Download to read offline

Paper presentation at the International Conference on Advanced Information Systems Engineering (CAiSE).
This paper presents an approach to automatically discover business process simulation models from event logs by combining process mining and deep learning techniques.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-07472-1_4

Paper presentation at the International Conference on Advanced Information Systems Engineering (CAiSE).
This paper presents an approach to automatically discover business process simulation models from event logs by combining process mining and deep learning techniques.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-07472-1_4

Advertisement
Advertisement

More Related Content

Similar to Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning (20)

More from Marlon Dumas (20)

Advertisement

Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning

  1. 1. Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning Manuel Camargo Marlon Dumas Oscar González-Rojas
  2. 2. z 2 Process Credit Card Accept Cash or Check Identify payment method 18 / 8 min 20 / 5 min 5 / 2 min Prepare package for customer 10 / 5 min Cycle time Processing / Waiting times Costs x activity x instance x resource … Resource utilization 1h 5 min 38 min 10 min How to assess the impact of a business process change on temporal & cost measures? BUSINESS PROCESS SIMULATION
  3. 3. z DESIGNING BUSINESS PROCESS SIMULATION (BPS) MODELS 3 • Interviews • Expert knowledge • Observations • Sampling Time consuming (hard to tune) Execution paths left aside Identify payment method Prepare package for customer Accept Cash or Check X Process Credit Card X Payment Method? Process Virtual Currencies X Credit Card Accepted? Prone to human errors Unrealistic parameters
  4. 4. z Data to the Rescue! EnterpriseSystem (CRM, ERP,…) Event Log Simulation Model
  5. 5. z PROBLEM STATEMENT 5 How to automatically create accurate business process simulation models based on data extracted from enterprise information systems?
  6. 6. z OBSERVATION 6 A generative model of business processes is a statistical model constructed from an event log that can generate traces that resemble those observed in the log and other traces of the process (not observed in the log). A Business Process Simulation (BPS) model is a generative model of a business process.
  7. 7. z LEARNING GENERATIVE BUSINESS PROCESS MODELS 7 Data-Driven Discrete Event Simulation • Use process mining & data mining techniques to discover branching probabilities, resource pools, resource calendars, etc. Deep Learning • Use deep learning methods to discover a neural network (e.g. an LSTM network) to predict successive events and to generate event sequences
  8. 8. z 8 Data-Driven Simulation • May take as input a process specification (helps with interpretability) • Requires specifications of resource constraints • Models the case creation process via a probability distribution • Assumes undifferentiated resources with robotic behavior • Models resource availability as calendars (possibly discovered from historical data) • Relies on branching probabilities to local conditional choice • Provides a natural mechanism for capturing the effect of changes to the process Deep Learning (DL) Sequence Generation Methods • No interpretable process specification • Does not explicitly consider resource constraints • Learns the case arrival process from data • May capture differentiated resources and robotic behavior • Captures resource availability via non-linear functions • Branching behavior modeled via neural networks  may capture complex relations • Does not have a mechanism for capturing the effect of changes to the process LEARNING GENERATIVE BUSINESS PROCESS MODELS M. Camargo, M. Dumas, O. González Rojas: Discovering generative models from event logs: data-driven simulation vs deep learning. PeerJ Computer Science vol. 7, e577, 2021
  9. 9. z HYPOTHESIS 9 By combining data-driven simulation and deep learning, we can learn generative business process models that are: 1. more accurate than those we can learn by using these methods in isolation 2. Suitable both for “as is” and for “what if” analysis use cases
  10. 10. z DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS 10 {T1 -> T2 -> T3} {T1 -> T3 -> T3} {T1 -> T2 -> T3} {T1 -> T2 -> T3} {T1 -> T2 -> T2} {T1 -> T2 -> T3} {T1 -> T2 -> T3} {T1 -> T3 -> T2} {T1 -> T2 -> T3} {T1 -> T2 -> T3} Event log Phase 1 • Stochastic process model Discovery & optimization • Generation of activity sequences Activity sequence generation Accuracy assessment Phase 2 • Time-series analysis & optimization • Enrichment of traces with start-times Case start-times generation Phase 3 • Deep-learning models training & optimization • Enrichment of traces with timestamps Activity timestamps generation
  11. 11. z 11 PHASE 1 - ACTIVITY SEQUENCE GENERATION A1 A2 A3 A5 A6 A4 Ꝺ1: A2 A3 A4 A5 Ꝺ2: • Control-flow discovery (BPMN model): • Split Miner algorithm • Discover the branching probabilities: I. Assign equal values to each conditional branch II. Compute the branching probabilities by replaying the aligned event against the process model DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
  12. 12. z 12 PHASE 2 - CASE START-TIMES GENERATION A1 A2 A3 A5 A6 A4 A2 A3 A4 A5 Time series decomposition (trend, seasonality, and holidays): y(t) = g(t) + s(t) + h(t) + Ɛt Trend g(t): models nonperiodic changes in the value of the time series: Saturating growth model (logistic growth model) Seasonality s(t): represents periodic changes (e.g., weekly and yearly seasonality): Fourier series Holidays h(t): represents the effects of holidays that occur on potentially irregular schedules over one or more days: Introduced by the analyst Error Ɛt: idiosyncratic changes which are not accommodated by the model DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
  13. 13. z 13 PHASE 3 - ACTIVITY TIMESTAMPS GENERATION e1- start e1- complete e2- start 𝜎1 Ac1 Ac2 e2- complete Waiting time predictive model Features: Wait+Ac2+Cx+WIP+RO Processing time predictive model Features: Proc+Ac1+Cx+WIP+RO A1 A2 A3 A5 A6 A4 A2 A3 A4 A5 DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
  14. 14. z Size Source log #Traces #Events #Act. Avg. activities per trace Avg. duration Max. duration Description LARGE R POC 70512 415261 8 5.89 15.21 days 269.23 days Undisclosed banking process* R BPI17W 30276 240854 8 7.96 12.66 days 286.07 days Dutch financial institution updated R BPI12W 8616 59302 6 6.88 8.91 days 85.87 days Dutch financial institution S CVS 10000 103906 15 10.39 7.58 days 21.0 days CVS retail pharmacy** S CFM 1670 44373 29 26.57 0.76 days 5.83 days Anonymized confidential process** SMALL R INS 1182 23141 9 19.58 70.93 days 599.9 days Insurance claims process* R ACR 954 4962 16 5.2 14.89 days 135.84 days Academic Credential Recognition R MP 225 4503 24 20.01 20.63 days 87.5 days Manufacturing Production S CFS 800 21221 29 26.53 0.83 days 4.09 days Anonymized confidential process** S P2P 608 9119 21 15 21.46 days 108.31 days Purchase-to-Pay process DATASETS (EVENT LOGS) 14 (*) Private logs, (**) Generated from simulation models of real processes
  15. 15. z EXP1 - AS-IS ANALYSIS USING DEEPSIMULATOR 15 Partition 2 (30%) Testing Partition 1 (70%) Training Testing Deep Learning Trainer BEST DL MODEL Trace generator Time splitting Evaluator ELS/DL/MAE/EMD Simod Parameter extraction BEST SIM MODEL Simulator Training (80%) Validation (20%) Event-log 1 2 3 DeepSimulator BEST DEEP SIM MODEL Deep Simulator
  16. 16. z EXP1 – EVALUATION RESULTS 16 DeepSimulator generally outperforms classical DDS w.r.t. temporal measures
  17. 17. z EXP2 - WHAT-IF ANALYSIS (ARRIVAL INTENSITY) 17 Scenario 1 Batch 1 Batch 2 Batch 3 Batch 4 Batch 5 Batch 6 BPI17W BPI12W CVS Version1 : Demand modified (D) Version2 : Demand + waiting times Modified (TD)
  18. 18. z EXP2 - WHAT-IF ANALYSIS (ADDING A NEVER-BEFORE-OBSERVED ACTIVITY) 18 Remove activity Train DSIM baseline model List of changes Update models Update embeddings Replace embeddings of generative models Evaluator MAE/RMSE/SMAPE Generate log Generate log Simulation model Train DSIM modified model BASELINE MODEL UPDATED MODEL Simulation model modified log Modified log Partition 2 (30%) Testing Partition 1 (70%) Time splitting Training (80%) Validation (20%) Event-log Scenario 2
  19. 19. z EXP2 - EVALUATION RESULTS 19 • DeepSimulator can better estimate the impact of changes in the demand in settings where such changes have been previously observed in the data. • The accuracy of DeepSimulator degraded when evaluated in a previously unobserved scenario (a new activity is added to the process) SIMOD DSIM SIMOD DSIM SIMOD DSIM Version 1 BPI17W 971151 417572 0.02222 0.03593 3185 3647 BPI12W 660211 534341 0.11295 0.04853 515 458 CVS 1489252 467572 0.03213 0.00001 3380 849 Version 2 BPI17W 895524 290980 0.06438 0.03218 4528 3431 BPT12W 550266 524995 0.25888 0.22003 726 507 CVS 540112 246159 0.15674 0.05708 2453 1967 AS-IS WHAT-IF AS-IS WHAT-IF AS-IS WHAT-IF CFM 7155 17546 22006 33137 0.15629 0.28762 CVS 283061 1040344 357717 1052255 0.31972 1.84601 Log MAE RMSE SMAPE Scenario 1 Scenario 2 Log MAE EMD DTW
  20. 20. z CONCLUSION 20 The DeepSimulator method combines data-driven simulation to capture the control-flow perspective of a process with deep learning techniques to capture the temporal perspective. The evaluation in the AS-IS setting shows that DeepSimulator outperforms a pure Data-Driven Discrete Event Simulation method and a pure Deep Learning method. The evaluation on WHAT-IF analysis scenarios shows that DeepSimulator can better estimate the impact of changes on the arrival rate of cases (the demand) in settings where such changes have been previously observed in the data. However, the accuracy of DeepSimulator degrades when applied to a previously unobserved scenario, specifically a scenario where a completely new activity is added to the process.
  21. 21. z FUTURE WORK 21 Explore other mechanisms for modeling the activities of the process via embeddings (e.g., word2vec, transformer models). Generate events that include resource and domain-specific attributes. Support a broader range of changes, such as changes in the resource perspective
  22. 22. QUESTIONS? 22 Ph.D Manuel Camargo University of Tartu Universidad de los Andes manuel.camargo@ut.ee

×