Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)

416 views

Published on

Since the first algorithms for automatically discovering process models from event logs have been proposed in the late 1990ies the problem of obtaining insights into processes by mining from event logs gained growing attention. By now, the field has grown into a maturing discipline and industry has begun adopting process mining in regular operations, supported by several commercial process mining solutions are available on the market.

In the early days of process mining, several algorithms for constructively discovering a process model from an event log were proposed, each algorithm pursuing unique principles for constructing a model. This first generation of process discovery techniques, which includes for instance the alpha-algorithm, paved the ground for process mining as research discipline. As these algorithms were applied in practice, new research challenges showed up, sparking new results in both pre-processing event data and evaluating process models on event logs. In particular the latter deepened the understanding of the challenges in process mining and established a reliable feedback mechanism in process mining in the form of conformance checking. This feedback mechanism enabled researching a second generation of process mining techniques addressing a large variety of problems such as quality guarantees for discovered models, including the data perspective in discovered models, or discovering temporal logic constraints. In particular, the inductive miner family was seen as a new milestone as it provided a systematic way to develop process discovery algorithms with reliable results. Yet again, as these more capable techniques are being applied to the growing and more detailed event data recorded in practice, further unsolved challenges arise.

In the first part of my talk I will draw an arc from the early days of process mining to the current state of the art in process mining – highlighting central techniques and their impact on later developments. In the second part of my talk, I will then turn to what kinds of event data and challenges are being found in practice today, how existing process mining techniques fail to address them, and thus which open challenges and opportunities the process mining field offers also for researchers from other domains.

Published in: Data & Analytics
  • Be the first to comment

Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)

  1. 1. Process Mining Past, Present, and Open Challenges Dirk Fahland (d.fahland@tue.nl) @dfahland 0
  2. 2. Design vs Actual Use 1
  3. 3. Process Design vs Actual Use 1 2
  4. 4. Process Design vs Actual Use 2 3
  5. 5. Process Design vs Actual Use “1 returned” “refund 1” 4
  6. 6. Process Design vs Actual Use “refund 2” “refund 1” 5
  7. 7. Actual use… unknown Hey… what’s your return order process? Just use our app, send item, receive money 6
  8. 8. What is Process Mining? 7 process miningstochastics operations manage- ment & research business process management process automation & optimi- zation formal methods & concurrency theory business process improve- ment workflow manage- ment process science + ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
  9. 9. … the link between Process Science and Data Science 8©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
  10. 10. Discover actual use of a system: read the traces 9
  11. 11. OrderID Activity Time Source Product … 302 Receive Order 09.02 22:15 Web 1, 2, 3 … 412 Receive Order 14.02 22:21 … … … 302 Create Return # 15.02 11:25 App 1 … 302 Create Return # 15.02 11:27 App 2 … … … … … … 302 Receive Package 17.02 9:24 Cam1 1 412 … … … … … 302 Customer Call 18.02 20:13 Anna 2 … … … … … … … Traces left by (Information) Systems event log What? When? Who? Which case? 10
  12. 12. Traces left by (Information) Systems event log … Create Return # Create Return # Receive Package Receive Order Customer Call Receive Order … … 11 OrderID Activity Time 302 Receive Order 09.02 22:15 412 Receive Order 14.02 22:21 302 Create Return # 15.02 11:25 302 Create Return # 15.02 11:27 … … … 302 Receive Package 17.02 9:24 412 … … 302 Customer Call 18.02 20:13 … … …
  13. 13. Process Discovery event log Create Order Receive Return Create Return discover process modeldescribes simple… … Customer Call 12 … Create Return # Create Return # Receive Package Receive Order Customer Call Create Order … …
  14. 14. Process Mining… 13 A B C DE p2 end p4 p3p1 start Past Present Open Challenges
  15. 15. Learning Automata 14 Directly-Follows-Graph
  16. 16. Learning Automata 15 K-TailsDirectly-Follows-Graph state = “sequences of next k activities” Mining = find structure in these relations [Cook, Wolf 1995-1998], [Cohen, Maoz 2014]
  17. 17. Learning Concurrency 16 Inductive Miner: B and C concurrent
  18. 18. Learning Concurrency 17 Inductive Miner: B and C concurrent reveals true frequencies, local repetitions, … zoom in
  19. 19. Learning Models with Concurrency: ILP Miner [Werf, Dongen, Hurkens, Serebrenik 2009] 18 A B C DE ABCD ACBD AED D must happen before B  prevents traces #1 and #2  don’t add placeA must happen before B or E  allows all traces  add place  encode as ILP problem
  20. 20. Learning Models with Concurrency: ILP Miner [Werf, Dongen, Hurkens, Serebrenik 2009] 19 A B C DE p2 end p4 p3p1 start ABCD ACBD AED Alpha Algorithm: construct places based on binary relations (derived from directly-follows graph) [Aalst, Weijters, Maruster 2004]
  21. 21. Precise Semantics and “Messy” Data 20 Road Traffic Fines Log ILP Miner: fitting, but complexAlpha Miner: “unsound” (no proper behavior)
  22. 22. Less precise: the Visual Approach 21 Directly & Eventually Follows Relation: thresholds for filtering edges + structural simplification Heuristics Miner [Agrawal, Gunopulos, Leymann 1998] [Weijters, Aalst 2001] Road Traffic Fines Log
  23. 23. Many Process Discovery Algorithms… alpha ILP Heuristics Transition System Fuzzy Disco 22
  24. 24. … and the Challenges of Real-Life Data ILP Transition Systemalpha Heuristics Fuzzy Disco show/hide details 23
  25. 25. …but concurrency matters for… frequencies, performance analysis, simplicity 24 2hrs 7hrs 1.6hrs 4.5hrs
  26. 26. How to get correct models on real data? 25 A B C DE p2 end p4 p3p1 start Past Present Open Challenges
  27. 27. Quality and Forces in Process Discovery log process model positive examples only 26
  28. 28. Quality and Forces in Process Discovery [Buijs, van Dongen, Aalst 2014] log process model ensure fitness generalize increase precision simple models 27
  29. 29. The Process Discovery Problem event log discover process model fitting and precise can rediscover (generalizes) Simple, Sound, Semantics Analysis 29
  30. 30. Basic Process Discovery Principle extract behavioral specification synthesize process model process model 30 event log
  31. 31. Bottom-Up Discovery: Directly-Follows Relation ACDE ADCE ADECFDE BDEC BCDEFDE BDEFCDE A B C D E F 33
  32. 32. Dominant Behavioral Relation: Sequence Cut A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F  A B C D E F 34
  33. 33. Split Along Cut & Recurse A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F  A B C D E F 35
  34. 34. Choice Cut & Base Case A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F   A B C D E F 36
  35. 35. C D E F Parallel Cut A B   A B C D E F A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE 37 
  36. 36. D E F C Loop Cut A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F   A B  38
  37. 37. D E F C Loop Cut A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F   A B  39 
  38. 38. D E F C … until All Bases Reached A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F   A B   40 
  39. 39. D E F C Sequence, Choice, Parallel, Loop (or “Flower”) A CDE A DCE A DECFDE B DEC B CDEFDE B DEFCDE A B C D E F   A B   41 
  40. 40. D E F C Process Tree = Block-Structured Model   A B    A B F  A B   C  C D E D E   F 42
  41. 41. Inductive Miner sound, fitting models (+/- filtering)  allows for reliable analysis of behavior [Leemans, Fahland, Aalst 2013-2015]
  42. 42. 44 Inductive Miner sound, fitting models (+/- filtering)  allows for reliable analysis of behavior [Leemans, Fahland, Aalst 2013-2015] adding details 59000x credit collection 4000x appeal
  43. 43. 45 Inductive Miner sound, fitting models (+/- filtering)  allows for reliable analysis of behavior [Leemans, Fahland, Aalst 2013-2015] Animate flow of cases
  44. 44. 46 Highlight deviations
  45. 45. 47 Inductive Miner sound, fitting models (+/- filtering)  allows for reliable analysis of behavior [Leemans, Fahland, Aalst 2013-2015] Analyze performance 87 days until fine is sent
  46. 46. Combining Process Mining and Data Mining [Leoni et al 2013] 48 conditions for choices: “Appeal to Judge” if amount  36 EUR
  47. 47. Where are we now? Process Mining Software  www.promtools.org 49 1500+ plug-ins available covering the whole process mining spectrum >150k downloads ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
  48. 48. Commercial Uptake 50©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
  49. 49. IEEE CIS Taskforce on Process Mining https://www.win.tue.nl/ieeetfpm/ 51
  50. 50. So, has the problem been cracked? 52 A B C DE p2 end p4 p3p1 start Past Open Challenges
  51. 51. So, has the problem been cracked? 53 A B C DE p2 end p4 p3p1 start Past Present
  52. 52. Processes may follow many different variants 54 Purchasing Process All variants in one model  very imprecise
  53. 53. 55 Cluster traces based on similarity of event context [Lu et al. 2015-2017]
  54. 54. 56 Cluster traces based on similarity of event context [Lu et al. 2015-2017]
  55. 55. Put events into data context: decompose [van Eck, Sidorova, Aalst 2016] 57 Create Sales Order Position Creating Invoice
  56. 56. Put events into data context: decompose [Aalst, Kalenkova, Rubin, Verbeek 2014] 59 Register, Select Flight, Select Hotel, Book Flight, Book Hotel, Pay Register, Select Flight, Select Hotel, Book Hotel, Book Flight, Pay Register, Select Flight, Book Flight, Select Hotel, Book Hotel, Pay Register, Select Flight, Select Hotel, Cancel Register, Select Flight, Book Flight, Pay Register, Select Flight, Book Flight, Pay Register, Select Flight, Book Flight, Pay Register, Select Flight, Cancel Register, Select Hotel, Book Hotel, Pay Register, Select Hotel, Cancel Register, Select Hotel, Book Hotel, Pay Register, Select Hotel, Book Hotel, Pay
  57. 57. Put events into data context: decompose & recompose [Aalst, Kalenkova, Rubin, Verbeek 2014] 60 Register, Select Flight, Book Flight, Pay Register, Select Flight, Book Flight, Pay Register, Select Flight, Book Flight, Pay Register, Select Flight, Cancel Register, Select Hotel, Book Hotel, Pay Register, Select Hotel, Cancel Register, Select Hotel, Book Hotel, Pay Register, Select Hotel, Book Hotel, Pay
  58. 58. Put events into data context: decompose & recompose [Aalst, Kalenkova, Rubin, Verbeek 2014] 61 Register, Select Flight, Book Flight, Pay Register, Select Flight, Book Flight, Pay Register, Select Flight, Book Flight, Pay Register, Select Flight, Cancel Register, Select Hotel, Book Hotel, Pay Register, Select Hotel, Cancel Register, Select Hotel, Book Hotel, Pay Register, Select Hotel, Book Hotel, Pay Allows discovering non-block structured models!
  59. 59. Post-Process 62 restructure output of Heuristics Miner [Augusto et al. 2016]
  60. 60. Process Mining = Discovery + Conformance + Extension + Log Preprocessing + … event log discover model of actual process model of intended process check conformance Deviations between actual and intended process model of actual process model of intended process enriched model extend • Filtering • Clustering • Activity identification • Deviation detection • Partially ordered event data • Event log visualization • Database tables • Database logs • Event streams • IoT devices 63  www.promtools.org
  61. 61.  Find patterns and contexts • identify variants • identify independence  concurrency • aggregate sets of low-level events to high-level activities  Learn prediction models • outcomes of a process based on case features • detect deviations/risks early on  Mine and integrate domain-knowledge • Identify patterns/variants/views that fit domain expectations • Enrich models with domain concepts Opportunities for Data Mining in Process Mining 64
  62. 62.  Get ProM • www.promtools.org  Get event logs • Real-life event logs https://data.4tu.nl/repository/collection:event_logs_real • Synthetic event logs https://data.4tu.nl/repository/collection:event_logs_synthetic  Read up on analyses • Case studies https://www.win.tue.nl/ieeetfpm/doku.php?id=shared:process_mining_case_studies • BPI Challenge 2017 (and all previous editions) https://www.win.tue.nl/bpi/doku.php?id=2017:challenge  Take a free online course on Process Mining • https://www.coursera.org/learn/process-mining/ • https://www.futurelearn.com/courses/process-mining • https://www.futurelearn.com/courses/process-mining-healthcare  Check the literature list on the next page How to get started? 65
  63. 63. 1. Cook, Jonathan E. and Alexander L. Wolf. “Automating Process Discovery through Event-Data Analysis.” 1995 17th International Conference on Software Engineering (1995): 73-73. 2. Cook, Jonathan E. and Alexander L. Wolf. “Discovering Models of Software Processes from Event-Based Data.” ACM Trans. Softw. Eng. Methodol. 7 (1998): 215-249. 3. Cook, Jonathan E. and Alexander L. Wolf. “Event-Based Detection of Concurrency.” (1998). ACM SIGSOFT’98 4. Agrawal, Rakesh, Dimitrios Gunopulos and Frank Leymann. “Mining Process Models from Workflow Logs.” EDBT (1998). 5. Weijters, A J M M and W M P Van Der Aalst. “Process Mining Discovering Workflow Models from Event-Based Data.” (2001). 6. Maruster, Laura, A. J. M. M. Weijters, Wil M. P. van der Aalst and Antal van den Bosch. “Process Mining: Discovering Direct Successors in Process Logs.” Discovery Science (2002). 7. Aalst, Wil M. P. van der, A. J. M. M. Weijters and Laura Maruster. “Workflow mining: discovering process models from event logs.” IEEE Transactions on Knowledge and Data Engineering 16 (2004): 1128-1142. 8. Jan Martijn E. M. van der Werf, Boudewijn F. van Dongen, Cor A. J. Hurkens, Alexander Serebrenik: Process Discovery using Integer Linear Programming. Fundam. Inform. 94(3-4): 387-412 (2009) 9. Sander J. J. Leemans, Dirk Fahland, Wil M. P. van der Aalst: Discovering Block-Structured Process Models from Event Logs - A Constructive Approach. Petri Nets 2013: 311-329 10. Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Giorgio Bruno: Automated Discovery of Structured Process Models: Discover Structured vs. Discover and Structure. ER 2016: 313-329 11. Wil M. P. van der Aalst, Anna A. Kalenkova, Vladimir A. Rubin, Eric Verbeek: Process Discovery Using Localized Events. Petri Nets 2015: 287-308 12. Cohen, Hila and Shahar Maoz. “The confidence in our k-tails.” ASE (2014). 13. Xixi Lu, Dirk Fahland, Wil M. P. van der Aalst: Interactively Exploring Logs and Mining Models with Clustering, Filtering, and Relabeling. BPM (Demos) 2016: 44-49 14. Xixi Lu, Dirk Fahland: A Conceptual Framework for Understanding Event Data Quality for Behavior Analysis. ZEUS 2017: 11-14 15. Xixi Lu, Dirk Fahland, Frank J. H. M. van den Biggelaar, Wil M. P. van der Aalst: Detecting Deviating Behaviors Without Models. Business Process Management Workshops 2015: 126-139 16. Maikel L. van Eck, Natalia Sidorova, Wil M. P. van der Aalst: Discovering and Exploring State-Based Models for Multi-perspective Processes. BPM 2016: 142-157 17. Massimiliano de Leoni, Marlon Dumas, Luciano García-Bañuelos: Discovering Branching Conditions from Business Process Execution Logs. FASE 2013: 114-129 18. Massimiliano de Leoni, Wil M. P. van der Aalst: Data-aware process mining: discovering decisions in processes using alignments. SAC 2013: 1454-1461 19. Joos C. A. M. Buijs, Boudewijn F. van Dongen, Wil M. P. van der Aalst: Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity. Int. J. Cooperative Inf. Syst. 23(1) (2014) 20. Arya Adriansyah, Boudewijn F. van Dongen, Wil M. P. van der Aalst: Conformance Checking Using Cost-Based Fitness Analysis. EDOC 2011: 55-64 21. Jorge Munoz-Gama, Josep Carmona: A Fresh Look at Precision in Process Conformance. BPM 2010: 211-226 22. Arya Adriansyah, Jorge Munoz-Gama, Josep Carmona, Boudewijn F. van Dongen, Wil M. P. van der Aalst: Measuring precision of modeled behavior. Inf. Syst. E-Business Management 13(1): 37-67 (2015) Literature 66
  64. 64. Process Mining Past, Present, and Open Challenges Dirk Fahland (d.fahland@tue.nl) @dfahland 67

×