Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Business Process Monitoring and Process Mining

96 views

Published on

Two-day course delivered at the Chinese Business Process Management (BPM) Summer School in Jinan, China, 23-24 August 2018. The course introduces a range of techniques, tools, and algorithms for process monitoring and mining.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Introduction to Business Process Monitoring and Process Mining

  1. 1. Introduction to Business Process Monitoring and Process Mining Marlon Dumas University of Tartu, Estonia marlon.dumas@ut.ee
  2. 2. 3 months later 2
  3. 3. 1. Any process is better than no process 2. A good process is better than a bad process 3. Even a good process can be improved 4. Any good process eventually becomes a bad process • […unless continuously cared for] • Michael Hammer Back to basics… 3
  4. 4. 4
  5. 5. Part 1: Techniques for Business Process Monitoring 5
  6. 6. Business Process Monitoring Dashboards & reports Process miningEvent stream DB logs Event log
  7. 7. Process Dashboards Operational dashboards (runtime) Tactical dashboards (historical) Strategic dashboards (historical) Types of process dashboards
  8. 8. Operational process dashboards • Aimed at process workers & operational managers • Emphasis on monitoring (detect-and-respond), e.g.: - Work-in-progress - Problematic cases – e.g. overdue/at-risk cases - Resource load
  9. 9. • Aimed at process owners / managers • Emphasis on analysis and management • E.g. detecting bottlenecks • Typical process performance indicators • Cycle times • Error rates • Resource utilization Tactical dashboards
  10. 10. Tactical Performance Dashboard @ Australian Insurer
  11. 11. • Aimed at executives & managers • Emphasis on linking process performance to strategic objectives Strategic dashboards
  12. 12. Manage Unplanned Outages Manage Emergencies & Disasters Manage Work Programming & Resourcing Manage Procurement Customer Satisfaction 0.5 0.55 - 0.2 Customer Complaint 0.6 - - 0.5 Customer Feedback 0.4 - - 0.8 Connection Less Than Agreed Time 0.3 0.6 0.7 - Key Performance Process Strategic Performance Dashboard @ Australian Utilities Provider
  13. 13. Process: Manage Emergencies & Disasters Process: Manage Procurement Process: Manage Unplanned Outages Overall Process Performance Financial People Customer Excellence Operational Excellence Risk Management Health & Safety Customer Satisfaction Customer Complaint Customer Rating (%) Customer Loyalty Index Average Time Spent on Plan 1st Layer Key Result Area 2nd Layer Key Performance Satisfied Customer Index Market Share (%) 3rd & 4th Layer Process Performance Measures 0.65 0.6 0.7 0.7 0.6 0.8 0.4 0.8 0.5 0.4 0.5 0.8 0.4 0.54 0.58 0.67
  14. 14. Sketch operational and tactical process monitoring dashboards for CVS Pharmacy’s prescription fulfillment process. Consider the viewpoints of each stakeholder in the process. Teamwork
  15. 15. Business Process Monitoring Dashboards & reports Process miningEvent stream DB logs Event log
  16. 16. Business Process Monitoring Dashboards & reports Process miningEvent stream DB logs Event log
  17. 17. Process Mining 17 / event log discovered model Process Discovery Conformance Checking Variants Analysis Difference diagnostics Performance Mining input model Enhanced model event log’
  18. 18. Event logs structure: minimum requirements Concrete formats: • Comma-Separated Values (CSV) • XES (XML format)
  19. 19. Automated Process Discovery 19 Enter Loan Application Retrieve Applicant Data Compute Installments Approve Simple Application Approve Complex Application Notify Rejection Notify Eligibility CID Task Time Stamp … 13219 Enter Loan Application 2007-11-09 T 11:20:10 - 13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 - 13220 Enter Loan Application 2007-11-09 T 11:22:40 - 13219 Compute Installments 2007-11-09 T 11:22:45 - 13219 Notify Eligibility 2007-11-09 T 11:23:00 - 13219 Approve Simple Application 2007-11-09 T 11:24:30 - 13220 Compute Installements 2007-11-09 T 11:24:35 - … … … …
  20. 20. Process Mining Tools Open-source • Apromore • ProM • bupaR Lightweight • Disco Mid-range • Minit • myInvenio • ProcessGold • QPR Process Analyzer • Signavio Process Intelligence • StereoLOGIC Discovery Analyst Heavyweight • ARIS Process Performance Manager • Celonis Process Mining • Perceptive Process Mining (Lexmark) • Interstage Process Discovery (Fujitsu) 20
  21. 21. Fluxicon Disco 21
  22. 22. Process Maps • A process map of an event log is a graph where: • Each activity is represented by one node • An arc from activity A to activity B means that B is directly followed by A in at least one trace in the log • Arcs in a process map can be annotated with: • Absolute frequency: how many times B directly follows A? • Relative frequency: in what percentage of times when A is executed, it is directly followed by B? • Time: What is the average time between the occurrence of A and the occurrence of B? 22
  23. 23. Process Maps – Example 23 Event log: 10: a,b,c,g,e,h 10: a,b,c,f,g,h 10: a,b,d,g,e,h 10: a,b,d,e,g,h 10: a,b,e,c,g,h 10: a,b,e,d,g,h 10: a,c,b,e,g,h 10: a,c,b,f,g,h 10: a,d,b,e,g,h 10: a,d,b,f,g,h
  24. 24. Process Maps – Exercise Case ID Task Name Originator Timestamp Case ID Task Name Originator Timestamp 1 File Fine Anne 20-07-2004 14:00:00 3 Reminder John 21-08-2004 10:00:00 2 File Fine Anne 20-07-2004 15:00:00 2 Process Payment system 22-08-2004 09:05:00 1 Send Bill system 20-07-2004 15:05:00 2 Close case system 22-08-2004 09:06:00 2 Send Bill system 20-07-2004 15:07:00 4 Reminder John 22-08-2004 15:10:00 3 File Fine Anne 21-07-2004 10:00:00 4 Reminder Mary 22-08-2004 17:10:00 3 Send Bill system 21-07-2004 14:00:00 4 Process Payment system 29-08-2004 14:01:00 4 File Fine Anne 22-07-2004 11:00:00 4 Close Case system 29-08-2004 17:30:00 4 Send Bill system 22-07-2004 11:10:00 3 Reminder John 21-09-2004 10:00:00 1 Process Payment system 24-07-2004 15:05:00 3 Reminder John 21-10-2004 10:00:00 1 Close Case system 24-07-2004 15:06:00 3 Process Payment system 25-10-2004 14:00:00 2 Reminder Mary 20-08-2004 10:00:00 3 Close Case system 25-10-2004 14:01:00 24
  25. 25. Process Maps in Disco • Disco (and other commercial process mining tools) use process maps as the main visualization technique for event logs • These tools also provide three types of operations: 1. Abstract the process map: • Show only most frequent activities • Show only most frequent arcs 2. Filter the traces in the event log… 25
  26. 26. Types of filters • Event filters • Retain only events that fulfil a given condition (e.g. all events of type “Create purchase order”) • Performance filter • Retain traces that have a duration above or below a given value • Event pair filter (a.k.a. “follower” filter) • Retain traces where there is a pair of events that fulfil a given condition (e.g. “Create invoice” followed by “Create purchase order”) • Endpoint filter • Retain traces that start with or finish with an event that fulfils a given condition 26
  27. 27. Process Maps in Disco • Disco (and other commercial process mining tools) use process maps as the main visualization technique for event logs • These tools also provide three types of operations: 1. Abstract the process map: • Show only most frequent activities • Show only most frequent arcs 2. Filter the traces in the event log 3. Enhance the process map 27
  28. 28. Process Map Enhancement • Nodes and arcs in a process map can be color- coded or thickness-coded to capture: • Frequency: How often a given task or a given directly- follows relation occurs? • Time performance: processing times, waiting times, cycles times of tasks • More advanced tools support enhancement by other attributes, e.g. cost, revenue, etc. if the data is available. 28
  29. 29. Hands-on Disco demo 29
  30. 30. Using Disco, answer the following questions on the PurchasingExample log: • How many cases had to settle a dispute with the purchasing agent? • Is there a difference in cycle time for the cases that had to settle a dispute with the purchasing agent, compared to the ones that did not? Make sure you only compare cases that actually reach the endpoint ‘Pay invoice’ • Are there any cases where the invoice is released and authorized by the same resource? And if so, who is doing this most often? Exercise Exercise by Anne Rozinat, Fluxicon
  31. 31. Consider the dataset of a refund process from an electronics manufacturer. Customer complaints and the inspection of individual cases indicate that this process suffers from inefficiencies and overly long cycle times. Assume that only cases that have reached the ‘Order completed’ event are finished. Questions: 1. Is it a problem if you take the average cycle time of all cases, also the ones that have not finished yet? 2. In general, which channel(s) have the biggest problems with missing documents that need to be requested from the customer? 3. How many customers have received a refund without the product being received by the electronics manufacturing company? This should not happen in this process. 4. Has a customer ever received a double payment? This should not happen in this process. To complete this exercise use the log of RefundProcess.fbt One more exercise: Refund process
  32. 32. Process Maps - Limitations • Process maps over-generalize: some paths of a process map might not exist and might not make sense • Example: Draw the process map of [ abc, adc, afce, afec ] and check which traces it can recognize, for which there is no support in the event log. • Process maps make it difficult to distinguish conditional branching, parallelism, and loops. • See previous example… or a simpler one: [abcd, acbd] • Solution: automated BPMN process discovery • More on this tomorrow… 33
  33. 33. Process Mining 34 / event log discovered model Process Discovery Conformance Checking Variants Analysis Difference diagnostics Performance Mining input model Enhanced model event log’
  34. 34. • Dotted charts • One line per trace, each line contains points, one point per event • Each event type is mapped to a colour • Position of the point denotes its occurrence time (in a relative scale) • Birds-eye view of the timing of different events (e.g. activity end times), but does not allow one to see the “processing” times of activities • Timeline diagrams • One line per trace, each line contains segments capturing the start and end of tasks • Captures process time (unlike dotted charts) • Not scalable for large event logs – good to show “representative” traces • Performance-enhanced process maps • Process maps where nodes are colour-coded w.r.t a performance measure. Nodes may represent activities (default option) • But they may represent resources and then arcs denote hands-offs between resources Process Performance Mining
  35. 35. Slide 36 Dotted chart
  36. 36. Slide 37 Timeline diagram See: http://timelines.nirdizati.org
  37. 37. Slide 38 Performance-enhanced process map Nodes are activities (default) Screenshot of Disco
  38. 38. Slide 39 Performance-enhanced process map Nodes are tasks (handoff graph) Screenshot of MyInvenio
  39. 39. Exercise • Consider the following event log of a telephone repair process: http://tinyurl.com/repairLogs • What are the bottlenecks in this process? • Which task has the longest waiting time and which one has the longest processing time? 40
  40. 40. Process Mining 41 / event log discovered model Process Discovery Conformance Checking Variants Analysis Difference diagnostics Performance Mining input model Enhanced model event log’
  41. 41. Given two logs, find the differences and root causes for variation or deviance between the two logs Variants Analysis ≠
  42. 42. Case Study: Variants Analysis at Suncorp OK OK Good Bad Expected Performance Line
  43. 43. Simple claims and quick Simple claims and slow Variants Analysis via Process Map Comparison ? S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013
  44. 44. Variants analysis - Exercise We consider a process for handling health insurance claims, for which we have extracted two event logs, namely L1 and L2. Log L1 contains all the cases executed in 2011, while L2 contains all cases executed in 2012. The logs are available in the book’s companion website or directly at: http://tinyurl.com/InsuranceLogs Based on these logs, answer the following questions using a process mining tool: 1. What is the cycle time of each log? 2. Where are the bottlenecks (highest waiting times) in each of the two logs and how do these bottlenecks differ? 3. Describe the differences between the frequency of tasks and the order in which tasks are executed in 2011 (L1) versus 2012 (L2). Hint: If you are using process maps, you should consider using the abstraction slider in your tool to hide some of the most infrequent arcs so as to make the maps more readable 45
  45. 45. Process Mining 46 / event log discovered model Process Discovery Conformance Checking Variants Analysis Difference diagnostics Performance Mining input model Enhanced model event log’
  46. 46. Conformance Checking 47 ≠
  47. 47. Conformance Checking: Unfitting vs. Additional Behavior Unfitting behaviour: • Task C is optional (i.e. may be skipped) in the log Additional behavior: • The cycle including IGDF is not observed in the log Event log: ABCDEH ACBDEH ABCDFH ACBDFH ABDEH ABDFH
  48. 48. Conformance Checking in Apromore 49 Full demo at: https://www.youtube.com/watch?v=3d00pORc9X8
  49. 49. Open-source tools: Apromore (apromore.org)• Open-source BPM analytics platform as Software as a Service • Focus is on end users (business analytics and operations managers), not on data scientists • Over 40 plugins ! !
  50. 50. Key features • Repository of process models and event logs (BPMN, AML, XPDL, EPML, AML, YAWL, XES, MXML) • Offers a range of features along the BPM lifecycle: From logs • Automated discovery of BPMN models • Filter noise from log • Visualize log • Mine process stages From models • Structure model On logs • Animate logs • Compare model-log, log-log • Detect and characterize drifts • Measure log complexity • Mine process performance On models • Measure model complexity • Compare model-model • Detect clones • Search similar models • Simulate model On models • Merge model variants • Configure model with questionnaire From logs • Animate logs • Compare model-log, log-log • Detect and characterize drifts • Mine process performance • Predict outcomes and performance (via Nirdizati)
  51. 51. Access Apromore You can access it in the cloud or download and install a standalone version Cloud-version • Node 1(Estonia): http://apromore.cs.ut.ee • Node 2 (Australia): http://apromore.qut.edu.au Standalone • One-click: a lightweight version of Apromore. Simply unzip and run from localhost • Full-fledged: for developers and advanced users, this distribution gives you full control over Apromore Source code • Apromore’s source code is open-source, licensed under LGPL 3.0 • The code can be accessed from GitHub
  52. 52. ProM: the very first process mining tool • 600+ plug-ins available for the whole process mining spectrum • Open source license • Download it from www.processmining.org
  53. 53. Nirdizati: predictive process monitoring (nirdizati.com)• Predict process outcome (e.g. “Is this loan offer going to be rejected?”) • Predict process performance (e.g. “Will this claim take longer than 5 days to be handled?”) • Predict future events (e.g. “What activity is likely to be executed next? And after that?”)
  54. 54. Part 2: Process Mining Algorithms 55
  55. 55. BPMN-Based Process Mining 56 / event log discovered model Process Discovery Conformance Checking Variants Analysis Difference diagnostics Performance Mining input model Enhanced model event log’
  56. 56. Conformance Checking: Unfitting vs. Additional Behavior Unfitting behaviour: • Task C is optional (i.e. may be skipped) in the log Additional behavior: • The cycle including IGDF is not observed in the log Event log: ABCDEH ACBDEH ABCDFH ACBDFH ABDEH ABDFH
  57. 57. 58 Process Model Log Unfitting behavior (lack of fitness) Additional behavior (lack of precision) Lack of generalization Accuracy of Automatically Discovered Process Models
  58. 58. Accuracy of Automatically Discovered Process Models • Fitness: To what extent the behaviour observed in the event log fits the process model? • No unfitting behaviour  Fitness = 1 • Precision: How much additional behaviour the process model allows that is not observed in the event log • No additional behaviour  Precision = 1 • Generalization (of an algorithm): If we have a (partial) event log of a process, to what extent the discovery algorithm produces models that fit the behaviour of the process that is not observed in the log 59
  59. 59. Measuring Fitness • Replay • Replay each trace against the model • When a parsing error occurs, repair it locally • Keep track of the “parsing error” • Does not calculate an exact distance measure! • Optimal Trace Alignment • For each trace in the model t, find the trace t’ of the process model such that the string-edit distance of t and t’ is minimal • Use the string-edit distances • Calculates a “distance” between log and model 60
  60. 60. Conformance Checking via Replay A B C E
  61. 61. Conformance Checking via Replay B C E
  62. 62. Conformance Checking via Replay B C E
  63. 63. Conformance Checking via Replay C E
  64. 64. Conformance Checking via Replay C E
  65. 65. Conformance Checking via Replay E
  66. 66. Conformance Checking via Replay E
  67. 67. Conformance Checking via Replay E
  68. 68. Conformance Checking via Replay
  69. 69. Conformance Checking via Replay
  70. 70. Conformance Checking via Replay A C E
  71. 71. Conformance Checking via Replay C E
  72. 72. Conformance Checking via Replay C E
  73. 73. Conformance Checking via Replay C E
  74. 74. Conformance Checking via Replay E
  75. 75. Conformance Checking via Replay E
  76. 76. Conformance Checking via Replay E missing token
  77. 77. Conformance Checking via Replay E
  78. 78. Conformance Checking via Replay
  79. 79. Conformance Checking via Replay
  80. 80. Conformance Checking via Replay remaining token
  81. 81. Accuracy of automatically discovered process models The accuracy of an automatically discovered process models consists of three quality dimensions: 1. Fitness: the discovered model should allow for the behavior seen in the event log.  A model has a perfect fitness if all traces in the log can be replayed from the beginning to the end.
  82. 82. Accuracy of process models The accuracy of an automatically discovered process models consists of three quality dimensions: 1. Fitness 2. Precision (avoid underfitting): the discovered model should not allow for behavior completely unrelated to what was seen in the event log.
  83. 83. The accuracy of an automatically discovered process models consists of three quality dimensions: 1. Fitness: 2. Precision (avoid underfitting) 3. Generalization (avoid overfitting): the discovered model should generalize the example behavior seen in the event log. Accuracy of process models
  84. 84. Flower model (underfitting) L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}
  85. 85. Enumerating model (overfitting) L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}
  86. 86. Something in the middle… L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}
  87. 87. Computing fitness: basic approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}
  88. 88. Computing fitness: basic approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}
  89. 89. Computing fitness: basic approach A B I J K L
  90. 90. Computing fitness: basic approach A B I J K L
  91. 91. Computing fitness: basic approach B I J K L
  92. 92. Computing fitness: basic approach B I J K L
  93. 93. Computing fitness: basic approach I J K L
  94. 94. Computing fitness: basic approach I J K L
  95. 95. Computing fitness: basic approach I J K L non-conformance
  96. 96. Computing fitness: basic approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360} A “basic approach” to compute fitness is to count the fraction of cases that can be “parsed completely” (i.e., the proportion of cases corresponding to firing sequences leading from [start] to [end]).
  97. 97. Computing fitness: basic approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360} A “basic approach” to compute fitness is to count the fraction of cases that can be “parsed completely” (i.e., the proportion of cases corresponding to firing sequences leading from [start] to [end]). Fitness = 0.97
  98. 98. Computing fitness: Event-based approach • In the simple fitness computation, we stopped replaying a trace once we encounter a problem and mark it as non-fitting. • An event-based approach to calculate fitness consists of just continue replaying the trace on the model and: • record all situations where a transition is forced to fire without being enabled, i.e., we count all missing tokens. • record the tokens that remain at the end. • Use of four counters: • p = produced tokens • c = consumed tokens • m = missing tokens • r = remaining tokens
  99. 99. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}
  100. 100. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}
  101. 101. Computing fitness: Event-based approach A B I J K L
  102. 102. Computing fitness: Event-based approach p = 1 c = 0 m = 0 r = 0 A B I J K L
  103. 103. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0 B I J K L
  104. 104. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 B I J K L p = 1 c = 0 m = 0 r = 0
  105. 105. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 I J K L p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  106. 106. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 I J K L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  107. 107. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 I J K L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 0 m = 1 r = 0
  108. 108. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 I J K L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 0 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  109. 109. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 J K L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 0 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  110. 110. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 K L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  111. 111. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  112. 112. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 L p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  113. 113. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 0
  114. 114. Computing fitness: Event-based approach p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 1 m = 1 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 12 c = 12 m = 1 r = 1
  115. 115. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}
  116. 116. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360} p = 12 c = 12 m = 1 r = 1
  117. 117. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 12 c = 12 m = 1 r = 1 Fitness = 0.9166
  118. 118. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}
  119. 119. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 13 c = 13 m = 0 r = 0
  120. 120. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 13 c = 13 m = 0 r = 0 Fitness = 1
  121. 121. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}
  122. 122. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 0 m = 0 r = 0
  123. 123. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 1 m = 0 r = 0 p = 1 c = 0 m = 0 r = 1 p = 0 c = 0 m = 0 r = 0 p = 12 c = 11 m = 0 r = 1
  124. 124. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 12 c = 11 m = 0 r = 1 Fitness = 0.9583
  125. 125. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}
  126. 126. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 13 c = 13 m = 0 r = 0
  127. 127. Computing fitness: Event-based approach L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360} p = 13 c = 13 m = 0 r = 0 Fitness = 1
  128. 128. Computing fitness at log level L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}
  129. 129. Computing fitness at log level L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} Number of occurrences of a specific trace in the log (e.g., if a trace σ appears 200 times in the log, L(σ) will be equal to 200 )
  130. 130. Computing fitness at log level L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 13 c = 13 m = 0 r = 0 p = 12 c = 12 m = 1 r = 1 p = 13 c = 13 m = 0 r = 0 p = 12 c = 11 m = 0 r = 1
  131. 131. Computing fitness at log level L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360} p = 13 c = 13 m = 0 r = 0 p = 12 c = 12 m = 1 r = 1 p = 13 c = 13 m = 0 r = 0 p = 12 c = 11 m = 0 r = 1 Fitness = 0.998
  132. 132. Calculating precision • Precision = 1  the behaviour allowed by the model is contained or equal to the behavior in the log • Precision close to 0  None of the behaviour in the model is observed in the log • Precision can be calculated as a “difference” between a state space representing the behaviour of the model, and a state space representing the behaviour of the log • Adriano Augusto et al. “Abstract-and-Compare: A Family of Scalable Precision Measures for Automated Process Discovery”. In Proceedings of BPM’2018 133
  133. 133. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5
  134. 134. L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Discovery Measuring generalization of a process discovery algorithm via cross-validation
  135. 135. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Fitness
  136. 136. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Fitness
  137. 137. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Fitness
  138. 138. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Fitness
  139. 139. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Fitness
  140. 140. Measuring generalization of a process discovery algorithm via cross-validation L L1 L2 L3 L4 L5 N1 N2 N3 N4 N5 Avg Fitness
  141. 141. Automated Discovery of BPMN Process Models 142
  142. 142. α-algorithm: the Origin of Process Discovery van der Aalst, W. M. P. and Weijters, A. J. M. M. and Maruster, L. (2003). Workflow Mining: Discovering process models from event logs, IEEE Transactions on Knowledge and Data Engineering
  143. 143. α-algorithm Basic Idea: Ordering relations • Direct succession: x>y iff for some case x is directly followed by y. • Causality: xy iff x>y and not y>x. • Parallel: x||y iff x>y and y>x • Unrelated: x#y iff not x>y and not y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B ... A>B A>C B>C B>D C>B C>D E>F AB AC BD CD EF B||C C||B ABCD ACBD EF
  144. 144. Basic Idea: Example
  145. 145. Basic Idea: Example
  146. 146. Basic Idea: Footprints
  147. 147. α-Algorithm • Idea (a) 
  148. 148. α-Algorithm • Idea (a) a  b 
  149. 149. α-Algorithm • Idea (b) 
  150. 150. α-Algorithm • Idea (b) a b, a c and b # c 
  151. 151. α-Algorithm • Idea (c) 
  152. 152. α-Algorithm • Idea (c) b d, c d and b # c 
  153. 153. α-Algorithm • Idea (d) 
  154. 154. α-Algorithm • Idea (d) a b, a c and b || c 
  155. 155. α-Algorithm • Idea (e) 
  156. 156. α-Algorithm • Idea (e) b d, c d and b || c 
  157. 157. α-algorithm: Applicative Example α(L) = ? L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  158. 158. α-algorithm: Applicative Example ALPHABET L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  159. 159. α-algorithm: Applicative Example ALPHABET {a,b,c,d,e,f,g} L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  160. 160. α-algorithm: Applicative Example INITIAL ACTIVITIES L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  161. 161. α-algorithm: Applicative Example INITIAL ACTIVITIES {a} L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  162. 162. α-algorithm: Applicative Example FINAL ACTIVITIES L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  163. 163. α-algorithm: Applicative Example FINAL ACTIVITIES {g} L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  164. 164. α-algorithm: Applicative Example FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4} a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < #
  165. 165. α-algorithm: Applicative Example FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4} a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < #
  166. 166. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  167. 167. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  168. 168. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  169. 169. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  170. 170. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  171. 171. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  172. 172. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  173. 173. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  174. 174. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  175. 175. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  176. 176. α-algorithm: Applicative Example a b c d e f g a # > > # # # # b < # || > # # # c < || # > # # # d # < < # > > # e # # # < # # > f # # # < # # > g # # # # < < # FOOTPRINTS L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  177. 177. α-algorithm: Applicative Example L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}
  178. 178. α-algorithm: Applicative Example α(L) = ? L= { <a,b,c,d,e,f> 4, <a,g,h,d,f,e> 2, <a,b,c,d,f,e> 3, <a,g,h,d,e,f> 3}
  179. 179. Limitations of alpha miner Completeness All possible traces of the process (model) need to be in the log Short loops c>b and b>c implies c||b and b||c instead of cb and bc Self-loops b>b and not b>b implies bb (impossible!)
  180. 180. Frequency of the Ordering Relations?
  181. 181. Little Thumb to Deal with Noise van der Aalst, W. M. P. and Weijters, A. J. M. M. (2003). Rediscovering workflow models from event-based data using little thumb, Integrated Computer-Aided Engineering
  182. 182. Heuristics Miner
  183. 183. Heuristics Miner number between -1 and 1 indicating strength of causal dependency between a and b
  184. 184. Heuristics Miner
  185. 185. Heuristics Miner
  186. 186. Heuristics Miner What’s the corresponding process model?
  187. 187. Automated Process Discovery Automated process discovery method Simplicity minimal size & structural complexity Precision does not parse traces not in the log Fitness parses the traces of the log Generalization parses traces of the process not included in the log 188
  188. 188. Process Discovery Algorithms: The Two Worlds High-Fitness High-Precision Heuristic Miner Fodina Miner High-Fitness Low-Complexity
  189. 189. Process Model discovered with Heuristics Miner
  190. 190. Process Model discovered with Inductive Miner • Structured by construction • Based on process tree
  191. 191. Process Discovery Algorithms: The Two Worlds High-Fitness High-Precision High complexity Heuristic Miner Fodina Miner High-Fitness Low Precision Low-Complexity Inductive Miner Evolutionary Tree Miner
  192. 192. Split Miner Augusto, A. and Conforti, R. and Dumas, M. and La Rosa, M. (2017). Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs. ICDM 2017
  193. 193. Process Model discovered with Split Miner
  194. 194. Process Discovery Algorithms High-Fitness High-Precision Low-Complexity Split Miner
  195. 195. From Event Log to Process Model in 5 Steps 196 Directly-Follows Graph and Loops Discovery Filtering Concurrency Discovery Splits Discovery Joins Discovery Event Log Process Model
  196. 196. Trace #obs a » b » c » g » e » h 10 a » b » c » f » g » h 10 a » b » d » g » e » h 10 a » b » d » e » g » h 10 a » b » e » c » g » h 10 a » b » e » d » g » h 10 a » c » b » e » g » h 10 a » c » b » f » g » h 10 a » d » b » e » g » h 10 a » d » b » f » g » h 10 197 Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  197. 197. Trace #obs a » b » c » g » e » h 10 a » b » c » f » g » h 10 a » b » d » g » e » h 10 a » b » d » e » g » h 10 a » b » e » c » g » h 10 a » b » e » d » g » h 10 a » c » b » e » g » h 10 a » c » b » f » g » h 10 a » d » b » e » g » h 10 a » d » b » f » g » h 10 198 Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  198. 198. 199 Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model (b || c) (b || d) (d || e) (e || g)
  199. 199. 200 Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  200. 200. 201 Break for Make-up!
  201. 201. 202 Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model (b || c) (b || d) (d || e) (e || g)
  202. 202. 203 Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  203. 203. 204 Done!
  204. 204. Automated discovery of BPM Process Models in Apromore 205
  205. 205. Process Dashboards Operational dashboards Tactical dashboards Strategic dashboards Process Mining Automated process discovery Conformance checking Performance mining Variants analysis Summary
  206. 206. Topics not covered in this class • Event log filtering • Removing anomalous or infrequent behaviour from an event log • Business process drift detection • Detecting changes in a business process (over time) using event logs • Predictive process monitoring • Predicting the outcome or a future property of a process based on an event log containing completed cases, and an incomplete case 207
  207. 207. Thank you! http://fundamentals-of-bpm.org Chapter 12: Process Monitoring

×