Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discovering Concurrency: Learning (Business) Process Models from Examples

655 views

Published on

Invited Lecture International Conference on Concurrency Theory (CONCUR 2011), Aachen, Germany, September 2011.

Published in: Technology, Education
  • Be the first to comment

Discovering Concurrency: Learning (Business) Process Models from Examples

  1. 1. Discovering ConcurrencyLearning (Business) Process Models from ExamplesInvited Talk CONCUR 2011, 8-9-2011, Aachen.prof.dr.ir. Wil van der Aalstwww.processmining.org
  2. 2. Business Process Management ? PAGE 1
  3. 3. BusinessProcessManagement !! PAGE 2
  4. 4. Types of Process Models Used in BPM Emphasis on graphical models supporting a substantial part of the so-called workflow patterns (incl. concurrency) PAGE 3
  5. 5. Classical Challenges in BPM • Verification (cf. soundness problem in WF-nets) • Performance analysis (e.g., simulation) • Converting models into running systems • Providing flexibility without loosing control • …Many problems have been “solved”: the real problem is adoption in practice! PAGE 4
  6. 6. A more interesting challenge: Dealingwith variability• Variants of the same process exist in various domains, e.g., Dutch Municipalities, Hertz, Suncorp, Salesforce, Easychair, etc.• Configurable process models to generate concrete processes, cf. C-YAWL.• Merging process models is a challenging problem.• Model similarity (rather than equivalence). PAGE 5
  7. 7. So we are interested in processes … … but most of us do not study them! PAGE 6
  8. 8. Desire lines in process models PAGE 7
  9. 9. Data explosion PAGE 8
  10. 10. Process Mining = Event Data + Processes Data Mining + Process AnalysisMachine Learning + Formal Methods PAGE 9
  11. 11. Process Mining • Process discovery: "What is really happening?" • Conformance checking: "Do we do what was agreed upon?" • Performance analysis: "Where are the bottlenecks?" • Process prediction: "Will this case be late?" • Process improvement: "How to redesign this process?" • Etc. PAGE 10
  12. 12. Process Mining
  13. 13. Starting point: event log XES, MXML, SA-MXML, CSV, etc. PAGE 12
  14. 14. Simplified event log a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request PAGE 13
  15. 15. Processdiscovery PAGE 14
  16. 16. Conformancechecking case 7: e is executed without case 8: g or being h is missing enabled case 10: e is missing in second round PAGE 15
  17. 17. Extension: Adding perspectives tomodel based on event log PAGE 16
  18. 18. We applied ProM in >100 organizations• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)• Government agencies (e.g., Rijkswaterstaat, Centraal Justitieel Incasso Bureau, Justice department)• Insurance related agencies (e.g., UWV)• Banks (e.g., ING Bank)• Hospitals (e.g., AMC hospital, Catharina hospital)• Multinationals (e.g., DSM, Deloitte)• High-tech system manufacturers and their customers (e.g., Philips Healthcare, ASML, Ricoh, Thales)• Media companies (e.g. Winkwaves)• ... PAGE 17
  19. 19. All supported by …• Open-source (L-GPL), cf. www.processmining.org• Plug-in architecture• Plug-ins cover the whole process mining spectrum and also support classical forms of process analysis PAGE 18
  20. 20. Process discovery supports/ controls business processes people machines components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes conformance enhancement PAGE 19
  21. 21. Process Discovery Techniques (small selection) distributed genetic mining automata-based learning language-based regions heuristic mining state-based regions partial-order based mining genetic mining LTL mining pattern-based mining neural networks stochastic task graphsfuzzy mining mining block structures hidden Markov models α algorithm multi-phase mining conformal process graph α# algorithm ILP mining α++ algorithm PAGE 20
  22. 22. Why is process discovery such a difficultproblem?• There are no negative examples (i.e., a log shows what has happened but does not show what could not happen).• Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors.• There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 21
  23. 23. Challenge: four competing qualitycriteria“able to replay event log” “Occam’s razor”“not overfitting the log” “not underfitting the log” PAGE 22
  24. 24. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeggeneralization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 23 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  25. 25. # trace 455 acdehModel N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg 33 acdefbdeh 14 acdefbdeg 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 24 1391
  26. 26. # trace 455 acdehModel N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg 33 acdefbdeh 14 acdefbdeg 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 25 1391
  27. 27. # trace 455 acdehModel N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg 33 acdefbdeh 14 acdefbdeg 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 26 1391
  28. 28. # trace 455 acdehModel N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdegstart end register examine check decide reject request casually ticket request 11 acdefdbeg 9 adcefcdeh (all 21 variants seen in the log) 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 27 1391
  29. 29. Example of a process discovery technique:Genetic Mining• Characteristics • requires a lot of computing power, but can be distributed easily, • can deal with noise, infrequent behavior, duplicate tasks, invisible tasks, • allows for incremental improvement and combinations with other approaches (heuristics post-optimization, etc.). PAGE 28
  30. 30. Genetic process mining: Overview PAGE 29
  31. 31. Example: crossover PAGE 30
  32. 32. Example: mutation remove place b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examinestart register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request added arc request PAGE 31
  33. 33. Example of a process discoverytechnique: Theory of Regions• Two types of regions theory: − State-based regions − Language-based regions All about discovering places! PAGE 32
  34. 34. State-based regions: Two step approach d e [a,e] [a,d,e] [ a,b] a b [] [a] c c b d [a,c] [a,b,c] [a,b,c,d] b a p1 e p3 d start end p2 c p4 PAGE 33
  35. 35. Example regiona enters, b and e exit, c and d do not cross d e [a,e] [a,d,e] [ a,b] a b [] [a] c c b d [a,c] [a,b,c] [a,b,c,d] b a p1 e p3 d start end p2 c p4 PAGE 34
  36. 36. Language-based regions A place is feasible if it can be added without disabling any of the traces in the event log. R PAGE 35
  37. 37. Conformance checking supports/ controls business processes people machines components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery enhancement PAGE 36
  38. 38. Four models, one logN1 b examine thoroughly g p1 p3 pay c compensation a examine estart register casually decide p5 end request h p2 d p4 reject check ticket request f reinitiate request PAGE 37
  39. 39. PAGE 38
  40. 40. Replaying (1/7) σ1 on N1p’ = p+1 p = produced c = consumed m = missing ≤ c r = remaining ≤ p PAGE 39
  41. 41. Replaying (2/7)σ1 on N1 p’ = p+2 c’ = c+1 PAGE 40
  42. 42. Replaying (3/7)σ1 on N1 p’ = p+1 c’ = c+1 PAGE 41
  43. 43. Replaying (4/7)σ1 on N1 p’ = p+1 c’ = c+1 PAGE 42
  44. 44. Replaying (5/7)σ1 on N1 p’ = p+1 c’ = c+2 PAGE 43
  45. 45. Replaying (6/7)σ1 on N1 p’ = p+1 c’ = c+1 PAGE 44
  46. 46. Replaying (7/7) σ1 on N1 p=7 p=7 c=6 c=7 m=0 m=0 r=0 r=0 p1 p3 start p5 end c’ = c+1 p2 p4m=0r=0no problems encountered PAGE 45
  47. 47. Replaying (1/7) σ3 on N2p’ = p+1 p = produced c = consumed m = missing ≤ c r = remaining ≤ p PAGE 46
  48. 48. Replaying (2/7)σ3 on N2 p’ = p+1 c’ = c+1 PAGE 47
  49. 49. Replaying (3/7)σ3 on N2 p’ = p+1 c’ = c+1 m’ = m+1 PAGE 48
  50. 50. Replaying (4/7)σ3 on N2 p’ = p+1 c’ = c+1 PAGE 49
  51. 51. Replaying (5/7)σ3 on N2 p’ = p+1 c’ = c+1 PAGE 50
  52. 52. Replaying (6/7)σ3 on N2 p’ = p+1 c’ = c+1 PAGE 51
  53. 53. Replaying (7/7)σ3 on N2 r’ = r+1 c’ = c+1Problems:• m = 1 : d was forced to occur without being enabled• r = 1 : output of c was not used by d PAGE 52
  54. 54. Computing fitness at trace level PAGE 53
  55. 55. Computing fitness at the log level PAGE 54
  56. 56. Example valuesN1 b examine thoroughly g p1 p3 pay c compensation a examine estart register casually decide p5 end request h p2 d p4 reject check ticket request f reinitiate request PAGE 55
  57. 57. Diagnostics problem problem 430 tokens remain in place p1, 566 tokens were missing in because c did not happen while place p3 during replay, the model expected c to happen because e happened while this was not possible according to the model problem10 tokens were missing in place p1 during replay, because c happened while this was not possible according to the model 971 971 1391 1537 +430 1391 -10 c -566 1537 930 930 p1 examine p3 casually a e +607 h -461 start register decide p5 reject end request request -146 d 1537 p2 check p4 1391 ticket 1537 1537 problem problem 146 tokens were missing in problem 461 of the 1391 place p2 during replay, because 607 tokens remain in place p5, cases did not d happened while this was not because h did not happen while reach place end possible according to the model the model expected h to happen PAGE 56
  58. 58. Challenges related toconformance checking• Not as simple as it seems!• In case of duplicate tasks (two transition with the same label) or silent tasks (τ labeled transitions), multiple paths need to be considered (state space analysis, heuristics, or optimization).• More general formulation of the problem with costs associated to skipping/inserting particular tasks, see ProM latest conformance checker (A* algorithm).• Computing the most likely alignment is needed for other types of process mining (time analysis, measuring precision, social network analysis, etc.). PAGE 57
  59. 59. How can process mining help?• Detect bottlenecks • Provide mirror• Detect deviations • Highlight important• Performance problems measurement • Avoid ICT failures• Suggest improvements • Avoid management by• Decision support (e.g., PowerPoint recommendation and • From “politics” to prediction) “analytics” PAGE 58
  60. 60. PAGE 59
  61. 61. Example of a Lasagna process: WMO process of a Dutch municipalityEach line corresponds to one of the 528 requests that werehandled in the period from 4-1-2009 until 28-2-2010. In totalthere are 5498 events represented as dots. The mean timeneeded to handled a case is approximately 25 days. PAGE 60
  62. 62. WMO process (Wet Maatschappelijke Ondersteuning)• WMO refers to the social support act that came into force in The Netherlands on January 1st, 2007.• The aim of this act is to assist people with disabilities and impairments. Under the act, local authorities are required to give support to those who need it, e.g., household help, providing wheelchairs and scootmobiles, and adaptations to homes.• There are different processes for the different kinds of help. We focus on the process for handling requests for household help.• In a period of about one year, 528 requests for household WMO support were received.• These 528 requests generated 5498 events. PAGE 61
  63. 63. C-net discovered usingheuristic miner (1/3) PAGE 62
  64. 64. C-net discovered usingheuristic miner (2/3) PAGE 63
  65. 65. C-net discovered usingheuristic miner (3/3) PAGE 64
  66. 66. Conformance check WMO process (1/3) PAGE 65
  67. 67. Conformance check WMO process (2/3) PAGE 66
  68. 68. Conformance check WMO process (3/3) The fitness of the discovered process is 0.99521667. Of the 528 cases, 496 cases fit perfectly whereas for 32 cases there are missing or remaining tokens. PAGE 67
  69. 69. Bottleneck analysis WMO process (1/3) PAGE 68
  70. 70. Bottleneck analysis WMO process (2/3) PAGE 69
  71. 71. Bottleneck analysis WMO process (3/3)flow time ofapprox. 25 dayswith a standarddeviation ofapprox. 28 PAGE 70
  72. 72. Two additional Lasagna processes RWS (“Rijkswaterstaat”) process WOZ (“Waardering Onroerende Zaken”) process PAGE 71
  73. 73. RWS Process• The Dutch national public works department, called “Rijkswaterstaat” (RWS), has twelve provincial offices. We analyzed the handling of invoices in one of these offices.• The office employs about 1,000 civil servants and is primarily responsible for the construction and maintenance of the road and water infrastructure in its province.• To perform its functions, the RWS office subcontracts various parties such as road construction companies, cleaning companies, and environmental bureaus. Also, it purchases services and products to support its construction, maintenance, and administrative activities. PAGE 72
  74. 74. C-net discovered using heuristic miner PAGE 73
  75. 75. Social network constructed based onhandovers of work Each of the 271 nodes corresponds to a civil servant. Two civil servants are connected if one executed an activity causally following an activity executed by the other civil servant PAGE 74
  76. 76. Social network consisting of civil servants thatexecuted more than 2000 activities in a 9 month period. The darker arcs indicate the strongest relationships in the social network. Nodes having the same color belong to the same clique. PAGE 75
  77. 77. WOZ process• Event log containing information about 745 objections against the so-called WOZ (“Waardering Onroerende Zaken”) valuation.• Dutch municipalities need to estimate the value of houses and apartments. The WOZ value is used as a basis for determining the real-estate property tax.• The higher the WOZ value, the more tax the owner needs to pay. Therefore, there are many objections (i.e., appeals) of citizens that assert that the WOZ value is too high.• “WOZ process” discovered for another municipality (i.e., different from the one for which we analyzed the WMO process). PAGE 76
  78. 78. Discovered process modelThe log contains events related to 745 objections against theso-called WOZ valuation. These 745 objections generated 9583events. There are 13 activities. For 12 of these activities bothstart and complete events are recorded. Hence, the WF-net has PAGE 7725 transitions.
  79. 79. Conformance checker:(fitness is 0.98876214) PAGE 78
  80. 80. Performance analysis PAGE 79
  81. 81. Resource-activity matrix(four groups discovered) PAGE 80
  82. 82. PAGE 81
  83. 83. Example of a Spaghetti processSpaghetti process describing the diagnosis and treatment of 2765patients in a Dutch hospital. The process model was constructedbased on an event log containing 114,592 events. There are 619different activities (taking event types into account) executed by 266different individuals (doctors, nurses, etc.). PAGE 82
  84. 84. Fragment18 activities of the 619 activities (2.9%) PAGE 83
  85. 85. Another example(event log of Dutch housing agency) The event log contains 208 cases that generated 5987 events. There are 74 different activities. PAGE 84
  86. 86. PAGE 85
  87. 87. Conclusion• Many concurrency-related BPM challenges.• Process leave traces in event logs. So, if you are interested in processes, use them!• Process mining: challenging and highly relevant.• Process discovery challenge − balancing between different objectives − only example behavior• Conformance checking challenge − finding the most likely trace − dealing with silent/duplicate steps• Eldorado for exciting concurrency research! PAGE 86
  88. 88. www.processmining.org www.win.tue.nl/ieeetfpm/ PAGE 87

×