Keynote on Process Mining at SSCI 2010 / CIDM 2011

629 views

Published on

Keynote IEEE Symposium Series on Computational Intelligence (SSCI 2011)/IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), April 2011, Paris, France

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
629
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Keynote on Process Mining at SSCI 2010 / CIDM 2011

  1. 1. Process Mining:Discovering and ImprovingSpaghetti and Lasagna Processesprof.dr.ir. Wil van der Aalstwww.processmining.org
  2. 2. Architecture of Information Systems @ TU/e process BPM/WFM/ discovery SOA systems Process PAIS Mining Technology conformance workflow checking patterns simulation Process Modeling/ Analysis verification
  3. 3. Data explosion PAGE 2
  4. 4. The Worlds Technological Capacity to Store, Communicate, and ComputeInformation by Martin Hilbert and Priscila López (DOI 10.1126/science.1200970) PAGE 3
  5. 5. Process Mining = (RM,RD) c11 modify conditions (YE,RD) check_A c5 (RM,RD) c2 check_A c8 (E,SD) needed? (RM,RD) (E,RD) Smoker c6(YE,RD) No start register c1 initial c3 check_B check_B c9 asses c12 decline conditions needed? risk Yes c7(FE,FD) Drinker c4 check_C check_C c10 needed? Short(91/10) <81.5 Yes Weight ≥81.5 No Long (30/1) + (SM,SD) (E,SD) c13 (E,FD) (E,SD) make c14 handle c15 handle c16 send offer response payment insurance documents (E,SD) Long Short (150/20) (321/25) c17 withdraw timeout1 timeout2 offerData Mining Process Analysis PAGE 4
  6. 6. Process Mining • Process discovery: "What is really happening?" • Conformance checking: "Do we do what was agreed upon?" • Performance analysis: "Where are the bottlenecks?" • Process prediction: "Will this case be late?" • Process improvement: "How to redesign this process?" • Etc. PAGE 5
  7. 7. We applied ProM in >100 organizations• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)• Government agencies (e.g., Rijkswaterstaat, Centraal Justitieel Incasso Bureau, Justice department)• Insurance related agencies (e.g., UWV)• Banks (e.g., ING Bank)• Hospitals (e.g., AMC hospital, Catharina hospital)• Multinationals (e.g., DSM, Deloitte)• High-tech system manufacturers and their customers (e.g., Philips Healthcare, ASML, Ricoh, Thales)• Media companies (e.g. Winkwaves)• ... PAGE 6
  8. 8. Process Mining supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement
  9. 9. Starting point: event log XES, MXML, SA-MXML, CSV, etc. PAGE 8
  10. 10. Simplified event log a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request PAGE 9
  11. 11. Processdiscovery b examine thoroughly g c1 c3 pay c compensation a examine estart register casually decide c5 end request h c2 d c4 reject check ticket request f reinitiate request PAGE 10
  12. 12. Conformancechecking b case 7: e is executed examine without thoroughly case 8: g or being g h is missing enabled c1 c3 pay c compensation a examine estart register casually decide c5 end request case 10: e h d is missing c2 c4 reject in second check ticket round request f reinitiate request PAGE 11
  13. 13. Extension: Adding perspectives tomodel based on event log The event log can be used to discover roles in the organization (e.g., groups of people with similar work patterns). These roles can be Performance information (e.g., the used to relate individuals and average time between two activities. subsequent activities) can be extracted from the event log and visualized on top of the model. Role A: Role E: Role M: Assistant Expert Manager Decision rules (e.g., a decision tree based on data known at the time a Pete Sue Sara particular choice was made) can be learned from the event log and used Mike Sean to annotated decisions. Ellen E b A examine thoroughly A g A M c1 c3 pay c compensation a examine e A start register casually A decide c5 end request h c2 d c4 M reject check ticket request f reinitiate request PAGE 12
  14. 14. Let us play …
  15. 15. Play-Out process model event log PAGE 14
  16. 16. Play-Out (Classical use of models) B A p1 E p3 Dstart end p2 C p4 A B C D AED AED ABCD ACBD ACBD AED ACBD PAGE 15
  17. 17. Play-Inevent log process model PAGE 16
  18. 18. Play-InABCD AED AED ABCD ACBDACBD AED ACBD B A p1 E p3 Dstart end p2 C p4 PAGE 17
  19. 19. Replay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendationsevent log process model PAGE 18
  20. 20. Replay ABC D B A p1 E p3 Dstart end p2 C p4 PAGE 19
  21. 21. Replay can detect problems AC D Problem! Problem! token left behind B missing token A p1 E p3 Dstart end p2 C p4 PAGE 20
  22. 22. Replay can extract timing information A5 B8 C9 D13 8 5 6 4 7 3 B 2 5 8 A p1 E p3 Dstart end 5 13 4 p2 3 C p4 4 37 4 7 6 9 PAGE 21
  23. 23. Desire lines in process models PAGE 22
  24. 24. An example algorithm PAGE 23
  25. 25. Process Discovery:basic idea α PAGE 24
  26. 26. >,→,||,# relations• Direct succession: x>y iff for some case x is directly followed by y. abcd• Causality: x→y iff x>y and acbd not y>x. aed• Parallel: x||y iff x>y and a>b y>x a>c a→b b#e a>e• Choice: x#y iff not x>y and a→c e#b b>c b||c c#e not y>x. a→e b>d c||b c>b b→d a#d … c>d c→d e>d e→d PAGE 25
  27. 27. Basic Idea Used by α Algorithm (1) a b (a) sequence pattern: a→b PAGE 26
  28. 28. Basic Idea Used by α Algorithm (2) a b c b a (c) XOR-join pattern: b a→c, b→c, and a#b a c c (b) XOR-split pattern: (b) XOR-split pattern:a→c, and b#c a→b, a→b, a→c, and b#c PAGE 27
  29. 29. Basic Idea Used by α Algorithm (3) a b c b a (e) AND-join pattern: b a→c, b→c, and a||b a c c (d) AND-split pattern: (d) AND-split pattern: a→b, a→c, and b||c a→b, a→c, and b||c PAGE 28
  30. 30. Example Revisited a>b a→b b||c b#e a>c a→c c||b e#b a>e a→e c#e b>c a#d b→d b>d … c>b c→ d c>d e→d b e>d a p1 e p3 d start end p2 c p4Result produced by α algorithm PAGE 29
  31. 31. PAGE 30
  32. 32. Challenge: four competing qualitycriteria “able to replay event log” “Occam’s razor” fitness simplicity process discoverygeneralization precision “not overfitting the log” “not underfitting the log” PAGE 31
  33. 33. Flower model b c a d start end e h f g PAGE 32
  34. 34. What is the best model? A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 33
  35. 35. What is the best model? A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 34
  36. 36. What is the best model? A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 35
  37. 37. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeggeneralization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 36 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  38. 38. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbegstart register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg requestN1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 37 1391
  39. 39. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdehstart register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 38 1391
  40. 40. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdegstart register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 39 1391
  41. 41. # trace 455 acdehModel N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdegstart end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 40 1391
  42. 42. Why is process mining such a difficultproblem?• There are no negative examples (i.e., a log shows what has happened but does not show what could not happen).• Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors.• There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 41
  43. 43. How can process mining help?• Detect bottlenecks • Provide mirror• Detect deviations • Highlight important• Performance problems measurement • Avoid ICT failures• Suggest improvements • Avoid management by• Decision support (e.g., PowerPoint recommendation and • From “politics” to prediction) “analytics” PAGE 42
  44. 44. PAGE 43
  45. 45. Example of a Lasagna process: WMO process of a Dutch municipalityEach line corresponds to one of the 528 requests that werehandled in the period from 4-1-2009 until 28-2-2010. In totalthere are 5498 events represented as dots. The mean timeneeded to handled a case is approximately 25 days. PAGE 44
  46. 46. WMO process (Wet Maatschappelijke Ondersteuning)• WMO refers to the social support act that came into force in The Netherlands on January 1st, 2007.• The aim of this act is to assist people with disabilities and impairments. Under the act, local authorities are required to give support to those who need it, e.g., household help, providing wheelchairs and scootmobiles, and adaptations to homes.• There are different processes for the different kinds of help. We focus on the process for handling requests for household help.• In a period of about one year, 528 requests for household WMO support were received.• These 528 requests generated 5498 events. PAGE 45
  47. 47. C-net discovered usingheuristic miner (1/3) PAGE 46
  48. 48. C-net discovered usingheuristic miner (2/3) PAGE 47
  49. 49. C-net discovered usingheuristic miner (3/3) PAGE 48
  50. 50. Conformance check WMO process (1/3) PAGE 49
  51. 51. Conformance check WMO process (2/3) PAGE 50
  52. 52. Conformance check WMO process (3/3) The fitness of the discovered process is 0.99521667. Of the 528 cases, 496 cases fit perfectly whereas for 32 cases there are missing or remaining tokens. PAGE 51
  53. 53. Bottleneck analysis WMO process (1/3) PAGE 52
  54. 54. Bottleneck analysis WMO process (2/3) PAGE 53
  55. 55. Bottleneck analysis WMO process (3/3)flow time ofapprox. 25 dayswith a standarddeviation ofapprox. 28 PAGE 54
  56. 56. Two additional Lasagna processes RWS (“Rijkswaterstaat”) process WOZ (“Waardering Onroerende Zaken”) process PAGE 55
  57. 57. RWS Process• The Dutch national public works department, called “Rijkswaterstaat” (RWS), has twelve provincial offices. We analyzed the handling of invoices in one of these offices.• The office employs about 1,000 civil servants and is primarily responsible for the construction and maintenance of the road and water infrastructure in its province.• To perform its functions, the RWS office subcontracts various parties such as road construction companies, cleaning companies, and environmental bureaus. Also, it purchases services and products to support its construction, maintenance, and administrative activities. PAGE 56
  58. 58. C-net discovered using heuristic miner PAGE 57
  59. 59. Social network constructed based onhandovers of work Each of the 271 nodes corresponds to a civil servant. Two civil servants are connected if one executed an activity causally following an activity executed by the other civil servant PAGE 58
  60. 60. Social network consisting of civil servants thatexecuted more than 2000 activities in a 9 month period. The darker arcs indicate the strongest relationships in the social network. Nodes having the same color belong to the same clique. PAGE 59
  61. 61. WOZ process• Event log containing information about 745 objections against the so-called WOZ (“Waardering Onroerende Zaken”) valuation.• Dutch municipalities need to estimate the value of houses and apartments. The WOZ value is used as a basis for determining the real-estate property tax.• The higher the WOZ value, the more tax the owner needs to pay. Therefore, there are many objections (i.e., appeals) of citizens that assert that the WOZ value is too high.• “WOZ process” discovered for another municipality (i.e., different from the one for which we analyzed the WMO process). PAGE 60
  62. 62. Discovered process modelThe log contains events related to 745 objections against theso-called WOZ valuation. These 745 objections generated 9583events. There are 13 activities. For 12 of these activities bothstart and complete events are recorded. Hence, the WF-net has PAGE 6125 transitions.
  63. 63. Conformance checker:(fitness is 0.98876214) PAGE 62
  64. 64. Performance analysis bottleneck detection: places are colored based on average durations time required to move from one activity to another information on total flow time PAGE 63
  65. 65. Resource-activity matrix(four groups discovered) clique 1 clique 2 clique 3 clique 4 PAGE 64
  66. 66. PAGE 65
  67. 67. Example of a Spaghetti processSpaghetti process describing the diagnosis and treatment of 2765patients in a Dutch hospital. The process model was constructedbased on an event log containing 114,592 events. There are 619different activities (taking event types into account) executed by 266different individuals (doctors, nurses, etc.). PAGE 66
  68. 68. Fragment18 activities of the 619 activities (2.9%) PAGE 67
  69. 69. Another example(event log of Dutch housing agency) The event log contains 208 cases that generated 5987 events. There are 74 different activities. PAGE 68
  70. 70. PAGE 69
  71. 71. PAGE 70
  72. 72. Example of a map Road map of The Netherlands. The map abstracts from smaller cities and less significant roads; only the bigger cities, highways, and other important roads are shown. Moreover, cities aggregate local roads and local districts. Also not use of color, size, etc. PAGE 71
  73. 73. Illustrating the problemx start y 1.0 z 1.0 1.0 a f j p3 p9 p1 p12 p7 0.4 0.3 0.4 0.6 0.6 0.4 0.3 0.6 0.4 b c d g h k l0.4 0.3 0.3 p4 0.4 0.6 p10 p2 p8 p5 p11 1.0 e i 1.0 p6 end PAGE 72
  74. 74. Classical top level view: low level connections still exist p3 p9 p4 x y z p10 p5 p11x start y 1.0 z 1.0 p6 1.0 a f j p3 p9 p1 p12 p7 0.4 0.3 0.4 0.6 0.6 0.4 0.3 0.6 0.4 b c d g h k l0.4 0.3 0.3 p4 0.4 0.6 p10 p2 p8 p5 p11 1.0 e i 1.0 p6 end PAGE 73
  75. 75. Seamless zoomThreshold: 1.0 x y z a f j x y z e iThreshold: 0.6 x y z a f j h k x y z e iThreshold: 0.4 x y z a f j b g h k l x y z e iThreshold: 0.3 x y z a f j b c d g h k l x y z e i PAGE 74
  76. 76. Example: Reviewing papers(100 cases generating 3730 events) WF-net discovered using the α-algorithm PAGE 75
  77. 77. Fuzzy miner: two views on the same process fuzzy model showing fuzzy model all activities showing only two activities color and width of arc indicates significanceof connection PAGE 76
  78. 78. Balancing between both extremes fuzzy model showing all activities fuzzy model showing only two activities color and width of arc indicates significanceof connection aggregated node containing 10 activities inner structure of aggregated node PAGE 77
  79. 79. Not a single map! PAGE 78
  80. 80. Projecting dynamic information onbusiness process maps PAGE 79
  81. 81. Projecting traffic jams on maps PAGE 80
  82. 82. Business process movies PAGE 81
  83. 83. Navigation• Whereas a TomTom device is continuously showing the expected arrival time, users of today’s information systems are often left clueless about likely outcomes of the cases they are working on.• Car navigation systems provide directions and guidance without controlling the driver. The driver is still in control, but, given a goal (e.g. to get from A to B as fast as possible), the navigation system recommends the next action to be taken.• Operational support provides TomTom functionality for business processes. PAGE 82
  84. 84. Recommend: How to get home ASAP? Take a left turn! Detect: You drive too fast! Predict: When will I be home? At 11.26! PAGE 83
  85. 85. Conclusion: two types of processes PAGE 84
  86. 86. www.processmining.org www.win.tue.nl/ieeetfpm/ PAGE 85

×