Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Process Mining - Chapter 5 - Process Discovery

1,725 views

Published on

Slides supporting the book "Process Mining: Discovery, Conformance, and Enhancement of Business Processes" by Wil van der Aalst. See also http://springer.com/978-3-642-19344-6 (ISBN 978-3-642-19344-6) and the website http://www.processmining.org/book/start providing sample logs.

Published in: Business, Technology
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ◆◆◆ http://scamcb.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Doctor's 2-Minute Ritual For Shocking Daily Belly Fat Loss! Watch This Video ♥♥♥ https://tinyurl.com/bkfitness4u
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ★★ How Long Does She Want You to Last? ★★ A recent study proved that the average man lasts just 2-5 minutes in bed (during intercourse). The study also showed that many women need at least 7-10 minutes of intercourse to reach "The Big O" - and, worse still... 30% of women never get there during intercourse. Clearly, most men are NOT fulfilling there women's needs in bed. Now, as I've said many times - how long you can last is no guarantee of being a GREAT LOVER. But, not being able to last 20, 30 minutes or more, is definitely a sign that you're not going to "set your woman's world on fire" between the sheets. Question is: "What can you do to last longer?" Well, one of the best recommendations I can give you today is to read THIS report. In it, you'll discover a detailed guide to an Ancient Taoist Thrusting Technique that can help any man to last much longer in bed. I can vouch 100% for the technique because my husband has been using it for years :) Here's the link to the report ➤➤ https://tinyurl.com/rockhardxxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for sharing, greeting from Ecuador
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Process Mining - Chapter 5 - Process Discovery

  1. 1. Chapter 5Process Discovery:An Introductionprof.dr.ir. Wil van der Aalstwww.processmining.org
  2. 2. OverviewChapter 1IntroductionPart I: PreliminariesChapter 2 Chapter 3Process Modeling and Data MiningAnalysisPart II: From Event Logs to Process ModelsChapter 4 Chapter 5 Chapter 6Getting the Data Process Discovery: An Advanced Process Introduction Discovery TechniquesPart III: Beyond Process DiscoveryChapter 7 Chapter 8 Chapter 9Conformance Mining Additional Operational SupportChecking PerspectivesPart IV: Putting Process Mining to WorkChapter 10 Chapter 11 Chapter 12Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes”Part V: ReflectionChapter 13 Chapter 14Cartography and EpilogueNavigation PAGE 1
  3. 3. Process discovery supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 2
  4. 4. Process discovery = Play-InPlay-In event log process modelPlay-Out process model event logReplay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendations event log process model PAGE 3
  5. 5. Example b a p1 e p3 dstart end p2 c p4 Event log contains all possible traces of model and vice versa. PAGE 4
  6. 6. Another example p1 b p3 a f e dstart p5 end p2 c p4 Generalization: event log contains only subset of all possible traces of model. PAGE 5
  7. 7. Notation is less relevant (e.g. BPMN) b a p1 e p3 dstart end b p2 c p4 c a d start end e PAGE 6
  8. 8. Another BPMN example p1 b p3 a f e dstart p5 end b p2 c p4 c a d start end f e PAGE 7
  9. 9. Challenge• In general, there is a trade-off between the following four quality criteria:1.Fitness: the discovered model should allow for the behavior seen in the event log.2.Precision (avoid underfitting): the discovered model should not allow for behavior completely unrelated to what was seen in the event log.3.Generalization (avoid overfitting): the discovered model should generalize the example behavior seen in the event log.4.Simplicity: the discovered model should be as simple as possible. PAGE 8
  10. 10. Process Discovery:example of algorithm α PAGE 9
  11. 11. >,→,||,# relations• Direct succession: x>y iff for some case x is directly followed by y. abcd• Causality: x→y iff x>y and acbd not y>x. aed• Parallel: x||y iff x>y and a>b y>x a>c a→b b#e a>e• Choice: x#y iff not x>y and a→c e#b b>c b||c c#e not y>x. a→e b>d c||b c>b b→d a#d … c>d c→d e>d e→d PAGE 10
  12. 12. Basic Idea Used by α Algorithm (1) a b (a) sequence pattern: a→b PAGE 11
  13. 13. Basic Idea Used by α Algorithm (2) a b c b a (c) XOR-join pattern: b a→c, b→c, and a#b a c c (b) XOR-split pattern: (b) XOR-split pattern:a→c, and b#c a→b, a→b, a→c, and b#c PAGE 12
  14. 14. Basic Idea Used by α Algorithm (3) a b c b a (e) AND-join pattern: b a→c, b→c, and a||b a c c (d) AND-split pattern: (d) AND-split pattern: a→b, a→c, and b||c a→b, a→c, and b||c PAGE 13
  15. 15. Example Revisited a>b a→b b||c b#e a>c a→c c||b e#b a>e a→e c#e b>c a#d b→d b>d … c>b c→ d c>d e→d b e>d a p1 e p3 d start end p2 c p4Result produced by α algorithm PAGE 14
  16. 16. Footprint of L1 b a p1 e p3 d start end p2 c p4 PAGE 15
  17. 17. Footprint of L2 p1 b p3 a f e d start p5 end p2 c p4 PAGE 16
  18. 18. Simple patterns a b (a) sequence pattern: a→b b a a c c b (b) XOR-split pattern: (c) XOR-join pattern: a→b, a→c, and b#c a→c, b→c, and a#b b a a c c b (d) AND-split pattern: (e) AND-join pattern: PAGE 17 a→b, a→c, and b||c a→c, b→c, and a||b
  19. 19. AlgorithmLet L be an event log over T. α(L) is defined as follows.1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ},2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) },3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) },4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL},7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈ YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and8. α(L) = (PL,TL,FL). PAGE 18
  20. 20. Key idea: find places a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn}4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, PAGE 19
  21. 21. Places as footprints a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} PAGE 20
  22. 22. b a p1 e p3 dstart end p2 c p4 PAGE 21
  23. 23. Another event log L3 PAGE 22
  24. 24. Model for L3 f c a b p({b},{c}) p({c},{e}) e giL p({a,f},{b}) d p({e},{f,g}) oL p({b},{d}) p({d},{e}) PAGE 23
  25. 25. Another event log L4 a d ciL p({a,b},{c}) p({c},{d,e}) oL b e PAGE 24
  26. 26. Event log L5 PAGE 25
  27. 27. PAGE 26
  28. 28. Discovered model d c p({c},{d}) b p({a,d},{b}) p({b},{c,f})iL a f oL e p({a},{e}) p({e},{f}) PAGE 27
  29. 29. Limitation of α algorithm (implicit places) c a d p1 g e p2 b fGreen places are implicit! PAGE 28
  30. 30. Limitation of α algorithm(loops of length 1) b a c b a c PAGE 29
  31. 31. Limitation of α algorithm(loops of length 2) c a b d b a d c PAGE 30
  32. 32. Limitation of α algorithm(non-local dependencies) a p1 d c b p2 eGreen places are not discovered! PAGE 31
  33. 33. Difficult constructs for α algorithm a c b PAGE 32
  34. 34. Taking the transactional life-cycle into accounta c assign assign b start assigned assigned suspend running start start suspended resume complete running running complete complete PAGE 33
  35. 35. Rediscovering process models simulate discover discovered original process event process model log model N N’ N=N’ ? The rediscovery problem: Is the discovered model N’ equivalent to the original model N? PAGE 34
  36. 36. Equivalence: trace equivalence, bisimilarity, and branching bisimilarity s1 s5 s8 birth curse birth birth curse curse s6 curse curse ? s9 s10 s2 curse heaven hell heaven hell heaven hell hell heavencurse heaven s3 s4 s7 s11 TS1 TS2 TS3Three trace equivalent transition systems: TS1 and TS2are not bisimilar, but TS2 and TS3 are bisimilar PAGE 35
  37. 37. Branching bisimilarity defined for YAWL start s1 s6 start check check check check s2 τ τ s7 c1 c2 s3 s4 c3 ?reject accept reject accept reject accept reject accept end s8 end s5 TS1 TS2TS1 and TS2 are not branching bisimilar (although trace equivalent). PAGE 36
  38. 38. Challenge: finding the rightrepresentational bias a a start p end There is no WF-net with unique visible labels that exhibits this behavior. PAGE 37
  39. 39. Another example τ a b cstart p1 p1 end (a) a a b c There is no WF-start p1 p1 end net with unique (b) visible labels that exhibits this behavior. a b cstart p1 p1 end PAGE 38 (c)
  40. 40. Challenge: noise and incompleteness• To discover a suitable process model it is assumed that the event log contains a representative sample of behavior.• Two related phenomena: − Noise: the event log contains rare and infrequent behavior not representative for the typical behavior of the process. − Incompleteness: the event log contains too few events to be able to discover some of the underlying control-flow structures. PAGE 39
  41. 41. More on incompletenessSee also chapter 3 (cross-validation, precision, recall, etc.) PAGE 40
  42. 42. Challenge: BalancingBetween Underfitting andOverfitting PAGE 41
  43. 43. Challenge: four competing qualitycriteria “able to replay event log” “Occam’s razor” fitness simplicity process discoverygeneralization precision “not overfitting the log” “not underfitting the log” PAGE 42
  44. 44. Flower model b c a d start end e h f g PAGE 43
  45. 45. What is the best model? A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 44
  46. 46. What is the best model? A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 45
  47. 47. What is the best model? A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 46
  48. 48. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeggeneralization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 47 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  49. 49. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbegstart register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg requestN1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 48 1391
  50. 50. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdehstart register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 49 1391
  51. 51. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdegstart register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 50 1391
  52. 52. # trace 455 acdehModel N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdegstart end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 51 1391
  53. 53. Why is process mining such a difficultproblem?• There are no negative examples (i.e., a log shows what has happened but does not show what could not happen).• Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors.• There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 52
  54. 54. Creating a 2-D slice of a 3-D realityCreating a 2-D slice of a 3-D reality: theprocess is viewed from a specific angle, theprocess is scoped using a frame, and theresolution determines the granularity of theresulting model PAGE 53

×