Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
On the Role of Fitness,
Precision, Generalization and
Simplicity in Process
Discovery

Joos Buijs
Boudewijn van Dongen
Wil...
Advances in Process Mining

• Many process discovery and conformance checking algorithms
  and tools are available (cf. th...
Example Process Discovery
(Vestia, Dutch housing agency, 208 cases, 5987 events)




                                     ...
Example: Conformance Checking
(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)




              ...
Challenge:
Four Competing Quality Criteria

 “able to replay event log”                 “Occam’s razor”

replay fitness   ...
Example: one log four models
                                                                                             ...
#     trace
                                                                               455 acdeh
        Model N1     ...
#     trace
                                                                              455 acdeh
        Model N2      ...
#     trace
                                                                                          455 acdeh
        Mo...
#     trace
                                                                                               455 acdeh
Model...
Another challenge: Huge search space

M    M M         M              M
  M M                M M             M M         M...
… with just a few interesting candidates

M    M M         M
  M M                M M       M    M M        M             ...
Two requirements

1. It should be possible to      “able to replay event log”                 “Occam’s razor”

           ...
Proposal: Evolutionary Tree Miner (ETM)

• Process trees as representation (= limit search space
  to "good" models).
• Ge...
Representational Bias: Process Trees

         →           • Always sound because of the
                       block stru...
Petri Net Semantics
(used for comparison and conformance checking only)




                                              ...
Steps of the Genetic ETM Algorithm



              Change
             Population

                            No
  Creat...
Population Change


                           Elite




Population i                               Population i+1




   ...
Four Metrics (see paper)

          “able to replay event log”                 “Occam’s razor”

         replay fitness   ...
Example



                              B            E

                 A            C                       G

        ...
Conventional Algorithms (1/3)
   ("best effort" mapping to process trees to allow for comparison)

alpha miner




       ...
Conventional Algorithms (2/3)

heuristic miner




multi-phase miner




                                   PAGE 21
Conventional Algorithms (3/3)

genetic miner




state-based region miner




                                  PAGE 22
Often unsound result and no mechanism
to seamlessly balance the four criteria




                                     PAG...
Genetic Mining (ETM) While Considering
    Only One Criterion




                                                        ...
Considering Replay Fitness and One
    Other Criterion




ETM with weight zero to two out of four perspectives.   PAGE 25
Considering 3 of 4 Criteria




     replay fitness needs to have a larger weight   PAGE 26
Considering All Four Criteria with
Emphasis on Fitness




    fitness has weight 10

                                    ...
Initial Model Versus Discovered Model




    simulated          discovered by ETM




    B    E

A   C           G

    ...
Real-Life Event Logs

• Event log L0 is the event log used before. L0 contains
  100 traces, 590 events and 7 activities.
...
Results
          If unsound , the
          sound behavior
          is approximated
          when creating
          th...
Conclusion

 “able to replay event log”                 “Occam’s razor”

replay fitness                                sim...
www.processmining.org
  www.win.tue.nl/ieeetfpm/
                             PAGE 35
Upcoming SlideShare
Loading in …5
×

of

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 1 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 2 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 3 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 4 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 5 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 6 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 7 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 8 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 9 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 10 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 11 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 12 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 13 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 14 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 15 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 16 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 17 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 18 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 19 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 20 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 21 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 22 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 23 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 24 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 25 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 26 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 27 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 28 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 29 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 30 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 31 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 32 On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Slide 33
Upcoming SlideShare
Event Logs: What kind of data does process mining require?
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

  1. 1. On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Joos Buijs Boudewijn van Dongen Wil van der Aalst http://www.win.tue.nl/coselog/
  2. 2. Advances in Process Mining • Many process discovery and conformance checking algorithms and tools are available (cf. the various ProM packages). • Also commercial software based on these ideas: Disco (Fluxicon), Reflect (Futura/Perceptive), BPMOne (Pallas Athena/Perceptive), ARIS Process Performance Manager (Software AG), Interstage Automated Process Discovery (Fujitsu), QPR ProcessAnalyzer/Analysis (QPR Software), flow (fourspark), Discovery Analyst (StereoLOGIC), etc. • We applied process mining in over 100 organizations. More than 75 people involving more than 50 organizations created the Process Mining Manifesto in the context of the IEEE Task Force on Process Mining. Available in 13 languages PAGE 1
  3. 3. Example Process Discovery (Vestia, Dutch housing agency, 208 cases, 5987 events) PAGE 2
  4. 4. Example: Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988) PAGE 3
  5. 5. Challenge: Four Competing Quality Criteria “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 4
  6. 6. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh replay fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeg generalization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 5 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  7. 7. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbeg start register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg request N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 6 1391
  8. 8. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdeh start register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 7 1391
  9. 9. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdeg start register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 8 1391
  10. 10. # trace 455 acdeh Model N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdeg start end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 9 1391
  11. 11. Another challenge: Huge search space M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M MM M M M M M M MM M M M M M M M M M M M M M M M M M M M M M M M M M M M M M MM M MM M M MM M M M M M M MM M MM M MM M MM MM MM M M M M M M M M M M M M M M M M M M M M M MM M M M M M M M M M M M M M M M M MM M M M M M M M M M M M M M M M M M MM M M M M M M M M MM M M M MM M M M M M M M MM M M M M MM M M M M M M MM M M M M M M M M M M M MM M MM M M MM M M M M M M M M M MM M M M MM M M M M M MM M MM MM M M M M M M M M M M M M M M MM M M M M M M M M M M M MM M M M M M M M MM M M M M MM M M M M M M M M M M MM M M M M M M M MM M M M M M M MM M M M MM M MM M M M M MM M M MM M MM M M M M MM M M M M PAGE 10
  12. 12. … with just a few interesting candidates M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M MM M M M M M MM M M M M M M M M M M M M M M M M M M M M M M M M MM M M M M M M M MM M MM M M M M M M MM M M M M MM M MM M MM MM M M M M MM M M M MM M M M M M M M M MM M M MM M M M M M M M M M M M M M M M M M MM M M M M M M MM M M M M M M M M M M M MM MM M M M M M MM M M M M M M M MM M M M M MM M M M M M MM M M M M M M M M M M M M MM M M M M M M M MM M M M MM M M MM M M M M MM M MM MM M M M M M M M M MM M M M M M M M M MM M M M M M M M M M M M M M MM M M M M M M M MM M M M M MM M M M M M M M M M M MM M M M M M M M M MM M M M M M MM M M M M M M M M M MM M M M MM M M M M M MM M MM M M M M M PAGE 11
  13. 13. Two requirements 1. It should be possible to “able to replay event log” “Occam’s razor” replay fitness simplicity seamlessly balance the different quality criteria process discovery based on user-defined generalization precision preferences. “not overfitting the log” “not underfitting the log” 2. The algorithm should always return a "correct" process model and not waste time on model having deadlocks and other anomalies. PAGE 12
  14. 14. Proposal: Evolutionary Tree Miner (ETM) • Process trees as representation (= limit search space to "good" models). • Genetic approach (= very flexible) • Fitness function uses all four criteria (= seamlessly balance the different "forces") PAGE 13
  15. 15. Representational Bias: Process Trees → • Always sound because of the block structure A A → • Also Loop and OR operator ∧ E B C D X D A B B C A (BA)* E PAGE 14
  16. 16. Petri Net Semantics (used for comparison and conformance checking only) A A B Sequence Parallellism B A B A Exclusive Choice A B B Or Choice Loop PAGE 15
  17. 17. Steps of the Genetic ETM Algorithm Change Population No Create Return Measure Stop Initial Best Quality ? Yes Population Individual PAGE 16
  18. 18. Population Change Elite Population i Population i+1 Selection Replace Crossover Mutation PAGE 17
  19. 19. Four Metrics (see paper) “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” 1 = optimal 0 = very bad PAGE 18
  20. 20. Example B E A C G D F A = send e-mail, B = check credit, C = calculate capacity, D = check system, E = accept, F = reject, G = send e-mail PAGE 19
  21. 21. Conventional Algorithms (1/3) ("best effort" mapping to process trees to allow for comparison) alpha miner low fitness ILP miner low precision language-based region miner low fitness PAGE 20
  22. 22. Conventional Algorithms (2/3) heuristic miner multi-phase miner PAGE 21
  23. 23. Conventional Algorithms (3/3) genetic miner state-based region miner PAGE 22
  24. 24. Often unsound result and no mechanism to seamlessly balance the four criteria PAGE 23
  25. 25. Genetic Mining (ETM) While Considering Only One Criterion best value possible for this log ETM with weight zero to three out of four perspectives. PAGE 24
  26. 26. Considering Replay Fitness and One Other Criterion ETM with weight zero to two out of four perspectives. PAGE 25
  27. 27. Considering 3 of 4 Criteria replay fitness needs to have a larger weight PAGE 26
  28. 28. Considering All Four Criteria with Emphasis on Fitness fitness has weight 10 PAGE 27
  29. 29. Initial Model Versus Discovered Model simulated discovered by ETM B E A C G D F PAGE 28
  30. 30. Real-Life Event Logs • Event log L0 is the event log used before. L0 contains 100 traces, 590 events and 7 activities. • Event Log L1 contains 105 traces, 743 events in total, with 6 different activities. • Event Log L2 contains 444 traces, 3.269 events in total, with 6 different activities. • Event Log L3 contains 274 traces, 1:582 events in total, with 6 different activities. Event logs L1, L2 and L3 are extracted from the information systems of municipalities participating in the CoSeLoG project (http://www.win.tue.nl/coselog/). PAGE 29
  31. 31. Results If unsound , the sound behavior is approximated when creating the process tree. Equal weights for all criteria. PAGE 30
  32. 32. Conclusion “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” • First algorithm that allows for balancing all four perspectives. • Genetic algorithm is very flexible, but also very slow. • Process trees only used internally (choose your favorite representation) • Future work: − Improve speed − Distribute PM tasks − Discover configurable process trees PAGE 34
  33. 33. www.processmining.org www.win.tue.nl/ieeetfpm/ PAGE 35

Views

Total views

1,651

On Slideshare

0

From embeds

0

Number of embeds

52

Actions

Downloads

52

Shares

0

Comments

0

Likes

0

×