SlideShare a Scribd company logo
Chapter 6
Advanced Process
Discovery Techniques
prof.dr.ir. Wil van der Aalst
www.processmining.org
Overview
Chapter 1
Introduction



Part I: Preliminaries

Chapter 2                   Chapter 3
Process Modeling and        Data Mining
Analysis


Part II: From Event Logs to Process Models

Chapter 4                  Chapter 5               Chapter 6
Getting the Data           Process Discovery: An   Advanced Process
                           Introduction            Discovery Techniques


Part III: Beyond Process Discovery

Chapter 7                   Chapter 8              Chapter 9
Conformance                 Mining Additional      Operational Support
Checking                    Perspectives


Part IV: Putting Process Mining to Work

Chapter 10                  Chapter 11             Chapter 12
Tool Support                Analyzing “Lasagna     Analyzing “Spaghetti
                            Processes”             Processes”


Part V: Reflection

Chapter 13                  Chapter 14
Cartography and             Epilogue
Navigation
                                                                          PAGE 1
Process discovery

                              supports/
      “world”    business
                               controls
                processes                      software
   people   machines                            system
        components
           organizations                              records
                                                   events, e.g.,
                                                    messages,
                                   specifies       transactions,
    models
                                  configures            etc.
   analyzes
                                 implements
                                   analyzes


                            discovery
        (process)                                 event
                            conformance
          model                                    logs
                            enhancement
                                                                   PAGE 2
Challenge

 “able to replay event log”                 “Occam’s razor”

          fitness                             simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”



                                                              PAGE 3
Observing a stable process infinitely long

       frequent                  all behavior
       behavior    trace in   (including noise)
                  event log




                                                  PAGE 4
Target model


               target model




                              PAGE 5
Non-fitting model


                    non-fitting model




                                        PAGE 6
Overfitting model


                    overfitting model




                                        PAGE 7
Underfitting model


               underfitting model




                                    PAGE 8
Characteristics of process discovery
 algorithms
• Representational bias
   −   Inability to represent concurrency
   −   Inability to deal with (arbitrary) loops
   −   Inability to represent silent actions
   −   Inability to represent duplicate actions
   −   Inability to model OR-splits/joins
   −   Inability to represent non-free-choice behavior
   −   Inability to represent hierarchy
• Ability to deal with noise
• Completeness notion assumed
• Approach used (direct algorithmic approaches, two-
  phase approaches, computational intelligence
  approaches, partial approaches, etc.)                  PAGE 9
Examples
• Algorithmic techniques
  • Alpha miner
  • Alpha+, Alpha++, Alpha#
  • FSM miner
  • Fuzzy miner
  • Heuristic miner
  • Multi phase miner
• Genetic process mining
  • Single/duplicate tasks
  • Distributed GM
• Region-based process mining
  • State-based regions
  • Language based regions
• Classical approaches not dealing with concurrency
  • Inductive inference (Mark Gold, Dana Angluin et al.)
  • Sequence mining
                                                           PAGE 10
Heuristic mining

• To deal with noise and incompleteness.
• To have a better representational bias than the α
  algorithm (AND/XOR/OR/skip).
• Uses C-nets.


                            b
                          check
                          policy

               a            c                 e
            register       check             close
             claim        damage             case

                            d
                                   consult
                                   expert
                                                      PAGE 11
Example log; problem α algorithm




                 p5

                 b



        a   p1   d      p3   e

start                              end

            p2    c     p4

                                         PAGE 12
Taking into account frequencies




                                  PAGE 13
Dependency measure




                     PAGE 14
Example




          PAGE 15
Lower threshold (2 direct successions and
a dependency of at least 0.7)
       5(0.83)

                      b

           11(0.92)       11(0.92)

  a                   c                    e
         11(0.92)            11(0.92)


      13(0.93)                  13(0.93)
                      d

          4(0.80)




                                               PAGE 16
Higher threshold (5 direct successions
and a dependency of at least 0.9)

                  b
    11(0.92)             11(0.92)



a                 c                 e
       11(0.92)       11(0.92)


    13(0.93)             13(0.93)
                  d




                                         PAGE 17
Learning splits and joins

                          5
                                  20    b       20

                                       21
           5             20                          20         5


                    20            20            20   20
      a                                 c                        e
      40                 20            21            20         40
                                                           13
               13
                                  13            13
                    13                                13
                                        d
                              4        17
                                            4
                                  4



                                                                     PAGE 18
Alternative visualization

                     5
                             20   b        20

                                  21
     5              20                          20         5


               20            20            20   20
a                                  c                        e
40                  20            21            20         40
                                                      13
         13
                             13            13
               13                                13
                                  d                                       b
                         4        17
                                       4
                             4
                                                                    AND       AND
                                                                a         c         e




                                                                          d




                                                                                        PAGE 19
Characteristics of heuristic mining

• Can deal with noise and therefore quite robust.
• Improved representational bias.
• Split and join rules are only considered locally
  (therefore most of the discovered model are not
  sound and require repair actions).




                                                     PAGE 20
Genetic process mining

                    create initial
                     population



   event log                                                  mutation

                                     next generation
                  compute
                   fitness
                                       elitism
  termination
                       tournament                           children

                                                       crossover

    select best                  parents
     individual



                             “dead” individuals



                                                                         PAGE 21
Design decisions

•   Representation of individuals
•   Initialization
•   Fitness function
•   Selection strategy (tournament and elitism)
•   Crossover                                   create initial
                                                 population


•   Mutation                   event log                                                  mutation

                                                                 next generation
                                              compute
                                               fitness
                                                                   elitism
                              termination
                                                   tournament                           children

                                                                                   crossover

                                select best                  parents
                                 individual



                                                         “dead” individuals




                                                                                                     PAGE 22
Example: crossover

                        b                                                                           b
                    examine                                                                     examine
                   thoroughly                                                                  thoroughly
                                                            g                                                                           g
                                                           pay                                                                         pay
                        c                                                                           c
                                                       compensation                                                                compensation
           a                          e                                                a                          e
                    examine                                                                     examine
start   register    casually      decide                              end   start   register    casually      decide                              end
        request                                                                     request
                                                            h                                                                           h
                        d                                                                           d
                                                          reject                                                                      reject
                   check ticket                          request                               check ticket                          request
                                  f                                                                           f
                                          reinitiate                                                                  reinitiate
                                           request                                                                     request




                        b                                                                           b
                    examine                                                                     examine
                   thoroughly                                                                  thoroughly
                                                            g                                                                           g
                                                           pay                                                                         pay
                        c                                                                           c
                                                       compensation                                                                compensation
           a                          e                                                a                          e
                    examine                                                                     examine
start   register    casually      decide                              end   start   register    casually      decide                              end
        request                                                                     request
                                                            h                                                                           h
                        d                                                                           d
                                                          reject                                                                      reject
                   check ticket                          request                               check ticket                          request
                                  f                                                                           f
                                          reinitiate
                                                                                                                      reinitiate
                                           request
                                                                                                                       request




                                                                                                                                            PAGE 23
Example: mutation



                                  remove place

                        b                                                                           b
                    examine                                                                     examine
                   thoroughly                                                                  thoroughly
                                                            g                                                                           g
                                                           pay                                                                         pay
                        c                                                                           c
                                                       compensation                                                                compensation
           a                          e                                                a                          e
                    examine                                                                     examine
start   register    casually      decide                              end   start   register    casually      decide                              end
        request                                                                     request
                                                            h                                                                           h
                        d                                                                           d
                                                          reject                                                                      reject
                   check ticket                          request                               check ticket                          request
                                  f                                                                           f
                                          reinitiate                                                                  reinitiate
                                           request
                                                                            added arc                                  request




                                                                                                                                        PAGE 24
Characteristics of genetic
 process mining




• Requires a lot of computing power.
• Can be distributed easily.
• Can deal with noise, infrequent behavior, duplicate tasks,
  invisible tasks, etc.
• Allows for incremental improvement and combinations
  with other approaches (heuristics post-optimization, etc.).
                                                       PAGE 25
Region-based mining

• Two types of regions theory:
   − State-based regions
   − Language-based regions
• All about discovering places (like in the α algorithm)!


              a1                          b1


              a2                          b2

              ...         p(A,B)          ...
              am                          bn


        A={a1,a2, … am}            B={b1,b2, … bn}
                                                      PAGE 26
State-based regions

Two steps:
1.Discover a transition system (different abstractions
  are possible)
2.Convert transition system into an “equivalent” Petri
  net.




                                                     PAGE 27
Step 1: learning a transition system

                                 current state


       trace:   abcdcdcde faghhhi
                      past                       future

                             past and future

•   past, future, past+future
•   sequence, multiset, set abstraction
•   limited horizon to abstract further
•   filtering e.g. based on transaction type, names, etc.
•   labels based on activity name or other features
                                                            PAGE 28
Past without abstraction (full sequence)


                    c             d
       ‹a,b›
                        ‹a,b,c›       ‹a,b,c,d›
                b
      a             e             d
 ‹›       ‹a›           ‹a,e›         ‹a,e,d›
                c
                    b             d
       ‹a,c›
                        ‹a,c,b›       ‹a,c,b,d›

                                                PAGE 29
Future without abstraction


             a             b        ‹c,d›
 ‹a,b,c,d›       ‹b,c,d›       c
             a             e              d
  ‹a,e,d›         ‹e,d›            ‹d ›       ‹›
                               b
             a             c
                                    ‹b,d›
 ‹a,c,b,d›       ‹c,b,d›

                                                   PAGE 30
Past with multiset abstraction

           [a,e]
                             d
                                      [a,d,e]
                e       [a,b]
      a             b
 []       [a]
                c        c
                    b             d
           [a,c]        [a,b,c]       [a,b,c,d]

                                                  PAGE 31
Only last event matters for state

                                ‹e›
                    e                      d
        a               b
                                ‹ b›       d
  ‹›         ‹a ›           c          b       ‹d›
                    c                      d

                                ‹c›

                                                     PAGE 32
Step 2: constructing a Petri net using
regions
                                            a = enter
               b                d           b = enter
       a                            e       c = exit
                                            d = exit
                   f            d           e = do not cross
   e                                        f = do not cross
           e

                       f        c
       a

                           R


                       a                c

           e                                      f
                           pR
                       b                d

                                                               PAGE 33
Example

                                                      d
                                        e
                                            [a,e]             [a,d,e]
                               [ a,b]
             a             b
        []       [a]                    c
                       c
                           b                          d
                  [a,c]                     [a,b,c]           [a,b,c,d]




                               b



        a        p1            e              p3          d

start                                                                end

                 p2            c              p4
                                                                           PAGE 34
Language based regions


                  f                  c1

                          a1                    b1

              e                       c                      d
                                     pR
                          a2                    b2


                          X                     Y

Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} =
transitions producing a token for pR, Y = {b1,b2,c1} = transitions
consuming a token from pR, and c is the initial marking of pR.       PAGE 35
Based idea: enough tokens should be
present when consuming
                           A place is feasible if it
                           can be added without
       f        c1         disabling any of the
                           traces in the event log.

           a1        b1

   e            c          d
                pR
           a2        b2


           X         Y



                                               PAGE 36
Example




          PAGE 37
Regions




          PAGE 38
Model

        a        p5            d

                      c
 p1         p2            p3       p4
        b                      e

                 p6




                                        PAGE 39
Characteristics of region-based mining

• Can be used to discover more complex control-flow
  structures.
• Classical approaches need to be adapted
  (overfitting!).
• Representational bias can be parameterized (e.g.,
  free-choice nets, label splitting, etc.).
• Problems dealing with noise.




                                                  PAGE 40
Other approaches, e.g. fuzzy mining




                                      PAGE 41
Evaluating the discovered process



                         Fitness: Is the event log
                         possible according to the
                         model?

Precision: Is the model                        Generalization: Is the model
not underfitting (allow for                    not overfitting (only allow for
too much)?                                     the “accidental” examples)?


                         Structure: Is this the
                         simplest model (Occam's
                         Razor)?



                                                                          PAGE 42

More Related Content

Similar to Process mining chapter_06_advanced_process_discovery_techniques

Process mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingProcess mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checking
Muhammad Ajmal
 
Process mining chapter_05_process_discovery
Process mining chapter_05_process_discoveryProcess mining chapter_05_process_discovery
Process mining chapter_05_process_discovery
Muhammad Ajmal
 
Process Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional PerspectivesProcess Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional Perspectives
Wil van der Aalst
 
Process mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectivesProcess mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectives
Muhammad Ajmal
 
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataProcess Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big Data
Wil van der Aalst
 
Process mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processesProcess mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processes
Muhammad Ajmal
 
Process mining chapter_01_introduction
Process mining chapter_01_introductionProcess mining chapter_01_introduction
Process mining chapter_01_introduction
Muhammad Ajmal
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - Introduction
Wil van der Aalst
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match Reality
Dirk Fahland
 
Process mining chapter_14_epilogue
Process mining chapter_14_epilogueProcess mining chapter_14_epilogue
Process mining chapter_14_epilogue
Muhammad Ajmal
 
Process Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - EpilogueProcess Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - Epilogue
Wil van der Aalst
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Yandex
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
Mohd Arif
 
Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London
Wil van der Aalst
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process Models
Dirk Fahland
 
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Wil van der Aalst
 
Back To The Future
Back To The FutureBack To The Future
Back To The Future
Bill Scott
 
Process Mining - Chapter 11 - Analyzing Lasagna Processes
Process Mining - Chapter 11 - Analyzing Lasagna ProcessesProcess Mining - Chapter 11 - Analyzing Lasagna Processes
Process Mining - Chapter 11 - Analyzing Lasagna Processes
Wil van der Aalst
 
Process mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processesProcess mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processes
Muhammad Ajmal
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
Revolution Analytics
 

Similar to Process mining chapter_06_advanced_process_discovery_techniques (20)

Process mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingProcess mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checking
 
Process mining chapter_05_process_discovery
Process mining chapter_05_process_discoveryProcess mining chapter_05_process_discovery
Process mining chapter_05_process_discovery
 
Process Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional PerspectivesProcess Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional Perspectives
 
Process mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectivesProcess mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectives
 
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataProcess Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big Data
 
Process mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processesProcess mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processes
 
Process mining chapter_01_introduction
Process mining chapter_01_introductionProcess mining chapter_01_introduction
Process mining chapter_01_introduction
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - Introduction
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match Reality
 
Process mining chapter_14_epilogue
Process mining chapter_14_epilogueProcess mining chapter_14_epilogue
Process mining chapter_14_epilogue
 
Process Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - EpilogueProcess Mining - Chapter 14 - Epilogue
Process Mining - Chapter 14 - Epilogue
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process Models
 
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...Business Process Configuration in the Cloud: How to Support and Analyze Multi...
Business Process Configuration in the Cloud: How to Support and Analyze Multi...
 
Back To The Future
Back To The FutureBack To The Future
Back To The Future
 
Process Mining - Chapter 11 - Analyzing Lasagna Processes
Process Mining - Chapter 11 - Analyzing Lasagna ProcessesProcess Mining - Chapter 11 - Analyzing Lasagna Processes
Process Mining - Chapter 11 - Analyzing Lasagna Processes
 
Process mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processesProcess mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processes
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
 

More from Muhammad Ajmal

Process mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigationProcess mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigation
Muhammad Ajmal
 
Process mining chapter_10_tool_support
Process mining chapter_10_tool_supportProcess mining chapter_10_tool_support
Process mining chapter_10_tool_support
Muhammad Ajmal
 
Process mining chapter_09_operational_support
Process mining chapter_09_operational_supportProcess mining chapter_09_operational_support
Process mining chapter_09_operational_support
Muhammad Ajmal
 
Process mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingProcess mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checking
Muhammad Ajmal
 
Process mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_dataProcess mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_data
Muhammad Ajmal
 
Process mining chapter_03_data_mining
Process mining chapter_03_data_miningProcess mining chapter_03_data_mining
Process mining chapter_03_data_mining
Muhammad Ajmal
 
Process mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysisProcess mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysis
Muhammad Ajmal
 
Process mining
Process miningProcess mining
Process mining
Muhammad Ajmal
 

More from Muhammad Ajmal (8)

Process mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigationProcess mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigation
 
Process mining chapter_10_tool_support
Process mining chapter_10_tool_supportProcess mining chapter_10_tool_support
Process mining chapter_10_tool_support
 
Process mining chapter_09_operational_support
Process mining chapter_09_operational_supportProcess mining chapter_09_operational_support
Process mining chapter_09_operational_support
 
Process mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingProcess mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checking
 
Process mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_dataProcess mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_data
 
Process mining chapter_03_data_mining
Process mining chapter_03_data_miningProcess mining chapter_03_data_mining
Process mining chapter_03_data_mining
 
Process mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysisProcess mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysis
 
Process mining
Process miningProcess mining
Process mining
 

Process mining chapter_06_advanced_process_discovery_techniques

  • 1. Chapter 6 Advanced Process Discovery Techniques prof.dr.ir. Wil van der Aalst www.processmining.org
  • 2. Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Chapter 3 Process Modeling and Data Mining Analysis Part II: From Event Logs to Process Models Chapter 4 Chapter 5 Chapter 6 Getting the Data Process Discovery: An Advanced Process Introduction Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Chapter 8 Chapter 9 Conformance Mining Additional Operational Support Checking Perspectives Part IV: Putting Process Mining to Work Chapter 10 Chapter 11 Chapter 12 Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes” Part V: Reflection Chapter 13 Chapter 14 Cartography and Epilogue Navigation PAGE 1
  • 3. Process discovery supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 2
  • 4. Challenge “able to replay event log” “Occam’s razor” fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 3
  • 5. Observing a stable process infinitely long frequent all behavior behavior trace in (including noise) event log PAGE 4
  • 6. Target model target model PAGE 5
  • 7. Non-fitting model non-fitting model PAGE 6
  • 8. Overfitting model overfitting model PAGE 7
  • 9. Underfitting model underfitting model PAGE 8
  • 10. Characteristics of process discovery algorithms • Representational bias − Inability to represent concurrency − Inability to deal with (arbitrary) loops − Inability to represent silent actions − Inability to represent duplicate actions − Inability to model OR-splits/joins − Inability to represent non-free-choice behavior − Inability to represent hierarchy • Ability to deal with noise • Completeness notion assumed • Approach used (direct algorithmic approaches, two- phase approaches, computational intelligence approaches, partial approaches, etc.) PAGE 9
  • 11. Examples • Algorithmic techniques • Alpha miner • Alpha+, Alpha++, Alpha# • FSM miner • Fuzzy miner • Heuristic miner • Multi phase miner • Genetic process mining • Single/duplicate tasks • Distributed GM • Region-based process mining • State-based regions • Language based regions • Classical approaches not dealing with concurrency • Inductive inference (Mark Gold, Dana Angluin et al.) • Sequence mining PAGE 10
  • 12. Heuristic mining • To deal with noise and incompleteness. • To have a better representational bias than the α algorithm (AND/XOR/OR/skip). • Uses C-nets. b check policy a c e register check close claim damage case d consult expert PAGE 11
  • 13. Example log; problem α algorithm p5 b a p1 d p3 e start end p2 c p4 PAGE 12
  • 14. Taking into account frequencies PAGE 13
  • 16. Example PAGE 15
  • 17. Lower threshold (2 direct successions and a dependency of at least 0.7) 5(0.83) b 11(0.92) 11(0.92) a c e 11(0.92) 11(0.92) 13(0.93) 13(0.93) d 4(0.80) PAGE 16
  • 18. Higher threshold (5 direct successions and a dependency of at least 0.9) b 11(0.92) 11(0.92) a c e 11(0.92) 11(0.92) 13(0.93) 13(0.93) d PAGE 17
  • 19. Learning splits and joins 5 20 b 20 21 5 20 20 5 20 20 20 20 a c e 40 20 21 20 40 13 13 13 13 13 13 d 4 17 4 4 PAGE 18
  • 20. Alternative visualization 5 20 b 20 21 5 20 20 5 20 20 20 20 a c e 40 20 21 20 40 13 13 13 13 13 13 d b 4 17 4 4 AND AND a c e d PAGE 19
  • 21. Characteristics of heuristic mining • Can deal with noise and therefore quite robust. • Improved representational bias. • Split and join rules are only considered locally (therefore most of the discovered model are not sound and require repair actions). PAGE 20
  • 22. Genetic process mining create initial population event log mutation next generation compute fitness elitism termination tournament children crossover select best parents individual “dead” individuals PAGE 21
  • 23. Design decisions • Representation of individuals • Initialization • Fitness function • Selection strategy (tournament and elitism) • Crossover create initial population • Mutation event log mutation next generation compute fitness elitism termination tournament children crossover select best parents individual “dead” individuals PAGE 22
  • 24. Example: crossover b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examine start register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request request b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examine start register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request request PAGE 23
  • 25. Example: mutation remove place b b examine examine thoroughly thoroughly g g pay pay c c compensation compensation a e a e examine examine start register casually decide end start register casually decide end request request h h d d reject reject check ticket request check ticket request f f reinitiate reinitiate request added arc request PAGE 24
  • 26. Characteristics of genetic process mining • Requires a lot of computing power. • Can be distributed easily. • Can deal with noise, infrequent behavior, duplicate tasks, invisible tasks, etc. • Allows for incremental improvement and combinations with other approaches (heuristics post-optimization, etc.). PAGE 25
  • 27. Region-based mining • Two types of regions theory: − State-based regions − Language-based regions • All about discovering places (like in the α algorithm)! a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} PAGE 26
  • 28. State-based regions Two steps: 1.Discover a transition system (different abstractions are possible) 2.Convert transition system into an “equivalent” Petri net. PAGE 27
  • 29. Step 1: learning a transition system current state trace: abcdcdcde faghhhi past future past and future • past, future, past+future • sequence, multiset, set abstraction • limited horizon to abstract further • filtering e.g. based on transaction type, names, etc. • labels based on activity name or other features PAGE 28
  • 30. Past without abstraction (full sequence) c d ‹a,b› ‹a,b,c› ‹a,b,c,d› b a e d ‹› ‹a› ‹a,e› ‹a,e,d› c b d ‹a,c› ‹a,c,b› ‹a,c,b,d› PAGE 29
  • 31. Future without abstraction a b ‹c,d› ‹a,b,c,d› ‹b,c,d› c a e d ‹a,e,d› ‹e,d› ‹d › ‹› b a c ‹b,d› ‹a,c,b,d› ‹c,b,d› PAGE 30
  • 32. Past with multiset abstraction [a,e] d [a,d,e] e [a,b] a b [] [a] c c b d [a,c] [a,b,c] [a,b,c,d] PAGE 31
  • 33. Only last event matters for state ‹e› e d a b ‹ b› d ‹› ‹a › c b ‹d› c d ‹c› PAGE 32
  • 34. Step 2: constructing a Petri net using regions a = enter b d b = enter a e c = exit d = exit f d e = do not cross e f = do not cross e f c a R a c e f pR b d PAGE 33
  • 35. Example d e [a,e] [a,d,e] [ a,b] a b [] [a] c c b d [a,c] [a,b,c] [a,b,c,d] b a p1 e p3 d start end p2 c p4 PAGE 34
  • 36. Language based regions f c1 a1 b1 e c d pR a2 b2 X Y Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} = transitions producing a token for pR, Y = {b1,b2,c1} = transitions consuming a token from pR, and c is the initial marking of pR. PAGE 35
  • 37. Based idea: enough tokens should be present when consuming A place is feasible if it can be added without f c1 disabling any of the traces in the event log. a1 b1 e c d pR a2 b2 X Y PAGE 36
  • 38. Example PAGE 37
  • 39. Regions PAGE 38
  • 40. Model a p5 d c p1 p2 p3 p4 b e p6 PAGE 39
  • 41. Characteristics of region-based mining • Can be used to discover more complex control-flow structures. • Classical approaches need to be adapted (overfitting!). • Representational bias can be parameterized (e.g., free-choice nets, label splitting, etc.). • Problems dealing with noise. PAGE 40
  • 42. Other approaches, e.g. fuzzy mining PAGE 41
  • 43. Evaluating the discovered process Fitness: Is the event log possible according to the model? Precision: Is the model Generalization: Is the model not underfitting (allow for not overfitting (only allow for too much)? the “accidental” examples)? Structure: Is this the simplest model (Occam's Razor)? PAGE 42