SlideShare a Scribd company logo
1 of 54
Download to read offline
Chapter 5
Process Discovery:
An Introduction
prof.dr.ir. Wil van der Aalst
www.processmining.org
Overview
Chapter 1
Introduction



Part I: Preliminaries

Chapter 2                   Chapter 3
Process Modeling and        Data Mining
Analysis


Part II: From Event Logs to Process Models

Chapter 4                  Chapter 5               Chapter 6
Getting the Data           Process Discovery: An   Advanced Process
                           Introduction            Discovery Techniques


Part III: Beyond Process Discovery

Chapter 7                   Chapter 8              Chapter 9
Conformance                 Mining Additional      Operational Support
Checking                    Perspectives


Part IV: Putting Process Mining to Work

Chapter 10                  Chapter 11             Chapter 12
Tool Support                Analyzing “Lasagna     Analyzing “Spaghetti
                            Processes”             Processes”


Part V: Reflection

Chapter 13                  Chapter 14
Cartography and             Epilogue
Navigation
                                                                          PAGE 1
Process discovery

                              supports/
      “world”    business
                               controls
                processes                      software
   people   machines                            system
        components
           organizations                              records
                                                   events, e.g.,
                                                    messages,
                                   specifies       transactions,
    models
                                  configures            etc.
   analyzes
                                 implements
                                   analyzes


                            discovery
        (process)                                 event
                            conformance
          model                                    logs
                            enhancement
                                                                   PAGE 2
Process discovery = Play-In
Play-In




      event log                                   process model


Play-Out




                  process model                                event log


Replay

                                                      •   extended model
                                                          showing times,
                                                          frequencies, etc.
                                                      •   diagnostics
                                                      •   predictions
                                                      •   recommendations
     event log                    process model                               PAGE 3
Example

                                   b



            a          p1          e   p3   d

start                                             end

                       p2          c   p4




 Event log contains all possible
 traces of model and vice versa.
                                                PAGE 4
Another example

                  p1                  b        p3




            a              f               e        d

start                                 p5                  end

                  p2                  c        p4




 Generalization: event log contains only
 subset of all possible traces of model.
                                                        PAGE 5
Notation is less relevant (e.g. BPMN)
                  b



         a   p1   e    p3     d

start                                 end   b
             p2   c   p4



                                            c

                                  a             d
                      start                            end
                                            e




                                                    PAGE 6
Another BPMN example
            p1       b        p3




        a        f        e         d

start                p5                    end
                                                         b

            p2       c        p4



                                                         c

                                                 a               d
                                   start                                 end

                                                     f       e




                                                                     PAGE 7
Challenge

• In general, there is a trade-off between the following
  four quality criteria:
1.Fitness: the discovered model should allow for the
  behavior seen in the event log.
2.Precision (avoid underfitting): the discovered model
  should not allow for behavior completely unrelated
  to what was seen in the event log.
3.Generalization (avoid overfitting): the discovered
  model should generalize the example behavior seen
  in the event log.
4.Simplicity: the discovered model should be as
  simple as possible.
                                                      PAGE 8
Process Discovery:
example of algorithm



             α

                       PAGE 9
>,→,||,# relations


• Direct succession: x>y iff
  for some case x is directly
  followed by y.                            abcd
• Causality: x→y iff x>y and                acbd
  not y>x.                                   aed
• Parallel: x||y iff x>y and    a>b
  y>x                           a>c   a→b          b#e
                                a>e
• Choice: x#y iff not x>y and         a→c          e#b
                                b>c         b||c   c#e
  not y>x.                            a→e
                                b>d         c||b
                                c>b   b→d          a#d
                                                    …
                                c>d   c→d
                                e>d   e→d                PAGE 10
Basic Idea Used by α Algorithm (1)




        a                     b

     (a) sequence pattern: a→b




                                     PAGE 11
Basic Idea Used by α Algorithm (2)

                                 a

                                     b                   c

                                 b
           a
                                 (c) XOR-join pattern:
                       b
                                  a→c, b→c, and a#b
   a                                 c
                       c
            (b) XOR-split pattern:
   (b) XOR-split pattern:a→c, and b#c
              a→b,
   a→b, a→c, and b#c                                     PAGE 12
Basic Idea Used by α Algorithm (3)

                              a

                                    b                  c

                              b
          a
                               (e) AND-join pattern:
                 b              a→c, b→c, and a||b

  a
                                      c
                 c
             (d) AND-split pattern:
  (d) AND-split pattern:
               a→b, a→c, and b||c
   a→b, a→c, and b||c                                  PAGE 13
Example Revisited

 a>b       a→b    b||c   b#e
 a>c       a→c    c||b   e#b
 a>e       a→e           c#e
 b>c                     a#d
           b→d
 b>d                      …
 c>b       c→ d
 c>d       e→d                   b
 e>d

              a          p1      e   p3   d

   start                                      end

                         p2      c   p4
Result produced by α algorithm                PAGE 14
Footprint of L1


                    b



           a   p1   e   p3   d

   start                         end

               p2   c   p4




                                       PAGE 15
Footprint of L2



                     p1       b        p3




                 a        f        e        d

         start                p5                end

                     p2       c        p4




                                                      PAGE 16
Simple patterns

                     a                  b

                  (a) sequence pattern: a→b

                         b          a

    a                                                        c

                             c      b

    (b) XOR-split pattern:           (c) XOR-join pattern:
     a→b, a→c, and b#c               a→c, b→c, and a#b

                         b          a

    a                                                        c

                             c      b

    (d) AND-split pattern:           (e) AND-join pattern:
                                                                 PAGE 17
     a→b, a→c, and b||c               a→c, b→c, and a||b
Algorithm

Let L be an event log over T. α(L) is defined as follows.
1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ},
2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) },
3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) },
4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧
   ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL},
7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈
   YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and
8. α(L) = (PL,TL,FL).



                                                                      PAGE 18
Key idea: find places


                       a1                              b1


                       a2                              b2

                       ...           p(A,B)            ...
                       am                              bn


              A={a1,a2, … am}                 B={b1,b2, … bn}

4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧
   ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
                                                                          PAGE 19
Places as footprints

                a1                          b1


                a2                          b2

                ...         p(A,B)          ...
                am                          bn


          A={a1,a2, … am}            B={b1,b2, … bn}




                                                       PAGE 20
b



        a   p1   e   p3   d

start                               end

            p2   c   p4




                          PAGE 21
Another event log L3




                       PAGE 22
Model for L3




                                            f



                                            c

        a                  b   p({b},{c})       p({c},{e})   e                  g

iL          p({a,f},{b})                    d                    p({e},{f,g})                 oL

                               p({b},{d})       p({d},{e})



                                                                                    PAGE 23
Another event log L4




      a                                      d

                          c

iL         p({a,b},{c})       p({c},{d,e})       oL
      b                                      e




                                                 PAGE 24
Event log L5




               PAGE 25
PAGE 26
Discovered model

                d                            c
                           p({c},{d})

                              b
            p({a,d},{b})                p({b},{c,f})
iL      a                                              f       oL

                              e
            p({a},{e})                  p({e},{f})




                                                           PAGE 27
Limitation of α algorithm
 (implicit places)




                             c
              a
                             d
                                 p1   g
                             e
                                 p2
              b
                             f

Green places are implicit!
                                          PAGE 28
Limitation of α algorithm
(loops of length 1)




              b



      a               c




                                b



                            a       c


                                        PAGE 29
Limitation of α algorithm
(loops of length 2)




                      c


     a                b       d




                                  b

                          a           d

                                  c


                                          PAGE 30
Limitation of α algorithm
(non-local dependencies)




           a                       p1   d

                             c

           b                       p2   e



Green places are not discovered!




                                            PAGE 31
Difficult constructs for α algorithm



           a

                                       c

           b




                                           PAGE 32
Taking the transactional life-cycle into
    account

a                                                    c
      assign                                               assign

                 b
                       start
    assigned                                             assigned
                               suspend


                     running
        start                                                start
                                         suspended

                               resume
                  complete
     running                                              running



    complete                                             complete




                                                                     PAGE 33
Rediscovering process models



                    simulate            discover   discovered
 original process
                                event                process
      model
                                 log                  model
         N                                              N’
                               N=N’ ?


 The rediscovery problem: Is the discovered
 model N’ equivalent to the original model N?



                                                                PAGE 34
Equivalence: trace equivalence,
 bisimilarity, and branching bisimilarity


                s1
                                              s5                    s8
                                      birth                                   curse
             birth                                          birth
        curse                     curse         s6      curse        curse
                              ?                                     s9              s10
                 s2
        curse                     heaven         hell   heaven        hell
                   heaven hell                                               hell
                heaven
curse                                                                        heaven
        s3               s4                s7                   s11
                 TS1                       TS2                  TS3

Three trace equivalent transition systems: TS1 and TS2
are not bisimilar, but TS2 and TS3 are bisimilar

                                                                                    PAGE 35
Branching bisimilarity defined for YAWL


   start                      s1                              s6          start

                          check                       check
  check                                                                  check
                              s2

                              τ          τ
                                                              s7
  c1       c2            s3                  s4                                   c3
                                                  ?

reject          accept   reject          accept   reject      accept   reject          accept

   end                                                           s8       end
                              s5

                                   TS1                     TS2
TS1 and TS2 are not branching bisimilar (although trace equivalent).

                                                                                           PAGE 36
Challenge: finding the right
representational bias




                                   a                       a
                     start                     p                     end


 There is no WF-net with unique visible labels that exhibits this behavior.

                                                                         PAGE 37
Another example

                 τ

        a        b          c
start       p1         p1       end
                 (a)


            a



        a        b          c         There is no WF-
start       p1         p1       end   net with unique
                 (b)                  visible labels
                                      that exhibits
                                      this behavior.


        a        b          c
start       p1         p1       end
                                               PAGE 38
                 (c)
Challenge: noise and incompleteness

• To discover a suitable process model it is
  assumed that the event log contains a
  representative sample of behavior.
• Two related phenomena:
    − Noise: the event log contains rare and
      infrequent behavior not representative for
      the typical behavior of the process.
    − Incompleteness: the event log contains
      too few events to be able to discover
      some of the underlying control-flow
      structures.
                                              PAGE 39
More on incompleteness




See also chapter 3 (cross-validation, precision, recall, etc.)
                                                                 PAGE 40
Challenge: Balancing
Between Underfitting and
Overfitting
                           PAGE 41
Challenge: four competing quality
criteria

 “able to replay event log”                 “Occam’s razor”

          fitness                             simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”



                                                                PAGE 42
Flower model


                  b   c
              a               d



      start                       end



              e
                                  h
                  f       g


                                        PAGE 43
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   0
 BCE   85
                 A        D
 BCD   0

                     C



                 B        E




                              PAGE 44
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   88
 BCE   85
                 A        D
 BCD   78

                     C



                 B        E




                              PAGE 45
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   2
 BCE   85
                 A        D
 BCD   3

                     C



                 B        E




                              PAGE 46
Example: one log four models
                                                                                                               b
                                                                                                            examine
                                                                                                           thoroughly
                                                                                                                                                                            g
                                                                                                                                                                         pay
                                                                                                               c                                                     compensation
                                                                                          a                examine                                 e
                                                                           start     register              casually                           decide                                   end
                                                                                                                                                                                                     #      trace
                                                                                     request
                                                                                                                                                                            h                        455 acdeh
                                                                                                               d                                                         reject
                                                                                                          check ticket                                                  request                      191 abdeg
                                                                                                                                               f     reinitiate
                                                                                                                                                      request                                        177 adceh
                                                                               N1 : fitness = +, precision = +, generalization = +, simplicity = +
                                                                                                                                                                                                     144 abdeh
                                                                                                                                                                                                     111 acdeg
                                                                                      a              c                        d                          e                      h
                                                                                                                                                                                                      82 adceg
                                                                          start    register       examine                   check                      decide                reject     end
                                                                                   request        casually                  ticket                                          request
                                                                                                                                                                                                      56 adbeh
                                                                               N2 : fitness = -, precision = +, generalization = -, simplicity = +
                                                                                                                                                                                                      47 acdefdbeh
 “able to replay event log”                 “Occam’s razor”
                                                                                                                                                                                                      38 adbeg
                                                                                                       examine                                check
                                                                                                      thoroughly        b             d       ticket                        g                         33 acdefbdeh
          fitness                             simplicity                                                                                                            pay
                                                                                                                                                                compensation
                                                                                          a                                                                                                           14 acdefbdeg
                                                                           start     register   examine
                                                                                                             c                                                                         end            11 acdefdbeg
                                                                                     request    casually
                                                                                                                         e                f        reinitiate               h
                               process                                                                        decide                                request        reject
                                                                                                                                                                  request
                                                                                                                                                                                                         9 adcefcdeh
                              discovery                                        N3 : fitness = +, precision = -, generalization = +, simplicity = +                                                       8 adcefdbeh
                                                                                                                                                                                                         5 adcefbdeg
                                                                                       a              d                        c                           e                    g
                                                                                                                                                                                                         3 acdefbdefdbeg
generalization                                precision                             register
                                                                                    request
                                                                                                    check
                                                                                                    ticket
                                                                                                                         examine
                                                                                                                         casually
                                                                                                                                                        decide              pay
                                                                                                                                                                        compensation
                                                                                                                                                                                                         2 adcefdbeg
                                                                                       a              c                        d                          e                     g                        2 adcefbdefbdeg
 “not overfitting the log”                “not underfitting the log”                register      examine                    check                      decide              pay
                                                                                    request       casually                   ticket                                     compensation                     1 adcefdbefbdeh
                                                                                       a              d                        c                           e                    h                        1 adbefbdefdbeg
                                                                                    register        check                examine                        decide                reject
                                                                                    request         ticket               casually                                            request                     1 adcefdbefcdefdbeg
                                                                                      a               c                       d                           e                     h                   1391
                                                                       start                                                                                                                  end
                                                                                   register       examine                   check                      decide                reject
                                                                                   request        casually                  ticket                                          request


                                                                                                 …                 (all 21 variants seen in the log)


                                                                                      a              b                        d                           e                     g
                                                                                   register        examine                  check                      decide               pay
                                                                                   request        thoroughly                ticket                                      compensation

                                                                                      a              d                        b                           e                     h
                                                                                   register         check                 examine                      decide                reject
                                                                                   request          ticket               thoroughly                                         request

                                                                                      a              b                        d                           e                     h
                                                                                   register        examine                  check                      decide                reject
                                                                                   request        thoroughly                ticket                                          request                        PAGE 47
                                                                                N4 : fitness = +, precision = +, generalization = -, simplicity = -
#     trace
                                                                               455 acdeh
        Model N1                                                               191 abdeg
                                                                               177 adceh
                                                                               144 abdeh
                                                                               111 acdeg
                                                                                82 adceg
                                                                                56 adbeh
                          b                                                     47 acdefdbeh
                       examine
                      thoroughly                                                38 adbeg
                                                              g                 33 acdefbdeh
                                                             pay
                          c                              compensation           14 acdefbdeg
            a         examine               e
                                                                                11 acdefdbeg
start    register     casually          decide                          end
         request                                                                   9 adcefcdeh
                                                              h
                          d                                 reject                 8 adcefdbeh
                     check ticket                          request                 5 adcefbdeg
                                        f   reinitiate                             3 acdefbdefdbeg
                                             request
N1 : fitness = +, precision = +, generalization = +, simplicity = +                2 adcefdbeg
                                                                                   2 adcefbdefbdeg
                                                                                   1 adcefdbefbdeh
                                                                                   1 adbefbdefdbeg
                                                                                   1 adcefdbefcdefdbeg
                                                                                             PAGE 48
                                                                              1391
#     trace
                                                                              455 acdeh
        Model N2                                                              191 abdeg
                                                                              177 adceh
                                                                              144 abdeh
                                                                              111 acdeg
                                                                               82 adceg
                                                                               56 adbeh
                                                                               47 acdefdbeh
                                                                               38 adbeg
           a          c             d            e             h               33 acdefbdeh
start   register   examine        check        decide         reject   end     14 acdefbdeg
        request    casually       ticket                     request
   N2 : fitness = -, precision = +, generalization = -, simplicity = +         11 acdefdbeg
                                                                                  9 adcefcdeh
                                                                                  8 adcefdbeh
                                                                                  5 adcefbdeg
                                                                                  3 acdefbdefdbeg
                                                                                  2 adcefdbeg
                                                                                  2 adcefbdefbdeg
                                                                                  1 adcefdbefbdeh
                                                                                  1 adbefbdefdbeg
                                                                                  1 adcefdbefcdefdbeg
                                                                                            PAGE 49
                                                                             1391
#     trace
                                                                                          455 acdeh
        Model N3                                                                          191 abdeg
                                                                                          177 adceh
                                                                                          144 abdeh
                                                                                          111 acdeg
                                                                                           82 adceg
                                                                                           56 adbeh
                                                                                           47 acdefdbeh
                           examine                  check
                          thoroughly    b   d       ticket                     g           38 adbeg
                                                                        pay                33 acdefbdeh
                                                                    compensation
            a                                                                              14 acdefbdeg
start    register   examine                                                        end     11 acdefdbeg
         request    casually   c
                                        e       f      reinitiate
                                                                      reject
                                                                               h              9 adcefcdeh
                               decide                   request
                                                                     request                  8 adcefdbeh
 N3 : fitness = +, precision = -, generalization = +, simplicity = +
                                                                                              5 adcefbdeg
                                                                                              3 acdefbdefdbeg
                                                                                              2 adcefdbeg
                                                                                              2 adcefbdefbdeg
                                                                                              1 adcefdbefbdeh
                                                                                              1 adbefbdefdbeg
                                                                                              1 adcefdbefcdefdbeg
                                                                                                        PAGE 50
                                                                                         1391
#     trace
                                                                                               455 acdeh
Model N4                                                                                       191 abdeg
                                                                                               177 adceh
                                                                                               144 abdeh
              a             d                 c                e              g                111 acdeg
           register       check           examine            decide          pay
           request        ticket          casually                       compensation           82 adceg
              a             c                 d                e              g                 56 adbeh
           register     examine             check           decide           pay
           request      casually            ticket                       compensation           47 acdefdbeh
              a             d                 c                e              h                 38 adbeg
           register       check           examine           decide           reject
           request        ticket          casually                          request             33 acdefbdeh
              a            c                 d                e               h                 14 acdefbdeg
start                                                                                   end
           register     examine            check            decide           reject
           request      casually           ticket                           request             11 acdefdbeg

                       …             (all 21 variants seen in the log)
                                                                                                   9 adcefcdeh
                                                                                                   8 adcefdbeh
                                                                                                   5 adcefbdeg
             a             b                 d                e               g
           register     examine            check            decide           pay                   3 acdefbdefdbeg
           request     thoroughly          ticket                        compensation
                                                                                                   2 adcefdbeg
              a            d                 b                e               h
           register       check            examine          decide           reject                2 adcefbdefbdeg
           request        ticket          thoroughly                        request
                                                                                                   1 adcefdbefbdeh
             a             b                 d                e               h
          register       examine           check            decide          reject                 1 adbefbdefdbeg
          request       thoroughly         ticket                          request
                                                                                                   1 adcefdbefcdefdbeg
        N4 : fitness = +, precision = +, generalization = -, simplicity = -
                                                                                                             PAGE 51
                                                                                              1391
Why is process mining such a difficult
problem?

• There are no negative examples (i.e., a log shows
  what has happened but does not show what could
  not happen).
• Due to concurrency, loops, and choices the search
  space has a complex structure and the log typically
  contains only a fraction of all possible behaviors.
• There is no clear relation between the size of a model
  and its behavior (i.e., a smaller model may generate
  more or less behavior although classical analysis
  and evaluation methods typically assume some
  monotonicity property).


                                                     PAGE 52
Creating a 2-D slice of a 3-D reality




Creating a 2-D slice of a 3-D reality: the
process is viewed from a specific angle, the
process is scoped using a frame, and the
resolution determines the granularity of the
resulting model                                PAGE 53

More Related Content

Similar to Process mining chapter_05_process_discovery

Process Mining - Chapter 7 - Conformance Checking
Process Mining - Chapter 7 - Conformance CheckingProcess Mining - Chapter 7 - Conformance Checking
Process Mining - Chapter 7 - Conformance CheckingWil van der Aalst
 
Process Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional PerspectivesProcess Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional PerspectivesWil van der Aalst
 
Process mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectivesProcess mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectivesMuhammad Ajmal
 
Process mining chapter_06_advanced_process_discovery_techniques
Process mining chapter_06_advanced_process_discovery_techniquesProcess mining chapter_06_advanced_process_discovery_techniques
Process mining chapter_06_advanced_process_discovery_techniquesMuhammad Ajmal
 
Process Mining - Chapter 6 - Advanced Process Discovery_techniques
Process Mining - Chapter 6 - Advanced Process Discovery_techniquesProcess Mining - Chapter 6 - Advanced Process Discovery_techniques
Process Mining - Chapter 6 - Advanced Process Discovery_techniquesWil van der Aalst
 
Discovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process ManagementDiscovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process ManagementWil van der Aalst
 
Discovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from ExamplesDiscovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from ExamplesWil van der Aalst
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match RealityDirk Fahland
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolIvo Jimenez
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSEYandex
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionWil van der Aalst
 
Process mining chapter_01_introduction
Process mining chapter_01_introductionProcess mining chapter_01_introduction
Process mining chapter_01_introductionMuhammad Ajmal
 
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataProcess Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataWil van der Aalst
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process ModelsDirk Fahland
 
e Service Prototype
e Service Prototypee Service Prototype
e Service PrototypeYves Pigneur
 
Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Wil van der Aalst
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Section07-Deadlocks (1).ppt
Section07-Deadlocks (1).pptSection07-Deadlocks (1).ppt
Section07-Deadlocks (1).pptamadayshwan
 

Similar to Process mining chapter_05_process_discovery (20)

Process Mining - Chapter 7 - Conformance Checking
Process Mining - Chapter 7 - Conformance CheckingProcess Mining - Chapter 7 - Conformance Checking
Process Mining - Chapter 7 - Conformance Checking
 
Process Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional PerspectivesProcess Mining - Chapter 8 - Mining Additional Perspectives
Process Mining - Chapter 8 - Mining Additional Perspectives
 
Process mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectivesProcess mining chapter_08_mining_additional_perspectives
Process mining chapter_08_mining_additional_perspectives
 
Process mining chapter_06_advanced_process_discovery_techniques
Process mining chapter_06_advanced_process_discovery_techniquesProcess mining chapter_06_advanced_process_discovery_techniques
Process mining chapter_06_advanced_process_discovery_techniques
 
Process Mining - Chapter 6 - Advanced Process Discovery_techniques
Process Mining - Chapter 6 - Advanced Process Discovery_techniquesProcess Mining - Chapter 6 - Advanced Process Discovery_techniques
Process Mining - Chapter 6 - Advanced Process Discovery_techniques
 
Discovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process ManagementDiscovering Petri Nets: Evidence-Based Business Process Management
Discovering Petri Nets: Evidence-Based Business Process Management
 
Discovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from ExamplesDiscovering Concurrency: Learning (Business) Process Models from Examples
Discovering Concurrency: Learning (Business) Process Models from Examples
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match Reality
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
Process Mining: Data Science in Action - Wil van der Aalst, TU/e, DSC/e, HSE
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - Introduction
 
Process mining chapter_01_introduction
Process mining chapter_01_introductionProcess mining chapter_01_introduction
Process mining chapter_01_introduction
 
Process Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big DataProcess Mining: Understanding and Improving Desire Lines in Big Data
Process Mining: Understanding and Improving Desire Lines in Big Data
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process Models
 
Fuzzing - Part 2
Fuzzing - Part 2Fuzzing - Part 2
Fuzzing - Part 2
 
e Service Prototype
e Service Prototypee Service Prototype
e Service Prototype
 
Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London Keynote Gartner Business Process Management Summit, February 2009, London
Keynote Gartner Business Process Management Summit, February 2009, London
 
It Works On Dev
It Works On DevIt Works On Dev
It Works On Dev
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Section07-Deadlocks (1).ppt
Section07-Deadlocks (1).pptSection07-Deadlocks (1).ppt
Section07-Deadlocks (1).ppt
 

More from Muhammad Ajmal

Process mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigationProcess mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigationMuhammad Ajmal
 
Process mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processesProcess mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processesMuhammad Ajmal
 
Process mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processesProcess mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processesMuhammad Ajmal
 
Process mining chapter_10_tool_support
Process mining chapter_10_tool_supportProcess mining chapter_10_tool_support
Process mining chapter_10_tool_supportMuhammad Ajmal
 
Process mining chapter_09_operational_support
Process mining chapter_09_operational_supportProcess mining chapter_09_operational_support
Process mining chapter_09_operational_supportMuhammad Ajmal
 
Process mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingProcess mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingMuhammad Ajmal
 
Process mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_dataProcess mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_dataMuhammad Ajmal
 
Process mining chapter_03_data_mining
Process mining chapter_03_data_miningProcess mining chapter_03_data_mining
Process mining chapter_03_data_miningMuhammad Ajmal
 
Process mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysisProcess mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysisMuhammad Ajmal
 
Process mining chapter_14_epilogue
Process mining chapter_14_epilogueProcess mining chapter_14_epilogue
Process mining chapter_14_epilogueMuhammad Ajmal
 

More from Muhammad Ajmal (11)

Process mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigationProcess mining chapter_13_cartography_and_navigation
Process mining chapter_13_cartography_and_navigation
 
Process mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processesProcess mining chapter_12_analyzing_spaghetti_processes
Process mining chapter_12_analyzing_spaghetti_processes
 
Process mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processesProcess mining chapter_11_analyzing_lasagna_processes
Process mining chapter_11_analyzing_lasagna_processes
 
Process mining chapter_10_tool_support
Process mining chapter_10_tool_supportProcess mining chapter_10_tool_support
Process mining chapter_10_tool_support
 
Process mining chapter_09_operational_support
Process mining chapter_09_operational_supportProcess mining chapter_09_operational_support
Process mining chapter_09_operational_support
 
Process mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checkingProcess mining chapter_07_conformance_checking
Process mining chapter_07_conformance_checking
 
Process mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_dataProcess mining chapter_04_getting_the_data
Process mining chapter_04_getting_the_data
 
Process mining chapter_03_data_mining
Process mining chapter_03_data_miningProcess mining chapter_03_data_mining
Process mining chapter_03_data_mining
 
Process mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysisProcess mining chapter_02_process_modeling_and_analysis
Process mining chapter_02_process_modeling_and_analysis
 
Process mining
Process miningProcess mining
Process mining
 
Process mining chapter_14_epilogue
Process mining chapter_14_epilogueProcess mining chapter_14_epilogue
Process mining chapter_14_epilogue
 

Process mining chapter_05_process_discovery

  • 1. Chapter 5 Process Discovery: An Introduction prof.dr.ir. Wil van der Aalst www.processmining.org
  • 2. Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Chapter 3 Process Modeling and Data Mining Analysis Part II: From Event Logs to Process Models Chapter 4 Chapter 5 Chapter 6 Getting the Data Process Discovery: An Advanced Process Introduction Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Chapter 8 Chapter 9 Conformance Mining Additional Operational Support Checking Perspectives Part IV: Putting Process Mining to Work Chapter 10 Chapter 11 Chapter 12 Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes” Part V: Reflection Chapter 13 Chapter 14 Cartography and Epilogue Navigation PAGE 1
  • 3. Process discovery supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 2
  • 4. Process discovery = Play-In Play-In event log process model Play-Out process model event log Replay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendations event log process model PAGE 3
  • 5. Example b a p1 e p3 d start end p2 c p4 Event log contains all possible traces of model and vice versa. PAGE 4
  • 6. Another example p1 b p3 a f e d start p5 end p2 c p4 Generalization: event log contains only subset of all possible traces of model. PAGE 5
  • 7. Notation is less relevant (e.g. BPMN) b a p1 e p3 d start end b p2 c p4 c a d start end e PAGE 6
  • 8. Another BPMN example p1 b p3 a f e d start p5 end b p2 c p4 c a d start end f e PAGE 7
  • 9. Challenge • In general, there is a trade-off between the following four quality criteria: 1.Fitness: the discovered model should allow for the behavior seen in the event log. 2.Precision (avoid underfitting): the discovered model should not allow for behavior completely unrelated to what was seen in the event log. 3.Generalization (avoid overfitting): the discovered model should generalize the example behavior seen in the event log. 4.Simplicity: the discovered model should be as simple as possible. PAGE 8
  • 10. Process Discovery: example of algorithm α PAGE 9
  • 11. >,→,||,# relations • Direct succession: x>y iff for some case x is directly followed by y. abcd • Causality: x→y iff x>y and acbd not y>x. aed • Parallel: x||y iff x>y and a>b y>x a>c a→b b#e a>e • Choice: x#y iff not x>y and a→c e#b b>c b||c c#e not y>x. a→e b>d c||b c>b b→d a#d … c>d c→d e>d e→d PAGE 10
  • 12. Basic Idea Used by α Algorithm (1) a b (a) sequence pattern: a→b PAGE 11
  • 13. Basic Idea Used by α Algorithm (2) a b c b a (c) XOR-join pattern: b a→c, b→c, and a#b a c c (b) XOR-split pattern: (b) XOR-split pattern:a→c, and b#c a→b, a→b, a→c, and b#c PAGE 12
  • 14. Basic Idea Used by α Algorithm (3) a b c b a (e) AND-join pattern: b a→c, b→c, and a||b a c c (d) AND-split pattern: (d) AND-split pattern: a→b, a→c, and b||c a→b, a→c, and b||c PAGE 13
  • 15. Example Revisited a>b a→b b||c b#e a>c a→c c||b e#b a>e a→e c#e b>c a#d b→d b>d … c>b c→ d c>d e→d b e>d a p1 e p3 d start end p2 c p4 Result produced by α algorithm PAGE 14
  • 16. Footprint of L1 b a p1 e p3 d start end p2 c p4 PAGE 15
  • 17. Footprint of L2 p1 b p3 a f e d start p5 end p2 c p4 PAGE 16
  • 18. Simple patterns a b (a) sequence pattern: a→b b a a c c b (b) XOR-split pattern: (c) XOR-join pattern: a→b, a→c, and b#c a→c, b→c, and a#b b a a c c b (d) AND-split pattern: (e) AND-join pattern: PAGE 17 a→b, a→c, and b||c a→c, b→c, and a||b
  • 19. Algorithm Let L be an event log over T. α(L) is defined as follows. 1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ}, 2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) }, 3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) }, 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 }, 5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, 6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL}, 7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈ YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and 8. α(L) = (PL,TL,FL). PAGE 18
  • 20. Key idea: find places a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 }, 5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, PAGE 19
  • 21. Places as footprints a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} PAGE 20
  • 22. b a p1 e p3 d start end p2 c p4 PAGE 21
  • 23. Another event log L3 PAGE 22
  • 24. Model for L3 f c a b p({b},{c}) p({c},{e}) e g iL p({a,f},{b}) d p({e},{f,g}) oL p({b},{d}) p({d},{e}) PAGE 23
  • 25. Another event log L4 a d c iL p({a,b},{c}) p({c},{d,e}) oL b e PAGE 24
  • 26. Event log L5 PAGE 25
  • 28. Discovered model d c p({c},{d}) b p({a,d},{b}) p({b},{c,f}) iL a f oL e p({a},{e}) p({e},{f}) PAGE 27
  • 29. Limitation of α algorithm (implicit places) c a d p1 g e p2 b f Green places are implicit! PAGE 28
  • 30. Limitation of α algorithm (loops of length 1) b a c b a c PAGE 29
  • 31. Limitation of α algorithm (loops of length 2) c a b d b a d c PAGE 30
  • 32. Limitation of α algorithm (non-local dependencies) a p1 d c b p2 e Green places are not discovered! PAGE 31
  • 33. Difficult constructs for α algorithm a c b PAGE 32
  • 34. Taking the transactional life-cycle into account a c assign assign b start assigned assigned suspend running start start suspended resume complete running running complete complete PAGE 33
  • 35. Rediscovering process models simulate discover discovered original process event process model log model N N’ N=N’ ? The rediscovery problem: Is the discovered model N’ equivalent to the original model N? PAGE 34
  • 36. Equivalence: trace equivalence, bisimilarity, and branching bisimilarity s1 s5 s8 birth curse birth birth curse curse s6 curse curse ? s9 s10 s2 curse heaven hell heaven hell heaven hell hell heaven curse heaven s3 s4 s7 s11 TS1 TS2 TS3 Three trace equivalent transition systems: TS1 and TS2 are not bisimilar, but TS2 and TS3 are bisimilar PAGE 35
  • 37. Branching bisimilarity defined for YAWL start s1 s6 start check check check check s2 τ τ s7 c1 c2 s3 s4 c3 ? reject accept reject accept reject accept reject accept end s8 end s5 TS1 TS2 TS1 and TS2 are not branching bisimilar (although trace equivalent). PAGE 36
  • 38. Challenge: finding the right representational bias a a start p end There is no WF-net with unique visible labels that exhibits this behavior. PAGE 37
  • 39. Another example τ a b c start p1 p1 end (a) a a b c There is no WF- start p1 p1 end net with unique (b) visible labels that exhibits this behavior. a b c start p1 p1 end PAGE 38 (c)
  • 40. Challenge: noise and incompleteness • To discover a suitable process model it is assumed that the event log contains a representative sample of behavior. • Two related phenomena: − Noise: the event log contains rare and infrequent behavior not representative for the typical behavior of the process. − Incompleteness: the event log contains too few events to be able to discover some of the underlying control-flow structures. PAGE 39
  • 41. More on incompleteness See also chapter 3 (cross-validation, precision, recall, etc.) PAGE 40
  • 42. Challenge: Balancing Between Underfitting and Overfitting PAGE 41
  • 43. Challenge: four competing quality criteria “able to replay event log” “Occam’s razor” fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 42
  • 44. Flower model b c a d start end e h f g PAGE 43
  • 45. What is the best model? A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 44
  • 46. What is the best model? A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 45
  • 47. What is the best model? A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 46
  • 48. Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeg generalization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 47 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  • 49. # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbeg start register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg request N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 48 1391
  • 50. # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdeh start register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 49 1391
  • 51. # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdeg start register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 50 1391
  • 52. # trace 455 acdeh Model N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdeg start end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 51 1391
  • 53. Why is process mining such a difficult problem? • There are no negative examples (i.e., a log shows what has happened but does not show what could not happen). • Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors. • There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 52
  • 54. Creating a 2-D slice of a 3-D reality Creating a 2-D slice of a 3-D reality: the process is viewed from a specific angle, the process is scoped using a frame, and the resolution determines the granularity of the resulting model PAGE 53