Chapter 5
Process Discovery:
An Introduction
prof.dr.ir. Wil van der Aalst
www.processmining.org
Overview
Chapter 1
Introduction



Part I: Preliminaries

Chapter 2                   Chapter 3
Process Modeling and        Data Mining
Analysis


Part II: From Event Logs to Process Models

Chapter 4                  Chapter 5               Chapter 6
Getting the Data           Process Discovery: An   Advanced Process
                           Introduction            Discovery Techniques


Part III: Beyond Process Discovery

Chapter 7                   Chapter 8              Chapter 9
Conformance                 Mining Additional      Operational Support
Checking                    Perspectives


Part IV: Putting Process Mining to Work

Chapter 10                  Chapter 11             Chapter 12
Tool Support                Analyzing “Lasagna     Analyzing “Spaghetti
                            Processes”             Processes”


Part V: Reflection

Chapter 13                  Chapter 14
Cartography and             Epilogue
Navigation
                                                                          PAGE 1
Process discovery

                              supports/
      “world”    business
                               controls
                processes                      software
   people   machines                            system
        components
           organizations                              records
                                                   events, e.g.,
                                                    messages,
                                   specifies       transactions,
    models
                                  configures            etc.
   analyzes
                                 implements
                                   analyzes


                            discovery
        (process)                                 event
                            conformance
          model                                    logs
                            enhancement
                                                                   PAGE 2
Process discovery = Play-In
Play-In




      event log                                   process model


Play-Out




                  process model                                event log


Replay

                                                      •   extended model
                                                          showing times,
                                                          frequencies, etc.
                                                      •   diagnostics
                                                      •   predictions
                                                      •   recommendations
     event log                    process model                               PAGE 3
Example

                                   b



            a          p1          e   p3   d

start                                             end

                       p2          c   p4




 Event log contains all possible
 traces of model and vice versa.
                                                PAGE 4
Another example

                  p1                  b        p3




            a              f               e        d

start                                 p5                  end

                  p2                  c        p4




 Generalization: event log contains only
 subset of all possible traces of model.
                                                        PAGE 5
Notation is less relevant (e.g. BPMN)
                  b



         a   p1   e    p3     d

start                                 end   b
             p2   c   p4



                                            c

                                  a             d
                      start                            end
                                            e




                                                    PAGE 6
Another BPMN example
            p1       b        p3




        a        f        e         d

start                p5                    end
                                                         b

            p2       c        p4



                                                         c

                                                 a               d
                                   start                                 end

                                                     f       e




                                                                     PAGE 7
Challenge

• In general, there is a trade-off between the following
  four quality criteria:
1.Fitness: the discovered model should allow for the
  behavior seen in the event log.
2.Precision (avoid underfitting): the discovered model
  should not allow for behavior completely unrelated
  to what was seen in the event log.
3.Generalization (avoid overfitting): the discovered
  model should generalize the example behavior seen
  in the event log.
4.Simplicity: the discovered model should be as
  simple as possible.
                                                      PAGE 8
Process Discovery:
example of algorithm



             α

                       PAGE 9
>,→,||,# relations


• Direct succession: x>y iff
  for some case x is directly
  followed by y.                            abcd
• Causality: x→y iff x>y and                acbd
  not y>x.                                   aed
• Parallel: x||y iff x>y and    a>b
  y>x                           a>c   a→b          b#e
                                a>e
• Choice: x#y iff not x>y and         a→c          e#b
                                b>c         b||c   c#e
  not y>x.                            a→e
                                b>d         c||b
                                c>b   b→d          a#d
                                                    …
                                c>d   c→d
                                e>d   e→d                PAGE 10
Basic Idea Used by α Algorithm (1)




        a                     b

     (a) sequence pattern: a→b




                                     PAGE 11
Basic Idea Used by α Algorithm (2)

                                 a

                                     b                   c

                                 b
           a
                                 (c) XOR-join pattern:
                       b
                                  a→c, b→c, and a#b
   a                                 c
                       c
            (b) XOR-split pattern:
   (b) XOR-split pattern:a→c, and b#c
              a→b,
   a→b, a→c, and b#c                                     PAGE 12
Basic Idea Used by α Algorithm (3)

                              a

                                    b                  c

                              b
          a
                               (e) AND-join pattern:
                 b              a→c, b→c, and a||b

  a
                                      c
                 c
             (d) AND-split pattern:
  (d) AND-split pattern:
               a→b, a→c, and b||c
   a→b, a→c, and b||c                                  PAGE 13
Example Revisited

 a>b       a→b    b||c   b#e
 a>c       a→c    c||b   e#b
 a>e       a→e           c#e
 b>c                     a#d
           b→d
 b>d                      …
 c>b       c→ d
 c>d       e→d                   b
 e>d

              a          p1      e   p3   d

   start                                      end

                         p2      c   p4
Result produced by α algorithm                PAGE 14
Footprint of L1


                    b



           a   p1   e   p3   d

   start                         end

               p2   c   p4




                                       PAGE 15
Footprint of L2



                     p1       b        p3




                 a        f        e        d

         start                p5                end

                     p2       c        p4




                                                      PAGE 16
Simple patterns

                     a                  b

                  (a) sequence pattern: a→b

                         b          a

    a                                                        c

                             c      b

    (b) XOR-split pattern:           (c) XOR-join pattern:
     a→b, a→c, and b#c               a→c, b→c, and a#b

                         b          a

    a                                                        c

                             c      b

    (d) AND-split pattern:           (e) AND-join pattern:
                                                                 PAGE 17
     a→b, a→c, and b||c               a→c, b→c, and a||b
Algorithm

Let L be an event log over T. α(L) is defined as follows.
1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ},
2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) },
3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) },
4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧
   ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL},
7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈
   YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and
8. α(L) = (PL,TL,FL).



                                                                      PAGE 18
Key idea: find places


                       a1                              b1


                       a2                              b2

                       ...           p(A,B)            ...
                       am                              bn


              A={a1,a2, … am}                 B={b1,b2, … bn}

4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧
   ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 },
5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) },
                                                                          PAGE 19
Places as footprints

                a1                          b1


                a2                          b2

                ...         p(A,B)          ...
                am                          bn


          A={a1,a2, … am}            B={b1,b2, … bn}




                                                       PAGE 20
b



        a   p1   e   p3   d

start                               end

            p2   c   p4




                          PAGE 21
Another event log L3




                       PAGE 22
Model for L3




                                            f



                                            c

        a                  b   p({b},{c})       p({c},{e})   e                  g

iL          p({a,f},{b})                    d                    p({e},{f,g})                 oL

                               p({b},{d})       p({d},{e})



                                                                                    PAGE 23
Another event log L4




      a                                      d

                          c

iL         p({a,b},{c})       p({c},{d,e})       oL
      b                                      e




                                                 PAGE 24
Event log L5




               PAGE 25
PAGE 26
Discovered model

                d                            c
                           p({c},{d})

                              b
            p({a,d},{b})                p({b},{c,f})
iL      a                                              f       oL

                              e
            p({a},{e})                  p({e},{f})




                                                           PAGE 27
Limitation of α algorithm
 (implicit places)




                             c
              a
                             d
                                 p1   g
                             e
                                 p2
              b
                             f

Green places are implicit!
                                          PAGE 28
Limitation of α algorithm
(loops of length 1)




              b



      a               c




                                b



                            a       c


                                        PAGE 29
Limitation of α algorithm
(loops of length 2)




                      c


     a                b       d




                                  b

                          a           d

                                  c


                                          PAGE 30
Limitation of α algorithm
(non-local dependencies)




           a                       p1   d

                             c

           b                       p2   e



Green places are not discovered!




                                            PAGE 31
Difficult constructs for α algorithm



           a

                                       c

           b




                                           PAGE 32
Taking the transactional life-cycle into
    account

a                                                    c
      assign                                               assign

                 b
                       start
    assigned                                             assigned
                               suspend


                     running
        start                                                start
                                         suspended

                               resume
                  complete
     running                                              running



    complete                                             complete




                                                                     PAGE 33
Rediscovering process models



                    simulate            discover   discovered
 original process
                                event                process
      model
                                 log                  model
         N                                              N’
                               N=N’ ?


 The rediscovery problem: Is the discovered
 model N’ equivalent to the original model N?



                                                                PAGE 34
Equivalence: trace equivalence,
 bisimilarity, and branching bisimilarity


                s1
                                              s5                    s8
                                      birth                                   curse
             birth                                          birth
        curse                     curse         s6      curse        curse
                              ?                                     s9              s10
                 s2
        curse                     heaven         hell   heaven        hell
                   heaven hell                                               hell
                heaven
curse                                                                        heaven
        s3               s4                s7                   s11
                 TS1                       TS2                  TS3

Three trace equivalent transition systems: TS1 and TS2
are not bisimilar, but TS2 and TS3 are bisimilar

                                                                                    PAGE 35
Branching bisimilarity defined for YAWL


   start                      s1                              s6          start

                          check                       check
  check                                                                  check
                              s2

                              τ          τ
                                                              s7
  c1       c2            s3                  s4                                   c3
                                                  ?

reject          accept   reject          accept   reject      accept   reject          accept

   end                                                           s8       end
                              s5

                                   TS1                     TS2
TS1 and TS2 are not branching bisimilar (although trace equivalent).

                                                                                           PAGE 36
Challenge: finding the right
representational bias




                                   a                       a
                     start                     p                     end


 There is no WF-net with unique visible labels that exhibits this behavior.

                                                                         PAGE 37
Another example

                 τ

        a        b          c
start       p1         p1       end
                 (a)


            a



        a        b          c         There is no WF-
start       p1         p1       end   net with unique
                 (b)                  visible labels
                                      that exhibits
                                      this behavior.


        a        b          c
start       p1         p1       end
                                               PAGE 38
                 (c)
Challenge: noise and incompleteness

• To discover a suitable process model it is
  assumed that the event log contains a
  representative sample of behavior.
• Two related phenomena:
    − Noise: the event log contains rare and
      infrequent behavior not representative for
      the typical behavior of the process.
    − Incompleteness: the event log contains
      too few events to be able to discover
      some of the underlying control-flow
      structures.
                                              PAGE 39
More on incompleteness




See also chapter 3 (cross-validation, precision, recall, etc.)
                                                                 PAGE 40
Challenge: Balancing
Between Underfitting and
Overfitting
                           PAGE 41
Challenge: four competing quality
criteria

 “able to replay event log”                 “Occam’s razor”

          fitness                             simplicity

                               process
                              discovery



generalization                                precision
 “not overfitting the log”                “not underfitting the log”



                                                                PAGE 42
Flower model


                  b   c
              a               d



      start                       end



              e
                                  h
                  f       g


                                        PAGE 43
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   0
 BCE   85
                 A        D
 BCD   0

                     C



                 B        E




                              PAGE 44
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   88
 BCE   85
                 A        D
 BCD   78

                     C



                 B        E




                              PAGE 45
What is the best model?
                 A        D



                     C



 ACD   99        B        E
 ACE   2
 BCE   85
                 A        D
 BCD   3

                     C



                 B        E




                              PAGE 46
Example: one log four models
                                                                                                               b
                                                                                                            examine
                                                                                                           thoroughly
                                                                                                                                                                            g
                                                                                                                                                                         pay
                                                                                                               c                                                     compensation
                                                                                          a                examine                                 e
                                                                           start     register              casually                           decide                                   end
                                                                                                                                                                                                     #      trace
                                                                                     request
                                                                                                                                                                            h                        455 acdeh
                                                                                                               d                                                         reject
                                                                                                          check ticket                                                  request                      191 abdeg
                                                                                                                                               f     reinitiate
                                                                                                                                                      request                                        177 adceh
                                                                               N1 : fitness = +, precision = +, generalization = +, simplicity = +
                                                                                                                                                                                                     144 abdeh
                                                                                                                                                                                                     111 acdeg
                                                                                      a              c                        d                          e                      h
                                                                                                                                                                                                      82 adceg
                                                                          start    register       examine                   check                      decide                reject     end
                                                                                   request        casually                  ticket                                          request
                                                                                                                                                                                                      56 adbeh
                                                                               N2 : fitness = -, precision = +, generalization = -, simplicity = +
                                                                                                                                                                                                      47 acdefdbeh
 “able to replay event log”                 “Occam’s razor”
                                                                                                                                                                                                      38 adbeg
                                                                                                       examine                                check
                                                                                                      thoroughly        b             d       ticket                        g                         33 acdefbdeh
          fitness                             simplicity                                                                                                            pay
                                                                                                                                                                compensation
                                                                                          a                                                                                                           14 acdefbdeg
                                                                           start     register   examine
                                                                                                             c                                                                         end            11 acdefdbeg
                                                                                     request    casually
                                                                                                                         e                f        reinitiate               h
                               process                                                                        decide                                request        reject
                                                                                                                                                                  request
                                                                                                                                                                                                         9 adcefcdeh
                              discovery                                        N3 : fitness = +, precision = -, generalization = +, simplicity = +                                                       8 adcefdbeh
                                                                                                                                                                                                         5 adcefbdeg
                                                                                       a              d                        c                           e                    g
                                                                                                                                                                                                         3 acdefbdefdbeg
generalization                                precision                             register
                                                                                    request
                                                                                                    check
                                                                                                    ticket
                                                                                                                         examine
                                                                                                                         casually
                                                                                                                                                        decide              pay
                                                                                                                                                                        compensation
                                                                                                                                                                                                         2 adcefdbeg
                                                                                       a              c                        d                          e                     g                        2 adcefbdefbdeg
 “not overfitting the log”                “not underfitting the log”                register      examine                    check                      decide              pay
                                                                                    request       casually                   ticket                                     compensation                     1 adcefdbefbdeh
                                                                                       a              d                        c                           e                    h                        1 adbefbdefdbeg
                                                                                    register        check                examine                        decide                reject
                                                                                    request         ticket               casually                                            request                     1 adcefdbefcdefdbeg
                                                                                      a               c                       d                           e                     h                   1391
                                                                       start                                                                                                                  end
                                                                                   register       examine                   check                      decide                reject
                                                                                   request        casually                  ticket                                          request


                                                                                                 …                 (all 21 variants seen in the log)


                                                                                      a              b                        d                           e                     g
                                                                                   register        examine                  check                      decide               pay
                                                                                   request        thoroughly                ticket                                      compensation

                                                                                      a              d                        b                           e                     h
                                                                                   register         check                 examine                      decide                reject
                                                                                   request          ticket               thoroughly                                         request

                                                                                      a              b                        d                           e                     h
                                                                                   register        examine                  check                      decide                reject
                                                                                   request        thoroughly                ticket                                          request                        PAGE 47
                                                                                N4 : fitness = +, precision = +, generalization = -, simplicity = -
#     trace
                                                                               455 acdeh
        Model N1                                                               191 abdeg
                                                                               177 adceh
                                                                               144 abdeh
                                                                               111 acdeg
                                                                                82 adceg
                                                                                56 adbeh
                          b                                                     47 acdefdbeh
                       examine
                      thoroughly                                                38 adbeg
                                                              g                 33 acdefbdeh
                                                             pay
                          c                              compensation           14 acdefbdeg
            a         examine               e
                                                                                11 acdefdbeg
start    register     casually          decide                          end
         request                                                                   9 adcefcdeh
                                                              h
                          d                                 reject                 8 adcefdbeh
                     check ticket                          request                 5 adcefbdeg
                                        f   reinitiate                             3 acdefbdefdbeg
                                             request
N1 : fitness = +, precision = +, generalization = +, simplicity = +                2 adcefdbeg
                                                                                   2 adcefbdefbdeg
                                                                                   1 adcefdbefbdeh
                                                                                   1 adbefbdefdbeg
                                                                                   1 adcefdbefcdefdbeg
                                                                                             PAGE 48
                                                                              1391
#     trace
                                                                              455 acdeh
        Model N2                                                              191 abdeg
                                                                              177 adceh
                                                                              144 abdeh
                                                                              111 acdeg
                                                                               82 adceg
                                                                               56 adbeh
                                                                               47 acdefdbeh
                                                                               38 adbeg
           a          c             d            e             h               33 acdefbdeh
start   register   examine        check        decide         reject   end     14 acdefbdeg
        request    casually       ticket                     request
   N2 : fitness = -, precision = +, generalization = -, simplicity = +         11 acdefdbeg
                                                                                  9 adcefcdeh
                                                                                  8 adcefdbeh
                                                                                  5 adcefbdeg
                                                                                  3 acdefbdefdbeg
                                                                                  2 adcefdbeg
                                                                                  2 adcefbdefbdeg
                                                                                  1 adcefdbefbdeh
                                                                                  1 adbefbdefdbeg
                                                                                  1 adcefdbefcdefdbeg
                                                                                            PAGE 49
                                                                             1391
#     trace
                                                                                          455 acdeh
        Model N3                                                                          191 abdeg
                                                                                          177 adceh
                                                                                          144 abdeh
                                                                                          111 acdeg
                                                                                           82 adceg
                                                                                           56 adbeh
                                                                                           47 acdefdbeh
                           examine                  check
                          thoroughly    b   d       ticket                     g           38 adbeg
                                                                        pay                33 acdefbdeh
                                                                    compensation
            a                                                                              14 acdefbdeg
start    register   examine                                                        end     11 acdefdbeg
         request    casually   c
                                        e       f      reinitiate
                                                                      reject
                                                                               h              9 adcefcdeh
                               decide                   request
                                                                     request                  8 adcefdbeh
 N3 : fitness = +, precision = -, generalization = +, simplicity = +
                                                                                              5 adcefbdeg
                                                                                              3 acdefbdefdbeg
                                                                                              2 adcefdbeg
                                                                                              2 adcefbdefbdeg
                                                                                              1 adcefdbefbdeh
                                                                                              1 adbefbdefdbeg
                                                                                              1 adcefdbefcdefdbeg
                                                                                                        PAGE 50
                                                                                         1391
#     trace
                                                                                               455 acdeh
Model N4                                                                                       191 abdeg
                                                                                               177 adceh
                                                                                               144 abdeh
              a             d                 c                e              g                111 acdeg
           register       check           examine            decide          pay
           request        ticket          casually                       compensation           82 adceg
              a             c                 d                e              g                 56 adbeh
           register     examine             check           decide           pay
           request      casually            ticket                       compensation           47 acdefdbeh
              a             d                 c                e              h                 38 adbeg
           register       check           examine           decide           reject
           request        ticket          casually                          request             33 acdefbdeh
              a            c                 d                e               h                 14 acdefbdeg
start                                                                                   end
           register     examine            check            decide           reject
           request      casually           ticket                           request             11 acdefdbeg

                       …             (all 21 variants seen in the log)
                                                                                                   9 adcefcdeh
                                                                                                   8 adcefdbeh
                                                                                                   5 adcefbdeg
             a             b                 d                e               g
           register     examine            check            decide           pay                   3 acdefbdefdbeg
           request     thoroughly          ticket                        compensation
                                                                                                   2 adcefdbeg
              a            d                 b                e               h
           register       check            examine          decide           reject                2 adcefbdefbdeg
           request        ticket          thoroughly                        request
                                                                                                   1 adcefdbefbdeh
             a             b                 d                e               h
          register       examine           check            decide          reject                 1 adbefbdefdbeg
          request       thoroughly         ticket                          request
                                                                                                   1 adcefdbefcdefdbeg
        N4 : fitness = +, precision = +, generalization = -, simplicity = -
                                                                                                             PAGE 51
                                                                                              1391
Why is process mining such a difficult
problem?

• There are no negative examples (i.e., a log shows
  what has happened but does not show what could
  not happen).
• Due to concurrency, loops, and choices the search
  space has a complex structure and the log typically
  contains only a fraction of all possible behaviors.
• There is no clear relation between the size of a model
  and its behavior (i.e., a smaller model may generate
  more or less behavior although classical analysis
  and evaluation methods typically assume some
  monotonicity property).


                                                     PAGE 52
Creating a 2-D slice of a 3-D reality




Creating a 2-D slice of a 3-D reality: the
process is viewed from a specific angle, the
process is scoped using a frame, and the
resolution determines the granularity of the
resulting model                                PAGE 53

Process Mining - Chapter 5 - Process Discovery

  • 1.
    Chapter 5 Process Discovery: AnIntroduction prof.dr.ir. Wil van der Aalst www.processmining.org
  • 2.
    Overview Chapter 1 Introduction Part I:Preliminaries Chapter 2 Chapter 3 Process Modeling and Data Mining Analysis Part II: From Event Logs to Process Models Chapter 4 Chapter 5 Chapter 6 Getting the Data Process Discovery: An Advanced Process Introduction Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Chapter 8 Chapter 9 Conformance Mining Additional Operational Support Checking Perspectives Part IV: Putting Process Mining to Work Chapter 10 Chapter 11 Chapter 12 Tool Support Analyzing “Lasagna Analyzing “Spaghetti Processes” Processes” Part V: Reflection Chapter 13 Chapter 14 Cartography and Epilogue Navigation PAGE 1
  • 3.
    Process discovery supports/ “world” business controls processes software people machines system components organizations records events, e.g., messages, specifies transactions, models configures etc. analyzes implements analyzes discovery (process) event conformance model logs enhancement PAGE 2
  • 4.
    Process discovery =Play-In Play-In event log process model Play-Out process model event log Replay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendations event log process model PAGE 3
  • 5.
    Example b a p1 e p3 d start end p2 c p4 Event log contains all possible traces of model and vice versa. PAGE 4
  • 6.
    Another example p1 b p3 a f e d start p5 end p2 c p4 Generalization: event log contains only subset of all possible traces of model. PAGE 5
  • 7.
    Notation is lessrelevant (e.g. BPMN) b a p1 e p3 d start end b p2 c p4 c a d start end e PAGE 6
  • 8.
    Another BPMN example p1 b p3 a f e d start p5 end b p2 c p4 c a d start end f e PAGE 7
  • 9.
    Challenge • In general,there is a trade-off between the following four quality criteria: 1.Fitness: the discovered model should allow for the behavior seen in the event log. 2.Precision (avoid underfitting): the discovered model should not allow for behavior completely unrelated to what was seen in the event log. 3.Generalization (avoid overfitting): the discovered model should generalize the example behavior seen in the event log. 4.Simplicity: the discovered model should be as simple as possible. PAGE 8
  • 10.
    Process Discovery: example ofalgorithm α PAGE 9
  • 11.
    >,→,||,# relations • Directsuccession: x>y iff for some case x is directly followed by y. abcd • Causality: x→y iff x>y and acbd not y>x. aed • Parallel: x||y iff x>y and a>b y>x a>c a→b b#e a>e • Choice: x#y iff not x>y and a→c e#b b>c b||c c#e not y>x. a→e b>d c||b c>b b→d a#d … c>d c→d e>d e→d PAGE 10
  • 12.
    Basic Idea Usedby α Algorithm (1) a b (a) sequence pattern: a→b PAGE 11
  • 13.
    Basic Idea Usedby α Algorithm (2) a b c b a (c) XOR-join pattern: b a→c, b→c, and a#b a c c (b) XOR-split pattern: (b) XOR-split pattern:a→c, and b#c a→b, a→b, a→c, and b#c PAGE 12
  • 14.
    Basic Idea Usedby α Algorithm (3) a b c b a (e) AND-join pattern: b a→c, b→c, and a||b a c c (d) AND-split pattern: (d) AND-split pattern: a→b, a→c, and b||c a→b, a→c, and b||c PAGE 13
  • 15.
    Example Revisited a>b a→b b||c b#e a>c a→c c||b e#b a>e a→e c#e b>c a#d b→d b>d … c>b c→ d c>d e→d b e>d a p1 e p3 d start end p2 c p4 Result produced by α algorithm PAGE 14
  • 16.
    Footprint of L1 b a p1 e p3 d start end p2 c p4 PAGE 15
  • 17.
    Footprint of L2 p1 b p3 a f e d start p5 end p2 c p4 PAGE 16
  • 18.
    Simple patterns a b (a) sequence pattern: a→b b a a c c b (b) XOR-split pattern: (c) XOR-join pattern: a→b, a→c, and b#c a→c, b→c, and a#b b a a c c b (d) AND-split pattern: (e) AND-join pattern: PAGE 17 a→b, a→c, and b||c a→c, b→c, and a||b
  • 19.
    Algorithm Let L bean event log over T. α(L) is defined as follows. 1. TL = { t ∈ T | ∃σ ∈ L t ∈ σ}, 2. TI = { t ∈ T | ∃σ ∈ L t = first(σ) }, 3. TO = { t ∈ T | ∃σ ∈ L t = last(σ) }, 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 }, 5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, 6. PL = { p(A,B) | (A,B) ∈ YL } ∪{iL,oL}, 7. FL = { (a,p(A,B)) | (A,B) ∈ YL ∧ a ∈ A } ∪ { (p(A,B),b) | (A,B) ∈ YL ∧ b ∈ B } ∪{ (iL,t) | t ∈ TI} ∪{ (t,oL) | t ∈ TO}, and 8. α(L) = (PL,TL,FL). PAGE 18
  • 20.
    Key idea: findplaces a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} 4. XL = { (A,B) | A ⊆ TL ∧ A ≠ ø ∧ B ⊆ TL ∧ B ≠ ø ∧ ∀a ∈ A∀b ∈ B a →L b ∧ ∀a1,a2 ∈ A a1#L a2 ∧ ∀b1,b2 ∈ B b1#L b2 }, 5. YL = { (A,B) ∈ XL | ∀(A′,B′) ∈ XL A ⊆ A′ ∧B ⊆ B′⇒ (A,B) = (A′,B′) }, PAGE 19
  • 21.
    Places as footprints a1 b1 a2 b2 ... p(A,B) ... am bn A={a1,a2, … am} B={b1,b2, … bn} PAGE 20
  • 22.
    b a p1 e p3 d start end p2 c p4 PAGE 21
  • 23.
  • 24.
    Model for L3 f c a b p({b},{c}) p({c},{e}) e g iL p({a,f},{b}) d p({e},{f,g}) oL p({b},{d}) p({d},{e}) PAGE 23
  • 25.
    Another event logL4 a d c iL p({a,b},{c}) p({c},{d,e}) oL b e PAGE 24
  • 26.
  • 27.
  • 28.
    Discovered model d c p({c},{d}) b p({a,d},{b}) p({b},{c,f}) iL a f oL e p({a},{e}) p({e},{f}) PAGE 27
  • 29.
    Limitation of αalgorithm (implicit places) c a d p1 g e p2 b f Green places are implicit! PAGE 28
  • 30.
    Limitation of αalgorithm (loops of length 1) b a c b a c PAGE 29
  • 31.
    Limitation of αalgorithm (loops of length 2) c a b d b a d c PAGE 30
  • 32.
    Limitation of αalgorithm (non-local dependencies) a p1 d c b p2 e Green places are not discovered! PAGE 31
  • 33.
    Difficult constructs forα algorithm a c b PAGE 32
  • 34.
    Taking the transactionallife-cycle into account a c assign assign b start assigned assigned suspend running start start suspended resume complete running running complete complete PAGE 33
  • 35.
    Rediscovering process models simulate discover discovered original process event process model log model N N’ N=N’ ? The rediscovery problem: Is the discovered model N’ equivalent to the original model N? PAGE 34
  • 36.
    Equivalence: trace equivalence, bisimilarity, and branching bisimilarity s1 s5 s8 birth curse birth birth curse curse s6 curse curse ? s9 s10 s2 curse heaven hell heaven hell heaven hell hell heaven curse heaven s3 s4 s7 s11 TS1 TS2 TS3 Three trace equivalent transition systems: TS1 and TS2 are not bisimilar, but TS2 and TS3 are bisimilar PAGE 35
  • 37.
    Branching bisimilarity definedfor YAWL start s1 s6 start check check check check s2 τ τ s7 c1 c2 s3 s4 c3 ? reject accept reject accept reject accept reject accept end s8 end s5 TS1 TS2 TS1 and TS2 are not branching bisimilar (although trace equivalent). PAGE 36
  • 38.
    Challenge: finding theright representational bias a a start p end There is no WF-net with unique visible labels that exhibits this behavior. PAGE 37
  • 39.
    Another example τ a b c start p1 p1 end (a) a a b c There is no WF- start p1 p1 end net with unique (b) visible labels that exhibits this behavior. a b c start p1 p1 end PAGE 38 (c)
  • 40.
    Challenge: noise andincompleteness • To discover a suitable process model it is assumed that the event log contains a representative sample of behavior. • Two related phenomena: − Noise: the event log contains rare and infrequent behavior not representative for the typical behavior of the process. − Incompleteness: the event log contains too few events to be able to discover some of the underlying control-flow structures. PAGE 39
  • 41.
    More on incompleteness Seealso chapter 3 (cross-validation, precision, recall, etc.) PAGE 40
  • 42.
  • 43.
    Challenge: four competingquality criteria “able to replay event log” “Occam’s razor” fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” PAGE 42
  • 44.
    Flower model b c a d start end e h f g PAGE 43
  • 45.
    What is thebest model? A D C ACD 99 B E ACE 0 BCE 85 A D BCD 0 C B E PAGE 44
  • 46.
    What is thebest model? A D C ACD 99 B E ACE 88 BCE 85 A D BCD 78 C B E PAGE 45
  • 47.
    What is thebest model? A D C ACD 99 B E ACE 2 BCE 85 A D BCD 3 C B E PAGE 46
  • 48.
    Example: one logfour models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdeh fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeg generalization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 47 N4 : fitness = +, precision = +, generalization = -, simplicity = -
  • 49.
    # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbeg start register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg request N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 48 1391
  • 50.
    # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdeh start register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 49 1391
  • 51.
    # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdeg start register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 50 1391
  • 52.
    # trace 455 acdeh Model N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdeg start end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = - PAGE 51 1391
  • 53.
    Why is processmining such a difficult problem? • There are no negative examples (i.e., a log shows what has happened but does not show what could not happen). • Due to concurrency, loops, and choices the search space has a complex structure and the log typically contains only a fraction of all possible behaviors. • There is no clear relation between the size of a model and its behavior (i.e., a smaller model may generate more or less behavior although classical analysis and evaluation methods typically assume some monotonicity property). PAGE 52
  • 54.
    Creating a 2-Dslice of a 3-D reality Creating a 2-D slice of a 3-D reality: the process is viewed from a specific angle, the process is scoped using a frame, and the resolution determines the granularity of the resulting model PAGE 53