Process mining chapter_06_advanced_process_discovery_techniques

Chapter 6
Advanced Process
Discovery Techniques
prof.dr.ir. Wil van der Aalst
www.processmining.org

Overview
Chapter 1
Introduction

Part I: Preliminaries

Chapter 2 Chapter 3
Process Modeling and Data Mining
Analysis

Part II: From Event Logs to Process Models

Chapter 4 Chapter 5 Chapter 6
Getting the Data Process Discovery: An Advanced Process
Introduction Discovery Techniques

Part III: Beyond Process Discovery

Conformance Mining Additional Operational Support
Checking Perspectives

Part IV: Putting Process Mining to Work

Tool Support Analyzing “Lasagna Analyzing “Spaghetti
Processes” Processes”

Part V: Reflection

Chapter 13 Chapter 14
Cartography and Epilogue
Navigation
PAGE 1

Process discovery

supports/
“world” business
controls
processes software
people machines system
components
organizations records
events, e.g.,
messages,
specifies transactions,
models
configures etc.
analyzes
implements
analyzes

discovery
(process) event
conformance
model logs
enhancement
PAGE 2

Challenge

“able to replay event log” “Occam’s razor”

fitness simplicity

process
discovery

generalization precision
“not overfitting the log” “not underfitting the log”

PAGE 3

Observing a stable process infinitely long

frequent all behavior
behavior trace in (including noise)
event log

PAGE 4

Target model

target model

PAGE 5

Non-fitting model

non-fitting model

PAGE 6

Overfitting model

overfitting model

PAGE 7

Underfitting model

underfitting model

PAGE 8

Characteristics of process discovery
algorithms
• Representational bias
− Inability to represent concurrency
− Inability to deal with (arbitrary) loops
− Inability to represent silent actions
− Inability to represent duplicate actions
− Inability to model OR-splits/joins
− Inability to represent non-free-choice behavior
− Inability to represent hierarchy
• Ability to deal with noise
• Completeness notion assumed
• Approach used (direct algorithmic approaches, two-
phase approaches, computational intelligence
approaches, partial approaches, etc.) PAGE 9

Examples
• Algorithmic techniques
• Alpha miner
• Alpha+, Alpha++, Alpha#
• FSM miner
• Fuzzy miner
• Heuristic miner
• Multi phase miner
• Genetic process mining
• Single/duplicate tasks
• Distributed GM
• Region-based process mining
• State-based regions
• Language based regions
• Classical approaches not dealing with concurrency
• Inductive inference (Mark Gold, Dana Angluin et al.)
• Sequence mining
PAGE 10

Heuristic mining

• To deal with noise and incompleteness.
• To have a better representational bias than the α
algorithm (AND/XOR/OR/skip).
• Uses C-nets.

b
check
policy

a c e
register check close
claim damage case

d
consult
expert
PAGE 11

Example log; problem α algorithm

p5

b

a p1 d p3 e

start end

p2 c p4

PAGE 12

Taking into account frequencies

PAGE 13

Dependency measure

PAGE 14

Lower threshold (2 direct successions and
a dependency of at least 0.7)
5(0.83)

b

11(0.92) 11(0.92)

a c e
11(0.92) 11(0.92)

13(0.93) 13(0.93)
d

4(0.80)

PAGE 16

Higher threshold (5 direct successions
and a dependency of at least 0.9)

b
11(0.92) 11(0.92)

a c e
11(0.92) 11(0.92)

13(0.93) 13(0.93)
d

PAGE 17

Learning splits and joins

5
20 b 20

21
5 20 20 5

20 20 20 20
a c e
40 20 21 20 40
13
13
13 13
13 13
d
4 17
4
4

PAGE 18

Alternative visualization

5
20 b 20

21
5 20 20 5

20 20 20 20
a c e
40 20 21 20 40
13
13
13 13
13 13
d b
4 17
4
4
AND AND
a c e

d

PAGE 19

Characteristics of heuristic mining

• Can deal with noise and therefore quite robust.
• Improved representational bias.
• Split and join rules are only considered locally
(therefore most of the discovered model are not
sound and require repair actions).

PAGE 20

Genetic process mining

create initial
population

event log mutation

next generation
compute
fitness
elitism
termination
tournament children

crossover

select best parents
individual

“dead” individuals

PAGE 21

Design decisions

• Representation of individuals
• Initialization
• Fitness function
• Selection strategy (tournament and elitism)
• Crossover create initial
population

• Mutation event log mutation

next generation
compute
fitness
elitism
termination
tournament children

crossover

select best parents
individual

“dead” individuals

PAGE 22

Example: crossover

b b
examine examine
thoroughly thoroughly
g g
pay pay
c c
compensation compensation
a e a e
examine examine
start register casually decide end start register casually decide end
request request
h h
d d
reject reject
check ticket request check ticket request
f f
reinitiate reinitiate
request request

b b
examine examine
g g
pay pay
c c
a e a e
examine examine
request request
h h
d d
reject reject
f f
reinitiate
reinitiate
request
request

PAGE 23

Example: mutation

remove place

b b
examine examine
g g
pay pay
c c
a e a e
examine examine
request request
h h
d d
reject reject
f f
reinitiate reinitiate
request
added arc request

PAGE 24

Characteristics of genetic
process mining

• Requires a lot of computing power.
• Can be distributed easily.
• Can deal with noise, infrequent behavior, duplicate tasks,
invisible tasks, etc.
• Allows for incremental improvement and combinations
with other approaches (heuristics post-optimization, etc.).
PAGE 25

Region-based mining

• Two types of regions theory:
− State-based regions
− Language-based regions
• All about discovering places (like in the α algorithm)!

a1 b1

a2 b2

... p(A,B) ...
am bn

A={a1,a2, … am} B={b1,b2, … bn}
PAGE 26

State-based regions

Two steps:
1.Discover a transition system (different abstractions
are possible)
2.Convert transition system into an “equivalent” Petri
net.

PAGE 27

Step 1: learning a transition system

current state

trace: abcdcdcde faghhhi
past future

past and future

• past, future, past+future
• sequence, multiset, set abstraction
• limited horizon to abstract further
• filtering e.g. based on transaction type, names, etc.
• labels based on activity name or other features
PAGE 28

Past without abstraction (full sequence)

c d
‹a,b›
‹a,b,c› ‹a,b,c,d›
b
a e d
‹› ‹a› ‹a,e› ‹a,e,d›
c
b d
‹a,c›
‹a,c,b› ‹a,c,b,d›

PAGE 29

Future without abstraction

a b ‹c,d›
‹a,b,c,d› ‹b,c,d› c
a e d
‹a,e,d› ‹e,d› ‹d › ‹›
b
a c
‹b,d›
‹a,c,b,d› ‹c,b,d›

PAGE 30

Past with multiset abstraction

[a,e]
d
[a,d,e]
e [a,b]
a b
[] [a]
c c
b d
[a,c] [a,b,c] [a,b,c,d]

PAGE 31

Only last event matters for state

‹e›
e d
a b
‹ b› d
‹› ‹a › c b ‹d›
c d

‹c›

PAGE 32

Step 2: constructing a Petri net using
regions
a = enter
b d b = enter
a e c = exit
d = exit
f d e = do not cross
e f = do not cross
e

f c
a

R

a c

e f
pR
b d

PAGE 33

Example

d
e
[a,e] [a,d,e]
[ a,b]
a b
[] [a] c
c
b d
[a,c] [a,b,c] [a,b,c,d]

b

a p1 e p3 d

start end

p2 c p4
PAGE 34

Language based regions

f c1

a1 b1

e c d
pR
a2 b2

X Y

Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} =
transitions producing a token for pR, Y = {b1,b2,c1} = transitions
consuming a token from pR, and c is the initial marking of pR. PAGE 35

Based idea: enough tokens should be
present when consuming
A place is feasible if it
can be added without
f c1 disabling any of the
traces in the event log.

a1 b1

e c d
pR
a2 b2

X Y

PAGE 36

Model

a p5 d

c
p1 p2 p3 p4
b e

p6

PAGE 39

Characteristics of region-based mining

• Can be used to discover more complex control-flow
structures.
• Classical approaches need to be adapted
(overfitting!).
• Representational bias can be parameterized (e.g.,
free-choice nets, label splitting, etc.).
• Problems dealing with noise.

PAGE 40

Other approaches, e.g. fuzzy mining

PAGE 41

Evaluating the discovered process

Fitness: Is the event log
possible according to the
model?

Precision: Is the model Generalization: Is the model
not underfitting (allow for not overfitting (only allow for
too much)? the “accidental” examples)?

Structure: Is this the
simplest model (Occam's
Razor)?

PAGE 42

Process mining chapter_06_advanced_process_discovery_techniques

Recommended

Recommended

More Related Content

Similar to Process mining chapter_06_advanced_process_discovery_techniques

Similar to Process mining chapter_06_advanced_process_discovery_techniques (20)

More from Muhammad Ajmal

More from Muhammad Ajmal (8)

Process mining chapter_06_advanced_process_discovery_techniques