Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs

Split Miner:
Discovering Accurate and Simple
Business Process Models from Event Logs
Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa
117th IEEE International Conference on Data MiningSplit Miner

What is a Business Process?
Start event
End event
Parallel (AND) Gateway
Exclusive (XOR) Gateway
Split (AND) Gateway Join (XOR) Gateway

Event Logs & Traces
Case ID Activity Timestamp …
01 Enter Loan Application 2007-11-09 T 11:20:10 -
01 Compute Installments 2007-11-09 T 11:25:15 -
01 Retrieve Applicant Data 2007-11-09 T 11:40:50 -
02 Retrieve Applicant Data 2007-11-09 T 11:50:00 -
02 Notify Eligibility 2007-11-09 T 12:10:45 -
01 Notify Rejection 2007-11-09 T 14:45:00 -
02 Approve Simple Application 2007-11-09 T 15:00:30 -
… … … …

Quality of Discovered Process Models
— Recall (a.k.a. Fitness): behaviour recorded in the event log reproducible by the model (%)
— Precision: behaviour producible by the model observed in the event log (%)
— F-Score: (2 * fitness * precision) / (fitness + precision)
— Generalization: behaviour producible by the model eventually recorded in the event log (%)
— Complexity: a set of metrics estimating simplicity and understandability of the process model

Process Discovery Methods
Heuristics Miner
good F-score
complex
semantic errors
tuning complexity
Inductive Miner
high fitness
no semantic errors
simple
low precision

Our Objectives
Balanced Fitness and Precision
High F-Score
High Generalization
Low Complexity
Low Execution Time
Tuning Simplicity

From Event Log to Process Model in 5 Steps
Directly-Follows
Graph and
Loops Discovery
Filtering
Concurrency
Discovery
Splits
Discovery
Joins
Discovery
Event
Log
Process
Model

Trace #obs
a » b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model

Trace #obs
a » b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model

Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)

Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model

Break for Make-up!

Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)

Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model

Done!

Evaluation Setup
— 12 publicly available real-life event logs
domains: finance, health care, law enforcement, government, information technology
recorded events: 6,660 to 561,470
recorded traces: 681 to 150,370
— 5 state-of-the-art process model discovery methods
— hyperparameter-optimization evaluation
— default-parameters evaluation

Hyperparameter-optimization Results
10 times out of 12 Split Miner (green points) pareto-dominates the other methods

Default-parameters Results
— 11 times Split Miner achieved the highest F-score
balancing fitness and precision
— 9 times Split Miner achieved an F-score greater than 0.80
high F-score
— 10 times Split Miner scored the highest generalization
high generalization
— Split Miner always discovered at least the second simplest process model
low complexity
— Split Miner was always 2 to 6 times faster than the second fastest
low execution time
— Split Miner has only 2 simple and well documented tuning parameters
tuning simplicity

Split Miner in Apromore

Future Work
— Improve the Filtering to achieve higher F-score
— Minimize the use of the OR-gateways
— Turn Split Miner into an Run-Time process discovery method
— Apply a-posteriori algorithms to improve model simplicity

Thanks for Listening!
Any Questions?

Extra Slides

Concurrency Discovery
b || c
1. There is an edge from b to c
2. There is an edge from c to b
3. b and c are not a short-loop
4. edges (b, c) and (c, b) have
similar frequencies
||a →b| − |b →a||
|a →b| + |b →a|
< ε

Concurrency Discovery
ε = 0.2
b || c
b || d
d || e
e || g

Filtering
— Retain the most frequent behaviour (high fitness)
— Retain the minimum number of edges (high precision, low complexity)
— Guarantee each node on a path from start to end (high fitness)
1. Create a set B
2. Add to B the most frequent incoming and outgoing edges of each node of the DFG
3. Evaluate the percentile-ƞ on the frequencies of the edges in B
4. Add to B the edges with greater frequency than the percentile-ƞ
5. Remove all the edges of the DFG
6. Starting from the most frequent edge e in B, add e to the DFG if:
the source of e does not have outgoing edges OR
the target of e does not have incoming edges.

Split Discovery
b || c
b || d
b || c, d
c || b
d || b
Split A successors: b, c, d

Split Discovery
split: A
b || c, d
c || b
d || b
Xor(c, d)
b || c, d
c, d || b
And(b, Xor(c,d))
b, c, d || —

Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs

In this document

More Related Content

What's hot

Similar to Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs

More from Marlon Dumas

Recently uploaded

Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs