Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs
The document discusses the Split Miner, a process discovery tool that extracts accurate and simple business process models from event logs. It evaluates the quality of discovered models based on fitness, precision, generalization, complexity, and execution time, showcasing its effectiveness against other methodologies. Future work aims to enhance filtering, reduce the complexity of gateways, and develop real-time process discovery capabilities.
Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs
1.
Split Miner:
Discovering Accurateand Simple
Business Process Models from Event Logs
Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa
117th IEEE International Conference on Data MiningSplit Miner
2.
What is aBusiness Process?
217th IEEE International Conference on Data MiningSplit Miner
Start event
End event
Parallel (AND) Gateway
Exclusive (XOR) Gateway
Split (AND) Gateway Join (XOR) Gateway
3.
Event Logs &Traces
Case ID Activity Timestamp …
01 Enter Loan Application 2007-11-09 T 11:20:10 -
01 Compute Installments 2007-11-09 T 11:25:15 -
02 Enter Loan Application 2007-11-09 T 11:30:40 -
01 Retrieve Applicant Data 2007-11-09 T 11:40:50 -
02 Retrieve Applicant Data 2007-11-09 T 11:50:00 -
02 Compute Installments 2007-11-09 T 12:00:30 -
02 Notify Eligibility 2007-11-09 T 12:10:45 -
03 Enter Loan Application 2007-11-09 T 13:23:15 -
03 Compute Installments 2007-11-09 T 13:30:35 -
01 Notify Rejection 2007-11-09 T 14:45:00 -
02 Approve Simple Application 2007-11-09 T 15:00:30 -
… … … …
317th IEEE International Conference on Data MiningSplit Miner
4.
Quality of DiscoveredProcess Models
— Recall (a.k.a. Fitness): behaviour recorded in the event log reproducible by the model (%)
— Precision: behaviour producible by the model observed in the event log (%)
— F-Score: (2 * fitness * precision) / (fitness + precision)
— Generalization: behaviour producible by the model eventually recorded in the event log (%)
— Complexity: a set of metrics estimating simplicity and understandability of the process model
417th IEEE International Conference on Data MiningSplit Miner
5.
Process Discovery Methods
HeuristicsMiner
good F-score
complex
semantic errors
tuning complexity
517th IEEE International Conference on Data MiningSplit Miner
Inductive Miner
high fitness
no semantic errors
simple
low precision
6.
Our Objectives
Balanced Fitnessand Precision
High F-Score
High Generalization
Low Complexity
Low Execution Time
Tuning Simplicity
617th IEEE International Conference on Data MiningSplit Miner
7.
From Event Logto Process Model in 5 Steps
Directly-Follows
Graph and
Loops Discovery
Filtering
Concurrency
Discovery
Splits
Discovery
Joins
Discovery
717th IEEE International Conference on Data MiningSplit Miner
Event
Log
Process
Model
8.
Trace #obs
a »b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
817th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
9.
Trace #obs
a »b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
917th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
10.
1017th IEEE InternationalConference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)
11.
1117th IEEE InternationalConference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
1317th IEEE InternationalConference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)
14.
1417th IEEE InternationalConference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
Evaluation Setup
— 12publicly available real-life event logs
domains: finance, health care, law enforcement, government, information technology
recorded events: 6,660 to 561,470
recorded traces: 681 to 150,370
— 5 state-of-the-art process model discovery methods
— hyperparameter-optimization evaluation
— default-parameters evaluation
1617th IEEE International Conference on Data MiningSplit Miner
17.
Hyperparameter-optimization Results
10 timesout of 12 Split Miner (green points) pareto-dominates the other methods
1717th IEEE International Conference on Data MiningSplit Miner
18.
Default-parameters Results
— 11times Split Miner achieved the highest F-score
balancing fitness and precision
— 9 times Split Miner achieved an F-score greater than 0.80
high F-score
— 10 times Split Miner scored the highest generalization
high generalization
— Split Miner always discovered at least the second simplest process model
low complexity
— Split Miner was always 2 to 6 times faster than the second fastest
low execution time
— Split Miner has only 2 simple and well documented tuning parameters
tuning simplicity
1817th IEEE International Conference on Data MiningSplit Miner
19.
Split Miner inApromore
1917th IEEE International Conference on Data MiningSplit Miner
20.
Future Work
— Improvethe Filtering to achieve higher F-score
— Minimize the use of the OR-gateways
— Turn Split Miner into an Run-Time process discovery method
— Apply a-posteriori algorithms to improve model simplicity
2017th IEEE International Conference on Data MiningSplit Miner
Concurrency Discovery
b ||c
1. There is an edge from b to c
2. There is an edge from c to b
3. b and c are not a short-loop
4. edges (b, c) and (c, b) have
similar frequencies
||a →b| − |b →a||
|a →b| + |b →a|
< ε
2317th IEEE International Conference on Data MiningSplit Miner
Filtering
— Retain themost frequent behaviour (high fitness)
— Retain the minimum number of edges (high precision, low complexity)
— Guarantee each node on a path from start to end (high fitness)
1. Create a set B
2. Add to B the most frequent incoming and outgoing edges of each node of the DFG
3. Evaluate the percentile-ƞ on the frequencies of the edges in B
4. Add to B the edges with greater frequency than the percentile-ƞ
5. Remove all the edges of the DFG
6. Starting from the most frequent edge e in B, add e to the DFG if:
the source of e does not have outgoing edges OR
the target of e does not have incoming edges.
2517th IEEE International Conference on Data MiningSplit Miner
26.
Split Discovery
b ||c
b || d
b || c, d
c || b
d || b
2617th IEEE International Conference on Data MiningSplit Miner
Split A successors: b, c, d
27.
Split Discovery
split: A
b|| c, d
c || b
d || b
Xor(c, d)
b || c, d
c, d || b
And(b, Xor(c,d))
b, c, d || —
2717th IEEE International Conference on Data MiningSplit Miner