Split Miner:
Discovering Accurate and Simple
Business Process Models from Event Logs
Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa
117th IEEE International Conference on Data MiningSplit Miner
What is a Business Process?
217th IEEE International Conference on Data MiningSplit Miner
Start event
End event
Parallel (AND) Gateway
Exclusive (XOR) Gateway
Split (AND) Gateway Join (XOR) Gateway
Event Logs & Traces
Case ID Activity Timestamp …
01 Enter Loan Application 2007-11-09 T 11:20:10 -
01 Compute Installments 2007-11-09 T 11:25:15 -
02 Enter Loan Application 2007-11-09 T 11:30:40 -
01 Retrieve Applicant Data 2007-11-09 T 11:40:50 -
02 Retrieve Applicant Data 2007-11-09 T 11:50:00 -
02 Compute Installments 2007-11-09 T 12:00:30 -
02 Notify Eligibility 2007-11-09 T 12:10:45 -
03 Enter Loan Application 2007-11-09 T 13:23:15 -
03 Compute Installments 2007-11-09 T 13:30:35 -
01 Notify Rejection 2007-11-09 T 14:45:00 -
02 Approve Simple Application 2007-11-09 T 15:00:30 -
… … … …
317th IEEE International Conference on Data MiningSplit Miner
Quality of Discovered Process Models
— Recall (a.k.a. Fitness): behaviour recorded in the event log reproducible by the model (%)
— Precision: behaviour producible by the model observed in the event log (%)
— F-Score: (2 * fitness * precision) / (fitness + precision)
— Generalization: behaviour producible by the model eventually recorded in the event log (%)
— Complexity: a set of metrics estimating simplicity and understandability of the process model
417th IEEE International Conference on Data MiningSplit Miner
Process Discovery Methods
Heuristics Miner
good F-score
complex
semantic errors
tuning complexity
517th IEEE International Conference on Data MiningSplit Miner
Inductive Miner
high fitness
no semantic errors
simple
low precision
Our Objectives
Balanced Fitness and Precision
High F-Score
High Generalization
Low Complexity
Low Execution Time
Tuning Simplicity
617th IEEE International Conference on Data MiningSplit Miner
From Event Log to Process Model in 5 Steps
Directly-Follows
Graph and
Loops Discovery
Filtering
Concurrency
Discovery
Splits
Discovery
Joins
Discovery
717th IEEE International Conference on Data MiningSplit Miner
Event
Log
Process
Model
Trace #obs
a » b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
817th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
Trace #obs
a » b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
917th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
1017th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)
1117th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
1217th IEEE International Conference on Data MiningSplit Miner
Break for Make-up!
1317th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)
1417th IEEE International Conference on Data MiningSplit Miner
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model
1517th IEEE International Conference on Data MiningSplit Miner
Done!
Evaluation Setup
— 12 publicly available real-life event logs
domains: finance, health care, law enforcement, government, information technology
recorded events: 6,660 to 561,470
recorded traces: 681 to 150,370
— 5 state-of-the-art process model discovery methods
— hyperparameter-optimization evaluation
— default-parameters evaluation
1617th IEEE International Conference on Data MiningSplit Miner
Hyperparameter-optimization Results
10 times out of 12 Split Miner (green points) pareto-dominates the other methods
1717th IEEE International Conference on Data MiningSplit Miner
Default-parameters Results
— 11 times Split Miner achieved the highest F-score
balancing fitness and precision
— 9 times Split Miner achieved an F-score greater than 0.80
high F-score
— 10 times Split Miner scored the highest generalization
high generalization
— Split Miner always discovered at least the second simplest process model
low complexity
— Split Miner was always 2 to 6 times faster than the second fastest
low execution time
— Split Miner has only 2 simple and well documented tuning parameters
tuning simplicity
1817th IEEE International Conference on Data MiningSplit Miner
Split Miner in Apromore
1917th IEEE International Conference on Data MiningSplit Miner
Future Work
— Improve the Filtering to achieve higher F-score
— Minimize the use of the OR-gateways
— Turn Split Miner into an Run-Time process discovery method
— Apply a-posteriori algorithms to improve model simplicity
2017th IEEE International Conference on Data MiningSplit Miner
Thanks for Listening!
Any Questions?
2117th IEEE International Conference on Data MiningSplit Miner
Extra Slides
2217th IEEE International Conference on Data MiningSplit Miner
Concurrency Discovery
b || c
1. There is an edge from b to c
2. There is an edge from c to b
3. b and c are not a short-loop
4. edges (b, c) and (c, b) have
similar frequencies
||a →b| − |b →a||
|a →b| + |b →a|
< ε
2317th IEEE International Conference on Data MiningSplit Miner
Concurrency Discovery
2417th IEEE International Conference on Data MiningSplit Miner
ε = 0.2
b || c
b || d
d || e
e || g
Filtering
— Retain the most frequent behaviour (high fitness)
— Retain the minimum number of edges (high precision, low complexity)
— Guarantee each node on a path from start to end (high fitness)
1. Create a set B
2. Add to B the most frequent incoming and outgoing edges of each node of the DFG
3. Evaluate the percentile-ƞ on the frequencies of the edges in B
4. Add to B the edges with greater frequency than the percentile-ƞ
5. Remove all the edges of the DFG
6. Starting from the most frequent edge e in B, add e to the DFG if:
the source of e does not have outgoing edges OR
the target of e does not have incoming edges.
2517th IEEE International Conference on Data MiningSplit Miner
Split Discovery
b || c
b || d
b || c, d
c || b
d || b
2617th IEEE International Conference on Data MiningSplit Miner
Split A successors: b, c, d
Split Discovery
split: A
b || c, d
c || b
d || b
Xor(c, d)
b || c, d
c, d || b
And(b, Xor(c,d))
b, c, d || —
2717th IEEE International Conference on Data MiningSplit Miner

Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs

  • 1.
    Split Miner: Discovering Accurateand Simple Business Process Models from Event Logs Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa 117th IEEE International Conference on Data MiningSplit Miner
  • 2.
    What is aBusiness Process? 217th IEEE International Conference on Data MiningSplit Miner Start event End event Parallel (AND) Gateway Exclusive (XOR) Gateway Split (AND) Gateway Join (XOR) Gateway
  • 3.
    Event Logs &Traces Case ID Activity Timestamp … 01 Enter Loan Application 2007-11-09 T 11:20:10 - 01 Compute Installments 2007-11-09 T 11:25:15 - 02 Enter Loan Application 2007-11-09 T 11:30:40 - 01 Retrieve Applicant Data 2007-11-09 T 11:40:50 - 02 Retrieve Applicant Data 2007-11-09 T 11:50:00 - 02 Compute Installments 2007-11-09 T 12:00:30 - 02 Notify Eligibility 2007-11-09 T 12:10:45 - 03 Enter Loan Application 2007-11-09 T 13:23:15 - 03 Compute Installments 2007-11-09 T 13:30:35 - 01 Notify Rejection 2007-11-09 T 14:45:00 - 02 Approve Simple Application 2007-11-09 T 15:00:30 - … … … … 317th IEEE International Conference on Data MiningSplit Miner
  • 4.
    Quality of DiscoveredProcess Models — Recall (a.k.a. Fitness): behaviour recorded in the event log reproducible by the model (%) — Precision: behaviour producible by the model observed in the event log (%) — F-Score: (2 * fitness * precision) / (fitness + precision) — Generalization: behaviour producible by the model eventually recorded in the event log (%) — Complexity: a set of metrics estimating simplicity and understandability of the process model 417th IEEE International Conference on Data MiningSplit Miner
  • 5.
    Process Discovery Methods HeuristicsMiner good F-score complex semantic errors tuning complexity 517th IEEE International Conference on Data MiningSplit Miner Inductive Miner high fitness no semantic errors simple low precision
  • 6.
    Our Objectives Balanced Fitnessand Precision High F-Score High Generalization Low Complexity Low Execution Time Tuning Simplicity 617th IEEE International Conference on Data MiningSplit Miner
  • 7.
    From Event Logto Process Model in 5 Steps Directly-Follows Graph and Loops Discovery Filtering Concurrency Discovery Splits Discovery Joins Discovery 717th IEEE International Conference on Data MiningSplit Miner Event Log Process Model
  • 8.
    Trace #obs a »b » c » g » e » h 10 a » b » c » f » g » h 10 a » b » d » g » e » h 10 a » b » d » e » g » h 10 a » b » e » c » g » h 10 a » b » e » d » g » h 10 a » c » b » e » g » h 10 a » c » b » f » g » h 10 a » d » b » e » g » h 10 a » d » b » f » g » h 10 817th IEEE International Conference on Data MiningSplit Miner Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  • 9.
    Trace #obs a »b » c » g » e » h 10 a » b » c » f » g » h 10 a » b » d » g » e » h 10 a » b » d » e » g » h 10 a » b » e » c » g » h 10 a » b » e » d » g » h 10 a » c » b » e » g » h 10 a » c » b » f » g » h 10 a » d » b » e » g » h 10 a » d » b » f » g » h 10 917th IEEE International Conference on Data MiningSplit Miner Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  • 10.
    1017th IEEE InternationalConference on Data MiningSplit Miner Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model (b || c) (b || d) (d || e) (e || g)
  • 11.
    1117th IEEE InternationalConference on Data MiningSplit Miner Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  • 12.
    1217th IEEE InternationalConference on Data MiningSplit Miner Break for Make-up!
  • 13.
    1317th IEEE InternationalConference on Data MiningSplit Miner Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model (b || c) (b || d) (d || e) (e || g)
  • 14.
    1417th IEEE InternationalConference on Data MiningSplit Miner Directly-Follows Graph and Loops Discovery Concurrency DiscoveryEvent Log Filtering Splits Discovery Joins Discovery Process Model
  • 15.
    1517th IEEE InternationalConference on Data MiningSplit Miner Done!
  • 16.
    Evaluation Setup — 12publicly available real-life event logs domains: finance, health care, law enforcement, government, information technology recorded events: 6,660 to 561,470 recorded traces: 681 to 150,370 — 5 state-of-the-art process model discovery methods — hyperparameter-optimization evaluation — default-parameters evaluation 1617th IEEE International Conference on Data MiningSplit Miner
  • 17.
    Hyperparameter-optimization Results 10 timesout of 12 Split Miner (green points) pareto-dominates the other methods 1717th IEEE International Conference on Data MiningSplit Miner
  • 18.
    Default-parameters Results — 11times Split Miner achieved the highest F-score balancing fitness and precision — 9 times Split Miner achieved an F-score greater than 0.80 high F-score — 10 times Split Miner scored the highest generalization high generalization — Split Miner always discovered at least the second simplest process model low complexity — Split Miner was always 2 to 6 times faster than the second fastest low execution time — Split Miner has only 2 simple and well documented tuning parameters tuning simplicity 1817th IEEE International Conference on Data MiningSplit Miner
  • 19.
    Split Miner inApromore 1917th IEEE International Conference on Data MiningSplit Miner
  • 20.
    Future Work — Improvethe Filtering to achieve higher F-score — Minimize the use of the OR-gateways — Turn Split Miner into an Run-Time process discovery method — Apply a-posteriori algorithms to improve model simplicity 2017th IEEE International Conference on Data MiningSplit Miner
  • 21.
    Thanks for Listening! AnyQuestions? 2117th IEEE International Conference on Data MiningSplit Miner
  • 22.
    Extra Slides 2217th IEEEInternational Conference on Data MiningSplit Miner
  • 23.
    Concurrency Discovery b ||c 1. There is an edge from b to c 2. There is an edge from c to b 3. b and c are not a short-loop 4. edges (b, c) and (c, b) have similar frequencies ||a →b| − |b →a|| |a →b| + |b →a| < ε 2317th IEEE International Conference on Data MiningSplit Miner
  • 24.
    Concurrency Discovery 2417th IEEEInternational Conference on Data MiningSplit Miner ε = 0.2 b || c b || d d || e e || g
  • 25.
    Filtering — Retain themost frequent behaviour (high fitness) — Retain the minimum number of edges (high precision, low complexity) — Guarantee each node on a path from start to end (high fitness) 1. Create a set B 2. Add to B the most frequent incoming and outgoing edges of each node of the DFG 3. Evaluate the percentile-ƞ on the frequencies of the edges in B 4. Add to B the edges with greater frequency than the percentile-ƞ 5. Remove all the edges of the DFG 6. Starting from the most frequent edge e in B, add e to the DFG if: the source of e does not have outgoing edges OR the target of e does not have incoming edges. 2517th IEEE International Conference on Data MiningSplit Miner
  • 26.
    Split Discovery b ||c b || d b || c, d c || b d || b 2617th IEEE International Conference on Data MiningSplit Miner Split A successors: b, c, d
  • 27.
    Split Discovery split: A b|| c, d c || b d || b Xor(c, d) b || c, d c, d || b And(b, Xor(c,d)) b, c, d || — 2717th IEEE International Conference on Data MiningSplit Miner