Process discovery methods automatically infer process models from event logs. Often, event logs contain so-called noise, e.g., infrequent outliers or recording errors, which obscure the main behavior of the process. Existing methods filter this noise based on the frequency of event labels: infrequent paths and activities are excluded. However, infrequent behavior may reveal important insights into the process. Thus, not all infrequent behavior should be considered as noise. This paper proposes the Data-aware Heuristic Miner (DHM), a process discovery method that uses the data attributes to distinguish infrequent paths from random noise by using classification techniques. Data- and control-flow of the process are discovered together. We show that the DHM is, to some degree, robust against random noise and reveals data-driven decisions, which are filtered by other discovery methods. The DHM has been successfully tested on several real-life event logs, two of which we present in this paper.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Data-driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs
1. Data-driven Process Discovery
Revealing Conditional Infrequent Behavior
from Event Logs
Felix Mannhardt, Massimiliano de Leoni,
Hajo A. Reijers, Wil M.P. van der Aalst
5. Noise vs. potentially interesting infrequent behavior
PAGE 4 / 20
Infrequent behavior
What exactly is noise in event logs?
• Infrequent out-of-order events
• Recording errors
• Exceptional behavior
Random / No explanation
6. State of the Art – Based on Control-flow & Frequencies
PAGE 5 / 20
Inductive miner
Heuristics miner
Existing noise filtering techniques are based
on control-flow perspective!
7. Proposed Method: Data-aware Heuristic Miner
PAGE 6 / 20
Ⓐ priority = white
Ⓑ nurse = Alice
Ⓒ type = out
Ⓐ
Ⓑ
Ⓒ
Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action,
Second Edition. Springer (2016)
8. Overview: Data-aware Heuristic Miner (DHM)
PAGE 7 / 20
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional directly-follows relation
a ⇒C,L b
Conditional dependency measure
Dependency conditions (C)
C
Event log (L)
Causal Net
(1)
(4)
(3)
(2)(2)
9. Step (1) Dependency Conditions
PAGE 8 / 20
Binary classifiers that predict occurrence of directly-follows relations:
YES: Activity b directly-follows activity a
NO: Other activities (≠b) directly-follow a
Any classifiers can be employed! (we used C4.5)
Dependency conditions (C)
C
Event log (L)
(1)
10. Building Dependency Conditions – Example #1
PAGE 9 / 20
Relation: Register, END (Ⓐ)
Count Relation Class Priority Nurse …
1430 (Register, END) 1 white … …
39780 (Register, X-Ray) 0 green … …
49295 (Register, Check) 0 orange … …
9491 (Register, Visit) 0 red … …
Classifier: if priority = white then YES, otherwise NO
11. Building Dependency Conditions – Example #2
PAGE 10 / 20
Relation: X-Ray, Visit (Ⓑ)
Count Relation Class Priority Nurse …
20400 (X-Ray, Visit) 1 … Alice …
7923 (X-Ray, Final Visit) 0 … Peter …
(X-Ray, Check)
Detected as parallel activities
Classifier: if nurse = Alice then YES, otherwise NO
12. Step (2) Conditional Directly-follows Relation
PAGE 11 / 20
L
Dependency conditions
C
Event Log
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional
directly-follows relation
Relation: a followed by b under condition Ca,b
𝑋 > 𝐶 𝑋,𝑉,𝐿 𝑉 = 1
𝑉 > 𝐶 𝑋,𝑉,𝐿 𝑋 = 0
14. Step (4) – Discover a Causal Net with Conditional Dep.
PAGE 13 / 20
a ⇒C,L b
Conditional
dependency measure
Causal Net
Step 4.1: Build Unconditional Dependency Graph
• Observation Threshold (θobs)
• Dependency Threshold (θdep)
15. Step (4) – Discover a Causal Net with Conditional Dep. #2
PAGE 14 / 20
Causal Net
Step 4.2: Expand with Conditional Dependencies
• Dependency Threshold (θdep)
• Condition Threshold (θcon) [e.g., AUROC, Kappa, F-Score, ..]
Ⓐ
Ⓑ
Ⓒ
a ⇒C,L b
Conditional
dependency measure
16. Step (4) – Discover a Causal Net with Conditional Dep. #3
PAGE 15 / 20
Causal Net
Step 4.3: Connect Tasks
added
a ⇒C,L b
Conditional
dependency measure
17. Step (4) – Discover a Causal Net with Conditional Dep. #4
PAGE 16 / 20
Causal Net
Step 4.4: Build Causal Net as in the Heuristic Miner
• Binding Threshold (θbin)
a ⇒C,L b
Conditional
dependency measure
18. Evaluation – Can we rediscover conditions? (Synthetic)
PAGE 17 / 20
• Noise level 0.05 means that in 5% of the traces 1 event is out-of-order
• Compared three methods
• Heuristic Miner without filtering (HMA)
• Heuristic Miner with filtering (HMF)
• Data-aware Heuristic Miner (DHM)
• GED-based comparison since we want to evaluate at dependency level
19. Evaluation – Does it work in practice?
PAGE 18 / 20
Hospital Billing – Event Log (100,000 cases)
20. Conclusion & Future Work
PAGE 19 / 20
Implemented in ProM 6.7: Data-aware Heuristic Miner
• Data-first approach:
• Data attributes influence control-
flow discovery
• Conditional infrequent behavior
• Combines classification methods and
heuristic process discovery
• Validated on large real-life event logs
• Extend the idea to more complex patterns of behavior
• Long-term dependencies
• Duplicate activities
• Suggest suitable parameter settings / hyperparameter optimization