Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data-driven Process Discovery
Revealing Conditional Infrequent Behavior
from Event Logs
Felix Mannhardt, Massimiliano de L...
Process Discovery
PAGE 1 / 20
Three traces recorded for three process instances
The Noise Challenge
PAGE 2 / 20Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Spr...
“Happy Path” Discovery
PAGE 3 / 20
Noise vs. potentially interesting infrequent behavior
PAGE 4 / 20
Infrequent behavior
What exactly is noise in event logs?...
State of the Art – Based on Control-flow & Frequencies
PAGE 5 / 20
Inductive miner
Heuristics miner
Existing noise filteri...
Proposed Method: Data-aware Heuristic Miner
PAGE 6 / 20
Ⓐ priority = white
Ⓑ nurse = Alice
Ⓒ type = out
Ⓐ
Ⓑ
Ⓒ
Source: van ...
Overview: Data-aware Heuristic Miner (DHM)
PAGE 7 / 20
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional directly-follows relation
a ⇒C,L b
Condit...
Step (1) Dependency Conditions
PAGE 8 / 20
Binary classifiers that predict occurrence of directly-follows relations:
YES: ...
Building Dependency Conditions – Example #1
PAGE 9 / 20
Relation: Register, END (Ⓐ)
Count Relation Class Priority Nurse …
...
Building Dependency Conditions – Example #2
PAGE 10 / 20
Relation: X-Ray, Visit (Ⓑ)
Count Relation Class Priority Nurse …
...
Step (2) Conditional Directly-follows Relation
PAGE 11 / 20
L
Dependency conditions
C
Event Log
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional
...
Step (3) – Conditional Dependency Measure
PAGE 12 / 20
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional
directly-follows relation
𝑎 ⇒ 𝐶,𝐿 𝑏
Condi...
Step (4) – Discover a Causal Net with Conditional Dep.
PAGE 13 / 20
a ⇒C,L b
Conditional
dependency measure
Causal Net
Ste...
Step (4) – Discover a Causal Net with Conditional Dep. #2
PAGE 14 / 20
Causal Net
Step 4.2: Expand with Conditional Depend...
Step (4) – Discover a Causal Net with Conditional Dep. #3
PAGE 15 / 20
Causal Net
Step 4.3: Connect Tasks
added
a ⇒C,L b
C...
Step (4) – Discover a Causal Net with Conditional Dep. #4
PAGE 16 / 20
Causal Net
Step 4.4: Build Causal Net as in the Heu...
Evaluation – Can we rediscover conditions? (Synthetic)
PAGE 17 / 20
• Noise level 0.05 means that in 5% of the traces 1 ev...
Evaluation – Does it work in practice?
PAGE 18 / 20
Hospital Billing – Event Log (100,000 cases)
Conclusion & Future Work
PAGE 19 / 20
Implemented in ProM 6.7: Data-aware Heuristic Miner
• Data-first approach:
• Data at...
Questions
@fmannhardt - f.mannhardt@tue.nl - fmannhardt.de
Upcoming SlideShare
Loading in …5
×

Data-driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs

498 views

Published on

Process discovery methods automatically infer process models from event logs. Often, event logs contain so-called noise, e.g., infrequent outliers or recording errors, which obscure the main behavior of the process. Existing methods filter this noise based on the frequency of event labels: infrequent paths and activities are excluded. However, infrequent behavior may reveal important insights into the process. Thus, not all infrequent behavior should be considered as noise. This paper proposes the Data-aware Heuristic Miner (DHM), a process discovery method that uses the data attributes to distinguish infrequent paths from random noise by using classification techniques. Data- and control-flow of the process are discovered together. We show that the DHM is, to some degree, robust against random noise and reveals data-driven decisions, which are filtered by other discovery methods. The DHM has been successfully tested on several real-life event logs, two of which we present in this paper.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Data-driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs

  1. 1. Data-driven Process Discovery Revealing Conditional Infrequent Behavior from Event Logs Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers, Wil M.P. van der Aalst
  2. 2. Process Discovery PAGE 1 / 20 Three traces recorded for three process instances
  3. 3. The Noise Challenge PAGE 2 / 20Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Springer (2016)
  4. 4. “Happy Path” Discovery PAGE 3 / 20
  5. 5. Noise vs. potentially interesting infrequent behavior PAGE 4 / 20 Infrequent behavior What exactly is noise in event logs? • Infrequent out-of-order events • Recording errors • Exceptional behavior Random / No explanation
  6. 6. State of the Art – Based on Control-flow & Frequencies PAGE 5 / 20 Inductive miner Heuristics miner Existing noise filtering techniques are based on control-flow perspective!
  7. 7. Proposed Method: Data-aware Heuristic Miner PAGE 6 / 20 Ⓐ priority = white Ⓑ nurse = Alice Ⓒ type = out Ⓐ Ⓑ Ⓒ Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Springer (2016)
  8. 8. Overview: Data-aware Heuristic Miner (DHM) PAGE 7 / 20 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 Conditional directly-follows relation a ⇒C,L b Conditional dependency measure Dependency conditions (C) C Event log (L) Causal Net (1) (4) (3) (2)(2)
  9. 9. Step (1) Dependency Conditions PAGE 8 / 20 Binary classifiers that predict occurrence of directly-follows relations: YES: Activity b directly-follows activity a NO: Other activities (≠b) directly-follow a Any classifiers can be employed! (we used C4.5) Dependency conditions (C) C Event log (L) (1)
  10. 10. Building Dependency Conditions – Example #1 PAGE 9 / 20 Relation: Register, END (Ⓐ) Count Relation Class Priority Nurse … 1430 (Register, END) 1 white … … 39780 (Register, X-Ray) 0 green … … 49295 (Register, Check) 0 orange … … 9491 (Register, Visit) 0 red … … Classifier: if priority = white then YES, otherwise NO
  11. 11. Building Dependency Conditions – Example #2 PAGE 10 / 20 Relation: X-Ray, Visit (Ⓑ) Count Relation Class Priority Nurse … 20400 (X-Ray, Visit) 1 … Alice … 7923 (X-Ray, Final Visit) 0 … Peter … (X-Ray, Check) Detected as parallel activities Classifier: if nurse = Alice then YES, otherwise NO
  12. 12. Step (2) Conditional Directly-follows Relation PAGE 11 / 20 L Dependency conditions C Event Log 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 Conditional directly-follows relation Relation: a followed by b under condition Ca,b 𝑋 > 𝐶 𝑋,𝑉,𝐿 𝑉 = 1 𝑉 > 𝐶 𝑋,𝑉,𝐿 𝑋 = 0
  13. 13. Step (3) – Conditional Dependency Measure PAGE 12 / 20 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 Conditional directly-follows relation 𝑎 ⇒ 𝐶,𝐿 𝑏 Conditional dependency measure Adapting the Heuristics Miner for conditional directly-follows 𝑎 ⇒ 𝐶,𝐿 𝑏 = 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 − 𝑏 > 𝐶 𝑎,𝑏,𝐿 𝑎 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 + 𝑏 > 𝐶 𝑎,𝑏,𝐿 𝑎 + 1 , for 𝑎 ≠ 𝑏 𝑎 > 𝐶 𝑎,𝑎,𝐿 𝑎 𝑎 > 𝐶 𝑎,𝑎,𝐿 𝑎 + 1 , otherwise
  14. 14. Step (4) – Discover a Causal Net with Conditional Dep. PAGE 13 / 20 a ⇒C,L b Conditional dependency measure Causal Net Step 4.1: Build Unconditional Dependency Graph • Observation Threshold (θobs) • Dependency Threshold (θdep)
  15. 15. Step (4) – Discover a Causal Net with Conditional Dep. #2 PAGE 14 / 20 Causal Net Step 4.2: Expand with Conditional Dependencies • Dependency Threshold (θdep) • Condition Threshold (θcon) [e.g., AUROC, Kappa, F-Score, ..] Ⓐ Ⓑ Ⓒ a ⇒C,L b Conditional dependency measure
  16. 16. Step (4) – Discover a Causal Net with Conditional Dep. #3 PAGE 15 / 20 Causal Net Step 4.3: Connect Tasks added a ⇒C,L b Conditional dependency measure
  17. 17. Step (4) – Discover a Causal Net with Conditional Dep. #4 PAGE 16 / 20 Causal Net Step 4.4: Build Causal Net as in the Heuristic Miner • Binding Threshold (θbin) a ⇒C,L b Conditional dependency measure
  18. 18. Evaluation – Can we rediscover conditions? (Synthetic) PAGE 17 / 20 • Noise level 0.05 means that in 5% of the traces 1 event is out-of-order • Compared three methods • Heuristic Miner without filtering (HMA) • Heuristic Miner with filtering (HMF) • Data-aware Heuristic Miner (DHM) • GED-based comparison since we want to evaluate at dependency level
  19. 19. Evaluation – Does it work in practice? PAGE 18 / 20 Hospital Billing – Event Log (100,000 cases)
  20. 20. Conclusion & Future Work PAGE 19 / 20 Implemented in ProM 6.7: Data-aware Heuristic Miner • Data-first approach: • Data attributes influence control- flow discovery • Conditional infrequent behavior • Combines classification methods and heuristic process discovery • Validated on large real-life event logs • Extend the idea to more complex patterns of behavior • Long-term dependencies • Duplicate activities • Suggest suitable parameter settings / hyperparameter optimization
  21. 21. Questions @fmannhardt - f.mannhardt@tue.nl - fmannhardt.de

×