Data-driven Process Discovery
Revealing Conditional Infrequent Behavior
from Event Logs
Felix Mannhardt, Massimiliano de Leoni,
Hajo A. Reijers, Wil M.P. van der Aalst
Process Discovery
PAGE 1 / 20
Three traces recorded for three process instances
The Noise Challenge
PAGE 2 / 20Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Springer (2016)
“Happy Path” Discovery
PAGE 3 / 20
Noise vs. potentially interesting infrequent behavior
PAGE 4 / 20
Infrequent behavior
What exactly is noise in event logs?
• Infrequent out-of-order events
• Recording errors
• Exceptional behavior
Random / No explanation
State of the Art – Based on Control-flow & Frequencies
PAGE 5 / 20
Inductive miner
Heuristics miner
Existing noise filtering techniques are based
on control-flow perspective!
Proposed Method: Data-aware Heuristic Miner
PAGE 6 / 20
Ⓐ priority = white
Ⓑ nurse = Alice
Ⓒ type = out
Ⓐ
Ⓑ
Ⓒ
Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action,
Second Edition. Springer (2016)
Overview: Data-aware Heuristic Miner (DHM)
PAGE 7 / 20
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional directly-follows relation
a ⇒C,L b
Conditional dependency measure
Dependency conditions (C)
C
Event log (L)
Causal Net
(1)
(4)
(3)
(2)(2)
Step (1) Dependency Conditions
PAGE 8 / 20
Binary classifiers that predict occurrence of directly-follows relations:
YES: Activity b directly-follows activity a
NO: Other activities (≠b) directly-follow a
Any classifiers can be employed! (we used C4.5)
Dependency conditions (C)
C
Event log (L)
(1)
Building Dependency Conditions – Example #1
PAGE 9 / 20
Relation: Register, END (Ⓐ)
Count Relation Class Priority Nurse …
1430 (Register, END) 1 white … …
39780 (Register, X-Ray) 0 green … …
49295 (Register, Check) 0 orange … …
9491 (Register, Visit) 0 red … …
Classifier: if priority = white then YES, otherwise NO
Building Dependency Conditions – Example #2
PAGE 10 / 20
Relation: X-Ray, Visit (Ⓑ)
Count Relation Class Priority Nurse …
20400 (X-Ray, Visit) 1 … Alice …
7923 (X-Ray, Final Visit) 0 … Peter …
(X-Ray, Check)
Detected as parallel activities
Classifier: if nurse = Alice then YES, otherwise NO
Step (2) Conditional Directly-follows Relation
PAGE 11 / 20
L
Dependency conditions
C
Event Log
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional
directly-follows relation
Relation: a followed by b under condition Ca,b
𝑋 > 𝐶 𝑋,𝑉,𝐿 𝑉 = 1
𝑉 > 𝐶 𝑋,𝑉,𝐿 𝑋 = 0
Step (3) – Conditional Dependency Measure
PAGE 12 / 20
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏
Conditional
directly-follows relation
𝑎 ⇒ 𝐶,𝐿 𝑏
Conditional
dependency measure
Adapting the Heuristics Miner for conditional directly-follows
𝑎 ⇒ 𝐶,𝐿 𝑏 =
𝑎 > 𝐶 𝑎,𝑏,𝐿
𝑏 − 𝑏 > 𝐶 𝑎,𝑏,𝐿
𝑎
𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 + 𝑏 > 𝐶 𝑎,𝑏,𝐿 𝑎 + 1
, for 𝑎 ≠ 𝑏
𝑎 > 𝐶 𝑎,𝑎,𝐿
𝑎
𝑎 > 𝐶 𝑎,𝑎,𝐿 𝑎 + 1
, otherwise
Step (4) – Discover a Causal Net with Conditional Dep.
PAGE 13 / 20
a ⇒C,L b
Conditional
dependency measure
Causal Net
Step 4.1: Build Unconditional Dependency Graph
• Observation Threshold (θobs)
• Dependency Threshold (θdep)
Step (4) – Discover a Causal Net with Conditional Dep. #2
PAGE 14 / 20
Causal Net
Step 4.2: Expand with Conditional Dependencies
• Dependency Threshold (θdep)
• Condition Threshold (θcon) [e.g., AUROC, Kappa, F-Score, ..]
Ⓐ
Ⓑ
Ⓒ
a ⇒C,L b
Conditional
dependency measure
Step (4) – Discover a Causal Net with Conditional Dep. #3
PAGE 15 / 20
Causal Net
Step 4.3: Connect Tasks
added
a ⇒C,L b
Conditional
dependency measure
Step (4) – Discover a Causal Net with Conditional Dep. #4
PAGE 16 / 20
Causal Net
Step 4.4: Build Causal Net as in the Heuristic Miner
• Binding Threshold (θbin)
a ⇒C,L b
Conditional
dependency measure
Evaluation – Can we rediscover conditions? (Synthetic)
PAGE 17 / 20
• Noise level 0.05 means that in 5% of the traces 1 event is out-of-order
• Compared three methods
• Heuristic Miner without filtering (HMA)
• Heuristic Miner with filtering (HMF)
• Data-aware Heuristic Miner (DHM)
• GED-based comparison since we want to evaluate at dependency level
Evaluation – Does it work in practice?
PAGE 18 / 20
Hospital Billing – Event Log (100,000 cases)
Conclusion & Future Work
PAGE 19 / 20
Implemented in ProM 6.7: Data-aware Heuristic Miner
• Data-first approach:
• Data attributes influence control-
flow discovery
• Conditional infrequent behavior
• Combines classification methods and
heuristic process discovery
• Validated on large real-life event logs
• Extend the idea to more complex patterns of behavior
• Long-term dependencies
• Duplicate activities
• Suggest suitable parameter settings / hyperparameter optimization
Questions
@fmannhardt - f.mannhardt@tue.nl - fmannhardt.de

Data-driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs

  • 1.
    Data-driven Process Discovery RevealingConditional Infrequent Behavior from Event Logs Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers, Wil M.P. van der Aalst
  • 2.
    Process Discovery PAGE 1/ 20 Three traces recorded for three process instances
  • 3.
    The Noise Challenge PAGE2 / 20Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Springer (2016)
  • 4.
  • 5.
    Noise vs. potentiallyinteresting infrequent behavior PAGE 4 / 20 Infrequent behavior What exactly is noise in event logs? • Infrequent out-of-order events • Recording errors • Exceptional behavior Random / No explanation
  • 6.
    State of theArt – Based on Control-flow & Frequencies PAGE 5 / 20 Inductive miner Heuristics miner Existing noise filtering techniques are based on control-flow perspective!
  • 7.
    Proposed Method: Data-awareHeuristic Miner PAGE 6 / 20 Ⓐ priority = white Ⓑ nurse = Alice Ⓒ type = out Ⓐ Ⓑ Ⓒ Source: van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Springer (2016)
  • 8.
    Overview: Data-aware HeuristicMiner (DHM) PAGE 7 / 20 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 Conditional directly-follows relation a ⇒C,L b Conditional dependency measure Dependency conditions (C) C Event log (L) Causal Net (1) (4) (3) (2)(2)
  • 9.
    Step (1) DependencyConditions PAGE 8 / 20 Binary classifiers that predict occurrence of directly-follows relations: YES: Activity b directly-follows activity a NO: Other activities (≠b) directly-follow a Any classifiers can be employed! (we used C4.5) Dependency conditions (C) C Event log (L) (1)
  • 10.
    Building Dependency Conditions– Example #1 PAGE 9 / 20 Relation: Register, END (Ⓐ) Count Relation Class Priority Nurse … 1430 (Register, END) 1 white … … 39780 (Register, X-Ray) 0 green … … 49295 (Register, Check) 0 orange … … 9491 (Register, Visit) 0 red … … Classifier: if priority = white then YES, otherwise NO
  • 11.
    Building Dependency Conditions– Example #2 PAGE 10 / 20 Relation: X-Ray, Visit (Ⓑ) Count Relation Class Priority Nurse … 20400 (X-Ray, Visit) 1 … Alice … 7923 (X-Ray, Final Visit) 0 … Peter … (X-Ray, Check) Detected as parallel activities Classifier: if nurse = Alice then YES, otherwise NO
  • 12.
    Step (2) ConditionalDirectly-follows Relation PAGE 11 / 20 L Dependency conditions C Event Log 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 Conditional directly-follows relation Relation: a followed by b under condition Ca,b 𝑋 > 𝐶 𝑋,𝑉,𝐿 𝑉 = 1 𝑉 > 𝐶 𝑋,𝑉,𝐿 𝑋 = 0
  • 13.
    Step (3) –Conditional Dependency Measure PAGE 12 / 20 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 Conditional directly-follows relation 𝑎 ⇒ 𝐶,𝐿 𝑏 Conditional dependency measure Adapting the Heuristics Miner for conditional directly-follows 𝑎 ⇒ 𝐶,𝐿 𝑏 = 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 − 𝑏 > 𝐶 𝑎,𝑏,𝐿 𝑎 𝑎 > 𝐶 𝑎,𝑏,𝐿 𝑏 + 𝑏 > 𝐶 𝑎,𝑏,𝐿 𝑎 + 1 , for 𝑎 ≠ 𝑏 𝑎 > 𝐶 𝑎,𝑎,𝐿 𝑎 𝑎 > 𝐶 𝑎,𝑎,𝐿 𝑎 + 1 , otherwise
  • 14.
    Step (4) –Discover a Causal Net with Conditional Dep. PAGE 13 / 20 a ⇒C,L b Conditional dependency measure Causal Net Step 4.1: Build Unconditional Dependency Graph • Observation Threshold (θobs) • Dependency Threshold (θdep)
  • 15.
    Step (4) –Discover a Causal Net with Conditional Dep. #2 PAGE 14 / 20 Causal Net Step 4.2: Expand with Conditional Dependencies • Dependency Threshold (θdep) • Condition Threshold (θcon) [e.g., AUROC, Kappa, F-Score, ..] Ⓐ Ⓑ Ⓒ a ⇒C,L b Conditional dependency measure
  • 16.
    Step (4) –Discover a Causal Net with Conditional Dep. #3 PAGE 15 / 20 Causal Net Step 4.3: Connect Tasks added a ⇒C,L b Conditional dependency measure
  • 17.
    Step (4) –Discover a Causal Net with Conditional Dep. #4 PAGE 16 / 20 Causal Net Step 4.4: Build Causal Net as in the Heuristic Miner • Binding Threshold (θbin) a ⇒C,L b Conditional dependency measure
  • 18.
    Evaluation – Canwe rediscover conditions? (Synthetic) PAGE 17 / 20 • Noise level 0.05 means that in 5% of the traces 1 event is out-of-order • Compared three methods • Heuristic Miner without filtering (HMA) • Heuristic Miner with filtering (HMF) • Data-aware Heuristic Miner (DHM) • GED-based comparison since we want to evaluate at dependency level
  • 19.
    Evaluation – Doesit work in practice? PAGE 18 / 20 Hospital Billing – Event Log (100,000 cases)
  • 20.
    Conclusion & FutureWork PAGE 19 / 20 Implemented in ProM 6.7: Data-aware Heuristic Miner • Data-first approach: • Data attributes influence control- flow discovery • Conditional infrequent behavior • Combines classification methods and heuristic process discovery • Validated on large real-life event logs • Extend the idea to more complex patterns of behavior • Long-term dependencies • Duplicate activities • Suggest suitable parameter settings / hyperparameter optimization
  • 21.

Editor's Notes

  • #2 PhD student in Eindhoven Co-authors: Massimiliano, Hajo, and Wil
  • #12 This is the wrong way around. Adapt example!!