Successfully reported this slideshow.

Decision Mining Revisited - Discovering Overlapping Rules

1

Share

1 of 22
1 of 22

Decision Mining Revisited - Discovering Overlapping Rules

1

Share

Download to read offline

Description

Decision mining enriches process models with rules underlying decisions in processes using historical process execution data. Choices between multiple activities are specified through rules defined over process data. Existing decision mining methods focus on discovering mutually-exclusive rules, which only allow one out of multiple activities to be performed. These methods assume that decision making is fully deterministic, and all factors influencing decisions are recorded. In case the underlying decision rules are overlapping due to non-determinism or incomplete information, the rules returned by existing methods do not fit the recorded data well. This paper proposes a new technique to discover overlapping decision rules, which fit the recorded data better at the expense of precision, using decision tree learning techniques. An evaluation of the method on two real-life data sets confirms this trade off. Moreover, it shows that the method returns rules with better fitness and precision in under certain conditions.

Original paper: http://dx.doi.org/10.1007/978-3-319-39696-5_23
Presented at CAiSE'16

Transcript

  1. 1. Decision Mining Revisited Discovering Overlapping Rules Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers, Wil M.P. van der Aalst
  2. 2. Scope: Mining decision rules from event logs PAGE 1 Apply Amount Grant Extensive Check Reject Eligibility Simple Check Request Information Income Receive Information Category Activity Data
  3. 3. Control-flow – Petri net defines order & possible choices PAGE 2 Apply Grant Extensive Check Reject Simple Check Request Information Receive Information Exclusive Choice Sequence Exclusive Choice
  4. 4. Data-perspective – Data Petri Net modelling decisions PAGE 3 Decision point Data recording Decision rule
  5. 5. PAGE 4 DMN 1.1 released on 2016 Widely adopted by tool vendors, for example: U Eligibility Outcome 1 Yes Grant 2 No Reject Decision Table Grant Reject [Eligibility = No] [Eligibility = Yes] Comparing the Petri net notation to DMN Decision Rule / Guard
  6. 6. Why are overlapping rules needed? PAGE 5 Incomplete Information • Not recorded • Process context • Confidential • ... • Expert approval • Deferred choice • Randomized check • Inconsistent human behavior • ...
  7. 7. Goal: Discover rules which may overlap PAGE 6 Process Model Event Log Process Model with Overlapping Decision Rules Overlapping Rule Discovery
  8. 8. Decision point - Mutually-exclusive rule PAGE 7 Grant Reject [Eligibility = No] [Eligibility = Yes] Count Eligibility Outcome 5x “No” Reject 20x “Yes” Grant Observation instances from an event log Grant Reject
  9. 9. Decision point – Overlapping rule PAGE 8 C Rating Amount Activity 1 Good - Simple Check 2 Bad - Extensive Check 3 Bad Low Simple Check 4 Bad High Request Information 5 Unknown - Request Information Alternative Decision Table Notation
  10. 10. Proposed Discovery Method PAGE 9 Process Model Event Log Process Model With Overlapping Rules Overlapping Rule Discovery foreach Decision Point Collect Instances 1st Classification 2nd Classification Collect Misclassified Build Rules
  11. 11. 1) Collect Instances PAGE 10 Event Log collect Rating Amount Outcome 6x Good Low Simple 6x Good High Simple 6x Bad High Extensive 4x Bad High Request 6x Bad Low Extensive 4x Bad Low Simple 6x Unknown High Request Observation instances • Cyclic Behavior • Noise (Missing / Additional Events) • Unassigned values • Inconsistent recording Alignment-based method
  12. 12. 2) 1st Classification & 3) Misclassified Instances PAGE 11 Rating Amount Outcome 6x Good Low Simple 6x Good High Simple 6x Bad High Extensive 4x Bad High Request 6x Bad Low Extensive 4x Bad Low Simple 6x Unknown High Request Rating Simple RequestExtensive Good Unknown Bad 12 OK 12 OK 8 NOK 6 OK Instances Decision Tree
  13. 13. 4) 2nd Classification PAGE 12 Instances Amount Request Simple High Low 2nd Decision Tree Rating Amount Outcome 4x Bad High Request 4x Bad Low Simple
  14. 14. 5) Build Overlapping Decision Rules PAGE 13 Rating Simple RequestExtensive Good Unknown Bad Amount Request Simple High Low Compiled to overlapping rules If Rating = Good then Simple If Rating = Unknown then Request If Rating = Bad then Extensive If Rating = Bad AND Amount = High then Request If Rating = Bad AND Amount = Low then Simple
  15. 15. Resulting Data-aware Process Model PAGE 14
  16. 16. Trade-off: Precise and fitting model PAGE 15 Rating Amount Outcome 6x Good Low Simple 6x Good High Simple 6x Bad High Extensive 4x Bad High Request 6x Bad Low Extensive 4x Bad Low Simple 6x Unknown High Request Unfitting Imprecise [Underfitting] Good Trade-off
  17. 17. Evaluation – Measures PAGE 16 Precision Fitness How much unobserved behavior is modelled? How much observed behavior is modelled? Image source (CC BY-SA): https://en.wikipedia.org/wiki/Precision_and_recall#/media/File:Precisionrecall.svg
  18. 18. Evaluation – Setup PAGE 17 Method Description Expected Precision Expected Fitness WO Without rules Poor Good DTF Mutually-exclusive approach Good Poor DTT Naïve overlapping approach Poor Good DTO Presented overlapping approach Balanced Balanced Dataset # Traces # Events # Attributes # Decisions Road Fines 150,000 500,000 9 5 Hospital 1,000 15,000 39 11 Datasets Compared Methods
  19. 19. Evaluation – Example rules in the hospital data PAGE 18 Method Intensive Care Normal Care Skip DTO L > 0 ∧ H = true L > 0 L ≤ 0 ∨ (L > 0 ∧ H = false) DTT true L > 0 L ≤ 0 DTF false L > 0 L ≤ 0 Imprecise Unfitting Good trade-off
  20. 20. Evaluation – Precision & Fitness PAGE 19 Fitness Precision • Fitness  how often rules are violated • DTO improves fitness over DTF (mutually-exclusive) • Precision  how strict are the rules • DTO improves precision against WO • DTO does sacrifice precision vs. DTF
  21. 21. Conclusion & Future Work • Method: Discovery of overlapping rules using event logs • Based on decision tree induction • ProM framework: MultiPerspectiveExplorer http://www.promtools.org • Results: Trade-off fitness & precision • Improves the model fitness over standard trees • Improves the model precision over naïve approach • Future work • Better experimental validation • Manage the complexity of discovered rules • Imbalanced distributions PAGE 20
  22. 22. Questions? PAGE 21 @fmannhardt - f.mannhardt@tue.nl - http://promtools.org Multi-Perspective Explorer

Editor's Notes

  • I would like to present our work about “Decision Mining – Discovering Overlapping Rules”.
    My name is Felix Mannhardt, I’m a PhD student from the Eindhoven University of Technology.
    This is joint work with Massimiliano, Hajo and Wil.
  • First to scope our work, I would like to introduce some of the assumptions/notations underlying our work.
    We want to analyze decisions that took place in processes.
    We assume that processes can be represented by process models.
    Notation:
    Activities boxes
    Data rounded boxes
  • The control-flow of a process can be described with process
    A process model, such as a Petri net, defines the ordering and dependencies between activities
    We choose Petri net as notation to be independent from the actual process modelling language (such as BPMN or similar)
    For example: …
  • Next to the order and dependencies between activities: decisions are at the heart of processes
    For example,
    data is recorded during the execution of activities;
    exclusive-choice in the process are decision points;
    decision rules govern which activities can be executed
  • Decision point, exclusive choice between two activities
    Mutually-exclusive rule defined
  • - DMN decision table using the Collect hit policy
  • Public ‘Road Fines” dataset, IEEE taskforce
    Private hospital dataset
  • Simplified Model of the care-path at the hospital
    DTO get better scores for fitness and precision compared to the DTT
    Lactate level are related to admission,
  • Description

    Decision mining enriches process models with rules underlying decisions in processes using historical process execution data. Choices between multiple activities are specified through rules defined over process data. Existing decision mining methods focus on discovering mutually-exclusive rules, which only allow one out of multiple activities to be performed. These methods assume that decision making is fully deterministic, and all factors influencing decisions are recorded. In case the underlying decision rules are overlapping due to non-determinism or incomplete information, the rules returned by existing methods do not fit the recorded data well. This paper proposes a new technique to discover overlapping decision rules, which fit the recorded data better at the expense of precision, using decision tree learning techniques. An evaluation of the method on two real-life data sets confirms this trade off. Moreover, it shows that the method returns rules with better fitness and precision in under certain conditions.

    Original paper: http://dx.doi.org/10.1007/978-3-319-39696-5_23
    Presented at CAiSE'16

    Transcript

    1. 1. Decision Mining Revisited Discovering Overlapping Rules Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers, Wil M.P. van der Aalst
    2. 2. Scope: Mining decision rules from event logs PAGE 1 Apply Amount Grant Extensive Check Reject Eligibility Simple Check Request Information Income Receive Information Category Activity Data
    3. 3. Control-flow – Petri net defines order & possible choices PAGE 2 Apply Grant Extensive Check Reject Simple Check Request Information Receive Information Exclusive Choice Sequence Exclusive Choice
    4. 4. Data-perspective – Data Petri Net modelling decisions PAGE 3 Decision point Data recording Decision rule
    5. 5. PAGE 4 DMN 1.1 released on 2016 Widely adopted by tool vendors, for example: U Eligibility Outcome 1 Yes Grant 2 No Reject Decision Table Grant Reject [Eligibility = No] [Eligibility = Yes] Comparing the Petri net notation to DMN Decision Rule / Guard
    6. 6. Why are overlapping rules needed? PAGE 5 Incomplete Information • Not recorded • Process context • Confidential • ... • Expert approval • Deferred choice • Randomized check • Inconsistent human behavior • ...
    7. 7. Goal: Discover rules which may overlap PAGE 6 Process Model Event Log Process Model with Overlapping Decision Rules Overlapping Rule Discovery
    8. 8. Decision point - Mutually-exclusive rule PAGE 7 Grant Reject [Eligibility = No] [Eligibility = Yes] Count Eligibility Outcome 5x “No” Reject 20x “Yes” Grant Observation instances from an event log Grant Reject
    9. 9. Decision point – Overlapping rule PAGE 8 C Rating Amount Activity 1 Good - Simple Check 2 Bad - Extensive Check 3 Bad Low Simple Check 4 Bad High Request Information 5 Unknown - Request Information Alternative Decision Table Notation
    10. 10. Proposed Discovery Method PAGE 9 Process Model Event Log Process Model With Overlapping Rules Overlapping Rule Discovery foreach Decision Point Collect Instances 1st Classification 2nd Classification Collect Misclassified Build Rules
    11. 11. 1) Collect Instances PAGE 10 Event Log collect Rating Amount Outcome 6x Good Low Simple 6x Good High Simple 6x Bad High Extensive 4x Bad High Request 6x Bad Low Extensive 4x Bad Low Simple 6x Unknown High Request Observation instances • Cyclic Behavior • Noise (Missing / Additional Events) • Unassigned values • Inconsistent recording Alignment-based method
    12. 12. 2) 1st Classification & 3) Misclassified Instances PAGE 11 Rating Amount Outcome 6x Good Low Simple 6x Good High Simple 6x Bad High Extensive 4x Bad High Request 6x Bad Low Extensive 4x Bad Low Simple 6x Unknown High Request Rating Simple RequestExtensive Good Unknown Bad 12 OK 12 OK 8 NOK 6 OK Instances Decision Tree
    13. 13. 4) 2nd Classification PAGE 12 Instances Amount Request Simple High Low 2nd Decision Tree Rating Amount Outcome 4x Bad High Request 4x Bad Low Simple
    14. 14. 5) Build Overlapping Decision Rules PAGE 13 Rating Simple RequestExtensive Good Unknown Bad Amount Request Simple High Low Compiled to overlapping rules If Rating = Good then Simple If Rating = Unknown then Request If Rating = Bad then Extensive If Rating = Bad AND Amount = High then Request If Rating = Bad AND Amount = Low then Simple
    15. 15. Resulting Data-aware Process Model PAGE 14
    16. 16. Trade-off: Precise and fitting model PAGE 15 Rating Amount Outcome 6x Good Low Simple 6x Good High Simple 6x Bad High Extensive 4x Bad High Request 6x Bad Low Extensive 4x Bad Low Simple 6x Unknown High Request Unfitting Imprecise [Underfitting] Good Trade-off
    17. 17. Evaluation – Measures PAGE 16 Precision Fitness How much unobserved behavior is modelled? How much observed behavior is modelled? Image source (CC BY-SA): https://en.wikipedia.org/wiki/Precision_and_recall#/media/File:Precisionrecall.svg
    18. 18. Evaluation – Setup PAGE 17 Method Description Expected Precision Expected Fitness WO Without rules Poor Good DTF Mutually-exclusive approach Good Poor DTT Naïve overlapping approach Poor Good DTO Presented overlapping approach Balanced Balanced Dataset # Traces # Events # Attributes # Decisions Road Fines 150,000 500,000 9 5 Hospital 1,000 15,000 39 11 Datasets Compared Methods
    19. 19. Evaluation – Example rules in the hospital data PAGE 18 Method Intensive Care Normal Care Skip DTO L > 0 ∧ H = true L > 0 L ≤ 0 ∨ (L > 0 ∧ H = false) DTT true L > 0 L ≤ 0 DTF false L > 0 L ≤ 0 Imprecise Unfitting Good trade-off
    20. 20. Evaluation – Precision & Fitness PAGE 19 Fitness Precision • Fitness  how often rules are violated • DTO improves fitness over DTF (mutually-exclusive) • Precision  how strict are the rules • DTO improves precision against WO • DTO does sacrifice precision vs. DTF
    21. 21. Conclusion & Future Work • Method: Discovery of overlapping rules using event logs • Based on decision tree induction • ProM framework: MultiPerspectiveExplorer http://www.promtools.org • Results: Trade-off fitness & precision • Improves the model fitness over standard trees • Improves the model precision over naïve approach • Future work • Better experimental validation • Manage the complexity of discovered rules • Imbalanced distributions PAGE 20
    22. 22. Questions? PAGE 21 @fmannhardt - f.mannhardt@tue.nl - http://promtools.org Multi-Perspective Explorer

    Editor's Notes

  • I would like to present our work about “Decision Mining – Discovering Overlapping Rules”.
    My name is Felix Mannhardt, I’m a PhD student from the Eindhoven University of Technology.
    This is joint work with Massimiliano, Hajo and Wil.
  • First to scope our work, I would like to introduce some of the assumptions/notations underlying our work.
    We want to analyze decisions that took place in processes.
    We assume that processes can be represented by process models.
    Notation:
    Activities boxes
    Data rounded boxes
  • The control-flow of a process can be described with process
    A process model, such as a Petri net, defines the ordering and dependencies between activities
    We choose Petri net as notation to be independent from the actual process modelling language (such as BPMN or similar)
    For example: …
  • Next to the order and dependencies between activities: decisions are at the heart of processes
    For example,
    data is recorded during the execution of activities;
    exclusive-choice in the process are decision points;
    decision rules govern which activities can be executed
  • Decision point, exclusive choice between two activities
    Mutually-exclusive rule defined
  • - DMN decision table using the Collect hit policy
  • Public ‘Road Fines” dataset, IEEE taskforce
    Private hospital dataset
  • Simplified Model of the care-path at the hospital
    DTO get better scores for fitness and precision compared to the DTT
    Lactate level are related to admission,
  • More Related Content

    Related Books

    Free with a 30 day trial from Scribd

    See all

    Related Audiobooks

    Free with a 30 day trial from Scribd

    See all

    ×