- 1. Complete and Interpretable Conformance Checking of Business Processes Luciano García-Bañuelos University of Tartu, Estonia Nick van Beest Data61 | CSIRO, Australia Marlon Dumas University of Tartu, Estonia Marcello La Rosa Queensland University of Technology, Australia Willem Mertens Queensland University of Technology, Australia
- 2. Conformance checking 1. Compliance auditing – detect deviations with respect to a normative model (unfitting behavior) 2. Model maintenance – unfitting behavior – additional model behavior 3. Automated process model discovery – Iterative model improvement
- 3. Given a process model M and an event log L, explain the differences between the process behavior observed in M and L
- 4. State of the art Current approaches: • Are designed to identify the number and exact location of the differences • Don’t provide a “high-level” diagnosis that easily allows analysts to pinpoint differences: – Are unable to identify differences across traces – Are unable to fully characterize extra model behavior not present in the log
- 5. An example Desired conformance output: • task C is optional in the log • the cycle including IGDF is not observed in the log Log traces: ABCDEH ACBDEH ABCDFH ACBDFH ABDEH ABDFH
- 6. Our approach A method for business process conformance checking that: 1. Identifies all differences between the behavior in the model and the behavior in the log 2. Describes each difference via a natural language statement
- 7. How does it work? Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences
- 8. How does it work? Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences
- 9. How does it work? Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences
- 10. How does it work? Difference statements Event log Input model PESM unfold PESL merge Partially Synchronized Product (PSP) compare extract differences
- 11. Prime event structure (PES) A Prime Event Structure (PES) is a graph of events, where each event e represents the occurrence of a task in the modeled system (e.g. a business process) As such, multiple occurrences of the same task are represented by different events Pairs of events in a PES can have one of the following binary relations: • Causality: event e is a prerequisite for e' • Conflict: e and e' cannot occur in the same execution • Concurrency: no order can be established between e and e'
- 12. From event log to PES Log: Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 e0:A e1:B e2:C e3:E f0:A f1:B f2:E g0:A g1:D g2:E t1, t2 → p1 t3 → p2 t4 → p3 PO runs: {e0,f0,g0}:A
- 13. From event log to PES Log: Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 PO runs: {e0,f0,g0}:A {e1,f1}:B {e2}:C e0:A e1:B e2:C e3:E f0:A f1:B f2:E g0:A g1:D g2:E t1, t2 → p1 t3 → p2 t4 → p3
- 14. From event log to PES Log: Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 PO runs: {e0,f0,g0}:A {e1,f1}:B {e2}:C {g1}:D e0:A e1:B e2:C e3:E f0:A f1:B f2:E g0:A g1:D g2:E t1, t2 → p1 t3 → p2 t4 → p3
- 15. From event log to PES Log: Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 PO runs: {e0,f0,g0}:A {e1,f1}:B {e2}:C {g1}:D e0:A e1:B e2:C e3:E f0:A f1:B f2:E g0:A g1:D g2:E t1, t2 → p1 t3 → p2 t4 → p3
- 16. From event log to PES Log: Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 PO runs: {e0,f0,g0}:A {e1,f1}:B {f2}:E {e3}:E {g2}:E {e2}:C {g1}:D e0:A e1:B e2:C e3:E f0:A f1:B f2:E g0:A g1:D g2:E t1, t2 → p1 t3 → p2 t4 → p3
- 17. From event log to PES Log: Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 PO runs: {e0,f0,g0}:A {e1,f1}:B {f2}:E {e3}:E {g2}:E {e2}:C {g1}:D e0:A e1:B e2:C e3:E f0:A f1:B f2:E g0:A g1:D g2:E t1, t2 → p1 t3 → p2 t4 → p3
- 18. From model to PES BPMN model Petri net
- 19. From model to PES Branching process
- 20. From model to PES Complete prefix unfolding Cutoff event Corresponding event Cutoff event Corresponding event
- 21. PES prefix unfolding Complete prefix unfolding PES prefix unfolding Cutoff eventCorresponding event Corresponding event Cutoff event
- 23. Comparing PESs Log PES EL Model PES prefix unfolding EM e0:A e1:B e2:C e3:D e4:E e5:E e6:E Trace Ref N A B C E t1 3 A C B E t2 2 A B E t3 2 A D E t4 3 A B D E C f0:A f1:B f2:C f3:D f4:E f5:E
- 24. lh = {}, rh = {} m = {} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E f0:A f1:B f2:C f3:D f4:E f5:E
- 25. lh = {}, rh = {} m = {(e0,f0)A} match A lh = {}, rh = {} m = {} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E f0:A f1:B f2:C f3:D f4:E f5:E
- 26. match B lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B} lh = {}, rh = {} m = {(e0,f0)A} match A lh = {}, rh = {} m = {} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E match Dmatch C f0:A f1:B f2:C f3:D f4:E f5:E
- 27. match B lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B} match C lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C} lh = {}, rh = {} m = {(e0,f0)A} match A lh = {}, rh = {} m = {} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E match Dmatch C f0:A f1:B f2:C f3:D f4:E f5:E
- 28. match B lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B} match C match E lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C} lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E} lh = {}, rh = {} m = {(e0,f0)A} match A lh = {}, rh = {} m = {} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E match Dmatch C f0:A f1:B f2:C f3:D f4:E f5:E
- 29. match B lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B} rhide Cmatch C lh = {}, rh = {f2:C} m = {(e0,f0)A,(e1,f1)B} lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C} lh = {}, rh = {} m = {(e0,f0)A} match A lh = {}, rh = {} m = {} match E lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E match Dmatch C f0:A f1:B f2:C f3:D f4:E f5:E
- 30. match B lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B} rhide Cmatch C match E match E lh = {}, rh = {f2:C} m = {(e0,f0)A,(e1,f1)B} lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C} lh = {}, rh = {} m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E} lh = {}, rh = {f2:C} m = {(e0,f0)A,(e1,f1)B,(e4,f4)E} lh = {}, rh = {} m = {(e0,f0)A} match A lh = {}, rh = {} m = {} Comparing PESs (cont’d) e0:A e1:B e2:C e3:D e4:E e5:E e6:E match Dmatch C In the log, C is optional after {A,B}, whereas in the model it is not (task skipping) f0:A f1:B f2:C f3:D f4:E f5:E
- 31. Elementary mismatch patterns Unfitting behavior patterns: • Relation mismatch patterns 1. Causality-Concurrency 2. Conflict • Event mismatch patterns 3. Task skipping 4. Task substitution 5. Unmatched repetition 6. Task relocation 7. Task insertion / absence Additional model behavior patterns: 8. Unobserved acyclic interval 9. Unobserved cyclic interval
- 32. Example: Causality / Concurrency
- 34. Unobserved cyclic interval: PES and PES prefix unfolding A B C D Log PES EL Model PES prefix unfolding EM
- 35. Pomsets (partially ordered multisets) • A pomset is a Directed Acyclic Graph where: – the nodes are configurations – the edges represent direct causality relations between configurations – an edge is labeled by an event • Unlike an event structure, a pomset does not have any conflict relation, since a pomset represents one possible execution • The behavior of a PES can be characterized by the set of pomsets it induces • In the case of a PES prefix, the set of induced pomsets is infinite when the PES prefix captures cyclic behavior via cc-pairs • We cannot enumerate all pomsets of a PES prefix to compare with the PES of the log • Therefore, we can extract a set of elementary pomsets (inspired by the notion of elementary paths), which collectively cover all the possible pomsets induced by a PES prefix • Cyclic behavior is not required to be unfolded infinitely
- 36. Unobserved cyclic interval: expanded prefix with elementary pomsets
- 37. Unobserved cyclic interval: creating a PSP using the expanded prefix Two unobserved elementary acyclic pomsets: • s3 [a5, a9] • s9 [a8, a9] Two unobserved elementary cyclic pomsets: • s5 [a3, a6, a4, a7] • s11 [a4, a7, a3, a6]
- 38. Verbalization of elementary mismatch patterns Change pattern Condition Verbalization Causality / Concurrency if e' < e else In the log, after σ, λ(e') occurs before λ(e), while in the model they are concurrent In the model, after σ, λ(f') occurs before λ(f), while in the log they are concurrent Conflict if e' || e else if f' || f else if e' < e else In the log, after σ, λ(e') and λ(e) are concurrent, while in the model they are mutually exclusive In the model, after σ, λ(f') and λ(f) are concurrent, while in the log they are mutually exclusive In the log, after σ, λ(e') occurs before task λ(e), while in the model they are mutually exclusive after σ In the model, after σ, λ(f') occurs before λ(f), while in the log they are mutually exclusive Task skipping if e ≠ ┴ else In the log, after σ, λ(e) is optional In the model, after σ, λ(f) is optional Task substitution In the log, after σ, λ(f) is substituted by λ(e) Unmatched repetition In the log, λ(e) is repeated after σ Task relocation if e ≠ ┴ else In the log, λ(e) occurs after σ instead of σ' In the model, λ(f) occurs after σ instead of σ'
- 39. Change pattern Condition Verbalization Task insertion / absence if e ≠ ┴ else In the log, λ(e) occurs after σ and before σ' In the model, λ(f) occurs after σ and before σ' Unobserved acyclic interval In the log, interval ... does not occur after σ Unobserved cyclic interval In the log, the cycle involving interval ... does not occur after σ Verbalization of elementary mismatch patterns
- 40. Implementation Standalone Java tool: ProConformance OSGi plugin for Apromore: Compare – Input: BPMN process model and a log (MXML or XES format). Also accepts: • Two BPMN models for model comparison and • Two logs for log delta analysis – Output: set of difference statements
- 41. Evaluation 1. Qualitative evaluation on real life process: – Traffic fines management process in Italy with 150,370 traces, 231 distinct traces 2. Quantitative evaluation on two large process model collections: – IBM Business Integration Unit (BIT): 735 models – SAP R/3: 604 models 3. User evaluation (academics vs practitioners)
- 42. Qualitative evaluation: traffic fines model Start Create Fine Payment Send Fine Insert Fine Notification Add Penalty Appeal to Judge Send for Credit Collection Notify Result Appeal to Offender Insert Date Appeal to Prefecture Receive Result Appeal from Prefecture Send Appeal to Prefecture End Tau10
- 43. Qualitative evaluation: trace alignment • Replay a Log on Petri Net for Conformance Analysis: 205 misalignments out of 231 alignments • Replay a Log on Petri Net for All Optimal Alignments: 406 misalignments out of 412 alignments
- 44. Qualitative evaluation: verbalization 15 distinct statements in total, e.g. 1. In the log, “Send for credit collection” occurs after “Payment” and before the end state 2. In the model, after “Insert fine notification”, “Add penalty” occurs before “Appeal to judge”, while in the log they are concurrent 3. In the log, after “Add penalty”, “Receive results appeal from prefecture” is substituted by “Appeal to judge” 4. In the log, the cycle involving “Insert date appeal to prefecture, Send appeal to prefecture, Receive result appeal from prefecture, Notify result appeal to offender” does not occur after “Insert fine notification”.
- 45. Qualitative evaluation: verbalization 2. In the model, after “Insert fine notification”, “Add penalty” occurs before “Appeal to judge”, while in the log they are concurrent 4. In the log, the cycle involving “Insert date appeal to prefecture, Send appeal to prefecture, Receive result appeal from prefecture, Notify result appeal to offender” does not occur after “Insert fine notification”. Cannot be detected by trace alignment, as diagnostics are provided at the level of individual traces Cannot be entirely detected by trace alignment, as this difference concerns additional model behavior, while alignment-based ETC conformance only detects escaping edges
- 46. Qualitative evaluation: summary Verbalization: • produces a more compact yet more understandable diagnosis • exposes behavioral differences that are difficult or impossible to identify using trace alignment
- 47. Quantitative evaluation • For each model, we generated an event log using the ProM plugin “Generate Event Log from Petri Net” • This plugin generates a distinct log trace for each possible execution sequence in the model • The tool was only able to parse 274 models from the BIT collection, and 438 models from the R/3 collection, running into out-of-memory exceptions for the remaining models • Total models: 712 sound Workflow nets
- 49. Quantitative evaluation: log size Total log size (events)
- 50. Quantitative evaluation: time performance 0 100 200 300 400 500 600 700 800 0 50 100 150 200 250 300 Logsize(#events) Time (ms) No noise 5% noise 10% noise 15% noise 20% noise 0 2000 4000 6000 8000 10000 12000 0 0.5 1 1.5 2 Logsize(#events) Time (s) No noise 5% noise 10% noise 15% noise 20% noise BIT SAP Trace alignmentVerbalization BIT SAP 0 100 200 300 400 500 600 700 800 0 50 100 150 200 250 300 Logsize(#events) Time (ms) No noise 5% noise 10% noise 15% noise 20% noise 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Logsize(#events) Time (s) No noise 5% noise 10% noise 15% noise 20% noise
- 52. Quantitative evaluation: summary • Verbalization, although generally slower than trace alignment, shows reasonable execution times (within 10s) • Extreme cases: (logs with over 8,000 events in distinct traces) and a high number of differences, the execution time is still below 2 minutes • Verbalization consistently produces a more compact difference diagnosis than trace alignment
- 53. User evaluation • Online survey: – a simple Petri net with 31 nodes (10 visible transitions), created from a real-life claims handling process model – assumed that this model was accompanied by a log with 53 traces • Output of the alignment method (misalignments + Petri net with alignment information) overlaid vs • Output of the verbalization method (list of statements)
- 54. User evaluation • Respondents compared both methods using the Technology Acceptance Model: 1. What is the easiest approach for checking the conformance of an event log to a process model? 2. What is the easiest approach for identifying the differences between a process model and an event log? 3. What is the most useful approach for checking the conformance of an event log to a process model? 4. What is the most useful approach for identifying the differences between a process model and an event log? 5. Which approach would you likely use for checking the conformance of an event log to a process model? 6. Which approach would you likely use for identifying the differences between a process model and an event log? • Seven point Likert-scale: “Strongly prefer Alignment” to “Strongly prefer Verbalization” • Background: academic vs professional • Experience in process modelling • Confidence in modelling with Petri nets
- 55. User evaluation: hypotheses H1: respondents would have a preference for verbalization H2: respondents with less experience, familiarity, confidence and competence in the use of Petri nets would have a stronger preference for verbalization
- 56. User evaluation: results • Academics (38 responses) – More familiar in working with Petri nets – More competent in working with Petri nets – Analysed and created more models in the past 12 months • Professionals (33 responses) – Less familiar with Petri nets – Mostly rely on professional training
- 57. User evaluation: results • H1: – Tested for the full sample and for the two cohorts separately – For the full sample there is no general preference for our method: the median was zero (“neutral”) – Professionals did show a preference for verbalization (especially along ease of use) while academics preferred alignment, so H1 is supported for the professionals cohort only • H2: – Respondents with more experience, familiarity, confidence and competence in working with Petri nets have a stronger preference for alignments – H2 is supported by the results
- 58. User evaluation: summary • Academics prefer alignment • Professionals prefer verbalization • Overall, people with less expertise in the use of Petri nets show a stronger preference for verbalization
- 59. Limitations of the approach • Input log is assumed to consist of sequences of event labels – timestamps are ignored – event payloads are ignored • Simplicity of the used concurrency oracle (a+), leading to occasional difficulties in the presence of – short loops – skipped and/or duplicated tasks • Lack of visual representation of differences (text only) • No option to use different levels of abstraction • No statistical support for differences: all equally important even if some may be very infrequent
- 60. Future work • Employing a more accurate concurrency oracle (e.g. local a)* • Group related statements to trade accuracy with greater interpretability • Add statistical support to statements • Visual representation of differences in addition to natural language statements (e.g. via “representative” runs) • Capturing non-control-flow deviance – Analysis of underlying data – Resources – Temporal aspects • Use differences as a basis for model repair *Armas-Cervantes, A., Dumas, M., & La Rosa, M. (2016) Discovering Local Concurrency Relations in Business Process Event Logs, https://eprints.qut.edu.au/97615
- 61. Thank you for your attention

- detect deviations in the process execution with respect to the behavior stipulated by a normative model Model maintenance: The first situation suggests that the model needs to be extended to capture the unfitting behavior The second situation suggests that there are paths in the model that have become spurious overtime, meaning they are no longer used and need to be pruned if they are found to have lost relevance Iterative improvement of the process model. First, an initial model is discovered using any discovery algorithm, then conformance checking is measured and based on this result the behavior of the model is refined to better align it to that of the log, e.g. it is restricted to remove paths in the model never observed, in order to avoid over-generalization, and extended to add paths observed in the log but not captured in the model
- Related work MEASURING UNFITTING BEHAVIOR Replay fitness and ICS extension: replay a trace on the Petri net, to detect missing tokens that need to be added to a place to replay the trace, and remaining tokens that remain in the Petri net once the trace has been fully replayed. ICS extension trades accuracy for performance Vanden Bourcke et al. proposes a SESE decomposition to improve performance and offer more localized feedback General limitation: error recovery is performed locally each time an error is detected, thus these methods may not identify the minimum number of differences (errors) to explain the unfitting behavior. This limitation is solved by: - trace alignment, which however still provides feedback at the level of individual traces, rather than behavioral relations between events. MEASURING ADDITIONAL MODEL BEHAVIOR Negative events (Negative Events Precision): additional “negative” events are added and if the model can reply any of these negative events when replaying a trace, then this is a case of extra model behavior. However this method is heuristic, and cannot guarantee that all extra behavior is detected. While there are improvements on performance (the method is exponential as it is exponentially large the number of possible negative events), these still do not guarantee 100% accuracy in detecting extra behavior Trace automata (ETC Conformance): a prefix automaton is built from the log, such that each state in the automaton corresponds to a given trace prefix. Then for each state, by replaying the prefix on the model, a corresponding state in the model is detected, and if the set of enabled transitions in this stage includes events that cannot be taken at that state in the log, then this is detected as an “escaping edge” (i.e. a sink) in the automaton, pinpointing the beginning of extra model behavior and the reply is continued. Two limitations: i) unable to handle duplicate labels and silent transitions, and ii) assumes fully-fitting log. These limitations are solved by the: Alignment-based ETC Conformance: first, an optimal alignment is calculated, including silent moves on model. Then the model-projection of this alignment is used to compute the automaton, after which traces are replayed to detect escaping edges. General limitation of trace automata: they cannot fully characterize the extra behavior in the model, but only pinpoint where this extra behavior takes place and with what task it starts.
- The first statement characterizes the behavior observed in the log but not in the model: in the model, task C is compulsory, while in the log C is skippable The second statement characterizes the behavior observed in the model but not in the log Trace alignment would produce two optimal alignments: One between ABDEH of the log and ABCDHE of the model, the other between ABDFH of the log and ABCDFH of the model. From this one can infer that task C is optional in the log (move on log only). 1) However the number of misaligned traces is often very large, rendering this inference quite hard in practice. Visualizations, e.g. on top of Petri net, and at an aggregate level, can help, but fundamentally the problem is that trace alignment provides feedback at the level of individual traces, not at the level of behavioral relations observed in the log but not captured in the model. 2) Moreover, trace alignment would detect that there is escaping behavior starting with “Request addition information” at a trace prefix finishing with “Notify rejection”, but it will not identify that the extra behavior includes tasks IG and that IGDF is behavior that can be repeated in the model but not in the log. For example, task “Assess application” can be repeated in the model but not in the log.
- Input model in BPMN and event log in XES or MXML BPMN converted to Petri nets using Remco, Chun and Marlon’s technique (Information and Software Technology)
- The Petri net is unfolded using McMillan’s complete prefix unfolding technique From the log, first a set of partially-ordered runs are extracted over a concurrency relation discovered from the log. Then these runs are prefix-merged into a Prime Event Structure (PES)
- The PSP is a representation of a synchronized traversal of two input PESs, such that when a discrepancy is detected it is explicitly recorded and the traversal resumes from a “suitable” configuration in each of the two PESs. If the discrepancy if of type “unfitting log behavior” this will be recorded by a node of the PSP, so we can enumerate all unfitting behavior. To expose additional model behavior, we define a notion of coverage of the PES extracted from the model by the PES extracted from the log. The parts of the model PES not covered by the log PES are the additional model behavior.
- Each discrepancy falls under one of a set of disjoint patterns. For each pattern, we have a verbalization of the difference.
- An occurrence of a task in the business process, as represented by the model or the log Note that conflict is “inherited” by causality. For example, if e # e’ and e’ <= e’’, then e # e’’ Nielsen et al, “Petri nets, event structures and domains, part I”, TCS, 1981
- To simplify, in an event structure we don’t show: Transitive causality relations, e.g. A is causal to E Hereditary conflict, e.g. B and C are in conflict with {g2}E Concurrency: every pair of events that is neither directly nor transitively causally related, nor in conflict of course, is concurrent. E.g. B and C are concurrent Every state of a PES (called “configuration”) is represented by a set of events. A configuration is: causally closed, meaning that for each event e in C, C includes all causal predecessors of e, and conflict-free, meaning that there cannot be any pair of events in conflict within C. e is extension of C if {e} U C is also a configuration. A maximal configuration is a configuration that is maximal w.r.t. set inclusion Lossless representation: the set of maximal configurations of the PES is equivalent to the set of runs inferred from the log, modulo the inaccuracy of the concurrency oracle. The complexity of building such an event structure from a log is cubic on the length of the longest trace. We merge events with the same label (e.g. e0 and f0) and which have same history (same prefix)
- The model is a variation of what we showed before, with C being optional and without loopback after F. Requirements: The Petri net must be safe The silent transitions will be eliminated when constructing the PES
- First, we construct the branching process by prefix-merging all the partially ordered runs induced by the Petri net. We do so because branching processes explicitly represent the same set of behavioral relations as event structures. Transitions in the branching process represent events in the event structure, and the behavioral relations in the branching process can thus be used to generate an event structure. However, the event structure that we create by using a set of inductive rules, includes silent transitions. It has been shown (Abel’s IS paper) that silent transitions can be abstracted away in a behavioral-preserving manner, under the well-known notion of visible-pomset equivalence (which requires that sink events in the event structure are not silent, and we can always add fake ones). In the example, the future of t2 and t4 (C) is isomorphic, which is unfolded in the branching process separately for both t2 and t4. Furthermore, the future of E and F is isomorphic. Therefore, we can safely stop unfolding the branching process once we reach t4, provided that we continue unfolding from t2 and onwards.
- McMillan showed that for a safe net, a prefix of a branching process that unfolds each loop once fully encodes the behavior of the original net. Such prefix is referred to as complete prefix unfolding of the net. We use Esparza’s optimization that has been shown to produce compact unfoldings The trick is to find transitions which have isomorphic futures. When we stop unfolding after t4, we call this the cutoff event. The event with the isomorphic future, t2, is called the corresponding event. The resulting cc-pair (t4,t2) (cutoff-corresponding pair) is grapically depicted here with the red line. Similar for E and F
- We call the PES derived from a complex prefix unfolding PES prefix unfolding or simply prefix PES This translates to a PES prefix unfolding as shown in this slide. Reasoning about possible executions of a PES prefix unfolding is not convenient because some configurations are not explicitly represented. To make it more convenient to explore the configurations of a PES prefix unfolding, we use the “shift” operation on net unfoldings. Given a cc-pair (t4, t2), since the futures of [t4] and [t2] are isomorphic, we can “shift” from one configuration to the other. In other words, the shift operation is a “step” function that allows us to move from one configuration to another. Note that in a PES we retain those tau transitions that are involved in a cc-pair and safely get rid of the rest. The extraction of a complete prefix unfolding from a Petri net is in the worst-case scenario is exponential on the size of the net.
- For loop relations, the advantage of the prefix unfolding and shift operations is clear, as it allows us to identify elementary pomsets (partially ordered multisets), so that elementary loops can be identified and such that the PES prefix is not required to unfold the cyclic behavior infinitely. This corresponds to unfolding every cycle so that it is traversed only once. We will talk about pomsets later, when illustrating how to identify additional (cyclic and acyclic) model behavior.
- Let’s take this log and model, and their corresponding PESs
- Event matching must be Label preserving (the two events must have the same label), and Order preserving (the two events must be consistent with the casual relation in the input PESs). For example, I cannot match two events enabled at a given synchronized state if one is casual to the current event while the other is concurrent, even if these have the same label
- When matching, traversal of enabled events at a given configuration is done in lexicographical order to guarantee determinism.
- we can also match C and D
- In the example, note that C is optional in the log and mandatory in the model ONLY after state {A,B} and not always, e.g. in the model after {A} I can execute D and thus skip C At each synchronized state, the set of enabled events in the two PESs is checked, to identify those that are label-preserving and order-preserving. Label preservation is a simple check, but order-preservation requires a backward traversal of the event structure to check that all causality relations are maintained between each of the enabled events and the configuration path in the event structure. If this is not the case, the algorithm returns the set of events that have a causality discrepancy, the cut-off events being traversed and the set of all causality relations that are violated. This is later used to characterize behavioral mismatches. The PSP contruction aims at finding the optimal matchings (i.e. maximum number of matchings, meaning minimum number of hide) for every maximal configuration of the log PES. Hence, priority is given to the log. A PSP with no hide operations identifies a situation where the log is fully fitting into the model. --- Complexity of PSP construction: we use an A* heuristic to find the optimal number of matches, so worst-case is O(3^(nPES1 x nPES2)) where nPESx is the number of configurations of PESx. 3 is the branching factor – avg number of successors per state (match, lhide and rhide). Indeed each configuration of PES1 is associated with a configuration of PES2 via 3 possible operations
- We only report on immediate causality (not transitive causality) and direct conflict (not inherited conflict) because we want to report each mismatch once: 1. Immediate causality vs concurrency 2. direct conflict vs concurrency direct conflict vs immediate causality Each mismatch occurs in a given context, i.e. a pair of configurations, one for each PES Relation mismatch patterns are O(n) where n is the number of arcs of the PSP (via optimizations of O(n^3)) --- Task absence / insertion is a “catch all” pattern, essentially saying that there is a task at a given configuration in the PES of the log but not in the corresponding configuration in the PES of the model
- The pattern observed is the lhide and rhide of the same label in two different event structures. This happens because each configuration is conflict-free, so two conflicting events cannot co-exist in the same configuration (in our example a2 and b2). Note that it’s not enough to check that we have an lhide and an rhide of the same label, we also need to make sure that the two events being hidden are not order-preserving w.r.t. the enabling state. In our example, a2 is concurrent with a1 while b2 is causally related to b1, with the enabling state being {(a0,b0)A, (a2,b1)B. Given that there can be operations that are commutative, these two hides do not necessarily need to be contiguous. For example here after matching A I can lhide C and then match B. Note that the algorithm for computing the PSP applies partial order reductions from the field of model checking to remove redundant paths which lead to the same maximal configuration due to the concurrent enablement of events.
- To characterize additional model behavior not observed in the log we define a notion of “coverage”: essentially every elementary path (no repetition) and elementary cycle (first and last event of the sequence being the same, and if I remove the last event the cycle becomes an elementary path) of the prefix PES must be covered by a maximal configuration of the log PES, after hide operations have been applied. If this is the case, then there isn’t any additional model behavior. Elementary paths and cycles are borrowed from Graph Theory. The nice thing is that in this way we do not detect a difference between the log and the model if in the log we observe a finite number of iterations of a cycle while in the model the cycle is clearly infinite. This is because we only need the elementary cycle in the prefix PES to be covered by the log PES. --- In the example, paths in the log PES are highlighted in the model PES We can see in this example that the log behavior is perfectly fitting the model, as each of the paths in the PES of the log exist in the PES of the model. However, the PES of the model has a loop with two entry points and two exit points, while the PES of the log doesn’t have any repetitive behavior (each entry point is in fact an exit point).
- In the case of a PES prefix, the set of induced pomsets is infinite when the PES prefix captures cyclic behavior via cc-pairs. Therefore, we cannot enumerate all pomsets of a PES prefix in order to check if each of them is observed in the PES of the log. The set of elementary pomsets collectively cover all the possible pomsets induced by a PES prefix, such that every cycle is traversed only once.
- e ≠ ┴ means that the event is defined in the log σ is the state where the behavioral discrepancy is observed --- Note: in the case of unmatched repetition in the log, the idea is very simple: if we have an lhide of a label that has already been observed in the history of the PSP, that means that we are against repeated log behavior that is not matched by the model.