Keynote speech at the Belgian Process Mining Research Day 2021. I discuss the open, critical challenge of data preparation in process mining, considering the case where the original event data are implicitly stored in (legacy) relational databases. This case covers the common situation where event data are stored inside the data layer of an ERP or CRM system. This is usually handled using manual, ad-hoc, error-prone ETL procedures. I propose instead to adopt a pipeline based on semantic technologies, in particular the framework of ontology-based data access (also known as virtual knowledge graph). The approach is code-less, and relies on three main conceptual steps: (1) the creation of a data model capturing the relevant classes, attributes, and associations in the domain of interest (2) the definition of declarative mappings from the source database to the data model, following the ontology-based data access paradigm (3) the annotation of the data model with indications on which classes/associations/attributes provide the relevant notions of case, events, event attributes, and event-to-case relation. Once this is done, the framework automatically extracts the event log from the legacy data. This makes extremely smooth to generate logs by taking multiple perspectives on the same reality. The approach has been operationalized in the onprom tool, which employs semantic web standard languages for the various steps, and the XES standard as the target format for the event logs.
Complex Event Processing (CEP) involves detecting patterns in streams of event data. CEP tools analyze multiple simple events to identify complex events inferred from simpler ones. Typical applications of CEP include monitoring for business anomalies, detecting fraud or security threats. CEP augments service-oriented architectures by allowing services to trigger from events and generate new event streams. Event processing engines use techniques like filtering, windows, and correlation to detect patterns across events over time.
Processing Patterns for Predictive BusinessTim Bass
The document discusses processing patterns for predictive business event processing. It introduces an event-decision architecture with multiple levels of inference including event refinement, situation refinement, impact assessment, and process refinement. Common inference algorithms like Bayesian belief networks are described. Finally, recurring patterns are mapped that apply classical inference techniques like Bayesian networks to business contexts like fraud detection, intrusion detection, and opportunistic trading.
The “Predictive” Battlespace: Leveraging the Power of Event-Driven Architect...Nathaniel Palmer
The document discusses predictive analytics and complex event processing in network-centric warfare and logistics. It describes a notional predictive battlespace environment with multiple levels of event processing, refinement, and impact assessment to detect threats and opportunities. Examples are given of using different sensor data like imagery and log files as events that are cleaned, normalized and analyzed using rules, models and machine learning to identify situations and enable predictive inference.
This document presents a data leakage detection system developed by students at G.H Raisoni College of Engineering and Technology. The system was designed to detect when sensitive data distributed to agents has been leaked by calculating the probability of leakage for each agent based on the number of downloads. The system architecture uses encryption to share data securely between distributors and registered agents. The system provides advantages such as privileged access control and ability to identify agents likely to have leaked data. Future work may focus on supporting different file types and improved verification of agent identities.
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Tim Bass
Complex Event Processing (CEP) for Next-Generation Security Event Management, Fraud and Intrusion Detection , April 17, 2007 (First Draft), London, Tim Bass, CISSP, Director, Principal Global Architect
Emerging Technologies Group
The document provides an overview of the topics covered in a systems analysis and design course, including software used, information system components, analyzing the business case, managing projects, requirements modeling, data modeling, object modeling, development strategies, output and interface design, data design, and system architecture. Key concepts discussed include SWOT analysis, business cases, feasibility studies, project management techniques, UML, data flow diagrams, use cases, object-oriented analysis, cost-benefit analysis methods, user interface design, data structure, normalization, and entity relationship diagrams.
CEP: Event-Decision Architecture for PredictiveBusiness, July 2006Tim Bass
CEP: Event-Decision Architecture for PredictiveBusiness, Centre for Strategic Infocomm Technologies (CSIT), Singapore July 26, 2006, Tim Bass, CISSP, Principal Global Architect, Director, TIBCO Software Inc.
Complex Event Processing (CEP) involves detecting patterns in streams of event data. CEP tools analyze multiple simple events to identify complex events inferred from simpler ones. Typical applications of CEP include monitoring for business anomalies, detecting fraud or security threats. CEP augments service-oriented architectures by allowing services to trigger from events and generate new event streams. Event processing engines use techniques like filtering, windows, and correlation to detect patterns across events over time.
Processing Patterns for Predictive BusinessTim Bass
The document discusses processing patterns for predictive business event processing. It introduces an event-decision architecture with multiple levels of inference including event refinement, situation refinement, impact assessment, and process refinement. Common inference algorithms like Bayesian belief networks are described. Finally, recurring patterns are mapped that apply classical inference techniques like Bayesian networks to business contexts like fraud detection, intrusion detection, and opportunistic trading.
The “Predictive” Battlespace: Leveraging the Power of Event-Driven Architect...Nathaniel Palmer
The document discusses predictive analytics and complex event processing in network-centric warfare and logistics. It describes a notional predictive battlespace environment with multiple levels of event processing, refinement, and impact assessment to detect threats and opportunities. Examples are given of using different sensor data like imagery and log files as events that are cleaned, normalized and analyzed using rules, models and machine learning to identify situations and enable predictive inference.
This document presents a data leakage detection system developed by students at G.H Raisoni College of Engineering and Technology. The system was designed to detect when sensitive data distributed to agents has been leaked by calculating the probability of leakage for each agent based on the number of downloads. The system architecture uses encryption to share data securely between distributors and registered agents. The system provides advantages such as privileged access control and ability to identify agents likely to have leaked data. Future work may focus on supporting different file types and improved verification of agent identities.
Complex Event Processing (CEP) for Next-Generation Security Event Management,...Tim Bass
Complex Event Processing (CEP) for Next-Generation Security Event Management, Fraud and Intrusion Detection , April 17, 2007 (First Draft), London, Tim Bass, CISSP, Director, Principal Global Architect
Emerging Technologies Group
The document provides an overview of the topics covered in a systems analysis and design course, including software used, information system components, analyzing the business case, managing projects, requirements modeling, data modeling, object modeling, development strategies, output and interface design, data design, and system architecture. Key concepts discussed include SWOT analysis, business cases, feasibility studies, project management techniques, UML, data flow diagrams, use cases, object-oriented analysis, cost-benefit analysis methods, user interface design, data structure, normalization, and entity relationship diagrams.
CEP: Event-Decision Architecture for PredictiveBusiness, July 2006Tim Bass
CEP: Event-Decision Architecture for PredictiveBusiness, Centre for Strategic Infocomm Technologies (CSIT), Singapore July 26, 2006, Tim Bass, CISSP, Principal Global Architect, Director, TIBCO Software Inc.
This document discusses methodologies for developing multi-agent systems. It outlines several methodologies including AAII, Gaia, Tropos, DESIRE, AUML and Prometheus. For each methodology, it describes the key concepts and modeling approaches. It also discusses agent frameworks like FIPA and JADE. Finally, it outlines some potential pitfalls to avoid when developing multi-agent systems such as overselling agents, getting too religious about a particular methodology, and forgetting that you are developing software.
Workshop on requirements and modeling at HAE 2015Olivier Béghain
1. The document discusses requirements modeling in IT projects and how modeling techniques can be used to better understand requirements and be understood.
2. It provides background on the presenter and an overview of various modeling techniques such as use case diagrams, class diagrams, and collaboration diagrams that can be used.
3. The document then walks through an example of modeling the requirements for a deal initiation system using techniques like identifying entities, controllers, boundaries, use cases, and class responsibilities to develop a conceptual model of the system requirements.
This document provides an overview of functional modeling and the Object Modeling Technique (OMT) methodology for object-oriented systems analysis and design. It describes functional modeling using data flow diagrams and the key components of OMT, including the analysis, design, and implementation phases. The analysis phase involves creating object, dynamic, and functional models to specify the system. Comparisons are made between OMT and other methodologies like Structured Analysis/Design and Jackson Structured Design.
This document discusses functional modeling and data flow diagrams (DFDs) for object-oriented systems. It provides an overview of the following topics:
1. Functional modeling uses DFDs to represent how input and output values are derived in a program through processes and data flows.
2. DFDs graphically show the flow of data through a system using processes, data stores, flows, and external entities. They can be layered with increasing specificity.
3. Specifying operations for a functional model involves identifying inputs/outputs, building the DFD, describing functions, identifying constraints, and specifying optimization criteria.
4. The document also briefly introduces the Object Modeling Technique (OMT) methodology
The document discusses various techniques for modeling software requirements including:
1) Entity-relationship diagrams (ERDs) which model data objects and their relationships to understand the data domain.
2) Use case modeling which describes scenarios of how external actors will use the system through use cases and diagrams.
3) Flow-oriented modeling using data flow diagrams (DFDs) which represent how data objects are transformed as they move through the system.
The document discusses various techniques for modeling software requirements including:
1) Entity-relationship diagrams (ERDs) which model data objects and their relationships to understand the data domain.
2) Use case modeling which describes scenarios of how external actors will use the system through use cases and diagrams.
3) Object-oriented modeling which defines classes, objects, attributes, methods, encapsulation, and inheritance.
4) Flow modeling using data flow diagrams (DFDs) which represent how data objects flow through the system as they are transformed.
This document discusses database tampering detection and analysis. It covers the following key points in 3 sentences:
The document introduces database forensics as examining databases and metadata to determine who, when, and how data was modified or tampered. It describes Oracle's physical storage structures including datafiles, redo logs, and control files that can provide evidence. Detection methods discussed include checksums, hash trees, and audit log validation to identify if the database has been tampered with by comparing current values with past values.
Need for System Analysis
Stages in System Analysis
Structured SAD and tools :
DFD
Context Diagram
Decision Table
Structured Diagram.
System Development Models:
Water Flow
Prototype
Spiral
RAD
Roles and responsibilities of
System Analyst,
Database Administrator
Database Designer
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed real-time database.
In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS).
Building upon this, I explain how to build common business functionality by stepping through the patterns for: – Scalable payment processing – Run it on rails: Instrumentation and monitoring – Control flow patterns Finally, all of these concepts are combined in a solution architecture that can be used at an enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
1) The document discusses the art of building event streaming applications using various techniques like bounded contexts, stream processors, and architectural pillars.
2) Key aspects include modeling the application as a collection of loosely coupled bounded contexts, handling state using Kafka Streams, and building reusable stream processing patterns for instrumentation.
3) Composition patterns involve choreographing and orchestrating interactions between bounded contexts to capture business workflows and functions as event-driven data flows.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Process Mining and Predictive Process MonitoringMarlon Dumas
This document discusses process mining and predictive process monitoring. It begins with an overview of offline process mining techniques like process discovery, conformance checking, and deviance mining. It then discusses applying these techniques online for predictive process monitoring, including predicting outcomes, deviations, or failures. Various techniques are presented like nearest neighbor classification of partial traces and clustering traces before classification. The goal is to accurately predict outcomes during process execution based on control flow, data attributes, and textual case data.
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptxAmr E. Mohamed
The document discusses process modeling and data flow diagrams (DFDs). It begins by defining a system as consisting of inputs, outputs, and a process within defined boundaries. Logical and physical models are then described, with logical models showing what a system does independent of implementation and physical models including implementation details. The remainder of the document provides details on:
- Creating DFDs using processes, external entities, data stores, and data flows
- Developing context and level-0 diagrams
- Decomposing processes through functional decomposition and creating level-N diagrams
- Ensuring DFDs are complete and consistent
This document provides information on object oriented analysis and use case modeling. It discusses identifying objects and their relationships, defining object operations and attributes, and modeling system functionality through use cases. Use cases describe interactions between actors and the system, including typical workflows, alternative scenarios, and pre- and post-conditions. Use case diagrams visually represent the relationships between actors and use cases.
The art of the event streaming application: streams, stream processors and sc...confluent
The document discusses event streaming applications and microservices. It introduces event streaming as an architectural style where applications are composed of loosely coupled services that communicate asynchronously through streams of events. Key aspects covered include handling state using event streams and Kafka Streams, building applications as bounded contexts with choreography and orchestration, and establishing pillars for instrumentation, control and operations. Overall the document promotes event streaming as a paradigm that addresses complexity by providing simplicity and scalability through convergent data and logic processing.
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed realtime database.
In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS). Building upon this, I explain how to build common business functionality by stepping through patterns for Scalable payment processing Run it on rails: Instrumentation and monitoring Control flow patterns (start, stop, pause) Finally, all of these concepts are combined in a solution architecture that can be used at enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.
How can the concepts of event-driven linked with the concepts of serivce-oriented architectures. and what is the added value of such a combination?
What do events mean in the context of Business Process Management (BPM) and Business Activity Monitoring (BAM), and how can such architectures/solutions be enhanced with the concepts of Complex Event Processing?
The document discusses challenges with modeling processes that involve multiple interacting objects. Conventional process modeling approaches encourage separating objects and focusing on one object type per process, which can lead to issues when objects interact. The document proposes modeling objects as first-class citizens and capturing relationships between objects to better represent real-world processes where objects corelate and influence each other. It provides examples of how conventional case-centric modeling can struggle to accurately capture a hiring process that involves interacting candidate, application, job offer and other objects.
Slides of our BPM 2022 paper on "Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting", which received the best paper award at the conference. Paper available here: https://link.springer.com/chapter/10.1007/978-3-031-16103-2_22
This document discusses methodologies for developing multi-agent systems. It outlines several methodologies including AAII, Gaia, Tropos, DESIRE, AUML and Prometheus. For each methodology, it describes the key concepts and modeling approaches. It also discusses agent frameworks like FIPA and JADE. Finally, it outlines some potential pitfalls to avoid when developing multi-agent systems such as overselling agents, getting too religious about a particular methodology, and forgetting that you are developing software.
Workshop on requirements and modeling at HAE 2015Olivier Béghain
1. The document discusses requirements modeling in IT projects and how modeling techniques can be used to better understand requirements and be understood.
2. It provides background on the presenter and an overview of various modeling techniques such as use case diagrams, class diagrams, and collaboration diagrams that can be used.
3. The document then walks through an example of modeling the requirements for a deal initiation system using techniques like identifying entities, controllers, boundaries, use cases, and class responsibilities to develop a conceptual model of the system requirements.
This document provides an overview of functional modeling and the Object Modeling Technique (OMT) methodology for object-oriented systems analysis and design. It describes functional modeling using data flow diagrams and the key components of OMT, including the analysis, design, and implementation phases. The analysis phase involves creating object, dynamic, and functional models to specify the system. Comparisons are made between OMT and other methodologies like Structured Analysis/Design and Jackson Structured Design.
This document discusses functional modeling and data flow diagrams (DFDs) for object-oriented systems. It provides an overview of the following topics:
1. Functional modeling uses DFDs to represent how input and output values are derived in a program through processes and data flows.
2. DFDs graphically show the flow of data through a system using processes, data stores, flows, and external entities. They can be layered with increasing specificity.
3. Specifying operations for a functional model involves identifying inputs/outputs, building the DFD, describing functions, identifying constraints, and specifying optimization criteria.
4. The document also briefly introduces the Object Modeling Technique (OMT) methodology
The document discusses various techniques for modeling software requirements including:
1) Entity-relationship diagrams (ERDs) which model data objects and their relationships to understand the data domain.
2) Use case modeling which describes scenarios of how external actors will use the system through use cases and diagrams.
3) Flow-oriented modeling using data flow diagrams (DFDs) which represent how data objects are transformed as they move through the system.
The document discusses various techniques for modeling software requirements including:
1) Entity-relationship diagrams (ERDs) which model data objects and their relationships to understand the data domain.
2) Use case modeling which describes scenarios of how external actors will use the system through use cases and diagrams.
3) Object-oriented modeling which defines classes, objects, attributes, methods, encapsulation, and inheritance.
4) Flow modeling using data flow diagrams (DFDs) which represent how data objects flow through the system as they are transformed.
This document discusses database tampering detection and analysis. It covers the following key points in 3 sentences:
The document introduces database forensics as examining databases and metadata to determine who, when, and how data was modified or tampered. It describes Oracle's physical storage structures including datafiles, redo logs, and control files that can provide evidence. Detection methods discussed include checksums, hash trees, and audit log validation to identify if the database has been tampered with by comparing current values with past values.
Need for System Analysis
Stages in System Analysis
Structured SAD and tools :
DFD
Context Diagram
Decision Table
Structured Diagram.
System Development Models:
Water Flow
Prototype
Spiral
RAD
Roles and responsibilities of
System Analyst,
Database Administrator
Database Designer
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed real-time database.
In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS).
Building upon this, I explain how to build common business functionality by stepping through the patterns for: – Scalable payment processing – Run it on rails: Instrumentation and monitoring – Control flow patterns Finally, all of these concepts are combined in a solution architecture that can be used at an enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
1) The document discusses the art of building event streaming applications using various techniques like bounded contexts, stream processors, and architectural pillars.
2) Key aspects include modeling the application as a collection of loosely coupled bounded contexts, handling state using Kafka Streams, and building reusable stream processing patterns for instrumentation.
3) Composition patterns involve choreographing and orchestrating interactions between bounded contexts to capture business workflows and functions as event-driven data flows.
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Process Mining and Predictive Process MonitoringMarlon Dumas
This document discusses process mining and predictive process monitoring. It begins with an overview of offline process mining techniques like process discovery, conformance checking, and deviance mining. It then discusses applying these techniques online for predictive process monitoring, including predicting outcomes, deviations, or failures. Various techniques are presented like nearest neighbor classification of partial traces and clustering traces before classification. The goal is to accurately predict outcomes during process execution based on control flow, data attributes, and textual case data.
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptxAmr E. Mohamed
The document discusses process modeling and data flow diagrams (DFDs). It begins by defining a system as consisting of inputs, outputs, and a process within defined boundaries. Logical and physical models are then described, with logical models showing what a system does independent of implementation and physical models including implementation details. The remainder of the document provides details on:
- Creating DFDs using processes, external entities, data stores, and data flows
- Developing context and level-0 diagrams
- Decomposing processes through functional decomposition and creating level-N diagrams
- Ensuring DFDs are complete and consistent
This document provides information on object oriented analysis and use case modeling. It discusses identifying objects and their relationships, defining object operations and attributes, and modeling system functionality through use cases. Use cases describe interactions between actors and the system, including typical workflows, alternative scenarios, and pre- and post-conditions. Use case diagrams visually represent the relationships between actors and use cases.
The art of the event streaming application: streams, stream processors and sc...confluent
The document discusses event streaming applications and microservices. It introduces event streaming as an architectural style where applications are composed of loosely coupled services that communicate asynchronously through streams of events. Key aspects covered include handling state using event streams and Kafka Streams, building applications as bounded contexts with choreography and orchestration, and establishing pillars for instrumentation, control and operations. Overall the document promotes event streaming as a paradigm that addresses complexity by providing simplicity and scalability through convergent data and logic processing.
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed realtime database.
In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS). Building upon this, I explain how to build common business functionality by stepping through patterns for Scalable payment processing Run it on rails: Instrumentation and monitoring Control flow patterns (start, stop, pause) Finally, all of these concepts are combined in a solution architecture that can be used at enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.
How can the concepts of event-driven linked with the concepts of serivce-oriented architectures. and what is the added value of such a combination?
What do events mean in the context of Business Process Management (BPM) and Business Activity Monitoring (BAM), and how can such architectures/solutions be enhanced with the concepts of Complex Event Processing?
The document discusses challenges with modeling processes that involve multiple interacting objects. Conventional process modeling approaches encourage separating objects and focusing on one object type per process, which can lead to issues when objects interact. The document proposes modeling objects as first-class citizens and capturing relationships between objects to better represent real-world processes where objects corelate and influence each other. It provides examples of how conventional case-centric modeling can struggle to accurately capture a hiring process that involves interacting candidate, application, job offer and other objects.
Slides of our BPM 2022 paper on "Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting", which received the best paper award at the conference. Paper available here: https://link.springer.com/chapter/10.1007/978-3-031-16103-2_22
Slides of the keynote speech on "Constraints for process framing in Augmented BPM" at the AI4BPM 2022 International Workshop, co-located with BPM 2022. The keynote focuses on the problem of "process framing" in the context of the new vision of "Augmented BPM", where BPM systems are augmented with AI capabilities. This vision is described in a manifesto, available here: https://arxiv.org/abs/2201.12855
Keynote speech at KES 2022 on "Intelligent Systems for Process Mining". I introduce process mining, discuss why process mining tasks should be approached by using intelligent systems, and show a concrete example of this combination, namely (anticipatory) monitoring of evolving processes against temporal constraints, using techniques from knowledge representation and formal methods (in particular, temporal logics over finite traces and their automata-theoretic characterization).
Presentation (jointly with Claudio Di Ciccio) on "Declarative Process Mining", as part of the 1st Summer School in Process Mining (http://www.process-mining-summer-school.org). The Presentation summarizes 15 years of research in declarative process mining, covering declarative process modeling, reasoning on declarative process specifications, discovery of process constraints from event logs, conformance checking and monitoring of process constraints at runtime. This is done without ad-hoc algorithms, but relying on well-established techniques at the intersection of formal methods, artificial intelligence, and data science.
1. The document discusses representing business processes with uncertainty using ProbDeclare, an extension of Declare that allows constraints to have uncertain probabilities.
2. ProbDeclare models contain both crisp constraints that must always hold and probabilistic constraints that hold with some probability. This leads to multiple possible "scenarios" depending on which constraints are satisfied.
3. Reasoning involves determining which scenarios are logically consistent using LTLf, and computing the probability distribution over scenarios by solving a system of inequalities defined by the constraint probabilities.
Presentation on "From Case-Isolated to Object-Centric Processes - A Tale of Two Models" as part of the Hasselt University BINF Research Seminar Series (see https://www.uhasselt.be/en/onderzoeksgroepen-en/binf/research-seminar-series).
Invited seminar on "Modeling and Reasoning over Declarative Data-Aware Processes" as part of the KRDB Summer Online Seminars 2020 (https://www.inf.unibz.it/krdb/sos-2020/).
Presentation of the paper "Soundness of Data-Aware Processes with Arithmetic Conditions" at the 34th International Conference on Advanced Information Systems Engineering (CAiSE 2022). Paper available here: https://doi.org/10.1007/978-3-031-07472-1_23
Abstract:
Data-aware processes represent and integrate structural and behavioural constraints in a single model, and are thus increasingly investigated in business process management and information systems engineering. In this spectrum, Data Petri nets (DPNs) have gained increasing popularity thanks to their ability to balance simplicity with expressiveness. The interplay of data and control-flow makes checking the correctness of such models, specifically the well-known property of soundness, crucial and challenging. A major shortcoming of previous approaches for checking soundness of DPNs is that they consider data conditions without arithmetic, an essential feature when dealing with real-world, concrete applications. In this paper, we attack this open problem by providing a foundational and operational framework for assessing soundness of DPNs enriched with arithmetic data conditions. The framework comes with a proof-of-concept implementation that, instead of relying on ad-hoc techniques, employs off-the-shelf established SMT technologies. The implementation is validated on a collection of examples from the literature, and on synthetic variants constructed from such examples.
Presentation of the paper "Probabilistic Trace Alignment" at the 3rd International Conference on Process Mining (ICPM 2021). Paper available here: https://doi.org/10.1109/ICPM53251.2021.9576856
Abstract:
Alignments provide sophisticated diagnostics that pinpoint deviations in a trace with respect to a process model. Alignment-based approaches for conformance checking have so far used crisp process models as a reference. Recent probabilistic conformance checking approaches check the degree of conformance of an event log as a whole with respect to a stochastic process model, without providing alignments. For the first time, we introduce a conformance checking approach based on trace alignments using stochastic Workflow nets. This requires to handle the two possibly contrasting forces of the cost of the alignment on the one hand and the likelihood of the model trace with respect to which the alignment is computed on the other.
Presentation of the paper "Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors" at the 7th International Conference on Principles of Knowledge Representation and Reasoning (KR 2020). Paper available here: https://proceedings.kr.org/2020/32/
Abstract: The integrated modeling and analysis of dynamic systems and the data they manipulate has been long advocated, on the one hand, to understand how data and corresponding decisions affect the system execution, and on the other hand to capture how actions occurring in the systems operate over data. KR techniques proved successful in handling a variety of tasks over such integrated models, ranging from verification to online monitoring. In this paper, we consider a simple, yet relevant model for data-aware dynamic systems (DDSs), consisting of a finite-state control structure defining the executability of actions that manipulate a finite set of variables with an infinite domain. On top of this model, we consider a data-aware version of reactive synthesis, where execution strategies are built by guaranteeing the satisfaction of a desired linear temporal property that simultaneously accounts for the system dynamics and data evolution.
Presentation of the paper "Extending Temporal Business Constraints with Uncertainty" at the 18th Int. Conference on Business Process Management (BPM 2020). Paper available here: https://doi.org/10.1007/978-3-030-58666-9_3
Abstract: Temporal business constraints have been extensively adopted to declaratively capture the acceptable courses of execution in a business process. However, traditionally, constraints are interpreted logically in a crisp way: a process execution trace conforms with a constraint model if all the constraints therein are satisfied. This is too restrictive when one wants to capture best practices, constraints involving uncontrollable activities, and exceptional but still conforming behaviors. This calls for the extension of business constraints with uncertainty. In this paper, we tackle this timely and important challenge, relying on recent results on probabilistic temporal logics over finite traces. Specifically, our contribution is threefold. First, we delve into the conceptual meaning of probabilistic constraints and their semantics. Second, we argue that probabilistic constraints can be discovered from event data using existing techniques for declarative process discovery. Third, we study how to monitor probabilistic constraints, where constraints and their combinations may be in multiple monitoring states at the same time, though with different probabilities.
Presentation of the paper "Extending Temporal Business Constraints with Uncertainty" at the CAiSE2020 Forum. The paper is available here: https://link.springer.com/chapter/10.1007/978-3-030-58135-0_8
Abstract: Conformance checking is a fundamental task to detect deviations between the actual and the expected courses of execution of a business process. In this context, temporal business constraints have been extensively adopted to declaratively capture the expected behavior of the process. However, traditionally, these constraints are interpreted logically in a crisp way: a process execution trace conforms with a constraint model if all the constraints therein are satisfied. This is too restrictive when one wants to capture best practices, constraints involving uncontrollable activities, and exceptional but still conforming behaviors. This calls for the extension of business constraints with uncertainty. In this paper, we tackle this timely and important challenge, relying on recent results on probabilistic temporal logics over finite traces. Specifically, we equip business constraints with a natural, probabilistic notion of uncertainty. We discuss the semantic implications of the resulting framework and show how probabilistic conformance checking and constraint entailment can be tackled therein.
Presentation of the paper "Modeling and Reasoning over Declarative Data-Aware Processes with Object-Centric Behavioral Constraints" at the 17th Int. Conference on Business Process Management (BPM 2019). Paper available here: https://link.springer.com/chapter/10.1007/978-3-030-26619-6_11
Abstract
Existing process modeling notations ranging from Petri nets to BPMN have difficulties capturing the data manipulated by processes. Process models often focus on the control flow, lacking an explicit, conceptually well-founded integration with real data models, such as ER diagrams or UML class diagrams. To overcome this limitation, Object-Centric Behavioral Constraints (OCBC) models were recently proposed as a new notation that combines full-fledged data models with control-flow constraints inspired by declarative process modeling notations such as DECLARE and DCR Graphs. We propose a formalization of the OCBC model using temporal description logics. The obtained formalization allows us to lift all reasoning services defined for constraint-based process modeling notations without data, to the much more sophisticated scenario of OCBC. Furthermore, we show how reasoning over OCBC models can be reformulated into decidable, standard reasoning tasks over the corresponding temporal description logic knowledge base.
Keynote speech at the 7th International Workshop on DEClarative, DECision and Hybrid approaches to processes ( DEC2H 2019) In conjunction with BPM 2019.
This is a talk about the combined modeling and reasoning techniques for decisions, background knowledge, and work processes.
The advent of the OMG Decision Model and Notation (DMN) standard has revived interest, both from academia and industry, in decision management and its relationship with business process management. Several techniques and tools for the static analysis of decision models have been brought forward, taking advantage of the trade-off between expressiveness and computational tractability offered by the DMN S-FEEL language.
In this keynote, I argue that decisions have to be put in perspective, that is, understood and analyzed within their surrounding organizational boundaries. This brings new challenges that, in turn, require novel, advanced analysis techniques. Using a simple but illustrative example, I consider in particular two relevant settings: decisions interpreted the presence of background, structural knowledge of the domain of interest, and (data-aware) business processes routing process instances based on decisions. Notably, the latter setting is of particular interest in the context of multi-perspective process mining. I report on how we successfully tackled key analysis tasks in both settings, through a balanced combination of conceptual modeling, formal methods, and knowledge representation and re
Presentation at "Ontology Make Sense", an event in honor of Nicola Guarino, on how to integrate data models with behavioral constraints, an essential problem when modeling multi-case real-life work processes evolving multiple objects at once. I propose to combine UML class diagrams with temporal constraints on finite traces, linked to the data model via co-referencing constraints on classes and associations.
The document discusses representing and querying norm states using temporal ontology-based data access (OBDA). It presents the QUEN framework which models norms and their state transitions declaratively on top of a relational database. QUEN has three layers: 1) an ontological layer representing norms, 2) a specification of norm state transitions in response to database events, and 3) a legacy relational database storing events. It demonstrates QUEN on an example of patient data access consent, modeling authorizations and their lifecycles. Norm state queries are answered directly over the database using the declarative specifications without materializing states.
Presentation ad EDOC 2019 on monitoring multi-perspective business constraints accounting for time and data, with a specific focus on the (unsolvable in general) problem of conflict detection.
1) The document discusses business process management and how conceptual modeling and process mining can help understand and improve digital enterprises.
2) Process mining techniques like process discovery from event logs, decision mining, and social network mining can provide insights into how processes are executed in reality.
3) Replay techniques can enhance process models with timing information and detect deviations to help align actual behaviors with expected behaviors.
Presentation at BPM 2019, focused on a data-aware extension of BPMN encompassing read-write and read-only data, and on SMT-techniques for effectively tackling parameterized verification of the resulting integrated models.
More from Faculty of Computer Science - Free University of Bozen-Bolzano (20)
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Equivariant neural networks and representation theory
From legacy data to event data
1. Marco Montali Free University of Bozen-Bolzano, Italy
credits: Diego Calvanese, Tahir Emre Kalayci, Ario Santoso, Wil van der Aalst
From legacy data to event data
1
2. My research in one slide
I investigate foundational and applied
techniques grounded in arti
fi
cial intelligence
for modelling, veri
fi
cation, execution,
monitoring, and mining of dynamic systems
operating over data, with a speci
fi
c focus on
business process management and
multiagent systems.
2
3. How to attack this challenges?
Arti
fi
cial
Intelligence
Knowledge representation
Automated reasoning
Multiagent systems
Information
Systems
Business process management
Master data management
Decision management
Formal
Methods
In
fi
nite-state systems
Veri
fi
cation
Petri nets
Data
Science
Process mining
3
7. Processes leave digital breadcrumbs…
Organisational level:
• Internal management
• Calculation of process metrics/KPIs
• Legal reasons
(compliance, external audits)
Personal level:
• We live in a digital society!
• Social networks, sensors,
cyberphysical systems, mobile
devices are all data loggers
7
8. 50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/
get reqs.
enact/
monitor
(re
)
design
adjust
IT support
reality
(knowledge
)
workers
managers
/
analysts
8
9. 50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/
get reqs.
enact/
monitor
(re
)
design
adjust
IT support
reality
(knowledge
)
workers
managers
/
analysts
9
10. 50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/
get reqs.
enact/
monitor
(re
)
design
adjust
IT support
reality
(knowledge
)
workers
managers
/
analysts
10
11. PM2
[Eck et al., CAiSE 2015]
Initialization
Analysis iterations
Analysis iterations
Analysis iterations
1. Planning
2. Extraction
3. Data
processing
4. Mining
& Analysis
5. Evaluation
6. Process
Improvement
& Support
Discovery
Conformance
Enhancement
Event
logs
Improve-
ment ideas
Event
Data
Information
System
Process
Models
Performance
findings
Compliance
findings
Stage
Output /
Input
Research
questions
Refined/new
research
questions
Analytics
Analytic
models
Business
experts
Process
analysts
11
13. IEEE standard XES
www.xes-standard.org
IEEE Standard for the representation of event logs
• Based on XML
• Minimal mandatory structure:
log consists of traces, each representing the history of a case
trace consists of a list of atomic events
• Extensions to “decorate” log, trace, event with informative attributes:
timestamps, task names, transactional lifecycle, resources, additional event
data
• Supports “meta-level” declarations useful for log processors
13
17. A simple process
Apologies for being so predictable…
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
17
18. The lucky situation
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
Example 1. As a running example, we consider a simplified conference submission
system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au-
thors, reviewers, and conference chairs in the submission of papers to conferences, the
consequent review process, and the final decision about paper acceptance or rejection.
Figure 2 shows the process control flow considering papers as case objects. Under this
perspective, the management of a single paper evolves through the following execution
steps. First, the paper is created by one of its authors, and submitted to a conference
available in the system. Once the paper is submitted, the review phase for that paper
starts. This phase of the process consists of a so-called multi-instance section, i.e., a
section of the process where the same set of activities is instantiated multiple times on
Event Data
Case ID ID Timestamp Activity User ...
1
35654423 30-12-2010:11.02 create paper Pete ...
35654424 31-12-2010:10.06 submit paper Pete ...
35654425 05-01-2011:15.12 assign review Mike ...
35654426 06-01-2011:11.18 submit review Sara ...
35654428 07-01-2011:14.24 accept paper Mike ...
35654429 06-01-2011:11.18 upload CR Pete ...
2
35654483 30-12-2010:11.32 create paper George ...
35654485 30-12-2010:12.12 submit paper John ...
35654487 30-12-2010:14.16 assign review Mike ...
35654489 16-01-2011:10.30 submit review Ellen ...
35654490 18-01-2011:12.05 reject paper Mike ...
50%
18
20. The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
20
21. The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
21
22. The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
22
24. Intertwined objects
time
data
activities
Order
Item
Package
1
includes
*
*
is carried in
1
o1 o2 o3
i1,1 i1,2 i2,1 i2,2 i2,3 i3,1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o pay order
Have you ever
placed orders online?
24
25. Flattening reality
time
data
activities
Order
Item
Package
1
includes
*
*
is carried in
1
o1 o2 o3
i1,1 i1,2 i2,1 i2,2 i2,3 i3,1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o pay order
focus on orders
25
27. The effect of flattening
Package
is carried in
1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 12:32:00 add item i1,2 to order o1 add item
2019-09-23 13:03:00 pay order o1 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 14:45:00 add item i2,2 to order o2 add item
2019-09-23 14:51:00 load item i3,1 into package p1 load item
2019-09-23 15:12:00 add item i2,3 to order o2 add item
2019-09-23 15:41:00 pay order o2 pay order
2019-09-23 16:23:00 load item i2,1 into package p2 load item
2019-09-23 16:29:00 load item i1,2 into package p2 load item
2019-09-23 16:33:00 load item i2,2 into package p2 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 06:38:00 send package p2 send package send package
2019-09-24 07:33:00 load item i2,3 into package p3 load item
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 17:32:00 deliver package p2 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:30:00 deliver package p1 deliver package deliver package
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
2019-09-25 09:55:00 deliver package p2 deliver package deliver package
2019-09-25 17:11:00 deliver package p2 deliver package deliver package
2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery
27
28. The effect of flattening
Package
is carried in
1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 12:32:00 add item i1,2 to order o1 add item
2019-09-23 13:03:00 pay order o1 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 14:45:00 add item i2,2 to order o2 add item
2019-09-23 14:51:00 load item i3,1 into package p1 load item
2019-09-23 15:12:00 add item i2,3 to order o2 add item
2019-09-23 15:41:00 pay order o2 pay order
2019-09-23 16:23:00 load item i2,1 into package p2 load item
2019-09-23 16:29:00 load item i1,2 into package p2 load item
2019-09-23 16:33:00 load item i2,2 into package p2 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 06:38:00 send package p2 send package send package
2019-09-24 07:33:00 load item i2,3 into package p3 load item
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 17:32:00 deliver package p2 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:30:00 deliver package p1 deliver package deliver package
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
2019-09-25 09:55:00 deliver package p2 deliver package deliver package
2019-09-25 17:11:00 deliver package p2 deliver package deliver package
2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery
orders
28
29. The effect of flattening
Package
is carried in
1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 12:32:00 add item i1,2 to order o1 add item
2019-09-23 13:03:00 pay order o1 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 14:45:00 add item i2,2 to order o2 add item
2019-09-23 14:51:00 load item i3,1 into package p1 load item
2019-09-23 15:12:00 add item i2,3 to order o2 add item
2019-09-23 15:41:00 pay order o2 pay order
2019-09-23 16:23:00 load item i2,1 into package p2 load item
2019-09-23 16:29:00 load item i1,2 into package p2 load item
2019-09-23 16:33:00 load item i2,2 into package p2 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 06:38:00 send package p2 send package send package
2019-09-24 07:33:00 load item i2,3 into package p3 load item
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 17:32:00 deliver package p2 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:30:00 deliver package p1 deliver package deliver package
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
2019-09-25 09:55:00 deliver package p2 deliver package deliver package
2019-09-25 17:11:00 deliver package p2 deliver package deliver package
2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery
orders
29
30. Discovery?
p1
p2 p3
e of order, item, and package data objects in an order-to-delivery sce-
s from di↵erent orders are carried in several packages
event log for orders
overall log order o1 order o2 order o3
order o1 create order
m i1,1 to order o1 add item
order o2 create order
m i2,1 to order o2 add item
order o3 create order
m i3,1 to order o3 add item
der o3 pay order
m i1,2 to order o1 add item
der o1 pay order
m i1,1 into package p1 load item
m i2,2 to order o2 add item
m i3,1 into package p1 load item
m i2,3 to order o2 add item
der o2 pay order
m i2,1 into package p2 load item
m i1,2 into package p2 load item
m i2,2 into package p2 load item
ackage p1 send package send package
ackage p2 send package send package
m i2,3 into package p3 load item
ackage p3 send package
package p1 deliver package deliver package
package p2 deliver package deliver package
package p3 deliver package
delivery p3 accept delivery
package p1 deliver package deliver package
delivery p1 accept delivery accept delivery
package p2 deliver package deliver package
package p2 deliver package deliver package
delivery p2 accept delivery accept delivery
to discover a process model that explains the behavior
3
2
4
2
3
3
1
1
3
3
6
5
3
3
create order
3
add item
6
pay order
3
load item
6
send package
5
deliver package
11
accept delivery
5
This requires to apply a no
raw log, where a case notion is
flat view of the log is compute
case object. The right part of
this flattening when Order is
traces in this log is obtained,
order, and by filtering the raw
flat trace for a given order cont
directly refer to that order, or
or to a package that carries o
order.
Two undesired e↵ects consequ
1. Replication of tasks. When
objects, its related events
such case objects. In our
case for the events focused
refer to both order o1 and o
in these two traces.
2. Shu✏ing of independent th
applied to di↵erent object
object are shu✏ed togethe
guish to which actual obje
scenario, this is the case for
included in the same order
of that order. In the same
derstand which item is add
is delivered or accepted.
correlate a load item even
add item event, and an acce
corresponding delivery atte
The result of these two undes
covered model, which then con
formation as well as apparen
procent. This can be clearly
directly-follow graph with fre
the well-known Disco process
der log of our scenario. Due
misleadingly indicates that th
This number is derived consid
age in our scenario carry obj
30
31. Discovery?
p1
p2 p3
e of order, item, and package data objects in an order-to-delivery sce-
s from di↵erent orders are carried in several packages
event log for orders
overall log order o1 order o2 order o3
order o1 create order
m i1,1 to order o1 add item
order o2 create order
m i2,1 to order o2 add item
order o3 create order
m i3,1 to order o3 add item
der o3 pay order
m i1,2 to order o1 add item
der o1 pay order
m i1,1 into package p1 load item
m i2,2 to order o2 add item
m i3,1 into package p1 load item
m i2,3 to order o2 add item
der o2 pay order
m i2,1 into package p2 load item
m i1,2 into package p2 load item
m i2,2 into package p2 load item
ackage p1 send package send package
ackage p2 send package send package
m i2,3 into package p3 load item
ackage p3 send package
package p1 deliver package deliver package
package p2 deliver package deliver package
package p3 deliver package
delivery p3 accept delivery
package p1 deliver package deliver package
delivery p1 accept delivery accept delivery
package p2 deliver package deliver package
package p2 deliver package deliver package
delivery p2 accept delivery accept delivery
to discover a process model that explains the behavior
3
2
4
2
3
3
1
1
3
3
6
5
3
3
create order
3
add item
6
pay order
3
load item
6
send package
5
deliver package
11
accept delivery
5
This requires to apply a no
raw log, where a case notion is
flat view of the log is compute
case object. The right part of
this flattening when Order is
traces in this log is obtained,
order, and by filtering the raw
flat trace for a given order cont
directly refer to that order, or
or to a package that carries o
order.
Two undesired e↵ects consequ
1. Replication of tasks. When
objects, its related events
such case objects. In our
case for the events focused
refer to both order o1 and o
in these two traces.
2. Shu✏ing of independent th
applied to di↵erent object
object are shu✏ed togethe
guish to which actual obje
scenario, this is the case for
included in the same order
of that order. In the same
derstand which item is add
is delivered or accepted.
correlate a load item even
add item event, and an acce
corresponding delivery atte
The result of these two undes
covered model, which then con
formation as well as apparen
procent. This can be clearly
directly-follow graph with fre
the well-known Disco process
der log of our scenario. Due
misleadingly indicates that th
This number is derived consid
age in our scenario carry obj
non-existing loop
wrong statistics
31
32. Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
32
33. Level 4-5: straightforward
syntactic manipulation
Level 3: much more di
ffi
cult
• Multiple data sources
• Interpretation of data
• Lack of explicit information
about cases and events
• Processes with one-to-many
and many-to-many relations
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
33
34. Level 4-5: straightforward
syntactic manipulation
Level 3: much more di
ffi
cult
• Multiple data sources
• Interpretation of data
• Lack of explicit information
about cases and events
• Processes with one-to-many
and many-to-many relations
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
Not covered today, but
Recent works by Dirk Fahland,
Wil van der Aalst, my group
https://pais.hse.ru/en/seminar-pne/
https://multiprocessmining.org
http://ocel-standard.org
34
35. Extracting XES from legacy data
[___,BIS2017]
Manual construction of views and ETL procedures to fetch the data
Done by IT experts, not by knowledge workers (domain experts)
Traditional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
og Extraction and Process Mining
inally, EBITmax converted the log view into a CSV file, and analysed it using th
Disco process mining toolkit7
.
35
36. Extracting XES from legacy data
[___,BIS2017]
Crucial issues:
• Correctness: who knows? Process mining is dangerous if applied on wrong
data
• maintenance, evolution, change of perspective are hard… but process mining
should be highly interactive
Traditional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
og Extraction and Process Mining
inally, EBITmax converted the log view into a CSV file, and analysed it using th
Disco process mining toolkit7
.
36
37. The onprom approach
onprom.inf.unibz.it
Semantic technologies to:
1.Understand the data
2. Access the data using the domain vocabulary
3. Express the perspective for process mining using the domain vocabulary
4. Automatise the extraction of XES event logs
37
34 D. Calvanese et al.
high-level IS?
Create
conceptual
data
schema
Create
mappings
Bootstrap
model +
mappings
Enrich
model +
mappings
Choose
perspective
Create
event-data
annotations
Get
XES/CSV
Do process
mining
Other perspective?
N
Y
Y
N
Fig. 12: The onprom methodology and its four phases
the same time generating (identity) mappings to link the two specifications. The result
of bootstrapping can then be manually refined.
Once the first phase is completed, process analysts and the other involved stake-
holders do not need anymore to consider the structure of the legacy information system,
40. Data access is becoming a bottleneck
Optique project: Scalable, End-User Access to Big Data (http://optique-
project.eu)
One case study: Statoil
• geologists and engineers develop models of unexplored areas based on
drilling operations done in surrounding sites
Crompton (2008): domain experts use (too much) time to fetch data for
decision making and di their job
• Engineers in the oil/gas sector: 30-70% working time spent into in data
access and data quality
40
41. Facts on Statoil
• 1000 TB of relational data (SQL)
• Non-aligned schemas, each with 2K+ tables
• 900 experts within “Statoil Exploration”
• Up to 4 days needed to express queries and translate them into SQL
41
42. Example of query
42
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
43. 43
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
44. 44
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
50M
€
per year
57. OBDA
Main components
57
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
data sources
ontology
/
conceptual model
mapping
58. OBDA
Main technologies
58
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
SQL
(or other technologies)
schema
:
OWL2 QL
/
UML class diagram
s
(virtual knowledge graph)
:
RDF triples
R2RML
SPARQL
59. ontop-vkg.org
• State-of-the-art OBDA system
• Compliant with RDF(S), OWL2 QL, R2RML, SPARQL
• Supports all major relational DBS (Oracle, SQL Server, Postgres, …)
• Support for other data storage mechanisms ongoing (MongoDB,…)
• Development started in 2009
• Wide adoption in academia and industry
• At the basis of https://ontopic.biz
59
60. Conference Example: Conceptual Schema
60
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
N.B.: in on prom we use DL-LiteA
(supports a controlled form of functionality)
61. Behind the scene…
61
(title) ⌘ Paper
⇢(title) v string
(funct title)
(type) ⌘ Paper
⇢(type) v string
(funct type)
(decTime) ⌘ DecidedPaper
⇢(decTime) v ts
(funct decTime)
(accepted) ⌘ DecidedPaper
⇢(accepted) v boolean
(funct accepted)
(pName) ⌘ Person
⇢(pName) v string
(funct pName)
(regTime) ⌘ Person
⇢(regTime) v ts
(funct regTime)
(cName) ⌘ Conference
⇢(cName) v string
(funct cName)
(crTime) ⌘ Conference
⇢(crTime) v ts
(funct crTime)
(uploadTime) ⌘ Submission
⇢(uploadTime) v ts
(funct uploadTime)
(invTime) ⌘ Assignment
⇢(invTime) v ts
(funct invTime)
(subTime) ⌘ Review
⇢(subTime) v ts
(funct subTime)
DecidedPaper v Paper
Creation v Submission
CRUpload v Submission
9Submission1 ⌘ Submission
9Submission1 ⌘ Paper
(funct Submission1)
9Submission2 ⌘ Submission
9Submission2 v Person
(funct Submission2)
9Assignment1 ⌘ Assignment
9Assignment1 v Paper
(funct Assignment1)
9Assignment2 ⌘ Assignment
9Assignment2 v Person
(funct Assignment2)
9leadsTo v Assignment
9leadsTo ⌘ Review
(funct leadsTo)
(funct leadsTo )
9submittedTo ⌘ Paper
9submittedTo v Conference
(funct submittedTo)
9notifiedBy ⌘ DecidedPaper
9notifiedBy v Person
(funct notifiedBy)
9chairs v Person
9chairs ⌘ Conference
(funct chairs )
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
have the same logical models. However, it is possible to show that the logical models
of a UML class diagram and those of the DL-LiteA ontology derived from it correspond
to each other, and hence that satisfiability of a class or association in the UML diagram
62. Mapping Example
62
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. We
give some examples of mapping assertions:
– The following mapping assertion explicitly populates the concept Creation. The
term :submission/{oid} in the target part represents a URI template with
one placeholder, {oid}, which gets replaced with the values for oid retrieved
through the source query. This mapping expresses that each value in SUBMISSION
identified by oid and such that its upload time equals the corresponding paper’s
creation time, is mapped to an object :submission/oid, which becomes an
instance of concept Creation in T .
SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
:submission/{oid} rdf:type :Creation .
– The following mapping assertion retrieves from the PAPER table instances of the
concept Paper, and instantiates also their features title and type with values of type
String.
SELECT ID, title, type
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
68. data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
Theoretical Results
68
Q
Q’
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain sc
identify services an
UFO
inspect contr
OBDA
(b) 2OBDA
for service ma
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
(b) 2OBDA
for service ma
d
a
t
a
m
a
p
d
o
m
a
i
n
s
c
h
e
m
a
t
r
a
n
s
f
o
r
m
u
p
p
e
r
s
c
h
e
m
a
q
u
e
r
y
/
a
n
s
w
e
r
O
B
D
A
(
a
)
2
-
l
e
v
e
l
O
B
D
A
d
m
d
o
m
a
i
n
i
d
e
n
t
i
f
y
s
e
r
v
i
c
e
s
U
F
i
n
s
p
e
c
t
c
o
O
B
D
A
(
b
)
2
O
B
D
f
o
r
s
e
r
v
i
c
e
’
69. Case study: reference model
69
Conceptual Schema Transformation in Ontology-bas
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework (c) 2O
Conceptual Schema Transformation in Ontolo
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework
70. Case study: process mining!
70
Conceptual Schema Transformation in Ontolo
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework
al Schema Transformation in Ontology-based Data Access 3
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework
data
map
domain schema
identify cases and events
event log format
fetch cases and events
process mining tool
OBDA
(c) 2OBDA framework for pro-
72. Annotating the Conceptual Schema
Fix perspective: declare the case
• Find the class whose instances are considered as case objects
• Express additional
fi
lters
Find the events (looking for timestamps)
• Find the classes whose instances refer to events
• Declare how they are connected to corresponding case objects —> navigation in the
UML class diagram
• Declare how they are (in)directly related to event attributes
(timestamp, task name, optionally event type and resource)
—> navigation in the UML class diagram
72
73. Conference Example
Case Annotation
73
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
74. Conference Example
Case Annotation
74
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
75. Conference Example
Event annotation
75
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
76. Conference Example
Event annotation
76
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
77. 77
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
78. Switching Perspective
Simply amounts to rede
fi
ne the annotations
• Flow of accepted papers
• Flow of full papers
• Flow of reviews
• Flow of authors
• Flow of reviewers
• ….
78
80. Formalizing Annotations
Annotations are nothing else than SPARQL queries over the conceptual data
schema!
• Case annotation: query retrieving case objects
• Event annotation: query retrieving event objects
• Case-attribute annotation: query retrieving pairs <attribute, case>
• Event-attribute annotation: query retrieving pairs <attribute, event>
80
81. 81
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 16: Annotated data model of our CONFSYS running example
annotations, respectively used to capture the relationship between the event and its cor-
responding case(s), timestamp, and activity. As pointed out before, the timestamp anno-
SELECT DISTINCT ?case
WHERE {
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries wi
swer variable, this time matching with actual event identifiers, i.e., ob
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries w
variables, establishing a relation between events and their correspondin
ues. In this light, for timestamp and activity attribute annotations, the
variable will be substituted by corresponding values for timestamps/activ
case attribute annotations, instead, the second answer variable will be
case objects, thus establishing a relationship between events and the c
long to.
Example 15. Consider again the annotation for creation events, as show
The relationship between creation events and their corresponding times
lished by the following query:
PREFIX : <http://www.example.com/>
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries with a single an-
swer variable, this time matching with actual event identifiers, i.e., objects denoting
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
82. Annotations and XES Elements
Annotations can be easily “mapped” onto XES elements:
case annotation query —> traces
event annotation query —> events
attribute annotation query —> trace/event attributes with given key
82
OBDA for Log Extraction in Process Mining 35
Attribute
attKey: String
attType: String
attValue: String
Event
Trace
e-has-a
t-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
83. Conference Example:
Case Annotation
83
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
ecTime: ts
ccepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 16: Annotated data model of our CONFSYS running example
notations, respectively used to capture the relationship between the event and its cor-
sponding case(s), timestamp, and activity. As pointed out before, the timestamp anno-
ion needs to have a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries with a single an
swer variable, this time matching with actual event identifiers, i.e., objects denoting
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answe
variables, establishing a relation between events and their corresponding attribute val
ues. In this light, for timestamp and activity attribute annotations, the second answe
variable will be substituted by corresponding values for timestamps/activity names. Fo
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16
The relationship between creation events and their corresponding timestamps is estab
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
XES events:
- id: ?creationEvent
Event annotations are also tackled using SPARQL SELECT queries with a single an-
swer variable, this time matching with actual event identifiers, i.e., objects denoting
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES attribute:
- key: timestamp extension
- type: milliseconds
- value: ?creationTime
- parent event: ?creationEvent
84. Rewriting Annotations
Annotations are nothing else than SPARQL queries over the conceptual data
schema
84
They can be automatically reformulated as SQL
queries over the legacy data
We automatically get a standard OBDA mapping
from the legacy data to the XES concepts
85. 85
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Submission."UploadTime" = Paper."CT" AND
Submission."ID" IS NOT NULL
This query is generated by the ontop OBDA system, which applies various optimisa-
tions so as to obtain a final SQL query that is not only correct, but also possibly compact
and fast to process by a standard DBMS. One such optimisations is the application of
Person
pName : String
regTime: ts
CRUpload Creation
o
chairs
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
* 1..*
1 1
model of our CONFSYS running example
apture the relationship between the event and its cor-
activity. As pointed out before, the timestamp anno-
vigation. This also applies to the activity annotation,
ad of providing a functional navigation, the activity
a constant string that independently fixes the name
datory attributes, additional optional attribute anno-
over the various standard extensions provided XES,
within the activity transactional lifecycle, as well as
ituted by the resource name and/or role.
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES events:
- id: ?creationEvent
OBDA for Log Extraction in Process Mining 43
ch SQL query q(c) 2 Lsql obtained from a case annotation, we insert into
he following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
vely, such a mapping populates the concept Trace in E with the case objects
e created from the answers returned by query q(c).
ch SQL query q(e) 2 Lsql that is obtained from an event annotation, we
nto ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
vely, such a mapping populates the concept Event in E with the event objects
e created from the answers returned by query q(e).
OBDA for Log Extraction in Process Mining 43
1. For each SQL query q(c) 2 Lsql obtained from a case annotation, we insert into
ME
P the following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
Intuitively, such a mapping populates the concept Trace in E with the case objects
that are created from the answers returned by query q(c).
2. For each SQL query q(e) 2 Lsql that is obtained from an event annotation, we
insert into ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
Intuitively, such a mapping populates the concept Event in E with the event objects
that are created from the answers returned by query q(e).
as a XES event log, and also to actually materialise such an event log.
Technically, onprom takes as input an onprom model P = hI, T , M, Li and the
event schema E, and produces new OBDA system hI, ME
P , Ei, where the annotations
in L are automatically reformulated as OBDA mappings ME
P that directly link I to E.
Such mappings are synthesised using the three-step approach described next.
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
86. Recap
86
OBDA for Log Extraction in Process Mining 37
D
(database)
R
(db schema)
conforms to
M
(mapping specification)
T
(conceptual data schema)
L
(event-data annotations)
P (onprom model)
E
(conceptual event schema)
annotates
points to
ME
P
(log mapping specification)
I (information system)
B (OBDA model)
Fig. 15: Sketch of the onprom model. The dashed mapping specification is automati-
87. Querying the “Virtual Log”
SPARQL queries over the event schema are answered using legacy data
• Example: get empty and nonempty traces; for nonempty traces, also fetch all their events
Answers can be serialised into a fully compliant XES log!
87
name.
The following query is instead meant to retrieve (elementary) attributes, considering
in particular their key, type, and value.
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?att ?attType ?attKey ?attValue
WHERE {
?att rdf:type :Attribute;
:attType ?attType;
:attKey ?attKey;
:attVal ?attValue.
}
The following query handles the retrieval of empty and nonempty traces, simulta-
neously obtaining, for nonempty traces, their constitutive events:
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?trace ?event
WHERE {
?trace a :Trace .
OPTIONAL {
?trace :t-contain-e ?event .
?event :e-contain-a ?timestamp .
?timestamp :attKey "time:timestamp"ˆˆxsd:string .
?event :e-contain-a ?name .
?name :attKey "concept:name"ˆˆxsd:string .
}
}
4.6 The onprom Toolchain
onprom comes with a toolchain that supports the various phases of the methodology
88. The onprom Toolchain
Implementation of all the described steps using
• Java (GUIs, algorithms)
• OWL 2 QL plus functionality (conceptual schemas)
• ontop (OBDA system)
• OpenXES (XES serialisation and manipulation)
• ProM process mining framework (environment)
88
89. onprom UML Editor
89
46 D. Calvanese et al.
Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
90. onprom Annotation Editor
90
OBDA for Log Extraction in Process Mining 47
Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
91. onprom Log Extractor
91
OBDA for Log Extraction in Process Mining 49
Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
92. Experiments
• Very encouraging initial experiments
• Carried out using synthetic data
• We are looking for real case studies!
92
96. Conclusions
• Process Mining as a way to reconcile model-driven management and the real
behaviours
• Data preparation is an issue in presence of legacy data
• Ontology-Based Data Access: solid theoretical basis with optimised
implementations
• onprom as an e
ff
ective tool chain for extracting event logs from legacy
databases
• Several simpli
fi
ed settings can emerge depending on the context:
fi
xed ERP
schema, reference models, …
96
97. Future Work
• Conceptual Modeling
• How to improve the discovery of events?
• How to semi-automatically propose events to the user?
• How to integrate methodologies and results from formal ontology?
• Engineering
• How to handle di
ff
erent types of data?
• How to deal with di
ff
erent event schemas that go beyond XES?
• How to generalise the approach to handle rich ontology-to-ontology-mappings?
97