SlideShare a Scribd company logo
Marco Montali Free University of Bozen-Bolzano, Italy

credits: Diego Calvanese, Tahir Emre Kalayci, Ario Santoso, Wil van der Aalst
From legacy data to event data
1
My research in one slide
I investigate foundational and applied
techniques grounded in arti
fi
cial intelligence
for modelling, veri
fi
cation, execution,
monitoring, and mining of dynamic systems
operating over data, with a speci
fi
c focus on
business process management and
multiagent systems.
2
How to attack this challenges?
Arti
fi
cial
Intelligence
Knowledge representation


Automated reasoning


Multiagent systems
Information
Systems
Business process management


Master data management


Decision management
Formal
Methods
In
fi
nite-state systems


Veri
fi
cation


Petri nets
Data
Science
Process mining
3
Business Process Management
4
Business Process Management
5
Processes leave breadcrumbs…
6
Processes leave digital breadcrumbs…
Organisational level:

• Internal management

• Calculation of process metrics/KPIs

• Legal reasons 

(compliance, external audits)

Personal level:

• We live in a digital society! 

• Social networks, sensors,
cyberphysical systems, mobile
devices are all data loggers
7
50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/

get reqs.
enact/
monitor
(re
)

design
adjust
IT support
reality
(knowledge
)

workers
managers
/

analysts
8
50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/

get reqs.
enact/
monitor
(re
)

design
adjust
IT support
reality
(knowledge
)

workers
managers
/

analysts
9
50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/

get reqs.
enact/
monitor
(re
)

design
adjust
IT support
reality
(knowledge
)

workers
managers
/

analysts
10
PM2
[Eck et al., CAiSE 2015]
Initialization
Analysis iterations
Analysis iterations
Analysis iterations
1. Planning
2. Extraction
3. Data
processing
4. Mining
& Analysis
5. Evaluation
6. Process
Improvement
& Support
Discovery
Conformance
Enhancement
Event
logs
Improve-
ment ideas
Event
Data
Information
System
Process
Models
Performance
findings
Compliance
findings
Stage
Output /
Input
Research
questions
Refined/new
research
questions
Analytics
Analytic
models
Business
experts
Process
analysts
11
12
Camunda
ERP
Signavio
Document-driven
EPCs
GSM
BPMN
CMMN
Case Management
Legacy Systems
CRM
E-R
Bizagi
Aris
UML
Artifact-Centric
SAP
Bonita
Practice…
IEEE standard XES
www.xes-standard.org
IEEE Standard for the representation of event logs

• Based on XML

• Minimal mandatory structure: 

log consists of traces, each representing the history of a case

trace consists of a list of atomic events

• Extensions to “decorate” log, trace, event with informative attributes:
timestamps, task names, transactional lifecycle, resources, additional event
data

• Supports “meta-level” declarations useful for log processors
13
Full XES
Schema
attKey: String
attType: String
Attribute
extName: String
extPrefix: String
extUri: String
Extension
attValue: String
ElementaryAttribute CompositeAttribute
{disjoint}
ca-contains-a
*
*
logFeatures: String
logVersion: String
Log Trace Event
GlobalAttribute
GlobalEventAttribute
GlobalTraceAttribute
EventClassifier
TraceClassifier
name: String
Classifier
a-contains-a
*
*
*
* e-usedBy-a
e-usedBy-l
*
*
l-contains-t t-contains-e
* *
*
1..*
l-contains-e
*
*
* * *
*
*
*
l-has-a
t-has-a
e-has-a
l-has-gea *
1..*
l-contains-ec
1..*
*
1..*
l-contains-tc
*
ec-definedBy-gea
1..*
*
1..*
1..*
* tc-definedBy-gea
l-has-gta
*
{disjoint}
{disjoint}
14
Core XES schema
OBDA for Log Extraction in Process Mining 3
Attribute
attKey: String
attType: String
attValue: String
Event
Trace
e-has-a
t-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
Fig. 13: Core event schema
15
<log xes.version="1.0"


xes.features="nested-attributes"


openxes.version="1.0RC7">


<extension name="Time"


prefix="time"


uri="http://www.xes-standard.org/time.xesext"/>


<classifier name="Event Name" keys="concept:name"/>


<string key="concept:name" value="XES Event Log"/>


...


<trace>


<string key="concept:name" value="1"/>


<event>


<string key="User" value="Pete"/>


<string key="concept:name" value="create paper"/>


<int key="Event ID" value="35654424"/>


...


</event>


<event>


...


<string key="concept:name" value="submit paper"/>


...


</event>


...


16
A simple process
Apologies for being so predictable…
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
17
The lucky situation
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
Example 1. As a running example, we consider a simplified conference submission
system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au-
thors, reviewers, and conference chairs in the submission of papers to conferences, the
consequent review process, and the final decision about paper acceptance or rejection.
Figure 2 shows the process control flow considering papers as case objects. Under this
perspective, the management of a single paper evolves through the following execution
steps. First, the paper is created by one of its authors, and submitted to a conference
available in the system. Once the paper is submitted, the review phase for that paper
starts. This phase of the process consists of a so-called multi-instance section, i.e., a
section of the process where the same set of activities is instantiated multiple times on
Event Data
Case ID ID Timestamp Activity User ...
1
35654423 30-12-2010:11.02 create paper Pete ...
35654424 31-12-2010:10.06 submit paper Pete ...
35654425 05-01-2011:15.12 assign review Mike ...
35654426 06-01-2011:11.18 submit review Sara ...
35654428 07-01-2011:14.24 accept paper Mike ...
35654429 06-01-2011:11.18 upload CR Pete ...
2
35654483 30-12-2010:11.32 create paper George ...
35654485 30-12-2010:12.12 submit paper John ...
35654487 30-12-2010:14.16 assign review Mike ...
35654489 16-01-2011:10.30 submit review Ellen ...
35654490 18-01-2011:12.05 reject paper Mike ...
50%
18
The common case
19
The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
20
The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
21
The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
22
Intertwined objects
time
data
activities
23
Intertwined objects
time
data
activities
Order
Item
Package
1
includes
*
*
is carried in
1
o1 o2 o3
i1,1 i1,2 i2,1 i2,2 i2,3 i3,1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o pay order
Have you ever 

placed orders online?
24
Flattening reality
time
data
activities
Order
Item
Package
1
includes
*
*
is carried in
1
o1 o2 o3
i1,1 i1,2 i2,1 i2,2 i2,3 i3,1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o pay order
focus on orders
25
Flattening Reality
time
data
activities
order o1
order o2
order o3
26
The effect of flattening
Package
is carried in
1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 12:32:00 add item i1,2 to order o1 add item
2019-09-23 13:03:00 pay order o1 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 14:45:00 add item i2,2 to order o2 add item
2019-09-23 14:51:00 load item i3,1 into package p1 load item
2019-09-23 15:12:00 add item i2,3 to order o2 add item
2019-09-23 15:41:00 pay order o2 pay order
2019-09-23 16:23:00 load item i2,1 into package p2 load item
2019-09-23 16:29:00 load item i1,2 into package p2 load item
2019-09-23 16:33:00 load item i2,2 into package p2 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 06:38:00 send package p2 send package send package
2019-09-24 07:33:00 load item i2,3 into package p3 load item
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 17:32:00 deliver package p2 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:30:00 deliver package p1 deliver package deliver package
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
2019-09-25 09:55:00 deliver package p2 deliver package deliver package
2019-09-25 17:11:00 deliver package p2 deliver package deliver package
2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery
27
The effect of flattening
Package
is carried in
1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 12:32:00 add item i1,2 to order o1 add item
2019-09-23 13:03:00 pay order o1 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 14:45:00 add item i2,2 to order o2 add item
2019-09-23 14:51:00 load item i3,1 into package p1 load item
2019-09-23 15:12:00 add item i2,3 to order o2 add item
2019-09-23 15:41:00 pay order o2 pay order
2019-09-23 16:23:00 load item i2,1 into package p2 load item
2019-09-23 16:29:00 load item i1,2 into package p2 load item
2019-09-23 16:33:00 load item i2,2 into package p2 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 06:38:00 send package p2 send package send package
2019-09-24 07:33:00 load item i2,3 into package p3 load item
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 17:32:00 deliver package p2 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:30:00 deliver package p1 deliver package deliver package
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
2019-09-25 09:55:00 deliver package p2 deliver package deliver package
2019-09-25 17:11:00 deliver package p2 deliver package deliver package
2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery
orders
28
The effect of flattening
Package
is carried in
1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 09:20:00 create order o2 create order
2019-09-23 09:34:00 add item i2,1 to order o2 add item
2019-09-23 11:33:00 create order o3 create order
2019-09-23 11:40:00 add item i3,1 to order o3 add item
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 12:32:00 add item i1,2 to order o1 add item
2019-09-23 13:03:00 pay order o1 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 14:45:00 add item i2,2 to order o2 add item
2019-09-23 14:51:00 load item i3,1 into package p1 load item
2019-09-23 15:12:00 add item i2,3 to order o2 add item
2019-09-23 15:41:00 pay order o2 pay order
2019-09-23 16:23:00 load item i2,1 into package p2 load item
2019-09-23 16:29:00 load item i1,2 into package p2 load item
2019-09-23 16:33:00 load item i2,2 into package p2 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 06:38:00 send package p2 send package send package
2019-09-24 07:33:00 load item i2,3 into package p3 load item
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 17:32:00 deliver package p2 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:30:00 deliver package p1 deliver package deliver package
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
2019-09-25 09:55:00 deliver package p2 deliver package deliver package
2019-09-25 17:11:00 deliver package p2 deliver package deliver package
2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery
orders
29
Discovery?
p1
p2 p3
e of order, item, and package data objects in an order-to-delivery sce-
s from di↵erent orders are carried in several packages
event log for orders
overall log order o1 order o2 order o3
order o1 create order
m i1,1 to order o1 add item
order o2 create order
m i2,1 to order o2 add item
order o3 create order
m i3,1 to order o3 add item
der o3 pay order
m i1,2 to order o1 add item
der o1 pay order
m i1,1 into package p1 load item
m i2,2 to order o2 add item
m i3,1 into package p1 load item
m i2,3 to order o2 add item
der o2 pay order
m i2,1 into package p2 load item
m i1,2 into package p2 load item
m i2,2 into package p2 load item
ackage p1 send package send package
ackage p2 send package send package
m i2,3 into package p3 load item
ackage p3 send package
package p1 deliver package deliver package
package p2 deliver package deliver package
package p3 deliver package
delivery p3 accept delivery
package p1 deliver package deliver package
delivery p1 accept delivery accept delivery
package p2 deliver package deliver package
package p2 deliver package deliver package
delivery p2 accept delivery accept delivery
to discover a process model that explains the behavior
3
2
4
2
3
3
1
1
3
3
6
5
3
3
create order
3
add item
6
pay order
3
load item
6
send package
5
deliver package
11
accept delivery
5
This requires to apply a no
raw log, where a case notion is
flat view of the log is compute
case object. The right part of
this flattening when Order is
traces in this log is obtained,
order, and by filtering the raw
flat trace for a given order cont
directly refer to that order, or
or to a package that carries o
order.
Two undesired e↵ects consequ
1. Replication of tasks. When
objects, its related events
such case objects. In our
case for the events focused
refer to both order o1 and o
in these two traces.
2. Shu✏ing of independent th
applied to di↵erent object
object are shu✏ed togethe
guish to which actual obje
scenario, this is the case for
included in the same order
of that order. In the same
derstand which item is add
is delivered or accepted.
correlate a load item even
add item event, and an acce
corresponding delivery atte
The result of these two undes
covered model, which then con
formation as well as apparen
procent. This can be clearly
directly-follow graph with fre
the well-known Disco process
der log of our scenario. Due
misleadingly indicates that th
This number is derived consid
age in our scenario carry obj
30
Discovery?
p1
p2 p3
e of order, item, and package data objects in an order-to-delivery sce-
s from di↵erent orders are carried in several packages
event log for orders
overall log order o1 order o2 order o3
order o1 create order
m i1,1 to order o1 add item
order o2 create order
m i2,1 to order o2 add item
order o3 create order
m i3,1 to order o3 add item
der o3 pay order
m i1,2 to order o1 add item
der o1 pay order
m i1,1 into package p1 load item
m i2,2 to order o2 add item
m i3,1 into package p1 load item
m i2,3 to order o2 add item
der o2 pay order
m i2,1 into package p2 load item
m i1,2 into package p2 load item
m i2,2 into package p2 load item
ackage p1 send package send package
ackage p2 send package send package
m i2,3 into package p3 load item
ackage p3 send package
package p1 deliver package deliver package
package p2 deliver package deliver package
package p3 deliver package
delivery p3 accept delivery
package p1 deliver package deliver package
delivery p1 accept delivery accept delivery
package p2 deliver package deliver package
package p2 deliver package deliver package
delivery p2 accept delivery accept delivery
to discover a process model that explains the behavior
3
2
4
2
3
3
1
1
3
3
6
5
3
3
create order
3
add item
6
pay order
3
load item
6
send package
5
deliver package
11
accept delivery
5
This requires to apply a no
raw log, where a case notion is
flat view of the log is compute
case object. The right part of
this flattening when Order is
traces in this log is obtained,
order, and by filtering the raw
flat trace for a given order cont
directly refer to that order, or
or to a package that carries o
order.
Two undesired e↵ects consequ
1. Replication of tasks. When
objects, its related events
such case objects. In our
case for the events focused
refer to both order o1 and o
in these two traces.
2. Shu✏ing of independent th
applied to di↵erent object
object are shu✏ed togethe
guish to which actual obje
scenario, this is the case for
included in the same order
of that order. In the same
derstand which item is add
is delivered or accepted.
correlate a load item even
add item event, and an acce
corresponding delivery atte
The result of these two undes
covered model, which then con
formation as well as apparen
procent. This can be clearly
directly-follow graph with fre
the well-known Disco process
der log of our scenario. Due
misleadingly indicates that th
This number is derived consid
age in our scenario carry obj
non-existing loop
wrong statistics
31
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
32
Level 4-5: straightforward
syntactic manipulation

Level 3: much more di
ffi
cult

• Multiple data sources

• Interpretation of data

• Lack of explicit information
about cases and events

• Processes with one-to-many
and many-to-many relations
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
33
Level 4-5: straightforward
syntactic manipulation

Level 3: much more di
ffi
cult

• Multiple data sources

• Interpretation of data

• Lack of explicit information
about cases and events

• Processes with one-to-many
and many-to-many relations
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
Not covered today, but


Recent works by Dirk Fahland,


Wil van der Aalst, my group


https://pais.hse.ru/en/seminar-pne/


https://multiprocessmining.org


http://ocel-standard.org


34
Extracting XES from legacy data
[___,BIS2017]
Manual construction of views and ETL procedures to fetch the data

Done by IT experts, not by knowledge workers (domain experts)
Traditional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
og Extraction and Process Mining
inally, EBITmax converted the log view into a CSV file, and analysed it using th
Disco process mining toolkit7
.
35
Extracting XES from legacy data
[___,BIS2017]
Crucial issues:

• Correctness: who knows? Process mining is dangerous if applied on wrong
data

• maintenance, evolution, change of perspective are hard… but process mining
should be highly interactive
Traditional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
og Extraction and Process Mining
inally, EBITmax converted the log view into a CSV file, and analysed it using th
Disco process mining toolkit7
.
36
The onprom approach
onprom.inf.unibz.it
Semantic technologies to: 

1.Understand the data

2. Access the data using the domain vocabulary
3. Express the perspective for process mining using the domain vocabulary
4. Automatise the extraction of XES event logs
37
34 D. Calvanese et al.
high-level IS?
Create
conceptual
data
schema
Create
mappings
Bootstrap
model +
mappings
Enrich
model +
mappings
Choose
perspective
Create
event-data
annotations
Get
XES/CSV
Do process
mining
Other perspective?
N
Y
Y
N
Fig. 12: The onprom methodology and its four phases
the same time generating (identity) mappings to link the two specifications. The result
of bootstrapping can then be manually refined.
Once the first phase is completed, process analysts and the other involved stake-
holders do not need anymore to consider the structure of the legacy information system,
Step 1. Understand the data
38
Ontology-Based Data Access
(aka Virtual Knowledge Graphs)
39
Data access is becoming a bottleneck
Optique project: Scalable, End-User Access to Big Data (http://optique-
project.eu)

One case study: Statoil

• geologists and engineers develop models of unexplored areas based on
drilling operations done in surrounding sites

Crompton (2008): domain experts use (too much) time to fetch data for
decision making and di their job

• Engineers in the oil/gas sector: 30-70% working time spent into in data
access and data quality
40
Facts on Statoil
• 1000 TB of relational data (SQL)

• Non-aligned schemas, each with 2K+ tables

• 900 experts within “Statoil Exploration”

• Up to 4 days needed to express queries and translate them into SQL
41
Example of query
42
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
43
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
44
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
50M
€

per year
45
geologist
data
IT expert
domain
knowledge
domain IT
46
data
domain IT
geologist
IT expert
domain
knowledge
info
request ?
47
data
domain IT
geologist
IT expert
domain
knowledge
?
info
request
48
data
domain IT
geologist
IT expert
SQL queries
domain
knowledge
?
info
request
49
data
domain IT
geologist
IT expert
SQL queries
domain
knowledge
SQL
answers
?
info
request
50
data
domain IT
geologist
IT expert
SQL queries
{
domain
knowledge
integrated SQL answers
SQL
answers
?
info
request
51
data
answer
domain IT
geologist
IT expert
SQL queries
{
domain
knowledge
integrated SQL answers
SQL
answers
?
info
request
52
data
domain IT
geologist
IT expert
SQL queries
{
domain
knowledge
integrated SQL answers
answer
SQL
answers
?
info
request
53
data
geologist
domain IT
OBDA suite
knowledge
engineer
domain
knowledge
54
data
geologist
domain IT
OBDA suite
knowledge
graph
mapping
knowledge
engineer
domain
knowledge
55
data
geologist
domain
knowledge
domain IT
OBDA suite
knowledge
graph
mapping
Ontop engine
?
info
request
knowledge
engineer
56
data
geologist
domain
knowledge
domain IT
OBDA suite
knowledge
graph
mapping
Ontop engine
?
info
request
answer
knowledge
engineer
OBDA
Main components
57
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
data sources
ontology
/

conceptual model
mapping
OBDA
Main technologies
58
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
SQL
 

(or other technologies)
schema
:

OWL2 QL
/

UML class diagram
s

(virtual knowledge graph)
:

RDF triples
R2RML
SPARQL
ontop-vkg.org
• State-of-the-art OBDA system

• Compliant with RDF(S), OWL2 QL, R2RML, SPARQL

• Supports all major relational DBS (Oracle, SQL Server, Postgres, …)

• Support for other data storage mechanisms ongoing (MongoDB,…)

• Development started in 2009

• Wide adoption in academia and industry

• At the basis of https://ontopic.biz
59
Conference Example: Conceptual Schema
60
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
N.B.: in on prom we use DL-LiteA
 

(supports a controlled form of functionality)
Behind the scene…
61
(title) ⌘ Paper
⇢(title) v string
(funct title)
(type) ⌘ Paper
⇢(type) v string
(funct type)
(decTime) ⌘ DecidedPaper
⇢(decTime) v ts
(funct decTime)
(accepted) ⌘ DecidedPaper
⇢(accepted) v boolean
(funct accepted)
(pName) ⌘ Person
⇢(pName) v string
(funct pName)
(regTime) ⌘ Person
⇢(regTime) v ts
(funct regTime)
(cName) ⌘ Conference
⇢(cName) v string
(funct cName)
(crTime) ⌘ Conference
⇢(crTime) v ts
(funct crTime)
(uploadTime) ⌘ Submission
⇢(uploadTime) v ts
(funct uploadTime)
(invTime) ⌘ Assignment
⇢(invTime) v ts
(funct invTime)
(subTime) ⌘ Review
⇢(subTime) v ts
(funct subTime)
DecidedPaper v Paper
Creation v Submission
CRUpload v Submission
9Submission1 ⌘ Submission
9Submission1 ⌘ Paper
(funct Submission1)
9Submission2 ⌘ Submission
9Submission2 v Person
(funct Submission2)
9Assignment1 ⌘ Assignment
9Assignment1 v Paper
(funct Assignment1)
9Assignment2 ⌘ Assignment
9Assignment2 v Person
(funct Assignment2)
9leadsTo v Assignment
9leadsTo ⌘ Review
(funct leadsTo)
(funct leadsTo )
9submittedTo ⌘ Paper
9submittedTo v Conference
(funct submittedTo)
9notifiedBy ⌘ DecidedPaper
9notifiedBy v Person
(funct notifiedBy)
9chairs v Person
9chairs ⌘ Conference
(funct chairs )
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
have the same logical models. However, it is possible to show that the logical models
of a UML class diagram and those of the DL-LiteA ontology derived from it correspond
to each other, and hence that satisfiability of a class or association in the UML diagram
Mapping Example
62
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. We
give some examples of mapping assertions:
– The following mapping assertion explicitly populates the concept Creation. The
term :submission/{oid} in the target part represents a URI template with
one placeholder, {oid}, which gets replaced with the values for oid retrieved
through the source query. This mapping expresses that each value in SUBMISSION
identified by oid and such that its upload time equals the corresponding paper’s
creation time, is mapped to an object :submission/oid, which becomes an
instance of concept Creation in T .
SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
:submission/{oid} rdf:type :Creation .
– The following mapping assertion retrieves from the PAPER table instances of the
concept Paper, and instantiates also their features title and type with values of type
String.
SELECT ID, title, type
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
Abstraction Layers
63
Data
Domain
Model
Reference
Models
Spaghetti Mappings
64
Data
Domain
Model
Reference
Models
data

analysis
reporting KPIs
query

answering
From OBDA to 2-level OBDA
[___,EKAW2018]
65
66
data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
relational DB
GAV mappings

(SQL query -> atom)
UML class diagram /

OWL2 QL TBox
Transformation rules

(ontology2ontology 

GAV mappings)
UML class diagram /

OWL2 QL TBox
UCQs

From OBDA to 2-level OBDA
[___,EKAW2018]
Theoretical Results
67
data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
Q
Q’
data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
Theoretical Results
68
Q
Q’
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain sc
identify services an
UFO
inspect contr
OBDA
(b) 2OBDA
for service ma
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
(b) 2OBDA
for service ma
d
a
t
a
m
a
p
d
o
m
a
i
n
s
c
h
e
m
a
t
r
a
n
s
f
o
r
m
u
p
p
e
r
s
c
h
e
m
a
q
u
e
r
y
/
a
n
s
w
e
r
O
B
D
A
(
a
)
2
-
l
e
v
e
l
O
B
D
A
d
m
d
o
m
a
i
n
i
d
e
n
t
i
f
y
s
e
r
v
i
c
e
s
U
F
i
n
s
p
e
c
t
c
o
O
B
D
A
(
b
)
2
O
B
D
f
o
r
s
e
r
v
i
c
e
’
Case study: reference model
69
Conceptual Schema Transformation in Ontology-bas
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework (c) 2O
Conceptual Schema Transformation in Ontolo
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework
Case study: process mining!
70
Conceptual Schema Transformation in Ontolo
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework
al Schema Transformation in Ontology-based Data Access 3
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework
data
map
domain schema
identify cases and events
event log format
fetch cases and events
process mining tool
OBDA
(c) 2OBDA framework for pro-
Step 2. Find the event data
71
Annotating the Conceptual Schema
Fix perspective: declare the case

• Find the class whose instances are considered as case objects

• Express additional
fi
lters

Find the events (looking for timestamps)

• Find the classes whose instances refer to events

• Declare how they are connected to corresponding case objects —> navigation in the
UML class diagram

• Declare how they are (in)directly related to event attributes

(timestamp, task name, optionally event type and resource)

—> navigation in the UML class diagram
72
Conference Example
Case Annotation
73
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Conference Example
Case Annotation
74
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Conference Example
Event annotation
75
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Conference Example
Event annotation
76
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
77
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Switching Perspective
Simply amounts to rede
fi
ne the annotations
• Flow of accepted papers

• Flow of full papers

• Flow of reviews

• Flow of authors

• Flow of reviewers

• ….
78
Step 3. Get your log, automatically
79
Formalizing Annotations
Annotations are nothing else than SPARQL queries over the conceptual data
schema!

• Case annotation: query retrieving case objects

• Event annotation: query retrieving event objects

• Case-attribute annotation: query retrieving pairs <attribute, case>

• Event-attribute annotation: query retrieving pairs <attribute, event>
80
81
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 16: Annotated data model of our CONFSYS running example
annotations, respectively used to capture the relationship between the event and its cor-
responding case(s), timestamp, and activity. As pointed out before, the timestamp anno-
SELECT DISTINCT ?case
WHERE {
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries wi
swer variable, this time matching with actual event identifiers, i.e., ob
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries w
variables, establishing a relation between events and their correspondin
ues. In this light, for timestamp and activity attribute annotations, the
variable will be substituted by corresponding values for timestamps/activ
case attribute annotations, instead, the second answer variable will be
case objects, thus establishing a relationship between events and the c
long to.
Example 15. Consider again the annotation for creation events, as show
The relationship between creation events and their corresponding times
lished by the following query:
PREFIX : <http://www.example.com/>
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries with a single an-
swer variable, this time matching with actual event identifiers, i.e., objects denoting
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
Annotations and XES Elements
Annotations can be easily “mapped” onto XES elements:

case annotation query —> traces

event annotation query —> events

attribute annotation query —> trace/event attributes with given key

82
OBDA for Log Extraction in Process Mining 35
Attribute
attKey: String
attType: String
attValue: String
Event
Trace
e-has-a
t-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
Conference Example:
Case Annotation
83
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
ecTime: ts
ccepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 16: Annotated data model of our CONFSYS running example
notations, respectively used to capture the relationship between the event and its cor-
sponding case(s), timestamp, and activity. As pointed out before, the timestamp anno-
ion needs to have a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries with a single an
swer variable, this time matching with actual event identifiers, i.e., objects denoting
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answe
variables, establishing a relation between events and their corresponding attribute val
ues. In this light, for timestamp and activity attribute annotations, the second answe
variable will be substituted by corresponding values for timestamps/activity names. Fo
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16
The relationship between creation events and their corresponding timestamps is estab
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
XES events:

- id: ?creationEvent
Event annotations are also tackled using SPARQL SELECT queries with a single an-
swer variable, this time matching with actual event identifiers, i.e., objects denoting
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES attribute:

- key: timestamp extension


- type: milliseconds


- value: ?creationTime


- parent event: ?creationEvent
Rewriting Annotations
Annotations are nothing else than SPARQL queries over the conceptual data
schema
84
They can be automatically reformulated as SQL
queries over the legacy data
We automatically get a standard OBDA mapping
from the legacy data to the XES concepts
85
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Submission."UploadTime" = Paper."CT" AND
Submission."ID" IS NOT NULL
This query is generated by the ontop OBDA system, which applies various optimisa-
tions so as to obtain a final SQL query that is not only correct, but also possibly compact
and fast to process by a standard DBMS. One such optimisations is the application of
Person
pName : String
regTime: ts
CRUpload Creation
o
chairs
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
* 1..*
1 1
model of our CONFSYS running example
apture the relationship between the event and its cor-
activity. As pointed out before, the timestamp anno-
vigation. This also applies to the activity annotation,
ad of providing a functional navigation, the activity
a constant string that independently fixes the name
datory attributes, additional optional attribute anno-
over the various standard extensions provided XES,
within the activity transactional lifecycle, as well as
ituted by the resource name and/or role.
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES events:

- id: ?creationEvent
OBDA for Log Extraction in Process Mining 43
ch SQL query q(c) 2 Lsql obtained from a case annotation, we insert into
he following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
vely, such a mapping populates the concept Trace in E with the case objects
e created from the answers returned by query q(c).
ch SQL query q(e) 2 Lsql that is obtained from an event annotation, we
nto ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
vely, such a mapping populates the concept Event in E with the event objects
e created from the answers returned by query q(e).
OBDA for Log Extraction in Process Mining 43
1. For each SQL query q(c) 2 Lsql obtained from a case annotation, we insert into
ME
P the following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
Intuitively, such a mapping populates the concept Trace in E with the case objects
that are created from the answers returned by query q(c).
2. For each SQL query q(e) 2 Lsql that is obtained from an event annotation, we
insert into ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
Intuitively, such a mapping populates the concept Event in E with the event objects
that are created from the answers returned by query q(e).
as a XES event log, and also to actually materialise such an event log.
Technically, onprom takes as input an onprom model P = hI, T , M, Li and the
event schema E, and produces new OBDA system hI, ME
P , Ei, where the annotations
in L are automatically reformulated as OBDA mappings ME
P that directly link I to E.
Such mappings are synthesised using the three-step approach described next.
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Recap
86
OBDA for Log Extraction in Process Mining 37
D
(database)
R
(db schema)
conforms to
M
(mapping specification)
T
(conceptual data schema)
L
(event-data annotations)
P (onprom model)
E
(conceptual event schema)
annotates
points to
ME
P
(log mapping specification)
I (information system)
B (OBDA model)
Fig. 15: Sketch of the onprom model. The dashed mapping specification is automati-
Querying the “Virtual Log”
SPARQL queries over the event schema are answered using legacy data

• Example: get empty and nonempty traces; for nonempty traces, also fetch all their events

Answers can be serialised into a fully compliant XES log!
87
name.
The following query is instead meant to retrieve (elementary) attributes, considering
in particular their key, type, and value.
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?att ?attType ?attKey ?attValue
WHERE {
?att rdf:type :Attribute;
:attType ?attType;
:attKey ?attKey;
:attVal ?attValue.
}
The following query handles the retrieval of empty and nonempty traces, simulta-
neously obtaining, for nonempty traces, their constitutive events:
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?trace ?event
WHERE {
?trace a :Trace .
OPTIONAL {
?trace :t-contain-e ?event .
?event :e-contain-a ?timestamp .
?timestamp :attKey "time:timestamp"ˆˆxsd:string .
?event :e-contain-a ?name .
?name :attKey "concept:name"ˆˆxsd:string .
}
}
4.6 The onprom Toolchain
onprom comes with a toolchain that supports the various phases of the methodology
The onprom Toolchain
Implementation of all the described steps using

• Java (GUIs, algorithms)

• OWL 2 QL plus functionality (conceptual schemas)

• ontop (OBDA system)

• OpenXES (XES serialisation and manipulation)

• ProM process mining framework (environment)
88
onprom UML Editor
89
46 D. Calvanese et al.
Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
onprom Annotation Editor
90
OBDA for Log Extraction in Process Mining 47
Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
onprom Log Extractor
91
OBDA for Log Extraction in Process Mining 49
Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
Experiments
• Very encouraging initial experiments

• Carried out using synthetic data

• We are looking for real case studies!
92
Data Generation with CPN Tools
93
Results
94
Postgres
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Running
time
(in
milliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
Running
time
(in
milliseconds)
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Running
time
(in
milliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
# Tuple(s) in the whole database
Running
time
(in
milliseconds)
~11 mins to extract ~9M XES components
from ~3,5M tuples
95
Conclusions
• Process Mining as a way to reconcile model-driven management and the real
behaviours

• Data preparation is an issue in presence of legacy data

• Ontology-Based Data Access: solid theoretical basis with optimised
implementations

• onprom as an e
ff
ective tool chain for extracting event logs from legacy
databases

• Several simpli
fi
ed settings can emerge depending on the context:
fi
xed ERP
schema, reference models, …
96
Future Work
• Conceptual Modeling

• How to improve the discovery of events?

• How to semi-automatically propose events to the user?

• How to integrate methodologies and results from formal ontology?

• Engineering

• How to handle di
ff
erent types of data?

• How to deal with di
ff
erent event schemas that go beyond XES?

• How to generalise the approach to handle rich ontology-to-ontology-mappings?
97
98
Thank you!

More Related Content

Similar to From legacy data to event data

Presentation
PresentationPresentation
Presentation
Marlon Etheredge
 
Workshop on requirements and modeling at HAE 2015
Workshop on requirements and modeling at HAE 2015Workshop on requirements and modeling at HAE 2015
Workshop on requirements and modeling at HAE 2015
Olivier Béghain
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
Edhole.com
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
Edhole.com
 
Ch08
Ch08Ch08
Ch08
Ch08Ch08
Analysis of database tampering
Analysis of database tamperingAnalysis of database tampering
Analysis of database tampering
saddamhusain hadimani
 
VTU - MIS Module 4 - SDLC
VTU - MIS Module 4 - SDLCVTU - MIS Module 4 - SDLC
VTU - MIS Module 4 - SDLC
Priya Diana Mercy
 
Synopsis of project of MTech - III Sem in AKTU
Synopsis of project of MTech - III Sem in AKTUSynopsis of project of MTech - III Sem in AKTU
Synopsis of project of MTech - III Sem in AKTU
GauravSingh964476
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
Neil Avery
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestrationCamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
Bernd Ruecker
 
Mythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingMythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event Processing
Tim Bass
 
Process Mining and Predictive Process Monitoring
Process Mining and Predictive Process MonitoringProcess Mining and Predictive Process Monitoring
Process Mining and Predictive Process Monitoring
Marlon Dumas
 
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptx
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptxSE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptx
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptx
Amr E. Mohamed
 
chapter_5_5.ppt
chapter_5_5.pptchapter_5_5.ppt
chapter_5_5.ppt
MonirHossain707319
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
Guido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soaGuido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soa
Guido Schmutz
 

Similar to From legacy data to event data (20)

Presentation
PresentationPresentation
Presentation
 
Workshop on requirements and modeling at HAE 2015
Workshop on requirements and modeling at HAE 2015Workshop on requirements and modeling at HAE 2015
Workshop on requirements and modeling at HAE 2015
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Ch08
Ch08Ch08
Ch08
 
Ch08
Ch08Ch08
Ch08
 
Analysis of database tampering
Analysis of database tamperingAnalysis of database tampering
Analysis of database tampering
 
VTU - MIS Module 4 - SDLC
VTU - MIS Module 4 - SDLCVTU - MIS Module 4 - SDLC
VTU - MIS Module 4 - SDLC
 
Synopsis of project of MTech - III Sem in AKTU
Synopsis of project of MTech - III Sem in AKTUSynopsis of project of MTech - III Sem in AKTU
Synopsis of project of MTech - III Sem in AKTU
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestrationCamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
 
Mythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event ProcessingMythbusters: Event Stream Processing v. Complex Event Processing
Mythbusters: Event Stream Processing v. Complex Event Processing
 
Process Mining and Predictive Process Monitoring
Process Mining and Predictive Process MonitoringProcess Mining and Predictive Process Monitoring
Process Mining and Predictive Process Monitoring
 
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptx
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptxSE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptx
SE2018_Lec 14_ Process Modeling and Data Flow Diagram.pptx
 
chapter_5_5.ppt
chapter_5_5.pptchapter_5_5.ppt
chapter_5_5.ppt
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
Guido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soaGuido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soa
 

More from Faculty of Computer Science - Free University of Bozen-Bolzano

From Case-Isolated to Object-Centric Processes - A Tale of two Models
From Case-Isolated to Object-Centric Processes - A Tale of two ModelsFrom Case-Isolated to Object-Centric Processes - A Tale of two Models
From Case-Isolated to Object-Centric Processes - A Tale of two Models
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic SettingReasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Constraints for Process Framing in Augmented BPM
Constraints for Process Framing in Augmented BPMConstraints for Process Framing in Augmented BPM
Constraints for Process Framing in Augmented BPM
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Intelligent Systems for Process Mining
Intelligent Systems for Process MiningIntelligent Systems for Process Mining
Declarative process mining
Declarative process miningDeclarative process mining
Process Reasoning and Mining with Uncertainty
Process Reasoning and Mining with UncertaintyProcess Reasoning and Mining with Uncertainty
Process Reasoning and Mining with Uncertainty
Faculty of Computer Science - Free University of Bozen-Bolzano
 
From Case-Isolated to Object-Centric Processes
From Case-Isolated to Object-Centric ProcessesFrom Case-Isolated to Object-Centric Processes
From Case-Isolated to Object-Centric Processes
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Modeling and Reasoning over Declarative Data-Aware Processes
Modeling and Reasoning over Declarative Data-Aware ProcessesModeling and Reasoning over Declarative Data-Aware Processes
Modeling and Reasoning over Declarative Data-Aware Processes
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Soundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic ConditionsSoundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic Conditions
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Probabilistic Trace Alignment
Probabilistic Trace AlignmentProbabilistic Trace Alignment
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple ActorsStrategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Putting Decisions in Perspective(s)
Putting Decisions in Perspective(s)Putting Decisions in Perspective(s)
Enriching Data Models with Behavioral Constraints
Enriching Data Models with Behavioral ConstraintsEnriching Data Models with Behavioral Constraints
Enriching Data Models with Behavioral Constraints
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Representing and querying norm states using temporal ontology-based data access
Representing and querying norm states using temporal ontology-based data accessRepresenting and querying norm states using temporal ontology-based data access
Representing and querying norm states using temporal ontology-based data access
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Compliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process modelsCompliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process models
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Processes and organizations - a look behind the paper wall
Processes and organizations - a look behind the paper wallProcesses and organizations - a look behind the paper wall
Processes and organizations - a look behind the paper wall
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMNFormal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
Faculty of Computer Science - Free University of Bozen-Bolzano
 

More from Faculty of Computer Science - Free University of Bozen-Bolzano (20)

From Case-Isolated to Object-Centric Processes - A Tale of two Models
From Case-Isolated to Object-Centric Processes - A Tale of two ModelsFrom Case-Isolated to Object-Centric Processes - A Tale of two Models
From Case-Isolated to Object-Centric Processes - A Tale of two Models
 
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic SettingReasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
 
Constraints for Process Framing in Augmented BPM
Constraints for Process Framing in Augmented BPMConstraints for Process Framing in Augmented BPM
Constraints for Process Framing in Augmented BPM
 
Intelligent Systems for Process Mining
Intelligent Systems for Process MiningIntelligent Systems for Process Mining
Intelligent Systems for Process Mining
 
Declarative process mining
Declarative process miningDeclarative process mining
Declarative process mining
 
Process Reasoning and Mining with Uncertainty
Process Reasoning and Mining with UncertaintyProcess Reasoning and Mining with Uncertainty
Process Reasoning and Mining with Uncertainty
 
From Case-Isolated to Object-Centric Processes
From Case-Isolated to Object-Centric ProcessesFrom Case-Isolated to Object-Centric Processes
From Case-Isolated to Object-Centric Processes
 
Modeling and Reasoning over Declarative Data-Aware Processes
Modeling and Reasoning over Declarative Data-Aware ProcessesModeling and Reasoning over Declarative Data-Aware Processes
Modeling and Reasoning over Declarative Data-Aware Processes
 
Soundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic ConditionsSoundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic Conditions
 
Probabilistic Trace Alignment
Probabilistic Trace AlignmentProbabilistic Trace Alignment
Probabilistic Trace Alignment
 
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple ActorsStrategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
 
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
 
Putting Decisions in Perspective(s)
Putting Decisions in Perspective(s)Putting Decisions in Perspective(s)
Putting Decisions in Perspective(s)
 
Enriching Data Models with Behavioral Constraints
Enriching Data Models with Behavioral ConstraintsEnriching Data Models with Behavioral Constraints
Enriching Data Models with Behavioral Constraints
 
Representing and querying norm states using temporal ontology-based data access
Representing and querying norm states using temporal ontology-based data accessRepresenting and querying norm states using temporal ontology-based data access
Representing and querying norm states using temporal ontology-based data access
 
Compliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process modelsCompliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process models
 
Processes and organizations - a look behind the paper wall
Processes and organizations - a look behind the paper wallProcesses and organizations - a look behind the paper wall
Processes and organizations - a look behind the paper wall
 
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMNFormal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
 

Recently uploaded

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 

Recently uploaded (20)

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 

From legacy data to event data

  • 1. Marco Montali Free University of Bozen-Bolzano, Italy credits: Diego Calvanese, Tahir Emre Kalayci, Ario Santoso, Wil van der Aalst From legacy data to event data 1
  • 2. My research in one slide I investigate foundational and applied techniques grounded in arti fi cial intelligence for modelling, veri fi cation, execution, monitoring, and mining of dynamic systems operating over data, with a speci fi c focus on business process management and multiagent systems. 2
  • 3. How to attack this challenges? Arti fi cial Intelligence Knowledge representation Automated reasoning Multiagent systems Information Systems Business process management Master data management Decision management Formal Methods In fi nite-state systems Veri fi cation Petri nets Data Science Process mining 3
  • 7. Processes leave digital breadcrumbs… Organisational level: • Internal management • Calculation of process metrics/KPIs • Legal reasons 
 (compliance, external audits) Personal level: • We live in a digital society! • Social networks, sensors, cyberphysical systems, mobile devices are all data loggers 7
  • 8. 50% 50% data models 50% 50% con fi gure/ deploy diagnose / get reqs. enact/ monitor (re ) design adjust IT support reality (knowledge ) workers managers / analysts 8
  • 9. 50% 50% data models 50% 50% con fi gure/ deploy diagnose / get reqs. enact/ monitor (re ) design adjust IT support reality (knowledge ) workers managers / analysts 9
  • 10. 50% 50% data models 50% 50% con fi gure/ deploy diagnose / get reqs. enact/ monitor (re ) design adjust IT support reality (knowledge ) workers managers / analysts 10
  • 11. PM2 [Eck et al., CAiSE 2015] Initialization Analysis iterations Analysis iterations Analysis iterations 1. Planning 2. Extraction 3. Data processing 4. Mining & Analysis 5. Evaluation 6. Process Improvement & Support Discovery Conformance Enhancement Event logs Improve- ment ideas Event Data Information System Process Models Performance findings Compliance findings Stage Output / Input Research questions Refined/new research questions Analytics Analytic models Business experts Process analysts 11
  • 13. IEEE standard XES www.xes-standard.org IEEE Standard for the representation of event logs • Based on XML • Minimal mandatory structure: 
 log consists of traces, each representing the history of a case
 trace consists of a list of atomic events • Extensions to “decorate” log, trace, event with informative attributes: timestamps, task names, transactional lifecycle, resources, additional event data • Supports “meta-level” declarations useful for log processors 13
  • 14. Full XES Schema attKey: String attType: String Attribute extName: String extPrefix: String extUri: String Extension attValue: String ElementaryAttribute CompositeAttribute {disjoint} ca-contains-a * * logFeatures: String logVersion: String Log Trace Event GlobalAttribute GlobalEventAttribute GlobalTraceAttribute EventClassifier TraceClassifier name: String Classifier a-contains-a * * * * e-usedBy-a e-usedBy-l * * l-contains-t t-contains-e * * * 1..* l-contains-e * * * * * * * * l-has-a t-has-a e-has-a l-has-gea * 1..* l-contains-ec 1..* * 1..* l-contains-tc * ec-definedBy-gea 1..* * 1..* 1..* * tc-definedBy-gea l-has-gta * {disjoint} {disjoint} 14
  • 15. Core XES schema OBDA for Log Extraction in Process Mining 3 Attribute attKey: String attType: String attValue: String Event Trace e-has-a t-has-a t-contains-e 0..* 0..* 0..* 0..* 1..* 0..* Fig. 13: Core event schema 15
  • 16. <log xes.version="1.0" xes.features="nested-attributes" openxes.version="1.0RC7"> <extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/> <classifier name="Event Name" keys="concept:name"/> <string key="concept:name" value="XES Event Log"/> ... <trace> <string key="concept:name" value="1"/> <event> <string key="User" value="Pete"/> <string key="concept:name" value="create paper"/> <int key="Event ID" value="35654424"/> ... </event> <event> ... <string key="concept:name" value="submit paper"/> ... </event> ... 16
  • 17. A simple process Apologies for being so predictable… create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; 17
  • 18. The lucky situation create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged. Example 1. As a running example, we consider a simplified conference submission system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au- thors, reviewers, and conference chairs in the submission of papers to conferences, the consequent review process, and the final decision about paper acceptance or rejection. Figure 2 shows the process control flow considering papers as case objects. Under this perspective, the management of a single paper evolves through the following execution steps. First, the paper is created by one of its authors, and submitted to a conference available in the system. Once the paper is submitted, the review phase for that paper starts. This phase of the process consists of a so-called multi-instance section, i.e., a section of the process where the same set of activities is instantiated multiple times on Event Data Case ID ID Timestamp Activity User ... 1 35654423 30-12-2010:11.02 create paper Pete ... 35654424 31-12-2010:10.06 submit paper Pete ... 35654425 05-01-2011:15.12 assign review Mike ... 35654426 06-01-2011:11.18 submit review Sara ... 35654428 07-01-2011:14.24 accept paper Mike ... 35654429 06-01-2011:11.18 upload CR Pete ... 2 35654483 30-12-2010:11.32 create paper George ... 35654485 30-12-2010:12.12 submit paper John ... 35654487 30-12-2010:14.16 assign review Mike ... 35654489 16-01-2011:10.30 submit review Ellen ... 35654490 18-01-2011:12.05 reject paper Mike ... 50% 18
  • 20. The common case ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~ x). Example 10. Consider the CONFSYS running example, and an information system whose db schema R consists of the eight relational tables shown in Figure 11. Some example mapping assertions are the following ones: 1. SELECT DISTINCT SUBMISSION.ID AS oid FROM SUBMISSION, PAPER WHERE SUBMISSION.PAPER = PAPER.ID AND SUBMISSION.UPLOADTIME = PAPER.CT 20
  • 21. The common case ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~ x). Example 10. Consider the CONFSYS running example, and an information system whose db schema R consists of the eight relational tables shown in Figure 11. Some example mapping assertions are the following ones: 1. SELECT DISTINCT SUBMISSION.ID AS oid FROM SUBMISSION, PAPER WHERE SUBMISSION.PAPER = PAPER.ID AND SUBMISSION.UPLOADTIME = PAPER.CT 21
  • 22. The common case ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~ x). Example 10. Consider the CONFSYS running example, and an information system whose db schema R consists of the eight relational tables shown in Figure 11. Some example mapping assertions are the following ones: 1. SELECT DISTINCT SUBMISSION.ID AS oid FROM SUBMISSION, PAPER WHERE SUBMISSION.PAPER = PAPER.ID AND SUBMISSION.UPLOADTIME = PAPER.CT 22
  • 24. Intertwined objects time data activities Order Item Package 1 includes * * is carried in 1 o1 o2 o3 i1,1 i1,2 i2,1 i2,2 i2,3 i3,1 p1 p2 p3 Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce- nario whereuv items from di↵erent orders are carried in several packages event log for orders timestamp overall log order o1 order o2 order o3 2019-09-22 10:00:00 create order o1 create order 2019-09-22 10:01:00 add item i1,1 to order o1 add item 2019-09-23 09:20:00 create order o2 create order 2019-09-23 09:34:00 add item i2,1 to order o2 add item 2019-09-23 11:33:00 create order o3 create order 2019-09-23 11:40:00 add item i3,1 to order o3 add item 2019-09-23 12:27:00 pay order o pay order Have you ever 
 placed orders online? 24
  • 25. Flattening reality time data activities Order Item Package 1 includes * * is carried in 1 o1 o2 o3 i1,1 i1,2 i2,1 i2,2 i2,3 i3,1 p1 p2 p3 Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce- nario whereuv items from di↵erent orders are carried in several packages event log for orders timestamp overall log order o1 order o2 order o3 2019-09-22 10:00:00 create order o1 create order 2019-09-22 10:01:00 add item i1,1 to order o1 add item 2019-09-23 09:20:00 create order o2 create order 2019-09-23 09:34:00 add item i2,1 to order o2 add item 2019-09-23 11:33:00 create order o3 create order 2019-09-23 11:40:00 add item i3,1 to order o3 add item 2019-09-23 12:27:00 pay order o pay order focus on orders 25
  • 27. The effect of flattening Package is carried in 1 p1 p2 p3 Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce- nario whereuv items from di↵erent orders are carried in several packages event log for orders timestamp overall log order o1 order o2 order o3 2019-09-22 10:00:00 create order o1 create order 2019-09-22 10:01:00 add item i1,1 to order o1 add item 2019-09-23 09:20:00 create order o2 create order 2019-09-23 09:34:00 add item i2,1 to order o2 add item 2019-09-23 11:33:00 create order o3 create order 2019-09-23 11:40:00 add item i3,1 to order o3 add item 2019-09-23 12:27:00 pay order o3 pay order 2019-09-23 12:32:00 add item i1,2 to order o1 add item 2019-09-23 13:03:00 pay order o1 pay order 2019-09-23 14:34:00 load item i1,1 into package p1 load item 2019-09-23 14:45:00 add item i2,2 to order o2 add item 2019-09-23 14:51:00 load item i3,1 into package p1 load item 2019-09-23 15:12:00 add item i2,3 to order o2 add item 2019-09-23 15:41:00 pay order o2 pay order 2019-09-23 16:23:00 load item i2,1 into package p2 load item 2019-09-23 16:29:00 load item i1,2 into package p2 load item 2019-09-23 16:33:00 load item i2,2 into package p2 load item 2019-09-23 17:01:00 send package p1 send package send package 2019-09-24 06:38:00 send package p2 send package send package 2019-09-24 07:33:00 load item i2,3 into package p3 load item 2019-09-24 08:46:00 send package p3 send package 2019-09-24 16:21:00 deliver package p1 deliver package deliver package 2019-09-24 17:32:00 deliver package p2 deliver package deliver package 2019-09-24 18:52:00 deliver package p3 deliver package 2019-09-24 18:57:00 accept delivery p3 accept delivery 2019-09-25 08:30:00 deliver package p1 deliver package deliver package 2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery 2019-09-25 09:55:00 deliver package p2 deliver package deliver package 2019-09-25 17:11:00 deliver package p2 deliver package deliver package 2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery 27
  • 28. The effect of flattening Package is carried in 1 p1 p2 p3 Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce- nario whereuv items from di↵erent orders are carried in several packages event log for orders timestamp overall log order o1 order o2 order o3 2019-09-22 10:00:00 create order o1 create order 2019-09-22 10:01:00 add item i1,1 to order o1 add item 2019-09-23 09:20:00 create order o2 create order 2019-09-23 09:34:00 add item i2,1 to order o2 add item 2019-09-23 11:33:00 create order o3 create order 2019-09-23 11:40:00 add item i3,1 to order o3 add item 2019-09-23 12:27:00 pay order o3 pay order 2019-09-23 12:32:00 add item i1,2 to order o1 add item 2019-09-23 13:03:00 pay order o1 pay order 2019-09-23 14:34:00 load item i1,1 into package p1 load item 2019-09-23 14:45:00 add item i2,2 to order o2 add item 2019-09-23 14:51:00 load item i3,1 into package p1 load item 2019-09-23 15:12:00 add item i2,3 to order o2 add item 2019-09-23 15:41:00 pay order o2 pay order 2019-09-23 16:23:00 load item i2,1 into package p2 load item 2019-09-23 16:29:00 load item i1,2 into package p2 load item 2019-09-23 16:33:00 load item i2,2 into package p2 load item 2019-09-23 17:01:00 send package p1 send package send package 2019-09-24 06:38:00 send package p2 send package send package 2019-09-24 07:33:00 load item i2,3 into package p3 load item 2019-09-24 08:46:00 send package p3 send package 2019-09-24 16:21:00 deliver package p1 deliver package deliver package 2019-09-24 17:32:00 deliver package p2 deliver package deliver package 2019-09-24 18:52:00 deliver package p3 deliver package 2019-09-24 18:57:00 accept delivery p3 accept delivery 2019-09-25 08:30:00 deliver package p1 deliver package deliver package 2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery 2019-09-25 09:55:00 deliver package p2 deliver package deliver package 2019-09-25 17:11:00 deliver package p2 deliver package deliver package 2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery orders 28
  • 29. The effect of flattening Package is carried in 1 p1 p2 p3 Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce- nario whereuv items from di↵erent orders are carried in several packages event log for orders timestamp overall log order o1 order o2 order o3 2019-09-22 10:00:00 create order o1 create order 2019-09-22 10:01:00 add item i1,1 to order o1 add item 2019-09-23 09:20:00 create order o2 create order 2019-09-23 09:34:00 add item i2,1 to order o2 add item 2019-09-23 11:33:00 create order o3 create order 2019-09-23 11:40:00 add item i3,1 to order o3 add item 2019-09-23 12:27:00 pay order o3 pay order 2019-09-23 12:32:00 add item i1,2 to order o1 add item 2019-09-23 13:03:00 pay order o1 pay order 2019-09-23 14:34:00 load item i1,1 into package p1 load item 2019-09-23 14:45:00 add item i2,2 to order o2 add item 2019-09-23 14:51:00 load item i3,1 into package p1 load item 2019-09-23 15:12:00 add item i2,3 to order o2 add item 2019-09-23 15:41:00 pay order o2 pay order 2019-09-23 16:23:00 load item i2,1 into package p2 load item 2019-09-23 16:29:00 load item i1,2 into package p2 load item 2019-09-23 16:33:00 load item i2,2 into package p2 load item 2019-09-23 17:01:00 send package p1 send package send package 2019-09-24 06:38:00 send package p2 send package send package 2019-09-24 07:33:00 load item i2,3 into package p3 load item 2019-09-24 08:46:00 send package p3 send package 2019-09-24 16:21:00 deliver package p1 deliver package deliver package 2019-09-24 17:32:00 deliver package p2 deliver package deliver package 2019-09-24 18:52:00 deliver package p3 deliver package 2019-09-24 18:57:00 accept delivery p3 accept delivery 2019-09-25 08:30:00 deliver package p1 deliver package deliver package 2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery 2019-09-25 09:55:00 deliver package p2 deliver package deliver package 2019-09-25 17:11:00 deliver package p2 deliver package deliver package 2019-09-25 17:12:00 accept delivery p2 accept delivery accept delivery orders 29
  • 30. Discovery? p1 p2 p3 e of order, item, and package data objects in an order-to-delivery sce- s from di↵erent orders are carried in several packages event log for orders overall log order o1 order o2 order o3 order o1 create order m i1,1 to order o1 add item order o2 create order m i2,1 to order o2 add item order o3 create order m i3,1 to order o3 add item der o3 pay order m i1,2 to order o1 add item der o1 pay order m i1,1 into package p1 load item m i2,2 to order o2 add item m i3,1 into package p1 load item m i2,3 to order o2 add item der o2 pay order m i2,1 into package p2 load item m i1,2 into package p2 load item m i2,2 into package p2 load item ackage p1 send package send package ackage p2 send package send package m i2,3 into package p3 load item ackage p3 send package package p1 deliver package deliver package package p2 deliver package deliver package package p3 deliver package delivery p3 accept delivery package p1 deliver package deliver package delivery p1 accept delivery accept delivery package p2 deliver package deliver package package p2 deliver package deliver package delivery p2 accept delivery accept delivery to discover a process model that explains the behavior 3 2 4 2 3 3 1 1 3 3 6 5 3 3 create order 3 add item 6 pay order 3 load item 6 send package 5 deliver package 11 accept delivery 5 This requires to apply a no raw log, where a case notion is flat view of the log is compute case object. The right part of this flattening when Order is traces in this log is obtained, order, and by filtering the raw flat trace for a given order cont directly refer to that order, or or to a package that carries o order. Two undesired e↵ects consequ 1. Replication of tasks. When objects, its related events such case objects. In our case for the events focused refer to both order o1 and o in these two traces. 2. Shu✏ing of independent th applied to di↵erent object object are shu✏ed togethe guish to which actual obje scenario, this is the case for included in the same order of that order. In the same derstand which item is add is delivered or accepted. correlate a load item even add item event, and an acce corresponding delivery atte The result of these two undes covered model, which then con formation as well as apparen procent. This can be clearly directly-follow graph with fre the well-known Disco process der log of our scenario. Due misleadingly indicates that th This number is derived consid age in our scenario carry obj 30
  • 31. Discovery? p1 p2 p3 e of order, item, and package data objects in an order-to-delivery sce- s from di↵erent orders are carried in several packages event log for orders overall log order o1 order o2 order o3 order o1 create order m i1,1 to order o1 add item order o2 create order m i2,1 to order o2 add item order o3 create order m i3,1 to order o3 add item der o3 pay order m i1,2 to order o1 add item der o1 pay order m i1,1 into package p1 load item m i2,2 to order o2 add item m i3,1 into package p1 load item m i2,3 to order o2 add item der o2 pay order m i2,1 into package p2 load item m i1,2 into package p2 load item m i2,2 into package p2 load item ackage p1 send package send package ackage p2 send package send package m i2,3 into package p3 load item ackage p3 send package package p1 deliver package deliver package package p2 deliver package deliver package package p3 deliver package delivery p3 accept delivery package p1 deliver package deliver package delivery p1 accept delivery accept delivery package p2 deliver package deliver package package p2 deliver package deliver package delivery p2 accept delivery accept delivery to discover a process model that explains the behavior 3 2 4 2 3 3 1 1 3 3 6 5 3 3 create order 3 add item 6 pay order 3 load item 6 send package 5 deliver package 11 accept delivery 5 This requires to apply a no raw log, where a case notion is flat view of the log is compute case object. The right part of this flattening when Order is traces in this log is obtained, order, and by filtering the raw flat trace for a given order cont directly refer to that order, or or to a package that carries o order. Two undesired e↵ects consequ 1. Replication of tasks. When objects, its related events such case objects. In our case for the events focused refer to both order o1 and o in these two traces. 2. Shu✏ing of independent th applied to di↵erent object object are shu✏ed togethe guish to which actual obje scenario, this is the case for included in the same order of that order. In the same derstand which item is add is delivered or accepted. correlate a load item even add item event, and an acce corresponding delivery atte The result of these two undes covered model, which then con formation as well as apparen procent. This can be clearly directly-follow graph with fre the well-known Disco process der log of our scenario. Due misleadingly indicates that th This number is derived consid age in our scenario carry obj non-existing loop wrong statistics 31
  • 32. Level Characterization Examples ★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy and complete) and events are well-defined. Events are recorded in an automatic, systematic, reliable, and safe manner. Privacy and security considerations are addressed adequately. Moreover, the events recorded (and all of their attributes) have clear semantics. This implies the existence of one or more ontologies. Events and their attributes point to this ontology. Semantically annotated logs of BPM systems. ★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable manner, i.e., logs are trustworthy and complete. Unlike the systems operating at level , notions such as process instance (case) and activity are supported in an explicit manner. Events logs of traditional BPM/ workflow systems. ★ ★ ★ Events are recorded automatically, but no systematic approach is followed to record events. However, unlike logs at level , there is some level of guarantee that the events recorded match reality (i.e., the event log is trustworthy but not necessarily complete). Consider, for example, the events recorded by an ERP system. Although events need to be extracted from a variety of tables, the information can be assumed to be correct (e.g., it is safe to assume that a payment recorded by the ERP actually exists and vice versa). Tables in ERP systems, event logs of CRM systems, transaction logs of messaging systems, event logs of high-tech systems, etc. ★ ★ Events are recorded automatically, i.e., as a by-product of some information system. Coverage varies, i.e., no systematic approach is followed to decide which events are recorded. Moreover, it is possible to bypass the information system. Hence, events may be missing or not recorded properly. Event logs of document and product management systems, error logs of embedded systems, worksheets of service engineers, etc. ★ Lowest level: event logs are of poor quality. Recorded events may not correspond to reality and events may be missing. Event logs for which events are recorded by hand typically have such characteristics. Trails left in paper documents routed through the organization ("yellow notes"), paper-based medical records, etc. ★★★ ★★ 32
  • 33. Level 4-5: straightforward syntactic manipulation Level 3: much more di ffi cult • Multiple data sources • Interpretation of data • Lack of explicit information about cases and events • Processes with one-to-many and many-to-many relations Level Characterization Examples ★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy and complete) and events are well-defined. Events are recorded in an automatic, systematic, reliable, and safe manner. Privacy and security considerations are addressed adequately. Moreover, the events recorded (and all of their attributes) have clear semantics. This implies the existence of one or more ontologies. Events and their attributes point to this ontology. Semantically annotated logs of BPM systems. ★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable manner, i.e., logs are trustworthy and complete. Unlike the systems operating at level , notions such as process instance (case) and activity are supported in an explicit manner. Events logs of traditional BPM/ workflow systems. ★ ★ ★ Events are recorded automatically, but no systematic approach is followed to record events. However, unlike logs at level , there is some level of guarantee that the events recorded match reality (i.e., the event log is trustworthy but not necessarily complete). Consider, for example, the events recorded by an ERP system. Although events need to be extracted from a variety of tables, the information can be assumed to be correct (e.g., it is safe to assume that a payment recorded by the ERP actually exists and vice versa). Tables in ERP systems, event logs of CRM systems, transaction logs of messaging systems, event logs of high-tech systems, etc. ★ ★ Events are recorded automatically, i.e., as a by-product of some information system. Coverage varies, i.e., no systematic approach is followed to decide which events are recorded. Moreover, it is possible to bypass the information system. Hence, events may be missing or not recorded properly. Event logs of document and product management systems, error logs of embedded systems, worksheets of service engineers, etc. ★ Lowest level: event logs are of poor quality. Recorded events may not correspond to reality and events may be missing. Event logs for which events are recorded by hand typically have such characteristics. Trails left in paper documents routed through the organization ("yellow notes"), paper-based medical records, etc. ★★★ ★★ 33
  • 34. Level 4-5: straightforward syntactic manipulation Level 3: much more di ffi cult • Multiple data sources • Interpretation of data • Lack of explicit information about cases and events • Processes with one-to-many and many-to-many relations Level Characterization Examples ★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy and complete) and events are well-defined. Events are recorded in an automatic, systematic, reliable, and safe manner. Privacy and security considerations are addressed adequately. Moreover, the events recorded (and all of their attributes) have clear semantics. This implies the existence of one or more ontologies. Events and their attributes point to this ontology. Semantically annotated logs of BPM systems. ★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable manner, i.e., logs are trustworthy and complete. Unlike the systems operating at level , notions such as process instance (case) and activity are supported in an explicit manner. Events logs of traditional BPM/ workflow systems. ★ ★ ★ Events are recorded automatically, but no systematic approach is followed to record events. However, unlike logs at level , there is some level of guarantee that the events recorded match reality (i.e., the event log is trustworthy but not necessarily complete). Consider, for example, the events recorded by an ERP system. Although events need to be extracted from a variety of tables, the information can be assumed to be correct (e.g., it is safe to assume that a payment recorded by the ERP actually exists and vice versa). Tables in ERP systems, event logs of CRM systems, transaction logs of messaging systems, event logs of high-tech systems, etc. ★ ★ Events are recorded automatically, i.e., as a by-product of some information system. Coverage varies, i.e., no systematic approach is followed to decide which events are recorded. Moreover, it is possible to bypass the information system. Hence, events may be missing or not recorded properly. Event logs of document and product management systems, error logs of embedded systems, worksheets of service engineers, etc. ★ Lowest level: event logs are of poor quality. Recorded events may not correspond to reality and events may be missing. Event logs for which events are recorded by hand typically have such characteristics. Trails left in paper documents routed through the organization ("yellow notes"), paper-based medical records, etc. ★★★ ★★ Not covered today, but Recent works by Dirk Fahland, 
 Wil van der Aalst, my group https://pais.hse.ru/en/seminar-pne/ https://multiprocessmining.org http://ocel-standard.org 34
  • 35. Extracting XES from legacy data [___,BIS2017] Manual construction of views and ETL procedures to fetch the data Done by IT experts, not by knowledge workers (domain experts) Traditional Methodology Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N og Extraction and Process Mining inally, EBITmax converted the log view into a CSV file, and analysed it using th Disco process mining toolkit7 . 35
  • 36. Extracting XES from legacy data [___,BIS2017] Crucial issues: • Correctness: who knows? Process mining is dangerous if applied on wrong data • maintenance, evolution, change of perspective are hard… but process mining should be highly interactive Traditional Methodology Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N og Extraction and Process Mining inally, EBITmax converted the log view into a CSV file, and analysed it using th Disco process mining toolkit7 . 36
  • 37. The onprom approach onprom.inf.unibz.it Semantic technologies to: 1.Understand the data 2. Access the data using the domain vocabulary 3. Express the perspective for process mining using the domain vocabulary 4. Automatise the extraction of XES event logs 37 34 D. Calvanese et al. high-level IS? Create conceptual data schema Create mappings Bootstrap model + mappings Enrich model + mappings Choose perspective Create event-data annotations Get XES/CSV Do process mining Other perspective? N Y Y N Fig. 12: The onprom methodology and its four phases the same time generating (identity) mappings to link the two specifications. The result of bootstrapping can then be manually refined. Once the first phase is completed, process analysts and the other involved stake- holders do not need anymore to consider the structure of the legacy information system,
  • 38. Step 1. Understand the data 38
  • 39. Ontology-Based Data Access (aka Virtual Knowledge Graphs) 39
  • 40. Data access is becoming a bottleneck Optique project: Scalable, End-User Access to Big Data (http://optique- project.eu) One case study: Statoil • geologists and engineers develop models of unexplored areas based on drilling operations done in surrounding sites Crompton (2008): domain experts use (too much) time to fetch data for decision making and di their job • Engineers in the oil/gas sector: 30-70% working time spent into in data access and data quality 40
  • 41. Facts on Statoil • 1000 TB of relational data (SQL) • Non-aligned schemas, each with 2K+ tables • 900 experts within “Statoil Exploration” • Up to 4 days needed to express queries and translate them into SQL 41
  • 42. Example of query 42 OBDI framework Query answering Ontology languages Mappings Identity Conclusions How much time/money is spent searching for data? A user query at Statoil Show all norwegian wellbores with some aditional attributes (wellbore id, completion date, oldest penetrated age,result). Limit to all wellbores with a core and show attributes like (wellbore id, core number, top core depth, base core depth, intersecting stratigraphy). Limit to all wellbores with core in Brentgruppen and show key atributes in a table. After connecting to EPDS (slegge) we could for instance limit futher to cores in Brent with measured permeability and where it is larger than a given value, for instance 1 mD. We could also find out whether there are cores in Brent which are not stored in EPDS (based on NPD info) and where there could be permeability values. Some of the missing data we possibly own, other not. Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
  • 43. 43 A user query at Statoil Show all norwegian wellbores with some aditional attributes (wellbore id, completion date, oldest penetrated age,result). Limit to all wellbores with a core and show attributes like (wellbore id, core number, top core depth, base core depth, intersecting stratigraphy). Limit to all wellbores with core in Brentgruppen and show key atributes in a table. After connecting to EPDS (slegge) we could for instance limit futher to cores in Brent with measured permeability and where it is larger than a given value, for instance 1 mD. We could also find out whether there are cores in Brent which are not stored in EPDS (based on NPD info) and where there could be permeability values. Some of the missing data we possibly own, other not. SELECT [...] FROM db_name.table1 table1, db_name.table2 table2a, db_name.table2 table2b, db_name.table3 table3a, db_name.table3 table3b, db_name.table3 table3c, db_name.table3 table3d, db_name.table4 table4a, db_name.table4 table4b, db_name.table4 table4c, db_name.table4 table4d, db_name.table4 table4e, db_name.table4 table4f, db_name.table5 table5a, db_name.table5 table5b, db_name.table6 table6a, db_name.table6 table6b, db_name.table7 table7a, db_name.table7 table7b, db_name.table8 table8, db_name.table9 table9, db_name.table10 table10a, db_name.table10 table10b, db_name.table10 table10c, db_name.table11 table11, db_name.table12 table12, db_name.table13 table13, db_name.table14 table14, db_name.table15 table15, db_name.table16 table16 WHERE [...] table2a.attr1=‘keyword’ AND table3a.attr2=table10c.attr1 AND table3a.attr6=table6a.attr3 AND table3a.attr9=‘keyword’ AND table4a.attr10 IN (‘keyword’) AND table4a.attr1 IN (‘keyword’) AND table5a.kinds=table4a.attr13 AND table5b.kinds=table4c.attr74 AND table5b.name=‘keyword’ AND (table6a.attr19=table10c.attr17 OR (table6a.attr2 IS NULL AND table10c.attr4 IS NULL)) AND table6a.attr14=table5b.attr14 AND table6a.attr2=‘keyword’ AND (table6b.attr14=table10c.attr8 OR (table6b.attr4 IS NULL AND table10c.attr7 IS NULL)) AND table6b.attr19=table5a.attr55 AND table6b.attr2=‘keyword’ AND table7a.attr19=table2b.attr19 AND table7a.attr17=table15.attr19 AND table4b.attr11=‘keyword’ AND table8.attr19=table7a.attr80 AND table8.attr19=table13.attr20 AND table8.attr4=‘keyword’ AND table9.attr10=table16.attr11 AND table3b.attr19=table10c.attr18 AND table3b.attr22=table12.attr63 AND table3b.attr66=‘keyword’ AND table10a.attr54=table7a.attr8 AND table10a.attr70=table10c.attr10 AND table10a.attr16=table4d.attr11 AND table4c.attr99=‘keyword’ AND table4c.attr1=‘keyword’ AND table11.attr10=table5a.attr10 AND table11.attr40=‘keyword’ AND table11.attr50=‘keyword’ AND table2b.attr1=table1.attr8 AND table2b.attr9 IN (‘keyword’) AND table2b.attr2 LIKE ‘keyword’% AND table12.attr9 IN (‘keyword’) AND table7b.attr1=table2a.attr10 AND table3c.attr13=table10c.attr1 AND table3c.attr10=table6b.attr20 AND table3c.attr13=‘keyword’ AND table10b.attr16=table10a.attr7 AND table10b.attr11=table7b.attr8 AND table10b.attr13=table4b.attr89 AND table13.attr1=table2b.attr10 AND table13.attr20=’‘keyword’’ AND table13.attr15=‘keyword’ AND table3d.attr49=table12.attr18 AND table3d.attr18=table10c.attr11 AND table3d.attr14=‘keyword’ AND table4d.attr17 IN (‘keyword’) AND table4d.attr19 IN (‘keyword’) AND table16.attr28=table11.attr56 AND table16.attr16=table10b.attr78 AND table16.attr5=table14.attr56 AND table4e.attr34 IN (‘keyword’) AND table4e.attr48 IN (‘keyword’) AND table4f.attr89=table5b.attr7 AND table4f.attr45 IN (‘keyword’) AND table4f.attr1=‘keyword’ AND table10c.attr2=table4e.attr19 AND (table10c.attr78=table12.attr56 OR (table10c.attr55 IS NULL AND table12.attr17 IS NULL))
  • 44. 44 A user query at Statoil Show all norwegian wellbores with some aditional attributes (wellbore id, completion date, oldest penetrated age,result). Limit to all wellbores with a core and show attributes like (wellbore id, core number, top core depth, base core depth, intersecting stratigraphy). Limit to all wellbores with core in Brentgruppen and show key atributes in a table. After connecting to EPDS (slegge) we could for instance limit futher to cores in Brent with measured permeability and where it is larger than a given value, for instance 1 mD. We could also find out whether there are cores in Brent which are not stored in EPDS (based on NPD info) and where there could be permeability values. Some of the missing data we possibly own, other not. SELECT [...] FROM db_name.table1 table1, db_name.table2 table2a, db_name.table2 table2b, db_name.table3 table3a, db_name.table3 table3b, db_name.table3 table3c, db_name.table3 table3d, db_name.table4 table4a, db_name.table4 table4b, db_name.table4 table4c, db_name.table4 table4d, db_name.table4 table4e, db_name.table4 table4f, db_name.table5 table5a, db_name.table5 table5b, db_name.table6 table6a, db_name.table6 table6b, db_name.table7 table7a, db_name.table7 table7b, db_name.table8 table8, db_name.table9 table9, db_name.table10 table10a, db_name.table10 table10b, db_name.table10 table10c, db_name.table11 table11, db_name.table12 table12, db_name.table13 table13, db_name.table14 table14, db_name.table15 table15, db_name.table16 table16 WHERE [...] table2a.attr1=‘keyword’ AND table3a.attr2=table10c.attr1 AND table3a.attr6=table6a.attr3 AND table3a.attr9=‘keyword’ AND table4a.attr10 IN (‘keyword’) AND table4a.attr1 IN (‘keyword’) AND table5a.kinds=table4a.attr13 AND table5b.kinds=table4c.attr74 AND table5b.name=‘keyword’ AND (table6a.attr19=table10c.attr17 OR (table6a.attr2 IS NULL AND table10c.attr4 IS NULL)) AND table6a.attr14=table5b.attr14 AND table6a.attr2=‘keyword’ AND (table6b.attr14=table10c.attr8 OR (table6b.attr4 IS NULL AND table10c.attr7 IS NULL)) AND table6b.attr19=table5a.attr55 AND table6b.attr2=‘keyword’ AND table7a.attr19=table2b.attr19 AND table7a.attr17=table15.attr19 AND table4b.attr11=‘keyword’ AND table8.attr19=table7a.attr80 AND table8.attr19=table13.attr20 AND table8.attr4=‘keyword’ AND table9.attr10=table16.attr11 AND table3b.attr19=table10c.attr18 AND table3b.attr22=table12.attr63 AND table3b.attr66=‘keyword’ AND table10a.attr54=table7a.attr8 AND table10a.attr70=table10c.attr10 AND table10a.attr16=table4d.attr11 AND table4c.attr99=‘keyword’ AND table4c.attr1=‘keyword’ AND table11.attr10=table5a.attr10 AND table11.attr40=‘keyword’ AND table11.attr50=‘keyword’ AND table2b.attr1=table1.attr8 AND table2b.attr9 IN (‘keyword’) AND table2b.attr2 LIKE ‘keyword’% AND table12.attr9 IN (‘keyword’) AND table7b.attr1=table2a.attr10 AND table3c.attr13=table10c.attr1 AND table3c.attr10=table6b.attr20 AND table3c.attr13=‘keyword’ AND table10b.attr16=table10a.attr7 AND table10b.attr11=table7b.attr8 AND table10b.attr13=table4b.attr89 AND table13.attr1=table2b.attr10 AND table13.attr20=’‘keyword’’ AND table13.attr15=‘keyword’ AND table3d.attr49=table12.attr18 AND table3d.attr18=table10c.attr11 AND table3d.attr14=‘keyword’ AND table4d.attr17 IN (‘keyword’) AND table4d.attr19 IN (‘keyword’) AND table16.attr28=table11.attr56 AND table16.attr16=table10b.attr78 AND table16.attr5=table14.attr56 AND table4e.attr34 IN (‘keyword’) AND table4e.attr48 IN (‘keyword’) AND table4f.attr89=table5b.attr7 AND table4f.attr45 IN (‘keyword’) AND table4f.attr1=‘keyword’ AND table10c.attr2=table4e.attr19 AND (table10c.attr78=table12.attr56 OR (table10c.attr55 IS NULL AND table12.attr17 IS NULL)) 50M € per year
  • 48. 48 data domain IT geologist IT expert SQL queries domain knowledge ? info request
  • 49. 49 data domain IT geologist IT expert SQL queries domain knowledge SQL answers ? info request
  • 50. 50 data domain IT geologist IT expert SQL queries { domain knowledge integrated SQL answers SQL answers ? info request
  • 51. 51 data answer domain IT geologist IT expert SQL queries { domain knowledge integrated SQL answers SQL answers ? info request
  • 52. 52 data domain IT geologist IT expert SQL queries { domain knowledge integrated SQL answers answer SQL answers ? info request
  • 57. OBDA Main components 57 OBDI framework Query answering Ontology languages Mappings Identity Conclusions Ontology-based data integration framework . . . . . . . . . . . . Query Result Ontology provides global vocabulary and conceptual view Mappings semantically link sources and ontology Data Sources external and heterogeneous We achieve logical transparency in accessing data: does not know where and how the data is stored. can only see a conceptual view of the data. data sources ontology / conceptual model mapping
  • 58. OBDA Main technologies 58 OBDI framework Query answering Ontology languages Mappings Identity Conclusions Ontology-based data integration framework . . . . . . . . . . . . Query Result Ontology provides global vocabulary and conceptual view Mappings semantically link sources and ontology Data Sources external and heterogeneous We achieve logical transparency in accessing data: does not know where and how the data is stored. can only see a conceptual view of the data. SQL (or other technologies) schema : OWL2 QL / UML class diagram s (virtual knowledge graph) : RDF triples R2RML SPARQL
  • 59. ontop-vkg.org • State-of-the-art OBDA system • Compliant with RDF(S), OWL2 QL, R2RML, SPARQL • Supports all major relational DBS (Oracle, SQL Server, Postgres, …) • Support for other data storage mechanisms ongoing (MongoDB,…) • Development started in 2009 • Wide adoption in academia and industry • At the basis of https://ontopic.biz 59
  • 60. Conference Example: Conceptual Schema 60 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example N.B.: in on prom we use DL-LiteA (supports a controlled form of functionality)
  • 61. Behind the scene… 61 (title) ⌘ Paper ⇢(title) v string (funct title) (type) ⌘ Paper ⇢(type) v string (funct type) (decTime) ⌘ DecidedPaper ⇢(decTime) v ts (funct decTime) (accepted) ⌘ DecidedPaper ⇢(accepted) v boolean (funct accepted) (pName) ⌘ Person ⇢(pName) v string (funct pName) (regTime) ⌘ Person ⇢(regTime) v ts (funct regTime) (cName) ⌘ Conference ⇢(cName) v string (funct cName) (crTime) ⌘ Conference ⇢(crTime) v ts (funct crTime) (uploadTime) ⌘ Submission ⇢(uploadTime) v ts (funct uploadTime) (invTime) ⌘ Assignment ⇢(invTime) v ts (funct invTime) (subTime) ⌘ Review ⇢(subTime) v ts (funct subTime) DecidedPaper v Paper Creation v Submission CRUpload v Submission 9Submission1 ⌘ Submission 9Submission1 ⌘ Paper (funct Submission1) 9Submission2 ⌘ Submission 9Submission2 v Person (funct Submission2) 9Assignment1 ⌘ Assignment 9Assignment1 v Paper (funct Assignment1) 9Assignment2 ⌘ Assignment 9Assignment2 v Person (funct Assignment2) 9leadsTo v Assignment 9leadsTo ⌘ Review (funct leadsTo) (funct leadsTo ) 9submittedTo ⌘ Paper 9submittedTo v Conference (funct submittedTo) 9notifiedBy ⌘ DecidedPaper 9notifiedBy v Person (funct notifiedBy) 9chairs v Person 9chairs ⌘ Conference (funct chairs ) OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example Correctness of the Encoding. The encoding we have provided is faithful, in the sense that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram. Obviously, since, due to reification, the ontology alphabet may contain additional sym- bols with respect to those used in the UML class diagram, the two specifications cannot have the same logical models. However, it is possible to show that the logical models of a UML class diagram and those of the DL-LiteA ontology derived from it correspond to each other, and hence that satisfiability of a class or association in the UML diagram
  • 62. Mapping Example 62 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example Correctness of the Encoding. The encoding we have provided is faithful, in the sense that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram. Obviously, since, due to reification, the ontology alphabet may contain additional sym- bols with respect to those used in the UML class diagram, the two specifications cannot Example 10. Consider the CONFSYS running example, and an information system whose db schema R consists of the eight relational tables shown in Figure 11. We give some examples of mapping assertions: – The following mapping assertion explicitly populates the concept Creation. The term :submission/{oid} in the target part represents a URI template with one placeholder, {oid}, which gets replaced with the values for oid retrieved through the source query. This mapping expresses that each value in SUBMISSION identified by oid and such that its upload time equals the corresponding paper’s creation time, is mapped to an object :submission/oid, which becomes an instance of concept Creation in T . SELECT DISTINCT SUBMISSION.ID AS oid FROM SUBMISSION, PAPER WHERE SUBMISSION.PAPER = PAPER.ID AND SUBMISSION.UPLOADTIME = PAPER.CT :submission/{oid} rdf:type :Creation . – The following mapping assertion retrieves from the PAPER table instances of the concept Paper, and instantiates also their features title and type with values of type String. SELECT ID, title, type ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status
  • 65. From OBDA to 2-level OBDA [___,EKAW2018] 65
  • 66. 66 data map domain schema transform upper schema query/answer OBDA data map domain s identify services an UFO inspect contr OBDA relational DB GAV mappings (SQL query -> atom) UML class diagram / OWL2 QL TBox Transformation rules (ontology2ontology GAV mappings) UML class diagram / OWL2 QL TBox UCQs From OBDA to 2-level OBDA [___,EKAW2018]
  • 67. Theoretical Results 67 data map domain schema transform upper schema query/answer OBDA data map domain s identify services an UFO inspect contr OBDA Q Q’
  • 68. data map domain schema transform upper schema query/answer OBDA data map domain s identify services an UFO inspect contr OBDA Theoretical Results 68 Q Q’ data map domain schema transform upper schema query/answer OBDA (a) 2-level OBDA data map domain sc identify services an UFO inspect contr OBDA (b) 2OBDA for service ma data map domain schema transform upper schema query/answer OBDA (a) 2-level OBDA data map domain s identify services an UFO inspect contr OBDA (b) 2OBDA for service ma d a t a m a p d o m a i n s c h e m a t r a n s f o r m u p p e r s c h e m a q u e r y / a n s w e r O B D A ( a ) 2 - l e v e l O B D A d m d o m a i n i d e n t i f y s e r v i c e s U F i n s p e c t c o O B D A ( b ) 2 O B D f o r s e r v i c e ’
  • 69. Case study: reference model 69 Conceptual Schema Transformation in Ontology-bas data map domain schema transform upper schema query/answer OBDA (a) 2-level OBDA data map domain schema identify services and commitments UFO-S inspect contract states OBDA (b) 2OBDA framework (c) 2O Conceptual Schema Transformation in Ontolo data map domain schema transform upper schema query/answer OBDA (a) 2-level OBDA data map domain schema identify services and commitments UFO-S inspect contract states OBDA (b) 2OBDA framework
  • 70. Case study: process mining! 70 Conceptual Schema Transformation in Ontolo data map domain schema transform upper schema query/answer OBDA (a) 2-level OBDA data map domain schema identify services and commitments UFO-S inspect contract states OBDA (b) 2OBDA framework al Schema Transformation in Ontology-based Data Access 3 data map domain schema identify services and commitments UFO-S inspect contract states OBDA (b) 2OBDA framework data map domain schema identify cases and events event log format fetch cases and events process mining tool OBDA (c) 2OBDA framework for pro-
  • 71. Step 2. Find the event data 71
  • 72. Annotating the Conceptual Schema Fix perspective: declare the case • Find the class whose instances are considered as case objects • Express additional fi lters Find the events (looking for timestamps) • Find the classes whose instances refer to events • Declare how they are connected to corresponding case objects —> navigation in the UML class diagram • Declare how they are (in)directly related to event attributes
 (timestamp, task name, optionally event type and resource)
 —> navigation in the UML class diagram 72
  • 73. Conference Example Case Annotation 73 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 74. Conference Example Case Annotation 74 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 75. Conference Example Event annotation 75 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 76. Conference Example Event annotation 76 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 77. 77 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Case Case Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 78. Switching Perspective Simply amounts to rede fi ne the annotations • Flow of accepted papers • Flow of full papers • Flow of reviews • Flow of authors • Flow of reviewers • …. 78
  • 79. Step 3. Get your log, automatically 79
  • 80. Formalizing Annotations Annotations are nothing else than SPARQL queries over the conceptual data schema! • Case annotation: query retrieving case objects • Event annotation: query retrieving event objects • Case-attribute annotation: query retrieving pairs <attribute, case> • Event-attribute annotation: query retrieving pairs <attribute, event> 80
  • 81. 81 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * * * 1..* * 1 1 0..1 * 1 1 * Fig. 16: Annotated data model of our CONFSYS running example annotations, respectively used to capture the relationship between the event and its cor- responding case(s), timestamp, and activity. As pointed out before, the timestamp anno- SELECT DISTINCT ?case WHERE { ?case rdf:type :Paper . } which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT queries wi swer variable, this time matching with actual event identifiers, i.e., ob occurrences of events. Example 14. Consider the event annotation for creation, as shown in actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries w variables, establishing a relation between events and their correspondin ues. In this light, for timestamp and activity attribute annotations, the variable will be substituted by corresponding values for timestamps/activ case attribute annotations, instead, the second answer variable will be case objects, thus establishing a relationship between events and the c long to. Example 15. Consider again the annotation for creation events, as show The relationship between creation events and their corresponding times lished by the following query: PREFIX : <http://www.example.com/> which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT queries with a single an- swer variable, this time matching with actual event identifiers, i.e., objects denoting occurrences of events. Example 14. Consider the event annotation for creation, as shown in Figure 16. The actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries with two answer variables, establishing a relation between events and their corresponding attribute val- ues. In this light, for timestamp and activity attribute annotations, the second answer variable will be substituted by corresponding values for timestamps/activity names. For case attribute annotations, instead, the second answer variable will be substituted by case objects, thus establishing a relationship between events and the case(s) they be- long to. Example 15. Consider again the annotation for creation events, as shown in Figure 16. The relationship between creation events and their corresponding timestamps is estab- lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . }
  • 82. Annotations and XES Elements Annotations can be easily “mapped” onto XES elements:
 case annotation query —> traces
 event annotation query —> events
 attribute annotation query —> trace/event attributes with given key
 82 OBDA for Log Extraction in Process Mining 35 Attribute attKey: String attType: String attValue: String Event Trace e-has-a t-has-a t-contains-e 0..* 0..* 0..* 0..* 1..* 0..*
  • 83. Conference Example: Case Annotation 83 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper ecTime: ts ccepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * * * 1..* * 1 1 0..1 * 1 1 * Fig. 16: Annotated data model of our CONFSYS running example notations, respectively used to capture the relationship between the event and its cor- sponding case(s), timestamp, and activity. As pointed out before, the timestamp anno- ion needs to have a functional navigation. This also applies to the activity annotation, ?case rdf:type :Paper . } which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT queries with a single an swer variable, this time matching with actual event identifiers, i.e., objects denoting occurrences of events. Example 14. Consider the event annotation for creation, as shown in Figure 16. The actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries with two answe variables, establishing a relation between events and their corresponding attribute val ues. In this light, for timestamp and activity attribute annotations, the second answe variable will be substituted by corresponding values for timestamps/activity names. Fo case attribute annotations, instead, the second answer variable will be substituted by case objects, thus establishing a relationship between events and the case(s) they be long to. Example 15. Consider again the annotation for creation events, as shown in Figure 16 The relationship between creation events and their corresponding timestamps is estab lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { XES events:
 - id: ?creationEvent Event annotations are also tackled using SPARQL SELECT queries with a single an- swer variable, this time matching with actual event identifiers, i.e., objects denoting occurrences of events. Example 14. Consider the event annotation for creation, as shown in Figure 16. The actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries with two answer variables, establishing a relation between events and their corresponding attribute val- ues. In this light, for timestamp and activity attribute annotations, the second answer variable will be substituted by corresponding values for timestamps/activity names. For case attribute annotations, instead, the second answer variable will be substituted by case objects, thus establishing a relationship between events and the case(s) they be- long to. Example 15. Consider again the annotation for creation events, as shown in Figure 16. The relationship between creation events and their corresponding timestamps is estab- lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } which indeed retrieves all instances of Creation, together with the corresponding values taken by the uploadTime attribute. XES attribute:
 - key: timestamp extension - type: milliseconds 
 - value: ?creationTime - parent event: ?creationEvent
  • 84. Rewriting Annotations Annotations are nothing else than SPARQL queries over the conceptual data schema 84 They can be automatically reformulated as SQL queries over the legacy data We automatically get a standard OBDA mapping from the legacy data to the XES concepts
  • 85. 85 In the first step, the SPARQL queries formalising the annotations in L are reformu- lated into corresponding SQL queries posed directly over I. This is done by relying on standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten considering the contribution of the conceptual data schema T , and then unfolded using the mappings in M. The resulting query qsql can then be posed directly over I so as to retrieve the data associated to the corresponding annotation. In the following, we denote the set of all so-obtained SQL queries as Lsql. Example 16. Consider the SPARQL query in Example 13, formalising the event anno- tation that accounts for the creation of papers. A possible reformulation of the rewriting and unfolding of such a query respectively using the conceptual data schema in Fig- ure 9, and the mappings from Example 10, is the following SQL query: SELECT DISTINCT CONCAT(’http://www.example.com/submission/’,Submission."ID") AS "creationEvent" FROM Submission, Paper WHERE Submission."Paper" = Paper."ID" AND Submission."UploadTime" = Paper."CT" AND Submission."ID" IS NOT NULL This query is generated by the ontop OBDA system, which applies various optimisa- tions so as to obtain a final SQL query that is not only correct, but also possibly compact and fast to process by a standard DBMS. One such optimisations is the application of Person pName : String regTime: ts CRUpload Creation o chairs Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * 1..* 1 1 model of our CONFSYS running example apture the relationship between the event and its cor- activity. As pointed out before, the timestamp anno- vigation. This also applies to the activity annotation, ad of providing a functional navigation, the activity a constant string that independently fixes the name datory attributes, additional optional attribute anno- over the various standard extensions provided XES, within the activity transactional lifecycle, as well as ituted by the resource name and/or role. occurrences of events. Example 14. Consider the event annotation for creation, as shown in Figure 16. The actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries with two answer variables, establishing a relation between events and their corresponding attribute val- ues. In this light, for timestamp and activity attribute annotations, the second answer variable will be substituted by corresponding values for timestamps/activity names. For case attribute annotations, instead, the second answer variable will be substituted by case objects, thus establishing a relationship between events and the case(s) they be- long to. Example 15. Consider again the annotation for creation events, as shown in Figure 16. The relationship between creation events and their corresponding timestamps is estab- lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } which indeed retrieves all instances of Creation, together with the corresponding values taken by the uploadTime attribute. XES events:
 - id: ?creationEvent OBDA for Log Extraction in Process Mining 43 ch SQL query q(c) 2 Lsql obtained from a case annotation, we insert into he following OBDA mapping: q(c) :trace/{c} rdf:type :Trace . vely, such a mapping populates the concept Trace in E with the case objects e created from the answers returned by query q(c). ch SQL query q(e) 2 Lsql that is obtained from an event annotation, we nto ME P the following OBDA mapping: q(e) :event/{e} rdf:type :Event . vely, such a mapping populates the concept Event in E with the event objects e created from the answers returned by query q(e). OBDA for Log Extraction in Process Mining 43 1. For each SQL query q(c) 2 Lsql obtained from a case annotation, we insert into ME P the following OBDA mapping: q(c) :trace/{c} rdf:type :Trace . Intuitively, such a mapping populates the concept Trace in E with the case objects that are created from the answers returned by query q(c). 2. For each SQL query q(e) 2 Lsql that is obtained from an event annotation, we insert into ME P the following OBDA mapping: q(e) :event/{e} rdf:type :Event . Intuitively, such a mapping populates the concept Event in E with the event objects that are created from the answers returned by query q(e). as a XES event log, and also to actually materialise such an event log. Technically, onprom takes as input an onprom model P = hI, T , M, Li and the event schema E, and produces new OBDA system hI, ME P , Ei, where the annotations in L are automatically reformulated as OBDA mappings ME P that directly link I to E. Such mappings are synthesised using the three-step approach described next. In the first step, the SPARQL queries formalising the annotations in L are reformu- lated into corresponding SQL queries posed directly over I. This is done by relying on standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten considering the contribution of the conceptual data schema T , and then unfolded using the mappings in M. The resulting query qsql can then be posed directly over I so as to retrieve the data associated to the corresponding annotation. In the following, we denote the set of all so-obtained SQL queries as Lsql. Example 16. Consider the SPARQL query in Example 13, formalising the event anno- tation that accounts for the creation of papers. A possible reformulation of the rewriting and unfolding of such a query respectively using the conceptual data schema in Fig- ure 9, and the mappings from Example 10, is the following SQL query: SELECT DISTINCT CONCAT(’http://www.example.com/submission/’,Submission."ID") AS "creationEvent" FROM Submission, Paper WHERE Submission."Paper" = Paper."ID" AND
  • 86. Recap 86 OBDA for Log Extraction in Process Mining 37 D (database) R (db schema) conforms to M (mapping specification) T (conceptual data schema) L (event-data annotations) P (onprom model) E (conceptual event schema) annotates points to ME P (log mapping specification) I (information system) B (OBDA model) Fig. 15: Sketch of the onprom model. The dashed mapping specification is automati-
  • 87. Querying the “Virtual Log” SPARQL queries over the event schema are answered using legacy data • Example: get empty and nonempty traces; for nonempty traces, also fetch all their events Answers can be serialised into a fully compliant XES log! 87 name. The following query is instead meant to retrieve (elementary) attributes, considering in particular their key, type, and value. PREFIX : <http://www.example.org/> SELECT DISTINCT ?att ?attType ?attKey ?attValue WHERE { ?att rdf:type :Attribute; :attType ?attType; :attKey ?attKey; :attVal ?attValue. } The following query handles the retrieval of empty and nonempty traces, simulta- neously obtaining, for nonempty traces, their constitutive events: PREFIX : <http://www.example.org/> SELECT DISTINCT ?trace ?event WHERE { ?trace a :Trace . OPTIONAL { ?trace :t-contain-e ?event . ?event :e-contain-a ?timestamp . ?timestamp :attKey "time:timestamp"ˆˆxsd:string . ?event :e-contain-a ?name . ?name :attKey "concept:name"ˆˆxsd:string . } } 4.6 The onprom Toolchain onprom comes with a toolchain that supports the various phases of the methodology
  • 88. The onprom Toolchain Implementation of all the described steps using • Java (GUIs, algorithms) • OWL 2 QL plus functionality (conceptual schemas) • ontop (OBDA system) • OpenXES (XES serialisation and manipulation) • ProM process mining framework (environment) 88
  • 89. onprom UML Editor 89 46 D. Calvanese et al. Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
  • 90. onprom Annotation Editor 90 OBDA for Log Extraction in Process Mining 47 Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
  • 91. onprom Log Extractor 91 OBDA for Log Extraction in Process Mining 49 Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
  • 92. Experiments • Very encouraging initial experiments • Carried out using synthetic data • We are looking for real case studies! 92
  • 93. Data Generation with CPN Tools 93
  • 94. Results 94 Postgres 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Extracted Components in XES log Running time (in milliseconds) 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Tuple(s) in the whole database Running time (in milliseconds) 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Extracted Components in XES log Running time (in milliseconds) 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Tuple(s) in the whole database # Tuple(s) in the whole database Running time (in milliseconds) ~11 mins to extract ~9M XES components from ~3,5M tuples
  • 95. 95
  • 96. Conclusions • Process Mining as a way to reconcile model-driven management and the real behaviours • Data preparation is an issue in presence of legacy data • Ontology-Based Data Access: solid theoretical basis with optimised implementations • onprom as an e ff ective tool chain for extracting event logs from legacy databases • Several simpli fi ed settings can emerge depending on the context: fi xed ERP schema, reference models, … 96
  • 97. Future Work • Conceptual Modeling • How to improve the discovery of events? • How to semi-automatically propose events to the user? • How to integrate methodologies and results from formal ontology? • Engineering • How to handle di ff erent types of data? • How to deal with di ff erent event schemas that go beyond XES? • How to generalise the approach to handle rich ontology-to-ontology-mappings? 97