From legacy data to event data

Marco Montali Free University of Bozen-Bolzano, Italy

credits: Diego Calvanese, Tahir Emre Kalayci, Ario Santoso, Wil van der Aalst
From legacy data to event data
1

My research in one slide
I investigate foundational and applied
techniques grounded in arti
fi
cial intelligence
for modelling, veri
fi
cation, execution,
monitoring, and mining of dynamic systems
operating over data, with a speci
fi
c focus on
business process management and
multiagent systems.
2

How to attack this challenges?
Arti
fi
cial
Intelligence
Knowledge representation

Automated reasoning

Multiagent systems
Information
Systems
Business process management

Master data management

Decision management
Formal
Methods
In
fi
nite-state systems

Veri
fi
cation

Petri nets
Data
Science
Process mining
3

Processes leave breadcrumbs…
6

Processes leave digital breadcrumbs…
Organisational level:

• Internal management

• Calculation of process metrics/KPIs

• Legal reasons  
(compliance, external audits)

Personal level:

• We live in a digital society!

• Social networks, sensors,
cyberphysical systems, mobile
devices are all data loggers
7

50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/

get reqs.
enact/
monitor
(re
)

design
adjust
IT support
reality
(knowledge
)

workers
managers
/

analysts
8

50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/

get reqs.
enact/
monitor
(re
)

design
adjust
IT support
reality
(knowledge
)

workers
managers
/

analysts
9

50% 50%
data models
50% 50%
con
fi
gure/
deploy
diagnose
/

get reqs.
enact/
monitor
(re
)

design
adjust
IT support
reality
(knowledge
)

workers
managers
/

analysts
10

PM2
[Eck et al., CAiSE 2015]
Initialization
Analysis iterations
Analysis iterations
Analysis iterations
1. Planning
2. Extraction
3. Data
processing
4. Mining
& Analysis
5. Evaluation
6. Process
Improvement
& Support
Discovery
Conformance
Enhancement
Event
logs
Improve-
ment ideas
Event
Data
Information
System
Process
Models
Performance
findings
Compliance
findings
Stage
Output /
Input
Research
questions
Refined/new
research
questions
Analytics
Analytic
models
Business
experts
Process
analysts
11

12
Camunda
ERP
Signavio
Document-driven
EPCs
GSM
BPMN
CMMN
Case Management
Legacy Systems
CRM
E-R
Bizagi
Aris
UML
Artifact-Centric
SAP
Bonita
Practice…

IEEE standard XES
www.xes-standard.org
IEEE Standard for the representation of event logs

• Based on XML

• Minimal mandatory structure:  
log consists of traces, each representing the history of a case 
trace consists of a list of atomic events

• Extensions to “decorate” log, trace, event with informative attributes:
timestamps, task names, transactional lifecycle, resources, additional event
data

• Supports “meta-level” declarations useful for log processors
13

Full XES
Schema
attKey: String
attType: String
Attribute
extName: String
extPrefix: String
extUri: String
Extension
attValue: String
ElementaryAttribute CompositeAttribute
{disjoint}
ca-contains-a
*
*
logFeatures: String
logVersion: String
Log Trace Event
GlobalAttribute
GlobalEventAttribute
GlobalTraceAttribute
EventClassifier
TraceClassifier
name: String
Classifier
a-contains-a
*
*
*
* e-usedBy-a
e-usedBy-l
*
*
l-contains-t t-contains-e
* *
*
1..*
l-contains-e
*
*
* * *
*
*
*
l-has-a
t-has-a
e-has-a
l-has-gea *
1..*
l-contains-ec
1..*
*
1..*
l-contains-tc
*
ec-definedBy-gea
1..*
*
1..*
1..*
* tc-definedBy-gea
l-has-gta
*
{disjoint}
{disjoint}
14

Core XES schema
OBDA for Log Extraction in Process Mining 3
Attribute
attKey: String
attType: String
attValue: String
Event
Trace
e-has-a
t-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
Fig. 13: Core event schema
15

<log xes.version="1.0"

xes.features="nested-attributes"

openxes.version="1.0RC7">

<extension name="Time"

prefix="time"

uri="http://www.xes-standard.org/time.xesext"/>

<classifier name="Event Name" keys="concept:name"/>

<string key="concept:name" value="XES Event Log"/>

...

<trace>

<string key="concept:name" value="1"/>

<event>

<string key="User" value="Pete"/>

<string key="concept:name" value="create paper"/>

<int key="Event ID" value="35654424"/>

...

</event>

<event>

...

<string key="concept:name" value="submit paper"/>

...

</event>

...

16

A simple process
Apologies for being so predictable…
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
17

The lucky situation
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
Example 1. As a running example, we consider a simplified conference submission
system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au-
thors, reviewers, and conference chairs in the submission of papers to conferences, the
consequent review process, and the final decision about paper acceptance or rejection.
Figure 2 shows the process control flow considering papers as case objects. Under this
perspective, the management of a single paper evolves through the following execution
steps. First, the paper is created by one of its authors, and submitted to a conference
available in the system. Once the paper is submitted, the review phase for that paper
starts. This phase of the process consists of a so-called multi-instance section, i.e., a
section of the process where the same set of activities is instantiated multiple times on
Event Data
Case ID ID Timestamp Activity User ...
1
35654423 30-12-2010:11.02 create paper Pete ...
35654424 31-12-2010:10.06 submit paper Pete ...
35654425 05-01-2011:15.12 assign review Mike ...
35654426 06-01-2011:11.18 submit review Sara ...
35654428 07-01-2011:14.24 accept paper Mike ...
35654429 06-01-2011:11.18 upload CR Pete ...
2
35654483 30-12-2010:11.32 create paper George ...
35654485 30-12-2010:12.12 submit paper John ...
35654487 30-12-2010:14.16 assign review Mike ...
35654489 16-01-2011:10.30 submit review Ellen ...
35654490 18-01-2011:12.05 reject paper Mike ...
50%
18

The common case
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~
x).
Example 10. Consider the CONFSYS running example, and an information system
whose db schema R consists of the eight relational tables shown in Figure 11. Some
example mapping assertions are the following ones:
1. SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
20

The common case
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER
REVIEW
REVIEWREQUEST
x).
21

The common case
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER
REVIEW
REVIEWREQUEST
x).
22

Intertwined objects
time
data
activities
23

Intertwined objects
time
data
activities
Order
Item
Package
1
includes
*
*
is carried in
1
o1 o2 o3
i1,1 i1,2 i2,1 i2,2 i2,3 i3,1
p1
p2 p3
Figure 1: Structure of order, item, and package data objects in an order-to-delivery sce-
nario whereuv items from di↵erent orders are carried in several packages
event log for orders
timestamp overall log order o1 order o2 order o3
2019-09-22 10:00:00 create order o1 create order
2019-09-22 10:01:00 add item i1,1 to order o1 add item
2019-09-23 12:27:00 pay order o pay order
Have you ever  
placed orders online?
24

Flattening reality
time
data
activities
Order
Item
Package
1
includes
*
*
is carried in
1
o1 o2 o3
i1,1 i1,2 i2,1 i2,2 i2,3 i3,1
p1
p2 p3
2019-09-23 12:27:00 pay order o pay order
focus on orders
25

Flattening Reality
time
data
activities
order o1
order o2
order o3
26

The effect of flattening
Package
is carried in
1
p1
p2 p3
2019-09-23 12:27:00 pay order o3 pay order
2019-09-23 14:34:00 load item i1,1 into package p1 load item
2019-09-23 17:01:00 send package p1 send package send package
2019-09-24 08:46:00 send package p3 send package
2019-09-24 16:21:00 deliver package p1 deliver package deliver package
2019-09-24 18:52:00 deliver package p3 deliver package
2019-09-24 18:57:00 accept delivery p3 accept delivery
2019-09-25 08:32:00 accept delivery p1 accept delivery accept delivery
27

Package
is carried in
1
p1
p2 p3
orders
28

Package
is carried in
1
p1
p2 p3
orders
29

Discovery?
p1
p2 p3
e of order, item, and package data objects in an order-to-delivery sce-
s from di↵erent orders are carried in several packages
overall log order o1 order o2 order o3
order o1 create order
m i1,1 to order o1 add item
der o3 pay order
der o1 pay order
m i1,1 into package p1 load item
der o2 pay order
ackage p1 send package send package
ackage p3 send package
package p1 deliver package deliver package
package p3 deliver package
delivery p3 accept delivery
delivery p1 accept delivery accept delivery
to discover a process model that explains the behavior
3
2
4
2
3
3
1
1
3
3
6
5
3
3
create order
3
add item
6
pay order
3
load item
6
send package
5
deliver package
11
accept delivery
5
This requires to apply a no
raw log, where a case notion is
flat view of the log is compute
case object. The right part of
this flattening when Order is
traces in this log is obtained,
order, and by filtering the raw
flat trace for a given order cont
directly refer to that order, or
or to a package that carries o
order.
Two undesired e↵ects consequ
1. Replication of tasks. When
objects, its related events
such case objects. In our
case for the events focused
refer to both order o1 and o
in these two traces.
2. Shu✏ing of independent th
applied to di↵erent object
object are shu✏ed togethe
guish to which actual obje
scenario, this is the case for
included in the same order
of that order. In the same
derstand which item is add
is delivered or accepted.
correlate a load item even
add item event, and an acce
corresponding delivery atte
The result of these two undes
covered model, which then con
formation as well as apparen
procent. This can be clearly
directly-follow graph with fre
the well-known Disco process
der log of our scenario. Due
misleadingly indicates that th
This number is derived consid
age in our scenario carry obj
30

Discovery?
p1
p2 p3
e of order, item, and package data objects in an order-to-delivery sce-
s from di↵erent orders are carried in several packages
overall log order o1 order o2 order o3
der o3 pay order
der o1 pay order
der o2 pay order
ackage p3 send package
package p3 deliver package
delivery p3 accept delivery
to discover a process model that explains the behavior
3
2
4
2
3
3
1
1
3
3
6
5
3
3
create order
3
add item
6
pay order
3
load item
6
send package
5
deliver package
11
accept delivery
5
This requires to apply a no
raw log, where a case notion is
flat view of the log is compute
case object. The right part of
this flattening when Order is
traces in this log is obtained,
order, and by filtering the raw
flat trace for a given order cont
directly refer to that order, or
or to a package that carries o
order.
Two undesired e↵ects consequ
1. Replication of tasks. When
objects, its related events
such case objects. In our
case for the events focused
refer to both order o1 and o
in these two traces.
2. Shu✏ing of independent th
applied to di↵erent object
object are shu✏ed togethe
guish to which actual obje
scenario, this is the case for
included in the same order
of that order. In the same
derstand which item is add
is delivered or accepted.
correlate a load item even
add item event, and an acce
corresponding delivery atte
The result of these two undes
covered model, which then con
formation as well as apparen
procent. This can be clearly
directly-follow graph with fre
the well-known Disco process
der log of our scenario. Due
misleadingly indicates that th
This number is derived consid
age in our scenario carry obj
non-existing loop
wrong statistics
31

Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some
information system. Coverage varies, i.e., no systematic approach
is followed to decide which events are recorded. Moreover, it is
possible to bypass the information system. Hence, events may be
missing or not recorded properly.
Event logs of document and
product management systems,
error logs of embedded
systems, worksheets of service
engineers, etc.
★ Lowest level: event logs are of poor quality. Recorded events may
not correspond to reality and events may be missing. Event logs for
which events are recorded by hand typically have such
characteristics.
Trails left in paper documents
routed through the organization
("yellow notes"), paper-based
medical records, etc.
★★★
★★
32

Level 4-5: straightforward
syntactic manipulation

Level 3: much more di
ffi
cult

• Multiple data sources

• Interpretation of data

• Lack of explicit information
about cases and events

• Processes with one-to-many
and many-to-many relations
BPM systems.
workflow systems.
systems, etc.
engineers, etc.
characteristics.
★★★
★★
33

Level 4-5: straightforward
syntactic manipulation

Level 3: much more di
ffi
cult

• Multiple data sources

• Interpretation of data

• Lack of explicit information
about cases and events

• Processes with one-to-many
and many-to-many relations
BPM systems.
workflow systems.
systems, etc.
engineers, etc.
characteristics.
★★★
★★
Not covered today, but

Recent works by Dirk Fahland,
 
Wil van der Aalst, my group

https://pais.hse.ru/en/seminar-pne/

https://multiprocessmining.org

http://ocel-standard.org

34

Extracting XES from legacy data
[___,BIS2017]
Manual construction of views and ETL procedures to fetch the data

Done by IT experts, not by knowledge workers (domain experts)
Traditional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
og Extraction and Process Mining
inally, EBITmax converted the log view into a CSV file, and analysed it using th
Disco process mining toolkit7
.
35

Extracting XES from legacy data
[___,BIS2017]
Crucial issues:

• Correctness: who knows? Process mining is dangerous if applied on wrong
data

• maintenance, evolution, change of perspective are hard… but process mining
should be highly interactive
Traditional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
og Extraction and Process Mining
inally, EBITmax converted the log view into a CSV file, and analysed it using th
Disco process mining toolkit7
.
36

The onprom approach
onprom.inf.unibz.it
Semantic technologies to:

1.Understand the data

2. Access the data using the domain vocabulary
3. Express the perspective for process mining using the domain vocabulary
4. Automatise the extraction of XES event logs
37
34 D. Calvanese et al.
high-level IS?
Create
conceptual
data
schema
Create
mappings
Bootstrap
model +
mappings
Enrich
model +
mappings
Choose
perspective
Create
event-data
annotations
Get
XES/CSV
Do process
mining
Other perspective?
N
Y
Y
N
Fig. 12: The onprom methodology and its four phases
the same time generating (identity) mappings to link the two specifications. The result
of bootstrapping can then be manually refined.
Once the first phase is completed, process analysts and the other involved stake-
holders do not need anymore to consider the structure of the legacy information system,

Step 1. Understand the data
38

Ontology-Based Data Access
(aka Virtual Knowledge Graphs)
39

Data access is becoming a bottleneck
Optique project: Scalable, End-User Access to Big Data (http://optique-
project.eu)

One case study: Statoil

• geologists and engineers develop models of unexplored areas based on
drilling operations done in surrounding sites

Crompton (2008): domain experts use (too much) time to fetch data for
decision making and di their job

• Engineers in the oil/gas sector: 30-70% working time spent into in data
access and data quality
40

Facts on Statoil
• 1000 TB of relational data (SQL)

• Non-aligned schemas, each with 2K+ tables

• 900 experts within “Statoil Exploration”

• Up to 4 days needed to express queries and translate them into SQL
41

Example of query
42
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)

43
other not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table4a.attr10 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr17=table15.attr19 AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table10b.attr11=table7b.attr8 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4e.attr34 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))

44
other not.
SELECT [...]
FROM
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table16 table16
WHERE [...]
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table7a.attr17=table15.attr19 AND
table3b.attr19=table10c.attr18 AND
table10a.attr16=table4d.attr11 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table13.attr20=’‘keyword’’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
50M
€

per year

45
geologist
data
IT expert
domain
knowledge
domain IT

46
data
domain IT
geologist
IT expert
domain
knowledge
info
request ?

47
data
domain IT
geologist
IT expert
domain
knowledge
?
info
request

48
data
domain IT
geologist
IT expert
SQL queries
domain
knowledge
?
info
request

49
data
domain IT
geologist
IT expert
SQL queries
domain
knowledge
SQL
answers
?
info
request

50
data
domain IT
geologist
IT expert
SQL queries
{
domain
knowledge
integrated SQL answers
SQL
answers
?
info
request

51
data
answer
domain IT
geologist
IT expert
SQL queries
{
domain
knowledge
SQL
answers
?
info
request

52
data
domain IT
geologist
IT expert
SQL queries
{
domain
knowledge
answer
SQL
answers
?
info
request

53
data
geologist
domain IT
OBDA suite
knowledge
engineer
domain
knowledge

54
data
geologist
domain IT
OBDA suite
knowledge
graph
mapping
knowledge
engineer
domain
knowledge

55
data
geologist
domain
knowledge
domain IT
OBDA suite
knowledge
graph
mapping
Ontop engine
?
info
request
knowledge
engineer

56
data
geologist
domain
knowledge
domain IT
OBDA suite
knowledge
graph
mapping
Ontop engine
?
info
request
answer
knowledge
engineer

OBDA
Main components
57
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
data sources
ontology
/

conceptual model
mapping

OBDA
Main technologies
58
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
SQL

(or other technologies)
schema
:

OWL2 QL
/

UML class diagram
s

(virtual knowledge graph)
:

RDF triples
R2RML
SPARQL

ontop-vkg.org
• State-of-the-art OBDA system

• Compliant with RDF(S), OWL2 QL, R2RML, SPARQL

• Supports all major relational DBS (Oracle, SQL Server, Postgres, …)

• Support for other data storage mechanisms ongoing (MongoDB,…)

• Development started in 2009

• Wide adoption in academia and industry

• At the basis of https://ontopic.biz
59

Conference Example: Conceptual Schema
60
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
N.B.: in on prom we use DL-LiteA

(supports a controlled form of functionality)

Behind the scene…
61
(title) ⌘ Paper
⇢(title) v string
(funct title)
(type) ⌘ Paper
⇢(type) v string
(funct type)
(decTime) ⌘ DecidedPaper
⇢(decTime) v ts
(funct decTime)
(accepted) ⌘ DecidedPaper
⇢(accepted) v boolean
(funct accepted)
(pName) ⌘ Person
⇢(pName) v string
(funct pName)
(regTime) ⌘ Person
⇢(regTime) v ts
(funct regTime)
(cName) ⌘ Conference
⇢(cName) v string
(funct cName)
(crTime) ⌘ Conference
⇢(crTime) v ts
(funct crTime)
(uploadTime) ⌘ Submission
⇢(uploadTime) v ts
(funct uploadTime)
(invTime) ⌘ Assignment
⇢(invTime) v ts
(funct invTime)
(subTime) ⌘ Review
⇢(subTime) v ts
(funct subTime)
DecidedPaper v Paper
Creation v Submission
CRUpload v Submission
9Submission1 ⌘ Submission
9Submission1 ⌘ Paper
(funct Submission1)
9Submission2 ⌘ Submission
9Submission2 v Person
(funct Submission2)
9Assignment1 ⌘ Assignment
9Assignment1 v Paper
(funct Assignment1)
9Assignment2 ⌘ Assignment
9Assignment2 v Person
(funct Assignment2)
9leadsTo v Assignment
9leadsTo ⌘ Review
(funct leadsTo)
(funct leadsTo )
9submittedTo ⌘ Paper
9submittedTo v Conference
(funct submittedTo)
9notifiedBy ⌘ DecidedPaper
9notifiedBy v Person
(funct notifiedBy)
9chairs v Person
9chairs ⌘ Conference
(funct chairs )
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
have the same logical models. However, it is possible to show that the logical models
of a UML class diagram and those of the DL-LiteA ontology derived from it correspond
to each other, and hence that satisfiability of a class or association in the UML diagram

Mapping Example
62
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
whose db schema R consists of the eight relational tables shown in Figure 11. We
give some examples of mapping assertions:
– The following mapping assertion explicitly populates the concept Creation. The
term :submission/{oid} in the target part represents a URI template with
one placeholder, {oid}, which gets replaced with the values for oid retrieved
through the source query. This mapping expresses that each value in SUBMISSION
identified by oid and such that its upload time equals the corresponding paper’s
creation time, is mapped to an object :submission/oid, which becomes an
instance of concept Creation in T .
SELECT DISTINCT SUBMISSION.ID AS oid
:submission/{oid} rdf:type :Creation .
– The following mapping assertion retrieves from the PAPER table instances of the
concept Paper, and instantiates also their features title and type with values of type
String.
SELECT ID, title, type
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER

Abstraction Layers
63
Data
Domain
Model
Reference
Models

Spaghetti Mappings
64
Data
Domain
Model
Reference
Models
data

analysis
reporting KPIs
query

answering

From OBDA to 2-level OBDA
[___,EKAW2018]
65

66
data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
identify services an
UFO
inspect contr
OBDA
relational DB
GAV mappings

(SQL query -> atom)
UML class diagram /

OWL2 QL TBox
Transformation rules

(ontology2ontology

GAV mappings)
UML class diagram /

OWL2 QL TBox
UCQs

From OBDA to 2-level OBDA
[___,EKAW2018]

Theoretical Results
67
data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
UFO
inspect contr
OBDA
Q
Q’

data
map
domain schema
transform
upper schema
query/answer
OBDA
data
map
domain s
UFO
inspect contr
OBDA
Theoretical Results
68
Q
Q’
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain sc
UFO
inspect contr
OBDA
(b) 2OBDA
for service ma
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain s
UFO
inspect contr
OBDA
(b) 2OBDA
for service ma
d
a
t
a
m
a
p
d
o
m
a
i
n
s
c
h
e
m
a
t
r
a
n
s
f
o
r
m
u
p
p
e
r
s
c
h
e
m
a
q
u
e
r
y
/
a
n
s
w
e
r
O
B
D
A
(
a
)
2
-
l
e
v
e
l
O
B
D
A
d
m
d
o
m
a
i
n
i
d
e
n
t
i
f
y
s
e
r
v
i
c
e
s
U
F
i
n
s
p
e
c
t
c
o
O
B
D
A
(
b
)
2
O
B
D
f
o
r
s
e
r
v
i
c
e
’

Case study: reference model
69
Conceptual Schema Transformation in Ontology-bas
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
identify services and commitments
UFO-S
inspect contract states
OBDA
(b) 2OBDA framework (c) 2O
Conceptual Schema Transformation in Ontolo
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
UFO-S
OBDA
(b) 2OBDA framework

Case study: process mining!
70
Conceptual Schema Transformation in Ontolo
data
map
domain schema
transform
upper schema
query/answer
OBDA
(a) 2-level OBDA
data
map
domain schema
UFO-S
OBDA
(b) 2OBDA framework
al Schema Transformation in Ontology-based Data Access 3
data
map
domain schema
UFO-S
OBDA
(b) 2OBDA framework
data
map
domain schema
identify cases and events
event log format
fetch cases and events
process mining tool
OBDA
(c) 2OBDA framework for pro-

Step 2. Find the event data
71

Annotating the Conceptual Schema
Fix perspective: declare the case

• Find the class whose instances are considered as case objects

• Express additional
fi
lters

Find the events (looking for timestamps)

• Find the classes whose instances refer to events

• Declare how they are connected to corresponding case objects —> navigation in the
UML class diagram

• Declare how they are (in)directly related to event attributes 
(timestamp, task name, optionally event type and resource) 
—> navigation in the UML class diagram
72

Conference Example
Case Annotation
73
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Event Creation
Case: Submission!Submission1
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

Conference Example
Case Annotation
74
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

Conference Example
Event annotation
75
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

Conference Example
Event annotation
76
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

77
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Case
Case
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

Switching Perspective
Simply amounts to rede
fi
ne the annotations
• Flow of accepted papers

• Flow of full papers

• Flow of reviews

• Flow of authors

• Flow of reviewers

• ….
78

Step 3. Get your log, automatically
79

Formalizing Annotations
Annotations are nothing else than SPARQL queries over the conceptual data
schema!

• Case annotation: query retrieving case objects

• Event annotation: query retrieving event objects

• Case-attribute annotation: query retrieving pairs <attribute, case>

• Event-attribute annotation: query retrieving pairs <attribute, event>
80

81
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 16: Annotated data model of our CONFSYS running example
annotations, respectively used to capture the relationship between the event and its cor-
responding case(s), timestamp, and activity. As pointed out before, the timestamp anno-
SELECT DISTINCT ?case
WHERE {
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT queries wi
swer variable, this time matching with actual event identifiers, i.e., ob
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries w
variables, establishing a relation between events and their correspondin
ues. In this light, for timestamp and activity attribute annotations, the
variable will be substituted by corresponding values for timestamps/activ
case attribute annotations, instead, the second answer variable will be
case objects, thus establishing a relationship between events and the c
long to.
Example 15. Consider again the annotation for creation events, as show
The relationship between creation events and their corresponding times
lished by the following query:
Event annotations are also tackled using SPARQL SELECT queries with a single an-
swer variable, this time matching with actual event identifiers, i.e., objects denoting
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
WHERE {
}
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}

Annotations and XES Elements
Annotations can be easily “mapped” onto XES elements: 
case annotation query —> traces 
event annotation query —> events 
attribute annotation query —> trace/event attributes with given key 
82
Attribute
attKey: String
attType: String
attValue: String
Event
Trace
e-has-a
t-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*

Conference Example:
Case Annotation
83
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
ecTime: ts
ccepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 16: Annotated data model of our CONFSYS running example
notations, respectively used to capture the relationship between the event and its cor-
sponding case(s), timestamp, and activity. As pointed out before, the timestamp anno-
ion needs to have a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
Event annotations are also tackled using SPARQL SELECT queries with a single an
WHERE {
}
Attribute annotations are formalised using SPARQL SELECT queries with two answe
variables, establishing a relation between events and their corresponding attribute val
ues. In this light, for timestamp and activity attribute annotations, the second answe
variable will be substituted by corresponding values for timestamps/activity names. Fo
case objects, thus establishing a relationship between events and the case(s) they be
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16
The relationship between creation events and their corresponding timestamps is estab
WHERE {
XES events: 
- id: ?creationEvent
Event annotations are also tackled using SPARQL SELECT queries with a single an-
WHERE {
}
long to.
WHERE {
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES attribute: 
- key: timestamp extension

- type: milliseconds
 
- value: ?creationTime

- parent event: ?creationEvent

Rewriting Annotations
Annotations are nothing else than SPARQL queries over the conceptual data
schema
84
They can be automatically reformulated as SQL
queries over the legacy data
We automatically get a standard OBDA mapping
from the legacy data to the XES concepts

85
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Submission."UploadTime" = Paper."CT" AND
Submission."ID" IS NOT NULL
This query is generated by the ontop OBDA system, which applies various optimisa-
tions so as to obtain a final SQL query that is not only correct, but also possibly compact
and fast to process by a standard DBMS. One such optimisations is the application of
Person
pName : String
regTime: ts
CRUpload Creation
o
chairs
Event Creation
Event Creation
* 1..*
1 1
model of our CONFSYS running example
apture the relationship between the event and its cor-
activity. As pointed out before, the timestamp anno-
vigation. This also applies to the activity annotation,
ad of providing a functional navigation, the activity
a constant string that independently fixes the name
datory attributes, additional optional attribute anno-
over the various standard extensions provided XES,
within the activity transactional lifecycle, as well as
ituted by the resource name and/or role.
WHERE {
}
long to.
WHERE {
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES events: 
- id: ?creationEvent
ch SQL query q(c) 2 Lsql obtained from a case annotation, we insert into
he following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
vely, such a mapping populates the concept Trace in E with the case objects
e created from the answers returned by query q(c).
ch SQL query q(e) 2 Lsql that is obtained from an event annotation, we
nto ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
vely, such a mapping populates the concept Event in E with the event objects
e created from the answers returned by query q(e).
1. For each SQL query q(c) 2 Lsql obtained from a case annotation, we insert into
ME
q(c)
:trace/{c} rdf:type :Trace .
Intuitively, such a mapping populates the concept Trace in E with the case objects
that are created from the answers returned by query q(c).
2. For each SQL query q(e) 2 Lsql that is obtained from an event annotation, we
insert into ME
q(e)
:event/{e} rdf:type :Event .
Intuitively, such a mapping populates the concept Event in E with the event objects
that are created from the answers returned by query q(e).
as a XES event log, and also to actually materialise such an event log.
Technically, onprom takes as input an onprom model P = hI, T , M, Li and the
event schema E, and produces new OBDA system hI, ME
P , Ei, where the annotations
in L are automatically reformulated as OBDA mappings ME
P that directly link I to E.
Such mappings are synthesised using the three-step approach described next.
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND

Recap
86
D
(database)
R
(db schema)
conforms to
M
(mapping specification)
T
(conceptual data schema)
L
(event-data annotations)
P (onprom model)
E
(conceptual event schema)
annotates
points to
ME
P
(log mapping specification)
I (information system)
B (OBDA model)
Fig. 15: Sketch of the onprom model. The dashed mapping specification is automati-

Querying the “Virtual Log”
SPARQL queries over the event schema are answered using legacy data

• Example: get empty and nonempty traces; for nonempty traces, also fetch all their events

Answers can be serialised into a fully compliant XES log!
87
name.
The following query is instead meant to retrieve (elementary) attributes, considering
in particular their key, type, and value.
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?att ?attType ?attKey ?attValue
WHERE {
?att rdf:type :Attribute;
:attType ?attType;
:attKey ?attKey;
:attVal ?attValue.
}
The following query handles the retrieval of empty and nonempty traces, simulta-
neously obtaining, for nonempty traces, their constitutive events:
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?trace ?event
WHERE {
?trace a :Trace .
OPTIONAL {
?trace :t-contain-e ?event .
?event :e-contain-a ?timestamp .
?timestamp :attKey "time:timestamp"ˆˆxsd:string .
?event :e-contain-a ?name .
?name :attKey "concept:name"ˆˆxsd:string .
}
}
4.6 The onprom Toolchain
onprom comes with a toolchain that supports the various phases of the methodology

The onprom Toolchain
Implementation of all the described steps using

• Java (GUIs, algorithms)

• OWL 2 QL plus functionality (conceptual schemas)

• ontop (OBDA system)

• OpenXES (XES serialisation and manipulation)

• ProM process mining framework (environment)
88

onprom UML Editor
89
46 D. Calvanese et al.
Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our

onprom Annotation Editor
90
Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case

onprom Log Extractor
91
Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.

Experiments
• Very encouraging initial experiments

• Carried out using synthetic data

• We are looking for real case studies!
92

Data Generation with CPN Tools
93

Results
94
Postgres
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Running
time
(in
milliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
Running
time
(in
milliseconds)
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Running
time
(in
milliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
# Tuple(s) in the whole database
Running
time
(in
milliseconds)
~11 mins to extract ~9M XES components
from ~3,5M tuples

Conclusions
• Process Mining as a way to reconcile model-driven management and the real
behaviours

• Data preparation is an issue in presence of legacy data

• Ontology-Based Data Access: solid theoretical basis with optimised
implementations

• onprom as an e
ff
ective tool chain for extracting event logs from legacy
databases

• Several simpli
fi
ed settings can emerge depending on the context:
fi
xed ERP
schema, reference models, …
96

Future Work
• Conceptual Modeling

• How to improve the discovery of events?

• How to semi-automatically propose events to the user?

• How to integrate methodologies and results from formal ontology?

• Engineering

• How to handle di
ff
erent types of data?

• How to deal with di
ff
erent event schemas that go beyond XES?

• How to generalise the approach to handle rich ontology-to-ontology-mappings?
97

From legacy data to event data

Recommended

Recommended

More Related Content

Similar to From legacy data to event data

Similar to From legacy data to event data (20)

More from Faculty of Computer Science - Free University of Bozen-Bolzano

More from Faculty of Computer Science - Free University of Bozen-Bolzano (20)

Recently uploaded

Recently uploaded (20)

From legacy data to event data