Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From Model to Event Data Analysis
Marco Montali
Free University of Bozen-Bolzano

ATAED 2016
Marrying Data and Processes
Our Starting Point
Marrying processes and data is a must 

if we want to really understand 

how complex dynamic systems o...
Our Thesis
Knowledge representation and 

computational logics 



is a swiss-army knife to



understand data-aware dynam...
Business Process Lifecycle
4
picture by Wil van der Aalst
Formal Verification
Automated analysis
of a formal model of the system
against a property of interest,
considering all poss...
Process Mining
Extraction of valuable,
process-related information 

from event logs,
i.e., the footprint of reality
6
pic...
7
Loneliness
Data/Process Fragmentation
• A business process consists of a set of activities that
are performed in coordination in an o...
Experts Dichotomy
• BPM professionals: data are subsidiary to
processes
• Master data managers: data are the main driver
f...
Conventional Data Modeling
Focus: revelant entities, relations, static constraints
Supplier ManufacturingProcurement/Suppl...
Conventional Process Modeling
Focus: control-flow of activities in response to events
But… how do activities update data?
W...
A Deployed Process
13
Do you like Spaghetti?
Manage
Cancelation
ShipAssemble
Manage
Material POs
Decompose
Customer PO
Activities
Process
Data
A...
Too Late…
• Where are the data?
• Where shall we model relevant business rules?
15
o late to reconstruct the missing piece...
How is Research Reacting?
A recent review…
Verification typically takes place at the design stage
of a business process typ...
…But There is Hope!
• [Meyer et al, 2011]: data-process integration
crucial to assess the value of processes and
evaluate ...
Formal Verification
The Conventional, Propositional Case
Process control-flow
(Un)desired property
18
(Un)desired property
Finite-state
transition
system
Propositional
temporal formula|=
Formal Verification
The Conventional, ...
(Un)desired property
Finite-state
transition
system
Propositional
temporal formula|=
Verification
via model checking
2007 T...
Marriage
(Un)desired property
Formal Verification
The Data-Aware Case
22
Process+Data
(Un)desired property
First-order
temporal formula|=
Process+Data
Formal Verification
The Data-Aware Case
Infinite-state, rel...
(Un)desired property
First-order
temporal formula|=
?
Formal Verification
The Data-Aware Case
24
Process+Data
Infinite-state...
Why FO Temporal Logics
• To inspect data: FO queries
• To capture system dynamics: temporal
modalities
• To track the evol...
Why FO Temporal Logics
• To inspect data: FO queries
• To capture system dynamics: temporal
modalities
• To track the evol...
Problem Dimensions
Data
component
Relational DB
Description
logic KB
OBDA system
Inconsistency
tolerant KB
…
Process
compo...
Colored Petri Nets
28
Colored Petri Nets
• Where is the data model?
• Formal analysis only with a-priori propositionalization
29
Tools (e.g. BizAgi)
30
Review
Request
Fill Reim-
bursement
Review Reim-
bursement
Rejected
Accepted
• Which formal semanti...
A
true
false
BPMS and Data
Correct?
• BizAgi…
• YAWL…
31
A
true
false
BPMS and Data
Correct?
• BizAgi… not sure…
• YAWL… YES!
32
RAW-SYS
• Integrated data+process modeling
• Standard relational model for capturing data
• Standard workflow nets (or othe...
RAW-SYS
34
Task
local
Task
local
Task
Case
new case close case
archive
shared
local
archive
Example: User Cart
35
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
crea...
Example: User Cart
36
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Cust...
Example: User Cart
37
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Cust...
Example: User Cart
38
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Cust...
Example: User Cart
39
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Cust...
Example: User Cart
40
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Cust...
Execution Semantics
Relational transition system. Each state is labeled by:
• Instance of the shared DB
• Case IDs of runn...
The Good…
RAW-SYS are:
• Markovian: Next state only depends on the
current state + input. 

Two states with identical DBs ...
… and the Bad
Reachability undecidable even with a single safe net
• Counter —> “size” of a unary relation
• Test counter ...
State-Boundedness
[PODS 2013]
Put a pre-defined bound on the DB size

(not the size of the data domain!)
• Resulting transi...
RAW-SYS, Boundedness,
and Reachability
Reachability undecidable as soon as one of the following
conditions holds:
• Shared...
Magic!
46
First-order
temporal formula
(FO-CTL or FO-LTL with

persistent quantification)
|=
Infinite-state
transition system
Magic!
47
First-order
temporal formula
(FO-CTL or FO-LTL with

persistent quantification)
|=
Infinite-state
transition syste...
Magic!
48
First-order
temporal formula
(FO-CTL or FO-LTL with

persistent quantification)
|=
Infinite-state
transition syste...
Towards Implementations
• [IJCAI 2015] Planning can be lifted to deal with this
infinite-state setting
• Ongoing implementa...
Process Mining
50 picture by Wil van der Aalst
Process Mining
51 picture by Wil van der Aalst
Expected Reality
52
log	
trace	
event
Expected Reality
XES standard for event logs
53
<log	xes.version="1.0"	xes.features="nested-attributes">	
<trace>	
			<str...
Actual Reality
54
Actual Reality
55
PaperInfo
Understanding Reality…
56
1..*
*
Conference
creation	time:DateTime
conf	name:String
User
creation	time:DateTime
username:S...
From here…
Impedance Mismatch
57
1..*
*
Conference
creation	time:DateTime
conf	name:String
User
creation	time:DateTime
use...
…to there!
Impedance Mismatch
58
Key Issues
• How to resolve the
“impedance mismatch”?
• How to get a “view” of
the data tailored to
process mining?
59
Pap...
Impedance Mismatch is
Really an Issue
Crompton (2008): domain experts loose too much
time to big into data and turn them i...
Optique
Scalable, End-User Access to Big Data
• http://optique-project.eu
• Goal: engineering techniques for enabling end-...
Facts on Statoil
• 1000 TB of data inside relational DBMSs
• Schemas not aligned
• More than 2000 tables, in a plethora of...
Query Example
63
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is sp...
64
er query at Statoil
w all norwegian wellbores with some aditional attributes
bore id, completion date, oldest penetrate...
65
er query at Statoil
w all norwegian wellbores with some aditional attributes
bore id, completion date, oldest penetrate...
Ontology-Based Data Access
66
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-bas...
Ontop
• Open-source OBDA technology developed at
UNIBZ (supervisor: Diego Calvanese)
• Fully supports semantic web standar...
Resolving the
Impedance Mismatch
68
PaperInfo
FullPaper
creationTime: DateTime
title: String
mappingId fp-mapping
target p...
What if my DB is Very Nice?
Ontology bootstrapping automatically creates
• a conceptual model that mirrors 1-1 the relatio...
OBDA for Process Mining
• Need to resolve a second impedance mismatch
problem!
• From here…
70
1..*
*
Conference
creation	...
OBDA for Process Mining
• …To there!
71
-xes.version	:	xs:decimal
-xes.features	:	xs:token
Log
-name	:	xs:string
-keys	:	x...
OBDA for Process Mining
• From here…
72
PaperInfo
OBDA for Process Mining
• …To there!
73
log	
trace	
event
Log Annotations
74
1..*
*
Conference
creation	time:DateTime
conf	name:String
User
creation	time:DateTime
username:String
P...
75
1..*
*
Conference
creation	time:DateTime
conf	name:String
User
creation	time:DateTime
username:String
Paper
creation	ti...
Multiple Log Views
76
1..*
*
Conference
creation	time:DateTime
conf	name:String
User
creation	time:DateTime
username:Strin...
77
1..*
*
Conference
creation	time:DateTime
conf	name:String
User
creation	time:DateTime
username:String
Paper
creation	ti...
And Now?
78
data
base
data
base
data
base
Domain	Ontology Event	Ontology
XES	Log	
Extraction
Mapping	
Annota+on	
?	
?
Mapping Synthesis
79
data
base
data
base
data
base
Domain	Ontology Event	Ontology
XES	Log	
Extraction
Mapping	
Annota+on	
...
Log Materialization
80
data
base
data
base
data
base
Domain	Ontology Event	Ontology
Mapping
Annotation
Log	Mapping
XES	
Ev...
81
82
Log Virtualization
83
data
base
data
base
data
base
Domain	Ontology Event	Ontology
Mapping
Annotation
Process	
Mining	Tool...
Questions
• How to optimize and test the scalability of the
approach? Fine-tuning is a must!
• Real vs simulated data? (Be...
KAOS Project
Knowledge-Aware Operational Support
• Goal: Empowering process mining and online
operational support with dom...
86
Conclusion
Acknowledgments
All coauthors of this research, 

in particular 



Diego Calvanese (UNIBZ)

Giuseppe De Giacomo (UNIROMA)...
Upcoming SlideShare
Loading in …5
×

ATAED2016 Montali - Marrying data and processes: from model to event data analysis

436 views

Published on

Keynote by Marco Montali "Marrying data and processes: from model to event data analysis" at the Workshop on Algorithms & Theories for the Analysis of Event Data (ATAED 2016), satellite event of the 37th International Conference on Application and Theory of Petri Nets and Concurrency and of the 16th International Conference on Application of Concurrency to System Design (PN 2016 and ACSD 2016).

  • Be the first to comment

  • Be the first to like this

ATAED2016 Montali - Marrying data and processes: from model to event data analysis

  1. 1. From Model to Event Data Analysis Marco Montali Free University of Bozen-Bolzano
 ATAED 2016 Marrying Data and Processes
  2. 2. Our Starting Point Marrying processes and data is a must 
 if we want to really understand 
 how complex dynamic systems operate Dynamic systems of interest: • business processes • multiagent systems • distributed systems 2
  3. 3. Our Thesis Knowledge representation and 
 computational logics 
 
 is a swiss-army knife to
 
 understand data-aware dynamic systems, and 
 provide automated reasoning and verification capabilities along their entire lifecycle 3
  4. 4. Business Process Lifecycle 4 picture by Wil van der Aalst
  5. 5. Formal Verification Automated analysis of a formal model of the system against a property of interest, considering all possible system behaviors 5 picture by Wil van der Aalst
  6. 6. Process Mining Extraction of valuable, process-related information 
 from event logs, i.e., the footprint of reality 6 picture by Wil van der Aalst
  7. 7. 7
  8. 8. Loneliness
  9. 9. Data/Process Fragmentation • A business process consists of a set of activities that are performed in coordination in an organizational and technical environment [Weske, 2007] • Activities change the real world • The corresponding updates are reflected into the organizational information system(s) • Data trigger decision-making, which in turn determines the next steps to be taken in the process • Survey by Forrester [Karel et al, 2009]: lack of interaction between data and process experts 9
  10. 10. Experts Dichotomy • BPM professionals: data are subsidiary to processes • Master data managers: data are the main driver for the company’s existence • Forrester: in 83/100 companies, no interaction at all between these two groups • This isolation propagates to languages and tools, which never properly account for the process- data connection 10
  11. 11. Conventional Data Modeling Focus: revelant entities, relations, static constraints Supplier ManufacturingProcurement/Supplier Sales Customer PO Line Item Work OrderMaterial PO * * spawns 0..1 Material But… how do data evolve? Where can we find the “state” of a purchase order? 11
  12. 12. Conventional Process Modeling Focus: control-flow of activities in response to events But… how do activities update data? What is the impact of canceling an order? 12
  13. 13. A Deployed Process 13
  14. 14. Do you like Spaghetti? Manage Cancelation ShipAssemble Manage Material POs Decompose Customer PO Activities Process Data Activities Process Data Activities Process Data Activities Process Data Activities Process Data Customers Suppliers&CataloguesCustomer POs Work Orders Material POs IT integration: difficult to manage, understand, evolve 14
  15. 15. Too Late… • Where are the data? • Where shall we model relevant business rules? 15 o late to reconstruct the missing pieces Where is our data? part is in the DBs, part is hidden in the process execution engine. Where are the relevant business rules, and how are they modeled? At the DB level? Which DB? How to import the process data? (Also) in the business model? How to import data from the DBs? DataProcess Supplier ManufacturingProcurement/Supplier Sales Customer PO Line Item Work OrderMaterial PO * * spawns 0..1 Determine cancelation penalty Notify penalty Material Process Engine Process State Business rules For each work order W For each material PO M in W if M has been shipped add returnCost(M) to penalty
  16. 16. How is Research Reacting? A recent review… Verification typically takes place at the design stage of a business process type. However, at this stage, required knowledge about data (database schema, integrity constraints) is typically not yet available. 16
  17. 17. …But There is Hope! • [Meyer et al, 2011]: data-process integration crucial to assess the value of processes and evaluate KPIs • [Dumas, 2011]: data-process integration crucial to aggregate all relevant information, and to suitably inject business rules into the system • [Reichert, 2012]: “Process and data are just two sides of the same coin” 17
  18. 18. Formal Verification The Conventional, Propositional Case Process control-flow (Un)desired property 18
  19. 19. (Un)desired property Finite-state transition system Propositional temporal formula|= Formal Verification The Conventional, Propositional Case Process control-flow 19
  20. 20. (Un)desired property Finite-state transition system Propositional temporal formula|= Verification via model checking 2007 Turing award: Clarke, Emerson, Sifakis Formal Verification The Conventional, Propositional Case Process control-flow 20
  21. 21. Marriage
  22. 22. (Un)desired property Formal Verification The Data-Aware Case 22 Process+Data
  23. 23. (Un)desired property First-order temporal formula|= Process+Data Formal Verification The Data-Aware Case Infinite-state, relational transition system [Vardi 2005] 23
  24. 24. (Un)desired property First-order temporal formula|= ? Formal Verification The Data-Aware Case 24 Process+Data Infinite-state, relational transition system [Vardi 2005]
  25. 25. Why FO Temporal Logics • To inspect data: FO queries • To capture system dynamics: temporal modalities • To track the evolution of objects: FO quantification across states • Example: 
 It is always the case that every order 
 is eventually either cancelled or paid 25
  26. 26. Why FO Temporal Logics • To inspect data: FO queries • To capture system dynamics: temporal modalities • To track the evolution of objects: FO quantification across states • Example: 
 It is always the case that every order 
 is eventually either cancelled or paid 26 G ✓ 8x.Order(x) ! F State(x, cancelled) _ State(x, paid) ◆
  27. 27. Problem Dimensions Data component Relational DB Description logic KB OBDA system Inconsistency tolerant KB … Process component condition- action rules BPMN Golog program Petri nets … Task modeling Conditional effects Add/delete assertions Programs User forms … External inputs None External services Input DB Fixed input … Network topology Single orchestrator Full mesh Connected, fixed graph Ring … Interaction mechanism None Synchronous Asynchronous and ordered Asynchronous lossy … 27
  28. 28. Colored Petri Nets 28
  29. 29. Colored Petri Nets • Where is the data model? • Formal analysis only with a-priori propositionalization 29
  30. 30. Tools (e.g. BizAgi) 30 Review Request Fill Reim- bursement Review Reim- bursement Rejected Accepted • Which formal semantics? • Analysable?
  31. 31. A true false BPMS and Data Correct? • BizAgi… • YAWL… 31
  32. 32. A true false BPMS and Data Correct? • BizAgi… not sure… • YAWL… YES! 32
  33. 33. RAW-SYS • Integrated data+process modeling • Standard relational model for capturing data • Standard workflow nets (or other types of Petri nets) for capturing processes • Net transitions interplay with data • Conditionally enabled by FO queries over the data • Described in terms of full-fledged CRUD operations over the data • Bridge between theory and practice • Mimics how BPMS actually work • Has unambiguous execution semantics 33
  34. 34. RAW-SYS 34 Task local Task local Task Case new case close case archive shared local archive
  35. 35. Example: User Cart 35 Customer Id … Product (read-only) name … InCart BarCode Product Owner CustId Shared DB Local DB create case close case
  36. 36. Example: User Cart 36 Customer Id … Product (read-only) name … InCart BarCode Product Owner CustId Shared DB Local DB Customer(x,…) ADD Owner(x) … create case close case
  37. 37. Example: User Cart 37 Customer Id … Product (read-only) name … InCart BarCode Product Owner CustId Shared DB Local DB Customer(x,…) ADD Owner(x) open cart … create case close case
  38. 38. Example: User Cart 38 Customer Id … Product (read-only) name … InCart BarCode Product Owner CustId Shared DB Local DB Customer(x,…) ADD Owner(x) open cart insert item(p) create case Product(p,…) ADD InCart(getBC(),p) close case
  39. 39. Example: User Cart 39 Customer Id … Product (read-only) name … InCart BarCode Product Owner CustId Shared DB Local DB Customer(x,…) ADD Owner(x) open cart empty cart insert item(p) create case Exist x,p. InCart(x,p) Product(p,…) Forall x,p.
 InCart(x,p)->DEL InCart(x,p) ADD InCart(getBC(),p) close case
  40. 40. Example: User Cart 40 Customer Id … Product (read-only) name … InCart BarCode Product Owner CustId Shared DB Local DB Customer(x,…) ADD Owner(x) open cart … close cart empty cart insert item(p) create case Exist x,p. InCart(x,p) Product(p,…) Forall x,p.
 InCart(x,p)->DEL InCart(x,p) ADD InCart(getBC(),p) close case
  41. 41. Execution Semantics Relational transition system. Each state is labeled by: • Instance of the shared DB • Case IDs of running cases, together with corresponding • Instances of local DBs • Markings of their nets Successors constructed considering all possible ground executable actions and all possible input configurations (s.t. the resulting state satisfies the schema constraints) —> infinite-state transition system
  42. 42. The Good… RAW-SYS are: • Markovian: Next state only depends on the current state + input. 
 Two states with identical DBs are bisimilar. • Generic: FO/SQL (as all query languages) does not distinguish structures which are identical modulo uniform renaming of data objects. —> Two isomorphic states are bisimilar 42
  43. 43. … and the Bad Reachability undecidable even with a single safe net • Counter —> “size” of a unary relation • Test counter for zero: check whether counter relation is empty • What matters is the # of tuples, not the actual values • Can be reconstructed also without negation in the queries 43 New Increment Decrement
  44. 44. State-Boundedness [PODS 2013] Put a pre-defined bound on the DB size
 (not the size of the data domain!) • Resulting transition system: still infinite-state • But: infinitely-many encountered values along a run cannot be “accumulated” in a single state 44
  45. 45. RAW-SYS, Boundedness, and Reachability Reachability undecidable as soon as one of the following conditions holds: • Shared DB with unbounded size • Local DB with unbounded size • Unboundedly many simultaneously running cases 
 What happens if all these three sources are “bounded in size”? 45
  46. 46. Magic! 46 First-order temporal formula (FO-CTL or FO-LTL with
 persistent quantification) |= Infinite-state transition system
  47. 47. Magic! 47 First-order temporal formula (FO-CTL or FO-LTL with
 persistent quantification) |= Infinite-state transition system |= Propositional temporal formula ‘ Finite-state abstraction
  48. 48. Magic! 48 First-order temporal formula (FO-CTL or FO-LTL with
 persistent quantification) |= Infinite-state transition system |= Propositional temporal formula ‘ If and only if Finite-state abstraction
  49. 49. Towards Implementations • [IJCAI 2015] Planning can be lifted to deal with this infinite-state setting • Ongoing implementation effort using DLVk and state-of-the-art ADL planners • [SEBD 2015, AMW 2015] Ongoing effort for implementing model checking techniques based on our abstraction natively in relational technology • Goal: combine the best of databases and formal methods 49
  50. 50. Process Mining 50 picture by Wil van der Aalst
  51. 51. Process Mining 51 picture by Wil van der Aalst
  52. 52. Expected Reality 52 log trace event
  53. 53. Expected Reality XES standard for event logs 53 <log xes.version="1.0" xes.features="nested-attributes"> <trace> <string key=“concept:name” value=“1” /> <event> <string key=“concept:name” value=“register request” /> <date key=“time:timestamp” value=“2010-12-30T11:02:00.000+01:00” /> </event> </trace>
  54. 54. Actual Reality 54
  55. 55. Actual Reality 55 PaperInfo
  56. 56. Understanding Reality… 56 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool Upload Submitted upload time:DateTime upload accepted upload time:DateTime submitted to 1 * organizer of Accepted Paper <<no time>> * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 * AhasU 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1* UAuploadbyU 1 *
  57. 57. From here… Impedance Mismatch 57 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool Upload Submitted upload time:DateTime upload accepted upload time:DateTime submitted to 1 * organizer of Accepted Paper <<no time>> * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 * AhasU 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1* UAuploadbyU 1 *
  58. 58. …to there! Impedance Mismatch 58
  59. 59. Key Issues • How to resolve the “impedance mismatch”? • How to get a “view” of the data tailored to process mining? 59 PaperInfo 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool Upload Submitted upload time:DateTime upload accepted upload time:DateTime submitted to 1 * organizer of Accepted Paper <<no time>> * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 * AhasU 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1* UAuploadbyU 1 *
  60. 60. Impedance Mismatch is Really an Issue Crompton (2008): domain experts loose too much time to big into data and turn them into knowledge • Engineers in the oil/gas industry: 30-70% of their working time spent for data searching and data quality 60
  61. 61. Optique Scalable, End-User Access to Big Data • http://optique-project.eu • Goal: engineering techniques for enabling end- users accessing data through domain ontologies • Case studies: Statoil, Siemens 61
  62. 62. Facts on Statoil • 1000 TB of data inside relational DBMSs • Schemas not aligned • More than 2000 tables, in a plethora of different DBs • 900 experts part of “Statoil Exploration” • Up to 4 days to formulate queries and encode them in SQL 62
  63. 63. Query Example 63 OBDI framework Query answering Ontology languages Mappings Identity Conclusions How much time/money is spent searching for data? A user query at Statoil Show all norwegian wellbores with some aditional attributes (wellbore id, completion date, oldest penetrated age,result). Limit to all wellbores with a core and show attributes like (wellbore id, core number, top core depth, base core depth, intersecting stratigraphy). Limit to all wellbores with core in Brentgruppen and show key atributes in a table. After connecting to EPDS (slegge) we could for instance limit futher to cores in Brent with measured permeability and where it is larger than a given value, for instance 1 mD. We could also find out whether there are cores in Brent which are not stored in EPDS (based on NPD info) and where there could be permeability values. Some of the missing data we possibly own, other not. Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
  64. 64. 64 er query at Statoil w all norwegian wellbores with some aditional attributes bore id, completion date, oldest penetrated age,result). Lim wellbores with a core and show attributes like (wellbore id number, top core depth, base core depth, intersecting graphy). Limit to all wellbores with core in Brentgruppen key atributes in a table. After connecting to EPDS (slegg ould for instance limit futher to cores in Brent with measu eability and where it is larger than a given value, for insta We could also find out whether there are cores in Brent w ot stored in EPDS (based on NPD info) and where there ermeability values. Some of the missing data we possibly o not. SELECT [...] FROM db_name.table1 table1, db_name.table2 table2a, db_name.table2 table2b, db_name.table3 table3a, db_name.table3 table3b, db_name.table3 table3c, db_name.table3 table3d, db_name.table4 table4a, db_name.table4 table4b, db_name.table4 table4c, db_name.table4 table4d, db_name.table4 table4e, db_name.table4 table4f, db_name.table5 table5a, db_name.table5 table5b, db_name.table6 table6a, db_name.table6 table6b, db_name.table7 table7a, db_name.table7 table7b, db_name.table8 table8, db_name.table9 table9, db_name.table10 table10a, db_name.table10 table10b, db_name.table10 table10c, db_name.table11 table11, db_name.table12 table12, db_name.table13 table13, db_name.table14 table14, db_name.table15 table15, db_name.table16 table16 WHERE [...] table2a.attr1=‘keyword’ AND table3a.attr2=table10c.attr1 AND table3a.attr6=table6a.attr3 AND table3a.attr9=‘keyword’ AND table4a.attr10 IN (‘keyword’) AND table4a.attr1 IN (‘keyword’) AND table5a.kinds=table4a.attr13 AND table5b.kinds=table4c.attr74 AND table5b.name=‘keyword’ AND (table6a.attr19=table10c.attr17 OR (table6a.attr2 IS NULL AND table10c.attr4 IS NULL)) AND table6a.attr14=table5b.attr14 AND table6a.attr2=‘keyword’ AND (table6b.attr14=table10c.attr8 OR (table6b.attr4 IS NULL AND table10c.attr7 IS NULL)) AND table6b.attr19=table5a.attr55 AND table6b.attr2=‘keyword’ AND table7a.attr19=table2b.attr19 AND table7a.attr17=table15.attr19 AND table4b.attr11=‘keyword’ AND table8.attr19=table7a.attr80 AND table8.attr19=table13.attr20 AND table8.attr4=‘keyword’ AND table9.attr10=table16.attr11 AND table3b.attr19=table10c.attr18 AND table3b.attr22=table12.attr63 AND table3b.attr66=‘keyword’ AND table10a.attr54=table7a.attr8 AND table10a.attr70=table10c.attr10 AND table10a.attr16=table4d.attr11 AND table4c.attr99=‘keyword’ AND table4c.attr1=‘keyword’ AND table11.attr10=table5a.attr10 AND table11.attr40=‘keyword’ AND table11.attr50=‘keyword’ AND table2b.attr1=table1.attr8 AND table2b.attr9 IN (‘keyword’) AND table2b.attr2 LIKE ‘keyword’% AND table12.attr9 IN (‘keyword’) AND table7b.attr1=table2a.attr10 AND table3c.attr13=table10c.attr1 AND table3c.attr10=table6b.attr20 AND table3c.attr13=‘keyword’ AND table10b.attr16=table10a.attr7 AND table10b.attr11=table7b.attr8 AND table10b.attr13=table4b.attr89 AND table13.attr1=table2b.attr10 AND table13.attr20=’‘keyword’’ AND table13.attr15=‘keyword’ AND table3d.attr49=table12.attr18 AND table3d.attr18=table10c.attr11 AND table3d.attr14=‘keyword’ AND table4d.attr17 IN (‘keyword’) AND table4d.attr19 IN (‘keyword’) AND table16.attr28=table11.attr56 AND table16.attr16=table10b.attr78 AND table16.attr5=table14.attr56 AND table4e.attr34 IN (‘keyword’) AND table4e.attr48 IN (‘keyword’) AND table4f.attr89=table5b.attr7 AND table4f.attr45 IN (‘keyword’) AND table4f.attr1=‘keyword’ AND table10c.attr2=table4e.attr19 AND (table10c.attr78=table12.attr56 OR (table10c.attr55 IS NULL AND table12.attr17 IS NULL))
  65. 65. 65 er query at Statoil w all norwegian wellbores with some aditional attributes bore id, completion date, oldest penetrated age,result). Lim wellbores with a core and show attributes like (wellbore id number, top core depth, base core depth, intersecting graphy). Limit to all wellbores with core in Brentgruppen key atributes in a table. After connecting to EPDS (slegg ould for instance limit futher to cores in Brent with measu eability and where it is larger than a given value, for insta We could also find out whether there are cores in Brent w ot stored in EPDS (based on NPD info) and where there ermeability values. Some of the missing data we possibly o not. SELECT [...] FROM db_name.table1 table1, db_name.table2 table2a, db_name.table2 table2b, db_name.table3 table3a, db_name.table3 table3b, db_name.table3 table3c, db_name.table3 table3d, db_name.table4 table4a, db_name.table4 table4b, db_name.table4 table4c, db_name.table4 table4d, db_name.table4 table4e, db_name.table4 table4f, db_name.table5 table5a, db_name.table5 table5b, db_name.table6 table6a, db_name.table6 table6b, db_name.table7 table7a, db_name.table7 table7b, db_name.table8 table8, db_name.table9 table9, db_name.table10 table10a, db_name.table10 table10b, db_name.table10 table10c, db_name.table11 table11, db_name.table12 table12, db_name.table13 table13, db_name.table14 table14, db_name.table15 table15, db_name.table16 table16 WHERE [...] table2a.attr1=‘keyword’ AND table3a.attr2=table10c.attr1 AND table3a.attr6=table6a.attr3 AND table3a.attr9=‘keyword’ AND table4a.attr10 IN (‘keyword’) AND table4a.attr1 IN (‘keyword’) AND table5a.kinds=table4a.attr13 AND table5b.kinds=table4c.attr74 AND table5b.name=‘keyword’ AND (table6a.attr19=table10c.attr17 OR (table6a.attr2 IS NULL AND table10c.attr4 IS NULL)) AND table6a.attr14=table5b.attr14 AND table6a.attr2=‘keyword’ AND (table6b.attr14=table10c.attr8 OR (table6b.attr4 IS NULL AND table10c.attr7 IS NULL)) AND table6b.attr19=table5a.attr55 AND table6b.attr2=‘keyword’ AND table7a.attr19=table2b.attr19 AND table7a.attr17=table15.attr19 AND table4b.attr11=‘keyword’ AND table8.attr19=table7a.attr80 AND table8.attr19=table13.attr20 AND table8.attr4=‘keyword’ AND table9.attr10=table16.attr11 AND table3b.attr19=table10c.attr18 AND table3b.attr22=table12.attr63 AND table3b.attr66=‘keyword’ AND table10a.attr54=table7a.attr8 AND table10a.attr70=table10c.attr10 AND table10a.attr16=table4d.attr11 AND table4c.attr99=‘keyword’ AND table4c.attr1=‘keyword’ AND table11.attr10=table5a.attr10 AND table11.attr40=‘keyword’ AND table11.attr50=‘keyword’ AND table2b.attr1=table1.attr8 AND table2b.attr9 IN (‘keyword’) AND table2b.attr2 LIKE ‘keyword’% AND table12.attr9 IN (‘keyword’) AND table7b.attr1=table2a.attr10 AND table3c.attr13=table10c.attr1 AND table3c.attr10=table6b.attr20 AND table3c.attr13=‘keyword’ AND table10b.attr16=table10a.attr7 AND table10b.attr11=table7b.attr8 AND table10b.attr13=table4b.attr89 AND table13.attr1=table2b.attr10 AND table13.attr20=’‘keyword’’ AND table13.attr15=‘keyword’ AND table3d.attr49=table12.attr18 AND table3d.attr18=table10c.attr11 AND table3d.attr14=‘keyword’ AND table4d.attr17 IN (‘keyword’) AND table4d.attr19 IN (‘keyword’) AND table16.attr28=table11.attr56 AND table16.attr16=table10b.attr78 AND table16.attr5=table14.attr56 AND table4e.attr34 IN (‘keyword’) AND table4e.attr48 IN (‘keyword’) AND table4f.attr89=table5b.attr7 AND table4f.attr45 IN (‘keyword’) AND table4f.attr1=‘keyword’ AND table10c.attr2=table4e.attr19 AND (table10c.attr78=table12.attr56 OR (table10c.attr55 IS NULL AND table12.attr17 IS NULL)) 50.000.000 €/year
  66. 66. Ontology-Based Data Access 66 OBDI framework Query answering Ontology languages Mappings Identity Conclusions Ontology-based data integration framework . . . . . . . . . . . . Query Result Ontology provides global vocabulary and conceptual view Mappings semantically link sources and ontology Data Sources external and heterogeneous We achieve logical transparency in accessing data: does not know where and how the data is stored. can only see a conceptual view of the data. data sources “lightweight” conceptual model mapping
  67. 67. Ontop • Open-source OBDA technology developed at UNIBZ (supervisor: Diego Calvanese) • Fully supports semantic web standards 
 (OWL/SPARQL) • Integrates with a plethora of relational DBMSs • Apache open license • http://ontop.inf.unibz.it 67
  68. 68. Resolving the Impedance Mismatch 68 PaperInfo FullPaper creationTime: DateTime title: String mappingId fp-mapping target paper{ID} a :FullPaper; :title {Title}; :creationTime{CT} source select I.ID, I.Title, I.CT
 from PaperInfo I where I.Type = “FP”
  69. 69. What if my DB is Very Nice? Ontology bootstrapping automatically creates • a conceptual model that mirrors 1-1 the relational DB • identity mappings Useful for “small” case studies 69
  70. 70. OBDA for Process Mining • Need to resolve a second impedance mismatch problem! • From here… 70 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool Upload Submitted upload time:DateTime upload accepted upload time:DateTime submitted to 1 * organizer of Accepted Paper <<no time>> * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 * AhasU 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1* UAuploadbyU 1 *
  71. 71. OBDA for Process Mining • …To there! 71 -xes.version : xs:decimal -xes.features : xs:token Log -name : xs:string -keys : xs:string Classifier-prefix : xs:string -uri : xs:string -key : xs:string -type : xs:string Extension Trace Event -key : xs:string -value : xs:string -type : xs:string Attribute -scope : {event,trace} Global Attribute 0..* -declare1 1 -define1 1 1..* -contain11 0..* -contain21 0..* -contain31 0..* -contain5 1 0..* -define21..* 1..* -declare20..1 1 -contain4 1 0..* -contain60..* *
  72. 72. OBDA for Process Mining • From here… 72 PaperInfo
  73. 73. OBDA for Process Mining • …To there! 73 log trace event
  74. 74. Log Annotations 74 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool Upload Submitted upload time:DateTime Upload Accepted upload time:DateTime submitted to 1 * organizer of Accepted Paper <<no time>> * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 * AhasU 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1* UAuploadbyU 1 * trace event event eventevent trace: follow has activity name: “decision” timestamp: decision time resource: follow by type: complete attributes: outcome trace: follow has & for activity name: “review” timestamp: submission time resource: follow RhasR & reviewer type: complete trace: follow has activity name: “upload submitted” timestamp: upload time resource: follow USuploadbyU type: complete trace: follow has & corr. to activity name: “upload accepted” timestamp: upload time resource: follow UAuploadbyU type: complete submitted to = BPM 2015
  75. 75. 75 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool submitted to 1 * organizer of * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 1sponds to sP 1 *1 for author 1..* * by 1 * creator 1 * 1 UAuploadbyU 1 trace event event trace: follow has activity name: “decision” timestamp: decision time resource: follow by type: complete attributes: outcome trace: follow has & for activity name: “review” timestamp: submission time resource: follow RhasR & reviewer type: complete submitted to = BPM 2015
  76. 76. Multiple Log Views 76 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool Upload Submitted upload time:DateTime Upload Accepted upload time:DateTime submitted to 1 * organizer of Accepted Paper <<no time>> * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 * AhasU 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1* UAuploadbyU 1 * trace event trace: follow has author activity name: “decision author” timestamp: decision time resource: follow PhasD type: complete event trace: follow by activity name: “decision chair” timestamp: decision time resource: follow PhasD type: complete attributes: outcome event trace: follow has & reviewer activity name: “review” timestamp: submission time resource: follow RhasR & for type: complete event trace: follow upload by activity name: “upload submitted” timestamp: upload time resource: follow UhasP type: complete
  77. 77. 77 1..* * Conference creation time:DateTime conf name:String User creation time:DateTime username:String Paper creation time:DateTime title:String Review Request invitation time:DateTime Review submission time:DateTime Decision decision time:DateTime outcome: Bool ted me ed me submitted to 1 * organizer of er * reviewer 1 0..1 PhasD 1 0..1 RhasR 1 10..1 corresponds to * UhasP 1 *1 for author 1..* * by 1 * USuploadbyU creator 1 * 1 UAuploadbyU 1 trace event trace: follow has author activity name: “decision author” timestamp: decision time resource: follow PhasD type: complete event trace: follow by activity name: “decision chair” timestamp: decision time resource: follow PhasD type: complete attributes: outcome event trace: follow has & reviewer activity name: “review” timestamp: submission time resource: follow RhasR & for type: complete
  78. 78. And Now? 78 data base data base data base Domain Ontology Event Ontology XES Log Extraction Mapping Annota+on ? ?
  79. 79. Mapping Synthesis 79 data base data base data base Domain Ontology Event Ontology XES Log Extraction Mapping Annota+on ? ? Automatically synthesized 1. Annotation transformed into an ontology-to- ontology mapping M’ 2. M’ is “rewritten” using the data-to-domain ontology mapping 3. The result is a mapping connecting the XES ontology directly to the data
  80. 80. Log Materialization 80 data base data base data base Domain Ontology Event Ontology Mapping Annotation Log Mapping XES Event Data XES File Process Mining Tools 1 2 3 4 SELECT DISTINCT ?t ?v ?e WHERE {?t :TcontainsA ?ta . ?ta :valueA ?v. ?t :TcontainsE ?e.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :typeA ?t.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :keyA ?t.} SELECT DISTINCT ?e ?t WHERE {?e :EcontainsA ?a . ?a :valueA ?t.}
  81. 81. 81
  82. 82. 82
  83. 83. Log Virtualization 83 data base data base data base Domain Ontology Event Ontology Mapping Annotation Process Mining Tools On Demand XES Loader Log Mapping 1 2 XFactoryOnDemandImpl XLogOnDemandImpl XTraceOnDemandImpl XEventOnDemandImpl XLogOnDemandIterator XTraceOnDemandIterator xlog.get(7).get(90) to retrieve te event in index 7th inside the 90th trace in a log
  84. 84. Questions • How to optimize and test the scalability of the approach? Fine-tuning is a must! • Real vs simulated data? (Benchmarking OBDA) • Initial benchmarking using CPN tools • Is the “virtual” approach useful? How do process mining algorithms access the data? • Hybrid virtual approach with caching strategies? 84
  85. 85. KAOS Project Knowledge-Aware Operational Support • Goal: Empowering process mining and online operational support with domain knowledge • Euregio project: Trento + Bolzano + Innsbruck • Mix of expertise from AI, BPM, database theory, formal methods, formal ontology, conceptual modeling, process mining, machine learning, software engineering • Just started: we are hiring!!! 85
  86. 86. 86 Conclusion
  87. 87. Acknowledgments All coauthors of this research, 
 in particular 
 
 Diego Calvanese (UNIBZ)
 Giuseppe De Giacomo (UNIROMA)
 Riccardo De Masellis (FBK-Trento)
 Alin Deutsch (UCSD)
 Chiara Difrancescomarino (FBK-Trento)
 Chiara Ghidini (FBK-Trento)
 Fabio Patrizi (UNIBZ)
 Sergio Tessaris (UNIBZ)
 Alifah Syamsiyah (TU/e)
 Wil van der Aalst (TU/e) 87

×