Daniele Dell’Aglio
Problem setting
Real time integration of dynamic data from heterogeneous
sources
– Traffic Prediction
– Social media analytics
– Personalised services
2/33
Daniele Dell’Aglio
Information Flow Processing
In literature there are two different main approaches to
process streams
Data Stream Management Systems (DSMSs)
• Roots in DBMS research
• Aggregations, filters and joins
3/33
Daniele Dell’Aglio
Information Flow Processing
In literature there are two different main approaches to
process streams
Data Stream Management Systems (DSMSs)
• Roots in DBMS research
• Aggregations, filters and joins
Complex Event Processors (CEPs)
• Roots in Distributed Systems (Publish/Subscribe)
• Search of relevant patterns in the stream
• Non-equi-join on the timestamps (after, before, etc.)
3/33
Daniele Dell’Aglio
Information Flow Processing
In literature there are two different main approaches to
process streams
Data Stream Management Systems (DSMSs)
• Roots in DBMS research
• Aggregations, filters and joins
Complex Event Processors (CEPs)
• Roots in Distributed Systems (Publish/Subscribe)
• Search of relevant patterns in the stream
• Non-equi-join on the timestamps (after, before, etc.)
Current systems implements feature of both of them
• EPL (e.g. Esper, ORACLE CEP)
3/33
Daniele Dell’Aglio
The SR research so far
IMaRS
STARQL
DynamiTE
TROWL
EuropeMapfromWikipedia
StreamRule
StarCITY
SPARKWAVE
LARS
Key
5/33
StreamQR
Daniele Dell’Aglio
The problem (1)
t3 6 91
{:alice :isIn :hall}
{:bob :isIn :hall}
{:alice :isIn :kitchen}
{:bob :isIn :kitchen}
e1 e2 e3 e4S
6/33
Daniele Dell’Aglio
The problem (1)
Where are Alice and Bob,
when they are together?
t3 6 91
{:alice :isIn :hall}
{:bob :isIn :hall}
{:alice :isIn :kitchen}
{:bob :isIn :kitchen}
e1 e2 e3 e4S
6/33
Daniele Dell’Aglio
The problem (1)
Where are Alice and Bob,
when they are together?
Let’s consider a tumbling
window W of size 5
Let’s execute the
experiment 4 times
t3 6 91
{:alice :isIn :hall}
{:bob :isIn :hall}
{:alice :isIn :kitchen}
{:bob :isIn :kitchen}
e1 e2 e3 e4S
6/33
Daniele Dell’Aglio
The problem (1)
Execution 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
Where are Alice and Bob,
when they are together?
Let’s consider a tumbling
window W of size 5
Let’s execute the
experiment 4 times
t3 6 91
{:alice :isIn :hall}
{:bob :isIn :hall}
{:alice :isIn :kitchen}
{:bob :isIn :kitchen}
Which is the correct answer?
e1 e2 e3 e4S
6/33
Daniele Dell’Aglio
The problem (2)
Execution 1° answer 2° answer
1 :hall [6] :kitchen [11]
2 :hall [5] :kitchen [10]
3 :hall [6] :kitchen [11]
4 - [7] - [12]
Execution 1° answer 2° answer
1 :hall [3] :kitchen [9]
2 No answers
3 :hall [3] :kitchen [9]
4 No answers
CSPARQL CQELS
Which system behaves in the correct way?
t3 6 91
{:alice :isIn :hall}
{:bob :isIn :hall}
{:alice :isIn :kitchen}
{:bob :isIn :kitchen}
e1 e2 e3 e4S
7/33
Daniele Dell’Aglio
Research Question 1
In the context of continuous query answering over RDF
streams, how can the behaviour of existing systems be
captured, compared and contrast when deductive reasoning
processes are not involved?
8/33
Daniele Dell’Aglio
Research Question 1
In the context of continuous query answering over RDF
streams, how can the behaviour of existing systems be
captured, compared and contrast when deductive reasoning
processes are not involved?
Why do we need it?
• Comparison and contrast
• Study RDF Stream Processing related problems
• Standard RSP query language
8/33
Daniele Dell’Aglio
Background
data
Streams
Applications
RSP-QL BGP evaluation
over background
data
BGP evaluation
over streams
Model to express
continous queries
RDF
SPARQL
RSEP-QL
A reference model that formally defines the semantics of
RDF Stream Processing engines
9/33
Daniele Dell’Aglio
Background
data
Streams
RSEP-QL
Applications
RSP-QL BGP evaluation
over background
data
BGP evaluation
over streams
Event Pattern
detection operators
Model to express
continous queries
RDF
SPARQL
RSEP-QL
A reference model that formally defines the semantics of
RDF Stream Processing engines
9/33
Daniele Dell’Aglio
RSEP-QL: Dataset
From SPARQL to RSEP-QL dataset
t1
G(t1)
T⊆ ℕ R={RDF graph}
SPARQL dataset
G
Instantaneous Graph
G(t1) RTime-Varying Graph
G: T R
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S
S1
S2
𝕎(S)
13/33
Daniele Dell’Aglio
RSEP-QL: Dataset
From SPARQL to RSEP-QL dataset
t1
G(t1)
T⊆ ℕ R={RDF graph}
SPARQL dataset
G
H
Instantaneous Graph
G(t1) RTime-Varying Graph
G: T R
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S
S1
S2
𝕎(S)
13/33
Daniele Dell’Aglio
RSEP-QL: Dataset
From SPARQL to RSEP-QL dataset
t1
G(t1)
T⊆ ℕ R={RDF graph}
SPARQL dataset
G
H
Instantaneous Graph
G(t1) RTime-Varying Graph
G: T R
RESP-QL dataset
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S
S1
S2
𝕎(S)
13/33
Daniele Dell’Aglio
RSEP-QL: Operators
Stream Processing operators (RSP-QL)
• SPARQL operators
• WINDOW to specify that the active element is a window (similar to
GRAPH)
• RStream, IStream, DStream to create the output stream
14/33
Daniele Dell’Aglio
RSEP-QL: Operators
Stream Processing operators (RSP-QL)
• SPARQL operators
• WINDOW to specify that the active element is a window (similar to
GRAPH)
• RStream, IStream, DStream to create the output stream
Event Processing operators (RSEP-QL)
• EVENT w P (Basic Event Pattern)
• E1 SEQ E2
• FIRST E1, LAST E1
• MATCH to describe an event pattern
• ...
14/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Stream Processing Evaluation Semantics
The SPARQL evaluation function is defined as
⟦𝑃⟧ 𝐷𝑆(𝐺)
The RSEP-QL evaluation function extends the SPARQL one
by introducing the evaluation time instant
⟪𝑃⟫ 𝑆𝐷𝑆(𝐴)
𝑡
15/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Stream Processing Evaluation Semantics
The SPARQL evaluation function is defined as
⟦𝑃⟧ 𝐷𝑆(𝐺)
The RSEP-QL evaluation function extends the SPARQL one
by introducing the evaluation time instant
⟪𝑃⟫ 𝑆𝐷𝑆(𝐴)
𝑡
SPARQL operators are straight extended to the new
evaluation function
Example: JOIN
⟦𝐽𝑂𝐼𝑁(𝑃1, 𝑃2)⟧ 𝐷𝑆(𝐺) = ⟦𝑃1⟧ 𝐷𝑆 𝐺 ⨝⟦𝑃2⟧ 𝐷𝑆 𝐺
⟪𝐽𝑂𝐼𝑁(𝑃1, 𝑃2)⟫ 𝑆𝐷𝑆 𝐴
𝑡
= ⟪𝑃1⟫ 𝑆𝐷𝑆 𝐴
𝑡
⨝⟪𝑃2⟫ 𝑆𝐷𝑆 𝐴
𝑡
15/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Stream Processing Evaluation Semantics
The main difference is on the BGP evaluation:
⟪𝐵𝐺𝑃⟫ 𝑆𝐷𝑆(𝐴)
𝑡
=⟦𝐵𝐺𝑃⟧ 𝑆𝐷𝑆(𝐴,𝑡)
16/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Stream Processing Evaluation Semantics
The main difference is on the BGP evaluation:
⟪𝐵𝐺𝑃⟫ 𝑆𝐷𝑆(𝐴)
𝑡
=⟦𝐵𝐺𝑃⟧ 𝑆𝐷𝑆(𝐴,𝑡)
SDS(A,t) is:
SDS(G,t)= SDS(G(t)) if A is a time-varying graph G
16/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Stream Processing Evaluation Semantics
The main difference is on the BGP evaluation:
⟪𝐵𝐺𝑃⟫ 𝑆𝐷𝑆(𝐴)
𝑡
=⟦𝐵𝐺𝑃⟧ 𝑆𝐷𝑆(𝐴,𝑡)
SDS(A,t) is:
SDS(G,t)= SDS(G(t)) if A is a time-varying graph G
SDS(𝕎(S),t)=SDS(m(𝕎(S,t))) if A is from a sliding window 𝕎
SDS(𝕃(S),t)=SDS(m(𝕃(S,t))) if A is from a landmark window 𝕃
where m denotes a merge function
m(𝕎(S,t))= 𝑑 𝑖,𝑡 𝑖 ∈𝕎(S,t) 𝑑𝑖
• takes as input a window content i.e. a sequence of timestamped
RDF graphs
• produces an RDF graph
16/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Event Processing Evaluation semantics
A new evaluation function ⦅⋅⦆ 𝑜,𝑐
𝑡
• t is the evaluation time instant (as in ⟪⋅⟫ 𝑆𝐷𝑆(𝐴)
𝑡
),
• 𝑜, 𝑐 is an additional window to identify the portion of the data
on which the event may happen
17/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Event Processing Evaluation semantics
A new evaluation function ⦅⋅⦆ 𝑜,𝑐
𝑡
• t is the evaluation time instant (as in ⟪⋅⟫ 𝑆𝐷𝑆(𝐴)
𝑡
),
• 𝑜, 𝑐 is an additional window to identify the portion of the data
on which the event may happen
Event pattern evaluation produces event mappings 𝜇, 𝑡1, 𝑡2
• 𝜇 is a solution mapping
• 𝑡1 and 𝑡2 denote the time inverval justifying 𝜇
17/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Event Processing Evaluation semantics
A new evaluation function ⦅⋅⦆ 𝑜,𝑐
𝑡
• t is the evaluation time instant (as in ⟪⋅⟫ 𝑆𝐷𝑆(𝐴)
𝑡
),
• 𝑜, 𝑐 is an additional window to identify the portion of the data
on which the event may happen
Event pattern evaluation produces event mappings 𝜇, 𝑡1, 𝑡2
• 𝜇 is a solution mapping
• 𝑡1 and 𝑡2 denote the time inverval justifying 𝜇
Selection and consumption policies to determine the data
involved in the evaluation
17/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Evaluation semantics - Example
⦅FIRST EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2⦆ 9,16
𝑡
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
FIRST EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
18/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Evaluation semantics - Example
⦅FIRST EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2⦆ 9,16
𝑡
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
FIRST EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
18/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Evaluation semantics - Example
⦅FIRST EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2⦆ 9,16
𝑡
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
FIRST EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
18/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Evaluation semantics - Example
⦅FIRST EVENT 𝑤1 𝑃1 SEQ EVERY EVENT 𝑤2 𝑃2⦆ 9,16
𝑡
S2
S3 S4
S1 S1
S6
S7
S8
S9 S10
S11
S12
S2
FIRST EVENT 𝑤1 𝑃1
SEQ
EVERY EVENT 𝑤2 𝑃2
t
10 12 14 1611 13 15
18/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
MATCH graph pattern
Event patterns are enclosed in MATCH graph patterns
• Event mappings exist only in the context of event patterns
• The evaluation of a MATCH graph pattern produces a bag of solution
mappings
⟪𝑀𝐴𝑇𝐶𝐻 𝐸⟫ 𝑆𝐷𝑆 𝐴
𝑡
= {𝜇| 𝜇, 𝑡1, 𝑡2 ∈ ⦅𝐸⦆ 0,𝑡
𝑡
}
It is possible to combine the MATCH graph pattern with
other SPARQL graph patterns
19/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Continuous evaluation
For each evaluation time t ∈ ET: ⟪𝑆𝐸⟫ 𝑆𝐷𝑆(𝐴)
𝑡
• The continuous evaluation is a sequence of instantaneous
evaluations
20/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Continuous evaluation
For each evaluation time t ∈ ET: ⟪𝑆𝐸⟫ 𝑆𝐷𝑆(𝐴)
𝑡
• The continuous evaluation is a sequence of instantaneous
evaluations
It is not always possible to compute ET a priori
20/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Continuous evaluation
For each evaluation time t ∈ ET: ⟪𝑆𝐸⟫ 𝑆𝐷𝑆(𝐴)
𝑡
• The continuous evaluation is a sequence of instantaneous
evaluations
It is not always possible to compute ET a priori
• Can be infinite
• Can be data dependent
20/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Continuous evaluation
For each evaluation time t ∈ ET: ⟪𝑆𝐸⟫ 𝑆𝐷𝑆(𝐴)
𝑡
• The continuous evaluation is a sequence of instantaneous
evaluations
It is not always possible to compute ET a priori
• Can be infinite
• Can be data dependent
ET is expressed through a Report Policy
• A set of conditions to one or more window operators in SDS
• Initially defined in SECRET for Stream Processing engines
20/33
Daniele Dell’Aglio
RSEP-QL: Evaluation Semantics
Continuous evaluation
Report Policy examples:
• P Periodic: the window reports only at regular intervals
• WC Window Close: the window reports if the active window
closes
• CC Content Change: the window reports if the content changes.
21/33
Daniele Dell’Aglio
Research Question 2
Is it possible to add deductive reasoning features to RSEP-QL?
Applications
RDF
SPARQL
Ontology
SPARQL
Entailment
Regimes
22/33
Daniele Dell’Aglio
Research Question 2
Is it possible to add deductive reasoning features to RSEP-QL?
Streams Ontology
Background
data
RSEP-QL
Applications
RSP-QL
RDF
SPARQL
Ontology
SPARQL
Entailment
Regimes
22/33
Daniele Dell’Aglio
Research Question 2
Is it possible to add deductive reasoning features to RSEP-QL?
Streams Ontology
Background
data
RSEP-QL
Entailment
Regimes
RSEP-QL
Applications
RSP-QL
RDF
SPARQL
Ontology
SPARQL
Entailment
Regimes
22/33
Daniele Dell’Aglio
RSEP-QL: Reasoning
Stream Processing
At which point of the continuous query answering process
should the inference process be taken into account?
Event ProcessingWindow merge
Window operator
Stream S3
4 5
6
7
8
9
S 1
2
6
7
8
9
6-9
t
RDF Graph
RDF Stream
Window
23/33
Daniele Dell’Aglio
RSEP-QL: Reasoning
Stream Processing
At which point of the continuous query answering process
should the inference process be taken into account?
Direct application
on SPARQL
entailment regimes
Event ProcessingWindow merge
Window operator
Stream S
Graph-level entailment
3
4 5
6
7
8
9
S 1
2
6
7
8
9
6-9
t
RDF Graph
RDF Stream
Window
23/33
Daniele Dell’Aglio
RSEP-QL: Reasoning
Stream Processing
At which point of the continuous query answering process
should the inference process be taken into account?
After applying the
window operator
Direct application
on SPARQL
entailment regimes
Event ProcessingWindow merge
Window operator
Stream S
Graph-level entailment
Window-level entailment
3
4 5
6
7
8
9
S 1
2
6
7
8
9
6-9
t
RDF Graph
RDF Stream
Window
23/33
Daniele Dell’Aglio
RSEP-QL: Reasoning
Stream Processing
At which point of the continuous query answering process
should the inference process be taken into account?
A larger, landmark
window to include
relevant information
from the past
After applying the
window operator
Direct application
on SPARQL
entailment regimes
Event ProcessingWindow merge
Window operator
Stream S
Graph-level entailment
Window-level entailment
Stream-level entailment
3
4 5
6
7
8
9
S 1
2
6
7
8
9
6-9
t
6
7
8
9
3
4 5
RDF Graph
RDF Stream
Window
23/33
Daniele Dell’Aglio
RSEP-QL: Reasoning
Window-level and stream-level entailment regimes
How can DL ontology be extended to capture the window
content?
• Ontology stream – a sequence of time-stamped ABoxes 𝑆 𝑜
𝑐
𝑖
w.r.t a static TBox
24/33
Daniele Dell’Aglio
RSEP-QL: Reasoning
Window-level and stream-level entailment regimes
How can DL ontology be extended to capture the window
content?
• Ontology stream – a sequence of time-stamped ABoxes 𝑆 𝑜
𝑐
𝑖
w.r.t a static TBox
• Introduction of so-far closure, as the closure of the i-th
snapshot w.r.t. the Tbox and the previus snapshots
• The so-far closure of 𝑆 𝑜
𝑐 𝑖 is done w.r.t. the TBox and the
snapshots 𝑆 𝑜
𝑐 𝑜 … 𝑆 𝑜
𝑐 𝑖 − 1
• The so-far closure of the ontology stream is obtained by so-far
closing all the ABox snapshots
24/33
Daniele Dell’Aglio
Event Processing
Stream-level entailment
RSEP-QL: Reasoning
Results
1. Theorem: in the context of
Stream Processing operations,
graph- and window-level
entailments have the same
behaviour as far as no
inconsistency is entailed
25/33
Stream Processing
Window merge
Window operator
Stream S
Graph-level ent.
Window-level entailment
Daniele Dell’Aglio
Event Processing
Stream-level entailment
RSEP-QL: Reasoning
Results
1. Theorem: in the context of
Stream Processing operations,
graph- and window-level
entailments have the same
behaviour as far as no
inconsistency is entailed
2. Theorem: The stream-level
entailment brings to a larger
set of answers w.r.t. graph-
and window- entailment ones.
25/33
Stream Processing
Event ProcessingWindow merge
Window operator
Stream S
Graph-level ent.
Window-level entailment
Stream-level entailment
Daniele Dell’Aglio
CSR Bench
CSR Bench is an extension of the SRbench benchmark
that focuses on correctness
A test suite
• LinkedSensorData as dataset
• 8 parametric queries to tests the RSP engines in different
conditions
An oracle (an automatic correctness validator)
• Based on RSP-QL
28/33
Daniele Dell’Aglio
CSR Bench
Experimental Results
All the three systems that we
considered in our experiments showed
wrong behaviours
The defects we identified are related to:
•The window operator
• Initialization
• Slide parameter
• Window contents
•Timestamps of the stream items
• Internal timestamp management
CQELS
C-SPARQL
SPARQLstream
Q1
Q2
Q3
Q4
Q5
Q6
Q7
29/33
Daniele Dell’Aglio
What’s next?
An RSEP-QL query language
• W3C RSP CG ongoing activities
Implementations
• Yet another RSP engine
• Stream Reasoner composition
• RSEP-QL as model to capture the behaviour of SRs
• RSEP-QL queries to define tasks to be executed over one or
more SR engines
30/33
Daniele Dell’Aglio
What’s next?
An RSEP-QL query language
• W3C RSP CG ongoing activities
Implementations
• Yet another RSP engine
• Stream Reasoner composition
• RSEP-QL as model to capture the behaviour of SRs
• RSEP-QL queries to define tasks to be executed over one or
more SR engines
Stream-level entailment scalable techniques
• Initial work on stream without inference
• Permanent storage of portions of data (raw or inferred)
30/33
Daniele Dell’Aglio
What’s next?
Reasoning over streams and continuous query answering
over streams are still two different worlds
• Key to show why RDF Stream Processing is useful w.r.t.
DSMS and CEP
• Few initial attempts to put them together
31/33
Daniele Dell’Aglio
What’s next?
Reasoning over streams and continuous query answering
over streams are still two different worlds
• Key to show why RDF Stream Processing is useful w.r.t.
DSMS and CEP
• Few initial attempts to put them together
Streams in the Web are getting popular – applications want
more and more sophisticated features
• Different timestamps, out-of-orders
• Noise (inductive reasoning)
31/33
Daniele Dell’Aglio
Conclusions
RSEP-QL captures the evaluation semantics of existing
RSP engines
RSEP-QL can determine which are the correct answers of
an RSP engine, given the input data and query
• At the basis of CSR Bench
32/33
Daniele Dell’Aglio
Conclusions
RSEP-QL captures the evaluation semantics of existing
RSP engines
RSEP-QL can determine which are the correct answers of
an RSP engine, given the input data and query
• At the basis of CSR Bench
The dynamics introduced in the continuous query evaluation
process have not been totally understood
• Not fully captured by existing models
• RSEP-QL captures those dynamics
32/33
Daniele Dell’Aglio
Thank you! Questions?
On Unified Stream Reasoning
Daniele Dell’Aglio
Advisor: Prof. Emanuele Della Valle
33/33