When a FILTER makes the difference in continuously answering SPARQL queries on streaming and quasi-static Linked Data

WHEN A FILTER MAKES THE DIFFERENCE IN
CONTINUOUSLYANSWERING SPARQL
QUERIES ON STREAMING AND QUASI-STATIC
LINKED DATA
Shima Zahmatkesh, Emanuele Della Valle, and Daniele Dell'Aglio
DEIB - Politecnico of Milano
ICWE 2016 – Lugano
7th June 2016

Outline
• Introduction
• Motivation
• Problem Statement
• State-of-the-art
• Research Question and Hypotheses
• Proposed Solution
• Experimental Results
• Conclusion and future works
!2

Introduction to Stream Processing
Stream Processing
Engine
ResultsWindows
Stream data
Register query once
and execute it
continuously
!3

Time-based sliding window 
Sequence of
timestamped events
W(3,1 ,1)
β
ω
widthslide
S3
S4 S5 S6
S7
S8 S9
S10
S11
S
S1
S2
t
1 2 3 4 5 6 7 8 9 10 11 12
W i
W i+1
!4
)

Introduction to RSP EnginesRSPengine
Web
Results
Join
Windows
RDF Streams SPARQL endpoint
- Fetching all background data in each evaluation
- High number of invocation to the endpoint
- Loosing reactiveness of the engine
!5

Introduction to RSP EnginesRSPengine
Web
Results
Join
WindowsRDF Streams SPARQL endpoint
Local
Replica
Data become stale
during time
!6

Introduction to Continuous QueryRSPengine
Web
Results
Join
Local
Replica
Maintenance
Policy
WindowsRDF Streams SPARQL endpoint
!7

Motivation
The cloth brand ACME wants to persuade influential Social
Network users to post commercial endorsements.
Every minute give me the ID of the users that are mentioned on
Social Network in the last 10 minutes whose number of followers is
greater than 100,000.
!8

Motivation
The cloth brand ACME wants to persuade influential Social
Network users to post commercial endorsements.
Every minute give me the ID of the users that are mentioned on
Social Network in the last 10 minutes whose number of followers is
greater than 100,000.
ΩW
?user :hasMentions ?metionsNum
ΩS
?user :hasFollowers ?followerCount
WINDOW
SERVICE
JOIN
FILTER
?followerCount > 100000
!8

Problem Statement
• RSP Engines:
• Fetch all the background data in each evaluation
• Have high number of invocations
• The application may loose its reactiveness.
• Class of query with WINDOW and SERVICE and FILTER
clauses
• Solution:
• Using a local replica of the background data
• Need maintenance policies to keep the data freshness
!9

State-of-the-art
• Acqua: guarantee reactiveness of RSP engines while
maximizing the freshness of the replica
• Used a local replica of the background data
• Proposed maintenance policies
• Used refresh budget γ to limit the number of the access to the
background data and guarantee reactiveness
• Class of query with WINDOW and SERVICE clauses without
FILTER
!10

Acqua framework
!11
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal Replica
2
3
1
SERVICE clause
E
C
RND
LRU
WBM
WSJ
Candidate set
Elected set: top
γ mappings of
Candidate set

Research Question
• Given a class of query with WINDOW and SERVICE and
FILTER clauses
• how can we refresh the local replica in order to
• guarantee reactiveness while
• maximizing the freshness the replica?
!12

Intuition
!13
time
NumberofFollowers
t
Filtering Threshold
User A
User B
User C
User D

Hypotheses
• H.1 the replica can be maintained fresher than when using
Acqua policies, if we first refresh the mappings whose
values are closer to the Filtering Threshold.
• H.2 the replica can be maintained fresher than when using
Acqua policies by first selecting the mappings as in
Hypothesis H.1 and, then, applying the Acqua policies.
!14

Proposed Solution
!15
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal Replica
2
3
1
SERVICE clause
contains
FILTER clause
E
C
WSJ
Candidate set
Elected set
Filter Update Policy
RND.F
LRU.F
WBM.F

Proposed Solution
• Filter Update Policy:
For each mapping in the replica:
• Computes how close is the value associate to the variable of the
mapping to the Filtering Threshold.
• Combination with Acqua:
• RND.F, LRU.F, WBM.F
• Select a subset of Candidate set whose mappings are closer to
Filtering Threshold.
• Apply the Acqua policies over this subset.
!16

Experimental Evaluation (1 of 3)
• Data Sets
• Streaming data: collected from 400 verified users of Twitter for
three hours of tweets
• Background data: collected the number of followers per user, every
minute during the three hours
• Transformation of the background data:
• To control the selectivity of filter
• The time-series of specified percentage of the users crosses the Filtering
Threshold at least once during the experiment.
• To avoid biases, generate 10 different dataset for each specified
percentage
!17

• Query
• Contains WINDOW, SERVICE, and FILTER clauses
• Generate correct answer of the query by a Oracle
• Ans(Oracle)
• KPIs
• Use Jaccard distance to measure diversity of the set generated by
the query Ans(Qi) and correct answers Ans(Oraclei)
• Define cumulative Jaccard distance
!18

• Two sets of experiments:
• Comparing the Filter Update Policy (.F) with Acqua ones
• Comparing the Filter Update Policy (.F) with the combine policies
(RND.F, LRU.F, and WBM.F)
• Sensitivity of Refresh Budget
• Sensitivity of Selectivity of Filter
!19

How to read result graphs
!20
CumulativeJaccardDistance
Iterations of query execution
Policy A
Policy C
Policy B
Policy A outperform Policy B and C

Result of Experiment 1
!21

!22
90 80 75 70 60 50
Selectivity

!23
Filter
Filter

!24
90 80 75 70 60 50
Selectivity

Conclusion
• Continuous evaluation of a class of queries that joins data
streams and background data with Filtering.
• Get the idea of local replica, maintenance policy, and refresh
budget from the state-of-the-art
• Extend the class of continuous queries
• Propose new maintenance policies
• Filter Update Policy keeps the replica fresher than Acqua
policies in higher selectivity (>60%).
• Combination of policies keep the replica even fresher than the
Filter Update Policy.
!25

Future works
• Broaden the class of queries
• Multiple filtering
• Filtering condition formulated as a ranking clause
• Pushing the FILTER clause into the SERVICE clause and
considering caching instead of local replica
• Study the effect of different trends in the data
!26

Thank you! 
Any Question?
When a FILTER makes the difference in
continuously answering SPARQL queries
on streaming and quasi-static Linked Data
Shima Zahmatkesh
shima.zahmatkesh@polimi.it
DEIB - Politecnico of Milano
!27

When a FILTER makes the di fference in continuously answering SPARQL queries on streaming and quasi-static Linked Data

More Related Content

Similar to When a FILTER makes the di fference in continuously answering SPARQL queries on streaming and quasi-static Linked Data

Recently uploaded