SlideShare a Scribd company logo
ON RELEVANT QUERYANSWERING OVER
STREAMING AND DISTRIBUTED DATA
Shima Zahmatkesh
Politecnico di Milano – DEIB
Data Science Group – Stream Reasoning Team
Supervisor: Prof. Emanuele Della Valle
INTRODUCTION
!2
Motivation
▪ An application that shows the places
around drivers where there is an high
probability of finding free parking.
Query: return the best streets (around the car
that calls the service) where there are many free
parking lots and few cars looking for parking in
the last 10 minutes.
!3
▪ Web applications require to combine data streams
with distributed data over the Web to continuously find
the best answer to user’s queries
Solving the example with web stream processing
Web
Relevant
Answers
Join
Windows
Car request
streams
!4
Stream Processing Engine
Request
Response
Best streets to look for parking
Free parking lots
Problem Statement
Web
Relevant
Answers
Join
SPARQL endpoint
!5
RDF Stream Processing (RSP) Engine
Local
Replica
Minimize
computational
resources
• High Latency
• Rate Limits
Being
Reactive
• Stale Data
• Refresh
Budget
• Maintenance
Policy
Windows
RDF Streams
Research Question
Is it possible to optimize query evaluation in order to
continuously obtain the most relevant combinations of
streaming and evolving distributed data, while
guaranteeing the reactiveness of the engine?
!6
Related work
!7
Stream
Processing
Top-k Query
Evaluation
Federated Query
Processing
ACQUA
Using
resource
replication
MinTopK
Optimal continuous
top-k query
evaluation
•Top-k linked data
query by Wagner
•Top-k join queries
by Ilyas
Our work
Previously
unexplored
APPROACH
!8
Scope of the state of the art
Is it possible to optimize query evaluation in order to
continuously obtain the most relevant combinations of
streaming and evolving distributed data, while
guaranteeing the reactiveness of the engine?
!9
Features ACQUA MinTopk
Type of data Streaming and
distributed
streaming
Relevancy ✗ ✓
Reactiveness Refresh budget Incremental
evaluation
Handling evolving data Local replica
Maintenance policies
✗
Scope of the research
▪ Queries that contains FILTER clause and have to filter
the data come in the distributed dataset.
▪ Top-k queries where the scoring function involves data
that appears both in the streaming and the distributed
datasets.
!10
QUERIES WITH A FILTER
CLAUSE
!11
Query
▪ Every minute give me the best influencers, i.e. users who
are mentioned on Social Network in the last 10 minutes
whose number of followers is greater than 100,000.
!12
REGISTER STREAM <:Influencers> AS
CONSTRUCT {?user a :influentialUser}
WHERE {
WINDOW :W(10m,1m) ON :S
{?user :hasMentions ?mentionsNumber}
SERVICE :BKG
{?user :hasFollowers ?followersCount}
FILTER (?followersCount > 100000)
}
Filtering Threshold
ACQUA
State of the art - ACQUA
!13
WINDOW clause
JOIN
Local Replica
Candidate set
Elected set
RND
LRU
WBM
SERVICE clause
Maintainer
3
Proposer
1
Ranker
2
Proposed Solution – ACQUA.F
!14
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal Replica
SERVICE clause
with
FILTER clause
✓ Filter Update
Policy
✓ RND.F
✓ LRU.F
✓ WBM.F
Filter Update Policy (intuition)
!15
time
NumberofFollowers
t
User A
User B
User D
▪ Computes how close is the value associate to the
variable of each data item to the Filtering Threshold.
User C
Filtering Threshold
Experimental Result
!16
WorstBest
Performance
Experiment Dimension
For low selectivity
WBM is better than
Filter Update Policy
For high selectivity
Filter Update Policy is
better than WBM
Combined Policies – ACQUA.F
!17
time
NumberofFollowers
t
Band
User A
User B
User C
User D
▪ Combine Filter Update Policy with ACQUA ones
▪ RND.F, LRU.F, and WBM.F
Experimental Result
!18
WorstBest
Performance
Experiment Dimension
Impossible in
practice
State of the art - Rank Aggregation
▪ Fairly take into account the opinions of different
algorithms.
▪ Combine the ranking lists by computing aggregated score
!19
User Score
Alice 0.8
Bob 0.7
David 0.4
User Score
Bob 0.9
David 0.8
Alice 0.7
α = 0.5
User Scoreagg
Bob 0.8
Alice 0.75
David 0.6
WBM Filter Update WBM.F+
Proposed Solution – ACQUA.F+
!20
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal Replica
SERVICE clause
with
FILTER clause
✓ LRU.F+
✓ WBM.F+
✓ WBM.F*
Experimental Results
!21
Comparable to WBM.F
Possible in practice
TOP-K QUERIES
!22
W1(current window)
State of the art – MinTopK
!23
Time
Score
E
C
W1
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Window Length = 9
Top-2
results
W2
State of the art – MinTopK
!23
Time
Score
E
C
E
C
W1 W2
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Slide = 3
Top-2
results
W3
State of the art – MinTopK
!23
Time
Score
E
C
E
C
W1
E
F
W2 W3
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Top-2
results
W3
State of the art – MinTopK
!23
Time
Score
Object Ws We
E 1 3
C 1 2
F 3 3
E
C
E
C
W1
E
F
W2 W3
now
Super-MTK
List
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Top-2
results
Top-k Query
▪ Return every 3 minutes the top-2 popular users who are
most mentioned on Social Networks in the last 9 minutes
!24
REGISTER STREAM :TopkUsersToContact AS
SELECT ?user F(?mentionCount,?followerCount) AS ?score
FROM NAMED WINDOW :W ON :S [RANGE 9m STEP 3m]
WHERE {
WINDOW :W {?user :hasMentions ?mentionCount}
SERVICE :BKG {?user :hasFollowers ?followerCount}
}
ORDER BY DESC (?score)
LIMIT 2
Time
ScoreS
0
2
3
5
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Score
!25
Time
ScoreR
Time
ScoreS
0
2
3
5
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Score
!25
Time
FinalScore
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Time
ScoreR
Final Score = F ( ScoreS , ScoreR)
Contributions (1 of 2)
▪ Data structure: Super-MTK+N List
▪ to handle changes in distributed dataset : N changes per window
▪ N additional slots
▪ MTK+N list : Keep K+N elements
▪ Complexity: O(K+N)
!26
Object Ws We
E 1 2
G 1 3
C 1 1
F 2 2
A 3 3
E
G
C
E
G
W1
W2
LBP
F
W1 W2 W3
G
A
K area
N area
MTK+N lists Super-MTK+N List
Contributions (2 of 2)
▪ Algorithm:
▪ Top-k+N
▪ Window expiration
▪ New arrival of distinct data items
▪ Handle changes in distributed data
▪ AcquaTop
▪ Handle updating local replica
▪ Complexity: O(K+N)
▪ Framework: AcquaTop Framework
▪ Apply maintenance policies
!27
Top-K+N – New object arrival
!28
Time
Score
W2 (current window)
Object Ws We
E 2 3
C 2 2
F 2 3
A 3 4
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
K = 2
N = 1
Top-K+N – New object arrival
!28
Time
Score
W2 (current window)
Object Ws We
E 2 3
C 2 2
F 2 3
A 3 4
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
K = 2
N = 1
now
Object Ws We
G 2 4
E 2 3
C 2 2
F 3 3
A 4 4
Top-K+N
Top-K+N - Handling Changes
!29
Time
Score
W2
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Object Ws We
G 2 4
E 2 3
C 2 2
F 3 3
A 4 4
now
Top-K+N - Handling Changes
!29
Time
Score
W2
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Object Ws We
G 2 4
E 2 3
C 2 2
F 3 3
A 4 4
now
F
Object Ws We
G 2 4
F 2 3
E 2 3
C 2 2
A 4 4
Top-K+N
AcquaTop Framework
RDF Stream
Ranker
Maintainer
SPARQL endpoint
Elected set
Candidate set
Local Replica
✓ MTKN-T
✓ MTKN-F
✓ MTKN-A
Super-MTK+N
List
!30
AcquaTop Algorithm
Top-k+N Algorithm
Expiration
New Arrival
Remote Changes
New Maintenance Policies
▪ MTKN-T: Select objects
from top of the MTKN list
for updating
▪ MTKN-F: Select objects for
updating from the border
of K and N areas in MTKN
list (half from top N area,
and half from bottom K
area)
!31
Object Ws We
E 2 3
G 2 4
C 2 2
F 3 3
A 4 4
Object Ws We
E 2 3
G 2 4
C 2 2
F 3 3
A 4 4
2 items for
updating
2 items for
updating
EXPERIMENTAL
EVALUATION
!32
Experimental setting
▪ Datasets:
▪ Streaming data from twitter: mention numbers of user
▪ Real data from REST twitter: follower count of users
▪ Realistic and synthetic distributed data
▪ Query
▪ Query with FILTER clause
▪ Top-k query
▪ Scoring function - > normalized weighted summation between number
of mentions in each window and number of changes in Follower Count
▪ Generate the Oracle for each query
!33
Experimental setting
▪ Baselines:
▪ WST : we don’t update any changes
▪ RND : randomly selects items for update
▪ MTKN-A: update all the elements in MTKN list
▪ Metrics:
▪ CJD : show the correctness of the results for 2 different sets
▪ nDCG@K : Shows how relevant are the results comparing to the
Oracle one
▪ ACC@K : Shows the accuracy of the results
!34
measuring varying
Filter
Updat
e
LRU.
F
WB
M.F
LRU.F+
WB
M.F
+
WBM.F
*
accuracy selectivity >60% ✓ ✓ <60%
accuracy budget ✓ >=2 =1 ✓ >5
sensitivity to α selectivity,α <50% ✓ ✓
sensitivity to α budget,α ✓ ✓
accuracy - ✓
accuracy S <60% ✓
sensitivity to α - ✓ ✓
Evaluation Results – Queries with Filter
!35
measuring Varying MTKN-T MTKN-F
relevancy budget >3
accuracy budget ✓
relevancy CH ✓
accuracy CH =80 <=40
relevancy K ✓
accuracy K <7
relevancy N ✓
accuracy N ✓
relevancy - ✓
accuracy - ✓
Evaluation Results – Top-k Queries
!36
CONCLUSION
!37
Limitations and Future work
Limitations Future work
Two class of queries Broaden the class of queries: N:M join
relationship, multi-join operators,
preference queries, …
Static refresh budget Flexible budget allocation
Full replica Cache and replacement strategies
Single stream of data and one query
for evaluation
Distributed streams and multiple
queries
Correct and complete data inaccurate or incomplete
!38
Conclusion
▪ Is this work, we address the problem of relevant query
answering over streaming and distributed data.
▪ Proposed maintenance policies for queries with FILTER
clause.
▪ Proposed framework for top-k query answering and
maintenance policies to generate more relevant and
accurate result.
▪ We get more relevant and accurate results comparing
to the sate-of-the-art approaches.
!39
Thank you!

Any Question?
On Relevant Query Answering over
Streaming and Distributed Data
Shima Zahmatkesh
shima.zahmatkesh@polimi.it
DEIB - Politecnico of Milano
!40

More Related Content

Similar to On Relevant Query Answering over Streaming and Distributed Data

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
transight
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
Amazon Web Services
 
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionLarge GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Safe Software
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
InfluxData
 
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
Giuseppe Masetti
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
Daniele Dell'Aglio
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
Databricks
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
Patrick McFadin
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Soheila Dehghanzadeh
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
Databricks
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
Maria Gomez
 
near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
Umesh Prasad
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUST
Zuhair khayyat
 

Similar to On Relevant Query Answering over Streaming and Distributed Data (20)

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionLarge GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
 
near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUST
 

Recently uploaded

Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
pvpriya2
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
SMT process how to making and defects finding
SMT process how to making and defects findingSMT process how to making and defects finding
SMT process how to making and defects finding
rameshqapcba
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls ChennaiCall Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
paraasingh12 #V08
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
comptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdfcomptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdf
foxlyon
 

Recently uploaded (20)

Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
SMT process how to making and defects finding
SMT process how to making and defects findingSMT process how to making and defects finding
SMT process how to making and defects finding
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls ChennaiCall Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
comptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdfcomptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdf
 

On Relevant Query Answering over Streaming and Distributed Data

  • 1. ON RELEVANT QUERYANSWERING OVER STREAMING AND DISTRIBUTED DATA Shima Zahmatkesh Politecnico di Milano – DEIB Data Science Group – Stream Reasoning Team Supervisor: Prof. Emanuele Della Valle
  • 3. Motivation ▪ An application that shows the places around drivers where there is an high probability of finding free parking. Query: return the best streets (around the car that calls the service) where there are many free parking lots and few cars looking for parking in the last 10 minutes. !3 ▪ Web applications require to combine data streams with distributed data over the Web to continuously find the best answer to user’s queries
  • 4. Solving the example with web stream processing Web Relevant Answers Join Windows Car request streams !4 Stream Processing Engine Request Response Best streets to look for parking Free parking lots
  • 5. Problem Statement Web Relevant Answers Join SPARQL endpoint !5 RDF Stream Processing (RSP) Engine Local Replica Minimize computational resources • High Latency • Rate Limits Being Reactive • Stale Data • Refresh Budget • Maintenance Policy Windows RDF Streams
  • 6. Research Question Is it possible to optimize query evaluation in order to continuously obtain the most relevant combinations of streaming and evolving distributed data, while guaranteeing the reactiveness of the engine? !6
  • 7. Related work !7 Stream Processing Top-k Query Evaluation Federated Query Processing ACQUA Using resource replication MinTopK Optimal continuous top-k query evaluation •Top-k linked data query by Wagner •Top-k join queries by Ilyas Our work Previously unexplored
  • 9. Scope of the state of the art Is it possible to optimize query evaluation in order to continuously obtain the most relevant combinations of streaming and evolving distributed data, while guaranteeing the reactiveness of the engine? !9 Features ACQUA MinTopk Type of data Streaming and distributed streaming Relevancy ✗ ✓ Reactiveness Refresh budget Incremental evaluation Handling evolving data Local replica Maintenance policies ✗
  • 10. Scope of the research ▪ Queries that contains FILTER clause and have to filter the data come in the distributed dataset. ▪ Top-k queries where the scoring function involves data that appears both in the streaming and the distributed datasets. !10
  • 11. QUERIES WITH A FILTER CLAUSE !11
  • 12. Query ▪ Every minute give me the best influencers, i.e. users who are mentioned on Social Network in the last 10 minutes whose number of followers is greater than 100,000. !12 REGISTER STREAM <:Influencers> AS CONSTRUCT {?user a :influentialUser} WHERE { WINDOW :W(10m,1m) ON :S {?user :hasMentions ?mentionsNumber} SERVICE :BKG {?user :hasFollowers ?followersCount} FILTER (?followersCount > 100000) } Filtering Threshold ACQUA
  • 13. State of the art - ACQUA !13 WINDOW clause JOIN Local Replica Candidate set Elected set RND LRU WBM SERVICE clause Maintainer 3 Proposer 1 Ranker 2
  • 14. Proposed Solution – ACQUA.F !14 WINDOW clause JOIN Proposer Ranker MaintainerLocal Replica SERVICE clause with FILTER clause ✓ Filter Update Policy ✓ RND.F ✓ LRU.F ✓ WBM.F
  • 15. Filter Update Policy (intuition) !15 time NumberofFollowers t User A User B User D ▪ Computes how close is the value associate to the variable of each data item to the Filtering Threshold. User C Filtering Threshold
  • 16. Experimental Result !16 WorstBest Performance Experiment Dimension For low selectivity WBM is better than Filter Update Policy For high selectivity Filter Update Policy is better than WBM
  • 17. Combined Policies – ACQUA.F !17 time NumberofFollowers t Band User A User B User C User D ▪ Combine Filter Update Policy with ACQUA ones ▪ RND.F, LRU.F, and WBM.F
  • 19. State of the art - Rank Aggregation ▪ Fairly take into account the opinions of different algorithms. ▪ Combine the ranking lists by computing aggregated score !19 User Score Alice 0.8 Bob 0.7 David 0.4 User Score Bob 0.9 David 0.8 Alice 0.7 α = 0.5 User Scoreagg Bob 0.8 Alice 0.75 David 0.6 WBM Filter Update WBM.F+
  • 20. Proposed Solution – ACQUA.F+ !20 WINDOW clause JOIN Proposer Ranker MaintainerLocal Replica SERVICE clause with FILTER clause ✓ LRU.F+ ✓ WBM.F+ ✓ WBM.F*
  • 21. Experimental Results !21 Comparable to WBM.F Possible in practice
  • 23. W1(current window) State of the art – MinTopK !23 Time Score E C W1 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Window Length = 9 Top-2 results
  • 24. W2 State of the art – MinTopK !23 Time Score E C E C W1 W2 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Slide = 3 Top-2 results
  • 25. W3 State of the art – MinTopK !23 Time Score E C E C W1 E F W2 W3 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Top-2 results
  • 26. W3 State of the art – MinTopK !23 Time Score Object Ws We E 1 3 C 1 2 F 3 3 E C E C W1 E F W2 W3 now Super-MTK List 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Top-2 results
  • 27. Top-k Query ▪ Return every 3 minutes the top-2 popular users who are most mentioned on Social Networks in the last 9 minutes !24 REGISTER STREAM :TopkUsersToContact AS SELECT ?user F(?mentionCount,?followerCount) AS ?score FROM NAMED WINDOW :W ON :S [RANGE 9m STEP 3m] WHERE { WINDOW :W {?user :hasMentions ?mentionCount} SERVICE :BKG {?user :hasFollowers ?followerCount} } ORDER BY DESC (?score) LIMIT 2
  • 28. Time ScoreS 0 2 3 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Score !25 Time ScoreR
  • 29. Time ScoreS 0 2 3 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Score !25 Time FinalScore 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Time ScoreR Final Score = F ( ScoreS , ScoreR)
  • 30. Contributions (1 of 2) ▪ Data structure: Super-MTK+N List ▪ to handle changes in distributed dataset : N changes per window ▪ N additional slots ▪ MTK+N list : Keep K+N elements ▪ Complexity: O(K+N) !26 Object Ws We E 1 2 G 1 3 C 1 1 F 2 2 A 3 3 E G C E G W1 W2 LBP F W1 W2 W3 G A K area N area MTK+N lists Super-MTK+N List
  • 31. Contributions (2 of 2) ▪ Algorithm: ▪ Top-k+N ▪ Window expiration ▪ New arrival of distinct data items ▪ Handle changes in distributed data ▪ AcquaTop ▪ Handle updating local replica ▪ Complexity: O(K+N) ▪ Framework: AcquaTop Framework ▪ Apply maintenance policies !27
  • 32. Top-K+N – New object arrival !28 Time Score W2 (current window) Object Ws We E 2 3 C 2 2 F 2 3 A 3 4 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A K = 2 N = 1
  • 33. Top-K+N – New object arrival !28 Time Score W2 (current window) Object Ws We E 2 3 C 2 2 F 2 3 A 3 4 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A K = 2 N = 1 now Object Ws We G 2 4 E 2 3 C 2 2 F 3 3 A 4 4 Top-K+N
  • 34. Top-K+N - Handling Changes !29 Time Score W2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Object Ws We G 2 4 E 2 3 C 2 2 F 3 3 A 4 4 now
  • 35. Top-K+N - Handling Changes !29 Time Score W2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Object Ws We G 2 4 E 2 3 C 2 2 F 3 3 A 4 4 now F Object Ws We G 2 4 F 2 3 E 2 3 C 2 2 A 4 4 Top-K+N
  • 36. AcquaTop Framework RDF Stream Ranker Maintainer SPARQL endpoint Elected set Candidate set Local Replica ✓ MTKN-T ✓ MTKN-F ✓ MTKN-A Super-MTK+N List !30 AcquaTop Algorithm Top-k+N Algorithm Expiration New Arrival Remote Changes
  • 37. New Maintenance Policies ▪ MTKN-T: Select objects from top of the MTKN list for updating ▪ MTKN-F: Select objects for updating from the border of K and N areas in MTKN list (half from top N area, and half from bottom K area) !31 Object Ws We E 2 3 G 2 4 C 2 2 F 3 3 A 4 4 Object Ws We E 2 3 G 2 4 C 2 2 F 3 3 A 4 4 2 items for updating 2 items for updating
  • 39. Experimental setting ▪ Datasets: ▪ Streaming data from twitter: mention numbers of user ▪ Real data from REST twitter: follower count of users ▪ Realistic and synthetic distributed data ▪ Query ▪ Query with FILTER clause ▪ Top-k query ▪ Scoring function - > normalized weighted summation between number of mentions in each window and number of changes in Follower Count ▪ Generate the Oracle for each query !33
  • 40. Experimental setting ▪ Baselines: ▪ WST : we don’t update any changes ▪ RND : randomly selects items for update ▪ MTKN-A: update all the elements in MTKN list ▪ Metrics: ▪ CJD : show the correctness of the results for 2 different sets ▪ nDCG@K : Shows how relevant are the results comparing to the Oracle one ▪ ACC@K : Shows the accuracy of the results !34
  • 41. measuring varying Filter Updat e LRU. F WB M.F LRU.F+ WB M.F + WBM.F * accuracy selectivity >60% ✓ ✓ <60% accuracy budget ✓ >=2 =1 ✓ >5 sensitivity to α selectivity,α <50% ✓ ✓ sensitivity to α budget,α ✓ ✓ accuracy - ✓ accuracy S <60% ✓ sensitivity to α - ✓ ✓ Evaluation Results – Queries with Filter !35
  • 42. measuring Varying MTKN-T MTKN-F relevancy budget >3 accuracy budget ✓ relevancy CH ✓ accuracy CH =80 <=40 relevancy K ✓ accuracy K <7 relevancy N ✓ accuracy N ✓ relevancy - ✓ accuracy - ✓ Evaluation Results – Top-k Queries !36
  • 44. Limitations and Future work Limitations Future work Two class of queries Broaden the class of queries: N:M join relationship, multi-join operators, preference queries, … Static refresh budget Flexible budget allocation Full replica Cache and replacement strategies Single stream of data and one query for evaluation Distributed streams and multiple queries Correct and complete data inaccurate or incomplete !38
  • 45. Conclusion ▪ Is this work, we address the problem of relevant query answering over streaming and distributed data. ▪ Proposed maintenance policies for queries with FILTER clause. ▪ Proposed framework for top-k query answering and maintenance policies to generate more relevant and accurate result. ▪ We get more relevant and accurate results comparing to the sate-of-the-art approaches. !39
  • 46. Thank you!
 Any Question? On Relevant Query Answering over Streaming and Distributed Data Shima Zahmatkesh shima.zahmatkesh@polimi.it DEIB - Politecnico of Milano !40