SlideShare a Scribd company logo
1 of 46
Download to read offline
ON RELEVANT QUERYANSWERING OVER
STREAMING AND DISTRIBUTED DATA
Shima Zahmatkesh
Politecnico di Milano – DEIB
Data Science Group – Stream Reasoning Team
Supervisor: Prof. Emanuele Della Valle
INTRODUCTION
!2
Motivation
▪ An application that shows the places
around drivers where there is an high
probability of finding free parking.
Query: return the best streets (around the car
that calls the service) where there are many free
parking lots and few cars looking for parking in
the last 10 minutes.
!3
▪ Web applications require to combine data streams
with distributed data over the Web to continuously find
the best answer to user’s queries
Solving the example with web stream processing
Web
Relevant
Answers
Join
Windows
Car request
streams
!4
Stream Processing Engine
Request
Response
Best streets to look for parking
Free parking lots
Problem Statement
Web
Relevant
Answers
Join
SPARQL endpoint
!5
RDF Stream Processing (RSP) Engine
Local
Replica
Minimize
computational
resources
• High Latency
• Rate Limits
Being
Reactive
• Stale Data
• Refresh
Budget
• Maintenance
Policy
Windows
RDF Streams
Research Question
Is it possible to optimize query evaluation in order to
continuously obtain the most relevant combinations of
streaming and evolving distributed data, while
guaranteeing the reactiveness of the engine?
!6
Related work
!7
Stream
Processing
Top-k Query
Evaluation
Federated Query
Processing
ACQUA
Using
resource
replication
MinTopK
Optimal continuous
top-k query
evaluation
•Top-k linked data
query by Wagner
•Top-k join queries
by Ilyas
Our work
Previously
unexplored
APPROACH
!8
Scope of the state of the art
Is it possible to optimize query evaluation in order to
continuously obtain the most relevant combinations of
streaming and evolving distributed data, while
guaranteeing the reactiveness of the engine?
!9
Features ACQUA MinTopk
Type of data Streaming and
distributed
streaming
Relevancy ✗ ✓
Reactiveness Refresh budget Incremental
evaluation
Handling evolving data Local replica
Maintenance policies
✗
Scope of the research
▪ Queries that contains FILTER clause and have to filter
the data come in the distributed dataset.
▪ Top-k queries where the scoring function involves data
that appears both in the streaming and the distributed
datasets.
!10
QUERIES WITH A FILTER
CLAUSE
!11
Query
▪ Every minute give me the best influencers, i.e. users who
are mentioned on Social Network in the last 10 minutes
whose number of followers is greater than 100,000.
!12
REGISTER STREAM <:Influencers> AS
CONSTRUCT {?user a :influentialUser}
WHERE {
WINDOW :W(10m,1m) ON :S
{?user :hasMentions ?mentionsNumber}
SERVICE :BKG
{?user :hasFollowers ?followersCount}
FILTER (?followersCount > 100000)
}
Filtering Threshold
ACQUA
State of the art - ACQUA
!13
WINDOW clause
JOIN
Local Replica
Candidate set
Elected set
RND
LRU
WBM
SERVICE clause
Maintainer
3
Proposer
1
Ranker
2
Proposed Solution – ACQUA.F
!14
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal Replica
SERVICE clause
with
FILTER clause
✓ Filter Update
Policy
✓ RND.F
✓ LRU.F
✓ WBM.F
Filter Update Policy (intuition)
!15
time
NumberofFollowers
t
User A
User B
User D
▪ Computes how close is the value associate to the
variable of each data item to the Filtering Threshold.
User C
Filtering Threshold
Experimental Result
!16
WorstBest
Performance
Experiment Dimension
For low selectivity
WBM is better than
Filter Update Policy
For high selectivity
Filter Update Policy is
better than WBM
Combined Policies – ACQUA.F
!17
time
NumberofFollowers
t
Band
User A
User B
User C
User D
▪ Combine Filter Update Policy with ACQUA ones
▪ RND.F, LRU.F, and WBM.F
Experimental Result
!18
WorstBest
Performance
Experiment Dimension
Impossible in
practice
State of the art - Rank Aggregation
▪ Fairly take into account the opinions of different
algorithms.
▪ Combine the ranking lists by computing aggregated score
!19
User Score
Alice 0.8
Bob 0.7
David 0.4
User Score
Bob 0.9
David 0.8
Alice 0.7
α = 0.5
User Scoreagg
Bob 0.8
Alice 0.75
David 0.6
WBM Filter Update WBM.F+
Proposed Solution – ACQUA.F+
!20
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal Replica
SERVICE clause
with
FILTER clause
✓ LRU.F+
✓ WBM.F+
✓ WBM.F*
Experimental Results
!21
Comparable to WBM.F
Possible in practice
TOP-K QUERIES
!22
W1(current window)
State of the art – MinTopK
!23
Time
Score
E
C
W1
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Window Length = 9
Top-2
results
W2
State of the art – MinTopK
!23
Time
Score
E
C
E
C
W1 W2
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Slide = 3
Top-2
results
W3
State of the art – MinTopK
!23
Time
Score
E
C
E
C
W1
E
F
W2 W3
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Top-2
results
W3
State of the art – MinTopK
!23
Time
Score
Object Ws We
E 1 3
C 1 2
F 3 3
E
C
E
C
W1
E
F
W2 W3
now
Super-MTK
List
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
G
H
A
MTK Lists:
Top-2
results
Top-k Query
▪ Return every 3 minutes the top-2 popular users who are
most mentioned on Social Networks in the last 9 minutes
!24
REGISTER STREAM :TopkUsersToContact AS
SELECT ?user F(?mentionCount,?followerCount) AS ?score
FROM NAMED WINDOW :W ON :S [RANGE 9m STEP 3m]
WHERE {
WINDOW :W {?user :hasMentions ?mentionCount}
SERVICE :BKG {?user :hasFollowers ?followerCount}
}
ORDER BY DESC (?score)
LIMIT 2
Time
ScoreS
0
2
3
5
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Score
!25
Time
ScoreR
Time
ScoreS
0
2
3
5
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Score
!25
Time
FinalScore
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Time
ScoreR
Final Score = F ( ScoreS , ScoreR)
Contributions (1 of 2)
▪ Data structure: Super-MTK+N List
▪ to handle changes in distributed dataset : N changes per window
▪ N additional slots
▪ MTK+N list : Keep K+N elements
▪ Complexity: O(K+N)
!26
Object Ws We
E 1 2
G 1 3
C 1 1
F 2 2
A 3 3
E
G
C
E
G
W1
W2
LBP
F
W1 W2 W3
G
A
K area
N area
MTK+N lists Super-MTK+N List
Contributions (2 of 2)
▪ Algorithm:
▪ Top-k+N
▪ Window expiration
▪ New arrival of distinct data items
▪ Handle changes in distributed data
▪ AcquaTop
▪ Handle updating local replica
▪ Complexity: O(K+N)
▪ Framework: AcquaTop Framework
▪ Apply maintenance policies
!27
Top-K+N – New object arrival
!28
Time
Score
W2 (current window)
Object Ws We
E 2 3
C 2 2
F 2 3
A 3 4
now
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
K = 2
N = 1
Top-K+N – New object arrival
!28
Time
Score
W2 (current window)
Object Ws We
E 2 3
C 2 2
F 2 3
A 3 4
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
K = 2
N = 1
now
Object Ws We
G 2 4
E 2 3
C 2 2
F 3 3
A 4 4
Top-K+N
Top-K+N - Handling Changes
!29
Time
Score
W2
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Object Ws We
G 2 4
E 2 3
C 2 2
F 3 3
A 4 4
now
Top-K+N - Handling Changes
!29
Time
Score
W2
0
2
4
6
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13
C
D
E
F
B
A
G
A
Object Ws We
G 2 4
E 2 3
C 2 2
F 3 3
A 4 4
now
F
Object Ws We
G 2 4
F 2 3
E 2 3
C 2 2
A 4 4
Top-K+N
AcquaTop Framework
RDF Stream
Ranker
Maintainer
SPARQL endpoint
Elected set
Candidate set
Local Replica
✓ MTKN-T
✓ MTKN-F
✓ MTKN-A
Super-MTK+N
List
!30
AcquaTop Algorithm
Top-k+N Algorithm
Expiration
New Arrival
Remote Changes
New Maintenance Policies
▪ MTKN-T: Select objects
from top of the MTKN list
for updating
▪ MTKN-F: Select objects for
updating from the border
of K and N areas in MTKN
list (half from top N area,
and half from bottom K
area)
!31
Object Ws We
E 2 3
G 2 4
C 2 2
F 3 3
A 4 4
Object Ws We
E 2 3
G 2 4
C 2 2
F 3 3
A 4 4
2 items for
updating
2 items for
updating
EXPERIMENTAL
EVALUATION
!32
Experimental setting
▪ Datasets:
▪ Streaming data from twitter: mention numbers of user
▪ Real data from REST twitter: follower count of users
▪ Realistic and synthetic distributed data
▪ Query
▪ Query with FILTER clause
▪ Top-k query
▪ Scoring function - > normalized weighted summation between number
of mentions in each window and number of changes in Follower Count
▪ Generate the Oracle for each query
!33
Experimental setting
▪ Baselines:
▪ WST : we don’t update any changes
▪ RND : randomly selects items for update
▪ MTKN-A: update all the elements in MTKN list
▪ Metrics:
▪ CJD : show the correctness of the results for 2 different sets
▪ nDCG@K : Shows how relevant are the results comparing to the
Oracle one
▪ ACC@K : Shows the accuracy of the results
!34
measuring varying
Filter
Updat
e
LRU.
F
WB
M.F
LRU.F+
WB
M.F
+
WBM.F
*
accuracy selectivity >60% ✓ ✓ <60%
accuracy budget ✓ >=2 =1 ✓ >5
sensitivity to α selectivity,α <50% ✓ ✓
sensitivity to α budget,α ✓ ✓
accuracy - ✓
accuracy S <60% ✓
sensitivity to α - ✓ ✓
Evaluation Results – Queries with Filter
!35
measuring Varying MTKN-T MTKN-F
relevancy budget >3
accuracy budget ✓
relevancy CH ✓
accuracy CH =80 <=40
relevancy K ✓
accuracy K <7
relevancy N ✓
accuracy N ✓
relevancy - ✓
accuracy - ✓
Evaluation Results – Top-k Queries
!36
CONCLUSION
!37
Limitations and Future work
Limitations Future work
Two class of queries Broaden the class of queries: N:M join
relationship, multi-join operators,
preference queries, …
Static refresh budget Flexible budget allocation
Full replica Cache and replacement strategies
Single stream of data and one query
for evaluation
Distributed streams and multiple
queries
Correct and complete data inaccurate or incomplete
!38
Conclusion
▪ Is this work, we address the problem of relevant query
answering over streaming and distributed data.
▪ Proposed maintenance policies for queries with FILTER
clause.
▪ Proposed framework for top-k query answering and
maintenance policies to generate more relevant and
accurate result.
▪ We get more relevant and accurate results comparing
to the sate-of-the-art approaches.
!39
Thank you!

Any Question?
On Relevant Query Answering over
Streaming and Distributed Data
Shima Zahmatkesh
shima.zahmatkesh@polimi.it
DEIB - Politecnico of Milano
!40

More Related Content

Similar to On Relevant Query Answering over Streaming and Distributed Data

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumChengKuan Gan
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august editiontransight
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
 
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionLarge GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionSafe Software
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...
Backscatter Working Group Software Inter-comparison Project Requesting and Co...Giuseppe Masetti
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmDaniele Dell'Aglio
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineDatabricks
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...Soheila Dehghanzadeh
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryDataWorks Summit
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveMaria Gomez
 
near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce Umesh Prasad
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTZuhair khayyat
 

Similar to On Relevant Query Answering over Streaming and Distributed Data (20)

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
 
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
 
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout SolutionLarge GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
Large GIS Data Reprojection With FME Workbench - UTM Zone Fanout Solution
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...Backscatter Working Group Software Inter-comparison ProjectRequesting and Co...
Backscatter Working Group Software Inter-comparison Project Requesting and Co...
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
 
near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUST
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

On Relevant Query Answering over Streaming and Distributed Data

  • 1. ON RELEVANT QUERYANSWERING OVER STREAMING AND DISTRIBUTED DATA Shima Zahmatkesh Politecnico di Milano – DEIB Data Science Group – Stream Reasoning Team Supervisor: Prof. Emanuele Della Valle
  • 3. Motivation ▪ An application that shows the places around drivers where there is an high probability of finding free parking. Query: return the best streets (around the car that calls the service) where there are many free parking lots and few cars looking for parking in the last 10 minutes. !3 ▪ Web applications require to combine data streams with distributed data over the Web to continuously find the best answer to user’s queries
  • 4. Solving the example with web stream processing Web Relevant Answers Join Windows Car request streams !4 Stream Processing Engine Request Response Best streets to look for parking Free parking lots
  • 5. Problem Statement Web Relevant Answers Join SPARQL endpoint !5 RDF Stream Processing (RSP) Engine Local Replica Minimize computational resources • High Latency • Rate Limits Being Reactive • Stale Data • Refresh Budget • Maintenance Policy Windows RDF Streams
  • 6. Research Question Is it possible to optimize query evaluation in order to continuously obtain the most relevant combinations of streaming and evolving distributed data, while guaranteeing the reactiveness of the engine? !6
  • 7. Related work !7 Stream Processing Top-k Query Evaluation Federated Query Processing ACQUA Using resource replication MinTopK Optimal continuous top-k query evaluation •Top-k linked data query by Wagner •Top-k join queries by Ilyas Our work Previously unexplored
  • 9. Scope of the state of the art Is it possible to optimize query evaluation in order to continuously obtain the most relevant combinations of streaming and evolving distributed data, while guaranteeing the reactiveness of the engine? !9 Features ACQUA MinTopk Type of data Streaming and distributed streaming Relevancy ✗ ✓ Reactiveness Refresh budget Incremental evaluation Handling evolving data Local replica Maintenance policies ✗
  • 10. Scope of the research ▪ Queries that contains FILTER clause and have to filter the data come in the distributed dataset. ▪ Top-k queries where the scoring function involves data that appears both in the streaming and the distributed datasets. !10
  • 11. QUERIES WITH A FILTER CLAUSE !11
  • 12. Query ▪ Every minute give me the best influencers, i.e. users who are mentioned on Social Network in the last 10 minutes whose number of followers is greater than 100,000. !12 REGISTER STREAM <:Influencers> AS CONSTRUCT {?user a :influentialUser} WHERE { WINDOW :W(10m,1m) ON :S {?user :hasMentions ?mentionsNumber} SERVICE :BKG {?user :hasFollowers ?followersCount} FILTER (?followersCount > 100000) } Filtering Threshold ACQUA
  • 13. State of the art - ACQUA !13 WINDOW clause JOIN Local Replica Candidate set Elected set RND LRU WBM SERVICE clause Maintainer 3 Proposer 1 Ranker 2
  • 14. Proposed Solution – ACQUA.F !14 WINDOW clause JOIN Proposer Ranker MaintainerLocal Replica SERVICE clause with FILTER clause ✓ Filter Update Policy ✓ RND.F ✓ LRU.F ✓ WBM.F
  • 15. Filter Update Policy (intuition) !15 time NumberofFollowers t User A User B User D ▪ Computes how close is the value associate to the variable of each data item to the Filtering Threshold. User C Filtering Threshold
  • 16. Experimental Result !16 WorstBest Performance Experiment Dimension For low selectivity WBM is better than Filter Update Policy For high selectivity Filter Update Policy is better than WBM
  • 17. Combined Policies – ACQUA.F !17 time NumberofFollowers t Band User A User B User C User D ▪ Combine Filter Update Policy with ACQUA ones ▪ RND.F, LRU.F, and WBM.F
  • 19. State of the art - Rank Aggregation ▪ Fairly take into account the opinions of different algorithms. ▪ Combine the ranking lists by computing aggregated score !19 User Score Alice 0.8 Bob 0.7 David 0.4 User Score Bob 0.9 David 0.8 Alice 0.7 α = 0.5 User Scoreagg Bob 0.8 Alice 0.75 David 0.6 WBM Filter Update WBM.F+
  • 20. Proposed Solution – ACQUA.F+ !20 WINDOW clause JOIN Proposer Ranker MaintainerLocal Replica SERVICE clause with FILTER clause ✓ LRU.F+ ✓ WBM.F+ ✓ WBM.F*
  • 21. Experimental Results !21 Comparable to WBM.F Possible in practice
  • 23. W1(current window) State of the art – MinTopK !23 Time Score E C W1 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Window Length = 9 Top-2 results
  • 24. W2 State of the art – MinTopK !23 Time Score E C E C W1 W2 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Slide = 3 Top-2 results
  • 25. W3 State of the art – MinTopK !23 Time Score E C E C W1 E F W2 W3 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Top-2 results
  • 26. W3 State of the art – MinTopK !23 Time Score Object Ws We E 1 3 C 1 2 F 3 3 E C E C W1 E F W2 W3 now Super-MTK List 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B G H A MTK Lists: Top-2 results
  • 27. Top-k Query ▪ Return every 3 minutes the top-2 popular users who are most mentioned on Social Networks in the last 9 minutes !24 REGISTER STREAM :TopkUsersToContact AS SELECT ?user F(?mentionCount,?followerCount) AS ?score FROM NAMED WINDOW :W ON :S [RANGE 9m STEP 3m] WHERE { WINDOW :W {?user :hasMentions ?mentionCount} SERVICE :BKG {?user :hasFollowers ?followerCount} } ORDER BY DESC (?score) LIMIT 2
  • 28. Time ScoreS 0 2 3 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Score !25 Time ScoreR
  • 29. Time ScoreS 0 2 3 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Score !25 Time FinalScore 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Time ScoreR Final Score = F ( ScoreS , ScoreR)
  • 30. Contributions (1 of 2) ▪ Data structure: Super-MTK+N List ▪ to handle changes in distributed dataset : N changes per window ▪ N additional slots ▪ MTK+N list : Keep K+N elements ▪ Complexity: O(K+N) !26 Object Ws We E 1 2 G 1 3 C 1 1 F 2 2 A 3 3 E G C E G W1 W2 LBP F W1 W2 W3 G A K area N area MTK+N lists Super-MTK+N List
  • 31. Contributions (2 of 2) ▪ Algorithm: ▪ Top-k+N ▪ Window expiration ▪ New arrival of distinct data items ▪ Handle changes in distributed data ▪ AcquaTop ▪ Handle updating local replica ▪ Complexity: O(K+N) ▪ Framework: AcquaTop Framework ▪ Apply maintenance policies !27
  • 32. Top-K+N – New object arrival !28 Time Score W2 (current window) Object Ws We E 2 3 C 2 2 F 2 3 A 3 4 now 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A K = 2 N = 1
  • 33. Top-K+N – New object arrival !28 Time Score W2 (current window) Object Ws We E 2 3 C 2 2 F 2 3 A 3 4 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A K = 2 N = 1 now Object Ws We G 2 4 E 2 3 C 2 2 F 3 3 A 4 4 Top-K+N
  • 34. Top-K+N - Handling Changes !29 Time Score W2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Object Ws We G 2 4 E 2 3 C 2 2 F 3 3 A 4 4 now
  • 35. Top-K+N - Handling Changes !29 Time Score W2 0 2 4 6 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 C D E F B A G A Object Ws We G 2 4 E 2 3 C 2 2 F 3 3 A 4 4 now F Object Ws We G 2 4 F 2 3 E 2 3 C 2 2 A 4 4 Top-K+N
  • 36. AcquaTop Framework RDF Stream Ranker Maintainer SPARQL endpoint Elected set Candidate set Local Replica ✓ MTKN-T ✓ MTKN-F ✓ MTKN-A Super-MTK+N List !30 AcquaTop Algorithm Top-k+N Algorithm Expiration New Arrival Remote Changes
  • 37. New Maintenance Policies ▪ MTKN-T: Select objects from top of the MTKN list for updating ▪ MTKN-F: Select objects for updating from the border of K and N areas in MTKN list (half from top N area, and half from bottom K area) !31 Object Ws We E 2 3 G 2 4 C 2 2 F 3 3 A 4 4 Object Ws We E 2 3 G 2 4 C 2 2 F 3 3 A 4 4 2 items for updating 2 items for updating
  • 39. Experimental setting ▪ Datasets: ▪ Streaming data from twitter: mention numbers of user ▪ Real data from REST twitter: follower count of users ▪ Realistic and synthetic distributed data ▪ Query ▪ Query with FILTER clause ▪ Top-k query ▪ Scoring function - > normalized weighted summation between number of mentions in each window and number of changes in Follower Count ▪ Generate the Oracle for each query !33
  • 40. Experimental setting ▪ Baselines: ▪ WST : we don’t update any changes ▪ RND : randomly selects items for update ▪ MTKN-A: update all the elements in MTKN list ▪ Metrics: ▪ CJD : show the correctness of the results for 2 different sets ▪ nDCG@K : Shows how relevant are the results comparing to the Oracle one ▪ ACC@K : Shows the accuracy of the results !34
  • 41. measuring varying Filter Updat e LRU. F WB M.F LRU.F+ WB M.F + WBM.F * accuracy selectivity >60% ✓ ✓ <60% accuracy budget ✓ >=2 =1 ✓ >5 sensitivity to α selectivity,α <50% ✓ ✓ sensitivity to α budget,α ✓ ✓ accuracy - ✓ accuracy S <60% ✓ sensitivity to α - ✓ ✓ Evaluation Results – Queries with Filter !35
  • 42. measuring Varying MTKN-T MTKN-F relevancy budget >3 accuracy budget ✓ relevancy CH ✓ accuracy CH =80 <=40 relevancy K ✓ accuracy K <7 relevancy N ✓ accuracy N ✓ relevancy - ✓ accuracy - ✓ Evaluation Results – Top-k Queries !36
  • 44. Limitations and Future work Limitations Future work Two class of queries Broaden the class of queries: N:M join relationship, multi-join operators, preference queries, … Static refresh budget Flexible budget allocation Full replica Cache and replacement strategies Single stream of data and one query for evaluation Distributed streams and multiple queries Correct and complete data inaccurate or incomplete !38
  • 45. Conclusion ▪ Is this work, we address the problem of relevant query answering over streaming and distributed data. ▪ Proposed maintenance policies for queries with FILTER clause. ▪ Proposed framework for top-k query answering and maintenance policies to generate more relevant and accurate result. ▪ We get more relevant and accurate results comparing to the sate-of-the-art approaches. !39
  • 46. Thank you!
 Any Question? On Relevant Query Answering over Streaming and Distributed Data Shima Zahmatkesh shima.zahmatkesh@polimi.it DEIB - Politecnico of Milano !40