SlideShare a Scribd company logo
RELEVANT QUERYANSWERING ON 

DYNAMIC AND DISTRIBUTED DATASETS
Shima Zahmatkesh
DEIB – Politecnico di Milano
Supervisor: Prof. Emanuele Della Valle
ISWC 2017- Vienna
22 October 2017
Relevancy
• Several Applications
• Domains: Social Networking, Smart City, Financial Market
• need to federate streams with distributed data to provide
relevant answer for users.
Web
Stream data
Distributed data
Answer
Join
!2
Advertisement agencies may want to
continuously detect influential Social
Network users:
✓ high number of followers
✓ mentioned in micro-posts
Across Social Networks, in order to
ask them to endorse their
commercials.
Problem Statement
RDF Stream Processing engine
Web
Answer
Join
WindowsRDF Streams SPARQL endpoint
!3
Provide answer
in timely
fashion
Problem Statement
RDF Stream Processing engine
Web
Answer
Join
WindowsRDF Streams SPARQL endpoint
Local
Replica
!3
Provide answer
in timely
fashion
Problem Statement
RDF Stream Processing engine
Web
Answer
Join
WindowsRDF Streams
Data become stale
if not refreshed
SPARQL endpoint
Local
Replica
!3
Provide answer
in timely
fashion
Problem Statement
RDF Stream Processing engine
Web
Answer
Join
WindowsRDF Streams
Define Refresh
Budget to limit
invocations
Data become stale
if not refreshed
SPARQL endpoint
Local
Replica
!3
Provide answer
in timely
fashion
Problem Statement
RDF Stream Processing engine
Web
Answer
Join
WindowsRDF Streams
Define Refresh
Budget to limit
invocations
Data become stale
if not refreshed
Correct vs
approximate
answer
SPARQL endpoint
Local
Replica
!3
Provide answer
in timely
fashion
Problem Statement
RDF Stream Processing engine
Web
Answer
Join
WindowsRDF Streams SPARQL endpoint
Local
Replica
!3
Maintenance
Policy
✓ Best usage of
refresh budget
✓ Maximize
Correctness
Related Works
Continuous
relevant query
evaluation
Data sources
replication
Federated query
answering
State of the art:
ACQUA: Approximate Continuous QUery
Answering over streams and dynamic Linked
Data sets
My Work:
Continuously Relevant SPARQL Query Answering on
Streaming and Slowly Evolving Linked Data
!4
Research Question
• Given a user-information need formulated as a 

relevant continuous query over an ontology,
• is it possible to optimize query evaluation in order to continuously
obtain the relevant (Filter based, Top-k) best combinations of
streaming and distributed resources that answer the information
need?
!5
Approach
RDF Stream
JOIN 1. Proposer 2. Ranker
3. Maintainer
SPARQL endpoint
E
C
✓ Filter Update Policy
✓ ACQUA.F Policies
✓ Rank Aggregation
Policies
✓ Top-k Policies
Candidate set
Elected set: top γ mappings
of Candidate set
Local Replica
!6
Maintenance Policies
Hypotheses
• For each proposed policy, I check:
• The proposed policy can make the replica fresher and give more
accurate results comparing to the state of the art policies.
• The proposed policy are not sensitive to its parameters.
• The combination of the proposed policies have better or at
least the same accuracy of the corresponding policies.
!7
Evaluation Plan
• Data Sets
• Streaming data
• Realistic and synthetic distributed data
• Query
• Join Query with Filter Clause
• Top-k Query
• KPIs
• Measure diversity of the set generated by the query and correct
answers:
• Cumulative Jaccard distance
• nDCG
• Control the overall latency by using refresh budget
!8
Preliminary Results
!9
Hp.1 Hp.2 Hp.3 Hp.4 Hp.5 Hp.6
measuring accuracy accuracy sensitivity
to alpha
accuracy accuracy sensitivity
to alpha
varying selectivity selectivity selectivity budget budget budget
Filter Update
LRU.F
WBM.F
LRU.F+
WBM.F+
WBM.F*
Hp.1 Hp.2 Hp.3 Hp.4 Hp.5 Hp.6
measuring accuracy accuracy sensitivity
to alpha
accuracy accuracy sensitivity
to alpha
varying selectivity selectivity selectivity budget budget budget
Filter Update
LRU.F
WBM.F
LRU.F+
WBM.F+
WBM.F*
Hp.1 Hp.2 Hp.3 Hp.4 Hp.5 Hp.6
measuring accuracy accuracy sensitivity
to alpha
accuracy accuracy sensitivity
to alpha
varying selectivity selectivity selectivity budget budget budget
Filter Update >60% ✓
LRU.F >40% ✓
WBM.F <40% <4
LRU.F+ ✓ ✓ ✓ ✓
WBM.F+ ✗ ✓ >5 ✓
WBM.F* <60% ✓ ✗ ✓
Reflection
• In this thesis, I proposed various maintenance policies for
top-k continuously query answering over stream and
distributed data.
• limitations:
• Focusing on join query with filter clause, and top-k query à
Considering other type of queries
• Defining a static refresh budget to control reactiveness à define
dynamic refresh budget
• Keeping the replica of distributed data à use cache
!10
Thank you!

Any Question?
Relevant Query Answering on 

Dynamic and Distributed Datasets
Shima Zahmatkesh
shima.zahmatkesh@polimi.it
DEIB - Politecnico of Milano
!11

More Related Content

Similar to Relevant Query Answering on Dynamic and Distributed Datasets

SEIN Advanced Rate Design Project
SEIN Advanced Rate Design ProjectSEIN Advanced Rate Design Project
SEIN Advanced Rate Design Project
Storn White
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit
 
Making driver-based planning and budgeting work
Making driver-based planning and budgeting workMaking driver-based planning and budgeting work
Making driver-based planning and budgeting work
Anaplan
 
Co-dependency with Clients - building a great product ≠ great product success
Co-dependency with Clients - building a great product ≠ great product successCo-dependency with Clients - building a great product ≠ great product success
Co-dependency with Clients - building a great product ≠ great product success
Barry Magee
 
AWS Partner Day London - June 11th 2013
AWS Partner Day London -  June 11th 2013  AWS Partner Day London -  June 11th 2013
AWS Partner Day London - June 11th 2013
Amazon Web Services
 
Designing a Future-proof API Program
Designing a Future-proof API ProgramDesigning a Future-proof API Program
Designing a Future-proof API Program
Pronovix
 
Supply Chain Network Design: Key Questions for a Successful Distribution Network
Supply Chain Network Design: Key Questions for a Successful Distribution NetworkSupply Chain Network Design: Key Questions for a Successful Distribution Network
Supply Chain Network Design: Key Questions for a Successful Distribution Network
Hannah Flynn
 
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdfGEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
JamesKumar21
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
Arvind Sathi
 
Case Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerCase Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the Customer
Jill Kirkpatrick
 
Analysis of economic data using big data
Analysis of economic data using big data Analysis of economic data using big data
Analysis of economic data using big data
Shivu Manjesh
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
Lviv Startup Club
 
MISO Info forum 072517
MISO Info forum 072517MISO Info forum 072517
MISO Info forum 072517
Paul De Martini
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Precisely
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
Francisco Couto
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
IntoTheMinds
 
Spectrum 2020.1: Proactively Manage the Data Value Chain for Faster, Trusted...
Spectrum 2020.1:  Proactively Manage the Data Value Chain for Faster, Trusted...Spectrum 2020.1:  Proactively Manage the Data Value Chain for Faster, Trusted...
Spectrum 2020.1: Proactively Manage the Data Value Chain for Faster, Trusted...
Precisely
 
Cascade
CascadeCascade
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
ScaleBase
 

Similar to Relevant Query Answering on Dynamic and Distributed Datasets (20)

SEIN Advanced Rate Design Project
SEIN Advanced Rate Design ProjectSEIN Advanced Rate Design Project
SEIN Advanced Rate Design Project
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Making driver-based planning and budgeting work
Making driver-based planning and budgeting workMaking driver-based planning and budgeting work
Making driver-based planning and budgeting work
 
Co-dependency with Clients - building a great product ≠ great product success
Co-dependency with Clients - building a great product ≠ great product successCo-dependency with Clients - building a great product ≠ great product success
Co-dependency with Clients - building a great product ≠ great product success
 
AWS Partner Day London - June 11th 2013
AWS Partner Day London -  June 11th 2013  AWS Partner Day London -  June 11th 2013
AWS Partner Day London - June 11th 2013
 
Designing a Future-proof API Program
Designing a Future-proof API ProgramDesigning a Future-proof API Program
Designing a Future-proof API Program
 
Supply Chain Network Design: Key Questions for a Successful Distribution Network
Supply Chain Network Design: Key Questions for a Successful Distribution NetworkSupply Chain Network Design: Key Questions for a Successful Distribution Network
Supply Chain Network Design: Key Questions for a Successful Distribution Network
 
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdfGEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Case Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerCase Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the Customer
 
Analysis of economic data using big data
Analysis of economic data using big data Analysis of economic data using big data
Analysis of economic data using big data
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
MISO Info forum 072517
MISO Info forum 072517MISO Info forum 072517
MISO Info forum 072517
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
Spectrum 2020.1: Proactively Manage the Data Value Chain for Faster, Trusted...
Spectrum 2020.1:  Proactively Manage the Data Value Chain for Faster, Trusted...Spectrum 2020.1:  Proactively Manage the Data Value Chain for Faster, Trusted...
Spectrum 2020.1: Proactively Manage the Data Value Chain for Faster, Trusted...
 
Cascade
CascadeCascade
Cascade
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
 

Recently uploaded

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 

Relevant Query Answering on Dynamic and Distributed Datasets

  • 1. RELEVANT QUERYANSWERING ON 
 DYNAMIC AND DISTRIBUTED DATASETS Shima Zahmatkesh DEIB – Politecnico di Milano Supervisor: Prof. Emanuele Della Valle ISWC 2017- Vienna 22 October 2017
  • 2. Relevancy • Several Applications • Domains: Social Networking, Smart City, Financial Market • need to federate streams with distributed data to provide relevant answer for users. Web Stream data Distributed data Answer Join !2 Advertisement agencies may want to continuously detect influential Social Network users: ✓ high number of followers ✓ mentioned in micro-posts Across Social Networks, in order to ask them to endorse their commercials.
  • 3. Problem Statement RDF Stream Processing engine Web Answer Join WindowsRDF Streams SPARQL endpoint !3 Provide answer in timely fashion
  • 4. Problem Statement RDF Stream Processing engine Web Answer Join WindowsRDF Streams SPARQL endpoint Local Replica !3 Provide answer in timely fashion
  • 5. Problem Statement RDF Stream Processing engine Web Answer Join WindowsRDF Streams Data become stale if not refreshed SPARQL endpoint Local Replica !3 Provide answer in timely fashion
  • 6. Problem Statement RDF Stream Processing engine Web Answer Join WindowsRDF Streams Define Refresh Budget to limit invocations Data become stale if not refreshed SPARQL endpoint Local Replica !3 Provide answer in timely fashion
  • 7. Problem Statement RDF Stream Processing engine Web Answer Join WindowsRDF Streams Define Refresh Budget to limit invocations Data become stale if not refreshed Correct vs approximate answer SPARQL endpoint Local Replica !3 Provide answer in timely fashion
  • 8. Problem Statement RDF Stream Processing engine Web Answer Join WindowsRDF Streams SPARQL endpoint Local Replica !3 Maintenance Policy ✓ Best usage of refresh budget ✓ Maximize Correctness
  • 9. Related Works Continuous relevant query evaluation Data sources replication Federated query answering State of the art: ACQUA: Approximate Continuous QUery Answering over streams and dynamic Linked Data sets My Work: Continuously Relevant SPARQL Query Answering on Streaming and Slowly Evolving Linked Data !4
  • 10. Research Question • Given a user-information need formulated as a 
 relevant continuous query over an ontology, • is it possible to optimize query evaluation in order to continuously obtain the relevant (Filter based, Top-k) best combinations of streaming and distributed resources that answer the information need? !5
  • 11. Approach RDF Stream JOIN 1. Proposer 2. Ranker 3. Maintainer SPARQL endpoint E C ✓ Filter Update Policy ✓ ACQUA.F Policies ✓ Rank Aggregation Policies ✓ Top-k Policies Candidate set Elected set: top γ mappings of Candidate set Local Replica !6 Maintenance Policies
  • 12. Hypotheses • For each proposed policy, I check: • The proposed policy can make the replica fresher and give more accurate results comparing to the state of the art policies. • The proposed policy are not sensitive to its parameters. • The combination of the proposed policies have better or at least the same accuracy of the corresponding policies. !7
  • 13. Evaluation Plan • Data Sets • Streaming data • Realistic and synthetic distributed data • Query • Join Query with Filter Clause • Top-k Query • KPIs • Measure diversity of the set generated by the query and correct answers: • Cumulative Jaccard distance • nDCG • Control the overall latency by using refresh budget !8
  • 14. Preliminary Results !9 Hp.1 Hp.2 Hp.3 Hp.4 Hp.5 Hp.6 measuring accuracy accuracy sensitivity to alpha accuracy accuracy sensitivity to alpha varying selectivity selectivity selectivity budget budget budget Filter Update LRU.F WBM.F LRU.F+ WBM.F+ WBM.F* Hp.1 Hp.2 Hp.3 Hp.4 Hp.5 Hp.6 measuring accuracy accuracy sensitivity to alpha accuracy accuracy sensitivity to alpha varying selectivity selectivity selectivity budget budget budget Filter Update LRU.F WBM.F LRU.F+ WBM.F+ WBM.F* Hp.1 Hp.2 Hp.3 Hp.4 Hp.5 Hp.6 measuring accuracy accuracy sensitivity to alpha accuracy accuracy sensitivity to alpha varying selectivity selectivity selectivity budget budget budget Filter Update >60% ✓ LRU.F >40% ✓ WBM.F <40% <4 LRU.F+ ✓ ✓ ✓ ✓ WBM.F+ ✗ ✓ >5 ✓ WBM.F* <60% ✓ ✗ ✓
  • 15. Reflection • In this thesis, I proposed various maintenance policies for top-k continuously query answering over stream and distributed data. • limitations: • Focusing on join query with filter clause, and top-k query à Considering other type of queries • Defining a static refresh budget to control reactiveness à define dynamic refresh budget • Keeping the replica of distributed data à use cache !10
  • 16. Thank you!
 Any Question? Relevant Query Answering on 
 Dynamic and Distributed Datasets Shima Zahmatkesh shima.zahmatkesh@polimi.it DEIB - Politecnico of Milano !11