SlideShare a Scribd company logo
1 of 13
JIST 2014
Optimizing SPARQL Query Processing On Dynamic
and Static Data Based on Query Time/Freshness
Requirements Using Materialization
Soheila Dehghanzadeh, Marcel Karnstedt, Stefan Decker, Josiane Xavier Parreira, Juergen Umbrich and Manfred Hauswirth
Outline
• Introduction
• Terminology
• Problem definition
• Proposed solution
• Experimental results
• Conclusion
Insight Centre for Data Analytics Slide 2
Introduction: Query Processing On Linked Data
• Report changes to the local store (maintenance)
• sources pro-actively report changes or their existence (pushing).
• query processor discover new sources and changes by crawling (pulling).
• Fast maintenance leads high quality but slow response and vice versa.
• Problem: On-demand maintenance according to response quality requirements.
• Why it is important? It eliminates unnecessary maintenance and leads to faster
response and better scalability.
Replication (database) or Caching (web)
Off-line
materialization
Local
Store
Query
Processor
Query
Response
NEW
sources
Terminology
• Quality requirements:
• Freshness B/(A+B)
• Completeness B/(B+C)
• Maintenance plan
• Each set of views chosen for maintenance is called a maintenance
plan.
• Having n views, number of maintenance plans is 2 𝑛
.
• Each maintenance plan leads to a different response quality.
20 October 2014Insight Centre for Data Analytics Slide 4
V1 V2 V3 V4
20% 90% 10% 80%
Freshness Example
a1 b1 T
a2 b2 T
a3 b3 F
a4 b4 T
a5 b5 F
20 October 2014Insight Centre for Data Analytics Slide 5
a1 c1 F
a1 c2 F
a1 c3 T
a2 c4 T
a6 c5 F
a1 b1 c1 F
a1 b1 c2 F
a1 b1 c3 T
a2 b2 c4 T
60% 40% 50%
a1 b1 T
a2 b2 T
a3 b3 T
a4 b4 T
a5 b5 T
a1 c1 F
a1 c2 F
a1 c3 T
a2 c4 T
a6 c5 F
a1 b1 c1 F
a1 b1 c2 F
a1 b1 c3 T
a2 b2 c4 T
100% 40% 50%
a1 b1 T
a2 b2 T
a3 b3 F
a4 b4 T
a5 b5 F
a1 c1 T
a1 c2 T
a1 c3 T
a2 c4 T
a6 c5 T
a1 b1 c1 T
a1 b1 c2 T
a1 b1 c3 T
a2 b2 c4 T
60% 100% 100%
Research questions
• What is the least costly maintenance plan that fulfils
response quality requirements.
• What is the quality of response without maintenance?
• what is the quality of response of each “maintenance plan”.
Experiment
• We use BSBM benchmark to create a dataset and a query
set.
• We label triples with true/false to specify their freshness
status.
• We summarize the cache to estimate the quality of a query
response without actually executing the query on cache.
• To summarize the cache we extended the cardinality
estimation techniques for freshness estimation problem.
Insight Centre for Data Analytics Slide 7
Alice Lives Dublin True
Bob Lives Berlin False
Alice Job Teacher True
Bob Job Developer False
Cardinality Estimation
• Capture the data distribution by splitting data into buckets
and only keep the bucket cardinality in the summary.
Insight Centre for Data Analytics Slide 8
Alice Job Teacher
Alice Lives Dublin
Alice Job PhD student
Alice Lives Athlon
Bob Job Manager
Bob Lives Berlin
Bob Lives Chicago
Bob Lives Munich
Bob Lives Belfast
Bob Lives Limerick
Bob Job CEO
Bob Job Consultant
Alice Job * 2
Bob Job * 3
Alice Lives * 2
Bob Lives * 5
* Job * 5
* Lives * 7
Freshness
True
True
False
False
True
True
True
False
False
False
False
False
2
3
1
1
1
2
Q1: ?a Job ?b
Q2: (?a Job ?b)^(?a Lives ?c)
Estimated Actual
5 5
35 19
Estimated Actual
5 5
19 19
Estimated Actual
2/5 2/5
6/35 3/19
Estimated Actual
2/5 2/5
3/19 3/19
Cardinality Estimation Approaches
• System R assumptions for cardinality estimation:
• data is uniformly distributed per attribute.
• join predicates are independent.
• Indexing approaches make both assumptions.
• Histogram captures the distribution of attributes for more
accurate estimation.
• Probabilistic Graphical Models captures dependencies
among attributes.
Insight Centre for Data Analytics Slide 9
Measure accuracy of the estimation
approach
Insight Centre for Data Analytics Slide 10
n is the number of queries
Measure the difference between the actual and estimated
freshness of queries in a query set.
Preliminary results
Conclusion
• We proposed a new approach for on-demand view
maintenance based on the response quality requirements.
• We defined quality requirements based on freshness and
completeness.
• We summarized a synthetic dataset to estimate the
freshness of various queries using indexing and histogram
for our freshness estimation problem.
• Using probabilistic graphical model to summarize the
dataset is the future work and it is promising to reduce the
estimation error.
Insight Centre for Data Analytics Slide 12
Thanks a lot for your attention !
Any question is welcomed!
Insight Centre for Data Analytics Slide 13

More Related Content

What's hot

A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmNECST Lab @ Politecnico di Milano
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submittedChamath Sajeewa
 
Development Infographic
Development InfographicDevelopment Infographic
Development InfographicRealMassive
 
Yulong deng resume 2018
Yulong deng resume 2018Yulong deng resume 2018
Yulong deng resume 2018Yulong Deng
 
Data mining example
Data mining exampleData mining example
Data mining exampleAamir Khan
 
Energy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static AnalysisEnergy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static AnalysisJérôme Rocheteau
 
Ranking using pairwise preferences
Ranking using pairwise preferencesRanking using pairwise preferences
Ranking using pairwise preferencesSweta Sharma
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Mining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningMining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningLionel Briand
 
Canopy clustering algorithm
Canopy clustering algorithmCanopy clustering algorithm
Canopy clustering algorithmAshish Karki
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series PolarSeven Pty Ltd
 
PPT slides
PPT slidesPPT slides
PPT slidesbutest
 
SBSI optimization tutorial
SBSI optimization tutorialSBSI optimization tutorial
SBSI optimization tutorialRichard Adams
 
Measuring the Combinatorial Coverage of Software in Real Time
Measuring the Combinatorial Coverage of Software in Real  TimeMeasuring the Combinatorial Coverage of Software in Real  Time
Measuring the Combinatorial Coverage of Software in Real TimeZachary Ratliff
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf
 

What's hot (20)

A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 
Group13 kdd cup_report_submitted
Group13 kdd cup_report_submittedGroup13 kdd cup_report_submitted
Group13 kdd cup_report_submitted
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
 
Yulong deng resume 2018
Yulong deng resume 2018Yulong deng resume 2018
Yulong deng resume 2018
 
Weka linked in
Weka linked inWeka linked in
Weka linked in
 
Data mining example
Data mining exampleData mining example
Data mining example
 
Energy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static AnalysisEnergy Wasting Rate as a Metrics for Green Computing and Static Analysis
Energy Wasting Rate as a Metrics for Green Computing and Static Analysis
 
PerOpteryx
PerOpteryxPerOpteryx
PerOpteryx
 
Ranking using pairwise preferences
Ranking using pairwise preferencesRanking using pairwise preferences
Ranking using pairwise preferences
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Mining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningMining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine Learning
 
Canopy clustering algorithm
Canopy clustering algorithmCanopy clustering algorithm
Canopy clustering algorithm
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series
 
PPT slides
PPT slidesPPT slides
PPT slides
 
SBSI optimization tutorial
SBSI optimization tutorialSBSI optimization tutorial
SBSI optimization tutorial
 
Measuring the Combinatorial Coverage of Software in Real Time
Measuring the Combinatorial Coverage of Software in Real  TimeMeasuring the Combinatorial Coverage of Software in Real  Time
Measuring the Combinatorial Coverage of Software in Real Time
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
 
assia2015sakai
assia2015sakaiassia2015sakai
assia2015sakai
 

Similar to JIST 2014: Optimizing SPARQL Query Processing Based on Query Time/Freshness Requirements

addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceSoheila Dehghanzadeh
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data SetsApproximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data SetsSoheila Dehghanzadeh
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24SKelly514
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
DB2 LUW Access Plan Stability
DB2 LUW Access Plan StabilityDB2 LUW Access Plan Stability
DB2 LUW Access Plan Stabilitydmcmichael
 
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clusteringbiagiolicari7
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...Mary Chan
 
Business Process Monitoring and Mining
Business Process Monitoring and MiningBusiness Process Monitoring and Mining
Business Process Monitoring and MiningMarlon Dumas
 
What+Is+Six+Sigma
What+Is+Six+SigmaWhat+Is+Six+Sigma
What+Is+Six+SigmaTyg Lucas
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?Chester Chen
 
261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part i261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part iNaviSoft
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...Soheila Dehghanzadeh
 
2019 2 testing and verification of vlsi design_verification
2019 2 testing and verification of vlsi design_verification2019 2 testing and verification of vlsi design_verification
2019 2 testing and verification of vlsi design_verificationUsha Mehta
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...Borhan Kazimipour
 
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...ORAU
 

Similar to JIST 2014: Optimizing SPARQL Query Processing Based on Query Time/Freshness Requirements (20)

addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenance
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data SetsApproximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data Sets
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
DB2 LUW Access Plan Stability
DB2 LUW Access Plan StabilityDB2 LUW Access Plan Stability
DB2 LUW Access Plan Stability
 
Query processing
Query processingQuery processing
Query processing
 
Clinical data eav
Clinical data eavClinical data eav
Clinical data eav
 
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clustering
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...
 
Business Process Monitoring and Mining
Business Process Monitoring and MiningBusiness Process Monitoring and Mining
Business Process Monitoring and Mining
 
What+Is+Six+Sigma
What+Is+Six+SigmaWhat+Is+Six+Sigma
What+Is+Six+Sigma
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part i261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part i
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 
2019 2 testing and verification of vlsi design_verification
2019 2 testing and verification of vlsi design_verification2019 2 testing and verification of vlsi design_verification
2019 2 testing and verification of vlsi design_verification
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
 
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 

JIST 2014: Optimizing SPARQL Query Processing Based on Query Time/Freshness Requirements

  • 1. JIST 2014 Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query Time/Freshness Requirements Using Materialization Soheila Dehghanzadeh, Marcel Karnstedt, Stefan Decker, Josiane Xavier Parreira, Juergen Umbrich and Manfred Hauswirth
  • 2. Outline • Introduction • Terminology • Problem definition • Proposed solution • Experimental results • Conclusion Insight Centre for Data Analytics Slide 2
  • 3. Introduction: Query Processing On Linked Data • Report changes to the local store (maintenance) • sources pro-actively report changes or their existence (pushing). • query processor discover new sources and changes by crawling (pulling). • Fast maintenance leads high quality but slow response and vice versa. • Problem: On-demand maintenance according to response quality requirements. • Why it is important? It eliminates unnecessary maintenance and leads to faster response and better scalability. Replication (database) or Caching (web) Off-line materialization Local Store Query Processor Query Response NEW sources
  • 4. Terminology • Quality requirements: • Freshness B/(A+B) • Completeness B/(B+C) • Maintenance plan • Each set of views chosen for maintenance is called a maintenance plan. • Having n views, number of maintenance plans is 2 𝑛 . • Each maintenance plan leads to a different response quality. 20 October 2014Insight Centre for Data Analytics Slide 4 V1 V2 V3 V4 20% 90% 10% 80%
  • 5. Freshness Example a1 b1 T a2 b2 T a3 b3 F a4 b4 T a5 b5 F 20 October 2014Insight Centre for Data Analytics Slide 5 a1 c1 F a1 c2 F a1 c3 T a2 c4 T a6 c5 F a1 b1 c1 F a1 b1 c2 F a1 b1 c3 T a2 b2 c4 T 60% 40% 50% a1 b1 T a2 b2 T a3 b3 T a4 b4 T a5 b5 T a1 c1 F a1 c2 F a1 c3 T a2 c4 T a6 c5 F a1 b1 c1 F a1 b1 c2 F a1 b1 c3 T a2 b2 c4 T 100% 40% 50% a1 b1 T a2 b2 T a3 b3 F a4 b4 T a5 b5 F a1 c1 T a1 c2 T a1 c3 T a2 c4 T a6 c5 T a1 b1 c1 T a1 b1 c2 T a1 b1 c3 T a2 b2 c4 T 60% 100% 100%
  • 6. Research questions • What is the least costly maintenance plan that fulfils response quality requirements. • What is the quality of response without maintenance? • what is the quality of response of each “maintenance plan”.
  • 7. Experiment • We use BSBM benchmark to create a dataset and a query set. • We label triples with true/false to specify their freshness status. • We summarize the cache to estimate the quality of a query response without actually executing the query on cache. • To summarize the cache we extended the cardinality estimation techniques for freshness estimation problem. Insight Centre for Data Analytics Slide 7 Alice Lives Dublin True Bob Lives Berlin False Alice Job Teacher True Bob Job Developer False
  • 8. Cardinality Estimation • Capture the data distribution by splitting data into buckets and only keep the bucket cardinality in the summary. Insight Centre for Data Analytics Slide 8 Alice Job Teacher Alice Lives Dublin Alice Job PhD student Alice Lives Athlon Bob Job Manager Bob Lives Berlin Bob Lives Chicago Bob Lives Munich Bob Lives Belfast Bob Lives Limerick Bob Job CEO Bob Job Consultant Alice Job * 2 Bob Job * 3 Alice Lives * 2 Bob Lives * 5 * Job * 5 * Lives * 7 Freshness True True False False True True True False False False False False 2 3 1 1 1 2 Q1: ?a Job ?b Q2: (?a Job ?b)^(?a Lives ?c) Estimated Actual 5 5 35 19 Estimated Actual 5 5 19 19 Estimated Actual 2/5 2/5 6/35 3/19 Estimated Actual 2/5 2/5 3/19 3/19
  • 9. Cardinality Estimation Approaches • System R assumptions for cardinality estimation: • data is uniformly distributed per attribute. • join predicates are independent. • Indexing approaches make both assumptions. • Histogram captures the distribution of attributes for more accurate estimation. • Probabilistic Graphical Models captures dependencies among attributes. Insight Centre for Data Analytics Slide 9
  • 10. Measure accuracy of the estimation approach Insight Centre for Data Analytics Slide 10 n is the number of queries Measure the difference between the actual and estimated freshness of queries in a query set.
  • 12. Conclusion • We proposed a new approach for on-demand view maintenance based on the response quality requirements. • We defined quality requirements based on freshness and completeness. • We summarized a synthetic dataset to estimate the freshness of various queries using indexing and histogram for our freshness estimation problem. • Using probabilistic graphical model to summarize the dataset is the future work and it is promising to reduce the estimation error. Insight Centre for Data Analytics Slide 12
  • 13. Thanks a lot for your attention ! Any question is welcomed! Insight Centre for Data Analytics Slide 13

Editor's Notes

  1. Hi All and thanks for coming to my presentation. In this work, I’m going to talk about how to optimize SPARQL query processing on static and dynamic data based on quality requirements of query response which are response time and freshness.
  2. The outline of the talk is as follows: First we will have a brief introduction on query processing and proposed approaches to make it faster Second, we introduce the terminology of our work. Third, we illustrate the targeted problem with an example. Afterwards we define the problem. The proposed solution and experimental results will then be presented. At the end we will conclude the talk with some directions for future works.
  3. To process queries on Linked data, the very naïve approach is that the query processor gets the query and fetch the relevant data from original sources, combine them and provide the response to the user. However, fetching data from original sources will take a lot of time and if the original sources become temporarily un-available, query processor can not provide the full response. Enter ------------ To get rid of availability and latency problems, researchers came up with the idea of offline materialization which is called replication in database or caching in Web context. They proposed to materialize as much data as they can in their local store and respond queries only using their local store. This provides very fast response time and will not suffer from the availability issues. Enter----------- However, if the original sources become updated or new sources become available, query processor can’t reflect these changes into its responses and thus provided responses which will suffer from low quality. Enter------------ To address this issue, maintenance mechanisms will help the query processor to compensate the quality issues. Enter------ However, highly frequent maintenance will consume all computational resources and queries will need to wait for computational resources. Thus, a response with high quality will only be achieved with a long response time and vice versa. Enter ------ The problem that we are targeting here is to do on-demand maintenance based on quality requirements of query response . Enter ------ The importance of this problem is that, it eliminates unnecessary maintenance and leads to faster response and better scalability.
  4. Here we define terminologies <<<point to the first figure>>> To specify the quality requirements of the response, suppose the shaded circle is the response that is provided with local store and transparent circle is the actual response. These 2 responses will share a set of tuples which is represented by “B”. “A” represents out-of-date responses provided with query processor. “C” represents valid responses that has been ignored by query processor due to the maintenance delay. If “A” is empty, that means query processor has partially provided a valid response but the response is not complete. If C is empty that means query processor has provided a complete response but the response is not fully fresh. Therefore, the definition of freshness is B divided by A plus B and the definition of completeness is B divided by B plus C. ---------------------- In each maintenance, query processor will decide to maintain a set of views which we call it a maintenance plan. Given n views, it is obvious that there exist 2 to the power n maintenance plans. As we will show in the next slide, each maintenance plan will lead to a different response quality. I just need to mention that in this work, we haven't touched response completeness and left it for future works. Therefore, we will only deal with response freshness as the response quality requirements.
  5. In this example we label fresh tuples with T and stale Tuples with F. suppose we have a join between 2 mappings. We want to show that different maintenance plans will lead to different freshness of response. One maintenance plan is not to maintain anything. <<<point to first row>>> One maintenance plan is not to maintain anything. Thus we need to measure the freshness of response with current data in local store. As we can see in first row, mapping 1 with 60% freshness will join with mapping 2 with 40% freshness and the result is 50% fresh. <<<point to second row>>> in the second row, we show the next maintenance plan which is to maintain mapping 1. however, join result is still 50% fresh. <<<point to third row>>>in the third row, we show another maintenance plan which is to maintain mapping 2. however, join result becomes 100% fresh this time. Therefore, various maintenance plans will lead to different quality of response.
  6. The problem that we are targeting is to find least costly maintenance that fulfils response quality requirements. This boils down to 2 sub-problems. first, estimating the quality of response provided with present cache without maintenance . Second, estimating the quality of response for other maintenance plans. In this work we only targeted the first sub-problem.
  7. To estimate the quality of response provided with present cache, we use BSBM benchmark generator to generate a dataset and a query set. we labled triples with true/false specifying their freshness status. We summarize the cache to estimate the quality of a query response without actually executing the query on cache. To summarize the cache we extended the cardinality estimation techniques for freshness estimation problem. In the next slide we will present how to extend the cardinality estimation methods for freshness estimation.
  8. Cardinality estimation approaches are trying to capture the data distribution by splitting data into buckets and keep the bucket cardinality in the summary. In our example, we summrize the whole dataset into an index which stores the cardinality of individual predicates. Now, to test the summary, we run 2 queries; Q1 and Q2. this summary provides accurate estimation for Q1 but to estimate the cardinality of Q2 it multiplies the cardinality of its triple patterns which is 35 while in fact the response cardinality is 19. As we can see this summary has failed to provide good estimation. Enter------ However, a more granular summary, can provide more accurate responses. As we can see, the second index can provide accurate cardinality estimation for Q1 and Q2. Enter------ Now, to extend cardinality estimation methods for freshness estimation we extend the summaries with one more column to store the number of fresh responses in addition to the total number of entries in that category. Enter---------- As we can see the first index provides accurate freshness estimation for Q1 but it failed to provide a good estimation for Q2. Enter----------- However, By storing more granular information in second index, we can provide accurate freshness estimation for both Q1 and Q2.
  9. To summarize underlying data for cardinality estimation, the original system R has made 2 simplifying assumptions: first they assumed data is uniformly distributed per attribute. Second, they assumed that join predicates are independent. Indexing approaches are making both assumptions to simplify the summarization and estimation process. However, such assumptions barely holds in real datasets. Thus, researchers came up with the idea of histograms to address the uniform data distribution assumption. using probabilistic graphical models we can build summaries that can address the join predicate independence assumptions. In this paper we extended the indexing and histogram cardinality estimation methods for freshness estimation according to the procedure explained in the previous slide.
  10. In order to measure the accuracy of the estimation approach we used the root mean square deviation error. That is, we sum over all squared differences between the estimated and actual freshness for all queries and we compute the rooted average to get a unique factor specifying the error of that method.
  11. The result showed that, indexing approach can achieve a very low estimation error and low storage space simultaneously. However, histogram can provide lower estimation error only with a huge summary size. We believe that majority of estimation errors in our queries, is caused by join dependencies which is not addressed by histogram. So we are hoping to further reduce the estimation error by using probabilistic graphical models in our future works.