SlideShare a Scribd company logo
1 of 27
Addressing Time/Quality Trade-off in View
Maintenance
Soheila Dehghanzadeh
Outline
• Introduction
• Terminology
• Problem definition
• Proposed solution
• Experimental results
• Conclusion
Insight Centre for Data Analytics Slide 2
Introduction: Query Processing On Linked Data
• Report changes to the local store (maintenance)
• sources pro-actively report changes or their existence (pushing).
• query processor discover new sources and changes by crawling (pulling).
• Maintenance trade-off
• Fast maintenance leads high quality but slow response and vice versa.
• Problem: Maintenance according to user defined trade-off.
• Why is it important? It eliminates unnecessary maintenance and leads to faster
response and better scalability.
Replication (database) or Caching (web)
Off-line
materialization
Local
Store
Query
Processor
Query
Response
NEW
sources
Scalability
Availability
performance
Scalability
Availability
performance
View Maintenance Categorization
Insight Centre for Data Analytics Slide 4
Trade-off
Management
V.s.
Change
Reporting
Mechanism
Time/quality trade-off
query level replica level
quality
constraint
time
constraint
quality
constraint
time
constraint
update stream A B C D
no update
stream
E F G H
Problem Definition
• Problem E
• Optimizing maintenance to satisfy quality constraints within the
lowest response time for each query.
• Problem F
• Optimizing maintenance to satisfy time constraints with the highest
response quality for each query.
Insight Centre for Data Analytics Slide 5
Terminology
• Quality requirements:
• Freshness B/(A+B)
• Completeness B/(B+C)
• Maintenance plan
• Each set of views chosen for maintenance is called a maintenance
plan.
• Having n views, number of maintenance plans is 2 𝑛
.
• Each maintenance plan leads to a different response quality.
Insight Centre for Data Analytics Slide 6
V1 V2 V3 V4
20% 90% 10% 80%
Freshness Example
a1 b1 T
a2 b2 T
a3 b3 F
a4 b4 T
a5 b5 F
Insight Centre for Data Analytics Slide 7
a1 c1 F
a1 c2 F
a1 c3 T
a2 c4 T
a6 c5 F
a1 b1 c1 F
a1 b1 c2 F
a1 b1 c3 T
a2 b2 c4 T
60% 40% 50%
a1 b1 T
a2 b2 T
a3 b3 T
a4 b4 T
a5 b5 T
a1 c1 F
a1 c2 F
a1 c3 T
a2 c4 T
a6 c5 F
a1 b1 c1 F
a1 b1 c2 F
a1 b1 c3 T
a2 b2 c4 T
100% 40% 50%
a1 b1 T
a2 b2 T
a3 b3 F
a4 b4 T
a5 b5 F
a1 c1 T
a1 c2 T
a1 c3 T
a2 c4 T
a6 c5 T
a1 b1 c1 T
a1 b1 c2 T
a1 b1 c3 T
a2 b2 c4 T
60% 100% 100%
Research questions
• What is the least costly maintenance plan that fulfills
response quality requirements.
• What is the quality of response without maintenance?
• What is the quality of response of each “maintenance plan”.
Insight Centre for Data Analytics Slide 8
Experiment
• We use BSBM benchmark to create a dataset and a query
set.
• We label triples with true/false to specify their freshness
status.
• We summarize the cache to estimate the quality of a query
response without actually executing the query on cache.
• To summarize the cache we extended the cardinality
estimation techniques for freshness estimation problem.
Insight Centre for Data Analytics Slide 9
Alice Lives Dublin True
Bob Lives Berlin False
Alice Job Teacher True
Bob Job Developer False
Cardinality Estimation
• Capture the data distribution by splitting data into buckets
and only keep the bucket cardinality in the summary.
Insight Centre for Data Analytics Slide 10
Alice Job Teacher
Alice Lives Dublin
Alice Job PhD student
Alice Lives Athlon
Bob Job Manager
Bob Lives Berlin
Bob Lives Chicago
Bob Lives Munich
Bob Lives Belfast
Bob Lives Limerick
Bob Job CEO
Bob Job Consultant
Alice Job * 2
Bob Job * 3
Alice Lives * 2
Bob Lives * 5
* Job * 5
* Lives * 7
Freshness
True
True
False
False
True
True
True
False
False
False
False
False
2
3
1
1
1
2
Q1: ?a Job ?b
Q2: (?a Job ?b)^(?a Lives ?c)
Estimated Actual
5 5
35 19
Estimated Actual
5 5
19 19
Estimated Actual
2/5 2/5
6/35 3/19
Estimated Actual
2/5 2/5
3/19 3/19
Cardinality Estimation Approaches
• System R assumptions for cardinality estimation:
• data is uniformly distributed per attribute.
• predicates are independent (either in same table or among different
tables).
• predicate multiplication approaches make both assumptions.
• Histogram captures the dependencies among predicates for
more accurate estimation.
Insight Centre for Data Analytics Slide 11
Measure accuracy of the estimation
approach
Insight Centre for Data Analytics Slide 12
n is the number of queries
Measure the difference between the actual and estimated
freshness of queries in a query set.
Estimation Results
Estimation Error 1
Insight Centre for Data Analytics Slide 14
a Job teacher T
a Job professor F
a Job PhD F
b Job developer T
a Lives in Dublin T
b Lives in Galway F
b Lives in Cork T
b Lives in Limerick T
a teacher Dublin T
a Professor Dublin F
a PhD Dublin F
b Developer Galway F
b Developer Cork T
b Developer Limerick T
?s, Job, ?o 50%
50%
?s, Lives in, ?o 75%
Reason : Dependencies
Solution :
• A more granular index on join (subject) and bounded dimension (predicate).
• Histogram and table level synopses can capture these dependencies and reduce
the error accordingly.
Experiment: We did not observe this error in our experiment because we didn’t have
such dependencies in the dataset.
37.5% summary
Data
<?s,Job,?o1> join <?s, Lives in,?o2>
Estimation Error 2
20 October 2014Insight Centre for Data Analytics Slide 15
?s, Job, ?o1 50% ?s, Lives in, ?o2 75% summary
a Job teacher T
a Job professor F
a Job PhD F
b Job developer T
a Lives in Dublin T
b Lives in Galway F
b Lives in Cork T
b Lives in Limerick T
Data
<?s,Job,Developer> join <?s, Lives in,?o2>
b Developer Galway F
b Developer Cork T
b Developer Limerick T
37.5%
66%
Reason : bounded object
Solution :
• A more granular index on join dimension (subject) and bounded dimensions
(predicate and object) => we need to index the whole dataset-> not efficient.
Experiment: We did not observe any improvement on this error by using histogram.
Concern 1 on problem definition
Bob Job Teacher True
Bob Job PhD True
Alice Job Profess
or
True
Bob Job Teacher True
Bob Job PhD False
Alice Job Profess
or
True
Bob Job Teacher True
Bob Job PhD False
Alice Job Profess
or
False
Bob Job Teacher False
Bob Job PhD False
Alice Job Profess
or
False
Bob Lives in Limeric
k
True
Bob Lives in Galway True
Alice Lives in Dublin True
Alice Lives in Cork True
Bob Lives in Limeric
k
True
Bob Lives in Galway True
Alice Lives in Dublin True
Alice Lives in Cork False
Bob Lives in Limeric
k
True
Bob Lives in Galway False
Alice Lives in Dublin True
Alice Lives in Cork False
Bob Lives in Limeric
k
False
Bob Lives in Galway False
Alice Lives in Dublin True
Alice Lives in Cork False
Bob Teacher Limerick True
Bob Teacher Galway True
Bob PhD Limerick True
Bob PhD Galway True
Alice Professor Dublin True
Alice Professor Cork True
Bob Teacher Limerick True
Bob Teacher Galway True
Bob PhD Limerick False
Bob PhD Galway False
Alice Professor Dublin True
Alice Professor Cork False
Bob Teacher Limerick True
Bob Teacher Galway False
Bob PhD Limerick False
Bob PhD Galway False
Alice Professor Dublin False
Alice Professor Cork False
Bob Teacher Limerick False
Bob Teacher Galway False
Bob PhD Limerick False
Bob PhD Galway False
Alice Professor Dublin False
Alice Professor Cork False
100%
100%
100%
66%
75%
50%
33%
50%
16%
0%
25%
0%
True
False
True
True
False
66%
Concern 2 on the suggested solution
• We need to build one summaries for each maintenance plan
because summary of one maintenance plan can not be used for
estimating freshness of a query executed on another maintenance
plan.
• This is very inefficient given the space requirements and cost of
maintaining these summaries.
Insight Centre for Data Analytics Slide 17
Conclusion
• We defined quality constraints based on freshness and completeness.
• We summarized a snapshot of a dataset to estimate the freshness of various queries
using indexing and histogram for our freshness estimation problem.
• We need to build individual summaries for each maintenance plan since a summary
for one maintenance plan can not be used to estimate the quality of a query executed on
other maintenance plans.
• Our experiment didn’t fail by estimation error caused by dependency due to lack of
such errors in the dataset. Next step is to design a more realistic dataset and again
compare the result of histogram and predicate multiplication.
• Summarization techniques are designed for a very static environment and any
change on the underlying data needs to build the summary from scratch. So does it really
make sense to extend cardinality estimation for freshness estimation?
Insight Centre for Data Analytics Slide 18
Problem Definition
• Problem E
• Optimizing maintenance to satisfy quality constraints within the
lowest response time for each query.
• Problem F
• Optimizing maintenance to satisfy time constraints with the highest
response quality for each query.
Insight Centre for Data Analytics Slide 19
Problem description without join
Insight Centre for Data Analytics Slide 20
Replica
User queries the replica with time
constraints
Replica should maintain only a subset
of result that is more likely to be
expired.
ScenarioStream Data
Back Ground Data
Window Replica
Insight Centre for Data Analytics Slide 23
Use CaseTwitter Stream
Data
Back Ground Data
Number of
mentions in the
last twitter
window
User follower
count Replica
Raising stars Query: find users who have been
mentioned more than 100 times in the last 10 minutes
and have more than 1000 followers.
With constraint on the execution time.
Insight Centre for Data Analytics Slide 23
Continuous join operator with one replica
• We implemented a set of continuous join operators
• DWJoin : Uses the static replica and never change it(the quality of
response degrades).
• Baseline join: Uses the LRU entry to choose entries to update from
set of matches.(not necessary Least recently updated requires
updating).
• Oracle Join: fetch data directly from source.
• Smart Join: compute statistics of change rate and choose those
who are likely to be expired for fetching.
• Mixed baseline-smart(possible extensions).
Insight Centre for Data Analytics Slide 23
Performance of join operators
Insight Centre for Data Analytics Slide 24
Possible extensions
Insight Centre for Data Analytics Slide 25
The problem becomes
complicated when the query is a
join between replicas
Updating which combination of
entries incurs the highest
increase in join update?
ReplicaReplica
Future works
• Use a better model for learning the change rate in smart
policy.
• We believe that smart policy will perform better if the change
rate is more predictable.
• Investigate the problem where there is joins on the background
knowledge side to know which combination of stale entries will
contribute more to the result correctness if they become updated.
Insight Centre for Data Analytics Slide 26
Thanks a lot for your attention !
Insight Centre for Data Analytics Slide 27

More Related Content

What's hot

ICSME2014
ICSME2014ICSME2014
ICSME2014swy351
 
Gap Survey, Assessment and Analysis for DevSecOps
Gap Survey, Assessment and Analysis for DevSecOpsGap Survey, Assessment and Analysis for DevSecOps
Gap Survey, Assessment and Analysis for DevSecOpsMarc Hornbeek
 
Estimating test effort part 1 of 2
Estimating test effort part 1 of 2Estimating test effort part 1 of 2
Estimating test effort part 1 of 2Ian McDonald
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Chakkrit (Kla) Tantithamthavorn
 
Lead Time: What We Know About It...
Lead Time: What We Know About It...Lead Time: What We Know About It...
Lead Time: What We Know About It...azheglov
 
Engineering DevOps Right the First Time
Engineering DevOps Right the First TimeEngineering DevOps Right the First Time
Engineering DevOps Right the First TimeMarc Hornbeek
 
Agile Test Management and Reporting—Even in a Non-Agile Project
Agile Test Management and Reporting—Even in a Non-Agile ProjectAgile Test Management and Reporting—Even in a Non-Agile Project
Agile Test Management and Reporting—Even in a Non-Agile ProjectTechWell
 
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010TEST Huddle
 
How Do You Measure The KM Maturity Of Your Organization Final Ver.
How Do You Measure The KM Maturity Of Your Organization Final Ver.How Do You Measure The KM Maturity Of Your Organization Final Ver.
How Do You Measure The KM Maturity Of Your Organization Final Ver.Art Schlussel
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Ian McDonald
 
Key Measurements For Testers
Key Measurements For TestersKey Measurements For Testers
Key Measurements For TestersGopi Raghavendra
 
Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)KMS Technology
 

What's hot (13)

ICSME2014
ICSME2014ICSME2014
ICSME2014
 
Gap Survey, Assessment and Analysis for DevSecOps
Gap Survey, Assessment and Analysis for DevSecOpsGap Survey, Assessment and Analysis for DevSecOps
Gap Survey, Assessment and Analysis for DevSecOps
 
Estimating test effort part 1 of 2
Estimating test effort part 1 of 2Estimating test effort part 1 of 2
Estimating test effort part 1 of 2
 
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
 
Lead Time: What We Know About It...
Lead Time: What We Know About It...Lead Time: What We Know About It...
Lead Time: What We Know About It...
 
Engineering DevOps Right the First Time
Engineering DevOps Right the First TimeEngineering DevOps Right the First Time
Engineering DevOps Right the First Time
 
Agile Test Management and Reporting—Even in a Non-Agile Project
Agile Test Management and Reporting—Even in a Non-Agile ProjectAgile Test Management and Reporting—Even in a Non-Agile Project
Agile Test Management and Reporting—Even in a Non-Agile Project
 
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
Thomas Axen - Lean Kaizen Applied To Software Testing - EuroSTAR 2010
 
How Do You Measure The KM Maturity Of Your Organization Final Ver.
How Do You Measure The KM Maturity Of Your Organization Final Ver.How Do You Measure The KM Maturity Of Your Organization Final Ver.
How Do You Measure The KM Maturity Of Your Organization Final Ver.
 
PerOpteryx
PerOpteryxPerOpteryx
PerOpteryx
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2
 
Key Measurements For Testers
Key Measurements For TestersKey Measurements For Testers
Key Measurements For Testers
 
Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)
 

Viewers also liked

Maureen manley, ma chickchat uber influencers
Maureen manley, ma chickchat uber influencersMaureen manley, ma chickchat uber influencers
Maureen manley, ma chickchat uber influencersMaureenManley1
 
Vocabulary reading book
Vocabulary reading bookVocabulary reading book
Vocabulary reading bookGabriela Paez
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...Soheila Dehghanzadeh
 
Advice to my Uni self.
Advice to my Uni self.Advice to my Uni self.
Advice to my Uni self.Erik Posthuma
 
Customer Experience Trends 2011
Customer Experience Trends 2011Customer Experience Trends 2011
Customer Experience Trends 2011Erik Posthuma
 
lightweight graphical models for selectivity estimation without independance ...
lightweight graphical models for selectivity estimation without independance ...lightweight graphical models for selectivity estimation without independance ...
lightweight graphical models for selectivity estimation without independance ...Soheila Dehghanzadeh
 

Viewers also liked (8)

Maureen manley, ma chickchat uber influencers
Maureen manley, ma chickchat uber influencersMaureen manley, ma chickchat uber influencers
Maureen manley, ma chickchat uber influencers
 
Ley 10272 modif. ley 9571
Ley 10272   modif. ley 9571Ley 10272   modif. ley 9571
Ley 10272 modif. ley 9571
 
Chp 2,3 & 4 Triage, Guessing, Poe & Taking The
Chp 2,3 & 4 Triage, Guessing, Poe & Taking TheChp 2,3 & 4 Triage, Guessing, Poe & Taking The
Chp 2,3 & 4 Triage, Guessing, Poe & Taking The
 
Vocabulary reading book
Vocabulary reading bookVocabulary reading book
Vocabulary reading book
 
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ... Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
Approximate Continuous Query Answering Over Streams and Dynamic Linked Data ...
 
Advice to my Uni self.
Advice to my Uni self.Advice to my Uni self.
Advice to my Uni self.
 
Customer Experience Trends 2011
Customer Experience Trends 2011Customer Experience Trends 2011
Customer Experience Trends 2011
 
lightweight graphical models for selectivity estimation without independance ...
lightweight graphical models for selectivity estimation without independance ...lightweight graphical models for selectivity estimation without independance ...
lightweight graphical models for selectivity estimation without independance ...
 

Similar to addressing tim/quality trade-off in view maintenance

Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...Soheila Dehghanzadeh
 
quality control STUDY ON 3 POLE MCCB MBA SIP report
quality control STUDY ON 3 POLE MCCB MBA SIP report quality control STUDY ON 3 POLE MCCB MBA SIP report
quality control STUDY ON 3 POLE MCCB MBA SIP report Akshay Nair
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...Mary Chan
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24SKelly514
 
'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de Burgt'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de BurgtTEST Huddle
 
What+Is+Six+Sigma
What+Is+Six+SigmaWhat+Is+Six+Sigma
What+Is+Six+SigmaTyg Lucas
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapLeanleaders.org
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapLeanleaders.org
 
OPS 571 HELP Redefined Education--ops571help.com
OPS 571 HELP Redefined Education--ops571help.comOPS 571 HELP Redefined Education--ops571help.com
OPS 571 HELP Redefined Education--ops571help.comclaric212
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
Java Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth RoundJava Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth RoundAnnibale Panichella
 
Verification and Validation with Innoslate
Verification and Validation with InnoslateVerification and Validation with Innoslate
Verification and Validation with InnoslateElizabeth Steiner
 
Quality And Performance.pptx
Quality And Performance.pptxQuality And Performance.pptx
Quality And Performance.pptxOswaldo Gonzales
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
[Case study]Utilize STLC data for Process Improvement
[Case study]Utilize STLC data for Process Improvement[Case study]Utilize STLC data for Process Improvement
[Case study]Utilize STLC data for Process ImprovementRakuten Group, Inc.
 
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 WeeksPROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 WeeksGoLeanSixSigma.com
 
Improving the Quality of Existing Software
Improving the Quality of Existing SoftwareImproving the Quality of Existing Software
Improving the Quality of Existing SoftwareSteven Smith
 

Similar to addressing tim/quality trade-off in view maintenance (20)

Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...
 
quality control STUDY ON 3 POLE MCCB MBA SIP report
quality control STUDY ON 3 POLE MCCB MBA SIP report quality control STUDY ON 3 POLE MCCB MBA SIP report
quality control STUDY ON 3 POLE MCCB MBA SIP report
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24
 
'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de Burgt'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de Burgt
 
What+Is+Six+Sigma
What+Is+Six+SigmaWhat+Is+Six+Sigma
What+Is+Six+Sigma
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
 
D03 15 Deliverable Roadmap
D03 15 Deliverable RoadmapD03 15 Deliverable Roadmap
D03 15 Deliverable Roadmap
 
OPS 571 HELP Redefined Education--ops571help.com
OPS 571 HELP Redefined Education--ops571help.comOPS 571 HELP Redefined Education--ops571help.com
OPS 571 HELP Redefined Education--ops571help.com
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Java Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth RoundJava Unit Testing Tool Competition — Fifth Round
Java Unit Testing Tool Competition — Fifth Round
 
Verification and Validation with Innoslate
Verification and Validation with InnoslateVerification and Validation with Innoslate
Verification and Validation with Innoslate
 
Quality And Performance.pptx
Quality And Performance.pptxQuality And Performance.pptx
Quality And Performance.pptx
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
[Case study]Utilize STLC data for Process Improvement
[Case study]Utilize STLC data for Process Improvement[Case study]Utilize STLC data for Process Improvement
[Case study]Utilize STLC data for Process Improvement
 
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 WeeksPROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
PROJECT STORYBOARD: Reducing Learning Curve Ramp for Temp Employees by 2 Weeks
 
Improving the Quality of Existing Software
Improving the Quality of Existing SoftwareImproving the Quality of Existing Software
Improving the Quality of Existing Software
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

addressing tim/quality trade-off in view maintenance

  • 1. Addressing Time/Quality Trade-off in View Maintenance Soheila Dehghanzadeh
  • 2. Outline • Introduction • Terminology • Problem definition • Proposed solution • Experimental results • Conclusion Insight Centre for Data Analytics Slide 2
  • 3. Introduction: Query Processing On Linked Data • Report changes to the local store (maintenance) • sources pro-actively report changes or their existence (pushing). • query processor discover new sources and changes by crawling (pulling). • Maintenance trade-off • Fast maintenance leads high quality but slow response and vice versa. • Problem: Maintenance according to user defined trade-off. • Why is it important? It eliminates unnecessary maintenance and leads to faster response and better scalability. Replication (database) or Caching (web) Off-line materialization Local Store Query Processor Query Response NEW sources Scalability Availability performance Scalability Availability performance
  • 4. View Maintenance Categorization Insight Centre for Data Analytics Slide 4 Trade-off Management V.s. Change Reporting Mechanism Time/quality trade-off query level replica level quality constraint time constraint quality constraint time constraint update stream A B C D no update stream E F G H
  • 5. Problem Definition • Problem E • Optimizing maintenance to satisfy quality constraints within the lowest response time for each query. • Problem F • Optimizing maintenance to satisfy time constraints with the highest response quality for each query. Insight Centre for Data Analytics Slide 5
  • 6. Terminology • Quality requirements: • Freshness B/(A+B) • Completeness B/(B+C) • Maintenance plan • Each set of views chosen for maintenance is called a maintenance plan. • Having n views, number of maintenance plans is 2 𝑛 . • Each maintenance plan leads to a different response quality. Insight Centre for Data Analytics Slide 6 V1 V2 V3 V4 20% 90% 10% 80%
  • 7. Freshness Example a1 b1 T a2 b2 T a3 b3 F a4 b4 T a5 b5 F Insight Centre for Data Analytics Slide 7 a1 c1 F a1 c2 F a1 c3 T a2 c4 T a6 c5 F a1 b1 c1 F a1 b1 c2 F a1 b1 c3 T a2 b2 c4 T 60% 40% 50% a1 b1 T a2 b2 T a3 b3 T a4 b4 T a5 b5 T a1 c1 F a1 c2 F a1 c3 T a2 c4 T a6 c5 F a1 b1 c1 F a1 b1 c2 F a1 b1 c3 T a2 b2 c4 T 100% 40% 50% a1 b1 T a2 b2 T a3 b3 F a4 b4 T a5 b5 F a1 c1 T a1 c2 T a1 c3 T a2 c4 T a6 c5 T a1 b1 c1 T a1 b1 c2 T a1 b1 c3 T a2 b2 c4 T 60% 100% 100%
  • 8. Research questions • What is the least costly maintenance plan that fulfills response quality requirements. • What is the quality of response without maintenance? • What is the quality of response of each “maintenance plan”. Insight Centre for Data Analytics Slide 8
  • 9. Experiment • We use BSBM benchmark to create a dataset and a query set. • We label triples with true/false to specify their freshness status. • We summarize the cache to estimate the quality of a query response without actually executing the query on cache. • To summarize the cache we extended the cardinality estimation techniques for freshness estimation problem. Insight Centre for Data Analytics Slide 9 Alice Lives Dublin True Bob Lives Berlin False Alice Job Teacher True Bob Job Developer False
  • 10. Cardinality Estimation • Capture the data distribution by splitting data into buckets and only keep the bucket cardinality in the summary. Insight Centre for Data Analytics Slide 10 Alice Job Teacher Alice Lives Dublin Alice Job PhD student Alice Lives Athlon Bob Job Manager Bob Lives Berlin Bob Lives Chicago Bob Lives Munich Bob Lives Belfast Bob Lives Limerick Bob Job CEO Bob Job Consultant Alice Job * 2 Bob Job * 3 Alice Lives * 2 Bob Lives * 5 * Job * 5 * Lives * 7 Freshness True True False False True True True False False False False False 2 3 1 1 1 2 Q1: ?a Job ?b Q2: (?a Job ?b)^(?a Lives ?c) Estimated Actual 5 5 35 19 Estimated Actual 5 5 19 19 Estimated Actual 2/5 2/5 6/35 3/19 Estimated Actual 2/5 2/5 3/19 3/19
  • 11. Cardinality Estimation Approaches • System R assumptions for cardinality estimation: • data is uniformly distributed per attribute. • predicates are independent (either in same table or among different tables). • predicate multiplication approaches make both assumptions. • Histogram captures the dependencies among predicates for more accurate estimation. Insight Centre for Data Analytics Slide 11
  • 12. Measure accuracy of the estimation approach Insight Centre for Data Analytics Slide 12 n is the number of queries Measure the difference between the actual and estimated freshness of queries in a query set.
  • 14. Estimation Error 1 Insight Centre for Data Analytics Slide 14 a Job teacher T a Job professor F a Job PhD F b Job developer T a Lives in Dublin T b Lives in Galway F b Lives in Cork T b Lives in Limerick T a teacher Dublin T a Professor Dublin F a PhD Dublin F b Developer Galway F b Developer Cork T b Developer Limerick T ?s, Job, ?o 50% 50% ?s, Lives in, ?o 75% Reason : Dependencies Solution : • A more granular index on join (subject) and bounded dimension (predicate). • Histogram and table level synopses can capture these dependencies and reduce the error accordingly. Experiment: We did not observe this error in our experiment because we didn’t have such dependencies in the dataset. 37.5% summary Data <?s,Job,?o1> join <?s, Lives in,?o2>
  • 15. Estimation Error 2 20 October 2014Insight Centre for Data Analytics Slide 15 ?s, Job, ?o1 50% ?s, Lives in, ?o2 75% summary a Job teacher T a Job professor F a Job PhD F b Job developer T a Lives in Dublin T b Lives in Galway F b Lives in Cork T b Lives in Limerick T Data <?s,Job,Developer> join <?s, Lives in,?o2> b Developer Galway F b Developer Cork T b Developer Limerick T 37.5% 66% Reason : bounded object Solution : • A more granular index on join dimension (subject) and bounded dimensions (predicate and object) => we need to index the whole dataset-> not efficient. Experiment: We did not observe any improvement on this error by using histogram.
  • 16. Concern 1 on problem definition Bob Job Teacher True Bob Job PhD True Alice Job Profess or True Bob Job Teacher True Bob Job PhD False Alice Job Profess or True Bob Job Teacher True Bob Job PhD False Alice Job Profess or False Bob Job Teacher False Bob Job PhD False Alice Job Profess or False Bob Lives in Limeric k True Bob Lives in Galway True Alice Lives in Dublin True Alice Lives in Cork True Bob Lives in Limeric k True Bob Lives in Galway True Alice Lives in Dublin True Alice Lives in Cork False Bob Lives in Limeric k True Bob Lives in Galway False Alice Lives in Dublin True Alice Lives in Cork False Bob Lives in Limeric k False Bob Lives in Galway False Alice Lives in Dublin True Alice Lives in Cork False Bob Teacher Limerick True Bob Teacher Galway True Bob PhD Limerick True Bob PhD Galway True Alice Professor Dublin True Alice Professor Cork True Bob Teacher Limerick True Bob Teacher Galway True Bob PhD Limerick False Bob PhD Galway False Alice Professor Dublin True Alice Professor Cork False Bob Teacher Limerick True Bob Teacher Galway False Bob PhD Limerick False Bob PhD Galway False Alice Professor Dublin False Alice Professor Cork False Bob Teacher Limerick False Bob Teacher Galway False Bob PhD Limerick False Bob PhD Galway False Alice Professor Dublin False Alice Professor Cork False 100% 100% 100% 66% 75% 50% 33% 50% 16% 0% 25% 0% True False True True False 66%
  • 17. Concern 2 on the suggested solution • We need to build one summaries for each maintenance plan because summary of one maintenance plan can not be used for estimating freshness of a query executed on another maintenance plan. • This is very inefficient given the space requirements and cost of maintaining these summaries. Insight Centre for Data Analytics Slide 17
  • 18. Conclusion • We defined quality constraints based on freshness and completeness. • We summarized a snapshot of a dataset to estimate the freshness of various queries using indexing and histogram for our freshness estimation problem. • We need to build individual summaries for each maintenance plan since a summary for one maintenance plan can not be used to estimate the quality of a query executed on other maintenance plans. • Our experiment didn’t fail by estimation error caused by dependency due to lack of such errors in the dataset. Next step is to design a more realistic dataset and again compare the result of histogram and predicate multiplication. • Summarization techniques are designed for a very static environment and any change on the underlying data needs to build the summary from scratch. So does it really make sense to extend cardinality estimation for freshness estimation? Insight Centre for Data Analytics Slide 18
  • 19. Problem Definition • Problem E • Optimizing maintenance to satisfy quality constraints within the lowest response time for each query. • Problem F • Optimizing maintenance to satisfy time constraints with the highest response quality for each query. Insight Centre for Data Analytics Slide 19
  • 20. Problem description without join Insight Centre for Data Analytics Slide 20 Replica User queries the replica with time constraints Replica should maintain only a subset of result that is more likely to be expired.
  • 21. ScenarioStream Data Back Ground Data Window Replica Insight Centre for Data Analytics Slide 23
  • 22. Use CaseTwitter Stream Data Back Ground Data Number of mentions in the last twitter window User follower count Replica Raising stars Query: find users who have been mentioned more than 100 times in the last 10 minutes and have more than 1000 followers. With constraint on the execution time. Insight Centre for Data Analytics Slide 23
  • 23. Continuous join operator with one replica • We implemented a set of continuous join operators • DWJoin : Uses the static replica and never change it(the quality of response degrades). • Baseline join: Uses the LRU entry to choose entries to update from set of matches.(not necessary Least recently updated requires updating). • Oracle Join: fetch data directly from source. • Smart Join: compute statistics of change rate and choose those who are likely to be expired for fetching. • Mixed baseline-smart(possible extensions). Insight Centre for Data Analytics Slide 23
  • 24. Performance of join operators Insight Centre for Data Analytics Slide 24
  • 25. Possible extensions Insight Centre for Data Analytics Slide 25 The problem becomes complicated when the query is a join between replicas Updating which combination of entries incurs the highest increase in join update? ReplicaReplica
  • 26. Future works • Use a better model for learning the change rate in smart policy. • We believe that smart policy will perform better if the change rate is more predictable. • Investigate the problem where there is joins on the background knowledge side to know which combination of stale entries will contribute more to the result correctness if they become updated. Insight Centre for Data Analytics Slide 26
  • 27. Thanks a lot for your attention ! Insight Centre for Data Analytics Slide 27

Editor's Notes

  1. Hi All and thanks for coming to my presentation. In this work, I’m going to talk about how to optimize SPARQL query processing on static and dynamic data based on quality requirements of query response which are response time and freshness.
  2. The outline of the talk is as follows: First we will have a brief introduction on query processing and proposed approaches to make it faster Second, we introduce the terminology of our work. Third, we illustrate the targeted problem with an example. Afterwards we define the problem. The proposed solution and experimental results will then be presented. At the end we will conclude the talk with some directions for future works.
  3. To process queries on Linked data, the very naïve approach is that the query processor gets the query and fetch the relevant data from original sources, combine them and provide the response to the user. However, fetching data from original sources will take a lot of time and if the original sources become temporarily un-available, query processor can not provide the full response. Enter ------------ To get rid of availability and latency problems, researchers came up with the idea of offline materialization which is called replication in database or caching in Web context. They proposed to materialize as much data as they can in their local store and respond queries only using their local store. This provides very fast response time and will not suffer from the availability issues. Enter----------- However, if the original sources become updated or new sources become available, query processor can’t reflect these changes into its responses and thus provided responses which will suffer from low quality. Enter------------ To address this issue, maintenance mechanisms will help the query processor to compensate the quality issues. Enter------ However, highly frequent maintenance will consume all computational resources and queries will need to wait for computational resources. Thus, a response with high quality will only be achieved with a long response time and vice versa. Enter ------ The problem that we are targeting here is to do on-demand maintenance based on quality requirements of query response . Enter ------ The importance of this problem is that, it eliminates unnecessary maintenance and leads to faster response and better scalability.
  4. Based on how we specify the trade-off and how underlying data sources report their changes we categorized view maintenance problem into 8 sub-problems. Problem A and B has already been addressed in ligature. Management of replica level trade-off is much easier because we measure the quality at the high replica level based on the number of un-applied update streams© and oldness(time of last maintenance) of replica(D) and oldness(time of last processed update) of replica(H). E,F,G are still open problems. G: how to measure quality of replica without assuming the updates stream E: how to measure quality of query response without assuming update stream F: how to maximize quality of query response with time constraint without assuming update stream
  5. Here we define terminologies <<<point to the first figure>>> To specify the quality requirements of the response, suppose the shaded circle is the response that is provided with local store and transparent circle is the actual response. These 2 responses will share a set of tuples which is represented by “B”. “A” represents out-of-date responses provided with query processor. “C” represents valid responses that has been ignored by query processor due to the maintenance delay. If “A” is empty, that means query processor has partially provided a valid response but the response is not complete. If C is empty that means query processor has provided a complete response but the response is not fully fresh. Therefore, the definition of freshness is B divided by A plus B and the definition of completeness is B divided by B plus C. ---------------------- In each maintenance, query processor will decide to maintain a set of views which we call it a maintenance plan. Given n views, it is obvious that there exist 2 to the power n maintenance plans. As we will show in the next slide, each maintenance plan will lead to a different response quality. I just need to mention that in this work, we haven't touched response completeness and left it for future works. Therefore, we will only deal with response freshness as the response quality requirements.
  6. In this example we label fresh tuples with T and stale Tuples with F. suppose we have a join between 2 mappings. We want to show that different maintenance plans will lead to different freshness of response. One maintenance plan is not to maintain anything. <<<point to first row>>> One maintenance plan is not to maintain anything. Thus we need to measure the freshness of response with current data in local store. As we can see in first row, mapping 1 with 60% freshness will join with mapping 2 with 40% freshness and the result is 50% fresh. <<<point to second row>>> in the second row, we show the next maintenance plan which is to maintain mapping 1. however, join result is still 50% fresh. <<<point to third row>>>in the third row, we show another maintenance plan which is to maintain mapping 2. however, join result becomes 100% fresh this time. Therefore, various maintenance plans will lead to different quality of response.
  7. The problem that we are targeting is to find least costly maintenance that fulfils response quality requirements. This boils down to 2 sub-problems. first, estimating the quality of response provided with present cache without maintenance . Second, estimating the quality of response for other maintenance plans. In this work we only targeted the first sub-problem.
  8. To estimate the quality of response provided with present cache, we use BSBM benchmark generator to generate a dataset and a query set. we labled triples with true/false specifying their freshness status. We summarize the cache to estimate the quality of a query response without actually executing the query on cache. To summarize the cache we extended the cardinality estimation techniques for freshness estimation problem. In the next slide we will present how to extend the cardinality estimation methods for freshness estimation.
  9. Cardinality estimation approaches are trying to capture the data distribution by splitting data into buckets and keep the bucket cardinality in the summary. In our example, we summrize the whole dataset into an index which stores the cardinality of individual predicates. Now, to test the summary, we run 2 queries; Q1 and Q2. this summary provides accurate estimation for Q1 but to estimate the cardinality of Q2 it multiplies the cardinality of its triple patterns which is 35 while in fact the response cardinality is 19. As we can see this summary has failed to provide good estimation. Enter------ However, a more granular summary, can provide more accurate responses. As we can see, the second index can provide accurate cardinality estimation for Q1 and Q2. Enter------ Now, to extend cardinality estimation methods for freshness estimation we extend the summaries with one more column to store the number of fresh responses in addition to the total number of entries in that category. Enter---------- As we can see the first index provides accurate freshness estimation for Q1 but it failed to provide a good estimation for Q2. Enter----------- However, By storing more granular information in second index, we can provide accurate freshness estimation for both Q1 and Q2.
  10. We investigated 2 type of cardinality estimation techniques for our freshness estimation problem, indexing based and histogram. A more accurate cardinality estimation technique exists which is trying to capture dependencies To summarize underlying data for cardinality estimation, the original system R has made 2 simplifying assumptions: first they assumed data is uniformly distributed per attribute. Second, they assumed that join predicates are independent. Indexing approaches are making both assumptions to simplify the summarization and estimation process. However, such assumptions barely holds in real datasets. Thus, researchers came up with the idea of histograms to address the uniform data distribution assumption. using probabilistic graphical models we can build summaries that can address the join predicate independence assumptions. In this paper we extended the indexing and histogram cardinality estimation methods for freshness estimation according to the procedure explained in the previous slide.
  11. In order to measure the accuracy of the estimation approach we used the root mean square deviation error. That is, we sum over all squared differences between the estimated and actual freshness for all queries and we compute the rooted average to get a unique factor specifying the error of that method.
  12. The result showed that, indexing approach can achieve a very low estimation error and low storage space simultaneously. However, histogram can provide lower estimation error only with a huge summary size. We believe that majority of estimation errors in our queries, is caused by join dependencies which is not addressed by histogram. So we are hoping to further reduce the estimation error by using probabilistic graphical models in our future works.
  13. We have a materialized data and we have a system that fetches a subset of materialized data with time constraints and we need to optimize the view maintenance. A continues join query is the prefect use case for our scenario.
  14. We have a materialized data and we have a system that fetches a subset of materialized data with time constraints and we need to optimize the view maintenance. A continues join query is the prefect use case for our scenario.