0
The Impact of       Data Caching onQuery Execution for Linked Data                                          Olaf Hartig   ...
Can we query the Web of Data as of it were a single, giant database?                    SELECT DISTINCT ?i ?label         ...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:     ●   Evaluate parts of...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:     ●   Evaluate parts of...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:                          ...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:                          ...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:                          ...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:                          ...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:     ●   Evaluate parts of...
Main Idea ●   Intertwine query evaluation with traversal of data links ●   We alternate between:     ●   Evaluate parts of...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Main Idea ●   Intertwine query evaluation with traversal of data links                                                    ...
Characteristics ●   Link traversal based query execution:     ●   Evaluation on a continuously augmented dataset     ●   D...
The Issue Query                 ?acq interest                                              ?i               s            o...
The Issue Query                 ?acq interest                                              ?i               s            o...
The Issue Query                 ?acq interest                                         http://bob.name                     ...
The Issue Query                 ?acq interest                                         http://bob.name                     ...
The Issue Query                 ?acq interest                                              ?i               s            o...
Reusing the Query-Local Dataset Query                 ?acq interest                                              ?i       ...
Reusing the Query-Local Dataset Query                 ?acq interest                                              ?i       ...
Reusing the Query-Local Dataset Query                 ?acq interest                                              ?i       ...
Hypothesis        Re-using the query-local dataset (a.k.a. data caching)                            may benefit           ...
Contributions ●   Systematic analysis of the impact of data caching     ●         Theoretical foundation*     ●         Co...
Experiment – Scenario                                                                              ●   Information about t...
Experiment – Single Query  no reuse        reuse 0 10 20 30 40 50 60                  query                        per    ...
Experiment – Single Query  no reuse        reuse 0 10 20 30 40 50 60                  query                        per    ...
Experiment – Single Query  no reuse        reuse 0 10 20 30 40 50 60                        per                           ...
Experiment – Single Query  no reuse        reuse 0 10 20 30 40 50 60                        per                           ...
Experiment – Single Query  no reuse        reuse 0 10 20 30 40 50 60                        per                           ...
Experiment – Complete Sequence  no reuse        reuse 0 10 20 30 all 50 60                  query                        p...
Experiment – Complete Sequence  no reuse        reuse 0 10 20 30 all 50 60                        per  reuse 40           ...
Experiment – Complete Sequence  no reuse        reuse 0 10 20 30 all 50 60                        per  reuse 40           ...
Experiment – Complete Sequence  no reuse        reuse 0 10 20 30 all 50 60                        per  reuse 40           ...
Experiment – Complete Sequence  no reuse        reuse 0 10 20 30 all 50 60                        per  reuse 40           ...
Outlook ●   Requirements of a data cache:     ●   Replacement mechanism     ●   Coherency mechanismOlaf Hartig - The Impac...
Cache Replacement ●   Cache full → remove descriptor objects ●   Replacement strategy     ●   Primary goal: maximize hit r...
Studying Cache Replacement?Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data   46
Studying Cache Replacement?                      “Web cache replacement in its general                       form seems to...
Studying Cache Replacement?                      “Web cache replacement in its general                       form seems to...
Cache Coherency ●   Data items in the cache may become inconsistent ●   Strong cache consistency     ●   Server validation...
Client Validation ●   Polling every time ●   Enables strong cache consistency ●   Conditional GET     ●   Request with If-...
Client Validation ●   Polling every time ●   Enables strong cache consistency ●   Conditional GET     ●   Request with If-...
Time to Live (TTL) ●   TTL field: life time estimation for each object ●   Supported by HTTP response headers:     ●   Exp...
Time to Live (TTL) ●   TTL field: life time estimation for each object ●   Supported by HTTP response headers:     ●   Exp...
Adaptive TTL ●   Assumption:     ●   The older an object, the less likely it is to be modified ●   TTL is a percentage of ...
Adaptive TTL ●   Assumption:     ●   The older an object, the less likely it is to be modified ●   TTL is a percentage of ...
Summary ●   Systematic analysis of the impact of data cache     ●   Theoretical foundation     ●   Conceptual analysis    ...
Backup SlidesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data   57
Contributions ●   Theoretical foundation (extension of the original definition)     ●   Reachability by a Dseed-initialize...
Query Template Contact SELECT * WHERE {             <PERSON> foaf:knows ?p .                             OPTIONAL       { ...
Query Template UnsetProps SELECT DISTINCT ?result ?resultLabel WHERE {    ?result rdfs:isDefinedBy <http://xmlns.com/foaf/...
Query Template Incoming SELECT DISTINCT ?result WHERE {   ?result foaf:knows <PERSON> .       OPTIONAL       {         ?re...
Query Template 2ndDegree1 SELECT DISTINCT ?result WHERE {    <PERSON> foaf:knows ?p1 .    <PERSON> foaf:knows ?p2 .    FIL...
Query Template 2ndDegree2 SELECT DISTINCT ?result WHERE {    <PERSON> foaf:knows ?p1 .    <PERSON> foaf:knows ?p2 .    FIL...
Experiment – Single Query         Experiment              Avg.1 number of                        Average1                A...
Experiment – Single Query         Experiment              Avg.1 number of                        Average1                A...
Experiment – Complete Sequence         Experiment              Avg.1 number of                        Average1            ...
Experiment – Complete Sequence     Experiment                  Avg.1 number of                        Average1            ...
These slides have been created by                                       Olaf Hartig                                       ...
Upcoming SlideShare
Loading in...5
×

The Impact of Data Caching of on Query Execution for Linked Data

1,232

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,232
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "The Impact of Data Caching of on Query Execution for Linked Data"

  1. 1. The Impact of Data Caching onQuery Execution for Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin
  2. 2. Can we query the Web of Data as of it were a single, giant database? SELECT DISTINCT ?i ?label WHERE { ?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i . } OPTIONAL { } ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") ORDER BY ?label ? Our approach: Link Traversal Based Query Execution [ISWC09]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
  3. 3. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
  4. 4. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
  5. 5. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
  6. 6. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
  7. 7. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
  8. 8. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data “Descriptor object” to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
  9. 9. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
  10. 10. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://bob.name Query kno ws http://bob.name http://alice.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
  11. 11. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://bob.name Query kno ws http://bob.name http://alice.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
  12. 12. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
  13. 13. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
  14. 14. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
  15. 15. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
  16. 16. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
  17. 17. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://alice.name Query pr o http://bob.name jec t ?prjName http://.../AlicesPrj s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
  18. 18. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset http://alice.name Query pr o http://bob.name jec t ?prjName http://.../AlicesPrj s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
  19. 19. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
  20. 20. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset ?prj ?prjName http://.../AlicesPrj “…“ Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
  21. 21. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset ?prj ?prjName http://.../AlicesPrj “…“ Query ?acq ?prj ?prjName http://bob.name ?prjName http://alice.name http://.../AlicesPrj “…“ s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
  22. 22. Characteristics ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance ● Limitations: ● Query has to contain a URI as a starting point ● Ignores data that is not reachable* by the query execution * formal definition in [LDOW11a]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
  23. 23. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
  24. 24. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel htt query-local p: //b ob dataset ? .nam eOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
  25. 25. The Issue Query ?acq interest http://bob.name ?i kno s ow w s label kn http://alice.name http://bob.name ?iLabel query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
  26. 26. The Issue Query ?acq interest http://bob.name ?i kno s ow w s label kn http://alice.name http://bob.name ?iLabel query-local dataset ?acq ?i ?iLabelOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
  27. 27. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
  28. 28. Reusing the Query-Local Dataset Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
  29. 29. Reusing the Query-Local Dataset Query ?acq interest ?i s ow label kn http://bob.name ?iLabel http://alice.name o ws Query kn http://bob.name http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
  30. 30. Reusing the Query-Local Dataset Query ?acq interest ?i ?acq s ow http://alice.name label kn http://bob.name ?iLabel http://alice.name o ws Query kn http://bob.name http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
  31. 31. Hypothesis Re-using the query-local dataset (a.k.a. data caching) may benefit query performance + result completenessOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
  32. 32. Contributions ● Systematic analysis of the impact of data caching ● Theoretical foundation* ● Conceptual analysis* ● Empirical evaluation of the potential impact * see [LDOW11a] ● Out of scope: Caching strategies (replacement, invalidation)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
  33. 33. Experiment – Scenario ● Information about the distributed social network of FOAF profiles ● 5 types of queries ● Experiment Setup: ● 20 persons ● Sequential use ➔ 100 queriesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
  34. 34. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 query per ● no reuse experiment: 0,01 0,1 1 10 100ContactInfoDanBri ● No data caching (Query No. 61) ● reuse per query experimentUnsetPropsDanBri ● Reuse of query-local dataset (Query No. 62) for 3 executions of each query2ndDegree1DanBri ● Third execution measured (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
  35. 35. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 query per ● no 0,01 reuse experiment: 0,1 1 10 100ContactInfoDanBri ● No data caching (Query No. 61) ● reuse per query experimentUnsetPropsDanBri ● Reuse of query-local dataset (Query No. 62) for 3 executions of each query2ndDegree1DanBri ● Third execution measured (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
  36. 36. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
  37. 37. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
  38. 38. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
  39. 39. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 query per reuse 40 queries ● reuse all queries experiment: 100 0,01 0,1 1 10ContactInfoDanBri ● Reuse of the query-local (Query No. 61) dataset for the complete sequence of all 100 queriesUnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
  40. 40. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
  41. 41. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
  42. 42. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
  43. 43. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
  44. 44. Outlook ● Requirements of a data cache: ● Replacement mechanism ● Coherency mechanismOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
  45. 45. Cache Replacement ● Cache full → remove descriptor objects ● Replacement strategy ● Primary goal: maximize hit rate ● Recency-based ● Frequency-based ● Function-based ● Randomized ● Replacement process ● Watermarks: high and lowOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
  46. 46. Studying Cache Replacement?Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 46
  47. 47. Studying Cache Replacement? “Web cache replacement in its general form seems to be a solved topic.” S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
  48. 48. Studying Cache Replacement? “Web cache replacement in its general form seems to be a solved topic.” S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003 ● 6 quad indexes* in main memory ● Size grows linear in the number of quads ● Example (after reuse all queries experiment, 100 queries): ● 905 descriptor objects, overall number of 745,756 triples ● ca. 103 MB ➔ Available main memory is almost no limit * as introduced in [LDOW11b]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
  49. 49. Cache Coherency ● Data items in the cache may become inconsistent ● Strong cache consistency ● Server validation ● Client validation ● Weak cache consistency ● Time to live (TTL) ● Adaptive TTLOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
  50. 50. Client Validation ● Polling every time ● Enables strong cache consistency ● Conditional GET ● Request with If-Modified-Since header ● Possible response: 304 Not ModifiedOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
  51. 51. Client Validation ● Polling every time ● Enables strong cache consistency ● Conditional GET ● Request with If-Modified-Since header ● Possible response: 304 Not Modified ● Not supported by most Linked Data servers ● Experiment based on the CKAN catalog of linked datasets ● 41 out of 154 example resources (26.6%) from 110 datasetsOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
  52. 52. Time to Live (TTL) ● TTL field: life time estimation for each object ● Supported by HTTP response headers: ● Expires ● Cache-Control: max-age ● When TTL elapses, object is invalid ● Accessing an invalid object → re-retrieve object again ● Conditional GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
  53. 53. Time to Live (TTL) ● TTL field: life time estimation for each object ● Supported by HTTP response headers: ● Expires 37.0% ● Cache-Control: max-age 37.7% ● When TTL elapses, object is invalid ● Accessing an invalid object → re-retrieve object again ● Conditional GET 26.6% ● Alternative (due to lack of support in Linked Data servers): ● Assume a default TTL for each object ● Ordinary GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
  54. 54. Adaptive TTL ● Assumption: ● The older an object, the less likely it is to be modified ● TTL is a percentage of the age: ● Threshold = 10% ; age = 30 days → TTL = 3 days ● Last verification: yesterday → invalidation in 2 days ● HTTP-based implementation: ● Calculation of age: use Last-Modified response header ● Verification with conditional GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
  55. 55. Adaptive TTL ● Assumption: ● The older an object, the less likely it is to be modified ● TTL is a percentage of the age: ● Threshold = 10% ; age = 30 days → TTL = 3 days ● Last verification: yesterday → invalidation in 2 days ● HTTP-based implementation: 35.1% ● Calculation of age: use Last-Modified response header ● Verification with conditional GET ● Alternative (due to lack of support in Linked Data servers): ● Assume Last-Modified is time of first retrieval ● Verification by comparing a response to the current versionOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
  56. 56. Summary ● Systematic analysis of the impact of data cache ● Theoretical foundation ● Conceptual analysis ● Empirical evaluation ● Main findings: ● Additional results possible (for semantically similar queries) ● Impact on performance may be positive but also negative ● Future work: ● Analysis of caching strategies in our context ● Main issue: invalidationOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
  57. 57. Backup SlidesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 57
  58. 58. Contributions ● Theoretical foundation (extension of the original definition) ● Reachability by a Dseed-initialized execution of a BGP query b ● Dseed-dependent solution for a BGP query b ● Reachability R(B) for a serial execution of B = b1 , … , bn ➔ Each solution for bcur is also R(B)-dependent solution for bcur ● Conceptual analysis of the impact of data caching ● Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B ) ● Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] ) ● Empirical verification of the potential impact ● Out of scope: Caching strategies (replacement, invalidation)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
  59. 59. Query Template Contact SELECT * WHERE { <PERSON> foaf:knows ?p . OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname } OPTIONAL { ?p foaf:birthday ?birthday } OPTIONAL { ?p foaf:img ?img } OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID } }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 59
  60. 60. Query Template UnsetProps SELECT DISTINCT ?result ?resultLabel WHERE { ?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> . ?result rdfs:domain foaf:Person . OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) ) <PERSON> foaf:knows ?var2 . ?var2 ?result ?var3 . ?result rdfs:label ?resultLabel . ?result vs:term_status ?var1 . } ORDER BY ?var1Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
  61. 61. Query Template Incoming SELECT DISTINCT ?result WHERE { ?result foaf:knows <PERSON> . OPTIONAL { ?result foaf:knows ?var1 . FILTER ( <PERSON> = ?var1 ) <PERSON> foaf:knows ?result . } FILTER ( !bound(?var1) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
  62. 62. Query Template 2ndDegree1 SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?p1 foaf:knows ?result . FILTER ( <PERSON> != ?result ) ?p2 foaf:knows ?result . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
  63. 63. Query Template 2ndDegree2 SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?result foaf:knows ?p1 . FILTER ( <PERSON> != ?result ) ?result foaf:knows ?p2 . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
  64. 64. Experiment – Single Query Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 1 Averaged over all 100 queries ● In the ideal case for Bupper= [ bcur , bcur ] : ● pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] ) ● supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
  65. 65. Experiment – Single Query Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 1 Averaged over all 100 queries ● Summary (measurement errors aside): ● Same number of query results ● Significant improvements in query performanceOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
  66. 66. Experiment – Complete Sequence Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 44.87 0.991 37.91 s reuse all queries (178.36) (0.053) (112.94) 1 Averaged over all 100 queries ● Summary: ● Data cache may provide for additional query results ● Impact on performance may be positive but also negativeOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
  67. 67. Experiment – Complete Sequence Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 44.87 0.991 37.91 s reuse all queries (178.36) (0.053) (112.94) reuse all queries 118.18 0.992 20.61 s (random orders) (867.07) (0.016) (216.61) ● Executing the query sequence in a random order results in measurements similar to the given order.Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
  68. 68. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×