The Impact of Data Caching of on Query Execution for Linked Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The Impact of Data Caching of on Query Execution for Linked Data

on

  • 1,450 views

 

Statistics

Views

Total Views
1,450
Views on SlideShare
1,450
Embed Views
0

Actions

Likes
1
Downloads
28
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Impact of Data Caching of on Query Execution for Linked Data Presentation Transcript

  • 1. The Impact of Data Caching onQuery Execution for Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin
  • 2. Can we query the Web of Data as of it were a single, giant database? SELECT DISTINCT ?i ?label WHERE { ?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i . } OPTIONAL { } ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") ORDER BY ?label ? Our approach: Link Traversal Based Query Execution [ISWC09]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
  • 3. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
  • 4. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
  • 5. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
  • 6. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
  • 7. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
  • 8. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data “Descriptor object” to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
  • 9. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
  • 10. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://bob.name Query kno ws http://bob.name http://alice.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
  • 11. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://bob.name Query kno ws http://bob.name http://alice.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
  • 12. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
  • 13. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
  • 14. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
  • 15. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
  • 16. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
  • 17. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://alice.name Query pr o http://bob.name jec t ?prjName http://.../AlicesPrj s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
  • 18. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset http://alice.name Query pr o http://bob.name jec t ?prjName http://.../AlicesPrj s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
  • 19. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
  • 20. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset ?prj ?prjName http://.../AlicesPrj “…“ Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
  • 21. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset ?prj ?prjName http://.../AlicesPrj “…“ Query ?acq ?prj ?prjName http://bob.name ?prjName http://alice.name http://.../AlicesPrj “…“ s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
  • 22. Characteristics ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance ● Limitations: ● Query has to contain a URI as a starting point ● Ignores data that is not reachable* by the query execution * formal definition in [LDOW11a]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
  • 23. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
  • 24. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel htt query-local p: //b ob dataset ? .nam eOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
  • 25. The Issue Query ?acq interest http://bob.name ?i kno s ow w s label kn http://alice.name http://bob.name ?iLabel query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
  • 26. The Issue Query ?acq interest http://bob.name ?i kno s ow w s label kn http://alice.name http://bob.name ?iLabel query-local dataset ?acq ?i ?iLabelOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
  • 27. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
  • 28. Reusing the Query-Local Dataset Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
  • 29. Reusing the Query-Local Dataset Query ?acq interest ?i s ow label kn http://bob.name ?iLabel http://alice.name o ws Query kn http://bob.name http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
  • 30. Reusing the Query-Local Dataset Query ?acq interest ?i ?acq s ow http://alice.name label kn http://bob.name ?iLabel http://alice.name o ws Query kn http://bob.name http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
  • 31. Hypothesis Re-using the query-local dataset (a.k.a. data caching) may benefit query performance + result completenessOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
  • 32. Contributions ● Systematic analysis of the impact of data caching ● Theoretical foundation* ● Conceptual analysis* ● Empirical evaluation of the potential impact * see [LDOW11a] ● Out of scope: Caching strategies (replacement, invalidation)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
  • 33. Experiment – Scenario ● Information about the distributed social network of FOAF profiles ● 5 types of queries ● Experiment Setup: ● 20 persons ● Sequential use ➔ 100 queriesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
  • 34. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 query per ● no reuse experiment: 0,01 0,1 1 10 100ContactInfoDanBri ● No data caching (Query No. 61) ● reuse per query experimentUnsetPropsDanBri ● Reuse of query-local dataset (Query No. 62) for 3 executions of each query2ndDegree1DanBri ● Third execution measured (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
  • 35. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 query per ● no 0,01 reuse experiment: 0,1 1 10 100ContactInfoDanBri ● No data caching (Query No. 61) ● reuse per query experimentUnsetPropsDanBri ● Reuse of query-local dataset (Query No. 62) for 3 executions of each query2ndDegree1DanBri ● Third execution measured (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
  • 36. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
  • 37. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
  • 38. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
  • 39. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 query per reuse 40 queries ● reuse all queries experiment: 100 0,01 0,1 1 10ContactInfoDanBri ● Reuse of the query-local (Query No. 61) dataset for the complete sequence of all 100 queriesUnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
  • 40. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
  • 41. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
  • 42. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
  • 43. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
  • 44. Outlook ● Requirements of a data cache: ● Replacement mechanism ● Coherency mechanismOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
  • 45. Cache Replacement ● Cache full → remove descriptor objects ● Replacement strategy ● Primary goal: maximize hit rate ● Recency-based ● Frequency-based ● Function-based ● Randomized ● Replacement process ● Watermarks: high and lowOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
  • 46. Studying Cache Replacement?Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 46
  • 47. Studying Cache Replacement? “Web cache replacement in its general form seems to be a solved topic.” S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
  • 48. Studying Cache Replacement? “Web cache replacement in its general form seems to be a solved topic.” S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003 ● 6 quad indexes* in main memory ● Size grows linear in the number of quads ● Example (after reuse all queries experiment, 100 queries): ● 905 descriptor objects, overall number of 745,756 triples ● ca. 103 MB ➔ Available main memory is almost no limit * as introduced in [LDOW11b]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
  • 49. Cache Coherency ● Data items in the cache may become inconsistent ● Strong cache consistency ● Server validation ● Client validation ● Weak cache consistency ● Time to live (TTL) ● Adaptive TTLOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
  • 50. Client Validation ● Polling every time ● Enables strong cache consistency ● Conditional GET ● Request with If-Modified-Since header ● Possible response: 304 Not ModifiedOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
  • 51. Client Validation ● Polling every time ● Enables strong cache consistency ● Conditional GET ● Request with If-Modified-Since header ● Possible response: 304 Not Modified ● Not supported by most Linked Data servers ● Experiment based on the CKAN catalog of linked datasets ● 41 out of 154 example resources (26.6%) from 110 datasetsOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
  • 52. Time to Live (TTL) ● TTL field: life time estimation for each object ● Supported by HTTP response headers: ● Expires ● Cache-Control: max-age ● When TTL elapses, object is invalid ● Accessing an invalid object → re-retrieve object again ● Conditional GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
  • 53. Time to Live (TTL) ● TTL field: life time estimation for each object ● Supported by HTTP response headers: ● Expires 37.0% ● Cache-Control: max-age 37.7% ● When TTL elapses, object is invalid ● Accessing an invalid object → re-retrieve object again ● Conditional GET 26.6% ● Alternative (due to lack of support in Linked Data servers): ● Assume a default TTL for each object ● Ordinary GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
  • 54. Adaptive TTL ● Assumption: ● The older an object, the less likely it is to be modified ● TTL is a percentage of the age: ● Threshold = 10% ; age = 30 days → TTL = 3 days ● Last verification: yesterday → invalidation in 2 days ● HTTP-based implementation: ● Calculation of age: use Last-Modified response header ● Verification with conditional GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
  • 55. Adaptive TTL ● Assumption: ● The older an object, the less likely it is to be modified ● TTL is a percentage of the age: ● Threshold = 10% ; age = 30 days → TTL = 3 days ● Last verification: yesterday → invalidation in 2 days ● HTTP-based implementation: 35.1% ● Calculation of age: use Last-Modified response header ● Verification with conditional GET ● Alternative (due to lack of support in Linked Data servers): ● Assume Last-Modified is time of first retrieval ● Verification by comparing a response to the current versionOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
  • 56. Summary ● Systematic analysis of the impact of data cache ● Theoretical foundation ● Conceptual analysis ● Empirical evaluation ● Main findings: ● Additional results possible (for semantically similar queries) ● Impact on performance may be positive but also negative ● Future work: ● Analysis of caching strategies in our context ● Main issue: invalidationOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
  • 57. Backup SlidesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 57
  • 58. Contributions ● Theoretical foundation (extension of the original definition) ● Reachability by a Dseed-initialized execution of a BGP query b ● Dseed-dependent solution for a BGP query b ● Reachability R(B) for a serial execution of B = b1 , … , bn ➔ Each solution for bcur is also R(B)-dependent solution for bcur ● Conceptual analysis of the impact of data caching ● Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B ) ● Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] ) ● Empirical verification of the potential impact ● Out of scope: Caching strategies (replacement, invalidation)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
  • 59. Query Template Contact SELECT * WHERE { <PERSON> foaf:knows ?p . OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname } OPTIONAL { ?p foaf:birthday ?birthday } OPTIONAL { ?p foaf:img ?img } OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID } }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 59
  • 60. Query Template UnsetProps SELECT DISTINCT ?result ?resultLabel WHERE { ?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> . ?result rdfs:domain foaf:Person . OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) ) <PERSON> foaf:knows ?var2 . ?var2 ?result ?var3 . ?result rdfs:label ?resultLabel . ?result vs:term_status ?var1 . } ORDER BY ?var1Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
  • 61. Query Template Incoming SELECT DISTINCT ?result WHERE { ?result foaf:knows <PERSON> . OPTIONAL { ?result foaf:knows ?var1 . FILTER ( <PERSON> = ?var1 ) <PERSON> foaf:knows ?result . } FILTER ( !bound(?var1) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
  • 62. Query Template 2ndDegree1 SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?p1 foaf:knows ?result . FILTER ( <PERSON> != ?result ) ?p2 foaf:knows ?result . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
  • 63. Query Template 2ndDegree2 SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?result foaf:knows ?p1 . FILTER ( <PERSON> != ?result ) ?result foaf:knows ?p2 . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
  • 64. Experiment – Single Query Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 1 Averaged over all 100 queries ● In the ideal case for Bupper= [ bcur , bcur ] : ● pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] ) ● supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
  • 65. Experiment – Single Query Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 1 Averaged over all 100 queries ● Summary (measurement errors aside): ● Same number of query results ● Significant improvements in query performanceOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
  • 66. Experiment – Complete Sequence Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 44.87 0.991 37.91 s reuse all queries (178.36) (0.053) (112.94) 1 Averaged over all 100 queries ● Summary: ● Data cache may provide for additional query results ● Impact on performance may be positive but also negativeOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
  • 67. Experiment – Complete Sequence Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 44.87 0.991 37.91 s reuse all queries (178.36) (0.053) (112.94) reuse all queries 118.18 0.992 20.61 s (random orders) (867.07) (0.016) (216.61) ● Executing the query sequence in a random order results in measurements similar to the given order.Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
  • 68. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68