The Impact of Data Caching of on Query Execution for Linked Data

  • 1,145 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,145
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
28
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Impact of Data Caching onQuery Execution for Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin
  • 2. Can we query the Web of Data as of it were a single, giant database? SELECT DISTINCT ?i ?label WHERE { ?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i . } OPTIONAL { } ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") ORDER BY ?label ? Our approach: Link Traversal Based Query Execution [ISWC09]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
  • 3. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
  • 4. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
  • 5. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
  • 6. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
  • 7. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
  • 8. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: htt p:/ ? ● Evaluate parts of the query (triple patterns) /bo on a continuously augmented set of data b.n am Look up URIs in intermediate e ● solutions and add retrieved data “Descriptor object” to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
  • 9. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
  • 10. Main Idea ● Intertwine query evaluation with traversal of data links ● We alternate between: ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://bob.name Query kno ws http://bob.name http://alice.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
  • 11. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://bob.name Query kno ws http://bob.name http://alice.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
  • 12. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
  • 13. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
  • 14. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) ? me on a continuously augmented set of data a e.n lic a :// ● Look up URIs in intermediate p htt solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
  • 15. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
  • 16. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
  • 17. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the query-local dataset http://alice.name Query pr o http://bob.name jec t ?prjName http://.../AlicesPrj s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
  • 18. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset http://alice.name Query pr o http://bob.name jec t ?prjName http://.../AlicesPrj s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
  • 19. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
  • 20. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset ?prj ?prjName http://.../AlicesPrj “…“ Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
  • 21. Main Idea ● Intertwine query evaluation with traversal of data links ?acq ● We alternate between: http://alice.name ● Evaluate parts of the query (triple patterns) on a continuously augmented set of data ● Look up URIs in intermediate ?acq ?prj http://alice.name http://.../AlicesPrj solutions and add retrieved data to the query-local dataset ?prj ?prjName http://.../AlicesPrj “…“ Query ?acq ?prj ?prjName http://bob.name ?prjName http://alice.name http://.../AlicesPrj “…“ s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
  • 22. Characteristics ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance ● Limitations: ● Query has to contain a URI as a starting point ● Ignores data that is not reachable* by the query execution * formal definition in [LDOW11a]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
  • 23. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
  • 24. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel htt query-local p: //b ob dataset ? .nam eOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
  • 25. The Issue Query ?acq interest http://bob.name ?i kno s ow w s label kn http://alice.name http://bob.name ?iLabel query-local datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
  • 26. The Issue Query ?acq interest http://bob.name ?i kno s ow w s label kn http://alice.name http://bob.name ?iLabel query-local dataset ?acq ?i ?iLabelOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
  • 27. The Issue Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
  • 28. Reusing the Query-Local Dataset Query ?acq interest ?i s ow label kn http://bob.name ?iLabel query-local dataset Query http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
  • 29. Reusing the Query-Local Dataset Query ?acq interest ?i s ow label kn http://bob.name ?iLabel http://alice.name o ws Query kn http://bob.name http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
  • 30. Reusing the Query-Local Dataset Query ?acq interest ?i ?acq s ow http://alice.name label kn http://bob.name ?iLabel http://alice.name o ws Query kn http://bob.name http://bob.name ?prjName s ow me kn na ?acq query-local project ?prj datasetOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
  • 31. Hypothesis Re-using the query-local dataset (a.k.a. data caching) may benefit query performance + result completenessOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
  • 32. Contributions ● Systematic analysis of the impact of data caching ● Theoretical foundation* ● Conceptual analysis* ● Empirical evaluation of the potential impact * see [LDOW11a] ● Out of scope: Caching strategies (replacement, invalidation)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
  • 33. Experiment – Scenario ● Information about the distributed social network of FOAF profiles ● 5 types of queries ● Experiment Setup: ● 20 persons ● Sequential use ➔ 100 queriesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
  • 34. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 query per ● no reuse experiment: 0,01 0,1 1 10 100ContactInfoDanBri ● No data caching (Query No. 61) ● reuse per query experimentUnsetPropsDanBri ● Reuse of query-local dataset (Query No. 62) for 3 executions of each query2ndDegree1DanBri ● Third execution measured (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
  • 35. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 query per ● no 0,01 reuse experiment: 0,1 1 10 100ContactInfoDanBri ● No data caching (Query No. 61) ● reuse per query experimentUnsetPropsDanBri ● Reuse of query-local dataset (Query No. 62) for 3 executions of each query2ndDegree1DanBri ● Third execution measured (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
  • 36. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
  • 37. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
  • 38. Experiment – Single Query no reuse reuse 0 10 20 30 40 50 60 per 0,01 0,1 1 10 100 queryContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
  • 39. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 query per reuse 40 queries ● reuse all queries experiment: 100 0,01 0,1 1 10ContactInfoDanBri ● Reuse of the query-local (Query No. 61) dataset for the complete sequence of all 100 queriesUnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
  • 40. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
  • 41. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
  • 42. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
  • 43. Experiment – Complete Sequence no reuse reuse 0 10 20 30 all 50 60 per reuse 40 0,01 0,1 1 10 100 query queriesContactInfoDanBri (Query No. 61)UnsetPropsDanBri (Query No. 62)2ndDegree1DanBri (Query No. 63)2ndDegree2DanBri (Query No. 64) IncomingDanBri (Query No. 65) 0 10 20 30 40 50 60 0,01 0,1 1 10 100 number of query results query execution time (in seconds)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
  • 44. Outlook ● Requirements of a data cache: ● Replacement mechanism ● Coherency mechanismOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
  • 45. Cache Replacement ● Cache full → remove descriptor objects ● Replacement strategy ● Primary goal: maximize hit rate ● Recency-based ● Frequency-based ● Function-based ● Randomized ● Replacement process ● Watermarks: high and lowOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
  • 46. Studying Cache Replacement?Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 46
  • 47. Studying Cache Replacement? “Web cache replacement in its general form seems to be a solved topic.” S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
  • 48. Studying Cache Replacement? “Web cache replacement in its general form seems to be a solved topic.” S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003 ● 6 quad indexes* in main memory ● Size grows linear in the number of quads ● Example (after reuse all queries experiment, 100 queries): ● 905 descriptor objects, overall number of 745,756 triples ● ca. 103 MB ➔ Available main memory is almost no limit * as introduced in [LDOW11b]Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
  • 49. Cache Coherency ● Data items in the cache may become inconsistent ● Strong cache consistency ● Server validation ● Client validation ● Weak cache consistency ● Time to live (TTL) ● Adaptive TTLOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
  • 50. Client Validation ● Polling every time ● Enables strong cache consistency ● Conditional GET ● Request with If-Modified-Since header ● Possible response: 304 Not ModifiedOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
  • 51. Client Validation ● Polling every time ● Enables strong cache consistency ● Conditional GET ● Request with If-Modified-Since header ● Possible response: 304 Not Modified ● Not supported by most Linked Data servers ● Experiment based on the CKAN catalog of linked datasets ● 41 out of 154 example resources (26.6%) from 110 datasetsOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
  • 52. Time to Live (TTL) ● TTL field: life time estimation for each object ● Supported by HTTP response headers: ● Expires ● Cache-Control: max-age ● When TTL elapses, object is invalid ● Accessing an invalid object → re-retrieve object again ● Conditional GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
  • 53. Time to Live (TTL) ● TTL field: life time estimation for each object ● Supported by HTTP response headers: ● Expires 37.0% ● Cache-Control: max-age 37.7% ● When TTL elapses, object is invalid ● Accessing an invalid object → re-retrieve object again ● Conditional GET 26.6% ● Alternative (due to lack of support in Linked Data servers): ● Assume a default TTL for each object ● Ordinary GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
  • 54. Adaptive TTL ● Assumption: ● The older an object, the less likely it is to be modified ● TTL is a percentage of the age: ● Threshold = 10% ; age = 30 days → TTL = 3 days ● Last verification: yesterday → invalidation in 2 days ● HTTP-based implementation: ● Calculation of age: use Last-Modified response header ● Verification with conditional GETOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
  • 55. Adaptive TTL ● Assumption: ● The older an object, the less likely it is to be modified ● TTL is a percentage of the age: ● Threshold = 10% ; age = 30 days → TTL = 3 days ● Last verification: yesterday → invalidation in 2 days ● HTTP-based implementation: 35.1% ● Calculation of age: use Last-Modified response header ● Verification with conditional GET ● Alternative (due to lack of support in Linked Data servers): ● Assume Last-Modified is time of first retrieval ● Verification by comparing a response to the current versionOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
  • 56. Summary ● Systematic analysis of the impact of data cache ● Theoretical foundation ● Conceptual analysis ● Empirical evaluation ● Main findings: ● Additional results possible (for semantically similar queries) ● Impact on performance may be positive but also negative ● Future work: ● Analysis of caching strategies in our context ● Main issue: invalidationOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
  • 57. Backup SlidesOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 57
  • 58. Contributions ● Theoretical foundation (extension of the original definition) ● Reachability by a Dseed-initialized execution of a BGP query b ● Dseed-dependent solution for a BGP query b ● Reachability R(B) for a serial execution of B = b1 , … , bn ➔ Each solution for bcur is also R(B)-dependent solution for bcur ● Conceptual analysis of the impact of data caching ● Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B ) ● Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] ) ● Empirical verification of the potential impact ● Out of scope: Caching strategies (replacement, invalidation)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
  • 59. Query Template Contact SELECT * WHERE { <PERSON> foaf:knows ?p . OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname } OPTIONAL { ?p foaf:birthday ?birthday } OPTIONAL { ?p foaf:img ?img } OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID } }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 59
  • 60. Query Template UnsetProps SELECT DISTINCT ?result ?resultLabel WHERE { ?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> . ?result rdfs:domain foaf:Person . OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) ) <PERSON> foaf:knows ?var2 . ?var2 ?result ?var3 . ?result rdfs:label ?resultLabel . ?result vs:term_status ?var1 . } ORDER BY ?var1Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
  • 61. Query Template Incoming SELECT DISTINCT ?result WHERE { ?result foaf:knows <PERSON> . OPTIONAL { ?result foaf:knows ?var1 . FILTER ( <PERSON> = ?var1 ) <PERSON> foaf:knows ?result . } FILTER ( !bound(?var1) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
  • 62. Query Template 2ndDegree1 SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?p1 foaf:knows ?result . FILTER ( <PERSON> != ?result ) ?p2 foaf:knows ?result . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
  • 63. Query Template 2ndDegree2 SELECT DISTINCT ?result WHERE { <PERSON> foaf:knows ?p1 . <PERSON> foaf:knows ?p2 . FILTER ( ?p1 != ?p2 ) ?result foaf:knows ?p1 . FILTER ( <PERSON> != ?result ) ?result foaf:knows ?p2 . OPTIONAL { <PERSON> ?knows ?result . FILTER ( ?knows = foaf:knows ) } FILTER ( !bound(?knows) ) }Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
  • 64. Experiment – Single Query Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 1 Averaged over all 100 queries ● In the ideal case for Bupper= [ bcur , bcur ] : ● pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] ) ● supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
  • 65. Experiment – Single Query Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 1 Averaged over all 100 queries ● Summary (measurement errors aside): ● Same number of query results ● Significant improvements in query performanceOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
  • 66. Experiment – Complete Sequence Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 44.87 0.991 37.91 s reuse all queries (178.36) (0.053) (112.94) 1 Averaged over all 100 queries ● Summary: ● Data cache may provide for additional query results ● Impact on performance may be positive but also negativeOlaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
  • 67. Experiment – Complete Sequence Experiment Avg.1 number of Average1 Avg.1 query Query Results Hit Rate Execution Time (std.dev.) (std.dev.) (std.dev.) 27.37 0.849 64.95 s no reuse (140.49) (0.205) (124.50) 26.71 1 0.02 s reuse per query (148.77) (0) (0.07) 44.87 0.991 37.91 s reuse all queries (178.36) (0.053) (112.94) reuse all queries 118.18 0.992 20.61 s (random orders) (867.07) (0.016) (216.61) ● Executing the query sequence in a random order results in measurements similar to the given order.Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
  • 68. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68