Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ESWC2015 - Query Optimization for Clients of Linked Data Fragments

610 views

Published on

Query Optimization for Clients of Linked Data Fragments

Published in: Technology
  • Be the first to comment

ESWC2015 - Query Optimization for Clients of Linked Data Fragments

  1. 1. ELIS – Multimedia Lab Reducing HTTP traffic for scalable linked data consumption Query Execution Optimization for Clients of Triple Patterns Fragments Joachim Van Herwegen, Ruben Verborgh, Erik Mannens, Rik Van De Walle
  2. 2. 2 ELIS – Multimedia Lab SPARQL endpoints, data dumps, simple interfaces, … Still looking for the ultimate linked data solution Full SPARQL support High scalability Fast response time Low server & client load … Not found yet, so we focused on improving the response time for clients using simple interfaces (Triple Pattern Fragments). Accessing linked data
  3. 3. 3 ELIS – Multimedia Lab Accessing Linked Data Problem statement Improved join tree Optimizing local joins Bringing it all together Query Execution Optimization for Clients of Triple Patterns Fragments
  4. 4. 4 ELIS – Multimedia Lab Accessing Linked Data Problem statement Improved join tree Optimizing local joins Bringing it all together Query Execution Optimization for Clients of Triple Patterns Fragments
  5. 5. 5 ELIS – Multimedia Lab Linked Data access extremes SPARQL protocol Live data Full SPARQL support High server load Data dump Static data Remote: 1 query Local: full queries High client load
  6. 6. 6 ELIS – Multimedia Lab Generic way to describe how linked data can be accessed Data Results when accessing a selector Metadata Description of the fragment Controls Links to other fragments Verborgh et al. – Web-scale querying through Linked Data Fragments Linked Data Fragments
  7. 7. 7 ELIS – Multimedia Lab Accessing data through a SPARQL endpoint Data Bindings matching a SPARQL query Metadata { } (data contains everything needed) Controls { } (interface can answer everything) SPARQL endpoint
  8. 8. 8 ELIS – Multimedia Lab Accessing data through Triple Pattern Fragments Data Triples matching a triple pattern Metadata Count estimate, page size, etc. Controls First page, next page, root fragment Triple Pattern Fragments
  9. 9. 9 ELIS – Multimedia Lab Triple Pattern Fragments URI query results metadata/controls
  10. 10. 10 ELIS – Multimedia Lab Accessing Linked Data Problem statement Improved join tree Optimizing local joins Bringing it all together Query Execution Optimization for Clients of Triple Patterns Fragments
  11. 11. 11 ELIS – Multimedia Lab SELECT ?person ?city WHERE { ?person a db:Architect. 1200 triples ?person db:birthPlace ?city. 430,000 triples ?city dc:subject db:Category:Capitals_in_Europe. 60 triples } Start from the smallest pattern, apply bindings and do recursion Greedy algorithm birthPlace architect 400 40,000 Capitals 1
  12. 12. 12 ELIS – Multimedia Lab SELECT ?person ?city WHERE { ?person a db:Architect. 1200 triples ?person db:birthPlace ?city. 430,000 triples ?city dc:subject db:Category:Capitals_in_Europe. 60 triples } Find optimal solution for every pattern Optimized algorithm Capitals birthPlace architect 1 400 local 12
  13. 13. 13 ELIS – Multimedia Lab Accessing Linked Data Problem statement Improved join tree Optimizing local joins Bringing it all together Query Execution Optimization for Clients of Triple Patterns Fragments
  14. 14. 14 ELIS – Multimedia Lab Goal Minimize HTTP calls required to solve BGP query Solution 2 possible roles for every pattern in query: Download pattern completely or Bind variable and download resulting patterns Estimate best option for every pattern Optimized algorithm
  15. 15. 15 ELIS – Multimedia Lab ?player :team ?club 365,000 triples ?club :type :SoccerClub 16,000 triples ?club :ground ?city 15,000 triples ?city :country :Spain 7,000 triples ?player :birthPlace ?city 430,000 triples Always download smallest pattern Determine others on shared variables and results so far Can change during runtime Extended example
  16. 16. 16 ELIS – Multimedia Lab Extended example ?city :country :Spain ?city ?club :ground ?city ?player :birthPlace ?city ?club ?player ?player :team ?club ?club :type :SoccerClub supplies ?city supplied by ?city
  17. 17. 17 ELIS – Multimedia Lab First iteration ?city :country :Spain ?city ?club :ground ?city ?player :birthPlace ?city ?club ?player ?player :team ?club ?club :type :SoccerClub
  18. 18. 18 ELIS – Multimedia Lab Further iterations ?city :country :Spain ?city ?club :ground ?city ?player :birthPlace ?city ?club ?player ?player :team ?club ?club :type :SoccerClub Making sure no pattern is ignored
  19. 19. 19 ELIS – Multimedia Lab Estimate which option requires least HTTP calls. Download: #𝑡𝑟𝑖𝑝𝑙𝑒𝑠 𝑝𝑎𝑔𝑒𝑠𝑖𝑧𝑒 avg pages per binding avg bindings per triple Bind: 𝑎𝑣𝑔 𝑡𝑟𝑖𝑝𝑙𝑒𝑠/𝑏𝑖𝑛𝑑𝑖𝑛𝑔 𝑝𝑎𝑔𝑒𝑠𝑖𝑧𝑒 ⋅ max 𝑏𝑖𝑛𝑑𝑖𝑛𝑔𝑠 𝑓𝑜𝑢𝑛𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑒𝑑 ⋅ #𝑡𝑟𝑖𝑝𝑙𝑒𝑠 for all suppliers Swap when necessary, taking into account work done so far Updating pattern roles
  20. 20. 20 ELIS – Multimedia Lab Accessing Linked Data Problem statement Improved join tree Optimizing local joins Bringing it all together Query Execution Optimization for Clients of Triple Patterns Fragments
  21. 21. 21 ELIS – Multimedia Lab Most join data from previous iterations can be reused. Challenge: reuse as much data as possible. Local joining Triple data Iteration i Triple data Iteration i+1 New triples!
  22. 22. 22 ELIS – Multimedia Lab Join tree step Iteration i Iteration i + 1 Bindings i-1 Triples i New New Bindings i-1 Triples i Bindings i Bindings i New
  23. 23. 23 ELIS – Multimedia Lab Start with the largest unchanged, connected set of patterns. Estimate remainder of join order based on pattern size and connectivity. Minimizing joins New New Bindings i-1 Triples i New triples Propagated changes
  24. 24. 24 ELIS – Multimedia Lab Accessing Linked Data Problem statement Improved join tree Optimizing local joins Bringing it all together Query Execution Optimization for Clients of Triple Patterns Fragments
  25. 25. 25 ELIS – Multimedia Lab Prevent local optima Join tree instead of join path Reuse local join data Summary
  26. 26. 26 ELIS – Multimedia Lab Single machine Intel Core i5-3230M CPU @ 2.60GHz 8 GB RAM Both client and server Artificial delay of 100ms on server to simulate network delay Test setup
  27. 27. 27 ELIS – Multimedia Lab WatDiv benchmark queries, 100ms delay on server Median # HTTP calls Median time (s) Results
  28. 28. 28 ELIS – Multimedia Lab Less HTTP calls with more client-side processing Ideal for slow connection situations Still room for improvements No parallelism Focus on BGPs More work per HTTP call Not guaranteed to be better Conclusion
  29. 29. 29 ELIS – Multimedia Lab Thank you! Come see demo #13 on thursday Questions?

×