Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)May 14, 2013http://db.u...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 2Query Plan Selection● Possible asse...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 3Outline Heuristics-Based Planning...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 4Heuristics-Based Plan Selection [Ha...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 5?p ex:affiliated_with <http://.../o...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 6?p ex:affiliated_with <http://.../o...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 7?p ex:affiliated_with <http://.../o...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 8Recall assumption:seed URIs = URIs ...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 9INSTANCE SEED RULE● Patterns to avo...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 10FILTER RULE● Filtering triple patt...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 11Outline Heuristics-Based Planning...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 12Next?Next?tp3= ( ?b , rdf:type , <...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 13Next?Next?tp3= ( ?b , rdf:type , <...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 14Next?Next?tp3= ( ?b , rdf:type , <...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 15Next?Next?tp3= ( ?b , rdf:type , <...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 16Next?Next?tp3= ( ?b , rdf:type , <...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 17Postponing Iterator [HBF09]● Idea:...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 19Outline Heuristics-Based Planning...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 20General Idea of Source RankingRank...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 21Harth et al. [HHK+10, UHK+11]● For...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 22Harth et al. [HHK+10, UHK+11]● For...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 23Ladwig and Tran [LT10]● Multiple s...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 24Metric: Triple Pattern Cardinality...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 25Metric: TF–ISF [LT10]● Idea: adopt...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 26Metric: Join Pattern Cardinality [...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 27Ladwig and Tran [LT10]● Multiple s...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 28Refinement at Run-Time [LT10]● Dur...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 29Outline Heuristics-Based Planning...
WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 30Tutorial Outline(1) Introduction(2) Theoretical Found...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 31These slides have been created byO...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 32These slides have been created byO...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 33Backup Slides
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 34Metric: Links to Results [LT10]● R...
WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 35Metric: Retrieval Cost [LT10]● Rat...
Upcoming SlideShare
Loading in …5
×

Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimization" (WWW 2013 Ed.)

1,200 views

Published on

These are the slides from my WWW 2013 Tutorial "Linked Data Query Processing" http://db.uwaterloo.ca/LDQTut2013/

Published in: Technology
  • Be the first to comment

Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimization" (WWW 2013 Ed.)

  1. 1. Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)May 14, 2013http://db.uwaterloo.ca/LDQTut2013/5. Query Planningand OptimizationOlaf HartigUniversity of Waterloo
  2. 2. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 2Query Plan Selection● Possible assessment criteria:● Benefit (size of computed query result)● Cost (overall query execution time)● Response time (time for returning k solutions)● To select from candidate plans, criteria must be estimated● For index-based source selection: estimation may bebased on information recorded in the index [HHK+10]● For (pure) live exploration: estimation impossible● No a-priori information available● Use heuristics instead
  3. 3. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 3Outline Heuristics-Based Planning Optimizing Link Traversing Iterators➢ Prefetching➢ Postponing Source Ranking➢ Harth et al. [HHK+10, UHK+11]➢ Ladwig and Tran [LT10]
  4. 4. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 4Heuristics-Based Plan Selection [Har11a]● Four rules:● DEPENDENCY RULE● SEED RULE● INSTANCE SEED RULE● FILTER RULE● Tailored to LTBQE implemented by link traversing iterators● Assumptions about queries:● Query pattern refers to instance data● URIs mentioned in the query pattern are the seed URIs
  5. 5. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 5?p ex:affiliated_with <http://.../orgaX>?p ex:interested_in ?b?b rdf:type <http://.../Book>QueryDEPENDENCY RULE● Dependency: a variable from each triple pattern alreadyoccurs in one of the preceding triple patternstp1= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1tp2= ( ?p , ex:interested_in , ?b ) I2tp3= ( ?b , rdf:type , <http://.../Book> ) I3Use a dependency respecting query plan√
  6. 6. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 6?p ex:affiliated_with <http://.../orgaX>?p ex:interested_in ?b?b rdf:type <http://.../Book>QueryDEPENDENCY RULE● Dependency: a variable from each triple pattern alreadyoccurs in one of the preceding triple patternstp1= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1tp2= ( ?p , ex:interested_in , ?b ) I2tp3= ( ?b , rdf:type , <http://.../Book> ) I3Use a dependency respecting query plan
  7. 7. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 7?p ex:affiliated_with <http://.../orgaX>?p ex:interested_in ?b?b rdf:type <http://.../Book>QueryDEPENDENCY RULE● Dependency: a variable from each triple pattern alreadyoccurs in one of the preceding triple patterns● Rationale:Avoidcartesianproductstp1= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1tp2= ( ?b , rdf:type , <http://.../Book> ) I2tp3= ( ?p , ex:interested_in , ?b ) I3Use a dependency respecting query plan
  8. 8. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 8Recall assumption:seed URIs = URIs in the querySEED RULE● Seed triple pattern of a plan… is the first triple pattern in the plan, and… contains at least one HTTP URI● Rationale:Good starting pointUse a plan with a seed triple pattern?p ex:affiliated_with <http://.../orgaX>?p ex:interested_in ?b?b rdf:type <http://.../Book>Query√√√
  9. 9. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 9INSTANCE SEED RULE● Patterns to avoid:✗ ?s ex:any_property ?o✗ ?s rdf:type ex:any_class● Rationale: URIs for vocabulary terms usually resolve tovocabulary definitions with little instance dataAvoid a seed triple pattern with vocabulary terms?p ex:affiliated_with <http://.../orgaX>?p ex:interested_in ?b?b rdf:type <http://.../Book>Query√
  10. 10. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 10FILTER RULE● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns● For each valuationconsumed as inputa filtering TP canonly report 1 or 0valuations asoutput● Rationale: Reducecosttp2= ( ?p , ex:interested_in , ?b ) I2tp3= ( ?b , rdf:type , <http://.../Book> ) I3Use a plan where all filtering triple patterns areas close to the first triple pattern as possible{ ?p = <http://.../alice> }{ ?p = <http://.../alice> , ?b = <http://.../b1> }tp2 = ( <http://.../alice> , ex:interested_in , ?b )tp3 = ( <http://.../b1> , rdf:type , <http://.../Book> )tp1= ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
  11. 11. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 11Outline Heuristics-Based Planning Optimizing Link Traversing Iterators➢ Prefetching➢ Postponing Source Ranking➢ Harth et al. [HHK+10, UHK+11]➢ Ladwig and Tran [LT10]√
  12. 12. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 12Next?Next?tp3= ( ?b , rdf:type , <http://.../Book> ) I3tp1= ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1tp2= ( ?p , ex:interested_in , ?b )tp2 = ( <http://.../alice> , ex:interested_in , ?b )I2query-localdataset{ ?p = <http://.../alice> }Link Traversing Iterators May Block!
  13. 13. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 13Next?Next?tp3= ( ?b , rdf:type , <http://.../Book> ) I3tp1= ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1tp2= ( ?p , ex:interested_in , ?b )tp2 = ( <http://.../alice> , ex:interested_in , ?b )I2query-localdataset{ ?p = <http://.../alice> }Link Traversing Iterators May Block!Initiate look-up(s)and wait
  14. 14. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 14Next?Next?tp3= ( ?b , rdf:type , <http://.../Book> ) I3tp1= ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1tp2= ( ?p , ex:interested_in , ?b )tp2 = ( <http://.../alice> , ex:interested_in , ?b )I2query-localdataset{ ?p = <http://.../alice> }Link Traversing Iterators May Block!Initiate look-up(s)and wait
  15. 15. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 15Next?Next?tp3= ( ?b , rdf:type , <http://.../Book> ) I3tp1= ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1tp2= ( ?p , ex:interested_in , ?b )tp2 = ( <http://.../alice> , ex:interested_in , ?b )I2query-localdataset{ ?p = <http://.../alice> }Prefetching of URIs [HBF09]Ensure look-upis finishedInitiatelook-upin thebackgroundInitiate look-up(s)and wait
  16. 16. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 16Next?Next?tp3= ( ?b , rdf:type , <http://.../Book> ) I3tp1= ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1tp2= ( ?p , ex:interested_in , ?b )tp2 = ( <http://.../alice> , ex:interested_in , ?b )I2query-localdataset{ ?p = <http://.../alice> }Prefetching of URIs [HBF09]Wait until look-upis finishedInitiatelook-upin thebackgroundInitiate look-up(s)and wait
  17. 17. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 17Postponing Iterator [HBF09]● Idea: temporarily reject an input solutionif processing it would cause blocking● Enabled by an extension of the iterator paradigm:● New function POSTPONE: treat the element most recentlyreported by GETNEXT as if ithas not yet been reported(i.e., “take back” this element)● Adjusted GETNEXT: either return a (new) next element orreturn a formerly postponed element
  18. 18. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 19Outline Heuristics-Based Planning Optimizing Link Traversing Iterators➢ Prefetching➢ Postponing Source Ranking➢ Harth et al. [HHK+10, UHK+11]➢ Ladwig and Tran [LT10]√√
  19. 19. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 20General Idea of Source RankingRank the URIs resulting from source selectionsuch thatthe ranking represents a priority for lookup● Possible objectives:● Report first solutions as early as possible● Minimize time for computing the first k solutions● Maximize the number of solutions computed in agiven amount of time
  20. 20. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 21Harth et al. [HHK+10, UHK+11]● For triple patterns this number is directly available:● Recall, each QTree bucket stores a set of (URI,count)-pairs● All query-relevant buckets are known after source selectionFor any URI u (selected by the QTree-based approach), let:rank(u) :═ estimated number of solutions that u contributes toRootBCAA1A2B2B1RootA BA1 A2CB1 B2
  21. 21. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 22Harth et al. [HHK+10, UHK+11]● For triple patterns this number is directly available:● Recall, each QTree bucket stores a set of (URI,count)-pairs● All query-relevant buckets are known after source selection● For BGPs, estimate the number recursively:● Recursively determine regions of join-able data(based on overlapping QTree buckets for each triple pattern)● For each of these regions, recursively estimate number oftriples the URI contributes to the region● Factor in the estimated join result cardinality of these regions(estimated based on overlap between contributing buckets)For any URI u (selected by the QTree-based approach), let:rank(u) :═ estimated number of solutions that u contributes to
  22. 22. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 23Ladwig and Tran [LT10]● Multiple scores● Triple pattern cardinality● Triple frequency – inverse source frequency (TF–ISF)● (URI-specific) join pattern cardinality● Incoming links● Assumption: pre-populated index that stores triple patterncardinalities and join pattern cardinalities for each URI● Aggregation of the scores to obtain ranks● For indexed URIs: weighted summation of all scores● For non-indexed URIs: weighting of (currently known) in-links● Ranking is refined at run-time
  23. 23. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 24Metric: Triple Pattern Cardinality [LT10]● Rationale: data that contains many matching triplesis likely to contribute to many solutions● Requirement: pre-populated index that stores the cardinalities● Caveat: some triple patterns have a highcardinality for almost all URIs● Example: (?x, rdf:type, ?y)● These patterns do not discriminate URIsFor a selected URI u, and a triple pattern tp (from the query), let:card(u, tp) :═ number of triples in the data of u that match tp
  24. 24. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 25Metric: TF–ISF [LT10]● Idea: adopt TF-IDF concept to weight triple patterns● Triple Frequency – Inverse Source Frequency (TF–ISF)● Rationale:● Importance positively correlates to the number of matchingtriples that occur in the data for a URI● Importance negatively correlates to how often matchingtriples occur for all known URIs (i.e., all indexed URIs)For a selected URI u, a triple pattern tp, and a set of all knownURIs Uknown , let:tf.isf (u ,tp):=card (u ,tp) ∗ log( ∣U known∣{r∈U known ∣ card (r ,tp)>0})
  25. 25. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 26Metric: Join Pattern Cardinality [LT10]● Rationale: data that matches pairs of (joined) triple patternsis highly relevant, because it matches a largerpart of the query● Requirement: these join cardinalities are also pre-computedand stored in a pre-populated indexFor a selected URI u, two triple pattern tpi and tpj , andquery variable v, let:card(u, tpi , tpj , v) :═ number of solutions producedby joining tpi and tpj on variable vusing only the data from u
  26. 26. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 27Ladwig and Tran [LT10]● Multiple scores● Triple pattern cardinality● Triple frequency – inverse source frequency (TF–ISF)● (URI-specific) join pattern cardinality● Incoming links● Assumption: pre-populated index that stores triple patterncardinalities and join pattern cardinalities for each URI● Aggregation of the scores to obtain ranks● For indexed URIs: weighted summation of all scores● For non-indexed URIs: weighting of (currently known) in-links● Ranking is refined at run-time
  27. 27. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 28Refinement at Run-Time [LT10]● During query execution information becomes available(1) intermediate join results (2) more incoming links● Use it to adjust scores & ranking (for integrated execution)● Re-estimate join pattern cardinalities based on samples ofintermediate results (available from hash tables in SHJ)● Parameters for influencing behavior of ranking process:● Invalid score threshold: re-rank when the number of URIswith invalid scores passes this threshold● Sample size: larger samples give better estimates, but makethe process more costly● Re-sampling threshold: reuse cached estimates unless thehash table of join operators grows past this threshold
  28. 28. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 29Outline Heuristics-Based Planning Optimizing Link Traversing Iterators➢ Prefetching➢ Postponing Source Ranking➢ Harth et al. [HHK+10, UHK+11]➢ Ladwig and Tran [LT10]√√√
  29. 29. WWW 2013 Tutorial on Linked Data Query Processing [ Introduction ] 30Tutorial Outline(1) Introduction(2) Theoretical Foundations(3) Source Selection Strategies(4) Execution Process(5) Query Planning and Optimization… Thanks!
  30. 30. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 31These slides have been created byOlaf Hartigfor theWWW 2013 tutorial onLink Data Query ProcessingTutorial Website: http://db.uwaterloo.ca/LDQTut2013/This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/)(Some of the slides in this slide set have been inspired byslides from Günter Ladwig [LT10] – Thanks!)
  31. 31. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 32These slides have been created byOlaf Hartigfor theWWW 2013 tutorial onLink Data Query ProcessingTutorial Website: http://db.uwaterloo.ca/LDQTut2013/This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/)(Slides 24 - 26, 33, and 34 are inspired by slidesfrom Günter Ladwig [LT10] – Thanks!)
  32. 32. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 33Backup Slides
  33. 33. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 34Metric: Links to Results [LT10]● Rationale: a URI is more relevant if data frommany relevant URIs mention it● Links are only discovered at run-timeThe “links to results” of a selected URI u is defined by:where Uprocessed is the set of URIs whose data has already beenprocessed and links( u1 , u2 ) are the links to URI u1 mentionedin the data from URI u2.links(u):={l ∈links(u ,uprocessed )∣u processed ∈U processed }
  34. 34. WWW 2013 Tutorial on Linked Data Query Processing [ Query Planning and Optimization ] 35Metric: Retrieval Cost [LT10]● Rationale: URIs are more relevant the faster their data canbe retrieved● Size is available in the pre-populated index● Bandwidth for any particular host can be approximatedbased on past experience or average performancerecorded during the query execution processThe retrieval cost of a selected URI u is defined by:cost( u) :═ Agg( size(u) , bandwidth(u) )where size(u) is the of the data from u, and bandwidth(u) is thebandwidth of the Web server that hosts u.

×