Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)May 14, 2013http://db.u...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 2● Result construction approach● i.e., query-local ...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 3Query-Specific Relevance of URIs● Definition: A UR...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 4Objective of Source Selection● Source selection: G...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 5Outline Objectives of Source Selection Index-Bas...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 6Idea of Index-Based Source Selection● Use a pre-po...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 7General Properties of Lookup Indexes● Index entrie...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 8Perfect Summaries● Triple-pattern-based indexes● “...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 9Approximate Summaries● Recall, index keys may rang...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 10Multidimensional Histograms● Transform RDF triple...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 11Multidimensional Histograms● Transform RDF triple...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 12RootQTree● Combination of histograms and R-trees ...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 13Index Construction● Given a set of URIs to index,...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 14Index Maintenance● Adding additionally discovered...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 15Outline Objectives of Source Selection Index-Ba...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 16Live Exploration● General idea: Perform a recursi...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 17Comparison to Focused Crawling● Separate pre-runt...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 18Outline Objectives of Source Selection Index-Ba...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 19Live Exploration – vs. – Index-Based● Possibiliti...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 20Hybrid Source SelectionWhy not get the best of bo...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 21Outline Objectives of Source Selection Index-Ba...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 22These slides have been created byOlaf Hartigfor t...
Upcoming SlideShare
Loading in …5
×

Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" (WWW 2013 Ed.)

943
-1

Published on

These are the slides from my WWW 2013 Tutorial "Linked Data Query Processing" http://db.uwaterloo.ca/LDQTut2013/

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
943
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
35
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" (WWW 2013 Ed.)

  1. 1. Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)May 14, 2013http://db.uwaterloo.ca/LDQTut2013/3. Source SelectionOlaf HartigUniversity of Waterloo
  2. 2. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 2● Result construction approach● i.e., query-local data processinghttp://mdb.../Paul http://geo.../Berlinhttp://mdb.../Ric http://geo.../Rome?loc?actor● Combining data retrievaland result construction● Data retrieval approach● Data source selection● Data source ranking(optional, for optimization)GET http://.../movie2449“Ingredients” for LD Query ExecutionQuery-local data
  3. 3. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 3Query-Specific Relevance of URIs● Definition: A URI is relevant for a given query if looking upthis URI gives us data that contributes to the query result.● Example:● Conjunctive query (BGP): { (Bob, lives in, ?x) , (?y, lives in, ?x) }● Looking up URI Bob gives us: { (Bob, lives in, Berlin) , ... }● Looking up URI Alice gives us: { (Alice, lives in, Berlin) , ... }● Hence, μ = { ?x → Berlin , ?y → Alice } is a solution● Thus, URIs Bob and Alice are relevant for the query● Simply contributing a matching triple is not sufficient:● Suppose, URI Charles gives us { (Charles, lives in, London) , ... }● Since the matching triple cannot be used for computinga solution, URI Charles is not relevant.
  4. 4. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 4Objective of Source Selection● Source selection: Given a Linked Data query,determine a set of URIs to look up● Ideal source selection approach:● For any query, selects all relevant URIs● For any query, selects relevant URIs only● Irrelevant URIs are not required to answer the query● Avoiding their lookup reduces cost of query executionssignificantly!● Caveat:● What URIs are relevant (resp. irrelevant) is unknownbefore the query execution has been completed.
  5. 5. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 5Outline Objectives of Source Selection Index-Based Strategy➢ General Idea➢ Possible Index Structures Live Exploration Strategy Comparison of both Strategies Combining both Strategies√
  6. 6. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 6Idea of Index-Based Source Selection● Use a pre-populated index structure to determine relevantURIs (and to avoid as many irrelevant ones as possible)● Example: triple-pattern-based indexes● For single triple pattern queries, sourceselection using such an index structure issound and complete (w.r.t. the indexed URIs)Entry: { uri1, uri2, … , urin }Key: tpGET uriimatches
  7. 7. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 7General Properties of Lookup Indexes● Index entries:● Usually, a set of URIs● Each URI in such an entry may be pairedwith a cardinality (utilized for source ranking)● Indexed URIs may appear multiple times(i.e., associated with multiple index keys)● Type of index keys depends on theparticular index structure used● e.g., triple patterns● Represent a summary of the data from all indexed URIs● Perfect summary: index keys are individual elements● Approximate summary: index keys may range over elements
  8. 8. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 8Perfect Summaries● Triple-pattern-based indexes● “Inverted URI Indexing” [UHK+11]● “Schema-level Indexing” [UHK+11]● Index keys: schema elements● Like a triple-pattern-based index that considers only two typesof triple patterns: ( ?s, property, ?o ) and ( ?s, rdf:type, class )● Tian et al. [TUY11]● Index keys: Unique encodings of combinations of triplepatterns (i.e., BGPs) frequently found in a query workloadKey: urimentioned inEntry: { uri1, … , urin } GET urii
  9. 9. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 9Approximate Summaries● Recall, index keys may range over elements● Advantage: approximation reduces index size● Disadvantage: index lookup may return false positives● Examples of data structures used:● Multidimensional histogram [UHK+11]● QTree [HHK+10, UHK+11]
  10. 10. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 10Multidimensional Histograms● Transform RDF triples to points in a 3-dimensional space(Bob, lives in, Berlin) → hash function → (422, 247, 143)
  11. 11. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 11Multidimensional Histograms● Transform RDF triples to points in a 3-dimensional space(Bob, lives in, Berlin) → hash function → (422, 247, 143)● Buckets partition that space into disjoint regions● Indexing: Each bucket contains entries for all URIs whosedata includes an RDF triple in the corresponding region● Source selection:● Transform triple patterns to lines / planes in the space(Bob, lives in, ?x) → (422, 247, ?)● Any URI relevant for the triple patternmay only be contained in buckets whoseregion is touched by the line / plane● Pruning due to non-overlapping regions
  12. 12. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 12RootQTree● Combination of histograms and R-trees (i.e., hierarchical)● Leaf nodes are the buckets● Different buckets mayrepresent regions ofdifferent size(in contrast to fixed-sizedregions used for MDH)● Non-populated regionsare ignored● Deals more efficiently with a spacethat is populated sparsely orcontains many clustersBCAA1A2RootA BA1 A2CB1 B2B2B1
  13. 13. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 13Index Construction● Given a set of URIs to index, each of these URIs needs tobe looked up and its data needs to be retrieved● Alternative: crawl the Web to obtain URIs and their data● Alternative: populate index as a by-product of executingqueries using live-exploration-based source selection
  14. 14. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 14Index Maintenance● Adding additionally discovered URIs● Keeping the index in sync with original data● Still an open research problem● Similar to index maintenance ininformation retrieval andview maintenance indatabase systems
  15. 15. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 15Outline Objectives of Source Selection Index-Based Strategy➢ General Idea➢ Possible Index Structures Live Exploration Strategy Comparison of both Strategies Combining both Strategies√√
  16. 16. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 16Live Exploration● General idea: Perform a recursive URI lookup processat query execution runtime● Start from a set of seed URIs● Explore the queried Web by traversing data links● Retrieved data serves two purposes:(1) Discover further URIs(2) Construct query result● Lookup of URIs may be constrained(i.e., not all links need be traversed)● Natural support of reachability-based query semantics
  17. 17. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 17Comparison to Focused Crawling● Separate pre-runtime (orbackground) process● Crawler populatesa search index ora local database● Essential part of the queryexecution process itself● Live exploration aimsto discover data foranswering a particularquery● URIs qualify for lookupbecause of their highrelevance for a topic● Relevance of URIsrelated to the queryat handFocused Crawling vs. Live Exploration
  18. 18. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 18Outline Objectives of Source Selection Index-Based Strategy➢ General Idea➢ Possible Index Structures Live Exploration Strategy Comparison of both Strategies Combining both Strategies√√√
  19. 19. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 19Live Exploration – vs. – Index-Based● Possibilities for parallelizeddata retrieval are limited● Data retrieval adds to queryexecution time significantly● Usable immediately● Most suitable for “on-demand” querying scenario● Depends on the structureof the network of data links● Data retrieval can be fullyparallelized● Reduces the impact of dataretrieval on query exec. time● Usable only afterinitialization phase● Depends on what has beenselected for the index● May miss new data sourcesNone of both strategies is superior over the other w.r.t.result completeness (under full-Web query semantics).● Both strategies may miss (different) solutions for a query
  20. 20. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 20Hybrid Source SelectionWhy not get the best of both strategies by combining them?● Ideas:● Use index to obtain seed URIs for live exploration(e.g., “mixed strategy” [LT10])● Feed back information discovered by live explorationto update, to expand, or to reorganize the index● Use data summary for controlling a live exploration process(e.g., by prioritizing the URIs scheduled for lookup)
  21. 21. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 21Outline Objectives of Source Selection Index-Based Strategy➢ General Idea➢ Possible Index Structures Live Exploration Strategy Comparison of both Strategies Combining both Strategies√√√√√Next part: 4. Execution Process ...
  22. 22. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 22These slides have been created byOlaf Hartigfor theWWW 2013 tutorial onLink Data Query ProcessingTutorial Website: http://db.uwaterloo.ca/LDQTut2013/This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/)(Slides 10,11, and 12 are inspired by slidesfrom Andreas Harth [HHK+10] – Thanks!)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×