• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linked Data Top-K Query Processing
 

Linked Data Top-K Query Processing

on

  • 455 views

"Linked Data Top-K Query Processing" paper at ESWC'12.

"Linked Data Top-K Query Processing" paper at ESWC'12.

Statistics

Views

Total Views
455
Views on SlideShare
453
Embed Views
2

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Introduction:* Challenges in Current Linked Data Query Processing*Processing of Ranked Linked Data* Our ContributionsTop-k* Top-K Query Processing in a Linked Data Setting* Improving the Threshold Estimation* Eager Pruning of Partial Results
  • * Special case of federated query processing* Only http-lookups are availablefor data access* Entire sources have to be retrieved
  • * Provides strategies for computing only the k top-ranked results*Other (less relevant) results are not materialized* For computing the top-1 result, no data from src. 2 is needed.
  • *Tighter threshold estimation and early partial result pruning
  • * For instance, scores for triples can be obtained through PageRank inspired ranking [4]* However, no triples are indexed (i.e., each source must be scanned)
  • * Join inputs must be accessible in a descending score order* We store min/max triple score per source, and allow sources to be accessed in descending score order (via a scheduling strategy)
  • * Given our ranking function, sorted access and source index we can employ a push-based rank join
  • * The threshold allows us estimate scores of the unseen query result bindings and terminate early
  • Push-based symmetric hash join operator (shj) Rank-join operator with corner-bound (rj-cc) [6] Rank-join operator with tigther corner-bound and early pruning (rj-tc)* (all push-based join processing and left-deep join trees): * (due to network latency issues, sources were downloaded and Linked Data access was simulated on one single machine)
  • * Differences due to less input data retrieved* Some queries (e.g., q10 or q20) equal as result set too small (i.e., all (!) data had to retrieved)* Differences between rj-cc and rj-tc not showing properly in (a) as evaluation was on local machineOutlier q19 due to implementation issueQ9: early pruning: 8% of buffered data safed. However, no „real“ impact on efficiency -> main aspect here is number of source to be retrieved
  • (b) Average number of sources (different k, d = n). (c) Average evaluation time (different k, d = n). (d) Average evaluation time (different n, k = 10). (e) Average evaluation time with varying number of triple patterns (k = 1, d = n).
  • Q9: early pruning: 8% of buffered data safed. However, no „real“ impact on efficiency -> main aspect here is number of source to be retrieved
  • * ( „seen“ and „output“ buffers)* That is, any partial result having a (partial) score that together the maximal possible score for the unevaluated query part is ≤ than the currently smallest top-k score

Linked Data Top-K Query Processing Linked Data Top-K Query Processing Presentation Transcript

  • Top-k Linked Data Query Processing Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi StuderInstitute of Applied Informatics and Formal Description Methods (AIFB)KIT – University of the State of Baden-Wuerttemberg andNational Research Center of the Helmholtz Association www.kit.edu
  • Introduction and Motivation Top-k Linked Data Query Processing Evaluation Results2 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • INTRODUCTION & MOTIVATION3 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • Linked Data Query Processing Linked Data Query Processing Engine HTTP lookup data URI Src. data sources Problems: Efficiency and Scalability4 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Top-K Query Processing Users are usually interested in only a few results Top-K query processing addresses the efficiency and scalability issues ex:sgt_pepper foaf:name "Sgt. Pepper"; ex:song "Lucy". ex:beatles foaf:name Src. 1 "The Beatles"; Src. 2 ex:album ex:sgt_pepper; ex:album ex:help. SELECT * WHERE Src. 3 { ex:beatles ex:album ?album . ex:help foaf:name ?album ex:song ?song . "Help!"; } ex:song "Help!".5 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Contributions Transfer top-k query processing to the Linked Data setting Linked Data specific improvements of the top-k approach Evaluation using real-world data6 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • TOP-K LINKED DATA QUERY PROCESSING7 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • Top-K Query Processing in a Linked Data Setting (1) – Requirements (1) Source index mapping triple patterns to sources containing bindings (e.g., [1,2]) Ranking function determining the relevance of triple pattern bindings TP1: ex:beatles ex:album ?album . Linked Data TP2: ?album ex:song ?song . Query Processing source Engine index TP2 TP1 TP2 ex:sgt_pepper foaf:name score∈ [0,1] "Sgt. Pepper"; score ∈ [2,3] Src. 3 ex:song "Lucy".ex:beatles foaf:name Src. 1 "The Beatles"; ex:help foaf:nameex:album ex:sgt_pepper; "Help!";ex:album ex:help. Src. 2 score∈ [1,2] ex:song "Help!".8 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Top-K Query Processing in a Linked Data Setting (2) – Requirements (2) Sorted access on each join input 2 Src. 3 score ∈ [2,3] Scheduling 1 Strategy Src. 1 3 Src. 2 score ∈ [0,1] Bindings with TP1: score ∈ [1,2] descending ex:beatles ex:album ?album TP2: ?album ex:song ?song scores9 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Top-K Query Processing in a Scheduling Strategy: Linked Data Setting (3) – Push Bound Rank Joinsource 1 Load (1) 3 Score Query Bindings – Output Queue Score Seen Triples (TP1) 1 ex:beatles ex:album ex:sgt_pepper Score Seen Triples (TP2) Score Seen Triples (TP1) 1 ex:beatles ex:album 3 ex:help ex:song "Help!" ex:help Sorted Access for Sorted Access for ex:beatles foaf:name Src. ex:beatles ex:album ?album1. "The Beatles"; ?album foaf:name ?song 3 ex:help ex:song Src. ex:album ex:sgt_pepper; "Help!"; ex:album ex:help. ex:song "Help!".10 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Top-K Query Processing in a Linked Data Setting (4) – Push Bound Rank Join (2) Score Query Bindings – Output Queue Threshold: 4 4 ex:beatles ex:album ex:help . ex:help ex:song "Help!" . Score Seen Triples (TP1) 1 ex:beatles ex:album Found query binding with ex:sgt_pepper score ≥ threshold Seen Triples (TP2) Score 1 ex:beatles ex:album STOP 3 ex:help ex:song "Help!" ex:help Sorted Access for Sorted Access for ex:beatles ex:album ?album . ?album ex:song ?song Src. 211 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Improving the Threshold Estimation (1) Threshold estimation: Threshold: max { max_1 + min_2 , max_2 + min_1 } upper bound seenmax_1 max_2 Score Seen Triples (TP1) Score Seen Triples (TP2) +min_1 min_2 upper bound unseen We improve the threshold estimation: Star-shaped entity query bounds Look-ahead bounds12 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Improving the Threshold Estimation (2) Star-shaped Entity Query Bounds Observation: Results for entity queries come from one single source Idea: Upper bound scores for triple pattern bindings via the maximal possible triple score score ∈ [1,2]upper-bound ex:sgt_pepper foaf:namefor triple "Sgt. Pepper"; Src. 3 ex:song "Lucy".bindings: 3 ex:song ?y ex:help foaf:name ?x "Help!"; ex:song "Help!". Src. 2 foaf:name ?z score ∈ [2,3]upper-boundfor triple bindings: 3 upper bound for entity query bindings: 3 + 313 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Improving the Threshold Estimation (3) Look-ahead Bounds Idea: Provide a more accurate upper bound for the unseen bindings scores via the „next possible“ score Threshold: max { 1 + 3 , 1 + 3 } = 4 2 Score Query Bindings – Output Queue 4 ex:beatles ex:album ex:help . ex:help ex:song "Help!" .max_1 = 1 max_2 = 3 Score Seen Triples (TP1) Score Seen Triples (TP2) 1 ex:beatles ex:album 3 ex:help ex:song "Help!" Src. 3 ex:sgt_pepper min_2 = 3 1 ex:beatles ex:album ex:help min_2 = 2min_1 = 1 Sorted Access for ?album ex:song ?song Src. 2 Sorted Access for score ∈ [1,2] ex:beatles ex:album ?album .14 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • EVALUATION15 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • Evaluation – Setting We implemented three systems Push-based symmetric hash join operator [2,5] Standard top-k operator [6] Improved top-k operator Query set: 20 queries (8 FedBench and 12 own queries), having varying result size (1 to ~10.000) and complexity (2 to 5 triple patterns) Data set: ~ 2.000.000 triples, distributed over ~700.000 sources Parameters: k ∈ {1,5,10,20} and score distributions ∈ {uniform, normal, exponential}16 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Evaluation – Results (1) Overall Results Overview of processing times for all queries (k = 1, d = n) Top-k strategies lead to runtime improvement of 35% on average (compared to standard Linked Data processing) Tighter bounding lead to further improvements of 12% on average (compared to standard top-k processing)17 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • Evaluation – Results (2) Effect of K and Score Distributions18 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • CONCLUSION19 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • Conclusion We showed that top-k processing techniques are applicable to the Linked Data setting. Top-k strategies lead to significant time savings w.r.t. small values of k (in our experiments 35% on average) We showed that our improved top-k strategy lead to further runtime advantages (in our experiments 12% on average)20 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • QUESTIONS21 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • REFERENCES22 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • References [1] A. Harth, K. Hose, M. Karnstedt, A. Polleres, K. Sattler, and J. Umbrich. Data summaries for on-demand queries over linked data. In World Wide Web, 2010. [2] G. Ladwig and T. Tran. Linked Data Query Processing Strategies. In ISWC, 2010. [3] M. Wu, L. Berti-Equille, A. Marian, C. M. Procopiuc, and D. Srivastava. Processing top-k join queries. Proc. VLDB Endow., pages 860–870, 2010. [4] A. Harth, S. Kinsella, and S. Decker. Using naming authority to rank data and ontologies for web search. In ISWC, pages 277–292, 2009. [5] G. Ladwig and T. Tran. SIHJoin: Querying Remote and Local Linked Data. In ESWC, 2011. [6] K. Schnaitter and N. Polyzotis. Optimal algorithms for evaluating rank joins in database systems. ACM Trans. Database Syst., 35:6:1–6:47, 2010.23 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • BACKUP SLIDES24 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • Early Pruning of Partial Results Motivation: Top-k join processing can be quite costly in terms of memory consumption Idea: Prune such partial query results that cannot contribute to a final top-k result Currently known top-2 results: Rank Query Bindings – Output Queue 6 ex:help foaf:name "Help!". ex:song ?y ex:help ex:song "Help!" . 4 ex:sgt_pepper foaf:name "Sgt. Pepper". ?x ex:sgt_pepper ex:song "Lucy". foaf:name ?z Currently known partial results:upper-bound Rank Triple Pattern Binding ≤for triple bindings: 3 1 ex:sgt_pepper ex:song "Getting Better". +25 maximal score: 3 + 1 = 4 Institute of Applied Informatics and Formal Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer Description Methods (AIFB)