Index Structures and Top-k Joins for Native KeywordSearch DatabasesGünter Ladwig, Thanh TranConference on Information and ...
Contents       Introduction:               Native keyword search               Contributions       Index Structures       ...
Keyword Search on Graph-Structured Data                                  “john”                                           ...
Native Keyword Search                                  “john”                                                       Querie...
Native Keyword Search: EASE       Indexes at the level of r-maximal subgraphs               Given keyword query find relev...
Native Keyword Search using Top-k Joins       Fine-grained indexing at the level of paths                          “john” ...
Contributions       We propose a new processing strategy for the keyword       search problem based on standard database o...
INDEX STRUCTURES8   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Meth...
d-length 2-Hop Cover       Compact representation of connections in a graph               Used to find paths between two n...
Finding Paths Using Joins        To find paths between two nodes u and v                Retrieve neighborhoods NBu and NBv...
Index Storage        Pruned neighborhoods are stored as path entries        Path entry (w,s) for each hop node w in NBu   ...
KEYWORD QUERY       PROCESSING12   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal D...
Keyword Query Processing        Use joins to find connections between matching elements        for all keywords        Bas...
Query Plans                                                                                    “john”      No results!    ...
Integrated Query Plan        Join operators in all query plans:        Query plans for different join orders overlap      ...
Top-k Keyword-Join Processing        High number of operators        Terminate early after computing top-k instead of all ...
Operator Ranking        Prefer operators that have “promising” results        Global score of rank join operator, based on...
EVALUATION18   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (...
Evaluation        Four approaches                EASE: indexing at the level of graphs                KJ: keyword join app...
Results        KJ, KJU outperform EASE        Operator ranking is beneficial20    October 25th, 2011   CIKM 2011, Glasgow ...
Results        Benefit of operator ranking more pronounced for larger        queries as these need more join operators21  ...
Conclusion        Native keyword search based on data access and join        d-length 2-Hop Cover                Index at ...
Thank you for your attention! Questions?         Günter Ladwig, guenter.ladwig@kit.edu23   October 25th, 2011   CIKM 2011,...
BACKUP SLIDES24   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Method...
Introduction        Keyword search on graph-structured data (RDF)        Query Translation                Translate keywor...
Example                                    Query: “alice malta peter”                  Malta               l1             ...
Problem Definition        Given a graph GE=(NE,ER)        Find Steiner graphs connection keyword elements27    October 25t...
Scoring        Assumption: more compact Steiner graphs are more        relevant        Scoring function                GS:...
Approaches        Bidirectional Search                Explore graph from keyword elements to find connections             ...
d-Length 2-Hop Cover        Preliminaries        Compact representation of connections in a graph                Used to f...
Construction        Trivial d-length 2-hop cover is the set of all d-        neighborhoods of GE, but contains redundancie...
Example: Pruning                                                    center node     d=2                             p3    ...
Neighborhood Join                                                                                                         ...
Graph Join        Expand keyword graphs to keyword graph neighborhoods     Keyword Graph                                  ...
Integrated Query Plan        Number of join operators without operator sharing        Number of join operators with operat...
Upcoming SlideShare
Loading in …5
×

Index Structures and Top-k Joins for Native Keyword Search Databases

287 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
287
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Index Structures and Top-k Joins for Native Keyword Search Databases

  1. 1. Index Structures and Top-k Joins for Native KeywordSearch DatabasesGünter Ladwig, Thanh TranConference on Information and Knowledge Management (CIKM2011)Institute of Applied Informatics and Formal Description Methods (AIFB)KIT – University of the State of Baden-Württemberg andNational Large-scale Research Center of the Helmholtz Association www.kit.edu
  2. 2. Contents Introduction: Native keyword search Contributions Index Structures d-length 2-Hop Cover Path indexes Keyword Query Processing Integrated Query Plan Operator Ranking Evaluation Conclusion2 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  3. 3. Keyword Search on Graph-Structured Data “john” “2009” “acme” Queries “steve” “mary” “steve 2009” “john steve alice” “2009” “2009” “alice” Keyword queries over structured data Approaches Query translation (based on schema exploration) Native keyword search (based on data graph exploration)3 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  4. 4. Native Keyword Search “john” Queries “2009” “steve 2009” “acme” “john steve alice” “john” “2009” “steve” “mary” “steve” “2009” “mary” “2009” “steve” “alice” “2009” Match keywords to elements of the data graphs Find structures connecting these elements (Steiner graphs) More expensive than query translation approaches Preprocess data to reduce online effort4 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  5. 5. Native Keyword Search: EASE Indexes at the level of r-maximal subgraphs Given keyword query find relevant subgraphs using index Explore subgraphs to construct Steiner graphs “john” “john” “2009” “2009” “john”“acme” Query “steve 2009” “mary” “steve” “steve” “mary” “steve” Exploration “steve” “2009” “2009” “mary” “alice” “2009” “alice” High redundancy “2009” Requires special operations: exploration, pruning5 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  6. 6. Native Keyword Search using Top-k Joins Fine-grained indexing at the level of paths “john” “john”“steve” “2009” “steve” “mary” Query “steve” “john” “2009” “steve 2009” Joins “mary” “2009” “steve” “steve” “mary” “2009” More pruning, less redundancy: less storage required Enables use of database query processing concepts Data access and top-k joins Keyword search is now a “traditional” query processing problem6 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  7. 7. Contributions We propose a new processing strategy for the keyword search problem based on standard database operations data access and join For efficient data access we extend the 2-hop cover to pre- compute and materialize neighborhoods of data elements, indexing the data at the level of paths Keyword search requires consideration of a large number of query plans: push-based top-k join procedure ranks query plans during processing7 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  8. 8. INDEX STRUCTURES8 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  9. 9. d-length 2-Hop Cover Compact representation of connections in a graph Used to find paths between two nodes Extension of 2-Hop Cover to store only paths of length d or less 2-Hop Cover labels all nodes u with neighborhood NBu If two nodes u,v are connected via paths of length d or less then All paths of length d or less between center nodes u and v are of the form w is called a hop node Construction prunes redundant entries from neighborhoods to reduce size of the cover9 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  10. 10. Finding Paths Using Joins To find paths between two nodes u and v Retrieve neighborhoods NBu and NBv Intersect NBuand NBv to obtain all hop nodes Reconstruct paths between u and v through hop nodes “steve” “steve” hop node “2009” “2009” “mary” “john” “mary” center node “alice” “acme” Intersection is performed as rank join Rank join requires input to be sorted10 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  11. 11. Index Storage Pruned neighborhoods are stored as path entries Path entry (w,s) for each hop node w in NBu Path entry index maps nodes to its Node Path Entries path entries (sorted) (w1, 1.0) u1 (w2, 2.0) (w3, 2.0) Path index u2 (w5, 1.0) Stores paths for all center nodes and … their path entries Used to reconstruct paths11 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  12. 12. KEYWORD QUERY PROCESSING12 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  13. 13. Keyword Query Processing Use joins to find connections between matching elements for all keywords Base inputs: keyword neighborhood for each keyword Union of matching elements’ neighborhoods Process Data access to retrieve keyword neighborhoods Joins to connect keyword matching elements steve john alice Are all possible plans valid?13 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  14. 14. Query Plans “john” No results! d=2 “steve” alice john steve “alice” Join order matters No single join order delivers all results (some might even be empty) We do not know in advance which orders deliver results Consider all possible join orders14 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  15. 15. Integrated Query Plan Join operators in all query plans: Query plans for different join orders overlap Share as many operators as possible Join operators with sharing: |K| N’(K) N(|K|, K) 2 2 1 3 12 6 4 72 24 5 480 10015 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  16. 16. Top-k Keyword-Join Processing High number of operators Terminate early after computing top-k instead of all results Rank join operators Top-k union operator Integrated Query Plan is a composition of many sub-plans Some sub-plans might produce no results Pull-based operators will block until result can be produced Use push-based operators: execution driven by inputs instead of results Some sub-plans might produce results earlier than others Rank not only results, but also rank operators16 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  17. 17. Operator Ranking Prefer operators that have “promising” results Global score of rank join operator, based on current results and upper bounds for subsequent join operations R: intermediate results NBK: keyword neighborhoods not yet covered Global score defined as Join operators have a global score when they have results ready Only the operator with the highest global score can push results to subsequent operators Otherwise, lower level data access operators are activated17 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  18. 18. EVALUATION18 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  19. 19. Evaluation Four approaches EASE: indexing at the level of graphs KJ: keyword join approach KJU: keyword join approach without operator ranking Datasets BTC: 10M triples DBLP1/5/10: 1M, 5M, 10M triples (from SP2Bench) 9 keyword queries for each dataset Reduction of index storage size 50% (DBLP1) – 79% (DBLP10)19 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  20. 20. Results KJ, KJU outperform EASE Operator ranking is beneficial20 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  21. 21. Results Benefit of operator ranking more pronounced for larger queries as these need more join operators21 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  22. 22. Conclusion Native keyword search based on data access and join d-length 2-Hop Cover Index at the level of paths, instead of graphs Top-k Keyword Join Exploration transformed into series of join operators Operator ranking Reduces storage requirement and increases performance22 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  23. 23. Thank you for your attention! Questions? Günter Ladwig, guenter.ladwig@kit.edu23 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  24. 24. BACKUP SLIDES24 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  25. 25. Introduction Keyword search on graph-structured data (RDF) Query Translation Translate keywords into structured query using schema knowledge Native Keyword Search No translation Match keywords to elements of the data graphs Find structures connecting these elements (Steiner graphs) More expensive than query translation approaches Preprocess data and create special indexes Reduces search space during online query processing Requires offline preprocessing and storage25 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  26. 26. Example Query: “alice malta peter” Malta l1 l1 Malta locatedIn locatedIn ABC Corp o1 o2 ABC Corp worksAt worksAt worksAt knows p3 knows p2 knows Alice p4 p1 Richard Peter Mary Match keyword elements Find connections between keyword elements26 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  27. 27. Problem Definition Given a graph GE=(NE,ER) Find Steiner graphs connection keyword elements27 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  28. 28. Scoring Assumption: more compact Steiner graphs are more relevant Scoring function GS: Steiner graph P: set of paths connecting its keyword elements Other functions possible, but not part of this work28 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  29. 29. Approaches Bidirectional Search Explore graph from keyword elements to find connections Does not scale well EASE Indexes neighborhood graphs to restrict search space for exploration Our approach Use database operations: data access and join Transform graph exploration into a series of join operations Improves storage requirements and performance29 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  30. 30. d-Length 2-Hop Cover Preliminaries Compact representation of connections in a graph Used to find paths between two nodes in a graph30 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  31. 31. Construction Trivial d-length 2-hop cover is the set of all d- neighborhoods of GE, but contains redundancies Finding a minimal 2-hop cover is NP-hard (Minimum Set Cover) Approximation algorithm Select a “best” node covering a large amount of paths Use its neighborhood to prune redundant paths from all other neighborhoods31 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  32. 32. Example: Pruning center node d=2 p3 hop node p2 knows knows knows prune worksAt worksAt knows p4 o1 p2 o2 p3 p1 locatedIn locatedIn worksAt knows worksAt knows l1 o1 p1 l2 o1 p4 Pruned paths between two nodes can be reconstructed by intersecting their neighborhoods Store each pruned neighborhood as a list of path entries32 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  33. 33. Neighborhood Join hop node o1 o1 o3 center node l1 p4 p4 p2 p3 p3 l2 Result: Keyword Graphs p4 o1 p2 stands for all paths of length d between p4 and p2 through o1 p4 p3 p2 ...33 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  34. 34. Graph Join Expand keyword graphs to keyword graph neighborhoods Keyword Graph Keyword Graph Neighborhood p4 o1 p2 p4 o1 p2 o3 p4 o1 p2 l2 l1 p4 o1 p2 ... Graph Join: joins keyword graph neighborhood with keyword neighborhood34 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  35. 35. Integrated Query Plan Number of join operators without operator sharing Number of join operators with operator sharing |K| N’(K) N(|K|, K) 2 2 1 3 12 6 4 72 24 5 480 10035 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)

×