Approximate and Incremental Processing ofComplex Queries against the Web of DataThanh Tran, Günter Ladwig, Andreas WagnerD...
Contents                                                       Approximate       Introduction                 Overview    ...
INTRODUCTION3   August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics and Formal Description ...
Introduction – Data Model    Resource Description Framework (RDF)                                                         ...
Introduction – Query Model    Basic Graph Patterns       Conjunctive queries over RDF data: graph pattern matching        ...
Contribution       Techniques for matching (basic) query patterns against graph-       structured data have limits       W...
Contribution – Pipeline Overview       Pipeline of operations where approximate results are refined       incrementally   ...
Approximate       Structure-                                 Structure-    Entity Search                       Structure  ...
Entity Search       Entity index               Stores attribute edges of the data graph               Enables lookup of en...
Query Decomposition & Transformation                                                    AIFB       name                   ...
Query Decomposition & Transformation                                              AIFB         name                       ...
Entity Search Results        Use entity index to obtain bindings for all entity queries in        transformed query       ...
Approximate       Structure-                                 Structure- Entity Search                          Structure  ...
Approximate Structure Matching        Only entity parts of the query have been matched        Relation edges have yet to b...
Neighborhood Join via Bloom Filters        We store the set of k-neighborhood entities as a bloom filter        Bloom filt...
Neighborhood Join via Bloom Filters                                                                     AIFB              ...
Approximate       Structure-                                 Structure- Entity Search                          Structure  ...
Structure-based Result Refinement        From ASM we know that entities in intermediate results are        connected      ...
Structure Index                                                                      Bisimulation                         ...
Structure-based Result Refinement        We take advantage of this property:          Whenever there is a match of a query...
Approximate       Structure-                                 Structure- Entity Search                          Structure  ...
Structure-based Answer Compution        Finally, results which exactly match the query are computed by        the last ref...
EVALUTION23   August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics and Formal Description Me...
Evaluation        Systems                INC: the proposed approach                VP: join processing using vertical part...
Results – Average Processing Time25    August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics ...
Results – Average Processing Time     Neighborhood Distance26    August 31st, 2011   DEXA 2011, Toulouse, France   Institu...
Results – Precision vs. Time27    August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics and F...
Results - Precision28    August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics and Formal Des...
Conclusion        We proposed a novel process for approximate and        incremental processing of complex graph pattern q...
30   August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics and Formal Description Methods (AI...
BACKUP SLIDES31   August 31st, 2011   DEXA 2011, Toulouse, France   Institute of Applied Informatics and Formal Descriptio...
Upcoming SlideShare
Loading in...5
×

Approximate and Incremental Processing of Complex Queries against the Web of Data

284

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
284
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Approximate and Incremental Processing of Complex Queries against the Web of Data

  1. 1. Approximate and Incremental Processing ofComplex Queries against the Web of DataThanh Tran, Günter Ladwig, Andreas WagnerDEXA 2011Institute of Applied Informatics and Formal Description Methods (AIFB)KIT – University of the State of Baden-Württemberg andNational Large-scale Research Center of the Helmholtz Association www.kit.edu
  2. 2. Contents Approximate Introduction Overview & Incremental Evaluation Conclusion Processing Structure-based Approximate Result Entity Search Structure Refinement and Matching Computation2 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  3. 3. INTRODUCTION3 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  4. 4. Introduction – Data Model Resource Description Framework (RDF) conference a1 c1 authorOf super- authorOf vises name p2 p1 p5 P2 P5 worksAt name worksAt knows i1 u1 partOf p4 p3 super- name vises worksAt authorOf U1 a2 i2 conference partOf c2 u24 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  5. 5. Introduction – Query Model Basic Graph Patterns Conjunctive queries over RDF data: graph pattern matching AIFB name KIT partOf name z u worksAt supervise w x y v name age author conf ICDE 295 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  6. 6. Contribution Techniques for matching (basic) query patterns against graph- structured data have limits We might wish to trade completeness and exactness for responsiveness Our approach allows an “affordable” computation of an initial set of approximate results, which can be incrementally refined as needed.6 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  7. 7. Contribution – Pipeline Overview Pipeline of operations where approximate results are refined incrementally Intermediate,Approximate Results Approximate Structure- Structure- Entity Search Structure based Result based Answer Matching Refinement Computation Entity & Structure Neighborhood Relation Index Index Index7 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  8. 8. Approximate Structure- Structure- Entity Search Structure based Result based Answer Matching Refinement Computation ENTITY SEARCH8 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  9. 9. Entity Search Entity index Stores attribute edges of the data graph Enables lookup of entities by attribute and value Entity search Obtains candidate bindings for all variables in the query that have attribute edges Does not consider structure (i.e., relations between entities) Query decomposition and transformation Decompose query into entity queries to create a transformed query9 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  10. 10. Query Decomposition & Transformation AIFB name KIT partOf name z u worksAt supervise w x y v age author conf name ICDE 29 Identify entity queries Breadth-first search starting from random variable10 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  11. 11. Query Decomposition & Transformation AIFB name KIT partOf name z u worksAt supervise w x y v age author conf name ICDE 29 Collapse entity queries z partOf u name AIFB name KIT worksAt w supervise x y v age 29 author conf name ICDE11 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  12. 12. Entity Search Results Use entity index to obtain bindings for all entity queries in transformed query Entity queries are necessary conditions, x z u v but not sufficient p1 i1 u1 c1 Final results will be a subset p3 i1 u1 c1 p5 i1 u1 c1 p6 i1 u1 c1 z partOf u name AIFB name KIT worksAt w supervise x y v age 29 author conf name ICDE12 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  13. 13. Approximate Structure- Structure- Entity Search Structure based Result based Answer Matching Refinement Computation APPROXIMATE STRUCTURE MATCHING13 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  14. 14. Approximate Structure Matching Only entity parts of the query have been matched Relation edges have yet to be processed Instead of performing exact equijoins we propose to perform a neighborhood join The k-neighborhood of an entity e is the set of entities in the data graph that can be reached from e via a path of relation edges of length k or less. Neighborhood join allows us to check whether two entities are connected via relation edges (but not which ones) A neighborhood join between two sets of entities E1, E2 is an equijoin between all pairs e1 ∈ E1, e2 ∈ E2 where e1 and e2 are considered equivalent if the intersection of their k-neighborhood is non-empty. Again: necessary, but not sufficient14 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  15. 15. Neighborhood Join via Bloom Filters We store the set of k-neighborhood entities as a bloom filter Bloom filter Space-efficient, probabilistic data structure for set membership test False positives are possible (false negatives are not) We refine the results of the previous step To perform a neighborhood join between bindings E1, E2 Load bloom filters for one set of entities, say E1 In a nested loop manner, check if entities in E2 are contained in the bloom filter15 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  16. 16. Neighborhood Join via Bloom Filters AIFB name KIT partOf name z u worksAt supervise w x y v age author conf name ICDE 29 k=1 k=2 Load bloom filters for entities bound to x Check whether entities bound to w,y, z are in the neighborhood of x When k=2, bloom filters for x also cover u and v16 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  17. 17. Approximate Structure- Structure- Entity Search Structure based Result based Answer Matching Refinement Computation STRUCTURE-BASED RESULT REFINEMENT17 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  18. 18. Structure-based Result Refinement From ASM we know that entities in intermediate results are connected Necessary, but not sufficient. With structure-based result refinement we find out whether they are connected via paths captured by query atoms Query is matched against a structure index graph Bisimulation-based summary of data graph that captures structural information Nodes in the data graph with the same “structure” are grouped together Much smaller than the data graph18 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  19. 19. Structure Index Bisimulation conference a1 c1 authorOf super- authorOf vises p2 p1 p5 worksAt worksAt knows worksAt partOf E6 E3 E5 i1 u1 p5 i1,i2 u1, u2 partOf p4 p3 super- vises worksAt authorOf worksAt authorOf a2 i2 E1 E2 E4 E6 p2,p4 super- p1,p3 authorOf a1,a2 conference c1,c2 conference partOf vises c2 u2 knows Structure Index Graph G~ Data graph G19 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  20. 20. Structure-based Result Refinement We take advantage of this property: Whenever there is a match of a query graph q on G the query also matches on G~. Moreover, extensions of the index graph matches will contain all data graph matches, i.e. the bindings to query variables. Match the query against the structure index graph to obtain sets of extensions that contain potential query answers Bindings computed in previous ES/ASM steps can only be answers if they are contained in the matched extensions20 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  21. 21. Approximate Structure- Structure- Entity Search Structure based Result based Answer Matching Refinement Computation STRUCTURE-BASED ANSWER COMPUTATION21 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  22. 22. Structure-based Answer Compution Finally, results which exactly match the query are computed by the last refinement. Only for this step, we actually perform joins on the data.22 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  23. 23. EVALUTION23 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  24. 24. Evaluation Systems INC: the proposed approach VP: join processing using vertical partitioning with sextuple indexing Datasets DBLP: 13M triples LUBM: 0.7M – 6.7M triples Queries Generated 80 queries via random sampling Different shapes: path, star, graph24 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  25. 25. Results – Average Processing Time25 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  26. 26. Results – Average Processing Time Neighborhood Distance26 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  27. 27. Results – Precision vs. Time27 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  28. 28. Results - Precision28 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  29. 29. Conclusion We proposed a novel process for approximate and incremental processing of complex graph pattern queries Initial results are computed in a small fraction of total time and the incrementally refined via approximate matching at low cost Increased responsiveness as inexact results are available early Users can decide if and for which result exactness and completeness is desirable Experiments show that our approach is relatively fast w.r.t. exact and complete results, indicating that the proposed mechanism is able to reuse intermediate results29 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  30. 30. 30 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  31. 31. BACKUP SLIDES31 August 31st, 2011 DEXA 2011, Toulouse, France Institute of Applied Informatics and Formal Description Methods (AIFB)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×