presentation(ppt)

377 views
334 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
377
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

presentation(ppt)

  1. 1. Containment of Partially Specified Tree-Pattern Queries Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE)
  2. 2. Introduction Data Model Additional Concepts Query Containment Experiments Conclusion
  3. 3. Motivating Example (  ) <ul><li>Tree structure (e.g. XML) with motorbike spare parts. </li></ul><ul><li>We search for spare parts. </li></ul><ul><li>BUT… </li></ul>r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ
  4. 4. Motivating Example (  ) <ul><li>Dimitri Theodoratos lives in NJ. </li></ul><ul><li>He has a Yamaha Serrow motorbike in Greece. </li></ul><ul><li>He searches for spare parts in Greece or USA. </li></ul><ul><li> structural difference </li></ul>r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ ?
  5. 5. Motivating Example (  ) <ul><li>Theodore Dalamagas has a BMW motorbike. </li></ul><ul><li>He looks for spare parts worldwide. </li></ul><ul><li> structural inconsistency </li></ul>../F650GS/650cc ../650cc/F650GS r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ
  6. 6. Motivating Example (  ) <ul><li>Stefanos Souldatos has a Honda Varadero. </li></ul><ul><li>But, he is not fully aware of the tree structure. </li></ul><ul><li> unknown structure </li></ul>r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ
  7. 7. Motivating Example (  ) <ul><li>Pawel Placek wants to buy a motorbike that he can easily find spare parts for. </li></ul><ul><li>He searches in many different tree structures. </li></ul><ul><li> source integration </li></ul>r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ
  8. 8. Motivation <ul><li> Querying tree-structured data </li></ul><ul><li>BUT </li></ul><ul><li> structure is not always strictly defined </li></ul><ul><li> user does not always deal with structure: </li></ul><ul><ul><li> Find Honda spare parts in Greece . </li></ul></ul>
  9. 9. Our Approach <ul><li>Dimensions : semantically related nodes. </li></ul><ul><li>Dimension Graphs : summary of the tree structure. </li></ul><ul><li>Query Language : partial specification of the structure (Partially Specified Tree-Pattern Queries). </li></ul><ul><li>We study the problem of Query Containment for Partially Specified Tree-Pattern Queries. </li></ul>
  10. 10. Introduction Data Model Additional Concepts Query Containment Experiments Conclusion
  11. 11. Dimension Graph R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS dimension graph = summary of the tree structure r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW TRAVEL 650cc F650 F650GS YAMAHA BMW ON-OFF 200cc SERROW TRAVEL F650GS 650cc NJ R C B T M L E
  12. 12. Dimension Graph… <ul><li>… offers a summary of the structure of the tree. </li></ul><ul><li>… provides the necessary semantics for query formulation. </li></ul><ul><li>… sets the framework for querying sources with structural differences and inconsistencies. </li></ul><ul><li>… supports query evaluation and optimization. </li></ul>R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS R C B T M L E
  13. 13. Partially Specified Tree-pattern Query <ul><li>Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece . (+ structural info) </li></ul>R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS R C B T M L E C = {Greece} B = {BMW} M = ? B = {BMW} E = ?
  14. 14. Partially Specified Tree-pattern Query <ul><li>Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece . (+ structural info) </li></ul>partially specified paths (PSP) R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS R C B T M L E C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1
  15. 15. Partially Specified Tree-pattern Query <ul><li>Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece . (+ structural info) </li></ul>output path (*) partially specified paths (PSP) R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS R C B T M L E C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1
  16. 16. Partially Specified Tree-pattern Query <ul><li>Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece . (+ structural info) </li></ul>parent child output path (*) partially specified paths (PSP) ancestor descendant R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS R C B T M L E C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1
  17. 17. Partially Specified Tree-pattern Query <ul><li>Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece . (+ structural info) </li></ul>parent child node sharing expression (NSE) output path (*) partially specified paths (PSP) ancestor descendant R (oot) C (ountry) B (rand) T (ype) L (ocation) M (odel) E (ngine) DIMENSIONS R C B T M L E C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1
  18. 18. Introduction Data Model Additional Concepts Query Containment Experiments Conclusion
  19. 19. Additional Concepts Full Form Query C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1 C = {Greece}
  20. 20. Additional Concepts Full Form Query Dimension Trees DIMENSION TREES = QUERY + GRAPH R C B T M L E R C = {Greece} B = {BMW} T M E C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1 C = {Greece}
  21. 21. Introduction Data Model Additional Concepts Query Containment Experiments Conclusion
  22. 22. Absolute Containment Q1  Q2 Each result of Q1 is a result of Q2. 
  23. 23. Absolute Containment Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to Q1
  24. 24. Absolute Containment Q1  Q2 Each result of Q1 is a result of Q2.  Q1 Q2 homomorphism from Q2 to Q1 PSP p2 PSP *p1 PSP p4 PSP *p3 C B M B E C C M E C
  25. 25. Relative Containment (w.r.t. G) Q1  G Q2 Each result of Q1 in G is a result of Q2 in G. 
  26. 26. Relative Containment (w.r.t. G) Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1
  27. 27. Relative Containment (w.r.t. G) Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  A dimension tree of Q1 A dimension tree of Q2 homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1 R C B T E M R C B T E
  28. 28. Relative Containment Heuristic 1 msec Absolute Containment (AC) 1 00msec Relative Containment (RC)
  29. 29. Relative Containment Heuristic <ul><li> sound but not complete </li></ul><ul><li>extract structural information from the Dimension Graph </li></ul><ul><li>insert it in the query Q1 </li></ul><ul><li>check Q1  Q2 instead of Q1  G Q2 </li></ul>1 msec Absolute Containment (AC) 1 00msec Relative Containment (RC) Relative Containment Heuristic (RCH)
  30. 30. Relative Containment Heuristic <ul><li>Example </li></ul>B = ? T = ? PSP *p1 B = ? PSP *p2 C = ? Q1 Q 2 Q1  Q2 R C B T M L E
  31. 31. Relative Containment Heuristic <ul><li>Example </li></ul>B = ? T = ? PSP *p1 B = ? PSP *p2 C = ? Q1 Q 2 Q1  Q2 B=>T : R->C, C=>B R C B T M L E
  32. 32. Relative Containment Heuristic <ul><li>Example </li></ul>B = ? T = ? PSP *p1 B = ? PSP *p2 C = ? Q1 Q 2 Q1  Q2 C = ? R = ? Q1  G Q2 B=>T : R->C, C=>B R C B T M L E
  33. 33. Introduction Data Model Additional Concepts Query Containment Experiments Conclusion
  34. 34. Experiments <ul><li>We measured… </li></ul><ul><ul><li>execution time for </li></ul></ul><ul><ul><ul><li>Absolute Containment (AC) </li></ul></ul></ul><ul><ul><ul><li>Relative Containment (RC) </li></ul></ul></ul><ul><ul><ul><li>Relative Containment Heuristic (RCH) </li></ul></ul></ul><ul><ul><li>accuracy for RCH </li></ul></ul><ul><li>… for various graph sizes </li></ul><ul><li>… for various query sizes </li></ul>
  35. 35. Time Time (msec) Graph paths: 10 - 80 Graph dimensions: 20 Graph dimensions: 30 Graph dimensions: 40 Graph paths: 15 - 120 Graph paths: 20 - 160 Query PSPs: 1 Query PSPs: 2 Time (msec) Nodes per PSP: 3 - 6 Nodes per PSP: 3 - 6 RC RCH AC RC RCH AC RCH AC RC RCH AC RCH AC RC RC
  36. 36. Accuracy of RCH <ul><li>80% for graphs of common sizes </li></ul><ul><ul><li>based on XML benchmarks (XMach, XMark, etc.) </li></ul></ul><ul><li>50% for graphs of higher density </li></ul>
  37. 37. Introduction Data Model Additional Concepts Query Containment Experiments Conclusion
  38. 38. Conclusion <ul><li>Query Containment for Partially Specified Tree-Pattern Queries (PSTPQs). </li></ul><ul><li>Sound technique for checking Relative Query Containment </li></ul><ul><ul><li>Time: one order of magnitude </li></ul></ul><ul><ul><li>Accuracy: over 8 0% </li></ul></ul>
  39. 39. Future Work <ul><li>Heuristics for checking Relative Containment </li></ul><ul><ul><li>precomputed and on-the-fly </li></ul></ul><ul><ul><li>trade-off between time and accuracy </li></ul></ul><ul><li>Special forms of queries, e.g. swings: </li></ul>B PSP *p3 PSP p1 B A A C C PSP p2
  40. 40. Questions?
  41. 41. Links <ul><li>Introduction ( 2-9 ) </li></ul><ul><li>Data Model ( 10-17 ) </li></ul><ul><li>Additional Concepts ( 18-20 ) </li></ul><ul><li>Query Containment ( 21-32 ) </li></ul><ul><li>Experiments (3 3-36 ) </li></ul><ul><li>Conclusion (37-41) </li></ul><ul><li>Appendix (42-46) </li></ul>
  42. 42. Appendix
  43. 43. Who defines the dimensions? <ul><li>Automatic </li></ul><ul><ul><li>XML tags (dimension graph = “path summary”, “path index”, “structural summary”) </li></ul></ul><ul><li>Semi-automatic </li></ul><ul><ul><li>Graph administrator + XML tags </li></ul></ul><ul><ul><li>(dimension = group of XML tags) </li></ul></ul><ul><ul><li>Graph administrator + ontology </li></ul></ul><ul><li>Manual </li></ul><ul><ul><li>Graph administrator </li></ul></ul>
  44. 44. Inference Rules 1. Full Form Query INFERENCE RULES (IR1) |- R[p1]  R[p2] (IR2) A[p1]  A[p2], A[p2]  A[p3] |- A[p1]  A[p3] (IR3) a structural expression that involves A[p] |- R[p] => A[p] (IR4) A[p]  B[p] |- A[p] => B[p] (IR5) A[p] => B[p], B[p] => C[p] |- A[p] => C[p] (IR6) A[p]  B[p], A[p => C[p] |- B[p] => C[p] (IR7) A[p]  B[p], C[p] => B[p] |- C[p] => A[p] (IR8) A[p1]  B[p1], B[p1]  B[p2] |- A[p2]  B[p2] (IR9) A[p1] => B[p1], B[p1]  B[p2] |- A[p2] => B[p2] (IR10) A[p1] => B[p1], A[p1]  A[p2], R[p2] => B[p2] |- A[p2] => B[p2] (IR11) A[p1] => B[p1], B[p1]  B[p2] |- A[p1]  A[p2] (IR12) A[p1]  B[p1], C[p2]  B[p2], D[p1]  D[p2] |- D[p1] => A[p1] (IR13) A[p1]  B[p1], A[p2]  C[p2], D[p1]  D[p2] |- D[p1] => A[p1] (IR14) A[p1] => B[p1], B[p2] => A[p2], C[p1]  C[p2] |- C[p1] => A[p1] R C B T M L E C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1 C = {Greece}
  45. 45. Dimension Trees r/Greece/BMW / * T [* E ]/* M r/Greece/BMW/ * T /* M [* E ] r/Greece/BMW/ * T [* M /* E ]/* E * M r/Greece/BMW/ * T /* E /* M R C B T M L E R C = {Greece} B = {BMW} T M E R C = {Greece} B = {BMW} T M E R C = {Greece} B = {BMW} T E M R C = {Greece} B = {BMW} T M E E M C = {Greece} B = {BMW} M = ? B = {BMW} E = ? PSP *p2 PSP p1 C = {Greece}
  46. 46. Previous Approaches <ul><li>Keyword-based search approach </li></ul><ul><ul><li>Absence of structure </li></ul></ul><ul><li>Naive approach </li></ul><ul><ul><li>All possible query patterns are generated </li></ul></ul><ul><ul><li>(Honda=>Greece, Greece=>Honda) </li></ul></ul><ul><li>Approximation techniques </li></ul><ul><ul><li>Relax the query  more answers </li></ul></ul><ul><li>Traditional integration approach </li></ul><ul><ul><li>Global structure and mapping rules </li></ul></ul>

×