Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. T...
Query Example <ul><li>SELECT  *  FROM  Jobs J,  Candidates C </li></ul><ul><li>WHERE  J.Salary <= 95 </li></ul><ul><li>  A...
What if the query answer is empty? <ul><li>SELECT  *  FROM  Jobs J,  Candidates C </li></ul><ul><li>WHERE  J.Salary <= 95 ...
Example Percentages of Empty Result Queries <ul><li>In a Customer Relationship Management (CRM) application developed by I...
Observations <ul><li>Different ways to adjust the conditions:  Select vs. Join  </li></ul><ul><li>How much to adjust each ...
Contributions <ul><li>Query relaxation   framework for selections and joins </li></ul><ul><li>Lattice -based approach for ...
Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><l...
Query Relaxation <ul><li>Top-k / Nearest neighbor </li></ul><ul><ul><li>Weight for each condition </li></ul></ul><ul><li>S...
Query Relaxation <ul><li>Skyline </li></ul><ul><li>Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operato...
Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><l...
Lattice -based Relaxation Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates … 90391...
Overview <ul><li>Motivation  </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><...
Relaxing Selection Conditions <ul><li>Algorithm: </li></ul><ul><li>Compute  Skyline  on Jobs </li></ul><ul><li>Compute  Sk...
Relaxing Selection Conditions <ul><li>Join First  Algorithm: </li></ul><ul><li>Compute the join (disregarding the selectio...
Relaxing Selection Condition <ul><li>Variations </li></ul><ul><li>Pruning Join </li></ul><ul><ul><li>Build the Skyline dur...
Relaxing all conditions <ul><li>Multi-Dim.-Index-based-Relaxation  Algorithm: </li></ul><ul><li>Traverse the index structu...
Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><l...
Variations <ul><li>Computing  Top-k  over Skyline </li></ul><ul><ul><li>Weight to each condition </li></ul></ul><ul><li>Qu...
Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><l...
Experimental Setting <ul><li>Datasets </li></ul><ul><ul><li>Real </li></ul></ul><ul><ul><ul><li>Internet Movie Database (I...
Different algorithms, different behaviors IMDB Dataset
Different datasets, different behaviors Correlated Dataset Anticorrelated Dataset Independent Dataset
How big is the Skyline?
Relaxing join takes time Self-join on Census Dataset
Top-k over Skyline IMDB Dataset
Related Work <ul><li>Muslea et al. </li></ul><ul><ul><li>Alternate forms of conjunctive expressions </li></ul></ul><ul><li...
Conclusions <ul><li>Query relaxation   framework for selections and joins </li></ul><ul><li>Lattice -based approach for qu...
Future Work <ul><li>Optimum  use of the lattice structure </li></ul><ul><li>Relax conditions on  string attributes </li></...
Questions ?
 
Skyline vs. Top-k
Skyline vs. Top-k over Skyline
Upcoming SlideShare
Loading in …5
×

Relaxing Join and Selection Queries - VLDB 2006 Slides

511 views

Published on

Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the \'minimal\' amount of relaxation that yields an answer. We first characterize the types of answers that we return to relaxed queries. We then propose a lattice based framework in order to aid query relaxation. Nodes in the lattice correspond to different ways to relax queries. We characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer. We then discuss how to traverse this lattice in a way that a non-empty query answer is obtained with the minimum amount of query condition relaxation. We implemented this framework and we present our results of a thorough performance evaluation using real and synthetic data. Our results indicate the practical utility of our framework.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Relaxing Join and Selection Queries - VLDB 2006 Slides

  1. 1. Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
  2. 2. Query Example <ul><li>SELECT * FROM Jobs J, Candidates C </li></ul><ul><li>WHERE J.Salary <= 95 </li></ul><ul><li> AND J.Zipcode = C.Zipcode </li></ul><ul><li> AND C.WorkExp >= 5; </li></ul>… 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  3. 3. What if the query answer is empty? <ul><li>SELECT * FROM Jobs J, Candidates C </li></ul><ul><li>WHERE J.Salary <= 95 </li></ul><ul><li> AND J.Zipcode = C.Zipcode </li></ul><ul><li> AND C.WorkExp >= 5; </li></ul><ul><li>Adjust the conditions </li></ul><ul><li>What conditions to adjust? </li></ul><ul><li>How to adjust them? </li></ul>
  4. 4. Example Percentages of Empty Result Queries <ul><li>In a Customer Relationship Management (CRM) application developed by IBM </li></ul><ul><ul><li>18.07% (3,396 empty result queries in 18,793 queries) </li></ul></ul><ul><li>In a real estate application developed by IBM </li></ul><ul><ul><li>5.75% </li></ul></ul><ul><li>In a digital library application [JCM + 00] </li></ul><ul><ul><li>10.53% </li></ul></ul><ul><li>In a bioinformatics application [RCP + 98] </li></ul><ul><ul><li>38% </li></ul></ul><ul><li>Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 </li></ul>
  5. 5. Observations <ul><li>Different ways to adjust the conditions: Select vs. Join </li></ul><ul><li>How much to adjust each condition? Salary <= 100 vs. Salary <= 120 </li></ul><ul><li>Adjust join vs. Adjust both selections </li></ul>Salary <= 95 WorkExp >= 5 … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  6. 6. Contributions <ul><li>Query relaxation framework for selections and joins </li></ul><ul><li>Lattice -based approach for query relaxation </li></ul><ul><li>Efficient relaxation algorithms </li></ul>
  7. 7. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
  8. 8. Query Relaxation <ul><li>Top-k / Nearest neighbor </li></ul><ul><ul><li>Weight for each condition </li></ul></ul><ul><li>Skyline </li></ul><ul><ul><li>No weights are needed </li></ul></ul><ul><ul><li>Conditions are not considered equal </li></ul></ul><ul><ul><li>Return non dominated points </li></ul></ul>
  9. 9. Query Relaxation <ul><li>Skyline </li></ul><ul><li>Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001 </li></ul>
  10. 10. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
  11. 11. Lattice -based Relaxation Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  12. 12. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
  13. 13. Relaxing Selection Conditions <ul><li>Algorithm: </li></ul><ul><li>Compute Skyline on Jobs </li></ul><ul><li>Compute Skyline on Candidates </li></ul><ul><li>Join the Skylines </li></ul>Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  14. 14. Relaxing Selection Conditions <ul><li>Join First Algorithm: </li></ul><ul><li>Compute the join (disregarding the selections) </li></ul><ul><li>Compute Skyline on join results </li></ul>Salary <= 95 WorkExp >= 5 Join Skyline … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
  15. 15. Relaxing Selection Condition <ul><li>Variations </li></ul><ul><li>Pruning Join </li></ul><ul><ul><li>Build the Skyline during the join </li></ul></ul><ul><li>Pruning Join+ </li></ul><ul><ul><li>Pruning Join </li></ul></ul><ul><ul><li>Build the local Skyline before the join </li></ul></ul><ul><li>Sorted Access Join </li></ul><ul><ul><li>Fagin’s Top-k: sort the columns on relaxation </li></ul></ul><ul><ul><li>Compute the join Skyline </li></ul></ul>
  16. 16. Relaxing all conditions <ul><li>Multi-Dim.-Index-based-Relaxation Algorithm: </li></ul><ul><li>Traverse the index structure top-down </li></ul><ul><li>Form pairs of nodes or records </li></ul><ul><li>Build the Skyline </li></ul>Skyline Queue
  17. 17. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
  18. 18. Variations <ul><li>Computing Top-k over Skyline </li></ul><ul><ul><li>Weight to each condition </li></ul></ul><ul><li>Queries with multiple joins </li></ul><ul><li>Conditions on nonnumeric attributes </li></ul><ul><ul><li>Dominance checking function </li></ul></ul>
  19. 19. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
  20. 20. Experimental Setting <ul><li>Datasets </li></ul><ul><ul><li>Real </li></ul></ul><ul><ul><ul><li>Internet Movie Database (IMDB) </li></ul></ul></ul><ul><ul><ul><ul><li>Movies (120k) & ActorInMovies (1.2m) </li></ul></ul></ul></ul><ul><ul><ul><li>Census-Income – UCI KDD Repository </li></ul></ul></ul><ul><ul><ul><ul><li>Census (200k) </li></ul></ul></ul></ul><ul><ul><li>Synthetic </li></ul></ul><ul><ul><ul><li>Independent, Correlated, and Anticorrelated </li></ul></ul></ul><ul><li>Implementation </li></ul><ul><ul><li>GNU C++ </li></ul></ul><ul><ul><li>Spatial Index Library (R-tree) </li></ul></ul><ul><ul><li>Linux, AMD Opteron 240, 1GB RAM </li></ul></ul>
  21. 21. Different algorithms, different behaviors IMDB Dataset
  22. 22. Different datasets, different behaviors Correlated Dataset Anticorrelated Dataset Independent Dataset
  23. 23. How big is the Skyline?
  24. 24. Relaxing join takes time Self-join on Census Dataset
  25. 25. Top-k over Skyline IMDB Dataset
  26. 26. Related Work <ul><li>Muslea et al. </li></ul><ul><ul><li>Alternate forms of conjunctive expressions </li></ul></ul><ul><li>Efficient Skyline algorithms </li></ul><ul><ul><li>Selection queries </li></ul></ul><ul><li>Efficient Top-k algorithms </li></ul><ul><ul><li>Require weights for conditions </li></ul></ul>
  27. 27. Conclusions <ul><li>Query relaxation framework for selections and joins </li></ul><ul><li>Lattice -based approach for query relaxation </li></ul><ul><li>Efficient relaxation algorithms </li></ul>
  28. 28. Future Work <ul><li>Optimum use of the lattice structure </li></ul><ul><li>Relax conditions on string attributes </li></ul><ul><li>Algorithms applicable outside the databases </li></ul>
  29. 29. Questions ?
  30. 31. Skyline vs. Top-k
  31. 32. Skyline vs. Top-k over Skyline

×