Relaxing Join and Selection Queries - VLDB 2006 Slides

436 views

Published on

Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the \'minimal\' amount of relaxation that yields an answer. We first characterize the types of answers that we return to relaxed queries. We then propose a lattice based framework in order to aid query relaxation. Nodes in the lattice correspond to different ways to relax queries. We characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer. We then discuss how to traverse this lattice in a way that a non-empty query answer is obtained with the minimum amount of query condition relaxation. We implemented this framework and we present our results of a thorough performance evaluation using real and synthetic data. Our results indicate the practical utility of our framework.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
436
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Make a “real” story; companies are very eager to find the people they want. We assume that closer zip codes mean closer areas.
  • Queries can return nothing It is important to have results Automatically do relaxation!
  • Efficiency is a big issue
  • Skyline is not the only way. Skyline does not care which one is more important. We are not comparing apples with oranges.
  • In this diagram we are not relaxing join conditions. Each point is a join pair.
  • Sometimes we might not want to relax the join (e.g.: attribute is an ID) Relaxation is done automatically by the system
  • Skyline as a relational algebra operator with various properties
  • Main idea of the algorithms. For more details see the paper.
  • Index exists, e.g., R-tree; works with other types of multi-dimensional indices Children Queue: Enqueue, Dequeue
  • Explain the Top-k over Skyline
  • We present just a few of our results, for more details see the paper.
  • Skyline size depends on cardinality, number of selections, and data size.
  • Muslea deals primarily with expressibility issues without paying attention to the data management issues involved. We relax queries with selection and join conditions. Other studies assume that the attributes and ordering of the values are already pre-determined in a single table; our work require us to compute skyline dynamically for a set of tables which are to be join and whose attribute values must be determined on the fly. Our work considers both the selection and join conditions for relaxation.
  • Efficiency is a big issue
  • Relaxing Join and Selection Queries - VLDB 2006 Slides

    1. 1. Relaxing Join and Selection Queries Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
    2. 2. Query Example <ul><li>SELECT * FROM Jobs J, Candidates C </li></ul><ul><li>WHERE J.Salary <= 95 </li></ul><ul><li> AND J.Zipcode = C.Zipcode </li></ul><ul><li> AND C.WorkExp >= 5; </li></ul>… 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
    3. 3. What if the query answer is empty? <ul><li>SELECT * FROM Jobs J, Candidates C </li></ul><ul><li>WHERE J.Salary <= 95 </li></ul><ul><li> AND J.Zipcode = C.Zipcode </li></ul><ul><li> AND C.WorkExp >= 5; </li></ul><ul><li>Adjust the conditions </li></ul><ul><li>What conditions to adjust? </li></ul><ul><li>How to adjust them? </li></ul>
    4. 4. Example Percentages of Empty Result Queries <ul><li>In a Customer Relationship Management (CRM) application developed by IBM </li></ul><ul><ul><li>18.07% (3,396 empty result queries in 18,793 queries) </li></ul></ul><ul><li>In a real estate application developed by IBM </li></ul><ul><ul><li>5.75% </li></ul></ul><ul><li>In a digital library application [JCM + 00] </li></ul><ul><ul><li>10.53% </li></ul></ul><ul><li>In a bioinformatics application [RCP + 98] </li></ul><ul><ul><li>38% </li></ul></ul><ul><li>Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006 </li></ul>
    5. 5. Observations <ul><li>Different ways to adjust the conditions: Select vs. Join </li></ul><ul><li>How much to adjust each condition? Salary <= 100 vs. Salary <= 120 </li></ul><ul><li>Adjust join vs. Adjust both selections </li></ul>Salary <= 95 WorkExp >= 5 … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
    6. 6. Contributions <ul><li>Query relaxation framework for selections and joins </li></ul><ul><li>Lattice -based approach for query relaxation </li></ul><ul><li>Efficient relaxation algorithms </li></ul>
    7. 7. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
    8. 8. Query Relaxation <ul><li>Top-k / Nearest neighbor </li></ul><ul><ul><li>Weight for each condition </li></ul></ul><ul><li>Skyline </li></ul><ul><ul><li>No weights are needed </li></ul></ul><ul><ul><li>Conditions are not considered equal </li></ul></ul><ul><ul><li>Return non dominated points </li></ul></ul>
    9. 9. Query Relaxation <ul><li>Skyline </li></ul><ul><li>Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001 </li></ul>
    10. 10. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
    11. 11. Lattice -based Relaxation Salary <= 95 WorkExp >= 5 R – select on Jobs J – join condition S – select on Candidates … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
    12. 12. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
    13. 13. Relaxing Selection Conditions <ul><li>Algorithm: </li></ul><ul><li>Compute Skyline on Jobs </li></ul><ul><li>Compute Skyline on Candidates </li></ul><ul><li>Join the Skylines </li></ul>Salary <= 95 WorkExp >= 5 INCORRECT Skyline Skyline Empty Join Skyline … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
    14. 14. Relaxing Selection Conditions <ul><li>Join First Algorithm: </li></ul><ul><li>Compute the join (disregarding the selections) </li></ul><ul><li>Compute Skyline on join results </li></ul>Salary <= 95 WorkExp >= 5 Join Skyline … 90391 82632 92612 93652 Zipcode … IBM Microsoft Intel Broadcom Company … … ... … … ... 1 150 C4 130 90391 J4 5 100 C3 120 82632 J3 6 130 C2 95 93652 J2 3 120 C1 80 92047 J1 WorkExp ExpSalary ID Salary Zipcode ID Candidates Jobs
    15. 15. Relaxing Selection Condition <ul><li>Variations </li></ul><ul><li>Pruning Join </li></ul><ul><ul><li>Build the Skyline during the join </li></ul></ul><ul><li>Pruning Join+ </li></ul><ul><ul><li>Pruning Join </li></ul></ul><ul><ul><li>Build the local Skyline before the join </li></ul></ul><ul><li>Sorted Access Join </li></ul><ul><ul><li>Fagin’s Top-k: sort the columns on relaxation </li></ul></ul><ul><ul><li>Compute the join Skyline </li></ul></ul>
    16. 16. Relaxing all conditions <ul><li>Multi-Dim.-Index-based-Relaxation Algorithm: </li></ul><ul><li>Traverse the index structure top-down </li></ul><ul><li>Form pairs of nodes or records </li></ul><ul><li>Build the Skyline </li></ul>Skyline Queue
    17. 17. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
    18. 18. Variations <ul><li>Computing Top-k over Skyline </li></ul><ul><ul><li>Weight to each condition </li></ul></ul><ul><li>Queries with multiple joins </li></ul><ul><li>Conditions on nonnumeric attributes </li></ul><ul><ul><li>Dominance checking function </li></ul></ul>
    19. 19. Overview <ul><li>Motivation </li></ul><ul><li>Query Relaxation </li></ul><ul><li>Lattice-based Relaxation </li></ul><ul><li>Relaxation Algorithms </li></ul><ul><li>Variations </li></ul><ul><li>Experiments </li></ul>
    20. 20. Experimental Setting <ul><li>Datasets </li></ul><ul><ul><li>Real </li></ul></ul><ul><ul><ul><li>Internet Movie Database (IMDB) </li></ul></ul></ul><ul><ul><ul><ul><li>Movies (120k) & ActorInMovies (1.2m) </li></ul></ul></ul></ul><ul><ul><ul><li>Census-Income – UCI KDD Repository </li></ul></ul></ul><ul><ul><ul><ul><li>Census (200k) </li></ul></ul></ul></ul><ul><ul><li>Synthetic </li></ul></ul><ul><ul><ul><li>Independent, Correlated, and Anticorrelated </li></ul></ul></ul><ul><li>Implementation </li></ul><ul><ul><li>GNU C++ </li></ul></ul><ul><ul><li>Spatial Index Library (R-tree) </li></ul></ul><ul><ul><li>Linux, AMD Opteron 240, 1GB RAM </li></ul></ul>
    21. 21. Different algorithms, different behaviors IMDB Dataset
    22. 22. Different datasets, different behaviors Correlated Dataset Anticorrelated Dataset Independent Dataset
    23. 23. How big is the Skyline?
    24. 24. Relaxing join takes time Self-join on Census Dataset
    25. 25. Top-k over Skyline IMDB Dataset
    26. 26. Related Work <ul><li>Muslea et al. </li></ul><ul><ul><li>Alternate forms of conjunctive expressions </li></ul></ul><ul><li>Efficient Skyline algorithms </li></ul><ul><ul><li>Selection queries </li></ul></ul><ul><li>Efficient Top-k algorithms </li></ul><ul><ul><li>Require weights for conditions </li></ul></ul>
    27. 27. Conclusions <ul><li>Query relaxation framework for selections and joins </li></ul><ul><li>Lattice -based approach for query relaxation </li></ul><ul><li>Efficient relaxation algorithms </li></ul>
    28. 28. Future Work <ul><li>Optimum use of the lattice structure </li></ul><ul><li>Relax conditions on string attributes </li></ul><ul><li>Algorithms applicable outside the databases </li></ul>
    29. 29. Questions ?
    30. 31. Skyline vs. Top-k
    31. 32. Skyline vs. Top-k over Skyline

    ×