Your SlideShare is downloading. ×
Full Disjunction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Full Disjunction

411
views

Published on

Published in: Economy & Finance, Travel

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
411
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Full Disjunctions : Polynomial-Delay Iterators in Action VLDB 2006 Seoul, Korea Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University Israel Yehoshua Sagiv Hebrew University Israel Itzhak Fadida Technion Israel
  • 2. Computing Full Disjunctions
    • The full disjunction is a relational operator that maximally combines data from several relations
      • It extends the natural join by allowing incompleteness
      • It extends the binary outerjoin to many relations
    • This paper presents algorithms and optimizations for computing full disjunctions
      • Theoretically, full disjunctions are more tractable than previously known
      • Practically, a significant improvement over the state-of-art, an iterator -like evaluation
  • 3. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 4. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 5. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
  • 6. The Natural Join Misses Information Climates Accommodations Sites Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada Stars Hotel Climate City Site Country
  • 7. The Natural Join Misses Information Climates Accommodations Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada Empty space means null value
  • 8. The Natural Join Misses Information Climates Accommodations A looser notion of join is needed — one that enables joining tuples from some of the tables Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada
  • 9. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites A tuple of the join corresponds to a set of tuples from the source relations Join consistent Connected No Cartesian product Complete One tuple from each relation Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
  • 10. Join-Consistent Sets of Tuples A set T of tuples is join-consistent if every two tuples of T are join-consistent Two tuples t 1 and t 2 are join-consistent if for every common attribute A : 1. t 1 [ A ] and t 2 [ A ] are non-null 2. t 1 [ A ] = t 2 [ A ] Ramada London Canada Stars Hotel City Country Air Show London Canada Site City Country
  • 11. Connected Sets of Tuples
    • The nodes are the tuples of T
    • An edge between every two tuples with a common attribute
    The join graph of a set T of tuples: A set of tuples is connected if its join graph is connected diverse Canada Climate Country Buckingham London UK Site City Country 4 Plaza Toronto Stars Hotel City
  • 12. Natural Join (w/o Cartesian Product) Each tuple of the result corresponds to a set T of tuples from the source relations T is join consistent 1. T is connected No Cartesian product 2. T is complete One tuple from each relation 3. JCC
  • 13. Full Disjunction (Galindo-Legaria 1994) T is join consistent 1. Each tuple of the result corresponds to a set T of tuples from the source relations T is connected No Cartesian product 2. T is complete One tuple from each relation 3. T is maximal Not properly contained in any JCC set 3. JCC
  • 14. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country
  • 15. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada
  • 16. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada
  • 17. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada
  • 18. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
  • 19. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
  • 20. Padding Joined Tuple Sets with Nulls Mouth Logan Canada Site City Country diverse Canada Climate Country Mouth Logan diverse Canada Stars Hotel Climate City Site Country
  • 21. The Outerjoin Operator The outerjoin of two relations R 1 and R 2 R 1 R 2 The natural join R 1 R 2 and, in addition, all dangling tuples padded with nulls
  • 22. Example of an Outerjoin Climates Accommodations temperate UK tropical Bahamas diverse Canada Climate Country 4 Atala Paris France Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country temperate UK Hilton Nassau tropical Bahamas diverse Climate Paris Toronto City Atala Plaza Hotel 4 France 4 Canada Stars Country Climates Accommodations
  • 23. Combining Relations using Outerjoins The outerjoin operator is not associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins cannot maximally combine relations (no matter what order is used) Outerjoin is not suitable for combining more than two relations !
  • 24. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 25. Efficiency of Evaluation The full-disjunction operator (as well as other operators like the Cartesian product or the natural join ) can generate an exponential (in the input size) number of tuples Polynomial running time is not a suitable yardstick The usual notion: Polynomial time in the combined size of the input and the output
  • 26. History of Algorithms for Full Disjunctions n : N : F : number of relations number of tuples in the DB number of tuples in the FD This paper: linear dependence on F F is typically very large Can be exponential in the size of the database Source Time Databases RU96 O ( n + F 2 )  -acyclic KS03 O ( n 5  N 2  F 2 ) general CS05 O ( n 3  N  F 2 ) “ incremental polynomial” general
  • 27. Polynomial Delay One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that acts as an iterator with an efficient next () operator, that is, An enumeration algorithm that runs with polynomial delay An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input time
  • 28. Other Benefits of Polynomial Delay
    • Incremental evaluation
      • First tuples are generated quickly
        • Full disjunctions are large, yet the user need not wait for the whole result to be generated
      • Suitable for Web applications, where users expect to get the first few pages quickly
        • In addition, the user can decide anytime that enough information has been shown
    • Enable parallel query processing
      • While one processor generates the FD tuples, other processors apply further processing
  • 29. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 30. Main Contributions 1. First algorithm for computing full disjunctions with polynomial delay 2. First algorithm for computing full disjunctions in time linear in the output 3. A general optimization technique for computing full disjunctions Division into biconnected components Substantial improvement over the state-of-art is proved theoretically and experimentally
  • 31. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 32. Our Algorithms Algorithm NLOJ Tree Schemes Algorithm PDelayFD General Schemes Division into Biconnected Components Optimization Algorithm BiComNLOJ Main Algorithm − General Schemes Combine
  • 33. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 34. Tree Schemes Scheme graphs w/o cycles In the scheme graph , the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes R 1 R 2 R 3 R 4 R 5 R 6 R 7
  • 35. Left-Deep Sequence of Outerjoins R : a set of relations with a tree scheme R 1 ,…, R n : a connected-prefix order of R 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order FD ( R ) = (…(( R 1 R 2 ) R 3 ) …) R n Proposition: Algorithm NLOJ ( N ested L oop O uter J oin)
  • 36. Connected-Prefix Order of Relations R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 1 R 3 R 2 R 7 R 4 R 5 R 6 A connected-prefix order of relations: Each prefix forms a (connected) subtree
  • 37. Achieving Polynomial Delay 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order R 1 … Problem: exp. delay Solution: use iterators Algorithm NLOJ ( N ested L oop O uter J oin) R 2 R 3 R n -1 R n Already exponential size !
  • 38. Iterators Algorithm
    • Operate on top of an enumeration algorithm
    • Implement next () by controlling the execution
    To obtain polynomial delay, we use iterators Iterator next ()
  • 39. Using Iterators for Outerjoins R 1 … Iterator 1 Iterator n Iterator 2 Iterator n -1 R 2 R 3 R n -1 R n
  • 40. Outerjoins are not Always Applicable It is not always possible to formulate a full disjunction as a left-deep sequence of outerjoins Rajaraman and Ullman [PODS 96] : Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)
  • 41. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 42. About the Algorithm
    • Unlike NLOJ , the next algorithm, PDelayFD , is applicable to all schemes (and not just trees)
    • Algorithm PDelayFD has a polynomial delay , but the delay is larger than that of NLOJ
    • Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art
  • 43. Shifting a Maximal JCC Tuple Set T t -shifting T : t t t t -shift of T 1. Add t to T 2. Extract max. JCC subset containing t 3. Extend to a maximal JCC set T
  • 44. Algorithm PDelayFD Validate that the t -shift is not already in Q or C 1. Generate a max. JCC set T 0 2. Insert T 0 into Q Repeat until Q is empty : 1. Move some T from Q to C 2. Print the join of T , padded with nulls 3. Insert into Q a t -shift of T for all tuples t in the database Output : … PDelayFD ( R ) computes FD ( R ) with polynomial delay C Q Theorem:
  • 45. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 46. NLOJ vs. PDelayFD NLOJ PDelayFD ? Our approach: divide and conquer
    • Shorter delays
    • Less space
    • Simpler to impl.
    R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1
  • 47. Biconnected Components R 1 R 2 R 3 R 4 R 7 R 5 R 6 R 8 Biconnected component : A maximal subset B of relations, s.t. the scheme graph has two (or more) disjoint paths between every two relations of B R 1 R 2 R 4 R 7 R 8 R 9 R 5 R 6 R 3
  • 48. Left-Deep Sequence of Outerjoins R : a set of relations Theorem: Optimized Algorithm: 1. Compute the biconnected components of R 2. Compute the full disjunction of each component 3. Apply outerjoins in a suitable order There exists an (efficiently computable) order B 1 ,…, B k of the biconnected components of R , s.t . FD ( R ) = (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k )
  • 49. BiComNLOJ : a Naïve Attempt 1. Divide R into biconnected components -> B 1 ,… B k in a suitable order 2. Compute FD ( B 1 ) ,…, FD ( B k ) − using PDelayFD 3. U sing NLOJ , compute (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k ) Each FD ( B i ) can be exponential in the input Non-polynomial delay! Solution: Iterator Iterator Iterator
  • 50. Retaining Polynomial Delay: 1 st Problem
    • After generating a tuple t of FD ( B 1 ) , we need to generate all tuples of FD ( B 2 ) that can join t
    • Non-polynomial delay if all of FD ( B 2 ) is computed for finding these tuples !
    • Solution:
    • PDelayFD can be modified so that it generates only those tuples of FD ( B 2 ) that can join t
    For simplification, assume only two components R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2 Details in the proceedings…
  • 51. Retaining Polynomial Delay: 2 nd Problem
    • The last step is to generate all tuples of FD ( B 2 ) that cannot be joined with tuples of FD ( B 1 )
    • However, this task is by itself NP-hard !
    • Solution: When generating all tuples of FD ( B 2 ) that can be joined with some tuple of FD ( B 1 ) , we collect enough information for generating the remaining tuples of FD ( B 2 )
    For simplification, assume only two components Details in the proceedings… R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2
  • 52. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 53. Experimental Setting Algorithms: PDelayFD , BiComNLOJ (main) IncrementalFD (CS05, state-of-art) PosgreSQL (open source) HW: Pentium 4 , 1.6 GHZ, 512 MB RAM
    • Synthetic data (randomly generated)
    • Fixed schemes
    Implementation R 3 R 1 R 5 R 2 R 4 R 6 R 9 R 8 R 7 R 10 Scheme S 1 R 3 R 1 R 7 R 5 R 8 R 2 R 4 R 6 R 10 R 9 Scheme S 2 R 2 R 5 R 1 R 4 R 9 R 10 R 8 R 7 R 6 R 3 Scheme S 3
  • 54. State-of-Art vs. Main Algorithm Number of Tuples in each Relation Average Delay (msec) IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm BiComNLOJ is a substantial improvement over the state-of-art Scheme 1 Scheme 2 Scheme 3
  • 55. Division into Biconnected Components Number of Tuples in each Relation Average Delay (msec) Division reduces delays (amount depends on the scheme) PDelayFD (no division to b.c.c.) BiComNJOJ our main algorithm Scheme 1 Scheme 2 Scheme 3
  • 56. Behavior of Delay IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the delay before each generated tuple While IncrementalFD has a slowdown , the delay of BiComNLOJ remains almost constant
  • 57. Contents
    • Full Disjunctions
      • Complexity
    • Contributions
    • Algorithms
      • Algorithm NLOJ for Tree-Structured Schemes
      • Algorithm PDelayFD for General Schemes
      • Algorithm BiComNLOJ − Main Algorithm
    • Experimental Results
    • Conclusion
  • 58. Summary Full Disjunction : An associative extension of the outerjoin operator to an arbitrary number of relations 3 Algorithms for computing FD: NLOJ N ested- L oop O uter j oin Tree-Structured Schemes PDelayFD P olynomial- Delay F ull D isjunction General Schemes BiComNLOJ Combine first 2, deploy div. into bi connected com ponents General Schemes
  • 59. Contributions
    • Substantial improvement of evaluation time over the state-of-art
      • Proved theoretically and experimentally
    • Full disjunctions can be computed with polynomial delay and in time linear in the output size
    • Optimization techniques for computing FDs
    • Implementation within PostgreSQL ( ongoing …)
    • Incorporating our algorithms into an SQL optimizer
      • E.g., some operators can be pushed through the FD
      • Not discussed here, appears in the proceedings…
  • 60. Thank you. Questions?