Full Disjunctions :   Polynomial-Delay Iterators in Action VLDB 2006 Seoul, Korea Sara Cohen  Technion  Israel Yaron Kanza...
Computing Full Disjunctions <ul><li>The  full disjunction  is a relational operator that  maximally combines  data from se...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
The Natural  Join  Operator Climates   Accommodations   Sites Climates Accommodations Sites Stars Hotel Climate City Site ...
The Natural Join Misses Information Climates Accommodations Sites Climates   Accommodations   Sites Bahamas  is not in  Si...
The Natural Join Misses Information Climates Accommodations Climates   Accommodations   Sites Bahamas  is not in  Sites , ...
The Natural Join Misses Information Climates Accommodations A  looser  notion of join is needed — one that enables joining...
The Natural  Join  Operator Climates   Accommodations   Sites Climates Accommodations Sites A  tuple  of the join correspo...
Join-Consistent Sets of Tuples A set  T  of tuples is  join-consistent  if every two tuples of  T  are join-consistent Two...
Connected Sets of Tuples <ul><li>The nodes are the tuples of  T   </li></ul><ul><li>An edge between every two tuples with ...
Natural  Join  (w/o Cartesian Product) Each  tuple  of the result corresponds to a set  T  of tuples  from the source rela...
Full   Disjunction  (Galindo-Legaria 1994) T  is  join consistent 1. Each  tuple  of the result corresponds to a  set  T  ...
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ra...
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ra...
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ra...
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ra...
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ra...
An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ra...
Padding Joined Tuple Sets with Nulls Mouth Logan Canada Site City Country diverse Canada Climate Country Mouth Logan diver...
The Outerjoin Operator The   outerjoin   of two relations  R 1   and   R 2 R 1   R 2 The  natural join   R 1  R 2  and, in...
Example of an Outerjoin  Climates Accommodations temperate UK tropical Bahamas diverse Canada Climate Country 4 Atala  Par...
Combining Relations using Outerjoins  The outerjoin operator is  not  associative For more than two relations, the result ...
Contents <ul><li>Full   Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><...
Efficiency of Evaluation The full-disjunction operator  (as well as other operators  like the  Cartesian product  or the  ...
History of Algorithms for Full Disjunctions n : N : F : number of relations number of tuples in the  DB number of tuples i...
Polynomial Delay One way to obtain an evaluation with a running time  linear in the output is to devise an algorithm that ...
Other Benefits of Polynomial Delay <ul><li>Incremental evaluation </li></ul><ul><ul><li>First tuples are generated quickly...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
Main Contributions 1.   First algorithm for computing full disjunctions with   polynomial delay 2.  First algorithm for co...
Contents <ul><li>Full   Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><...
Our Algorithms  Algorithm  NLOJ Tree Schemes Algorithm  PDelayFD General   Schemes Division into  Biconnected Components O...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
Tree Schemes Scheme graphs w/o cycles In the  scheme graph , the relation schemes are the  nodes  and there is an  edge  b...
Left-Deep Sequence of Outerjoins R   : a set of relations with a tree scheme R 1 ,…, R n   :  a  connected-prefix  order o...
Connected-Prefix Order of Relations R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 1 R 3 R 2 R 7 R 4 R 5 R 6 A   connected-prefix   order o...
Achieving Polynomial Delay 1.  Compute a  connected-prefix  order of  R 2.  Apply  outerjoins  in a left-deep order R 1 … ...
Iterators Algorithm <ul><li>Operate on top of an enumeration algorithm  </li></ul><ul><li>Implement  next ()  by controlli...
Using Iterators for Outerjoins R 1 … Iterator 1 Iterator  n Iterator 2 Iterator  n -1 R 2 R 3 R n -1 R n
Outerjoins are not Always Applicable It is  not  always possible to formulate a  full disjunction as a  left-deep sequence...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
About the Algorithm <ul><li>Unlike  NLOJ , the next algorithm,  PDelayFD , is applicable to all schemes  (and not just tre...
Shifting a Maximal JCC Tuple Set  T t -shifting  T : t t t t -shift of  T 1.   Add  t  to  T 2.   Extract   max. JCC subse...
Algorithm  PDelayFD Validate that the  t -shift is not already in  Q  or   C  1.  Generate a max. JCC set  T 0 2.   Insert...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
NLOJ  vs.  PDelayFD NLOJ PDelayFD ? Our approach:  divide and conquer <ul><li>Shorter delays </li></ul><ul><li>Less space ...
Biconnected Components R 1 R 2 R 3 R 4 R 7 R 5 R 6 R 8 Biconnected component : A maximal subset  B   of relations, s.t. th...
Left-Deep Sequence of Outerjoins R   : a set of relations Theorem: Optimized Algorithm: 1.  Compute the biconnected compon...
BiComNLOJ : a Naïve Attempt 1.   Divide   R   into biconnected components ->   B 1 ,… B k  in a suitable order 2.   Comput...
Retaining Polynomial Delay: 1 st  Problem <ul><li>After generating a tuple  t  of  FD ( B 1 ) , we need to generate all tu...
Retaining Polynomial Delay: 2 nd  Problem <ul><li>The last step is to generate all tuples of  FD ( B 2 )  that  cannot  be...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
Experimental Setting Algorithms:   PDelayFD ,   BiComNLOJ   (main)   IncrementalFD   (CS05, state-of-art) PosgreSQL   (ope...
State-of-Art vs. Main Algorithm Number of Tuples in each Relation Average Delay  (msec) IncrementalFD   (state of art, CS0...
Division into Biconnected Components Number of Tuples in each Relation Average Delay  (msec) Division reduces delays (amou...
Behavior of Delay IncrementalFD   (state of art, CS05) BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the ...
Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li...
Summary Full Disjunction :   An  associative  extension of the  outerjoin  operator to an  arbitrary  number of relations ...
Contributions <ul><li>Substantial  improvement of evaluation time  over  the state-of-art  </li></ul><ul><ul><li>Proved  t...
Thank you. Questions?
Upcoming SlideShare
Loading in …5
×

Full Disjunction

640 views

Published on

Published in: Economy & Finance, Travel
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
640
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Full Disjunction

  1. 1. Full Disjunctions : Polynomial-Delay Iterators in Action VLDB 2006 Seoul, Korea Sara Cohen Technion Israel Yaron Kanza University of Toronto Canada Benny Kimelfeld Hebrew University Israel Yehoshua Sagiv Hebrew University Israel Itzhak Fadida Technion Israel
  2. 2. Computing Full Disjunctions <ul><li>The full disjunction is a relational operator that maximally combines data from several relations </li></ul><ul><ul><li>It extends the natural join by allowing incompleteness </li></ul></ul><ul><ul><li>It extends the binary outerjoin to many relations </li></ul></ul><ul><li>This paper presents algorithms and optimizations for computing full disjunctions </li></ul><ul><ul><li>Theoretically, full disjunctions are more tractable than previously known </li></ul></ul><ul><ul><li>Practically, a significant improvement over the state-of-art, an iterator -like evaluation </li></ul></ul>
  3. 3. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  4. 4. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  5. 5. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
  6. 6. The Natural Join Misses Information Climates Accommodations Sites Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada Stars Hotel Climate City Site Country
  7. 7. The Natural Join Misses Information Climates Accommodations Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada Empty space means null value
  8. 8. The Natural Join Misses Information Climates Accommodations A looser notion of join is needed — one that enables joining tuples from some of the tables Climates Accommodations Sites Bahamas is not in Sites , so the natural join misses it Mouth Logan is not in a city, hence missed temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Stars Hotel Climate City Site Country Air Show 3 Ramada London diverse Canada
  9. 9. The Natural Join Operator Climates Accommodations Sites Climates Accommodations Sites A tuple of the join corresponds to a set of tuples from the source relations Join consistent Connected No Cartesian product Complete One tuple from each relation Stars Hotel Climate City Site Country temperate UK tropical Bahamas diverse Canada Climate Country 3 Ramada London Canada Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London London City Hyde Park Air Show Site UK Canada Country Air Show 3 Ramada London diverse Canada
  10. 10. Join-Consistent Sets of Tuples A set T of tuples is join-consistent if every two tuples of T are join-consistent Two tuples t 1 and t 2 are join-consistent if for every common attribute A : 1. t 1 [ A ] and t 2 [ A ] are non-null 2. t 1 [ A ] = t 2 [ A ] Ramada London Canada Stars Hotel City Country Air Show London Canada Site City Country
  11. 11. Connected Sets of Tuples <ul><li>The nodes are the tuples of T </li></ul><ul><li>An edge between every two tuples with a common attribute </li></ul>The join graph of a set T of tuples: A set of tuples is connected if its join graph is connected diverse Canada Climate Country Buckingham London UK Site City Country 4 Plaza Toronto Stars Hotel City
  12. 12. Natural Join (w/o Cartesian Product) Each tuple of the result corresponds to a set T of tuples from the source relations T is join consistent 1. T is connected No Cartesian product 2. T is complete One tuple from each relation 3. JCC
  13. 13. Full Disjunction (Galindo-Legaria 1994) T is join consistent 1. Each tuple of the result corresponds to a set T of tuples from the source relations T is connected No Cartesian product 2. T is complete One tuple from each relation 3. T is maximal Not properly contained in any JCC set 3. JCC
  14. 14. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country
  15. 15. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada
  16. 16. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada
  17. 17. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada
  18. 18. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
  19. 19. An Example of a Full Disjunction Climates Accommodations Sites FD ( R ) R temperate UK diverse Canada Climate Country 3 Ramada London Canada Toronto City Plaza Hotel 4 Canada Stars Country Buckingham London UK Mouth Logan Canada London City Air Show Site Canada Country Stars Hotel Climate City Site Country 4 Plaza Toronto diverse Canada Air Show 3 Ramada London diverse Canada Mouth Logan diverse Canada Buckingham London temperate UK
  20. 20. Padding Joined Tuple Sets with Nulls Mouth Logan Canada Site City Country diverse Canada Climate Country Mouth Logan diverse Canada Stars Hotel Climate City Site Country
  21. 21. The Outerjoin Operator The outerjoin of two relations R 1 and R 2 R 1 R 2 The natural join R 1 R 2 and, in addition, all dangling tuples padded with nulls
  22. 22. Example of an Outerjoin Climates Accommodations temperate UK tropical Bahamas diverse Canada Climate Country 4 Atala Paris France Nassau Toronto City Hilton Plaza Hotel Bahamas 4 Canada Stars Country temperate UK Hilton Nassau tropical Bahamas diverse Climate Paris Toronto City Atala Plaza Hotel 4 France 4 Canada Stars Country Climates Accommodations
  23. 23. Combining Relations using Outerjoins The outerjoin operator is not associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins cannot maximally combine relations (no matter what order is used) Outerjoin is not suitable for combining more than two relations !
  24. 24. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  25. 25. Efficiency of Evaluation The full-disjunction operator (as well as other operators like the Cartesian product or the natural join ) can generate an exponential (in the input size) number of tuples Polynomial running time is not a suitable yardstick The usual notion: Polynomial time in the combined size of the input and the output
  26. 26. History of Algorithms for Full Disjunctions n : N : F : number of relations number of tuples in the DB number of tuples in the FD This paper: linear dependence on F F is typically very large Can be exponential in the size of the database Source Time Databases RU96 O ( n + F 2 )  -acyclic KS03 O ( n 5  N 2  F 2 ) general CS05 O ( n 3  N  F 2 ) “ incremental polynomial” general
  27. 27. Polynomial Delay One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that acts as an iterator with an efficient next () operator, that is, An enumeration algorithm that runs with polynomial delay An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input time
  28. 28. Other Benefits of Polynomial Delay <ul><li>Incremental evaluation </li></ul><ul><ul><li>First tuples are generated quickly </li></ul></ul><ul><ul><ul><li>Full disjunctions are large, yet the user need not wait for the whole result to be generated </li></ul></ul></ul><ul><ul><li>Suitable for Web applications, where users expect to get the first few pages quickly </li></ul></ul><ul><ul><ul><li>In addition, the user can decide anytime that enough information has been shown </li></ul></ul></ul><ul><li>Enable parallel query processing </li></ul><ul><ul><li>While one processor generates the FD tuples, other processors apply further processing </li></ul></ul>
  29. 29. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  30. 30. Main Contributions 1. First algorithm for computing full disjunctions with polynomial delay 2. First algorithm for computing full disjunctions in time linear in the output 3. A general optimization technique for computing full disjunctions Division into biconnected components Substantial improvement over the state-of-art is proved theoretically and experimentally
  31. 31. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  32. 32. Our Algorithms Algorithm NLOJ Tree Schemes Algorithm PDelayFD General Schemes Division into Biconnected Components Optimization Algorithm BiComNLOJ Main Algorithm − General Schemes Combine
  33. 33. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  34. 34. Tree Schemes Scheme graphs w/o cycles In the scheme graph , the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes R 1 R 2 R 3 R 4 R 5 R 6 R 7
  35. 35. Left-Deep Sequence of Outerjoins R : a set of relations with a tree scheme R 1 ,…, R n : a connected-prefix order of R 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order FD ( R ) = (…(( R 1 R 2 ) R 3 ) …) R n Proposition: Algorithm NLOJ ( N ested L oop O uter J oin)
  36. 36. Connected-Prefix Order of Relations R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 1 R 3 R 2 R 7 R 4 R 5 R 6 A connected-prefix order of relations: Each prefix forms a (connected) subtree
  37. 37. Achieving Polynomial Delay 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order R 1 … Problem: exp. delay Solution: use iterators Algorithm NLOJ ( N ested L oop O uter J oin) R 2 R 3 R n -1 R n Already exponential size !
  38. 38. Iterators Algorithm <ul><li>Operate on top of an enumeration algorithm </li></ul><ul><li>Implement next () by controlling the execution </li></ul>To obtain polynomial delay, we use iterators Iterator next ()
  39. 39. Using Iterators for Outerjoins R 1 … Iterator 1 Iterator n Iterator 2 Iterator n -1 R 2 R 3 R n -1 R n
  40. 40. Outerjoins are not Always Applicable It is not always possible to formulate a full disjunction as a left-deep sequence of outerjoins Rajaraman and Ullman [PODS 96] : Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)
  41. 41. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  42. 42. About the Algorithm <ul><li>Unlike NLOJ , the next algorithm, PDelayFD , is applicable to all schemes (and not just trees) </li></ul><ul><li>Algorithm PDelayFD has a polynomial delay , but the delay is larger than that of NLOJ </li></ul><ul><li>Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art </li></ul>
  43. 43. Shifting a Maximal JCC Tuple Set T t -shifting T : t t t t -shift of T 1. Add t to T 2. Extract max. JCC subset containing t 3. Extend to a maximal JCC set T
  44. 44. Algorithm PDelayFD Validate that the t -shift is not already in Q or C 1. Generate a max. JCC set T 0 2. Insert T 0 into Q Repeat until Q is empty : 1. Move some T from Q to C 2. Print the join of T , padded with nulls 3. Insert into Q a t -shift of T for all tuples t in the database Output : … PDelayFD ( R ) computes FD ( R ) with polynomial delay C Q Theorem:
  45. 45. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  46. 46. NLOJ vs. PDelayFD NLOJ PDelayFD ? Our approach: divide and conquer <ul><li>Shorter delays </li></ul><ul><li>Less space </li></ul><ul><li>Simpler to impl. </li></ul>R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1 R 3 R 5 R 2 R 9 R 8 R 7 R 10 R 4 R 6 R 1
  47. 47. Biconnected Components R 1 R 2 R 3 R 4 R 7 R 5 R 6 R 8 Biconnected component : A maximal subset B of relations, s.t. the scheme graph has two (or more) disjoint paths between every two relations of B R 1 R 2 R 4 R 7 R 8 R 9 R 5 R 6 R 3
  48. 48. Left-Deep Sequence of Outerjoins R : a set of relations Theorem: Optimized Algorithm: 1. Compute the biconnected components of R 2. Compute the full disjunction of each component 3. Apply outerjoins in a suitable order There exists an (efficiently computable) order B 1 ,…, B k of the biconnected components of R , s.t . FD ( R ) = (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k )
  49. 49. BiComNLOJ : a Naïve Attempt 1. Divide R into biconnected components -> B 1 ,… B k in a suitable order 2. Compute FD ( B 1 ) ,…, FD ( B k ) − using PDelayFD 3. U sing NLOJ , compute (…(( FD ( B 1 ) FD ( B 2 )) …) FD ( B k ) Each FD ( B i ) can be exponential in the input Non-polynomial delay! Solution: Iterator Iterator Iterator
  50. 50. Retaining Polynomial Delay: 1 st Problem <ul><li>After generating a tuple t of FD ( B 1 ) , we need to generate all tuples of FD ( B 2 ) that can join t </li></ul><ul><li>Non-polynomial delay if all of FD ( B 2 ) is computed for finding these tuples ! </li></ul><ul><li>Solution: </li></ul><ul><li>PDelayFD can be modified so that it generates only those tuples of FD ( B 2 ) that can join t </li></ul>For simplification, assume only two components R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2 Details in the proceedings…
  51. 51. Retaining Polynomial Delay: 2 nd Problem <ul><li>The last step is to generate all tuples of FD ( B 2 ) that cannot be joined with tuples of FD ( B 1 ) </li></ul><ul><li>However, this task is by itself NP-hard ! </li></ul><ul><li>Solution: When generating all tuples of FD ( B 2 ) that can be joined with some tuple of FD ( B 1 ) , we collect enough information for generating the remaining tuples of FD ( B 2 ) </li></ul>For simplification, assume only two components Details in the proceedings… R 2 R 3 R 1 R 4 R 6 R 7 R 5 R 8 B 1 B 2
  52. 52. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  53. 53. Experimental Setting Algorithms: PDelayFD , BiComNLOJ (main) IncrementalFD (CS05, state-of-art) PosgreSQL (open source) HW: Pentium 4 , 1.6 GHZ, 512 MB RAM <ul><li>Synthetic data (randomly generated) </li></ul><ul><li>Fixed schemes </li></ul>Implementation R 3 R 1 R 5 R 2 R 4 R 6 R 9 R 8 R 7 R 10 Scheme S 1 R 3 R 1 R 7 R 5 R 8 R 2 R 4 R 6 R 10 R 9 Scheme S 2 R 2 R 5 R 1 R 4 R 9 R 10 R 8 R 7 R 6 R 3 Scheme S 3
  54. 54. State-of-Art vs. Main Algorithm Number of Tuples in each Relation Average Delay (msec) IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm BiComNLOJ is a substantial improvement over the state-of-art Scheme 1 Scheme 2 Scheme 3
  55. 55. Division into Biconnected Components Number of Tuples in each Relation Average Delay (msec) Division reduces delays (amount depends on the scheme) PDelayFD (no division to b.c.c.) BiComNJOJ our main algorithm Scheme 1 Scheme 2 Scheme 3
  56. 56. Behavior of Delay IncrementalFD (state of art, CS05) BiComNJOJ our main algorithm Tuple Number Delay (msec) Measure the delay before each generated tuple While IncrementalFD has a slowdown , the delay of BiComNLOJ remains almost constant
  57. 57. Contents <ul><li>Full Disjunctions </li></ul><ul><ul><li>Complexity </li></ul></ul><ul><li>Contributions </li></ul><ul><li>Algorithms </li></ul><ul><ul><li>Algorithm NLOJ for Tree-Structured Schemes </li></ul></ul><ul><ul><li>Algorithm PDelayFD for General Schemes </li></ul></ul><ul><ul><li>Algorithm BiComNLOJ − Main Algorithm </li></ul></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  58. 58. Summary Full Disjunction : An associative extension of the outerjoin operator to an arbitrary number of relations 3 Algorithms for computing FD: NLOJ N ested- L oop O uter j oin Tree-Structured Schemes PDelayFD P olynomial- Delay F ull D isjunction General Schemes BiComNLOJ Combine first 2, deploy div. into bi connected com ponents General Schemes
  59. 59. Contributions <ul><li>Substantial improvement of evaluation time over the state-of-art </li></ul><ul><ul><li>Proved theoretically and experimentally </li></ul></ul><ul><li>Full disjunctions can be computed with polynomial delay and in time linear in the output size </li></ul><ul><li>Optimization techniques for computing FDs </li></ul><ul><li>Implementation within PostgreSQL ( ongoing …) </li></ul><ul><li>Incorporating our algorithms into an SQL optimizer </li></ul><ul><ul><li>E.g., some operators can be pushed through the FD </li></ul></ul><ul><ul><li>Not discussed here, appears in the proceedings… </li></ul></ul>
  60. 60. Thank you. Questions?

×