Upcoming SlideShare
Loading in …5
×

# Computing FDs

1,097 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total views
1,097
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Computing FDs

1. 1. Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem
2. 2. A Formal Definitions of Full Disjunction
3. 3. Preliminary Notations <ul><li>Given </li></ul><ul><ul><li>a set of relations r 1 , …, r n </li></ul></ul><ul><ul><li>with schemes R 1 , …, R n , respectively </li></ul></ul><ul><li>We denote with t ij the j -th tuple of r i </li></ul><ul><li>For X  R i , we denote by t ij [ X ] the projection of t ij on X </li></ul><ul><li>Next, we give some preliminary definitions </li></ul>
4. 4. Scheme Graph <ul><li>Two distinct schemes R i and R j are connected if R i  R j is non-empty </li></ul><ul><li>The scheme graph of R 1 , …, R n consists of </li></ul><ul><ul><li>A node for each scheme R i </li></ul></ul><ul><ul><li>An edge between R i and R j if R i and R j are connected </li></ul></ul>Movies Actors Actors-that-Directed Acted-in
5. 5. Connected Relations Schemes <ul><li>Relation schemes R i 1 , …, R i m are connected if their scheme graph is connected </li></ul><ul><li>Tuples t i 1 j 1 , …, t i m j m , from m distinct relations, are connected if the relation schemes of these relations are connected </li></ul>Movies Actors Acted-in Connected Relation Schemes Movies Actors Unconnected Relation Schemes
6. 6. Join Consistent Tuples <ul><li>Two tuples t i 1 j 1 and t i 2 j 2 are join consistent if </li></ul><ul><ul><li>t i 1 j 1 [ R i 1  R i 2 ] = t i 2 j 2 [ R i 1  R i 2 ] </li></ul></ul><ul><li>m tuples, from m distinct relations, are join consistent if every pair of connected tuples are join consistent </li></ul>
7. 7. Universal Tuple <ul><li>A universal tuple u is defined over all the attributes in R 1  …  R n and consists of null and non-null values </li></ul><ul><li>We denote by û the non-null portion of u </li></ul><ul><li>A universal tuple is called integrated tuple if there are m connected and join consistent tuples t i 1 j 1 , …, t i m j m such that û is the natural join of t i 1 j 1 , …, t i m j m </li></ul>
8. 8. Maximal Universal Tuple <ul><li>A universal tuple u subsumes a universal tuple v if u is equal to v on all the non-null attributes of v </li></ul><ul><ul><li>(i.e., u can be created from v by replacing </li></ul></ul><ul><ul><li>some null values with non-null values) </li></ul></ul><ul><li>In a given set D , a tuple u is maximal if there is no tuple in D , other than u , that subsumes u </li></ul>
9. 9. A Full Disjunction <ul><li>The full disjunction of r 1 , …, r n i s the set of all maximal integrated tuples that can be generated from m tuples of r 1 , …, r n </li></ul>
10. 10. Acyclic Scheme <ul><li>Given a set of schemes R 1 , …, R n , their scheme hypergraph consists of </li></ul><ul><ul><li>A node for each attribute that appears in some R i </li></ul></ul><ul><ul><li>For each R i (1  i  n ), a hyperedge that includes the attributes of R i </li></ul></ul><ul><li>α- acyclic scheme hypergraph: </li></ul><ul><ul><li>All the hyperedges can be removed by a sequence of ear removals </li></ul></ul><ul><li>γ- acyclic scheme hypergraph: </li></ul><ul><ul><li>The Bachman diagram of the scheme hypergraph is acyclic </li></ul></ul>
11. 12. Computing Full Disjunctions
12. 13. Product Graph <ul><li>Given a query Q and a database D , the product of Q and D is a graph such that </li></ul><ul><ul><li>The nodes are pairs of a node of Q and a node of D </li></ul></ul><ul><ul><li>The edges are between nodes such that the pair of nodes of Q and the pair of nodes of D both are connected by edges with the same label in Q and in D , respectively </li></ul></ul><ul><ul><li>The root is the pair of the root of Q and the root of D </li></ul></ul>
13. 14. 1 2 4 5 6 title language 7 3 year 8 director 9 name 10 movie date of birth 11 1983 movie actor Zelig Antz 1998 English 1/12/1935 Woody Allen title year filmography item filmography item v 1 v 2 w 1 v 3 title actor movie director filmography item w 2 w 3 w 4 date of birth name language The product of the query and the database is the next graph
14. 15. title language director name movie date of birth movie actor title filmography item filmography item V 1 , 1 V 2 , 2 V 2 , 3 V 3 , 4 w 1 , 5 w 2 , 6 w 1 , 8 w 3 , 10 w 4 , 11 There are additional nodes that are not reachable from the root
15. 16. <ul><li>For a subgraph G of the product graph </li></ul><ul><ul><li>G has no repeated variables </li></ul></ul><ul><ul><li>G contains the root </li></ul></ul><ul><ul><li>Each node in G is reachable from the root </li></ul></ul><ul><ul><li>G preserves the constraints (edges) of the query </li></ul></ul><ul><li>Conditions 1 – 3  OR-matching graph </li></ul><ul><li>Conditions 1 – 4  weak-matching graph </li></ul>Matching as a Subgraph of the Product Graph
16. 17. title language director name movie date of birth movie actor title filmography item filmography item V 1 , 1 V 2 , 2 V 2 , 3 V 3 , 4 w 1 , 5 w 2 , 6 w 1 , 8 w 3 , 10 w 4 , 11 An OR-matching graph It is also a weak-matching graph V 1 , 1 V 2 , 2 w 1 , 5 w 2 , 6 V 3 , 4 w 3 , 10 w 4 , 11
17. 18. title language director name movie date of birth movie actor title filmography item filmography item V 1 , 1 V 2 , 2 V 2 , 3 V 3 , 4 w 1 , 5 w 2 , 6 w 1 , 8 w 3 , 10 w 4 , 11 V 1 , 1 V 3 , 4 w 3 , 10 w 4 , 11 Another OR-matching graph V 2 , 3 w 1 , 8 It is not a weak-matching graph since the “ director” edge of the query is not preserved
18. 19. Matching Graphs Each OR-matching graph represents an OR-matching (and each weak-matching graph represent a weak matching) An OR-matching can be represented by many OR-matching graphs, but all these graphs have the same set of nodes and only differ by their edges (and the same it true for weak-matchings and weak-matching graphs) Matching
19. 20. Intuition <ul><li>For DAG queries, matching graphs are constructed by adding edges according to the query constraints </li></ul><ul><ul><li>The order of the extensions is simply made by using a topological sort of the query nodes </li></ul></ul><ul><li>For cyclic queries, a simple traversal over the query nor a simple traversal over the database will work </li></ul><ul><ul><li>Instead, we use a stratum traversal over the matching graph </li></ul></ul>
20. 21. title language director name movie date of birth movie actor title filmography item filmography item V 1 , 1 V 2 , 2 V 2 , 3 V 3 , 4 w 1 , 5 w 2 , 6 w 1 , 8 w 3 , 10 w 4 , 11 Dividing the edges to strata … Stratum 1 Stratum 2 Stratum 3
21. 22. Stratum Traversal <ul><li>A stratum traversal is an ordered list that </li></ul><ul><ul><li>Starts with the edges on stratum 1 </li></ul></ul><ul><ul><li>Followed by the edges of stratum 2 </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><ul><li>Followed by the edges of stratum n </li></ul></ul><ul><ul><li>… </li></ul></ul>The order of the edges in each stratum is unimportant There can be multiple occurrences of the same edge in different strata We only look at the first n strata, where n is the size of the query
22. 23. Computing the OR-Matching Graphs <ul><li>A set of OR-matching graphs is created </li></ul><ul><li>We extend each OR-matching graph in the set by adding edges according to the stratum traversal </li></ul><ul><li>Initially, the set includes a single graph that consists only the root of the product graph </li></ul><ul><li>In each extension step, we try to add the current edge to the graphs that were produced so far, and this may cause </li></ul><ul><ul><li>The creation of a new graph that replaces the extended graph </li></ul></ul><ul><ul><li>The creation of a new graph that is added to the set of graphs in addition to the existing graphs </li></ul></ul><ul><ul><li>No change to the set of graphs </li></ul></ul>
23. 24. Adding an Edge <ul><li>After each addition of an edge, subsumed matching-graphs are being removed, to avoid exponential blowup </li></ul><ul><li>There are six cases that should be handled </li></ul><ul><li>The cases of extending a graph by an edge will be described next </li></ul>
24. 25. No change is being done movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 title V 2 , O 2 V 1 , O 3 The target of the added edge has a node with a pair that includes the root of Q without the root of D 1 No change is being done movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 movie V 1 , O 1 V 2 , O 2 The graph already includes the added edge 2
25. 26. No change is being done movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 title V 2 , O 3 W 1 , O 8 The graph does not include the source of the added edge 3 movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 title V 2 , O 2 W 1 , O 5 The graph includes the source of the added edge and no node with the variable of the target 4 movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 title W 1 , O 5 The edge is added to the graph and the new graph replaces the existing graph
26. 27. movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 The graph already includes the source and the target of the added edge but does not include the added edge itself 5 title W 1 , O 3 a.k.a V 2 , O 2 W 1 , O 3 The edge is added to the graph and the new graph replaces the existing graph a.k.a
27. 28. movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 film V 3 , O 4 V 2 , O 4 The graph includes the source of the added edge but also includes a node with the same variable as the variable in the target of the added edge 6 title W 1 , O 3 Different nodes with the same variable V 2 A new graph is created and being added to the existing graph, without replacing it movie V 1 , O 1 V 2 , O 2 actor V 3 , O 4 title W 1 , O 3 movie V 1 , O 1 V 2 , O 4 actor V 3 , O 4 film (V 2 ,O 2 ) is replaced by (V 2 ,O 4 ), and nodes that are not reachable from the root are being erased
28. 29. Applying the algorithm to the movies example V 1 , 1 1 V 1 , 1 2 movie V 2 , 2 V 1 , 1 movie V 2 , 2 V 1 , 1 3 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie
29. 30. 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 1 , 1 V 3 , 4 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor 5 title V 2 , 2 w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor
30. 31. 6 language V 2 , 2 w 2 , 6 title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor 7 language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 V 2 , 3 language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5
31. 32. language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 8 V 2 , 2 V 1 , 1 name V 3 , 4 w 3 , 10 name w 3 , 10 name w 3 , 10 V 3 , 4 w 4 , 11 date of birth 9 date of birth w 4 , 11 date of birth w 4 , 11
32. 33. language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 10 director V 2 , 2 V 3 , 4 V 2 , 2 V 1 , 1 name w 3 , 10 name w 3 , 10 date of birth w 4 , 11 date of birth w 4 , 11 language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 name w 3 , 10 name w 3 , 10 date of birth w 4 , 11 date of birth w 4 , 11 director
33. 34. 11 filmography item V 3 , 4 V 2 , 2 language w 2 , 6 title w 1 , 5 V 3 , 4 movie V 2 , 2 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 name w 3 , 10 name w 3 , 10 date of birth w 4 , 11 date of birth w 4 , 11 title w 1 , 5 movie V 2 , 2 language w 2 , 6 V 3 , 4 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 name w 3 , 10 name w 3 , 10 date of birth w 4 , 11 date of birth w 4 , 11 filmography item director V 1 , 1 V 2 , 2 V 3 , 4 actor name w 3 , 10 date of birth w 4 , 11 filmography item Subsumed by the left matching graph
34. 35. 12 V 1 , 1 V 2 , 3 movie V 3 , 4 actor title w 1 , 5 name w 3 , 10 date of birth w 4 , 11 title w 1 , 5 movie V 2 , 2 language w 2 , 6 V 3 , 4 V 1 , 1 actor name w 3 , 10 date of birth w 4 , 11 filmography item director filmography item V 3 , 4 V 2 , 3 title w 1 , 5 movie V 2 , 2 language w 2 , 6 V 3 , 4 V 1 , 1 V 1 , 1 V 2 , 3 movie actor V 3 , 4 actor title w 1 , 5 name w 3 , 10 name w 3 , 10 date of birth w 4 , 11 date of birth w 4 , 11 filmography item director filmography item V 2 , 3 V 3 , 4 V 1 , 1 actor name w 3 , 10 date of birth w 4 , 11 filmography item Subsumed by the right matching graph
35. 36. title language name movie date of birth movie actor title filmography item filmography item V 1 , 1 V 2 , 2 V 2 , 3 V 3 , 4 w 1 , 5 w 2 , 6 w 1 , 8 w 3 , 10 w 4 , 11 The Product Graph director title w 1 , 5 movie V 2 , 2 language w 2 , 6 V 3 , 4 V 1 , 1 actor name w 3 , 10 date of birth w 4 , 11 filmography item director V 1 , 1 V 2 , 3 movie V 3 , 4 actor title w 1 , 5 name w 3 , 10 date of birth w 4 , 11 filmography item The OR-Matchings
36. 37. Computing Maximal Weak-Matching Graphs <ul><li>In order to compute maximal weak matching graphs, the same algorithm is being used with a slight change </li></ul><ul><li>After each addition of an edge the nodes that cause a query constraint not to be preserved are removed (along with edges that contain these nodes) </li></ul><ul><li>Also, are deleted nodes that the previous deletion causes them not to be reachable from the root </li></ul>
37. 38. The Algorithm Computes Weak-Queries in Polynomial Time <ul><li>Theorem Given a query Q and a database D , </li></ul><ul><li>the revised algorithm terminates with the set </li></ul><ul><li>of maximal weak-matching graphs of Q </li></ul><ul><li>w.r.t. D . The runtime of the algorithm is </li></ul><ul><li>O ( q 3 dm 2 ), where q is the size of the query, d is </li></ul><ul><li>the size of the database and m is the size of </li></ul><ul><li>the result </li></ul>