Upcoming SlideShare
×

# Databases 1

880 views

Published on

Published in: Economy & Finance
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
880
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide
• ### Databases 1

1. 1. Databases I ( H.10 /H.11) Summary FD’s and Normalisation Steven Klusener Vrije Universiteit, Amsterdam versie 2007
2. 2. Topics <ul><li>Functional dependencies (FD’s) </li></ul><ul><ul><li>Formal definition of an FD </li></ul></ul><ul><ul><li>Deriving FD’s from others, or, constructing a minimal cover </li></ul></ul><ul><ul><li>Showing counter examples </li></ul></ul><ul><li>Candidate and primary keys, based on FD’s </li></ul><ul><li>Normalisation: 2 NF, 3 NF and BCNF </li></ul><ul><li>Decomposition and the lossless join property </li></ul><ul><li>A 3NF decomposition algorithm that satisfies the lossless join property. </li></ul>
3. 3. Functional Dependency (FD) <ul><li>Given: </li></ul><ul><ul><li>A relation R with attributes A 1 , A 2 , …, A n </li></ul></ul><ul><ul><li>X  {A 1 , A 2 , …, A n } and Y  {A 1 , A 2 , …, A n } </li></ul></ul><ul><li>Y depends functionally on X (notation: X  Y) iff for each possible extension of R it holds that </li></ul><ul><ul><li> t 1 ,t 2  R: (t 1 [X] = t 2 [X])  (t 1 [Y] = t 2 [Y]) </li></ul></ul><ul><li>Note that if all tuples in a R have a different value for X, then the FD X  Y is trivially satisfied </li></ul>
4. 4. Minimal set of FD’s <ul><li>Certain functional dependencies can be derived from others </li></ul><ul><li>For example </li></ul><ul><ul><li>ENR  BDATE </li></ul></ul><ul><ul><ul><li>(ENR value E1 (John) determines the birth date 1964-08-28 ) </li></ul></ul></ul><ul><ul><li>BDATE  ZODIAC zodiac = “sterrenbeeld” </li></ul></ul><ul><ul><ul><li>(The birth date 1964-08-28 determines the zodiac value Virgin) </li></ul></ul></ul><ul><li>Hence, each employee number determines the zodiac sign of that employee, so we have ENR  ZODIAC, which follows from ENR  BDATE and BDATE  ZODIAC (by transitivity as we will see later) </li></ul>
5. 5. Derivability of FD’s, closure <ul><li>We can derive FD’s from others using the following inteference rules: </li></ul><ul><ul><li>Reflexive Rule : the FD X  Y is given, and hence derivable, </li></ul></ul><ul><ul><li>for any Y  X. </li></ul></ul><ul><ul><li>Augmentation Rule : the FD XZ  YZ is derivable from </li></ul></ul><ul><ul><li>the FD X  Y. </li></ul></ul><ul><ul><li>Transitive Rule : the FD X  Z is derivable from </li></ul></ul><ul><ul><li>the two FD’s X  Y and Y  Z </li></ul></ul><ul><li>From these rules other interference rules can be obtained, such as: </li></ul><ul><ul><li>Decomposition Rule : the FD X  Y is derivable from </li></ul></ul><ul><ul><li>the FD X  YZ. </li></ul></ul><ul><ul><li>Additive Rule : the composed FD X  YZ is derivable from </li></ul></ul><ul><ul><li>the two FD’s X  Y and X  Z. </li></ul></ul><ul><li>From a set of FD’s F we write F+ for the set of all FD’s that can be derived from F, F+ is called the closure of F </li></ul>
6. 6. Minimal set of FD’s <ul><li>A set F of functional dependencies is minimal iff: </li></ul><ul><ul><li>Every FD in F is of the form X  A, where A is an attribute </li></ul></ul><ul><ul><li>We cannot make any left-hand-side smaller, so if a rule XY  A is simplified into X  A then the closure gets strictly smaller </li></ul></ul><ul><ul><li>We cannot remove FD’s from F without losing FD’s in the closure of F </li></ul></ul><ul><li>In general, the construction of a minimal cover does not lead to unique one </li></ul>
7. 7. Keys <ul><li>Given </li></ul><ul><ul><li>A relation R with attributes A 1 , A 2 , …, A n </li></ul></ul><ul><ul><li>A minimal set of FD’s F </li></ul></ul><ul><ul><li>A set of attributes X  {A 1 , A 2 , …, A n } </li></ul></ul><ul><li>then </li></ul><ul><li>X is a (candidate) key of R iff for each i in [1,..,n] : </li></ul><ul><li>1) identification: X  A i is derivable from F èn </li></ul><ul><li>2) minimalitity: if Y  X  Y  X then Y  A i is not derivable from F </li></ul>
8. 8. Some terminology w.r.t. keys <ul><li>superkey (identification, but no minimality) </li></ul><ul><ul><li>Like (ENR,NAME) </li></ul></ul><ul><li>candidate key (a candidate, see previous slide) </li></ul><ul><ul><li>Like (ENR) </li></ul></ul><ul><li>primary key (the selected candidate key) </li></ul><ul><li>alternate/alternative key (all remaining candidate keys) </li></ul>
9. 9. Categorizing FD’s, based on the key <ul><li>Given a relation R and a minimal set of FD’s F </li></ul><ul><li>An attribute A is a prime attribute if A  K, for some candidate key K, otherwise it is a non-prime attribute </li></ul><ul><li>If attr. A is a non-prime attribute, then X  A </li></ul><ul><ul><li>is a regular FD if X is a candidate key </li></ul></ul><ul><ul><li>is a partial FD if X  K, for some candiate key K </li></ul></ul><ul><ul><li>is a transitive FD if all atributes in X are non-prime </li></ul></ul><ul><li>If attr. A is a prime attribute, then X  A is a prime FD </li></ul>
10. 10. Normal forms <ul><li>Given a relation R, a minimal set of FD’s F </li></ul><ul><li>R is in </li></ul><ul><li>2NF, if there are no partial FD’s </li></ul><ul><li>3NF, there are no partial and no transitive FD’s </li></ul><ul><li>BCNF, if R is in 3NF, à nd if for every X  A in F it holds that X is a key </li></ul>
11. 11. 2 NF <ul><li>2 NF: no partial FD’s are allowed, if they occur, then the relation has to be decomposed. </li></ul><ul><li>EMP_PROJ( SSN,PNUMBER ,HOURS,ENAME,PNAME,PLOC) </li></ul><ul><li>With FD’s </li></ul><ul><li>SSN, PNUMBER  HOURS </li></ul><ul><li>SSN  ENAME (partial dependency) </li></ul><ul><li>PNUMBER  PNAME, PLOC (partial dependency) </li></ul><ul><li>After decomposition: </li></ul><ul><li>EMP_PROJ1 ( SSN,PNUMBER , HOURS) </li></ul><ul><li>EMP ( SSN , ENAME) </li></ul><ul><li>PROJ ( PNUMBER , PNAME, PLOC) </li></ul><ul><li>Note </li></ul><ul><ul><li>If the key contains only one attribute, the 2NF property holds trivially </li></ul></ul><ul><ul><li>Be sure that you have lossless projection (to be discussed later) </li></ul></ul>
12. 12. 3 NF <ul><li>3 NF: no transitive FD’s are allowed, if they occur, then the relation has to be decomposed. </li></ul><ul><li>EMP_DEPT( SSN , ENAME, BDATE, DNUMBER, DNAME, DMGRSSN) </li></ul><ul><li>With FD’s </li></ul><ul><li>SSN  ENAME, BDATE, DNUMBER </li></ul><ul><li>DNUMBER  DNAME, DMGRSSN (transitive dependency) </li></ul><ul><li>After decomposition: </li></ul><ul><li>EMP ( SSN , ENAME, BDATE, DNUMBER) </li></ul><ul><li>DEPT ( DNUMBER , DNAME, DMGRSSN) </li></ul><ul><li>Rephrasing the 3 NF property: for every non-prime attribute A and FD X  A in F, X must be a candidate key </li></ul>
13. 13. BCNF <ul><li>For every FD X  A, X must be a superkey </li></ul><ul><li>TEACH( STUDENT, COURSE , INSTRUCTOR ) </li></ul><ul><li>With FD’s </li></ul><ul><li>STUDENT, COURSE  INSTRUCTOR </li></ul><ul><li>INSTRUCTOR  COURSE (INSTRUCTOR is not a key) </li></ul><ul><li>Decomposition is not trivial: </li></ul><ul><li>S-I-1( STUDENT , INSTRUCTOR ) and S-C-1( STUDENT , COURSE) </li></ul><ul><li>C-I-2( INSTRUCTOR , COURSE) and C-S-2( COURSE , STUDENT ) </li></ul><ul><li>I-C-3( INSTRUCTOR , COURSE) and I-S-3( INSTRUCTOR , STUDENT ) </li></ul><ul><li>Conclusion: </li></ul><ul><li>FD1 is lost in all three cases (no subrelation contains all three attributes) </li></ul><ul><li>I-C-3/I-S-3 is the best, because it is non-additive (to be discussed later) </li></ul>
14. 14. Dependency preservation during decomposition <ul><li>During normalisation we decompose relations into subrelations, until we arrive at the right level </li></ul><ul><li>However, we have to take care this decomposition guarantees: </li></ul><ul><ul><li>Lossless joins: we must be able to construct the original relation from joining the subrelations </li></ul></ul><ul><ul><li>In other words, joining the subrelations may not introduce spurious tuples </li></ul></ul>
15. 15. Example Natural Join (  ) <ul><li>EMP DEPT </li></ul><ul><li>E# ENAME BDATE D# D# DNAME BUDGET </li></ul><ul><li>E1 John 28-08-1964 D1 D1 engineering 500,000 </li></ul><ul><li>E2 Joe 04-04-1968 D1 D2 sales 200,000 </li></ul><ul><li>E3 Jack 03-09-1969 D1 </li></ul><ul><li>E4 Will 21-03-1971 D2 </li></ul><ul><li>E5 Bridget 22-01-1972 D2 </li></ul><ul><li>EMP  DEPT: Join of EMP and DEPT, combine every tuple from EMP and tuple of DEPT if their common attributes (here D#) have the same value </li></ul><ul><li>EMP  DEPT </li></ul><ul><li>E# ENAME BDATE D# NAME BUDGET </li></ul><ul><li>E1 John 28-08-1964 D1 engineering 500,000 </li></ul><ul><li>E2 Joe 04-04-1968 D1 engineering 500,000 </li></ul><ul><li>E3 Jack 03-09-1969 D1 engineering 500,000 </li></ul><ul><li>E4 Will 21-03-1971 D2 sales 200,000 </li></ul><ul><li>E5 Bridget 22-01-1972 D2 sales 200,000 </li></ul>
16. 21. Dependency preservation during decomposition <ul><li>During normalisation we decompose relations into subrelations, until we arrive at the right level </li></ul><ul><li>However, we have to take care this decomposition guarantees: </li></ul><ul><ul><li>Lossless joins: we must be able to construct the original relation from joining the subrelations </li></ul></ul><ul><ul><li>In other words, joining the subrelations may not introduce spurious tuples </li></ul></ul>
17. 22. Lossless-join, decomp. into 2 projections <ul><li>D is a lossless join decomposition w.r.t. F iff </li></ul><ul><ul><li>(R 1  R 2 )  R 1  F + , òr </li></ul></ul><ul><ul><li>(R 1  R 2 )  R 2  F + </li></ul></ul><ul><li>R = DPD_EMP = {E#, DPD_N, REL, EMP_N, BDATE, D#} with F = { {E#, DPD_N}  REL, E#  EMP_N, E#  BDATE, E#  D# } R 1 = DPD = {E#, DPD_N, REL} R 2 = EMP = {E#, EMP_N, BDATE, D#} R 1  R 2 = {E#} and E#  {E#, EMP_N, BDATE, D#} (=EMP) Hence : this decomposition of DPD_EMP is lossless w.r.t. F </li></ul>
18. 23. Ex. lossless-join (with n projections) (1/5) <ul><li>DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET) </li></ul><ul><ul><li>with F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET } DPD (E#, DPD_N, REL) EMP (E#, EMP_N, BDATE, D#) DEPT (D#, DPM_N, BUDGET) </li></ul></ul>
19. 24. Ex. lossless-join (with n projections) (2/5) <ul><li>DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET) </li></ul><ul><li>with F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET } </li></ul><ul><li>INITIAL MATRIX: a i for each relation/row that has a certain attribute (whit index i) </li></ul><ul><li>E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET </li></ul><ul><li>DPD a 1 a 2 a 3 b 14 b 15 b 16 b 17 b 18 </li></ul><ul><li>EMP a 1 b 22 b 23 a 4 a 5 a 6 b 27 b 28 </li></ul><ul><li>DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8 </li></ul>
20. 25. Ex. lossless-join (with n projections) (3/5) <ul><li>DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET) </li></ul><ul><li>met F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET } </li></ul><ul><li>E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET </li></ul><ul><li>DPD a 1 a 2 a 3 b 14 b 15 b 16 b 17 b 18 </li></ul><ul><li>EMP a 1 b 22 b 23 a 4 a 5 a 6 b 27 b 28 </li></ul><ul><li>DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8 </li></ul><ul><li>N.B. After applying the FD E#  {EMP_N, BDATE, D#} </li></ul>
21. 26. Ex. lossless-join (with n projections) (4/5) <ul><li>DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET) </li></ul><ul><li>met F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET } </li></ul><ul><li>E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET </li></ul><ul><li>DPD a 1 a 2 a 3 a 4 a 5 a 6 b 17 b 18 </li></ul><ul><li>EMP a 1 b 22 b 23 a 4 a 5 a 6 b 27 b 28 </li></ul><ul><li>DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8 </li></ul><ul><li>N.B. After applying the FD E#  {EMP_N, BDATE, D#} </li></ul>
22. 27. Ex. lossless-join (with n projections) (5/5) <ul><li>DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET) </li></ul><ul><li>met F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET } </li></ul><ul><li>E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET </li></ul><ul><li>DPD a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 </li></ul><ul><li>EMP a 1 b 22 b 23 a 4 a 5 a 6 a 7 a 8 </li></ul><ul><li>DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8 </li></ul><ul><li>N.B. After applying the FD’s E#  {EMP_N, BDATE, D#} and D#  {DPM_N, BUDGET} </li></ul><ul><li>Lossless-join eigenschap has been shown: row DPD contains only a’s! </li></ul>
23. 28. Algorithm lossless-join <ul><li>Does the decomposition R 1 , …, R k of R satisfy the lossless-join property w.r.t. m.b.t. A set of FD’s F? </li></ul><ul><li>1) construct the initial matrix, consisting of </li></ul><ul><ul><ul><li>A row for each subrelation R i </li></ul></ul></ul><ul><ul><ul><li>A column for each attribute of R </li></ul></ul></ul><ul><ul><ul><li>For each entry, thus for each row i and column j </li></ul></ul></ul><ul><ul><ul><ul><li>Put a i in the entry, if row i has the attribute of column j </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Otherwise, put b ij in this entry </li></ul></ul></ul></ul><ul><li>2) apply each FD X  Y  F (see below) until </li></ul><ul><ul><ul><li>Either one row is of the form a 1 , …, a n, , and the composition is lossless indeed </li></ul></ul></ul><ul><ul><ul><li>Or, applying the FD’s again will not change the matrix anymore, and the composition is not lossless </li></ul></ul></ul><ul><li>Applying FD X  Y: For all rows that have the same values for the attributes in X, make the attributes in Y equal as well (first start with the a j, otherwise try the b ij ) </li></ul>
24. 29. 3NF decomposition algoritme (lossless + d.p.) <ul><li>Given a relation scheme R and a set of FD’s F: </li></ul><ul><li>Construct a minimal cover of F, call it G </li></ul><ul><li>For each X i  A i in G, construct a subrelation scheme X i A i </li></ul><ul><li>If X i = X j for two subrelation schemes (X i A i and X j A j ), merge them into X i A i A j . </li></ul><ul><li>Construct one relation scheme X, where X is the prime key of R. </li></ul><ul><li>Check whether there are still remaing attributes (which are not yet covered by the earlier steps), put them in separate subrelation schemes </li></ul><ul><li>N.B. Steps 2 and 3 can also be combined into one step. </li></ul>
25. 30. Final remarks <ul><li>With this material one must be capable to recognize potential redundancies, and to avoid them to a certain extent </li></ul><ul><li>Normalisation can lead to a large number of small tables (i.e., tables with a small number of attributes) which have to be combined with others to obtain proper data. This may cost performance and maintenance overhead. </li></ul><ul><li>Hence, it is up to the database designer to decide whether these potential redundancies have to be resolved or not. </li></ul>