• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Databases 1
 

Databases 1

on

  • 874 views

 

Statistics

Views

Total Views
874
Views on SlideShare
874
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Databases 1 Databases 1 Presentation Transcript

  • Databases I ( H.10 /H.11) Summary FD’s and Normalisation Steven Klusener Vrije Universiteit, Amsterdam versie 2007
  • Topics
    • Functional dependencies (FD’s)
      • Formal definition of an FD
      • Deriving FD’s from others, or, constructing a minimal cover
      • Showing counter examples
    • Candidate and primary keys, based on FD’s
    • Normalisation: 2 NF, 3 NF and BCNF
    • Decomposition and the lossless join property
    • A 3NF decomposition algorithm that satisfies the lossless join property.
  • Functional Dependency (FD)
    • Given:
      • A relation R with attributes A 1 , A 2 , …, A n
      • X  {A 1 , A 2 , …, A n } and Y  {A 1 , A 2 , …, A n }
    • Y depends functionally on X (notation: X  Y) iff for each possible extension of R it holds that
      •  t 1 ,t 2  R: (t 1 [X] = t 2 [X])  (t 1 [Y] = t 2 [Y])
    • Note that if all tuples in a R have a different value for X, then the FD X  Y is trivially satisfied
  • Minimal set of FD’s
    • Certain functional dependencies can be derived from others
    • For example
      • ENR  BDATE
        • (ENR value E1 (John) determines the birth date 1964-08-28 )
      • BDATE  ZODIAC zodiac = “sterrenbeeld”
        • (The birth date 1964-08-28 determines the zodiac value Virgin)
    • Hence, each employee number determines the zodiac sign of that employee, so we have ENR  ZODIAC, which follows from ENR  BDATE and BDATE  ZODIAC (by transitivity as we will see later)
  • Derivability of FD’s, closure
    • We can derive FD’s from others using the following inteference rules:
      • Reflexive Rule : the FD X  Y is given, and hence derivable,
      • for any Y  X.
      • Augmentation Rule : the FD XZ  YZ is derivable from
      • the FD X  Y.
      • Transitive Rule : the FD X  Z is derivable from
      • the two FD’s X  Y and Y  Z
    • From these rules other interference rules can be obtained, such as:
      • Decomposition Rule : the FD X  Y is derivable from
      • the FD X  YZ.
      • Additive Rule : the composed FD X  YZ is derivable from
      • the two FD’s X  Y and X  Z.
    • From a set of FD’s F we write F+ for the set of all FD’s that can be derived from F, F+ is called the closure of F
  • Minimal set of FD’s
    • A set F of functional dependencies is minimal iff:
      • Every FD in F is of the form X  A, where A is an attribute
      • We cannot make any left-hand-side smaller, so if a rule XY  A is simplified into X  A then the closure gets strictly smaller
      • We cannot remove FD’s from F without losing FD’s in the closure of F
    • In general, the construction of a minimal cover does not lead to unique one
  • Keys
    • Given
      • A relation R with attributes A 1 , A 2 , …, A n
      • A minimal set of FD’s F
      • A set of attributes X  {A 1 , A 2 , …, A n }
    • then
    • X is a (candidate) key of R iff for each i in [1,..,n] :
    • 1) identification: X  A i is derivable from F èn
    • 2) minimalitity: if Y  X  Y  X then Y  A i is not derivable from F
  • Some terminology w.r.t. keys
    • superkey (identification, but no minimality)
      • Like (ENR,NAME)
    • candidate key (a candidate, see previous slide)
      • Like (ENR)
    • primary key (the selected candidate key)
    • alternate/alternative key (all remaining candidate keys)
  • Categorizing FD’s, based on the key
    • Given a relation R and a minimal set of FD’s F
    • An attribute A is a prime attribute if A  K, for some candidate key K, otherwise it is a non-prime attribute
    • If attr. A is a non-prime attribute, then X  A
      • is a regular FD if X is a candidate key
      • is a partial FD if X  K, for some candiate key K
      • is a transitive FD if all atributes in X are non-prime
    • If attr. A is a prime attribute, then X  A is a prime FD
  • Normal forms
    • Given a relation R, a minimal set of FD’s F
    • R is in
    • 2NF, if there are no partial FD’s
    • 3NF, there are no partial and no transitive FD’s
    • BCNF, if R is in 3NF, à nd if for every X  A in F it holds that X is a key
  • 2 NF
    • 2 NF: no partial FD’s are allowed, if they occur, then the relation has to be decomposed.
    • EMP_PROJ( SSN,PNUMBER ,HOURS,ENAME,PNAME,PLOC)
    • With FD’s
    • SSN, PNUMBER  HOURS
    • SSN  ENAME (partial dependency)
    • PNUMBER  PNAME, PLOC (partial dependency)
    • After decomposition:
    • EMP_PROJ1 ( SSN,PNUMBER , HOURS)
    • EMP ( SSN , ENAME)
    • PROJ ( PNUMBER , PNAME, PLOC)
    • Note
      • If the key contains only one attribute, the 2NF property holds trivially
      • Be sure that you have lossless projection (to be discussed later)
  • 3 NF
    • 3 NF: no transitive FD’s are allowed, if they occur, then the relation has to be decomposed.
    • EMP_DEPT( SSN , ENAME, BDATE, DNUMBER, DNAME, DMGRSSN)
    • With FD’s
    • SSN  ENAME, BDATE, DNUMBER
    • DNUMBER  DNAME, DMGRSSN (transitive dependency)
    • After decomposition:
    • EMP ( SSN , ENAME, BDATE, DNUMBER)
    • DEPT ( DNUMBER , DNAME, DMGRSSN)
    • Rephrasing the 3 NF property: for every non-prime attribute A and FD X  A in F, X must be a candidate key
  • BCNF
    • For every FD X  A, X must be a superkey
    • TEACH( STUDENT, COURSE , INSTRUCTOR )
    • With FD’s
    • STUDENT, COURSE  INSTRUCTOR
    • INSTRUCTOR  COURSE (INSTRUCTOR is not a key)
    • Decomposition is not trivial:
    • S-I-1( STUDENT , INSTRUCTOR ) and S-C-1( STUDENT , COURSE)
    • C-I-2( INSTRUCTOR , COURSE) and C-S-2( COURSE , STUDENT )
    • I-C-3( INSTRUCTOR , COURSE) and I-S-3( INSTRUCTOR , STUDENT )
    • Conclusion:
    • FD1 is lost in all three cases (no subrelation contains all three attributes)
    • I-C-3/I-S-3 is the best, because it is non-additive (to be discussed later)
  • Dependency preservation during decomposition
    • During normalisation we decompose relations into subrelations, until we arrive at the right level
    • However, we have to take care this decomposition guarantees:
      • Lossless joins: we must be able to construct the original relation from joining the subrelations
      • In other words, joining the subrelations may not introduce spurious tuples
  • Example Natural Join (  )
    • EMP DEPT
    • E# ENAME BDATE D# D# DNAME BUDGET
    • E1 John 28-08-1964 D1 D1 engineering 500,000
    • E2 Joe 04-04-1968 D1 D2 sales 200,000
    • E3 Jack 03-09-1969 D1
    • E4 Will 21-03-1971 D2
    • E5 Bridget 22-01-1972 D2
    • EMP  DEPT: Join of EMP and DEPT, combine every tuple from EMP and tuple of DEPT if their common attributes (here D#) have the same value
    • EMP  DEPT
    • E# ENAME BDATE D# NAME BUDGET
    • E1 John 28-08-1964 D1 engineering 500,000
    • E2 Joe 04-04-1968 D1 engineering 500,000
    • E3 Jack 03-09-1969 D1 engineering 500,000
    • E4 Will 21-03-1971 D2 sales 200,000
    • E5 Bridget 22-01-1972 D2 sales 200,000
  •  
  •  
  •  
  •  
  •  
  • Dependency preservation during decomposition
    • During normalisation we decompose relations into subrelations, until we arrive at the right level
    • However, we have to take care this decomposition guarantees:
      • Lossless joins: we must be able to construct the original relation from joining the subrelations
      • In other words, joining the subrelations may not introduce spurious tuples
  • Lossless-join, decomp. into 2 projections
    • D is a lossless join decomposition w.r.t. F iff
      • (R 1  R 2 )  R 1  F + , òr
      • (R 1  R 2 )  R 2  F +
    • R = DPD_EMP = {E#, DPD_N, REL, EMP_N, BDATE, D#} with F = { {E#, DPD_N}  REL, E#  EMP_N, E#  BDATE, E#  D# } R 1 = DPD = {E#, DPD_N, REL} R 2 = EMP = {E#, EMP_N, BDATE, D#} R 1  R 2 = {E#} and E#  {E#, EMP_N, BDATE, D#} (=EMP) Hence : this decomposition of DPD_EMP is lossless w.r.t. F
  • Ex. lossless-join (with n projections) (1/5)
    • DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET)
      • with F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET } DPD (E#, DPD_N, REL) EMP (E#, EMP_N, BDATE, D#) DEPT (D#, DPM_N, BUDGET)
  • Ex. lossless-join (with n projections) (2/5)
    • DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET)
    • with F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET }
    • INITIAL MATRIX: a i for each relation/row that has a certain attribute (whit index i)
    • E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET
    • DPD a 1 a 2 a 3 b 14 b 15 b 16 b 17 b 18
    • EMP a 1 b 22 b 23 a 4 a 5 a 6 b 27 b 28
    • DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8
  • Ex. lossless-join (with n projections) (3/5)
    • DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET)
    • met F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET }
    • E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET
    • DPD a 1 a 2 a 3 b 14 b 15 b 16 b 17 b 18
    • EMP a 1 b 22 b 23 a 4 a 5 a 6 b 27 b 28
    • DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8
    • N.B. After applying the FD E#  {EMP_N, BDATE, D#}
  • Ex. lossless-join (with n projections) (4/5)
    • DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET)
    • met F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET }
    • E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET
    • DPD a 1 a 2 a 3 a 4 a 5 a 6 b 17 b 18
    • EMP a 1 b 22 b 23 a 4 a 5 a 6 b 27 b 28
    • DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8
    • N.B. After applying the FD E#  {EMP_N, BDATE, D#}
  • Ex. lossless-join (with n projections) (5/5)
    • DPD_EMP_DPM (E#, DPD_N, REL, EMP_N, BDATE, D#, DPM_N, BUDGET)
    • met F = { {E#, DPD_N}  REL, D#  DPM_N, E#  EMP_N, D#  BUDGET, E#  BDATE, DPM_N  D#, E#  D#, DPM_N  BUDGET }
    • E# DPD_N REL EMP_N BDATE D# DPM_N BUDGET
    • DPD a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
    • EMP a 1 b 22 b 23 a 4 a 5 a 6 a 7 a 8
    • DEPT b 31 b 32 b 33 b 34 b 35 a 6 a 7 a 8
    • N.B. After applying the FD’s E#  {EMP_N, BDATE, D#} and D#  {DPM_N, BUDGET}
    • Lossless-join eigenschap has been shown: row DPD contains only a’s!
  • Algorithm lossless-join
    • Does the decomposition R 1 , …, R k of R satisfy the lossless-join property w.r.t. m.b.t. A set of FD’s F?
    • 1) construct the initial matrix, consisting of
        • A row for each subrelation R i
        • A column for each attribute of R
        • For each entry, thus for each row i and column j
          • Put a i in the entry, if row i has the attribute of column j
          • Otherwise, put b ij in this entry
    • 2) apply each FD X  Y  F (see below) until
        • Either one row is of the form a 1 , …, a n, , and the composition is lossless indeed
        • Or, applying the FD’s again will not change the matrix anymore, and the composition is not lossless
    • Applying FD X  Y: For all rows that have the same values for the attributes in X, make the attributes in Y equal as well (first start with the a j, otherwise try the b ij )
  • 3NF decomposition algoritme (lossless + d.p.)
    • Given a relation scheme R and a set of FD’s F:
    • Construct a minimal cover of F, call it G
    • For each X i  A i in G, construct a subrelation scheme X i A i
    • If X i = X j for two subrelation schemes (X i A i and X j A j ), merge them into X i A i A j .
    • Construct one relation scheme X, where X is the prime key of R.
    • Check whether there are still remaing attributes (which are not yet covered by the earlier steps), put them in separate subrelation schemes
    • N.B. Steps 2 and 3 can also be combined into one step.
  • Final remarks
    • With this material one must be capable to recognize potential redundancies, and to avoid them to a certain extent
    • Normalisation can lead to a large number of small tables (i.e., tables with a small number of attributes) which have to be combined with others to obtain proper data. This may cost performance and maintenance overhead.
    • Hence, it is up to the database designer to decide whether these potential redundancies have to be resolved or not.