1. CS 222
Database Management System
Spring 2010-11
Lecture 5
Database Design (Decomposition)
Korra Sathya Babu
Department of Computer Science
NIT Rourkela
2. Recap
• Design of DB is needed to reduce redundancy and
anomalies
• The theory of Functional Dependency is completely
studied
• Better Design requires schema refinement
• A solution for schema refinement is Synthesis of
relations
2/11/2013 Database Design 2
4. Relation Decomposition
• Reason for Decomposition
• A solution for reducing redundancy and Anomalies
• Rules for synthesis
• Lossless Join (Information Preservation)
• Dependency Preservation (a special case of information
preservation)
• Decomposition (synthesis) types
• By functional dependency
• By multi-valued dependency
• By Join dependency
2/11/2013 Database Design 4
5. Lossless Join
• Definition
A decomposition D = {R1, R2,..., Rm} of R has the lossless
join property with respect to the set of dependencies F on
R if, for every relation r of R that satisfies F, the following
holds, ( R1(r), ..., Rm(r)) = r
where is the natural join of all the relations in D
• The word loss in lossless refers to loss of
information, not to loss of tuples.
2/11/2013 Database Design 5
6. Test for Lossless Join
Input: A relation R, a decomposition D = {R1, R2,..., Rm} of R, and a set F of
Functional Dependencies
Lossless Join Test Algorithm:
Step 1: Create an initial matrix S with one row i for each relation Ri in D, and one
column j for each attribute Aj in R.
Step 2: Set S(i, j) := bij for all matrix entries
Step 3: For each row i representing relation schema Ri Do
{for each column j representing Aj do
{if relation Ri includes attribute Aj then
set S(i, j) := aj;}
Step 4: Repeat the following loop until a complete loop execution results in no
changes to S.
2/11/2013 Database Design 6
7. Test for Lossless Join
Lossless Join Test Algorithm: continues…
Step 4: Repeat the following loop until a complete loop execution results in no
changes to S.
If {for each function dependency X Y in F do
for all rows in S which have the same symbols in the columns
corresponding to attributes in X do
{make the symbols in each column that correspond to
an attribute in Y be the same in all these rows as follows:
if any of the rows has an “a” symbol for the column,
set the other rows to the same “a” symbol in the column.
If no “a” symbol exists for the attribute in any of the
rows, choose one of the “b” symbols that appear in one
of the rows for the attribute and set the other rows to
that same “b” symbol in the column;}}
Step 5: If a row is made up entirely of “a” symbols, then the
decomposition has the lossless join property;
otherwise it does not.
2/11/2013 Database Design 7
13. Problems
• Check whether the following decompositions are
lossy or lossless
• Let R=ABCDE, R1=AD, R2=AB, R3=BE, R4=CDE, R5=AE.
Let F={AC, BC, CD, DEC, CEA}
• R(XYZWQ), FD={XZ, YZ, ZW, WQZ, ZQX}.
R1(XW), R2(XY), R3(YQ), R4(ZWQ), R5(XQ)
• R(XYZ), F={XY, ZY}. R1(XY), R2(YZ)
• R(XYWZPQ), D={R1(ZPQ), R2(XYZPQ)}
F={XYW, XWP, PQZ, XYQ}
2/11/2013 Database Design 13
14. Dependency Preservation
R was decomposed (normalisation) into R1, …, Rn
S - the set of FDs for R
S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers
to only the attributes of Ri)
S’ = S1 … Sn (usually, S’ S)
the decomposition is dependency preserving if
S’+ = S+
2/11/2013 Database Design 14
15. Test for Dependency Preservation
Input: decomposition D={D1,…,Dk} and a set of FDs F
Dependency Preservation Test:
Step 1: For each XY Є F initialize a set T of attributes with the attributes of X (the
determinant of the FD under consideration). ie set T=X and continue with step 2
Step 2: Repeat step 3 until the set T no longer changes. When T no longer
changes continue with step 4
Step 3: For each relation Ri (1≤ i ≤ k) of the input decomposition apply
the corresponding Ri operation (on a set of attributes T with respect to set
of dependencies F). i.e T=T ∩ ((T ∪ Ri)+ ∩ Ri) and repeat step 3
Step 4: Test to see if Y(the right hand side of the FD under consideration)
is such that Y ⊂ T. There are two outcomes to this test. If the answer is
negative. i.e. if Y not a subset of T then stop the execution of the
algorithm and report that the decomposition does not preserve the FD. If
the answer is affirmative, i.e. if Y ⊂ T then XY Є G+. If there are other
FDs in F that need to be considered repeat step 1 with a FD that has not
been considered before. If no more FDs in F then continue with step 4
2/11/2013 Database Design 15
16. Problems
1. Given R(XYZ) and the set F = {ZX , XYZ}. Check if the
decomposition R1(XY) and R2(XZ) preserve the set F.
2. Given R(ABCD) and the set F = {AB , CD}. Check if the
decomposition R1(AB) and R2(CD) preserve the set F.
3. Determine if the decomposition D={R1(XY), R2(YZ), R3(ZW)} of the
relation R(WXYZ) preserves the dependencies of the set F={XY,
YZ, ZW, WX}.
4. Given R(ABCDEF) and the set F = {AB , CDF, ACE, DF}. Check
if the decomposition R1(ACE), R2(CD), R3(DF) and R4(AB) preserve
the set F.
2/11/2013 Database Design 16
17. Normalization
• Normalization is the process of successive reduction
of a given set of relations to a better form (reduced
redundancy and anomalies)
• The normalization that one needs to sustain
depends on the work flow (tradeoff between fast
access, maintenance of integrity)
• Assumes that all possible functional dependencies
are known
• First construct a minimal set of FDs
• Then apply algorithms that construct a required Normal
Form
• Additional criteria may be needed to ensure that the
set of relations in a relational database are
atisfactory
2/11/2013 Database Design 17
18. 1 NF
• A relation is in first normal form (1NF) if it does not
contain any repeating columns or repeating groups
of columns
• It is the process of converting complex data
structures into more simple, stable data structures
• A relvar is in 1NF if and only if in every legal value
of that relvar, every tuple contains exactly one
value for each attribute
• First Normal From (1NF)
• Unique rows
• All attributes are atomic
2/11/2013 Database Design 18
19. 2 NF
• A table is in the second normal form (2NF) if it is in
the first normal form and if all non-key columns in
the table depend on the entire primary key
• The following relation is in 1NF but not 2NF
EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed)
Functional dependencies:
1. Emp_ID Name, Dept, Salary partial key dependency
2. Emp_ID, Course Date_Completed
Decompose into 2NF
EMPLOYEE1(Emp_ID, Name, Dept, Salary)
Functional dependencies: Emp_ID Name, Dept, Salary
EMPCOURSE(Emp_ID, Course,Date_Completed)
Functional dependency: Emp_ID, Course Date_Completed
2/11/2013 Database Design 19
20. 3 NF
• A table is in the third normal form (3NF) if it is in
the second normal form and if all non-key columns
in the table depend non-transitively on the entire
primary key
SALES(Customer_ID, Customer_Name, SalesPerson, Region)
Functional dependencies:
1. Customer_ID Customer_Name, SalesPerson, Region
2. SalesPerson Region Transitive Dependency
Decompose into 3NF
SALES1(Customer_ID, Customer_Name, SalesPerson)
Functional dependencies: Customer_ID Customer_Name, SalesPerson
SPERSON(SalesPerson, Region)
Functional dependency: SalesPerson Region
2/11/2013 Database Design 20
21. BCNF
• A table is in Boyce-Codd normal form (BCNF) if
every column, on which some other column is fully
functionally dependent, is also a candidate for the
primary key of the table
• A table is in BCNF if the only determinants in the
table are the candidate keys
SCHOOL(Student, Subject, Teacher)
Functional dependencies:
1. Student, Subject Teacher
2. Student, Teacher Subject
3. Teacher Subject
Decompose into BCNF
SCHOOL1(Student, Subject)
SCHOOL2(Subject, Teacher)
All Functional Dependencies vanished except TeacherSubject
2/11/2013 Database Design 21
22. Comparison between 3NF and BCNF
• It is always possible to decompose a relation into
relations in 3NF such that:
the decomposition is lossless
the dependencies are preserved
• It is always possible to decompose a relation into
relations in BCNF such that:
the decomposition is lossless
but it may not be possible to preserve dependencies
But may eliminate more redundancy
2/11/2013 Database Design 22
23. Multivalued Dependency
Let R be a relation schema and let R and R. The
multivalued dependency
holds on R if in any legal relation r(R), for all pairs for
tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4
in r such that:
t1[ ] = t2 [ ] = t3 [ ] = t4 [ ]
t3[ ] = t1 [ ]
t3[R – ] = t2[R – ]
t4 ] = t2[ ]
t4[R – ] = t1[R – ]
• MVD is a tuple generating Dependency
2/11/2013 Database Design 23
24. 4 NF
• A table is in the fourth normal form (4 NF) if it is in
BCNF and does not have any independent multi-
valued parts of the primary key
• If there are two attributes A and B and for a given
value of A if there exists multiple values of B, then
we say that an MVD exists between A and B
• The normal forms after BCNF are theoretical
interests
2/11/2013 Database Design 24
25. 4 NF
Student Table
Student Subject Language
Geeta Mythology English
Geeta Psychology English
Geeta Mythology Hindi
Geeta Psychology Hindi
Shekher Gardening English
Student Subject
Student Language
2/11/2013 Database Design 25
26. 4 NF
Split the independent multi-valued components of the
primary key into two tables
The primary key is (student subject language)
Student_Subject Table Student_Language Table
Student Subject Student Language
Geeta Mythology Geeta English
Geeta Psychology Geeta Hindi
Shekher Gardening Shekher English
Here we take care of the update anomaly
2/11/2013 Database Design 26
27. Surprise: Loss less Decomposition
• There exists relations that cannot be nonloss-
decomposed into two projects, but can be
decomposed into three or more
2/11/2013 Database Design 27
28. Join Dependency
• Definition: A relation R satisfies the join
Dependency (JD) *(X,Y,…,Z)
iff R is equal to the join of its projects on
X,Y,..,Z, where X,Y,..,Z are subsets of the set of
attributes of R.
• Consider the following Suppliers(S), Parts(P) and Location they
Supply (L) table
SPL Table
S P L
S P P L
S1 P1 L2
ACTUAL S1 P1 P1 L2
DECOMPOSTION
S1 P2 L1
S1 P2 P2 L1
S2 P1 L1
S2 P1 P1 L1
S1 P1 L1
2/11/2013 Database Design 28
29. Join Dependency
S P L
S P P L
S1 P1 L2
ACTUAL S1 P1 P1 L2
DECOMPOSTION
S1 P2 L1
S1 P2 P2 L1
S2 P1 L1
S2 P1 P1 L1
S1 P1 L1
Join
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
Spurious S2 P1 L2
Tuple
2/11/2013 Database Design 29
30. Join Dependency
S P L
S P P L L S
S1 P1 L2
DECOMPOSTION S1 P1 P1 L2 L2 S1
S1 P2 L1
S1 P2 P2 L1 L1 S1
S2 P1 L1
S2 P1 P1 L1 L2 S2
S1 P1 L1
Join
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
2/11/2013 Database Design 30
31. 5 NF
• A table is in fifth normal form (5NF) if it is in the
fourth normal form and every join dependency in
the table is implied by the candidate key
• Its also called as the Project Join Normal Form
(PJNF)
2/11/2013 Database Design 31
32. Normalization
Un-normalized Relation
Arrange every atomic value in the cell
(intersection of row and column) of a table
First Normal Form (1NF)
Eliminate Partial Dependencies
Second Normal Form (2NF)
Eliminate Transitive Dependencies
Third Normal Form (3NF)
Make every determinant as a key
Boyce-Codd Normal Form
Eliminate Multi-valued Dependencies
that are not Functional Dependencies
Fourth Normal Form (4NF)
Eliminate Join Dependencies that are not
implied by Candidate keys
Fifth Normal Form (5NF)
2/11/2013 Database Design 32
33. Denormalization
• Denormalization if a process in which we retain or
introduce some amount of redundancy for faster
data access
• Where there arise tradeoffs
2/11/2013 Database Design 33
34. Summary
• Normalization helps to reduce redundancy and few
anomalies
• The first 3 (1, 2 and 3) normal forms are practical
but BCNF, 4NF and 5 NF are more of theoretical
interests
• Denormalization is done for fast access
2/11/2013 Database Design 34