This chapter deals with the importance of normalization in database management systems. We learn about the necessary criterion needed for normalization. We discuss different types of normal forms along with some sample examples.
2. Lecture Outcome:
• After the completion of this chapter, the students
will be able to:
– Define Normalization
– Explain Lossless Decomposition and Dependency
Preservation
– Identify the guidelines followed in good database design
– Explain different normal forms
– List the advantages and disadvantages of Normalization
– Define denormalization
16 March 2021 2
3. Organization of this Lecture:
• Introduction to Normalization
• Lossless Decomposition and Dependency
Preservation
• Guidelines followed in good databse design
• Normal Forms
• Advanced Normal Forms
• Advantages and Disadvantages of Normalization
• Denormalization
16 March 2021 3
4. Introduction to Normalization
• Normalization is the process of decomposing or breaking a
relational schema R into fragments (i.e. smaller schemas) R1,
R2, ... Rn such that the following conditions hold:
– Lossless decomposition: The fragments should contain the
same information as the original relation.
– Dependency preservation: All the functional dependencies
should be preserved within each fragment Ri.
– Good form: Each fragment Ri should be free from any type
of redundancy.
• In other words, normalization is the process of refining the
relational data model. It is used because of the following reasons:
•It improves database design
•It ensures minimum redundancy of data
•It removes anomalies for database activities
16 March 2021 4
5. Lossless and Lossy Decomposition
• Decomposition is lossless if it is feasible to reconstruct relation
R from decomposed tables using Joins. The join would result
in the same original relation.
• Relation R is decomposed into two or more relations if
decomposition is lossless join as well as dependency
preserving.
– Union of Attributes of R1 and R2 must be equal to
attribute of R. Each attribute of R must be either in R1 or in
R2 i.e. Att(R1) U Att(R2) = Att(R)
– Intersection of Attributes of R1 and R2 must not be
NULL i.e. Att(R1) ∩ Att(R2) ≠ Φ
– Common attribute must be a key for at least one relation
(R1 or R2). i.e. The key will give distinct values for all the
tuples.
16 March 2021 5
6. contd..
• When the base relation schema is decomposed into the
fragmented relation schemas, the consecutive relations should
be related by primary key - foreign key pair on the common
column.
• This is to ensure natural join be possible on the common
column. Thus, we have to check whether the common column
is the key of any relation or not:
– If the common column is the key, the decomposition is
lossless. i.e. Att(R1) ∩ Att(R2) → Att(R1) or
Att(R1) ∩ Att(R2) → Att(R2)
– If the common column is not the key, the decomposition is
lossy
16 March 2021 6
7. Sample Example
• Consider the relation R<EmpInfo> as shown below:
• Now, suppose we decompose the above relation R into two
relations R1<EmpDetails> and R2<DeptDetails> as below:
16 March 2021 7
8. contd..
• The above decomposition is lossless as we can achieve the
original relation R by joining the two fragments R1 and R2.
• Lossy Decomposition: As the name suggests, when a relation
is decomposed into two or more relational schemas, the loss of
information is unavoidable when the original relation is
retrieved.
16 March 2021 8
9. contd..
• Now, suppose we decompose the above relation R into two
relations R1<EmpDetails> and R2<DeptDetails> as below:
• Now, we won’t be able to join the above tables, since Emp_ID
isn’t part of the DeptDetails relation. Therefore, the above
relation has lossy decomposition.
16 March 2021 9
10. Example based on Dataset
• Consider the relation R(A,B,C,D,E) which has been
decomposed into fragments R1 and R2. Which
decomposition(s) is lossless?
– R1(A,B) & R2(C,D)
– R1(A,B,C) and R2(D,E)
– R1(A,B,C) and R2(C,D,E)
– R1(A,B,C,D) and R2(A,C,D,E)
– R1(A,B,C,D) and R2(D,E)
16 March 2021 10
11. Example based on FD
• R(A, B, C, D) with F={A→B, B→C, C→D has been
decomposed to R1(A, B, C); F1={A→B, B→C}and R2(C, D);
F2={C→D}. Determine if this decomposition is lossless or
not.
Ans:
• Union Property:
– Att(R1) U Att(R2) = (A,B,C) U (C,D) = (A,B,C,D)
• Intersection Property:
– Att(R1) ∩ Att(R2) = (A,B,C) ∩ (C,D) = (C)
• Common Attribute must be Key for either R1 or R2.
– For relation R1: (C)+ = ( C ) i.e. C is not a key for R1
– For relation R2: (C)+ = (C, D) i.e. C is a key for R2
Hence, The above decomposition is lossless.
16 March 2021 11
12. Lossless Join Algorithm
• This algorithm is used to check whether the decomposition of a
relation is lossless or lossy type. The steps are:
– S-1: Construct a table with n columns, where n is the
number of attributes in the original relation and k rows,
where k is the number of decomposed relations. Label the
columns as A1, A2, ... An
– S-2: Fill the entries in this table as follows: for each
attribute Ai , check if this attribute is one of the attributes of
the relation schema Rj
• If attribute Ai is in the Relation Rj , then the entry (Ai , Rj
) of the table will be ai
• If attribute Ai is not an attribute in the relation Rj , then
the entry (Ai , Rj ) of the table will be bij
16 March 2021 12
13. contd..
– S-3: For each of FD X→Y of F, do the following
until it is not possible to make any more changes to
the table.
• If there are two or more rows with the same value under
the attribute or attributes of the determinant X, then
make equal their entries under attribute Y (i.e. RHS of
FD)
– When making equal two or more entries under any
column, if one of them is ai , then make all of them
ai
– If none of them is ai , then make all the entries
uniform by considering one b term.
16 March 2021 13
14. contd..
– If they are bij and bkl , chose one of these two values
as the representative value and make the other
values equal to it. That means, make all the entries
as either bij or bkl . Continue with step3
– If there are no two rows with the same value under the
attribute or attributes of the determinant X, continue with
step3.
• S-4: Check the rows of the table. If there is a row with its
entries equal to a1, a2 ... an , then the decomposition is lossless.
Otherwise, the decomposition is lossy.
16 March 2021 14
15. Example-2
• Let R = (A,B,C,D,E), R1 = (A,D), R2 = (A,B), R3 = (B,E),
R4 = (CDE), and R5 = (AE). Let the set of FDs be: F= {A ->
C, B -> C, C -> D, DE -> C, CE -> A}. Applying the Lossless
Join Algorithm, determine if the above decomposition is
lossless.
Ans:
S-1 and S-2: The initial table can be made as below:
16 March 2021 15
A1(A) A2(B) A3(C) A4(D) A5(E)
R1 a1 b21 b31 a4 b51
R2 a1 a2 b32 b42 b52
R3 b13 a2 b33 b43 a5
R4 b14 b24 a3 a4 a5
R5 a1 b25 b35 b45 a5
16. • S-3:
– For FD: A->C, the table will go through changes and will
look like as below
– For FD: B->C, the algorithm gives following result
16 March 2021 16
A1(A) A2(B) A3(C) A4(D) A5(E)
R1 a1 b21 b31 a4 b51
R2 a1 a2 b31 b42 b52
R3 b13 a2 b33 b43 a5
R4 b14 b24 a3 a4 a5
R5 a1 b25 b31 b45 a5
A1(A) A2(B) A3(C) A4(D) A5(E)
R1 a1 b21 b31 a4 b51
R2 a1 a2 b31 b42 b52
R3 b13 a2 b31 b43 a5
R4 b14 b24 a3 a4 a5
R5 a1 b25 b31 b45 a5
20. Dependency Preservation
• The decomposition of a relational schema R with FDs F is a
set of fragment relations (R1, R2,..., Rn) with FDs (F1,
F2,...,Fn) , where Fi is the subset of dependencies in F + that
include only attributes in Ri .
• The decomposition is dependency preserving iff
– (F1 ∪ F2 ∪ … ∪ Fm)+ = F+
16 March 2021 20
23. Guidelines followed in Designing Good Database
• G-1: Design a relation schema so that it is easy to explain its
meaning. Do not combine attributes from multiple entity sets
and relationship sets into a single relation.
– Only foreign keys should be used to refer to other entities.
Entity and relationship attributes should be kept apart as
much as possible.
• G-2: Design the base relation schemas in such a way that the
anomalies such as insertion, deletion, or updation anomalies are
removed from the relations.
– If any anomalies are present, note them clearly and make
sure that the programs that modify (update) the database
will operate correctly
16 March 2021 23
24. contd..
• G-3: Avoid placing attributes in a base relation whose values
may frequently be NULL.
– If NULLs are unavoidable, make sure that they apply in
exceptional cases only and do not apply to a majority of
tuples in the relation.
– Attributes that are NULL frequently could be placed in
separate relations (with the primary key).
• G-4: Design the relation schemas so that they can be joined
in a such a way that no spurious tuples are generated.
– Avoid relations that contain matching attributes that are not
(foreign key and primary key) combinations, because
joining on such attributes may produce spurious tuples
16 March 2021 24
25. Normal Forms
• Normal forms provide a stepwise progression towards the
construction of normalized relation schemas, which are free
from data redundancies.
• A series of normal form tests that can be carried out on
individual relation schemas so that the relational database can
be normalized to any desired degree.
• Normal Form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to
which it has been normalized.
• A relation schema is said to be in a particular normal form if it
it satisfies certain defined conditions.
16 March 2021 25
26. contd..
• Most practical design projects acquire existing designs of
database from previous designs, from designs in legacy
models, or from existing files.
• Existing designs are evaluated by applying the normal form
tests, and normalization is carried out in practice so that the
resulting designs are of high quality and meet the desirable
properties.
• Although several normal forms have been defined, the
practical utility is only upto 3NF, BCNF, or at most 4NF. This
is because the higher normal forms like 4NF and 5NF are
based on the constraints that are rare and hard for the designers
and users to understand or to detect.
• Thus, database design as practiced in industry today pays
particular attention to normalization upto 3NF, BCNF, or at
most 4NF
16 March 2021 26
27. First Normal Form (1NF)
• First Normal Form was proposed by Codd in 1972. It was
defined to prohibit the use of multivalued attribute, composite
attributes, or their combination in a table.
• A relation is in 1NF iff the values in the relation are atomic and
single-valued, and from the same domain for every attribute in
the relation.
16 March 2021 27
28. • As the relation schema contains no multi-data values, therefore
all relation schemas are in 1NF.
• Using the 1NF, data redundancy increases, as there will be
many columns with same data in multiple rows but each row as
a whole will be unique.
• 1NF also suffers from anomalies like insertion, deletion, and
updation
28
29. Second Normal Form (2NF)
• Partial FD: A FD A → B is a partial FD, if some attribute of A
can be removed and the FD still holds. That means there is
some proper subset of A, C ⊂A, such that C → B.
• A database is in 2NF, if it satisfies the following conditions:
– It is in 1NF.
– All non-key attributes are fully functional dependent on the
primary key. i.e There should not be any partial FD.
• If the primary key is not a composite key, all non-key attributes
are always fully functional dependent on the primary key i.e. A
table that is in 1NF and contains only a single key as the
primary key is automatically in 2NF.
16 March 2021 29
30. Steps to convert non-2NF into 2NF
• A non-2NF relation can be decomposed into 2NF relations by
following:
– Create a new relation by using the attributes from the
offending FD as the attributes in the new relation.
– The determinant of the FD becomes the primary key of the
new relation.
– The attribute on the RHS of the FD is then eliminated from
the original relation.
– If more than one FD prevents the relation from being 2NF,
repeat steps 1 and 2 for each offending FD.
– If the same determinant appears in more than one FD, place
all the attributes functionally dependent on this
determinant as non-key attributes in the relation having the
determinant as the primary key.
16 March 2021 30
31. Example for 2NF
• Consider the following example:
• This table has a composite primary key [Customer ID, Store
ID]. The non-key attribute is [Purchase Location]. i.e.
{Customer_ID, Store_ID} → {purchase_location}
holds.
• Here, the partial FD {Store_ID}→{purchase_location}
also exists because [Purchase Location] only depends on
[Store ID], Therefore, this table does not satisfy 2NF as
partial FD exists.
16 March 2021 31
32. • To bring this table to second normal form, we break the table
into two tables, and now we have the following:
• What we have done is to remove the partial functional dependency that we
initially had.
• Now, in the table [TABLE_STORE], the column [Purchase Location] is
fully dependent on the primary key of that table, which is [Store ID].
• 2NF also suffers from anomalies.
16 March 2021 32
33. Third Normal Form (3NF)
• A relation is in 3NF iff the following two conditions are
satisfied simultaneously:
•It is in 2NF
•There is no transitive FD i.e. A →B, B →C, A →C kind of
scenario must not exist.
• The process of decomposing the non-3NF relation into 3NF
relations is similar to the process of decomposing the non-2NF
relation to 2NF relations. Consider the following example:
16 March 2021 33
35. 3NF contd..
• The 3NF helped us to get rid of the anomalies caused by
dependencies of a non-key attribute on another non-key
attribute.
• However, relations in 3NF are still susceptible to anomalies
when the relations have two overlapping candidate keys or
when non-key attribute functionally determines a key attribute.
Overlapping candidate keys means composite candidate keys
with at least one attribute in common among themselves
• Note: A database should normally be in 3NF at least.
• Q1: Lecturer = (lectid, lectname, courseid, coursename) &
F={lectid → lectname, lectid → courseid, lectid →
coursename, courseid → coursename}
16 March 2021 35
36. Boyce Codd Normal Form (BCNF)
• It is an advance version of 3NF that’s why it is also referred as
3.5NF. BCNF is stricter than 3NF.
• A table complies with BCNF iff
– it is in 3NF
– for every non-trivial FD, the determinant is a key
• The process of decomposing the non-BCNF relation into
BCNF relations is a simple process. For each non-trivial FD
where the determinant is not the key, construct new relations
16 March 2021 36
37. contd..
• Consider the following relation
• This relation is not present in BCNF as in FD Time → Course;
the determinant {Time} is not a key.
16 March 2021 37
38. contd..
• After the conversion of this relation to BCNF, create a new
relation R1=(Time, Course) with set of FDs F1={Time
→Course}
• The original relation is changed to R=(Student, Time) as
{Student, Time} set is also the key of the relation.
16 March 2021 38
Time Course
12:00 Database
12:00 Database
15:00 Database
10:00 Programming
10:00 Programming
13:00 Programming
39. contd..
16 March 2021 39
• Here, we have lost the FD {Student, Course} → Time.
• Corollary: If a relation has only one candidate key, then 3NF
and BCNF are same.
• Note: Normalization to 3NF is always lossless and
dependency preserving. But, normalization to BCNF is
lossless, but may not preserve all the functional dependencies
Student Time
Rahul 12:00
Pratik 12:00
Praveen 15:00
Praveen 10:00
Rajib 10:00
Shivam 13:00
40. Example-1
• Let’s take R=ABCDE, F = {A -> BC, C -> DE)
Ans: First, let’s compute the attribute closure:
A+ = ABCDE; B+ = B; C+ = CDE; D+ = D; E+ = E
This attribute closure tells us the candidate keys for R is A.
Now that we know the candidate keys, we can begin to
decompose it.
– 1st FD: A -> BC; A is super key for R. So, it satisfies the
BCNF.
– 2nd FD: C -> DE; C is not a super key for R. So,we
decompose R into (CDE) (ABC).
– Again, In (ABC), A is still the key, so the first FD is still not
in violation.
– Again, In (CDE), C is the key, so C -> DE is also not in
violation. This decomposition is in BCNF.
16 March 2021 40
41. Example-2
• Q: R(A, B, C); F={AB→C, C→B}. Determine if this
relation is in BCNF.
Ans:
S-1: Determine if R is in BCNF.
(A B)+ = {A B C } i.e. (A B) is a super key.
(C)+ ={C B} i.e. C is not a super key.
Since, the determinant of all non trivial FDs is not super
key. Hence, the above relation R is not in BCNF.
S-2: Convert the relation R into BCNF.
Since, C is not a key. A new relation R1 can be made
using the FD C->B. i.e. R1(C,B) where C is the key.
16 March 2021 41
42. contd..
– Remove the attribute B from the original table R because B is
the dependent for FD: C->B.
– After converting into BCNF, we get R= (C,B) (A, C)
Q1: R =(A, B, C, D, E), F={A → {B, E}, C → D}. Decompose
the relation to BCNF
Q2: R=(A, B, C, D), F={{A, B} → {C, D}, C → B}.
Decompose the relation into BCNF
Q3: R =(A, B, C, D, E, G), F={{A, B} → {C, D}, {B, C} → {D,
A}, C → G, B → E}. Decompose this relation to BCNF
16 March 2021 42
43. Multi-Valued Dependency (MVD)
• MVD(Multi-Valued Dependency): A table involves a multi-
valued dependency if it contains multiple values for an entity.
• A multi-valued dependency A →→ B exists iff for every
occurrence of A; there exists multiple occurrences of B.
• If A →→ B and A →→ C, then we have a single attribute A
which multi-determines two other attributes, B and C.
• Multi-valued dependencies are also referred to as tuple
generating dependencies.
16 March 2021 43
44. MVD (contd..)
• An MVD X →→ Y in relation R is called a trivial MVD if:
•Y is a subset of X, or
•X ∪ Y = R
• An MVD that satisfies neither the first nor the second
condition is called a nontrivial MVD
• Normally, MVDs exist in pair
16 March 2021 44
45. Fourth Normal Form (4NF)
• A relation is in 4NF iff the following two conditions are
satisfied simultaneously:
•It is in 3NF
•It contains no multiple MVDs
•The above table is not in 4NF as it contains two MVDs.
16 March 2021 45
46. • To bring this up to 4NF, it is necessary to break this
information into two tables.
• Now, the tables are in 4NF as it contains only 1 MVD in each
table.
16 March 2021 46
47. Fifth Normal Form (5NF)
• Join Dependency (JD)
– If the join of R1 and R2 over C is equal to relation R, then
we can say that a join dependency (JD) exists.
– Where R1 and R2 are the decompositions R1(A, B, C) and
R2(C, D) of a given relations R (A, B, C, D).
– Alternatively, R1 and R2 are a lossless decomposition of R.
• In other words, a join dependency is said to hold over a
relation R; if R1, R2 ... Rn is a lossless-join decomposition of R.
• A relation is in 5NF iff the following two conditions are
satisfied simultaneously:
•It is in 4NF
•Every join dependency is implied by the candidate keys.
• In other words, a relation is in 5NF if it is in 4NF and the
decomposition is lossless type
16 March 2021 47
49. contd..
• Thus, decomposition of Dealer to Dealer_Parts,
Parts_Customer and Dealer_Customer is in 4NF as well as in
5NF
16 March 2021 49
50. Advantages and Disadvantages of Normalization
• Advantages of normalization
– It removes data redundancy
– It solves Insertion, Updation, and Deletion anomalies
– This makes it easier to maintain in the database in a
consistent state
• Disadvantages of normalization
– It leads to more tables in the database
– For retrieving the records or information, these tables need
to be joined back together, which is an expensive task
• Thus, sometimes it is worth denormalizing.
16 March 2021 50
51. Denormaliztion
• Denormalization is the opposite of Normalization. It is the
process of increasing redundancy in the database either for
convenience or to improve performance
• Once a normalized database design has been achieved,
adjustments can be made with the potential consequences
(anomalies) in mind.
• Possible denormalization steps include the following:
•Recombining relations that were split to satisfy
normalization rules.
•Storing redundant data in tables
•Storing summarized data in tables.
16 March 2021 51