2. • The normalization process, as first proposed by Codd
(1972a), takes a relation schema through a series of
tests to certify whether it satisfies a certain normal
form.
• The process, which proceeds in a top-down fashion
by evaluating each relation against the criteria for
normal forms and decomposing relations as
necessary, can thus be considered as relational
design by analysis.
3. • Initially, Codd proposed three normal forms,
which he called first, second, and third normal
form.
• A stronger definition of 3NF—called Boyce-
Codd normal form (BCNF)—was proposed
later by Boyce and Codd.
• All these normal forms are based on a single
analytical tool: the functional dependencies
among the attributes of a relation.
• Later, a fourth normal form (4NF) and a fifth
normal form (5NF) were proposed, based on
the concepts of multivalued dependencies
and join dependencies, respectively
4. • Normalization of data can be considered a
process of analyzing the given relation schemas
based on their FDs and primary keys to achieve
the desirable properties of
– minimizing redundancy and
– minimizing the insertion, deletion, and update
anomalies
5. • It can be considered as a “filtering” or
“purification” process to make the design have
successively better quality.
• Unsatisfactory relation schemas that do not
meet certain conditions—the normal form
tests—are decomposed into smaller relation
schemas that meet the tests and hence
possess the desirable properties.
6. • Thus, the normalization procedure provides
database designers with the following:
– A formal framework for analyzing relation
schemas based on their keys and on the
functional dependencies among their attributes
– A series of normal form tests that can be carried
out on individual relation schemas so that the
relational database can be normalized to any
desired degree
7. Definition.
• The normal form of a relation refers to the
highest normal form condition that it meets,
and hence indicates the degree to which it has
been normalized.
8. • Normal forms, when considered in isolation from
other factors, do not guarantee a good database
design.
• It is generally not sufficient to check separately
that each relation schema in the database is, say,
in BCNF or 3NF.
• Rather, the process of normalization through
decomposition must also confirm the existence
of additional properties that the relational
schemas, taken together, should possess.
9. • These would include two properties:
– The nonadditive join or lossless join property,
which guarantees that the spurious tuple
generation problem does not occur with respect
to the relation schemas created after
decomposition.
– The dependency preservation property, which
ensures that each functional dependency is
represented in some individual relation resulting
after decomposition.
10. • The non additive join property is extremely
critical and must be achieved at any cost,
whereas the dependency preservation
property, although desirable, is sometimes
sacrificed
11. • database designers need not normalize to the
highest possible normal form.
• Relations may be left in a lower normalization
status, such as 2NF, for performance reasons
Doing so incurs the corresponding penalties of
dealing with the anomalies.
12. FIRST NORMAL FORM
• First normal form (1NF) is now considered to be part of
the formal definition of a relation in the relational
model;
• historically, it was defined to disallow multivalued
attributes, composite attributes, and their
combinations.
• It states that the domain of an attribute must include
only atomic (simple, indivisible) values and that the
value of any attribute in a tuple must be a single value
from the domain of that attribute.
13. • Hence, 1NF disallows having
– a set of values (multivalued attribute)
– a set of values (composite attribute)
– or a combination of both
as an attribute value for a single tuple.
14. • In case of composite attributes, INF can be
achieved by breaking the composite attributes
into atomic attributes
• Employee(eno, Name)
• Address(eno, Hno, Street, City)
16. • To normalise this relation into 1NF, we Place
the multivalued attribute in a separate
relation with the primary key of the original
relation
17. SECOND NORMAL FORM (2NF)
• Based on the concept of Full Functional
Dependency
18. • A FD X-> Y is said to be a full FD if removal of
any attribute A from X means that the
dependency does not hold any more
• i.e. {X-A} does not functionally determine Y
19. • A FD X-> Y is said to be partial if some
attribute A can be removed from X and the
dependency {X-A} -> Y still holds
20.
21.
22. 2NF states that
A relation schema R is in 2NF if it satisfies the
following conditions :
• It is in 1NF
• every non prime attribute A in R is fully
functional dependent on any key of R (no non
prime attribute is dependent on part of the
key)
23. • Alternate Definition of 2NF
A relation schema R is said to be in 2NF if it is
in 1NF and
every non-prime attribute A in R is not partially
dependent on any key of R
24. • {SSn, pno} is the primary key
• Hours, pname, ename, plocation are the non-
prime attributes
25. • The relation is not in 2NF because of the
following FDs
– {SSn, Pno} -> ename
– {SSn, Pno} -> pname
– {SSn, Pno} -> plocation
26. • To normalize a non 2NF relation into 2NF
relations
– Decompose the non 2NF relation into 2NF
relations where the non-prime attributes are
associated with only that part of the primary key
on which they are fully functional dependent
27.
28. Algorithm for 2NF Decomposition
• Let R be a relational schema not in 2NF
• Let F be the set of FDs holding on R
• Determine the canonical cover Fc of R
• Determine the set of candidate keys (K1, K2, …Kn)
for R
• Determine the non-prime attributes K’ = R - {K1,
K2, …Kn)
• While there exists a non 2NF schema Ri
• Do
– For(Each non-trivial left irreducible FD X->Y in Ri)
– Do
• If Y is partially functionally dependent on any prime attribute, then
• R = (R – Y) U XY
29. Steps to decompose a non-2NF relation into a
2NF relation
• Step 1:
– Create a separate relation for each partial
dependency
• Step 2:
– Remove the right hand side attribute of the
partial dependency from the relation that is
being decomposed.
i.e. R= (R-Y) U (XY) if X->Y is a partial FD
30. THIRD NORMAL FORM (3NF)
Based on concept of Transitive Functional
Dependency
31. • A FD X-> Y in a relation schema R is said to be
transitive if there is a set of attributes Z that is
neither a superkey nor a subset of any key of R
and both X-> Z and Z->Y hold
32. • From SSno -> dnumber and dnumber -> dmgrno,
we get
• SSno -> dmgrno (By Transitivity)
• If X= Ssno, Y= dmgrno and Z= dnumber
• Dnumber is neither a key or subset of any key,
therefore SSno -> dmgrno is a transitive FD
33. • In simpler words, a transitive FD is that which
is obtained by applying transitivity rule to
other FDs
34. • 3NF states that
– A relation schema R is in 3NF if
• It satisfies 2NF
• No non-prime attribute is transitively dependent on the
primary key
35. • Emp_Deptt is in 2NF because all non-prime
attributes are fully functionally dependent on
the primary key
• But non prime attribute dmgrno is transitively
dependent on the primary key Ssno
• Hence relation is not in 3NF
36. • To decompose a non-3NF relation into 3NF
relations
– Decompose and set up a relation that includes the
non key attributes that functionally determine
other non key attributes
37.
38. Check for 3NF
• A relation schema R is said to be in 3NF, if
every FD X->Y holding on R satisfies one of the
following conditions
– It is a trivial FD
– X is a superkey of R
– Y is a prime attribute of R
39. • Emp_deptt is not in 3NF because
– FDs are non trivial
– RHS in FDs are non prime
– Dnumber is not a superkey
40. • The decomposed relations are in 3NF as
In Emp,
– Sno is the primary key (Ssno -> ename,Address, dnumber)
In Deptt
– Dnumber is the primary key (dnumber -> dname, dmgrno)
41. • Given the relation schema R(ABCDE) and the
set of functional dependencies
A->BC, CD->E, E->A , B->D
Check if R is in 3NF
Solution in notes
42. Algorithm for 3NF Decomposition
• Let R be a relational schema not in 3NF
• Let F be the set of FDs holding on R
• Determine the canonical cover Fc of R
• Determine the set of candidate keys (K1, K2, …Kn) for R
• Determine the non-prime attributes K’ = R - {K1, K2, …Kn)
• While there exists a non 3NF schema Ri
• Do
– For(Each non-trivial left irreducible FD X->Y in Ri)
– Do
• If(X is not a super key of Ri) and (Y has only non- prime attributes)
• R = (R – Y) U XY
43.
44. • For the FD D->E, we decompose the relation
as follows
• R= (R-E) U (D,E)
= (ABCD) U (DE)
D->E
D is the primary key, hence in
3NF
A->BCDE
BC->AD
A and BC are superkeys hence in 3NF
45. • BOYCE CODD NORMAL FORM
– A stricter form of 3NF
– A relation schema in 3NF may still have some
anomalies, especially when the schema has
multiple candidate keys which may be composite
or overlapping
– In such cases update anomalies may exist
46. • The FDs are
{S#, P#} -> Qty
{Sname, P#} -> Qty
S#-> Sname
Sname -> S#
• The keys are
{S#, P#}
{Sname, P#}
• The schema is in 2NF and 3NF
47. • However inspite of being in 2NF and 3NF, the
relation under this schema will have
redundancies
• Thus there is a need to have a normal form
stronger than 3NF
48. • The solution is provided by Boyce Codd
Normal Form (BCNF)
• A relation schema R is said to be in BCNF if
each FD X-> Y holding on R, satisfies one of the
following conditions :
– It is a trivial FD
– X is a superkey of R
49. • These conditions for BCNF are the same as the
first two conditions for 3NF
• However, the third condition is missing, thus
BCNF is a more stricter form than 3NF
• A schema may be in 3NF but not in BCNF
50. • Now consider the schema SP(S#, Sname, P#, Qty)
And the FDs
{S#, P#} -> Qty
{Sname, P#} -> Qty
S#-> Sname
Sname -> S#
• The keys are
{S#, P#} and {Sname, P#}
• It is in 3NF but not in BCNF because of the FDs S#-> Sname
and Sname -> S# are non-trivial and LHS is not a superkey
51. • Such a relation can be decomposed into BCNF
relations
52. • The BCNF decomposition of SP based on the FDs
violationg BCNF are
– S(S#, Sname) with FDs S#-> Sname and Sname-> S#
And key S# or Sname
– SP1(S#, P#, Qty) with FD {S#, P#} -> Qty and key {S#, P#}
OR
– S(S#, Sname) with FDs S#-> Sname and Sname-> S#
And key S# or Snmae
– SP1(Sname, P#, Qty) with FD {Sname, P#} -> Qty and key
{Sname, P#}
53. • ALGO to decompose non-BCNF schema into a set
of BCNF schemas
• Let R be a relational schema not in 3NF
• Let F be the set of FDs holding on R
• Determine the canonical cover Fc of R
• Determine the set of candidate keys (K1, K2, …Kn) for R
• Determine the non-prime attributes K’ = R - {K1, K2, …Kn)
• While there exists a non BCNF schema Ri
• Do
– For(Each non-trivial left irreducible FD X->Y in Ri)
– Do
• If(X is not a super key of Ri)
• R = (R – Y) U XY
54. COMPARISON OF BCNF and 3NF
• The goal of database design is to reduce the
redundancy in relations and have consistency
of data
• This is achieved through decomposition into
normal forms so as to obtain schemas that are
– In best highest normal form (upto BCNF)
– Decomposition is
• Attribute preserving
• Dependency preserving
• lossless
55.
56.
57. • 3NF decomposition results into dependency
preserving and lossless decompositions
• However limitations of 3NF decompositions
are
– Possibility of NULL values
– Some redundancy
• So a higher form BCNF is used
– However it may not be always possible to obtain a
BCNF design without sacrificing some FDs
• Thus we may opt for a 3NF design with some
redundancy and NULL values but more
integrity of the data OR a BCNF design with
loss of FDs