Normalization
•Normalization is theprocess of efficiently organizing
data in a database.
•Two goals of the normalization process:
1. Eliminating redundant data (for example, storing the
same data in more than one table).
2. Ensuring data dependencies make sense (only
storing related data in a table).
2.
• Both ofthese are reduce the amount of space a database
consumes and ensure that data is logically stored.
The Normal Forms:
5 types i.e.,
• 1. First normal form or 1NF
• 2. Second normal form or 2NF
• 3. Third normal form or 3NF
• 4. Boyce-codd normal form(BCNF)
• 5. Fourth normal form or 4NF
• 6. Fifth normal form or 5NF.
3.
First normal formor 1NF
• First Normal Form (1NF)(Multivalued attributes should be removed)
• First normal form (1NF) sets the very basic rules for an organized
database.
• It is a relation in which the intersection of each row and column
contains one and only one value.
• Rules:
• Eliminate duplicative columns from the same table.
• Create separate tables for each group of related data and identify
each row with a unique column or set of columns (the primary key).
• The entire attribute are atomic.
Second Normal Form(2NF)
• Second Normal Form (2NF)(Partial dependency should
be removed)
• Second normal form (2NF) further addresses the
concept of removing duplicative data. It meets the
following conditions,
• Meet all the requirements of the first normal form.
• Remove subsets of data that apply to multiple rows of
a table and place them in separate tables.
• It’s don’t have partial dependencies that’s means its
having fully functional dependency.
• This tablehas a composite primary key
[Customer ID, Store ID]. The non-key attribute
is [Purchase Location]. In this case, [Purchase
Location] only depends on [Store ID], which is
only part of the primary key. Therefore, this
table does not satisfy second normal form.
9.
To bring thistable to second normal form, we break the
table into two tables, and now we have the following:
10.
• What wehave done is to remove the partial
functional dependency that we initially had.
Now, in the table [TABLE_STORE], the column
[Purchase Location] is fully dependent on the
primary key of that table, which is [Store ID].
11.
3rd Normal Form
•A database is in third normal form if it satisfies
the following conditions:
• It is in second normal form
• There is no transitive functional dependency
• By transitive functional dependency, we mean
we have the following relationships in the table:
A is functionally dependent on B, and B is
functionally dependent on C. In this case, C is
transitively dependent on A via B.
• In thetable able, [Book ID] determines [Genre
ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type]
via [Genre ID] and we have transitive
functional dependency, and this structure
does not satisfy third normal form.
14.
• To bringthis table to third normal form, we
split the table into two as follows:
15.
• Now allnon-key attributes are fully functional
dependent only on the primary key. In
[TABLE_BOOK], both [Genre ID] and [Price] are
only dependent on [Book ID]. In
[TABLE_GENRE], [Genre Type] is only
dependent on [Genre ID].
16.
BOYCE-CODD NORMAL FORM(BCNF)
• A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X
• -> A holds in R, then X is a superkey of R
• Each normal form is strictly stronger than the previous
one.
• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF
• There exist relations that are in 3NF but not in BCNF
• The goal is to have each relation in BCNF (or 3NF)
18.
Multivalued dependency
• Amultivalued dependency (MVD) X —>> Y specified on relation
• schema R, where X and Y are both subsets of R, specifies the
• following constraint on any relation state r of R: If two tuples t1 and t2
• exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also
exist in r with the following properties, where we use Z to denote
(R - (X Y)):
• t3[X] = t4[X] = t1[X] = t2[X].
• t3[Y] = t1[Y] and t4[Y] = t2[Y].
• t3[Z] = t2[Z] and t4[Z] = t1[Z].
• An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of
• X, or (b) X Y = R.
19.
Fourth Normal Form(4NF)
• Fourth normal form eliminates independent many-
to-one relationships between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one
multi-valued attribute.
• Defined as a relation that is in Boyce-Codd Normal
Form and contains no nontrivial multi-valued
dependencies.
JOIN DEPENDENCIES
• Wheneverwe decompose a relation into two
relations the resulting relations have the loss-
less join property. This property refers to the
fact that we can rejoin the resulting relations
to produce the original relation.
• Lossless-join dependency is a property of
decomposition, which ensures that no
spurious tuples are generated when relations
are reunited through a join operation
23.
Example:
The decomposition ofthe branch staffowner relation
branchNo Sname Oname
B003 Ann beech Carl Farrel
B003 David Ford Carl Farrel
B003 Ann beech Tina Murphy
B003 David Ford Tina Murphy
Into the BranchStaff
branchNo Sname
B003 Ann beech
B003 David Ford
And BranchOwner
branchNo Oname
B003 Carl Farrel
B003 Tina Murphy
24.
• Relation hasthe lossless-join property.i.e, the
original branchstaffowner relation can be
reconstructed by performing a join operation
on the branchstaff and branchowner relations
25.
Fifth Normal Form(5NF)
• Fifth Normal Form (5NF)
• A relation decompose into two relations must have
the lossless-join property,
• which ensures that no spurious tuples are generated
when relations are reunited through a natural join
operation.
• However, there are requirements to decompose a
relation into more than two relations. Although rare,
these cases are managed by join dependency and fifth
normal form (5NF).
26.
• Fifth NormalForm:
• A relation that has no join dependently is in
fifth normal form
• Example: Consider the property item supplier
relation.
Property No itemDescription SupplierNo
PG4 Bed S1
PG4 Chair S2
PG16 Bed S3
27.
• As thisrelation contains a join dependency, it
is therefore not in fifth normal form.
• To remove the join dependency, decompose
the relation into three relations as,
• Property item
Property
No
itemDescrip
tion
PG4 Bed
PG4 Chair
PG16 Bed
• The propertyitem supplier relation with form
(A<B<C) satisfies the join dependency JD(R1
(A,b),R2 (B,C),R3(A<C)).
• i.e., performing the join on all three will
recreate the original property itemsupplier
relation.
30.
DOMAIN KEY NORMALFORM (DKNF)
• The idea behind DKNF is to specify the “ultimate normal
form” that takes into account all possible types of
dependencies and constraints.
• A relation is said to be in DKNF if all constraints and
dependencies that should hold on the relation can be
enforced simply by enforcing the domain constraint and
key constraint on the relation.
• For a relation in DKNF, it becomes very straight forward to
enforce all data base constraints by simply checking that
each attribute value in a tuple is of the appropriate domain
and that every key constraint is enforced.
31.
Denormalization
- Causes redundancy,but fast performance & no
referential integrity
- Denormalize when
• specific queries occur frequently,
• a strict performance is required and
• it is not heavily updated