Database Normalization: A Guide to 1NF, 2NF, 3NF and BCNF Forms
1. Normalization
□ It is a technique for designing relational database tables
to minimize duplication of information.
□ Normalization is a practice to safeguard the database
against logical and structural anomalies.
□ Normalization is also termed as canonical synthesis by
the experts.
□ It is used to keep data consistent and check that no loss
of data as well as data integrity is there.
□ Its complexity may lead to higher degree of join
operations which sometimes lead to the degraded
throughput times.
□ The normal forms like 1NF, 2NF, 3NF, BCNF, 4NF, 5NF,
DKNF & 6NF are in practice.
Jitendra Tomar 1
2. Normalization
Normal Forms
□ The Normal Forms (NF) of relational database provide with the
theoretical procedures that determine a relation’s
vulnerability towards the inconsistencies .
□ If we Normalize the relation, it will always meets the
requirements of its HNF (Highest Normal Form) and of all
normal forms lower than its HNF. Also by definition, a relation
fails to meet the requirements of any Normal Form higher than
its HNF.
□ The Normal Forms are applicable to individual relations. The
entire database is said to be in Normal Form n if all of its
relations are in Normal Form n.
□ E.F. Codd gave the concepts of first three Normal Forms
based on the concept of FDs where as the HNFs are based on
degree of relationship between attributes of a relation.
Jitendra Tomar 2
3. Normalization
According to Chris Date in "What First Normal Form
Really Means“, a Relation is said as Un-Normalized
Relation if it does not follow the following set of rules:
□ There's no top-to-bottom ordering to the rows.
□ There's no left-to-right ordering to the columns.
□ There are no duplicate rows.
□ Every row-and-column intersection contains exactly
one value from the applicable domain (and nothing
else).
□ All columns are regular [i.e. rows have no hidden
components such as row IDs, object IDs, or hidden
timestamps].
Jitendra Tomar 3
4. Normalization
Un-Normalized Relation
□ The below given Relation is Un-Normalized Relation
because of non-atomicity of the cell under Contact_No
attribute.
Customer
Cust_ID Name Contact_No
123 Navin Kumar 01202536145
01245847698
456 Vikas Malhotra
01243265984
789 Ashish Sharma 01125698745
Jitendra Tomar 4
5. Normalization
First Normal Form
□ A Relation is said to be First Normal Form if it meets a
certain minimum set of given criteria.
□ There's no top-to-bottom ordering of the rows.
□ There's no left-to-right ordering of the columns.
□ There are no duplicate rows.
□ Every row-and-column intersection contains exactly one
value from the applicable domain (and nothing else).
□ All columns are regular [i.e. rows have no hidden
components such as row IDs, object IDs, or timestamps].
□ These criteria are basically concerned with ensuring that
the table is a faithful representation of a Relation and
that it is free of Repeating Groups.
Jitendra Tomar 5
6. Normalization
First Normal Form
□ The Novice Structure of a Customer Table.
Customer
Cust_ID Name Contact_No
123 Navin Kumar 01202536145
456 Vikas Malhotra 01245847698
789 Ashish Sharma 01125698745
□ The basic problem of Atomicity will arise if the structure
of a Relation is kept as above.
□ To have multiple contact numbers for a customer will
give rise to the problem of atomicity and the user will
have repeated groups.
Jitendra Tomar 6
7. Normalization
First Normal Form
□ Solution 1
□ The simplest way is to allow two values in each cell
which will be the problematic aspect.
□ This introduce the repeated groups of Domain &
Values.
Customer
Cust_ID Name Contact_No
123 Navin Kumar 01202536145
01245847698
456 Vikas Malhotra
01243265984
789 Ashish Sharma 01125698745
Jitendra Tomar 7
8. Normalization
First Normal Form
□ Solution 2
Customer
Cust_ID Name Contact_No1 Contact_No2
123 Navin Kumar 01202536145
456 Vikas Malhotra 01245847698 01243265984
789 Ashish Sharma 01125698745
□ The another way is to define multiple columns which
makes use of Null Columns.
□ This introduce the Repeated Groups across Columns.
□ Contact_No1, Contact_No2 and so on share the
identical domain and same meaning.
Jitendra Tomar 8
9. Normalization
First Normal Form
□ The splitting is artificial and causes logical problem like:
□ Difficulty in querying the table such as "Which
customers have telephone number X?"
□ By mistake a customer is given identical values for
Contact_No1 & Contact_No2. Creates Redundancy.
□ Restriction on the number of Contact numbers per
customer. It leaves DB with unrecorded information
and means that DB design is imposing constraints on
the business process, rather than (as should ideally be
the case) vise-versa.
Jitendra Tomar 9
10. Normalization
First Normal Form
□ Solution (A design that complies with 1NF)
□ Make two table, one for the customer details and
other for the Contact details of the customer.
□ Repeating groups of Contact details are absent in this
design.
Contact
Customer
Cust_ID Contact_No
Cust_ID Name
123 01202536145
123 Navin Kumar
456 01245847698
456 Vikas Malhotra
456 01243265984
789 Ashish Sharma
789 01125698745
Jitendra Tomar 10
12. Normalization
Functional Dependency (FD)
□ It is a property of the information represented by the
relation.
□ It defines the most common encountered type of
relatedness property between data items of a
database.
□ Usually, relatedness between attributes of single
relational table are considered.
□ FD concerns the dependence of the values of one
attribute or set of attributes on those of another attribute
or set of attributes, giving rise to constraint between two
attributes or two sets of attributes.
Jitendra Tomar 12
13. Normalization
Functional Dependency
□ In FDD, functional dependency is represented by
rectangles representing attributes and a heavy arrow
showing the dependency.
□ E.g. FD: YX (Functional Dependency Diagram
when X is functionally dependent on Y.)
Y X
□ The arrow notation ‘’ is read as ‘functionally
determines.’
Jitendra Tomar 13
14. Normalization
Functional Dependency
□ In general terms, it can be stated that a set of attributes
(subset) X in a relation model table T is said to be
functionally dependent on a set of attributes (subset) Y
in the table T if a given set of values for each attribute in
Y determines a unique (only one) value for the set of
attributes in X.
□ The attributes in subset Y are sometimes knows as the
determinant of FD: Y X. The left hand side of the FD is
sometimes called determinant whereas that of the right
hand side is called the dependent. The determinant and
dependent are both sets of attributes.
Jitendra Tomar 14
15. Normalization
Functional Dependency
□ Example
□ Let us consider a functional dependency that there is
one person working on a machine each day, which is
given as:
FD: {Person_Id, Date_Used} {Machine_no}
Person_Id
Machine_No
Date_Used
Jitendra Tomar 15
16. Normalization
Functional Dependency
□ Let us consider a functional dependency of relation
Assign which is given as
Emp_No Project Yrs Spent on Project
106519 P1 5
Emp_No
112233 P3 2 Yrs Spent
106519 P2 4 Project
111222 P1 4
□ Here , if the values of Emp_No and Project are known, a
unique value of Yrs_Spent could also be known.
FD: {Emp_No, Project} {Yrs Spend on Project}
Jitendra Tomar 16
17. Normalization
Full Functional Dependency
□ The set of attributes X will be fully functionally dependent
on the set of attributes Y if the following conditions are
satisfied:
□ X is functionally dependent on Y and
□ X is not functionally dependent on any subset of Y
□ Example:
{Proj_Id} has a functional dependency on {Employee
ID, Skill}, but not a full functional dependency,
because is also dependent on {Employee ID}.
Jitendra Tomar 17
19. Normalization
Second Normal Form
□ A relation R is said to be in second normal form (2NF) if
□ it is in 1NF and
□ every non-prime key attributes of R is fully functionally
dependent on Primary Key i.e. no partial
dependency is allowed in the relation R.
□ In other words, no attributes of the relation should be
functionally dependent on only one part of the
concatenated primary key.
□ Thus, 2NF can be violated only when a key is a
composite key, consisting or more than one attribute.
□ 2NF is an intermediate step towards higher normal forms,
it eliminates the problems of 1NF.
Jitendra Tomar 19
20. Normalization
Second Normal Form
Example 1
□ Consider the below given relation PATIENT_DOCTOR. The
relation is in 1NF.
Patient DOB Doctor Contact Date Consult
Name Name No Time Duration
Ravi 10.02.1963 Abhishek 1122334 10.01.08 10:20 15
Sanjay 05.08.1983 Prakash 2255886 10.01.08 11:00 10
Ravi 10.02.1963 Manish 5566448 10.01.08 12:30 20
□ It can be observed from the relational table that a
doctor cannot have two simultaneous appointments
and thus DOCTOR_NAME and DATE_TIME is a compound
key.
Jitendra Tomar 20
21. Normalization
Second Normal Form
□ Similarly, a patient cannot have same time from two
different doctors. Therefore, PATIENT_NAME and
DATE_TIME attributes are also a candidate key.
□ The Relation could be depicted as:
PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME,
DATE_TIME, CONTACT_NO, CONSULT_DURATION)
□ In this relation composite key (DOCTOR_NAME, DATE_TIME)
is taken as a primary key.
□ But there is a partial dependency as CONTACT_NO is
Functionally Dependent upon DOCTOR_NAME, and
hence the relation is not in 2NF.
Jitendra Tomar 21
22. Normalization
Second Normal Form
□ Therefore, to bring the relation in 2NF, the information
about doctors and their contact numbers have to be
separated from information about patients and their
appointments with doctors. Thus, the relation is
decomposed into two table namely PATIENT_DOCTOR
and DOCTOR.
□ The relation in 2NF can be depicted as:
PATIENT_DOCTOR
(PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO,
CONSULT_DURATION)
DOCTOR
(DOCTOR_NAME, CONTACT_NO)
Jitendra Tomar 22
23. Normalization
Second Normal Form Contact_No
Doctor_Name
Patient_Name DOB
Date_Time
Cons Duration
Doctor_Name
Patient_Name DOB
Date_Time
Cons Duration
Doctor_Name Contact_No
Jitendra Tomar 23
24. Normalization
Second Normal Form
Problems in above example:
□ Deleting a record from relation PATIENT_DOCTOR may
lose patient’s details
□ Any changes in the details of the patient may involve
changing multiple occurrences because this information
is still stored redundantly.
Jitendra Tomar 24
26. Normalization
Third Normal Form
□ A relation R is said to be in Third Normal Form (3NF) if
□ it is in 2NF,
□ The non-prime attributes are
□ Mutually independent &
□ Functionally dependent on the primary (or relation) key.
□ No attributes of the relation should be transitively
functionally dependent on the primary key or no non
prime attribute is functionally dependent on another
non-prime attribute.
□ This means that a relation in 3NF consists of the
primary key and a set of independent non prime
attributes.
Jitendra Tomar 26
27. Normalization
Third Normal Form
□ The following Relation in 1NF was depicted as:
PATIENT_DOCTOR (PATIENT_NAME, DOB, DOCTOR_NAME,
DATE_TIME, CONTACT_NO, CONSULT_DURATION)
□ The relation in 2NF can be depicted as:
PATIENT_DOCTOR
(PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME,
CONTACT_NO, CONSULT_DURATION)
DOCTOR
(DOCTOR_NAME, CONTACT_NO)
Jitendra Tomar 27
28. Normalization
Third Normal Form
□ Therefore, to bring the relation in 2NF, the information
about doctors and their contact numbers have to be
separated from information about patients and their
appointments with doctors. Thus, the relation is
decomposed into two table namely PATIENT_DOCTOR
and DOCTOR.
□ The relation in 2NF can be depicted as:
PATIENT_DOCTOR
(PATIENT_NAME, DOB, DOCTOR_NAME, DATE_TIME, CONTACT_NO,
CONSULT_DURATION)
DOCTOR
(DOCTOR_NAME, CONTACT_NO)
Jitendra Tomar 28
29. Normalization
The FFD in 1NF and 2NF was given as below:
Contact_No
Doctor_Name
Patient_Name DOB
Date_Time
Cons Duration
Doctor_Name
Patient_Name DOB
Date_Time
Cons Duration
Doctor_Name Contact_No
Jitendra Tomar 29
30. Normalization
Third Normal Form
□ But the Relation still have the problem as the attribute
DOB is not Functionally Dependent on the composite
primary key i.e. {Doctor_Name, Date_Time}
□ Infact the attribute DOB is FD on Patient_Name which is
the non-key attribute.
□ Thus Patient_Name and DOB are not mutually
independent.
□ Since in 3NF, the non-key attributes should be mutually
independent and no transitive dependency should
occur, hence the relation violates the requisite for 3NF.
□ The relation has to be decomposed to remove the parts
that are not directly dependent on primary key.
Jitendra Tomar 30
31. Normalization
The FFD in 1NF and 2NF was given as below:
Doctor_Name
Patient_Name DOB
Date_Time
Cons Duration
Doctor_Name Contact_No
Doctor_Name
Cons Duration Patient Name DOB
Date_Time
Doctor_Name Contact_No
Jitendra Tomar 31
32. Normalization
Third Normal Form
□ Thus the relation in 3NF is depicted as:
PATIENT
(PATIENT_NAME, DOB)
PATIENT_DOCTOR
(PATIENT_NAME, DOCTOR_NAME, DATE_TIME,
CONSULT_DURATION)
DOCTOR
(DOCTOR_NAME, CONTACT_NO)
Jitendra Tomar 32
33. Normalization
Third Normal Form
□ FFD of the relation in 3NF.
Doctor_Name
Cons Duration Patient_Name
Date_Time
Patient_Name DOB
Doctor_Name Contact_No
Jitendra Tomar 33