Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model.
Agenda
What Is Normalization?
Why We Use Normalization?
Various Levels Of Normalization
Any Tools For Generate Normalization?
By Harsiddhi Thakkar
If you have any query
Contact me on : harsiddhithakkar94@gmail.com
2. Agenda
• What Is Normalization?
• Why We Use Normalization?
• Various Levels Of Normalization
• Any Tools For Generate Normalization?
• Summary
3. What Is Normalization?
• Normalization is the process of minimizing redundancy from a
relation or set of relations.
• Redundancy in relation may cause insertion, deletion and updation
anomalies. So, it helps to minimize the redundancy in relations.
4. What Is Anomalies?
• Insertion anomaly: There are circumstances in which certain facts
cannot be recorded at all.
• Update anomaly: The same information can be expressed on multiple
rows; therefore updates to the relation may result in logical
inconsistencies.
• Deletion anomaly: the unintended loss of data due to deletion of other
data.
5. Example Of Insert Anomaly
StudentNo CourseNo Student
Name
Address Course
S21 9201 Jones Edinburgh Accounts
S21 9267 Jones Edinburgh Accounts
S24 9267 Smith Glasgow physics
S30 9201 Richards Manchester Computing
S30 9322 Richards Manchester Maths
Here we can't add a new course unless we have at least one student
enrolled on the course.
6. Example Of Update Anomaly
StudentNo CourseNo Student
Name
Address Course
S21 9201 Jones Edinburgh Accounts
S21 9267 Jones Edinburgh Accounts
S24 9267 Smith Glasgow physics
S30 9201 Richards Manchester Computing
S30 9322 Richards Manchester Maths
consider Jones moving address - you need to
update all instances of Jones's address.
7. Example Of Delete Anomaly
StudentNo CourseNo Student
Name
Address Course
S21 9201 Jones Edinburgh Accounts
S21 9267 Jones Edinburgh Accounts
S24 9267 Smith Glasgow physics
S30 9201 Richards Manchester Computing
S30 9322 Richards Manchester Maths
consider what happens if Student S30 is the last student to leave the
course - All information about the course is lost..
8. Normal Forms
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
Most databases should be 3NF or BCNF in order to avoid the
database anomalies.
9. First Normal Form
• We say a relation is in 1NF if all values stored in the relation are single-
valued and atomic.
• 1NF places restrictions on the structure of relations. Values must be
simple.
10. Example OF 1NF
• The following Student Table is not in 1NF
StudentNo CourseNo Student
Name
Address Course
S21 9201 Jones Edinburgh Accounts,p
hysics,
Maths
S30 9201 Richards Manchester Computing,
physics
S24 9267 Smith Glasgow physics
11. Example OF 1NF
• Now Student Table is in 1NF
StudentNo CourseNo Student
Name
Address Course
S21 9201 Jones Edinburgh Accounts
S21 9201 Jones Edinburgh physics
S21 9201 Jones Edinburgh Maths
S30 9201 Richards Manchester Computing
S30 9201 Richards Manchester physics
S24 9267 Smith Glasgow physics
12. Second Normal Form
• Before we learn about the second normal form, we need to understand
the following −
• A candidate key is a column, or set of columns, in a table that can
uniquely identify any database record without referring to any other
data. Each table may have one or more candidate keys, but
one candidate key is unique, and it is called the primarykey.
• Prime attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-
key, is said to be a non-prime attribute.
13. Functional Dependency
• Functional dependency is a relationship that exists when one attribute
uniquely determines another attribute.
• Functional dependency is represented by an arrow sign (→) that is,
X→Y, where X functionally determines Y. The left-hand side attributes
determine the values of attributes on the right-hand side.
14. Second Normal Form
• If we follow second normal form,
• It is in first normal form
• All non-prime attributes are fully functional dependent on the primary
key
15. Example OF 2NF
• Here Student Table is Not in 2NF
StudentNo CourseNo Student
Name
Address Course
S21 9201 Jones Edinburgh Accounts
S21 9201 Jones Edinburgh physics
S21 9201 Jones Edinburgh Maths
S30 9201 Richards Manchester Computing
S30 9201 Richards Manchester physics
S24 9267 Smith Glasgow physics
16. Example OF 2NF
• Solution OF Student Table in 2NF
StudentNo Student
Name
Address
S21 Jones Edinburgh
S21 Jones Edinburgh
S21 Jones Edinburgh
S30 Richards Manchester
S30 Richards Manchester
S24 Smith Glasgow
17. Example OF 2NF
• Here CourseTable in 2NF
CourseNo Course
9201 Accounts
9201 physics
9201 Maths
9201 Computing
9201 physics
9267 physics
18. Example Of 2NF
• This table has a composite primary key [StudentNo, CourseNo] Here,
Course is depend on CourseNo
• FD: CourseNo->Course
• The non-key attribute is [Course]. In this case, [Course] only depends
on [CourseNo], which is only part of the primary key. Therefore, this
table does not satisfy second normal form.
• It is called partial dependency
19. Third Normal Form
• For a relation to be in Third Normal Form, it must be in Second Normal
form and the following must satisfy −
• No non-prime attribute is transitively dependent on prime key attribute.
20. Example OF 3NF
StudentNo Student Name City Zip Address
S21 Jones Surat 3080005 Edinburgh
S21 Jones Surat 3080005 Edinburgh
S21 Jones Surat 3080005 Edinburgh
S30 Richards Surat 3080005 Manchester
S30 Richards Ahmedabad 380009 Manchester
S24 Smith Ahmedabad 380009 Glasgow
21. Example Of 3NF
• We find that in the above Student relation, StudentNo is the key and
only prime key attribute. We find that City can be identified by
StudentNo as well as Zip itself. Neither Zip is a superkey nor is City a
prime attribute. Additionally, StudentNo → Zip → City, so there
exists transitive dependency.
• To bring this relation into third normal form, we break the relation into
two relations as follows −
22. Example Of 3NF
StudentNo Student Name Zip(City )
S21 Jones 3080005
S21 Jones 3080005
S21 Jones 3080005
S30 Richards 380009
S24 Smith 380009
Student Table
23. Example Of 3NF
Zip City
3080005 Ahmedabad
300009 Surat
Student_details Table
24. Boyce-Codd Normaol Form
• Boyce-Codd Normal Form (BCNF) is an extension of Third Normal
Form on strict terms. BCNF states that −
• For any non-trivial functional dependency, X → A, X must be a super-
key.
• In the above image, StudentNo is the super-key in the relation
Student and Zip is the super-key in the relation ZipCodes. So,
• StudentNo→ Stu_Name,Zip and
• Zip → City
• Which confirms that both the relations are in BCNF.
25. Forth Normal Form
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued
attribute.
27. Example Of 4NF
• Note that all three attributes make up the Primary Key.
• Note that StudentNo can be associated with many
CourseNo as well as many Activity (multi-valued
dependency).
29. 5 Normal Form
• A table is in the 5NF if it is in 4NF and if it cannot have a lossless decomposition in to
any number of smaller tables (relations).
• It is also known as Project-join normal form (PJ/NF)
31. Example Of 5NF
Class CourseNo
SEM1 9201
SEM2 9201
SEM3 9201
SEM1 9201
SEM2 9201
SEM1 9267
T3 - Student_class Table
32. Example Of 5NF
• But if we perform natural join between the above three
relations then no spurious (extra) rows are added so this
decomposition is called lossless decomposition.
• So Now three tables P1, P2 and P3 are in 5 NF
33. Example Of 5NF
StudentNo CourseNo Class
S21 9201 01
S21 9201 02
S21 9201 01
S30 9201 02
S30 9201 01
S24 9267 02
34. Why Normalization is not Good
Always
• More tables to join as by spreading out data into more tables, the need
to join table’s increases and the task becomes more tedious. The
database becomes harder to realize as well.
• Tables will contain codes rather than real data as the repeated data will
be stored as lines of codes rather than the true data. Therefore, there
is always a need to go to the lookup table.
35. Why Normalization is not Good
Always
• Data model becomes extremely difficult to query against as the data
model is optimized for applications, not for ad hoc querying. (Ad hoc
query is a query that cannot be determined before the issuance of the
query. It consists of an SQL that is constructed dynamically and is
usually constructed by desktop friendly query tools.). Hence it is hard
to model the database without knowing what the customer desires.
• As the normal form type progresses, the performance becomes slower
and slower.
36. Why Normalization is not Good
Always
• Proper knowledge is required on the various normal forms to execute
the normalization process efficiently. Careless use may lead to terrible
design filled with major anomalies and data inconsistency
37. Any Tools For Generate Normalization ?
• For Normalization No Tools are available .
• Some Best Software Paid-for Architecture
• ErWin (data modeling tool - CA ERwin),Embarcadero ER/Studio (Enterprise
Data Modeling & Metadata Management Software)
• Some Best Software Free-for Architecture
• SQL Power Architect (Free Download: SQL Power Architect); also available in
a paid-for version
• MySQL Workbench (MySQL :: MySQL Workbench); also available in an
"enterprise edition"
• Oracle SQL Developer Data Modeler (SQL Developer Data Modeler)