SlideShare a Scribd company logo
1 of 20
* 
In relational database design, the process of organizing data to minimize redundancy is 
called normalization or the process of decomposing relation with anomalies to produce 
smaller ,well-structured relation. Normalization usually involves dividing a database into 
two or more tables and defining relationships between the tables. The objective is to 
isolate data so that additions, deletions, and modifications of a field can be made in just 
one table and then propagated through the rest of the database via the defined 
relationships. 
Normalization is a process of decomposing a set of relations(table) with anomalies to 
produce smaller and well structured relations that contain minimum or no redundancy. 
The basic objectives of normalization are to reduce redundancy, which means that 
information is to be stored only once. Storing information several times leads to wastage 
of storage space and increase in the total size of the data stored. Normalization 
provides the designer with a systematic and scientific process of grouping of attributes 
in a relation. 
The normalization process can be defined as a procedure of analyzing and successive 
reduction of the given relational schemas based on their FDs and primary keys to 
achieve the desirable properties of - 
(i) Minimizing redundancy, 
(ii) minimizing insertion, deletion and update anomalies.
The objectives of the normalization process are 
1- To create a formal framework for analyzing relation schemas based on their keys 
and on the functional dependencies among their attributes. 
2- To free relations from undesirable insertion, update and deletion anomalies. 
3- To reduce the need for restructuring the relations as new data types are 
introduced. 
4- To carry out series of tests on individual relation schema so that the relational 
database can be normalized to some degree. When a test fails, the relation violating 
that test must be decomposed into relations that individually meet the normalization 
tests.
* 
*The main goal of Database Normalization is to 
restructure the logical data model of a database to: 
*Eliminate redundancy 
*Organize data efficiently 
*Reduce the potential for data anomalies.
* 
*First normal form (1NF) is a property of a relation in a relational 
database. A relation is in first normal form if the domain of each 
attribute contains only atomic values, and the value of each attribute 
contains only a single value from that domain. 
*First normal form is an essential property of a relation in a relational 
database. Database normalization is the process of representing a 
database in terms of relations in standard normal forms, where first 
normal is a minimal requirement.
*Examples 
The following scenario illustrates how a database design might violate first normal form. 
Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table which 
looks like this: 
Customer ID First Name Surname Telephone Number 
123 Robert Ingram 555-861-2025 
456 Jane Wright 555-403-1659 
789 Maria Fernandez 555-808-9633 
Customer 
The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons 
that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one 
value: 
Customer ID First Name Surname Telephone Number 
123 Robert Ingram 555-861-2025 
456 Jane Wright 
555-403-1659 
555-776-4100 
789 Maria Fernandez 555-808-9633 
Customer 
Assuming, however, that the Telephone Number column is defined on some telephone number-like domain, such as the 
domain of 12-character strings, the representation above is not in first normal form. It is in violation of first normal form 
as a single field has been allowed to contain multiple values. A typical relational database management system will not 
allow fields in a table to contain multiple values in this way.
*A design that complies with 1NF 
A design that is unambiguously in first normal form makes use of two tables: a 
Customer Name table and a Customer Telephone Number table. 
Customer Name 
Customer ID First Name Surname 
123 Robert Ingram 
456 Jane Wright 
789 Maria Fernandez 
Customer Telephone Number 
Customer ID Telephone 
Number 
123 555-861-2025 
456 555-403-1659 
456 555-776-4100 
789 555-808-9633 
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to- 
Telephone Number link appears on its own record. With Customer ID as key, a one-to-many 
relationship exists between the two tables. A record in the "parent" table, Customer Name, 
can have many telephone number records in the "child" table, Customer Telephone Number, 
but each telephone number belongs to one, and only one customer. It is worth noting that this 
design meets the additional requirements for second and third normal form.
* 
Second normal form (2NF) is a normal form used in database normalization. 
A table that is in first normal form (1NF) must meet additional criteria if it is to 
qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 
1NF and no non-prime attribute is dependent on any proper subset of any candidate 
key of the table. A non-prime attribute of a table is an attribute that is not a part of 
any candidate key of the table. 
Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute 
of the table is dependent on the whole of a candidate key.
*Example 
Consider a table describing employees' skills: 
Employees' Skills 
Employee Skill Current Work Location 
Brown Light Cleaning 73 Industrial Way 
Brown Typing 73 Industrial Way 
Harrison Light Cleaning 73 Industrial Way 
Jones Shorthand 114 Main Street 
Jones Typing 114 Main Street 
Jones Whittling 114 Main Street 
Neither {Employee} nor {Skill} is a candidate key for the table. 
This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill 
might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, 
Skill} qualifies as a candidate key for the table. 
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. 
Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told 
three times that Jones works at 114 Main Street, and twice that Brown works at 73 Industrial Way. This redundancy makes 
the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Shorthand" and 
"Typing" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question 
"What is Jones' current work location?"
*A 2NF alternative to this design would represent the same 
information in two tables: an "Employees" table with candidate 
key {Employee}, and an "Employees' Skills" table with candidate 
key {Employee, Skill}: 
Employees 
Employee 
Current Work 
Location 
Brown 73 Industrial Way 
Harrison 73 Industrial Way 
Jones 114 Main Street 
Employees' Skills 
Employee Skill 
Brown Light 
Cleaning 
Brown Typing 
Harrison Light 
Cleaning 
Jones Shorthand 
Jones Typing 
Jones Whittling
* Third normal form 
The third normal form (3NF) is a normal form used in database normalization 
Codd's definition states that a table is in 3NF if and only if both of the following conditions hold: 
* The relation R (table) is in second normal form (2NF) 
* Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of 
R. 
* A non-prime attribute of R is an attribute that does not belong to any candidate key of R. 
A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by 
virtue of X → Y and Y → Z (where it is not the case that Y → X). 
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. 
This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at 
least one of the following conditions holds: 
* X contains A (that is, X → A is trivial functional dependency), or 
* X is a superkey, or 
* Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each attribute 
in A-X is contained in some candidate key
* An example of a 2NF table that fails to meet the requirements of 3NF is: 
Tournament Winners 
Tournament Year Winner Winner Date of Birth 
Indiana Invitational 1998 Al Fredrickson 21 July 1975 
Cleveland Open 1999 Bob Albertson 28 September 1968 
Des Moines Masters 1999 Al Fredrickson 21 July 1975 
Indiana Invitational 1999 Chip Masterson 14 March 1977 
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the 
composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. 
That is, {Tournament, Year} is a candidate key for the table. 
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate 
key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on 
Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown 
with different dates of birth on different records. 
In order to express the same facts without violating 3NF, it is necessary to split the table into two: 
Tournament Winners 
Tournament Year Winner 
Indiana Invitational 1998 Al Fredrickson 
Cleveland Open 1999 Bob Albertson 
Des Moines Masters 1999 Al Fredrickson 
Indiana Invitational 1999 Chip Masterson 
Player Dates of Birth 
Player Date of Birth 
Chip Masterson 14 March 1977 
Al Fredrickson 21 July 1975 
Bob Albertson 28 September 1968
*Normalization beyond 3NF 
Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF 
tables, rarely met with in practice, are affected by such anomalies; these are tables which 
either fall short of Boyce–Codd normal form (BCNF) or, if they meet BCNF, fall short of the 
higher normal forms 4NF or 5NF. 
Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database 
normalization. It is a slightly stronger version of the third normal form (3NF). 
* If a relational schema is in BCNF then all redundancy based on functional dependency has 
been removed, although other types of redundancy may still exist. A relational schema R is 
in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least 
one of the following conditions hold: 
* X → Y is a trivial functional dependency (Y ⊆ X) 
* X is a superkey for schema R
*3NF tables not meeting BCNF (Boyce–Codd normal form) 
Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which 
does not have multiple overlapping candidate keys is guaranteed to be in BCNF. Depending 
on what its functional dependencies are, a 3NF table with two or more overlapping 
candidate keys may or may not be in BCNF. 
* An example of a 3NF table that does not meet BCNF is: 
Today's Court Bookings 
Court Start Time End Time Rate Type 
1 09:30 10:30 SAVER 
1 11:00 12:00 SAVER 
1 14:00 15:30 STANDARD 
2 10:00 11:30 PREMIUM-B 
2 11:30 13:30 PREMIUM-B 
2 15:00 16:30 PREMIUM-A
* Each row in the table represents a court booking at a tennis club that has one hard court (Court 1) 
and one grass court (Court 2) 
* A booking is defined by its Court and the period for which the Court is reserved 
* Additionally, each booking has a Rate Type associated with it. There are four distinct rate types: 
* SAVER, for Court 1 bookings made by members 
* STANDARD, for Court 1 bookings made by non-members 
* PREMIUM-A, for Court 2 bookings made by members 
* PREMIUM-B, for Court 2 bookings made by non-members 
Note that even though in the above table Start Time and End Time attributes have no duplicate values 
for each of them, we still have to admit that in some other days two different bookings on court 1 and 
court 2 could start at the same time or end at the same time. This is the reason why {Start Time} and 
{End Time} cannot be considered as the table's super keys. 
However, only S1, S2, S3 and S4 are candidate keys (that is, minimal superkeys for that 
relation) because e.g. S1 ⊂ S5, so S5 cannot be a candidate key.
* In Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong 
to some candidate key. Therefore the table adheres to both 2NF and 3NF. 
* The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in 
which the determining attribute (Rate Type) is neither a candidate key nor a superset of a 
candidate key. 
* Dependency Rate Type → Court is respected as a Rate Type should only ever apply to a single 
Court. 
* The design can be amended so that it meets BCNF: 
Rate Types 
Rate Type Court 
Member 
Flag 
SAVER 1 Yes 
STANDARD 1 No 
PREMIUM-A 2 Yes 
PREMIUM-B 2 No 
Today's Bookings 
Rate Type Start Time End Time 
SAVER 09:30 10:30 
SAVER 11:00 12:00 
STANDARD 14:00 15:30 
PREMIUM-B 10:00 11:30 
PREMIUM-B 11:30 13:30 
PREMIUM-A 15:00 16:30 
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; 
the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate 
Type, End Time}. Both tables are in BCNF
*Fourth normal form 
Fourth normal form (4NF) is a normal form used in database normalization , 4NF is the next level 
of normalization after Boyce–Codd normal form (BCNF). 
Whereas the second, third, and Boyce–Codd normal forms are concerned with functional 
dependencies, 4NF is concerned with a more general type of dependency known as a multivalued 
dependency. 
A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is 
a super key—that is, X is either a candidate key or a superset thereof 
Consider the following example: 
Pizza Delivery Permutations 
Restaurant Pizza Variety Delivery Area 
A1 Pizza Thick Crust Springfield 
A1 Pizza Thick Crust Shelbyville 
A1 Pizza Thick Crust Capital City 
A1 Pizza Stuffed Crust Springfield 
A1 Pizza Stuffed Crust Shelbyville 
A1 Pizza Stuffed Crust Capital City 
Elite Pizza Thin Crust Capital City 
Elite Pizza Stuffed Crust Capital City 
Vincenzo's Pizza Thick Crust Springfield 
Vincenzo's Pizza Thick Crust Shelbyville 
Vincenzo's Pizza Thin Crust Springfield 
Vincenzo's Pizza Thin Crust Shelbyville
* Each row indicates that a given restaurant can deliver a given variety of pizza to a given area. 
* The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery 
Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza 
varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF. 
The problem is that the table features two non-trivial multivalued dependencies on the 
{Restaurant} attribute (which is not a super key). 
The dependencies are: 
* {Restaurant} {Pizza Variety} 
* {Restaurant} {Delivery Area} 
* These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties 
of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This 
state of affairs leads to redundancy in the table: 
for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts 
producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's 
delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add 
Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the 
multivalued dependency {Restaurant} {Pizza Variety}. 
* To eliminate the possibility of these anomalies, we must place the facts about varieties offered 
into a different table from the facts about delivery areas, yielding two tables that are both in 
4NF:
Varieties By Restaurant 
Restaurant Pizza Variety 
A1 Pizza Thick Crust 
A1 Pizza Stuffed Crust 
Elite Pizza Thin Crust 
Elite Pizza Stuffed Crust 
Vincenzo's Pizza Thick Crust 
Vincenzo's Pizza Thin Crust 
Delivery Areas By Restaurant 
Restaurant Delivery Area 
A1 Pizza Springfield 
A1 Pizza Shelbyville 
A1 Pizza Capital City 
Elite Pizza Capital City 
Vincenzo's Pizza Springfield 
Vincenzo's Pizza Shelbyville 
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary 
from one delivery area to another, the original three-column table would satisfy 4NF
* Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is 
a level of database normalization designed to reduce redundancy in relational 
databases recording multi-valued facts by isolating semantically related 
multiple relationships. A table is said to be in the 5NF if and only if every non-trivial 
join dependency in it is implied by the candidate keys. 
Consider the following example: 
Traveling Salesman Brand Product Type 
Jack Schneider Acme Vacuum Cleaner 
Jack Schneider Acme Breadbox 
Willy Loman Robusto Pruning Shears 
Willy Loman Robusto Vacuum Cleaner 
Willy Loman Robusto Breadbox 
Willy Loman Robusto Umbrella Stand 
Louis Ferguson Robusto Vacuum Cleaner 
Louis Ferguson Robusto Telescope 
Louis Ferguson Acme Vacuum Cleaner 
Louis Ferguson Acme Lava Lamp 
Louis Ferguson Nimbus Tie Rack 
Traveling Salesman Product 
Availability By Brand
* The table's predicate is: Products of the type designated by Product Type, made by the brand 
designated by Brand, are available from the traveling salesman designated by Traveling Salesman. 
* In the absence of any rules restricting the valid possible combinations of Traveling Salesman, Brand, 
and Product Type, the three-attribute table above is necessary in order to model the situation correctly. 
* Suppose, however, that the following rule applies: A Traveling Salesman has certain Brands and 
certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product 
Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P), the 
Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made 
by Brand B2. 
* In that case, it is possible to split the table into three: 
Product Types By Traveling Salesman 
Traveling 
Salesman 
Product Type 
Jack Schneider Vacuum Cleaner 
Jack Schneider Breadbox 
Willy Loman Pruning Shears 
Willy Loman Vacuum Cleaner 
Willy Loman Breadbox 
Willy Loman Umbrella Stand 
Louis Ferguson Telescope 
Louis Ferguson Vacuum Cleaner 
Louis Ferguson Lava Lamp 
Louis Ferguson Tie Rack 
Brands By Traveling 
Salesman 
Traveling 
Salesman 
Brand 
Jack 
Schneider 
Acme 
Willy 
Loman 
Robusto 
Louis 
Ferguson 
Robusto 
Louis 
Ferguson 
Acme 
Louis 
Ferguson 
Nimbus 
Product Types By Brand 
Brand Product Type 
Acme Vacuum Cleaner 
Acme Breadbox 
Acme Lava Lamp 
Robusto Pruning Shears 
Robusto Vacuum Cleaner 
Robusto Breadbox 
Robusto Umbrella Stand 
Robusto Telescope 
Nimbus Tie Rack

More Related Content

What's hot

Entity relationship modelling
Entity relationship modellingEntity relationship modelling
Entity relationship modelling
Dr. C.V. Suresh Babu
 
Functional dependencies and normalization
Functional dependencies and normalizationFunctional dependencies and normalization
Functional dependencies and normalization
daxesh chauhan
 

What's hot (20)

Functional dependancy
Functional dependancyFunctional dependancy
Functional dependancy
 
Normalization in databases
Normalization in databasesNormalization in databases
Normalization in databases
 
Normalization
NormalizationNormalization
Normalization
 
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
FUNCTION DEPENDENCY  AND TYPES & EXAMPLEFUNCTION DEPENDENCY  AND TYPES & EXAMPLE
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
 
Normalization of Data Base
Normalization of Data BaseNormalization of Data Base
Normalization of Data Base
 
Database Concept - Normalization (1NF, 2NF, 3NF)
Database Concept - Normalization (1NF, 2NF, 3NF)Database Concept - Normalization (1NF, 2NF, 3NF)
Database Concept - Normalization (1NF, 2NF, 3NF)
 
Database Normalization
Database NormalizationDatabase Normalization
Database Normalization
 
Normalization
NormalizationNormalization
Normalization
 
Database - Normalization
Database - NormalizationDatabase - Normalization
Database - Normalization
 
Functional dependencies in Database Management System
Functional dependencies in Database Management SystemFunctional dependencies in Database Management System
Functional dependencies in Database Management System
 
Entity relationship modelling
Entity relationship modellingEntity relationship modelling
Entity relationship modelling
 
Normal forms
Normal formsNormal forms
Normal forms
 
Normalisation
NormalisationNormalisation
Normalisation
 
Functional dependency
Functional dependencyFunctional dependency
Functional dependency
 
Dbms normalization
Dbms normalizationDbms normalization
Dbms normalization
 
ER MODEL
ER MODELER MODEL
ER MODEL
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 
Assignment#11
Assignment#11Assignment#11
Assignment#11
 
Functional dependencies and normalization
Functional dependencies and normalizationFunctional dependencies and normalization
Functional dependencies and normalization
 

Similar to Database normalization

Database normalization
Database normalizationDatabase normalization
Database normalization
Jignesh Jain
 
What is Database NormalizationExplain the guidelines for ensuring t.pdf
What is Database NormalizationExplain the guidelines for ensuring t.pdfWhat is Database NormalizationExplain the guidelines for ensuring t.pdf
What is Database NormalizationExplain the guidelines for ensuring t.pdf
arjunstores123
 
Research gadot
Research gadotResearch gadot
Research gadot
Jotham Gadot
 
Normalization by Ashwin and Tanmay
Normalization by Ashwin and TanmayNormalization by Ashwin and Tanmay
Normalization by Ashwin and Tanmay
Ashwin Dinoriya
 
Dependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its typesDependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its types
nsrChowdary1
 
Normalisation - 2nd normal form
Normalisation - 2nd normal formNormalisation - 2nd normal form
Normalisation - 2nd normal form
college
 

Similar to Database normalization (20)

Normalization
NormalizationNormalization
Normalization
 
normaliztion
normaliztionnormaliztion
normaliztion
 
Database normalization
Database normalizationDatabase normalization
Database normalization
 
Normalization
NormalizationNormalization
Normalization
 
Understanding about relational database m-square systems inc
Understanding about relational database m-square systems incUnderstanding about relational database m-square systems inc
Understanding about relational database m-square systems inc
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
What is Database NormalizationExplain the guidelines for ensuring t.pdf
What is Database NormalizationExplain the guidelines for ensuring t.pdfWhat is Database NormalizationExplain the guidelines for ensuring t.pdf
What is Database NormalizationExplain the guidelines for ensuring t.pdf
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016
 
Research gadot
Research gadotResearch gadot
Research gadot
 
Presentation on Normalization.pptx
Presentation on Normalization.pptxPresentation on Normalization.pptx
Presentation on Normalization.pptx
 
Normalization by Ashwin and Tanmay
Normalization by Ashwin and TanmayNormalization by Ashwin and Tanmay
Normalization by Ashwin and Tanmay
 
Dependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its typesDependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its types
 
Data normailazation
Data normailazationData normailazation
Data normailazation
 
Types of normalization
Types of normalizationTypes of normalization
Types of normalization
 
Year 11 DATA PROCESSING 1st Term
Year 11 DATA PROCESSING 1st TermYear 11 DATA PROCESSING 1st Term
Year 11 DATA PROCESSING 1st Term
 
Database Normalization.pptx
Database Normalization.pptxDatabase Normalization.pptx
Database Normalization.pptx
 
Impact of Normalization in Future
Impact of Normalization in FutureImpact of Normalization in Future
Impact of Normalization in Future
 
Database Normalization.docx
Database Normalization.docxDatabase Normalization.docx
Database Normalization.docx
 
DBMS (UNIT 2)
DBMS (UNIT 2)DBMS (UNIT 2)
DBMS (UNIT 2)
 
Normalisation - 2nd normal form
Normalisation - 2nd normal formNormalisation - 2nd normal form
Normalisation - 2nd normal form
 

More from Vaibhav Kathuria (6)

Copy of sec d (2)
Copy of sec d (2)Copy of sec d (2)
Copy of sec d (2)
 
Copy of sec d (2)
Copy of sec d (2)Copy of sec d (2)
Copy of sec d (2)
 
Cursors, triggers, procedures
Cursors, triggers, proceduresCursors, triggers, procedures
Cursors, triggers, procedures
 
Dbms mca-section a
Dbms mca-section aDbms mca-section a
Dbms mca-section a
 
Relational algebra calculus
Relational algebra  calculusRelational algebra  calculus
Relational algebra calculus
 
B & c
B & cB & c
B & c
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Database normalization

  • 1. * In relational database design, the process of organizing data to minimize redundancy is called normalization or the process of decomposing relation with anomalies to produce smaller ,well-structured relation. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. Normalization is a process of decomposing a set of relations(table) with anomalies to produce smaller and well structured relations that contain minimum or no redundancy. The basic objectives of normalization are to reduce redundancy, which means that information is to be stored only once. Storing information several times leads to wastage of storage space and increase in the total size of the data stored. Normalization provides the designer with a systematic and scientific process of grouping of attributes in a relation. The normalization process can be defined as a procedure of analyzing and successive reduction of the given relational schemas based on their FDs and primary keys to achieve the desirable properties of - (i) Minimizing redundancy, (ii) minimizing insertion, deletion and update anomalies.
  • 2. The objectives of the normalization process are 1- To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. 2- To free relations from undesirable insertion, update and deletion anomalies. 3- To reduce the need for restructuring the relations as new data types are introduced. 4- To carry out series of tests on individual relation schema so that the relational database can be normalized to some degree. When a test fails, the relation violating that test must be decomposed into relations that individually meet the normalization tests.
  • 3. * *The main goal of Database Normalization is to restructure the logical data model of a database to: *Eliminate redundancy *Organize data efficiently *Reduce the potential for data anomalies.
  • 4. * *First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. *First normal form is an essential property of a relation in a relational database. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement.
  • 5. *Examples The following scenario illustrates how a database design might violate first normal form. Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table which looks like this: Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 789 Maria Fernandez 555-808-9633 Customer The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value: Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 555-776-4100 789 Maria Fernandez 555-808-9633 Customer Assuming, however, that the Telephone Number column is defined on some telephone number-like domain, such as the domain of 12-character strings, the representation above is not in first normal form. It is in violation of first normal form as a single field has been allowed to contain multiple values. A typical relational database management system will not allow fields in a table to contain multiple values in this way.
  • 6. *A design that complies with 1NF A design that is unambiguously in first normal form makes use of two tables: a Customer Name table and a Customer Telephone Number table. Customer Name Customer ID First Name Surname 123 Robert Ingram 456 Jane Wright 789 Maria Fernandez Customer Telephone Number Customer ID Telephone Number 123 555-861-2025 456 555-403-1659 456 555-776-4100 789 555-808-9633 Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to- Telephone Number link appears on its own record. With Customer ID as key, a one-to-many relationship exists between the two tables. A record in the "parent" table, Customer Name, can have many telephone number records in the "child" table, Customer Telephone Number, but each telephone number belongs to one, and only one customer. It is worth noting that this design meets the additional requirements for second and third normal form.
  • 7. * Second normal form (2NF) is a normal form used in database normalization. A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table. Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key.
  • 8. *Example Consider a table describing employees' skills: Employees' Skills Employee Skill Current Work Location Brown Light Cleaning 73 Industrial Way Brown Typing 73 Industrial Way Harrison Light Cleaning 73 Industrial Way Jones Shorthand 114 Main Street Jones Typing 114 Main Street Jones Whittling 114 Main Street Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table. The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Brown works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Shorthand" and "Typing" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?"
  • 9. *A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}: Employees Employee Current Work Location Brown 73 Industrial Way Harrison 73 Industrial Way Jones 114 Main Street Employees' Skills Employee Skill Brown Light Cleaning Brown Typing Harrison Light Cleaning Jones Shorthand Jones Typing Jones Whittling
  • 10. * Third normal form The third normal form (3NF) is a normal form used in database normalization Codd's definition states that a table is in 3NF if and only if both of the following conditions hold: * The relation R (table) is in second normal form (2NF) * Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R. * A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that Y → X). A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds: * X contains A (that is, X → A is trivial functional dependency), or * X is a superkey, or * Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each attribute in A-X is contained in some candidate key
  • 11. * An example of a 2NF table that fails to meet the requirements of 3NF is: Tournament Winners Tournament Year Winner Winner Date of Birth Indiana Invitational 1998 Al Fredrickson 21 July 1975 Cleveland Open 1999 Bob Albertson 28 September 1968 Des Moines Masters 1999 Al Fredrickson 21 July 1975 Indiana Invitational 1999 Chip Masterson 14 March 1977 Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table. The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records. In order to express the same facts without violating 3NF, it is necessary to split the table into two: Tournament Winners Tournament Year Winner Indiana Invitational 1998 Al Fredrickson Cleveland Open 1999 Bob Albertson Des Moines Masters 1999 Al Fredrickson Indiana Invitational 1999 Chip Masterson Player Dates of Birth Player Date of Birth Chip Masterson 14 March 1977 Al Fredrickson 21 July 1975 Bob Albertson 28 September 1968
  • 12. *Normalization beyond 3NF Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of Boyce–Codd normal form (BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NF or 5NF. Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). * If a relational schema is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist. A relational schema R is in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least one of the following conditions hold: * X → Y is a trivial functional dependency (Y ⊆ X) * X is a superkey for schema R
  • 13. *3NF tables not meeting BCNF (Boyce–Codd normal form) Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF. Depending on what its functional dependencies are, a 3NF table with two or more overlapping candidate keys may or may not be in BCNF. * An example of a 3NF table that does not meet BCNF is: Today's Court Bookings Court Start Time End Time Rate Type 1 09:30 10:30 SAVER 1 11:00 12:00 SAVER 1 14:00 15:30 STANDARD 2 10:00 11:30 PREMIUM-B 2 11:30 13:30 PREMIUM-B 2 15:00 16:30 PREMIUM-A
  • 14. * Each row in the table represents a court booking at a tennis club that has one hard court (Court 1) and one grass court (Court 2) * A booking is defined by its Court and the period for which the Court is reserved * Additionally, each booking has a Rate Type associated with it. There are four distinct rate types: * SAVER, for Court 1 bookings made by members * STANDARD, for Court 1 bookings made by non-members * PREMIUM-A, for Court 2 bookings made by members * PREMIUM-B, for Court 2 bookings made by non-members Note that even though in the above table Start Time and End Time attributes have no duplicate values for each of them, we still have to admit that in some other days two different bookings on court 1 and court 2 could start at the same time or end at the same time. This is the reason why {Start Time} and {End Time} cannot be considered as the table's super keys. However, only S1, S2, S3 and S4 are candidate keys (that is, minimal superkeys for that relation) because e.g. S1 ⊂ S5, so S5 cannot be a candidate key.
  • 15. * In Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong to some candidate key. Therefore the table adheres to both 2NF and 3NF. * The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in which the determining attribute (Rate Type) is neither a candidate key nor a superset of a candidate key. * Dependency Rate Type → Court is respected as a Rate Type should only ever apply to a single Court. * The design can be amended so that it meets BCNF: Rate Types Rate Type Court Member Flag SAVER 1 Yes STANDARD 1 No PREMIUM-A 2 Yes PREMIUM-B 2 No Today's Bookings Rate Type Start Time End Time SAVER 09:30 10:30 SAVER 11:00 12:00 STANDARD 14:00 15:30 PREMIUM-B 10:00 11:30 PREMIUM-B 11:30 13:30 PREMIUM-A 15:00 16:30 The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate Type, End Time}. Both tables are in BCNF
  • 16. *Fourth normal form Fourth normal form (4NF) is a normal form used in database normalization , 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a super key—that is, X is either a candidate key or a superset thereof Consider the following example: Pizza Delivery Permutations Restaurant Pizza Variety Delivery Area A1 Pizza Thick Crust Springfield A1 Pizza Thick Crust Shelbyville A1 Pizza Thick Crust Capital City A1 Pizza Stuffed Crust Springfield A1 Pizza Stuffed Crust Shelbyville A1 Pizza Stuffed Crust Capital City Elite Pizza Thin Crust Capital City Elite Pizza Stuffed Crust Capital City Vincenzo's Pizza Thick Crust Springfield Vincenzo's Pizza Thick Crust Shelbyville Vincenzo's Pizza Thin Crust Springfield Vincenzo's Pizza Thin Crust Shelbyville
  • 17. * Each row indicates that a given restaurant can deliver a given variety of pizza to a given area. * The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a super key). The dependencies are: * {Restaurant} {Pizza Variety} * {Restaurant} {Delivery Area} * These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} {Pizza Variety}. * To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:
  • 18. Varieties By Restaurant Restaurant Pizza Variety A1 Pizza Thick Crust A1 Pizza Stuffed Crust Elite Pizza Thin Crust Elite Pizza Stuffed Crust Vincenzo's Pizza Thick Crust Vincenzo's Pizza Thin Crust Delivery Areas By Restaurant Restaurant Delivery Area A1 Pizza Springfield A1 Pizza Shelbyville A1 Pizza Capital City Elite Pizza Capital City Vincenzo's Pizza Springfield Vincenzo's Pizza Shelbyville In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to another, the original three-column table would satisfy 4NF
  • 19. * Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in it is implied by the candidate keys. Consider the following example: Traveling Salesman Brand Product Type Jack Schneider Acme Vacuum Cleaner Jack Schneider Acme Breadbox Willy Loman Robusto Pruning Shears Willy Loman Robusto Vacuum Cleaner Willy Loman Robusto Breadbox Willy Loman Robusto Umbrella Stand Louis Ferguson Robusto Vacuum Cleaner Louis Ferguson Robusto Telescope Louis Ferguson Acme Vacuum Cleaner Louis Ferguson Acme Lava Lamp Louis Ferguson Nimbus Tie Rack Traveling Salesman Product Availability By Brand
  • 20. * The table's predicate is: Products of the type designated by Product Type, made by the brand designated by Brand, are available from the traveling salesman designated by Traveling Salesman. * In the absence of any rules restricting the valid possible combinations of Traveling Salesman, Brand, and Product Type, the three-attribute table above is necessary in order to model the situation correctly. * Suppose, however, that the following rule applies: A Traveling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P), the Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made by Brand B2. * In that case, it is possible to split the table into three: Product Types By Traveling Salesman Traveling Salesman Product Type Jack Schneider Vacuum Cleaner Jack Schneider Breadbox Willy Loman Pruning Shears Willy Loman Vacuum Cleaner Willy Loman Breadbox Willy Loman Umbrella Stand Louis Ferguson Telescope Louis Ferguson Vacuum Cleaner Louis Ferguson Lava Lamp Louis Ferguson Tie Rack Brands By Traveling Salesman Traveling Salesman Brand Jack Schneider Acme Willy Loman Robusto Louis Ferguson Robusto Louis Ferguson Acme Louis Ferguson Nimbus Product Types By Brand Brand Product Type Acme Vacuum Cleaner Acme Breadbox Acme Lava Lamp Robusto Pruning Shears Robusto Vacuum Cleaner Robusto Breadbox Robusto Umbrella Stand Robusto Telescope Nimbus Tie Rack