Normalization is a logical database design method that minimizes data redundancy and reduces design flaws. It involves applying normal forms like 1NF, 2NF, and 3NF to break large tables into smaller subsets. The normal forms improve data integrity by preventing anomalies like insertion, update, and deletion anomalies. Applying the normal forms can result in relations that are in first, second, and third normal form, but additional steps may be needed to attain Boyce-Codd normal form, which further reduces anomalies from overlapping candidate keys.
1. Normalization
A logical design method which minimizes
data redundancy and reduces design flaws.
•Consists of applying various “normal” forms
to the database design.
•The normal forms break down large tables
into smaller subsets.
2. First Normal Form (1NF)
Each attribute must be atomic
• No repeating columns within a row.
• No multi-valued columns.
1NF simplifies attributes
• Queries become easier.
3. 1NF
Employee (unnormalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
4. Second Normal Form (2NF)
Each attribute must be functionally
dependent on the primary key.
• Functional dependence - the property of one or more
attributes that uniquely determines the value of other
attributes.
• Any non-dependent attributes are moved into a
smaller (subset) table.
2NF improves data integrity.
• Prevents update, insert, and delete anomalies.
5. Functional Dependence
Name, dept_no, and dept_name are functionally dependent on
emp_no. (emp_no -> name, dept_no, dept_name)
Skills is not functionally dependent on emp_no since it is not
unique to each emp_no.
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
6. 2NF
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D
Employee (2NF)
emp_no skills
1 C
1 Perl
1 Java
2 Linux
2 Mac
3 DB2
3 Oracle
3 Java
Skills (2NF)
7. Data Integrity
• Insert Anomaly - adding null values. eg, inserting a new department does not
require the primary key of emp_no to be added.
• Update Anomaly - multiple updates for a single name change, causes
performance degradation. eg, changing IT dept_name to IS
• Delete Anomaly - deleting wanted information. eg, deleting the IT department
removes employee Barbara Jones from the database
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
8. Third Normal Form (3NF)
Remove transitive dependencies.
• Transitive dependence - two separate entities exist
within one table.
• Any transitive dependencies are moved into a smaller
(subset) table.
3NF further improves data integrity.
• Prevents update, insert, and delete anomalies.
9. Transitive Dependence
Dept_no and dept_name are functionally dependent
on emp_no however, department can be considered
a separate entity.
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D
Employee (2NF)
10. 3NF
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D
Employee (2NF)
emp_no name dept_no
1 Kevin Jacobs 201
2 Barbara Jones 224
3 Jake Rivera 201
Employee (3NF)
dept_no dept_name
201 R&D
224 IT
Department (3NF)
11. Other Normal Forms
Boyce-Codd Normal Form (BCNF)
• Strengthens 3NF by requiring the keys in the
functional dependencies to be superkeys (a column or
columns that uniquely identify a row)
Fourth Normal Form (4NF)
• Eliminate trivial multivalued dependencies.
Fifth Normal Form (5NF)
• Eliminate dependencies not determined by keys.
12. Normalizing our webstore (1NF)
customers
items
item_id name price inventory
34 sweater red 50 21
35 sweater blue 50 10
56 t-shirt 25 76
72 jeans 75 5
81 jacket 175 9
cust_id name address credit_card_num credit_card_type
45 Mike Speedy 123 A St. 45154 visa
45 Mike Speedy 123 A St. 32499 mastercard
45 Mike Speedy 123 A St. 12834 discover
78 Frank Newmon 2 Main St. 45698 visa
102 Joe Powers 343 Blue Blvd. 94065 mastercard
102 Joe Powers 343 Blue Blvd. 10532 discover
orders
order_id cust_id item_id quantity cost date status
405 45 34 2 100 2/306 shipped
405 45 35 1 50 2/306 shipped
405 45 56 3 75 2/306 shipped
408 78 56 2 50 3/5/06 refunded
410 102 72 2 150 3/10/06 shipped
410 102 81 1 175 3/10/06 shipped
13. Normalizing our webstore (2NF &
3NF)
customers credit_cards
cust_id name address
45 Mike Speedy 123 A St.
78 Frank Newmon 2 Main St.
102 Joe Powers 343 Blue Blvd.
cust_id num type
45 45154 visa
45 32499 mastercard
45 12834 discover
78 45698 visa
102 94065 mastercard
102 10532 discover
16. Boyce-Codd Normal Form (BCNF)
• When a relation has more than one candidate key,
anomalies may result even though the relation is in
3NF.
• 3NF does not deal satisfactorily with the case of a
relation with overlapping candidate keys
• i.e. composite candidate keys with at least one attribute in common.
• BCNF is based on the concept of a determinant.
• A determinant is any attribute (simple or composite) on which some
other attribute is fully functionally dependent.
• A relation is in BCNF is, and only if, every
determinant is a candidate key.
17. The theory
• Consider the following relation and determinants.
R(a,b,c,d)
a,c -> b,d
a,d -> b
• To be in BCNF, all valid determinants must be a
candidate key. In the relation R, a,c->b,d is the
determinate used, so the first determinate is fine.
• a,d->b suggests that a,d can be the primary key,
which would determine b. However this would not
determine c. This is not a candidate key, and thus
R is not in BCNF.
18. Example 1
Patient No Patient Name Appointment Id Time Doctor
1 John 0 09:00 Zorro
2 Kerr 0 09:00 Killer
3 Adam 1 10:00 Zorro
4 Robert 0 13:00 Killer
5 Zane 1 14:00 Zorro
19. Two possible keys
• DB(Patno,PatName,appNo,time,doctor)
• Determinants:
• Patno -> PatName
• Patno,appNo -> Time,doctor
• Time -> appNo
• Two options for 1NF primary key selection:
• DB(Patno,PatName,appNo,time,doctor) (example 1a)
• DB(Patno,PatName,appNo,time,doctor) (example 1b)
20. Example 1a
• DB(Patno,PatName,appNo,time,doctor)
• No repeating groups, so in 1NF
• 2NF – eliminate partial key dependencies:
• DB(Patno,appNo,time,doctor)
• R1(Patno,PatName)
• 3NF – no transient dependences so in 3NF
• Now try BCNF.
21. BCNF Every determinant is a candidate
key
DB(Patno,appNo,time,doctor)
R1(Patno,PatName)
• Is determinant a candidate key?
• Patno -> PatName
Patno is present in DB, but not PatName, so irrelevant.
22. Continued…
DB(Patno,appNo,time,doctor)
R1(Patno,PatName)
• Patno,appNo -> Time,doctor
All LHS and RHS present so relevant. Is this a candidate key?
Patno,appNo IS the key, so this is a candidate key.
• Time -> appNo
Time is present, and so is appNo, so relevant. Is this a candidate
key? If it was then we could rewrite DB as:
DB(Patno,appNo,time,doctor)
This will not work, so not BCNF.
23. Rewrite to BCNF
• DB(Patno,appNo,time,doctor)
R1(Patno,PatName)
• BCNF: rewrite to
DB(Patno,time,doctor)
R1(Patno,PatName)
R2(time,appNo)
• time is enough to work out the appointment
number of a patient. Now BCNF is satisfied,
and the final relations shown are in BCNF
24. Example 1b
• DB(Patno,PatName,appNo,time,doctor)
• No repeating groups, so in 1NF
• 2NF – eliminate partial key dependencies:
• DB(Patno,time,doctor)
• R1(Patno,PatName)
• R2(time,appNo)
• 3NF – no transient dependences so in 3NF
• Now try BCNF.
25. BCNF Every determinant is a candidate
key
DB(Patno,time,doctor)
R1(Patno,PatName)
R2(time,appNo)
• Is determinant a candidate key?
• Patno -> PatName
Patno is present in DB, but not PatName, irrelevant.
• Patno,appNo -> Time,doctor
Not all LHS present so not relevant
• Time -> appNo
Time is present, but not appNo, so not relevant.
• Relations are in BCNF.
26. Summary - Example 1
This example has demonstrated three things:
• BCNF is stronger than 3NF, relations that are in
3NF are not necessarily inBCNF
• BCNF is needed in certain situations to obtain full
understanding of the data model
• there are several routes to take to arrive at the
same set of relations in BCNF.
• Unfortunately there are no rules as to which route will be the easiest
one to take.
31. Example 2 cont...
• BCNF Every determinant is a candidate key
• Student : only determinant is StudNo
• StudCourse: only determinant is StudNo,Major
• Course: only determinant is CourseNo
• Instructor: only determinant is InstrucName
• StudMajor: the determinants are
• StudNo,Major, or
• Advisor
Only StudNo,Major is a candidate key.
33. Problems BCNF overcomes
• If the record for student 456 is deleted we lose
not only information on student 456 but also
the fact that DARWIN advises in BIOLOGY
• we cannot record the fact that WATSON can
advise on COMPUTING until we have a
student majoring in COMPUTING to whom we
can assign WATSON as an advisor.
STUDENT MAJOR ADVISOR
123 PHYSICS EINSTEIN
123 MUSIC MOZART
456 BIOLOGY DARWIN
789 PHYSICS BOHR
999 PHYSICS EINSTEIN
34. Split into two tables
In BCNF we have two
tables
STUDENT ADVISOR
123 EINSTEIN
123 MOZART
456 DARWIN
789 BOHR
999 EINSTEIN
ADVISOR MAJOR
EINSTEIN PHYSICS
MOZART MUSIC
DARWIN BIOLOGY
BOHR PHYSICS
35. Returning to the ER Model
• Now that we have reached the end of the
normalisation process, you must go back and
compare the resulting relations with the original ER
model
• You may need to alter it to take account of the changes that have
occurred during the normalisation process Your ER diagram should
always be a prefect reflection of the model you are going to implement
in the database, so keep it up to date!
• The changes required depends on how good the ER model was at
first!
36. Video Library Example
• A video library allows customers to borrow videos.
• Assume that there is only 1 of each video.
• We are told that:
video(title,director,serial)
customer(name,addr,memberno)
hire(memberno,serial,date)
title->director,serial
serial->title
serial->director
name,addr -> memberno
memberno -> name,addr
serial,date -> memberno
37. What NF is this?
• No repeating groups therefore at least 1NF
• 2NF – A Composite key exists:
hire(memberno,serial,date)
• Can memberno be found with just serial or date?
• NO, therefore the relations are already in 2NF.
• 3NF?
38. Test for 3NF
• video(title,director,serial)
• title->director,serial
• serial->director
• Director can be derived using serial, and
serial and director are both non keys, so
therefore this is a transitive or non-key
dependency.
• Rewrite video…
40. Check BCNF
• Is every determinant a candidate key?
• video(title,serial) - Determinants are:
• title->director,serial Candidate key
• serial->title Candidate key
• video in BCNF
• serial(serial,director) Determinants are:
• serial->director Candidate key
• serial in BCNF
41. • customer(name,addr,memberno)
Determinants are:
• name,addr -> memberno Candidate key
• memberno -> name,addr Candidate key
• customer in BCNF
• hire(memberno,serial,date) Determinants
are:
• serial,date -> memberno Candidate key
• hire in BCNF
• Therefore the relations are also now in
BCNF.
Editor's Notes
Accomplish normalization by analyzing the interdependencies among attributes in tables and taking subsets of larger tables to form smaller ones.
The subsets are created from examining the interdependencies among the table attributes.
Note, dept_name is functionally dependent on dept_no. Dept_no is functionally dependent on emp_no, so via the middle step of dept_no, dept_name is functionally dependent on emp_no.
(emp_no -> dept_no , dept_no -> dept_name, thus emp_no -> dept_name)