CNIC Information System with Pakdata Cf In Pakistan
Â
Database normalization
1. *
In relational database design, the process of organizing data to minimize redundancy is
called normalization or the process of decomposing relation with anomalies to produce
smaller ,well-structured relation. Normalization usually involves dividing a database into
two or more tables and defining relationships between the tables. The objective is to
isolate data so that additions, deletions, and modifications of a field can be made in just
one table and then propagated through the rest of the database via the defined
relationships.
Normalization is a process of decomposing a set of relations(table) with anomalies to
produce smaller and well structured relations that contain minimum or no redundancy.
The basic objectives of normalization are to reduce redundancy, which means that
information is to be stored only once. Storing information several times leads to wastage
of storage space and increase in the total size of the data stored. Normalization
provides the designer with a systematic and scientific process of grouping of attributes
in a relation.
The normalization process can be defined as a procedure of analyzing and successive
reduction of the given relational schemas based on their FDs and primary keys to
achieve the desirable properties of -
(i) Minimizing redundancy,
(ii) minimizing insertion, deletion and update anomalies.
2. The objectives of the normalization process are
1- To create a formal framework for analyzing relation schemas based on their keys
and on the functional dependencies among their attributes.
2- To free relations from undesirable insertion, update and deletion anomalies.
3- To reduce the need for restructuring the relations as new data types are
introduced.
4- To carry out series of tests on individual relation schema so that the relational
database can be normalized to some degree. When a test fails, the relation violating
that test must be decomposed into relations that individually meet the normalization
tests.
3. *
*The main goal of Database Normalization is to
restructure the logical data model of a database to:
*Eliminate redundancy
*Organize data efficiently
*Reduce the potential for data anomalies.
4. *
*First normal form (1NF) is a property of a relation in a relational
database. A relation is in first normal form if the domain of each
attribute contains only atomic values, and the value of each attribute
contains only a single value from that domain.
*First normal form is an essential property of a relation in a relational
database. Database normalization is the process of representing a
database in terms of relations in standard normal forms, where first
normal is a minimal requirement.
5. *Examples
The following scenario illustrates how a database design might violate first normal form.
Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table which
looks like this:
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
789 Maria Fernandez 555-808-9633
Customer
The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons
that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one
value:
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright
555-403-1659
555-776-4100
789 Maria Fernandez 555-808-9633
Customer
Assuming, however, that the Telephone Number column is defined on some telephone number-like domain, such as the
domain of 12-character strings, the representation above is not in first normal form. It is in violation of first normal form
as a single field has been allowed to contain multiple values. A typical relational database management system will not
allow fields in a table to contain multiple values in this way.
6. *A design that complies with 1NF
A design that is unambiguously in first normal form makes use of two tables: a
Customer Name table and a Customer Telephone Number table.
Customer Name
Customer ID First Name Surname
123 Robert Ingram
456 Jane Wright
789 Maria Fernandez
Customer Telephone Number
Customer ID Telephone
Number
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-
Telephone Number link appears on its own record. With Customer ID as key, a one-to-many
relationship exists between the two tables. A record in the "parent" table, Customer Name,
can have many telephone number records in the "child" table, Customer Telephone Number,
but each telephone number belongs to one, and only one customer. It is worth noting that this
design meets the additional requirements for second and third normal form.
7. *
Second normal form (2NF) is a normal form used in database normalization.
A table that is in first normal form (1NF) must meet additional criteria if it is to
qualify for second normal form. Specifically: a table is in 2NF if and only if it is in
1NF and no non-prime attribute is dependent on any proper subset of any candidate
key of the table. A non-prime attribute of a table is an attribute that is not a part of
any candidate key of the table.
Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute
of the table is dependent on the whole of a candidate key.
8. *Example
Consider a table describing employees' skills:
Employees' Skills
Employee Skill Current Work Location
Brown Light Cleaning 73 Industrial Way
Brown Typing 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
Jones Shorthand 114 Main Street
Jones Typing 114 Main Street
Jones Whittling 114 Main Street
Neither {Employee} nor {Skill} is a candidate key for the table.
This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill
might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee,
Skill} qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee.
Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told
three times that Jones works at 114 Main Street, and twice that Brown works at 73 Industrial Way. This redundancy makes
the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Shorthand" and
"Typing" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question
"What is Jones' current work location?"
9. *A 2NF alternative to this design would represent the same
information in two tables: an "Employees" table with candidate
key {Employee}, and an "Employees' Skills" table with candidate
key {Employee, Skill}:
Employees
Employee
Current Work
Location
Brown 73 Industrial Way
Harrison 73 Industrial Way
Jones 114 Main Street
Employees' Skills
Employee Skill
Brown Light
Cleaning
Brown Typing
Harrison Light
Cleaning
Jones Shorthand
Jones Typing
Jones Whittling
10. * Third normal form
The third normal form (3NF) is a normal form used in database normalization
Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:
* The relation R (table) is in second normal form (2NF)
* Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of
R.
* A non-prime attribute of R is an attribute that does not belong to any candidate key of R.
A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by
virtue of X → Y and Y → Z (where it is not the case that Y → X).
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982.
This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at
least one of the following conditions holds:
* X contains A (that is, X → A is trivial functional dependency), or
* X is a superkey, or
* Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each attribute
in A-X is contained in some candidate key
11. * An example of a 2NF table that fails to meet the requirements of 3NF is:
Tournament Winners
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the
composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row.
That is, {Tournament, Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate
key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on
Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown
with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Tournament Winners
Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Player Dates of Birth
Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968
12. *Normalization beyond 3NF
Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF
tables, rarely met with in practice, are affected by such anomalies; these are tables which
either fall short of Boyce–Codd normal form (BCNF) or, if they meet BCNF, fall short of the
higher normal forms 4NF or 5NF.
Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database
normalization. It is a slightly stronger version of the third normal form (3NF).
* If a relational schema is in BCNF then all redundancy based on functional dependency has
been removed, although other types of redundancy may still exist. A relational schema R is
in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least
one of the following conditions hold:
* X → Y is a trivial functional dependency (Y ⊆ X)
* X is a superkey for schema R
13. *3NF tables not meeting BCNF (Boyce–Codd normal form)
Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which
does not have multiple overlapping candidate keys is guaranteed to be in BCNF. Depending
on what its functional dependencies are, a 3NF table with two or more overlapping
candidate keys may or may not be in BCNF.
* An example of a 3NF table that does not meet BCNF is:
Today's Court Bookings
Court Start Time End Time Rate Type
1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A
14. * Each row in the table represents a court booking at a tennis club that has one hard court (Court 1)
and one grass court (Court 2)
* A booking is defined by its Court and the period for which the Court is reserved
* Additionally, each booking has a Rate Type associated with it. There are four distinct rate types:
* SAVER, for Court 1 bookings made by members
* STANDARD, for Court 1 bookings made by non-members
* PREMIUM-A, for Court 2 bookings made by members
* PREMIUM-B, for Court 2 bookings made by non-members
Note that even though in the above table Start Time and End Time attributes have no duplicate values
for each of them, we still have to admit that in some other days two different bookings on court 1 and
court 2 could start at the same time or end at the same time. This is the reason why {Start Time} and
{End Time} cannot be considered as the table's super keys.
However, only S1, S2, S3 and S4 are candidate keys (that is, minimal superkeys for that
relation) because e.g. S1 ⊂ S5, so S5 cannot be a candidate key.
15. * In Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong
to some candidate key. Therefore the table adheres to both 2NF and 3NF.
* The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in
which the determining attribute (Rate Type) is neither a candidate key nor a superset of a
candidate key.
* Dependency Rate Type → Court is respected as a Rate Type should only ever apply to a single
Court.
* The design can be amended so that it meets BCNF:
Rate Types
Rate Type Court
Member
Flag
SAVER 1 Yes
STANDARD 1 No
PREMIUM-A 2 Yes
PREMIUM-B 2 No
Today's Bookings
Rate Type Start Time End Time
SAVER 09:30 10:30
SAVER 11:00 12:00
STANDARD 14:00 15:30
PREMIUM-B 10:00 11:30
PREMIUM-B 11:30 13:30
PREMIUM-A 15:00 16:30
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag};
the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate
Type, End Time}. Both tables are in BCNF
16. *Fourth normal form
Fourth normal form (4NF) is a normal form used in database normalization , 4NF is the next level
of normalization after Boyce–Codd normal form (BCNF).
Whereas the second, third, and Boyce–Codd normal forms are concerned with functional
dependencies, 4NF is concerned with a more general type of dependency known as a multivalued
dependency.
A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is
a super key—that is, X is either a candidate key or a superset thereof
Consider the following example:
Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville
17. * Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
* The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery
Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza
varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF.
The problem is that the table features two non-trivial multivalued dependencies on the
{Restaurant} attribute (which is not a super key).
The dependencies are:
* {Restaurant} {Pizza Variety}
* {Restaurant} {Delivery Area}
* These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties
of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This
state of affairs leads to redundancy in the table:
for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts
producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's
delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add
Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the
multivalued dependency {Restaurant} {Pizza Variety}.
* To eliminate the possibility of these anomalies, we must place the facts about varieties offered
into a different table from the facts about delivery areas, yielding two tables that are both in
4NF:
18. Varieties By Restaurant
Restaurant Pizza Variety
A1 Pizza Thick Crust
A1 Pizza Stuffed Crust
Elite Pizza Thin Crust
Elite Pizza Stuffed Crust
Vincenzo's Pizza Thick Crust
Vincenzo's Pizza Thin Crust
Delivery Areas By Restaurant
Restaurant Delivery Area
A1 Pizza Springfield
A1 Pizza Shelbyville
A1 Pizza Capital City
Elite Pizza Capital City
Vincenzo's Pizza Springfield
Vincenzo's Pizza Shelbyville
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary
from one delivery area to another, the original three-column table would satisfy 4NF
19. * Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is
a level of database normalization designed to reduce redundancy in relational
databases recording multi-valued facts by isolating semantically related
multiple relationships. A table is said to be in the 5NF if and only if every non-trivial
join dependency in it is implied by the candidate keys.
Consider the following example:
Traveling Salesman Brand Product Type
Jack Schneider Acme Vacuum Cleaner
Jack Schneider Acme Breadbox
Willy Loman Robusto Pruning Shears
Willy Loman Robusto Vacuum Cleaner
Willy Loman Robusto Breadbox
Willy Loman Robusto Umbrella Stand
Louis Ferguson Robusto Vacuum Cleaner
Louis Ferguson Robusto Telescope
Louis Ferguson Acme Vacuum Cleaner
Louis Ferguson Acme Lava Lamp
Louis Ferguson Nimbus Tie Rack
Traveling Salesman Product
Availability By Brand
20. * The table's predicate is: Products of the type designated by Product Type, made by the brand
designated by Brand, are available from the traveling salesman designated by Traveling Salesman.
* In the absence of any rules restricting the valid possible combinations of Traveling Salesman, Brand,
and Product Type, the three-attribute table above is necessary in order to model the situation correctly.
* Suppose, however, that the following rule applies: A Traveling Salesman has certain Brands and
certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product
Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P), the
Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made
by Brand B2.
* In that case, it is possible to split the table into three:
Product Types By Traveling Salesman
Traveling
Salesman
Product Type
Jack Schneider Vacuum Cleaner
Jack Schneider Breadbox
Willy Loman Pruning Shears
Willy Loman Vacuum Cleaner
Willy Loman Breadbox
Willy Loman Umbrella Stand
Louis Ferguson Telescope
Louis Ferguson Vacuum Cleaner
Louis Ferguson Lava Lamp
Louis Ferguson Tie Rack
Brands By Traveling
Salesman
Traveling
Salesman
Brand
Jack
Schneider
Acme
Willy
Loman
Robusto
Louis
Ferguson
Robusto
Louis
Ferguson
Acme
Louis
Ferguson
Nimbus
Product Types By Brand
Brand Product Type
Acme Vacuum Cleaner
Acme Breadbox
Acme Lava Lamp
Robusto Pruning Shears
Robusto Vacuum Cleaner
Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Nimbus Tie Rack