NORMALIZATION
SYNOPSIS :
1. Introduction
2. Table : Company
3. Definition of Normalization
4. Rules of Normalization
5. Normalization form tables
6. Conclusion
INTRODUCTION :
In this exercise I am looking at the
optimisation of data structure which is named
as a company . The example system I am
going to use as a model is a database to keep
track of the company of an organisation
working on differentprojects.
TABLE : COMPANY
Problem without Normalization
Definition of Normalization :
❖ Normalization is the process of
removingredundant data from our tables
to improve storage efficiency,data
integrityand scalability .
❖ It is the process of reorganizingdata
in a database so that it meets two basic
requirements:
1. There is no redundancy of data, all data
is stored in only one place.
2. Data dependenciesare logical,all related
data items are stored together.
❖Normalization is important for many
reasons, but chiefly because it allows
databases to take up as little disk space as
possible, resulting in increased
performance.
❖It generally involves splitting existing
tables into multiple ones ,which must be
rejoined or linked each time a query is
issued.
❖Normalization is also known as data
normalization.
Why is Normalization used ?
➢ The relation derived from the user
view or data store will most likely be
unnormalized.
➢ The problem usually happens when
an existing system uses unstructured
file
Example : MS Excel.
Normalization Avoids
● Duplication of Data- The same data is
listed in multiple lines of the database
● Insert Anomaly- A record about an entity
cannot be inserted into the table without
first insertinginformation about another
entity - Cannot enter a customer without a
sales order
● Delete Anomaly- A record cannot be
deleted without deletinga record about a
related entity. Cannot delete a sales order
without deleting all of the customer's
information.
● Update Anomaly- Cannot update
information without changing
information in many places. To update
customer information,it must be updated
for each sales order the customer has
placed
Normalization Rule
Normalization rules are dividedinto the
following normal forms:
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce- Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)
First Normal Form (1NF)
Definition:
An entity is in the first normal form if it
contains no repeatinggroups. In relational
terms, a table is in the first normal form if it
contains no repeatingcolumns. Repeating
columns makes our data less flexible, waste
disk space, and makes it more difficult to
search for data.
It has the following conditions :
1. The values of each attribute is atomic.
2. All entries in any column must be of the same
kind.
3. Each column must have a unique name.
4. No two rows are identical.
5. No composite values.
Additional :
Choose a primary key
Reminder :
A primary key is unique ,not null
,unchanged .
It can be either an attribute or combined
attributes.
Functional Dependency:
The value of one attribute in a table is
determinedentirelyby the value of another.
Second Normal Form (2NF)
Definition:
A relation is in 2NF if it is in 1NF and every
non-key attribute is fully dependenton each
candidate key of the relation.
● Remove Partial Dependencies.
● Functional Dependency:The value of one
attribute in a table is determinedentirely
by the value of another.
● Partial Dependency:A type of functional
dependencywhere an attribute is
functionally dependenton only part of the
primary key (primary key must be a
composite key).
● Create a separate table with the
functionally dependentdata and the part
of the key on which it depends. The tables
created at this step will usually contain
descriptionsof resources.
It has the following conditions:
1. It should satisfy all the conditionsof 1 st
normal form.
2. All non-primeor non-key attributes are
fully functional dependenton each composite
primary key. 2NF has no partial dependency.
Third Normal Form (3NF)
Definition :
A relation is in third normal form if it is in
2NF and every non-key attribute of the
relation is non-transitivelydependenton
each candidate key of the relation.
It has the following conditions:
● It is in second normal form
● There is no transitive functional
dependency. No non-key attributes
● depends on other non-key attributes. It
should depend only on the
● primary key.
● Remove transitive dependencies.
● TransitiveDependency:
TransitiveDependencyA type of
functional dependency where an attribute
is functionally dependenton an attribute
other than the primary key. Thus its value
is only indirectly determinedby the
primary key.
● Create a separate table containingthe
attribute and the fields that are
functionally dependenton it. The tables
created at this step will usually contain
descriptionsof either resources or agents.
Keep a copy of the key attribute in the
original file.
Primary key :
In theory I choose the Employee name to be
a primary key . but in practice ,I add
Emp_num as the primary key.
Fourth Normal Form (4NF)
Definition:
A table is in fourth normal form (4NF) if
and only if it is in BCNF and contains no more
than one multi-valued dependency.
An entity is in Fourth Normal Form (4NF)
when it meets the requirementof beingin
Third Normal Form (3NF) and additionally:
It has the following conditions
● It should be in Boyce Codd Normal Form.
● It should not have the multivalued
dependency. (two 1:N relation)
● Has no multiple sets of multi-valued
dependencies.In other words, 4NF states
that no entity can have more than a single
one-to-many relationship within an
entity if the one-to-many attributes are
independentof each other.
● Fourth Normal Form applies to situations
involving many-to-manyrelationships.
In relational databases, many-to-many
relationships are expressed through cross-
referencetables.
Fifth Normal Form (5NF)
Definition:
A table is in the fifth normal form (5NF)
or Project-Join Normal Form (PJNF) if it is in
4NF and it cannot have a lossless
decomposition into any number of smaller
tables.
Fifth normal form, also known as join-
projection normal form (JPNF), states that no
non-trivial join dependenciesexist. 5NF
states that any fact should be able to be
reconstructed without any anomalous results
in any case, regardless of the number of tables
being joined. A 5NF table should have only
candidate keys and its primary key should
consist of only a single column.
In 5th NF
It has the following conditions
● It should be in 4NF.
● It should not have the join dependency.
(Non additive lossless Join)
● If Join Dependencyexists, then
● It should have the trivial join dependency.
● A relation that has a join dependency
cannot be decomposed by a projection
into other relations without spurious
results
● A relation is in 5NF when its information
content cannot be reconstructed from
several smaller relations i.e. from
relations having fewer attributes than the
original relation
CONCLUSION :
★ Normalization is used to keep data
consistentand check that no loss of data
as well as data integrityis there.
★ Its complexity may lead to higher
degree of join operations which
sometimeslead to the degraded
throughput times.

Normalization in relational database management systems

  • 1.
    NORMALIZATION SYNOPSIS : 1. Introduction 2.Table : Company 3. Definition of Normalization 4. Rules of Normalization 5. Normalization form tables 6. Conclusion INTRODUCTION : In this exercise I am looking at the optimisation of data structure which is named as a company . The example system I am going to use as a model is a database to keep track of the company of an organisation working on differentprojects.
  • 2.
    TABLE : COMPANY Problemwithout Normalization Definition of Normalization : ❖ Normalization is the process of removingredundant data from our tables to improve storage efficiency,data integrityand scalability .
  • 3.
    ❖ It isthe process of reorganizingdata in a database so that it meets two basic requirements: 1. There is no redundancy of data, all data is stored in only one place. 2. Data dependenciesare logical,all related data items are stored together. ❖Normalization is important for many reasons, but chiefly because it allows databases to take up as little disk space as possible, resulting in increased performance. ❖It generally involves splitting existing tables into multiple ones ,which must be rejoined or linked each time a query is issued.
  • 4.
    ❖Normalization is alsoknown as data normalization. Why is Normalization used ? ➢ The relation derived from the user view or data store will most likely be unnormalized. ➢ The problem usually happens when an existing system uses unstructured file Example : MS Excel. Normalization Avoids ● Duplication of Data- The same data is listed in multiple lines of the database ● Insert Anomaly- A record about an entity cannot be inserted into the table without first insertinginformation about another
  • 5.
    entity - Cannotenter a customer without a sales order ● Delete Anomaly- A record cannot be deleted without deletinga record about a related entity. Cannot delete a sales order without deleting all of the customer's information. ● Update Anomaly- Cannot update information without changing information in many places. To update customer information,it must be updated for each sales order the customer has placed Normalization Rule
  • 6.
    Normalization rules aredividedinto the following normal forms: 1. First Normal Form (1NF) 2. Second Normal Form (2NF) 3. Third Normal Form (3NF) 4. Boyce- Codd Normal Form (BCNF) 5. Fourth Normal Form (4NF)
  • 7.
    6. Fifth NormalForm (5NF) First Normal Form (1NF) Definition: An entity is in the first normal form if it contains no repeatinggroups. In relational terms, a table is in the first normal form if it contains no repeatingcolumns. Repeating columns makes our data less flexible, waste disk space, and makes it more difficult to search for data. It has the following conditions : 1. The values of each attribute is atomic. 2. All entries in any column must be of the same kind. 3. Each column must have a unique name.
  • 8.
    4. No tworows are identical. 5. No composite values. Additional : Choose a primary key Reminder : A primary key is unique ,not null ,unchanged . It can be either an attribute or combined attributes. Functional Dependency: The value of one attribute in a table is determinedentirelyby the value of another.
  • 10.
    Second Normal Form(2NF) Definition:
  • 11.
    A relation isin 2NF if it is in 1NF and every non-key attribute is fully dependenton each candidate key of the relation. ● Remove Partial Dependencies. ● Functional Dependency:The value of one attribute in a table is determinedentirely by the value of another. ● Partial Dependency:A type of functional dependencywhere an attribute is functionally dependenton only part of the primary key (primary key must be a composite key). ● Create a separate table with the functionally dependentdata and the part of the key on which it depends. The tables
  • 12.
    created at thisstep will usually contain descriptionsof resources. It has the following conditions: 1. It should satisfy all the conditionsof 1 st normal form. 2. All non-primeor non-key attributes are fully functional dependenton each composite primary key. 2NF has no partial dependency.
  • 15.
    Third Normal Form(3NF) Definition : A relation is in third normal form if it is in 2NF and every non-key attribute of the relation is non-transitivelydependenton each candidate key of the relation. It has the following conditions: ● It is in second normal form ● There is no transitive functional dependency. No non-key attributes ● depends on other non-key attributes. It should depend only on the ● primary key. ● Remove transitive dependencies.
  • 16.
    ● TransitiveDependency: TransitiveDependencyA typeof functional dependency where an attribute is functionally dependenton an attribute other than the primary key. Thus its value is only indirectly determinedby the primary key. ● Create a separate table containingthe attribute and the fields that are functionally dependenton it. The tables created at this step will usually contain descriptionsof either resources or agents. Keep a copy of the key attribute in the original file.
  • 18.
    Primary key : Intheory I choose the Employee name to be a primary key . but in practice ,I add Emp_num as the primary key. Fourth Normal Form (4NF) Definition: A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-valued dependency. An entity is in Fourth Normal Form (4NF) when it meets the requirementof beingin Third Normal Form (3NF) and additionally: It has the following conditions ● It should be in Boyce Codd Normal Form.
  • 19.
    ● It shouldnot have the multivalued dependency. (two 1:N relation) ● Has no multiple sets of multi-valued dependencies.In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independentof each other. ● Fourth Normal Form applies to situations involving many-to-manyrelationships. In relational databases, many-to-many relationships are expressed through cross- referencetables.
  • 21.
    Fifth Normal Form(5NF) Definition: A table is in the fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it cannot have a lossless decomposition into any number of smaller tables. Fifth normal form, also known as join- projection normal form (JPNF), states that no non-trivial join dependenciesexist. 5NF states that any fact should be able to be
  • 22.
    reconstructed without anyanomalous results in any case, regardless of the number of tables being joined. A 5NF table should have only candidate keys and its primary key should consist of only a single column. In 5th NF It has the following conditions ● It should be in 4NF. ● It should not have the join dependency. (Non additive lossless Join) ● If Join Dependencyexists, then ● It should have the trivial join dependency. ● A relation that has a join dependency cannot be decomposed by a projection into other relations without spurious results
  • 23.
    ● A relationis in 5NF when its information content cannot be reconstructed from several smaller relations i.e. from relations having fewer attributes than the original relation
  • 26.
    CONCLUSION : ★ Normalizationis used to keep data consistentand check that no loss of data as well as data integrityis there. ★ Its complexity may lead to higher degree of join operations which sometimeslead to the degraded throughput times.