transection mangemanrtin data bse mangement system .pdf

UNIT 4: Data Normalization: Anomalies in relational database design.
Decomposition. Functional dependencies. Normalization. First normal
form,
Second normal form, Third normal form. Boyce-Codd normal form.
Dr Hamela K
Dept of Computer Science
GFGC, Malur
Prepared by Dr Hamela, GFGC, Malur

Types of Anomalies in DBMS
Insert Anomaly: The term "insertion anomaly" is used to describe when a
new row is added to a table and it causes an inconsistency.
Update Anomaly: If there are some changes in the database, we have to
apply that change in all the rows. And if we miss any row, we will have one
more field, creating an update anomaly in the database.
Delete Anomaly: The term "deletion anomaly in the database" is used
when we delete some rows from a table and any necessary additional
information or data is also lost from the database.

Assume a manufacturing company stores employee details in a table called Employee
having four attributes:

Insert anomaly
If there is a new row inserted in the table and it creates the inconsistency in the table
then it is called the insertion anomaly.
Example
Assume that a new employee is joining the company under training and not assigned to
any department. Then, we would not insert the data into the table if the emp_dept field
doesn't allow nulls.

Update anomaly
When we update some rows in the table, and if it leads to the
inconsistency of the table then this anomaly occurs. This type
of anomaly is known as an updation anomaly.
Example
In the given table, we have two rows for an employee named
Rick, and he belongs to two different departments of the
company. If we need to update Rick's address, we must update
the same address in two rows. Otherwise, the data will become
inconsistent.
If, in some way, we can update the correct address in one
department but not the other, then according to the database,
Rick will have two different addresses, which is not correct and
would lead to inconsistent data.

Delete anomaly
If we delete some rows from the table and if
any other information or data which is required
is also deleted from the database, this is called
the deletion anomaly in the database.
Example
Assume that if the company closes the
department D890, then deleting the rows that
have emp_dept as D890 would also delete the
information of employee Maggie since she is
assigned only to this department.

DECOMPOSITION
• Decomposition can be defined as a database management
system process for dividing a single relation into multiple sub-
relations.
• Its main purpose is to break down the functions of a company
into fine levels of detail.
• It eliminates the anomalies and redundancy from the database
by breaking it up into many different tables.

Types of decomposition

In a Lossy Decomposition, the relation needs to be decomposed into two or
more relational schemas. There is no way that loss of information can be
avoided during the retrieval of the original relation.
Now, we won’t be
able to join the above
tables, since Emp_ID
isn’t part of the
DeptDetails relation.
Therefore, the above
relation has lossy
decomposition.

Lossless Decomposition
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables
using Joins. This is the preferred choice. The information will not lose from the relation
when decomposed. The join would result in the same original relation.

Advantages of decomposition in DBMS?
Easy use of Codes
The availability of decomposition makes it easier for programs to copy and reuse important codes for
other works in DBMS. It only not helps in saving lots of time but also makes things convenient for the
users.
Finding Mistakes
Another reason the programmers opt for decomposition is to allow them conveniently complete complex
programs. The mistakes are quite easier to find with this sort of programming.
Problem-Solving Approach
It is considered a perfect problem-solving strategy using which complex computer programs can be
written easily. The users can precisely join tons of code together for adequate results.
Eliminating Errors
The biggest advantage of having decomposition in DBMS is eliminating the inconsistencies and
duplication to a greater extent. The data can be easily identified in cases when decomposition happens
in DBMS.

Properties of decomposition
The programmers must be aware of the main properties of decomposition in DBMS.
We have mentioned the major ones in detail below:
Attribute Preservation
The functional dependencies decompose the universal relation that attributes
preservation of decomposition.
Dependency Preservation
Dependency preservation can be defined as the functionality that features directly
in the relation schemas. There is a chance of dependency loss if the decomposition
is not preserved.
•No Redundancy
It is used for removing a few of the issues related to improper design, such as
redundancy, anomalies, and inconsistencies.

Issues of decomposition in DBMS?
There are many problems regarding the decomposition in DBMS mentioned below:
Redundant Storage
Many instances where the same information gets stored in a single place can confuse
the programmers. It will take lots of space in the system.
Insertion Anomalies
It isn’t essential for storing important details unless some kind of information is stored
in a consistent manner.
Deletion Anomalies
It isn’t possible to delete some details without eliminating any sort of information.

Functional Dependency
The functional dependency is a relationship that exists between
two attributes. It typically exists between the primary key and
non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of
the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id,
Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name
attribute of employee table because if we know the Emp_Id, we
can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on
Emp_Id.

Types of Functional dependency
Types of Functional dependency

1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and
Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial
functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name →
Employee_Name are trivial dependencies too.

2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a
subset of A.
When A intersection B is NULL, then A → B is called as
complete non-trivial.
Example:
ID → Name,
Name → DOB

Armstrong's Axioms
If F is a set of functional dependencies then the closure
of F, denoted as F+, is the set of all functional
dependencies logically implied by F. Armstrong's
Axioms are a set of rules, that when applied
repeatedly, generates a closure of functional
dependencies.
Reflexive rule − If alpha is a set of attributes and beta
is_subset_of alpha, then alpha holds beta.
Augmentation rule − If a → b holds and y is attribute
set, then ay → by also holds. That is adding attributes
in dependencies, does not change the basic
dependencies.
Transitivity rule − Same as transitive rule in algebra, if
a → b holds and b → c holds, then a → c also holds. a
→ b is called as a functionally that determines b.

• Trivial Functional Dependency
• Trivial − If a functional dependency (FD) X → Y holds,
where Y is a subset of X, then it is called a trivial FD.
Trivial FDs always hold.
• Non-trivial − If an FD X → Y holds, where Y is not a
subset of X, then it is called a non-trivial FD.
• Completely non-trivial − If an FD X → Y holds, where
x intersect Y = Φ, it is said to be a completely non-
trivial FD.

The Purpose of Normalization
Normalization is a technique for producing a set of
relations with desirable properties, given the data
requirements of an enterprise.
The process of normalization is a formal method that
identifies relations based on their primary or candidate
keys and the functional dependencies among their
attributes.

Problem Without Normalization
Without Normalization, it becomes difficult to handle and update the database,
without facing data loss. Insertion, Updation and Deletion Anamolies are very
frequent if Database is not Normalized.

Purpose
• Normalization is the process of organizing the data in
the database.
• Normalization is used to minimize the redundancy
from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller
and links them using relationships.
• The normal form is used to reduce redundancy from
the database table.

Advantages of Normalization
• Normalization helps to minimize data
redundancy.
• Greater overall database organization.
• Data consistency within the database.
• Much more flexible database design.
• Enforces the concept of relational integrity.

Disadvantages of Normalization
We cannot start building the database before knowing
what the user needs.
The performance degrades when normalizing the
relations to higher normal forms, i.e., 4NF, 5NF.
It is very time-consuming and difficult to normalize
relations of a higher degree.
Careless decomposition may lead to a bad database
design and serious problems.

FIRST NORMAL FORM
• First Normal Form is a relation in which the
intersection of each row and column contains one
and only one value.

As per First Normal Form, no two Rows of data must contain repeating group
of information i.e each set of column must have a unique value, such that
multiple columns cannot be used to fetch the same row. Each table should be
organized into rows, and each row should have a primary key that
distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one
column can be combined to create a single primary key. For example consider
a table which is not in First normal form
Student Table :
Student
Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths

In First Normal Form, any row must not have a column in which more
than one value is saved, like separated with commas. Rather than that,
we must separate such data into multiple rows.
Student Table following 1NF will be :
Student
Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many
columns with same data in multiple rows but each row as a whole will be unique.

Full functional dependency
Full functional dependency indicates that if A and B are
attributes of a relation, B is fully functionally dependent on
A if B is functionally dependent on A, but not on any proper
subset of A.
A functional dependency A→B is partially dependent if there
is some attributes that can be removed from A and the
dependency still holds.

Second Normal Form (2NF)
Second normal form (2NF) is a relation that is in first
normal form and every non-primary-key attribute is fully
functionally dependent on the primary key.
The normalization of 1NF relations to 2NF involves the
removal of partial dependencies. If a partial dependency
exists, we remove the function dependent attributes from
the relation by placing them in a new relation along with
a copy of their determinant.

• In example of First Normal Form there are two rows for Adam, to include
multiple subjects that he has opted for. While this is searchable, and follows
First normal form, it is an inefficient use of space. Also in the above Table in
First Normal Form, while the candidate key is {Student, Subject}, Age of
Student only depends on Student column, which is incorrect as per Second
Normal Form. To achieve second normal form, it would be helpful to split
out the subjects into an independent table, and match them up using the
student names as foreign keys.
New Student Table following
2NF will be :
Student
Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will
be Student column, because all other
column i.e Age is dependent on it.
New Subject Table introduced for 2NF
will be :
Student
Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths

• In Subject Table the candidate key will be {Student, Subject} column. Now,
both the above tables qualifies for Second Normal Form and will never
suffer from Update Anomalies. Although there are a few complex cases in
which table in Second Normal Form suffers Update Anomalies, and to
handle those scenarios Third Normal Form is there.

Third Normal Form (3NF)
Transitive dependency
A condition where A, B, and C are attributes of a relation such that
if A → B and B → C, then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).
Third normal form (3NF)
A relation that is in first and second normal form, and in which
no non-primary-key attribute is transitively dependent on the
primary key.
The normalization of 2NF relations to 3NF involves the removal
of transitive dependencies by placing the attribute(s) in a new
relation along with a copy of the determinant.

• Student_Detail Table :
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to
apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
Third Normal form applies that every non-prime attribute of table must be
dependent on primary key, or we can say that, there should not be the case
that a non-prime attribute is determined by another non-prime attribute. So
this transitive functional dependency should be removed from the table and
also the table must be in Second Normal form. For example, consider a
table with following fields.

New Student_Detail Table :
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transtive dependency is,
Amount of data duplication is reduced.
Data integrity achieved.

Boyce Codd normal form (BCNF)
• Boyce Codd normal form (BCNF)
• BCNF is the advance version of 3NF. It is stricter than
3NF.
• A table is in BCNF if every functional dependency X →
Y, X is the super key of the table.
• For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.

EMP_ID EMP_COUNT
RY
EMP_DEPT DEPT_TYPE EMP_DEPT_N
O
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
Example: Let's assume there is a company where employees work in more than
one department.
EMPLOYEE table:
In the above table Functional dependencies are as follows:
1.EMP_ID → EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_ID EMP_COUN
TRY
264 India
264 India
EMP_COUNTRY table:
EMP_DEPT DEPT_TYP
E
EMP_DEPT
_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
EMP_DEPT_MAPPING table:

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because the left side part of both the
functional dependencies is key.

transection mangemanrtin data bse mangement system .pdf

More Related Content

Similar to transection mangemanrtin data bse mangement system .pdf

Recently uploaded

transection mangemanrtin data bse mangement system .pdf