Chapter Four Logical Database Design (Normalization).pptx

Chapter Four
Normalization
Logical Database Design

Normal Forms &
Normalization
2

Normalization
3
 Normalization: the process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
 Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
 The main goal of Database Normalization is to
restructure the logical data model of a database to:
 Eliminate redundancy
 Organize data efficiently
 Reduce the potential for data anomalies

Data Anomalies
 Mixing attributes of multiple entities may cause
problems
 Information is stored redundantly wasting storage
 Well structured relations contain minimal redundancy of data
 They allow modification, insertion and deletion of data in the
relation without error
 Data Anomalies are errors/inconsistencies that arise due to
redundantly stored data in a relation
 The three most common anomalies in relational database
design are:
Insertion anomalies
Deletion anomalies
Modification anomalies (update anomalies)
4

Anomalies…
5
• Let’s consider the EMP_DEPT relation

Data Anomalies: Insertion Anomalies
These type of data anomalies occur when we try to
insert new records to a relation.
Insertion anomalies can be differentiated into two
types:
6

Data Anomalies: Insertion Anomalies
7
1.To insert a new employee tuple into EMP_DEPT, we must
include either the attribute values for the department that the
employee works for, or nulls (if the employee does not work
for a department as yet)
2. It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation.
 The only way to do this is to place null values in the
attributes for employee.
 This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an
employee entity-not a department entity
 Moreover, when the first employee is assigned to that
department, we do not need this tuple with null values
any more.

Data Anomalies: Deletion anomalies
 E.g: If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a
particular department, the information concerning that
department is lost from the database
 These type of anomalies occur when critical data has been
unintentionally (perhaps) removed from the database
8

Data Anomalies: Modification/Update Anomalies
 These anomalies arise when the database must make
multiple changes on records to reflect a single attribute
change
 Example:
 In EMP_DEPT, if we change the value of one of the
attributes of a particular department-say, the manager of
department 5-we must update the tuples of all employees
who work in that department; otherwise, the database will
become inconsistent.
 If we fail to update some tuples, the same department will
be shown to have two different values for manager in
different employee tuples, which would be wrong
9

Practical Use of Normal Forms
10
 Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
 The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
 Denormalization: the process of storing the join of
higher normal form relations as a base relation—
which is in a lower normal form

Normalization and Normal Forms
 The normalization process, as first proposed by Codd (l972a),
takes a relation schema through a series of tests to "certify"
whether it satisfies a certain normal form.
 Normalization helps to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for anomalies during data operations,
and
Improve data consistency
11

Normalization and Normal Forms
 In the relational model, methods exist for quantifying how
efficient a database is.
 These classifications are called normal forms (or NF), and
there are algorithms for converting a given database between
them
 Edgar F. Codd originally established three normal forms:
 1NF
 2NF and
 3NF
 Later, others like BCNF, 4NF and 5NF were introduced and
were generally accepted, but 3NF is widely considered to be
sufficient for most applications
 Most tables when reaching 3NF are also in BCNF (Boyce-
Codd Normal Form)
12

Normal Forms: First Normal Form (1NF)
 A relation (table) R is in 1NF if and only if all underlying domains of
attributes contain only atomic values (simple/non divisible)
 Each attribute must be atomic
• No repeating columns within a row
• No multi-valued columns.
 1NF simplifies non atomic attributes
• Queries become easier
 Normalization (Decomposition)
 There are three options to normalize a relation into 1NF (as discussed
in the next slide) but the best option is to form new relation for each
non-atomic attribute or nested relations
 Example: Employee Relation ( un normalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
13

Normal Forms: First Normal Form (1NF)
 There are three techniques to achieve a 1NF for such relation:
 Expand the key so that there will be a separate tuple in the original Employee
relation for each Skill of Employee. But this option has the disadvantage of
introducing redundancy in the relation
 Remove the attribute Skills that violates 1NF and place it in a separate relation
EMP_SKILLS along with the primary key Emp_no of Employee
 This decomposes the non-1NF relation into two 1NFrelations with the
following Schemas:
 Employee (emp_no,name,dept_no,dept_name)
 Emp_Skills (emp_no,skills)
 If a maximum number of values is known for the non-atomic attribute-for
example, if it is known that at most three skills can exist for an employee-
replace the Skills attribute by three atomic attributes: Skill1,Skill2 and
Skill3. But this has the disadvantage of introducing null values if some
employees has less than three skills
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
14

Normal Forms: Second Normal Form (2NF)
 Second normal form (2NF) is based on the concept of full
functional dependency. A functional dependency XY is a
full functional dependency if removal of any attribute A from X
means that the dependency does not hold any more;
 The test for 2NF involves testing for functional
dependencies whose left-hand side attributes are part of
the primary key. If the primary key contains a single
attribute, the test need not be applied at all
15

 Example: Consider Employee -Project Relation schema below
 The relation is in 1NF
 But the functional dependencies FD2 and FD3 make ENAME, PNAME, and
PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ, thus violating the 2NF test.
 Normalizing the relation into 2NF hence leads to the decomposition of
EMP_PROJ into the three relation schemas EPl, EP2, and EP3 as shown below:
Normal Forms: Second Normal Form (2NF)
16

Normal Forms: Third Normal Form (3NF)
 Third normal form (3NF) is based on the concept of
transitive dependency
 A functional dependency XY in a relation schema R is a
transitive dependency if there is a set of attributes Z that is
neither a candidate key nor a subset of any key of R, and
both XZ and ZY hold.
 Example:
 The dependency SSNDMGRSSN is transitive through
DNUMBER in EMP_DEP
17

Normal Forms: Third Normal Form (3NF)
 Example:
 The relation EMP_DEPT is in 2NF since no partial dependencies on a key exist
 However, EMP_DEPT is not in 3NF because of the transitive dependency of
DMGRSSN (and also DNAME) on SSN via DNUMBER.
 We can normalize EMP_DEPT by decomposing it into the two 3NF relation
schemas ED1 and ED2 as shown below
18

Individual Assignment 10%
19
Boyce-Codd Normal Form (BCNF)
Normal Forms: Fourth Normal Form
(4NF)
Fifth Normal Form (5NF)

Denormalization
 Normalization is performed to reduce or eliminate Insertion, Deletion or
Update anomalies
 However, a completely normalized database may not be the most
efficient or effective implementation
 “Denormalization” is sometimes used to improve efficiency.
Denormalization
 Is the process of selectively taking normalized tables and re-combining
the data in them
 Usually driven by the need to improve query speed.
20

Normalization
 Improves maintenance for database changes
 Tends to slow down retrieval
 Better at finding problems than solving them
 Standard normalization procedures are subtle and
may introduce BCNF or 4NF problems into tables

Intuitive(Accepted) by Normalization
1NF Tables represent entities
2NF Each table represents only one entity
3NF Tables do not contain attributes from embedded
entities
4NF Triple relationships should not represent a pair
of dual relationships

Exercise
1. Given the Grade report relation below and its functional dependencies,
normalize the relation
Gradereport (StudNo, StudName, Major, Advisor, CourseNo, Ctitle, InstName,
InstrucLocn, Grade)
Functional Dependencies:
• StudNo -> StudName
• CourseNo -> Ctitle, InstrucName
• InstrucName -> InstrucLocn
• StudNo, CourseNo, Major -> Grade
• StudNo, Major -> Advisor
• Advisor -> Major
23

Examples
1NF: Remove repeating groups
 Student (StudNo, StudName)
 StudMajor (StudNo, Major, Advisor)
 StudCourse (StudNo, Major, Course No, Ctitle, InstrucName,
InstructLocn,
Grade)
2NF: Remove partial key dependencies
 Student (StudNo, StudName)
 StudMajor (StudNo, Major, Advisor)
 StudCourse (StudNo, Major, CourseNo, Grade)
 Course (CourseNo, Ctitle, InstrucName, InstructLocn)
24

Example: Company Database
 The COMPANY database keeps track of a company's employees, departments,
and projects. Suppose that after the requirements collection and analysis phase,
the database designers provided the following description of the the part of the
company to be represented in the database:
 The company is organized into departments. Each department has a unique
name, a unique number, and a particular employee who manages the department.
We keep track of the start date when that employee began managing the
department. A department may have several locations.
 A department controls a number of projects, each of which has a unique name, a
unique number, and a single location
 We store each employee's name, social security number, address, salary, sex, and
birth date. An employee is assigned to one department but may work on several
projects, which are not necessarily controlled by the same department. We keep
track of the number of hours per week that an employee works on each project.
We also keep track of the direct supervisor of each employee
 We want to keep track of the dependents of each employee for insurance pur-
poses. We keep each dependent's first name, sex, birth date, and relationship to
the employee.
25

ER- Schema of Company Database
26

Relational Schemas of the Company Database
27

Reading Assignments
1. Discuss the correspondences between the ER model constructs and the
relational model constructs. Show how each ER model construct can be
mapped to the relational model, and discuss any alternative mappings
2. Discuss the options for mapping EER model constructs to relations.
3. Why should nulls in a relation be avoided as far as possible?
4. What does spurious tuples refer to? Discuss the problem of spurious
tuples and how we may prevent it
5. Discuss insertion, deletion, and modification anomalies. Why are they
considered bad? Illustrate with examples.
6. What does the term unnormalized relation refer to?
7. What undesirable dependencies are avoided when a relation is in 2NF?
8. What undesirable dependencies are avoided when a relation is in 3NF?
9. Define Boyce-Codd normal form. How does it differ from 3NF?Why is it
considered a stronger form of 3NF?
28

Chapter Four Logical Database Design (Normalization).pptx

More Related Content

Similar to Chapter Four Logical Database Design (Normalization).pptx

Recently uploaded

Chapter Four Logical Database Design (Normalization).pptx

Editor's Notes