Chapter Four
Normalization
Logical Database Design
Normal Forms &
Normalization
2
Normalization
3
 Normalization: the process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
 Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
 The main goal of Database Normalization is to
restructure the logical data model of a database to:
 Eliminate redundancy
 Organize data efficiently
 Reduce the potential for data anomalies
Data Anomalies
 Mixing attributes of multiple entities may cause
problems
 Information is stored redundantly wasting storage
 Well structured relations contain minimal redundancy of data
 They allow modification, insertion and deletion of data in the
relation without error
 Data Anomalies are errors/inconsistencies that arise due to
redundantly stored data in a relation
 The three most common anomalies in relational database
design are:
Insertion anomalies
Deletion anomalies
Modification anomalies (update anomalies)
4
Anomalies…
5
• Let’s consider the EMP_DEPT relation
Data Anomalies: Insertion Anomalies
These type of data anomalies occur when we try to
insert new records to a relation.
Insertion anomalies can be differentiated into two
types:
6
Data Anomalies: Insertion Anomalies
7
1.To insert a new employee tuple into EMP_DEPT, we must
include either the attribute values for the department that the
employee works for, or nulls (if the employee does not work
for a department as yet)
2. It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation.
 The only way to do this is to place null values in the
attributes for employee.
 This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an
employee entity-not a department entity
 Moreover, when the first employee is assigned to that
department, we do not need this tuple with null values
any more.
Data Anomalies: Deletion anomalies
 E.g: If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a
particular department, the information concerning that
department is lost from the database
 These type of anomalies occur when critical data has been
unintentionally (perhaps) removed from the database
8
Data Anomalies: Modification/Update Anomalies
 These anomalies arise when the database must make
multiple changes on records to reflect a single attribute
change
 Example:
 In EMP_DEPT, if we change the value of one of the
attributes of a particular department-say, the manager of
department 5-we must update the tuples of all employees
who work in that department; otherwise, the database will
become inconsistent.
 If we fail to update some tuples, the same department will
be shown to have two different values for manager in
different employee tuples, which would be wrong
9
Practical Use of Normal Forms
10
 Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
 The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
 Denormalization: the process of storing the join of
higher normal form relations as a base relation—
which is in a lower normal form
Normalization and Normal Forms
 The normalization process, as first proposed by Codd (l972a),
takes a relation schema through a series of tests to "certify"
whether it satisfies a certain normal form.
 Normalization helps to:
Eliminate redundancy
Organize data efficiently
Reduce the potential for anomalies during data operations,
and
Improve data consistency
11
Normalization and Normal Forms
 In the relational model, methods exist for quantifying how
efficient a database is.
 These classifications are called normal forms (or NF), and
there are algorithms for converting a given database between
them
 Edgar F. Codd originally established three normal forms:
 1NF
 2NF and
 3NF
 Later, others like BCNF, 4NF and 5NF were introduced and
were generally accepted, but 3NF is widely considered to be
sufficient for most applications
 Most tables when reaching 3NF are also in BCNF (Boyce-
Codd Normal Form)
12
Normal Forms: First Normal Form (1NF)
 A relation (table) R is in 1NF if and only if all underlying domains of
attributes contain only atomic values (simple/non divisible)
 Each attribute must be atomic
• No repeating columns within a row
• No multi-valued columns.
 1NF simplifies non atomic attributes
• Queries become easier
 Normalization (Decomposition)
 There are three options to normalize a relation into 1NF (as discussed
in the next slide) but the best option is to form new relation for each
non-atomic attribute or nested relations
 Example: Employee Relation ( un normalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
13
Normal Forms: First Normal Form (1NF)
 There are three techniques to achieve a 1NF for such relation:
 Expand the key so that there will be a separate tuple in the original Employee
relation for each Skill of Employee. But this option has the disadvantage of
introducing redundancy in the relation
 Remove the attribute Skills that violates 1NF and place it in a separate relation
EMP_SKILLS along with the primary key Emp_no of Employee
 This decomposes the non-1NF relation into two 1NFrelations with the
following Schemas:
 Employee (emp_no,name,dept_no,dept_name)
 Emp_Skills (emp_no,skills)
 If a maximum number of values is known for the non-atomic attribute-for
example, if it is known that at most three skills can exist for an employee-
replace the Skills attribute by three atomic attributes: Skill1,Skill2 and
Skill3. But this has the disadvantage of introducing null values if some
employees has less than three skills
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
14
Normal Forms: Second Normal Form (2NF)
 Second normal form (2NF) is based on the concept of full
functional dependency. A functional dependency XY is a
full functional dependency if removal of any attribute A from X
means that the dependency does not hold any more;
 The test for 2NF involves testing for functional
dependencies whose left-hand side attributes are part of
the primary key. If the primary key contains a single
attribute, the test need not be applied at all
15
 Example: Consider Employee -Project Relation schema below
 The relation is in 1NF
 But the functional dependencies FD2 and FD3 make ENAME, PNAME, and
PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ, thus violating the 2NF test.
 Normalizing the relation into 2NF hence leads to the decomposition of
EMP_PROJ into the three relation schemas EPl, EP2, and EP3 as shown below:
Normal Forms: Second Normal Form (2NF)
16
Normal Forms: Third Normal Form (3NF)
 Third normal form (3NF) is based on the concept of
transitive dependency
 A functional dependency XY in a relation schema R is a
transitive dependency if there is a set of attributes Z that is
neither a candidate key nor a subset of any key of R, and
both XZ and ZY hold.
 Example:
 The dependency SSNDMGRSSN is transitive through
DNUMBER in EMP_DEP
17
Normal Forms: Third Normal Form (3NF)
 Example:
 The relation EMP_DEPT is in 2NF since no partial dependencies on a key exist
 However, EMP_DEPT is not in 3NF because of the transitive dependency of
DMGRSSN (and also DNAME) on SSN via DNUMBER.
 We can normalize EMP_DEPT by decomposing it into the two 3NF relation
schemas ED1 and ED2 as shown below
18
Individual Assignment 10%
19
Boyce-Codd Normal Form (BCNF)
Normal Forms: Fourth Normal Form
(4NF)
Fifth Normal Form (5NF)
Denormalization
 Normalization is performed to reduce or eliminate Insertion, Deletion or
Update anomalies
 However, a completely normalized database may not be the most
efficient or effective implementation
 “Denormalization” is sometimes used to improve efficiency.
Denormalization
 Is the process of selectively taking normalized tables and re-combining
the data in them
 Usually driven by the need to improve query speed.
20
Normalization
 Improves maintenance for database changes
 Tends to slow down retrieval
 Better at finding problems than solving them
 Standard normalization procedures are subtle and
may introduce BCNF or 4NF problems into tables
Intuitive(Accepted) by Normalization
1NF Tables represent entities
2NF Each table represents only one entity
3NF Tables do not contain attributes from embedded
entities
4NF Triple relationships should not represent a pair
of dual relationships
Exercise
1. Given the Grade report relation below and its functional dependencies,
normalize the relation
Gradereport (StudNo, StudName, Major, Advisor, CourseNo, Ctitle, InstName,
InstrucLocn, Grade)
Functional Dependencies:
• StudNo -> StudName
• CourseNo -> Ctitle, InstrucName
• InstrucName -> InstrucLocn
• StudNo, CourseNo, Major -> Grade
• StudNo, Major -> Advisor
• Advisor -> Major
23
Examples
1NF: Remove repeating groups
 Student (StudNo, StudName)
 StudMajor (StudNo, Major, Advisor)
 StudCourse (StudNo, Major, Course No, Ctitle, InstrucName,
InstructLocn,
Grade)
2NF: Remove partial key dependencies
 Student (StudNo, StudName)
 StudMajor (StudNo, Major, Advisor)
 StudCourse (StudNo, Major, CourseNo, Grade)
 Course (CourseNo, Ctitle, InstrucName, InstructLocn)
24
Example: Company Database
 The COMPANY database keeps track of a company's employees, departments,
and projects. Suppose that after the requirements collection and analysis phase,
the database designers provided the following description of the the part of the
company to be represented in the database:
 The company is organized into departments. Each department has a unique
name, a unique number, and a particular employee who manages the department.
We keep track of the start date when that employee began managing the
department. A department may have several locations.
 A department controls a number of projects, each of which has a unique name, a
unique number, and a single location
 We store each employee's name, social security number, address, salary, sex, and
birth date. An employee is assigned to one department but may work on several
projects, which are not necessarily controlled by the same department. We keep
track of the number of hours per week that an employee works on each project.
We also keep track of the direct supervisor of each employee
 We want to keep track of the dependents of each employee for insurance pur-
poses. We keep each dependent's first name, sex, birth date, and relationship to
the employee.
25
ER- Schema of Company Database
26
Relational Schemas of the Company Database
27
Reading Assignments
1. Discuss the correspondences between the ER model constructs and the
relational model constructs. Show how each ER model construct can be
mapped to the relational model, and discuss any alternative mappings
2. Discuss the options for mapping EER model constructs to relations.
3. Why should nulls in a relation be avoided as far as possible?
4. What does spurious tuples refer to? Discuss the problem of spurious
tuples and how we may prevent it
5. Discuss insertion, deletion, and modification anomalies. Why are they
considered bad? Illustrate with examples.
6. What does the term unnormalized relation refer to?
7. What undesirable dependencies are avoided when a relation is in 2NF?
8. What undesirable dependencies are avoided when a relation is in 3NF?
9. Define Boyce-Codd normal form. How does it differ from 3NF?Why is it
considered a stronger form of 3NF?
28
End of Chapter
Four
any Q.

Chapter Four Logical Database Design (Normalization).pptx

  • 1.
  • 2.
  • 3.
    Normalization 3  Normalization: theprocess of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations  Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties  The main goal of Database Normalization is to restructure the logical data model of a database to:  Eliminate redundancy  Organize data efficiently  Reduce the potential for data anomalies
  • 4.
    Data Anomalies  Mixingattributes of multiple entities may cause problems  Information is stored redundantly wasting storage  Well structured relations contain minimal redundancy of data  They allow modification, insertion and deletion of data in the relation without error  Data Anomalies are errors/inconsistencies that arise due to redundantly stored data in a relation  The three most common anomalies in relational database design are: Insertion anomalies Deletion anomalies Modification anomalies (update anomalies) 4
  • 5.
  • 6.
    Data Anomalies: InsertionAnomalies These type of data anomalies occur when we try to insert new records to a relation. Insertion anomalies can be differentiated into two types: 6
  • 7.
    Data Anomalies: InsertionAnomalies 7 1.To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or nulls (if the employee does not work for a department as yet) 2. It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation.  The only way to do this is to place null values in the attributes for employee.  This causes a problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to represent an employee entity-not a department entity  Moreover, when the first employee is assigned to that department, we do not need this tuple with null values any more.
  • 8.
    Data Anomalies: Deletionanomalies  E.g: If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database  These type of anomalies occur when critical data has been unintentionally (perhaps) removed from the database 8
  • 9.
    Data Anomalies: Modification/UpdateAnomalies  These anomalies arise when the database must make multiple changes on records to reflect a single attribute change  Example:  In EMP_DEPT, if we change the value of one of the attributes of a particular department-say, the manager of department 5-we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent.  If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong 9
  • 10.
    Practical Use ofNormal Forms 10  Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form  The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect  Denormalization: the process of storing the join of higher normal form relations as a base relation— which is in a lower normal form
  • 11.
    Normalization and NormalForms  The normalization process, as first proposed by Codd (l972a), takes a relation schema through a series of tests to "certify" whether it satisfies a certain normal form.  Normalization helps to: Eliminate redundancy Organize data efficiently Reduce the potential for anomalies during data operations, and Improve data consistency 11
  • 12.
    Normalization and NormalForms  In the relational model, methods exist for quantifying how efficient a database is.  These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them  Edgar F. Codd originally established three normal forms:  1NF  2NF and  3NF  Later, others like BCNF, 4NF and 5NF were introduced and were generally accepted, but 3NF is widely considered to be sufficient for most applications  Most tables when reaching 3NF are also in BCNF (Boyce- Codd Normal Form) 12
  • 13.
    Normal Forms: FirstNormal Form (1NF)  A relation (table) R is in 1NF if and only if all underlying domains of attributes contain only atomic values (simple/non divisible)  Each attribute must be atomic • No repeating columns within a row • No multi-valued columns.  1NF simplifies non atomic attributes • Queries become easier  Normalization (Decomposition)  There are three options to normalize a relation into 1NF (as discussed in the next slide) but the best option is to form new relation for each non-atomic attribute or nested relations  Example: Employee Relation ( un normalized) emp_no name dept_no dept_name skills 1 Kevin Jacobs 201 R&D C, Perl, Java 2 Barbara Jones 224 IT Linux, Mac 3 Jake Rivera 201 R&D DB2, Oracle, Java 13
  • 14.
    Normal Forms: FirstNormal Form (1NF)  There are three techniques to achieve a 1NF for such relation:  Expand the key so that there will be a separate tuple in the original Employee relation for each Skill of Employee. But this option has the disadvantage of introducing redundancy in the relation  Remove the attribute Skills that violates 1NF and place it in a separate relation EMP_SKILLS along with the primary key Emp_no of Employee  This decomposes the non-1NF relation into two 1NFrelations with the following Schemas:  Employee (emp_no,name,dept_no,dept_name)  Emp_Skills (emp_no,skills)  If a maximum number of values is known for the non-atomic attribute-for example, if it is known that at most three skills can exist for an employee- replace the Skills attribute by three atomic attributes: Skill1,Skill2 and Skill3. But this has the disadvantage of introducing null values if some employees has less than three skills emp_no name dept_no dept_name skills 1 Kevin Jacobs 201 R&D C 1 Kevin Jacobs 201 R&D Perl 1 Kevin Jacobs 201 R&D Java 2 Barbara Jones 224 IT Linux 2 Barbara Jones 224 IT Mac 3 Jake Rivera 201 R&D DB2 3 Jake Rivera 201 R&D Oracle 3 Jake Rivera 201 R&D Java 14
  • 15.
    Normal Forms: SecondNormal Form (2NF)  Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency XY is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more;  The test for 2NF involves testing for functional dependencies whose left-hand side attributes are part of the primary key. If the primary key contains a single attribute, the test need not be applied at all 15
  • 16.
     Example: ConsiderEmployee -Project Relation schema below  The relation is in 1NF  But the functional dependencies FD2 and FD3 make ENAME, PNAME, and PLOCATION partially dependent on the primary key {SSN, PNUMBER} of EMP_PROJ, thus violating the 2NF test.  Normalizing the relation into 2NF hence leads to the decomposition of EMP_PROJ into the three relation schemas EPl, EP2, and EP3 as shown below: Normal Forms: Second Normal Form (2NF) 16
  • 17.
    Normal Forms: ThirdNormal Form (3NF)  Third normal form (3NF) is based on the concept of transitive dependency  A functional dependency XY in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both XZ and ZY hold.  Example:  The dependency SSNDMGRSSN is transitive through DNUMBER in EMP_DEP 17
  • 18.
    Normal Forms: ThirdNormal Form (3NF)  Example:  The relation EMP_DEPT is in 2NF since no partial dependencies on a key exist  However, EMP_DEPT is not in 3NF because of the transitive dependency of DMGRSSN (and also DNAME) on SSN via DNUMBER.  We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2 as shown below 18
  • 19.
    Individual Assignment 10% 19 Boyce-CoddNormal Form (BCNF) Normal Forms: Fourth Normal Form (4NF) Fifth Normal Form (5NF)
  • 20.
    Denormalization  Normalization isperformed to reduce or eliminate Insertion, Deletion or Update anomalies  However, a completely normalized database may not be the most efficient or effective implementation  “Denormalization” is sometimes used to improve efficiency. Denormalization  Is the process of selectively taking normalized tables and re-combining the data in them  Usually driven by the need to improve query speed. 20
  • 21.
    Normalization  Improves maintenancefor database changes  Tends to slow down retrieval  Better at finding problems than solving them  Standard normalization procedures are subtle and may introduce BCNF or 4NF problems into tables
  • 22.
    Intuitive(Accepted) by Normalization 1NFTables represent entities 2NF Each table represents only one entity 3NF Tables do not contain attributes from embedded entities 4NF Triple relationships should not represent a pair of dual relationships
  • 23.
    Exercise 1. Given theGrade report relation below and its functional dependencies, normalize the relation Gradereport (StudNo, StudName, Major, Advisor, CourseNo, Ctitle, InstName, InstrucLocn, Grade) Functional Dependencies: • StudNo -> StudName • CourseNo -> Ctitle, InstrucName • InstrucName -> InstrucLocn • StudNo, CourseNo, Major -> Grade • StudNo, Major -> Advisor • Advisor -> Major 23
  • 24.
    Examples 1NF: Remove repeatinggroups  Student (StudNo, StudName)  StudMajor (StudNo, Major, Advisor)  StudCourse (StudNo, Major, Course No, Ctitle, InstrucName, InstructLocn, Grade) 2NF: Remove partial key dependencies  Student (StudNo, StudName)  StudMajor (StudNo, Major, Advisor)  StudCourse (StudNo, Major, CourseNo, Grade)  Course (CourseNo, Ctitle, InstrucName, InstructLocn) 24
  • 25.
    Example: Company Database The COMPANY database keeps track of a company's employees, departments, and projects. Suppose that after the requirements collection and analysis phase, the database designers provided the following description of the the part of the company to be represented in the database:  The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations.  A department controls a number of projects, each of which has a unique name, a unique number, and a single location  We store each employee's name, social security number, address, salary, sex, and birth date. An employee is assigned to one department but may work on several projects, which are not necessarily controlled by the same department. We keep track of the number of hours per week that an employee works on each project. We also keep track of the direct supervisor of each employee  We want to keep track of the dependents of each employee for insurance pur- poses. We keep each dependent's first name, sex, birth date, and relationship to the employee. 25
  • 26.
    ER- Schema ofCompany Database 26
  • 27.
    Relational Schemas ofthe Company Database 27
  • 28.
    Reading Assignments 1. Discussthe correspondences between the ER model constructs and the relational model constructs. Show how each ER model construct can be mapped to the relational model, and discuss any alternative mappings 2. Discuss the options for mapping EER model constructs to relations. 3. Why should nulls in a relation be avoided as far as possible? 4. What does spurious tuples refer to? Discuss the problem of spurious tuples and how we may prevent it 5. Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with examples. 6. What does the term unnormalized relation refer to? 7. What undesirable dependencies are avoided when a relation is in 2NF? 8. What undesirable dependencies are avoided when a relation is in 3NF? 9. Define Boyce-Codd normal form. How does it differ from 3NF?Why is it considered a stronger form of 3NF? 28
  • 29.

Editor's Notes

  • #12 Normalization splits database information across multiple tables. To retrieve complete information from a normalized database, the JOIN operation must be used. JOIN tends to be expensive in terms of processing time, and very large joins are very expensive. Examples:
  • #18 If we have transitive dependency in a relation, it means there are different entities in a relation