Normalization

Normalization
For People Who
Hate
Normalization
Roger Carlson www.RogersAccessLibrary.com

About Me
 Roger Carlson started Roger's Access Library as a place to store
knowledge in all forms related to Access It has grown one of the
most popular sites on the web with an estimated 2 million downloads.
Roger's website (www.rogersaccesslibrary.com) and blog
(http://rogersaccessblog.blogspot.com) have been visited by nearly a
million visitors from 170 countries.
 Roger graduated from Western Michigan University with a BS in
Computer Science and taught database design and implementation
at Muskegon Community College for 12 years.
 Roger currently works at Spectrum Health, the largest hospital
system in out-state Michigan, as a Senior BI Analyst.

What’s the Big Deal about
Normalization?
 What is normalization? Normalization is a
methodology for removing redundant data
from a database WITHOUT losing
information.
 So who cares? Why is redundant data bad?

Flat Files and Spreadsheets and
Databases. Oh My!
 In a spreadsheet, it's acceptable to represent the data like
this
 One way to correct this, would be to fill in the missing
information.
234-94-3894

Repeated Columns
One way to solve the redundant data problem is with Repeated
Columns. This is a common solution in spreadsheets. With
repeated columns, the redundant information are stored as
columns.
 How many repeated columns should I create?
 Structure becomes untenable to maintain (job desc, pay
grade, pay range, status, etc.
 Structure adding new fields requires changes to all queries,
forms, reports, etc.
 Difficult to query information. The Problem of Repeated Columns

Normalization
 The solution is to break the table into multiple
tables that preserves data integrity without using
multiple columns.
 And then relate the tables on one or more fields.

Decomposition Method vs.
12-Step Method
 Decomposition:
 Using the formal rules of normalization (Normal
Forms) to break non-normalized tables into
smaller normalized tables.
 12-Step Method:
 Starts with the business rules and builds the
database into properly normalized tables

The 12-Step Program
 Many developers are addicted to tables
designed as spreadsheets
 We call this "committing spreadsheet"
 The following is the 12-Step Program to
Better Databases

Additional Reading
 Database Design for Mere Mortals: A
Hands-On Guide to Relational
Database Design
by Michael J Hernandez (Addison-
Wesley)
 CASE*Method Entity Relationship
Modelling
by Richard Barker (Addison-Wesley)

Step 1: Create a Narrative
 Create a narrative that accurately and in
some detail describes the business
 Collect input screens or paper forms
 Collect reports and other output
 Talk to managers
 Talk to end users
 Make the narrative as complete as possible.

Employee Database
 Narrative
ZYX Laboratories requires an employee tracking database.
They want to track information about employees, the employee's job
history, and their certifications. Employee information includes first
name, middle initial, last name, social security number, address, city,
state, zip, home phone, cell phone, email address.
Job history would include job title, job description, pay grade, pay
range, salary, and date of promotion.
For certifications, they want certification type and date achieved. An
employee can have multiple jobs over time, (ie, Analyst, Sr. Analyst,
QA Administrator). Employees can also earn certifications necessary
for their job.

Step 2: Underline the Nouns
ZYX Laboratories requires an employee tracking database.
They want to track information about employees, the
employee's job history, and their certifications. Employee
information includes first name, middle initial, last name, social
security number, address, city, state, zip, home phone, cell
phone, email address.
Job history would include job title, job description, pay grade,
pay range, salary, and date of promotion.
For certifications, they want certification type and date achieved.
An employee can have multiple jobs over time, (ie, Analyst, Sr.
Analyst, QA Administrator). Employees can also earn
certifications necessary for their job.

Entities and Attributes
 All of these nouns must be represented in the
database -- some as Entities and some as
Attributes.
 An Entity is a "thing" about which we store
information. (Table)
 An Attribute is the information that is being
stored. (Field)

Step 3: Create Noun List
 Make a list of all the nouns.
 Try to determine which are duplicates or are not
pertinent.
 This will be your Preliminary Noun List
Employee First Name Middle
Last Name Address City
State Zip SS#
Phone Cell Email
Job History Job Title Job Description
Promotion Date Pay Range Pay Grade
Salary Certifications Certification Type
Certification Date

Step 4: Flag the Entities
 Flag the nouns that are "subjects".
 This will be your Entity List
Employee * First Name Middle
Last Name Address City
State Zip SS#
Phone Cell Email
Job History * Job Title Job Description
Promotion Date Pay Range Pay Grade
Salary Certifications * Certification Type
Certification Date

Step 5: Group Attributes with
Entities
 Place all the Entities across the top of a sheet
of paper and write the unflagged nouns in the
Preliminary Noun List below the appropriate
Entity. Check them off the list as you do.
 Do all of the nouns belong to an Entity in the
list?
 If not, you missed a subject so you should add
it or assign it to "Unassigned" for later
consideration.

Step 6: Revise Entity List
 Go through the Entity list with the customer if
possible
 to see if there is any data that you should be
storing about that entity that you are not.
 If so, add it to the attribute list.

Step 7: Add Primary Keys
 A primary key is a field or fields which uniquely identify
a record. At this point, natural keys only.

Step 8: Evaluate Entities
 Each Entity:
 represents a single subject
 has a primary key
 DOES NOT contain unnecessary duplicate
attributes. (repeated columns)

Step 9: Evaluate Attributes
 Each Attribute:
 is a characteristic of the Entity
 contains only a single value
 CANNOT be deconstructed into smaller
components.
 DOES NOT contain a calculated or
concatenated value.
 is unique within the entire database structure.
 DOES NOT have attributes of its own.

Step 10: Determine Relationships
 Relationship Types
Many-to-Many: Common in real life,
but cannot be represented in a
database.
One-to-Many: The most common
relationship in a database.
One-to-One: Seldom used.

Employee-JobHistory
 Each Employee can have One or More Job
History instance
And
 Each Job History instance can be for One and
Only One Employee

Job-Job History
 Each Job can have One or More Job History
instance
And
 Each Job History instance can be for One and
Only One Job.

Employee-Job
 Each Employee can hold One or More Jobs
And
 Each Job can be held by One or More
Employees

Employee-Certifications
 Each Employee can attain One or More
Certifications
And
 Each Certification can be earned by One or
More Employees

Job-Certifications
 Each Job requires One or More Certification
But
 Each Certification is for One and Only One
Job

Relationships Between Entities

Step 11: Resolve Many-Many
Relationships
 To rationalize a many-to-many relationship between
two tables, you create a entity table -- an "intersection"
or "linking" entity. Then you create one-to-many
relationships between the linking entity and each of the
main entities, with the "many-side" of both relationships
on the linking entity.
 The Employee/Certification entity represents a
certification for a particular employee and that can be
given at only one time. Now I can see where to put my
unassigned CertificationDate attribute.

Real Entities vs. Pseudo Entities
 Real Entity to Resolve M:M
 Pseudo Entity to Resolve M:M

Step 12: Implementing the E-R
Diagram
 So far, I've talked about Entities and Attributes to keep
myself from thinking about implementation issues
during the modeling phase.
 But at the implementation phase, entities become
tables and attributes become fields.

Add Surrogate Keys:
 Add an Autonumber, Primary Key field (Surrogate Key)
(tablename+"id")
 EmployeeID
 JobID
 Create a Unique Index on the Natural Key
 SS#
 Job Title

Tables with Surrogate Keys Added

Add Foreign Keys
 Now it's time to look at my relationships.
Relationships are created on fields holding
related information, Primary Key to Foreign
Key.
 In a One-to-Many (1:M) relationship, the
primary key of the table on the "One" side is
added to the table on the "Many" side table
and becomes the foreign key.
 EmployeeID  tblJobHistory
 JobID  tblJobHistory

Create Unique Indexes
Table Unique Index
Employee FirstName/LastName/MI
Job JobTitle
JobHistory EmployeeID/JobID/PromotionDate
Certifications JobID/CertificationType
Employee/Certifications EmployeeID/CertificationID/CertificationDate

Completed Data Model
Unique Indexes

Questions
 Roger's Access Blog

Normalization

Recommended

Recommended

More Related Content

Similar to Normalization

Similar to Normalization (20)

Recently uploaded

Recently uploaded (20)

Normalization

Editor's Notes