SlideShare a Scribd company logo
1 of 44
Normalization
For People Who
Hate
Normalization
Roger Carlson www.RogersAccessLibrary.com
About Me
 Roger Carlson started Roger's Access Library as a place to store
knowledge in all forms related to Access It has grown one of the
most popular sites on the web with an estimated 2 million downloads.
Roger's website (www.rogersaccesslibrary.com) and blog
(http://rogersaccessblog.blogspot.com) have been visited by nearly a
million visitors from 170 countries.
 Roger graduated from Western Michigan University with a BS in
Computer Science and taught database design and implementation
at Muskegon Community College for 12 years.
 Roger currently works at Spectrum Health, the largest hospital
system in out-state Michigan, as a Senior BI Analyst.
What’s the Big Deal about
Normalization?
 What is normalization? Normalization is a
methodology for removing redundant data
from a database WITHOUT losing
information.
 So who cares? Why is redundant data bad?
Flat Files and Spreadsheets and
Databases. Oh My!
 In a spreadsheet, it's acceptable to represent the data like
this
 One way to correct this, would be to fill in the missing
information.
234-94-3894
Repeated Columns
One way to solve the redundant data problem is with Repeated
Columns. This is a common solution in spreadsheets. With
repeated columns, the redundant information are stored as
columns.
 How many repeated columns should I create?
 Structure becomes untenable to maintain (job desc, pay
grade, pay range, status, etc.
 Structure adding new fields requires changes to all queries,
forms, reports, etc.
 Difficult to query information. The Problem of Repeated Columns
Normalization
 The solution is to break the table into multiple
tables that preserves data integrity without using
multiple columns.
 And then relate the tables on one or more fields.
Decomposition Method vs.
12-Step Method
 Decomposition:
 Using the formal rules of normalization (Normal
Forms) to break non-normalized tables into
smaller normalized tables.
 12-Step Method:
 Starts with the business rules and builds the
database into properly normalized tables
The 12-Step Program
 Many developers are addicted to tables
designed as spreadsheets
 We call this "committing spreadsheet"
 The following is the 12-Step Program to
Better Databases
Additional Reading
 Database Design for Mere Mortals: A
Hands-On Guide to Relational
Database Design
by Michael J Hernandez (Addison-
Wesley)
 CASE*Method Entity Relationship
Modelling
by Richard Barker (Addison-Wesley)
Step 1: Create a Narrative
 Create a narrative that accurately and in
some detail describes the business
 Collect input screens or paper forms
 Collect reports and other output
 Talk to managers
 Talk to end users
 Make the narrative as complete as possible.
Employee Database
 Narrative
ZYX Laboratories requires an employee tracking database.
They want to track information about employees, the employee's job
history, and their certifications. Employee information includes first
name, middle initial, last name, social security number, address, city,
state, zip, home phone, cell phone, email address.
Job history would include job title, job description, pay grade, pay
range, salary, and date of promotion.
For certifications, they want certification type and date achieved. An
employee can have multiple jobs over time, (ie, Analyst, Sr. Analyst,
QA Administrator). Employees can also earn certifications necessary
for their job.
Step 2: Underline the Nouns
ZYX Laboratories requires an employee tracking database.
They want to track information about employees, the
employee's job history, and their certifications. Employee
information includes first name, middle initial, last name, social
security number, address, city, state, zip, home phone, cell
phone, email address.
Job history would include job title, job description, pay grade,
pay range, salary, and date of promotion.
For certifications, they want certification type and date achieved.
An employee can have multiple jobs over time, (ie, Analyst, Sr.
Analyst, QA Administrator). Employees can also earn
certifications necessary for their job.
Entities and Attributes
 All of these nouns must be represented in the
database -- some as Entities and some as
Attributes.
 An Entity is a "thing" about which we store
information. (Table)
 An Attribute is the information that is being
stored. (Field)
Step 3: Create Noun List
 Make a list of all the nouns.
 Try to determine which are duplicates or are not
pertinent.
 This will be your Preliminary Noun List
Employee First Name Middle
Last Name Address City
State Zip SS#
Phone Cell Email
Job History Job Title Job Description
Promotion Date Pay Range Pay Grade
Salary Certifications Certification Type
Certification Date
Step 4: Flag the Entities
 Flag the nouns that are "subjects".
 This will be your Entity List
Employee * First Name Middle
Last Name Address City
State Zip SS#
Phone Cell Email
Job History * Job Title Job Description
Promotion Date Pay Range Pay Grade
Salary Certifications * Certification Type
Certification Date
Step 5: Group Attributes with
Entities
 Place all the Entities across the top of a sheet
of paper and write the unflagged nouns in the
Preliminary Noun List below the appropriate
Entity. Check them off the list as you do.
 Do all of the nouns belong to an Entity in the
list?
 If not, you missed a subject so you should add
it or assign it to "Unassigned" for later
consideration.
 Preliminary Groupings
Step 6: Revise Entity List
 Go through the Entity list with the customer if
possible
 to see if there is any data that you should be
storing about that entity that you are not.
 If so, add it to the attribute list.
Step 7: Add Primary Keys
 A primary key is a field or fields which uniquely identify
a record. At this point, natural keys only.
Step 8: Evaluate Entities
 Each Entity:
 represents a single subject
 has a primary key
 DOES NOT contain unnecessary duplicate
attributes. (repeated columns)
Amended Grid
Step 9: Evaluate Attributes
 Each Attribute:
 is a characteristic of the Entity
 contains only a single value
 CANNOT be deconstructed into smaller
components.
 DOES NOT contain a calculated or
concatenated value.
 is unique within the entire database structure.
 DOES NOT have attributes of its own.
Step 10: Determine Relationships
 Relationship Types
Many-to-Many: Common in real life,
but cannot be represented in a
database.
One-to-Many: The most common
relationship in a database.
One-to-One: Seldom used.
Employee-JobHistory
 Each Employee can have One or More Job
History instance
And
 Each Job History instance can be for One and
Only One Employee
Job-Job History
 Each Job can have One or More Job History
instance
And
 Each Job History instance can be for One and
Only One Job.
Employee-Job
 Each Employee can hold One or More Jobs
And
 Each Job can be held by One or More
Employees
Employee-Certifications
 Each Employee can attain One or More
Certifications
And
 Each Certification can be earned by One or
More Employees
Job-Certifications
 Each Job requires One or More Certification
But
 Each Certification is for One and Only One
Job
Relationships Between Entities
Relationships Between Entities
Step 11: Resolve Many-Many
Relationships
 To rationalize a many-to-many relationship between
two tables, you create a entity table -- an "intersection"
or "linking" entity. Then you create one-to-many
relationships between the linking entity and each of the
main entities, with the "many-side" of both relationships
on the linking entity.
 The Employee/Certification entity represents a
certification for a particular employee and that can be
given at only one time. Now I can see where to put my
unassigned CertificationDate attribute.
Real Entities vs. Pseudo Entities
 Real Entity to Resolve M:M
 Pseudo Entity to Resolve M:M
Final Attribute Grid
Final E-R Diagram
Step 12: Implementing the E-R
Diagram
 So far, I've talked about Entities and Attributes to keep
myself from thinking about implementation issues
during the modeling phase.
 But at the implementation phase, entities become
tables and attributes become fields.
Add Surrogate Keys:
 Add an Autonumber, Primary Key field (Surrogate Key)
(tablename+"id")
 EmployeeID
 JobID
 Create a Unique Index on the Natural Key
 SS#
 Job Title
Tables with Surrogate Keys Added
Add Foreign Keys
 Now it's time to look at my relationships.
Relationships are created on fields holding
related information, Primary Key to Foreign
Key.
 In a One-to-Many (1:M) relationship, the
primary key of the table on the "One" side is
added to the table on the "Many" side table
and becomes the foreign key.
 EmployeeID  tblJobHistory
 JobID  tblJobHistory
tblEmployee -- tblJobHistory
Enforce Referential Integrity
Create Unique Indexes
Table Unique Index
Employee FirstName/LastName/MI
Job JobTitle
JobHistory EmployeeID/JobID/PromotionDate
Certifications JobID/CertificationType
Employee/Certifications EmployeeID/CertificationID/CertificationDate
Completed Data Model
Unique Indexes
Questions
 Roger's Access Blog

More Related Content

Similar to Normalization

Please show a screenshot of the data model and database design School.pdf
Please show a screenshot of the data model and database design School.pdfPlease show a screenshot of the data model and database design School.pdf
Please show a screenshot of the data model and database design School.pdfinfo750646
 
Week 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docx
Week 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docxWeek 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docx
Week 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docxmelbruce90096
 
Design your own database
Design your own databaseDesign your own database
Design your own databaseFrank Katta
 
Database system the final assignment for this course is an eight to
Database system the final assignment for this course is an eight toDatabase system the final assignment for this course is an eight to
Database system the final assignment for this course is an eight tomehek4
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratchdmurph4
 
Unit 3 Qualitative Data
Unit 3 Qualitative DataUnit 3 Qualitative Data
Unit 3 Qualitative DataSherry Bailey
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingAnimesh Srivastava
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery Attivio
 
A2 databases
A2 databasesA2 databases
A2 databasesc.west
 

Similar to Normalization (20)

Please show a screenshot of the data model and database design School.pdf
Please show a screenshot of the data model and database design School.pdfPlease show a screenshot of the data model and database design School.pdf
Please show a screenshot of the data model and database design School.pdf
 
DATA MODELING.pptx
DATA MODELING.pptxDATA MODELING.pptx
DATA MODELING.pptx
 
Week 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docx
Week 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docxWeek 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docx
Week 3~$ek3.iLab.Directions-1.docxWeek 3BITS.BusinessProce.docx
 
Design your own database
Design your own databaseDesign your own database
Design your own database
 
Database system the final assignment for this course is an eight to
Database system the final assignment for this course is an eight toDatabase system the final assignment for this course is an eight to
Database system the final assignment for this course is an eight to
 
2
22
2
 
Building a Data Quality Program from Scratch
Building a Data Quality Program from ScratchBuilding a Data Quality Program from Scratch
Building a Data Quality Program from Scratch
 
Unit 3 Qualitative Data
Unit 3 Qualitative DataUnit 3 Qualitative Data
Unit 3 Qualitative Data
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Datamodelling
DatamodellingDatamodelling
Datamodelling
 
ER MODEL
ER MODELER MODEL
ER MODEL
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Measurement And Validation
Measurement And ValidationMeasurement And Validation
Measurement And Validation
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
SA Chapter 10
SA Chapter 10SA Chapter 10
SA Chapter 10
 
A2 databases
A2 databasesA2 databases
A2 databases
 
PHP/MySQL Programming Class Lecture 03
PHP/MySQL Programming Class Lecture 03PHP/MySQL Programming Class Lecture 03
PHP/MySQL Programming Class Lecture 03
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Normalization

  • 1. Normalization For People Who Hate Normalization Roger Carlson www.RogersAccessLibrary.com
  • 2. About Me  Roger Carlson started Roger's Access Library as a place to store knowledge in all forms related to Access It has grown one of the most popular sites on the web with an estimated 2 million downloads. Roger's website (www.rogersaccesslibrary.com) and blog (http://rogersaccessblog.blogspot.com) have been visited by nearly a million visitors from 170 countries.  Roger graduated from Western Michigan University with a BS in Computer Science and taught database design and implementation at Muskegon Community College for 12 years.  Roger currently works at Spectrum Health, the largest hospital system in out-state Michigan, as a Senior BI Analyst.
  • 3. What’s the Big Deal about Normalization?  What is normalization? Normalization is a methodology for removing redundant data from a database WITHOUT losing information.  So who cares? Why is redundant data bad?
  • 4. Flat Files and Spreadsheets and Databases. Oh My!  In a spreadsheet, it's acceptable to represent the data like this  One way to correct this, would be to fill in the missing information. 234-94-3894
  • 5. Repeated Columns One way to solve the redundant data problem is with Repeated Columns. This is a common solution in spreadsheets. With repeated columns, the redundant information are stored as columns.  How many repeated columns should I create?  Structure becomes untenable to maintain (job desc, pay grade, pay range, status, etc.  Structure adding new fields requires changes to all queries, forms, reports, etc.  Difficult to query information. The Problem of Repeated Columns
  • 6. Normalization  The solution is to break the table into multiple tables that preserves data integrity without using multiple columns.  And then relate the tables on one or more fields.
  • 7. Decomposition Method vs. 12-Step Method  Decomposition:  Using the formal rules of normalization (Normal Forms) to break non-normalized tables into smaller normalized tables.  12-Step Method:  Starts with the business rules and builds the database into properly normalized tables
  • 8. The 12-Step Program  Many developers are addicted to tables designed as spreadsheets  We call this "committing spreadsheet"  The following is the 12-Step Program to Better Databases
  • 9. Additional Reading  Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design by Michael J Hernandez (Addison- Wesley)  CASE*Method Entity Relationship Modelling by Richard Barker (Addison-Wesley)
  • 10. Step 1: Create a Narrative  Create a narrative that accurately and in some detail describes the business  Collect input screens or paper forms  Collect reports and other output  Talk to managers  Talk to end users  Make the narrative as complete as possible.
  • 11. Employee Database  Narrative ZYX Laboratories requires an employee tracking database. They want to track information about employees, the employee's job history, and their certifications. Employee information includes first name, middle initial, last name, social security number, address, city, state, zip, home phone, cell phone, email address. Job history would include job title, job description, pay grade, pay range, salary, and date of promotion. For certifications, they want certification type and date achieved. An employee can have multiple jobs over time, (ie, Analyst, Sr. Analyst, QA Administrator). Employees can also earn certifications necessary for their job.
  • 12. Step 2: Underline the Nouns ZYX Laboratories requires an employee tracking database. They want to track information about employees, the employee's job history, and their certifications. Employee information includes first name, middle initial, last name, social security number, address, city, state, zip, home phone, cell phone, email address. Job history would include job title, job description, pay grade, pay range, salary, and date of promotion. For certifications, they want certification type and date achieved. An employee can have multiple jobs over time, (ie, Analyst, Sr. Analyst, QA Administrator). Employees can also earn certifications necessary for their job.
  • 13. Entities and Attributes  All of these nouns must be represented in the database -- some as Entities and some as Attributes.  An Entity is a "thing" about which we store information. (Table)  An Attribute is the information that is being stored. (Field)
  • 14. Step 3: Create Noun List  Make a list of all the nouns.  Try to determine which are duplicates or are not pertinent.  This will be your Preliminary Noun List Employee First Name Middle Last Name Address City State Zip SS# Phone Cell Email Job History Job Title Job Description Promotion Date Pay Range Pay Grade Salary Certifications Certification Type Certification Date
  • 15. Step 4: Flag the Entities  Flag the nouns that are "subjects".  This will be your Entity List Employee * First Name Middle Last Name Address City State Zip SS# Phone Cell Email Job History * Job Title Job Description Promotion Date Pay Range Pay Grade Salary Certifications * Certification Type Certification Date
  • 16. Step 5: Group Attributes with Entities  Place all the Entities across the top of a sheet of paper and write the unflagged nouns in the Preliminary Noun List below the appropriate Entity. Check them off the list as you do.  Do all of the nouns belong to an Entity in the list?  If not, you missed a subject so you should add it or assign it to "Unassigned" for later consideration.
  • 18. Step 6: Revise Entity List  Go through the Entity list with the customer if possible  to see if there is any data that you should be storing about that entity that you are not.  If so, add it to the attribute list.
  • 19. Step 7: Add Primary Keys  A primary key is a field or fields which uniquely identify a record. At this point, natural keys only.
  • 20. Step 8: Evaluate Entities  Each Entity:  represents a single subject  has a primary key  DOES NOT contain unnecessary duplicate attributes. (repeated columns)
  • 21.
  • 23. Step 9: Evaluate Attributes  Each Attribute:  is a characteristic of the Entity  contains only a single value  CANNOT be deconstructed into smaller components.  DOES NOT contain a calculated or concatenated value.  is unique within the entire database structure.  DOES NOT have attributes of its own.
  • 24. Step 10: Determine Relationships  Relationship Types Many-to-Many: Common in real life, but cannot be represented in a database. One-to-Many: The most common relationship in a database. One-to-One: Seldom used.
  • 25. Employee-JobHistory  Each Employee can have One or More Job History instance And  Each Job History instance can be for One and Only One Employee
  • 26. Job-Job History  Each Job can have One or More Job History instance And  Each Job History instance can be for One and Only One Job.
  • 27. Employee-Job  Each Employee can hold One or More Jobs And  Each Job can be held by One or More Employees
  • 28. Employee-Certifications  Each Employee can attain One or More Certifications And  Each Certification can be earned by One or More Employees
  • 29. Job-Certifications  Each Job requires One or More Certification But  Each Certification is for One and Only One Job
  • 32. Step 11: Resolve Many-Many Relationships  To rationalize a many-to-many relationship between two tables, you create a entity table -- an "intersection" or "linking" entity. Then you create one-to-many relationships between the linking entity and each of the main entities, with the "many-side" of both relationships on the linking entity.  The Employee/Certification entity represents a certification for a particular employee and that can be given at only one time. Now I can see where to put my unassigned CertificationDate attribute.
  • 33. Real Entities vs. Pseudo Entities  Real Entity to Resolve M:M  Pseudo Entity to Resolve M:M
  • 36. Step 12: Implementing the E-R Diagram  So far, I've talked about Entities and Attributes to keep myself from thinking about implementation issues during the modeling phase.  But at the implementation phase, entities become tables and attributes become fields.
  • 37. Add Surrogate Keys:  Add an Autonumber, Primary Key field (Surrogate Key) (tablename+"id")  EmployeeID  JobID  Create a Unique Index on the Natural Key  SS#  Job Title
  • 38. Tables with Surrogate Keys Added
  • 39. Add Foreign Keys  Now it's time to look at my relationships. Relationships are created on fields holding related information, Primary Key to Foreign Key.  In a One-to-Many (1:M) relationship, the primary key of the table on the "One" side is added to the table on the "Many" side table and becomes the foreign key.  EmployeeID  tblJobHistory  JobID  tblJobHistory
  • 42. Create Unique Indexes Table Unique Index Employee FirstName/LastName/MI Job JobTitle JobHistory EmployeeID/JobID/PromotionDate Certifications JobID/CertificationType Employee/Certifications EmployeeID/CertificationID/CertificationDate

Editor's Notes

  1. There are many ways to represent data. Some of the most common are: spreadsheets, flat files, and relational databases. Each of these ways have their own advantages and disadvantages. Unfortunately, this requires storing a lot of redundant data. What's the big deal? It's only a couple of fields, right? But that's only in the example shown. What if we were storing all of the demographic data (name, address, phone, city, state, etc.) for a lot of people? This would waste a lot of storage capacity. But wasted storage is not the worst problem. What if the SSN of Gina Fawn's first record was changed to 215-87-7854? Perhaps this was through operator error or maybe a programmatic update. It doesn't matter, the data has been changed. Now, which SSN is really Gina's? The database has no way of knowing. Worst still, the SSN matches Steve Smith. So, does that SSN represent Gina or Steve? Again, no way to know. This same problem holds true for all the fields which hold redundant data. This is called a Data Anomaly error. Once you start having data anomalies, you cannot trust the integrity of your database.
  2. Now we don't have problems with redundancy, but we have additional problems. First of all, we have to decide how many repeated columns to create. In Figure 3, I only show one salary increase for Gina and Tony, but is that reasonable? What if Gina has five wage increases and Tony had seven? Is seven sets of columns enough? Do I cap it at the largest record? Or do I add more columns to accommodate growth? If so, how many? Secondly, such a table structure requires a lot of manual modification and becomes untenable when you have a lot of data. Perhaps instead of just the date and salary, we are also storing the job description, pay grade, status, and so forth? The structure would be come so large and unruly that it would be impossible to maintain.
  3. So far, I’ve approached normalization from a particular point of view. I put all the information into a single table then removed redundancies into separate tables. This method is called decomposition. Decomposition is fine for understanding the theory of normalization and for creating small databases. However, it is less useful for large databases. At least, I've found it so. So I'm going to talk about another way to approach normalization that starts with the individual pieces (or business rules) and builds it up into properly normalized tables. This method is called the Entity-Relationship method and the final result is an Entity-Relationship Diagram. An E-R diagram is useful not just for creating the data model, but for documenting it as well. Since we've been working with the Employee Database in our other examples, let's stick with it. But since I claimed that E-R method works for more complicated designs, let's make it a little more complex. I like to start with a short narrative of the requirements.
  4. It is useful at this point to put them in a grid and assign the rest of the attributes
  5. Next, I need to assign primary keys to each entity. A primary key is a field or fields which uniquely identify a record. At this point, I'm dealing only with natural keys. Surrogate keys will come later in the process.
  6. So, for the Employee table, a person (as represented by the SS#) can have only one first name, last name, address, home phone, and so forth. That satisfies requirement #1. Secondly, if the value of the SS# changes, then so will all of those values. By that, I mean if we move to a different entity with a different SS#, that entity will have a different first name, last name, etc. (For our purposes here, we will assume that no two employees share any of these attributes.) Now, what about the Job History table? Any time an entity has a compound primary key, you should look at it very closely to make sure all the fields depend on the entire primary key. Any particular job can have only one description, pay grade, and pay range. However, none of those depend on the Promotion Date. I've got a problem here and I need to take another look. What I really have is information about two different "things". Job Title, Description, Pay Range and Pay Grade pertain to the Job as a category. Everyone who holds that position will have the same values. On the other hand, Salary and Promotion Date will be different for each person. So I really have two entities: 1) Job (information about the job itself), and 2) Job History (information about a particular employee's employment history. I need take Job Title, Description, Pay Range and Pay Grade out of the Job History table and put them in the Job table. Lastly, in the Certification table, Certification Date is also not fully dependent on the Certification Type. Different individuals achieve the certification at different dates. I don't have an entity to put the date in, so I'll put that to the side and come back to it later. These last two problems are really a result of a poor narrative. If I had been more explicit, these would be obvious.
  7. Many-to-Many: Common in real life, but cannot be represented in a database. One-to-Many: The most common relationship in a database. One-to-One: Seldom used. At this point, I should say that this is not a strictly linear process . That is, you can't always move smoothly from one step to the next. Sometimes you have to move back and forth between them as you discover more things about your system. That's what I'm going to do next. Because I have an unassigned attribute, I'm going to look at the relationships between my existing entities and see if something doesn't present itself. To look at the relationships, I'm going to ignore the attributes for a while. Attributes do not have relationships, only entities do. If you discover that an attribute does have a relationship with some other entity or attribute, that's an indication that it is really an entity and your grid must change.
  8. So how do I know what the relationships are for my Employee Database? For that I need to go back to the narrative. The second paragraph describes "business rules", that is, how the business actually works. I'll repeat the paragraph here. An employee can have multiple jobs over time, (ie, Analyst, Sr. Analyst, QA Administrator). Employees can also earn certifications necessary for their job. From this I can write out the relationships in full sentences, and I find it useful to write them in both directions. For instance, from the narrative, I can say:
  9. Unfortunately, I'm not done yet, for two reasons: 1) many-to-many relationships cannot be directly implemented in a relational database, and 2) I still have an unassigned attribute. So first I'll rationalize the many-to-many relationship and then take another look. To rationalize a many-to-many relationship between two tables, you create a third table -- an "intersection" or "linking" table. Then you create one-to-many relationships between the linking table and each of the main tables, with the "many-side" of both relationships on the linking table. As you can see above, Employee and Certifications have a many-to-many relationship, so I need to create a new entity (Employee/Certifications). Sometimes linking tables have logical names. Other times, they don't. In that case, I simply combine the names of the base tables.
  10. Unfortunately, I'm not done yet, for two reasons: 1) many-to-many relationships cannot be directly implemented in a relational database, and 2) I still have an unassigned attribute. So first I'll rationalize the many-to-many relationship and then take another look. To rationalize a many-to-many relationship between two tables, you create a third table -- an "intersection" or "linking" table. Then you create one-to-many relationships between the linking table and each of the main tables, with the "many-side" of both relationships on the linking table. As you can see above, Employee and Certifications have a many-to-many relationship, so I need to create a new entity (Employee/Certifications). Sometimes linking tables have logical names. Other times, they don't. In that case, I simply combine the names of the base tables.
  11. Now that I've got all the relationships between my entities identified and assigned all the attributes, I can put it all into one diagram.