SlideShare a Scribd company logo
1 of 61
Why be Normal?
    An Introduction
    To Normalization




   http://joind.in/3436
About Me


• Ligaya Turmelle
• Senior Technical Support Engineer -
  MySQL
• 9 years
• I <3 Databases
Disclaimer
Questions

• Who works with databases?
• Who has had to design a database?
• Who has no idea what normalization is
  but has heard about it?
• ERD?
• Anyone heard of denormalization?
So what exactly is
 Normalization?
process of
organizing your
     data
So why do we use
 Normalization?
reduce
 redundancies
and organize the
 relationships
Advantages

• stored as small atomic pieces
• saves space
• increases speed
• reduces data anomalies
• makes for easier maintenance
So what then is this
1NF, 2NF, 3NF, etc.?
This refers to
Normal forms (NF)
Normal Forms


• Normal forms are “standardized” rules
• Each form builds off the last one
• The higher the number the more
  normalized the data is
So lets go over the
most common forms
Example Data
                 student     emergency                                           school
   parents                                 student age   classroom    teacher             grade level
                  Name        contact                                             year

John and Mary   John Smith
                             John Smith        10          C110      Ms. Brown   2010         6
    Smith           Jr.



                             Mary Smith        9           C80       Mr Green    2009         5




                  April
                             John Smith        10          C110      Ms. Brown   2010         6
                  Smith


                             Mary Smith        9           A25       Mr Baker    2009         5




                   Julie
 Dave Harris                 Dave Harris       6           A10       Mr Jones    2010         3
                  Harris
Base table we will start
         with
1NF
First Normal Form




• remove “repeating groups”
Example Data
                 student     emergency                                           school
   parents                                 student age   classroom    teacher             grade level
                  Name        contact                                             year

John and Mary   John Smith
                             John Smith        10          C110      Ms. Brown   2010         6
    Smith           Jr.



                             Mary Smith        9           C80       Mr Green    2009         5




                  April
                             John Smith        10          C110      Ms. Brown   2010         6
                  Smith


                             Mary Smith        9           A25       Mr Baker    2009         5




                   Julie
 Dave Harris                 Dave Harris       6           A10       Mr Jones    2010         3
                  Harris
Example Data
                 student       emergency                                           school
  parents                                    student age   classroom    teacher             grade level
                  Name          contact                                             year

John Smith    John Smith Jr.   John Smith        10          C110      Ms. Brown   2010         6


Mary Smith    John Smith Jr.   John Smith        10          C110      Ms. Brown   2010         6


John Smith    John Smith Jr.   Mary Smith        9           C80       Mr Green    2009         5


Mary Smith    John Smith Jr.   Mary Smith        9           C80       Mr Green    2009         5


John Smith    April Smith      John Smith        10          C110      Ms. Brown   2010         6

Mary Smith    April Smith      John Smith        10          C110      Ms. Brown   2010         6

John Smith    April Smith      Mary Smith        9           A25       Mr Baker    2009         5

Mary Smith    April Smith      Mary Smith        9           A25       Mr Baker    2009         5


Dave Harris    Julie Harris    Dave Harris       6           A10       Mr Jones    2010         3
First Normal Form



• remove repeating groups
• a primary key can be defined
What is a primary
      key?
it can UNIQUELY
identify any row in
       a table
Base table we will start
         with
Example Data
                 student       emergency                                           school
  parents                                    student age   classroom    teacher             grade level
                  Name          contact                                             year

John Smith    John Smith Jr.   John Smith        10          C110      Ms. Brown   2010         6


Mary Smith    John Smith Jr.   John Smith        10          C110      Ms. Brown   2010         6


John Smith    John Smith Jr.   Mary Smith        9           C80       Mr Green    2009         5


Mary Smith    John Smith Jr.   Mary Smith        9           C80       Mr Green    2009         5


John Smith    April Smith      John Smith        10          C110      Ms. Brown   2010         6

Mary Smith    April Smith      John Smith        10          C110      Ms. Brown   2010         6

John Smith    April Smith      Mary Smith        9           A25       Mr Baker    2009         5

Mary Smith    April Smith      Mary Smith        9           A25       Mr Baker    2009         5


Dave Harris    Julie Harris    Dave Harris       6           A10       Mr Jones    2010         3
2NF
Second Normal Form



• meet all requirements of 1NF
• isolate repeated subsets of data
Class
Class
Contact
Contact
And that then leaves us
        with...
And that then leaves us
        with...
Second Normal Form



• meet all requirements of 1NF
• isolate related data
• create the relationships
Our Tables Right Now
Student to Contact
   relationship
Student to Contact
   relationship
Student to Class
  relationship
Student to Class
  relationship
The tables and
relationships
Example Data
                                        StuClass
Student
                                         studentName             class_id
studentName studentAge
                                         John Smith Jr              1
John Smith Jr          10
                                         John Smith Jr              2
 April Smith           10
 Julie Harris          6                  April Smith               1
                                          April Smith               3
                                          Julie Harris              4
Class
class_id classroom           teacher     schoolYear      gradeLevel
   1            C110        Ms. Brown       2010             6
   2            C80         Mr. Green       2009             5
   3            A25         Mr. Baker       2009             5
   4            A10         Mr. Jones       2010             3
Example Data
                                    StuContact
Student                                 studentName      contactName
studentName studentAge
                                        John Smith Jr     John Smith
John Smith Jr       10
 April Smith        10                  John Smith Jr    Mary Smith

 Julie Harris       6                    April Smith      John Smith
                                         April Smith     Mary Smith

Contact                                  Julie Harris    Dave Harris

contactName parent contactPhone          contactEmail contactAddress
 John Smith     Y        123-555-9876    john@blah.com    1 Main St
Mary Smith      Y        123-555-2947   mary@blah.com     1 Main St
Dave Harris     Y        123-555-3456    dave@work.cm     5 Baker Rd
3NF
Third Normal Form



• meet all requirements of 2NF
• pull out data that is not dependent on
  the primary key
The tables and
relationships
The tables and
relationships
The tables and
relationships
reminder
We are looking for data that is not dependent
on the primary key, so we can extract that out
moving to 3NF
and the relationship
   between them
You can go to
higher NF, if you
   really want
Generally speaking
 there is no need
Knowing what you
   know now...
can you guess what
Denormalization is
       now?
Reasons for
      Denormalization

• SELECT with JOINs
  • Lots of JOINs
• Summarize information


  Normalize first and then only denormalize
  when you have to.
Questions?
Thank You
http://joind.in/3436

More Related Content

Viewers also liked

Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization emailharmeet
 
Important java programs(collection+file)
Important java programs(collection+file)Important java programs(collection+file)
Important java programs(collection+file)Alok Kumar
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Jargalsaikhan Alyeksandr
 

Viewers also liked (6)

Normalization
NormalizationNormalization
Normalization
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization
 
Important java programs(collection+file)
Important java programs(collection+file)Important java programs(collection+file)
Important java programs(collection+file)
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 

More from Ligaya Turmelle (12)

Troubleshooting tldr
Troubleshooting tldrTroubleshooting tldr
Troubleshooting tldr
 
Rootconf admin101
Rootconf admin101Rootconf admin101
Rootconf admin101
 
Zend2016 dba tutorial
Zend2016 dba tutorialZend2016 dba tutorial
Zend2016 dba tutorial
 
Character sets
Character setsCharacter sets
Character sets
 
Tek tutorial
Tek tutorialTek tutorial
Tek tutorial
 
DPC Tutorial
DPC TutorialDPC Tutorial
DPC Tutorial
 
MySQL 5.1 Replication
MySQL 5.1 ReplicationMySQL 5.1 Replication
MySQL 5.1 Replication
 
MySQL 5.5
MySQL 5.5MySQL 5.5
MySQL 5.5
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Php Community
Php CommunityPhp Community
Php Community
 
Explain
ExplainExplain
Explain
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 

Recently uploaded

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Recently uploaded (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Normalization

  • 1. Why be Normal? An Introduction To Normalization http://joind.in/3436
  • 2. About Me • Ligaya Turmelle • Senior Technical Support Engineer - MySQL • 9 years • I <3 Databases
  • 4. Questions • Who works with databases? • Who has had to design a database? • Who has no idea what normalization is but has heard about it? • ERD? • Anyone heard of denormalization?
  • 5. So what exactly is Normalization?
  • 7. So why do we use Normalization?
  • 9. Advantages • stored as small atomic pieces • saves space • increases speed • reduces data anomalies • makes for easier maintenance
  • 10. So what then is this 1NF, 2NF, 3NF, etc.?
  • 11. This refers to Normal forms (NF)
  • 12. Normal Forms • Normal forms are “standardized” rules • Each form builds off the last one • The higher the number the more normalized the data is
  • 13. So lets go over the most common forms
  • 14. Example Data student emergency school parents student age classroom teacher grade level Name contact year John and Mary John Smith John Smith 10 C110 Ms. Brown 2010 6 Smith Jr. Mary Smith 9 C80 Mr Green 2009 5 April John Smith 10 C110 Ms. Brown 2010 6 Smith Mary Smith 9 A25 Mr Baker 2009 5 Julie Dave Harris Dave Harris 6 A10 Mr Jones 2010 3 Harris
  • 15. Base table we will start with
  • 16. 1NF
  • 17. First Normal Form • remove “repeating groups”
  • 18. Example Data student emergency school parents student age classroom teacher grade level Name contact year John and Mary John Smith John Smith 10 C110 Ms. Brown 2010 6 Smith Jr. Mary Smith 9 C80 Mr Green 2009 5 April John Smith 10 C110 Ms. Brown 2010 6 Smith Mary Smith 9 A25 Mr Baker 2009 5 Julie Dave Harris Dave Harris 6 A10 Mr Jones 2010 3 Harris
  • 19. Example Data student emergency school parents student age classroom teacher grade level Name contact year John Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6 Mary Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6 John Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5 Mary Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5 John Smith April Smith John Smith 10 C110 Ms. Brown 2010 6 Mary Smith April Smith John Smith 10 C110 Ms. Brown 2010 6 John Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5 Mary Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5 Dave Harris Julie Harris Dave Harris 6 A10 Mr Jones 2010 3
  • 20. First Normal Form • remove repeating groups • a primary key can be defined
  • 21. What is a primary key?
  • 22. it can UNIQUELY identify any row in a table
  • 23. Base table we will start with
  • 24. Example Data student emergency school parents student age classroom teacher grade level Name contact year John Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6 Mary Smith John Smith Jr. John Smith 10 C110 Ms. Brown 2010 6 John Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5 Mary Smith John Smith Jr. Mary Smith 9 C80 Mr Green 2009 5 John Smith April Smith John Smith 10 C110 Ms. Brown 2010 6 Mary Smith April Smith John Smith 10 C110 Ms. Brown 2010 6 John Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5 Mary Smith April Smith Mary Smith 9 A25 Mr Baker 2009 5 Dave Harris Julie Harris Dave Harris 6 A10 Mr Jones 2010 3
  • 25. 2NF
  • 26. Second Normal Form • meet all requirements of 1NF • isolate repeated subsets of data
  • 27.
  • 28. Class
  • 29. Class
  • 30.
  • 33. And that then leaves us with...
  • 34. And that then leaves us with...
  • 35.
  • 36.
  • 37. Second Normal Form • meet all requirements of 1NF • isolate related data • create the relationships
  • 39. Student to Contact relationship
  • 40. Student to Contact relationship
  • 41. Student to Class relationship
  • 42. Student to Class relationship
  • 44. Example Data StuClass Student studentName class_id studentName studentAge John Smith Jr 1 John Smith Jr 10 John Smith Jr 2 April Smith 10 Julie Harris 6 April Smith 1 April Smith 3 Julie Harris 4 Class class_id classroom teacher schoolYear gradeLevel 1 C110 Ms. Brown 2010 6 2 C80 Mr. Green 2009 5 3 A25 Mr. Baker 2009 5 4 A10 Mr. Jones 2010 3
  • 45. Example Data StuContact Student studentName contactName studentName studentAge John Smith Jr John Smith John Smith Jr 10 April Smith 10 John Smith Jr Mary Smith Julie Harris 6 April Smith John Smith April Smith Mary Smith Contact Julie Harris Dave Harris contactName parent contactPhone contactEmail contactAddress John Smith Y 123-555-9876 john@blah.com 1 Main St Mary Smith Y 123-555-2947 mary@blah.com 1 Main St Dave Harris Y 123-555-3456 dave@work.cm 5 Baker Rd
  • 46. 3NF
  • 47. Third Normal Form • meet all requirements of 2NF • pull out data that is not dependent on the primary key
  • 51. reminder We are looking for data that is not dependent on the primary key, so we can extract that out
  • 53. and the relationship between them
  • 54.
  • 55. You can go to higher NF, if you really want
  • 57. Knowing what you know now...
  • 58. can you guess what Denormalization is now?
  • 59. Reasons for Denormalization • SELECT with JOINs • Lots of JOINs • Summarize information Normalize first and then only denormalize when you have to.

Editor's Notes

  1. http://joind.in/3436\n
  2. yeah yeah - Oracle I know.\n3 years with MySQL the company and about 5-6 years as a developer\n\n\n
  3. Ok, before we go any further I want to note a few things about this talk. When I first submitted this talk, I thought it would be quite an easy one to discuss. But as I dug deeper into the information (including my old text books and what is on the web), I found there is a lot of &amp;#x201C;discussion/argument&amp;#x201D; about it based on which relational theorist you prefer. \n\nI was originally taught these rules according to the theorist Codd. However there have been advancements to Codd&amp;#x2019;s theory - most notably by Date - that many now consider to be just as valid. Depending on the theorist you prefer to follow some of the information I give will either be valid - or invalid. \n\nI am not here to argue over which theorist is more &amp;#x201C;valid&amp;#x201D; but rather to try and give you, the beginner, a handhold into the concepts of normalization. That is my primary objective. \n\nOn the other hand I want to make sure to you the beginner understand that there is a depth of technical information that I will not be covering in order to not overwhelm you with technical details, terminology and arguments (Atomity, super keys, candidate keys, transitive dependencies, and partial dependencies for example). If anyone after the session would like to discuss in more detail these things, I would be more then happy to do it in the hallway track. \n\nFinally, I would love to hear feedback on this talk. It has been much harder then I thought to balance the complexity of the theory with the needs of not overwhelming a beginner. I would like to know how I did and what anyone thinks can be done to improve the talk.\n
  4. \n
  5. \n
  6. To put it simply - it is the process of organizing your data. This includes deciding what tables to create, what attributes/columns you have for each table, how you inter-relate those table, and the data you put in the table.\n
  7. \n
  8. \n
  9. - forces you to break down the data to its smallest form\n - small data saved only in one place == minimal space == increased speed (generally speaking).\n - because you are only saving the data in one place, you are less likely to &amp;#x201C;miss&amp;#x201D; data when you insert, update or delete the data\n
  10. \n
  11. So what the heck is that - I hear you say!\n
  12. \n
  13. \n
  14. DO NOT SHOW THIS SLIDE\nThis is an elementary school table (grade 1-6) and shows some of the basic information we are going to have to handle. As we go we may choose to expand on some of information we are going to keep\n\nFYI - Software is Workbench. This draws the ERD (Entity-Relationship Diagram) for us.\n
  15. This is an elementary school table (grade 1-6) and shows some of the basic information we are going to have to handle. As we go we may choose to expand on some of information we are going to keep\n\nFYI - Software is Workbench. This draws the ERD (Entity-Relationship Diagram) for us.\n
  16. DO NOT SHOW THIS SLIDE!\nOur spread sheet:\nJohn Jr and April are twins of John and Mary Smith. \nThey have attended this school for 3 years. \nThey have had the same teacher this year.\n
  17. Our spread sheet:\nJohn Jr and April are twins of John and Mary Smith. \nThey have attended this school for 2 years. \nThey have had the same teacher this year.\n
  18. We have John and Mary Smith who have twin children in the school - John Jr and April. John Jr and April have been at the school for 2 years and are both in Ms Brown&amp;#x2019;s class this year.\n\nDave Harris has recently moved into the area and only has 1 child - Julie - in the school this year. \n
  19. \n
  20. \n
  21. \n\n\n
  22. So where are the &amp;#x201C;repeating groups&amp;#x201D;? \n1) we have John *and* Mary Smith as parents for both John Jr and April Smith. We will need to isolate each parent to each child.\n2) I also see that John Jr and April both have been in the school for 2 years. So we will want to isolate the children to each year in school.\n\nBasically what we are working toward is making sure that each intersection between a row and a column contains only a single &amp;#x201C;atomic&amp;#x201D; value.\n
  23. When we break everything out, this is what we get. Notice that there is an entry for both parents, each child and each emergency contact.\n
  24. DO NOT SHOW THIS SLIDE!\nSo we make sure each row has the (repeating) values\n
  25. \n
  26. \n
  27. This can be a single column value, or a combination of columns. Modern web systems now tend to use MySQL&amp;#x2019;s auto_increment ability to create a unique integer value for each row. But the primary key *DOES NOT* have to be this. \n\nThere are a number of reasons why most systems now use auto_incs, from space being cheap (which historically was not always true), to the speed and ease of manipulation of an integer. But do not lock yourself into the thought that it must be.\n
  28. Primary Key is a column or set of columns that can uniquely identify any row. In this example the combination of the student Name, grade level, school year and teacher will uniquely identify each row. (can handle repeated years, skipping a grade in the same year, changing teachers or the same teacher multiple years)\n
  29. This technically is in 1NF. \n\nHowever, depending upon how the term &amp;#x201C;repeating groups&amp;#x201D; is interpreted (and by whom), some could argue that there are additional changes we must make. For example if you take the term &amp;#x201C;repeating groups&amp;#x201D; and think of it in terms of atomic data, we would have to break down the parents listing into individual names for each row. So John Jr and April would each then have 6 rows associated with them. I personally prefer it this way, but for brevity sake, I chose not to try and make a table that displays 12 rows and 13 columns for you to see.\n\nWe could also drill it down further and require the separation of first and last names into individual columns... and it goes on and on.\n\nI told you it could drive you mad! :D\n
  30. \n
  31. \n
  32. So looking at our data - can we uniquely identify each row? I think so. I personally like the combination of student name with teacher, classroom, year and grade. This would allow a student to skip grades within the same school year and change classrooms or teachers without breaking the uniqueness.\n
  33. Now that we have first normal form, we can see what it takes to get to second.\n
  34. Second normal for builds off first normal form. In order to be in second normal form we must first meet all the requirements of 1NF. Once we have that, we will need to isolate subsets of data.\n\nThis sounds hard but lets do a few examples so you can see what we mean.\n
  35. Ok - so what related subsets of data do we have?\nFirst thing I see is the grades. \n\n
  36. Ok - so what repeating subsets of data do we have? Well we know that John Jr and April are in the same class now. So that repeats. Lets pull that out into its own table.\n\n
  37. So we break this out into its own table - grades. Also note I have added a grades_id column as the primary key for this table. I did this since I have to still maintain 1NF (requires PK) but I chose this time around to use an artificial primary key rather then natural key made of a composite of multiple columns. This is mostly for convenience sake as you will see later.\n
  38. So we break this out into its own table - grades. Also note I have added a grades_id column as the primary key for this table. I did this since I have to still maintain 1NF (requires PK) but I chose this time around to use an artificial primary key rather then natural key made of a composite of multiple columns. This is mostly for convenience sake as you will see later.\n
  39. So we break this out into its own table. Also note I have added a grades_id column. I did this since I have to still maintain 1NF - which means I have to have a primary key. I could hypothetically make the primary key out of a combination of all 4 quarters grades, but since I know I may use it later - I chose in this case to make an artificial key to identify the row.\n
  40. So we break this out into its own table - grades. Also note I have added a grades_id column as the primary key for this table. I did this since I have to still maintain 1NF (requires PK) but I chose this time around to use an artificial primary key rather then natural key made of a composite of multiple columns. This is mostly for convenience sake as you will see later.\n
  41. OK now that we have removed the grades information- what other repeating subsets do we see? Hmm - John Jr and April are in the same class - so that is repeated. Lets break that out.\n
  42. OK now that we have removed the grades information- what other repeating subsets do we see? Hmm - John Jr and April are in the same class - so that is repeated. Lets break that out.\n
  43. \n
  44. \n
  45. \n
  46. I am creating an artificial primary key for the Class table. \n\n \n
  47. Ok - so is there anything else that repeats? Yep - we have the parent and emergency contact information.\n\n\n
  48. Ok - so is there anything else that repeats? Next thing I can think of is that John Jr and April share the same parents. So that repeats again. We can then pull that out.\n\n\n
  49. Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email.\n
  50. Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email.\n
  51. Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email. Later on you will see that I add this information to the table.\n
  52. Now the way I chose to handle this is to make a Contact table. In it we will place the emergency contact information along with the parent information since it is not uncommon for them to be the same. \n\nI will also include a parent column flag. This will tell me if the name is a parent or a separate emergency contact (Ex: single parent who has the grandparent as the emergency contact.)\n\nIf this was a real world situation we would want to also hold contact information like phone numbers, address, maybe even email. \n
  53. \n
  54. \n\n
  55. Which we can rename to the Student table\n\n
  56. \n
  57. \n
  58. \n
  59. \n
  60. Now that we have isolated related data, it is time to create the relations between the data.\n
  61. So we have these tables, and now have to &amp;#x201C;reconnect&amp;#x201D; them to each other. This puts back in the relations.\n
  62. So we have these tables, and now have to &amp;#x201C;reconnect&amp;#x201D; them to each other. This puts back in the relations.\n
  63. The easiest relationship I see is the Student to the Emergency Contact.\n\nWe have to keep in mind how our relationships work. Working under the understanding that each student can have many contacts (Ex: John and Mary Smith) and the each Contact can be for 1 or more students (Ex: John Jr and April).\n\nBecause this is a many students to many contacts relationship we make a pivot/associative table so we can link/list an individual student to an individual contact.\n\n
  64. The easiest relationship I see is the Student to the Emergency Contact.\n\nWe have to keep in mind how our relationships work. Working under the understanding that each student can have many contacts (Ex: John and Mary Smith) and the each Contact can be for 1 or more students (Ex: John Jr and April).\n\nBecause this is a many students to many contacts relationship we make a pivot/associative table so we can link/list an individual student to an individual contact.\n\n
  65. The StuContact table now links the 2 tables. We can now find each contact for an individual student (We can search on John Jr and find Mary Smith and John Smith.)\nAnd each student associated with a specific contact (We can search on Mary Smith and find John Jr and April.)\n\nNow could I make artificial Primary keys for the Student and the Contact tables so the pivot table is only working with integars - sure. This is now a common practice on the web to help indexes stay small and fast. However it should be noted that when you do this you are potentially taking up a lot of extra space to hold that unrelated value.\n\nFYI - this is called crows foot notation. The crows foot symbolizes a Many relationship. The other side stands for a singular relationship. So just by looking at the diagram we know that Student is a 1-N relationship to StuContact, and StuContact is a N-1 relationship with Contact.\n
  66. The StuContact table now links the 2 tables. We can now find each contact for an individual student (We can search on John Jr and find Mary Smith and John Smith.)\nAnd each student associated with a specific contact (We can search on Mary Smith and find John Jr and April.)\n\nNow could I make artificial Primary keys for the Student and the Contact tables so the pivot table is only working with integars - sure. This is now a common practice on the web to help indexes stay small and fast. However it should be noted that when you do this you are potentially taking up a lot of extra space to hold that unrelated value.\n\nFYI - this is called crows foot notation. The crows foot symbolizes a Many relationship. The other side stands for a singular relationship. So just by looking at the diagram we know that Student is a 1-N relationship to StuContact, and StuContact is a N-1 relationship with Contact.\n
  67. Please note here that I have changed the primary key for the Class table. Originally I was using the classroom, teacher, school year combination to uniquely identify each row. And that was fine. But now I will be connecting the Class table to the Student. So rather then have all that data repeatedly listed, I chose to add an artificial primary key.\n
  68. \n
  69. Like the student and contact listing, the student and class listing is also many to many (many students are in a class and a student can be in many classes over time).\n\nSo again we make a pivot/associative table.\n
  70. Like the student and contact listing, the student and class listing is also many to many (many students are in a class and a student can be in many classes over time).\n\nSo again we make a pivot/associative table.\n
  71. \n
  72. \n
  73. I chose to bring the Grades table in by also linking it into the student and the class. So if we have the name of the student and the class, we can then find the grades.\n
  74. I chose to bring the Grades table in by also linking it into the student and the class. So if we have the name of the student and the class, we can then find the grades.\n
  75. This is what all the tables and relationships look like so far.\n
  76. This is what all the tables and relationships look like so far.\n
  77. \n
  78. \n
  79. \n
  80. \n
  81. This is what all the tables and relationships look like so far.\n\nSo looking at any individual table - is there any data in a table that is not - or could not be linked to the primary key?\n\nNope there is nothing to improve here. So lets change things up a little bit.\n
  82. What about now?\n
  83. Does this help any?\n
  84. So what is here that is not dependent upon the primary key? The teacherSalary. \n\nSo since it is not dependent upon the primary key - to place this table into 3NF we will need to extract it (and any related data) out into its own table.\n
  85. \n
  86. \n
  87. All the tables in 3NF\n
  88. Lets start with the Grades table.\n
  89. Is there any data here that is not dependent on the primary key? Hmm - the final grade is a a derived value (sum of 4 quarters values divided by 4). So technically it is not dependent on the primary key. So we remove it.\n
  90. \n
  91. \n
  92. BCNF, 4NF, etc. \n
  93. But generally speaking 3NF is as high as most applications need to go. For applications I personally prefer to start at 3NF and then adjust as needed for my requirements.\n
  94. \n
  95. the process of attempting to optimize the performance of a database by adding redundant data or by grouping data.\n
  96. \n
  97. \n
  98. \n