18. Scott’s Rules For Database Design
1. Normalize Your Database For Data
Deduplication
2. Use The Database Engine to Keep Data
Clean
3. Proactively Add Indexes to Keep Queries
Performant
20. Users Table
• Email address
• Password
• Active state
• Hire Date
• Listing of previous passwords
• Office Name
• Office City
• Office Zip
21. Users Table
• Email address (string)
• Password (string)
• Active state (string)
• Hire Date (string)
• Listing of previous passwords (string)
• Office Name (string)
• Office City (string)
• Office Zip (string)
24. Normalize Your Database For Data Deduplication
“[T]he process of structuring a relational
database in accordance with a series of so-
called normal forms in order to reduce data
redundancy and improve data integrity.”
-“Database normalization” on Wikipedia
25. Normalize Your Database For Data Deduplication
• UNF: Unnormalized form
• 1NF: First normal form
• 2NF: Second normal form
• 3NF: Third normal form
• EKNF: Elementary key normal
form
• BCNF: Boyce–Codd normal form
• 4NF: Fourth normal form
• ETNF: Essential tuple normal
form
• 5NF: Fifth normal form
• DKNF: Domain-key normal form
• 6NF: Sixth normal form
26. Normalize Your Database For Data Deduplication
• UNF: Unnormalized form
• 1NF: First normal form
• 2NF: Second normal form
• 3NF: Third normal form
• EKNF: Elementary key normal
form
• BCNF: Boyce–Codd normal form
• 4NF: Fourth normal form
• ETNF: Essential tuple normal
form
• 5NF: Fifth normal form
• DKNF: Domain-key normal form
• 6NF: Sixth normal form
27. Normalize Your Database For Data Deduplication
• Boyce–Codd Normal Form:
• X should be a superkey for every
functional dependency (FD) X−>Y in a
given relation.
31. First Normal Form (1NF)
1. The table contains a unique identifier, also called the primary key, that is
used to identify the row.
2. Each column contains atomic values (values that can not be broken
down)
32. 1NF - users
email password active hire_date
previous_p
assword
office_
name
office_phone office_city office_zip
alice@exa
mple.com
hash1 1 1/1/2024
hash1
hash5
hash6
Main
Office
555-555-5555
Saginaw 48609
avery@exa
mple.com
NULL 1 8/11/2024
hash2
hash7
Hash8
main
office
5555555555 Saginaw 48609
scott@exa
mple.com
hash3 1
May 11th,
23
hash3
Man
office
(555)555-5555
Saginaw 48609
scott@exa
mple.com
hash4 1 Tuesday hash4 Main
555/555/5555
Saginaw 48609
33. 1NF - users
• A unique identifier should be:
• Auto-incrementing int
• UUID
34. 1NF - users
id email password active hire_date
previous_
password
office_
name
office_phone
office_cit
y
office_zip
1
alice@exa
mple.com
hash1 1 1/1/2024
hash1
hash5
hash6
Main
Office
555-555-5555
Saginaw 48609
2
avery@ex
ample.com
NULL 1 8/11/2024
hash2
hash7
Hash8
main
office
5555555555 Saginaw 48609
3
scott@exa
mple.com
hash3 1
May 11th,
23
hash3
Man
office
(555)555-
5555 Saginaw 48609
4
scott@exa
mple.com
hash4 1 Tuesday hash4 Main
555/555/5555
Saginaw 48609
35. 1NF - users
id email password active hire_date
previous_
password
office_
name
office_phone
office_cit
y
office_zip
1
alice@exa
mple.com
hash1 1 1/1/2024
hash1
hash5
hash6
Main
Office
555-555-5555
Saginaw 48609
2
avery@ex
ample.com
NULL 1 8/11/2024
hash2
hash7
hash8
main
office
5555555555 Saginaw 48609
3
scott@exa
mple.com
hash3 1
May 11th,
23
hash3
Man
office
(555)555-
5555 Saginaw 48609
4
scott@exa
mple.com
hash4 1 Tuesday hash4 Main
555/555/5555
Saginaw 48609
42. Second Normal Form (2NF)
1. Is already in 1NF
2. All the non-key columns are dependent on the primary key of the table
43. Second Normal Form (2NF)
id email password active hire_date
office_
name
office_phone office_city office_zip
1
alice@exa
mple.com
hash1 1 1/1/2024
Main
Office
555-555-5555
Saginaw 48609
2
avery@exa
mple.com
NULL 1 8/11/2024
main
office
5555555555 Saginaw 48609
3
scott@exa
mple.com
hash3 1
May 11th,
23
Man
office
(555)555-5555
Saginaw 48609
4
scott@exa
mple.com
hash4 1 Tuesday Main
555/555/5555
Saginaw 48609
44. 2nd - offices
id name phone city zip
1 Main Office
555-555-5555
Saginaw 48609
2 main office 5555555555 Saginaw 48609
3 Man office
(555)555-5555
Saginaw 48609
4 Main
555/555/5555
Saginaw 48609
45. 2NF - users
id email password active hire_date
office_
name
office_phone office_city office_zip
1
alice@exa
mple.com
hash1 1 1/1/2024
Main
Office
555-555-5555
Saginaw 48609
2
avery@exa
mple.com
NULL 1 8/11/2024
main
office
5555555555 Saginaw 48609
3
scott@exa
mple.com
hash3 1
May 11th,
23
Man
office
(555)555-5555
Saginaw 48609
4
scott@exa
mple.com
hash4 1 Tuesday Main
555/555/5555
Saginaw 48609
46. 2NF - users
id email password active hire_date
office_
name
office_phone
office_cit
y
office_zip office_id
1
alice@exa
mple.com
hash1 1 1/1/2024
Main
Office
555-555-5555
Saginaw 48609 1
2
avery@ex
ample.com
NULL 1 8/11/2024
main
office
5555555555 Saginaw 48609 2
3
scott@exa
mple.com
hash3 1
May 11th,
23
Man
office
(555)555-
5555 Saginaw 48609 3
4
scott@exa
mple.com
hash4 1 Tuesday Main
555/555/5555
Saginaw 48609 4
47. 2NF - users
id email password active hire_date office_id
1
alice@example.co
m
hash1 1 1/1/2024 1
2
avery@example.c
om
NULL 1 8/11/2024 2
3
scott@example.co
m
hash3 1 May 11th, 23 3
4
scott@example.co
m
hash4 1 Tuesday 4
49. Third Normal Form (3NF)
1. Is already in 2NF
2. It contains columns that are non-transitively dependent on the primary key
50. 3NF - offices
id name phone city zip
1 Main Office
555-555-5555
Saginaw 48609
2 main office 5555555555 Saginaw 48609
3 Man office
(555)555-5555
Saginaw 48609
4 Main
555/555/5555
Saginaw 48609
51. 3NF - zips
id city
48609 Saginaw
48640 Midland
48642 Midland
48901 Lansing
52. 3NF - zips
id city state
48609 Saginaw MI
48640 Midland MI
48642 Midland MI
48901 Lansing MI
66. Use Correct Column Types
id email password active hire_date office_id
1
alice@example.co
m
hash1 1 1/1/2024 1
2
avery@example.c
om
NULL 1 8/11/2024 2
3
scott@example.co
m
hash3 1 May 11th, 23 3
4
scott@example.co
m
hash4 1 Tuesday 4
5 Hash12 2 2024-04-01 1000
67. Use Correct Column Types
• Numeric: INT, TINYINT, BIGINT, FLOAT, REAL, etc.
• Date/Time: DATE, TIME, DATETIME, etc.
• String: CHAR, VARCHAR, TEXT, etc.
• Binary data types such as: BLOB, etc.
75. Use NOT NULL for Required Fields
mysql>
insert into users
(password)
values
(“just a password?”);
ERROR 1364 (HY000): Field ‘email' doesn't have a default value
76. Use NOT NULL for Required Fields
mysql>
insert into users
(password)
values
(“just a password?”);
ERROR 1364 (HY000): Field ‘email' doesn't have a default value
77. Use NOT NULL for Required Fields
mysql>
insert into users
(email, password)
values
(“s@s”, "just a password?");
ERROR 1364 (HY000): Field 'active' doesn't have a default value
85. Use Foreign Keys For References To Other Tables
id name phone city zip_id
1 Main Office
555-555-5555
Saginaw 48609
2 main office 5555555555 Saginaw 48609
3 Man office
(555)555-5555
Saginaw 48609
4 Main
555/555/5555
Saginaw 48609
86. Use Foreign Keys For References To Other Tables
id name phone city zip_id
1 Main Office
555-555-5555
Saginaw 48609
2 main office 5555555555 Saginaw 48609
3 Man office
(555)555-5555
Saginaw 48609
87. Use Foreign Keys For References To Other Tables
id name phone city zip_id
1 Main Office
555-555-5555
Saginaw 48609
2 main office 5555555555 Saginaw 48609
88. Use Foreign Keys For References To Other Tables
id name phone city zip_id
1 Main Office
555-555-5555
Saginaw 48609
143. What You Need to Know
1. Normalize Your Database For Data
Deduplication
2. Use The Database Engine to Keep Data
Clean
3. Proactively Add Indexes to Keep Queries
Performant
144. What You Need to Know
1. The table contains a unique identifier, also called the primary key, that is
used to identify the row.
2. Each column contains atomic values (values that can not be broken
down)
3. All the non-key columns are dependent on the primary key of the table
4. It contains columns that are non-transitively dependent on the primary key
145. What You Need to Know
• Make the DB Work With You
• Correct Column Types
• NOT NULL for Required Fields
• UNIQUE for Unique Values
• Foreign Keys For References To Other Tables
• Triggers For Complex Requirements
146. What You Need to Know
• Use indexes on commonly searched columns
• Start simple
• See recorded talks about how to add
Ask people for photos
Good morning all!
2 Shocking facts
Anyone else in this boat
Like to think I’m good at working with DBs
Know that there’s always something to learn
Early in my journey: just threw data into it and it spit it back
Might have been magic
Core piece of technology that I don’t understand
<slide>
Didn’t have a more senior level developer who could mentor
So I had to figure it out
Not necessarily a bad thing because that’s how I work best
Push a new feature
users are initially happy
But as usage grows we start finding problems
Angry people
Customers
My boss
Not ideal
Results in me fixing things under distress
Night/weekends
Once in a bathroom at a holiday inn
My goal is to have you learn from my thrama
For those of you who haven’t met me my name is …
Professional PHP Developer for 16 years
// team lead/CTO role for 11 of those 16
Currently Director of Technology at WeCare Connect
Use PHP and mysql for our backend
Also …
That being said My goal for today give you <slide>
These are the rules I give new hires so they can understand our teams design
So we have to figure it out ourselves
All of these rules exist to prevent bugs or performance problem
Like examples
Today’s example is <slide> from a project
Initial version of this database as it existed when we took over project
Track everything using a string
Only going to talk about the first four forms today as the others are hard to understand and demo
1 and 2 give us huge bang for our buck and we start looking a demising returns around 3
A table doesn’t meet any of the conditions of normalization
Essentially a spreadsheet
The table contains a unique identifier, also called the primary key, that is used to identify the row.
Make it auto incrementing a primary key so the database knows how to handle it
Each column contains atomic values (values that can not be broken down)
To solve this we need to create another table
A lot of normalization is fixed with more tables
Now in 1NF
Still a lot of duplication and mismatched data
Review users table
Three sections
Primary key
Second section - all related to that
Third section - not related
First X columns are dependent
Fix? It’s a new table
Create offices
Link our offices table to the users table
Link our offices table to the users table
Drop all the office columns
2nf
When columns are transitively dependent one column's data relies on another column through a third column. For example, our offices' city column is dependent on the zip column which is dependent on the office's id.
To fix this we'll split out the zip in a new table.
To fix this we'll split out the zip in a new table.
As many validation rules as possible
<slide has a bunch>
Not going to prevent lazy me
Right to the DB
This is just hiding future bugs want to prevent that
Not going to prevent lazy me
Right to the DB
This is just hiding future bugs want to prevent that
Not going to prevent lazy me
Right to the DB
This is just hiding future bugs want to prevent that
<slide>
Not the other way around
Let’s start with one of the most basic constraints
Looking back at our users table
Still issues: Blank emails, Date problem , Duplicate emails , Deleted Sites Problem
We can and should enforce rules at application level but …
Next thing: weird dates
Want dates in the “correct” format
Right now if someone asks for all the employees hired in 2023 getting that information will be a challenge
Especially the person who starts on Tuesday
List of all the types in mysql
SQL has a ton of types to best fit our needs
Switch this column to a date
Reformat a little and we get consistent values
Now easy to find everyone who start in 2023
Switch this column to a date
Reformat a little and we get consistent values
Now easy to find everyone who start in 2023
Might have required for field but
Show insert missing email
Might have required for field but
Show insert missing email
Embrace NOT NULL for required columns
Embrace NOT NULL for required columns
Embrace NOT NULL for required columns
Embrace NOT NULL for required columns
Example insert 2 users with same email and password
Example insert 2 users with same email and password
Embrase unique constraints
Allows us to specify this column is unique
Good for thing we never ever want to see two of email is the best option
Embrase unique constraints
Allows us to specify this column is unique
Good for thing we never ever want to see two of email is the best option
Can specify multiple columns for uniqueness
Example: multi-tenant database could support email address uniqueness per office
Gave users access to clean up offices
So they started deleting the duplicates
Gave users access to clean up offices
So they started deleting the duplicates
Gave users access to clean up offices
So they started deleting the duplicates
Gave users access to clean up offices
So they started deleting the duplicates
Deleted locations so the values don’t match
This table is using a join which is breaking the results
User as assigned to locations that no longer exist
Users that belong to non-existent offices
Need some way to say what’s valid
Need some way to say what’s valid
Allow us to define the relationship of one column to another table
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Allow us to define the relationship of one column to another
Performance
Not enough I ever worry about but each FK requires looks
“Magic” according to some developers
Active column can accept any integer value
I also like this for complex requirement that a standard column doesn’t cover
Ex: if a row is one type different fields are not null
Indexes In Life
I love to cook
Love to try new recipes
Leftover food from recipe
Now get a neural network to figure out
But could use cookbooks
Option 1
Go through every page looking for matches
Slow as most don’t meet our criteria
Option 2
Go to back of book to the index and look up ingreditants
Use that to look up recipes
Much faster
Same
Database is going to look at every row
Fine when you have 100 users
Slow when you have 10 million
We’re going to use indexes to tell the database common things we’re going to query on
<click>
For example, I’m going to search commonly on email and active so that’s a prime candidate
For example, I’m going to search commonly on email and active so that’s a prime candidate
For example, I’m going to search commonly on email and active so that’s a prime candidate
For example, I’m going to search commonly on email and active so that’s a prime candidate
All of these rules exist to prevent bugs or performance problem