How to Fake a Database Design

How to Fake a Database Design
How do I spell “normalization”?
OSCON 2014
Curtis "Ovid" Poe
http://allaroundtheworld.fr/
Copyright 2014, http://www.allaroundtheworld.fr/
March 18, 2022

Good Database Schemas
• Generally normalized
• Denormalized only as necessary
• No duplicate data
March 18, 2022 Copyright 2014, http://www.allaroundtheworld.fr/

Typical Developer Schemas
• A steaming pile of ones and zeros
• … with a “family friendly” background
Source: http://commons.wikimedia.org/wiki/File:Spaghetti-prepared.jpg

Database Normalization
• Remove redundancy
• Create logical relations
• Decomposing data to atomic elements

Only Covering 3NF
1. Remove repeating groups of data
2. Remove partial key dependencies
3. Remove data unrelated to key

How to Feel Stupid
“It is shown that if a relation
schema is in third normal form and
every key is simple, then it is in
projection-join normal form
(sometimes called fifth normal
form), the ultimate normal form
with respect to projections and
joins.”
Simple Conditions for Guaranteeing Higher Normal
Forms in Relational Databases — C. J. Date
http://commons.wikimedia.org/wiki/File:%22I_should_have_gone_to_the_pro_station%22_-_NARA_-
_514564.tif

‘Nuff of that – Let’s Get Started
I’m going to discuss “how”, not “why”,
because I only have 50 minutes.

Faking a Database Design
• Forget everything you know about Excel
• Focus on nouns (sort of)
• Duplicate data is a design flaw

Real-World Problem
• Client wanted a rewrite of recipes site
• They sent us their Access (!) database
• Main objects:
– customers
– recipes
– orders

Our “DBA” Said This Was OK

Our “DBA” also lost his job shortly thereafter

Back to the plot …
• Customers
• Orders
• Recipes

Nouns == Tables(*)

Rule #1
1. Nouns == tables

What’s with the customer_id?

It’s a foreign key
One-to-many
relationship

Our DDL (Data Definition Language)
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
order_date TIMESTAMP WITH TIME ZONE NOT NULL,
FOREIGN KEY (customer_id)
REFERENCES customer(customer_id)
);

Rule #2
1. Nouns == tables
2. Another table’s ID must have a FK constraint

Oh dog, no!

But “What if”?
1. fettuccinne
2. fettuchini
3. fettucini
4. fettucinne
5. fetuchine
6. fetuchinney
7. fetuchinni
8. fetucine
9. fetucini
10. fetucinni
https://www.flickr.com/photos/ykjc9/3485366680/sizes/l

Searching
SELECT recipe_id, name FROM recipes
WHERE
ingredient1 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni')
OR
OR
OR
OR
OR
OR
OR
'fetuchinni', 'fetucine', 'fetucini', 'fetucinni');

It’s “fettuccine”, in case
you were wondering

Searching
SELECT recipe_id, name FROM recipes
WHERE ingredient1 = 'fettuccine'
OR ingredient2 = 'fettuccine'
OR ingredient8 = 'fettuccine';

Ingredients Table

Rule #3
1. Nouns == tables
3. Lists of things get their own table

Lookup Table
Many-to-many relationship

Searching
SELECT recipe_id, name
FROM recipes r
JOIN recipe_ingredients ri ON ri.recipe_id = r.recipe_id
JOIN ingredients i ON i.ingredient_id =
ri.ingredient_id
WHERE i.name = 'fettuccine';

CREATE TABLE recipes_ingredients (
recipe_ingredient_id SERIAL PRIMARY KEY,
recipe_id INTEGER NOT NULL,
ingredient_id INTEGER NOT NULL,
UNIQUE(recipe_id, ingredient_id),
FOREIGN KEY (recipe_id)
REFERENCES recipes(recipe_id),
FOREIGN KEY (ingredient_id)
REFERENCES ingredients(ingredient_id)
);

CREATE TABLE recipes_ingredients (
recipe_id INTEGER NOT NULL,
ingredient_id INTEGER NOT NULL,
PRIMARY KEY (recipe_id, ingredient_id),
FOREIGN KEY (recipe_id)
REFERENCES recipes(recipe_id),
FOREIGN KEY (ingredient_id)
REFERENCES recipes(ingredient_id)
);

Rule #4
1. Nouns == tables
4. Many-to-many == lookup table (with FKs)

So How Do We Order Recipes?

Orders With Recipes

How Many of Which Ingredient?

Our simple “customers”, “orders”, and “recipes”
database has grown to seven tables.
And it will keep growing.

So Far
• Every noun has its own table (*)
• Lookup tables join related tables
• And generally have some of unique constraint
• Other table’s ids have foreign key constraints

Database Tips
• We’ve covered the main rules
• They only cover structure
• Now to dive deeper

Equality ≠ Identity
• No duplication == not duplicating identity
• Are identical twins the same person?
• Are two guys named “John” the same guy?
• This is important and easy to get wrong
• For example …

How do you get the total of an order?
• Assume each recipe has a price
• Store total in the order? (hint: no)
• Store price on the recipe? (hint: yes)
• Is that enough?

Orders Total

Calculating the Order Total?
SELECT o.order_id, sum(i.price)
FROM orders o
JOIN orders_recipes orr
ON orr.order_id = o.order_id
JOIN recipes r
ON r.recipe_id = orr.recipe_id
GROUP BY o.order_id

What if the price changes?

Calculating the Order Total
SELECT o.order_id, sum(orr.price)
FROM orders o
ON orr.order_id = o.order_id
GROUP BY o.order_id

Equality is not Identity
• Order item price isn’t item price
• What if the item price changes?
• What if you give a discount on the order item?
• A subtle, but common bug

Rule #5
1. Nouns == tables
5. Watch for equal values that aren’t identical

Naming
• Names are important
• Identical columns should have identical names
• Names should hint at use

Bad Naming
SELECT name, 'too cold'
FROM areas
WHERE temperature < 32;

ID Names
orders.order_id
versus
orders.id

ID Names
SELECT o.id, sum(i.price)
FROM orders o
ON orr.order_id = o.id
JOIN recipes r
on r.id = o.id
GROUP BY o.order_id

Conceptually Similar to …
SELECT name
FROM customer
WHERE id > weight;

ID Names
SELECT thread.*
FROM email thread
JOIN email selected ON selected.id = thread.id
JOIN character recipient ON recipient.id = thread.recipient_id
JOIN station_area sa ON sa.id = recipient.id
JOIN station st ON st.id = sa.id
JOIN star origin ON origin.id = thread.id
JOIN star destination ON destination.id = st.id
LEFT JOIN route
ON ( route.from_id = origin.id AND route.to_id = destination.id )
WHERE selected.id = ?
AND ( thread.sender_id = ?
OR ( thread.recipient_id = ?
AND ( origin.id = destination.id
OR ( route.distance IS NOT NULL
AND
now() >= thread.datesent
+ ( route.distance * interval '30 seconds' )
))))
ORDER BY datesent ASC, thread.parent_id ASC NULLS FIRST

Rule #6
1. Nouns == tables
6. Name columns as descriptively as possible

Summary
• Nouns == tables (*)
• FK constraints
• Proper naming is important
• Your DBAs will thank you
• Your apps will be more robust

?
http://www.slideshare.net/ovid/

Bonus Slides!
Super-duper important stuff I wasn’t
sure I had time to cover because it’s
going to make your head hurt.

Avoid NULL Values
• Every column should have a type
• NULLs, by definition, are unknown values
• Thus, their type is unknown
• But … every column should have a type?

Our employees Table
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name CHARACTER VARYING(255) NOT NULL,
salary MONEY NULL
);

Giving Bonuses
• $1,000 bonus to all employees
• … if they make less than $40,000/year

Get Employees For Bonus
SELECT employee_id, name
FROM employee
WHERE salary < 40000;

Bad SQL
• Won’t return anyone with a NULL salary
• Why is the salary NULL?
– What if it’s confidential?
– What if they’re a contractor and in that table?
– What if they’re an unpaid slave intern?
– What if it’s unknown when the data was entered?

NULLs tell you nothing
supplier_id city
s1 ‘London’
part_id city
p1 NULL
suppliers table
parts table
Example via “Database In Depth” by C.J. Date

part_id city
p1 NULL
parts table
SELECT part_id
FROM parts;
SELECT part_id
FROM parts
WHERE city = city;

supplier_id city
s1 ‘London’
part_id city
p1 NULL
SELECT s.supplier_id, p.part_id
FROM suppliers s, parts p
WHERE p.city <> s.city -- can’t compare NULL
OR p.city <> 'Paris’; -- can’t compare NULL

NULLs tell you lies
SELECT s.supplier_id, p.part_id
FROM suppliers s, parts p
WHERE p.city <> s.city -- can’t compare NULL
OR p.city <> 'Paris’; -- can’t compare NULL
• We get no rows because we can’t compare a NULL city
• The unknown city is Paris or it isn't.
• If it’s Paris, the first condition is true
• If it’s not Paris, the second condition is true
• Thus, the WHERE clause must be true, but it’s not

Rule #7
1. Nouns == tables
2. Another table’s ID must have an FK constraint
6. Name columns as descriptively as possible
7. Avoid NULL columns like the plague

How to Fake a Database Design

More Related Content

Similar to How to Fake a Database Design

More from Curtis Poe

Recently uploaded

How to Fake a Database Design

Editor's Notes