Domains (types) Relation (table, view, rowset) Tuple (row) Tuple (row) Tuple (row)INT TEXT INT TEXT INT Attribute TEXT DATE DATE Attribute DATE Tuple (row) Tuple (row) INT Attribute TEXT INT Attribute TEXT Attribute DATE Attribute DATE
Simple User Table● name (text)● email (text)● login (text)● password (text)● status (char) (user, inactive, admin) Users Admins
Non-Atomic Attributes● name (text)● email (text)● login (text)● password (text)● status (char) (user, inactive, admin) Users Admins
Whats Atomic?The simplest form of a datum, which is not divisible without loss of information. name Josh BerkusSELECT SUBSTR(name,STRPOS(name, )) ... Status ai … WHERE status = a???status = u ... … WHERE or ...
Whats Atomic?The simplest form of a datum, which is not divisible without loss of information. first_name last_name Josh Berkus active access TRUE a
Atomic, Shmomic. Who Cares?● Atomic Values: – retain data – make joins easier – make constraints easier● Non-atomic Values: – make data loss more likely – increase CPU usage – make you more likely to forget something
Splitting a Bulletin Board Thread the hard wayINSERT INTO threads VALUES ( .... );If $dbh(success) then for $these_posts.date > $cutdate loop UPDATE posts SET thread = $newthread WHERE id = $these_posts.id; if not $dbh(success) then for $these_posts.id > $last_id loop UPDATE posts SET thread = $oldthread WHERE id = $these_posts.id; DELETE FROM threads WHERE id = $newthread;
Splitting a Bulletin Board Thread the transactional wayBEGIN; INSERT INTO threads VALUES ( .... ); $newthread = curval(); UPDATE posts SET thread = $newthread WHERE thread = $oldthread AND date > $cutdate;END;
name user_name admin Josh Berkus Josh Berkus BerkusJoshua Berkus Berkus, Josh user_name Josh Berkus Josh Joshua Berkus Problem 2: Duplicate Data
A Good Key● Should have to be unique because the application requires it to be.● Expresses a unique predicate which describes the tuple (row): – user with login “jberkus” – post from “jberkus” on “2009-05-02 13:41:22” in thread “Making your own wine”● If you cant find a good key, your table design is missing data.
Abby Normal login level last_namejberkus u Berkusselena a Deckelman login title posted level jberkus Dinner? 09:28 u selena Dinner? 09:37 u jberkus Dinner? 09:44 a
How can I be “Normal”?1. Each piece of data only appears in one relation – except as a “foreign key” attribute● No “repeated” attributes login level last_name jberkus u Berkus selena a Deckelman login title posted jberkus Dinner? 09:28 selena Dinner? 09:37 jberkus Dinner? 09:44
Problem 3:Wrong Data
Users Run Amok● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean)● access (char)
Ensure Your Data is Consistent no matter where it came fromfirst_name last_name email login password active level Josh Berkus firstname.lastname@example.org jberkus jehosaphat TRUE a NULL Kelley kelley@ucb k NULL TRUE u Mark Twain www.pm.org samuel halleys NULL I S F email@example.com gavin twitter FALSE x
Users Under Constraint● first_name (text) length() > 1● last_name (text) length() > 1● email (text) ILIKE %@%.%● login (text) length() > 5● password (text) length() > 5● active (boolean) NOT NULL● access (char) IN ( a,u ) note: email and other validators would, of course, be more complex
Foreign Keys attributeUsers login Posts Admins
Posts Table● title (text) NOT NULL REFERENCES threads ( title ) ON DELETE CASCADE ON UPDATE CASCADE● posted (timestamp) NOT NULL● user (text) NOT NULL REFERENCES users ( login ) ON DELETE CASCADE ON UPDATE CASCADE● content (text) NOT NULL
Beautiful Cascades posts.content Josh Berkus Whats up? users.login Im going crazy!Josh Berkus jberkus www.pornking.com jerkyboy Why? selena www.whitehouse.com OSB! Its too much! www.whiteslavery.com www.lolcats.com I told you so ...
Problem 4:Crappy Performance
Things Weve Already Done● Atomicization – less CPU on parsing, calculations● Normalization – less data duplication – smaller tables● Transactions – more batches, less iteration – less locking
Denormalized Derived Relations materialized views for the win Users Admins user_postcount Posts
Problem 5:Database Changes Cause Application Downtime and vice-versa
Stuff Weve Already Done to make our data “agile”● Atomicization – data isnt in specific interface version formats● Normalization – where to extend data is more obvious – create a new table if you have to● Transactions – prevent partial failures from changed schema
Extending the Users Table● first_name (text) CREATE VIEW oldapp.users● last_name (text) AS SELECT● email (text) first_name || || last_name,● login (text) email, login, password, active, access● password (text) FROM users;● active (boolean)● access (char)● created (timestamp)● last_login (timestamp)
The Rest you already know● Write Migrations – deploy these in transactions, if supported – if not, write rollback scripts● Write Tests – use a realistic staging environment
Some Other Tips● Pick a naming scheme – and stick to it● Dont do your joins in the application – the database does them better● Repurposing fields will bite you – sure youll remember when that changed● Dont micro-optimize – INT3, anyone?
More Information● me – firstname.lastname@example.org – www.pgexperts.com – it.toolbox.com/blogs/database-soup● postgresql: www.postgresql.org● tutorial at OSCON – monday, 8:30 am! – see you in San Jose! This presentation copyright 2009 Josh Berkus, licensed for distribution under the Creative Commons Attribution License, except for photos, most of which were stolen from other peoples websites via images.google.com. Thanks, Google!