Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a SQL Database that Works

731 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Building a SQL Database that Works

  1. 1. Building a SQL Database that works Josh Berkus PostgreSQL Experts, Inc. OpenSourceBridge 2009
  2. 2. How Not To Do Itfour popular methods
  3. 3. 1. One Big Spreadsheet
  4. 4. 2. EAV & E-BlobID Property Setting ID Properties407 Eyes Brown407 Height 73in 407 <eyes=”brown”><height=”73”> <married=”1”><smoker=”1”>407 Married? TRUE 408 <hair=”brown”><age=”49”>408 Married? FALSE <married=”0”><smoker=”0”>408 Smoker FALSE 409 <age=”37”><height=”66”>408 Age 37 <hat=”old”><teeth=”gold”>409 Height 66in
  5. 5. 3. Incremental Development
  6. 6. 4. Leave It to the ORM
  7. 7. E.F. CoddDatabase Engineer, IBM 1970
  8. 8. IBM Databases Run Amok 1.losing data 2.duplicate data 3.wrong data 4.crappy performance5.downtime for database redesign whenever anyone made an application change
  9. 9. The Relational Model
  10. 10. All the Relational Model You Need to Knowin less than 10 minutes
  11. 11. Set (Bag) Theory
  12. 12. Relations Relation(table, view, rowset)
  13. 13. Tuples Relation (table, view, rowset)Tuple (row) Tuple (row) Tuple (row) Tuple (row) Tuple (row)
  14. 14. Attributes Relation (table, view, rowset) Tuple (row) Tuple (row) Tuple (row)Attribute Attribute Attribute Attribute Attribute Attribute Attribute Attribute Attribute Tuple (row) Tuple (row) Attribute Attribute Attribute Attribute Attribute Attribute
  15. 15. Domains (types) Relation (table, view, rowset) Tuple (row) Tuple (row) Tuple (row)INT TEXT INT TEXT INT Attribute TEXT DATE DATE Attribute DATE Tuple (row) Tuple (row) INT Attribute TEXT INT Attribute TEXT Attribute DATE Attribute DATE
  16. 16. Keys Relation (table, view, rowset) Tuple (row) Tuple (row) Tuple (row)Attribute Key Attribute Key Attribute Key Attribute Attribute Attribute Attribute Tuple (row) Tuple (row) Key Attribute Key Attribute Attribute Attribute
  17. 17. Constraints Relation (table, view, rowset) Tuple (row) Tuple (row) Tuple (row) Attribute Attribute Attribute Attribute Attribute Attribute Attribute Attribute Attribute Tuple (row) Tuple (row) Attribute Attribute Attribute Attribute Attribute AttributeAttribute > 5
  18. 18. Foreign Key Constraint
  19. 19. Derived Relation (query)
  20. 20. ?Problem 1:Losing Data
  21. 21. The Atomic Age
  22. 22. Simple User Table● name (text)● email (text)● login (text)● password (text)● status (char) (user, inactive, admin) Users Admins
  23. 23. Non-Atomic Attributes● name (text)● email (text)● login (text)● password (text)● status (char) (user, inactive, admin) Users Admins
  24. 24. Whats Atomic?The simplest form of a datum, which is not divisible without loss of information. name Josh BerkusSELECT SUBSTR(name,STRPOS(name, )) ... Status ai … WHERE status = a???status = u ... … WHERE or ...
  25. 25. Whats Atomic?The simplest form of a datum, which is not divisible without loss of information. first_name last_name Josh Berkus active access TRUE a
  26. 26. Table Atomized!● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) Users● access (char) Admins
  27. 27. Atomic, Shmomic. Who Cares?● Atomic Values: – retain data – make joins easier – make constraints easier● Non-atomic Values: – make data loss more likely – increase CPU usage – make you more likely to forget something
  28. 28. Transactions
  29. 29. Splitting a Bulletin Board Thread the hard wayINSERT INTO threads VALUES ( .... );If $dbh(success) then for $these_posts.date > $cutdate loop UPDATE posts SET thread = $newthread WHERE id = $these_posts.id; if not $dbh(success) then for $these_posts.id > $last_id loop UPDATE posts SET thread = $oldthread WHERE id = $these_posts.id; DELETE FROM threads WHERE id = $newthread;
  30. 30. Splitting a Bulletin Board Thread the transactional wayBEGIN; INSERT INTO threads VALUES ( .... ); $newthread = curval(); UPDATE posts SET thread = $newthread WHERE thread = $oldthread AND date > $cutdate;END;
  31. 31. name user_name admin Josh Berkus Josh Berkus BerkusJoshua Berkus Berkus, Josh user_name Josh Berkus Josh Joshua Berkus Problem 2: Duplicate Data
  32. 32. Where are my Keys?
  33. 33. Candidate (Natural) Keys● first_name (text)● last_name (text)● email (text)● login (text) Key● password (text)● active (boolean) Users● access (char) Admins
  34. 34. A Good Key● Should have to be unique because the application requires it to be.● Expresses a unique predicate which describes the tuple (row): – user with login “jberkus” – post from “jberkus” on “2009-05-02 13:41:22” in thread “Making your own wine”● If you cant find a good key, your table design is missing data.
  35. 35. Surrogate Key● first_name (text) ● user_id (serial)● last_name (text)● email (text)● login (text) Key● password (text)● active (boolean) Users● access (char) Admins
  36. 36. We All Just Want to Be Normal
  37. 37. Abby Normal login level last_namejberkus u Berkusselena a Deckelman login title posted level jberkus Dinner? 09:28 u selena Dinner? 09:37 u jberkus Dinner? 09:44 a
  38. 38. How can I be “Normal”?1. Each piece of data only appears in one relation – except as a “foreign key” attribute● No “repeated” attributes login level last_name jberkus u Berkus selena a Deckelman login title posted jberkus Dinner? 09:28 selena Dinner? 09:37 jberkus Dinner? 09:44
  39. 39. Problem 3:Wrong Data
  40. 40. Constraints
  41. 41. Users Run Amok● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean)● access (char)
  42. 42. Ensure Your Data is Consistent no matter where it came fromfirst_name last_name email login password active level Josh Berkus josh@pgexperts.com jberkus jehosaphat TRUE a NULL Kelley kelley@ucb k NULL TRUE u Mark Twain www.pm.org samuel halleys NULL I S F gavin@sf.gov gavin twitter FALSE x
  43. 43. Users Under Constraint● first_name (text) length() > 1● last_name (text) length() > 1● email (text) ILIKE %@%.%● login (text) length() > 5● password (text) length() > 5● active (boolean) NOT NULL● access (char) IN ( a,u ) note: email and other validators would, of course, be more complex
  44. 44. Foreign Keys attributeUsers login Posts Admins
  45. 45. Posts Table● title (text) NOT NULL REFERENCES threads ( title ) ON DELETE CASCADE ON UPDATE CASCADE● posted (timestamp) NOT NULL● user (text) NOT NULL REFERENCES users ( login ) ON DELETE CASCADE ON UPDATE CASCADE● content (text) NOT NULL
  46. 46. Beautiful Cascades posts.content Josh Berkus Whats up? users.login Im going crazy!Josh Berkus jberkus www.pornking.com jerkyboy Why? selena www.whitehouse.com OSB! Its too much! www.whiteslavery.com www.lolcats.com I told you so ...
  47. 47. Problem 4:Crappy Performance
  48. 48. Things Weve Already Done● Atomicization – less CPU on parsing, calculations● Normalization – less data duplication – smaller tables● Transactions – more batches, less iteration – less locking
  49. 49. Denormalized Derived Relations materialized views for the win Users Admins user_postcount Posts
  50. 50. Problem 5:Database Changes Cause Application Downtime and vice-versa
  51. 51. Stuff Weve Already Done to make our data “agile”● Atomicization – data isnt in specific interface version formats● Normalization – where to extend data is more obvious – create a new table if you have to● Transactions – prevent partial failures from changed schema
  52. 52. Views
  53. 53. Extending the Users Table● first_name (text) CREATE VIEW oldapp.users● last_name (text) AS SELECT● email (text) first_name || || last_name,● login (text) email, login, password, active, access● password (text) FROM users;● active (boolean)● access (char)● created (timestamp)● last_login (timestamp)
  54. 54. The Rest you already know● Write Migrations – deploy these in transactions, if supported – if not, write rollback scripts● Write Tests – use a realistic staging environment
  55. 55. Some Other Tips● Pick a naming scheme – and stick to it● Dont do your joins in the application – the database does them better● Repurposing fields will bite you – sure youll remember when that changed● Dont micro-optimize – INT3, anyone?
  56. 56. More Information● me – josh@pgexperts.com – www.pgexperts.com – it.toolbox.com/blogs/database-soup● postgresql: www.postgresql.org● tutorial at OSCON – monday, 8:30 am! – see you in San Jose! This presentation copyright 2009 Josh Berkus, licensed for distribution under the Creative Commons Attribution License, except for photos, most of which were stolen from other peoples websites via images.google.com. Thanks, Google!

×