Your SlideShare is downloading. ×
BADCamp 2008 DB Sync
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BADCamp 2008 DB Sync

1,344
views

Published on

http://badcamp.net/session/database-synchronization

http://badcamp.net/session/database-synchronization

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,344
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Database Synchronization Shaun Haber Warner Bros. Records
  • 2. What is it? • Merging content between a dev site and a production site
  • 3. Disclaimer • No single answer • No “Drupally” solution • Not exclusive to Drupal • Not magic
  • 4. Who the hell am I?
  • 5. Warner Music Group
  • 6. Warner Bros. Records • Subsidiary of Warner Music Group • Family of labels (Reprise, Sire, etc.) • Over 100 artists • Top-selling albums • It’s music biz after all!
  • 7. So what?
  • 8. WBR Tech • Only label with an in-house Tech team • “Start-up” mentality • Fast-paced, hectic, and fun! • We use Drupal... religiously
  • 9. 93 Drupal Sites 1 new site every week
  • 10. Launching like crazy!
  • 11. Source: http://flickr.com/photos/krosinsky/2848288562/ Web sites in the wild!
  • 12. Websites in the wild • Always collecting new data!
  • 13. Data Launch Time
  • 14. Not a bad thing, obviously • Want websites to grow • More users + more data = PROFIT
  • 15. But... • How do we keep the site updated? - New content - New features - Code fixes - <insert your own update here>
  • 16. Minor updates Major updates Source: http://flickr.com/photos/nimboo/132386298
  • 17. Minor Updates • CSS tweak • template.php change • Add a new Block • Change settings on a View • Install a new module
  • 18. Major Updates • Schema changes • Information re-architecture • Significant configuration changes • User flow changes • New theme integration
  • 19. Strategy? Maintain a separate Dev site!
  • 20. Prod server Dev server New Time
  • 21. Prod server Dev server New QA Time
  • 22. Prod server Prod Dev server New QA Dev Time
  • 23. Prod server Prod Prod Dev server New QA Dev Time
  • 24. Prod server Prod Prod Dev server New QA Dev Dev Time
  • 25. Prod server ? Prod Prod Dev server New QA Dev Dev Time
  • 26. Syncing Databases Sucks Code Easy Files Easy Database Hard
  • 27. Prod server Prod Prod Dev server New QA Dev Dev Time
  • 28. Prod server Prod 2.0 Prod Prod Dev server New QA Dev Dev Time
  • 29. Order of Events 1. Develop a new site 2. Launch site 3. Take snapshot of prod site 4. Develop on snapshot 5. Magic? => Relaunch new version of site
  • 30. But it’s not Magic! 1. Take dev site down 2. Shift sequenced IDs on Dev 3. Take prod site down 4. Merge content from Prod to Dev 5. QA “new” dev site 6. Copy dev site to prod site 7. Bring “new” prod site live
  • 31. Source: http://flickr.com/photos/interplast/6339098/ It’s Database Surgery!
  • 32. 2 Step Process • Step 1 - Shift Sequenced IDs • Step 2 - Merge content
  • 33. 3 2 1
  • 34. 3 3 2 2 1 1
  • 35. 6 5 5 4 4 3 3 2 2 1 1
  • 36. 11 10 6 5 4 3 3 2 2 1 1
  • 37. 11 10 6 6 5 5 4 4 3 3 2 2 1 1
  • 38. 11 10 6 6 5 5 4 4 3 3 2 2 1 1
  • 39. 11 10 6 6 5 5 4 4 3 3 2a 2a 1 1
  • 40. 11 10 6 6 5 5 4 4 3 3 2a 2a 1 1
  • 41. Step 1 - Shifting IDs • comments_cid • files_fid • node_revisions_vid • node_nid • users_uid
  • 42. Need to know • Highest common ID between Dev and Prod • Delta value to shift • Reference of known tables and fields
  • 43. Highest Common ID • Top item on the “stack” at time of the snapshot. 3 3 3 2 2 1 1
  • 44. Delta value • Amount to shift the conflicted items, with extra padding 11 10 7 3
  • 45. UPDATE table SET id = id + $delta WHERE id > $common
  • 46. And that’s it for Step 1
  • 47. Actually, it’s MUCH more complicated...
  • 48. What tables have nid? comments.nid poll.nid content_field_* nid.field_*_nid poll_choices.nid content_type_* nid.field_*_nid poll_votes.nid files.nid term_node.nid forum.nid uc_cart_products.nid forward_log.nid uc_order_products.nid history.nid uc_product_features.nid node.nid uc_products.nid node_access.nid uc_roles_products.nid node_comment_statistics.nid usernode.nid node_counter.nid webform.nid node_revisions.nid webform_component.nid nodefamily.parent_nid, child_nid webform_submissions.nid panels_node.nid webform_submitted_data.nid
  • 49. Also... • Special tables: • location, sequences, url_alias, etc. • node-nid.tpl.php • Serialized PHP variables in DB • PHP code in DB • URLs in DB or elsewhere (e.g., /node/123)
  • 50. Well shit!
  • 51. Do the best we can! • Reference of all known tables • Reference of all known sequence fields • Reference of all known “special cases” • Automate as much as possible
  • 52. Scripting Time!
  • 53. Check for unknown tables $rs = db_query(“SHOW TABLES”); while ($row = db_fetch_row($rs)) { if (!is_known_table($row[0]) { log_unknown_table($row[0]); } } if (found_unknown_tables()) { print_unknown_tables(); exit; }
  • 54. Store all known tables in a txt file access buddylist_groups accesslog buddylist_pending_requests audio_widget_thumbnail cache* audio_widget_track comments authmap contact blocks content_field_* blocks_roles content_type_* boxes devel_queries buddylist devel_times buddylist_buddy_group ...
  • 55. Store all fields in separate txt files comments.nid node_comment_statistics.nid content_field_* nid.field_*_nid node_counter.nid content_type_* nid.field_*_nid node_revisions.nid files.nid nodefamily.parent_nid, child_nid forum.nid panels_node.nid forward_log.nid poll.nid history.nid poll_choices.nid node.nid poll_votes.nid node_access.nid ...
  • 56. Now we can shift IDs! • Iterate thru DB tables • If table has known fields, shift IDs (remember that SQL command?) • Rinse and repeat for each sequenced ID
  • 57. UPDATE table SET id = id + $delta WHERE id > $common
  • 58. Special Cases
  • 59. Sequences table • Simply reset the value to new highest ID • Do this after shifting IDs in the “primary” table (node.nid, user.uid, etc.)
  • 60. UPDATE sequences SET `$seq` = $max
  • 61. Location table • Stores ID val in column `eid` • Stores sequence type in column `type` • type = node, user
  • 62. UPDATE location SET `eid` = `eid` + $delta WHERE `eid` > $common AND `type` = $type
  • 63. Url_alias table • ID values are embedded as strings • Use pattern matching to parse the ID • node: node/nid • user: user/uid, blog/uid • Add the delta, update new alias
  • 64. Pseudo-code SELECT * FROM url_alias WHERE src LIKE ‘node/%’ preg_match('/node/([0-9]*)/', $src, $matches) $id = $matches[1] $id = $id + $delta UPDATE url_alias SET src = 'node/$id' WHERE pid = $pid
  • 65. Manually • Rename any node-nid.tpl.php files • Search for ID vals in DB: • Eval’ed PHP code • Serialized PHP code • URLs • anything else?
  • 66. Step 1 Recap • Maintain indexes for tables and fields • Automate using the indexes • Review indexes before each shift • Inspect for manual cases after each shift • Document every new case you find!
  • 67. At least most of this can be automated!
  • 68. Step 2 - Merging Content
  • 69. Merging Content 10 6 6 5 5 4 4 3 3
  • 70. What to merge? • Content • Really, just the content • No variables, settings, etc.
  • 71. Need to know • Highest Common ID (same from Step 1) • Reference of tables
  • 72. Process • Iterate thru Prod tables: • Skip • INSERT IGNORE (I) • REPLACE (R) • DROP and INSERT (A)
  • 73. Special Cases • Url_alias table • Sequences table • Some nodes
  • 74. Url_Alias table • Don’t go by pid • REPLACE INTO url_alias SET src = '$src', dst = '$dst'
  • 75. Sequences table • Manually inspect sequence values!
  • 76. Node timestamps • Get timestamp of Highest Common nid • Check for older nodes on Prod that have been modified recently
  • 77. Replace on Dev with SELECT nid FROM node WHERE changed > $timestamp AND nid > $common
  • 78. That’s it... for now.
  • 79. Future • Share sequences table between Dev and Prod • Even/odd IDs (Drupal 6+) • Macro recordings and playbacks
  • 80. Questions? • Shaun Haber shaun.haber@wbr.com http://srhaber.com Twitter: @srhaber

×