Continuous Delivery: The Dirty Details

12,812 views
12,328 views

Published on

The practical implementation of Continuous Delivery at Etsy, and how it enables the engineering team to build features quickly, refactor and change architecture, and respond to problems in production.

Presented at GOTO Aarhus 2012.

Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.

http://www.etsy.com/careers

Published in: Technology
5 Comments
38 Likes
Statistics
Notes
No Downloads
Views
Total views
12,812
On SlideShare
0
From Embeds
0
Number of Embeds
590
Actions
Shares
0
Downloads
267
Comments
5
Likes
38
Embeds 0
No embeds

No notes for slide

Continuous Delivery: The Dirty Details

  1. 1. CONTINUOUS DELIVERY: THE DIRTY DETAILS Mike Brittain Etsy.com @mikebrittain mike@etsy.com
  2. 2. a.k.a. “Continuous Deployment”
  3. 3. www. .com
  4. 4. AUGUST 2012 1.4 Billion page views USD $76 Million in transactions 3.8 Million items soldhttp://www.etsy.com/blog/news/2012/etsy-statistics-august-2012-weather-report/
  5. 5. ~170 Committers, everyone deployscredit: martin_heigan (flickr)
  6. 6. 40302010 Very end of 2009 Today
  7. 7. Continuous delivery is a pattern language ingrowing use in software development toimprove the process of software delivery.Techniques such as automated testing,continuous integration, and continuousdeployment allow software to be developed toa high standard and easily packaged anddeployed to test environments, resulting in theability to rapidly, reliably and repeatedly pushout enhancements and bug fixes to customersat low risk and with minimal manualoverhead. The technique was one of theassumptions of extreme programming but atan enterprise level has developed into adiscipline of its own, with job descriptions forroles such as "buildmaster" calling for CDskills as mandatory. ~wikipedia
  8. 8. + DevOps+ Working on mainline, trunk, master+ Feature flags+ Branching in code
  9. 9. An Apology
  10. 10. An ApologyWe build primarily in PHP.Please don’t run away!
  11. 11. The Dirty Details of...“Continuous Deployment in Practice at Etsy”
  12. 12. Then Now 2009 2010-today Just before westarted using CD
  13. 13. Then Now 6-14 hours 15 mins“Deployment Army” 1 personHighly orchestrated Rapid release and infrequent cycle
  14. 14. Then NowSpecial event and Commonplace andhighly disruptive happens so often we cannot keep up
  15. 15. Then Now Blocked for Blocked for 6-14 hours, 15 minutes,plus minimum of next deploy will 6 hours to only take redeploy 15 minutes Config flags <5 mins
  16. 16. Then Now Release branch, Mainline, database schemas, minimal linking data transforms, and building, packaging, rsync, rolling restarts, site up cache purging,scheduled downtime
  17. 17. Then Now Slow FastComplex Simple Special Common
  18. 18. Deploying code is the very first thing engineers learn to do at Etsy.
  19. 19. 1st dayAdd your photo to Etsy.com.
  20. 20. 1st day Add your photo to Etsy.com. 2nd dayComplete tax, insurance, and benefits forms.
  21. 21. WARNING
  22. 22. Continuous Deployment Small, frequent changes.Constantly integrating into production. 30 deploys per day.
  23. 23. “Wow... 30 deploys a day.How do you build features so quickly?”
  24. 24. Software Deploy ≠ Product Launch
  25. 25. Deploys frequently gated by config flags (“dark” releases)
  26. 26. $cfg[‘new_search’] = array(enabled => off);$cfg[‘sign_in’] = array(enabled => on);$cfg[‘checkout’] = array(enabled => on);$cfg[‘homepage’] = array(enabled => on);
  27. 27. $cfg[‘new_search’] = array(enabled => off);
  28. 28. $cfg[‘new_search’] = array(enabled => off);// Meanwhile...# old and boring search$results = do_grep();
  29. 29. $cfg[‘new_search’] = array(enabled => off);// Meanwhile...if ($cfg[‘new_search’] == ‘on’) { # New and fancy search $results = do_solr();} else { # old and boring search $results = do_grep();}
  30. 30. $cfg[‘new_search’] = array(enabled => on);// or...$cfg[‘new_search’] = array(enabled => staff);// or...$cfg[‘new_search’] = array(enabled => 1%);// or...$cfg[‘new_search’] = array(enabled => users, user_list => mike,john,kellan);
  31. 31. Validate in production, hidden from public.
  32. 32. What’s in a deploy?Small incremental changes to the applicationNew classes, methods, controllersGraphics, stylesheets, templatesCopy/content changesTurning flags on/off, or ramping up
  33. 33. Quickly Responding to issuesSecurity, bugs, traffic, load shedding,adding/removing infrastructure.Tweaking config flags or releasing patches.
  34. 34. http://www.flickr.com/photos/flyforfun/2694158656/
  35. 35. Config flags Operator Metricshttp://www.flickr.com/photos/flyforfun/2694158656/
  36. 36. “How do you continuously deploy database schema changes?”
  37. 37. Code deploys: ~ every 15-20 minutes Schema changes: Thursday
  38. 38. Our web application is largely monolithic.
  39. 39. Etsy.com, support tools, developer API, back-office, analytics
  40. 40. External “services” are not deployed with the main application.
  41. 41. Databases, Search, Photo storage
  42. 42. For every config flag, there are two stateswe can support — forward and backward.
  43. 43. Expose multiple versions in each service.Expect multiple versions in the application.
  44. 44. Example: Changing a Database Schema
  45. 45. Prefer ADDs over ALTERs (“non-breaking expansions”)
  46. 46. Altering in-place requires coupling code and schema changes.
  47. 47. Merging “users” and “users_prefs”
  48. 48. 1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
  49. 49. 0. Add new version to schema1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
  50. 50. 0. Add new version to schemaSchema change to add prefs columns to “users” table.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “off”“read_prefs_from_users_table” => “off”
  51. 51. 1. Write to both versionsWrite code for writing prefs to the “users” table.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “off”
  52. 52. 2. Backfill historical dataOffline process to sync existing data from “user_prefs”to new columns in “users”
  53. 53. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “staff”
  54. 54. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “1%”
  55. 55. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “5%”
  56. 56. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “on”(“on” == “100%”)
  57. 57. 4. Cut-off writes to old versionAfter running on the new table for a significant amountof time, we can cut off writes to the old table.“write_prefs_to_user_prefs_table” => “off”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “on”
  58. 58. “Branch by Astraction” Controller Controller Users Model (Abstraction) “users” (old) “user_prefs” “users” old schema new schema http://paulhammant.com/blog/branch_by_abstraction.htmlhttp://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
  59. 59. “The Migration 4-Step”1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
  60. 60. “The Migration 4-Step”1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version5. Clean up flags, code, columns (when?)
  61. 61. Architecture and Process
  62. 62. Deploying is cheap.
  63. 63. Some philosophies on product development...
  64. 64. Gathering data should be cheap, too. staff, opt-in prototypes, 1%
  65. 65. Treat first iterations as experiments.
  66. 66. Get into code as quickly as possible.
  67. 67. Architecture largely doesn’t matter.
  68. 68. Kill things that don’t work.
  69. 69. “Terminate with extreme predjudice.”
  70. 70. Is the dumb solution enough to build a product? How long will the dumb solution last?
  71. 71. Your assumptions will be wrong once you’ve scaled 10x.
  72. 72. “We don’t optimize for being right. We optimize for quickly detecting when we’re wrong.” ~Kellan Elliott-McCrea, CTO
  73. 73. Become really good at changing your architecture.
  74. 74. Invest time in architecture by the 2nd or 3rd iteration.
  75. 75. Integration and Operations
  76. 76. Continuous Deployment Small, frequent changes.Constantly integrating into production. 30 deploys per day.
  77. 77. Code review before commit
  78. 78. Automated tests before deploy
  79. 79. Why Integrate with Production?
  80. 80. Dev ≠ Prod
  81. 81. Verify frequently and in small batches.
  82. 82. Integrating with production is a test in itself. We do this frequently and in small batches.
  83. 83. "Production is truly the only place you can validate your code."
  84. 84. "Production is truly the only place you can validate your code." ~ Michael Nygard, about 40 min ago
  85. 85. More database servers in prod.Bigger database hardware in prod.More web servers.Various replication schemes.Different versions of server and OS software.Schema changes applied at different times.Physical hardware in prod.More data in prod.Legacy data (7 years of odd user states).More traffic in prod.Wait, I mean MUCH more traffic in prod.Fewer elves.Faster disks (SSDs) in prod.
  86. 86. Using a MySQL database to test an application thatwill eventually be deployed on Oracle:
  87. 87. Using a MySQL database to test an application thatwill eventually be deployed on Oracle: Priceless.
  88. 88. Verify frequently and in small batches.
  89. 89. Dev ≠ Prod
  90. 90. Dev ⇾ QA ⇾ Staging ⇾ Prod
  91. 91. Dev ⇾ QA ⇾ Staging ⇾ Prod
  92. 92. Dev ⇾ Pre-Prod ⇾ Prod
  93. 93. Test and integrate where you’ll see value.
  94. 94. Config flags (again)off, on, staff, opt-in prototypes, user list, 0-100%
  95. 95. Config flags (again)off, on, staff, opt-in prototypes, user list, 0-100% “canary pools”
  96. 96. Automated tests after deploy
  97. 97. Real-time metrics and dashboardsNetwork & Servers, Application, Business
  98. 98. Release Managers: 0
  99. 99. Is it Broken?Or , is it just better?
  100. 100. Metrics + Configs ⇾ OODA Loop
  101. 101. “Theoretical” vs. “Practical”
  102. 102. Homepage (95th perc.) Surprise!!! Turning off multi- language support improves our page generation times by up to 25%.
  103. 103. Nope. It’s really broken.
  104. 104. Config flags Operator Metricshttp://www.flickr.com/photos/flyforfun/2694158656/
  105. 105. Thursday, Nov 22 - Thanksgiving Friday, Nov 23 - “Black Friday”Monday, Nov 26 - “Cyber Monday” ~30 days out from Christmas
  106. 106. 40302010
  107. 107. Thank you. Mike Brittain mike@etsy.com @mikebrittain

×