CONTINUOUS DEPLOYMENT                          The Dirty DetailsMike BrittainENGINEERING DIRECTOR   @mikebrittain         ...
CONTINUOUS DEPLOYMENT       The Dirty Details“OK, sounds cool. But I have some questions...”
CD 100- & 200-levels- CI environment for automated tests- Committing to trunk- Branching in code- Config flags (a.k.a. fea...
CD 100- & 200-levels- CI environment for automated tests- Committing to trunk- Branching in code- Config flags (a.k.a. fea...
www.   .com
GROSS MERCHANDISE SALEShttp://www.etsy.com/blog/news/2013/notes-from-chad-2012-year-in-review/
DECEMBER 2012                                                                                        1.5 Billion page view...
175+ Committers, everyone deployscredit: martin_heigan (flickr)
DEPLOYMENTS PER DAY       Very end of 2009                          Today
Continuous delivery is a pattern language in growing use          in software development to improve the process of       ...
Architecture Stack                                  Linux, Apache, MySQL, PHP                                  Memcache, G...
Then             Now     2009          2010-today Just before westarted using CD
Then                 Now    6-14 hours            15 mins“Deployment Army”         1 person Special event and    Part of e...
Then           NowBlocked for   Blocked for6-14 hours.   15 minutes.6+ hours to   15 minutes to redeploy       redeploy
Then                 Now  Release branch,        Mainline, database schemas,     minimal linking  data transforms,      an...
1st dayPut your face on Etsy.com.
2nd day                         Complete tax, insurance, and                              benefits forms.credit: ktpupp (fli...
WARNING
GROSS MERCHANDISE SALES
Continuous Deployment      Small, frequent changes.Constantly integrating into production.        30+ deploys per day.
“Wow... 30 deploys a day.How do you build features so quickly?”
Software Deploy ≠ Product Launch
Deploys frequently gated by config flags             (“dark” releases)
$cfg[‘new_search’]   =   array(enabled   =>   off);$cfg[‘sign_in’]      =   array(enabled   =>   on);$cfg[‘checkout’]     ...
$cfg[‘new_search’] = array(enabled => off);
$cfg[‘new_search’] = array(enabled => off);// Meanwhile...# old and boring search$results = do_grep();
$cfg[‘new_search’] = array(enabled => off);// Meanwhile...if ($cfg[‘new_search’] == ‘on’) {  # New and fancy search  $resu...
$cfg[‘new_search’] = array(enabled => on);// or...$cfg[‘new_search’] = array(enabled => staff);// or...$cfg[‘new_search’] ...
Validate in production, hidden from public.
What’s in a deploy?Small incremental changes to the applicationNew classes, methods, controllersGraphics, stylesheets, tem...
Low MTTR (response times)Latent bugs and security holesTraffic management, load sheddingAdding and removing infrastructureTw...
http://www.flickr.com/photos/flyforfun/2694158656/
Config flags                          Operator                                                        Metricshttp://www.flick...
Favorites “on”
Favorites “off”
Many deploys eventually lead to a product launch.
“How do you continuously deploy  database schema changes?”
Code deploys ~15-20 minutesSchema changes
Code deploys ~15-20 minutesSchema changes THURSDAYS!
Our web application is largely monolithic.     Etsy.com, Support & Back-office tools,     Developer API, Gearman (async work)
Our web application is largely monolithic.     Etsy.com, Support & Back-office tools,     Developer API, Gearman (async work...
External “services” are not deployed with          the main application.e.g. Databases, Search, Photo storage, Payments
External “services” are not deployed with            the main application.   e.g. Databases, Search, Photo storage, Paymen...
For every config flag, there are two states  we can support — present and future.
For every config flag, there are two states  we can support — present and future.                            ... or past and...
“Non-Breaking Expansions” Expose new version in a service interface;support multiple versions in the consumer.
Example: Changing a Database Schema    Merging “users” and “users_prefs”
C     RULE OF THUMB:     Prefer ADDs over ALTERs (non-breaking expansion)
1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
0. Add new version to schema1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes ...
0. Add new version to schemaSchema change to add prefs columns to “users” table.“write_prefs_to_user_prefs_table” => “on”“...
1. Write to both versionsWrite code for writing prefs to the “users” table.“write_prefs_to_user_prefs_table” => “on”“write...
2. Backfill historical dataOffline process to sync existing data from “user_prefs”to new columns in “users”
3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_pr...
3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_pr...
3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_pr...
3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_pr...
4. Cut-off writes to old versionAfter running on the new table for a significant amountof time, we can cut off writes to the...
“Branch by Astraction”                            Controller                               Controller                     ...
“The Migration 4-Step”1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old...
“When do you clean up all of those config flags?
We might remove config flags for the old version when...It is no longer valid for the business.It is no longer stable, maint...
Promote “dev flags” to “feature flags”
// Feature flag$cfg[‘mobilized_pages’] = array(enabled => on);// Dev flags$cfg[‘mobile_templates_seller_tools’]     =   ar...
// Feature flags$cfg[‘search’]                = array(enabled => on);$cfg[‘developer_api’]         = array(enabled => on);...
// Feature flags$cfg[‘search’]              = array(enabled => on);$cfg[‘developer_api’]       = array(enabled => on);$cfg...
Architecture and Process
Deploying is cheap.
Releasing is cheap.
Some philosophies on product development...
Gathering data should be cheap, too.        staff, opt-in prototypes, 1%
Treat first iterations as experiments.
Get into code as quickly as possible.
“Where a new system concept or new technology is used, one has to build asystem to throw away, for even the best planning ...
Architecture largely doesn’t matter.
Kill things that don’t work.
Your assumptions will be wrong    once you’ve scaled 10x.
“We don’t optimize for being right. We optimize  for quickly detecting when we’re wrong.”                                ~...
Become really good at changing     your architecture.
Invest time in architecture by the       2nd or 3rd iteration.
Integration and Operations
WARNING          REMEMBER THIS?
Continuous Deployment      Small, frequent changes.Constantly integrating into production.         30 deploys per day.
Code review before commit
Automated tests before deploy
Why Integrate with Production?
Dev ≠ Prod
Verify frequently and in small batches.
Integrating with production is a test in itself.    We do this frequently and in small batches.
More database servers in prod.Bigger database hardware in prod.More web servers.Various replication schemes.Different versi...
Using a MySQL database in dev for an application that will be runningon Oracle in production: Priceless
Verify frequently and in small batches.
Dev ⇾ QA ⇾ Staging ⇾ Prod
Dev ⇾ QA ⇾ Staging ⇾ Prod
Dev ⇾ Pre-Prod ⇾ Prod
Test and integrate where you’ll see value.
Config flags (again)off, on, staff, opt-in prototypes, user list, 0-100%
Config flags (again)off, on, staff, opt-in prototypes, user list, 0-100%                    “canary pools”
Automated tests after deploy
Real-time metrics and dashboardsNetwork & Servers, Application, Business
SERVER METRICSApache requests/sec, Busy processes,CPU utilization, Script exec time (med. & 95th)APPLICATION METRICSLogins...
Real humans reporting trouble!
“Theoretical” vs. “Practical” Engineering
Config flags                          Operator                                                        Metricshttp://www.flick...
Managing risk during Holiday Shopping season  Thanksgiving, “Black Friday,” “Cyber Monday” ➔ Christmas                    ...
DEPLOYMENTS PER DAY                      “Code Slush”
Tighten your feedback cyclesIntegrate with production and validate early in cycle.Use tools that allow you to detect issue...
Thank you                                                           ... and questions?These slides will be available later...
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
Upcoming SlideShare
Loading in...5
×

Continuous Deployment: The Dirty Details

19,733

Published on

Presented at ALM Summit 3 in Redmond, WA. January 2013.

Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.

http://www.etsy.com/careers

Published in: Technology
2 Comments
61 Likes
Statistics
Notes
No Downloads
Views
Total Views
19,733
On Slideshare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
308
Comments
2
Likes
61
Embeds 0
No embeds

No notes for slide

Continuous Deployment: The Dirty Details

  1. 1. CONTINUOUS DEPLOYMENT The Dirty DetailsMike BrittainENGINEERING DIRECTOR @mikebrittain mike@etsy.com
  2. 2. CONTINUOUS DEPLOYMENT The Dirty Details“OK, sounds cool. But I have some questions...”
  3. 3. CD 100- & 200-levels- CI environment for automated tests- Committing to trunk- Branching in code- Config flags (a.k.a. feature flags)- DevOps mentality- Metrics and alerting- Automated deploy scripts credit: photobookgirl (flickr)
  4. 4. CD 100- & 200-levels- CI environment for automated tests- Committing to trunk- Branching in code- Config flags (a.k.a. feature flags)- DevOps mentality CD 300 level- Metrics and alerting - Deploys vs. releases- Automated deploy scripts - Decoupled systems, schema changes - How we work: Arch. & Process - Integration and Operations credit: photobookgirl (flickr)
  5. 5. www. .com
  6. 6. GROSS MERCHANDISE SALEShttp://www.etsy.com/blog/news/2013/notes-from-chad-2012-year-in-review/
  7. 7. DECEMBER 2012 1.5 Billion page views $117 Million of goods sold 6 Million items soldItems by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet http://www.etsy.com/blog/news/2013/etsy-statistics-december-2012-weather-report/
  8. 8. 175+ Committers, everyone deployscredit: martin_heigan (flickr)
  9. 9. DEPLOYMENTS PER DAY Very end of 2009 Today
  10. 10. Continuous delivery is a pattern language in growing use in software development to improve the process of software delivery. Techniques such as automated testing, continuous integration, and continuous deployment allow software to be developed to a high standard and easily packaged and deployed to test environments, resulting in the ability to rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead. ~wikipediacredit: Stewart, redgen (flickr)
  11. 11. Architecture Stack Linux, Apache, MySQL, PHP Memcache, Gearman, Postgresql, Solr, Java, Traffic Server, Hadoop, HBase Git, Jenkinscredit: Saire Elizabeth (flickr)
  12. 12. Then Now 2009 2010-today Just before westarted using CD
  13. 13. Then Now 6-14 hours 15 mins“Deployment Army” 1 person Special event and Part of everydayhighly orchestrated workflow
  14. 14. Then NowBlocked for Blocked for6-14 hours. 15 minutes.6+ hours to 15 minutes to redeploy redeploy
  15. 15. Then Now Release branch, Mainline, database schemas, minimal linking data transforms, and building, packaging, rsync, rolling restarts, site up cache purging,scheduled downtime
  16. 16. 1st dayPut your face on Etsy.com.
  17. 17. 2nd day Complete tax, insurance, and benefits forms.credit: ktpupp (flickr)
  18. 18. WARNING
  19. 19. GROSS MERCHANDISE SALES
  20. 20. Continuous Deployment Small, frequent changes.Constantly integrating into production. 30+ deploys per day.
  21. 21. “Wow... 30 deploys a day.How do you build features so quickly?”
  22. 22. Software Deploy ≠ Product Launch
  23. 23. Deploys frequently gated by config flags (“dark” releases)
  24. 24. $cfg[‘new_search’] = array(enabled => off);$cfg[‘sign_in’] = array(enabled => on);$cfg[‘checkout’] = array(enabled => on);$cfg[‘homepage’] = array(enabled => on);
  25. 25. $cfg[‘new_search’] = array(enabled => off);
  26. 26. $cfg[‘new_search’] = array(enabled => off);// Meanwhile...# old and boring search$results = do_grep();
  27. 27. $cfg[‘new_search’] = array(enabled => off);// Meanwhile...if ($cfg[‘new_search’] == ‘on’) { # New and fancy search $results = do_solr();} else { # old and boring search $results = do_grep();}
  28. 28. $cfg[‘new_search’] = array(enabled => on);// or...$cfg[‘new_search’] = array(enabled => staff);// or...$cfg[‘new_search’] = array(enabled => 1%);// or...$cfg[‘new_search’] = array(enabled => users, user_list => mike,john,kellan);
  29. 29. Validate in production, hidden from public.
  30. 30. What’s in a deploy?Small incremental changes to the applicationNew classes, methods, controllersGraphics, stylesheets, templatesCopy/content changesTurning flags on, off, or % ramp up
  31. 31. Low MTTR (response times)Latent bugs and security holesTraffic management, load sheddingAdding and removing infrastructureTweaking config flags or releasing patches.
  32. 32. http://www.flickr.com/photos/flyforfun/2694158656/
  33. 33. Config flags Operator Metricshttp://www.flickr.com/photos/flyforfun/2694158656/
  34. 34. Favorites “on”
  35. 35. Favorites “off”
  36. 36. Many deploys eventually lead to a product launch.
  37. 37. “How do you continuously deploy database schema changes?”
  38. 38. Code deploys ~15-20 minutesSchema changes
  39. 39. Code deploys ~15-20 minutesSchema changes THURSDAYS!
  40. 40. Our web application is largely monolithic. Etsy.com, Support & Back-office tools, Developer API, Gearman (async work)
  41. 41. Our web application is largely monolithic. Etsy.com, Support & Back-office tools, Developer API, Gearman (async work) PHP, Apache, Memcache
  42. 42. External “services” are not deployed with the main application.e.g. Databases, Search, Photo storage, Payments
  43. 43. External “services” are not deployed with the main application. e.g. Databases, Search, Photo storage, Payments MYSQL PCI PROXY CACHE,(schema changes) (controlled access) FILERS, AMAZON S3 SOLR, JVM (specialized infra.) (rolling restarts)
  44. 44. For every config flag, there are two states we can support — present and future.
  45. 45. For every config flag, there are two states we can support — present and future. ... or past and present.
  46. 46. “Non-Breaking Expansions” Expose new version in a service interface;support multiple versions in the consumer.
  47. 47. Example: Changing a Database Schema Merging “users” and “users_prefs”
  48. 48. C RULE OF THUMB: Prefer ADDs over ALTERs (non-breaking expansion)
  49. 49. 1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
  50. 50. 0. Add new version to schema1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
  51. 51. 0. Add new version to schemaSchema change to add prefs columns to “users” table.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “off”“read_prefs_from_users_table” => “off”
  52. 52. 1. Write to both versionsWrite code for writing prefs to the “users” table.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “off”
  53. 53. 2. Backfill historical dataOffline process to sync existing data from “user_prefs”to new columns in “users”
  54. 54. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “staff”
  55. 55. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “1%”
  56. 56. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “5%”
  57. 57. 3. Read from new versionData validation tests. Ensure consistency both internallyand in production.“write_prefs_to_user_prefs_table” => “on”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “on”(“on” == “100%”)
  58. 58. 4. Cut-off writes to old versionAfter running on the new table for a significant amountof time, we can cut off writes to the old table.“write_prefs_to_user_prefs_table” => “off”“write_prefs_to_users_table” => “on”“read_prefs_from_users_table” => “on”
  59. 59. “Branch by Astraction” Controller Controller Users Model (Abstraction)“users” (old) “user_prefs” “users” old schema new schema http://paulhammant.com/blog/branch_by_abstraction.html http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
  60. 60. “The Migration 4-Step”1. Write to both versions2. Backfill historical data3. Read from new version4. Cut-off writes to old version
  61. 61. “When do you clean up all of those config flags?
  62. 62. We might remove config flags for the old version when...It is no longer valid for the business.It is no longer stable, maintained, or trusted.It has poor performance characteristics.The code is a mess, or difficult to read.We can afford to spend time on it.
  63. 63. Promote “dev flags” to “feature flags”
  64. 64. // Feature flag$cfg[‘mobilized_pages’] = array(enabled => on);// Dev flags$cfg[‘mobile_templates_seller_tools’] = array(enabled => on);$cfg[‘mobile_templates_account_tools’] = array(enabled => on);$cfg[‘mobile_templates_member_profile’] = array(enabled => on);$cfg[‘mobile_templates_search’] = array(enabled => off);$cfg[‘mobile_templates_activity_feed’] = array(enabled => off);...if ($cfg[‘mobilized_pages’] == ‘on’ && $cfg[‘mobile_templates_search’] == ‘on’) { // ... // ...}
  65. 65. // Feature flags$cfg[‘search’] = array(enabled => on);$cfg[‘developer_api’] = array(enabled => on);$cfg[‘seller_tools’] = array(enabled => on);$cfg[‘the_entire_web_site’] = array(enabled => on);
  66. 66. // Feature flags$cfg[‘search’] = array(enabled => on);$cfg[‘developer_api’] = array(enabled => on);$cfg[‘seller_tools’] = array(enabled => on);$cfg[‘the_entire_web_site’] = array(enabled => on);$cfg[‘the_entire_web_site_no_really_i_mean_it’] = array(enabled => on);
  67. 67. Architecture and Process
  68. 68. Deploying is cheap.
  69. 69. Releasing is cheap.
  70. 70. Some philosophies on product development...
  71. 71. Gathering data should be cheap, too. staff, opt-in prototypes, 1%
  72. 72. Treat first iterations as experiments.
  73. 73. Get into code as quickly as possible.
  74. 74. “Where a new system concept or new technology is used, one has to build asystem to throw away, for even the best planning is not so omniscient as toget it right the first time. Hence plan to throw one away; you will, anyhow.” ~ Fred Brooks, The Mythical Man-Month
  75. 75. Architecture largely doesn’t matter.
  76. 76. Kill things that don’t work.
  77. 77. Your assumptions will be wrong once you’ve scaled 10x.
  78. 78. “We don’t optimize for being right. We optimize for quickly detecting when we’re wrong.” ~Kellan Elliott-McCrea, CTO
  79. 79. Become really good at changing your architecture.
  80. 80. Invest time in architecture by the 2nd or 3rd iteration.
  81. 81. Integration and Operations
  82. 82. WARNING REMEMBER THIS?
  83. 83. Continuous Deployment Small, frequent changes.Constantly integrating into production. 30 deploys per day.
  84. 84. Code review before commit
  85. 85. Automated tests before deploy
  86. 86. Why Integrate with Production?
  87. 87. Dev ≠ Prod
  88. 88. Verify frequently and in small batches.
  89. 89. Integrating with production is a test in itself. We do this frequently and in small batches.
  90. 90. More database servers in prod.Bigger database hardware in prod.More web servers.Various replication schemes.Different versions of server and OS software.Schema changes applied at different times.Physical hardware in prod.More data in prod.Legacy data (7 years of odd user states).More traffic in prod.Wait, I mean MUCH more traffic in prod.Fewer elves.Faster disks (SSDs) in prod.
  91. 91. Using a MySQL database in dev for an application that will be runningon Oracle in production: Priceless
  92. 92. Verify frequently and in small batches.
  93. 93. Dev ⇾ QA ⇾ Staging ⇾ Prod
  94. 94. Dev ⇾ QA ⇾ Staging ⇾ Prod
  95. 95. Dev ⇾ Pre-Prod ⇾ Prod
  96. 96. Test and integrate where you’ll see value.
  97. 97. Config flags (again)off, on, staff, opt-in prototypes, user list, 0-100%
  98. 98. Config flags (again)off, on, staff, opt-in prototypes, user list, 0-100% “canary pools”
  99. 99. Automated tests after deploy
  100. 100. Real-time metrics and dashboardsNetwork & Servers, Application, Business
  101. 101. SERVER METRICSApache requests/sec, Busy processes,CPU utilization, Script exec time (med. & 95th)APPLICATION METRICSLogins, Registrations, Checkouts,Listings created, Forum postsTime and event correlated.
  102. 102. Real humans reporting trouble!
  103. 103. “Theoretical” vs. “Practical” Engineering
  104. 104. Config flags Operator Metricshttp://www.flickr.com/photos/flyforfun/2694158656/
  105. 105. Managing risk during Holiday Shopping season Thanksgiving, “Black Friday,” “Cyber Monday” ➔ Christmas (~30 days) Code Freeze?
  106. 106. DEPLOYMENTS PER DAY “Code Slush”
  107. 107. Tighten your feedback cyclesIntegrate with production and validate early in cycle.Use tools that allow you to detect issues early.Optimize for quick response times.Applied to both feature development and operability.
  108. 108. Thank you ... and questions?These slides will be available later today at http://mikebrittain.com/talks Mike Brittain ENGINEERING DIRECTOR @mikebrittain mike@etsy.com
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×