Successfully reported this slideshow.
Your SlideShare is downloading. ×

Crossing the Production Barrier: Development at Scale

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Crossing the Production Barrier: Development at Scale

  1. 1. jgoulah@etsy.com/@johngoulah CrossingtheProductionBarrier DevelopmentAtScale
  2. 2. The world’s handmade marketplace platform for people to sell homemade, crafts, and vintage goods
  3. 3. 42MMuniquevisitors/mo.
  4. 4. 1.5B+pageviews/mo. 42MMuniquevisitors/mo.
  5. 5. 1.5B+pageviews/mo. 42MMuniquevisitors/mo. 850Kshops/200countries
  6. 6. 1.5B+pageviews/mo. 895MMsalesin2012 42MMuniquevisitors/mo. 850Kshops/200countries
  7. 7. big cluster, 20 shards and adding 5 more
  8. 8. over 40% increase from last year in QPS (25K last year) additional 30K moving over from postgres 1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
  9. 9. 4TBInnoDBbufferpool over 40% increase from last year in QPS (25K last year) additional 30K moving over from postgres 1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
  10. 10. 4TBInnoDBbufferpool 20TB+datastored over 40% increase from last year in QPS (25K last year) additional 30K moving over from postgres 1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
  11. 11. 60K+queries/secavg 4TBInnoDBbufferpool 20TB+datastored over 40% increase from last year in QPS (25K last year) additional 30K moving over from postgres 1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
  12. 12. 60K+queries/secavg 4TBInnoDBbufferpool 20TB+datastored ~1.2Gbpsoutbound(plaintext) over 40% increase from last year in QPS (25K last year) additional 30K moving over from postgres 1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
  13. 13. 60K+queries/secavg 4TBInnoDBbufferpool 20TB+datastored 99.99%queriesunder1ms ~1.2Gbpsoutbound(plaintext) over 40% increase from last year in QPS (25K last year) additional 30K moving over from postgres 1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
  14. 14. 50+MySQLservers/800CPUs ServerSpec HPDL380G7 96GBRAM 16spindles/1TBRAID10 24Core 16 x 146GB
  15. 15. TheProblem been around since ’05, hit this a few years ago, every big company probably has this issue
  16. 16. DATA sync prod to dev, until prod data gets too big http://www.flickr.com/photos/uwwresnet/6280880034/sizes/l/in/ photostream/
  17. 17. SomeApproaches subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc) generated data can be time consuming to fake
  18. 18. SomeApproaches subsetsofdata subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc) generated data can be time consuming to fake
  19. 19. SomeApproaches subsetsofdata generateddata subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc) generated data can be time consuming to fake
  20. 20. But... but there is a problem with both of those approaches
  21. 21. EdgeCases what about testing edge cases, difficult to diagnose bugs? hard to model the same data set that produced a user facing bug http://www.flickr.com/photos/sovietuk/141381675/sizes/l/in/ photostream/
  22. 22. Perspective another issue is testing problems at scale, complex and large gobs of data real social network ecosystem can be difficult to generate (favorites, follows) (activity feed, “similar items” search gives better results) http://www.flickr.com/photos/donsolo/2136923757/sizes/l/in/ photostream/
  23. 23. Prod Dev? what most people do before data gets too big, almost 2 days to sync 20Tb over 1Gbps link, 5 hrs over 10Gbps bringing prod dataset to dev was expensive hardware/maint, keeping parity with prod, and applying schema changes would take at least as long
  24. 24. UseProduction so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugs we still have a dev database but the data is sparse and unreliable
  25. 25. UseProduction (sometimes) so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugs we still have a dev database but the data is sparse and unreliable
  26. 26. goes without saying this can be dangerous also difficult if done right, we’ve been working on this for a year http://www.flickr.com/photos/stuckincustoms/432361985/sizes/l/in/ photostream/
  27. 27. Approach two big things: cultural and technical
  28. 28. SolveCultureIssuesFirst part of figuring this out was exhausting all other options getting buy-in from major stakeholders
  29. 29. Two“Simple”TechnicalIssues
  30. 30. step0: failurerecovery
  31. 31. step1: makeitsafe how to have test data in production, prevent stupid mistakes
  32. 32. phasedrollout
  33. 33. phasedrollout read-only
  34. 34. phasedrollout read-only r/wdevshardonly
  35. 35. phasedrollout read-only r/wdevshardonly fullr/w
  36. 36. How? how did we do it?
  37. 37. QuickOverview high level view http://www.flickr.com/photos/h-k-d/7852444560/sizes/o/in/ photostream/
  38. 38. tickets index shard1 shard2 shardN
  39. 39. tickets index shard1 shard2 shardN UniqueIDs
  40. 40. tickets index shard1 shard2 shardN ShardLookup
  41. 41. tickets index shard1 shard2 shardN Store/RetrieveData
  42. 42. devshard introducing.... dev shard, shard used for initial writes of data created when coming from dev env
  43. 43. tickets index shard1 shard2 shardN
  44. 44. tickets index shard1 shard2 shardN DEVshard
  45. 45. shard1 shard2 shardN DEVshard www.etsy.com www.goulah.vm InitialWrites
  46. 46. shard1 shard2 shardN DEVshard www.etsy.com www.goulah.vm InitialWrites
  47. 47. shard1 shard2 shardN DEVshard www.etsy.com www.goulah.vm InitialWrites
  48. 48. mysqlproxy
  49. 49. proxy hits all of the shards/index/tickets http://www.oreillynet.com/pub/a/databases/2007/07/12/getting-started-with-mysql-proxy.html
  50. 50. dangerous/unnecessaryqueries -- filter dangerous queries - (queries without a WHERE) -- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
  51. 51. dangerous/unnecessaryqueries (DEV) etsy_rw@jgoulah [test]> select * from fred_test; -- filter dangerous queries - (queries without a WHERE) -- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
  52. 52. dangerous/unnecessaryqueries (DEV) etsy_rw@jgoulah [test]> select * from fred_test; ERROR 9001 (E9001): Selects from tables must have where clauses -- filter dangerous queries - (queries without a WHERE) -- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
  53. 53. knownin/egressfunnel we know where all of the queries from dev originate from http://www.flickr.com/photos/medevac71/4875526920/sizes/l/in/ photostream/
  54. 54. explicitlyenabled % dev_proxy on Dev-Proxy config is now ON. Use 'dev_proxy off' to turn it off. Not on all the time
  55. 55. visualnotifications
  56. 56. notify engineers they are using the proxy, this is read-only mode
  57. 57. read/writemode
  58. 58. read-write mode, needed for login and other things that write data
  59. 59. stealthdata hiding data from users (favorites go on dev and prod shard, making sure test user/shops don’t show up) http://www.flickr.com/photos/davidyuweb/8063097077/sizes/h/in/ photostream/
  60. 60. Security http://www.flickr.com/photos/sidelong/3878741556/sizes/l/in/ photostream/
  61. 61. PCI token exchange only, locked down for most people
  62. 62. PCI off-limits token exchange only, locked down for most people
  63. 63. anomalydetection another part of our security setup is detection
  64. 64. logging basics of anomaly detection is log collection
  65. 65. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table;
  66. 66. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date
  67. 67. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date threadid
  68. 68. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date threadid sourceip
  69. 69. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date threadid sourceip uniqueidgeneratedbyproxy
  70. 70. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date threadid sourceip uniqueidgeneratedbyproxy apprequestid
  71. 71. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date threadid sourceip uniqueidgeneratedbyproxy apprequestid dest.shard
  72. 72. 2013-04-22 18:05:43 485370821 devproxy -- /* DEVPROXY source=10.101.194.19:40198 uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361 [htSp8458VmHlC] [etsy_index_B] [browse.php] */ SELECT id FROM table; date threadid sourceip uniqueidgeneratedbyproxy apprequestid dest.shard script
  73. 73. login-as (read only, logged w/ reason for access)
  74. 74. reasonisrecordedandreviewed
  75. 75. Recovery
  76. 76. sourcesofrestoredata
  77. 77. sourcesofrestoredata Hadoop
  78. 78. sourcesofrestoredata Hadoop Backups
  79. 79. sourcesofrestoredata Hadoop Backups DelayedSlaves
  80. 80. DelayedSlaves pt-slave-delay watches a slave and starts and stops its replication SQL thread as necessary to hold it http://www.flickr.com/photos/xploded/141295823/sizes/o/in/ photostream/
  81. 81. DelayedSlaves role of the delayed slave also source of BCP (business continuity planning - prevention and recovery of threats)
  82. 82. 4hourdelaybehindmaster DelayedSlaves role of the delayed slave also source of BCP (business continuity planning - prevention and recovery of threats)
  83. 83. 4hourdelaybehindmaster producerowbasedbinarylogs DelayedSlaves role of the delayed slave also source of BCP (business continuity planning - prevention and recovery of threats)
  84. 84. 4hourdelaybehindmaster producerowbasedbinarylogs DelayedSlaves allowforquickrecovery role of the delayed slave also source of BCP (business continuity planning - prevention and recovery of threats)
  85. 85. pt-slave-delay--daemonize --pid/var/run/pt-slave-delay.pid--log/var/log/pt-slave-delay.log --delay4h--interval1m--nocontinue last 3 options most important, 4h delay, interval is how frequently it should check whether slave should be started or stopped nocontinue - don’t continue replication normally on exitx user/pass eliminated for brevity
  86. 86. R/W R/W Slave ShardPair
  87. 87. R/W R/W Slave ShardPair pt-slave-delay
  88. 88. R/W R/W Slave ShardPair pt-slave-delay rowbasedbinlogs
  89. 89. R/W R/W Slave ShardPair HDFS Vertica Parse/ Transform in addition can use slaves to send data to other stores for offline queries 1)parse each binlog file to generate sequence file of row changes 2)apply the row changes to a previous set for the latest version
  90. 90. somethingbadhappens... bad query is run (bad update, etc) http://www.flickr.com/photos/focalintent/1332072795/sizes/o/in/ photostream/
  91. 91. A B Slave BeforeRestoration.... master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  92. 92. A B Slave BeforeRestoration.... 1)stopdelayed slavereplication master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  93. 93. B Slave BeforeRestoration.... 1)stopdelayed slavereplication 2)pull sideA A master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  94. 94. B Slave BeforeRestoration.... 3)stopmaster-masterreplication 1)stopdelayed slavereplication 2)pull sideA A master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  95. 95. > SHOW SLAVE STATUS Relay_Log_File: dbslave-relay.007178 Relay_Log_Pos: 8666654 ondelayedslave get the relay position
  96. 96. mysql> show relaylog events in "dbslave-relay.007178" from 8666654 limit 1G *************************** 1. row ******************* Log_name: dbslave-relay.007178 Pos: 8666654 Event_type: Query Server_id: 1016572 End_log_pos: 8666565 Info: use `etsy_shard`; /* [CVmkWxhD7gsatX8hLbkDoHk29iKo] [etsy_shard_001_B] [/ your/activity/index.php] */ UPDATE `news_feed_stats` SET `time_last_viewed` = 1366406780, `update_time` = 1366406780 WHERE `owner_id` = 30793071 AND `owner_type_id` = 2 AND `feed_type` = 'owner' 2 rows in set (0.00 sec) ondelayedslave show relaylog events will show statements from relay log pass relay log and position to start
  97. 97. filterbadqueries cycle through all the logs, analyze Query events rotate events - next log file last relay log points to binlog master (server_id is masters, binlog coord matches master_log_file/pos) http://www.flickr.com/photos/chriswaits/6607823843/sizes/l/in/ photostream/
  98. 98. B Slave AfterDelayedSlaveDataIsRestored.... A master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  99. 99. B Slave AfterDelayedSlaveDataIsRestored.... 1)stop mysqlonA andslave A master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  100. 100. B Slave AfterDelayedSlaveDataIsRestored.... 1)stop mysqlonA andslave 2)copy datafiles toA A master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  101. 101. B Slave AfterDelayedSlaveDataIsRestored.... 1)stop mysqlonA andslave 2)copy datafiles toA 3)restartBtoAreplication, letAcatchuptoB A master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  102. 102. Slave AfterDelayedSlaveDataIsRestored.... 1)stop mysqlonA andslave 2)copy datafiles toA 3)restartBtoAreplication, letAcatchuptoB A 4)restartAtoBreplication, putAbackin,thenpullB A B master.info should be pointing to the right place step 2 could be flipping physical box (for faster recovery such as index servers)
  103. 103. OtherFormsofRecovery MigrateSingleObject(user/shop/etc) HadoopDeltas Backup+Binlogs migrate object from delayed slave (similar to shard migration) can generate deltas from hadoop if delayed slave has “played” the bad data, go from last nights backup (slower)
  104. 104. UseCases what are some use cases? http://www.flickr.com/photos/seatbelt67/502255276/sizes/o/ in/photostream/
  105. 105. userreportsabug... a user files a bug, i can trace the code for the exact page they're on right from my dev machine
  106. 106. testing“dry”writes testing how application runs a “dry” write -- r/o mode, exception is thrown with the exact query it would have attempted to run, the values it tried to use, etc.
  107. 107. searchadscampaign consistency starting campaigns and maintaining consistency for entire ad system is nearly impossible in dev Search ads data is stored in more than a dozen DB tables and state changes are driven by a combination of browsers triggering ads, sellers managing their campaigns, and a slew of crons running anywhere from once per 5 minutes to once a month eg) to test pausing campaigns that run out of money mid-day, can pull large numbers of campaigns from prod and operate on those to verify that the data will still be consistent
  108. 108. googleproductlistingads GPLA is where we syndicate our listings to google to be used in google product search ads we can test edge cases in GPLA syndication where it would be difficult to recreate the state in dev
  109. 109. testingprototypes features like similar items search gives better results in production because of the amount of data, allowed us to test the quality of listings a prototype was displaying
  110. 110. performancetesting need a real data set to test pages like treasury search with lots of threads/avatars/etc the dev data is too sparse, xhprof traces don’t mean anything, missing avatars change perf characteristics
  111. 111. hadoopgenerated datasets dataset produced from hadoop (recommendations for users, or statistics about usage) but since hadoop is prod data its for prod users/listings/shops, so have to check against prod --- sync to dev would fill dev dbs and data wouldn’t line up (b/c prod data)
  112. 112. browseslices browse slices have complex population so its easier to test experiment against prod data
  113. 113. not enough listings to populate the narrower subcategories, and it just takes too long
  114. 114. ThankYou etsy.com/jobs We’re hiring

×