Advertisement

Scaling Etsy: What Went Wrong, What Went Right

Software Engineer at Etsy
Oct. 5, 2011
Advertisement

More Related Content

Similar to Scaling Etsy: What Went Wrong, What Went Right(20)

Advertisement

Scaling Etsy: What Went Wrong, What Went Right

  1. Scaling : What Went Wrong, What Went Right Ross Snyder ross@etsy.com @beamrider9 Sept. 30, 2011 1
  2. Etsy is the world’s handmade marketplace. (vintage and supplies, too) 2
  3. Etsy was founded in mid-2005 and is constantly growing. Gross Merchandise Sales ($MM) 3
  4. From humble beginnings... June Four employees, one web*, 2005: one db, founder’s apartment * until getting slashdotted by a link from Boing Boing in Aug. 2005 4
  5. ... to today’s handmade juggernaut. Sept. 250+ employees, multiple 2011: offices, billions of pageviews (NYC Mayor Mike Bloomberg visited Etsy in June 2011) 5
  6. How’d we get here? 6
  7. Answer: with some difficulty. “There is no education like adversity.” - Benjamin Disraeli 7
  8. A few disclaimers 8
  9. Hindsight is 20/20 9
  10. “History is written by the victors” 10
  11. Etsy thrives today because of what its early employees accomplished 11
  12. Your narrator wasn’t present for most of the events covered in this talk 12
  13. Etsy Architecture: 2007 13
  14. Etsy Architecture: 2007 Operating System: Database: Webserver: Languages: 14
  15. Etsy Architecture: 2007 Most business logic in Postgres stored procedures 15
  16. Etsy Architecture: 2007 Front end / database interaction = stored procedure calls wrapped with PHP functions 16
  17. Etsy Architecture: 2007 Some database partitioning by feature, but still with a large central DB 17
  18. Etsy Architecture: 2007 Site uptime = not great 18
  19. Etsy Architecture: 2007 “How do we scale?” 19
  20. Etsy Architecture: 2007 “Let’s write some middleware!” (runners up: “Let’s rewrite the site in Java!” and “Let’s rewrite the site in Python!”) 20
  21. Conway’s Law: “Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.” - Melvin Conway, 1968 21
  22. Etsy Engineering: 2007 Dev DBA Ops 22
  23. Etsy Engineering: 2007 Dev DBA Ops Devs write code 23
  24. Etsy Engineering: 2007 Dev DBA Ops DBAs write SQL 24
  25. Etsy Engineering: 2007 Dev DBA Ops Ops deploys code & touches prod 25
  26. SILOS 26
  27. Etsy’s big bet: “Sprouter” (the Stored Procedure Router) 27
  28. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Runs on each webserver, listens on port 8010 28
  29. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Maps name/arguments to a Postgres stored procedure, calls it, returns results 29
  30. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Caches things 30
  31. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Supports sharding (in theory) 31
  32. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Devs write PHP, DBAs write SQL, meet somewhere in the middle 32
  33. SILOS 33
  34. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) The hope: easier to scale Sprouter than to scale the database itself 34
  35. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) (scaling the db when everything’s in stored procedures = somewhere between hard and impossible) 35
  36. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production 36
  37. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production Spring ’09: Sprouter deprecated 37
  38. What happened? 38
  39. Sprouter: “Good” Parts Web Sprouter DB (PHP) (Python) (Postgres) Forcibly centralizes database access 39
  40. Sprouter: “Good” Parts Web Sprouter DB (PHP) (Python) (Postgres) Hides data store implementation from caller 40
  41. Sprouter: “Good” Parts Web Sprouter DB (PHP) (Python) (Postgres) Opens the door for “clever” automatic caching 41
  42. Sprouter: “Good” Parts Web Sprouter DB (PHP) (Python) (Postgres) Prevents developers from writing SQL (?) 42
  43. 43
  44. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres) Creates substantial developer friction 44
  45. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres) Homegrown daemon + dependencies for Ops to maintain 45
  46. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres) Lack of community support / provability 46
  47. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres) Complex synchronization required to deploy (due to tight coupling with Postgres) 47
  48. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres) Database remains single point of failure (sharding features never fully formed) 48
  49. Sprouter: Summary Extra barriers to development 49
  50. Sprouter: Summary Extra barriers to development + Negligible (negative?) effect on site reliability 50
  51. Sprouter: Summary Extra barriers to development + Negligible (negative?) effect on site reliability + Deploys even more painful 51
  52. Sprouter: Summary Extra barriers to development + Negligible (negative?) effect on site reliability + Deploys even more painful + Requires extra Ops/Dev resources 52
  53. Sprouter: Summary Extra barriers to development + Negligible (negative?) effect on site reliability + Deploys even more painful + Requires extra Ops/Dev resources = 53
  54. How did attitudes change so quickly? 54
  55. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production Spring ’09: Sprouter deprecated 55
  56. The Great Etsy Culture Shift 56
  57. The Great Etsy Culture Shift Just as Sprouter went live, many of its strongest proponents departed Etsy 57
  58. The Great Etsy Culture Shift Taking with them... 58
  59. The Great Etsy Culture Shift Devotion to Postgres stored procedures / types 59
  60. The Great Etsy Culture Shift Fear of developers writing SQL 60
  61. The Great Etsy Culture Shift Fear of developers touching prod 61
  62. The Great Etsy Culture Shift Infrequent / large deploys to production 62
  63. The Great Etsy Culture Shift “Not developed here” 63
  64. The Great Etsy Culture Shift Then Now Fall ’08 64
  65. DevOps 65
  66. DevOps Silos = bad 66
  67. DevOps Trust, cooperation, transparency, shared responsibility = good 67
  68. DevOps “We’re all in this together” 68
  69. The Way Forward: Part 1 Stabilize the site 69
  70. The Way Forward: Part 1 Stabilize the site Improve metrics & monitoring 70
  71. The Way Forward: Part 1 Stabilize the site StatsD http://github.com/etsy/statsd 71
  72. The Way Forward: Part 1 Stabilize the site Upgrade database hardware vertically as far as possible 72
  73. The Way Forward: Part 1 Stabilize the site Give developers production access to help troubleshoot problems 73
  74. The Way Forward: Part 2 Continuous Deployment 74
  75. The Way Forward: Part 2 Continuous Deployment Any engineer can deploy to prod (generally happens 25+ times per day) 75
  76. The Way Forward: Part 2 Continuous Deployment Deployinator http://github.com/etsy/deployinator 76
  77. The Way Forward: Part 2 Continuous Deployment One button that deploys the site 77
  78. The Way Forward: Part 2 Continuous Deployment Small changesets, deployed frequently 78
  79. The Way Forward: Part 2 Continuous Deployment Requires solid tests, good communication 79
  80. The Way Forward: Part 2 Continuous Deployment Distributed developer-driven QA 80
  81. The Way Forward: Part 3 Circumvent Sprouter 81
  82. The Way Forward: Part 3 Circumvent Sprouter Object-Relational Mapping (ORM) 82
  83. The Way Forward: Part 3 Circumvent Sprouter aka “The Vietnam of Computer Science” (Google it) 83
  84. The Way Forward: Part 3 Circumvent Sprouter Front-end PHP talks directly to database via ORM (also written in PHP) 84
  85. The Way Forward: Part 3 Circumvent Sprouter ORM can cache where appropriate (as can front end) 85
  86. The Way Forward: Part 4 Database Sharding 86
  87. The Way Forward: Part 4 Database Sharding Etsy has a lot of DNA from flickr - including their DB sharding scheme 87
  88. The Way Forward: Part 4 Database Sharding Based on MySQL 88
  89. The Way Forward: Part 4 Database Sharding Battle-tested, well-known 89
  90. The Way Forward: Part 4 Database Sharding Scales horizontally to infinity (or close enough) 90
  91. The Way Forward: Part 4 Database Sharding No single points of failure (master-master replication) 91
  92. The Way Forward: Part 4 Database Sharding Gradually phase out Sprouter, phase in ORM / sharded data 92
  93. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production Spring ’09: Sprouter deprecated 93
  94. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production Spring ’09: Sprouter deprecated Spring ’11: Sprouter turned off 94
  95. 95
  96. Lessons Learned 96
  97. Etsy Architecture: 2007 Operating System: Database: Webserver: Languages: 97
  98. Etsy Architecture: 2011 Operating System: Database: Webserver: Languages: 98
  99. Open & trusting > closed & afraid (DevOps DevOps DevOps) 99
  100. Front end/database interaction is too critical to take chances on novel/untested solutions 100
  101. Side corollary: If you’re doing something “clever”, you’re probably doing it wrong 101
  102. The architectural decisions you make today will have large impact long after you’re gone 102
  103. No architectural hole is so deep that proven scaling strategies don’t exist for digging out 103
  104. Acknowledgement We are probably making decisions today that will be the subject of a similar talk in 2015 104
  105. Learn More: http://codeascraft.etsy.com/ @codeascraft 105
  106. Etsy is hiring! http://www.etsy.com/careers @etsy 106
Advertisement