Scaling Etsy: What Went Wrong, What Went Right

  • 7,287 views
Uploaded on

Slides for the talk given at Surge 2011.

Slides for the talk given at Surge 2011.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,287
On Slideshare
0
From Embeds
0
Number of Embeds
11

Actions

Shares
Downloads
113
Comments
0
Likes
11

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scaling : What Went Wrong, What Went RightRoss Snyderross@etsy.com@beamrider9 Sept. 30, 2011 1
  • 2. Etsy is the world’s handmade marketplace. (vintage and supplies, too) 2
  • 3. Etsy was founded in mid-2005 and is constantly growing. Gross Merchandise Sales ($MM) 3
  • 4. From humble beginnings... June Four employees, one web*,2005: one db, founder’s apartment * until getting slashdotted by a link from Boing Boing in Aug. 2005 4
  • 5. ... to today’s handmade juggernaut.Sept. 250+ employees, multiple2011: offices, billions of pageviews (NYC Mayor Mike Bloomberg visited Etsy in June 2011) 5
  • 6. How’d we get here? 6
  • 7. Answer: with some difficulty.“There is no education like adversity.” - Benjamin Disraeli 7
  • 8. A few disclaimers 8
  • 9. Hindsight is 20/20 9
  • 10. “History is written by the victors” 10
  • 11. Etsy thrives today because of whatits early employees accomplished 11
  • 12. Your narrator wasn’t present for most of the events covered in this talk 12
  • 13. Etsy Architecture: 2007 13
  • 14. Etsy Architecture: 2007Operating System: Database: Webserver: Languages: 14
  • 15. Etsy Architecture: 2007 Most business logic in Postgres stored procedures 15
  • 16. Etsy Architecture: 2007 Front end / database interaction = storedprocedure calls wrapped with PHP functions 16
  • 17. Etsy Architecture: 2007Some database partitioning by feature, but still with a large central DB 17
  • 18. Etsy Architecture: 2007 Site uptime = not great 18
  • 19. Etsy Architecture: 2007 “How do we scale?” 19
  • 20. Etsy Architecture: 2007“Let’s write some middleware!”(runners up: “Let’s rewrite the site in Java!” and “Let’s rewrite the site in Python!”) 20
  • 21. Conway’s Law:“Any organization that designsa system (defined broadly) willproduce a design whosestructure is a copy of theorganizations communicationstructure.”- Melvin Conway, 1968 21
  • 22. Etsy Engineering: 2007Dev DBA Ops 22
  • 23. Etsy Engineering: 2007Dev DBA Ops Devs write code 23
  • 24. Etsy Engineering: 2007Dev DBA Ops DBAs write SQL 24
  • 25. Etsy Engineering: 2007Dev DBA OpsOps deploys code & touches prod 25
  • 26. SILOS 26
  • 27. Etsy’s big bet: “Sprouter” (the Stored Procedure Router) 27
  • 28. Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Runs on each webserver, listens on port 8010 28
  • 29. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Maps name/arguments to a Postgresstored procedure, calls it, returns results 29
  • 30. Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Caches things 30
  • 31. Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Supports sharding (in theory) 31
  • 32. Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Devs write PHP, DBAs write SQL, meet somewhere in the middle 32
  • 33. SILOS 33
  • 34. Sprouter Web Sprouter DB(PHP) (Python) (Postgres) The hope: easier to scale Sprouter than to scale the database itself 34
  • 35. Sprouter Web Sprouter DB (PHP) (Python) (Postgres) (scaling the db when everything’s instored procedures = somewhere between hard and impossible) 35
  • 36. Sprouter: Timeline Fall ’07: Idea first discussedSpring ’08: Alpha version debuts Fall ’08: Released in production 36
  • 37. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in productionSpring ’09: Sprouter deprecated 37
  • 38. What happened? 38
  • 39. Sprouter: “Good” Parts Web Sprouter DB(PHP) (Python) (Postgres)Forcibly centralizes database access 39
  • 40. Sprouter: “Good” Parts Web Sprouter DB(PHP) (Python) (Postgres) Hides data store implementation from caller 40
  • 41. Sprouter: “Good” Parts Web Sprouter DB(PHP) (Python) (Postgres) Opens the door for “clever” automatic caching 41
  • 42. Sprouter: “Good” Parts Web Sprouter DB (PHP) (Python) (Postgres)Prevents developers from writing SQL (?) 42
  • 43. 43
  • 44. Sprouter: Not-As-Good Parts Web Sprouter DB(PHP) (Python) (Postgres)Creates substantial developer friction 44
  • 45. Sprouter: Not-As-Good Parts Web Sprouter DB(PHP) (Python) (Postgres)Homegrown daemon + dependencies for Ops to maintain 45
  • 46. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres)Lack of community support / provability 46
  • 47. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres)Complex synchronization required to deploy (due to tight coupling with Postgres) 47
  • 48. Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres)Database remains single point of failure(sharding features never fully formed) 48
  • 49. Sprouter: SummaryExtra barriers to development 49
  • 50. Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability 50
  • 51. Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability+ Deploys even more painful 51
  • 52. Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability+ Deploys even more painful+ Requires extra Ops/Dev resources 52
  • 53. Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability+ Deploys even more painful+ Requires extra Ops/Dev resources= 53
  • 54. How did attitudes change so quickly? 54
  • 55. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in productionSpring ’09: Sprouter deprecated 55
  • 56. The Great Etsy Culture Shift 56
  • 57. The Great Etsy Culture ShiftJust as Sprouter went live, many of its strongest proponents departed Etsy 57
  • 58. The Great Etsy Culture Shift Taking with them... 58
  • 59. The Great Etsy Culture Shift Devotion to Postgres stored procedures / types 59
  • 60. The Great Etsy Culture Shift Fear of developers writing SQL 60
  • 61. The Great Etsy Culture Shift Fear of developers touching prod 61
  • 62. The Great Etsy Culture ShiftInfrequent / large deploys to production 62
  • 63. The Great Etsy Culture Shift “Not developed here” 63
  • 64. The Great Etsy Culture ShiftThen Now Fall ’08 64
  • 65. DevOps 65
  • 66. DevOpsSilos = bad 66
  • 67. DevOpsTrust, cooperation, transparency, shared responsibility = good 67
  • 68. DevOps“We’re all in this together” 68
  • 69. The Way Forward: Part 1 Stabilize the site 69
  • 70. The Way Forward: Part 1 Stabilize the site Improve metrics & monitoring 70
  • 71. The Way Forward: Part 1 Stabilize the site StatsDhttp://github.com/etsy/statsd 71
  • 72. The Way Forward: Part 1 Stabilize the site Upgrade database hardware vertically as far as possible 72
  • 73. The Way Forward: Part 1 Stabilize the siteGive developers production access to help troubleshoot problems 73
  • 74. The Way Forward: Part 2 Continuous Deployment 74
  • 75. The Way Forward: Part 2 Continuous Deployment Any engineer can deploy to prod(generally happens 25+ times per day) 75
  • 76. The Way Forward: Part 2 Continuous Deployment Deployinatorhttp://github.com/etsy/deployinator 76
  • 77. The Way Forward: Part 2 Continuous DeploymentOne button that deploys the site 77
  • 78. The Way Forward: Part 2 Continuous DeploymentSmall changesets, deployed frequently 78
  • 79. The Way Forward: Part 2 Continuous Deployment Requires solid tests, good communication 79
  • 80. The Way Forward: Part 2 Continuous DeploymentDistributed developer-driven QA 80
  • 81. The Way Forward: Part 3 Circumvent Sprouter 81
  • 82. The Way Forward: Part 3 Circumvent SprouterObject-Relational Mapping (ORM) 82
  • 83. The Way Forward: Part 3 Circumvent Sprouteraka “The Vietnam of Computer Science” (Google it) 83
  • 84. The Way Forward: Part 3 Circumvent SprouterFront-end PHP talks directly to database via ORM (also written in PHP) 84
  • 85. The Way Forward: Part 3 Circumvent SprouterORM can cache where appropriate (as can front end) 85
  • 86. The Way Forward: Part 4 Database Sharding 86
  • 87. The Way Forward: Part 4 Database Sharding Etsy has a lot of DNA from flickr -including their DB sharding scheme 87
  • 88. The Way Forward: Part 4 Database Sharding Based on MySQL 88
  • 89. The Way Forward: Part 4 Database Sharding Battle-tested, well-known 89
  • 90. The Way Forward: Part 4 Database Sharding Scales horizontally to infinity (or close enough) 90
  • 91. The Way Forward: Part 4 Database Sharding No single points of failure (master-master replication) 91
  • 92. The Way Forward: Part 4 Database Sharding Gradually phase out Sprouter, phase in ORM / sharded data 92
  • 93. Sprouter: Timeline Fall ’07: Idea first discussedSpring ’08: Alpha version debuts Fall ’08: Released in productionSpring ’09: Sprouter deprecated 93
  • 94. Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production Spring ’09: Sprouter deprecatedSpring ’11: Sprouter turned off 94
  • 95. 95
  • 96. Lessons Learned 96
  • 97. Etsy Architecture: 2007Operating System: Database: Webserver: Languages: 97
  • 98. Etsy Architecture: 2011Operating System: Database: Webserver: Languages: 98
  • 99. Open & trusting > closed & afraid (DevOps DevOps DevOps) 99
  • 100. Front end/database interaction is too criticalto take chances on novel/untested solutions 100
  • 101. Side corollary: If you’re doing something“clever”, you’re probably doing it wrong 101
  • 102. The architectural decisions you make todaywill have large impact long after you’re gone 102
  • 103. No architectural hole is so deep that provenscaling strategies don’t exist for digging out 103
  • 104. AcknowledgementWe are probably making decisions today that will be the subject of a similar talk in 2015 104
  • 105. Learn More:http://codeascraft.etsy.com/@codeascraft 105
  • 106. Etsy is hiring!http://www.etsy.com/careers@etsy 106