• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scaling Etsy: What Went Wrong, What Went Right
 

Scaling Etsy: What Went Wrong, What Went Right

on

  • 6,748 views

Slides for the talk given at Surge 2011.

Slides for the talk given at Surge 2011.

Statistics

Views

Total Views
6,748
Views on SlideShare
6,376
Embed Views
372

Actions

Likes
10
Downloads
104
Comments
0

12 Embeds 372

http://blog.lgohlke.de 199
http://lanyrd.com 47
http://localhost 43
https://twitter.com 41
https://si0.twimg.com 11
http://spassmitit.blogspot.de 10
http://4439535518731425974_e6f5f316b282950409d3eb6a18768b4797071597.blogspot.com 8
https://twimg0-a.akamaihd.net 6
http://blog2.lgohlke.de 3
http://coderwall.com 2
http://4439535518731425974_e6f5f316b282950409d3eb6a18768b4797071597.blogspot.de 1
https://coderwall.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Scaling Etsy: What Went Wrong, What Went Right Scaling Etsy: What Went Wrong, What Went Right Presentation Transcript

  • Scaling : What Went Wrong, What Went RightRoss Snyderross@etsy.com@beamrider9 Sept. 30, 2011 1
  • Etsy is the world’s handmade marketplace. (vintage and supplies, too) 2
  • Etsy was founded in mid-2005 and is constantly growing. Gross Merchandise Sales ($MM) 3
  • From humble beginnings... June Four employees, one web*,2005: one db, founder’s apartment * until getting slashdotted by a link from Boing Boing in Aug. 2005 4
  • ... to today’s handmade juggernaut.Sept. 250+ employees, multiple2011: offices, billions of pageviews (NYC Mayor Mike Bloomberg visited Etsy in June 2011) 5
  • How’d we get here? 6
  • Answer: with some difficulty.“There is no education like adversity.” - Benjamin Disraeli 7
  • A few disclaimers 8
  • Hindsight is 20/20 9
  • “History is written by the victors” 10
  • Etsy thrives today because of whatits early employees accomplished 11
  • Your narrator wasn’t present for most of the events covered in this talk 12
  • Etsy Architecture: 2007 13
  • Etsy Architecture: 2007Operating System: Database: Webserver: Languages: 14
  • Etsy Architecture: 2007 Most business logic in Postgres stored procedures 15
  • Etsy Architecture: 2007 Front end / database interaction = storedprocedure calls wrapped with PHP functions 16
  • Etsy Architecture: 2007Some database partitioning by feature, but still with a large central DB 17
  • Etsy Architecture: 2007 Site uptime = not great 18
  • Etsy Architecture: 2007 “How do we scale?” 19
  • Etsy Architecture: 2007“Let’s write some middleware!”(runners up: “Let’s rewrite the site in Java!” and “Let’s rewrite the site in Python!”) 20
  • Conway’s Law:“Any organization that designsa system (defined broadly) willproduce a design whosestructure is a copy of theorganizations communicationstructure.”- Melvin Conway, 1968 21
  • Etsy Engineering: 2007Dev DBA Ops 22
  • Etsy Engineering: 2007Dev DBA Ops Devs write code 23
  • Etsy Engineering: 2007Dev DBA Ops DBAs write SQL 24
  • Etsy Engineering: 2007Dev DBA OpsOps deploys code & touches prod 25
  • SILOS 26
  • Etsy’s big bet: “Sprouter” (the Stored Procedure Router) 27
  • Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Runs on each webserver, listens on port 8010 28
  • Sprouter Web Sprouter DB (PHP) (Python) (Postgres) Maps name/arguments to a Postgresstored procedure, calls it, returns results 29
  • Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Caches things 30
  • Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Supports sharding (in theory) 31
  • Sprouter Web Sprouter DB(PHP) (Python) (Postgres) Devs write PHP, DBAs write SQL, meet somewhere in the middle 32
  • SILOS 33
  • Sprouter Web Sprouter DB(PHP) (Python) (Postgres) The hope: easier to scale Sprouter than to scale the database itself 34
  • Sprouter Web Sprouter DB (PHP) (Python) (Postgres) (scaling the db when everything’s instored procedures = somewhere between hard and impossible) 35
  • Sprouter: Timeline Fall ’07: Idea first discussedSpring ’08: Alpha version debuts Fall ’08: Released in production 36
  • Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in productionSpring ’09: Sprouter deprecated 37
  • What happened? 38
  • Sprouter: “Good” Parts Web Sprouter DB(PHP) (Python) (Postgres)Forcibly centralizes database access 39
  • Sprouter: “Good” Parts Web Sprouter DB(PHP) (Python) (Postgres) Hides data store implementation from caller 40
  • Sprouter: “Good” Parts Web Sprouter DB(PHP) (Python) (Postgres) Opens the door for “clever” automatic caching 41
  • Sprouter: “Good” Parts Web Sprouter DB (PHP) (Python) (Postgres)Prevents developers from writing SQL (?) 42
  • 43
  • Sprouter: Not-As-Good Parts Web Sprouter DB(PHP) (Python) (Postgres)Creates substantial developer friction 44
  • Sprouter: Not-As-Good Parts Web Sprouter DB(PHP) (Python) (Postgres)Homegrown daemon + dependencies for Ops to maintain 45
  • Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres)Lack of community support / provability 46
  • Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres)Complex synchronization required to deploy (due to tight coupling with Postgres) 47
  • Sprouter: Not-As-Good Parts Web Sprouter DB (PHP) (Python) (Postgres)Database remains single point of failure(sharding features never fully formed) 48
  • Sprouter: SummaryExtra barriers to development 49
  • Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability 50
  • Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability+ Deploys even more painful 51
  • Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability+ Deploys even more painful+ Requires extra Ops/Dev resources 52
  • Sprouter: SummaryExtra barriers to development+ Negligible (negative?) effect on site reliability+ Deploys even more painful+ Requires extra Ops/Dev resources= 53
  • How did attitudes change so quickly? 54
  • Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in productionSpring ’09: Sprouter deprecated 55
  • The Great Etsy Culture Shift 56
  • The Great Etsy Culture ShiftJust as Sprouter went live, many of its strongest proponents departed Etsy 57
  • The Great Etsy Culture Shift Taking with them... 58
  • The Great Etsy Culture Shift Devotion to Postgres stored procedures / types 59
  • The Great Etsy Culture Shift Fear of developers writing SQL 60
  • The Great Etsy Culture Shift Fear of developers touching prod 61
  • The Great Etsy Culture ShiftInfrequent / large deploys to production 62
  • The Great Etsy Culture Shift “Not developed here” 63
  • The Great Etsy Culture ShiftThen Now Fall ’08 64
  • DevOps 65
  • DevOpsSilos = bad 66
  • DevOpsTrust, cooperation, transparency, shared responsibility = good 67
  • DevOps“We’re all in this together” 68
  • The Way Forward: Part 1 Stabilize the site 69
  • The Way Forward: Part 1 Stabilize the site Improve metrics & monitoring 70
  • The Way Forward: Part 1 Stabilize the site StatsDhttp://github.com/etsy/statsd 71
  • The Way Forward: Part 1 Stabilize the site Upgrade database hardware vertically as far as possible 72
  • The Way Forward: Part 1 Stabilize the siteGive developers production access to help troubleshoot problems 73
  • The Way Forward: Part 2 Continuous Deployment 74
  • The Way Forward: Part 2 Continuous Deployment Any engineer can deploy to prod(generally happens 25+ times per day) 75
  • The Way Forward: Part 2 Continuous Deployment Deployinatorhttp://github.com/etsy/deployinator 76
  • The Way Forward: Part 2 Continuous DeploymentOne button that deploys the site 77
  • The Way Forward: Part 2 Continuous DeploymentSmall changesets, deployed frequently 78
  • The Way Forward: Part 2 Continuous Deployment Requires solid tests, good communication 79
  • The Way Forward: Part 2 Continuous DeploymentDistributed developer-driven QA 80
  • The Way Forward: Part 3 Circumvent Sprouter 81
  • The Way Forward: Part 3 Circumvent SprouterObject-Relational Mapping (ORM) 82
  • The Way Forward: Part 3 Circumvent Sprouteraka “The Vietnam of Computer Science” (Google it) 83
  • The Way Forward: Part 3 Circumvent SprouterFront-end PHP talks directly to database via ORM (also written in PHP) 84
  • The Way Forward: Part 3 Circumvent SprouterORM can cache where appropriate (as can front end) 85
  • The Way Forward: Part 4 Database Sharding 86
  • The Way Forward: Part 4 Database Sharding Etsy has a lot of DNA from flickr -including their DB sharding scheme 87
  • The Way Forward: Part 4 Database Sharding Based on MySQL 88
  • The Way Forward: Part 4 Database Sharding Battle-tested, well-known 89
  • The Way Forward: Part 4 Database Sharding Scales horizontally to infinity (or close enough) 90
  • The Way Forward: Part 4 Database Sharding No single points of failure (master-master replication) 91
  • The Way Forward: Part 4 Database Sharding Gradually phase out Sprouter, phase in ORM / sharded data 92
  • Sprouter: Timeline Fall ’07: Idea first discussedSpring ’08: Alpha version debuts Fall ’08: Released in productionSpring ’09: Sprouter deprecated 93
  • Sprouter: Timeline Fall ’07: Idea first discussed Spring ’08: Alpha version debuts Fall ’08: Released in production Spring ’09: Sprouter deprecatedSpring ’11: Sprouter turned off 94
  • 95
  • Lessons Learned 96
  • Etsy Architecture: 2007Operating System: Database: Webserver: Languages: 97
  • Etsy Architecture: 2011Operating System: Database: Webserver: Languages: 98
  • Open & trusting > closed & afraid (DevOps DevOps DevOps) 99
  • Front end/database interaction is too criticalto take chances on novel/untested solutions 100
  • Side corollary: If you’re doing something“clever”, you’re probably doing it wrong 101
  • The architectural decisions you make todaywill have large impact long after you’re gone 102
  • No architectural hole is so deep that provenscaling strategies don’t exist for digging out 103
  • AcknowledgementWe are probably making decisions today that will be the subject of a similar talk in 2015 104
  • Learn More:http://codeascraft.etsy.com/@codeascraft 105
  • Etsy is hiring!http://www.etsy.com/careers@etsy 106