Spotify services (SDC 2013)

1,134 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,134
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Spotify services (SDC 2013)

  1. 1. The whole is greater than the sum of the partsSpotify servicesNiklas Gustavssonmåndag 27 maj 13
  2. 2. Distributed systems geekSpotify since 2011ngn@spotify.com@protocol7Memåndag 27 maj 13
  3. 3. Architectural overviewLots of questions!Lastyearmåndag 27 maj 13
  4. 4. Spotify has more than a hundred backend services. They handle enormous amounts of data.They should always be available. How are they built?Todaymåndag 27 maj 13
  5. 5. In praise of small servicesmåndag 27 maj 13
  6. 6. A small code base is simpler to understand and reason aboutDoing one thing and one thing only means no compromisesIn praise of small servicesCCC CAPSS SSmåndag 27 maj 13
  7. 7. “Rule of Modularity: Developers should build a program out of simple parts connected by welldefined interfaces, so problems are local, and parts of the program can be replaced in futureversions to support new features. This rule aims to save time on debugging complex code thatis complex, long, and unreadable.”Eric S. Raymond, The Art of Unix Programmingmåndag 27 maj 13
  8. 8. “Decouple until it breaks, and then back of just a little”Strive to make services autonomousWatch your latency, but commonly not significantDecoupleCCC CAPSS SSmåndag 27 maj 13
  9. 9. Use scaffolding to quickly get the basic service structureReuse in librariesDon’t overuse patterns. Don’t use layers upon layers. Keep it simpleSimple codebasesmåndag 27 maj 13
  10. 10. We build services in Python and JavaPython is awesome for quick development and beautiful codeThe JVM is stable, performant and transparentLanguages and runtimesmåndag 27 maj 13
  11. 11. Performance at scalemåndag 27 maj 13
  12. 12. Care about your performance. Set clear goals. Measure, measure, measure.Have an architecture that allows for scale. Build out as needed. Measure, measure, measure.Performance at scalehttp://www.bbc.co.uk/programmes/b01qzdc1måndag 27 maj 13
  13. 13. Prefer stateless services when possibleScales out linearIsolate mutating operationsPrefer stateless servicesmåndag 27 maj 13
  14. 14. Fast, efficient, RESTful protocolsConnection pools are hard. Overloaded TCP servers are complicatedUse queues. Proper pushback. Naturally asynchronous.Efficient protocolsmåndag 27 maj 13
  15. 15. Small payloads, fast marshalinggziphttp://qconsf.com/dl/qcon-sanfran-2011/slides/SastryMalladi_DealingWithPerformanceChallengesOptimizedSerializationTechniques.pdfEfficient payloadsmåndag 27 maj 13
  16. 16. ZeroMQ. Light-weight, fast as hell, queue basedProtobuf. Small, fast, schema-based, simple binary formatRequest-reply and pub/subHermesmåndag 27 maj 13
  17. 17. Don’t be afraid to drop requests (and replies) when overloadedUse shallow queuesUse short timeoutsUse small thread poolsUse small connection poolsDrop requestsmåndag 27 maj 13
  18. 18. måndag 27 maj 13
  19. 19. We use the best tool for each case from a small, carefully selected set of optionsPostgreSQL as the default mutable storageCassandra for large scale (heavy writes) or multi-site servicesVarious read-only key-value storeshttp://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/Scaling storagemåndag 27 maj 13
  20. 20. Always fail, never failmåndag 27 maj 13
  21. 21. Stuff is always broken. Deal with it.Always design for redundancyAlways keep an eye on your worldDon’t DDoS yourselfAlways fail, never failmåndag 27 maj 13
  22. 22. Build your system to run on multiple serversUse service discovery everywhere. We use DNS SRV records.Make deployment and configuration automated and repeatableMake sure your service is actually runningMany commodity serversmåndag 27 maj 13
  23. 23. Instrument your code with metrics everywhereWe use our own for Python. http://metrics.codahale.com for javaMonitor your infrastructure. JVMs, OS, network, storageMeasure everythingmåndag 27 maj 13
  24. 24. Graph your important metrics, strive for seconds latencyWe use a heavily extended derivative of MuninGraphmåndag 27 maj 13
  25. 25. Hard to know beforehand, err on the side of logging too much (within reasons)Use a structured formatUse syslogCollect your logs in a central placeStore your logs and make them analyzableLog what’s importantmåndag 27 maj 13
  26. 26. Consistently build to some form of packages. Keep track of dependenciesWe build everything* to Debian packages and use package dependenciesDebian is awesome. Use it.Automate deployment* Except Maven dependenciesmåndag 27 maj 13
  27. 27. Keep everything under version controlUse a provisioning toolWe use Puppet and store every configuration in Git. Everything*.250 modules, 880 classesAutomate configuration* Everythingmåndag 27 maj 13
  28. 28. Trust your developers and ops. Let your teams be autonomousLong-term ownershipMinimize interruptions (aka meetings)Favor asynchronous communication. We coordinate over IRC and use mailShip.Developmentmåndag 27 maj 13
  29. 29. We’re hiring → spotify.com/jobs (ngn@spotify.com)Questions?måndag 27 maj 13

×