Spotify architecture - Pressing play

21,575 views

Published on

Slides from a talk I gave at Scandinavian Developers Conference 2012 on the architecture of Spotify. The slides follows a story of playing a track and the steps to get there.

0 Comments
82 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
21,575
On SlideShare
0
From Embeds
0
Number of Embeds
785
Actions
Shares
0
Downloads
0
Comments
0
Likes
82
Embeds 0
No embeds

No notes for slide

Spotify architecture - Pressing play

  1. 1. Pressing play Niklas Gustavsson ngn@spotify.com @protocol7Tuesday, April 17, 12
  2. 2. Who am I? • ngn@spotify.com • @protocol7 • Spotify backend dev based in Göteborg • Mainly from a JVM background, working on various stuff over the years • Apache Software Foundation memberTuesday, April 17, 12
  3. 3. What’s Spotify all about? • A big catalogue, tons of music • Available everywhere • Great user experience • More convenient than piracy • Fast, reliable, always available • Scalable for many, many users • Ad-supported or payed-for serviceTuesday, April 17, 12
  4. 4. Pressing  playTuesday, April 17, 12
  5. 5. Where’s Spotify? • Let’s start the client, but where should it connect to?Tuesday, April 17, 12
  6. 6. Aside: SRV records • Example SRV _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 8 4070 C8.spotify.com. _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 16 4070 C4.spotify.com. name TTL class prio weight port host • GeoDNS usedTuesday, April 17, 12
  7. 7. What does that record really point to? • accesspoint • Handles authentication state, logging, routing, rate limiting and much more • Protocol between client and AP uses a single, encrypted multiplexed socket over TCP • Written in C++Tuesday, April 17, 12
  8. 8. Tuesday, April 17, 12
  9. 9. Find something to play • Let’s searchTuesday, April 17, 12
  10. 10. Services • Probably close to 100 backend services, most small, handling a single task • UNIX philosophy • Many autonomous • Deployed on commodity servers • Always redundantTuesday, April 17, 12
  11. 11. Services • Mostly written in Python, a few in Java and C • Storage optimized for each service, mostly PostgreSQL, Cassandra and Tokyo Cabinet • Many service uses in-memory caching using for example /dev/shm or memcached • Usually a small daemon, talking HTTP or Hermes • Got our own supervisor which keeps services runningTuesday, April 17, 12
  12. 12. Aside: Hermes • ZeroMQ for transport, protobuf for envelope and payload • HTTP-like verbs and caching • Request-reply and publish/subscribe • Very performant and introspectableTuesday, April 17, 12
  13. 13. How does the accesspoint find search? • Everything has an SRV DNS record: • One record with same name for each service instance • Clients resolve to find servers providing that service • Lowest priority record is chosen with weighted shuffle • Clients retry other instances in case of failuresTuesday, April 17, 12
  14. 14. Read-only services • Stateless • Writes are hard • Simple to scale, just add more servers • Services can be restarted as needed • Indexes prefabricated, distributed to live serversTuesday, April 17, 12
  15. 15. Read-write services • User generated content, e.g. playlists • Hard to ensure consistence of data across instances Solutions: • Eventual consistency: • Reads of just written data not guaranteed to be up-to-date • Locking, atomic operations • Creating globally unique keys, e.g. usernames • Transactions, e.g. billingTuesday, April 17, 12
  16. 16. Sharding • Some services use Dynamo inspired DHTs • Each request has a key • Each service node is responsible for a range of hash keys • Data is distributed among service nodes • Redundancy is ensured by writing to replica node • Data must be transitioned when ring changesTuesday, April 17, 12
  17. 17. DHT exampleTuesday, April 17, 12
  18. 18. search • Java service • Lucene storage • New index published daily • Doesn’t store any metadata in itself, returns a list of identifiers • (Search suggestions are served from a separate service, optimized for speed)Tuesday, April 17, 12
  19. 19. Metadata services • Multiple read-only services • 60 Gb indices • Responds to metadata requests • Decorates metadata onto other service responses • We’re most likely moving away from this modelTuesday, April 17, 12
  20. 20. Tuesday, April 17, 12
  21. 21. Another aside: How does stuff get into Spotify? • >15 million tracks, we can’t maintain all that ourselves • Ingest audio, images and metadata from labels • Receive, transform, transcode, merge • All ends up in a metadata database from which indices are generated and distributed to servicesTuesday, April 17, 12
  22. 22. Tuesday, April 17, 12
  23. 23. The Kent bug • Much of the metadata lacks identifiers which leaves us with heuristics.Tuesday, April 17, 12
  24. 24. PlayTuesday, April 17, 12
  25. 25. Audio encodings and files • Spotify supports multiple audio encodings • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (- q9) • MP3 320 000 (downloads) • For each track, a file for each encoding/bitrate is listed in the returned metadata • The client picks an appropriate choiceTuesday, April 17, 12
  26. 26. Get the audio data • The client now must fetch the actual audio data • Latency killsTuesday, April 17, 12
  27. 27. Cache • Player caches tracks it has played • Caches are large (56% are over 5 GB) • Least Recently Used policy for cache eviction • 50% of data comes from local cache • Cached files are served in P2P overlayTuesday, April 17, 12
  28. 28. Streaming • Request first piece from Spotify storage • Meanwhile, search peer-to-peer (P2P) for remainder • Switch back and forth between Spotify storage and peers as needed • Towards end of a track, start prefetching next oneTuesday, April 17, 12
  29. 29. P2P • All peers are equals (no supernodes) • A user only downloads data she needs • tracker service keeps peers for each track • P2P network becomes (weakly) clustered by interest • Oblivious to network architecture • Does not enforce fairness • Mobile clients does not participate in P2P h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-­‐spo9fy_kth11.pdfTuesday, April 17, 12
  30. 30. Tuesday, April 17, 12
  31. 31. Tuesday, April 17, 12
  32. 32. Success!Tuesday, April 17, 12
  33. 33. YAA: Hadoop • We run analysis using Hadoop which feeds back into the previously described process, e.g. track popularity is used for weighing search results and toplistsTuesday, April 17, 12
  34. 34. Tuesday, April 17, 12
  35. 35. Development at Spotify • Uses almost exclusively open source software • Git, Debian, Munin, Zabbix, Puppet, Teamcity... • Developers use whatever development tools they are comfortable with • Scrum or Kanban in three week iterations • DevOps heavy. Freaking awesome ops • Monitor and measure all the things!Tuesday, April 17, 12
  36. 36. Development at Spotify • Development hubs in Stockholm, Göteborg and NYC • All in all, >220 people in tech • Very talented team • Hackdays and system owner days in each iteration • Hangs out on IRC • Growing and hiringTuesday, April 17, 12
  37. 37. Languages at SpotifyTuesday, April 17, 12
  38. 38. Questions?Tuesday, April 17, 12
  39. 39. Thank you Want to work at Spotify? http://www.spotify.com/jobs/Tuesday, April 17, 12

×