Spotify architecture - Pressing play

  • 11,021 views
Uploaded on

Slides from a talk I gave at Scandinavian Developers Conference 2012 on the architecture of Spotify. The slides follows a story of playing a track and the steps to get there.

Slides from a talk I gave at Scandinavian Developers Conference 2012 on the architecture of Spotify. The slides follows a story of playing a track and the steps to get there.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
11,021
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
0
Comments
0
Likes
49

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Pressing play Niklas Gustavsson ngn@spotify.com @protocol7Tuesday, April 17, 12
  • 2. Who am I? • ngn@spotify.com • @protocol7 • Spotify backend dev based in Göteborg • Mainly from a JVM background, working on various stuff over the years • Apache Software Foundation memberTuesday, April 17, 12
  • 3. What’s Spotify all about? • A big catalogue, tons of music • Available everywhere • Great user experience • More convenient than piracy • Fast, reliable, always available • Scalable for many, many users • Ad-supported or payed-for serviceTuesday, April 17, 12
  • 4. Pressing  playTuesday, April 17, 12
  • 5. Where’s Spotify? • Let’s start the client, but where should it connect to?Tuesday, April 17, 12
  • 6. Aside: SRV records • Example SRV _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 8 4070 C8.spotify.com. _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 16 4070 C4.spotify.com. name TTL class prio weight port host • GeoDNS usedTuesday, April 17, 12
  • 7. What does that record really point to? • accesspoint • Handles authentication state, logging, routing, rate limiting and much more • Protocol between client and AP uses a single, encrypted multiplexed socket over TCP • Written in C++Tuesday, April 17, 12
  • 8. Tuesday, April 17, 12
  • 9. Find something to play • Let’s searchTuesday, April 17, 12
  • 10. Services • Probably close to 100 backend services, most small, handling a single task • UNIX philosophy • Many autonomous • Deployed on commodity servers • Always redundantTuesday, April 17, 12
  • 11. Services • Mostly written in Python, a few in Java and C • Storage optimized for each service, mostly PostgreSQL, Cassandra and Tokyo Cabinet • Many service uses in-memory caching using for example /dev/shm or memcached • Usually a small daemon, talking HTTP or Hermes • Got our own supervisor which keeps services runningTuesday, April 17, 12
  • 12. Aside: Hermes • ZeroMQ for transport, protobuf for envelope and payload • HTTP-like verbs and caching • Request-reply and publish/subscribe • Very performant and introspectableTuesday, April 17, 12
  • 13. How does the accesspoint find search? • Everything has an SRV DNS record: • One record with same name for each service instance • Clients resolve to find servers providing that service • Lowest priority record is chosen with weighted shuffle • Clients retry other instances in case of failuresTuesday, April 17, 12
  • 14. Read-only services • Stateless • Writes are hard • Simple to scale, just add more servers • Services can be restarted as needed • Indexes prefabricated, distributed to live serversTuesday, April 17, 12
  • 15. Read-write services • User generated content, e.g. playlists • Hard to ensure consistence of data across instances Solutions: • Eventual consistency: • Reads of just written data not guaranteed to be up-to-date • Locking, atomic operations • Creating globally unique keys, e.g. usernames • Transactions, e.g. billingTuesday, April 17, 12
  • 16. Sharding • Some services use Dynamo inspired DHTs • Each request has a key • Each service node is responsible for a range of hash keys • Data is distributed among service nodes • Redundancy is ensured by writing to replica node • Data must be transitioned when ring changesTuesday, April 17, 12
  • 17. DHT exampleTuesday, April 17, 12
  • 18. search • Java service • Lucene storage • New index published daily • Doesn’t store any metadata in itself, returns a list of identifiers • (Search suggestions are served from a separate service, optimized for speed)Tuesday, April 17, 12
  • 19. Metadata services • Multiple read-only services • 60 Gb indices • Responds to metadata requests • Decorates metadata onto other service responses • We’re most likely moving away from this modelTuesday, April 17, 12
  • 20. Tuesday, April 17, 12
  • 21. Another aside: How does stuff get into Spotify? • >15 million tracks, we can’t maintain all that ourselves • Ingest audio, images and metadata from labels • Receive, transform, transcode, merge • All ends up in a metadata database from which indices are generated and distributed to servicesTuesday, April 17, 12
  • 22. Tuesday, April 17, 12
  • 23. The Kent bug • Much of the metadata lacks identifiers which leaves us with heuristics.Tuesday, April 17, 12
  • 24. PlayTuesday, April 17, 12
  • 25. Audio encodings and files • Spotify supports multiple audio encodings • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (- q9) • MP3 320 000 (downloads) • For each track, a file for each encoding/bitrate is listed in the returned metadata • The client picks an appropriate choiceTuesday, April 17, 12
  • 26. Get the audio data • The client now must fetch the actual audio data • Latency killsTuesday, April 17, 12
  • 27. Cache • Player caches tracks it has played • Caches are large (56% are over 5 GB) • Least Recently Used policy for cache eviction • 50% of data comes from local cache • Cached files are served in P2P overlayTuesday, April 17, 12
  • 28. Streaming • Request first piece from Spotify storage • Meanwhile, search peer-to-peer (P2P) for remainder • Switch back and forth between Spotify storage and peers as needed • Towards end of a track, start prefetching next oneTuesday, April 17, 12
  • 29. P2P • All peers are equals (no supernodes) • A user only downloads data she needs • tracker service keeps peers for each track • P2P network becomes (weakly) clustered by interest • Oblivious to network architecture • Does not enforce fairness • Mobile clients does not participate in P2P h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-­‐spo9fy_kth11.pdfTuesday, April 17, 12
  • 30. Tuesday, April 17, 12
  • 31. Tuesday, April 17, 12
  • 32. Success!Tuesday, April 17, 12
  • 33. YAA: Hadoop • We run analysis using Hadoop which feeds back into the previously described process, e.g. track popularity is used for weighing search results and toplistsTuesday, April 17, 12
  • 34. Tuesday, April 17, 12
  • 35. Development at Spotify • Uses almost exclusively open source software • Git, Debian, Munin, Zabbix, Puppet, Teamcity... • Developers use whatever development tools they are comfortable with • Scrum or Kanban in three week iterations • DevOps heavy. Freaking awesome ops • Monitor and measure all the things!Tuesday, April 17, 12
  • 36. Development at Spotify • Development hubs in Stockholm, Göteborg and NYC • All in all, >220 people in tech • Very talented team • Hackdays and system owner days in each iteration • Hangs out on IRC • Growing and hiringTuesday, April 17, 12
  • 37. Languages at SpotifyTuesday, April 17, 12
  • 38. Questions?Tuesday, April 17, 12
  • 39. Thank you Want to work at Spotify? http://www.spotify.com/jobs/Tuesday, April 17, 12