Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spotify: behind the scenes


Published on

Slides from my guest lecture at Uppsala University, class of Distributed Systems, on the 10th of February 2011

Published in: Technology
  • Follow the link, new dating source: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here

Spotify: behind the scenes

  1. 1. behind the scenes Spotify Ricardo Santos @ricardovice
  2. 2. “spotifiera”, anyone?
  3. 3. Main goals •  A big catalogue, tons of music •  Available everywhere •  Great user experience •  More convenient than piracy •  Fast •  Reliable, high availability •  Scalable to many, many users
  4. 4. Business idea •  Free ad-funded version •  Paid subscription where users get: •  No advertisements •  Mobile access •  Offline playback •  API access
  5. 5. “music itself is going to becomelike running water or electricity” David Bowie, 2002
  6. 6. Accessibility •  People should always be able to access music •  Whenever they want •  Wherever they are
  7. 7. The catalogue •  All content is delivered by labels •  Currently over 10 million tracks •  Growing every day, around 10k per day •  96-320 kbps audio streams, most are Ogg Vorbis q5, 160kbps
  8. 8. that all sounds cool,but let’s talk engineering!
  9. 9. “It’s Easy, Really.” Blaine Cook, 2007
  10. 10. Handling Growth •  Scaling is not an exact science •  There is no such thing as a magic formula •  Usage patterns differ •  There is always a limit to what you can handle •  Fail gracefully •  Continuous evolution process
  11. 11. Usage patterns Typically, some services are more demandingthan others, this can be due to: •  Higher popularity •  Higher complexity •  Both combined
  12. 12. Decoupling •  Divide and conquer! •  Resources assigned individually •  Using the right tools to address each problem •  Organization and delegation •  Problems are isolated •  Easier to handle growth
  13. 13. Decoupling Spotify’s internal services include: •  Access Point •  User •  Playlist •  Search •  Browse Can you guess which one is the most complex?
  14. 14. Playlist!
  15. 15. Playlist! Though it may sound simple, by far the most demanding: •  For each user there are several playlists •  Push notifications •  Offline writing •  Conflict resolution without user interaction
  16. 16. Metadata services Search and Browse allow users to find music •  Both handle read requests •  But their usage and responses differ •  Data sources should be optimized for each of these, called indices •  These are hard to maintain, easier to regenerate
  17. 17. Speed thrills
  18. 18. Latency matters •  High latency is a problem, not only in First Person Shooters •  Increased latency of Google searches by 100 – 400ms decreased usage by 0.2 – 0.6% (Jake Brutlag, 2009) •  Slow performance is one of the major reasons users abandon services •  Users dont come back
  19. 19. Focus on low latency •  Our SLA is maintained by monitoring latency on the client side •  On average, the human notion of “instantly” is 200ms •  The current median latency to begin to play a track in Spotify is 265ms •  Due to disk lookup, at times its actually faster to start playing a track from network than from disk
  20. 20. Playing a track •  Check local cache •  Request first piece from Spotify servers •  Meanwhile, search P2P for remainder •  Switch between servers P2P as needed •  Towards the end of a track, start pre- fetching the next one via P2P rather than our servers
  21. 21. When to start playing? •  Trade off between stutter latency •  Look at last 15 min of transfer rates •  Model as Markov chain and simulate •  Coupled with some heuristics
  22. 22. Production storage •  Production storage is a cache with fast drives lots of RAM •  Serves the most popular content •  A cache miss will generate a request to master storage, slightly higher latency •  Production storage is available in several data centers to ensure closeness to the user (latency wise)
  23. 23. Master storage •  Works as a DHT, with some redundancy •  Contains all available tracks but has slower drives and access •  Tracks are kept in several formats, adding up to around 290TB
  24. 24. P2P helps •  Easier to scale •  Less servers •  Less bandwidth •  Better uptime •  Less costs •  Fun!
  25. 25. P2P overview •  Not a piracy network, all tracks are added by Spotify •  Used on all desktop clients (no mobile) •  Each client connected to = 60 others •  All nodes are equals (no super nodes) •  A track is downloaded from several peers
  26. 26. P2P custom protocol •  Ask for most urgent pieces first •  If a peer is slow, re-request from new peers •  When buffers run low, download from central servers •  If loading from servers, estimate at what point P2P will catch up •  If buffers are very low, stop uploading
  27. 27. P2P finding peers •  Partial central tracker (BitTorrent-style) •  Broadcast query in small neighborhood (Gnutella-style) •  Two mechanisms results in higher availability •  Limited broadcast for local (LAN) peer discovery (cherry on top...)
  28. 28. P2P security •  The P2P network needs to be a safe and trusted one •  All exchanged files have to come originally from Spotify •  All peers should be trusted Spotify clients
  29. 29. Security trough obscurity •  Our client needs to be able to read metadata and play music •  At the same time we have to prevent reverse engineering from doing the same •  Therefor, we cant openly discuss the details
  30. 30. but… •  Closed environment •  Integrity of downloaded files is checked •  Data transfers are encrypted •  Usernames are not exposed in P2P network, all peers assigned pseudonym •  Software obfuscation, makes life difficult for reverse engineers
  31. 31. Software obfuscation
  32. 32. So, whats the outcome? •  At over 10 million users the responses are •  55.4% from client cache •  35.8% from the P2P network •  8.8% from the servers
  33. 33. Oh, andwe havecake aswell!
  34. 34. Id like to know more... •  Get in touch with us •  Checkout Gunnar Kreitzs slides and academic papers on the subject:
  35. 35. Thanks!