Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spotify: P2P music streaming


Published on

Slides from my talk at ISEL Tech in Lisbon, May 26th, 2011

Spotify: P2P music streaming

  1. 1. P2P music streaming Ricardo Santos @ricardovice
  2. 2. “spotifiera”, anyone?
  3. 3. What is Spotify? •  On-demand music streaming service •  Available in 7 European countries •  Over 13 million tracks •  Over 1 million paying users •  Over 10 million total users •  Legal •  Really fast
  4. 4. Business idea •  Free ad-funded version •  Paid subscription where users get: •  No advertisements •  Mobile access •  Offline playback •  API access
  5. 5. “music itself is going to becomelike running water or electricity” David Bowie, 2002
  6. 6. Accessibility •  People should be able to enjoy music •  Whenever they want •  Wherever they are •  Whatever they’re doing
  7. 7. That’s cool and all… but let’s talk engineering!
  8. 8. Latency sucks •  High latency can be a problem, not only in First Person Shooters •  Slow performance is one of the major reasons users abandon services •  Users dont come back How to avoid it?
  9. 9. Repeating songs? •  Player caches tracks it has played •  Default policy is to use 10% of free space (capped at 10 GB) •  Caches are large (56% are over 5 GB) •  LRU policy for cache eviction
  10. 10. Streaming •  Request first piece from Spotify servers •  Meanwhile, search for peers with track •  Download data in-order •  When buffers are sufficient, switch to P2P •  Towards end of a track, prefetch next one
  11. 11. When to start playing? •  Trade off between stutter & latency •  Look at last 15 min of transfer rates •  Model as Markov chain and simulate •  Coupled with some heuristics
  12. 12. How well does it work? •  Current median latency to begin playing a track in Spotify is 265ms •  Due to disk lookup, at times its actually faster to start playing a track from network than from disk •  Below 1% of playbacks experienced stutter
  13. 13. Spotify is fast, we get it, but you must have cool infrastructure, right?
  14. 14. Production storage •  Production storage is a cache with fast drives & lots of RAM •  Serves the most popular content •  A cache miss will generate a request to master storage, slightly higher latency •  Production storage is available in several data centers to ensure closeness to the user (latency wise)
  15. 15. Master storage •  Works as a DHT, with some redundancy •  Contains all available tracks but has slower drives and access •  Tracks are kept in several formats, adding up to around 290TB
  16. 16. Why P2P? •  Easier to scale •  Less servers •  Less bandwidth •  Better uptime •  Less costs •  Cool!
  17. 17. P2P overview •  Not a piracy network, all tracks are added by Spotify •  Used on all desktop clients •  All nodes are equal (no super nodes) •  A track is downloaded from several peers
  18. 18. P2P custom protocol •  Ask for most urgent pieces first •  If a peer is slow, re-request from new peers •  When buffers run low, download from central servers •  If loading from servers, estimate at what point P2P will catch up
  19. 19. P2P resource usage •  To ensure user experience we: •  Cap number of neighbors •  Cap number of simultaneous uploads •  If buffers are very low, stop uploading •  Cap cache size •  Mobile clients don’t participate in P2P
  20. 20. P2P finding peers •  Partial central tracker (BitTorrent-style) •  Broadcast query in small neighborhood (Gnutella-style) •  Two mechanisms results in higher availability •  Limited broadcast for local (LAN) peer discovery (cherry on top...)
  21. 21. Security •  P2P network needs to be safe and trusted •  All peers should be trusted Spotify clients •  Our client needs to be able to read metadata and play music •  But we have to prevent reverse engineering from doing the same Security trough obscurity: we dont openlydiscuss the details…
  22. 22. …but here’s a few tips •  Closed environment •  Integrity of downloaded files is checked •  Data transfers are encrypted •  Usernames are not exposed in P2P network, all peers assigned pseudonym •  Software obfuscation, makes life difficult for reverse engineers
  23. 23. Software obfuscation according to
  24. 24. So, whats the outcome? •  At over 10 million users the responses are: •  55.4% from client cache •  35.8% from the P2P network •  8.8% from the servers
  25. 25. Not enough info? If you would like to know more: •  Get in touch with us •  Checkout Gunnar Kreitzs slides and academic papers on the subject:
  26. 26. Oh btw,we have alotta funas well!