2. Basic requirements
• Lots of music
• Available everywhere, with great user interfaces
• More convenient than piracy
• Fast start of playback
• High availability (enough nines to put CDs in the
basement)
• Large scale (many many users)
3. The music
• Over 10 million tracks
• Growing every day, around 10k per day
• 96-320 kbps audio streams, most are Ogg
Vorbis q5, 160kbps
9. Latency matters
• High latency can be a problem, not only in
First Person Shooters
• Increased latency of Google searches by 100
– 400ms decreased usage by 0.2 – 0.6% (Jake
Brutlag, 2009)
• Slow performance is one of the major
reasons users abandon services
• Users don't come back
10. Latency matters
• Focus on low latency
• On average, the human notion of “instantly” is 200ms
• The median latency to begin play a track in Spotify is
265ms
• Due to disk lookup, at times it's actually faster to start
playing a track from network than from disk
• The SLA is maintained by monitoring latency in the client
11. Playing a track
• Check local cache
• Request first piece from Spotify servers
• Meanwhile, search P2P for remainder
• Switch between servers & P2P as needed
• Towards the end of a track, start pre-
fetching the next one via P2P rather than
our servers
12. When to start playing?
• Trade off between stutter & latency
• Look at last 15 min of transfer rates
• Model as Markov chain and simulate
• Coupled with some heuristics
14. • Production storage is a cache with fast drives &
lots of RAM
• Serves the most popular content
• A cache miss will generate a request to master
storage
• User will experience longer latency
• Production storage is available in several data
centers to ensure closeness to the user (latency
wise)
Production storage
15. Master storage
• Works as a DHT, with some redundancy
• Contains all available tracks but has slower
drives and access
• Tracks are kept in several formats, adding
up to around 290TB
17. P2P helps
• Easier to scale
• Less servers
• Less bandwidth
• Better uptime
• Less costs
• Fun!
18. P2P overview
• Not a piracy network, all tracks are added
by Spotify
• Used on all desktop clients (no mobile)
• Each client connected to <= 60 others
• All nodes are equals (no super nodes)
• A track is downloaded from several peers
19. P2P custom protocol
• Ask for most urgent pieces first
• If a peer is slow, re-request from new peers
• When buffers run low, download from
central servers
• If loading from servers, estimate at what
point P2P will catch up
• If buffers are very low, stop uploading
20. P2P finding peers
• Partial central tracker (BitTorrent-style)
• Broadcast query in small neighborhood
(Gnutella-style)
• Two mechanisms results in higher
availability
• Limited broadcast for local (LAN) peer
discovery (cherry on top...)
21. P2P security
• The client needs to be able to play music, but we have to
prevent reverse engineering from doing the same
• Therefor we can't openly discuss the details (Security
Trough Obscurity) but...
• Closed environment
• Verify integrity of downloaded files
• Data transfers are encrypted
• Usernames are not exposed in P2P network, all peers
assigned pseudonym
22. So, what's the
outcome?
• At over 10 million users the responses are
• 55.4% from client cache
• 35.8% from the P2P network
• 8.8% from the servers
24. I'd like to know more...
• Get in touch with us
• Checkout Gunnar Kreitz's slides and
academic papers on the subject:
http://www.csc.kth.se/~gkreitz/spotify-p2p10/