What is Spotify? • On-demand music streaming service • Available in 7 European countries • Over 13 million tracks • Over 1 million paying users • Over 10 million total users • Legal • Really fast
Business idea • Free ad-funded version • Paid subscription where users get: • No advertisements • Mobile access • Ofﬂine playback • API access
“music itself is going to becomelike running water or electricity” David Bowie, 2002
Accessibility • People should be able to enjoy music • Whenever they want • Wherever they are • Whatever they’re doing
That’s cool and all… but let’s talk engineering!
Latency sucks • High latency can be a problem, not only in First Person Shooters • Slow performance is one of the major reasons users abandon services • Users dont come back How to avoid it?
Repeating songs? • Player caches tracks it has played • Default policy is to use 10% of free space (capped at 10 GB) • Caches are large (56% are over 5 GB) • LRU policy for cache eviction
Streaming • Request ﬁrst piece from Spotify servers • Meanwhile, search for peers with track • Download data in-order • When buffers are sufﬁcient, switch to P2P • Towards end of a track, prefetch next one
When to start playing? • Trade off between stutter & latency • Look at last 15 min of transfer rates • Model as Markov chain and simulate • Coupled with some heuristics
How well does it work? • Current median latency to begin playing a track in Spotify is 265ms • Due to disk lookup, at times its actually faster to start playing a track from network than from disk • Below 1% of playbacks experienced stutter
Spotify is fast, we get it, but you must have cool infrastructure, right?
Production storage • Production storage is a cache with fast drives & lots of RAM • Serves the most popular content • A cache miss will generate a request to master storage, slightly higher latency • Production storage is available in several data centers to ensure closeness to the user (latency wise)
Master storage • Works as a DHT, with some redundancy • Contains all available tracks but has slower drives and access • Tracks are kept in several formats, adding up to around 290TB
Why P2P? • Easier to scale • Less servers • Less bandwidth • Better uptime • Less costs • Cool!
P2P overview • Not a piracy network, all tracks are added by Spotify • Used on all desktop clients • All nodes are equal (no super nodes) • A track is downloaded from several peers
P2P custom protocol • Ask for most urgent pieces ﬁrst • If a peer is slow, re-request from new peers • When buffers run low, download from central servers • If loading from servers, estimate at what point P2P will catch up
P2P resource usage • To ensure user experience we: • Cap number of neighbors • Cap number of simultaneous uploads • If buffers are very low, stop uploading • Cap cache size • Mobile clients don’t participate in P2P
P2P ﬁnding peers • Partial central tracker (BitTorrent-style) • Broadcast query in small neighborhood (Gnutella-style) • Two mechanisms results in higher availability • Limited broadcast for local (LAN) peer discovery (cherry on top...)
Security • P2P network needs to be safe and trusted • All peers should be trusted Spotify clients • Our client needs to be able to read metadata and play music • But we have to prevent reverse engineering from doing the same Security trough obscurity: we dont openlydiscuss the details…
…but here’s a few tips • Closed environment • Integrity of downloaded ﬁles is checked • Data transfers are encrypted • Usernames are not exposed in P2P network, all peers assigned pseudonym • Software obfuscation, makes life difﬁcult for reverse engineers