SlideShare a Scribd company logo
1 of 39
Pressing play



                                        Niklas Gustavsson
                                               ngn@spotify.com
                                                    @protocol7

Tuesday, April 17, 12
Who am I?
      • ngn@spotify.com
      • @protocol7
      • Spotify backend dev based in Göteborg
      • Mainly from a JVM background, working on
        various stuff over the years
      • Apache Software Foundation member




Tuesday, April 17, 12
What’s Spotify all about?
      •       A big catalogue, tons of music
      •       Available everywhere
      •       Great user experience
      •       More convenient than piracy
      •       Fast, reliable, always available
      •       Scalable for many, many users
      •       Ad-supported or payed-for service




Tuesday, April 17, 12
Pressing	
  play


Tuesday, April 17, 12
Where’s Spotify?
      • Let’s start the client, but where should it connect
        to?




Tuesday, April 17, 12
Aside: SRV records
      • Example SRV
      _spotify-mac-client._tcp.spotify.com. 242 IN    SRV 10   8      4070 C8.spotify.com.
      _spotify-mac-client._tcp.spotify.com. 242 IN    SRV 10   16     4070 C4.spotify.com.
      name                                  TTL class     prio weight port host




      • GeoDNS used




Tuesday, April 17, 12
What does that record really point to?
      • accesspoint
      • Handles authentication state, logging, routing,
        rate limiting and much more
      • Protocol between client and AP uses a single,
        encrypted multiplexed socket over TCP
      • Written in C++




Tuesday, April 17, 12
Tuesday, April 17, 12
Find something to play
      • Let’s search




Tuesday, April 17, 12
Services
      • Probably close to 100 backend services, most
        small, handling a single task
      • UNIX philosophy
      • Many autonomous
      • Deployed on commodity servers
      • Always redundant




Tuesday, April 17, 12
Services
      • Mostly written in Python, a few in Java and C
      • Storage optimized for each service, mostly
        PostgreSQL, Cassandra and Tokyo Cabinet
      • Many service uses in-memory caching using for
        example /dev/shm or memcached
      • Usually a small daemon, talking HTTP or Hermes
        • Got our own supervisor which keeps services
           running




Tuesday, April 17, 12
Aside: Hermes
      •       ZeroMQ for transport, protobuf for envelope and payload
      •       HTTP-like verbs and caching
      •       Request-reply and publish/subscribe
      •       Very performant and introspectable




Tuesday, April 17, 12
How does the accesspoint find search?
      • Everything has an SRV DNS record:
        • One record with same name for each service
          instance
        • Clients resolve to find servers providing that
          service
        • Lowest priority record is chosen with weighted
          shuffle
        • Clients retry other instances in case of failures




Tuesday, April 17, 12
Read-only services
      •       Stateless
      •       Writes are hard
      •       Simple to scale, just add more servers
      •       Services can be restarted as needed
      •       Indexes prefabricated, distributed to live servers




Tuesday, April 17, 12
Read-write services
      • User generated content, e.g. playlists
      • Hard to ensure consistence of data across instances

      Solutions:
      • Eventual consistency:
         • Reads of just written data not guaranteed to be up-to-date
      • Locking, atomic operations
          • Creating globally unique keys, e.g. usernames
          • Transactions, e.g. billing


Tuesday, April 17, 12
Sharding
      • Some services use Dynamo inspired DHTs
        • Each request has a key
        • Each service node is responsible for a range of
          hash keys
        • Data is distributed among service nodes
        • Redundancy is ensured by writing to replica
          node
        • Data must be transitioned when ring changes




Tuesday, April 17, 12
DHT example




Tuesday, April 17, 12
search
      • Java service
      • Lucene storage
        • New index published daily
      • Doesn’t store any metadata in itself, returns a list
        of identifiers

      • (Search suggestions are served from a separate
        service, optimized for speed)




Tuesday, April 17, 12
Metadata services
      •       Multiple read-only services
      •       60 Gb indices
      •       Responds to metadata requests
      •       Decorates metadata onto other service responses
              • We’re most likely moving away from this model




Tuesday, April 17, 12
Tuesday, April 17, 12
Another aside: How does stuff get into Spotify?
      • >15 million tracks, we can’t maintain all that
        ourselves
      • Ingest audio, images and metadata from labels
        • Receive, transform, transcode, merge
      • All ends up in a metadata database from which
        indices are generated and distributed to services




Tuesday, April 17, 12
Tuesday, April 17, 12
The Kent bug
      • Much of the metadata lacks identifiers which
        leaves us with heuristics.




Tuesday, April 17, 12
Play


Tuesday, April 17, 12
Audio encodings and files
      • Spotify supports multiple audio encodings
        • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (-
            q9)
        • MP3 320 000 (downloads)
      • For each track, a file for each encoding/bitrate is
        listed in the returned metadata
      • The client picks an appropriate choice




Tuesday, April 17, 12
Get the audio data
      • The client now must fetch the actual audio data
      • Latency kills




Tuesday, April 17, 12
Cache
      •       Player caches tracks it has played
      •       Caches are large (56% are over 5 GB)
      •       Least Recently Used policy for cache eviction
      •       50% of data comes from local cache
      •       Cached files are served in P2P overlay




Tuesday, April 17, 12
Streaming
      • Request first piece from Spotify storage
      • Meanwhile, search peer-to-peer (P2P) for
        remainder
      • Switch back and forth between Spotify storage
        and peers as needed
      • Towards end of a track, start prefetching next one




Tuesday, April 17, 12
P2P
      • All peers are equals (no supernodes)
      • A user only downloads data she needs
      • tracker service keeps peers for each track
      • P2P network becomes (weakly) clustered by
        interest
      • Oblivious to network architecture
      • Does not enforce fairness
      • Mobile clients does not participate in P2P



                        h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-­‐spo9fy_kth11.pdf
Tuesday, April 17, 12
Tuesday, April 17, 12
Tuesday, April 17, 12
Success!




Tuesday, April 17, 12
YAA: Hadoop
      • We run analysis using Hadoop which feeds back
        into the previously described process, e.g. track
        popularity is used for weighing search results and
        toplists




Tuesday, April 17, 12
Tuesday, April 17, 12
Development at Spotify
      • Uses almost exclusively open source software
        • Git, Debian, Munin, Zabbix, Puppet, Teamcity...
      • Developers use whatever development tools they are
        comfortable with
      • Scrum or Kanban in three week iterations
      • DevOps heavy. Freaking awesome ops
      • Monitor and measure all the things!




Tuesday, April 17, 12
Development at Spotify
      •        Development hubs in Stockholm, Göteborg and NYC
      •        All in all, >220 people in tech
      •        Very talented team
      •        Hackdays and system owner days in each iteration
      •        Hangs out on IRC
      •        Growing and hiring




Tuesday, April 17, 12
Languages at Spotify




Tuesday, April 17, 12
Questions?



Tuesday, April 17, 12
Thank you

                           Want to work at Spotify?
                        http://www.spotify.com/jobs/


Tuesday, April 17, 12

More Related Content

What's hot

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 

What's hot (20)

Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
An Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization SystemsAn Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization Systems
 
Enabling independent teams by creating decoupled data flows
Enabling independent teams by creating decoupled data flowsEnabling independent teams by creating decoupled data flows
Enabling independent teams by creating decoupled data flows
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Intro to Lean Software Development
Intro to Lean Software DevelopmentIntro to Lean Software Development
Intro to Lean Software Development
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Encrypted DNS - DNS over TLS / DNS over HTTPS
Encrypted DNS - DNS over TLS / DNS over HTTPSEncrypted DNS - DNS over TLS / DNS over HTTPS
Encrypted DNS - DNS over TLS / DNS over HTTPS
 
Devops Scorecard
Devops ScorecardDevops Scorecard
Devops Scorecard
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objects
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 

Similar to Spotify architecture - Pressing play

The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the Internet
Andrew Morris
 
Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation
Michael Bohlig
 
20130714 php matsuri - highly available php
20130714   php matsuri - highly available php20130714   php matsuri - highly available php
20130714 php matsuri - highly available php
Graham Weldon
 

Similar to Spotify architecture - Pressing play (20)

Spotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for moreSpotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for more
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streaming
 
Is Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon ToigoIs Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon Toigo
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the Internet
 
DNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and ResponseDNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and Response
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
ProjectTox: Free as in freedom Skype replacement
ProjectTox: Free as in freedom Skype replacementProjectTox: Free as in freedom Skype replacement
ProjectTox: Free as in freedom Skype replacement
 
Puppet Keynote
Puppet KeynotePuppet Keynote
Puppet Keynote
 
Compression talk
Compression talkCompression talk
Compression talk
 
ION Krakow - A Global IPv6 Deployment Update
ION Krakow - A Global IPv6 Deployment UpdateION Krakow - A Global IPv6 Deployment Update
ION Krakow - A Global IPv6 Deployment Update
 
Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
 
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
 
20130714 php matsuri - highly available php
20130714   php matsuri - highly available php20130714   php matsuri - highly available php
20130714 php matsuri - highly available php
 
Internet Week 2018: 1.1.1.0/24 A report from the (anycast) trenches
Internet Week 2018: 1.1.1.0/24 A report from the (anycast) trenchesInternet Week 2018: 1.1.1.0/24 A report from the (anycast) trenches
Internet Week 2018: 1.1.1.0/24 A report from the (anycast) trenches
 
Approaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsApproaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC apps
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)
 

More from Niklas Gustavsson (11)

Spotify services - Leetspeak 2014
Spotify services - Leetspeak 2014Spotify services - Leetspeak 2014
Spotify services - Leetspeak 2014
 
Spotify services (SDC 2013)
Spotify services (SDC 2013)Spotify services (SDC 2013)
Spotify services (SDC 2013)
 
Real-time web
Real-time webReal-time web
Real-time web
 
RESTful web services
RESTful web servicesRESTful web services
RESTful web services
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
HTML5
HTML5HTML5
HTML5
 
The future is bright
The future is brightThe future is bright
The future is bright
 
CouchDB
CouchDBCouchDB
CouchDB
 
Oredev 2009 JAX-RS
Oredev 2009 JAX-RSOredev 2009 JAX-RS
Oredev 2009 JAX-RS
 
Apachecon Eu 2008 Mina
Apachecon Eu 2008 MinaApachecon Eu 2008 Mina
Apachecon Eu 2008 Mina
 
REST made simple with Java
REST made simple with JavaREST made simple with Java
REST made simple with Java
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Spotify architecture - Pressing play

  • 1. Pressing play Niklas Gustavsson ngn@spotify.com @protocol7 Tuesday, April 17, 12
  • 2. Who am I? • ngn@spotify.com • @protocol7 • Spotify backend dev based in Göteborg • Mainly from a JVM background, working on various stuff over the years • Apache Software Foundation member Tuesday, April 17, 12
  • 3. What’s Spotify all about? • A big catalogue, tons of music • Available everywhere • Great user experience • More convenient than piracy • Fast, reliable, always available • Scalable for many, many users • Ad-supported or payed-for service Tuesday, April 17, 12
  • 5. Where’s Spotify? • Let’s start the client, but where should it connect to? Tuesday, April 17, 12
  • 6. Aside: SRV records • Example SRV _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 8 4070 C8.spotify.com. _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 16 4070 C4.spotify.com. name TTL class prio weight port host • GeoDNS used Tuesday, April 17, 12
  • 7. What does that record really point to? • accesspoint • Handles authentication state, logging, routing, rate limiting and much more • Protocol between client and AP uses a single, encrypted multiplexed socket over TCP • Written in C++ Tuesday, April 17, 12
  • 9. Find something to play • Let’s search Tuesday, April 17, 12
  • 10. Services • Probably close to 100 backend services, most small, handling a single task • UNIX philosophy • Many autonomous • Deployed on commodity servers • Always redundant Tuesday, April 17, 12
  • 11. Services • Mostly written in Python, a few in Java and C • Storage optimized for each service, mostly PostgreSQL, Cassandra and Tokyo Cabinet • Many service uses in-memory caching using for example /dev/shm or memcached • Usually a small daemon, talking HTTP or Hermes • Got our own supervisor which keeps services running Tuesday, April 17, 12
  • 12. Aside: Hermes • ZeroMQ for transport, protobuf for envelope and payload • HTTP-like verbs and caching • Request-reply and publish/subscribe • Very performant and introspectable Tuesday, April 17, 12
  • 13. How does the accesspoint find search? • Everything has an SRV DNS record: • One record with same name for each service instance • Clients resolve to find servers providing that service • Lowest priority record is chosen with weighted shuffle • Clients retry other instances in case of failures Tuesday, April 17, 12
  • 14. Read-only services • Stateless • Writes are hard • Simple to scale, just add more servers • Services can be restarted as needed • Indexes prefabricated, distributed to live servers Tuesday, April 17, 12
  • 15. Read-write services • User generated content, e.g. playlists • Hard to ensure consistence of data across instances Solutions: • Eventual consistency: • Reads of just written data not guaranteed to be up-to-date • Locking, atomic operations • Creating globally unique keys, e.g. usernames • Transactions, e.g. billing Tuesday, April 17, 12
  • 16. Sharding • Some services use Dynamo inspired DHTs • Each request has a key • Each service node is responsible for a range of hash keys • Data is distributed among service nodes • Redundancy is ensured by writing to replica node • Data must be transitioned when ring changes Tuesday, April 17, 12
  • 18. search • Java service • Lucene storage • New index published daily • Doesn’t store any metadata in itself, returns a list of identifiers • (Search suggestions are served from a separate service, optimized for speed) Tuesday, April 17, 12
  • 19. Metadata services • Multiple read-only services • 60 Gb indices • Responds to metadata requests • Decorates metadata onto other service responses • We’re most likely moving away from this model Tuesday, April 17, 12
  • 21. Another aside: How does stuff get into Spotify? • >15 million tracks, we can’t maintain all that ourselves • Ingest audio, images and metadata from labels • Receive, transform, transcode, merge • All ends up in a metadata database from which indices are generated and distributed to services Tuesday, April 17, 12
  • 23. The Kent bug • Much of the metadata lacks identifiers which leaves us with heuristics. Tuesday, April 17, 12
  • 25. Audio encodings and files • Spotify supports multiple audio encodings • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (- q9) • MP3 320 000 (downloads) • For each track, a file for each encoding/bitrate is listed in the returned metadata • The client picks an appropriate choice Tuesday, April 17, 12
  • 26. Get the audio data • The client now must fetch the actual audio data • Latency kills Tuesday, April 17, 12
  • 27. Cache • Player caches tracks it has played • Caches are large (56% are over 5 GB) • Least Recently Used policy for cache eviction • 50% of data comes from local cache • Cached files are served in P2P overlay Tuesday, April 17, 12
  • 28. Streaming • Request first piece from Spotify storage • Meanwhile, search peer-to-peer (P2P) for remainder • Switch back and forth between Spotify storage and peers as needed • Towards end of a track, start prefetching next one Tuesday, April 17, 12
  • 29. P2P • All peers are equals (no supernodes) • A user only downloads data she needs • tracker service keeps peers for each track • P2P network becomes (weakly) clustered by interest • Oblivious to network architecture • Does not enforce fairness • Mobile clients does not participate in P2P h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-­‐spo9fy_kth11.pdf Tuesday, April 17, 12
  • 33. YAA: Hadoop • We run analysis using Hadoop which feeds back into the previously described process, e.g. track popularity is used for weighing search results and toplists Tuesday, April 17, 12
  • 35. Development at Spotify • Uses almost exclusively open source software • Git, Debian, Munin, Zabbix, Puppet, Teamcity... • Developers use whatever development tools they are comfortable with • Scrum or Kanban in three week iterations • DevOps heavy. Freaking awesome ops • Monitor and measure all the things! Tuesday, April 17, 12
  • 36. Development at Spotify • Development hubs in Stockholm, Göteborg and NYC • All in all, >220 people in tech • Very talented team • Hackdays and system owner days in each iteration • Hangs out on IRC • Growing and hiring Tuesday, April 17, 12
  • 39. Thank you Want to work at Spotify? http://www.spotify.com/jobs/ Tuesday, April 17, 12