SlideShare a Scribd company logo
1 of 39
Pressing play



                                        Niklas Gustavsson
                                               ngn@spotify.com
                                                    @protocol7

Tuesday, April 17, 12
Who am I?
      • ngn@spotify.com
      • @protocol7
      • Spotify backend dev based in Göteborg
      • Mainly from a JVM background, working on
        various stuff over the years
      • Apache Software Foundation member




Tuesday, April 17, 12
What’s Spotify all about?
      •       A big catalogue, tons of music
      •       Available everywhere
      •       Great user experience
      •       More convenient than piracy
      •       Fast, reliable, always available
      •       Scalable for many, many users
      •       Ad-supported or payed-for service




Tuesday, April 17, 12
Pressing	
  play


Tuesday, April 17, 12
Where’s Spotify?
      • Let’s start the client, but where should it connect
        to?




Tuesday, April 17, 12
Aside: SRV records
      • Example SRV
      _spotify-mac-client._tcp.spotify.com. 242 IN    SRV 10   8      4070 C8.spotify.com.
      _spotify-mac-client._tcp.spotify.com. 242 IN    SRV 10   16     4070 C4.spotify.com.
      name                                  TTL class     prio weight port host




      • GeoDNS used




Tuesday, April 17, 12
What does that record really point to?
      • accesspoint
      • Handles authentication state, logging, routing,
        rate limiting and much more
      • Protocol between client and AP uses a single,
        encrypted multiplexed socket over TCP
      • Written in C++




Tuesday, April 17, 12
Tuesday, April 17, 12
Find something to play
      • Let’s search




Tuesday, April 17, 12
Services
      • Probably close to 100 backend services, most
        small, handling a single task
      • UNIX philosophy
      • Many autonomous
      • Deployed on commodity servers
      • Always redundant




Tuesday, April 17, 12
Services
      • Mostly written in Python, a few in Java and C
      • Storage optimized for each service, mostly
        PostgreSQL, Cassandra and Tokyo Cabinet
      • Many service uses in-memory caching using for
        example /dev/shm or memcached
      • Usually a small daemon, talking HTTP or Hermes
        • Got our own supervisor which keeps services
           running




Tuesday, April 17, 12
Aside: Hermes
      •       ZeroMQ for transport, protobuf for envelope and payload
      •       HTTP-like verbs and caching
      •       Request-reply and publish/subscribe
      •       Very performant and introspectable




Tuesday, April 17, 12
How does the accesspoint find search?
      • Everything has an SRV DNS record:
        • One record with same name for each service
          instance
        • Clients resolve to find servers providing that
          service
        • Lowest priority record is chosen with weighted
          shuffle
        • Clients retry other instances in case of failures




Tuesday, April 17, 12
Read-only services
      •       Stateless
      •       Writes are hard
      •       Simple to scale, just add more servers
      •       Services can be restarted as needed
      •       Indexes prefabricated, distributed to live servers




Tuesday, April 17, 12
Read-write services
      • User generated content, e.g. playlists
      • Hard to ensure consistence of data across instances

      Solutions:
      • Eventual consistency:
         • Reads of just written data not guaranteed to be up-to-date
      • Locking, atomic operations
          • Creating globally unique keys, e.g. usernames
          • Transactions, e.g. billing


Tuesday, April 17, 12
Sharding
      • Some services use Dynamo inspired DHTs
        • Each request has a key
        • Each service node is responsible for a range of
          hash keys
        • Data is distributed among service nodes
        • Redundancy is ensured by writing to replica
          node
        • Data must be transitioned when ring changes




Tuesday, April 17, 12
DHT example




Tuesday, April 17, 12
search
      • Java service
      • Lucene storage
        • New index published daily
      • Doesn’t store any metadata in itself, returns a list
        of identifiers

      • (Search suggestions are served from a separate
        service, optimized for speed)




Tuesday, April 17, 12
Metadata services
      •       Multiple read-only services
      •       60 Gb indices
      •       Responds to metadata requests
      •       Decorates metadata onto other service responses
              • We’re most likely moving away from this model




Tuesday, April 17, 12
Tuesday, April 17, 12
Another aside: How does stuff get into Spotify?
      • >15 million tracks, we can’t maintain all that
        ourselves
      • Ingest audio, images and metadata from labels
        • Receive, transform, transcode, merge
      • All ends up in a metadata database from which
        indices are generated and distributed to services




Tuesday, April 17, 12
Tuesday, April 17, 12
The Kent bug
      • Much of the metadata lacks identifiers which
        leaves us with heuristics.




Tuesday, April 17, 12
Play


Tuesday, April 17, 12
Audio encodings and files
      • Spotify supports multiple audio encodings
        • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (-
            q9)
        • MP3 320 000 (downloads)
      • For each track, a file for each encoding/bitrate is
        listed in the returned metadata
      • The client picks an appropriate choice




Tuesday, April 17, 12
Get the audio data
      • The client now must fetch the actual audio data
      • Latency kills




Tuesday, April 17, 12
Cache
      •       Player caches tracks it has played
      •       Caches are large (56% are over 5 GB)
      •       Least Recently Used policy for cache eviction
      •       50% of data comes from local cache
      •       Cached files are served in P2P overlay




Tuesday, April 17, 12
Streaming
      • Request first piece from Spotify storage
      • Meanwhile, search peer-to-peer (P2P) for
        remainder
      • Switch back and forth between Spotify storage
        and peers as needed
      • Towards end of a track, start prefetching next one




Tuesday, April 17, 12
P2P
      • All peers are equals (no supernodes)
      • A user only downloads data she needs
      • tracker service keeps peers for each track
      • P2P network becomes (weakly) clustered by
        interest
      • Oblivious to network architecture
      • Does not enforce fairness
      • Mobile clients does not participate in P2P



                        h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-­‐spo9fy_kth11.pdf
Tuesday, April 17, 12
Tuesday, April 17, 12
Tuesday, April 17, 12
Success!




Tuesday, April 17, 12
YAA: Hadoop
      • We run analysis using Hadoop which feeds back
        into the previously described process, e.g. track
        popularity is used for weighing search results and
        toplists




Tuesday, April 17, 12
Tuesday, April 17, 12
Development at Spotify
      • Uses almost exclusively open source software
        • Git, Debian, Munin, Zabbix, Puppet, Teamcity...
      • Developers use whatever development tools they are
        comfortable with
      • Scrum or Kanban in three week iterations
      • DevOps heavy. Freaking awesome ops
      • Monitor and measure all the things!




Tuesday, April 17, 12
Development at Spotify
      •        Development hubs in Stockholm, Göteborg and NYC
      •        All in all, >220 people in tech
      •        Very talented team
      •        Hackdays and system owner days in each iteration
      •        Hangs out on IRC
      •        Growing and hiring




Tuesday, April 17, 12
Languages at Spotify




Tuesday, April 17, 12
Questions?



Tuesday, April 17, 12
Thank you

                           Want to work at Spotify?
                        http://www.spotify.com/jobs/


Tuesday, April 17, 12

More Related Content

What's hot

Continuous Delivery, Continuous Integration
Continuous Delivery, Continuous Integration Continuous Delivery, Continuous Integration
Continuous Delivery, Continuous Integration Amazon Web Services
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...Amazon Web Services
 
DevSecOps and the CI/CD Pipeline
 DevSecOps and the CI/CD Pipeline DevSecOps and the CI/CD Pipeline
DevSecOps and the CI/CD PipelineJames Wickett
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSASrikrupa Srivatsan
 
Microservices at Spotify
Microservices at SpotifyMicroservices at Spotify
Microservices at SpotifyKevin Goldsmith
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability LibraryTonny Adhi Sabastian
 
GitHub Actions in action
GitHub Actions in actionGitHub Actions in action
GitHub Actions in actionOleksii Holub
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 

What's hot (20)

Continuous Delivery, Continuous Integration
Continuous Delivery, Continuous Integration Continuous Delivery, Continuous Integration
Continuous Delivery, Continuous Integration
 
Spotify: P2P music streaming
Spotify: P2P music streamingSpotify: P2P music streaming
Spotify: P2P music streaming
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
DevSecOps and the CI/CD Pipeline
 DevSecOps and the CI/CD Pipeline DevSecOps and the CI/CD Pipeline
DevSecOps and the CI/CD Pipeline
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSA
 
Microservices at Spotify
Microservices at SpotifyMicroservices at Spotify
Microservices at Spotify
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
GitHub Actions in action
GitHub Actions in actionGitHub Actions in action
GitHub Actions in action
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 

Similar to Spotify architecture - Pressing play

Spotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for moreSpotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for moreNick Barkas
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingRicardo Vice Santos
 
Is Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon ToigoIs Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon Toigospectralogic
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the InternetAndrew Morris
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
DNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and ResponseDNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and Responsepm123008
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling PinterestC4Media
 
ProjectTox: Free as in freedom Skype replacement
ProjectTox: Free as in freedom Skype replacementProjectTox: Free as in freedom Skype replacement
ProjectTox: Free as in freedom Skype replacementWei-Ning Huang
 
Puppet Keynote
Puppet KeynotePuppet Keynote
Puppet KeynotePuppet
 
Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Michael Bohlig
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldMilo Yip
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemAPNIC
 
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)Adam Charnock
 
20130714 php matsuri - highly available php
20130714   php matsuri - highly available php20130714   php matsuri - highly available php
20130714 php matsuri - highly available phpGraham Weldon
 

Similar to Spotify architecture - Pressing play (20)

Spotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for moreSpotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for more
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streaming
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Is Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon ToigoIs Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon Toigo
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the Internet
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
DNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and ResponseDNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and Response
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
ProjectTox: Free as in freedom Skype replacement
ProjectTox: Free as in freedom Skype replacementProjectTox: Free as in freedom Skype replacement
ProjectTox: Free as in freedom Skype replacement
 
Puppet Keynote
Puppet KeynotePuppet Keynote
Puppet Keynote
 
Compression talk
Compression talkCompression talk
Compression talk
 
ION Krakow - A Global IPv6 Deployment Update
ION Krakow - A Global IPv6 Deployment UpdateION Krakow - A Global IPv6 Deployment Update
ION Krakow - A Global IPv6 Deployment Update
 
Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation Coursera amazon cloudsearch presentation
Coursera amazon cloudsearch presentation
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
 
Spotify: Data center & Backend buildout
Spotify: Data center & Backend buildoutSpotify: Data center & Backend buildout
Spotify: Data center & Backend buildout
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
 
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
PlayNice.ly: Using Redis to store all our data, hahaha (Redis London Meetup)
 
20130714 php matsuri - highly available php
20130714   php matsuri - highly available php20130714   php matsuri - highly available php
20130714 php matsuri - highly available php
 

More from Niklas Gustavsson (11)

Spotify services - Leetspeak 2014
Spotify services - Leetspeak 2014Spotify services - Leetspeak 2014
Spotify services - Leetspeak 2014
 
Spotify services (SDC 2013)
Spotify services (SDC 2013)Spotify services (SDC 2013)
Spotify services (SDC 2013)
 
Real-time web
Real-time webReal-time web
Real-time web
 
RESTful web services
RESTful web servicesRESTful web services
RESTful web services
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
HTML5
HTML5HTML5
HTML5
 
The future is bright
The future is brightThe future is bright
The future is bright
 
CouchDB
CouchDBCouchDB
CouchDB
 
Oredev 2009 JAX-RS
Oredev 2009 JAX-RSOredev 2009 JAX-RS
Oredev 2009 JAX-RS
 
Apachecon Eu 2008 Mina
Apachecon Eu 2008 MinaApachecon Eu 2008 Mina
Apachecon Eu 2008 Mina
 
REST made simple with Java
REST made simple with JavaREST made simple with Java
REST made simple with Java
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Spotify architecture - Pressing play

  • 1. Pressing play Niklas Gustavsson ngn@spotify.com @protocol7 Tuesday, April 17, 12
  • 2. Who am I? • ngn@spotify.com • @protocol7 • Spotify backend dev based in Göteborg • Mainly from a JVM background, working on various stuff over the years • Apache Software Foundation member Tuesday, April 17, 12
  • 3. What’s Spotify all about? • A big catalogue, tons of music • Available everywhere • Great user experience • More convenient than piracy • Fast, reliable, always available • Scalable for many, many users • Ad-supported or payed-for service Tuesday, April 17, 12
  • 5. Where’s Spotify? • Let’s start the client, but where should it connect to? Tuesday, April 17, 12
  • 6. Aside: SRV records • Example SRV _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 8 4070 C8.spotify.com. _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 16 4070 C4.spotify.com. name TTL class prio weight port host • GeoDNS used Tuesday, April 17, 12
  • 7. What does that record really point to? • accesspoint • Handles authentication state, logging, routing, rate limiting and much more • Protocol between client and AP uses a single, encrypted multiplexed socket over TCP • Written in C++ Tuesday, April 17, 12
  • 9. Find something to play • Let’s search Tuesday, April 17, 12
  • 10. Services • Probably close to 100 backend services, most small, handling a single task • UNIX philosophy • Many autonomous • Deployed on commodity servers • Always redundant Tuesday, April 17, 12
  • 11. Services • Mostly written in Python, a few in Java and C • Storage optimized for each service, mostly PostgreSQL, Cassandra and Tokyo Cabinet • Many service uses in-memory caching using for example /dev/shm or memcached • Usually a small daemon, talking HTTP or Hermes • Got our own supervisor which keeps services running Tuesday, April 17, 12
  • 12. Aside: Hermes • ZeroMQ for transport, protobuf for envelope and payload • HTTP-like verbs and caching • Request-reply and publish/subscribe • Very performant and introspectable Tuesday, April 17, 12
  • 13. How does the accesspoint find search? • Everything has an SRV DNS record: • One record with same name for each service instance • Clients resolve to find servers providing that service • Lowest priority record is chosen with weighted shuffle • Clients retry other instances in case of failures Tuesday, April 17, 12
  • 14. Read-only services • Stateless • Writes are hard • Simple to scale, just add more servers • Services can be restarted as needed • Indexes prefabricated, distributed to live servers Tuesday, April 17, 12
  • 15. Read-write services • User generated content, e.g. playlists • Hard to ensure consistence of data across instances Solutions: • Eventual consistency: • Reads of just written data not guaranteed to be up-to-date • Locking, atomic operations • Creating globally unique keys, e.g. usernames • Transactions, e.g. billing Tuesday, April 17, 12
  • 16. Sharding • Some services use Dynamo inspired DHTs • Each request has a key • Each service node is responsible for a range of hash keys • Data is distributed among service nodes • Redundancy is ensured by writing to replica node • Data must be transitioned when ring changes Tuesday, April 17, 12
  • 18. search • Java service • Lucene storage • New index published daily • Doesn’t store any metadata in itself, returns a list of identifiers • (Search suggestions are served from a separate service, optimized for speed) Tuesday, April 17, 12
  • 19. Metadata services • Multiple read-only services • 60 Gb indices • Responds to metadata requests • Decorates metadata onto other service responses • We’re most likely moving away from this model Tuesday, April 17, 12
  • 21. Another aside: How does stuff get into Spotify? • >15 million tracks, we can’t maintain all that ourselves • Ingest audio, images and metadata from labels • Receive, transform, transcode, merge • All ends up in a metadata database from which indices are generated and distributed to services Tuesday, April 17, 12
  • 23. The Kent bug • Much of the metadata lacks identifiers which leaves us with heuristics. Tuesday, April 17, 12
  • 25. Audio encodings and files • Spotify supports multiple audio encodings • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (- q9) • MP3 320 000 (downloads) • For each track, a file for each encoding/bitrate is listed in the returned metadata • The client picks an appropriate choice Tuesday, April 17, 12
  • 26. Get the audio data • The client now must fetch the actual audio data • Latency kills Tuesday, April 17, 12
  • 27. Cache • Player caches tracks it has played • Caches are large (56% are over 5 GB) • Least Recently Used policy for cache eviction • 50% of data comes from local cache • Cached files are served in P2P overlay Tuesday, April 17, 12
  • 28. Streaming • Request first piece from Spotify storage • Meanwhile, search peer-to-peer (P2P) for remainder • Switch back and forth between Spotify storage and peers as needed • Towards end of a track, start prefetching next one Tuesday, April 17, 12
  • 29. P2P • All peers are equals (no supernodes) • A user only downloads data she needs • tracker service keeps peers for each track • P2P network becomes (weakly) clustered by interest • Oblivious to network architecture • Does not enforce fairness • Mobile clients does not participate in P2P h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-­‐spo9fy_kth11.pdf Tuesday, April 17, 12
  • 33. YAA: Hadoop • We run analysis using Hadoop which feeds back into the previously described process, e.g. track popularity is used for weighing search results and toplists Tuesday, April 17, 12
  • 35. Development at Spotify • Uses almost exclusively open source software • Git, Debian, Munin, Zabbix, Puppet, Teamcity... • Developers use whatever development tools they are comfortable with • Scrum or Kanban in three week iterations • DevOps heavy. Freaking awesome ops • Monitor and measure all the things! Tuesday, April 17, 12
  • 36. Development at Spotify • Development hubs in Stockholm, Göteborg and NYC • All in all, >220 people in tech • Very talented team • Hackdays and system owner days in each iteration • Hangs out on IRC • Growing and hiring Tuesday, April 17, 12
  • 39. Thank you Want to work at Spotify? http://www.spotify.com/jobs/ Tuesday, April 17, 12