Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
July 10, 2013
Data center &
Backend buildout
Emil Fredriksson
David Poblador i Garcia
@davidpoblador
July 10, 2013
• Some numbers about Spotify
• Data centers, Infrastructure
and Capacity
• How Spotify works
• What are we w...
Some numbers
•1000M+ playlists
•Over 24M active users
•Over 20M songs (adding 20K every day)
•Over 6M paying subscribers
•...
Operations
in numbers
•90+ backend systems
•23 SRE engineers
•2 locations: NYC and Stockholm
•Around 15 teams building the...
July 10, 2013
Data centers,
infrastructure
and capacity
Data centers:
our factories
•Input electricity, servers and software.
Get the Spotify services as output
•We have to scale...
The capacity
challenge
•Supporting our service for a growing number
of users
•New more complex features require server
cap...
Delivering capacity
•We operate four data centers with more than
5 000 servers and 140Gbps of Internet
capacity
•In 2008 t...
What we need in a
data center
•Reliable power supply
•Air conditioning
•Secure space
•Network POPs
•Remote hands
•Shipping...
Pods – standard
data center units
•Deploying a new data centers takes a long
time!
•We need to be agile and fast to keep u...
Pods – standard
data center units
•44 racks in one pod, about 1500 servers
•Racks redundantly connected with 10GE
uplink t...
July 10, 2013
Data center
locations
•You can not go faster than light
•Distance == Latency
•Current locations: Stockholm, London,
Ashbur...
So what about the
public clouds?
•Commoditization of the data center is
happening now, few companies will need to
build da...
July 10, 2013
Automated
installation
•Information about servers go in to a database:
MAC address, hardware configuration, location,
netw...
July 10, 2013
How Spotify works
access
point
storage
search
playlist
user
web api
browse
...
Backend services
Clients
www.spotify.com
ads
social
key
Faceb...
DNS à la Spotify
•Distribution of clients
•Error reporting by clients
•Service discovery
•DHT ring configuration
DNS: Service
discovery
•_playlist: service name
•_http: protocol
•3600: ttl
•10: prio
•50: weight
•8081: port
•host1.spoti...
DNS: DHT rings
Which service instance should I ask
for a resource?
•Configuration
config._key._http.spotify.net 3600 TXT “...
Databases:
Cassandra & Postgres
•Critical and consistency important:
PostgreSQL
•Huge, growing fast, eventual consistency ...
Storage:
Production Storage
•Read only
•Large files
•HTTP based
•nginx + storage proxies + Amazon S3
Other types of storage
•Hadoop
•Tokyo Cabinet
•CDB
•BDB
Communication protocols
between services: HTTP
•Originally used by every system
•Simple
•Well known
•Battle tested
•Proper...
Communication protocols
between services: Hermes
Thin layer on top of ØMQ
Data in messages is serialized as protobuf
•Serv...
Configuration management
•We use Puppet
•Installs Debian packages based on recipes
•Teams developing a system write Puppet...
July 10, 2013
Working on...
Operational responsibility
delegation
•Each feature team takes responsibility for the
entire stack: from developing a syst...
Service Discovery
•DNS will stay
•We can’t afford rewriting every system
•We like to be able to use standard tools (dig)
t...
Unit of deployment
(containers)
•Runs on top of our OS platform
•Consistency between different environments (testing,
prod...
Incident management
process improvements
•Main objective: A type of incident happens only once.
•Streamline internal and e...
More stuff being done
•Explaining our challenges to the world
•Opensourcing many of our tools
•Self-service provisioning o...
July 10, 2013
We are hiring
spoti.fi/ops-jobs
July 10, 2013
Gràcies! Q & A
spoti.fi/ops-jobs
Emil Fredriksson / David Poblador i Garcia
Upcoming SlideShare
Loading in …5
×

Spotify: Data center & Backend buildout

20,826 views

Published on

In this talk Emil Fredriksson and David Poblador i Garcia explain how Spotify builds its infrastructure in order to deliver millions of songs to millions of users.

We explain how we manage to support our development teams to build features by developing a highly scalable infrastructure.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Stop getting scammed by online, programs that don't even work! ▲▲▲ http://scamcb.com/ezpayjobs/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • It's always fascinating to understand what's going on in the back end. That's a lot of data storage and strategy is crucial. Good work!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Spotify: Data center & Backend buildout

  1. 1. July 10, 2013 Data center & Backend buildout Emil Fredriksson David Poblador i Garcia @davidpoblador
  2. 2. July 10, 2013 • Some numbers about Spotify • Data centers, Infrastructure and Capacity • How Spotify works • What are we working on now?
  3. 3. Some numbers •1000M+ playlists •Over 24M active users •Over 20M songs (adding 20K every day) •Over 6M paying subscribers •Available in 28 markets
  4. 4. Operations in numbers •90+ backend systems •23 SRE engineers •2 locations: NYC and Stockholm •Around 15 teams building the Spotify Platform in Operations and Infrastructure
  5. 5. July 10, 2013 Data centers, infrastructure and capacity
  6. 6. Data centers: our factories •Input electricity, servers and software. Get the Spotify services as output •We have to scale it up as we grow our business •Where the software meets the real world and customers •If it does not work, the music stops playing
  7. 7. The capacity challenge •Supporting our service for a growing number of users •New more complex features require server capacity •Keeping up with very fast software development
  8. 8. Delivering capacity •We operate four data centers with more than 5 000 servers and 140Gbps of Internet capacity •In 2008 there were 20 servers •Renting space in large data center facilities •Owning and operating hardware and network
  9. 9. What we need in a data center •Reliable power supply •Air conditioning •Secure space •Network POPs •Remote hands •Shipping and handling
  10. 10. Pods – standard data center units •Deploying a new data centers takes a long time! •We need to be agile and fast to keep up with the product development •We solve this by standardizing our data centers and networking in to pods and pre- provision servers •Target is to keep 30% spare capacity at all times
  11. 11. Pods – standard data center units •44 racks in one pod, about 1500 servers •Racks redundantly connected with 10GE uplink to core switches •Pod is directly connected to the Internet via multiple 10GE transit links •Build it the same way every time •Include the base infrastructure services
  12. 12. July 10, 2013
  13. 13. Data center locations •You can not go faster than light •Distance == Latency •Current locations: Stockholm, London, Ashburn (US east coast), San Jose (US west coast) •Static content on CDN. Dynamic content comes from our data centers
  14. 14. So what about the public clouds? •Commoditization of the data center is happening now, few companies will need to build data centers in the future •We already use both AWS S3 and EC2, usage will increase •Challenges that still remain: •Inter node network performance •Cost (at large scale) •Flexible hardware configurations
  15. 15. July 10, 2013
  16. 16. Automated installation •Information about servers go in to a database: MAC address, hardware configuration, location, networks, hostnames and state(available, in-use) •Automatic generation of DNS, DHCP and PXE records •Cobbler used as an installation server •Single command installs multiple servers in multiple data centers
  17. 17. July 10, 2013 How Spotify works
  18. 18. access point storage search playlist user web api browse ... Backend services Clients www.spotify.com ads social key Facebook Amazon S3 CDN Content ingestion, indexing, and transcoding Log analysis (hadoop) Record labels
  19. 19. DNS à la Spotify •Distribution of clients •Error reporting by clients •Service discovery •DHT ring configuration
  20. 20. DNS: Service discovery •_playlist: service name •_http: protocol •3600: ttl •10: prio •50: weight •8081: port •host1.spotify.net: host _playlist._http.spotify.net 3600 SRV 10 50 8081 host1.spotify.net.
  21. 21. DNS: DHT rings Which service instance should I ask for a resource? •Configuration config._key._http.spotify.net 3600 TXT “slaves=0” config._key._http.spotify.net 3600 TXT “slaves=2 redundancy=host” •Mapping ring segment to service instance tokens.8081.host1.spotify.net 3600 TXT “00112233445566778899aabbccddeeff”
  22. 22. Databases: Cassandra & Postgres •Critical and consistency important: PostgreSQL •Huge, growing fast, eventual consistency OK: Cassandra
  23. 23. Storage: Production Storage •Read only •Large files •HTTP based •nginx + storage proxies + Amazon S3
  24. 24. Other types of storage •Hadoop •Tokyo Cabinet •CDB •BDB
  25. 25. Communication protocols between services: HTTP •Originally used by every system •Simple •Well known •Battle tested •Proper Implementations in many languages •Each service defines its own RESTful protocol
  26. 26. Communication protocols between services: Hermes Thin layer on top of ØMQ Data in messages is serialized as protobuf •Services define their APIs partly as protobuf Hermes is embedded in the client-AP protocol •AP doesn’t need to translate protocols, it is just a message router. In addition to request/reply, we get pub/sub.
  27. 27. Configuration management •We use Puppet •Installs Debian packages based on recipes •Teams developing a system write Puppet manifests •Hiera: simple Hierarchical Database for service parameters •Not the most scalable solution
  28. 28. July 10, 2013 Working on...
  29. 29. Operational responsibility delegation •Each feature team takes responsibility for the entire stack: from developing a system to running and operating it. •Mentality shift: from “it works” to “it scales” •Full responsibility: capacity planning, monitoring, incident management. •Risk of reinventing square wheels. Closing the feedback loop is key.
  30. 30. Service Discovery •DNS will stay •We can’t afford rewriting every system •We like to be able to use standard tools (dig) to troubleshoot •We aim to have a handsfree zone file management •Automated registration and deregistration of nodes is a goal
  31. 31. Unit of deployment (containers) •Runs on top of our OS platform •Consistency between different environments (testing, production, public cloud, development boxes...) •Version N looks always the same •Testability improves •Deployments are fast. Gradual rollouts FTW! •Rollbacks are easy •Configurations could be part of the bundle
  32. 32. Incident management process improvements •Main objective: A type of incident happens only once. •Streamline internal and external communication •Teams developing a system lead the process for incidents connected with it •SRE leads the process for incidents affecting multiple pieces that require a higher level of coordination •Mitigation > Post-mortem > Remediation > Resolution
  33. 33. More stuff being done •Explaining our challenges to the world •Opensourcing many of our tools •Self-service provisioning of capacity •Improvements in our continuous integration pipeline •Network platform •OS platform •Automation everywhere •Recruitment
  34. 34. July 10, 2013 We are hiring spoti.fi/ops-jobs
  35. 35. July 10, 2013 Gràcies! Q & A spoti.fi/ops-jobs Emil Fredriksson / David Poblador i Garcia

×