David Poblador i Garcia // @davidpoblador
Service Availability Lead
DevOps at Spotify:
There and Back Again
david poblador i garcia
@davidpoblador
involved in free software since 1999
started up FS OSS consultancy company in 2000
free software in the public administration
leading infrastructure and operations teams
DevOps at Spotify:
from day 0
the beginning
a handful of…
engineers
a handful of…
systems
one sysadmin
one sysadmin
datacenters servers monitoring
on-call operating systems conf management
the result was…
a lot of technical debt…
Not
our
DC
but it was awesome…
and we moved fast…
Spotify interest over time (2008)
kept growing in all dimensions:
datacenters
employees
systems
servers
around 2010
DevOps was fading away…
40ish engineers
4 people in ops
dozens of systems
ownership issues
teams were changing fast
priorities moving
Operations team not scaling
the result was…
technical debt
lack of proper tooling
2013 and onwards
Spotify in numbers:
55 markets
20M+ songs
1.5 billion playlists
24M active users
6M+ paying subscribers
Spotify in numbers:
50+ teams building products/features
around 100 backend systems
4+ datacenters
6000+ servers
DevOps is back
every team is responsible for their
systems. end to end:
it works
it runs
it scales
we embed SRE engineers
in teams as needed
we have a core-SRE team in
charge of coordination duties:
cross cutting incidents
core on-call
…
teams take on-call duties
teams take all operational duties:
provisioning
deployment
capacity planning
incident remediation
operations and infrastructure
merge
I/O builds the Spotify platform:
alerting
backups
monitoring
data pipelines
conf management
testing environment
service ecosystem
data collection
procurement
provisioning
network
…
we encourage teams to build
missing pieces
results so far
results so far
20% of our teams are taking full
operational responsibility
results so far
the availability for the systems has not
been damaged (it has even improved)
results so far
we remove technical debt
faster than we create it
results so far
high adoption rate of our I/O platform
results so far
very promising results in our joint
ventures with feature teams
results so far
I/O teams are also taking full operational
responsibility for their systems
and some mistakes!
problems
lack of buy in
hidden work

(old ownership model)
Tack!
DevOps at Spotify: There and Back Again
David Poblador i Garcia
@davidpoblador
dpoblador@spotify.com

DevOps at Spotify: There and Back Again