0
From 1 to 100 developers
Scaling for developer productivity at Spotify

@dawhiting
HUG UK @ Strata
11/11/2013
2

How do I scale?
How many nodes?
How much data?
How many records?
3

How do I scale my
development?

How many developers?
How many teams?
How many Hadoop jobs?
How much code?
Data Infrastr...
4

A brief history of Hadoop development at
Spotify
2008 - Spotify launches in Sweden

2009 - First Hadoop cluster for roy...
5

Issues
What could possibly go wrong?
•Contention for resources
•Repetition of code, repetition of data
•Poor code quali...
6

Contention for
resources
Priority and isolation
•What is important?
Hadoop scheduler
•Capacity scheduler
•Queue isolati...
7

Don’t Repeat
Yourself
Refactor data, not just code
•Make popular data available
pre-joined
•Analyse code to find jobs
w...
8

Code Quality &
Technical Debt
Stable platform
•Python -> JVM
Abolish custom infrastructure
•Off-the-shelf is often good...
9

HDFS
Retention policy
•Automatic deletion of old
intermediate data
•Opt-out, not opt-in
Establish convention
•Can you c...
10

Data Library
Core datasets
•Identify
•Catalogue
•Document
•Monitor
Data library as code library
•Easy to use
•Synced w...
11

You can have it easier than us
Act now
•Big Data technical debt is worse than normal technical debt
•Rewriting 10 jobs...
Want to join the band?
We’re hiring for Stockholm and NYC
Check out http://www.spotify.com/
jobs for more information.
Upcoming SlideShare
Loading in...5
×

Spotify: From 1 to 100 Hadoop developers

715

Published on

How Spotify scaled their Hadoop cluster and the people working on it from 1 to over 100 develop, and 1 node to now over 690 nodes pushing them to have the largest Hadoop cluster in Europe.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
715
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Spotify: From 1 to 100 Hadoop developers"

  1. 1. From 1 to 100 developers Scaling for developer productivity at Spotify @dawhiting HUG UK @ Strata 11/11/2013
  2. 2. 2 How do I scale? How many nodes? How much data? How many records?
  3. 3. 3 How do I scale my development? How many developers? How many teams? How many Hadoop jobs? How much code? Data Infrastructure - July 2013
  4. 4. 4 A brief history of Hadoop development at Spotify 2008 - Spotify launches in Sweden 2009 - First Hadoop cluster for royalties, 2 developers 2010 - Up to 37 nodes, BI team formed, 3 devs/3 analysts 2011 - to Elastic MapReduce 2012 - Back to own cluster, 60 -> 190 nodes, Infrastructure/Insights/ Tools team split 2013 - 6 teams just for data infrastructure, ~100 developers using Hadoop cluster.
  5. 5. 5 Issues What could possibly go wrong? •Contention for resources •Repetition of code, repetition of data •Poor code quality / technical debt •Disorganised HDFS •Data cataloguing
  6. 6. 6 Contention for resources Priority and isolation •What is important? Hadoop scheduler •Capacity scheduler •Queue isolation YARN •Resource allocation
  7. 7. 7 Don’t Repeat Yourself Refactor data, not just code •Make popular data available pre-joined •Analyse code to find jobs with the same dependencies Work at a higher level •MapReduce out, (S)Crunch in •Allow substitution of operations for cached data
  8. 8. 8 Code Quality & Technical Debt Stable platform •Python -> JVM Abolish custom infrastructure •Off-the-shelf is often good enough •Eg. Sqoop, Kafka, ... Testing •Make testing easier than running
  9. 9. 9 HDFS Retention policy •Automatic deletion of old intermediate data •Opt-out, not opt-in Establish convention •Can you correctly guess the path to the data you need? Enforce structure •Path literals are a code smell
  10. 10. 10 Data Library Core datasets •Identify •Catalogue •Document •Monitor Data library as code library •Easy to use •Synced with release cycles
  11. 11. 11 You can have it easier than us Act now •Big Data technical debt is worse than normal technical debt •Rewriting 10 jobs is easier than rewriting 300 Plan to decentralise •At some point it won’t be enough to trust your developers •You won’t be able to review every job forever Make it simpler to do things the right way •Example: build tools
  12. 12. Want to join the band? We’re hiring for Stockholm and NYC Check out http://www.spotify.com/ jobs for more information.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×