ML Infra @ Spotify
Lessons Learned
Romain Yon
Music Discovery in the 90s
Music Streaming Service
Launched in 2008
Premium and Free Tiers
Available in 78 Markets
Over 190M active users
More than 40M songs
Over 3B playlists
Over 1 billion plays per day
30% of these teams
use ML at some
capacity
Eng org map, blurred
Recommendation - What should appear in this users’ Discover
Weekly?
Ranking - Which shelves should appear on the home page?
Classification - Which items in our catalog contain certain
instruments?
Estimation - How likely is this user to skip an ad?
ML Use Cases
What is Machine Learning Infrastructure?
D. Sculley , Gary Holt , Daniel Golovin , Eugene Davydov , Todd Phillips , Dietmar Ebner , Vinay Chaudhary , Michael Young , Jean-Francois Crespo , Dan Dennison, Hidden technical debt in Machine learning systems,
Proceedings of the 28th International Conference on Neural Information Processing Systems, p.2503-2511, December 07-12, 2015, Montreal, Canada
Discover Weekly in 2016
Area of
learning/iterative
development
20% of efforts
80% of time
ML systems have a special capacity
for incurring technical debt,
because they have all of the
maintenance problems of
traditional code plus an additional
set of ML-specific issues¹
[1] D. Sculley , Gary Holt , Daniel Golovin , Eugene Davydov , Todd Phillips , Dietmar Ebner , Vinay Chaudhary , Michael Young , Jean-Francois Crespo , Dan Dennison, Hidden technical debt in Machine
learning systems, Proceedings of the 28th International Conference on Neural Information Processing Systems, p.2503-2511, December 07-12, 2015, Montreal, Canada
ML Infra: Goal
80% of efforts
ML Infra @ Spotify
Lessons Learned
Romain Yon
Part 1
(Supervised) ML System
Example
Personalized Re-Ranking
of the Browse Page
(hypothetical)
Client
Events
Client
Events
Browse
Events
Client
Events
Browse
Events UserId GenreId IsClicked
UserA Genre1 True
UserA Genre2 False
UserB Genre1 True
... ... ...
Client
Events
Browse
Events
Joined
Entity DataGenre Data
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity DataGenre Data
User Data
Playlist Data
UserId GenreId UserAge ...
UserA Genre1 42 ...
UserA Genre2 42 ...
UserB Genre1 13 ...
... ... ... ...
Client
Events
Browse
Events
Joined
Entity Data
Normalized
FeaturesGenre Data
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity DataGenre Data
User Data
Playlist Data
UserId GenreId UserAge ...
UserA Genre1 0.60 ...
UserA Genre2 0.60 ...
UserB Genre1 -1.13 ...
... ... ... ...
Normalized
Features
Client
Events
Browse
Events
Joined
Entity DataGenre Data
User Data
Playlist Data
UserId GenreId UserAge ...
UserA Genre1 0.60 ...
UserA Genre2 0.60 ...
UserB Genre1 -1.13 ...
... ... ... ...
Normalized
Features
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data Featurizer Model
Browse Service
Genre Service Playlist ServiceUser Service
Browse Service
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
Part 2
Lessons Learned
Challenge:
Building Reusable ML Infra
Learning 1
Rely on data standards
Standard ML data format @ Spotify
● Input to a ML pipeline should use tf.example stored inside tf.record
● Libraries will serve as interfaces for reading of the input data
● Tooling to create, share and discover ML datasets
tf.example
tf.example
tf.example
tf.example
tf.example as Interface
Business
Logic
Featran
tf.Transform
XGBoost
TensorFlow
Feature transformation Model
?? ??
tf.example as Interface
Business
Logic
Featran
tf.Transform
XGBoost
TensorFlow
Feature transformation Model
?? ??
tf.example as Interface
Business
Logic
Featran
tf.Transform
XGBoost
TensorFlow
Feature transformation Model
?? ??
tf.example as Interface
Business
Logic
Featran
tf.Transform
XGBoost
TensorFlow
Feature transformation Model
?? ??
tf.example
Challenge:
Training & Serving
divergence
Learning 2
Share logic & weights
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity Data
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
FeatureSpec.jar:
- StandardScaler(age)
- ...
Featran-0.2.1.jar
settings.json:
“age_mean”: 32
“age_stddev”: 16.7
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
(Pypi)
xgboost==0.81Genre Data Offline Eval
User Data
Playlist Data foo.model
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
Browse Ranking Service
Featran-0.2.1.jar
FeatureSpec.jar
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service settings.json
Model
Browse Ranking Service
Featran-0.2.1.jar
FeatureSpec.jar
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service settings.json
Model
Browse Ranking Service
Featran-0.2.1.jar
FeatureSpec.jar
xgboost-0.81.jar
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
foo.model
settings.json
Sharing logic & weights
Prediction setup:
X’ = ffeatures(X, Θfeatures )
Y = fmodel(X’, Θmodel )
Sharing logic & weights
Prediction setup:
X’ = ffeatures(X, Θfeatures )
Y = fmodel(X’, Θmodel )
Input Data
Sharing logic & weights
Prediction setup:
X’ = ffeatures(X, Θfeatures )
Y = fmodel(X’, Θmodel )
Logic (Code)
Weights (State)
Sharing logic & weights
● Weights need to be shared, both for model and transformation stages
● Sharing logic is very hard if different training and serving stacks
● The least moving pieces the least amount of issues
⇨ Try to group (ffeatures, Θfeatures, fmodel & Θmodel ) inside a single object
⇨ Google AI Blog: tf.Transform
Learning 3
Share decoration logic
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
ML Wisdom from Google
“The best way to make sure that you train like you serve is to
save the set of features used at serving time, and then
pipe those features to a log to use them at training time.”
⇨ Martin Zinkevich - Rules of ML
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
UserId GenreId UserAge ...
UserA Genre1 42 ...
UserA Genre2 42 ...
UserB Genre1 13 ...
... ... ... ...
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
GCS
Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data
Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval
Challenge:
Sustain High Reliability
“Using ML in real-world production systems is complicated by a host of issues
not found in small toy examples or even large offline research experiments.
Testing and monitoring are key considerations for assessing the production-
readiness of an ML system.”
Breck et al.
⇨ What’s your ML Test Score? A rubric for ML production systems
Learning 4
Validate Your Data
Data Validation
Three main stages of data validation
1. Validation of data against schema (human curated)
tf.metadata.schema
tf.metadata.schema
tf.metadata.schema
Data Validation
Three main stages of data validation
1. Validation of data against schema (human curated)
2. Validation of data against past data
tf.metadata.statistics
tf.metadata.statistics
tf.metadata.schema
tf.metadata.schema
Data Validation
Three main stages of data validation
1. Validation of data against schema (human curated)
2. Validation of data against past data
3. Validation of serving data against training data
⇨ Ideally all 3 should be used in tandem
tf-dv/.../validation_api.py
Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval
Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval
Normalized
Features
Generate
Statistics training_stats.pb
Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval
Normalized
Features
Generate
Statistics
previous_stats.pb
Validate
Data
schema.pb
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
GCS
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
GCS
Streaming
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
GCS
Streaming
Streaming Window
Generate
Statistics
training_stats.pb schema.pb
Data
Validation
Learning 5
Use “stateless” containers
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
GCS
New model?
Hot swap
Avoid model “hot swap”
ML is not a special snowflake:
● Avoid custom (model swap) logic
● Use (sealed) containers
● Use containers management systems (e.g. Kubernetes)
Learning 6
Leverage CI/CD for ML
CI/CD for Model
Critical to keep both quality & velocity high
● Use Continuous Integration (Offline & Online metrics)
● Use Continuous Delivery
● Use low user impact environments (Canaries / Shadow)
Summary: our six learnings
● Rely on data standards
● Share logic & weights
● Share decoration logic
● Validate your data
● Use “stateless” containers
● Leverage CI/CD for ML
Appendix
A future challenge
Browse Ranking Service
Featurizer
Model
Entity Data
Genre Service Playlist ServiceUser Service
Business
Logic
Browse Service
GCS
Browse Ranking Service
Featurizer
Model
Entity Data
EntityDataService
Business
Logic
Browse Service
GCS
EntityDataService
EntityDataService
Sounds exciting?!
● We have several openings for ML Infra engineers
● Application link: bit.ly/spotify-ml-infra-engineer
● Checkout Spotify Job page: spotifyjobs.com
● Questions? ⇨
Thanks!
Questions?

ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup