ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup

ML Infra @ Spotify
Lessons Learned
Romain Yon

Music Streaming Service
Launched in 2008
Premium and Free Tiers
Available in 78 Markets

Over 190M active users
More than 40M songs
Over 3B playlists
Over 1 billion plays per day

30% of these teams
use ML at some
capacity
Eng org map, blurred

Recommendation - What should appear in this users’ Discover
Weekly?
Ranking - Which shelves should appear on the home page?
Classification - Which items in our catalog contain certain
instruments?
Estimation - How likely is this user to skip an ad?
ML Use Cases

What is Machine Learning Infrastructure?
D. Sculley , Gary Holt , Daniel Golovin , Eugene Davydov , Todd Phillips , Dietmar Ebner , Vinay Chaudhary , Michael Young , Jean-Francois Crespo , Dan Dennison, Hidden technical debt in Machine learning systems,
Proceedings of the 28th International Conference on Neural Information Processing Systems, p.2503-2511, December 07-12, 2015, Montreal, Canada

Area of
learning/iterative
development
20% of efforts
80% of time

ML systems have a special capacity
for incurring technical debt,
because they have all of the
maintenance problems of
traditional code plus an additional
set of ML-specific issues¹
[1] D. Sculley , Gary Holt , Daniel Golovin , Eugene Davydov , Todd Phillips , Dietmar Ebner , Vinay Chaudhary , Michael Young , Jean-Francois Crespo , Dan Dennison, Hidden technical debt in Machine
learning systems, Proceedings of the 28th International Conference on Neural Information Processing Systems, p.2503-2511, December 07-12, 2015, Montreal, Canada

Part 1
(Supervised) ML System
Example

Personalized Re-Ranking
of the Browse Page
(hypothetical)

Client
Events
Browse
Events UserId GenreId IsClicked
UserA Genre1 True
UserA Genre2 False
UserB Genre1 True
... ... ...

Client
Events
Browse
Events
Joined
Entity DataGenre Data
User Data
Playlist Data

Client
Events
Browse
Events
Joined
User Data
Playlist Data
UserId GenreId UserAge ...
UserA Genre1 42 ...
UserA Genre2 42 ...
UserB Genre1 13 ...
... ... ... ...

Client
Events
Browse
Events
Joined
Entity Data
Normalized
FeaturesGenre Data
User Data
Playlist Data

Client
Events
Browse
Events
Joined
User Data
Playlist Data
UserA Genre1 0.60 ...
UserA Genre2 0.60 ...
UserB Genre1 -1.13 ...
... ... ... ...
Normalized
Features

Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data
User Data
Playlist Data

Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
TrainingGenre Data Offline Eval
User Data
Playlist Data

Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
Model
User Data
Playlist Data Featurizer Model

Genre Service Playlist ServiceUser Service
Browse Service

Browse Ranking Service
Featurizer
Model
Entity Data
Business
Logic
Browse Service

Challenge:
Building Reusable ML Infra

Learning 1
Rely on data standards

Standard ML data format @ Spotify
● Input to a ML pipeline should use tf.example stored inside tf.record
● Libraries will serve as interfaces for reading of the input data
● Tooling to create, share and discover ML datasets

tf.example as Interface
Business
Logic
Featran
tf.Transform
XGBoost
TensorFlow
Feature transformation Model
?? ??

tf.example as Interface
Business
Logic
Featran
tf.Transform
XGBoost
TensorFlow
Feature transformation Model
?? ??
tf.example

Challenge:
Training & Serving
divergence

Learning 2
Share logic & weights

Client
Events
Browse
Events
Joined
Entity Data
Model
User Data
Playlist Data
FeatureSpec.jar:
- StandardScaler(age)
- ...
Featran-0.2.1.jar
settings.json:
“age_mean”: 32
“age_stddev”: 16.7

Client
Events
Browse
Events
Joined
Entity Data
Normalized
Features
(Pypi)
xgboost==0.81Genre Data Offline Eval
User Data
Playlist Data foo.model

Featran-0.2.1.jar
FeatureSpec.jar
Entity Data
Business
Logic
Browse Service settings.json
Model

Featran-0.2.1.jar
FeatureSpec.jar
xgboost-0.81.jar
Entity Data
Business
Logic
Browse Service
foo.model
settings.json

Sharing logic & weights
Prediction setup:
X’ = ffeatures(X, Θfeatures )
Y = fmodel(X’, Θmodel )

Prediction setup:
Input Data

Prediction setup:
Logic (Code)
Weights (State)

● Weights need to be shared, both for model and transformation stages
● Sharing logic is very hard if different training and serving stacks
● The least moving pieces the least amount of issues
⇨ Try to group (ffeatures, Θfeatures, fmodel & Θmodel ) inside a single object

⇨ Google AI Blog: tf.Transform

Learning 3
Share decoration logic

ML Wisdom from Google
“The best way to make sure that you train like you serve is to
save the set of features used at serving time, and then
pipe those features to a log to use them at training time.”
⇨ Martin Zinkevich - Rules of ML

Featurizer
Model
Entity Data
Business
Logic
Browse Service
UserA Genre1 42 ...
UserA Genre2 42 ...
UserB Genre1 13 ...
... ... ... ...

Featurizer
Model
Entity Data
Business
Logic
Browse Service
GCS

Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval

Challenge:
Sustain High Reliability

“Using ML in real-world production systems is complicated by a host of issues
not found in small toy examples or even large offline research experiments.
Testing and monitoring are key considerations for assessing the production-
readiness of an ML system.”
Breck et al.
⇨ What’s your ML Test Score? A rubric for ML production systems

Data Validation
Three main stages of data validation
1. Validation of data against schema (human curated)

Data Validation
2. Validation of data against past data

Data Validation
2. Validation of data against past data
3. Validation of serving data against training data
⇨ Ideally all 3 should be used in tandem

Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval
Normalized
Features
Generate
Statistics training_stats.pb

Client
Events
Browse
Events
Normalized
Features
Model
Training
Entity Data
Logs
Offline Eval
Normalized
Features
Generate
Statistics
previous_stats.pb
Validate
Data
schema.pb

Featurizer
Model
Entity Data
Business
Logic
Browse Service
GCS
Streaming

Featurizer
Model
Entity Data
Business
Logic
Browse Service
GCS
Streaming
Streaming Window
Generate
Statistics
training_stats.pb schema.pb
Data
Validation

Learning 5
Use “stateless” containers

Featurizer
Model
Entity Data
Business
Logic
Browse Service
GCS
New model?
Hot swap

Avoid model “hot swap”
ML is not a special snowflake:
● Avoid custom (model swap) logic
● Use (sealed) containers
● Use containers management systems (e.g. Kubernetes)

Learning 6
Leverage CI/CD for ML

CI/CD for Model
Critical to keep both quality & velocity high
● Use Continuous Integration (Offline & Online metrics)
● Use Continuous Delivery
● Use low user impact environments (Canaries / Shadow)

Summary: our six learnings
● Rely on data standards
● Share logic & weights
● Share decoration logic
● Validate your data
● Use “stateless” containers
● Leverage CI/CD for ML

Featurizer
Model
Entity Data
EntityDataService
Business
Logic
Browse Service
GCS
EntityDataService
EntityDataService

Sounds exciting?!
● We have several openings for ML Infra engineers
● Application link: bit.ly/spotify-ml-infra-engineer
● Checkout Spotify Job page: spotifyjobs.com
● Questions? ⇨

ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup

More Related Content

What's hot

Similar to ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup

Recently uploaded

ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup