11
Recommendations for
Building Machine Learning Software
Justin Basilico
Page Algorithms Engineering November 13, 2015
@JustinBasilico
SF 2015
22
Introduction
3
Change of focus
2006 2015
4
Netflix Scale
 > 69M members
 > 50 countries
 > 1000 device types
 > 3B hours/month
 36.4% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Everything is a Recommendation
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
7
Machine Learning Approach
Problem
Data
AlgorithmModel
Metrics
8
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
9
Design Considerations
Recommendations
• Personal
• Accurate
• Diverse
• Novel
• Fresh
Software
• Scalable
• Responsive
• Resilient
• Efficient
• Flexible
10
Software Stack
http://techblog.netflix.com
1111
Recommendations
12
Be flexible about where and when
computation happens
Recommendation 1
13
System Architecture
 Offline: Process data
 Batch learning
 Nearline: Process events
 Model evaluation
 Online learning
 Asynchronous
 Online: Process requests
 Real-time
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
14
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm like
SGD to produce factorization
 Publish video factors
 Nearline:
 Solve user factors
 Compute user-video dot products
 Store scores in cache
 Online:
 Presentation-context filtering
 Serve recommendations
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
V
sij=uivj Aui=b
sij
X≈UVt
X
sij>t
15
Think about distribution
starting from the outermost levels
Recommendation 2
16
Three levels of Learning Distribution/Parallelization
1. For each subset of the population (e.g.
region)
 Want independently trained and tuned models
2. For each combination of (hyper)parameters
 Simple: Grid search
 Better: Bayesian optimization using Gaussian
Processes
3. For each subset of the training data
 Distribute over machines (e.g. ADMM)
 Multi-core parallelism (e.g. HogWild)
 Or… use GPUs
17
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Spearmint or MOE for parameter
optimization
 Mesos, etc. for coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs
18
Design application software for
experimentation
Recommendation 3
19
Example development process
Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Data
discrepancies
Missing post-
processing
logic
Performance
issues
Actual
output
Experimentation environment
Production environment
(A/B test) Code
discrepancies
Final
model
20
Shared Engine
Avoid dual implementations
Experiment
code
Production
code
ProductionExperiment • Models
• Features
• Algorithms
• …
21
Solution: Share and lean towards production
 Developing machine learning is iterative
 Need a short pipeline to rapidly try ideas
 Want to see output of complete system
 So make the application easy to experiment with
 Share components between online, nearline, and offline
 Use the real code whenever possible
 Have well-defined interfaces and formats to allow you to go
off-the-beaten-path
22
Make algorithms extensible and modular
Recommendation 4
23
Make algorithms and models extensible and modular
 Algorithms often need to be tailored for a
specific application
 Treating an algorithm as a black box is
limiting
 Better to make algorithms extensible and
modular to allow for customization
 Separate models and algorithms
 Many algorithms can learn the same model
(i.e. linear binary classifier)
 Many algorithms can be trained on the same
types of data
 Support composing algorithms
Data
Parameters
Data
Model
Parameters
Model
Algorithm
Vs.
24
Provide building blocks
 Don’t start from scratch
 Linear algebra: Vectors, Matrices, …
 Statistics: Distributions, tests, …
 Models, features, metrics, ensembles, …
 Loss, distance, kernel, … functions
 Optimization, inference, …
 Layers, activation functions, …
 Initializers, stopping criteria, …
 …
 Domain-specific components
Build abstractions on
familiar concepts
Make the software put
them together
25
Example: Tailoring Random Forests
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
Use a custom
tree split
Customize to
run it for an
hour
Report a
custom metric
each iteration
Inspect the
ensemble
26
Describe your input and output
transformations with your model
Recommendation 5
27
Application
Putting learning in an application
Feature
Encoding
Output
Decoding
?
Machine
Learned Model
Rd ⟶ Rk
Application or model code?
28
Example: Simple ranking system
 High-level API: List<Video> rank(User u, List<Video> videos)
 Example model description file:
{
“type”: “ScoringRanker”,
“scorer”: {
“type”: “FeatureScorer”,
“features”: [
{“type”: “Popularity”, “days”: 10},
{“type”: “PredictedRating”}
],
“function”: {
“type”: “Linear”,
“bias”: -0.5,
“weights”: {
“popularity”: 0.2,
“predictedRating”: 1.2,
“predictedRating*popularity”:
3.5
}
}
}
Ranker
Scorer
Features
Linear function
Feature transformations
29
Don’t just rely on metrics for testing
Recommendation 6
30
Machine Learning and Testing
 Temptation: Use validation metrics to test software
 When things work and metrics go up this seems great
 When metrics don’t improve was it the
 code
 data
 metric
 idea
 …?
31
Reality of Testing
 Machine learning code involves intricate math and logic
 Rounding issues, corner cases, …
 Is that a + or -? (The math or paper could be wrong.)
 Solution: Unit test
 Testing of metric code is especially important
 Test the whole system: Just unit testing is not enough
 At a minimum, compare output for unexpected changes across
versions
3232
Conclusions
33
Two ways to solve computational problems
Know
solution
Write code
Compile
code
Test code Deploy code
Know
relevant
data
Develop
algorithmic
approach
Train model
on data using
algorithm
Validate
model with
metrics
Deploy
model
Software Development
Machine Learning
(steps may involve Software Development)
34
Take-aways for building machine learning software
 Building machine learning is an iterative process
 Make experimentation easy
 Take a holistic view of application where you are placing
learning
 Design your algorithms to be modular
 Look for the easy places to parallelize first
 Testing can be hard but is worthwhile
35
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico
We’re hiring

Recommendations for Building Machine Learning Software

  • 1.
    11 Recommendations for Building MachineLearning Software Justin Basilico Page Algorithms Engineering November 13, 2015 @JustinBasilico SF 2015
  • 2.
  • 3.
  • 4.
    4 Netflix Scale  >69M members  > 50 countries  > 1000 device types  > 3B hours/month  36.4% of peak US downstream traffic
  • 5.
    5 Goal Help members findcontent to watch and enjoy to maximize member satisfaction and retention
  • 6.
    6 Everything is aRecommendation Rows Ranking Over 80% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 7.
  • 8.
    8 Models & Algorithms Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 9.
    9 Design Considerations Recommendations • Personal •Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  • 10.
  • 11.
  • 12.
    12 Be flexible aboutwhere and when computation happens Recommendation 1
  • 13.
    13 System Architecture  Offline:Process data  Batch learning  Nearline: Process events  Model evaluation  Online learning  Asynchronous  Online: Process requests  Real-time Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 14.
    14 Where to placecomponents?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X≈UVt X sij>t
  • 15.
    15 Think about distribution startingfrom the outermost levels Recommendation 2
  • 16.
    16 Three levels ofLearning Distribution/Parallelization 1. For each subset of the population (e.g. region)  Want independently trained and tuned models 2. For each combination of (hyper)parameters  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes 3. For each subset of the training data  Distribute over machines (e.g. ADMM)  Multi-core parallelism (e.g. HogWild)  Or… use GPUs
  • 17.
    17 Example: Training NeuralNetworks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Spearmint or MOE for parameter optimization  Mesos, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 18.
    18 Design application softwarefor experimentation Recommendation 3
  • 19.
    19 Example development process IdeaData Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post- processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  • 20.
    20 Shared Engine Avoid dualimplementations Experiment code Production code ProductionExperiment • Models • Features • Algorithms • …
  • 21.
    21 Solution: Share andlean towards production  Developing machine learning is iterative  Need a short pipeline to rapidly try ideas  Want to see output of complete system  So make the application easy to experiment with  Share components between online, nearline, and offline  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten-path
  • 22.
    22 Make algorithms extensibleand modular Recommendation 4
  • 23.
    23 Make algorithms andmodels extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  • 24.
    24 Provide building blocks Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Loss, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  • 25.
    25 Example: Tailoring RandomForests Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble
  • 26.
    26 Describe your inputand output transformations with your model Recommendation 5
  • 27.
    27 Application Putting learning inan application Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk Application or model code?
  • 28.
    28 Example: Simple rankingsystem  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } Ranker Scorer Features Linear function Feature transformations
  • 29.
    29 Don’t just relyon metrics for testing Recommendation 6
  • 30.
    30 Machine Learning andTesting  Temptation: Use validation metrics to test software  When things work and metrics go up this seems great  When metrics don’t improve was it the  code  data  metric  idea  …?
  • 31.
    31 Reality of Testing Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system: Just unit testing is not enough  At a minimum, compare output for unexpected changes across versions
  • 32.
  • 33.
    33 Two ways tosolve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  • 34.
    34 Take-aways for buildingmachine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of application where you are placing learning  Design your algorithms to be modular  Look for the easy places to parallelize first  Testing can be hard but is worthwhile
  • 35.
    35 Thank You JustinBasilico jbasilico@netflix.com @JustinBasilico We’re hiring

Editor's Notes

  • #14 http://techblog.netflix.com/2013/03/system-architectures-for.html
  • #18 http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html