SlideShare a Scribd company logo
11
Recommendations for
Building Machine Learning Software
Justin Basilico
Page Algorithms Engineering November 13, 2015
@JustinBasilico
SF 2015
22
Introduction
3
Change of focus
2006 2015
4
Netflix Scale
 > 69M members
 > 50 countries
 > 1000 device types
 > 3B hours/month
 36.4% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Everything is a Recommendation
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
7
Machine Learning Approach
Problem
Data
AlgorithmModel
Metrics
8
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
9
Design Considerations
Recommendations
• Personal
• Accurate
• Diverse
• Novel
• Fresh
Software
• Scalable
• Responsive
• Resilient
• Efficient
• Flexible
10
Software Stack
http://techblog.netflix.com
1111
Recommendations
12
Be flexible about where and when
computation happens
Recommendation 1
13
System Architecture
 Offline: Process data
 Batch learning
 Nearline: Process events
 Model evaluation
 Online learning
 Asynchronous
 Online: Process requests
 Real-time
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
14
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm like
SGD to produce factorization
 Publish video factors
 Nearline:
 Solve user factors
 Compute user-video dot products
 Store scores in cache
 Online:
 Presentation-context filtering
 Serve recommendations
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
V
sij=uivj Aui=b
sij
X≈UVt
X
sij>t
15
Think about distribution
starting from the outermost levels
Recommendation 2
16
Three levels of Learning Distribution/Parallelization
1. For each subset of the population (e.g.
region)
 Want independently trained and tuned models
2. For each combination of (hyper)parameters
 Simple: Grid search
 Better: Bayesian optimization using Gaussian
Processes
3. For each subset of the training data
 Distribute over machines (e.g. ADMM)
 Multi-core parallelism (e.g. HogWild)
 Or… use GPUs
17
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Spearmint or MOE for parameter
optimization
 Mesos, etc. for coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs
18
Design application software for
experimentation
Recommendation 3
19
Example development process
Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Data
discrepancies
Missing post-
processing
logic
Performance
issues
Actual
output
Experimentation environment
Production environment
(A/B test) Code
discrepancies
Final
model
20
Shared Engine
Avoid dual implementations
Experiment
code
Production
code
ProductionExperiment • Models
• Features
• Algorithms
• …
21
Solution: Share and lean towards production
 Developing machine learning is iterative
 Need a short pipeline to rapidly try ideas
 Want to see output of complete system
 So make the application easy to experiment with
 Share components between online, nearline, and offline
 Use the real code whenever possible
 Have well-defined interfaces and formats to allow you to go
off-the-beaten-path
22
Make algorithms extensible and modular
Recommendation 4
23
Make algorithms and models extensible and modular
 Algorithms often need to be tailored for a
specific application
 Treating an algorithm as a black box is
limiting
 Better to make algorithms extensible and
modular to allow for customization
 Separate models and algorithms
 Many algorithms can learn the same model
(i.e. linear binary classifier)
 Many algorithms can be trained on the same
types of data
 Support composing algorithms
Data
Parameters
Data
Model
Parameters
Model
Algorithm
Vs.
24
Provide building blocks
 Don’t start from scratch
 Linear algebra: Vectors, Matrices, …
 Statistics: Distributions, tests, …
 Models, features, metrics, ensembles, …
 Loss, distance, kernel, … functions
 Optimization, inference, …
 Layers, activation functions, …
 Initializers, stopping criteria, …
 …
 Domain-specific components
Build abstractions on
familiar concepts
Make the software put
them together
25
Example: Tailoring Random Forests
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
Use a custom
tree split
Customize to
run it for an
hour
Report a
custom metric
each iteration
Inspect the
ensemble
26
Describe your input and output
transformations with your model
Recommendation 5
27
Application
Putting learning in an application
Feature
Encoding
Output
Decoding
?
Machine
Learned Model
Rd ⟶ Rk
Application or model code?
28
Example: Simple ranking system
 High-level API: List<Video> rank(User u, List<Video> videos)
 Example model description file:
{
“type”: “ScoringRanker”,
“scorer”: {
“type”: “FeatureScorer”,
“features”: [
{“type”: “Popularity”, “days”: 10},
{“type”: “PredictedRating”}
],
“function”: {
“type”: “Linear”,
“bias”: -0.5,
“weights”: {
“popularity”: 0.2,
“predictedRating”: 1.2,
“predictedRating*popularity”:
3.5
}
}
}
Ranker
Scorer
Features
Linear function
Feature transformations
29
Don’t just rely on metrics for testing
Recommendation 6
30
Machine Learning and Testing
 Temptation: Use validation metrics to test software
 When things work and metrics go up this seems great
 When metrics don’t improve was it the
 code
 data
 metric
 idea
 …?
31
Reality of Testing
 Machine learning code involves intricate math and logic
 Rounding issues, corner cases, …
 Is that a + or -? (The math or paper could be wrong.)
 Solution: Unit test
 Testing of metric code is especially important
 Test the whole system: Just unit testing is not enough
 At a minimum, compare output for unexpected changes across
versions
3232
Conclusions
33
Two ways to solve computational problems
Know
solution
Write code
Compile
code
Test code Deploy code
Know
relevant
data
Develop
algorithmic
approach
Train model
on data using
algorithm
Validate
model with
metrics
Deploy
model
Software Development
Machine Learning
(steps may involve Software Development)
34
Take-aways for building machine learning software
 Building machine learning is an iterative process
 Make experimentation easy
 Take a holistic view of application where you are placing
learning
 Design your algorithms to be modular
 Look for the easy places to parallelize first
 Testing can be hard but is worthwhile
35
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico
We’re hiring

More Related Content

What's hot

Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Vishal Chowdhary
 
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the CloudMMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
Xavier Amatriain
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
Natalia Díaz Rodríguez
 
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Sri Ambati
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
Vivek Raja P S
 
Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
Axel de Romblay
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
Gianmario Spacagna
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
Samir Bessalah
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
Ning Jiang
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
Ning Jiang
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 

What's hot (20)

Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the CloudMMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
 
Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and Practices
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 

Viewers also liked

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
BigML, Inc
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
BigML, Inc
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Soft Computing
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
BigML, Inc
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
BigML, Inc
 
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic Regression
BigML, Inc
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
BigML, Inc
 
Séminaire Expérience Client
Séminaire Expérience ClientSéminaire Expérience Client
Séminaire Expérience Client
Soft Computing
 
Machine Learning et Intelligence Artificielle
Machine Learning et Intelligence ArtificielleMachine Learning et Intelligence Artificielle
Machine Learning et Intelligence Artificielle
Soft Computing
 
Données Personnelles
Données PersonnellesDonnées Personnelles
Données Personnelles
Soft Computing
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Phygital
PhygitalPhygital
Phygital
Soft Computing
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Viewers also liked (15)

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic Regression
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
Séminaire Expérience Client
Séminaire Expérience ClientSéminaire Expérience Client
Séminaire Expérience Client
 
Machine Learning et Intelligence Artificielle
Machine Learning et Intelligence ArtificielleMachine Learning et Intelligence Artificielle
Machine Learning et Intelligence Artificielle
 
Données Personnelles
Données PersonnellesDonnées Personnelles
Données Personnelles
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Phygital
PhygitalPhygital
Phygital
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to Recommendations for Building Machine Learning Software

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
Matei Zaharia
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
Eran Stiller
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Provectus
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRL
Mohammad Sabouri
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
Elmar Selbach
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
Mark Tabladillo
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
Databricks
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
Mostafa Majidpour
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
Amazon Web Services
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
Mark Tabladillo
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
Antoine Sauray
 

Similar to Recommendations for Building Machine Learning Software (20)

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRL
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 

More from Justin Basilico

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 

More from Justin Basilico (6)

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 

Recently uploaded

inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Recommendations for Building Machine Learning Software

  • 1. 11 Recommendations for Building Machine Learning Software Justin Basilico Page Algorithms Engineering November 13, 2015 @JustinBasilico SF 2015
  • 4. 4 Netflix Scale  > 69M members  > 50 countries  > 1000 device types  > 3B hours/month  36.4% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Everything is a Recommendation Rows Ranking Over 80% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 8. 8 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 9. 9 Design Considerations Recommendations • Personal • Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  • 12. 12 Be flexible about where and when computation happens Recommendation 1
  • 13. 13 System Architecture  Offline: Process data  Batch learning  Nearline: Process events  Model evaluation  Online learning  Asynchronous  Online: Process requests  Real-time Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 14. 14 Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X≈UVt X sij>t
  • 15. 15 Think about distribution starting from the outermost levels Recommendation 2
  • 16. 16 Three levels of Learning Distribution/Parallelization 1. For each subset of the population (e.g. region)  Want independently trained and tuned models 2. For each combination of (hyper)parameters  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes 3. For each subset of the training data  Distribute over machines (e.g. ADMM)  Multi-core parallelism (e.g. HogWild)  Or… use GPUs
  • 17. 17 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Spearmint or MOE for parameter optimization  Mesos, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 18. 18 Design application software for experimentation Recommendation 3
  • 19. 19 Example development process Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post- processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  • 20. 20 Shared Engine Avoid dual implementations Experiment code Production code ProductionExperiment • Models • Features • Algorithms • …
  • 21. 21 Solution: Share and lean towards production  Developing machine learning is iterative  Need a short pipeline to rapidly try ideas  Want to see output of complete system  So make the application easy to experiment with  Share components between online, nearline, and offline  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten-path
  • 22. 22 Make algorithms extensible and modular Recommendation 4
  • 23. 23 Make algorithms and models extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  • 24. 24 Provide building blocks  Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Loss, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  • 25. 25 Example: Tailoring Random Forests Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble
  • 26. 26 Describe your input and output transformations with your model Recommendation 5
  • 27. 27 Application Putting learning in an application Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk Application or model code?
  • 28. 28 Example: Simple ranking system  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } Ranker Scorer Features Linear function Feature transformations
  • 29. 29 Don’t just rely on metrics for testing Recommendation 6
  • 30. 30 Machine Learning and Testing  Temptation: Use validation metrics to test software  When things work and metrics go up this seems great  When metrics don’t improve was it the  code  data  metric  idea  …?
  • 31. 31 Reality of Testing  Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system: Just unit testing is not enough  At a minimum, compare output for unexpected changes across versions
  • 33. 33 Two ways to solve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  • 34. 34 Take-aways for building machine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of application where you are placing learning  Design your algorithms to be modular  Look for the easy places to parallelize first  Testing can be hard but is worthwhile
  • 35. 35 Thank You Justin Basilico jbasilico@netflix.com @JustinBasilico We’re hiring

Editor's Notes

  1. http://techblog.netflix.com/2013/03/system-architectures-for.html
  2. http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html