SlideShare a Scribd company logo
1 of 34
1 
Lessons Learned from 
Building Machine Learning Software at Netflix 
Justin Basilico 
Page Algorithms Engineering December 13, 2014 
@JustinBasilico 
Workshop 2014
2 
Introduction
3 
Introduction 
2006 2014
4 
Netflix Scale 
 > 50M members 
 > 40 countries 
 > 1000 device types 
 Hours: > 2B/month 
 Plays: > 70M/day 
 Log 100B events/day 
 34.2% of peak US 
downstream traffic
5 
Goal 
Help members find content to watch and enjoy 
to maximize member satisfaction and retention
6 
Everything is a Recommendation 
Rows 
Ranking 
Over 75% of what 
people watch 
comes from our 
recommendations 
Recommendations 
are driven by 
Machine Learning
7 
Machine Learning Approach 
Problem 
Data 
Metrics 
Model Algorithm
8 
Models & Algorithms 
 Regression (Linear, logistic, elastic net) 
 SVD and other Matrix Factorizations 
 Factorization Machines 
 Restricted Boltzmann Machines 
 Deep Neural Networks 
 Markov Models and Graph Algorithms 
 Clustering 
 Latent Dirichlet Allocation 
 Gradient Boosted Decision 
Trees/Random Forests 
 Gaussian Processes 
 …
9 
Design Considerations 
Recommendations 
• Personal 
• Accurate 
• Diverse 
• Novel 
• Fresh 
Software 
• Scalable 
• Responsive 
• Resilient 
• Efficient 
• Flexible
10 
Software Stack 
http://techblog.netflix.com
11 
Lessons Learned
12 
Lesson 1: 
Be flexible about where and when 
computation happens.
13 
System Architecture 
 Offline: Process data 
 Nearline: Process events 
 Online: Process requests 
 Learning, Features, or Model 
evaluation can be done at any 
level 
Netflix.Hermes 
Netflix.Manhattan 
Nearline 
Computation 
Models 
Online 
Data Service 
Offline Data 
Model 
training 
Online 
Computation 
Event Distribution 
User Event 
Queue 
Algorithm 
Service 
UI Client 
Member 
Query results 
Recommendations 
NEARLINE 
Machine 
Learning 
Algorithm 
Machine 
Learning 
Algorithm 
Offline 
Computation Machine 
Learning 
Algorithm 
Play, Rate, 
Browse... 
OFFLINE 
ONLINE 
More details on Netflix Techblog
14 
Where to place components? 
 Example: Matrix Factorization 
 Offline: 
 Collect sample of play data 
 Run batch learning algorithm like 
SGD to produce factorization 
 Publish video factors 
 Nearline: 
 Solve user factors 
 Compute user-video dot products 
 Store scores in cache 
 Online: 
 Presentation-context filtering 
 Serve recommendations 
Netflix.Hermes 
Netflix.Manhattan 
X≈UVt 
Nearline 
Computation 
Models 
Online 
Data Service 
Offline Data 
Model 
training 
Online 
Computation 
Event Distribution 
User Event 
Queue 
Algorithm 
Service 
UI Client 
Member 
Query results 
Recommendations 
NEARLINE 
Machine 
Learning 
Algorithm 
Machine 
Learning 
Algorithm 
Offline 
Computation Machine 
Learning 
Algorithm 
Play, Rate, 
Browse... 
OFFLINE 
ONLINE 
V 
sij=uivj Aui=b 
sij 
X 
sij>t
15 
Lesson 2: 
Think about distribution starting from the 
outermost levels.
16 
Three levels of Learning Distribution/Parallelization 
1. For each subset of the population (e.g. 
region) 
 Want independently trained and tuned models 
2. For each combination of (hyper)parameters 
 Simple: Grid search 
 Better: Bayesian optimization using Gaussian 
Processes 
3. For each subset of the training data 
 Distribute over machines (e.g. ADMM) 
 Multi-core parallelism (e.g. HogWild) 
 Or… use GPUs
17 
Example: Training Neural Networks 
 Level 1: Machines in different 
AWS regions 
 Level 2: Machines in same AWS 
region 
 Spearmint or MOE for parameter 
optimization 
 Condor, StarCluster, Mesos, etc. for 
coordination 
 Level 3: Highly optimized, parallel 
CUDA code on GPUs
18 
Lesson 3: 
Design application software for 
experimentation.
19 
Example development process 
Idea Data 
Offline 
Modeling 
(R, Python, 
MATLAB, …) 
Iterate 
Implement in 
production 
system (Java, 
C++, …) 
Data 
discrepancies 
Missing post-processing 
logic 
Performance 
issues 
Actual 
output 
Experimentation environment 
Production environment 
(A/B test) Code 
discrepancies 
Final 
model
20 
Avoid dual implementations 
Shared Engine 
Experiment 
code 
Production 
code 
Experiment Production • Models 
• Features 
• Algorithms 
• …
21 
Solution: Share and lean towards production 
 Developing machine learning is an iterative process 
 Want a short pipeline to rapidly try ideas 
 Want to see output of complete system, not just learned component 
 Make application components easy to experiment with 
 Share them between online, nearline, and offline 
 Make it possible to run individual parts of the software 
 Use the real code whenever possible 
 Have well-defined interfaces and formats to allow you to go 
off-the-beaten path
22 
Lesson 4: 
Make algorithms extensible and modular.
23 
Make algorithms and models extensible and modular 
 Algorithms often need to be tailored for a 
specific application 
 Treating an algorithm as a black box is 
limiting 
 Better to make algorithms extensible and 
modular to allow for customization 
 Separate models and algorithms 
 Many algorithms can learn the same model 
(i.e. linear binary classifier) 
 Many algorithms can be trained on the same 
types of data 
 Support composing algorithms 
Data 
Parameters 
Data 
Model 
Parameters 
Model 
Algorithm 
Vs.
24 
Provide building blocks 
 Don’t start from scratch 
 Linear algebra: Vectors, Matrices, … 
 Statistics: Distributions, tests, … 
 Models, features, metrics, ensembles, … 
 Cost, distance, kernel, … functions 
 Optimization, inference, … 
 Layers, activation functions, … 
 Initializers, stopping criteria, … 
 … 
 Domain-specific components 
Build abstractions on 
familiar concepts 
Make the software put 
them together
25 
Example: Tailoring Random Forests 
Use a custom 
tree split 
Customize to 
run it for an 
hour 
Report a 
custom metric 
each iteration 
Inspect the 
ensemble 
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
26 
Lesson 5: 
Describe your input and output 
transformations with your model.
27 
Putting learning in an application 
Application 
Application or model code? 
Feature 
Encoding 
Output 
Decoding 
? Machine 
Learned Model 
Rd ⟶ Rk
28 
Example: Simple ranking system 
 High-level API: List<Video> rank(User u, List<Video> videos) 
 Example model description file: 
{ 
“type”: “ScoringRanker”, 
“scorer”: { 
“type”: “FeatureScorer”, 
“features”: [ 
{“type”: “Popularity”, “days”: 10}, 
{“type”: “PredictedRating”} 
], 
“function”: { 
“type”: “Linear”, 
“bias”: -0.5, 
“weights”: { 
“popularity”: 0.2, 
“predictedRating”: 1.2, 
“predictedRating*popularity”: 
3.5 
} 
} 
} 
} 
Ranker 
Scorer 
Features 
Linear function 
Feature transformations
29 
Lesson 6: 
Don’t just rely on metrics for testing.
30 
Importance of Testing 
 Temptation: Use validation metrics to test software 
 When things work this seems great 
 When metrics don’t improve: was it the code, data, metric, idea, …? 
 Machine learning code involves intricate math and logic 
 Rounding issues, corner cases, … 
 Is that a + or -? (The math or paper could be wrong.) 
 Solution: Unit test 
 Testing of metric code is especially important 
 Test the whole system 
 Compare output for unexpected changes across versions
31 
Conclusions
32 
Two ways to solve computational problems 
Know 
solution 
Write code 
Compile 
code 
Test code Deploy code 
Know 
relevant 
data 
Develop 
algorithmic 
approach 
Train model 
on data using 
algorithm 
Validate 
model with 
metrics 
Deploy 
model 
Software Development 
Machine Learning 
(steps may involve Software Development)
33 
Take-aways for building machine learning software 
 Building machine learning is an iterative process 
 Make experimentation easy 
 Take a holistic view of both the application and experimental 
environments 
 Optimize only what matters 
 Testing can be hard but is worthwhile
Thank You Justin Basilico 
jbasilico@netflix.com 
34 @JustinBasilico 
We’re hiring

More Related Content

What's hot

Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixJiangwei Pan
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the WorldYves Raimond
 

What's hot (20)

Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 

Viewers also liked

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Building Recommender Systems for Fashion
Building Recommender Systems for FashionBuilding Recommender Systems for Fashion
Building Recommender Systems for FashionNick Landia
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Justin Basilico
 
Personalization Challenges in E-Learning
Personalization Challenges in E-LearningPersonalization Challenges in E-Learning
Personalization Challenges in E-LearningRoberto Turrin
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
 
Bootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation EngineBootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation EngineNeal Lathia
 

Viewers also liked (6)

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Building Recommender Systems for Fashion
Building Recommender Systems for FashionBuilding Recommender Systems for Fashion
Building Recommender Systems for Fashion
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Personalization Challenges in E-Learning
Personalization Challenges in E-LearningPersonalization Challenges in E-Learning
Personalization Challenges in E-Learning
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
 
Bootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation EngineBootstrapping a Destination Recommendation Engine
Bootstrapping a Destination Recommendation Engine
 

Similar to Lessons Learned from Building Machine Learning Software at Netflix

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMATLABISRAEL
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Building an ML model with zero code
Building an ML model with zero codeBuilding an ML model with zero code
Building an ML model with zero codeNick Trogh
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTKAshish Jaiman
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Jordi Cabot
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & ValidationElmar Selbach
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning DevelopmentMatei Zaharia
 

Similar to Lessons Learned from Building Machine Learning Software at Netflix (20)

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data Analytics
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Building an ML model with zero code
Building an ML model with zero codeBuilding an ML model with zero code
Building an ML model with zero code
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Scaling up Machine Learning Development
Scaling up Machine Learning DevelopmentScaling up Machine Learning Development
Scaling up Machine Learning Development
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Lessons Learned from Building Machine Learning Software at Netflix

  • 1. 1 Lessons Learned from Building Machine Learning Software at Netflix Justin Basilico Page Algorithms Engineering December 13, 2014 @JustinBasilico Workshop 2014
  • 4. 4 Netflix Scale  > 50M members  > 40 countries  > 1000 device types  Hours: > 2B/month  Plays: > 70M/day  Log 100B events/day  34.2% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Everything is a Recommendation Rows Ranking Over 75% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 7. 7 Machine Learning Approach Problem Data Metrics Model Algorithm
  • 8. 8 Models & Algorithms  Regression (Linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 9. 9 Design Considerations Recommendations • Personal • Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  • 10. 10 Software Stack http://techblog.netflix.com
  • 12. 12 Lesson 1: Be flexible about where and when computation happens.
  • 13. 13 System Architecture  Offline: Process data  Nearline: Process events  Online: Process requests  Learning, Features, or Model evaluation can be done at any level Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 14. 14 Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan X≈UVt Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X sij>t
  • 15. 15 Lesson 2: Think about distribution starting from the outermost levels.
  • 16. 16 Three levels of Learning Distribution/Parallelization 1. For each subset of the population (e.g. region)  Want independently trained and tuned models 2. For each combination of (hyper)parameters  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes 3. For each subset of the training data  Distribute over machines (e.g. ADMM)  Multi-core parallelism (e.g. HogWild)  Or… use GPUs
  • 17. 17 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Spearmint or MOE for parameter optimization  Condor, StarCluster, Mesos, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 18. 18 Lesson 3: Design application software for experimentation.
  • 19. 19 Example development process Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post-processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  • 20. 20 Avoid dual implementations Shared Engine Experiment code Production code Experiment Production • Models • Features • Algorithms • …
  • 21. 21 Solution: Share and lean towards production  Developing machine learning is an iterative process  Want a short pipeline to rapidly try ideas  Want to see output of complete system, not just learned component  Make application components easy to experiment with  Share them between online, nearline, and offline  Make it possible to run individual parts of the software  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten path
  • 22. 22 Lesson 4: Make algorithms extensible and modular.
  • 23. 23 Make algorithms and models extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  • 24. 24 Provide building blocks  Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Cost, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  • 25. 25 Example: Tailoring Random Forests Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
  • 26. 26 Lesson 5: Describe your input and output transformations with your model.
  • 27. 27 Putting learning in an application Application Application or model code? Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk
  • 28. 28 Example: Simple ranking system  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } } Ranker Scorer Features Linear function Feature transformations
  • 29. 29 Lesson 6: Don’t just rely on metrics for testing.
  • 30. 30 Importance of Testing  Temptation: Use validation metrics to test software  When things work this seems great  When metrics don’t improve: was it the code, data, metric, idea, …?  Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system  Compare output for unexpected changes across versions
  • 32. 32 Two ways to solve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  • 33. 33 Take-aways for building machine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of both the application and experimental environments  Optimize only what matters  Testing can be hard but is worthwhile
  • 34. Thank You Justin Basilico jbasilico@netflix.com 34 @JustinBasilico We’re hiring

Editor's Notes

  1. http://techblog.netflix.com/2013/03/system-architectures-for.html
  2. http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html
  3. http://jobs.netflix.com/jobs.php?id=NFX01267