SlideShare a Scribd company logo
1 of 41
Making Netflix Machine
Learning Algorithms
Reliable
Tony Jebara & Justin Basilico
ICML Reliable ML in the Wild Workshop
2017-08-11
2
2006
6M members
US only
3
2017
> 100M members
> 190 countries
Goal:
Help members find content to watch and enjoy to
maximize member satisfaction and retention
Algorithm Areas
▪ Personalized Ranking
▪ Top-N Ranking
▪ Trending Now
▪ Continue Watching
▪ Video-Video Similarity
▪ Personalized Page Generation
▪ Search
▪ Personalized Image Selection
▪ ...
Models & Algorithms
▪ Regression (linear, logistic, ...)
▪ Matrix Factorization
▪ Factorization Machines
▪ Clustering & Topic Models
▪ Bayesian Nonparametrics
▪ Tree Ensembles (RF, GBDTs, …)
▪ Neural Networks (Deep, RBMs, …)
▪ Gaussian Processes
▪ Bandits
▪ …
A/B tests validate an overall approach works in
expectation
But they run in online production, so every A/B
tested model needs to be reliable
Innovation Cycle
Idea
Offline
Experiment
Full Online
Deployment
Online A/B
Test
1) Collect massive data sets
2) Try billions of hypotheses to find* one(s) with support
*Find with computational and statistical efficiency
Batch Learning
USERS
TIME
Collect Learn A/B Test Roll-out
Data Model
A
B
Batch Learning
USERS
TIME
A
BREGRET
Batch Learning
Collect Learn A/B Test Roll-out
Data Model
Explore and exploit → Less regret
Helps cold start models in the wild
Maintain some exploration for nonstationarity
Adapt reliably to changing and new data
Epsilon-greedy, UCB, Thompson Sampling, etc.
USERS
TIME
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Bandit Learning
USERS
TIME
Bandit Learning
1) Uniform population of hypotheses
2) Choose a random hypothesis h
3) Act according to h and observe outcome
4) Re-weight hypotheses
5) Go to 2
TS
Bandits for selecting images
Bandits for selecting images
Different preferences for genre/theme portrayed
Contextual bandits to personalize images
Different preferences for cast members
Contextual bandits to personalize images
Putting Machine
Learning In Production
18
Training Pipeline
Application
“Typical” ML Pipeline: A tale of two worlds
Historical
Data
Generate
Features
Train Models
Validate &
Select
Models
Application
Logic
Live
Data
Load
Model
Offline
Online
Collect
Labels
Publish
Model
Evaluate
Model
Experimentation
Offline: Model trains reliably
Online: Model performs reliably
… plus all involved systems and data
What needs to be reliable?
Detection
Response
Prevention
Reliability Approach
Reliability in Training
To be reliable, first learning must be repeatable
Automate retraining of models
Akin to continuous deployment in software engineering
How often depends on application; typically daily
Detect problems and fail fast to avoid using a bad model
Retraining
Example Training Pipeline
● Periodic retraining to refresh models
● Workflow system to manage pipeline
● Each step is a Spark or Docker job
● Each step has checks in place
○ Response: Stop workflow and send alerts
What to check?
● For both absolute and relative to previous runs...
● Offline metrics on hold-out set
● Data size for train and test
● Feature distributions
● Feature importance
● Large (unexpected) changes in output between models
● Model coverage (e.g. number of items it can predict)
● Model integrity
● Error counters
● ...
Example 1: number of samples
Numberoftrainingsamples
Time
Example 1: number of samples
Numberoftrainingsamples
Time
Alarm fires and
model is not
published due to
anomaly
Example 2: offline metrics
Liftw.r.tbaseline
Time
Example 2: offline metrics
Liftw.r.tbaseline
Time
Alarm fires and model is
not published due to
anomaly
Absolute threshold: if lift
< X then alarm also fires
Testing
- Unit testing and code coverage
- Integration testing
Improve quality and reliability of upstream data feeds
Reuse same data and code offline and online to avoid
training/serving skew
Running from multiple random seeds
Preventing offline training failures
Reliability Online
Check for invariants in inputs and outputs
Can catch many unexpected changes, bad assumptions
and engineering issues
Examples:
- Feature values are out-of-range
- Output is NaN or Infinity
- Probabilities are < 0 or > 1 or don’t sum to 1
Basic sanity checks
Online staleness checks can cover a wide range of failures in the training
and publishing
Example checks:
- How long ago was the most recent model published?
- How old is the data it was trained on?
- How old are the inputs it is consuming?
Model and input staleness
Stale
Model
Ageofmodel
Track the quality of the model
- Compare prediction to actual
behavior
- Online equivalents of offline metrics
For online learning or bandits reserve a
fraction of data for a simple policy (e.g.
epsilon-greedy) as a sanity check
Online metrics
Your model isn’t working right or your input data
is bad: Now what?
Common approaches in personalization space:
- Use previous model?
- Use previous output?
- Use simplified model or heuristic?
- If calling system is resilient: turn off that
subsystem?
Response: Graceful Degradation
Want to choose personalized images per profile
- Image lookup has O(10M) requests per second (e.g.
“House of Cards” -> Image URL)
Approach:
- Precompute show to image mapping per user
near-line system
- Store mapping in fast distributed cache
- At request time
- Lookup user-specific mapping in cache
- Fallback to unpersonalized results
- Store mapping for request
- Secondary fallback to default image for a
missing show
Example: Image Precompute Personalized Selection
(Precompute)
Unpersonalized
Default Image
Netflix runs 100% in AWS cloud
Needs to be reliable on unreliable infrastructure
Want a service to operate when an AWS instance
fails?
Randomly terminate instances (Chaos Monkey)
Want Netflix to operate when an entire AWS region
is having a problem?
Disable entire AWS regions (Chaos Kong)
Prevention: Failure Injection
What failure scenarios do you want your model to be
robust to?
Models are very sensitive to their input data
- Can be noisy, corrupt, delayed, incomplete, missing,
unknown
Train model to be resilient by injecting these conditions
into the training and testing data
Failure Injection for ML
Want to add a feature for type of row on homepage
Genre, Because you watched, Top picks, ...
Problem: Model may see new types online before in training data
Solution: Add unknown category and perturb a small fraction of
training data to that type
Rule-of-thumb: Apply for all categorical features unless new
category isn’t possible (Days of week -> OK, countries -> Not)
Example: Categorical features
Conclusions.
● Consider online learning and bandits
● Build off best practices from software engineering
● Automate as much as possible
● Adjust your data to cover the conditions you want
your model to be resilient to
● Detect problems and degrade gracefully
Take aways
Thank you.
Tony Jebara & Justin Basilico
Yes, we are hiring!

More Related Content

What's hot

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix ScaleJustin Basilico
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 

What's hot (20)

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 

Similar to Making Netflix Machine Learning Algorithms Reliable

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsShannon Cuthbertson
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence WorkshopDavid Tan
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 

Similar to Making Netflix Machine Learning Algorithms Reliable (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 

More from Justin Basilico

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 

More from Justin Basilico (6)

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 

Recently uploaded

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Making Netflix Machine Learning Algorithms Reliable

  • 1. Making Netflix Machine Learning Algorithms Reliable Tony Jebara & Justin Basilico ICML Reliable ML in the Wild Workshop 2017-08-11
  • 3. 3 2017 > 100M members > 190 countries
  • 4. Goal: Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 5. Algorithm Areas ▪ Personalized Ranking ▪ Top-N Ranking ▪ Trending Now ▪ Continue Watching ▪ Video-Video Similarity ▪ Personalized Page Generation ▪ Search ▪ Personalized Image Selection ▪ ...
  • 6. Models & Algorithms ▪ Regression (linear, logistic, ...) ▪ Matrix Factorization ▪ Factorization Machines ▪ Clustering & Topic Models ▪ Bayesian Nonparametrics ▪ Tree Ensembles (RF, GBDTs, …) ▪ Neural Networks (Deep, RBMs, …) ▪ Gaussian Processes ▪ Bandits ▪ …
  • 7. A/B tests validate an overall approach works in expectation But they run in online production, so every A/B tested model needs to be reliable Innovation Cycle Idea Offline Experiment Full Online Deployment Online A/B Test
  • 8. 1) Collect massive data sets 2) Try billions of hypotheses to find* one(s) with support *Find with computational and statistical efficiency Batch Learning
  • 9. USERS TIME Collect Learn A/B Test Roll-out Data Model A B Batch Learning
  • 11. Explore and exploit → Less regret Helps cold start models in the wild Maintain some exploration for nonstationarity Adapt reliably to changing and new data Epsilon-greedy, UCB, Thompson Sampling, etc. USERS TIME x x x x x x x x x x x x x x x x x Bandit Learning
  • 12. USERS TIME Bandit Learning 1) Uniform population of hypotheses 2) Choose a random hypothesis h 3) Act according to h and observe outcome 4) Re-weight hypotheses 5) Go to 2 TS
  • 15. Different preferences for genre/theme portrayed Contextual bandits to personalize images
  • 16. Different preferences for cast members Contextual bandits to personalize images
  • 18. 18 Training Pipeline Application “Typical” ML Pipeline: A tale of two worlds Historical Data Generate Features Train Models Validate & Select Models Application Logic Live Data Load Model Offline Online Collect Labels Publish Model Evaluate Model Experimentation
  • 19. Offline: Model trains reliably Online: Model performs reliably … plus all involved systems and data What needs to be reliable?
  • 22. To be reliable, first learning must be repeatable Automate retraining of models Akin to continuous deployment in software engineering How often depends on application; typically daily Detect problems and fail fast to avoid using a bad model Retraining
  • 23. Example Training Pipeline ● Periodic retraining to refresh models ● Workflow system to manage pipeline ● Each step is a Spark or Docker job ● Each step has checks in place ○ Response: Stop workflow and send alerts
  • 24. What to check? ● For both absolute and relative to previous runs... ● Offline metrics on hold-out set ● Data size for train and test ● Feature distributions ● Feature importance ● Large (unexpected) changes in output between models ● Model coverage (e.g. number of items it can predict) ● Model integrity ● Error counters ● ...
  • 25. Example 1: number of samples Numberoftrainingsamples Time
  • 26. Example 1: number of samples Numberoftrainingsamples Time Alarm fires and model is not published due to anomaly
  • 27. Example 2: offline metrics Liftw.r.tbaseline Time
  • 28. Example 2: offline metrics Liftw.r.tbaseline Time Alarm fires and model is not published due to anomaly Absolute threshold: if lift < X then alarm also fires
  • 29. Testing - Unit testing and code coverage - Integration testing Improve quality and reliability of upstream data feeds Reuse same data and code offline and online to avoid training/serving skew Running from multiple random seeds Preventing offline training failures
  • 31. Check for invariants in inputs and outputs Can catch many unexpected changes, bad assumptions and engineering issues Examples: - Feature values are out-of-range - Output is NaN or Infinity - Probabilities are < 0 or > 1 or don’t sum to 1 Basic sanity checks
  • 32. Online staleness checks can cover a wide range of failures in the training and publishing Example checks: - How long ago was the most recent model published? - How old is the data it was trained on? - How old are the inputs it is consuming? Model and input staleness Stale Model Ageofmodel
  • 33. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics For online learning or bandits reserve a fraction of data for a simple policy (e.g. epsilon-greedy) as a sanity check Online metrics
  • 34. Your model isn’t working right or your input data is bad: Now what? Common approaches in personalization space: - Use previous model? - Use previous output? - Use simplified model or heuristic? - If calling system is resilient: turn off that subsystem? Response: Graceful Degradation
  • 35. Want to choose personalized images per profile - Image lookup has O(10M) requests per second (e.g. “House of Cards” -> Image URL) Approach: - Precompute show to image mapping per user near-line system - Store mapping in fast distributed cache - At request time - Lookup user-specific mapping in cache - Fallback to unpersonalized results - Store mapping for request - Secondary fallback to default image for a missing show Example: Image Precompute Personalized Selection (Precompute) Unpersonalized Default Image
  • 36. Netflix runs 100% in AWS cloud Needs to be reliable on unreliable infrastructure Want a service to operate when an AWS instance fails? Randomly terminate instances (Chaos Monkey) Want Netflix to operate when an entire AWS region is having a problem? Disable entire AWS regions (Chaos Kong) Prevention: Failure Injection
  • 37. What failure scenarios do you want your model to be robust to? Models are very sensitive to their input data - Can be noisy, corrupt, delayed, incomplete, missing, unknown Train model to be resilient by injecting these conditions into the training and testing data Failure Injection for ML
  • 38. Want to add a feature for type of row on homepage Genre, Because you watched, Top picks, ... Problem: Model may see new types online before in training data Solution: Add unknown category and perturb a small fraction of training data to that type Rule-of-thumb: Apply for all categorical features unless new category isn’t possible (Days of week -> OK, countries -> Not) Example: Categorical features
  • 40. ● Consider online learning and bandits ● Build off best practices from software engineering ● Automate as much as possible ● Adjust your data to cover the conditions you want your model to be resilient to ● Detect problems and degrade gracefully Take aways
  • 41. Thank you. Tony Jebara & Justin Basilico Yes, we are hiring!