SlideShare a Scribd company logo
Making Netflix Machine
Learning Algorithms
Reliable
Tony Jebara & Justin Basilico
ICML Reliable ML in the Wild Workshop
2017-08-11
2
2006
6M members
US only
3
2017
> 100M members
> 190 countries
Goal:
Help members find content to watch and enjoy to
maximize member satisfaction and retention
Algorithm Areas
▪ Personalized Ranking
▪ Top-N Ranking
▪ Trending Now
▪ Continue Watching
▪ Video-Video Similarity
▪ Personalized Page Generation
▪ Search
▪ Personalized Image Selection
▪ ...
Models & Algorithms
▪ Regression (linear, logistic, ...)
▪ Matrix Factorization
▪ Factorization Machines
▪ Clustering & Topic Models
▪ Bayesian Nonparametrics
▪ Tree Ensembles (RF, GBDTs, …)
▪ Neural Networks (Deep, RBMs, …)
▪ Gaussian Processes
▪ Bandits
▪ …
A/B tests validate an overall approach works in
expectation
But they run in online production, so every A/B
tested model needs to be reliable
Innovation Cycle
Idea
Offline
Experiment
Full Online
Deployment
Online A/B
Test
1) Collect massive data sets
2) Try billions of hypotheses to find* one(s) with support
*Find with computational and statistical efficiency
Batch Learning
USERS
TIME
Collect Learn A/B Test Roll-out
Data Model
A
B
Batch Learning
USERS
TIME
A
BREGRET
Batch Learning
Collect Learn A/B Test Roll-out
Data Model
Explore and exploit → Less regret
Helps cold start models in the wild
Maintain some exploration for nonstationarity
Adapt reliably to changing and new data
Epsilon-greedy, UCB, Thompson Sampling, etc.
USERS
TIME
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Bandit Learning
USERS
TIME
Bandit Learning
1) Uniform population of hypotheses
2) Choose a random hypothesis h
3) Act according to h and observe outcome
4) Re-weight hypotheses
5) Go to 2
TS
Bandits for selecting images
Bandits for selecting images
Different preferences for genre/theme portrayed
Contextual bandits to personalize images
Different preferences for cast members
Contextual bandits to personalize images
Putting Machine
Learning In Production
18
Training Pipeline
Application
“Typical” ML Pipeline: A tale of two worlds
Historical
Data
Generate
Features
Train Models
Validate &
Select
Models
Application
Logic
Live
Data
Load
Model
Offline
Online
Collect
Labels
Publish
Model
Evaluate
Model
Experimentation
Offline: Model trains reliably
Online: Model performs reliably
… plus all involved systems and data
What needs to be reliable?
Detection
Response
Prevention
Reliability Approach
Reliability in Training
To be reliable, first learning must be repeatable
Automate retraining of models
Akin to continuous deployment in software engineering
How often depends on application; typically daily
Detect problems and fail fast to avoid using a bad model
Retraining
Example Training Pipeline
● Periodic retraining to refresh models
● Workflow system to manage pipeline
● Each step is a Spark or Docker job
● Each step has checks in place
○ Response: Stop workflow and send alerts
What to check?
● For both absolute and relative to previous runs...
● Offline metrics on hold-out set
● Data size for train and test
● Feature distributions
● Feature importance
● Large (unexpected) changes in output between models
● Model coverage (e.g. number of items it can predict)
● Model integrity
● Error counters
● ...
Example 1: number of samples
Numberoftrainingsamples
Time
Example 1: number of samples
Numberoftrainingsamples
Time
Alarm fires and
model is not
published due to
anomaly
Example 2: offline metrics
Liftw.r.tbaseline
Time
Example 2: offline metrics
Liftw.r.tbaseline
Time
Alarm fires and model is
not published due to
anomaly
Absolute threshold: if lift
< X then alarm also fires
Testing
- Unit testing and code coverage
- Integration testing
Improve quality and reliability of upstream data feeds
Reuse same data and code offline and online to avoid
training/serving skew
Running from multiple random seeds
Preventing offline training failures
Reliability Online
Check for invariants in inputs and outputs
Can catch many unexpected changes, bad assumptions
and engineering issues
Examples:
- Feature values are out-of-range
- Output is NaN or Infinity
- Probabilities are < 0 or > 1 or don’t sum to 1
Basic sanity checks
Online staleness checks can cover a wide range of failures in the training
and publishing
Example checks:
- How long ago was the most recent model published?
- How old is the data it was trained on?
- How old are the inputs it is consuming?
Model and input staleness
Stale
Model
Ageofmodel
Track the quality of the model
- Compare prediction to actual
behavior
- Online equivalents of offline metrics
For online learning or bandits reserve a
fraction of data for a simple policy (e.g.
epsilon-greedy) as a sanity check
Online metrics
Your model isn’t working right or your input data
is bad: Now what?
Common approaches in personalization space:
- Use previous model?
- Use previous output?
- Use simplified model or heuristic?
- If calling system is resilient: turn off that
subsystem?
Response: Graceful Degradation
Want to choose personalized images per profile
- Image lookup has O(10M) requests per second (e.g.
“House of Cards” -> Image URL)
Approach:
- Precompute show to image mapping per user
near-line system
- Store mapping in fast distributed cache
- At request time
- Lookup user-specific mapping in cache
- Fallback to unpersonalized results
- Store mapping for request
- Secondary fallback to default image for a
missing show
Example: Image Precompute Personalized Selection
(Precompute)
Unpersonalized
Default Image
Netflix runs 100% in AWS cloud
Needs to be reliable on unreliable infrastructure
Want a service to operate when an AWS instance
fails?
Randomly terminate instances (Chaos Monkey)
Want Netflix to operate when an entire AWS region
is having a problem?
Disable entire AWS regions (Chaos Kong)
Prevention: Failure Injection
What failure scenarios do you want your model to be
robust to?
Models are very sensitive to their input data
- Can be noisy, corrupt, delayed, incomplete, missing,
unknown
Train model to be resilient by injecting these conditions
into the training and testing data
Failure Injection for ML
Want to add a feature for type of row on homepage
Genre, Because you watched, Top picks, ...
Problem: Model may see new types online before in training data
Solution: Add unknown category and perturb a small fraction of
training data to that type
Rule-of-thumb: Apply for all categorical features unless new
category isn’t possible (Days of week -> OK, countries -> Not)
Example: Categorical features
Conclusions.
● Consider online learning and bandits
● Build off best practices from software engineering
● Automate as much as possible
● Adjust your data to cover the conditions you want
your model to be resilient to
● Detect problems and degrade gracefully
Take aways
Thank you.
Tony Jebara & Justin Basilico
Yes, we are hiring!

More Related Content

What's hot

Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel
Sudeep Das, Ph.D.
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
Anoop Deoras
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 

What's hot (20)

Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Personalization at Netflix - Making Stories Travel
Personalization at Netflix -  Making Stories Travel Personalization at Netflix -  Making Stories Travel
Personalization at Netflix - Making Stories Travel
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 

Similar to Making Netflix Machine Learning Algorithms Reliable

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
antimo musone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
DotNetCampus
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Shannon Cuthbertson
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Splunk
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
Splunk
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
David Tan
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 

Similar to Making Netflix Machine Learning Algorithms Reliable (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 

Making Netflix Machine Learning Algorithms Reliable

  • 1. Making Netflix Machine Learning Algorithms Reliable Tony Jebara & Justin Basilico ICML Reliable ML in the Wild Workshop 2017-08-11
  • 3. 3 2017 > 100M members > 190 countries
  • 4. Goal: Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 5. Algorithm Areas ▪ Personalized Ranking ▪ Top-N Ranking ▪ Trending Now ▪ Continue Watching ▪ Video-Video Similarity ▪ Personalized Page Generation ▪ Search ▪ Personalized Image Selection ▪ ...
  • 6. Models & Algorithms ▪ Regression (linear, logistic, ...) ▪ Matrix Factorization ▪ Factorization Machines ▪ Clustering & Topic Models ▪ Bayesian Nonparametrics ▪ Tree Ensembles (RF, GBDTs, …) ▪ Neural Networks (Deep, RBMs, …) ▪ Gaussian Processes ▪ Bandits ▪ …
  • 7. A/B tests validate an overall approach works in expectation But they run in online production, so every A/B tested model needs to be reliable Innovation Cycle Idea Offline Experiment Full Online Deployment Online A/B Test
  • 8. 1) Collect massive data sets 2) Try billions of hypotheses to find* one(s) with support *Find with computational and statistical efficiency Batch Learning
  • 9. USERS TIME Collect Learn A/B Test Roll-out Data Model A B Batch Learning
  • 11. Explore and exploit → Less regret Helps cold start models in the wild Maintain some exploration for nonstationarity Adapt reliably to changing and new data Epsilon-greedy, UCB, Thompson Sampling, etc. USERS TIME x x x x x x x x x x x x x x x x x Bandit Learning
  • 12. USERS TIME Bandit Learning 1) Uniform population of hypotheses 2) Choose a random hypothesis h 3) Act according to h and observe outcome 4) Re-weight hypotheses 5) Go to 2 TS
  • 15. Different preferences for genre/theme portrayed Contextual bandits to personalize images
  • 16. Different preferences for cast members Contextual bandits to personalize images
  • 18. 18 Training Pipeline Application “Typical” ML Pipeline: A tale of two worlds Historical Data Generate Features Train Models Validate & Select Models Application Logic Live Data Load Model Offline Online Collect Labels Publish Model Evaluate Model Experimentation
  • 19. Offline: Model trains reliably Online: Model performs reliably … plus all involved systems and data What needs to be reliable?
  • 22. To be reliable, first learning must be repeatable Automate retraining of models Akin to continuous deployment in software engineering How often depends on application; typically daily Detect problems and fail fast to avoid using a bad model Retraining
  • 23. Example Training Pipeline ● Periodic retraining to refresh models ● Workflow system to manage pipeline ● Each step is a Spark or Docker job ● Each step has checks in place ○ Response: Stop workflow and send alerts
  • 24. What to check? ● For both absolute and relative to previous runs... ● Offline metrics on hold-out set ● Data size for train and test ● Feature distributions ● Feature importance ● Large (unexpected) changes in output between models ● Model coverage (e.g. number of items it can predict) ● Model integrity ● Error counters ● ...
  • 25. Example 1: number of samples Numberoftrainingsamples Time
  • 26. Example 1: number of samples Numberoftrainingsamples Time Alarm fires and model is not published due to anomaly
  • 27. Example 2: offline metrics Liftw.r.tbaseline Time
  • 28. Example 2: offline metrics Liftw.r.tbaseline Time Alarm fires and model is not published due to anomaly Absolute threshold: if lift < X then alarm also fires
  • 29. Testing - Unit testing and code coverage - Integration testing Improve quality and reliability of upstream data feeds Reuse same data and code offline and online to avoid training/serving skew Running from multiple random seeds Preventing offline training failures
  • 31. Check for invariants in inputs and outputs Can catch many unexpected changes, bad assumptions and engineering issues Examples: - Feature values are out-of-range - Output is NaN or Infinity - Probabilities are < 0 or > 1 or don’t sum to 1 Basic sanity checks
  • 32. Online staleness checks can cover a wide range of failures in the training and publishing Example checks: - How long ago was the most recent model published? - How old is the data it was trained on? - How old are the inputs it is consuming? Model and input staleness Stale Model Ageofmodel
  • 33. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics For online learning or bandits reserve a fraction of data for a simple policy (e.g. epsilon-greedy) as a sanity check Online metrics
  • 34. Your model isn’t working right or your input data is bad: Now what? Common approaches in personalization space: - Use previous model? - Use previous output? - Use simplified model or heuristic? - If calling system is resilient: turn off that subsystem? Response: Graceful Degradation
  • 35. Want to choose personalized images per profile - Image lookup has O(10M) requests per second (e.g. “House of Cards” -> Image URL) Approach: - Precompute show to image mapping per user near-line system - Store mapping in fast distributed cache - At request time - Lookup user-specific mapping in cache - Fallback to unpersonalized results - Store mapping for request - Secondary fallback to default image for a missing show Example: Image Precompute Personalized Selection (Precompute) Unpersonalized Default Image
  • 36. Netflix runs 100% in AWS cloud Needs to be reliable on unreliable infrastructure Want a service to operate when an AWS instance fails? Randomly terminate instances (Chaos Monkey) Want Netflix to operate when an entire AWS region is having a problem? Disable entire AWS regions (Chaos Kong) Prevention: Failure Injection
  • 37. What failure scenarios do you want your model to be robust to? Models are very sensitive to their input data - Can be noisy, corrupt, delayed, incomplete, missing, unknown Train model to be resilient by injecting these conditions into the training and testing data Failure Injection for ML
  • 38. Want to add a feature for type of row on homepage Genre, Because you watched, Top picks, ... Problem: Model may see new types online before in training data Solution: Add unknown category and perturb a small fraction of training data to that type Rule-of-thumb: Apply for all categorical features unless new category isn’t possible (Days of week -> OK, countries -> Not) Example: Categorical features
  • 40. ● Consider online learning and bandits ● Build off best practices from software engineering ● Automate as much as possible ● Adjust your data to cover the conditions you want your model to be resilient to ● Detect problems and degrade gracefully Take aways
  • 41. Thank you. Tony Jebara & Justin Basilico Yes, we are hiring!