SlideShare a Scribd company logo
1 of 48
Download to read offline
Machine Learning At
Netflix Scale
Aish Fenton
Manager - Research Engineering
@aishfenton
Everything is a
recommendation
4
Top Picks for Aish
Movies based on books
Because you watched Bob’s Burgers
Rank based on your taste
Rankbasedonyourtaste
75% of plays come
from homepage
Back Story…
Proxy question:
▪ Accuracy in predicted rating
▪ Improve by 10% = $1million!
What we were interested in:
▪ High quality recommendations
predicted
actual
SVD RBMs
Top two results still used in production!
>
2006 2013
• > 44M members
• > 40 countries
• > 5B hours in Q3 2013
• Log 100B events/day
• 31.62% of peak US downstream traffic
Data and Models
▪ > 40M subscribers
▪ Ratings: ~5M/day
▪ Searches: >3M/day
▪ Plays: > 50M/day
▪ Streamed hours:
o 5B hours in Q3 2013
Geo Info
Time
Impressions
Device Info
Metadata
Social
Ratings
Demographics
Member Behavior
Plays
Aish House of Cards
Latent User Vector
Latent Item Vector
3.53
RU
M
u1 u2 u3
m1 !
m2!
m3
House of Cards
Aish Aish
House of Cards
Mean Rating My Bias
Movie Bias
Interaction
Mean Rating My Bias
Movie Bias
Interaction
3.55 = 2.50 + -1.5 + 1.2 + pq
My rating for
House of Cards
R
3.53
U
M
u1 u2 u3
m1 !
m2!
m3
House of Cards
Aish
2.35
1.34
Time
T
t1 t2 t3 Time
▪ Matrix/Tensor Factorization
▪ Regression models (Logistic, Linear, Elastic nets)
▪ Factorization Machines
▪ Restricted Boltzmann Machines
▪ Markov Chains & other graph models
▪ Clustering / Topic Models
▪ Neural Networks
▪ Association Rules
▪ GBDT/RF
▪ …
Popularity
+ Ratings
+ More Features & Optimized Models
0%
50%
100%
150%
200%
250%
300%
Improvement Over Baseline
Anatomy of a
Machine Learning
Platform
Problem
Data
Experiment
Offline
Produce
Model
Test /
Metrics
Near-line
Online
UI Clients
Event
Distribution
Online
Algs
Model
Trainer
Pre-
compute
AB Test
Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation
Platform
S3 / HDFS
Offline
Metrics
Query Tools
Models
Models
Near-line
Online
UI Clients
Event
Distribution
Online
Algs
Model
Trainer
Pre-
compute
AB Test
Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation
Platform
S3 / HDFS
Offline
Metrics
Query Tools
Models
Models
▪ App Logs
▪ User Actions
▪ Ratings
▪ Plays
▪ Queue Adds
▪ Algo Actions
▪ Impressions (Presentation Bias)
▪ Context
▪ Device Info
▪ User Demographics
▪ Social
▪ Time
▪ …
Many different types of data…
Near-line
Online
UI Clients
Event
Distribution
Online
Algs
Model
Trainer
Pre-
compute
AB Test
Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation
Platform
S3 / HDFS
Offline
Metrics
Query Tools
Models
Models
Embedded
Embedded
Weights
Real-time popularity of movie
Example: Neural Network Training
θ
Input OutputHidden Layer
Input OutputHidden Layers
Neural Network Training
1,536 cores
G2 Instances
$0.60 p/h
But… things can go astray
Near-line
Online
UI Clients
Event
Distribution
Online
Algs
Model
Trainer
Pre-
compute
AB Test
Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation
Platform
S3 / HDFS
Offline
Metrics
Query Tools
Models
Models
RU
M
Pre-compute
u1 u2 u3Online
Near-line
Online
UI Clients
Event
Distribution
Online
Algs
Model
Trainer
Pre-
compute
AB Test
Metrics
API Layer
Monitoring
Offline
Hadoop / Data Warehouse
Experimentation
Platform
S3 / HDFS
Offline
Metrics
Query Tools
Models
Models
Aish played HoC
Publish new model
for Aish
Aish Fenton
@aishfenton
https://www.linkedin.com/profile/view?id=47917219

More Related Content

What's hot

Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 

What's hot (20)

Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Recommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 

Viewers also liked

Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 

Viewers also liked (8)

Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at Netflix
 
ARTIFICIAL INTELLIGENCE AT WORK
ARTIFICIAL INTELLIGENCE AT WORKARTIFICIAL INTELLIGENCE AT WORK
ARTIFICIAL INTELLIGENCE AT WORK
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Personalization - 10 Lessons Learned from Netflix
Personalization - 10 Lessons Learned from NetflixPersonalization - 10 Lessons Learned from Netflix
Personalization - 10 Lessons Learned from Netflix
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 

Similar to Machine Learning at Netflix Scale

Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
DataStax
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Spark Summit
 

Similar to Machine Learning at Netflix Scale (20)

Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Darin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationDarin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_Presentation
 
(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014
(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014
(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic Stack
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case Study
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Data Access Patterns
Data Access PatternsData Access Patterns
Data Access Patterns
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
An Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization SystemsAn Approach to Data Quality for Netflix Personalization Systems
An Approach to Data Quality for Netflix Personalization Systems
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Analytics, reporting and ROI, Presentation EnDigiCom LTTA 1 by Jasna Suhadolc...
Analytics, reporting and ROI, Presentation EnDigiCom LTTA 1 by Jasna Suhadolc...Analytics, reporting and ROI, Presentation EnDigiCom LTTA 1 by Jasna Suhadolc...
Analytics, reporting and ROI, Presentation EnDigiCom LTTA 1 by Jasna Suhadolc...
 
Scaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API - From Atlassian Dev DenScaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API - From Atlassian Dev Den
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
 
Graphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in ProductionGraphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in Production
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 

Recently uploaded (20)

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 

Machine Learning at Netflix Scale

Editor's Notes

  1. - Who in the audience has an ML background ? Who is has big data background? Who’s an engineer? Going to cover: Bit of everything. A few models, our approach to architecture of ML systems, and how it all comes together Feel free to ask questions as we go along.
  2. - We use Machine Learning in many places at Netflix, but perhaps the place we’re best known for ML is in our recommender systems, and our personalization - So wanted to start with quick overview of what is personalization in Netflix
  3. If you’ve logged into Netflix before this should look familiar. This is what it looks like when you login to our website What you might not realize however is that almost every element on this page is driven by a ML algorithm
  4. - There’s the obvious recommendations. We a row of explicit recommendations, where we pull together everything we know about you, and present our “top picks” for you
  5. You’ll also see “Genre” rows, that provide shows around a particular theme. Movies are tagged in our system based on a number of different aspects The tags are editorially added by our team of content experts Which genre’s we pick however, is personalized. So “Movies based on books” is shown for me based on my predicted likelihood of wanting to watch this genre There’s also a level of personalization within the row itself. So a genre like “Movies based on books” spans a lot of different tastes. For example, movies about Wall Street and documentation on the GFC, and Young Adult Fiction all types of “Movies based on Books”, but they serve different tastes. But based on what we know about you, we can construct a set of “Movies based on books” tailed to your particular view of what that means.
  6. We also do “Similar” rows. So as the title says, because I last watched Bob Burgers, here’s some choices that are similar to that.
  7. Even our marketing images are personalized. Much of the hero images and marketing you see within Netflix is personalized to your taste. I see OITNB, but here because it fits with my tastes
  8. Finally we put it all together. Unsurprisingly, most of what people play is from the top left hand corner, and if they are forced to scroll further down, or right, then that means we failed to predict what they want to watch So we also rank the entire page. I’ve already shown how we rank the different rows left-to-right. We also rank each row top-to-bottom, so that you the most relevant (for you) rows are pushed to the top of the page.
  9. The net result of this personalization, is that 75% of what our users watch, is selected from the homepage. And the rows I’ve just shown you. Which means that we’ve been able to provide a very personalized experience for our users, where what they see on the homepage, when they login to Netflix, matches pretty well with what they want to watch.
  10. - Okay, I’m going to take a minute now to provide some back story.
  11. Who’s heard of the Netflix prize? It ran from 2006->2009. - It was won in 2009 by Team KorBell (AT&T).
  12. The challenge was: We give you 100M anonymized ratings from users data, to build a “rating prediction” model with We then get you to predict 2.8M ratings for user’s who we already know what they rated, but we held back. If you can improve on our predicted ratings by 10%, then we give you 1 million dollars We measure this as the root mean square difference between, your predicted rating, and what the real rating is that we held back. - Team KorBell (AT&T) won it in 2009. - They improved the predictions by 8.43% http://mathurl.com/osuomvj
  13. Two significant algorithms came out of the Netflix Prize. SVD - Prize RMSE: 0.8914 RBM - Prize RMSE: 0.8990 They were known in academia already, but hadn’t made their way out into industry recommender systems. I talk through how SVD works at a high level in later slides These two algorithms are still used in parts of the Netflix Recommender System to this day.
  14. - There are limitations though. Ratings != Plays. People’s ratings are somewhat “aspirational”. People may rate CitzenKane 5 stars, but what they watch is Sharknado. For our use case, we’re interested in predicting what people actually want to watch, not predicting what they think are critically worthy movies.
  15. Also Netflix has changed a lot since the start of the Netflix Prize. In 2006 we were mailing out DVDs. Now we’re more about steaming to devices. This also changed people viewing habits. The investment in selecting a great DVD, that the entire family can watch, was higher. Everyone had to agree on it, and getting it wrong might ruin your night. With streaming content want content that is more personalized, and more context sensitive to what they want to watch NOW.
  16. Also Netflix has grown. A lot. What algorithms worked in 2006, don’t necessary work with the volume we now have
  17. - Okay so dive a little into the models and data we use to do our personalization
  18. On the data side we have have a lot to work with. There’s a lot of signal that we get beyond straight plays/ratings. If you think about it, the context in which someone chooses what to watch tells you a lot too.
  19. So I want to give you a quick overview of how SVD (aka Matrix Factorization) works. This is one of the classic algorithms used in the NF prize, and was a big break through at the time. This should give you a flavor of how these systems work. Basic model is. http://mathurl.com/pgux65w
  20. - To make that more visual
  21. http://mathurl.com/pgux65w
  22. http://mathurl.com/l4w5yd6
  23. http://mathurl.com/l4w5yd6
  24. So that’s one of the foundational algorithms used in recommender systems. But things have moved on a lot since then too. These days we’re mostly focused on ranking rather than rating prediction. This allows us to balance things like diversity, freshness, global popularity against our prediction on how much this fits your tastes We are (or have) AB tested many of these. And what algorithm to use really depends on your application, and what you’re trying to achieve. All have pros and cons. You’ll likely end up with a few different algorithms for different parts of the problem The important thing to test them in your production system
  25. Over time we been able to improve on the results we got from the Netflix prize. It’s been a combination of adding more data, and adding in more sophisticated models As you can see here, we’ve moved things on a lot. These are improvements to Netflix’s core business metrics. So even a 1% improvement equates to real benefits to the business One quick note: Always make sure you select a realistic baseline to test against. Just straight global popularity is usually pretty tough to beat. So you can fool yourself if you’re not testing against that, or your equivalent of that.
  26. - So you now you have an idea of what a recommender system algorithm looks like, lets see how you can productionize that
  27. So here’s the core workflow you’ll need to support. Whatever decisions you make about your architecture, you’ll need to make the above process seamless. Machine Learning Approach Define problem (what you think needs solving, or hypothesis of what can be improved Gather data on which to train model Experiment offline to see if you can improve over baseline Produce Model/Algorithm and deploy Track key metrics in production to see if hypothesis is proven
  28. - Here’s a blueprint for different layers you’ll need. - We’ll step through each area next.
  29. Okay lets start with the front-end (aka online). I won’t cover much here, except for to point out that you’ll need an extremely good data pipeline. You’ll spend 90% of your time building this. Often needs to be built by an engineering team in collaboration with your researches.
  30. There’s many different types of data you’ll want to capture Incl. What your algorithms are doing. You’ll need to correct for presentation bias And context and behavior that users interact with you in
  31. - Need backend service that can accept and aggregate all these disparate data sources Want to look at technologies like Suro, Kafka, etc Stream to longer term (cheap) storage (S3, HDFS)
  32. Need common framework that makes it easier to instrument your code for events. Adopt early and get into every app as “standard”
  33. Okay lets talk about where you (typically) define and train your models Most of your models will be produced offline & embedded in production You’ll need a platform that allows easy, across diverse tools: R, iPython, in-house Common Format (can be code) that allows you to embed models once learned
  34. Common confusion: Models change less than you think Values you’ll be plugging in, can still be real-time http://mathurl.com/kuxa5hw
  35. Lets walk through an example of a model we train. Neural Networks
  36. These days use GPUs (Cuda) to do training of network. Thousand of cores Massively parallel Computing power is what’s changed. ANN are really an old idea
  37. But still need to explore hyper-parameter space. Parameters Learning rate theta …
  38. Parameters How many layers, and how deep
  39. AWS offers GPU compute instances. Approach. Conduct search over many different architectures / parameters - Distribute different architecture to each instance - Train model - Evaluate Can get smarter with how you explore this space. So rather than doing grid search, you search in areas most likely to have improvement 60cents an hour. Comparative fortune compared to other instances, but only takes a few hours to train model that is used in production for weeks (or months) Perfect for experimental work
  40. Your offline models won’t reflect sudden changes in behavior, that it hasn’t seen before. Here’s OITNB, and House of Cards (as being searched for in Google). These can represent massive shifts in global user behavior, which can throw the model off Also some models degrade faster than others. You see this especially with tree models.
  41. Another problem: The models themselves still run in production (even though they’re trained offline). This limits how sophisticated you can make your models. They still need to return results within your SLAs.
  42. One Solution. Near-line computing. Re-train models based on events from the system Pre-compute results where you can
  43. Now you don’t always have to pre-compute the final results. The beauty of the near-line approach is that it lets you half-bake the model. So that the parts that are more static are pre-generated, and the parts that are more sensitive to changes get worked on the fly. Remember our SVD model. U is users, M is movies, and R are ratings - Turns out that solving U if you know M and R, is simple Least Squares solution. With modern linear algebra libraries we can compute that in milli-seconds.
  44. Recomputes are event driven. No need to re-compute if nothing has changed So in this example, we re-compute the latent vectors representing my tastes, whenever there’s more information available about me to re-train that vector with.