Learning to Personalize

Justin Basilico
Justin BasilicoResearch/Engineering Director at Netflix - Machine Learning and Recommendation Systems
1 
Learning to Personalize 
Justin Basilico 
Page Algorithms Engineering September 19, 2014 
@JustinBasilico 
ATL 2014
2 
 Interested in high-quality 
recommendations 
 Proxy question: 
 Accuracy in predicted rating 
 Measured by root mean 
squared error (RMSE) 
 Improve by 10% = $1 million! 
 Data size: 
 100M ratings (back then 
“almost massive”)
3 
Change of focus 
2006 2014
4 
Netflix Scale 
 > 50M members 
 > 40 countries 
 > 1000 device types 
 Hours: > 1B/month 
 Plays: > 30M/day 
 Log 100B events/day 
 34.2% of peak US 
downstream traffic
5 
Goal 
Help members find content to watch and enjoy 
to maximize member satisfaction and retention
6 
“Emmy Winning” 
Approach to Recommendation
7 
Everything is a Recommendation 
Rows 
Ranking 
Over 75% of what 
people watch 
comes from our 
recommendations 
Recommendations 
are driven by 
Machine Learning
8 
Top Picks 
Personalization awareness 
Diversity
9 
Personalized genres 
 Genres focused on user interest 
 Derived from tag combinations 
 Provide context and evidence 
 How are they generated? 
 Implicit: Based on recent plays, 
ratings & other interactions 
 Explicit: Taste preferences
10 
Similarity 
 Recommend videos similar 
to one you’ve liked 
 “Because you watched” 
rows 
 Pivots 
 Video information page 
 In response to user actions 
(search, list add, …)
11 
Support for Recommendations 
Behavioral Support Social Support
12 
Learning to Recommend
13 
Machine Learning Approach 
Problem 
Data 
Metrics 
Model Algorithm
14 
Data 
 Plays 
 Duration, bookmark, time, 
device, … 
 Ratings 
 Metadata 
 Tags, synopsis, cast, … 
 Impressions 
 Interactions 
 Search, list add, scroll, … 
 Social
15 
Models & Algorithms 
 Regression (Linear, logistic, elastic net) 
 SVD and other Matrix Factorizations 
 Factorization Machines 
 Restricted Boltzmann Machines 
 Deep Neural Networks 
 Markov Models and Graph Algorithms 
 Clustering 
 Latent Dirichlet Allocation 
 Gradient Boosted Decision 
Trees/Random Forests 
 Gaussian Processes 
 …
16 
Rating Prediction 
 Based on first year progress prize 
 Top 2 algorithms 
 Matrix Factorization (SVD++) 
 Restricted Boltzmann Machines 
(RBM) 
 Ensemble: Linear blend 
Videos 
R 
≈ 
Users 
U 
V 
(99% Sparse) d 
Videos 
Users 
× d
17 
Ranking by ratings 
4.7 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 
Niche titles 
High average ratings… by those who would watch it
18 
RMSE
19 
Learning to Rank 
 Approaches: 
 Point-wise: Loss over items 
(Classification, ordinal regression, MF, …) 
 Pair-wise: Loss over preferences 
(RankSVM, RankNet, BPR, …) 
 List-wise: (Smoothed) loss over ranking 
(LambdaMART, DirectRank, GAPfm, …) 
 Ranking quality measures: 
 Recall@N, Precision@N, NDCG, MRR, 
ERR, MAP, FCP, … 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
0 20 40 60 80 100 
Importance 
Rank 
NDCG MRR FCP
20 
Example: Two features, linear model 
Popularity 
Predicted Rating 
1 
2 
3 
4 
5 
Linear Model: 
Final Ranking 
frank(u,v) = w1 p(v) + w2 r(u,v)
21 
Personalized Ranking
22 
“Learning to Row” 
Putting a page together
23 
Page-level algorithmic challenge 
10,000s of 
possible 
rows … 
10-40 
rows 
Variable number of 
possible videos per 
row (up to thousands) 
1 personalized page 
per device
24 
Balancing a Personalized Page 
Accurate vs. Diverse 
Discovery vs. Continuation 
Depth vs. Coverage 
Freshness vs. Stability 
Recommendations vs. Tasks
25 
2D Navigational Modeling 
More likely 
to see 
Less likely
26 
Building a page algorithmically 
 Approaches 
 Template: Non-personalized layout 
 Row-independent: Greedy rank rows by f(r | u, c) 
 Stage-wise: Pick next rows by f(r | u, c, p1:n) 
 Page-wise: Total page fitness f(p | u, c) 
 Obey constraints 
 Certain rows may be required (Continue Watching 
and My List) 
 Filter, de-duplicate 
 Format for device
27 
Row Features 
 Quality of items 
 Features of items 
 Quality of evidence 
 User-row interactions 
 Item/row metadata 
 Recency 
 Item-row affinity 
 Row length 
 Position on page 
 Title 
 Diversity 
 Similarity 
 Freshness 
 …
28 
Page-level Metrics 
 How do you measure the quality of 
the homepage? 
 Ease of discovery, Diversity, 
Novelty, … 
 Challenges: 
 Position effects 
 Row-video generalization 
 2D versions of ranking quality 
metrics 
 Example: Recall @ row-by-column 
0 10 20 30 
Recall 
Row
29 
Learning in the Cloud
30 
Three levels of Learning Distribution/Parallelization 
1. For each subset of the population (e.g. 
region) 
 Want independently trained and tuned models 
2. For each combination of hyperparameters 
 Simple: Grid search 
 Better: Bayesian optimization using Gaussian 
Processes 
3. For each subset of the training data 
 Distribute over machines (e.g. ADMM) 
 Multi-core parallelism (e.g. HogWild) 
 Or… use GPUs
31 
Example: Training Neural Networks 
 Level 1: Machines in different 
AWS regions 
 Level 2: Machines in same AWS 
region 
 Spearmint or MOE for parameter 
optimization 
 Condor, StarCluster, Mesos, etc. for 
coordination 
 Level 3: Highly optimized, parallel 
CUDA code on GPUs
32 
Conclusions
33 
Evolution of Recommendation Approach 
4.7 
Rating Ranking Page Generation
34 
Research Directions 
Context 
awareness 
Full-page 
optimization 
Presentation 
effects 
Scaling 
Personalized 
learning to 
rank 
Cold start
Thank You Justin Basilico 
jbasilico@netflix.com 
35 @JustinBasilico 
We’re hiring
1 of 35

More Related Content

What's hot(20)

Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity1.9K views
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
Samir Bessalah5.8K views
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
Ning Jiang612 views
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain5K views
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts1.7K views
Ai use casesAi use cases
Ai use cases
Sparsh Agarwal489 views

Recently uploaded(20)

METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation23 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya51 views
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting170 views
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh34 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet48 views

Learning to Personalize

  • 1. 1 Learning to Personalize Justin Basilico Page Algorithms Engineering September 19, 2014 @JustinBasilico ATL 2014
  • 2. 2  Interested in high-quality recommendations  Proxy question:  Accuracy in predicted rating  Measured by root mean squared error (RMSE)  Improve by 10% = $1 million!  Data size:  100M ratings (back then “almost massive”)
  • 3. 3 Change of focus 2006 2014
  • 4. 4 Netflix Scale  > 50M members  > 40 countries  > 1000 device types  Hours: > 1B/month  Plays: > 30M/day  Log 100B events/day  34.2% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 “Emmy Winning” Approach to Recommendation
  • 7. 7 Everything is a Recommendation Rows Ranking Over 75% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 8. 8 Top Picks Personalization awareness Diversity
  • 9. 9 Personalized genres  Genres focused on user interest  Derived from tag combinations  Provide context and evidence  How are they generated?  Implicit: Based on recent plays, ratings & other interactions  Explicit: Taste preferences
  • 10. 10 Similarity  Recommend videos similar to one you’ve liked  “Because you watched” rows  Pivots  Video information page  In response to user actions (search, list add, …)
  • 11. 11 Support for Recommendations Behavioral Support Social Support
  • 12. 12 Learning to Recommend
  • 13. 13 Machine Learning Approach Problem Data Metrics Model Algorithm
  • 14. 14 Data  Plays  Duration, bookmark, time, device, …  Ratings  Metadata  Tags, synopsis, cast, …  Impressions  Interactions  Search, list add, scroll, …  Social
  • 15. 15 Models & Algorithms  Regression (Linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 16. 16 Rating Prediction  Based on first year progress prize  Top 2 algorithms  Matrix Factorization (SVD++)  Restricted Boltzmann Machines (RBM)  Ensemble: Linear blend Videos R ≈ Users U V (99% Sparse) d Videos Users × d
  • 17. 17 Ranking by ratings 4.7 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 Niche titles High average ratings… by those who would watch it
  • 19. 19 Learning to Rank  Approaches:  Point-wise: Loss over items (Classification, ordinal regression, MF, …)  Pair-wise: Loss over preferences (RankSVM, RankNet, BPR, …)  List-wise: (Smoothed) loss over ranking (LambdaMART, DirectRank, GAPfm, …)  Ranking quality measures:  Recall@N, Precision@N, NDCG, MRR, ERR, MAP, FCP, … 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 Importance Rank NDCG MRR FCP
  • 20. 20 Example: Two features, linear model Popularity Predicted Rating 1 2 3 4 5 Linear Model: Final Ranking frank(u,v) = w1 p(v) + w2 r(u,v)
  • 22. 22 “Learning to Row” Putting a page together
  • 23. 23 Page-level algorithmic challenge 10,000s of possible rows … 10-40 rows Variable number of possible videos per row (up to thousands) 1 personalized page per device
  • 24. 24 Balancing a Personalized Page Accurate vs. Diverse Discovery vs. Continuation Depth vs. Coverage Freshness vs. Stability Recommendations vs. Tasks
  • 25. 25 2D Navigational Modeling More likely to see Less likely
  • 26. 26 Building a page algorithmically  Approaches  Template: Non-personalized layout  Row-independent: Greedy rank rows by f(r | u, c)  Stage-wise: Pick next rows by f(r | u, c, p1:n)  Page-wise: Total page fitness f(p | u, c)  Obey constraints  Certain rows may be required (Continue Watching and My List)  Filter, de-duplicate  Format for device
  • 27. 27 Row Features  Quality of items  Features of items  Quality of evidence  User-row interactions  Item/row metadata  Recency  Item-row affinity  Row length  Position on page  Title  Diversity  Similarity  Freshness  …
  • 28. 28 Page-level Metrics  How do you measure the quality of the homepage?  Ease of discovery, Diversity, Novelty, …  Challenges:  Position effects  Row-video generalization  2D versions of ranking quality metrics  Example: Recall @ row-by-column 0 10 20 30 Recall Row
  • 29. 29 Learning in the Cloud
  • 30. 30 Three levels of Learning Distribution/Parallelization 1. For each subset of the population (e.g. region)  Want independently trained and tuned models 2. For each combination of hyperparameters  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes 3. For each subset of the training data  Distribute over machines (e.g. ADMM)  Multi-core parallelism (e.g. HogWild)  Or… use GPUs
  • 31. 31 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Spearmint or MOE for parameter optimization  Condor, StarCluster, Mesos, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 33. 33 Evolution of Recommendation Approach 4.7 Rating Ranking Page Generation
  • 34. 34 Research Directions Context awareness Full-page optimization Presentation effects Scaling Personalized learning to rank Cold start
  • 35. Thank You Justin Basilico jbasilico@netflix.com 35 @JustinBasilico We’re hiring