Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning to Personalize

1,975 views

Published on

Talk from MLConf Atlanta on 2014-09-19 about how we use machine learning at Netflix to do personalized recommendations.

Published in: Technology

Learning to Personalize

  1. 1. 1 Learning to Personalize Justin Basilico Page Algorithms Engineering September 19, 2014 @JustinBasilico ATL 2014
  2. 2. 2  Interested in high-quality recommendations  Proxy question:  Accuracy in predicted rating  Measured by root mean squared error (RMSE)  Improve by 10% = $1 million!  Data size:  100M ratings (back then “almost massive”)
  3. 3. 3 Change of focus 2006 2014
  4. 4. 4 Netflix Scale  > 50M members  > 40 countries  > 1000 device types  Hours: > 1B/month  Plays: > 30M/day  Log 100B events/day  34.2% of peak US downstream traffic
  5. 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  6. 6. 6 “Emmy Winning” Approach to Recommendation
  7. 7. 7 Everything is a Recommendation Rows Ranking Over 75% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  8. 8. 8 Top Picks Personalization awareness Diversity
  9. 9. 9 Personalized genres  Genres focused on user interest  Derived from tag combinations  Provide context and evidence  How are they generated?  Implicit: Based on recent plays, ratings & other interactions  Explicit: Taste preferences
  10. 10. 10 Similarity  Recommend videos similar to one you’ve liked  “Because you watched” rows  Pivots  Video information page  In response to user actions (search, list add, …)
  11. 11. 11 Support for Recommendations Behavioral Support Social Support
  12. 12. 12 Learning to Recommend
  13. 13. 13 Machine Learning Approach Problem Data Metrics Model Algorithm
  14. 14. 14 Data  Plays  Duration, bookmark, time, device, …  Ratings  Metadata  Tags, synopsis, cast, …  Impressions  Interactions  Search, list add, scroll, …  Social
  15. 15. 15 Models & Algorithms  Regression (Linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  16. 16. 16 Rating Prediction  Based on first year progress prize  Top 2 algorithms  Matrix Factorization (SVD++)  Restricted Boltzmann Machines (RBM)  Ensemble: Linear blend Videos R ≈ Users U V (99% Sparse) d Videos Users × d
  17. 17. 17 Ranking by ratings 4.7 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 Niche titles High average ratings… by those who would watch it
  18. 18. 18 RMSE
  19. 19. 19 Learning to Rank  Approaches:  Point-wise: Loss over items (Classification, ordinal regression, MF, …)  Pair-wise: Loss over preferences (RankSVM, RankNet, BPR, …)  List-wise: (Smoothed) loss over ranking (LambdaMART, DirectRank, GAPfm, …)  Ranking quality measures:  Recall@N, Precision@N, NDCG, MRR, ERR, MAP, FCP, … 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 Importance Rank NDCG MRR FCP
  20. 20. 20 Example: Two features, linear model Popularity Predicted Rating 1 2 3 4 5 Linear Model: Final Ranking frank(u,v) = w1 p(v) + w2 r(u,v)
  21. 21. 21 Personalized Ranking
  22. 22. 22 “Learning to Row” Putting a page together
  23. 23. 23 Page-level algorithmic challenge 10,000s of possible rows … 10-40 rows Variable number of possible videos per row (up to thousands) 1 personalized page per device
  24. 24. 24 Balancing a Personalized Page Accurate vs. Diverse Discovery vs. Continuation Depth vs. Coverage Freshness vs. Stability Recommendations vs. Tasks
  25. 25. 25 2D Navigational Modeling More likely to see Less likely
  26. 26. 26 Building a page algorithmically  Approaches  Template: Non-personalized layout  Row-independent: Greedy rank rows by f(r | u, c)  Stage-wise: Pick next rows by f(r | u, c, p1:n)  Page-wise: Total page fitness f(p | u, c)  Obey constraints  Certain rows may be required (Continue Watching and My List)  Filter, de-duplicate  Format for device
  27. 27. 27 Row Features  Quality of items  Features of items  Quality of evidence  User-row interactions  Item/row metadata  Recency  Item-row affinity  Row length  Position on page  Title  Diversity  Similarity  Freshness  …
  28. 28. 28 Page-level Metrics  How do you measure the quality of the homepage?  Ease of discovery, Diversity, Novelty, …  Challenges:  Position effects  Row-video generalization  2D versions of ranking quality metrics  Example: Recall @ row-by-column 0 10 20 30 Recall Row
  29. 29. 29 Learning in the Cloud
  30. 30. 30 Three levels of Learning Distribution/Parallelization 1. For each subset of the population (e.g. region)  Want independently trained and tuned models 2. For each combination of hyperparameters  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes 3. For each subset of the training data  Distribute over machines (e.g. ADMM)  Multi-core parallelism (e.g. HogWild)  Or… use GPUs
  31. 31. 31 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Spearmint or MOE for parameter optimization  Condor, StarCluster, Mesos, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  32. 32. 32 Conclusions
  33. 33. 33 Evolution of Recommendation Approach 4.7 Rating Ranking Page Generation
  34. 34. 34 Research Directions Context awareness Full-page optimization Presentation effects Scaling Personalized learning to rank Cold start
  35. 35. Thank You Justin Basilico jbasilico@netflix.com 35 @JustinBasilico We’re hiring

×