Successfully reported this slideshow.

Recent Trends in Personalization at Netflix

0

Share

1 of 60
1 of 60

Recent Trends in Personalization at Netflix

0

Share

Download to read offline

Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.

Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.

More Related Content

More from Förderverein Technische Fakultät

Related Books

Free with a 14 day trial from Scribd

See all

Recent Trends in Personalization at Netflix

  1. 1. Recent Trends in Personalization at Netflix Anuj Shah @badshah79 https://www.linkedin.com/in/foranuj/
  2. 2. Why do we personalize?
  3. 3. Help members find entertainment to watch and enjoy to maximize member satisfaction and retention
  4. 4. Spark joy
  5. 5. What do we personalize?
  6. 6. Ordering of videos is personalized From how we rank Ranking
  7. 7. Selection and placement of rows is personalized ... to how we construct a page Rows
  8. 8. ... to how we respond to queries Search query & result recommendation
  9. 9. ... to how we cover different needs Personalized instant choices
  10. 10. ... to how we reach out Message personalization
  11. 11. Everything is a recommendation!
  12. 12. Isn’t this solved yet?
  13. 13. ○ Every person is unique with a variety of interests … and sometimes multiple people use the same profile ○ Help people find what they want when they’re not sure what they want ○ Non-stationary, context-dependent, mood-dependent, ... ○ Large datasets but small data per member … and potentially biased by the output of your system ○ Cold-start problems on all sides ○ More than just accuracy: diversity, novelty, freshness, fairness, ... ○ ... No, personalization is hard!
  14. 14. So how are we going to solve this?
  15. 15. Some recent avenues in approaching these challenges: 1. Deep Learning 2. Causality 3. Bandits & Reinforcement Learning 4. Objectives Trending Now
  16. 16. Trend 1: Deep Learning for Recommendations
  17. 17. ~2012 ~2017 Deep Learning becomes popular in Machine Learning Deep Learning becomes popular in Recommender Systems What took so long? ~2019 Traditional methods do as well or better than Deep Learning for Recommender Systems … Wait, what? Timeline
  18. 18. Traditional Recommendations Collaborative Filtering: Recommend items that similar users have chosen 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items
  19. 19. U ≈ R V A Matrix Factorization view 2
  20. 20. U A Feed-Forward Network view V 2
  21. 21. U A (deeper) feed-forward view V Mean squared loss ?
  22. 22. … isn’t always the best U V Mean squared loss ? Also see [Dacrema et al., 2019], [Rendle et. al, 2019], [Rendle et. al, 2021]. Make sure you tune your baselines.
  23. 23. Understanding the relationships From our forthcoming AI Magazine article “Deep Learning for Recommender Systems: A Netflix Case-Study”
  24. 24. X R EASE: Embarrassingly Shallow Auto-Encoders [Steck, 2019] ● Super efficient model to train in a collaborative filtering setting inspired by SLIM ● Learn item-by-item matrix X such that R.X is close to R and diag(X) is 0 ○ Avoids trivial solution of identity ● Closed-form solution ● More on that: auto-encoders that don’t overfit towards identity ≈ R 0 0
  25. 25. Modern Recommender Systems 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items
  26. 26. Contextual event data Modern Recommender Data Model Interactions Impressions Item data Profile settings +
  27. 27. Contextual sequence data 2017-12-10 15:40:22 2017-12-23 19:32:10 2017-12-24 12:05:53 2017-12-27 22:40:22 2017-12-29 19:39:36 2017-12-30 20:42:13 Context Item Sequence per member ? Time
  28. 28. V Sequential Recommendation Network Softmax over items Avg / Stack / Sequence / Attention DNN / RNN / CNN / TNN Input interactions (X) (X) p(Y|X) 2018-12-23 19:32:10 2018-12-24 12:05:53 2019-01-02 15:40:22
  29. 29. Offline Ranking Improvements
  30. 30. Trend 2: Causality
  31. 31. From Correlation to Causation ● Most recommendation algorithms are correlational ○ Some early recommendation algorithms literally computed correlations between users and items ● Did you watch a movie because we recommended it to you? Or because you liked it? Or both? ● If you had to watch a movie, would you like it? [Wang et al., 2020] p(Y|X) → p(Y|X, do(R)) (from http://www.tylervigen.com/spurious-correlations)
  32. 32. Feedback loops Impression bias inflates plays Leads to inflated item popularity More plays More impressions Oscillations in distribution of genre recommendations Feedback loops can cause biases to be reinforced by the recommendation system! [Chaney et al., 2018]: simulations showing that this can reduce the usefulness of the system They’re real:
  33. 33. Lots of feedback loops...
  34. 34. Closed Loop Training Data Watches Model Recs Search Training Data Watches Model Recs Open Loop
  35. 35. Closed Loop Training Data Watches Model Recs Danger Zone Search Training Data Watches Model Recs Open Loop
  36. 36. Closed Loop Training Data Watches Model Recs Danger Zone Search Training Data Watches Model Recs Open Loop
  37. 37. V Propensity Correction Avg / Stack / Sequence / Attention DNN / RNN / CNN / TNN Input interactions (X) (X) 2018-12-23 19:32:10 2018-12-24 12:05:53 2019-01-02 15:40:22 Policy softmax Propensity softmax E.g. [Chen et al., 2019] p(Y|X, do(R)) p(R|X)
  38. 38. Challenges in Causal Recommendations ● Handling unobserved confounders ● Coming up with the right causal graph ● High variance (especially propensity-based ones) ● Computational challenges (e.g. [Wong, 2020]) ● Off-policy evaluation ● When and how to introduce exploration
  39. 39. Trend 3: Bandits & Reinforcement Learning in Recommendations
  40. 40. Why contextual bandits for recommendations? ● Break feedback loops ● Want to explore to learn ● Uncertainty around member interests and new items ● Sparse and indirect feedback ● Changing trends ▶ Early news example: [Li et al., 2010]
  41. 41. Example: What to show first? ? ...
  42. 42. Recommendation as Contextual Bandit ● Environment: Netflix homepage ● Context: Member ● Arm: Display video at top of page ● Policy: Selects a video to recommend ● Reward: Member plays and enjoys video Video Selector ▶ ?
  43. 43. Winner Bandit Features Model 1 Model 2 Model 3 Model 4 Member (context) Video (arm) Probability of enjoyment (Predicted reward)
  44. 44. Causality & Bandits [Dimakopoulou et al., 2021] ● Data collected from bandits is not IID ○ Bandits collect data adaptively ○ Initial noise may mean choosing an arm less often, which can keep its sample mean low ● Inverse Propensity Weighting? High variance ○ Take inspiration from Doubly Robust estimators ● Doubly Adaptive Thompson Sampling (DATS) ○ Thompson Sampling using the distribution of the Adaptive Doubly Robust estimator in place of the posterior ○ DATS performs better in practice and matches TS regret bound
  45. 45. ● Designing good exploration is an art ○ Especially to support future algorithm innovation ○ Challenging to do member-level A/B tests comparing fully on-policy bandits at high scale ● Bandits over large action spaces: rankings and slates ● Layers of bandits that influence each other ● Handling delayed rewards Challenges with bandits in the real world
  46. 46. Going Long-Term ● Want to maximize long-term member joy ● Involves many member visits, recommendation actions and delayed reward ● … sounds like Reinforcement Learning
  47. 47. Within a page RL to optimize a ranking or slate How long? Within a session RL to optimize multiple interactions in a session Across sessions RL to optimize interactions across multiple sessions
  48. 48. Building simulators for evaluating recommenders Page-level Whole system (Accordion) [McInerney et al., 2021] Ranking
  49. 49. ● Embeddings for actions: List-wise [Zhao et al., 2017] or Page-wise recommendation [Zhao et al. 2018] based on [Dulac-Arnold et al., 2016] ● Adversarial model for user simulator: GAN-like model [Chen et al., 2019] ● Policy Gradient: Candidate generator using REINFORCE and TPRO [Chen et al., 2019], ● Multi-task: Additional model head or Actor-Critic [Xin et al., 2020], Auxiliary tasks for REINFORCE [Chen et al., 2021] ● Handling Diversity [Hansen et al., 2021], Slates [Ie et al., 2020], & Multiple Recommenders [Zhao et. al, 2020] ● ... Many potential directions
  50. 50. Trend 4: Objectives
  51. 51. ● We want to optimize long-term member joy ● While accounting for: ○ Avoiding “trust busters” ○ Coldstarting ○ Fairness ○ Findability ○ ... What is your recommender trying to optimize?
  52. 52. Layers of Metrics Training Objective Offline Metric Online Metric Goal
  53. 53. Layers of Metrics RMSE NDCG on historical data Member Engagement in A/B test Joy Example case: Misaligned Metrics Training Objective Offline Metric Online Metric Goal
  54. 54. Our recommendations can only be as good as the metrics we measure it on
  55. 55. Recap [More et al., 2019] ● Bandit replay-style metrics can have high variance due to low number of matches with large action spaces ● Use a ranking approach: Good to rank high reward arms near top, low reward arms near bottom
  56. 56. ● Nuanced metrics: ○ Differences between what you want and what you can encapsulate in a metric ○ Where does enjoyment come from? How does that vary by person? ○ How do you measure that at scale? ● What about effects beyond typical A/B time horizon? ● Incorporating fairness ○ Calibration to distribution of user tastes [Steck, 2018] ○ Item cold-start [Zhu et. al, 2021] ● Beyond algorithms: Ensuring a positive impact on society Challenges in objectives
  57. 57. Conclusion
  58. 58. 1. Deep Learning 2. Causality 3. Bandits & Reinforcement Learning 4. Objectives A few recent trends in personalization
  59. 59. Sound interesting?Join us research.netflix.com/jobs
  60. 60. Thank you Anuj Shah @badshah79 https://www.linkedin.com/in/foranuj/

×