Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Multi-Armed Bandit Framework For Recommendations at Netflix

5,213 views

Published on

In this talk, we present a general multi-armed bandit framework for recommendations on the Netflix homepage. We present two example case studies using MABs at Netflix - a) Artwork Personalization to recommend personalized visuals for each of our members for the different titles and b) Billboard recommendation to recommend the right title to be watched on the Billboard.

Published in: Technology
  • Dating direct: ♥♥♥ http://bit.ly/39sFWPG ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ http://bit.ly/39sFWPG ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ◆◆◆ http://ishbv.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

A Multi-Armed Bandit Framework For Recommendations at Netflix

  1. 1. A Multi-Armed Bandit Framework for Recommendations at Netflix Jaya Kawale & Fernando Amat PRS Workshop, June 2018
  2. 2. Quickly help members discover content they’ll love
  3. 3. Global Members, Personalized Tastes 125 Million Members ~200 Countries
  4. 4. 98% Match
  5. 5. Spot the Algorithms! 98% Match
  6. 6. Spot the Algorithms! 98% Match
  7. 7. Case Study I: Artwork Optimization Goal: Recommend a personalized artwork or imagery for a title to help members decide if they will enjoy the title or not.
  8. 8. Case Study II: Billboard Recommendation Goal: Successfully introduce content to the right members.
  9. 9. Traditional Approaches for Recommendation Collaborative Filtering ● Idea is to use the “wisdom of the crowd” to recommend items ● Well understood and various algorithms exist (e.g. Matrix Factorization) Collaborative Filtering 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items
  10. 10. Challenges for Traditional Approaches ● Scarce feedback ● Dynamic catalog ● Country availability ● Non-stationary member base ● Time sensitivity ○ Content popularity changes ○ Member interests evolves ○ Respond quickly to member feedback
  11. 11. Challenges for Traditional Approaches Continuous and fast learning needed ● Scarce feedback ● Dynamic catalog ● Country availability ● Non-stationary member base ● Time sensitivity ○ Content popularity changes ○ Member interests evolves ○ Respond quickly to member feedback
  12. 12. Multi-Armed Bandits Increasingly successful in various practical settings where these challenges occur Clinical Trials Network Routing Online Advertising AI for Games Hyperparameter Optimization
  13. 13. Multi-Armed Bandits ● A gambler playing multiple slot machines with unknown reward distribution ● Which machine to play to maximize reward?
  14. 14. Multi-Armed Bandit For Recommendation Exploration-Exploitation tradeoff : Recommend the optimal title given the evidence i.e. exploit OR Recommend other titles to gather feedback i.e. explore.
  15. 15. Numerous Variants ● Different Strategies: ε-Greedy, Thompson Sampling (TS), Upper Confidence Bound (UCB), etc. ● Different Environments: ○ Stochastic and stationary: Reward is generated i.i.d. from a distribution specific to the action. No payoff drift. ○ Adversarial: No assumptions on how rewards are generated. ● Different objectives: Cumulative regret, tracking the best expert ● Continuous or discrete set of actions, finite vs infinite ● Extensions: Varying set of arms, Contextual Bandits, etc.
  16. 16. Case Study I: Artwork Personalization
  17. 17. Bandit Algorithms Setting For each (user, show) request: ● Actions: set of candidate images available ● Reward: how many minutes did the user play from that impression ● Environment: Netflix homepage in user’s device ● Learner: its goal is to maximize the cumulative reward after N requests Learner Environment Action Reward Context
  18. 18. Specific challenges ● Play attribution and reward assignment ○ Incremental effect of the image on top of recommender system ● Only one image per title can be presented ○ Although inherently it is a ranking problem Would you play because the movie is recommended or because of the artwork? Or both?
  19. 19. Specific challenges ● Change effect ○ Can changing images too often make users confused? Session 1 Session 2 Session 3 ... Session N Sequence A Sequence B
  20. 20. ● We have control over the set of actions ○ How many images per show ○ Image design ● What makes a good asset? ○ Representative (no clickbait) ○ Differential ○ Informative ○ Engaging Actions Personal (i.e. contextual)
  21. 21. Intuition for Personalized Assets ● Emphasize themes through different artwork according to some context (user, viewing history, country, etc.) Preferences in genre
  22. 22. Intuition for Personalized Assets ● Emphasize themes through different artwork according to some context (user, viewing history, country, etc) Preferences in cast members
  23. 23. Epsilon Greedy for MABs ● Unbiased training data ● Like AB test across actions ● Greedy ● Select optimal action Explore ε 1-ε Exploit
  24. 24. ● Learn a binary classifier per image to predict probability of play ● Pick the winner (arg max) Member (context) Features Image Pool Model 1 Winner arg max Model 2 Model 3 Model 4 Greedy Exploit Policy
  25. 25. Take Fraction Example: Luke Cage Take Fraction = 1 / 3 Play No play User A User B User C
  26. 26. ● Unbiased offline evaluation from explore data Offline metric: Replay [Li et al, 2010] Offline Take Fraction = 2 / 3 User 1 User 2 User 3 User 4 User 5 User 6 Random Assignment Play? Model Assignment
  27. 27. Offline Replay ● Context matters ● Artwork diversity matters ● Personalization wiggles around most popular images Lift in Replay in the various algorithms as compared to the Random baseline
  28. 28. Online results ● Rollout to our >125M member base ● Most beneficial for less known titles ● Compression from title -level offline metrics due to cannibalization between titles
  29. 29. Case Study II: Billboard Recommendation
  30. 30. Considerations for the greedy policy ● Explore ○ Bandwidth allocation and cost of exploration ○ New vs existing titles ● Exploit ○ Model synchronisation ○ Title availability ○ Frequency of model update ○ Incremental updates vs batch training ■ Stationarity of title popularities ? ? ? ? ?? ?
  31. 31. Greedy Exploit Policy Member Features Candidate Pool Model 1 Winner Probability Of Play Model 2 Model 3 Model 4
  32. 32. Would the member have played the title anyway?
  33. 33. Netflix Promotions Netflix homepage is an expensive real-estate (opportunity cost): - so many titles to promote - so few opportunities to win a “moment of truth” D1 D2 D3 D4 D5 Promote?▶ ▶ ▶ ▶ Probability of Play Days
  34. 34. Netflix Promotions Netflix homepage is an expensive real-estate (opportunity cost): - so many titles to promote - so few opportunities to win a “moment of truth” Traditional (correlational) ML systems: - take action if probability of positive reward is high, irrespective of reward base rate - don’t model incremental effect of taking action D1 D2 D3 D4 D5 Promote?▶ ▶ ▶ ▶ Probability of Play Days
  35. 35. Incrementality from Advertising ● Goal: Measure ad effectiveness. ● Incrementality: The difference in the outcome because the ad was shown; the causal effect of the ad. $1.1M $1.0M $100k Other Advertisers’ Ads Control Treatment Revenue Random Assignment* *Johnson, Garrett A. and Lewis, Randall A. and Nubbemeyer, Elmar I, Ghost Ads: Improving the Economics of Measuring Online Ad Effectiveness (January 12, 2017). Simon Business School Working Paper No. FR 15-21. Available at SSRN: https://ssrn.com/abstract=2620078
  36. 36. Incrementality Based Policy ● Goal: Select title for promotion that benefits most from being shown in billboard ○ Member can play title from other sections on the homepage or search ○ Popular titles likely to appear on homepage anyway: Trending Now ○ Better utilize most expensive real-estate on the homepage! ● Define policy to be incremental with respect to probability of play
  37. 37. Incrementality Based Policy on Billboard ● Goal: Recommend title which has the largest additional benefit from being presented on the Billboard ○ Recommend titles with argmax of
  38. 38. Which titles benefit from Billboard? Title A benefits much more than Title C by being shown on the Billboard Scatter plot of incremental vs baseline probability of play for various members.
  39. 39. Offline & Online Results ● Incrementality based policy sacrifices replay by selecting a lesser known title that would benefit from being shown on the Billboard. ● Our implementation of incrementality is able to shift engagement within the candidate pool. Lift in Replay in the various algorithms as compared to a random baseline
  40. 40. Research Directions
  41. 41. Action selection orchestration ● Neighboring image selection influences result ● Title-level optimization is not enough Row A (diverse images) Row B (the microphone row) Stand-up comedy
  42. 42. Automatic image selection ● Generating new artwork is costly and time consuming ● Develop algorithm to predict asset quality from raw image Raw image Box-art
  43. 43. Long-term Reward: Road to RL ● Maximize long term reward: reinforcement learning ○ User long term joy rather than play clicks or duration.
  44. 44. Thank you. Jaya Kawale (jkawale@netflix.com) Fernando Amat (famat@netflix.com)

×