Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Artwork Personalization at Netflix

1,338 views

Published on

Talk from QCon SF on 2018-11-05

For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front each of our members at the right time. With a catalog spanning thousands of titles and a diverse member base spanning over a hundred million accounts, recommending the titles that are just right for each member is crucial. But the job of recommendation does not end there. Why should you care about any particular title we recommend? What can we say about a new and unfamiliar title that will pique your interest? How do we convince you that a title is worth watching? Answering these questions is critical in helping our members discover great content, especially for unfamiliar titles. One way to do this is to consider the artwork or imagery we use to visually portray each title. If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual “evidence” for why the title might be good for you. Selecting good artwork is important because it may be the first time a member becomes aware of a title (and sometimes the only time), so it must speak to them in a meaningful way. In this talk, we will present an approach for personalizing the artwork we show for each title on the Netflix homepage. We will look at how to frame this as a machine learning problem using contextual multi-armed bandits in a recommendation system setting. We will also describe the algorithmic and system challenges involved in getting this type of approach for artwork personalization to succeed at Netflix scale. Finally, we will discuss some of the future opportunities that we see to expand and improve upon this approach.

Published in: Technology

Artwork Personalization at Netflix

  1. 1. Artwork Personalization at Netflix Justin Basilico QCon SF 2018 2018-11-05 @JustinBasilico
  2. 2. Which artwork to show?
  3. 3. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential
  4. 4. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential Personal
  5. 5. Intuition: Preferences in cast members
  6. 6. Intuition: Preferences in genre
  7. 7. Choose artwork so that members understand if they will likely enjoy a title to maximize satisfaction and retention
  8. 8. Challenges in Artwork Personalization
  9. 9. Everything is a Recommendation Over 80% of what people watch comes from our recommendations Rankings Rows
  10. 10. Attribution Pick only one Was it the recommendation or artwork? Or both? ▶
  11. 11. Change Effects Which one caused the play? Is change confusing? Day 1 Day 2 ▶
  12. 12. ● Creatives select the images that are available ● But algorithms must be still robust Adding meaning and avoiding clickbait
  13. 13. Scale Over 20M RPS for images at peak
  14. 14. Traditional Recommendations Collaborative Filtering: Recommend items that similar users have chosen 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items Members can only play images we choose
  15. 15. Need something more
  16. 16. Bandit
  17. 17. Not that kind of Bandit
  18. 18. Image from Wikimedia commons
  19. 19. Multi-Armed Bandits (MAB) ● Multiple slot machines with unknown reward distribution ● A gambler can play one arm at a time ● Which machine to play to maximize reward?
  20. 20. Bandit Algorithms Setting Each round: ● Learner chooses an action ● Environment provides a real-valued reward for action ● Learner updates to maximize the cumulative reward Learner (Policy) Environment Action Reward
  21. 21. Artwork Optimization as Bandit ● Environment: Netflix homepage ● Learner: Artwork selector for a show ● Action: Display specific image for show ● Reward: Member has positive engagement Artwork Selector ▶
  22. 22. ● What images should creatives provide? ○ Variety of image designs ○ Thematic and visual differences ● How many images? ○ Creating each image has a cost ○ Diminishing returns Images as Actions
  23. 23. ● What is a good outcome? ✓ Watching and enjoying the content ● What is a bad outcome? ✖ No engagement ✖ Abandoning or not enjoying the content Designing Rewards
  24. 24. Metric: Take Fraction ▶ Example: Altered Carbon Take Fraction: 1/3
  25. 25. Minimizing Regret ● What is the best that a bandit can do? ○ Always choose optimal action ● Regret: Difference between optimal action and chosen action ● To maximize reward, minimize the cumulative regret
  26. 26. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions
  27. 27. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions Choose image
  28. 28. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions 2/4 0/2 1/3 Observed Take Fraction Overall: 3/9
  29. 29. Strategy Show current best image Try another image to learn if it is actually better ExplorationMaximization vs.
  30. 30. Principles of Exploration ● Gather information to make the best overall decision in the long-run ● Best long-term strategy may involve short-term sacrifices
  31. 31. Common strategies 1. Naive Exploration 2. Optimism in the Face of Uncertainty 3. Probability Matching
  32. 32. Naive Exploration: 𝝐-greedy ● Idea: Add a noise to the greedy policy ● Algorithm: ○ With probability 𝝐 ■ Choose one action uniformly at random ○ Otherwise ■ Choose the action with the best reward so far ● Pros: Simple ● Cons: Regret is unbounded
  33. 33. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 2/4 (greedy) 0/2 1/3 Observed Reward
  34. 34. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝝐 / 3 𝝐 / 3 1 - 2𝝐 / 3
  35. 35. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ?
  36. 36. Epsilon-Greedy Example 1 0 1 0 0 0 0 0 1 0 2/4 (greedy) 0/3 1/3 Observed Reward
  37. 37. Optimism: Upper Confidence Bound (UCB) ● Idea: Prefer actions with uncertain values ● Approach: ○ Compute confidence interval of observed rewards for each action ○ Choose action a with the highest 𝛂-percentile ○ Observe reward and update confidence interval for a ● Pros: Theoretical regret minimization properties ● Cons: Needs to update quickly from observed rewards
  38. 38. Beta-Bernoulli Distribution Image from Wikipedia Beta Bernoulli Pr(1) = p Pr(0) = 1 - p Prior
  39. 39. Bandit Example with Beta-Bernoulli 2/4 0/2 1/3 Observed Take Fraction Prior: 𝛽(1, 1) 𝛽(3, 3) 𝛽(2, 3) 𝛽(1, 3)+ = A B C
  40. 40. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  41. 41. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  42. 42. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  43. 43. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  44. 44. Probabilistic: Thompson Sampling ● Idea: Select the actions by the probability they are the best ● Approach: ○ Keep a distribution over model parameters for each action ○ Sample estimated reward value for each action ○ Choose action a with maximum sampled value ○ Observe reward for action a and update its parameter distribution ● Pros: Randomness continues to explore without update ● Cons: Hard to compute probabilities of actions
  45. 45. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝛽(3, 3) = 𝛽(2, 3) = Distribution 𝛽(1, 3) =
  46. 46. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  47. 47. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  48. 48. Thompson Sampling Example 1 0 1 0 0 0 0 1 0 1 Distribution 𝛽(3, 3) = 𝛽(3, 3) = 𝛽(1, 3) =
  49. 49. Many Variants of Bandits ● Standard setting: Stochastic and stationary ● Drifting: Reward values change over time ● Adversarial: No assumptions on how rewards are generated ● Continuous action space ● Infinite set of actions ● Varying set of actions over time ● ...
  50. 50. What about personalization?
  51. 51. Contextual Bandits ● Let’s make this harder! ● Slot machines where payout depends on context ● E.g. time of day, blinking light on slot machine, ...
  52. 52. Contextual Bandit Learner (Policy) Environment Action Reward Context Each round: ● Environment provides context (feature) vector ● Learner chooses an action for context ● Environment provides a real-valued reward for action in context ● Learner updates to maximize the cumulative reward
  53. 53. Supervised Learning Contextual Bandits Input: Features (x∊ℝd ) Output: Predicted label Feedback: Actual label (y) Input: Context (x∊ℝd ) Output: Action (a = 𝜋(x)) Feedback: Reward (r∊ℝ)
  54. 54. Supervised Learning Contextual Bandits Example Chihuahua images from ImageNet Cat Dog Cat 0 Dog ✓ Seal 0 ??? Reward Dog Label Dog Dog Fox 0✓
  55. 55. Artwork Personalization as Contextual Bandit ● Context: Member, device, page, etc. Artwork Selector ▶
  56. 56. Choose Epsilon Greedy Example 𝝐 1-𝝐 Personalized Image Image At Random
  57. 57. ● Learn a supervised regression model per image to predict reward ● Pick image with highest predicted reward Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max Greedy Policy Example
  58. 58. ● Linear model to calculate uncertainty in reward estimate ● Choose image with highest 𝛂-percentile predicted reward value Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max LinUCB Example Lin et al., 2010
  59. 59. ● Learn distribution over model parameters (e.g. Bayesian Regression) ● Sample a model, evaluate features, take arg max Thompson Sampling Example Member (context) Features Image Pool Sample 1 Winner Sample 2 Sample 3 Sample 4 arg max Chappelle & Li, 2011 Model 1 Model 2 Model 3 Model 4
  60. 60. Offline Metric: Replay Offline Take Fraction: 2/3 Logged Actions Model Assignments ▶ ▶ Li et al., 2011
  61. 61. Replay ● Pros ○ Unbiased metric when using logged probabilities ○ Easy to compute ○ Rewards observed are real ● Cons ○ Requires a lot of data ○ High variance due if few matches ■ Techniques like Doubly-Robust estimation (Dudik, Langford & Li, 2011) can help
  62. 62. Offline Replay Results ● Bandit finds good images ● Personalization is better ● Artwork variety matters ● Personalization wiggles around best images Lift in Replay in the various algorithms as compared to the Random baseline
  63. 63. Bandits in the Real World
  64. 64. ● Getting started ○ Need data to learn ○ Warm-starting via batch learning from existing data ● Closing the feedback loop ○ Only exposing bandit to its own output ● Algorithm performance depends data volume ○ Need to be able to test bandits at large scale, head-to-head A/B testing Bandit Algorithms
  65. 65. Starting the Loop Explore User Action Context Join Logging Reward Data Store Model User Update Action Context Join Logging Reward Data Store Incremental Publish Train Batch Publish Completing the Loop
  66. 66. ● Need to serve an image for any title in the catalog ○ Calls from homepage, search, galleries, etc. ○ > 20M RPS at peak ● Existing UI code written assuming image lookup is fast ○ In memory map of video ID to URL ○ Want to insert Machine Learned model ○ Don’t want a big rewrite across all UI code Scale Challenges
  67. 67. Live Compute Online Precompute Synchronous computation to choose image for title in response to a member request Asynchronous computation to choose image for title before request and stored in cache
  68. 68. Live Compute Online Precompute Pros: ● Access to most fresh data ● Knowledge of full context ● Compute only what is necessary Cons: ● Strict Service Level Agreements ○ Must respond quickly in all cases ○ Requires high availability ● Restricted to simple algorithms Pros: ● Can handle large data ● Can run moderate complexity algorithms ● Can average computational cost across users ● Change from actions Cons: ● Has some delay ● Done in event context ● Extra compute for users and items not served See techblog for more details
  69. 69. System Architecture Edge Personalized Image Precompute EV Cache Precompute logs ETL (aggregate data) Model training Bandit model UI image request Play and Impression logs
  70. 70. Precompute & Image Lookup ● Precompute ○ Run bandit for each title on each profile to choose personalized image ○ Store the title to image mapping in EVCache ● Image Lookup ○ Pull profile’s image mapping from EVCache once per request
  71. 71. Logging & Reward ● Precompute Logging ○ Selected image ○ Exploration Probability ○ Candidate pool ○ Snapshot facts for feature generation ● Reward Logging ○ Image rendered in UI & if played ○ Precompute ID Image via YouTube
  72. 72. Feature Generation & Training ● Join rewards with snapshotted facts ● Generate features using DeLorean ○ Feature encoders are shared online and offline ● Train the model using Spark ● Publish model to production DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  73. 73. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics Reserve a fraction of data for a simple policy (e.g. 𝝐-greedy) to sanity check bandits Monitoring and Resiliency
  74. 74. ● Missing images greatly degrade the member experience ● Try to serve the best image possible Graceful Degradation Personalized Selection Unpersonalized Fallback Default Image (when all else fails)
  75. 75. Does it work?
  76. 76. Online results ● A/B test: It works! ● Rolled out to our >130M member base ● Most beneficial for lesser known titles ● Competition between titles for attention leads to compression of offline metrics More details in our blog post
  77. 77. Future Work
  78. 78. More dimensions to personalize Rows Trailer Evidence Synopsis Image Row Title Metadata Ranking
  79. 79. Automatic image selection ● Generating new artwork is costly and time consuming ● Can we predict performance from raw image?
  80. 80. Artwork selection orchestration ● Neighboring image selection influences result Row A (microphones) Example: Stand-up comedy Row B (more variety)
  81. 81. Long-term Reward: Road to Reinforcement Learning ● RL involves multiple actions and delayed reward ● Useful to maximize user long-term joy?
  82. 82. Thank you @JustinBasilico

×