Artwork
Personalization
at Netflix
Fernando Amat
RecSys, Oct 2018
Quickly help members discover content they’ll love
Global Members, Personalized Tastes
130 Million Members
~180 Countries
98% Match
Spot the
Algorithms!
Artwork Optimization
Artwork Optimization
Goal: Recommend a personalized
artwork or imagery for a title to help
members decide if they will enjoy the
title or not.
Intuition for Personalized Assets
● Emphasize themes through different artwork according to some
context (user, viewing history, country, etc.)
Preferences in genre
Intuition for Personalized Assets
● Emphasize themes through different artwork according to some
context (user, viewing history, country, etc)
Preferences in cast members
Bandit Algorithms Setting
For each (user, show) request:
● Actions: set of candidate images available
● Reward: how many minutes did the user play from that impression
● Environment: Netflix homepage in user’s device
● Learner: its goal is to maximize the cumulative reward after N requests
Learner Environment
Action
Reward
Context
Numerous Variants
● Different Strategies: ε-Greedy, Thompson Sampling (TS), Upper Confidence
Bound (UCB), etc.
● Different Environments:
○ Stochastic and stationary: Reward is generated i.i.d. from a distribution
specific to the action. No payoff drift.
○ Adversarial: No assumptions on how rewards are generated.
● Different objectives: Cumulative regret, tracking the best expert
● Continuous or discrete set of actions, finite vs infinite
● Extensions: Varying set of arms, Contextual Bandits, etc.
Specific challenges
● Play attribution and reward assignment
○ Incremental effect of the image on top of recommender system
● Only one image per title can be presented
○ Although inherently it is a ranking problem
Would you play because the movie is recommended or because of the artwork? Or both?
Specific challenges
● Change effect
○ Can changing images too often make users confused?
Session 1 Session 2 Session 3 ... Session N
Sequence A
Sequence B
● We have control over the set of actions
○ How many images per show
○ Image design
● What makes a good asset?
○ Representative (no clickbait)
○ Differential
○ Informative
○ Engaging
Actions
Personal (i.e. contextual)
Explore
show?
Choose
Epsilon Greedy Example
εprofile
1-εprofile
εshow
1-εshow Personalized Image
Image
At Random
● Learn a binary classifier per image to predict probability of play
● Pick the winner (arg max)
Member
(context)
Features
Image Pool
Model 1
Winner
arg
max
Model 2
Model 3
Model 4
Greedy Policy Example
Take Fraction Example: Luke Cage
Take Fraction = 1 / 3
Play
No play
User A
User B
User C
● Unbiased offline evaluation from explore data
Offline metric: Replay [Li et al, 2010]
Offline Take Fraction = 2 / 3
User 1 User 2 User 3 User 4 User 5 User 6
Random Assignment
Play?
Model Assignment
Offline Replay
● Context matters
● Artwork diversity matters
● Personalization wiggles
around most popular images
Lift in Replay in the various algorithms as
compared to the Random baseline
Online results
● Rollout to our >130M member base
● Most beneficial for lesser known titles
● Compression from title -level offline metrics due to cannibalization
between titles
Research
Directions
Action selection orchestration
● Neighboring image selection influences result
● Title-level optimization is not enough
Row A
(diverse
images)
Row B
(the
microphone
row)
Stand-up comedy
Automatic image selection
● Generating new artwork is costly and time consuming
● Develop algorithm to predict asset quality from raw image
Long-term Reward: Road to RL
● Maximize long term reward: reinforcement learning
○ User long term joy rather than plays
Thank you.
Fernando Amat (famat@netflix.com)
Blogpost
We are hiring!

Artworks personalization on Netflix

  • 1.
  • 2.
    Quickly help membersdiscover content they’ll love
  • 3.
    Global Members, PersonalizedTastes 130 Million Members ~180 Countries
  • 4.
  • 5.
  • 6.
    Artwork Optimization Goal: Recommenda personalized artwork or imagery for a title to help members decide if they will enjoy the title or not.
  • 7.
    Intuition for PersonalizedAssets ● Emphasize themes through different artwork according to some context (user, viewing history, country, etc.) Preferences in genre
  • 8.
    Intuition for PersonalizedAssets ● Emphasize themes through different artwork according to some context (user, viewing history, country, etc) Preferences in cast members
  • 9.
    Bandit Algorithms Setting Foreach (user, show) request: ● Actions: set of candidate images available ● Reward: how many minutes did the user play from that impression ● Environment: Netflix homepage in user’s device ● Learner: its goal is to maximize the cumulative reward after N requests Learner Environment Action Reward Context
  • 10.
    Numerous Variants ● DifferentStrategies: ε-Greedy, Thompson Sampling (TS), Upper Confidence Bound (UCB), etc. ● Different Environments: ○ Stochastic and stationary: Reward is generated i.i.d. from a distribution specific to the action. No payoff drift. ○ Adversarial: No assumptions on how rewards are generated. ● Different objectives: Cumulative regret, tracking the best expert ● Continuous or discrete set of actions, finite vs infinite ● Extensions: Varying set of arms, Contextual Bandits, etc.
  • 11.
    Specific challenges ● Playattribution and reward assignment ○ Incremental effect of the image on top of recommender system ● Only one image per title can be presented ○ Although inherently it is a ranking problem Would you play because the movie is recommended or because of the artwork? Or both?
  • 12.
    Specific challenges ● Changeeffect ○ Can changing images too often make users confused? Session 1 Session 2 Session 3 ... Session N Sequence A Sequence B
  • 13.
    ● We havecontrol over the set of actions ○ How many images per show ○ Image design ● What makes a good asset? ○ Representative (no clickbait) ○ Differential ○ Informative ○ Engaging Actions Personal (i.e. contextual)
  • 14.
  • 15.
    ● Learn abinary classifier per image to predict probability of play ● Pick the winner (arg max) Member (context) Features Image Pool Model 1 Winner arg max Model 2 Model 3 Model 4 Greedy Policy Example
  • 16.
    Take Fraction Example:Luke Cage Take Fraction = 1 / 3 Play No play User A User B User C
  • 17.
    ● Unbiased offlineevaluation from explore data Offline metric: Replay [Li et al, 2010] Offline Take Fraction = 2 / 3 User 1 User 2 User 3 User 4 User 5 User 6 Random Assignment Play? Model Assignment
  • 18.
    Offline Replay ● Contextmatters ● Artwork diversity matters ● Personalization wiggles around most popular images Lift in Replay in the various algorithms as compared to the Random baseline
  • 19.
    Online results ● Rolloutto our >130M member base ● Most beneficial for lesser known titles ● Compression from title -level offline metrics due to cannibalization between titles
  • 20.
  • 21.
    Action selection orchestration ●Neighboring image selection influences result ● Title-level optimization is not enough Row A (diverse images) Row B (the microphone row) Stand-up comedy
  • 22.
    Automatic image selection ●Generating new artwork is costly and time consuming ● Develop algorithm to predict asset quality from raw image
  • 23.
    Long-term Reward: Roadto RL ● Maximize long term reward: reinforcement learning ○ User long term joy rather than plays
  • 24.
    Thank you. Fernando Amat(famat@netflix.com) Blogpost We are hiring!