SlideShare a Scribd company logo
Artwork
Personalization
at Netflix
Justin Basilico
QCon SF 2018
2018-11-05
@JustinBasilico
Which artwork to show?
A good image is...
1. Representative
2. Informative
3. Engaging
4. Differential
A good image is...
1. Representative
2. Informative
3. Engaging
4. Differential
Personal
Intuition: Preferences in cast members
Intuition: Preferences in genre
Choose artwork so that members
understand if they will likely enjoy a title
to maximize satisfaction and retention
Challenges in Artwork
Personalization
Everything is a Recommendation
Over 80% of what
people watch
comes from our
recommendations
Rankings
Rows
Attribution
Pick
only one
Was it the recommendation or artwork?
Or both?
▶
Change Effects
Which one caused the play?
Is change confusing?
Day 1 Day 2
▶
● Creatives select the images that are available
● But algorithms must be still robust
Adding meaning and avoiding clickbait
Scale
Over 20M RPS
for images at
peak
Traditional Recommendations
Collaborative Filtering:
Recommend items that
similar users have chosen
0 1 0 1 0
0 0 1 1 0
1 0 0 1 1
0 1 0 0 0
0 0 0 0 1
Users
Items
Members can only play
images we choose
Need
something
more
Bandit
Not that kind
of Bandit
Image from Wikimedia commons
Multi-Armed Bandits (MAB)
● Multiple slot machines with
unknown reward distribution
● A gambler can play one arm at a time
● Which machine to play to maximize
reward?
Bandit Algorithms Setting
Each round:
● Learner chooses an action
● Environment provides a real-valued reward for action
● Learner updates to maximize the cumulative reward
Learner
(Policy)
Environment
Action
Reward
Artwork Optimization as Bandit
● Environment: Netflix homepage
● Learner: Artwork selector for a show
● Action: Display specific image for show
● Reward: Member has positive engagement
Artwork Selector
▶
● What images should creatives provide?
○ Variety of image designs
○ Thematic and visual differences
● How many images?
○ Creating each image has a cost
○ Diminishing returns
Images as Actions
● What is a good outcome?
✓ Watching and enjoying the content
● What is a bad outcome?
✖ No engagement
✖ Abandoning or not enjoying the
content
Designing Rewards
Metric: Take Fraction
▶
Example: Altered Carbon
Take Fraction: 1/3
Minimizing Regret
● What is the best that a bandit can do?
○ Always choose optimal action
● Regret: Difference between optimal
action and chosen action
● To maximize reward, minimize the
cumulative regret
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
Choose
image
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
2/4
0/2
1/3
Observed
Take
Fraction
Overall: 3/9
Strategy
Show current best image
Try another image to learn
if it is actually better
ExplorationMaximization
vs.
Principles of Exploration
● Gather information to make the best overall decision
in the long-run
● Best long-term strategy may involve short-term
sacrifices
Common strategies
1. Naive Exploration
2. Optimism in the Face of Uncertainty
3. Probability Matching
Naive Exploration: 𝝐-greedy
● Idea: Add a noise to the greedy policy
● Algorithm:
○ With probability 𝝐
■ Choose one action uniformly at random
○ Otherwise
■ Choose the action with the best reward so far
● Pros: Simple
● Cons: Regret is unbounded
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
2/4
(greedy)
0/2
1/3
Observed
Reward
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
𝝐 / 3
𝝐 / 3
1 - 2𝝐 / 3
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Epsilon-Greedy Example
1 0 1 0
0 0 0
0 1 0
2/4
(greedy)
0/3
1/3
Observed
Reward
Optimism: Upper Confidence Bound (UCB)
● Idea: Prefer actions with uncertain values
● Approach:
○ Compute confidence interval of observed rewards
for each action
○ Choose action a with the highest 𝛂-percentile
○ Observe reward and update confidence interval
for a
● Pros: Theoretical regret minimization properties
● Cons: Needs to update quickly from observed rewards
Beta-Bernoulli Distribution
Image from Wikipedia
Beta Bernoulli
Pr(1) = p
Pr(0) = 1 - p
Prior
Bandit Example with Beta-Bernoulli
2/4
0/2
1/3
Observed Take
Fraction
Prior:
𝛽(1, 1)
𝛽(3, 3)
𝛽(2, 3)
𝛽(1, 3)+ =
A
B
C
Bayesian UCB Example
1 0 1 1 ?
0 0 ?
0 1 0 ?
[0.15, 0.85]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 ?
0 0 ?
0 1 0 ?
[0.15, 0.85]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 0
0 0
0 1 0
[0.12, 0.78]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 0
0 0
0 1 0
[0.12, 0.78]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Probabilistic: Thompson Sampling
● Idea: Select the actions by the probability they are the best
● Approach:
○ Keep a distribution over model parameters for each action
○ Sample estimated reward value for each action
○ Choose action a with maximum sampled value
○ Observe reward for action a and update its parameter distribution
● Pros: Randomness continues to explore without update
● Cons: Hard to compute probabilities of actions
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
𝛽(3, 3) =
𝛽(2, 3) =
Distribution
𝛽(1, 3) =
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
0.38
0.59
Sampled
values
0.18
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
0.38
0.59
Sampled
values
0.18
Thompson Sampling Example
1 0 1 0
0 0
0 1 0 1
Distribution
𝛽(3, 3) =
𝛽(3, 3) =
𝛽(1, 3) =
Many Variants of Bandits
● Standard setting: Stochastic and stationary
● Drifting: Reward values change over time
● Adversarial: No assumptions on how rewards are generated
● Continuous action space
● Infinite set of actions
● Varying set of actions over time
● ...
What about personalization?
Contextual Bandits
● Let’s make this harder!
● Slot machines where payout depends on
context
● E.g. time of day, blinking light on slot
machine, ...
Contextual Bandit
Learner
(Policy)
Environment
Action
Reward
Context
Each round:
● Environment provides context (feature) vector
● Learner chooses an action for context
● Environment provides a real-valued reward for action in context
● Learner updates to maximize the cumulative reward
Supervised Learning Contextual Bandits
Input: Features (x∊ℝd
)
Output: Predicted label
Feedback: Actual label (y)
Input: Context (x∊ℝd
)
Output: Action (a = 𝜋(x))
Feedback: Reward (r∊ℝ)
Supervised Learning Contextual Bandits
Example Chihuahua images from ImageNet
Cat Dog Cat 0
Dog ✓ Seal 0
???
Reward
Dog
Label
Dog Dog Fox 0✓
Artwork Personalization
as Contextual Bandit
● Context: Member, device, page, etc.
Artwork Selector
▶
Choose
Epsilon Greedy Example
𝝐 1-𝝐
Personalized
Image
Image
At Random
● Learn a supervised regression model per image to predict reward
● Pick image with highest predicted reward
Member
(context)
Features
Image Pool
Model 1
Winner
Model 2
Model 3
Model 4
arg
max
Greedy Policy Example
● Linear model to calculate uncertainty in reward estimate
● Choose image with highest 𝛂-percentile predicted reward value
Member
(context)
Features
Image Pool
Model 1
Winner
Model 2
Model 3
Model 4
arg
max
LinUCB Example
Lin et al., 2010
● Learn distribution over model parameters (e.g. Bayesian Regression)
● Sample a model, evaluate features, take arg max
Thompson Sampling Example
Member
(context)
Features
Image Pool
Sample 1
Winner
Sample 2
Sample 3
Sample 4
arg
max
Chappelle & Li, 2011
Model 1
Model 2
Model 3
Model 4
Offline Metric: Replay
Offline Take Fraction: 2/3
Logged
Actions
Model
Assignments
▶ ▶
Li et al., 2011
Replay
● Pros
○ Unbiased metric when using logged probabilities
○ Easy to compute
○ Rewards observed are real
● Cons
○ Requires a lot of data
○ High variance due if few matches
■ Techniques like Doubly-Robust estimation (Dudik, Langford
& Li, 2011) can help
Offline Replay Results
● Bandit finds good images
● Personalization is better
● Artwork variety matters
● Personalization wiggles
around best images
Lift in Replay in the various algorithms as
compared to the Random baseline
Bandits in the Real
World
● Getting started
○ Need data to learn
○ Warm-starting via batch learning from existing data
● Closing the feedback loop
○ Only exposing bandit to its own output
● Algorithm performance depends data volume
○ Need to be able to test bandits at large scale, head-to-head
A/B testing Bandit Algorithms
Starting the Loop
Explore User
Action
Context
Join
Logging
Reward
Data
Store
Model User
Update
Action
Context
Join
Logging
Reward
Data
Store
Incremental
Publish
Train Batch
Publish
Completing the Loop
● Need to serve an image for any title in the catalog
○ Calls from homepage, search, galleries, etc.
○ > 20M RPS at peak
● Existing UI code written assuming image lookup is fast
○ In memory map of video ID to URL
○ Want to insert Machine Learned model
○ Don’t want a big rewrite across all UI code
Scale Challenges
Live Compute Online Precompute
Synchronous computation
to choose image for title in
response to a member
request
Asynchronous computation
to choose image for title
before request and stored in
cache
Live Compute Online Precompute
Pros:
● Access to most fresh data
● Knowledge of full context
● Compute only what is necessary
Cons:
● Strict Service Level Agreements
○ Must respond quickly in all
cases
○ Requires high availability
● Restricted to simple algorithms
Pros:
● Can handle large data
● Can run moderate complexity
algorithms
● Can average computational
cost across users
● Change from actions
Cons:
● Has some delay
● Done in event context
● Extra compute for users and
items not served
See techblog for more details
System Architecture
Edge
Personalized
Image
Precompute
EV Cache
Precompute
logs
ETL
(aggregate
data)
Model
training
Bandit
model
UI
image
request
Play and
Impression
logs
Precompute & Image Lookup
● Precompute
○ Run bandit for each title on each profile to
choose personalized image
○ Store the title to image mapping in EVCache
● Image Lookup
○ Pull profile’s image mapping from EVCache
once per request
Logging & Reward
● Precompute Logging
○ Selected image
○ Exploration Probability
○ Candidate pool
○ Snapshot facts for feature generation
● Reward Logging
○ Image rendered in UI & if played
○ Precompute ID
Image via YouTube
Feature Generation & Training
● Join rewards with snapshotted facts
● Generate features using DeLorean
○ Feature encoders are shared online and offline
● Train the model using Spark
● Publish model to production
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
Track the quality of the model
- Compare prediction to actual behavior
- Online equivalents of offline metrics
Reserve a fraction of data for a simple
policy (e.g. 𝝐-greedy) to sanity check
bandits
Monitoring and Resiliency
● Missing images greatly degrade the member experience
● Try to serve the best image possible
Graceful Degradation
Personalized
Selection
Unpersonalized
Fallback
Default Image
(when all else fails)
Does it work?
Online results
● A/B test: It works!
● Rolled out to our >130M member base
● Most beneficial for lesser known titles
● Competition between titles for
attention leads to compression of
offline metrics More details in our blog post
Future Work
More dimensions to personalize
Rows
Trailer
Evidence
Synopsis
Image
Row Title
Metadata
Ranking
Automatic image selection
● Generating new artwork is costly and time consuming
● Can we predict performance from raw image?
Artwork selection orchestration
● Neighboring image selection influences result
Row A
(microphones)
Example: Stand-up comedy
Row B
(more variety)
Long-term Reward: Road to
Reinforcement Learning
● RL involves multiple actions and delayed reward
● Useful to maximize user long-term joy?
Thank you
@JustinBasilico

More Related Content

What's hot

Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"
Mounia Lalmas-Roelleke
 

What's hot (20)

Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"
 

Similar to Artwork Personalization at Netflix

Recommender systems
Recommender systems Recommender systems
Recommender systems
Mahmoud Khaled
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
IntoTheMinds
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
MLconf
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
Roelof van Zwol
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
Stitch Fix Algorithms
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
Ruth Yakubu
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
Gabriele Sottocornola
 
Sprezzatura - Roelof van Zwol - May 2018
Sprezzatura  - Roelof van Zwol - May 2018Sprezzatura  - Roelof van Zwol - May 2018
Sprezzatura - Roelof van Zwol - May 2018
Roelof van Zwol
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
Vasco Duarte
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
Faisal Siddiqi
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
Daniel Hen
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
Divyansh Dokania
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
台灣資料科學年會
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 

Similar to Artwork Personalization at Netflix (20)

Recommender systems
Recommender systems Recommender systems
Recommender systems
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
 
Sprezzatura - Roelof van Zwol - May 2018
Sprezzatura  - Roelof van Zwol - May 2018Sprezzatura  - Roelof van Zwol - May 2018
Sprezzatura - Roelof van Zwol - May 2018
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 

More from Justin Basilico

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Justin Basilico
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
Justin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 

More from Justin Basilico (6)

Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

Artwork Personalization at Netflix

  • 1. Artwork Personalization at Netflix Justin Basilico QCon SF 2018 2018-11-05 @JustinBasilico
  • 2.
  • 4. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential
  • 5. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential Personal
  • 8. Choose artwork so that members understand if they will likely enjoy a title to maximize satisfaction and retention
  • 10. Everything is a Recommendation Over 80% of what people watch comes from our recommendations Rankings Rows
  • 11. Attribution Pick only one Was it the recommendation or artwork? Or both? ▶
  • 12. Change Effects Which one caused the play? Is change confusing? Day 1 Day 2 ▶
  • 13. ● Creatives select the images that are available ● But algorithms must be still robust Adding meaning and avoiding clickbait
  • 14. Scale Over 20M RPS for images at peak
  • 15. Traditional Recommendations Collaborative Filtering: Recommend items that similar users have chosen 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items Members can only play images we choose
  • 20. Multi-Armed Bandits (MAB) ● Multiple slot machines with unknown reward distribution ● A gambler can play one arm at a time ● Which machine to play to maximize reward?
  • 21. Bandit Algorithms Setting Each round: ● Learner chooses an action ● Environment provides a real-valued reward for action ● Learner updates to maximize the cumulative reward Learner (Policy) Environment Action Reward
  • 22. Artwork Optimization as Bandit ● Environment: Netflix homepage ● Learner: Artwork selector for a show ● Action: Display specific image for show ● Reward: Member has positive engagement Artwork Selector ▶
  • 23. ● What images should creatives provide? ○ Variety of image designs ○ Thematic and visual differences ● How many images? ○ Creating each image has a cost ○ Diminishing returns Images as Actions
  • 24. ● What is a good outcome? ✓ Watching and enjoying the content ● What is a bad outcome? ✖ No engagement ✖ Abandoning or not enjoying the content Designing Rewards
  • 25. Metric: Take Fraction ▶ Example: Altered Carbon Take Fraction: 1/3
  • 26. Minimizing Regret ● What is the best that a bandit can do? ○ Always choose optimal action ● Regret: Difference between optimal action and chosen action ● To maximize reward, minimize the cumulative regret
  • 27. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions
  • 28. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions Choose image
  • 29. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions 2/4 0/2 1/3 Observed Take Fraction Overall: 3/9
  • 30. Strategy Show current best image Try another image to learn if it is actually better ExplorationMaximization vs.
  • 31. Principles of Exploration ● Gather information to make the best overall decision in the long-run ● Best long-term strategy may involve short-term sacrifices
  • 32. Common strategies 1. Naive Exploration 2. Optimism in the Face of Uncertainty 3. Probability Matching
  • 33. Naive Exploration: 𝝐-greedy ● Idea: Add a noise to the greedy policy ● Algorithm: ○ With probability 𝝐 ■ Choose one action uniformly at random ○ Otherwise ■ Choose the action with the best reward so far ● Pros: Simple ● Cons: Regret is unbounded
  • 34. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 2/4 (greedy) 0/2 1/3 Observed Reward
  • 35. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝝐 / 3 𝝐 / 3 1 - 2𝝐 / 3
  • 36. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ?
  • 37. Epsilon-Greedy Example 1 0 1 0 0 0 0 0 1 0 2/4 (greedy) 0/3 1/3 Observed Reward
  • 38. Optimism: Upper Confidence Bound (UCB) ● Idea: Prefer actions with uncertain values ● Approach: ○ Compute confidence interval of observed rewards for each action ○ Choose action a with the highest 𝛂-percentile ○ Observe reward and update confidence interval for a ● Pros: Theoretical regret minimization properties ● Cons: Needs to update quickly from observed rewards
  • 39. Beta-Bernoulli Distribution Image from Wikipedia Beta Bernoulli Pr(1) = p Pr(0) = 1 - p Prior
  • 40. Bandit Example with Beta-Bernoulli 2/4 0/2 1/3 Observed Take Fraction Prior: 𝛽(1, 1) 𝛽(3, 3) 𝛽(2, 3) 𝛽(1, 3)+ = A B C
  • 41. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 42. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 43. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 44. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 45. Probabilistic: Thompson Sampling ● Idea: Select the actions by the probability they are the best ● Approach: ○ Keep a distribution over model parameters for each action ○ Sample estimated reward value for each action ○ Choose action a with maximum sampled value ○ Observe reward for action a and update its parameter distribution ● Pros: Randomness continues to explore without update ● Cons: Hard to compute probabilities of actions
  • 46. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝛽(3, 3) = 𝛽(2, 3) = Distribution 𝛽(1, 3) =
  • 47. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  • 48. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  • 49. Thompson Sampling Example 1 0 1 0 0 0 0 1 0 1 Distribution 𝛽(3, 3) = 𝛽(3, 3) = 𝛽(1, 3) =
  • 50. Many Variants of Bandits ● Standard setting: Stochastic and stationary ● Drifting: Reward values change over time ● Adversarial: No assumptions on how rewards are generated ● Continuous action space ● Infinite set of actions ● Varying set of actions over time ● ...
  • 52. Contextual Bandits ● Let’s make this harder! ● Slot machines where payout depends on context ● E.g. time of day, blinking light on slot machine, ...
  • 53. Contextual Bandit Learner (Policy) Environment Action Reward Context Each round: ● Environment provides context (feature) vector ● Learner chooses an action for context ● Environment provides a real-valued reward for action in context ● Learner updates to maximize the cumulative reward
  • 54. Supervised Learning Contextual Bandits Input: Features (x∊ℝd ) Output: Predicted label Feedback: Actual label (y) Input: Context (x∊ℝd ) Output: Action (a = 𝜋(x)) Feedback: Reward (r∊ℝ)
  • 55. Supervised Learning Contextual Bandits Example Chihuahua images from ImageNet Cat Dog Cat 0 Dog ✓ Seal 0 ??? Reward Dog Label Dog Dog Fox 0✓
  • 56. Artwork Personalization as Contextual Bandit ● Context: Member, device, page, etc. Artwork Selector ▶
  • 57. Choose Epsilon Greedy Example 𝝐 1-𝝐 Personalized Image Image At Random
  • 58. ● Learn a supervised regression model per image to predict reward ● Pick image with highest predicted reward Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max Greedy Policy Example
  • 59. ● Linear model to calculate uncertainty in reward estimate ● Choose image with highest 𝛂-percentile predicted reward value Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max LinUCB Example Lin et al., 2010
  • 60. ● Learn distribution over model parameters (e.g. Bayesian Regression) ● Sample a model, evaluate features, take arg max Thompson Sampling Example Member (context) Features Image Pool Sample 1 Winner Sample 2 Sample 3 Sample 4 arg max Chappelle & Li, 2011 Model 1 Model 2 Model 3 Model 4
  • 61. Offline Metric: Replay Offline Take Fraction: 2/3 Logged Actions Model Assignments ▶ ▶ Li et al., 2011
  • 62. Replay ● Pros ○ Unbiased metric when using logged probabilities ○ Easy to compute ○ Rewards observed are real ● Cons ○ Requires a lot of data ○ High variance due if few matches ■ Techniques like Doubly-Robust estimation (Dudik, Langford & Li, 2011) can help
  • 63. Offline Replay Results ● Bandit finds good images ● Personalization is better ● Artwork variety matters ● Personalization wiggles around best images Lift in Replay in the various algorithms as compared to the Random baseline
  • 64. Bandits in the Real World
  • 65. ● Getting started ○ Need data to learn ○ Warm-starting via batch learning from existing data ● Closing the feedback loop ○ Only exposing bandit to its own output ● Algorithm performance depends data volume ○ Need to be able to test bandits at large scale, head-to-head A/B testing Bandit Algorithms
  • 66. Starting the Loop Explore User Action Context Join Logging Reward Data Store Model User Update Action Context Join Logging Reward Data Store Incremental Publish Train Batch Publish Completing the Loop
  • 67. ● Need to serve an image for any title in the catalog ○ Calls from homepage, search, galleries, etc. ○ > 20M RPS at peak ● Existing UI code written assuming image lookup is fast ○ In memory map of video ID to URL ○ Want to insert Machine Learned model ○ Don’t want a big rewrite across all UI code Scale Challenges
  • 68. Live Compute Online Precompute Synchronous computation to choose image for title in response to a member request Asynchronous computation to choose image for title before request and stored in cache
  • 69. Live Compute Online Precompute Pros: ● Access to most fresh data ● Knowledge of full context ● Compute only what is necessary Cons: ● Strict Service Level Agreements ○ Must respond quickly in all cases ○ Requires high availability ● Restricted to simple algorithms Pros: ● Can handle large data ● Can run moderate complexity algorithms ● Can average computational cost across users ● Change from actions Cons: ● Has some delay ● Done in event context ● Extra compute for users and items not served See techblog for more details
  • 71. Precompute & Image Lookup ● Precompute ○ Run bandit for each title on each profile to choose personalized image ○ Store the title to image mapping in EVCache ● Image Lookup ○ Pull profile’s image mapping from EVCache once per request
  • 72. Logging & Reward ● Precompute Logging ○ Selected image ○ Exploration Probability ○ Candidate pool ○ Snapshot facts for feature generation ● Reward Logging ○ Image rendered in UI & if played ○ Precompute ID Image via YouTube
  • 73. Feature Generation & Training ● Join rewards with snapshotted facts ● Generate features using DeLorean ○ Feature encoders are shared online and offline ● Train the model using Spark ● Publish model to production DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 74. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics Reserve a fraction of data for a simple policy (e.g. 𝝐-greedy) to sanity check bandits Monitoring and Resiliency
  • 75. ● Missing images greatly degrade the member experience ● Try to serve the best image possible Graceful Degradation Personalized Selection Unpersonalized Fallback Default Image (when all else fails)
  • 77. Online results ● A/B test: It works! ● Rolled out to our >130M member base ● Most beneficial for lesser known titles ● Competition between titles for attention leads to compression of offline metrics More details in our blog post
  • 79. More dimensions to personalize Rows Trailer Evidence Synopsis Image Row Title Metadata Ranking
  • 80. Automatic image selection ● Generating new artwork is costly and time consuming ● Can we predict performance from raw image?
  • 81. Artwork selection orchestration ● Neighboring image selection influences result Row A (microphones) Example: Stand-up comedy Row B (more variety)
  • 82. Long-term Reward: Road to Reinforcement Learning ● RL involves multiple actions and delayed reward ● Useful to maximize user long-term joy?