Recent Trends in Personalization at Netflix

Recent Trends in
Personalization
at Netflix
Justin Basilico
RecSys 2020 Expo
2020-09-24
@JustinBasilico

Help members find content
to watch and enjoy to maximize
member satisfaction and retention

Ordering of videos is personalized
From how we rank
Ranking

Selection and placement of rows is personalized
... to how we construct a pageRows

... to how we respond to queries
Search query & result recommendation

... to what images we suggest
Frame recommendation for artists

Personalized artwork selection
... and then select

... to how we reach out
Message personalization

Everything is a recommendation!

○ Every person is unique with a variety of interests
… and sometimes they share profiles
○ Help people find what they want when they’re not sure what they want
○ Large datasets but small data per user
… and potentially biased by the output of your system
○ Cold-start problems on all sides
○ Non-stationary, context-dependent, mood-dependent, ...
○ More than just accuracy: Diversity, novelty, freshness, fairness, ...
○ ...
No, personalization is hard!

So what are you doing about it?

Some recent avenues in approaching these challenges:
1. Causality
2. Bandits
3. Reinforcement Learning
4. Objectives
5. Fairness
6. Experience Personalization
Trending Now

From Correlation to Causation
● Most recommendation algorithms
are correlational
○ Some early recommendation
algorithms literally computed
correlations between users and items
● Did you watch a movie because
we recommended it to you? Or
because you liked it? Or both?
● If you had to watch a movie, would
you like it? [Wang et al., 2020] p(Y|X) → p(Y|X, do(R))
(from http://www.tylervigen.com/spurious-correlations)

Feedback loops
Impression bias
inflates plays
Leads to inflated
item popularity
More plays
More
impressions
Oscillations in
distribution of genre
recommendations
Feedback loops can cause biases to be
reinforced by the recommendation system!
[Chaney et al., 2018]: simulations showing that this can reduce the
usefulness of the system

Closed Loop
Training
Data
Watches Model
Recs
Search
Training
Data
Watches Model
Recs
Open Loop

Challenges in Causal Recommendations
● Handling unobserved confounders
● Coming up with the right causal graph for the model
● High variance in many causal models
● Computational challenges (e.g. [Wong, 2020])
● Connecting causal recommendations with other aspects like
off-policy reinforcement learning
● When and how to introduce randomization

Trend 2: Bandits in
Recommendations

Why contextual bandits for recommendations?
● Break feedback loops
● Want to explore to learn
● Uncertainty around user interests and new items
● Sparse and indirect feedback
● Changing trends
▶Early news example: [Li et al., 2010]

Example:
Which artwork to show?

Artwork Personalization as
Contextual Bandit
● Environment: Netflix homepage
● Context: Member, device, page, etc.
● Learner: Artwork selector for a show
● Action: Display specific image for show
● Reward: Member has positive engagement
Artwork Selector
▶

Offline Replay Results
● Bandit finds good images
● Personalization is better
● Artwork variety matters
● Personalization wiggles
around best images
Lift in Replay in the various algorithms as
compared to the Random baseline
[More info in our blog post]

● Designing good exploration is an art
○ Especially to support future algorithm innovation
○ Challenging to do user-level A/B tests comparing fully
on-policy bandits at high scale
● Bandits over large action spaces: rankings and slates
● Layers of bandits that influence each other
● Handling delayed rewards
Challenges in with bandits in the real world

Trend 3: Reinforcement
Learning in
Recommendations

Going Long-Term
● Want to maximize long-term member joy
● Involves many user visits, recommendation actions and delayed reward
● … sounds like Reinforcement Learning

Within a page
RL to optimize a
ranking or slate
How long?
Within a session
RL to optimize
multiple interactions
in a session
Across sessions
RL to optimize
interactions across
multiple sessions

● High-dimensional: Action of recommending a single item is O(|C|);
typically want to do ranking or page construction, which is combinatorial.
So are states such as user histories.
● Off-policy: Need to learn and evaluate from existing system actions
● Concurrent: Don’t observe full trajectories, need to learn simultaneously
from many interactions
● Evolving action space: New actions (items) become available and need to
be cold-started. Non-stationary behavior for existing actions.
● Simulator paradox: A great simulator means you already have a great
recommender
● Reward function design: Expressing the objective in a good way
Challenges of Reinforcement Learning for
Recommendations

Interested in more?
REVEAL Workshop 2020:
Bandit and Reinforcement Learning from User Interactions

● We want to optimize long-term member joy
● While accounting for:
○ Avoiding “trust busters”
○ Coldstarting
○ Fairness
○ ...
What is your recommender trying to optimize?

Layers of Metrics
Training
Objective
Offline Metric Online Metric Goal

Layers of Metrics
RMSE
NDCG on
historical data
User
Engagement in
A/B test
Joy
Example case: Misaligned Metrics
Training
Objective
Offline Metric Online Metric Goal

Your recommendations can only be as good as the
metrics you measure it on

Many recommenders to optimize
● Same objective? Different ones?
● Can we train (some of) them
together using multi-task learning?
● Is there a way to know a-priori if
combining tasks will be beneficial
or not?
User
history
Ranking
Page
Rating
Explanation
Search
Image
Context ...
[Some MTL examples: Zhao et al., 2015, Bansal et al., 2016, Lu et al., 2018, ...]

● Nuanced metrics:
○ Differences between what you want and what you can
encapsulate in a metric
○ Where does enjoyment come from? How does that vary by
person?
○ How do you measure that at scale?
● Ways of measuring improvements offline before going to A/B test?
● What about effects beyond typical A/B time horizon?
● Avoiding introducing lots of parameters to tune
Challenges in objectives

Personalization has a big impact in people’s lives
How do we ensure that it is fair?

Calibrated Recommendations [Steck, 2018]
● Fairness as matching distribution of user interests
● Accuracy as an objective can lead to unbalanced predictions
● Simple example:
● Many recommendation algorithms exhibit this behavior of exaggerating the
dominant interests and crowd out less frequent ones
30 action70 romance
30% action70% romance
User:
Expectation:
100% romanceReality: Maximizes accuracy

Calibration Results (MovieLens 20M)
Baseline model (wMF):
Many users receive
uncalibrated rec’s
After reranking:
Rec’s are much more
calibrated (smaller )
Userdensity
More calibrated (KL divergence)
Submodular
Reranker:

● Which definition of fairness to use in different recommendation
scenarios? [Mehrabi et. al, 2019 catalogues many types]
● Handling fairness without demographic information: both
methods [Beutel et al., 2020] and metrics
● Relationship of fairness with explainability and trust
● Connecting Fairness with all the prior areas
○ Bandits, RL, causality, …
● Beyond fairness of the algorithm: ensuring a positive impact on
society
Challenges in fairness for recommenders

Trend 6:
Experience Personalization

Rating Ranking Pages
4.7
Experience
Evolution of our Personalization Approach

Personalizing how we recommend
(not just what we recommend…)
● Algorithm level: Ideal balance of diversity, popularity,
novelty, freshness, etc. may depend on the person
● Display level: How you present items or explain
recommendations can also be personalized
● Interaction level: Balancing the needs of lean-back
users and power users

So many dimensions to personalize
Rows
Trailer
Evidence
Synopsis
Image
Row Title
Metadata
Ranking

Experience beyond the app
Recommendations New Arrival New Season AlertComing Soon
[Slides about messaging]

● Novelty and learning effects for new experiences
● Cohesion across pages, devices, and time
● Dealing with indirect feedback
● Handling structures of components
○ See [Elahi & Chandrashekar, 2020] poster today
● Coldstarting new experiences
Challenges in Experience Personalization

1. Causality
2. Bandits
3. Reinforcement Learning
4. Objectives
5. Fairness
6. Experience Personalization
Lots of opportunities to improve our
Personalization

Sound interesting?Join us
research.netflix.com/jobs
Interested in internship opportunities?
Follow @NetflixResearch

Thank you
Questions?
@JustinBasilico
Justin Basilico

Recent Trends in Personalization at Netflix

More Related Content

What's hot

Similar to Recent Trends in Personalization at Netflix

More from Justin Basilico

Recently uploaded

Recent Trends in Personalization at Netflix