Recommendation Modeling with Impression Data at Netflix

Recommendation
Modeling with Impression
Data at Netflix
Jiangwei Pan
Research Scientist at Netflix
LERI workshop, RecSys’23

Definition of impressions
An item appears in the viewport of the
application
● for at least x milliseconds
● partially visible can be OK
Impressions can be logged for different
entities on screen
● shows, rows, boxart images, etc

Goal of this presentation
Impression data is critical for building recommender models at
Netflix
● and other industry recommenders
How do we incorporate impression data into recommender
models?
● Impressions for label definition (training objectives)
● Impressions for feature definition
Share interesting learnings and challenges

Impressions for label
definition

Recommenders choose items and
display them to the user as
impressions
What do recommenders do?

A simplified recommendation algorithm
Given a user:
for every item, predict
p(engage | user impression of item)
then choose the item with the highest
prediction

How to train p(engage | impression)
Binary classification model: engage or no-engage?
Training data: take all user-item impressions

If only “relevant” items are impressed
Training data concentrate on the most
relevant part of the item space
If we train classifier using this data
● relevance is not the main difference
between positives and negatives
● so it may be ignored by the model
The classifier will not generalize well to the
whole item space
● may over-predict for many non-relevant
items

Solution 1: Add item exploration
Display random items to each user
User still can’t impress every single item
● there can be millions of items
But user can impress most “types” of items
Model generalizes better!
Too much exploration may hurt user
experience or ads revenue
Explore volume needs to be limited

Solution 2: Add random negatives
Pseudo-impressions with no
engagement
May incorrectly mark a relevant item as
negative
● risk is small when item space is large
Random negatives are easy to classify
● little connection to user interests
But help a lot with model generalization
Challenges
● what distribution to sample negatives?
● how to mix random negatives with
impressed negatives?

Popularity bias
Definition: popular items get higher predictions than they should
Model trained only using impressions (exploit + explore)
● no popularity bias as popular items get both more positives
and more negatives in training data
● some items can suffer from high variance if not enough
explore
Adding uniform random negatives
● may increase popularity bias as we add the same number of
negatives for popular and non-popular items

When item space is large (millions)
Too costly to compute p(engage |
impression) for every item
Candidate generation pass
● efficient model architectures (e.g. two-
tower)
● millions → hundreds (loosely-relevant)
● care more about recall @ hundreds
Fine-grained ranking pass
● more sophisticated model architectures
● distinguish between good and excellent
● often trained only on impressed negatives as
it is applied on already relevant candidates
More passes can be used, eg
● adjusting the ranking for diversity
Efficiency optimization: use 2 passes
● both predict p(engage | impression)
● with different focuses

Repeated impressions
User scrolls back and forth multiple times
Items at the top get repeated
impressions
Need to deduplicate the impressions per
session in the training data
Otherwise, top items get unfairly
penalized in the model as they have more
repeated impressions

Noisy impressions
Many items on screen at the same
time
Not clear if the user saw the item
If no engagement, is it because
● user is not interested?
● user didn’t see it?

Impressions may have long-term value
Impression of a Netflix show makes it more familiar
to the user
● even if the user did not play it
User may become more/less likely to play the show
at the next impression

Impressions for feature
definition

Typical features
Frequency counts: number past impressions
of item
● can add different variations
Engagement rate: #engagements /
#impressions
● how to set the value if #impressions = 0?
● 0, average, 1, adding prior?
● this could affect cold-start performance
● we can also skip this feature to let model
learn directly from raw counts
Categorical features: user’s impressed item
ids
● can help model generalize better via id
embeddings
But a user can have hundreds of impressions
even in a single day
Need to reduce the noise

Impression data volume
Impression data volume is huge
Logging is challenging
● heterogeneous client devices (TV,
mobile, web)
● need to process, sessionize and
summarize in real-time
● need to be available via multiple
channels (table, stream, API) for
different purposes
Handle volume in feature definition
● summary counts
● focus on most recent impressions
● increase minimum impression
duration requirement
● random sampling

How does impression features help?
Correlation
Should we then recommend more items with
many prior impressions from the user? No
Correlation does not imply causation
● highly-impressed items probably have
higher quality and thus have higher avg
label
In an AB test, after adding impression features
● model recommends more lowly-impressed
items

Conclusion
● Overview of using impression data to build an unbiased
recommendation model at Netflix
● Label definition: we may need exploration and random
negative sampling to enrich the training data
● Feature definition: various ways to summarize and
denoise impression data
● Long-term value: impressions can have different long-
term values for different users/items

Challenges
● How to do efficient exploration that maximizes signal
collection and minimizes user experience impact?
● How to sample random negatives? How to mix
random negatives with impressed negatives?
● How to model long-term value of impressions?

Thank you!
Questions?
Contact: Jiangwei Pan, jpan@netflix.com

Recommendation Modeling with Impression Data at Netflix

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Recommendation Modeling with Impression Data at Netflix

Similar to Recommendation Modeling with Impression Data at Netflix (20)

Recently uploaded

Recently uploaded (20)

Recommendation Modeling with Impression Data at Netflix