Balancing Discovery and Continuation in Recommendations

Hossein Taghavi
With: Ashok Chandrashekar, Linas Baltrunas, and Justin Basilico
Balancing Discovery and Continuation
in Recommendations
RecSysTV 2016

Outline
§ Background: Netflix recommendations
§ Recommending for different modes of watching
§ Case study: Continue Watching row
§ Conclusions
2

Evolution of Netflix
2006 2016

Netflix Scale
§ > 83M members
§ > 190 countries
§ > 1000 device types
§ > 3.7B hours of content
streamed every month
§ 36% of peak US
downstream traffic
4

§ Recommendations through
predicted star rating
§ Contest:
§ Accuracy measured by root
mean squared error (RMSE)
§ Improve by 10% = $1 million!
§ Data size:
§ 100M ratings (back then
“almost massive”)
5

Turn on Netflix, and the
absolute best contents for you
would automatically start playing
Recommendation System: Ideal State
6

Create a page of recommendations
where the titles you are
most likely to watch and enjoy are
shown on the most visible parts of
the page
Meanwhile…
7

Title Ranking
Everything is a RecommendationRowSelection&Ordering
Recommendations are
driven by machine
learning algorithms
Over 80% of what
members watch comes
from our
recommendations
8

How the Homepage is Built
§ The titles are organized as rows
§ Ordering of titles within rows depends on the row type
§ Selection and ordering of rows:
§ Personalized page generation
algorithm
§ Also some business rules and
constraints
§ Balance thematic coherence,
relevance, and diversity
9

Various Types of Member Interactions/Feedback
§ Plays
§ How long, pause, rewind, skip, etc.
§ Rating and social
§ Rate, like, share
§ Context
§ Time, location, device, language
§ Interactions
§ Scrolling, opening a title page,
search, list add 10

Building the Recommendations is Data Driven
§ Try an idea offline using historical
data to see if it would have made
better recommendations
§ Offline metrics: AUC, nDCG, Recall, …
§ If it did, deploy a live A/B test to see
if it performs well in Production
§ Primary metric: Member retention
Idea /
Problem
Data
Algorithm
Model
Metrics
A/B
Testing
11

For More Reading
§ Netflix tech blog:
§ bit.ly/beyondfivestars
§ bit.ly/learnapage
§ bit.ly/sparktimetravel
12

Building recommendation algorithms that are
balanced for different modes of watching
13

The same you watched last time!
What Is the Most Likely Title You Will Watch?
§ A large portion of watching hours are spent in continue
watching mode
14

Different Modes of Watching
§ Continuation: Resume a
recently-watched TV/Movie
§ List: Play a title previously
added to My List
§ Rewatch: Rewatch a title
enjoyed in the past
§ Discovery: Discover a new
title to watch
15

Recommending for Different Modes:
Approach 1
§ Build one unified model for ranking the titles in each row
and one for ranking rows
§ Optimized for the likelihood of play/enjoyment from the page
§ Benefits:
§ Fewer models to maintain
§ Fewer A/B tests
16

Approach 1: Challenges
§ Members behave differently in different modes
§ Different row types are designed for different behaviors
§ Hard to capture and balance all that in one objective
§ E.g. simply ranking titles by likelihood of play will fill the page with
already-watched titles è Poor member experience
§ Recommendations for different modes have different
sensitivities to member actions
§ Continuation recs may react immediately to watching activities,
My List recs may react to My List add/remove activities, etc.
17

Approach 2: Dedicated Models + Blend
§ Build separate models for the each mode
§ Blend the results on the page
§ Blending can be done through a model trained offline, or a
parameter tuned online
§ E.g., one or more dedicated rows for each mode
§ Pro:
§ More modular, provides more intuitive knobs for balancing
§ Con:
§ Less elegant, more maintenance 18

Case Study: Continue Watching Row
19

Continue Watching Row: The Past
§ CW row was shown on some devices
§ Videos sorted by recency of last watch
§ Row appearance on page by business rules
§ On the website, only a single CW title
§ A very significant fraction of plays are continuations
§ CW deserved a better treatment
20

Objective
§ Unify the CW row across devices
§ Optimize the row in two dimensions:
§ Row position on page
§ Place it higher when the member is more
likely to resume a video
§ Re-order the titles within the CW row
§ By their likelihood to be resumed in the
current session
21

Some Intuitive Patterns
§ Member may be more likely to want to
§ Resume a video if:
§ In the middle of binging a TV show
§ Partially watched a movie recently
§ Often watched it around this time of the day, location, or on the current
device
§ Discover a new title if:
§ Just finished a movie or completed all episodes of a show
§ Hasn’t watched anything recently
§ Is a relatively new member
22

Building a Recommendation Model for CW
§ Feature Brainstorm
§ Training Data
§ Models and Metrics
§ Implementation
23

Feature Ideas
§ Member-level:
§ Member’s subscription: tenure, country, language
§ How active has the member been recently
§ Member past ratings, genre preferences, etc.
24

Feature Ideas
§ Video and member’s previous interactions with it:
§ How recently was the video added to the catalog, watched, ...
§ How much of the movie/show watched
§ Video metadata:
§ Type and genre of video, # episodes
§ E.g., kids titles may be re-watched more
§ What else is on the catalog
§ Popularity and relevance of the video
§ How often do members resume this video
25

Feature Ideas
§ Contextual:
§ Time of the day and day of the week
§ Location at various resolutions
§ Device
26

Title Ranking Model
§ Training data
§ Continuation sessions
§ Look at which of the recently-watched titles were played?
§ Model
§ Learn-to-rank: Linear/ensembles/…
§ Optimize for how well we rank the played title among other titles
27

Title Ranking Model: Performance
§ Baseline: Ranking by recency of
last play
§ Recency rank was also an
important feature in the model
§ Metrics significantly higher than
the baseline
§ E.g. Significant lift in precision
§ A/B testing also showed
improvements
28

Row Placement Model
§ Objective
§ Estimate the likelihood of continuation vs. discovery
§ Map that likelihood to a position on the page
§ Simplification:
§ Fix two candidate positions on the page and apply a threshold
§ Tune the threshold to optimize some accuracy metric
29

Row Placement Model: Training
§ Training data
§ Randomly select sessions with plays globally
§ Model
§ Binary classification of continuation vs. discovery sessions
§ Evaluated using classification and ranking metrics
30

Row Placement Model: Performance
§ Metrics
§ Achieved high classification metrics for predicting continuation vs
discovery
§ Error types:
§ False positives è CW occupies top of the page unnecessarily
§ False negative è Difficult for member to find the CW title
§ Placing the row
§ Threshold trades off FP and FN è Hard to tune offline
§ Tuned the threshold by A/B testing
31

Reusing the Title Ranking Model
§ Use the title-level scores
§ Calibrate scores to get probability Pt of continuation for each CW
title t
§ Aggregate into an overall probability of continuation
§ E.g., assuming independence:
PCW = 1 - ∏tϵCW (1- Pt)
§ Pro: Avoids maintaining two separate models
§ Con: Not as accurate as a dedicated model
32

Context Awareness
§ Title ranks highest on the same time of day and device
as last play
§ Experiment:
§ Played “Sid the Science Kid” on iPhone
§ Played “Narcos” on the website
è Different ranking on iPhone and Web
33

Serving the CW Row in Production
§ Score cannot be precomputed è Real- or near real-time
§ Some features are context dependent
§ Row should refresh each time a member watches a title
§ Need to push updates to clients to keep the row fresh
§ Latency bottleneck: Data transfers from the cache to
computation backend
§ Requires careful backend engineering
§ Fallback strategy: If computation fails, can use recency ranking
34

Conclusions and Future Directions
35

Conclusions
§ Important to understand different modes of behavior
§ Continuation is a key driver of streaming hours
§ Improving CW recommendations improves member experience
§ A/B testing showed significant boost in user engagement
§ Future:
§ Incorporate the placement of CW row (and others) into the main
page construction model
§ When can we automatically start resuming a title? 36

Questions?
Upcoming blog post on this topic at: techblog.netflix.com
Job openings: jobs.netflix.com
37

Balancing Discovery and Continuation in Recommendations

More Related Content

Similar to Balancing Discovery and Continuation in Recommendations

Recently uploaded

Balancing Discovery and Continuation in Recommendations