Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Big & Personal: the data
and the models behind
Netflix recommendations

Outline
1. The Netflix Prize & the Recommendation
Problem
2. Anatomy of Netflix Personalization
3. Data & Models
4. More data or better Models?

What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
● Top 2 algorithms still in
production
Results
SVD
RBM

What about the final prize ensembles?
■ Our offline studies showed they were too computationally
intensive to scale
■ Expected improvement not worth the engineering effort
■ Plus…. Focus had already shifted to other issues that
had more impact than rating prediction.

Anatomy of
Netflix
Personalization
Everything is a Recommendation

Everything is personalized
Note:
Recommendations
are per household,
not individual user
Ranking

Top 10
Personalization awareness
Diversity
DadAll SonDaughterDad&Mom MomAll Daughter MomAll?

Support for Recommendations
Social Support

Genre rows
■ Personalized genre rows focus on user interest
■ Also provide context and “evidence”
■ Important for member satisfaction – moving personalized
rows to top on devices increased retention
■ How are they generated?
■ Implicit: based on user’s recent plays, ratings, & other
interactions
■ Explicit taste preferences
■ Hybrid:combine the above
■ Also take into account:
■ Freshness - has this been shown before?
■ Diversity– avoid repeating tags and genres, limit number
of TV genres, etc.

■ Displayed in
many different
contexts
■ In response to
user
actions/context
(search, queue
add…)
■ More like… rows
Similars

Big Data @Netflix ■ Almost 40M subscribers
■ Ratings: 4M/day
■ Searches: 3M/day
■ Plays: 30M/day
■ 2B hours streamed in Q4
2011
■ 1B hours in June 2012
■ > 4B hours in Q1 2013
Member Behavior
Geo-informationTime
Impressions
Device Info
Metadata
Social

Smart Models
■ Logistic/linear regression
■ Elastic nets
■ SVD and other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains
■ Different clustering approaches
■ LDA
■ Association Rules
■ Gradient Boosted Decision
Trees/Random Forests
■ …

SVD
X[n x m]
= U[n x r]
S [ r x r]
(V[m x r]
)T
■ X: m x n matrix (e.g., m users, n videos)
■ U: m x r matrix (m users, r factors)
■ S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the matrix)
■ V: r x n matrix (n videos, r factor)

SVD for Rating Prediction
■ User factor vectors and item-factors vector
■ Baseline (bias) (user & item deviation from average)
■ Predict rating as
■ SVD++ (Koren et. Al) asymmetric variation w. implicit feedback
■ Where
■ are three item factor vectors
■ Users are not parametrized, but rather represented by:
■ R(u): items rated by user u
■ N(u): items for which the user has given implicit preference (e.g. rated vs. not
rated)

Simon Funk’s SVD
■ One of the most
interesting findings
during the Netflix
Prize came out of a
blog post
■ Incremental, iterative,
and approximate way
to compute the SVD
using gradient
descent

Restricted Boltzmann Machines
■ Restrict the connectivity in ANN to make learning
easier.
■ Only one layer of hidden units.
■ Although multiple layers are possible
■ No connections between hidden units.
■ Hidden units are independent given the visible
states..
■ RBMs can be stacked to form Deep Belief
Networks (DBN) – 4th
generation of ANNs
hidden
i
j
visible

Ranking Key algorithm, sorts titles in most
contexts

Ranking
■ Ranking = Scoring + Sorting + Filtering
bags of movies for presentation to a user
■ Goal: Find the best possible ordering of a
set of videos for a user within a specific
context in real-time
■ Objective: maximize consumption
■ Aspirations: Played & “enjoyed” titles have
best score
■ Akin to CTR forecast for ads/search results
■ Factors
■ Accuracy
■ Novelty
■ Diversity
■ Freshness
■ Scalability
■ …

Example: Two features, linear model

Ranking
Novelty
Diversity
Freshness
Accuracy
Scalability

Learning to rank
■ Machine learning problem: goal is to construct ranking
model from training data
■ Training data can have partial order or binary judgments
(relevant/not relevant).
■ Resulting order of the items typically induced from a
numerical score
■ Learning to rank is a key element for personalization
■ You can treat the problem as a standard supervised
classification problem

Learning to Rank Approaches
1. Pointwise
■ Ranking function minimizes loss function defined on individual
relevance judgment
■ Ranking score based on regression or classification
■ Ordinal regression, Logistic regression, SVM, GBDT, …
2. Pairwise
■ Loss function is defined on pair-wise preferences
■ Goal: minimize number of inversions in ranking
■ Ranking problem is then transformed into the binary classification
problem
■ RankSVM, RankBoost, RankNet, FRank…

Learning to rank - metrics
■ Quality of ranking measured using metrics as
■ Normalized Discounted Cumulative Gain
■ Mean Reciprocal Rank (MRR)
■ Fraction of Concordant Pairs (FCP)
■ Others…
■ But, it is hard to optimize machine-learned
models directly on these measures (they are
not differentiable)
■ Recent research on models that directly
optimize ranking measures

Learning to Rank Approaches
3. Listwise
a. Indirect Loss Function
■ RankCosine: similarity between ranking list and ground truth as loss function
■ ListNet: KL-divergence as loss function by defining a probability distribution
■ Problem: optimization of listwise loss function may not optimize IR metrics
b. Directly optimizing IR measures (difficult since they are not differentiable)
■ Directly optimize IR measures through Genetic Programming or Simulated
Annealing
■ Gradient descent on smoothed version of objective function (e.g. CLiMF at
Recsys 2012 or TFMAP at SIGIR 2012)
■ SVM-MAP relaxes the MAP metric by adding it to the SVM constraints
■ AdaRank uses boosting to optimize NDCG

Other research questions we are interested on
● Row selection
○ How to select and rank lists of “related” items imposing inter-
group diversity, avoiding duplicates...
● Diversity
○ Can we increase diversity while preserving relevance in a way
that we optimize user response?
● Similarity
○ How to compute optimal and personalized similarity between
items by using different data that can range from play histories
to item metadata
● Context-aware recommendations
● Mood and session intent inference
● ...

More data or better models?
Really?
Anand Rajaraman: Stanford & Senior VP at
Walmart Global eCommerce (former Kosmix)

Sometimes, it’s not
about more data

[Banko and Brill, 2001]
Norvig: “Google does not
have better Algorithms,
only more Data”
Many features/
low-bias models

Sometimes, it’s not
about more data

Data without a sound approach = noise

The Personalization Problem
■ The Netflix Prize simplified the recommendation problem
to predicting ratings
■ But…
■ User ratings are only one of the many data inputs we have
■ Rating predictions are only part of our solution
■ Other algorithms such as ranking or similarity are very important
■ We can reformulate the recommendation problem
■ Function to optimize: probability a user chooses something and
enjoys it enough to come back to the service

More data +
Better models +
More accurate metrics +
Better approaches & architectures
Lots of room for improvement!

Thanks!
Xavier Amatriain (@xamat)
xavier@netflix.com
We’re hiring!

Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Similar to Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain (20)

More from BigMine

More from BigMine (7)

Recently uploaded

Recently uploaded (20)

Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain