SlideShare a Scribd company logo
1 of 82
Netflix Recommendations
Beyond the 5 Stars
ACM SF-‐Bay Area
October 22, 2012
Xavier Amatriain
Personalization Science and Engineering -
‐Netflix
@xamat
Outline
1. The Netflix Prize & the Recommendation
Problem
2. Anatomy of Netflix Personalization
3. Data & Models
4. And…
a) Consumer (Data) Science
b) Or Software Architectures
3
What we were interested in:
 High quality recommendations
Proxy question:
 Accuracy in predicted rating
 Improve by 10% = $1million!
Results
• Top 2 algorithms still in
production
SVD
RBM
What about the final prize ensembles?
 Our offline studies showed they were too computationally
intensive to scale
 Expected improvement not worth the engineering effort
 Plus…. Focus had already shifted to other issues that
had more impact than rating prediction.
5
Change of focus
6
2006 2012
Anatomy of
Netflix
Personalization
Everything is a Recommendation
Everything is personalized
Note:
Recommendations
are per household,
not individual user
Rows
Ranking
8
Top 10
Personalization awareness
Diversity
All
9
Dad Dad&Mom Daughter Mom
All Daughter Son Mom
All?
Support for Recommendations
10
Social Support
Social Recommendations
11
Watch again & Continue Watching
12
Genres
13
Genre rows
 Personalized genre rows focus on user interest
 Also provide context and “evidence”
 Important for member satisfaction – moving personalized rows to top on
devices increased retention
 How are they generated?
 Implicit: based on user’s recent plays, ratings, & other interactions
 Explicit taste preferences
 Hybrid:combine the above
 Also take into account:
 Freshness - has this been shown before?
 Diversity– avoid repeating tags and genres, limit number of TV genres, etc.
Genres - personalization
15
Genres - personalization
16
17
Genres- explanations
Genres- explanations
18
19
Genres – user involvement
Genres – user involvement
20
 Displayed in
many different
contexts
 In response to
user actions/
context (search,
queue add…)
 More like… rows
Similars
Anatomy of a Personalization - Recap
 Everything is a recommendation: not only rating
prediction, but also ranking, row selection, similarity…
 We strive to make it easy for the user, but…
 We want the user to be aware and be involved in the
recommendation process
 Deal with implicit/explicit and hybrid feedback
 Add support/explanations for recommendations
 Consider issues such as diversity or freshness
22
Data
&
Models
Big Data @Netflix
 Plays:
 Almost 30M subscribers
 Ratings: 4M/day
 Searches: 3M/day
30M/day
 2B hours streamed in Q4
2011
 1B hours in June 2012
24
Smart Models
25
 Logistic/linear regression
 Elastic nets
 SVD and other MF models
 Restricted Boltzmann Machines
 Markov Chains
 Different clustering approaches
 LDA
 Association Rules
 Gradient Boosted Decision Trees
 …
SVD
X[n x m] = U[n x r] S  r x r] (V[m x r])T
 X: m x n matrix (e.g., m users, n videos)
 U: m x r matrix (m users, r concepts)
 S: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix)
 V: r x n matrix (n videos, r concepts)
Simon Funk’s SVD
 One of the most
interesting findings
during the Netflix
Prize came out of a
blog post
 Incremental, iterative,
and approximate way
to compute the SVD
using gradient
descent
27
SVD for Rating Prediction
 Where f
 qv, xv, yv   are three item factor vectors
 Users are not parametrized, but rather represented by:
 R(u): items rated by user u
 N(u): items for which the user has given implicit preference (e.g. rated vs. not rated)
f
pu  
f
qv  
r' T
 b  p q
uv uv u v
 User factor vectors and item-factors vector
 Baseline buv   bu bv (user & item deviation from average)
 Predict rating as
 SVD++ (Koren et. Al) asymmetric variation w. implicit feedback
r'
uj uj j
jR(u)

28

 b  qT
R(u)

2  (r  b )x  N(u) 2
uv uv v 
 1 1 

jN (u) 
y 
j 
Artificial Neural Networks – 4 generations
 1st - Perceptrons (~60s)
 Single layer of hand-coded features
 Linear activation function
 Fundamentally limited in what they can learn to do.
 2nd - Back-propagation (~80s)
 Back-propagate error signal to get derivatives for learning
 Non-linear activation function
 3rd - Belief Networks (~90s)
 Directed acyclic graph composed of (visible & hidden) stochastic variables
with weighted connections.
 Infer the states of the unobserved variables & learn interactions between
variables to make network more likely to generate observed data.
29
Restricted Boltzmann Machines
 Restrict the connectivity to make learning easier.
 Only one layer of hidden units.
 Although multiple layers are possible
 No connections between hidden units.
 Hidden units are independent given the visible
states..
 So we can quickly get an unbiased sample from
the posterior distribution over hidden “causes”
when given a data-vector
 RBMs can be stacked to form Deep Belief
Nets (DBN) – 4th generation of ANNs
i
hidden
j
visible
RBM for the Netflix Prize
31
Ranking Key algorithm, sorts titles in most
contexts
Ranking
 Ranking = Scoring + Sorting + Filtering
bags of movies for presentation to a user
 Goal: Find the best possible ordering of a
set of videos for a user within a specific
context in real-time
 Objective: maximize consumption
 Aspirations: Played & “enjoyed” titles have
best score
 Akin to CTR forecast for ads/search results
 Factors
 Accuracy
 Novelty
 Diversity
 Freshness
 Scalability
 …
Ranking
 Popularity is the obvious baseline
 Ratings prediction is a clear secondary data
input that allows for personalization
 We have added many other features (and tried
many more that have not proved useful)
 What about the weights?
 Based on A/B testing
 Machine-learned
Example: Two features, linear model
35
Popularity
Predicted
Rating
1
2
3
4
5
Linear Model:
frank(u,v) = w1 p(v) + w2 r(u,v) + b
Final
Ranking
Ranking
Ranking
Ranking
Ranking
Learning to rank
 Machine learning problem: goal is to construct ranking
model from training data
 Training data can have partial order or binary judgments
(relevant/not relevant).
 Resulting order of the items typically induced from a
numerical score
 Learning to rank is a key element for personalization
 You can treat the problem as a standard supervised
classification problem
40
Learning to Rank Approaches
1. Pointwise
 Ranking function minimizes loss function defined on individual
relevance judgment
 Ranking score based on regression or classification
 Ordinal regression, Logistic regression, SVM, GBDT, …
2. Pairwise
 Loss function is defined on pair-wise preferences
 Goal: minimize number of inversions in ranking
 Ranking problem is then transformed into the binary classification
problem
 RankSVM, RankBoost, RankNet, FRank…
Learning to rank - metrics
 Quality of ranking measured using metrics as
 Normalized Discounted Cumulative Gain
 Mean Reciprocal Rank (MRR)
 Fraction of Concordant Pairs (FCP)
 Others…
 But, it is hard to optimize machine-learned
models directly on these measures (they are
not differentiable)
 Recent research on models that directly
optimize ranking measures
42
NDCG 
DCG
IDCG
DCG  relevance1
relevance
log i
2 2
n
 i
H hH rank(hi )
1 1
MRR  
FCP  ij
CP(xi, xj )
n(n 1)
2
Learning to Rank Approaches
3. Listwise
a. Indirect Loss Function
 RankCosine: similarity between ranking list and ground truth as loss function
 ListNet: KL-divergence as loss function by defining a probability distribution
 Problem: optimization of listwise loss function may not optimize IR metrics
b. Directly optimizing IR measures (difficult since they are not differentiable)
 Directly optimize IR measures through Genetic Programming
 Directly optimize measures with Simulated Annealing
 Gradient descent on smoothed version of objective function (e.g. CLiMF
presented at Recsys 2012 or TFMAP at SIGIR 2012)
 SVM-MAP relaxes the MAP metric by adding it to the SVM constraints
 AdaRank uses boosting to optimize NDCG
44
Similars
 Different similarities computed
from different sources: metadata,
ratings, viewing data…
 Similarities can be treated as
data/features
 Machine Learned models
improve our concept of “similarity”
Data & Models - Recap
 All sorts of feedback from the user can help generate better
recommendations
 Need to design systems that capture and take advantage of
all this data
 The right model is as important as the right data
 It is important to come up with new theoretical models, but
also need to think about application to a domain, and practical
issues
 Rating prediction models are only part of the solution to
recommendation (think about ranking, similarity…)
45
More data or better models?
46
Really?
Anand Rajaraman: Stanford & Senior VP at
Walmart Global eCommerce (former Kosmix)
Sometimes, it’s not
about more data
47
More data or better models?
[Banko and Brill, 2001]
48
Norvig: “Google does not
have better Algorithms,
only more Data”
Many features/
low-bias models
More data or better models?
0.03
0.02
0.01
0
49
0.04
0.05
0.09
0.08
0.07
0.06
0 1000000 2000000 3000000 4000000 5000000 6000000
Model performance vs. sample size
(actual Netflix system)
More data or better models?
Sometimes, it’s not
about more data
50
More data or better models?
Data without a sound approach = noise
Consumer
(Data) Science
Consumer Science
52
 Main goal is to effectively innovate for customers
 Innovation goals
 “If you want to increase your success rate, double
your failure rate.” – Thomas Watson, Sr., founder of
IBM
 The only real failure is the failure to innovate
 Fail cheaply
 Know why you failed/succeeded
Consumer (Data) Science
53
1. Start with a hypothesis:
 Algorithm/feature/design X will increase member engagement
with our service, and ultimately member retention
2. Design a test
 Develop a solution or prototype
 Think about dependent & independent variables, control,
significance…
3. Execute the test
4. Let data speak for itself
Offline/Online testing process
days Weeks to months
Rollout
Feature to
all users
Offline
testing
Online A/B
testing
[success] [success]
[fail]
54
Offline testing
55
 Optimize algorithms offline
 Measure model performance, using metrics such as:
 Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, Fraction of
Concordant Pairs, Precision/Recall & F-measures, AUC, RMSE, Diversity…
 Offline performance used as an indication to make informed
decisions on follow-up A/B tests
 A critical (and unsolved) issue is how offline metrics can
correlate with A/B test results.
 Extremely important to define a coherent offline evaluation
framework (e.g. How to create training/testing datasets is not
trivial)
Executing A/B tests
56
 Many different metrics, but ultimately trust user
engagement (e.g. hours of play and customer retention)
 Think about significance and hypothesis testing
 Our tests usually have thousands of members and 2-20 cells
 A/B Tests allow you to try radical ideas or test many
approaches at the same time.
 We typically have hundreds of customer A/B tests running
 Decisions on the product always data-driven
What to measure
57
 OEC: Overall Evaluation Criteria
 In an AB test framework, the measure of success is key
 Short-term metrics do not always align with long term
goals
 E.g. CTR: generating more clicks might mean that our
recommendations are actually worse
 Use long term metrics such as LTV (Life time value)
whenever possible
 In Netflix, we use member retention
What to measure
 Short-term metrics can sometimes be informative, and
may allow for faster decision-taking
 At Netflix we use many such as hours streamed by users or
%hours from a given algorithm
 But, be aware of several caveats of using early decision
mechanisms
Initial effects appear to trend.
See “Trustworthy Online
Controlled Experiments: Five
Puzzling Outcomes
Explained” [Kohavi et. Al. KDD
12]
58
Consumer Data Science - Recap
59
 Consumer Data Science aims to innovate for the
customer by running experiments and letting data speak
 This is mainly done through online AB Testing
 However, we can speed up innovation by experimenting
offline
 But, both for online and offline experimentation, it is
important to chose the right metric and experimental
framework
Architectures
60
Technology
61
http://techblog.netflix.com
62
Event & Data
Distribution
63
• UI devices should broadcast many
different kinds of user events
• Clicks
• Presentations
• Browsing events
• …
• Events vs. data
• Some events only need to be
propagated and trigger an action
(low latency, low information per
event)
• Others need to be processed and
“turned into” data (higher latency,
higher information quality).
• And… there are many in between
• Real-time event flow managed
through internal tool (Manhattan)
• Data flow mostly managed through
Hadoop.
Event & Data Distribution
64
Offline Jobs
65
• Two kinds of offline jobs
• Model training
• Batch offline computation of
recommendations/
intermediate results
• Offline queries either in Hive or
PIG
• Need a publishing mechanism
that solves several issues
• Notify readers when result of
query is ready
• Support different repositories
(s3, cassandra…)
• Handle errors, monitoring…
• We do this through Hermes
Offline Jobs
66
Computation
67
• Two ways of computing personalized
results
• Batch/offline
• Online
• Each approach has pros/cons
• Offline
+ Allows more complex computations
+ Can use more data
- Cannot react to quick changes
- May result in staleness
• Online
+ Can respond quickly to events
+ Can use most recent data
- May fail because of SLA
- Cannot deal with “complex”
computations
• It’s not an either/or decision
• Both approaches can be combined
68
Computation
Signals & Models
69
• Both offline and online algorithms are
based on three different inputs:
• Models: previously trained from
existing data
• (Offline) Data: previously
processed and stored information
• Signals: fresh data obtained from
live services
• User-related data
• Context data (session, date,
time…)
70
Signals & Models
Results
71
• Recommendations can be serviced
from:
• Previously computed lists
• Online algorithms
• A combination of both
• The decision on where to service the
recommendation from can respond to
many factors including context.
• Also, important to think about the
fallbacks (what if plan A fails)
• Previously computed lists/intermediate
results can be stored in a variety of
ways
• Cache
• Cassandra
• Relational DB
72
Results
Alerts and Monitoring
73
 A non-trivial concern in large-scale recommender
systems
 Monitoring: continuously observe quality of system
 Alert: fast notification if quality of system goes below a
certain pre-defined threshold
 Questions:
 What do we need to monitor?
 How do we know something is “bad enough” to alert
What to monitor
 Staleness
 Monitor time since last data update
Did something go
wrong here?
74
What to monitor
 Algorithmic quality
 Monitor different metrics by comparing what users do and what
your algorithm predicted they would do
75
What to monitor
 Algorithmic quality
 Monitor different metrics by comparing what users do and what
your algorithm predicted they would do
Did something go
wrong here?
76
What to monitor
 Algorithmic source for users
 Monitor how users interact with different algorithms
Algorithm X
New version
Did something go
wrong here?
77
When to alert
78
 Alerting thresholds are hard to tune
 Avoid unnecessary alerts (the “learn-to-ignore problem”)
 Avoid important issues being noticed before the alert happens
 Rules of thumb
 Alert on anything that will impact user experience significantly
 Alert on issues that are actionable
 If a noticeable event happens without an alert… add a new alert
for next time
Conclusions
79
The Personalization Problem
80
 The Netflix Prize simplified the recommendation problem
to predicting ratings
 But…
 User ratings are only one of the many data inputs we have
 Rating predictions are only part of our solution
 Other algorithms such as ranking or similarity are very important
 We can reformulate the recommendation problem
 Function to optimize: probability a user chooses something and
enjoys it enough to come back to the service
More data +
Better models +
81
More accurate metrics +
Better approaches & architectures
Lots of room for improvement!
Thanks!
We’re hiring!
Xavier Amatriain (@xamat)
xamatriain@netflix.com

More Related Content

Similar to acmsigtalkshare-121023190142-phpapp01.pptx

Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013MLconf
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMarion DE SOUSA
 
Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_publicVincent Michel
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualizationVini Vasundharan
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Lecture_2_Deep_Learning_Overview (1).pptx
Lecture_2_Deep_Learning_Overview (1).pptxLecture_2_Deep_Learning_Overview (1).pptx
Lecture_2_Deep_Learning_Overview (1).pptxgamajima2023
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesYuchen Zhao
 

Similar to acmsigtalkshare-121023190142-phpapp01.pptx (20)

Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_public
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Lecture_2_Deep_Learning_Overview (1).pptx
Lecture_2_Deep_Learning_Overview (1).pptxLecture_2_Deep_Learning_Overview (1).pptx
Lecture_2_Deep_Learning_Overview (1).pptx
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 

Recently uploaded

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 

acmsigtalkshare-121023190142-phpapp01.pptx

  • 1. Netflix Recommendations Beyond the 5 Stars ACM SF-‐Bay Area October 22, 2012 Xavier Amatriain Personalization Science and Engineering - ‐Netflix @xamat
  • 2. Outline 1. The Netflix Prize & the Recommendation Problem 2. Anatomy of Netflix Personalization 3. Data & Models 4. And… a) Consumer (Data) Science b) Or Software Architectures
  • 3. 3
  • 4. What we were interested in:  High quality recommendations Proxy question:  Accuracy in predicted rating  Improve by 10% = $1million! Results • Top 2 algorithms still in production SVD RBM
  • 5. What about the final prize ensembles?  Our offline studies showed they were too computationally intensive to scale  Expected improvement not worth the engineering effort  Plus…. Focus had already shifted to other issues that had more impact than rating prediction. 5
  • 8. Everything is personalized Note: Recommendations are per household, not individual user Rows Ranking 8
  • 9. Top 10 Personalization awareness Diversity All 9 Dad Dad&Mom Daughter Mom All Daughter Son Mom All?
  • 12. Watch again & Continue Watching 12
  • 14. Genre rows  Personalized genre rows focus on user interest  Also provide context and “evidence”  Important for member satisfaction – moving personalized rows to top on devices increased retention  How are they generated?  Implicit: based on user’s recent plays, ratings, & other interactions  Explicit taste preferences  Hybrid:combine the above  Also take into account:  Freshness - has this been shown before?  Diversity– avoid repeating tags and genres, limit number of TV genres, etc.
  • 19. 19 Genres – user involvement
  • 20. Genres – user involvement 20
  • 21.  Displayed in many different contexts  In response to user actions/ context (search, queue add…)  More like… rows Similars
  • 22. Anatomy of a Personalization - Recap  Everything is a recommendation: not only rating prediction, but also ranking, row selection, similarity…  We strive to make it easy for the user, but…  We want the user to be aware and be involved in the recommendation process  Deal with implicit/explicit and hybrid feedback  Add support/explanations for recommendations  Consider issues such as diversity or freshness 22
  • 24. Big Data @Netflix  Plays:  Almost 30M subscribers  Ratings: 4M/day  Searches: 3M/day 30M/day  2B hours streamed in Q4 2011  1B hours in June 2012 24
  • 25. Smart Models 25  Logistic/linear regression  Elastic nets  SVD and other MF models  Restricted Boltzmann Machines  Markov Chains  Different clustering approaches  LDA  Association Rules  Gradient Boosted Decision Trees  …
  • 26. SVD X[n x m] = U[n x r] S  r x r] (V[m x r])T  X: m x n matrix (e.g., m users, n videos)  U: m x r matrix (m users, r concepts)  S: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix)  V: r x n matrix (n videos, r concepts)
  • 27. Simon Funk’s SVD  One of the most interesting findings during the Netflix Prize came out of a blog post  Incremental, iterative, and approximate way to compute the SVD using gradient descent 27
  • 28. SVD for Rating Prediction  Where f  qv, xv, yv   are three item factor vectors  Users are not parametrized, but rather represented by:  R(u): items rated by user u  N(u): items for which the user has given implicit preference (e.g. rated vs. not rated) f pu   f qv   r' T  b  p q uv uv u v  User factor vectors and item-factors vector  Baseline buv   bu bv (user & item deviation from average)  Predict rating as  SVD++ (Koren et. Al) asymmetric variation w. implicit feedback r' uj uj j jR(u)  28   b  qT R(u)  2  (r  b )x  N(u) 2 uv uv v   1 1   jN (u)  y  j 
  • 29. Artificial Neural Networks – 4 generations  1st - Perceptrons (~60s)  Single layer of hand-coded features  Linear activation function  Fundamentally limited in what they can learn to do.  2nd - Back-propagation (~80s)  Back-propagate error signal to get derivatives for learning  Non-linear activation function  3rd - Belief Networks (~90s)  Directed acyclic graph composed of (visible & hidden) stochastic variables with weighted connections.  Infer the states of the unobserved variables & learn interactions between variables to make network more likely to generate observed data. 29
  • 30. Restricted Boltzmann Machines  Restrict the connectivity to make learning easier.  Only one layer of hidden units.  Although multiple layers are possible  No connections between hidden units.  Hidden units are independent given the visible states..  So we can quickly get an unbiased sample from the posterior distribution over hidden “causes” when given a data-vector  RBMs can be stacked to form Deep Belief Nets (DBN) – 4th generation of ANNs i hidden j visible
  • 31. RBM for the Netflix Prize 31
  • 32. Ranking Key algorithm, sorts titles in most contexts
  • 33. Ranking  Ranking = Scoring + Sorting + Filtering bags of movies for presentation to a user  Goal: Find the best possible ordering of a set of videos for a user within a specific context in real-time  Objective: maximize consumption  Aspirations: Played & “enjoyed” titles have best score  Akin to CTR forecast for ads/search results  Factors  Accuracy  Novelty  Diversity  Freshness  Scalability  …
  • 34. Ranking  Popularity is the obvious baseline  Ratings prediction is a clear secondary data input that allows for personalization  We have added many other features (and tried many more that have not proved useful)  What about the weights?  Based on A/B testing  Machine-learned
  • 35. Example: Two features, linear model 35 Popularity Predicted Rating 1 2 3 4 5 Linear Model: frank(u,v) = w1 p(v) + w2 r(u,v) + b Final Ranking
  • 40. Learning to rank  Machine learning problem: goal is to construct ranking model from training data  Training data can have partial order or binary judgments (relevant/not relevant).  Resulting order of the items typically induced from a numerical score  Learning to rank is a key element for personalization  You can treat the problem as a standard supervised classification problem 40
  • 41. Learning to Rank Approaches 1. Pointwise  Ranking function minimizes loss function defined on individual relevance judgment  Ranking score based on regression or classification  Ordinal regression, Logistic regression, SVM, GBDT, … 2. Pairwise  Loss function is defined on pair-wise preferences  Goal: minimize number of inversions in ranking  Ranking problem is then transformed into the binary classification problem  RankSVM, RankBoost, RankNet, FRank…
  • 42. Learning to rank - metrics  Quality of ranking measured using metrics as  Normalized Discounted Cumulative Gain  Mean Reciprocal Rank (MRR)  Fraction of Concordant Pairs (FCP)  Others…  But, it is hard to optimize machine-learned models directly on these measures (they are not differentiable)  Recent research on models that directly optimize ranking measures 42 NDCG  DCG IDCG DCG  relevance1 relevance log i 2 2 n  i H hH rank(hi ) 1 1 MRR   FCP  ij CP(xi, xj ) n(n 1) 2
  • 43. Learning to Rank Approaches 3. Listwise a. Indirect Loss Function  RankCosine: similarity between ranking list and ground truth as loss function  ListNet: KL-divergence as loss function by defining a probability distribution  Problem: optimization of listwise loss function may not optimize IR metrics b. Directly optimizing IR measures (difficult since they are not differentiable)  Directly optimize IR measures through Genetic Programming  Directly optimize measures with Simulated Annealing  Gradient descent on smoothed version of objective function (e.g. CLiMF presented at Recsys 2012 or TFMAP at SIGIR 2012)  SVM-MAP relaxes the MAP metric by adding it to the SVM constraints  AdaRank uses boosting to optimize NDCG
  • 44. 44 Similars  Different similarities computed from different sources: metadata, ratings, viewing data…  Similarities can be treated as data/features  Machine Learned models improve our concept of “similarity”
  • 45. Data & Models - Recap  All sorts of feedback from the user can help generate better recommendations  Need to design systems that capture and take advantage of all this data  The right model is as important as the right data  It is important to come up with new theoretical models, but also need to think about application to a domain, and practical issues  Rating prediction models are only part of the solution to recommendation (think about ranking, similarity…) 45
  • 46. More data or better models? 46 Really? Anand Rajaraman: Stanford & Senior VP at Walmart Global eCommerce (former Kosmix)
  • 47. Sometimes, it’s not about more data 47 More data or better models?
  • 48. [Banko and Brill, 2001] 48 Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models More data or better models?
  • 49. 0.03 0.02 0.01 0 49 0.04 0.05 0.09 0.08 0.07 0.06 0 1000000 2000000 3000000 4000000 5000000 6000000 Model performance vs. sample size (actual Netflix system) More data or better models? Sometimes, it’s not about more data
  • 50. 50 More data or better models? Data without a sound approach = noise
  • 52. Consumer Science 52  Main goal is to effectively innovate for customers  Innovation goals  “If you want to increase your success rate, double your failure rate.” – Thomas Watson, Sr., founder of IBM  The only real failure is the failure to innovate  Fail cheaply  Know why you failed/succeeded
  • 53. Consumer (Data) Science 53 1. Start with a hypothesis:  Algorithm/feature/design X will increase member engagement with our service, and ultimately member retention 2. Design a test  Develop a solution or prototype  Think about dependent & independent variables, control, significance… 3. Execute the test 4. Let data speak for itself
  • 54. Offline/Online testing process days Weeks to months Rollout Feature to all users Offline testing Online A/B testing [success] [success] [fail] 54
  • 55. Offline testing 55  Optimize algorithms offline  Measure model performance, using metrics such as:  Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, Fraction of Concordant Pairs, Precision/Recall & F-measures, AUC, RMSE, Diversity…  Offline performance used as an indication to make informed decisions on follow-up A/B tests  A critical (and unsolved) issue is how offline metrics can correlate with A/B test results.  Extremely important to define a coherent offline evaluation framework (e.g. How to create training/testing datasets is not trivial)
  • 56. Executing A/B tests 56  Many different metrics, but ultimately trust user engagement (e.g. hours of play and customer retention)  Think about significance and hypothesis testing  Our tests usually have thousands of members and 2-20 cells  A/B Tests allow you to try radical ideas or test many approaches at the same time.  We typically have hundreds of customer A/B tests running  Decisions on the product always data-driven
  • 57. What to measure 57  OEC: Overall Evaluation Criteria  In an AB test framework, the measure of success is key  Short-term metrics do not always align with long term goals  E.g. CTR: generating more clicks might mean that our recommendations are actually worse  Use long term metrics such as LTV (Life time value) whenever possible  In Netflix, we use member retention
  • 58. What to measure  Short-term metrics can sometimes be informative, and may allow for faster decision-taking  At Netflix we use many such as hours streamed by users or %hours from a given algorithm  But, be aware of several caveats of using early decision mechanisms Initial effects appear to trend. See “Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained” [Kohavi et. Al. KDD 12] 58
  • 59. Consumer Data Science - Recap 59  Consumer Data Science aims to innovate for the customer by running experiments and letting data speak  This is mainly done through online AB Testing  However, we can speed up innovation by experimenting offline  But, both for online and offline experimentation, it is important to chose the right metric and experimental framework
  • 62. 62
  • 64. • UI devices should broadcast many different kinds of user events • Clicks • Presentations • Browsing events • … • Events vs. data • Some events only need to be propagated and trigger an action (low latency, low information per event) • Others need to be processed and “turned into” data (higher latency, higher information quality). • And… there are many in between • Real-time event flow managed through internal tool (Manhattan) • Data flow mostly managed through Hadoop. Event & Data Distribution 64
  • 66. • Two kinds of offline jobs • Model training • Batch offline computation of recommendations/ intermediate results • Offline queries either in Hive or PIG • Need a publishing mechanism that solves several issues • Notify readers when result of query is ready • Support different repositories (s3, cassandra…) • Handle errors, monitoring… • We do this through Hermes Offline Jobs 66
  • 68. • Two ways of computing personalized results • Batch/offline • Online • Each approach has pros/cons • Offline + Allows more complex computations + Can use more data - Cannot react to quick changes - May result in staleness • Online + Can respond quickly to events + Can use most recent data - May fail because of SLA - Cannot deal with “complex” computations • It’s not an either/or decision • Both approaches can be combined 68 Computation
  • 70. • Both offline and online algorithms are based on three different inputs: • Models: previously trained from existing data • (Offline) Data: previously processed and stored information • Signals: fresh data obtained from live services • User-related data • Context data (session, date, time…) 70 Signals & Models
  • 72. • Recommendations can be serviced from: • Previously computed lists • Online algorithms • A combination of both • The decision on where to service the recommendation from can respond to many factors including context. • Also, important to think about the fallbacks (what if plan A fails) • Previously computed lists/intermediate results can be stored in a variety of ways • Cache • Cassandra • Relational DB 72 Results
  • 73. Alerts and Monitoring 73  A non-trivial concern in large-scale recommender systems  Monitoring: continuously observe quality of system  Alert: fast notification if quality of system goes below a certain pre-defined threshold  Questions:  What do we need to monitor?  How do we know something is “bad enough” to alert
  • 74. What to monitor  Staleness  Monitor time since last data update Did something go wrong here? 74
  • 75. What to monitor  Algorithmic quality  Monitor different metrics by comparing what users do and what your algorithm predicted they would do 75
  • 76. What to monitor  Algorithmic quality  Monitor different metrics by comparing what users do and what your algorithm predicted they would do Did something go wrong here? 76
  • 77. What to monitor  Algorithmic source for users  Monitor how users interact with different algorithms Algorithm X New version Did something go wrong here? 77
  • 78. When to alert 78  Alerting thresholds are hard to tune  Avoid unnecessary alerts (the “learn-to-ignore problem”)  Avoid important issues being noticed before the alert happens  Rules of thumb  Alert on anything that will impact user experience significantly  Alert on issues that are actionable  If a noticeable event happens without an alert… add a new alert for next time
  • 80. The Personalization Problem 80  The Netflix Prize simplified the recommendation problem to predicting ratings  But…  User ratings are only one of the many data inputs we have  Rating predictions are only part of our solution  Other algorithms such as ranking or similarity are very important  We can reformulate the recommendation problem  Function to optimize: probability a user chooses something and enjoys it enough to come back to the service
  • 81. More data + Better models + 81 More accurate metrics + Better approaches & architectures Lots of room for improvement!
  • 82. Thanks! We’re hiring! Xavier Amatriain (@xamat) xamatriain@netflix.com