Netflix Recommendations - Beyond the 5 Stars

Xavier Amatriain
Xavier AmatriainCofounder/CTO at Curai
Ne#lix	
  Recommenda/ons	
  
Beyond	
  the	
  5	
  Stars	
  
	
  
	
  
	
  
ACM	
  SF-­‐Bay	
  Area	
  
October	
  22,	
  2012	
  
	
  
Xavier	
  Amatriain	
  
Personaliza?on	
  Science	
  and	
  Engineering	
  -­‐	
  NeDlix	
  

                                                                       @xamat	
  
Outline
1.  The Netflix Prize & the Recommendation
    Problem
2.  Anatomy of Netflix Personalization
3.  Data & Models
4.  And…
  a)  Consumer (Data) Science
  b)  Or Software Architectures
3
SVD
What we were interested in:
§  High quality recommendations
Proxy question:                                 Results
§  Accuracy in predicted rating   •  Top 2 algorithms still in
                                      production
§  Improve by 10% = $1million!



                                                              RBM
What about the final prize ensembles?
§  Our offline studies showed they were too computationally
    intensive to scale
§  Expected improvement not worth the engineering effort
§  Plus…. Focus had already shifted to other issues that
    had more impact than rating prediction.




                                                               5
Change of focus




      2006        2012   6
Anatomy of
Netflix
Personalization
  Everything is a Recommendation
Everything is personalized
                    Ranking




                              Note:
                              Recommendations
       Rows




                              are per household,
                              not individual user




                                                    8
Top 10
  Personalization awareness




All       Dad     Dad&Mom Daughter    All    All?   Daughter   Son   Mom   Mom




                                     Diversity

                                                                                 9
Support for Recommendations




                         Social Support   10
Social Recommendations




                         11
Watch again & Continue Watching




                                  12
Genres




13
Genre rows
§  Personalized genre rows focus on user interest
   §  Also provide context and “evidence”
   §  Important for member satisfaction – moving personalized rows to top on
       devices increased retention
§  How are they generated?
   §  Implicit: based on user’s recent plays, ratings, & other interactions
   §  Explicit taste preferences
   §  Hybrid:combine the above
   §  Also take into account:
   §  Freshness - has this been shown before?
   §  Diversity– avoid repeating tags and genres, limit number of TV genres, etc.
Genres - personalization




                           15
Genres - personalization




                           16
Genres- explanations




                       17
Genres- explanations




                       18
Genres – user involvement




                            19
Genres – user involvement




                            20
Similars
 §  Displayed in
     many different
     contexts
    §  In response to
        user actions/
        context (search,
        queue add…)
    §  More like… rows
Anatomy of a Personalization - Recap
§  Everything is a recommendation: not only rating
    prediction, but also ranking, row selection, similarity…
§  We strive to make it easy for the user, but…
§  We want the user to be aware and be involved in the
    recommendation process
§  Deal with implicit/explicit and hybrid feedback
§  Add support/explanations for recommendations
§  Consider issues such as diversity or freshness
                                                               22
Data
  &
Models
Big Data @Netflix   §  Almost 30M subscribers
                    §  Ratings: 4M/day
                    §  Searches: 3M/day
                    §  Plays: 30M/day
                    §  2B hours streamed in Q4
                        2011
                    §  1B hours in June 2012



                                                  24
Smart Models
               §  Logistic/linear regression
               §  Elastic nets
               §  SVD and other MF models
               §  Restricted Boltzmann Machines
               §  Markov Chains
               §  Different clustering approaches
               §  LDA
               §  Association Rules
               §  Gradient Boosted Decision Trees
               §  …
                                                     25
SVD
X[n x m] = U[n x r] S [ r x r] (V[m x r])T




§    X: m x n matrix (e.g., m users, n videos)

§    U: m x r matrix (m users, r concepts)
§    S: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix)

§    V: r x n matrix (n videos, r concepts)
Simon Funk’s SVD
§  One of the most
    interesting findings
    during the Netflix
    Prize came out of a
    blog post
§  Incremental, iterative,
    and approximate way
    to compute the SVD
    using gradient
    descent
                              27
SVD for Rating Prediction
                                                                f
§  User factor vectors pu ∈ ℜ f and item-factors vector qv ∈ ℜ
§  Baseline buv = µ + bu + bv (user & item deviation from average)
                               '          T
§  Predict rating as        ruv = buv + pu qv
§  SVD++ (Koren et. Al) asymmetric variation w. implicit feedback
                               $     −
                                       1
                                                                             −
                                                                                 1           '
                     '        T
                               & R(u) 2
                   r = buv + q &
                    uv        v             ∑       (ruj − buj )x j + N(u)       2
                                                                                      ∑ yj ) )
                               %                                                             (
§  Where
                                           j∈R(u)                                    j∈N (u)



   §  qv , xv , yv ∈ ℜ f are three item factor vectors
   §  Users are not parametrized, but rather represented by:
      §  R(u): items rated by user u
      §  N(u): items for which the user has given implicit preference (e.g. rated vs. not rated)


                                                                                                    28
Artificial Neural Networks – 4 generations
§  1st - Perceptrons (~60s)
   §  Single layer of hand-coded features
   §  Linear activation function
   §  Fundamentally limited in what they can learn to do.
§  2nd - Back-propagation (~80s)
   §  Back-propagate error signal to get derivatives for learning
   §  Non-linear activation function
§  3rd - Belief Networks (~90s)
   §  Directed acyclic graph composed of (visible & hidden) stochastic variables
       with weighted connections.
   §  Infer the states of the unobserved variables & learn interactions between
       variables to make network more likely to generate observed data.

                                                                                    29
Restricted Boltzmann Machines
§  Restrict the connectivity to make learning easier.
    §  Only one layer of hidden units.
       §  Although multiple layers are possible          hidden
    §  No connections between hidden units.
                                                                 j
§  Hidden units are independent given the visible
    states..
    §  So we can quickly get an unbiased sample from
        the posterior distribution over hidden “causes”   i
        when given a data-vector
                                                              visible
§  RBMs can be stacked to form Deep Belief
    Nets (DBN) – 4th generation of ANNs
RBM for the Netflix Prize




                            31
Ranking   Key algorithm, sorts titles in most
                                     contexts
Ranking
§  Ranking = Scoring + Sorting + Filtering       §  Factors
    bags of movies for presentation to a user        §  Accuracy
§  Goal: Find the best possible ordering of a       §  Novelty
    set of videos for a user within a specific       §  Diversity
    context in real-time                             §  Freshness
§  Objective: maximize consumption                  §  Scalability
§  Aspirations: Played & “enjoyed” titles have      §  …
    best score
§  Akin to CTR forecast for ads/search results
Ranking
§  Popularity is the obvious baseline
§  Ratings prediction is a clear secondary data
    input that allows for personalization
§  We have added many other features (and tried
    many more that have not proved useful)
§  What about the weights?
  §  Based on A/B testing
  §  Machine-learned
Example: Two features, linear model
                                                       1	
  
Predicted Rating




                                               2	
  




                                                                                                                                  Final	
  Ranking	
  
                                       3	
  
                                   4	
                                       Linear	
  Model:	
  
                                                               frank(u,v)	
  =	
  w1	
  p(v)	
  +	
  w2	
  r(u,v)	
  +	
  b	
  
                           5	
  




                                   Popularity
                                                                                                                                                         35
Ranking
Ranking
Ranking
Ranking
Learning to rank
§  Machine learning problem: goal is to construct ranking
    model from training data
§  Training data can have partial order or binary judgments
    (relevant/not relevant).
§  Resulting order of the items typically induced from a
    numerical score
§  Learning to rank is a key element for personalization
§  You can treat the problem as a standard supervised
    classification problem

                                                               40
Learning to Rank Approaches
1.  Pointwise
   §    Ranking function minimizes loss function defined on individual
         relevance judgment
   §    Ranking score based on regression or classification
   §    Ordinal regression, Logistic regression, SVM, GBDT, …
2.  Pairwise
   §    Loss function is defined on pair-wise preferences
   §    Goal: minimize number of inversions in ranking
   §    Ranking problem is then transformed into the binary classification
         problem
   §    RankSVM, RankBoost, RankNet, FRank…
Learning to rank - metrics                                                    DCG
                                                             NDCG =
                                                                             IDCG
§  Quality of ranking measured using metrics as                         n
                                                                             relevancei
                                                   DCG = relevance1 + ∑
   §  Normalized Discounted Cumulative Gain                             2      log 2 i

   §  Mean Reciprocal Rank (MRR)
                                                                    1               1
   §  Fraction of Concordant Pairs (FCP)                  MRR =
                                                                    H
                                                                         ∑ rank(h )
                                                                         h∈H               i

   §  Others…
§  But, it is hard to optimize machine-learned                   ∑CP(x , x )  i       j

    models directly on these measures (they are           FCP =   i≠ j
                                                                     n(n −1)
    not differentiable)                                                            2

§  Recent research on models that directly
    optimize ranking measures

                                                                                        42
Learning to Rank Approaches
3.     Listwise
      a.    Indirect Loss Function
             §  RankCosine: similarity between ranking list and ground truth as loss function
             §  ListNet: KL-divergence as loss function by defining a probability distribution
             §  Problem: optimization of listwise loss function may not optimize IR metrics
      b.    Directly optimizing IR measures (difficult since they are not differentiable)
             §  Directly optimize IR measures through Genetic Programming
             §  Directly optimize measures with Simulated Annealing
             §  Gradient descent on smoothed version of objective function (e.g. CLiMF
                 presented at Recsys 2012 or TFMAP at SIGIR 2012)
             §  SVM-MAP relaxes the MAP metric by adding it to the SVM constraints
             §  AdaRank uses boosting to optimize NDCG
Similars

           §  Different similarities computed
               from different sources: metadata,
               ratings, viewing data…
           §  Similarities can be treated as
               data/features
           §  Machine Learned models
               improve our concept of “similarity”




                                                44
Data & Models - Recap
§  All sorts of feedback from the user can help generate better
    recommendations
§  Need to design systems that capture and take advantage of
    all this data
§  The right model is as important as the right data
§  It is important to come up with new theoretical models, but
    also need to think about application to a domain, and practical
    issues
§  Rating prediction models are only part of the solution to
    recommendation (think about ranking, similarity…)

                                                                      45
More data or better models?


                                                   Really?




             Anand Rajaraman: Stanford & Senior VP at
             Walmart Global eCommerce (former Kosmix)        46
More data or better models?

Sometimes, it’s not
about more data




                               47
More data or better models?
                                             [Banko and Brill, 2001]


Norvig: “Google does not
have better Algorithms,
only more Data”




                           Many features/
                           low-bias models




                                                                       48
More data or better models?
              Model performance vs. sample size
                    (actual Netflix system)
   0.09

   0.08

   0.07

   0.06

   0.05                                                                   Sometimes, it’s not
                                                                          about more data
   0.04

   0.03

   0.02

   0.01

     0
          0   1000000   2000000   3000000   4000000   5000000   6000000


                                                                                                49
More data or better models?




     Data without a sound approach = noise   50
Consumer
(Data) Science
Consumer Science
§  Main goal is to effectively innovate for customers
§  Innovation goals
  §  “If you want to increase your success rate, double
      your failure rate.” – Thomas Watson, Sr., founder of
      IBM
  §  The only real failure is the failure to innovate
  §  Fail cheaply
  §  Know why you failed/succeeded

                                                             52
Consumer (Data) Science
1.  Start with a hypothesis:
   §  Algorithm/feature/design X will increase member engagement
       with our service, and ultimately member retention
2.  Design a test
   §  Develop a solution or prototype
   §  Think about dependent & independent variables, control,
       significance…
3.  Execute the test
4.  Let data speak for itself

                                                                    53
Offline/Online testing process
     days                    Weeks to months




 Offline                Online A/B                           Rollout
                                                           Feature to
 testing    [success]
                         testing               [success]    all users



              [fail]




                                                                   54
Offline testing
§  Optimize algorithms offline
§  Measure model performance, using metrics such as:
   §  Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, Fraction of
       Concordant Pairs, Precision/Recall & F-measures, AUC, RMSE, Diversity…

§  Offline performance used as an indication to make informed
    decisions on follow-up A/B tests
§  A critical (and unsolved) issue is how offline metrics can
    correlate with A/B test results.
§  Extremely important to define a coherent offline evaluation
    framework (e.g. How to create training/testing datasets is not
    trivial)

                                                                                  55
Executing A/B tests
§  Many different metrics, but ultimately trust user
    engagement (e.g. hours of play and customer retention)
§  Think about significance and hypothesis testing
   §  Our tests usually have thousands of members and 2-20 cells
§  A/B Tests allow you to try radical ideas or test many
    approaches at the same time.
   §  We typically have hundreds of customer A/B tests running
§  Decisions on the product always data-driven

                                                                    56
What to measure
§  OEC: Overall Evaluation Criteria
§  In an AB test framework, the measure of success is key
§  Short-term metrics do not always align with long term
    goals
   §  E.g. CTR: generating more clicks might mean that our
       recommendations are actually worse
§  Use long term metrics such as LTV (Life time value)
    whenever possible
   §  In Netflix, we use member retention
                                                              57
What to measure
§  Short-term metrics can sometimes be informative, and
    may allow for faster decision-taking
   §  At Netflix we use many such as hours streamed by users or
       %hours from a given algorithm
§  But, be aware of several caveats of using early decision
    mechanisms
                                                 Initial effects appear to trend.
                                                 See “Trustworthy Online
                                                 Controlled Experiments: Five
                                                 Puzzling Outcomes
                                                 Explained” [Kohavi et. Al. KDD
                                                 12]


                                                                                    58
Consumer Data Science - Recap
§  Consumer Data Science aims to innovate for the
    customer by running experiments and letting data speak
§  This is mainly done through online AB Testing
§  However, we can speed up innovation by experimenting
    offline
§  But, both for online and offline experimentation, it is
    important to chose the right metric and experimental
    framework

                                                              59
Architectures



                60
Technology




             hTp://techblog.neDlix.com	
     61
62
Event & Data
Distribution




               63
Event & Data Distribution
•  UI devices should broadcast many
   different kinds of user events
    •    Clicks
    •    Presentations
    •    Browsing events
    •    …
•  Events vs. data
    •  Some events only need to be
       propagated and trigger an action
       (low latency, low information per
       event)
    •  Others need to be processed and
       “turned into” data (higher latency,
       higher information quality).
    •  And… there are many in between
•  Real-time event flow managed
   through internal tool (Manhattan)
•  Data flow mostly managed through
   Hadoop.

                                             64
Offline Jobs




               65
Offline Jobs
•  Two kinds of offline jobs
     •  Model training
     •  Batch offline computation of
        recommendations/
        intermediate results
•  Offline queries either in Hive or
   PIG
•  Need a publishing mechanism
   that solves several issues
     •  Notify readers when result of
        query is ready
     •  Support different repositories
        (s3, cassandra…)
     •  Handle errors, monitoring…
     •  We do this through Hermes
                                         66
Computation




              67
Computation
•  Two ways of computing personalized
   results
    •  Batch/offline
    •  Online
•  Each approach has pros/cons
    •  Offline
         +    Allows more complex computations
         +    Can use more data
         -    Cannot react to quick changes
         -    May result in staleness
    •  Online
         +    Can respond quickly to events
         +    Can use most recent data
         -    May fail because of SLA
         -    Cannot deal with “complex”
              computations
•  It’s not an either/or decision
    •  Both approaches can be combined

                                                 68
Signals & Models




                   69
Signals & Models

•  Both offline and online algorithms are
   based on three different inputs:
    •  Models: previously trained from
       existing data
    •  (Offline) Data: previously
       processed and stored information
    •  Signals: fresh data obtained from
       live services
        •  User-related data
        •  Context data (session, date,
           time…)



                                          70
Results




          71
Results
•  Recommendations can be serviced
   from:
    •  Previously computed lists
    •  Online algorithms
    •  A combination of both
•  The decision on where to service the
   recommendation from can respond to
   many factors including context.
•  Also, important to think about the
   fallbacks (what if plan A fails)
•  Previously computed lists/intermediate
   results can be stored in a variety of
   ways
     •  Cache
     •  Cassandra
     •  Relational DB
                                            72
Alerts and Monitoring
§  A non-trivial concern in large-scale recommender
    systems
§  Monitoring: continuously observe quality of system
§  Alert: fast notification if quality of system goes below a
    certain pre-defined threshold
§  Questions:
   §  What do we need to monitor?
   §  How do we know something is “bad enough” to alert


                                                                 73
What to monitor
                                             Did something go
§  Staleness                                  wrong here?

   §  Monitor time since last data update




                                                                74
What to monitor
§  Algorithmic quality
   §  Monitor different metrics by comparing what users do and what
       your algorithm predicted they would do




                                                                       75
What to monitor
§  Algorithmic quality
   §  Monitor different metrics by comparing what users do and what
       your algorithm predicted they would do

           Did something go
             wrong here?




                                                                       76
What to monitor
§  Algorithmic source for users
   §  Monitor how users interact with different algorithms
                                                Algorithm X

                                        Did something go
                                          wrong here?



                                                              New version




                                                                            77
When to alert
§  Alerting thresholds are hard to tune
   §  Avoid unnecessary alerts (the “learn-to-ignore problem”)
   §  Avoid important issues being noticed before the alert happens
§  Rules of thumb
   §  Alert on anything that will impact user experience significantly
   §  Alert on issues that are actionable
   §  If a noticeable event happens without an alert… add a new alert
       for next time



                                                                          78
Conclusions

              79
The Personalization Problem
§  The Netflix Prize simplified the recommendation problem
    to predicting ratings
§  But…
  §  User ratings are only one of the many data inputs we have
  §  Rating predictions are only part of our solution
     §  Other algorithms such as ranking or similarity are very important

§  We can reformulate the recommendation problem
  §  Function to optimize: probability a user chooses something and
      enjoys it enough to come back to the service

                                                                             80
More data +
         Better models +
     More accurate metrics +
Better approaches & architectures
  Lots of room for improvement!
                                    81
Thanks!



       We’re hiring!
Xavier Amatriain (@xamat)
 xamatriain@netflix.com
1 of 82

Recommended

Recent Trends in Personalization: A Netflix Perspective by
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
30.3K views64 slides
Recommendation at Netflix Scale by
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix ScaleJustin Basilico
21.6K views42 slides
Past, Present & Future of Recommender Systems: An Industry Perspective by
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
65K views20 slides
Sequential Decision Making in Recommendations by
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
2.1K views32 slides
Personalizing "The Netflix Experience" with Deep Learning by
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
1.1K views41 slides
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... by
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
13K views32 slides

More Related Content

What's hot

Déjà Vu: The Importance of Time and Causality in Recommender Systems by
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
11.8K views45 slides
Deep Learning for Recommender Systems by
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
21K views35 slides
Learning a Personalized Homepage by
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
6.5K views34 slides
Deep Learning for Recommender Systems by
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
15.4K views33 slides
Session-Based Recommender Systems by
Session-Based Recommender SystemsSession-Based Recommender Systems
Session-Based Recommender SystemsEötvös Loránd University
2.4K views22 slides
Tutorial on Deep Learning in Recommender System, Lars summer school 2019 by
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
2.2K views102 slides

What's hot(20)

Déjà Vu: The Importance of Time and Causality in Recommender Systems by Justin Basilico
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico11.8K views
Deep Learning for Recommender Systems by Justin Basilico
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico21K views
Learning a Personalized Homepage by Justin Basilico
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico6.5K views
Deep Learning for Recommender Systems by Yves Raimond
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond15.4K views
Tutorial on Deep Learning in Recommender System, Lars summer school 2019 by Anoop Deoras
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras2.2K views
Recommending for the World by Yves Raimond
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond2.7K views
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys... by Xavier Amatriain
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain16.5K views
Time, Context and Causality in Recommender Systems by Yves Raimond
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond5.9K views
A Multi-Armed Bandit Framework For Recommendations at Netflix by Jaya Kawale
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale11.1K views
Crafting Recommenders: the Shallow and the Deep of it! by Sudeep Das, Ph.D.
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.1.8K views
Machine Learning at Netflix Scale by Aish Fenton
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
Aish Fenton1.9K views
Recent Trends in Personalization at Netflix by Justin Basilico
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico24.2K views
Personalized Page Generation for Browsing Recommendations by Justin Basilico
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico5.3K views
Past, present, and future of Recommender Systems: an industry perspective by Xavier Amatriain
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain11K views
Netflix Recommendations Feature Engineering with Time Travel by Faisal Siddiqi
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi3.5K views
Lessons Learned from Building Machine Learning Software at Netflix by Justin Basilico
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico14.5K views
Data council SF 2020 Building a Personalized Messaging System at Netflix by Grace T. Huang
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang447 views
Artwork Personalization at Netflix Fernando Amat RecSys2018 by Fernando Amat
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat3.5K views

Similar to Netflix Recommendations - Beyond the 5 Stars

Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial by
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialXavier Amatriain
20K views82 slides
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix by
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixXavier Amatriain
4K views47 slides
Xavier amatriain, dir algorithms netflix m lconf 2013 by
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013MLconf
3.4K views47 slides
acmsigtalkshare-121023190142-phpapp01.pptx by
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
4 views82 slides
Facets and Pivoting for Flexible and Usable Linked Data Exploration by
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationRoberto García
1K views24 slides
Reward constrained interactive recommendation with natural language feedback ... by
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Jeong-Gwan Lee
42 views39 slides

Similar to Netflix Recommendations - Beyond the 5 Stars(20)

Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial by Xavier Amatriain
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Xavier Amatriain20K views
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix by Xavier Amatriain
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain4K views
Xavier amatriain, dir algorithms netflix m lconf 2013 by MLconf
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013
MLconf3.4K views
acmsigtalkshare-121023190142-phpapp01.pptx by dongchangim30
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
dongchangim304 views
Facets and Pivoting for Flexible and Usable Linked Data Exploration by Roberto García
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Roberto García1K views
Reward constrained interactive recommendation with natural language feedback ... by Jeong-Gwan Lee
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...
Jeong-Gwan Lee42 views
Understanding content using Deep learning for NLP by Jaya Kawale
Understanding content using Deep learning for NLPUnderstanding content using Deep learning for NLP
Understanding content using Deep learning for NLP
Jaya Kawale269 views
Diversity versus accuracy: solving the apparent dilemma facing recommender sy... by Aliaksandr Birukou
Diversity versus accuracy: solving the apparent dilemma facing recommender sy...Diversity versus accuracy: solving the apparent dilemma facing recommender sy...
Diversity versus accuracy: solving the apparent dilemma facing recommender sy...
Aliaksandr Birukou454 views
Efficient Filtering in Pub-Sub Systems using BDD by Nabeel Yoosuf
Efficient Filtering in Pub-Sub Systems using BDDEfficient Filtering in Pub-Sub Systems using BDD
Efficient Filtering in Pub-Sub Systems using BDD
Nabeel Yoosuf2.1K views
[SOCRS2013]Differential Context Modeling in Collaborative Filtering by YONG ZHENG
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
YONG ZHENG1.4K views
2011-02-03 LA RubyConf Rails3 TDD Workshop by Wolfram Arnold
2011-02-03 LA RubyConf Rails3 TDD Workshop2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop
Wolfram Arnold708 views
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale by Xavier Amatriain
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain15.1K views
Recommender system algorithm and architecture by Liang Xiang
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang40.7K views
项亮 推荐系统实践 从入门到精通 by topgeek
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
topgeek1.6K views
Neo4j MeetUp - Graph Exploration with MetaExp by Adrian Ziegler
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
Adrian Ziegler197 views
Computer Vision descriptors by Wael Badawy
Computer Vision descriptorsComputer Vision descriptors
Computer Vision descriptors
Wael Badawy60 views

More from Xavier Amatriain

Data/AI driven product development: from video streaming to telehealth by
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthXavier Amatriain
434 views50 slides
AI-driven product innovation: from Recommender Systems to COVID-19 by
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19Xavier Amatriain
864 views77 slides
AI for COVID-19 - Q42020 update by
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateXavier Amatriain
1.3K views29 slides
AI for COVID-19: An online virtual care approach by
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachXavier Amatriain
1.8K views15 slides
Lessons learned from building practical deep learning systems by
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
71.8K views61 slides
AI for healthcare: Scaling Access and Quality of Care for Everyone by
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneXavier Amatriain
2.5K views29 slides

More from Xavier Amatriain(20)

Data/AI driven product development: from video streaming to telehealth by Xavier Amatriain
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
Xavier Amatriain434 views
AI-driven product innovation: from Recommender Systems to COVID-19 by Xavier Amatriain
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
Xavier Amatriain864 views
AI for COVID-19: An online virtual care approach by Xavier Amatriain
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
Xavier Amatriain1.8K views
Lessons learned from building practical deep learning systems by Xavier Amatriain
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain71.8K views
AI for healthcare: Scaling Access and Quality of Care for Everyone by Xavier Amatriain
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
Xavier Amatriain2.5K views
Towards online universal quality healthcare through AI by Xavier Amatriain
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
Xavier Amatriain1.7K views
From one to zero: Going smaller as a growth strategy by Xavier Amatriain
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
Xavier Amatriain1.8K views
Medical advice as a Recommender System by Xavier Amatriain
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
Xavier Amatriain5.6K views
Past present and future of Recommender Systems: an Industry Perspective by Xavier Amatriain
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
Xavier Amatriain7.2K views
Staying Shallow & Lean in a Deep Learning World by Xavier Amatriain
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
Xavier Amatriain7.6K views
Machine Learning for Q&A Sites: The Quora Example by Xavier Amatriain
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain6.2K views
BIG2016- Lessons Learned from building real-life user-focused Big Data systems by Xavier Amatriain
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain3.4K views
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems by Xavier Amatriain
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain5.9K views
Barcelona ML Meetup - Lessons Learned by Xavier Amatriain
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
Xavier Amatriain3.2K views
10 more lessons learned from building Machine Learning systems - MLConf by Xavier Amatriain
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain375.2K views
10 more lessons learned from building Machine Learning systems by Xavier Amatriain
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain180.1K views

Recently uploaded

Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
52 views38 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
15 views161 slides
Attacking IoT Devices from a Web Perspective - Linux Day by
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day Simone Onofri
15 views68 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
16 views29 slides
PRODUCT PRESENTATION.pptx by
PRODUCT PRESENTATION.pptxPRODUCT PRESENTATION.pptx
PRODUCT PRESENTATION.pptxangelicacueva6
13 views1 slide
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide

Recently uploaded(20)

Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada135 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker33 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma31 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb13 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst476 views

Netflix Recommendations - Beyond the 5 Stars

  • 1. Ne#lix  Recommenda/ons   Beyond  the  5  Stars         ACM  SF-­‐Bay  Area   October  22,  2012     Xavier  Amatriain   Personaliza?on  Science  and  Engineering  -­‐  NeDlix   @xamat  
  • 2. Outline 1.  The Netflix Prize & the Recommendation Problem 2.  Anatomy of Netflix Personalization 3.  Data & Models 4.  And… a)  Consumer (Data) Science b)  Or Software Architectures
  • 3. 3
  • 4. SVD What we were interested in: §  High quality recommendations Proxy question: Results §  Accuracy in predicted rating •  Top 2 algorithms still in production §  Improve by 10% = $1million! RBM
  • 5. What about the final prize ensembles? §  Our offline studies showed they were too computationally intensive to scale §  Expected improvement not worth the engineering effort §  Plus…. Focus had already shifted to other issues that had more impact than rating prediction. 5
  • 6. Change of focus 2006 2012 6
  • 7. Anatomy of Netflix Personalization Everything is a Recommendation
  • 8. Everything is personalized Ranking Note: Recommendations Rows are per household, not individual user 8
  • 9. Top 10 Personalization awareness All Dad Dad&Mom Daughter All All? Daughter Son Mom Mom Diversity 9
  • 10. Support for Recommendations Social Support 10
  • 12. Watch again & Continue Watching 12
  • 14. Genre rows §  Personalized genre rows focus on user interest §  Also provide context and “evidence” §  Important for member satisfaction – moving personalized rows to top on devices increased retention §  How are they generated? §  Implicit: based on user’s recent plays, ratings, & other interactions §  Explicit taste preferences §  Hybrid:combine the above §  Also take into account: §  Freshness - has this been shown before? §  Diversity– avoid repeating tags and genres, limit number of TV genres, etc.
  • 19. Genres – user involvement 19
  • 20. Genres – user involvement 20
  • 21. Similars §  Displayed in many different contexts §  In response to user actions/ context (search, queue add…) §  More like… rows
  • 22. Anatomy of a Personalization - Recap §  Everything is a recommendation: not only rating prediction, but also ranking, row selection, similarity… §  We strive to make it easy for the user, but… §  We want the user to be aware and be involved in the recommendation process §  Deal with implicit/explicit and hybrid feedback §  Add support/explanations for recommendations §  Consider issues such as diversity or freshness 22
  • 24. Big Data @Netflix §  Almost 30M subscribers §  Ratings: 4M/day §  Searches: 3M/day §  Plays: 30M/day §  2B hours streamed in Q4 2011 §  1B hours in June 2012 24
  • 25. Smart Models §  Logistic/linear regression §  Elastic nets §  SVD and other MF models §  Restricted Boltzmann Machines §  Markov Chains §  Different clustering approaches §  LDA §  Association Rules §  Gradient Boosted Decision Trees §  … 25
  • 26. SVD X[n x m] = U[n x r] S [ r x r] (V[m x r])T §  X: m x n matrix (e.g., m users, n videos) §  U: m x r matrix (m users, r concepts) §  S: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix) §  V: r x n matrix (n videos, r concepts)
  • 27. Simon Funk’s SVD §  One of the most interesting findings during the Netflix Prize came out of a blog post §  Incremental, iterative, and approximate way to compute the SVD using gradient descent 27
  • 28. SVD for Rating Prediction f §  User factor vectors pu ∈ ℜ f and item-factors vector qv ∈ ℜ §  Baseline buv = µ + bu + bv (user & item deviation from average) ' T §  Predict rating as ruv = buv + pu qv §  SVD++ (Koren et. Al) asymmetric variation w. implicit feedback $ − 1 − 1 ' ' T & R(u) 2 r = buv + q & uv v ∑ (ruj − buj )x j + N(u) 2 ∑ yj ) ) % ( §  Where j∈R(u) j∈N (u) §  qv , xv , yv ∈ ℜ f are three item factor vectors §  Users are not parametrized, but rather represented by: §  R(u): items rated by user u §  N(u): items for which the user has given implicit preference (e.g. rated vs. not rated) 28
  • 29. Artificial Neural Networks – 4 generations §  1st - Perceptrons (~60s) §  Single layer of hand-coded features §  Linear activation function §  Fundamentally limited in what they can learn to do. §  2nd - Back-propagation (~80s) §  Back-propagate error signal to get derivatives for learning §  Non-linear activation function §  3rd - Belief Networks (~90s) §  Directed acyclic graph composed of (visible & hidden) stochastic variables with weighted connections. §  Infer the states of the unobserved variables & learn interactions between variables to make network more likely to generate observed data. 29
  • 30. Restricted Boltzmann Machines §  Restrict the connectivity to make learning easier. §  Only one layer of hidden units. §  Although multiple layers are possible hidden §  No connections between hidden units. j §  Hidden units are independent given the visible states.. §  So we can quickly get an unbiased sample from the posterior distribution over hidden “causes” i when given a data-vector visible §  RBMs can be stacked to form Deep Belief Nets (DBN) – 4th generation of ANNs
  • 31. RBM for the Netflix Prize 31
  • 32. Ranking Key algorithm, sorts titles in most contexts
  • 33. Ranking §  Ranking = Scoring + Sorting + Filtering §  Factors bags of movies for presentation to a user §  Accuracy §  Goal: Find the best possible ordering of a §  Novelty set of videos for a user within a specific §  Diversity context in real-time §  Freshness §  Objective: maximize consumption §  Scalability §  Aspirations: Played & “enjoyed” titles have §  … best score §  Akin to CTR forecast for ads/search results
  • 34. Ranking §  Popularity is the obvious baseline §  Ratings prediction is a clear secondary data input that allows for personalization §  We have added many other features (and tried many more that have not proved useful) §  What about the weights? §  Based on A/B testing §  Machine-learned
  • 35. Example: Two features, linear model 1   Predicted Rating 2   Final  Ranking   3   4   Linear  Model:   frank(u,v)  =  w1  p(v)  +  w2  r(u,v)  +  b   5   Popularity 35
  • 40. Learning to rank §  Machine learning problem: goal is to construct ranking model from training data §  Training data can have partial order or binary judgments (relevant/not relevant). §  Resulting order of the items typically induced from a numerical score §  Learning to rank is a key element for personalization §  You can treat the problem as a standard supervised classification problem 40
  • 41. Learning to Rank Approaches 1.  Pointwise §  Ranking function minimizes loss function defined on individual relevance judgment §  Ranking score based on regression or classification §  Ordinal regression, Logistic regression, SVM, GBDT, … 2.  Pairwise §  Loss function is defined on pair-wise preferences §  Goal: minimize number of inversions in ranking §  Ranking problem is then transformed into the binary classification problem §  RankSVM, RankBoost, RankNet, FRank…
  • 42. Learning to rank - metrics DCG NDCG = IDCG §  Quality of ranking measured using metrics as n relevancei DCG = relevance1 + ∑ §  Normalized Discounted Cumulative Gain 2 log 2 i §  Mean Reciprocal Rank (MRR) 1 1 §  Fraction of Concordant Pairs (FCP) MRR = H ∑ rank(h ) h∈H i §  Others… §  But, it is hard to optimize machine-learned ∑CP(x , x ) i j models directly on these measures (they are FCP = i≠ j n(n −1) not differentiable) 2 §  Recent research on models that directly optimize ranking measures 42
  • 43. Learning to Rank Approaches 3.  Listwise a.  Indirect Loss Function §  RankCosine: similarity between ranking list and ground truth as loss function §  ListNet: KL-divergence as loss function by defining a probability distribution §  Problem: optimization of listwise loss function may not optimize IR metrics b.  Directly optimizing IR measures (difficult since they are not differentiable) §  Directly optimize IR measures through Genetic Programming §  Directly optimize measures with Simulated Annealing §  Gradient descent on smoothed version of objective function (e.g. CLiMF presented at Recsys 2012 or TFMAP at SIGIR 2012) §  SVM-MAP relaxes the MAP metric by adding it to the SVM constraints §  AdaRank uses boosting to optimize NDCG
  • 44. Similars §  Different similarities computed from different sources: metadata, ratings, viewing data… §  Similarities can be treated as data/features §  Machine Learned models improve our concept of “similarity” 44
  • 45. Data & Models - Recap §  All sorts of feedback from the user can help generate better recommendations §  Need to design systems that capture and take advantage of all this data §  The right model is as important as the right data §  It is important to come up with new theoretical models, but also need to think about application to a domain, and practical issues §  Rating prediction models are only part of the solution to recommendation (think about ranking, similarity…) 45
  • 46. More data or better models? Really? Anand Rajaraman: Stanford & Senior VP at Walmart Global eCommerce (former Kosmix) 46
  • 47. More data or better models? Sometimes, it’s not about more data 47
  • 48. More data or better models? [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models 48
  • 49. More data or better models? Model performance vs. sample size (actual Netflix system) 0.09 0.08 0.07 0.06 0.05 Sometimes, it’s not about more data 0.04 0.03 0.02 0.01 0 0 1000000 2000000 3000000 4000000 5000000 6000000 49
  • 50. More data or better models? Data without a sound approach = noise 50
  • 52. Consumer Science §  Main goal is to effectively innovate for customers §  Innovation goals §  “If you want to increase your success rate, double your failure rate.” – Thomas Watson, Sr., founder of IBM §  The only real failure is the failure to innovate §  Fail cheaply §  Know why you failed/succeeded 52
  • 53. Consumer (Data) Science 1.  Start with a hypothesis: §  Algorithm/feature/design X will increase member engagement with our service, and ultimately member retention 2.  Design a test §  Develop a solution or prototype §  Think about dependent & independent variables, control, significance… 3.  Execute the test 4.  Let data speak for itself 53
  • 54. Offline/Online testing process days Weeks to months Offline Online A/B Rollout Feature to testing [success] testing [success] all users [fail] 54
  • 55. Offline testing §  Optimize algorithms offline §  Measure model performance, using metrics such as: §  Mean Reciprocal Rank, Normalized Discounted Cumulative Gain, Fraction of Concordant Pairs, Precision/Recall & F-measures, AUC, RMSE, Diversity… §  Offline performance used as an indication to make informed decisions on follow-up A/B tests §  A critical (and unsolved) issue is how offline metrics can correlate with A/B test results. §  Extremely important to define a coherent offline evaluation framework (e.g. How to create training/testing datasets is not trivial) 55
  • 56. Executing A/B tests §  Many different metrics, but ultimately trust user engagement (e.g. hours of play and customer retention) §  Think about significance and hypothesis testing §  Our tests usually have thousands of members and 2-20 cells §  A/B Tests allow you to try radical ideas or test many approaches at the same time. §  We typically have hundreds of customer A/B tests running §  Decisions on the product always data-driven 56
  • 57. What to measure §  OEC: Overall Evaluation Criteria §  In an AB test framework, the measure of success is key §  Short-term metrics do not always align with long term goals §  E.g. CTR: generating more clicks might mean that our recommendations are actually worse §  Use long term metrics such as LTV (Life time value) whenever possible §  In Netflix, we use member retention 57
  • 58. What to measure §  Short-term metrics can sometimes be informative, and may allow for faster decision-taking §  At Netflix we use many such as hours streamed by users or %hours from a given algorithm §  But, be aware of several caveats of using early decision mechanisms Initial effects appear to trend. See “Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained” [Kohavi et. Al. KDD 12] 58
  • 59. Consumer Data Science - Recap §  Consumer Data Science aims to innovate for the customer by running experiments and letting data speak §  This is mainly done through online AB Testing §  However, we can speed up innovation by experimenting offline §  But, both for online and offline experimentation, it is important to chose the right metric and experimental framework 59
  • 61. Technology hTp://techblog.neDlix.com   61
  • 62. 62
  • 64. Event & Data Distribution •  UI devices should broadcast many different kinds of user events •  Clicks •  Presentations •  Browsing events •  … •  Events vs. data •  Some events only need to be propagated and trigger an action (low latency, low information per event) •  Others need to be processed and “turned into” data (higher latency, higher information quality). •  And… there are many in between •  Real-time event flow managed through internal tool (Manhattan) •  Data flow mostly managed through Hadoop. 64
  • 66. Offline Jobs •  Two kinds of offline jobs •  Model training •  Batch offline computation of recommendations/ intermediate results •  Offline queries either in Hive or PIG •  Need a publishing mechanism that solves several issues •  Notify readers when result of query is ready •  Support different repositories (s3, cassandra…) •  Handle errors, monitoring… •  We do this through Hermes 66
  • 68. Computation •  Two ways of computing personalized results •  Batch/offline •  Online •  Each approach has pros/cons •  Offline +  Allows more complex computations +  Can use more data -  Cannot react to quick changes -  May result in staleness •  Online +  Can respond quickly to events +  Can use most recent data -  May fail because of SLA -  Cannot deal with “complex” computations •  It’s not an either/or decision •  Both approaches can be combined 68
  • 70. Signals & Models •  Both offline and online algorithms are based on three different inputs: •  Models: previously trained from existing data •  (Offline) Data: previously processed and stored information •  Signals: fresh data obtained from live services •  User-related data •  Context data (session, date, time…) 70
  • 71. Results 71
  • 72. Results •  Recommendations can be serviced from: •  Previously computed lists •  Online algorithms •  A combination of both •  The decision on where to service the recommendation from can respond to many factors including context. •  Also, important to think about the fallbacks (what if plan A fails) •  Previously computed lists/intermediate results can be stored in a variety of ways •  Cache •  Cassandra •  Relational DB 72
  • 73. Alerts and Monitoring §  A non-trivial concern in large-scale recommender systems §  Monitoring: continuously observe quality of system §  Alert: fast notification if quality of system goes below a certain pre-defined threshold §  Questions: §  What do we need to monitor? §  How do we know something is “bad enough” to alert 73
  • 74. What to monitor Did something go §  Staleness wrong here? §  Monitor time since last data update 74
  • 75. What to monitor §  Algorithmic quality §  Monitor different metrics by comparing what users do and what your algorithm predicted they would do 75
  • 76. What to monitor §  Algorithmic quality §  Monitor different metrics by comparing what users do and what your algorithm predicted they would do Did something go wrong here? 76
  • 77. What to monitor §  Algorithmic source for users §  Monitor how users interact with different algorithms Algorithm X Did something go wrong here? New version 77
  • 78. When to alert §  Alerting thresholds are hard to tune §  Avoid unnecessary alerts (the “learn-to-ignore problem”) §  Avoid important issues being noticed before the alert happens §  Rules of thumb §  Alert on anything that will impact user experience significantly §  Alert on issues that are actionable §  If a noticeable event happens without an alert… add a new alert for next time 78
  • 80. The Personalization Problem §  The Netflix Prize simplified the recommendation problem to predicting ratings §  But… §  User ratings are only one of the many data inputs we have §  Rating predictions are only part of our solution §  Other algorithms such as ranking or similarity are very important §  We can reformulate the recommendation problem §  Function to optimize: probability a user chooses something and enjoys it enough to come back to the service 80
  • 81. More data + Better models + More accurate metrics + Better approaches & architectures Lots of room for improvement! 81
  • 82. Thanks! We’re hiring! Xavier Amatriain (@xamat) xamatriain@netflix.com