Recommender System
    Introduction
  xiangliang@hulu.com
What is good recommender
         system?
Outline
• What is recommender system?
  – Mission
  – History
  – Problems
• What is good recommender system?
  – Experiment Methods
  – Evaluation Metric
Information Overload
How to solve information overload
• Catalog
  – Yahoo, DMOZ


• Search Engine
  – Google, Bing
Mission
• Help user find item of their interest.
• Help item provider deliver their item to
  right user.
• Help website improve user engagement.
Recommender
   System
Search Engine vs. Recommender
               System
• User will try search engine if
  – they have specific needs
  – they can use keywords to describe needs
• User will try recommender system if
  – they do not know what they want now
  – they can not use keywords to describe needs
History: Before 1992
• Content Filtering
  – An architecture for large scale information
    systems [1985] (Gifford, D.K)
  – MAFIA: An active mail-filter agent for an
    intelligent document processing support [1990]
    (Lutz, E.)
  – A rule-based message filtering system [1988]
    (Pollock, S. )
History: 1992-1998
• Tapestry by Xerox Palo Alto [1992]
  – First system designed by collaborative filtering
• Grouplens [1994]
  – First recommender system using rating data
• Movielens [1997]
  – First movie recommender system
  – Provide well-known dataset for researchers
History: 1992-1998
• Fab : content-based collaborative
  recommendation
  – First unified recommender system
• Empirical Analysis of Predictive Algorithms
  for Collaborative Filtering [1998] (John S.
  Breese)
  – Systematically evaluate user-based
    collaborative filtering
History: 1999-2005
• Amazon proposed item-based collaborative
  filtering (Patent is filed in 1998 and issued
  in 2001) [link]
• Thomas Hofmann proposed pLSA [1999]
  and apply similar method on collaborative
  filtering [2004]
• Pandora began music genome project
  [2000]
History: 1999-2005
• Lastfm using Audioscrobbler to generate
  user taste profile on musics.
• Evaluating collaborative filtering
  recommender systems [2004] (Jonathan L.
  Herlocker)
History: 2005-2009
• Toward the Next Generation of
  Recommender Systems: A Survey of the
  State-of-the-Art and Possible Extensions.
  [2005] (Alexander Tuzhilin)
• Netflix Prize [link]
  – Latent Factor Model (SVD, RSVD, NSVD, SVD++)
  – Temporal Dynamic Collaborative Filtering
  – Yehuda Koren [link]’s team get prize
History: 2005-2009
• ACM Conference on Recommender System
  [2007] (Minneapolis, Minnesota, USA)
• Digg, Youtube try recommender system.
History: 2010-now
•   Context-Aware Recommender Systems
•   Music Recommendation and Discovery
•   Recommender Systems and the Social Web
•   Information Heterogeneity and Fusion in
    Recommender Systems
•   Human Decision Making in Recommender Systems
•   Personalization in Mobile Applications
•   Novelty and Diversity in Recommender Systems
•   User-Centric Evaluation
History: 2010-now
• Facebook launches instant personalization
  [2010]
  – Clicker
  – Bing
  – Trip Advisor
  – Rotten Tomatoes
  – Pandora
  – ……
Problems
• Main Problems
  – Top-N Recommendation
  – Rating Prediction
Problems
• Top-N Recommendation
  – Input
        user      item
            A      a
            B      a
            B      b
            …      …

  – Output
Problems
• Top-N Recommendation
  – Input
        user      item     rating
            A      a
            B      a
            B      b
            …      …         …

  – Output
What is good recommender
         system?
Experiment Methods
• Offline Experiment
• User Survey
• Online Experiment
  – AB Testing
Experiment Methods
• Offline Experiment

                    DataSet




                Train                    Test


  • Advantage:
       • Only rely on dataset
       •
  • Disadvantage:
       • Offline metric can not reflect business goal
Experiment Methods
• User Survey
  – Advantage:
    • Can get subjective metrics
    • Lower risk than online testing
  – Disadvantage:
    • Higher cost than offline experiments
    • Some results may not have statistical significance
    • Users may have different behaviors under testing
      environment or real environment
    • It’s difficult to design double blink experiments.
Experiment Methods
• On line experiments (AB Testing)
  – Advantage:
     • Can get metrics related to business goal
  – Disadvantage:
     • High risk/cost
     • Need large user set to get statistical significant result
Experiment Metrics
•   User Satisfaction
•   Prediction Accuracy
•   Coverage
•   Diversity
•   Novelty
•   Serendipity
•   Trust
•   Robust
•   Real-time
Experiment Metrics
• User Satisfaction
  – Subjective metric
  – Measured by user survey or online experiments
Experiment Metrics
• Prediction Accuracy
  – Measured by offline experiments
  – Top-N Recommendation
     • Precision / Recall


  – Rating Prediction
     • MAE, RMSE
Experiment Metrics
• Coverage
  – Measure the ability of recommender system to
    recommend long-tail items.

                |         R (u , N ) |
                     u U
  Coverage
                           |I|

  – Entropy, Gini Index
Experiment Metrics
• Diversity
  – Measure the ability of recommender system to
    cover users’ different interests.
  – Different similarity metric generate different
    diversity metric.
Experiment Metrics
• Diversity (Example)




 Watch History           Related Items
Experiment Metrics
• Novelty
  – Measure the ability of recommender system to
    introduce long tail items to users.
  – International Workshop on Novelty and
    Diversity in Recommender Systems [link]
  – Music Recommendation and Discovery in the
    Long Tail [Oscar Celma]
Experiment Metrics
• Serendipity
  – A recommendation result is serendipity if:
     • it’s not related with user’s historical interest
     • it’s novelty to user
     • user will find it’s interesting after user view it
Experiment Metrics
• Trust
  – If user trust recommender system, they will
    interact with it.
  – Ways to improve trust:
     • Transparency
     • Social
     • Trust System (Epinion)
Experiment Metrics
• Robust
  – The ability of recommender system to prevent
    attack.
  – Neil Hurley. Tutorial on Robustness of
    Recommender System. ACM RecSys 2011.
Experiment Metrics
• Real-time
  – Generate new recommendations when user
    have new behaviors immediately.
Too many metric!
Which is most important?
How to do trade-off
• Business goal
• Our belief
• Making new algorithms by 3 steps
  experiments:
  – Offline testing
  – User survey
  – Online testing
Thanks!

Recommender system introduction

  • 1.
    Recommender System Introduction xiangliang@hulu.com
  • 2.
    What is goodrecommender system?
  • 3.
    Outline • What isrecommender system? – Mission – History – Problems • What is good recommender system? – Experiment Methods – Evaluation Metric
  • 4.
  • 5.
    How to solveinformation overload • Catalog – Yahoo, DMOZ • Search Engine – Google, Bing
  • 6.
    Mission • Help userfind item of their interest. • Help item provider deliver their item to right user. • Help website improve user engagement.
  • 7.
  • 8.
    Search Engine vs.Recommender System • User will try search engine if – they have specific needs – they can use keywords to describe needs • User will try recommender system if – they do not know what they want now – they can not use keywords to describe needs
  • 9.
    History: Before 1992 •Content Filtering – An architecture for large scale information systems [1985] (Gifford, D.K) – MAFIA: An active mail-filter agent for an intelligent document processing support [1990] (Lutz, E.) – A rule-based message filtering system [1988] (Pollock, S. )
  • 10.
    History: 1992-1998 • Tapestryby Xerox Palo Alto [1992] – First system designed by collaborative filtering • Grouplens [1994] – First recommender system using rating data • Movielens [1997] – First movie recommender system – Provide well-known dataset for researchers
  • 11.
    History: 1992-1998 • Fab: content-based collaborative recommendation – First unified recommender system • Empirical Analysis of Predictive Algorithms for Collaborative Filtering [1998] (John S. Breese) – Systematically evaluate user-based collaborative filtering
  • 12.
    History: 1999-2005 • Amazonproposed item-based collaborative filtering (Patent is filed in 1998 and issued in 2001) [link] • Thomas Hofmann proposed pLSA [1999] and apply similar method on collaborative filtering [2004] • Pandora began music genome project [2000]
  • 13.
    History: 1999-2005 • Lastfmusing Audioscrobbler to generate user taste profile on musics. • Evaluating collaborative filtering recommender systems [2004] (Jonathan L. Herlocker)
  • 14.
    History: 2005-2009 • Towardthe Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. [2005] (Alexander Tuzhilin) • Netflix Prize [link] – Latent Factor Model (SVD, RSVD, NSVD, SVD++) – Temporal Dynamic Collaborative Filtering – Yehuda Koren [link]’s team get prize
  • 15.
    History: 2005-2009 • ACMConference on Recommender System [2007] (Minneapolis, Minnesota, USA) • Digg, Youtube try recommender system.
  • 16.
    History: 2010-now • Context-Aware Recommender Systems • Music Recommendation and Discovery • Recommender Systems and the Social Web • Information Heterogeneity and Fusion in Recommender Systems • Human Decision Making in Recommender Systems • Personalization in Mobile Applications • Novelty and Diversity in Recommender Systems • User-Centric Evaluation
  • 17.
    History: 2010-now • Facebooklaunches instant personalization [2010] – Clicker – Bing – Trip Advisor – Rotten Tomatoes – Pandora – ……
  • 18.
    Problems • Main Problems – Top-N Recommendation – Rating Prediction
  • 19.
    Problems • Top-N Recommendation – Input user item A a B a B b … … – Output
  • 20.
    Problems • Top-N Recommendation – Input user item rating A a B a B b … … … – Output
  • 21.
    What is goodrecommender system?
  • 22.
    Experiment Methods • OfflineExperiment • User Survey • Online Experiment – AB Testing
  • 23.
    Experiment Methods • OfflineExperiment DataSet Train Test • Advantage: • Only rely on dataset • • Disadvantage: • Offline metric can not reflect business goal
  • 24.
    Experiment Methods • UserSurvey – Advantage: • Can get subjective metrics • Lower risk than online testing – Disadvantage: • Higher cost than offline experiments • Some results may not have statistical significance • Users may have different behaviors under testing environment or real environment • It’s difficult to design double blink experiments.
  • 25.
    Experiment Methods • Online experiments (AB Testing) – Advantage: • Can get metrics related to business goal – Disadvantage: • High risk/cost • Need large user set to get statistical significant result
  • 26.
    Experiment Metrics • User Satisfaction • Prediction Accuracy • Coverage • Diversity • Novelty • Serendipity • Trust • Robust • Real-time
  • 27.
    Experiment Metrics • UserSatisfaction – Subjective metric – Measured by user survey or online experiments
  • 28.
    Experiment Metrics • PredictionAccuracy – Measured by offline experiments – Top-N Recommendation • Precision / Recall – Rating Prediction • MAE, RMSE
  • 29.
    Experiment Metrics • Coverage – Measure the ability of recommender system to recommend long-tail items. | R (u , N ) | u U Coverage |I| – Entropy, Gini Index
  • 30.
    Experiment Metrics • Diversity – Measure the ability of recommender system to cover users’ different interests. – Different similarity metric generate different diversity metric.
  • 31.
    Experiment Metrics • Diversity(Example) Watch History Related Items
  • 32.
    Experiment Metrics • Novelty – Measure the ability of recommender system to introduce long tail items to users. – International Workshop on Novelty and Diversity in Recommender Systems [link] – Music Recommendation and Discovery in the Long Tail [Oscar Celma]
  • 33.
    Experiment Metrics • Serendipity – A recommendation result is serendipity if: • it’s not related with user’s historical interest • it’s novelty to user • user will find it’s interesting after user view it
  • 34.
    Experiment Metrics • Trust – If user trust recommender system, they will interact with it. – Ways to improve trust: • Transparency • Social • Trust System (Epinion)
  • 35.
    Experiment Metrics • Robust – The ability of recommender system to prevent attack. – Neil Hurley. Tutorial on Robustness of Recommender System. ACM RecSys 2011.
  • 36.
    Experiment Metrics • Real-time – Generate new recommendations when user have new behaviors immediately.
  • 37.
    Too many metric! Whichis most important?
  • 38.
    How to dotrade-off • Business goal • Our belief • Making new algorithms by 3 steps experiments: – Offline testing – User survey – Online testing
  • 39.