Temporal Diversity in Recommender Systems
  Neal Lathia1, Stephen Hailes1, Licia Capra1, Xavier Amatriain2
       1
         Dept. Computer Science, University College London
                  2
                    Telefonica Research, Barcelona

                    ACM SIGIR 2010, Geneva

                       n.lathia@cs.ucl.ac.uk
                  @neal_lathia, @xamat




             EU i-Tour Project
recommender systems

●   many examples over different web domains
●
    a lot of research: accuracy
●   multiple dimensions of usage that equate to user
    satisfaction
evaluating collaborative filtering over time

●   design a methodology to evaluate recommender systems
    that are iteratively updated; explore temporal dimension
    of filtering algorithms1




    1
    N. Lathia, S. Hailes, L. Capra. Temporal Collaborative Filtering with
    Adaptive Neighbourhoods. ACM SIGIR 2009, Boston, USA
temporal diversity

●   ...is not concerned with diversity of a single set of
    recommendations (e.g., are you recommended all six star
    wars movies at once?)
●    ...is concerned with the sequence of recommendations
    that users see (are you recommended the same items
    every week?)
contributions

●   is temporal recommendation diversity important?
●   how to measure temporal diversity and novelty?
●   how much temporal diversity do state-of-the-art CF
    algorithms provide?
●   how to improve temporal diversity?
is diversity important?
data perspective: growth & activity
demographics (in paper): ~104 respondents
procedure

●   claim: recommender system for “popular movies”
●   rate week 1's recommendations
     ●     movie titles, links to IMDB, DVD Covers
●   (click through buffer screen)
●   rate week 2's recommendations
●   (click through buffer screen)
●   ....
overview of the surveys
Survey 3: Random Movies

W1


W2


W3



W4



W5
Survey 3: Random Movies

W1


W2


W3



W4



W5
Survey 2: Popular Movies, Change Each Week

W1


W2


W3



W4



W5
Survey 2: Popular Movies, Change Each Week

W1


W2


W3



W4



W5
Survey 1: Popular Movies – No Change

W1


W2


W3



W4



W5
Closing Questions
Closing Questions

                    surprise, unrest, rude
                    compliments, “spot on”




                    74% important / very important
                    23% neutral




                    86% important / very important




                    95% important / very important
how did this affect the way people rated?
how did this affect the way people rated?
how did this affect the way people rated?




                                     S3 Random: Always Bad
how did this affect the way people rated?


                                     S2 Popular: Quite Good




                                     S3 Random: Always Bad
how did this affect the way people rated?


                                       S2 Popular: Quite Good
                                       S1 Starts off Quite Good




                                       S1 Ends off Bad
                                       S3 Random: Always Bad




                                 ...ANOVA details in paper...
is diversity important? (yes)
how to measure temporal diversity?
measuring temporal diversity




diversity = ?
measuring temporal diversity




diversity = 3/10
how much temporal diversity do state-of-the-art
CF algorithms provide?
3 algorithms – 3 influential factors


●   baseline – popularity ranking
●   item-based kNN
●   singular value decomposition


●   profile size vs. diversity
●   ratings added vs. diversity
●   time between sessions vs. diversity
profile size vs. diversity



   baseline              kNN   SVD
profile size vs. diversity



   baseline              kNN   SVD
main results


●   as profile size increases, diversity decreases
●   the more ratings added in the current session, the more
    diversity will be experienced in next session
●   more time between sessions leads to more diversity
consequences


●   want to avoid from having profiles that are too large
●   (conflict #1) want to encourage users to rate as much as
    possible
●   (conflict #2) want users to visit often, but diversity
    increases if they don't


●   how does this relate back to traditional evaluation metrics?
accuracy vs. diversity




more diverse
                                       kNN


                                       SVD
                                       baseline



                       more accurate
how to improve temporal diversity?
3 methods


●   temporal switching
●   temporal user-based switching
●   re-ranking frequent visitor's lists
temporal switching


●   “jump” between algorithms each week
temporal switching


●   “jump” between algorithms each week
re-ranking visitor's lists



  ●   (like we did in survey 2)
re-ranking visitor's lists


●   (like we did in survey 2, amazon did in 1998!)
contributions/summary

●   temporal diversity is important
●   defined (simple, extendable) metric to measure temporal
    recommendation diversity
●   analysed factors that influence diversity; most accurate
    algorithm is not the most diverse
●   hybrid-switching/re-ranking can improve diversity
Temporal Diversity in Recommender Systems
  Neal Lathia1, Stephen Hailes1, Licia Capra1, Xavier Amatriain2
       1
         Dept. Computer Science, University College London
                  2
                    Telefonica Research, Barcelona

                    ACM SIGIR 2010, Geneva

                      n.lathia@cs.ucl.ac.uk
               @neal_lathia, @xamat

            Support by:
            EU FP7 i-Tour
            Grant 234239

Temporal Diversity in RecSys - SIGIR2010

  • 1.
    Temporal Diversity inRecommender Systems Neal Lathia1, Stephen Hailes1, Licia Capra1, Xavier Amatriain2 1 Dept. Computer Science, University College London 2 Telefonica Research, Barcelona ACM SIGIR 2010, Geneva n.lathia@cs.ucl.ac.uk @neal_lathia, @xamat EU i-Tour Project
  • 2.
    recommender systems ● many examples over different web domains ● a lot of research: accuracy ● multiple dimensions of usage that equate to user satisfaction
  • 3.
    evaluating collaborative filteringover time ● design a methodology to evaluate recommender systems that are iteratively updated; explore temporal dimension of filtering algorithms1 1 N. Lathia, S. Hailes, L. Capra. Temporal Collaborative Filtering with Adaptive Neighbourhoods. ACM SIGIR 2009, Boston, USA
  • 4.
    temporal diversity ● ...is not concerned with diversity of a single set of recommendations (e.g., are you recommended all six star wars movies at once?) ● ...is concerned with the sequence of recommendations that users see (are you recommended the same items every week?)
  • 5.
    contributions ● is temporal recommendation diversity important? ● how to measure temporal diversity and novelty? ● how much temporal diversity do state-of-the-art CF algorithms provide? ● how to improve temporal diversity?
  • 6.
  • 7.
  • 9.
    demographics (in paper):~104 respondents
  • 10.
    procedure ● claim: recommender system for “popular movies” ● rate week 1's recommendations ● movie titles, links to IMDB, DVD Covers ● (click through buffer screen) ● rate week 2's recommendations ● (click through buffer screen) ● ....
  • 11.
  • 12.
    Survey 3: RandomMovies W1 W2 W3 W4 W5
  • 13.
    Survey 3: RandomMovies W1 W2 W3 W4 W5
  • 14.
    Survey 2: PopularMovies, Change Each Week W1 W2 W3 W4 W5
  • 15.
    Survey 2: PopularMovies, Change Each Week W1 W2 W3 W4 W5
  • 16.
    Survey 1: PopularMovies – No Change W1 W2 W3 W4 W5
  • 17.
  • 18.
    Closing Questions surprise, unrest, rude compliments, “spot on” 74% important / very important 23% neutral 86% important / very important 95% important / very important
  • 19.
    how did thisaffect the way people rated?
  • 20.
    how did thisaffect the way people rated?
  • 21.
    how did thisaffect the way people rated? S3 Random: Always Bad
  • 22.
    how did thisaffect the way people rated? S2 Popular: Quite Good S3 Random: Always Bad
  • 23.
    how did thisaffect the way people rated? S2 Popular: Quite Good S1 Starts off Quite Good S1 Ends off Bad S3 Random: Always Bad ...ANOVA details in paper...
  • 24.
  • 26.
    how to measuretemporal diversity?
  • 27.
  • 28.
  • 29.
    how much temporaldiversity do state-of-the-art CF algorithms provide?
  • 30.
    3 algorithms –3 influential factors ● baseline – popularity ranking ● item-based kNN ● singular value decomposition ● profile size vs. diversity ● ratings added vs. diversity ● time between sessions vs. diversity
  • 31.
    profile size vs.diversity baseline kNN SVD
  • 32.
    profile size vs.diversity baseline kNN SVD
  • 33.
    main results ● as profile size increases, diversity decreases ● the more ratings added in the current session, the more diversity will be experienced in next session ● more time between sessions leads to more diversity
  • 34.
    consequences ● want to avoid from having profiles that are too large ● (conflict #1) want to encourage users to rate as much as possible ● (conflict #2) want users to visit often, but diversity increases if they don't ● how does this relate back to traditional evaluation metrics?
  • 35.
    accuracy vs. diversity morediverse kNN SVD baseline more accurate
  • 36.
    how to improvetemporal diversity?
  • 37.
    3 methods ● temporal switching ● temporal user-based switching ● re-ranking frequent visitor's lists
  • 38.
    temporal switching ● “jump” between algorithms each week
  • 39.
    temporal switching ● “jump” between algorithms each week
  • 40.
    re-ranking visitor's lists ● (like we did in survey 2)
  • 41.
    re-ranking visitor's lists ● (like we did in survey 2, amazon did in 1998!)
  • 42.
    contributions/summary ● temporal diversity is important ● defined (simple, extendable) metric to measure temporal recommendation diversity ● analysed factors that influence diversity; most accurate algorithm is not the most diverse ● hybrid-switching/re-ranking can improve diversity
  • 43.
    Temporal Diversity inRecommender Systems Neal Lathia1, Stephen Hailes1, Licia Capra1, Xavier Amatriain2 1 Dept. Computer Science, University College London 2 Telefonica Research, Barcelona ACM SIGIR 2010, Geneva n.lathia@cs.ucl.ac.uk @neal_lathia, @xamat Support by: EU FP7 i-Tour Grant 234239