• Save
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems

on

  • 3,067 views

Slides of the paper presentation at RecSys 2011. ...

Slides of the paper presentation at RecSys 2011.

Abstract: The Recommender Systems community is paying increasing attention to novelty and diversity as key qualities beyond accuracy in real recommendation scenarios. Despite the raise of interest and work on the topic in recent years, we find that a clear common methodological and conceptual ground for the evaluation of these dimensions is still to be consolidated. Different evaluation metrics have been reported in the literature but the precise relation, distinction or equivalence between them has not been explicitly studied. Furthermore, the metrics reported so far miss important properties such as taking into consideration the ranking of recommended items, or whether items are relevant or not, when assessing the novelty and diversity of recommendations.
We present a formal framework for the definition of novelty and diversity metrics that unifies and generalizes several state of the art metrics. We identify three essential ground concepts at the roots of novelty and diversity: choice, discovery and relevance, upon which the framework is built. Item rank and relevance are introduced through a probabilistic recommendation browsing model, building upon the same three basic concepts. Based on the combination of ground elements, and the assumptions of the browsing model, different metrics and variants unfold. We report experimental observations which validate and illustrate the properties of the proposed metrics.

Statistics

Views

Total Views
3,067
Views on SlideShare
3,058
Embed Views
9

Actions

Likes
7
Downloads
0
Comments
0

5 Embeds 9

http://a0.twimg.com 5
http://paper.li 1
https://si0.twimg.com 1
http://fbweb-test.comoj.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems Presentation Transcript

  • 1. 5th ACM International Conference on Recommender Systems – RecSys 2011 Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems Saúl Vargas and Pablo Castells Universidad Autónoma de Madrid http://ir.ii.uam.esIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 2. Beyond accuracy: novelty and diversity You bought So you are recommended… (or browsed) Revolver Rubber Soul With The Beatles Let it be Help! Beatles for Sale A Hard Day’s Sgt. Pp’s Lonely Yellow Magical Mystery The White Night Hearts Club Band Submarine Tour Album Abbey Road The recommendedPlease are… items 1967-1970 1962-1966 Past Masters Past Masters  Very similar to each other (Blue) Please me (Red) Vol 2  Very similar to what the user has already seen … More Beatles’ albums  Very widely known Dark Side Some Girls Bob Dylan of the MoonIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 3. Novelty and diversity in Recommender Systems Algorithms to enhance novelty and diversity  Greedy optimization of objective functions (accuracy + diversity), promotion of long-tail items, etc. (Ziegler 2005, Zhang 2008, Celma 2008) Metrics and methodologies to measure and evaluate novelty and diversity  Inverse popularity –mean self-information (Zhou 2010)  recommend in the long tail 1 MSI   R  log iR 2 p i  Novelty  Intra-list diversity –average pairwise distance (Ziegler 2005, Zhang 2008) 2 ILD  R d ik , il  R  R  1 ik ,il  Diversity k l  Other: temporal diversity (Lathia 2010), diversity relative to other users & to other systems (Bellogín 2010), aggregate diversity (Adomavicius 2011), unexpectedness (Adamopoulos 2011), etc.IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 4. Some limitations R1 R2 Metrics are insensitive to the Diverse Not diverse order of recommender items Same item sets  same measured diversity/novelty Not diverse Diverse … …IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 5. Some limitations Accuracy and diversity/novelty measured independently Method A is better than B Which one is better? Method A Method B Accuracy DiversityIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 6. Our research goals 1. Further formalize recommendation novelty and diversity metrics based on a few basic fundamental principles 2. Build a unified metric framework where: – As many state of the art novelty and diversity metrics as possible are related and generalized – New metrics can be defined 3. Enhance the novelty and diversity metrics with rank sensitivity and relevance awarenessIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 7. Basic fundamental principles to build metrics upon  Our approach: define and formalize novelty and diversity metrics based on models of how users interact with items  Three basic fundamental principles in user-item interaction – Discovery – an item is seen by a user – Relevance – an item would be liked by (or useful for, etc.) a user – Choice – an item is actually accepted (bought, consumed, etc.) by a user  Formalized as binary random variables – seen, rel, choose taking values in {true, false} seen choose rel  Simplifying assumptions: – seen and rel are mutually independent – If a user sees an item that is relevant for her,  p choose  p seen  p rel  she chooses itIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 8. Proposed metric framework Expected effective novelty of items when a user interacts R with a ranked list of recommended items in a context  m  R    C  p choose i, u, R  nov i   iR Novelty is relative: item novelty context  i To (what we know about) what someone has seen sometime somewhere  Someone  the target user, a set of users, all users…  Sometime  a specific past time period, an ongoing session, “ever”…  Somewhere  past recommendations, the current recommendation R, recommendations by other systems, “anywhere”…  “What we know about that”  context of observation: available observations …IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 9. Metric framework components m  R    C  p choose i, u, R  nov i   iR  Item novelty model nov i    Choice model p choose i, u, R IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 10. Item novelty models Item novelty model nov(i|)  Discovery-based (negative popularity) – Popularity complement nov i    1  p seen i,   Forced discovery – Self-information (surprisal) nov i     log2 p i seen,   Free discovery  Distance-based ( here represents a set of items) – Expected item distance nov i     p  j choose, i,   d i, j  j  – Minimum item distance nov i    min d i, j  j IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 11. Metric framework components m  R    C  p choose i, u, R  nov i   iR  Item novelty model nov i    Choice model p choose i, u, R IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 12. Choice model Choice model p(choose|i,u,R) p choose  p seen  p rel  p choose i, u, R  p seen i, u, R  p rel i, u  Browsing Relevance Independent model model from RIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 13. Browsing model R Browsing model where p(seen|ik,u,R) should decrease with k 1  Can be formalized as different probabilistic discount functions 2 (see e.g. Carterette 2011) 3  In general, p(seen|ik,u,R) = disc(k) 4 disc k  5 p k 1 exponential, as in RBP (Moffat 2008)k=6 ? 1 log k  1 as in nDCG 7 1k Zipfian, as in MRR, MAP, etc. 8 1 no discount 9 ... many others... …IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 14. Wrapping up: resulting metric scheme m  R    C  disc k  p rel ik , u  nov ik   ik R Rank Item Item discount relevance novelty Normalization – to get the novelty ratio by expected number of browsed items 1 C   disc k  ik R Expected browsing depthIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 15. Implementation Ground model estimates   observed interaction between all users and items in the system  Discovery distributions can be estimated from rating data or access records – Forced discovery p(seen|i,)  IUF (ratio of users who have interacted with i) – Free discovery: p(i|seen,)  ICF (ratio of interactions involving i)  Relevance distribution p(rel|i,u) is estimated by a mapping from ratings to relevance (see definition of ERR in Chapelle 2009)IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 16. Novelty and diversity metrics Putting all together Some metric framework instantiationsIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 17. Putting all together: metric framework instantiations Discovery-based metrics   observed interaction between all users and items in the system  Expected popularity complement EPC R   C  disc k  p rel ik , u  1  p seen ik  ik R   Novelty  Expected free discovery EFD R   C  disc k  p rel ik , u  log p ik seen  ik R 1 Without rank and relevance reduces to  MSI  R    R  log p i seen  iRIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 18. Putting all together: metric framework instantiations Distance-based metrics   the observed interaction of the target user only  Expected profile distance Unexpectedness EPD  R   C u  disc k  p rel ik , u  p rel j, u  d ik , j  (user-specific) ik R j u   the recommended items the target user can see in R  Expected intra-list diversity Diversity EILD  R    C disc k  disc l k  p rel i , u  p rel i , u  d i , i  ik R k k l k l il R k l 2 Without rank and relevance reduces to  ILD  R    d ik , il  R  R  1 ik ,il R k lIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 19. Novelty and diversity metrics Some experimentsIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 20. Experiments  Datasets  Recommender algorithms – MovieLens 1M – CB Content-based (ML only) – Last.fm data by Òscar Celma – UB User-based kNN  Experiment design – MF Matrix factorization – Run baseline recommenders – AVG Average rating – Rerank top 500 recommended items – RND Random by diversification algorithms – Measure metrics on top 50 items  Diversification algorithms  Metrics – MMR Greedy optimization of relevance + diversity – EPC@50 Novelty (Zhang 2008) (popularity complement) – IA-Select Adaptation of IR – EPD@50 Unexpectednes diversity algorithm (profile distance) (Agrawal 2008) – EILD@50 Intra-list diversity – NGD Greedy optimization Distance function: complement of Jaccard of relevance + novelty (MovieLens genres) and Pearson (Last.fm) – RandomIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 21. Experimental results on baseline recommenders (no rank discount) MovieLens 1M Last.fm Without relevance 1.0 1.00 CB 0.9 MF  CB is good for long- 0.97No relevance 0.8 UB tail, not so good at 0.94 0.7 AVG unexpectedness 0.91 RND 0.6 and diversity 0.5 0.88  AVG rating and RND 0.4 0.85 EPC@50 EPD@50 EILD@50 EPC@50 EPD@50 EILD@50 stand out, especial- ly on Last.fm MovieLens 1M Last.fm With relevance 0.07 0.5 CB  MF stands out onRelevance-aware 0.06 MF 0.4 0.05 UB MovieLens 0.04 0.3 AVG 0.03 RND  UB stands out on 0.2 0.02 Last.fm 0.01 0.1 0.00 0.0  AVG rating and RND EPC@50 EPD@50 EILD@50 EPC@50 EPD@50 EILD@50 drop drasticallyIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 22. Experimental results with diversification algorithms Wilcoxon MovieLens 1M Last.fm p < 0.001 EPC@50 EPD@50 EILD@50 EPC@50 EPD@50 EILD@50 disc (k) 1 0.85k–1 1 0.85k–1 1 0.85k–1 1 0.85k–1 1 0.85k–1 1 0.85k–1 MF 0.9124 0.8876 0.7632 0.7466 0.7164 0.6191 0.8754 0.8481 0.8949 0.8895 0.8862 0.7954 No relevance IA-Select 0.9045 0.8886 0.8080 0.7577 0.8289 0.7483 0.8840 0.9089 0.8912 0.8909 0.8878 0.8274 MMR 0.9063 0.8769 0.7605 0.7428 0.7191 0.6247 0.9068 0.8903 0.9133 0.9107 0.9166 0.8398 NGD 0.9851 0.9795 0.7725 0.7551 0.6563 0.5430 0.9722 0.9571 0.9423 0.9398 0.9485 0.8784 Random 0.9525 0.9527 0.7699 0.7699 0.7283 0.6719 0.9359 0.9357 0.9278 0.9279 0.9318 0.8619 MF 0.0671 0.1043 0.0580 0.0944 0.0471 0.0551 0.2501 0.2115 0.2671 0.2587 0.2518 0.1900 IA-Select Relevance 0.0705 0.1161 0.0639 0.1032 0.0537 0.0648 0.3343 0.4752 0.3462 0.3994 0.3343 0.4154 MMR 0.0719 0.1131 0.0620 0.1020 0.0510 0.0610 0.2351 0.1936 0.2439 0.2340 0.2360 0.1759 NGD 0.0155 0.0223 0.0128 0.0200 0.0067 0.0017 0.2286 0.3077 0.2212 0.2593 0.2165 0.2656 Random 0.0222 0.0218 0.0182 0.0179 0.0117 0.0058 0.1362 0.1368 0.1407 0.1405 0.1342 0.1113  Improvement w.r.t. random reranking is clearer with relevance best  Rank sensitivity uncovers further improvements by diversification algorithms > random  Different metrics appreciate different diversification algorithms consistently < baselineIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 23. Experimental results  The metrics behave consistently – E.g. content-based recommender scores high on novelty (long-tail) but low on unexpectedness and diversity – Diversified recommendations score higher than baselines – Different diversification strategies met their specific target  Relevance makes a large difference – Probe recommenders such as random and average rating score high without relevance and rank discount –and they drop with relevance – Same effect for random diversification  Rank sensitiveness uncovers further improvements by diversification algorithms which otherwise go unnoticedIRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011
  • 24. Conclusion  General metric framework for recommendation novelty and diversity evaluation  Flexible and configurable, supports a fair range of variants and configurations – Key configuration components: item novelty models, context , rank and relevance  Unifies and generalizes state of the art metrics – Further metrics can be unified taking alternative  : temporal novelty/diversity, inter-system diversity, inter-user diversity  Provides for rank sensitivity and relevance awareness (as an option)  Provides for single metric assessing accuracy and diversity/novelty  Further ongoing empirical testing, wide space for further exploration!IRG Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems 5th ACM International Conference on Recommender Systems (RecSys 2011)IR Group @ UAM Chicago, IL, 23-27 October 2011