Diversity versus accuracy:
solving the apparent dilemma
 facing recommender systems
 Tao Zhou, Zoltán Kuscsik, Jian-Guo Li...
Overview
●   Background: recommender systems, accuracy
    and diversity
●   Recommendation algorithms, old and new
●   Da...
Background
Recommender systems use data on past user
preferences to predict possible future likes and
interests
●   Most m...
Similarity-focused recommendation

User 1                                       For user 1




User 2
                    ...
Recommendation algorithms (I)
●   Input: unary data
    –   u users, o objects, and links between the two
    –   more exp...
Recommendation algorithms (II)
●   Algorithms calculate recommendation scores
    for each user and each of their uncollec...
Recommendation algorithms (III)
●   HeatS and ProbS: assign collected objects an
    initial level of ‘resource’ denoted b...
Datasets
Measures of accuracy
●   Remove 10% of the links from the dataset to
    generate a test set.
    –   Relative rank rαi of...
Measures of diversity
–   If qij (L) is the number of common objects in the top
    L places of users i and j’s recommen-d...
Applying the algorithms



●   ProbS offers optimal performance for accuracy
●   HeatS is not accurate, but has exceptiona...
HeatS+ProbS hybrid
●   The HeatS and ProbS methods are intimately
    linked – their recommendation processes are
    just...
Conclusions
●   The dilemma is false – by creating a hybrid of
    accuracy- and diversity-focused methods we
    can tune...
Thanks ...
●   ... to my co-workers: Tao Zhou, Zoltán Kuscsik,
    Jian-Guo Liu, Matúš Medo & Yi-Cheng Zhang
●   ... to Yi...
Diversity versus accuracy: solving the apparent dilemma facing recommender systems
Diversity versus accuracy: solving the apparent dilemma facing recommender systems
Upcoming SlideShare
Loading in …5
×

Diversity versus accuracy: solving the apparent dilemma facing recommender systems

944 views

Published on

Joe Wakeling, "Diversity versus accuracy: solving the apparent dilemma facing recommender systems", 28th October 2009, University of Trento

Published in: Education, Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
944
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
47
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Diversity versus accuracy: solving the apparent dilemma facing recommender systems

  1. 1. Diversity versus accuracy: solving the apparent dilemma facing recommender systems Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph R. Wakeling & Yi-Cheng Zhang
  2. 2. Overview ● Background: recommender systems, accuracy and diversity ● Recommendation algorithms, old and new ● Datasets: Netflix, RateYourMusic, Delicious ● Measures for accuracy and diversity ● Solving the apparent ‘dilemma’
  3. 3. Background Recommender systems use data on past user preferences to predict possible future likes and interests ● Most methods based on similarity, of either users or objects ● PROBLEM: more and more users exposed to a narrowing band of popular objects ● ... when real value is in diversity and novelty: ‘finding what you don’t know’ ● DILEMMA: choose between accuracy and diversity of your recommendations ...
  4. 4. Similarity-focused recommendation User 1 For user 1 User 2 For user 2 Diversity-focused recommendation For user 1 User 1 User 2 For user 2
  5. 5. Recommendation algorithms (I) ● Input: unary data – u users, o objects, and links between the two – more explicit ratings can be mapped to this form easily – converse is not true! ● Two possible representations: – o×u adjacency matrix A where aαi = 1 if object α is collected by user i, 0 otherwise – bipartite user-object network where degrees of user and object nodes, ki and kα, represent the number of objects collected by user i and how many users have collected object α
  6. 6. Recommendation algorithms (II) ● Algorithms calculate recommendation scores for each user and each of their uncollected objects. Some widely used examples: ● GRank: rank objects according to popularity – objects sorted by degree kα (no personalization) ● USim: recommend objects collected by ‘taste mates’ u o ∑ a i a j ∑ sij a  j j=1 s ij = =1 v i = u k i k j ∑ sij j=1 user similarity recommendation score
  7. 7. Recommendation algorithms (III) ● HeatS and ProbS: assign collected objects an initial level of ‘resource’ denoted by a vector f, and then redistribute: f ' = Wf where u 1 a j a j HeatS = ∑ H W  (heat diffusion) k  j=1 k j u 1 a j a j = ∑ P ProbS W  (random walk) k  j =1 k j ● Recommend items according to scores fα'
  8. 8. Datasets
  9. 9. Measures of accuracy ● Remove 10% of the links from the dataset to generate a test set. – Relative rank rαi of object α in user i’s recom- mendation list should be lower if α is a deleted link. Average over all deleted links for all users to measure the mean recovery of deleted links. – If di(L) and Di are the number of deleted links in the top L places and the total number of deleted links for user i, then precision and recall are given by di(L)/L and di(L)/Di. Average over all users with at least 1 deleted link and compare with expected values for random lists to get precision and recall enhancement.
  10. 10. Measures of diversity – If qij (L) is the number of common objects in the top L places of users i and j’s recommen-dation lists, then the personalization of lists can be given by the mean of the inter-list distance, hij (L) = 1 – qij (L)/L, calculated over all pairs ij of users with at least one deleted link. – The novelty or unexpectedness of an object can be given by its self-information Iα = log2(u/kα). Averaging over all top-L objects for all users, we obtain the mean self-information or ‘surprisal’.
  11. 11. Applying the algorithms ● ProbS offers optimal performance for accuracy ● HeatS is not accurate, but has exceptionally high personalization and novelty – Does this confirm the dilemma? Must we choose between accuracy and diversity, or is there a way to get the best of both worlds?
  12. 12. HeatS+ProbS hybrid ● The HeatS and ProbS methods are intimately linked – their recommendation processes are just different normalizations of the same underlying matrix ● By incorporating a hybridization parameter λ ∊ [0,1] into the normalization, we obtain an elegant blend of the two methods: u 1 a j a j W HP  = k 1− k ∑ kj   j =1 – ... with λ = 0 corresponding to pure HeatS and λ = 1 to pure ProbS
  13. 13. Conclusions ● The dilemma is false – by creating a hybrid of accuracy- and diversity-focused methods we can tune it to produce simultaneous gains in accuracy and diversity of recommendations ● These methods do not rely on semantic or context-specific information – they are applicable to virtually any dataset ● ... but we expect the approach to be general, i.e. not limited to these algorithms ● Tuning is simple enough to permit individual users to customize the recommendation service
  14. 14. Thanks ... ● ... to my co-workers: Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo & Yi-Cheng Zhang ● ... to Yi-Kuo Yu for lots of good advice ● ... to Ting Lei for the nice lens/focus diagram ● ... to LiquidPub ● ... and to you for listening :-)

×