Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

944 views

Published on

No Downloads

Total views

944

On SlideShare

0

From Embeds

0

Number of Embeds

6

Shares

0

Downloads

47

Comments

0

Likes

7

No embeds

No notes for slide

- 1. Diversity versus accuracy: solving the apparent dilemma facing recommender systems Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph R. Wakeling & Yi-Cheng Zhang
- 2. Overview ● Background: recommender systems, accuracy and diversity ● Recommendation algorithms, old and new ● Datasets: Netflix, RateYourMusic, Delicious ● Measures for accuracy and diversity ● Solving the apparent ‘dilemma’
- 3. Background Recommender systems use data on past user preferences to predict possible future likes and interests ● Most methods based on similarity, of either users or objects ● PROBLEM: more and more users exposed to a narrowing band of popular objects ● ... when real value is in diversity and novelty: ‘finding what you don’t know’ ● DILEMMA: choose between accuracy and diversity of your recommendations ...
- 4. Similarity-focused recommendation User 1 For user 1 User 2 For user 2 Diversity-focused recommendation For user 1 User 1 User 2 For user 2
- 5. Recommendation algorithms (I) ● Input: unary data – u users, o objects, and links between the two – more explicit ratings can be mapped to this form easily – converse is not true! ● Two possible representations: – o×u adjacency matrix A where aαi = 1 if object α is collected by user i, 0 otherwise – bipartite user-object network where degrees of user and object nodes, ki and kα, represent the number of objects collected by user i and how many users have collected object α
- 6. Recommendation algorithms (II) ● Algorithms calculate recommendation scores for each user and each of their uncollected objects. Some widely used examples: ● GRank: rank objects according to popularity – objects sorted by degree kα (no personalization) ● USim: recommend objects collected by ‘taste mates’ u o ∑ a i a j ∑ sij a j j=1 s ij = =1 v i = u k i k j ∑ sij j=1 user similarity recommendation score
- 7. Recommendation algorithms (III) ● HeatS and ProbS: assign collected objects an initial level of ‘resource’ denoted by a vector f, and then redistribute: f ' = Wf where u 1 a j a j HeatS = ∑ H W (heat diffusion) k j=1 k j u 1 a j a j = ∑ P ProbS W (random walk) k j =1 k j ● Recommend items according to scores fα'
- 8. Datasets
- 9. Measures of accuracy ● Remove 10% of the links from the dataset to generate a test set. – Relative rank rαi of object α in user i’s recom- mendation list should be lower if α is a deleted link. Average over all deleted links for all users to measure the mean recovery of deleted links. – If di(L) and Di are the number of deleted links in the top L places and the total number of deleted links for user i, then precision and recall are given by di(L)/L and di(L)/Di. Average over all users with at least 1 deleted link and compare with expected values for random lists to get precision and recall enhancement.
- 10. Measures of diversity – If qij (L) is the number of common objects in the top L places of users i and j’s recommen-dation lists, then the personalization of lists can be given by the mean of the inter-list distance, hij (L) = 1 – qij (L)/L, calculated over all pairs ij of users with at least one deleted link. – The novelty or unexpectedness of an object can be given by its self-information Iα = log2(u/kα). Averaging over all top-L objects for all users, we obtain the mean self-information or ‘surprisal’.
- 11. Applying the algorithms ● ProbS offers optimal performance for accuracy ● HeatS is not accurate, but has exceptionally high personalization and novelty – Does this confirm the dilemma? Must we choose between accuracy and diversity, or is there a way to get the best of both worlds?
- 12. HeatS+ProbS hybrid ● The HeatS and ProbS methods are intimately linked – their recommendation processes are just different normalizations of the same underlying matrix ● By incorporating a hybridization parameter λ ∊ [0,1] into the normalization, we obtain an elegant blend of the two methods: u 1 a j a j W HP = k 1− k ∑ kj j =1 – ... with λ = 0 corresponding to pure HeatS and λ = 1 to pure ProbS
- 13. Conclusions ● The dilemma is false – by creating a hybrid of accuracy- and diversity-focused methods we can tune it to produce simultaneous gains in accuracy and diversity of recommendations ● These methods do not rely on semantic or context-specific information – they are applicable to virtually any dataset ● ... but we expect the approach to be general, i.e. not limited to these algorithms ● Tuning is simple enough to permit individual users to customize the recommendation service
- 14. Thanks ... ● ... to my co-workers: Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo & Yi-Cheng Zhang ● ... to Yi-Kuo Yu for lots of good advice ● ... to Ting Lei for the nice lens/focus diagram ● ... to LiquidPub ● ... and to you for listening :-)

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment