• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 

Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

on

  • 638 views

Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects ...

Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: last Friday, we won the ACM RecSys 2013 News Recommender Systems challenge). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.

Statistics

Views

Total Views
638
Views on SlideShare
624
Embed Views
14

Actions

Likes
1
Downloads
4
Comments
1

1 Embed 14

https://twitter.com 14

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • very useful
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This is the target user, or the user we want to present recommendations to
  • It is important to consider the preferences of the rest of the users in the system
  • Of all the users
  • The final goal of the system is to detect new items the user may like
  • One point for each fold

Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013 Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013 Presentation Transcript

  • Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013
  • Recommendation • Informally: – Search for information “without a query” • Three types: – Content-based recommendation – Collaborative filtering (CF) • Memory-based • Model-based – Hybrid approaches
  • Recommendation • Informally: – Search for information “without a query” • Three types: – Content-based recommendation – Collaborative filtering • Memory-based • Model-based – Hybrid approaches Today’s focus!
  • Collaborative Filtering • Collaborative filtering (originally introduced by Patti Maes as “social information filtering”) 1. Compare user judgments 2. Recommend differences between similar users • Leading principle: People’s tastes are not randomly distributed –A.k.a. “You are what you buy”
  • Collaborative Filtering • Benefits over content-based approach – Overcomes problems with finding suitable features to represent e.g. art, music – Serendipity – Implicit mechanism for qualitative aspects like style • Problems: large groups, broad domains
  • Context • Recommender systems – Users interact (rate, purchase, click) with items
  • Context • Recommender systems – Users interact (rate, purchase, click) with items
  • Context • Recommender systems – Users interact (rate, purchase, click) with items
  • Context • Recommender systems – Users interact (rate, purchase, click) with items
  • Context • Nearest-neighbour recommendation methods – The item prediction is based on “similar” users
  • Context • Nearest-neighbour recommendation methods – The item prediction is based on “similar” users
  • Similarity
  • Similarity
  • Similarity s( , ) sim( , )s( , )
  • Research Question • How does the choice of similarity measure determine the quality of the recommendations?
  • Sparseness • Too many items exist, so many ratings will be missing • A user’s neighborhood is likely to extend to include “not-so-similar” users and/or items
  • “Best” similarity? • Consider cosine similarity vs. Pearson similarity • Most existing studies report Pearson correlation to lead to superior recommendation accuracy
  • “Best” similarity? • Common variations to deal with sparse observations: – Item selection: • Compare full profiles, or only on overlap – Imputation: • Impute default value for unrated items – Filtering: • Threshold on minimal similarity value
  • “Best” similarity? • Cosine superior (!), but not for all settings – No consistent results
  • Analysis
  • Distance Distribution • In high dimensions, nearest neighbour is unstable: If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
  • Distance Distribution Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
  • Distance Distribution • Quality q(n, f): Fraction of users for which the similarity function has ranked at least n percent of the user community within a factor f of the nearest neighbour’s similarity value (well... its corresponding distance)
  • Distance Distribution
  • NNk Graph • Graph associated with the top k nearest neighbours • Analysis focusing on the binary relation of whether a user does or does not belong to a neighbourhood – Ignore similarity values (already included in the distance distribution analysis)
  • NNk Graph
  • MRR vs. Features • Quality: – If most of the user population is far away, high similarity correlates with effectiveness – If most of the user population is close, high similarity correlates with ineffectiveness
  • MRR vs. Features
  • Conclusions (so far) • “Similarity features” correlate with recommendation effectiveness – “Stability” of a metric (as defined in database literature on k-NN search in high dimensions) is related to its ability to discriminate between good and bad neighbours
  • Future Work • How to exploit this knowledge to now improve recommendation systems?
  • News Recommendation Challenge
  • Thanks • Alejandro Bellogín – ERCIM fellow in the Information Access group Details: Bellogín and De Vries, ICTIR 2013.