Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

•Download as PPT, PDF•

4 likes•1,614 views

Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: last Friday, we won the ACM RecSys 2013 News Recommender Systems challenge). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.

Technology Business

Similarity & Recommendation
Arjen P. de Vries
arjen@cwi.nl
CWI Scientific Meeting
September 27th 2013

Recommendation
• Informally:
– Search for information “without a query”
• Three types:
– Content-based recommendation
– Collaborative filtering (CF)
• Memory-based
• Model-based
– Hybrid approaches

Recommendation
• Informally:
– Search for information “without a query”
• Three types:
– Content-based recommendation
– Collaborative filtering
• Memory-based
• Model-based
– Hybrid approaches
Today’s focus!

Collaborative Filtering
• Collaborative filtering (originally introduced by
Patti Maes as “social information filtering”)
1. Compare user judgments
2. Recommend differences between
similar users
• Leading principle:
People’s tastes are not randomly
distributed
–A.k.a. “You are what you buy”

Collaborative Filtering
• Benefits over content-based approach
– Overcomes problems with finding suitable
features to represent e.g. art, music
– Serendipity
– Implicit mechanism for qualitative aspects like
style
• Problems: large groups, broad domains

Context
• Recommender systems
– Users interact (rate, purchase, click) with items

Context
• Nearest-neighbour recommendation methods
– The item prediction is based on “similar” users

Research Question
• How does the choice of similarity measure
determine the quality of the
recommendations?

Sparseness
• Too many items exist, so many ratings will
be missing
• A user’s neighborhood is likely to extend
to include “not-so-similar” users and/or
items

“Best” similarity?
• Consider cosine similarity vs. Pearson
similarity
• Most existing studies report Pearson
correlation to lead to superior
recommendation accuracy

“Best” similarity?
• Common variations to deal with sparse
observations:
– Item selection:
• Compare full profiles, or only on overlap
– Imputation:
• Impute default value for unrated items
– Filtering:
• Threshold on minimal similarity value

“Best” similarity?
• Cosine superior (!), but not for all settings
– No consistent results

Distance Distribution
• In high dimensions, nearest neighbour is unstable:
If the distance from query point to most data points is less than
(1 + ε) times the distance from the query point to its nearest
neighbour
Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

Distance Distribution
Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

Distance Distribution
• Quality q(n, f):
Fraction of users for which the similarity
function has ranked at least n percent of
the user community within a factor f of the
nearest neighbour’s similarity value (well... its
corresponding distance)

NNk
Graph
• Graph associated with the top k nearest
neighbours
• Analysis focusing on the binary relation of
whether a user does or does not belong to
a neighbourhood
– Ignore similarity values (already included in
the distance distribution analysis)

MRR vs. Features
• Quality:
– If most of the user population is far away, high
similarity correlates with effectiveness
– If most of the user population is close, high
similarity correlates with ineffectiveness

Conclusions (so far)
• “Similarity features” correlate with
recommendation effectiveness
– “Stability” of a metric (as defined in database
literature on k-NN search in high dimensions)
is related to its ability to discriminate between
good and bad neighbours

Future Work
• How to exploit this knowledge to now
improve recommendation systems?

Thanks
• Alejandro Bellogín – ERCIM fellow in the
Information Access group
Details: Bellogín and De Vries, ICTIR 2013.

What's hot

Recsys 2018 overview and highlightsSandra Garcia

Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain

Practical machine learning - Part 1Traian Rebedea

SSSW 2013 - Feeding Recommender Systems with Linked Open DataPolytechnic University of Bari

Content based filteringBendito Freitas Ribeiro

Recommender Systems and Linked Open DataPolytechnic University of Bari

[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...YONG ZHENG

Sentiment analysis using naive bayes classifier Dev Sahu

Recommender systemsTamer Rezk

Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari

Intro to Deep Learning for Question AnsweringTraian Rebedea

Clustering Technique for Collaborative Filtering Recommendation and Applicat...Pham Cuong

Random Generation of Relational Bayesian NetworksUniversity of Nantes

Filtering content bbased crsAravindharamanan S

Tutorial on Coreference Resolution Anirudh Jayakumar

13 sdm-blda-slidesMinghui QIU

Chapter 02 collaborative recommendationAravindharamanan S

Part 1butest

Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam

Preference Elicitation Interface晓愚孟

What's hot (20)

Recsys 2018 overview and highlights

Topic Modelling: Tutorial on Usage and Applications

Practical machine learning - Part 1

SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content based filtering

Recommender Systems and Linked Open Data

[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...

Sentiment analysis using naive bayes classifier

Recommender systems

Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...

Intro to Deep Learning for Question Answering

Clustering Technique for Collaborative Filtering Recommendation and Applicat...

Random Generation of Relational Bayesian Networks

Filtering content bbased crs

Tutorial on Coreference Resolution

13 sdm-blda-slides

Chapter 02 collaborative recommendation

Part 1

Presentation of Domain Specific Question Answering System Using N-gram Approach.

Preference Elicitation Interface

Similar to Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

Overview of recommender systemStanley Wang

Lecture Notes on Recommender System IntroductionPerumalPitchandi

Олександр Обєдніков “Рекомендательные системы”Dakiry

Chapter 02 collaborative recommendationAravindharamanan S

Recommender system introductionLiang Xiang

Big data certification training mumbaiTejaspathiLV

Best data science courses in puneprathyusha1234

Top data science institutes in hyderabadprathyusha1234

best online data science coursesprathyusha1234

Recommender SystemsGirish Khanzode

Lec7 collaborative filteringAravindharamanan S

Recommandation systems - Yousef Fadila

Social Recommender Systems Tutorial - WWW 2011idoguy

The User Side of Personalization: How Personalization Affects the UsersPeter Brusilovsky

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Charalampos Chelmis

Recommenders.pptAravind Reddy

Recommenders.pptNagendraBabu27244

Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto

Content based recommendation systemsAravindharamanan S

The Universal RecommenderPat Ferrel

Similar to Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013 (20)

Overview of recommender system

Lecture Notes on Recommender System Introduction

Олександр Обєдніков “Рекомендательные системы”

Chapter 02 collaborative recommendation

Recommender system introduction

Big data certification training mumbai

Best data science courses in pune

Top data science institutes in hyderabad

best online data science courses

Recommender Systems

Lec7 collaborative filtering

Recommandation systems -

Social Recommender Systems Tutorial - WWW 2011

The User Side of Personalization: How Personalization Affects the Users

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...

Recommenders.ppt

Apache Mahout Tutorial - Recommendation - 2013/2014

Content based recommendation systems

The Universal Recommender

Recently uploaded

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Unblocking The Main Thread Solving ANRs and Frozen Frames

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Presentation on how to chat with PDF using ChatGPT code interpreter

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Injustice - Developers Among Us (SciFiDevCon 2024)

My Hashitalk Indonesia April 2024 Presentation

How to Remove Document Management Hurdles with X-Docs?

Pigging Solutions in Pet Food Manufacturing

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Next-generation AAM aircraft unveiled by Supernal, S-A2

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

08448380779 Call Girls In Friends Colony Women Seeking Men

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

1. Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013

2. Recommendation • Informally: – Search for information “without a query” • Three types: – Content-based recommendation – Collaborative filtering (CF) • Memory-based • Model-based – Hybrid approaches

3. Recommendation • Informally: – Search for information “without a query” • Three types: – Content-based recommendation – Collaborative filtering • Memory-based • Model-based – Hybrid approaches Today’s focus!

4. Collaborative Filtering • Collaborative filtering (originally introduced by Patti Maes as “social information filtering”) 1. Compare user judgments 2. Recommend differences between similar users • Leading principle: People’s tastes are not randomly distributed –A.k.a. “You are what you buy”

5. Collaborative Filtering • Benefits over content-based approach – Overcomes problems with finding suitable features to represent e.g. art, music – Serendipity – Implicit mechanism for qualitative aspects like style • Problems: large groups, broad domains

6. Context • Recommender systems – Users interact (rate, purchase, click) with items

7. Context • Recommender systems – Users interact (rate, purchase, click) with items

8. Context • Recommender systems – Users interact (rate, purchase, click) with items

9. Context • Recommender systems – Users interact (rate, purchase, click) with items

10. Context • Nearest-neighbour recommendation methods – The item prediction is based on “similar” users

11. Context • Nearest-neighbour recommendation methods – The item prediction is based on “similar” users

12. Similarity

13. Similarity

14. Similarity s( , ) sim( , )s( , )

15. Research Question • How does the choice of similarity measure determine the quality of the recommendations?

16. Sparseness • Too many items exist, so many ratings will be missing • A user’s neighborhood is likely to extend to include “not-so-similar” users and/or items

17. “Best” similarity? • Consider cosine similarity vs. Pearson similarity • Most existing studies report Pearson correlation to lead to superior recommendation accuracy

18. “Best” similarity? • Common variations to deal with sparse observations: – Item selection: • Compare full profiles, or only on overlap – Imputation: • Impute default value for unrated items – Filtering: • Threshold on minimal similarity value

19. “Best” similarity? • Cosine superior (!), but not for all settings – No consistent results

20. Analysis

21. Distance Distribution • In high dimensions, nearest neighbour is unstable: If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

22. Distance Distribution Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999

23. Distance Distribution • Quality q(n, f): Fraction of users for which the similarity function has ranked at least n percent of the user community within a factor f of the nearest neighbour’s similarity value (well... its corresponding distance)

24. Distance Distribution

25. NNk Graph • Graph associated with the top k nearest neighbours • Analysis focusing on the binary relation of whether a user does or does not belong to a neighbourhood – Ignore similarity values (already included in the distance distribution analysis)

26. NNk Graph

27. MRR vs. Features • Quality: – If most of the user population is far away, high similarity correlates with effectiveness – If most of the user population is close, high similarity correlates with ineffectiveness

28. MRR vs. Features

29. Conclusions (so far) • “Similarity features” correlate with recommendation effectiveness – “Stability” of a metric (as defined in database literature on k-NN search in high dimensions) is related to its ability to discriminate between good and bad neighbours

30. Future Work • How to exploit this knowledge to now improve recommendation systems?

31. News Recommendation Challenge

32. Thanks • Alejandro Bellogín – ERCIM fellow in the Information Access group Details: Bellogín and De Vries, ICTIR 2013.

Editor's Notes

This is the target user, or the user we want to present recommendations to
It is important to consider the preferences of the rest of the users in the system
Of all the users
The final goal of the system is to detect new items the user may like
One point for each fold

Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

Similar to Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013 (20)

More from Arjen de Vries

More from Arjen de Vries (20)

Recently uploaded

Recently uploaded (20)

Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013

Editor's Notes