Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

  • 559 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
559
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Improving Memory-BasedCollaborative Filteringby Neighbour Selection based onUser Preference OverlapAlejandro Bellogín*, Pablo Castells, Iván CantadorInformation Retrieval Group – Universidad Autónoma de Madrid*Information Access Group – Centrum Wiskunde & Informatica
  • 2. 2IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Recommender Systems A recommender system aims to find and suggest items oflikely interest based on the users’ preferences Examples:• Amazon – products• Netflix – tv shows and movies• LinkedIn – jobs and colleagues• Last.fm – music artists and tracks
  • 3. 3IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Collaborative Filtering“You may like classical music if you like heavy metal”
  • 4. 4IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Collaborative Filtering (in real world)
  • 5. 5IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Neighbours are selected according to how similar they areNeighbour selection
  • 6. 6IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Neighbour-based collaborative filtering Depend on user similarity metrics• Pearson’s correlation• Spearman’s correlation• Cosine Advantages• Simplicity• Intuitive• Efficiency Disadvantages• Lower accuracy than other methods• Limited coverage
  • 7. 7IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Research questionCan we find a good surrogate ofuser similarity for neighbour selection? We need an alternative of similarity which provides equivalent(or better!) results We focus on user preference overlap
  • 8. 8IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap   I u I v P earso n s ,co r u v                   2 2, ,sim ,, ,ii ir u i r u r v i r vu vr u i r u r v i r v   
  • 9. 9IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap The larger the overlap, the more likely the users would agree ontheir ratings
  • 10. 10IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap The larger the overlap, the more likely the users would agree ontheir ratings Low (negative) similarity values obtained with small overlap sizes
  • 11. 11IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap The larger the overlap, the more likely the users would agree ontheir ratings Low (negative) similarity values obtained with small overlap sizes Highest similarity values obtained by users with tiny overlap• This is coincidental, such similarities have low reliability
  • 12. 12IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection User-based collaborative filtering: Select top-k neighbours V according to• Similarity• Intersection (overlap)• Herlocker’s weighting• McLaughlin’s weighting     r , sim , r ,v Vu i u v v i           m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn
  • 13. 13IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison• RMSE: error-based evaluation (the lower, the better)• P@10: precision-based evaluation (the higher the better)     2,1R M S E , ,u i Tr u i r u iT  R el T opNP @ NN
  • 14. 14IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison• RMSE: error-based evaluation (the lower, the better)• P@10: precision-based evaluation (the higher the better)↑Performance as good (or better) than the baseline↓Lower coverage for some methods0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filteringHerlocker filtering McLaughlin filtering0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size
  • 15. 15IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10          m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 100
  • 16. 16IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10          m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 1000.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 200.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 80
  • 17. 17IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10          m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 1000.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 200.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 800.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 400.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 60
  • 18. 18IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Conclusions We compare three approaches based on user preference overlap• Intersection• Herlocker’s weighting• McLaughlin’s weighting Overlap-based approaches improve accuracy (RMSE) and precision(P@10)• Especially, for small neighbourhoods• Stable with respect to the threshold parameter n
  • 19. 19IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Future work Extend our analysis to other similarity functions• Cosine• Spearman Exploit other recommendation input dimensions• User logs• Social networks Understand why the overlap is special to improve otherrecommendation algorithms
  • 20. 20IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Thank you!Improving Memory-Based Collaborative Filteringby Neighbour Selection Based onUser Preference OverlapAlejandro Bellogín, Pablo Castells, Iván Cantador@abelloginalejandro.bellogin@cwi.nl
  • 21. 21IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 1000.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 800.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 600.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 400.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 200.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10
  • 22. 22IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Experiments Dataset: MovieLens 1M• Users: 6,040• Items: 3,600• Number of ratings: 1 million• Available in http://www.grouplens.org/node/73
  • 23. 23IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Neighbour selection in collaborative filtering Two main alternatives• Top-k most similar users• Thresholding: those users whose similarity is above a thresholdIn [Herlocker et al, 2002]: thresholding obtains worse error accuracy Compatible with• Clustering of users: neighbourhood ~ users in the same cluster[Xue et al, 2005; Bellogin & Parapar, 2013]• Probabilistic relevance models: p(v | Ru)[Submitted to RecSys ‘13]• Rating normalisation[Desrosiers & Karypis, 2012]
  • 24. 24IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)References [Bellogin & Parapar, 2013] Using Graph Partitioning Techniques forNeighbour Selection in User-Based Collaborative Filtering. RecSys [Desrosiers & Karypis, 2012] A Comprehensive Survey ofNeighborhood-based Recommendation Methods. RecommenderSystems Handbook [Herlocker et al, 2002] An Empirical Analysis of Design Choices inNeighborhood-Based Collaborative Filtering Algorithms.Information Retrieval Journal [Xue et al, 2005] Scalable Collaborative Filtering Using Cluster-BasedSmoothing. SIGIR