Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

962 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
962
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Improving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap

  1. 1. IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Improving Memory-BasedCollaborative Filteringby Neighbour Selection based onUser Preference OverlapAlejandro Bellogín*, Pablo Castells, Iván CantadorInformation Retrieval Group – Universidad Autónoma de Madrid*Information Access Group – Centrum Wiskunde & Informatica
  2. 2. 2IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Recommender Systems A recommender system aims to find and suggest items oflikely interest based on the users’ preferences Examples:• Amazon – products• Netflix – tv shows and movies• LinkedIn – jobs and colleagues• Last.fm – music artists and tracks
  3. 3. 3IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Collaborative Filtering“You may like classical music if you like heavy metal”
  4. 4. 4IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Collaborative Filtering (in real world)
  5. 5. 5IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Neighbours are selected according to how similar they areNeighbour selection
  6. 6. 6IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Neighbour-based collaborative filtering Depend on user similarity metrics• Pearson’s correlation• Spearman’s correlation• Cosine Advantages• Simplicity• Intuitive• Efficiency Disadvantages• Lower accuracy than other methods• Limited coverage
  7. 7. 7IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Research questionCan we find a good surrogate ofuser similarity for neighbour selection? We need an alternative of similarity which provides equivalent(or better!) results We focus on user preference overlap
  8. 8. 8IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap   I u I v P earso n s ,co r u v                   2 2, ,sim ,, ,ii ir u i r u r v i r vu vr u i r u r v i r v   
  9. 9. 9IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap The larger the overlap, the more likely the users would agree ontheir ratings
  10. 10. 10IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap The larger the overlap, the more likely the users would agree ontheir ratings Low (negative) similarity values obtained with small overlap sizes
  11. 11. 11IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Relation between similarity and overlap The larger the overlap, the more likely the users would agree ontheir ratings Low (negative) similarity values obtained with small overlap sizes Highest similarity values obtained by users with tiny overlap• This is coincidental, such similarities have low reliability
  12. 12. 12IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection User-based collaborative filtering: Select top-k neighbours V according to• Similarity• Intersection (overlap)• Herlocker’s weighting• McLaughlin’s weighting     r , sim , r ,v Vu i u v v i           m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn
  13. 13. 13IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison• RMSE: error-based evaluation (the lower, the better)• P@10: precision-based evaluation (the higher the better)     2,1R M S E , ,u i Tr u i r u iT  R el T opNP @ NN
  14. 14. 14IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison• RMSE: error-based evaluation (the lower, the better)• P@10: precision-based evaluation (the higher the better)↑Performance as good (or better) than the baseline↓Lower coverage for some methods0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filteringHerlocker filtering McLaughlin filtering0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size
  15. 15. 15IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10          m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 100
  16. 16. 16IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10          m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 1000.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 200.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 80
  17. 17. 17IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10          m in ,H ,m ax ,M L ,I u I v nu vnI u I v nu vn0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 1000.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 200.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 800.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 400.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 60
  18. 18. 18IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Conclusions We compare three approaches based on user preference overlap• Intersection• Herlocker’s weighting• McLaughlin’s weighting Overlap-based approaches improve accuracy (RMSE) and precision(P@10)• Especially, for small neighbourhoods• Stable with respect to the threshold parameter n
  19. 19. 19IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Future work Extend our analysis to other similarity functions• Cosine• Spearman Exploit other recommendation input dimensions• User logs• Social networks Understand why the overlap is special to improve otherrecommendation algorithms
  20. 20. 20IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Thank you!Improving Memory-Based Collaborative Filteringby Neighbour Selection Based onUser Preference OverlapAlejandro Bellogín, Pablo Castells, Iván Cantador@abelloginalejandro.bellogin@cwi.nl
  21. 21. 21IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)User overlap for neighbour selection Performance comparison: different values for n0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Intersection filtering Herlocker filtering McLaughlin filtering0.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.400.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 1000.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 800.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 600.000.100.200.300.400.500.6010 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 400.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 200.000.050.100.150.200.2510 20 50 100 200 1000 2000P@10Neighbourhood size0.550.700.851.001.151.301.4510 20 50 100 200 1000 2000RMSENeighbourhood size0.00001.000010 20 50 100 200 1000 2000P@10Neighbourhood sizeARSimilarity filtering Trust filtering Intersection filteringHerlocker filtering McLaughlin filteringn = 10
  22. 22. 22IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Experiments Dataset: MovieLens 1M• Users: 6,040• Items: 3,600• Number of ratings: 1 million• Available in http://www.grouplens.org/node/73
  23. 23. 23IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)Neighbour selection in collaborative filtering Two main alternatives• Top-k most similar users• Thresholding: those users whose similarity is above a thresholdIn [Herlocker et al, 2002]: thresholding obtains worse error accuracy Compatible with• Clustering of users: neighbourhood ~ users in the same cluster[Xue et al, 2005; Bellogin & Parapar, 2013]• Probabilistic relevance models: p(v | Ru)[Submitted to RecSys ‘13]• Rating normalisation[Desrosiers & Karypis, 2012]
  24. 24. 24IRGIR Group @ UAMImproving Memory-Based Collaborative Filtering by Neighbour Selection based on User Preference Overlap – Alejandro BellogínOpen Research Areas in Information Retrieval (OAIR 2013)References [Bellogin & Parapar, 2013] Using Graph Partitioning Techniques forNeighbour Selection in User-Based Collaborative Filtering. RecSys [Desrosiers & Karypis, 2012] A Comprehensive Survey ofNeighborhood-based Recommendation Methods. RecommenderSystems Handbook [Herlocker et al, 2002] An Empirical Analysis of Design Choices inNeighborhood-Based Collaborative Filtering Algorithms.Information Retrieval Journal [Xue et al, 2005] Scalable Collaborative Filtering Using Cluster-BasedSmoothing. SIGIR

×