Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases

355 views

Published on

Presentation slides of full paper at SIGIR 2017, Tokyo, Japan, 8 August 2017

Published in: Technology
  • Be the first to comment

SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases

  1. 1. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Rocío Cañamares and Pablo Castells Autónoma University of Madrid http://ir.ii.uam.es Tokyo, Japan, 8 August 2017
  2. 2. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 The recommender systems task 𝑢 𝑖 𝑣 Clara Sanabras The Beatles Vanessa Da Mata A recommender system 1. Observes users as they carry out activities in the system 2. Detects behavior patterns, identifies evidence of interests 3. Predicts and suggests choices of potential interest
  3. 3. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 kNN user-based Ƹ𝑟 𝑢, 𝑖 = ෍ 𝑣 𝑤𝑣 𝑟 𝑣, 𝑖 𝑣 𝑖 Target item 𝑢 Target user Neighbor users The 𝒌 nearest neighbors approach (user-based)
  4. 4. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 𝑣 kNN user-based with cosine similarity Ƹ𝑟 𝑢, 𝑖 = 𝐶 ෍ 𝑣 𝑠𝑖𝑚 𝑢, 𝑣 𝑟 𝑣, 𝑖 𝑠𝑖𝑚 𝑢, 𝑣 = cos 𝑢 · Ԧ𝑣 = 𝑢 · Ԧ𝑣 𝑢 Ԧ𝑣 ൘1 ෍ 𝑣 𝑠𝑖𝑚 𝑢, 𝑣 1 𝐶 = = σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 2 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 2 𝑖 Target item 𝑢 Target user
  5. 5. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 The kNN scheme  Has been around since the early 90’s  Is easy to understand, implement, explain  Is competitive and broadly used in industry today  Is heuristic – Many variants, not clear which one is better  trial and error  Why a probabilistic reformulation? – For the sake of it  – May help better understand, explain and configure kNN HeuristicProbabilistic
  6. 6. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Probability space: item choice Target user Items User choice What item would the user choose?
  7. 7. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Probability space: item choice Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 𝐼𝑈 Future user choices “urn” What item would the user choose?
  8. 8. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 The main idea: marginalization Past user choices “urn” 𝐽 = 𝐼𝑉 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 𝐼𝑈 Future user choices “urn” = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐼 = 𝑖 𝑈 = 𝑢, 𝑉 = 𝑣, 𝐽 = 𝐼
  9. 9. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 The main idea: marginalization 𝐽 = 𝐼𝑉 𝐼𝑈 Future user choices “urn”Past user choices “urn”
  10. 10. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 Probability estimation Use past choices as a sample of future choices distribution 𝐽 = 𝐼𝑉 𝐼𝑈 Past user choices “urn”
  11. 11. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 Probability estimation 𝑟 𝑣, 𝑖 ≡ # times 𝑣 has interacted with 𝑖 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 = 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 𝐽 = 𝐼𝑉 𝐼𝑈 Past user choices “urn”
  12. 12. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 Probability estimation 𝑟 𝑣, 𝑖 ≡ # times 𝑣 has interacted with 𝑖 𝐽 = 𝐼𝑉 𝐼𝑈 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 = 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 Past user choices “urn”
  13. 13. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 Probability estimation 𝑟 𝑣, 𝑖 ≡ # times 𝑣 has interacted with 𝑖 𝐽 = 𝐼𝑉 𝐼𝑈 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 = σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗 σ 𝑤∈𝒰 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑤, 𝑗 Past user choices “urn”
  14. 14. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ ෍ 𝑣∈𝒰 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 𝑟 𝑣, 𝑖 Putting all together… 𝐽 = 𝐼𝑉 𝐼𝑈 Quite the same as the heuristic user-based kNN scheme! = ෍ 𝑣∈𝒰 𝑢 · Ԧ𝑣 𝑢 1 Ԧ𝑣 1 𝑟 𝑣, 𝑖 Past user choices “urn”
  15. 15. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017  Item-based 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍ 𝑗∈ℐ Ԧ𝑖 · Ԧ𝑗 Ԧ𝑗 1 𝑟 𝑢, 𝑗 𝐶 = σ 𝑣∈𝒰 𝑟 𝑣, 𝑖 σ 𝑣∈𝒰 𝑟 𝑣, 𝑖 Ԧ𝑣 1  Normalized variants – User-based 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍ 𝑣∈𝒰 𝑢 · Ԧ𝑣 𝑢 1 Ԧ𝑣 1 𝑟 𝑣, 𝑖 𝐶 = ൘1 ෍ 𝑣∈𝒰 𝑟 𝑣,𝑖 >0 𝑢 · Ԧ𝑣 – Item-based 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍ 𝑗∈ℐ Ԧ𝑖 · Ԧ𝑗 Ԧ𝑗 1 𝑟 𝑢, 𝑗 𝐶 = ൙Ԧ𝑖 ෍ 𝑗∈ℐ 𝑟 𝑣,𝑖 >0 Ԧ𝑖 · Ԧ𝑗 Other variants
  16. 16. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Popularity bias  If pairwise user independence: 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐼 = 𝐽 = 𝑝 𝑉 = 𝑣 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∼ ෍ 𝑣∈𝒰 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 𝑝 𝑉 = 𝑣 = 𝑝 𝐽 = 𝑖  is the popularity of item 𝑖  Therefore kNN: – Is biased towards popular items – Needs pairwise user-user dependence to work properly  Other kNN variants – Normalized user-based kNN is biased to the average rating – Item-based kNN (normalized or not) is also biased to popularity 𝑝 𝐽 = 𝑖 ∝ ෍ 𝑣∈𝒰 𝑟 𝑣, 𝑖
  17. 17. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Experiments  Random rating split 80% training, 20% test  Parameter tuning by grid search  Dirichlet smoothing for probabilistic kNN Domain # users # items # ratings MovieLens 1M Movies 6,040 3,706 1,000,209 Netflix Movies 480,189 17,770 100,480,507 Last.fm Music 992 174,091 898,073 Crowd random Music 1,054 1,084 103,584  Test probabilistic against heuristic variants  Check popularity biases  Datasets Public Flat rating distribution over items
  18. 18. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 0 0.1 0.2 0.3 User based Item based User based Item based Not normalized Normalized 0 0.05 0.1 0.15 0.2 User based Item based User based Item based Not normalized Normalized 0 0.1 0.2 0.3 User based Item based User based Item based Not normalized Normalized Public datasets – Results MovieLens 1M Netflix Last.fm nDCG@10 Heuristic Probabilistic   Heuristic Probabilistic     Similar accuracy overall Some improvements on item-based
  19. 19. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 0 0.01 0.02 User based Item based User based Item based Not normalized Normalized Crowdsourced dataset – Results nDCG@10 As good as not normalized! Heuristic item-based With flat ratings distribution… Heuristic Probabilistic
  20. 20. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 0 2000 4000 6000 1 2 3 4 5 Average rating Public datasets – Popularity biases 0 2000 4000 0 2000 4000 Popularity Probabilistic 0 2000 4000 6000 0 2000 4000 Popularity Not normalized Normalized User-based kNN (MovieLens 1M) Popularity Popularity 0 2000 4000 6000 1 2 3 4 5 Average rating Heuristic 0 2000 4000 0 2000 4000 Popularity 0 2000 4000 0 2000 4000 Popularity Not normalized Normalized 0 1000 2000 3000 4000 0 2000 4000 Popularity 0 2000 4000 6000 0 2000 4000 PopularityPopularity Popularity Average ratingAverage rating Quite the same trends
  21. 21. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Public datasets – Popularity biases Probabilistic Heuristic Not normalized Normalized Not normalized Normalized Item-based kNN (MovieLens 1M) 0 2000 4000 0 2000 4000 Popularity 0 2000 4000 0 2000 4000 Popularity 0 2000 4000 0 2000 4000 Popularity 0 500 1000 0 2000 4000 Popularity 0 500 1000 1 2 3 4 5 Average rating Not quite the same trends Popularity Popularity Popularity Popularity Average rating
  22. 22. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Conclusion  Full probabilistic reformulation of kNN scheme – With classic variants  The probabilistic formulation… – Provides a precise explanation why kNN works, under what condition – Explains why kNN tends to recommend popular items – Has the advantages of a probabilistic formulation  Equivalent accuracy and behavior to heuristic formulations – More so for user-based variants – Probabilistic item-based is more consistent than heuristic – Accuracy of normalized kNN might be misrepresented on common datasets  Future work: explore further empirical optimization, inter-user dependency analysis, other collaborative filtering methods…

×