Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]

281 views

Published on

Slides of the presentation given at ECIR 2016 for the following paper:

Daniel Valcarce, Javier Parapar, Alvaro Barreiro: Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation. ECIR 2016: 602-613

http://dx.doi.org/10.1007/978-3-319-30671-1_44

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]

  1. 1. ECIR 2016, PADUA, ITALY EFFICIENT PSEUDO-RELEVANCE FEEDBACK METHODS FOR COLLABORATIVE FILTERING RECOMMENDATION Daniel Valcarce, Javier Parapar, Álvaro Barreiro @dvalcarce @jparapar @AlvaroBarreiroG Information Retrieval Lab @IRLab_UDC University of A Coruña Spain
  2. 2. Outline 1. Pseudo-Relevance Feedback (PRF) 2. Collaborative Filtering (CF) 3. PRF Methods for CF 4. Experiments 5. Conclusions and Future Work 1/28
  3. 3. PSEUDO-RELEVANCE FEEDBACK (PRF)
  4. 4. Pseudo-Relevance Feedback (I) Pseudo-Relevance Feedback provides an automatic method for query expansion: Assumes that the top retrieved documents with the original query are relevant (pseudo-relevant set). The query is expanded with the most representative terms from this set. The expanded query is expected to yield better results than the original one. 3/28
  5. 5. Pseudo-Relevance Feedback (II) Information need 4/28
  6. 6. Pseudo-Relevance Feedback (II) Information need query 4/28
  7. 7. Pseudo-Relevance Feedback (II) Information need query Retrieval System 4/28
  8. 8. Pseudo-Relevance Feedback (II) Information need query Retrieval System 4/28
  9. 9. Pseudo-Relevance Feedback (II) Information need query Retrieval System 4/28
  10. 10. Pseudo-Relevance Feedback (II) Information need query Retrieval System 4/28
  11. 11. Pseudo-Relevance Feedback (II) Information need query Retrieval System Query Expansion expanded query 4/28
  12. 12. Pseudo-Relevance Feedback (II) Information need query Retrieval System Query Expansion expanded query 4/28
  13. 13. Pseudo-Relevance Feedback (III) Some popular PRF approaches: Based on Rocchio’s model (Rocchio, 1971 & Carpineto et al., ACM TOIS 2001) Relevance-Based Language Models (Lavrenko & Croft, SIGIR 2001) Divergence Minimization Model (Zhai & Lafferty, SIGIR 2006) Mixture Models (Tao & Zhai, SIGIR 2006) 5/28
  14. 14. COLLABORATIVE FILTERING (CF)
  15. 15. Recommender Systems Notation: The set of users U The set of items I The rating that the user u gave to the item i is ru,i The set of items rated by user u is denoted by Iu The set of users that rated item i is denoted by Ui The neighbourhood of user u is denoted by Vu Top-N recommendation: create a ranked list containing relevant and unknown items for each user u ∈ U. 7/28
  16. 16. Collaborative Filtering (I) Collaborative Filtering (CF) employs the past interaction between users and items to generate recommendations. Idea: If this user who is similar to you likes this item, maybe you will also like it. Different input data: Explicit feedback: ratings, reviews... Implicit feedback: clicks, purchases... Perhaps the most popular approach to recommendation given the increasing amount of information about users. 8/28
  17. 17. Collaborative Filtering (II) Collaborative Filtering (CF) techniques can be classified in: Model-based methods: learn a predictive model from the user-item ratings. ◦ Matrix factorisation (e.g., SVD) Neighbourhood-based (or memory-based) methods: compute recommendations using directly part of the ratings. ◦ k-NN approaches 9/28
  18. 18. PRF METHODS FOR CF
  19. 19. PRF for CF PRF CF User’s query User’s profile mostˆ1,populatedˆ2,stateˆ2 Titanicˆ2,Avatarˆ3,Matrixˆ5 Documents Neighbours Terms Items 11/28
  20. 20. Previous Work on Adapting PRF Methods to CF Relevance-Based Language Models Originally devised for PRF (Lavrenko & Croft, SIGIR 2001). Adapted to CF (Parapar et al., Inf. Process. Manage. 2013). Two models: RM1 and RM2. High precision figures in recommendation. 12/28
  21. 21. Previous Work on Adapting PRF Methods to CF Relevance-Based Language Models Originally devised for PRF (Lavrenko & Croft, SIGIR 2001). Adapted to CF (Parapar et al., Inf. Process. Manage. 2013). Two models: RM1 and RM2. High precision figures in recommendation. ... but high computational cost! RM1 : p(i|Ru) ∝ v∈Vu p(v) p(i|v) j∈Iu p(j|v) RM2 : p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) 12/28
  22. 22. Our Proposals based on Rocchio’s Framework Rocchio’s Weights pRocchio(i|u) v∈Vu rv,i |Vu| Robertson Selection Value g pRSV(i|u) v∈Vu rv,i |Vu| p(i|Vu) CHI-2 g pCHI−2(i|u) p(i|Vu) − p(i|C) 2 p(i|C) Kullback–Leibler Divergence pKLD(i|u) p(i|Vu) log p(i|Vu) p(i|C) 13/28
  23. 23. Our Proposals based on Rocchio’s Framework Rocchio’s Weights pRocchio(i|u) v∈Vu rv,i |Vu| Robertson Selection Value g pRSV(i|u) v∈Vu rv,i |Vu| p(i|Vu) CHI-2 g pCHI−2(i|u) p(i|Vu) − p(i|C) 2 p(i|C) Kullback–Leibler Divergence pKLD(i|u) p(i|Vu) log p(i|Vu) p(i|C) 13/28
  24. 24. Our Proposals based on Rocchio’s Framework Rocchio’s Weights pRocchio(i|u) v∈Vu rv,i |Vu| Robertson Selection Value g pRSV(i|u) v∈Vu rv,i |Vu| p(i|Vu) CHI-2 g pCHI−2(i|u) p(i|Vu) − p(i|C) 2 p(i|C) Kullback–Leibler Divergence pKLD(i|u) p(i|Vu) log p(i|Vu) p(i|C) 13/28
  25. 25. Probability Estimation Maximum Likelihood Estimate under a Multinomial Distribution over the ratings: pmle(i|Vu) v∈Vu rv,i v∈Vu , j∈I rv,j pmle(i|C) u∈U ru,i u∈U, j∈I ru,j 14/28
  26. 26. Neighbourhood Length Normalisation (I) Neighbourhoods are computed using clustering algorithms: Hard clustering: every user is in only one cluster. Clusters may have different sizes. Example: k-means. Soft clustering: each user has its own neighbours. When we set k to a high value, we may find different amounts of neighbours. Example: k-NN. 15/28
  27. 27. Neighbourhood Length Normalisation (I) Neighbourhoods are computed using clustering algorithms: Hard clustering: every user is in only one cluster. Clusters may have different sizes. Example: k-means. Soft clustering: each user has its own neighbours. When we set k to a high value, we may find different amounts of neighbours. Example: k-NN. Idea: consider the variability of the neighbourhood lengths: Big neighbourhoods is equivalent to a query with a lot of results: the collection model is closed to the target user. Small neighbourhoods implies that neighbours are highly specific: the collection is very different from the target user. 15/28
  28. 28. Neighbourhood Length Normalisation (II) We bias the MLE to perform neighbourhood length normalisation: pnmle(i|Vu) rank 1 |Vu| v∈Vu rv,i v∈Vu , j∈I rv,j pnmle(i|C) rank 1 |U| u∈U ru,i u∈U, j∈I ru,j 16/28
  29. 29. EXPERIMENTS
  30. 30. Experimental settings Baselines: UB: traditional user-based neighbourhood approach. SVD: matrix factorisation. UIR-Item: probabilistic approach. RM1 and RM2: Relevance-Based Language Models. Our algorithms: Rocchio’s Weights (RW) Robertson Selection Value (RSV) CHI-2 Kullback-Leibler Divergence (KLD) 18/28
  31. 31. Efficiency 0.01 0.1 1 10 ML 100k ML 1M ML 10M recommendationtimeperuser(s) dataset UIR RM1 RM2 SVD++ RSV UB RW CHI-2 KLD 19/28
  32. 32. Accuracy (nDCG@10) Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing UB 0.0468 0.0313 0.0108 0.0055b SVD 0.0936a 0.0608a 0.0101 0.0015 UIR-Item 0.2188ab 0.1795abd 0.0174abd 0.0673abd RM1 0.2473abc 0.1402ab 0.0146ab 0.0444ab RM2 0.3323abcd 0.1992abd 0.0207abcd 0.0957abcd Rocchio’s Weights 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd RSV 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd KLD MLE 0.2693abcd 0.1264ab 0.0197abcd 0.1576abcde NMLE 0.3120abcd 0.1546ab 0.0201abcd 0.1101abcde CHI-2 MLE 0.0777a 0.0709ab 0.0149ab 0.0939abcd NMLE 0.3220abcd 0.1419ab 0.0204abcd 0.1459abcde Table: Values of nDCG@10. Pink = best algorithm. Blue = not significantly different to the best (Wilcoxon two-sided p < 0.01). 20/28
  33. 33. Diversity (Gini@10) Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing UIR-Item 0.0124 0.0050 0.0137 0.0005 RM2 0.0256 0.0069 0.0207 0.0019 CHI-2 NMLE 0.0450 0.0106 0.0506 0.0539 Table: Values of the complement of Gini index at 10. Pink = best algorithm. 21/28
  34. 34. Novelty (MSI@10) Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing UIR-Item 5.2337e 8.3713e 3.7186e 17.1229e RM2 6.8273c 8.9481c 4.9618c 19.27343c CHI-2 NMLE 8.1711ec 10.0043ec 7.5555ec 8.8563 Table: Values of Mean Self-Information at 10. Pink = best algorithm. 22/28
  35. 35. Trade-off Accuracy-Diversity 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 200 300 400 500 600 700 800 900 G–(Gini,nDCG) k RM2 CHI-2 NMLE Figure: G-measure of nDCG@10 and Gini@10 on MovieLens 100k varying the number of neighbours k using Pearson’s correlation similarity. 23/28
  36. 36. Trade-off Accuracy-Novelty 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 200 300 400 500 600 700 800 900 G–(MSI,nDCG) k RM2 CHI-2 NMLE Figure: G-measure of nDCG@10 and MSI@10 on MovieLens 100k varying the number of neighbours k using Pearson’s correlation similarity. 24/28
  37. 37. CONCLUSIONS AND FUTURE WORK
  38. 38. Conclusions We proposed to use fast PRF methods (Rocchio’s Weigths, RSV, KLD and CHI-2): They are orders of magnitude faster than the Relevance Models (up to 200x). They generate quite accurate recommendations. Good novelty and diversity figures with a better trade-off than RM2. They lack of parameters (only clustering parameters). 26/28
  39. 39. Future Work Other approaches for computing neighbourhoods: Posterior Probability Clustering (a non-negative matrix factorisation). Normalised Cut (spectral clustering). 27/28
  40. 40. Future Work Other approaches for computing neighbourhoods: Posterior Probability Clustering (a non-negative matrix factorisation). Normalised Cut (spectral clustering). Explore other PRF methods: Divergence Minimization Models. Mixture Models. 27/28
  41. 41. THANK YOU! @DVALCARCE http://www.dc.fi.udc.es/~dvalcarce

×