The Wisdom of the Few @SIGIR09

2,002 views

Published on

Presenting The Wisdom of the Few, a Collaborative Filtering approach based on Expert opinions from the Web. This presentation was done in SIGIR 2009 (July 09, Boston, MA)

Published in: Technology, Education

The Wisdom of the Few @SIGIR09

  1. 1. The Wisdom of the Few <ul><ul><li>A Collaborative Filtering Approach Based on Expert Opinions from the Web </li></ul></ul><ul><ul><li>Xavier Amatriain (@xamat), Josep M. Pujol, Nuria Oliver </li></ul></ul><ul><ul><li>Telefonica Research (Barcelona) </li></ul></ul><ul><ul><li>Neal Lathia </li></ul></ul><ul><ul><li>UCL (London) </li></ul></ul>
  2. 2. First, a little quiz <ul><li>Name that Book.... </li></ul>“ It is really only experts who can reliably account for their reactions”
  3. 3. Crowds are not always wise <ul><li>Collaborative filtering is the preferred approach for Recommender Systems </li></ul><ul><ul><li>Recommendations are drawn from your past behavior and that of similar users in the system </li></ul></ul><ul><ul><li>Standard CF approach: </li></ul></ul><ul><ul><ul><li>Find your Neighbors from the set of other users </li></ul></ul></ul><ul><ul><ul><li>Recommend things that your Neighbors liked and you have not “seen” </li></ul></ul></ul><ul><li>Problem: predictions are based on a large dataset that is sparse and noisy </li></ul>
  4. 4. Overview of the Approach <ul><li>expert = individual that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain </li></ul><ul><li>Expert-based Collaborative Filtering </li></ul><ul><ul><li>Find neighbors from a reduced set of experts instead of regular users. </li></ul></ul><ul><ul><ul><li>Identify domain experts with reliable ratings </li></ul></ul></ul><ul><ul><ul><li>For each user, compute “ expert neighbors ” </li></ul></ul></ul><ul><ul><ul><li>Compute recommendations similar to standard kNN CF </li></ul></ul></ul>
  5. 5. Advantages of the Approach <ul><li>Noise </li></ul><ul><ul><li>Experts introduce less natural noise </li></ul></ul><ul><li>Malicious Ratings </li></ul><ul><ul><li>Dataset can be monitored to avoid shilling </li></ul></ul><ul><li>Data Sparsity </li></ul><ul><ul><li>Reduced set of domain experts can be motivated to rate items </li></ul></ul><ul><li>Cold Start problem </li></ul><ul><ul><li>Experts rate items as soon as they are available </li></ul></ul><ul><li>Scalability </li></ul><ul><ul><li>Dataset is several order of magnitudes smaller </li></ul></ul><ul><li>Privacy </li></ul><ul><ul><li>Recommendations can be computed locally </li></ul></ul>
  6. 6. Take home message <ul><li>Expert Collaborative Filtering </li></ul><ul><ul><li>Is a new approach to recommendation but it builds up on standard CF </li></ul></ul><ul><ul><li>Addresses many of standard CF shortcomings </li></ul></ul><ul><ul><li>At least in some conditions, users prefer it over standard CF approaches </li></ul></ul>
  7. 7. User study
  8. 8. User Study <ul><li>57 participants, only 14.5 ratings/participant </li></ul><ul><li>50% of the users consider Expert-based CF to be good or very good </li></ul><ul><li>Expert-based CF: only algorithm with an average rating over 3 (on a 0-4 scale) </li></ul>
  9. 9. User Study <ul><li>Results to the questions: “The recommendation list includes movies I like/dislike” (1-4 Likert) </li></ul><ul><li>Experts-CF clearly outperforms other methods </li></ul>
  10. 10. Expert Collaborative Filtering
  11. 11. Expert-based CF <ul><li>Given user u  U and d , find the set of experts E '  E such that: &quot; e  E '  sim ( u , e )  </li></ul><ul><li>confidence threshold t = the minimum number of expert neighbors who must have rated the item in order to trust their prediction. </li></ul><ul><ul><li>Given an item i , find E ''  E ' s.t. &quot; e  E '' r ei  unrated . </li></ul></ul><ul><ul><ul><li>if n < t ⇒ no prediction, user mean is returned. </li></ul></ul></ul><ul><ul><ul><li>if n  ⇒ rating can be predicted: similarity-weighted average of the ratings input from each expert e in E '' </li></ul></ul></ul>
  12. 12. Experts vs. Users Analysis
  13. 13. Mining the Web for Expert Ratings <ul><li>Collections of expert ratings can be obtained almost directly on the web: we crawled the Rotten Tomatoes movie critics mash-up </li></ul><ul><ul><li>Only those (169) with more than 250 ratings in the Neflix dataset were used </li></ul></ul>
  14. 14. Dataset Analysis (# ratings) <ul><li>Sparsity coefficient: 0.01 (users) vs. 0.07 (experts) </li></ul><ul><li>Average movie has 1K user ratings vs. 100 expert ratings </li></ul><ul><li>Average expert rated 400 movies, 10% rated > 1K </li></ul>
  15. 15. Dataset Analysis ( average) <ul><li>Users: average movie rating ~0.55 (3.2⋆); </li></ul><ul><ul><li>10%  0.45(2.8⋆),10%  0.7(3.8⋆) </li></ul></ul><ul><li>Experts: average movie rating ~0.6 (3.4⋆) </li></ul><ul><ul><li>10%  0.4(2.6⋆), 10%  0.8 (4.2⋆) </li></ul></ul><ul><li>user ratings centered 0.7 (3.8⋆) </li></ul><ul><li>expert ratings centered 0.6 (3.4⋆): small variability </li></ul><ul><ul><li>only 10% of the experts have a mean score  0.55 (3.2⋆) and another 10%  0.7 (3.8⋆) </li></ul></ul>
  16. 16. Dataset Analysis (std) <ul><li>Users: </li></ul><ul><ul><li>per movie centered around 0.25 (1⋆), little variation </li></ul></ul><ul><ul><li>per user centered around 0.25, larger variability </li></ul></ul><ul><li>Experts: </li></ul><ul><ul><li>lower std per movie (0.15) and larger variation. </li></ul></ul><ul><ul><li>average std per expert = 0.2, small variability. </li></ul></ul>
  17. 17. Dataset Analysis. Summary <ul><li>Experts... </li></ul><ul><ul><li>are much less sparse </li></ul></ul><ul><ul><li>rate movies all over the rating scale instead of being biased towards rating only “good” movies (different incentives). </li></ul></ul><ul><ul><li>but, they seem to consistently agree on the good movies. </li></ul></ul><ul><ul><li>have a lower overall standard deviation per movie: they tend to agree more than regular users. </li></ul></ul><ul><ul><li>tend to deviate less from their personal average rating. </li></ul></ul>
  18. 18. Experimental Results
  19. 19. Evaluation Procedure <ul><li>Use the 169 experts to predict ratings from 10.000 users sampled from the Netflix dataset </li></ul><ul><li>Prediction MAE using a 80-20 holdout procedure (5-fold cross-validation) </li></ul><ul><li>Top-N precision by classifying items as being “recommendable” given a threshold </li></ul><ul><li>Still, take results with a grain of salt... we have a user study backing up the approach </li></ul>
  20. 20. Results. Prediction MAE <ul><li>Setting our parameters to  =10 and  =0.01, we obtain a MAE of 0.781 and a coverage of 97.7% </li></ul><ul><ul><li>expert-CF yields a significant accuracy improvement with respect to using the experts’ average </li></ul></ul><ul><ul><li>Accuracy is worse than standard CF (with better coverage) </li></ul></ul>
  21. 21. Role of Thresholds <ul><li>MAE is inversely proportional to the similarity threshold (  ) until the 0.06 mark, when it starts to increase as we move to higher  values. </li></ul><ul><ul><li>below 0.0 it degrades rapidly: too many experts; </li></ul></ul><ul><li>Coverage decreases as we increase  . </li></ul><ul><ul><li>For the optimal MAE point of 0.06, coverage is still above 70%. </li></ul></ul><ul><li>MAE as a function of the confidence threshold (  )  =0.0 and  =0.01(optimal around 9) </li></ul>
  22. 22. Comparison to standard CF <ul><li>Standard NN CF has MAE around 10% but coverage is also 10% lower </li></ul><ul><li>Expert-CF only works worse for the 10% of the users with lower MAE </li></ul>
  23. 23. Results2. Top-N Precision <ul><li>Precision of the Top-N Recommendations as a function of the “recommendable” threshold </li></ul><ul><li>For a threshold of 4, NN-CF outperforms expert-based but if we lower it to 3 they are almost equal </li></ul>
  24. 24. Conclusions <ul><li>Different approach to the Recommendation problem </li></ul><ul><li>At least in some conditions, users prefer recommendations from similar experts than similar users. </li></ul><ul><li>Expert-based CF has the potential to address many of standard CF shortcomings </li></ul>
  25. 25. Future/Curent Work <ul><li>We are currently exploring its performance in other domains and implementing a distributed expert-based CF application (work with Jae-Wook Ahn, Pittsburgh U.) </li></ul>
  26. 26. The Wisdom of the Few <ul><ul><li>Thanks! </li></ul></ul><ul><ul><li>Questions? </li></ul></ul>

×