Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Wisdom of the Few @SIGIR09


Published on

Presenting The Wisdom of the Few, a Collaborative Filtering approach based on Expert opinions from the Web. This presentation was done in SIGIR 2009 (July 09, Boston, MA)

Published in: Technology, Education

The Wisdom of the Few @SIGIR09

  1. 1. The Wisdom of the Few <ul><ul><li>A Collaborative Filtering Approach Based on Expert Opinions from the Web </li></ul></ul><ul><ul><li>Xavier Amatriain (@xamat), Josep M. Pujol, Nuria Oliver </li></ul></ul><ul><ul><li>Telefonica Research (Barcelona) </li></ul></ul><ul><ul><li>Neal Lathia </li></ul></ul><ul><ul><li>UCL (London) </li></ul></ul>
  2. 2. First, a little quiz <ul><li>Name that Book.... </li></ul>“ It is really only experts who can reliably account for their reactions”
  3. 3. Crowds are not always wise <ul><li>Collaborative filtering is the preferred approach for Recommender Systems </li></ul><ul><ul><li>Recommendations are drawn from your past behavior and that of similar users in the system </li></ul></ul><ul><ul><li>Standard CF approach: </li></ul></ul><ul><ul><ul><li>Find your Neighbors from the set of other users </li></ul></ul></ul><ul><ul><ul><li>Recommend things that your Neighbors liked and you have not “seen” </li></ul></ul></ul><ul><li>Problem: predictions are based on a large dataset that is sparse and noisy </li></ul>
  4. 4. Overview of the Approach <ul><li>expert = individual that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain </li></ul><ul><li>Expert-based Collaborative Filtering </li></ul><ul><ul><li>Find neighbors from a reduced set of experts instead of regular users. </li></ul></ul><ul><ul><ul><li>Identify domain experts with reliable ratings </li></ul></ul></ul><ul><ul><ul><li>For each user, compute “ expert neighbors ” </li></ul></ul></ul><ul><ul><ul><li>Compute recommendations similar to standard kNN CF </li></ul></ul></ul>
  5. 5. Advantages of the Approach <ul><li>Noise </li></ul><ul><ul><li>Experts introduce less natural noise </li></ul></ul><ul><li>Malicious Ratings </li></ul><ul><ul><li>Dataset can be monitored to avoid shilling </li></ul></ul><ul><li>Data Sparsity </li></ul><ul><ul><li>Reduced set of domain experts can be motivated to rate items </li></ul></ul><ul><li>Cold Start problem </li></ul><ul><ul><li>Experts rate items as soon as they are available </li></ul></ul><ul><li>Scalability </li></ul><ul><ul><li>Dataset is several order of magnitudes smaller </li></ul></ul><ul><li>Privacy </li></ul><ul><ul><li>Recommendations can be computed locally </li></ul></ul>
  6. 6. Take home message <ul><li>Expert Collaborative Filtering </li></ul><ul><ul><li>Is a new approach to recommendation but it builds up on standard CF </li></ul></ul><ul><ul><li>Addresses many of standard CF shortcomings </li></ul></ul><ul><ul><li>At least in some conditions, users prefer it over standard CF approaches </li></ul></ul>
  7. 7. User study
  8. 8. User Study <ul><li>57 participants, only 14.5 ratings/participant </li></ul><ul><li>50% of the users consider Expert-based CF to be good or very good </li></ul><ul><li>Expert-based CF: only algorithm with an average rating over 3 (on a 0-4 scale) </li></ul>
  9. 9. User Study <ul><li>Results to the questions: “The recommendation list includes movies I like/dislike” (1-4 Likert) </li></ul><ul><li>Experts-CF clearly outperforms other methods </li></ul>
  10. 10. Expert Collaborative Filtering
  11. 11. Expert-based CF <ul><li>Given user u  U and d , find the set of experts E '  E such that: &quot; e  E '  sim ( u , e )  </li></ul><ul><li>confidence threshold t = the minimum number of expert neighbors who must have rated the item in order to trust their prediction. </li></ul><ul><ul><li>Given an item i , find E ''  E ' s.t. &quot; e  E '' r ei  unrated . </li></ul></ul><ul><ul><ul><li>if n < t ⇒ no prediction, user mean is returned. </li></ul></ul></ul><ul><ul><ul><li>if n  ⇒ rating can be predicted: similarity-weighted average of the ratings input from each expert e in E '' </li></ul></ul></ul>
  12. 12. Experts vs. Users Analysis
  13. 13. Mining the Web for Expert Ratings <ul><li>Collections of expert ratings can be obtained almost directly on the web: we crawled the Rotten Tomatoes movie critics mash-up </li></ul><ul><ul><li>Only those (169) with more than 250 ratings in the Neflix dataset were used </li></ul></ul>
  14. 14. Dataset Analysis (# ratings) <ul><li>Sparsity coefficient: 0.01 (users) vs. 0.07 (experts) </li></ul><ul><li>Average movie has 1K user ratings vs. 100 expert ratings </li></ul><ul><li>Average expert rated 400 movies, 10% rated > 1K </li></ul>
  15. 15. Dataset Analysis ( average) <ul><li>Users: average movie rating ~0.55 (3.2⋆); </li></ul><ul><ul><li>10%  0.45(2.8⋆),10%  0.7(3.8⋆) </li></ul></ul><ul><li>Experts: average movie rating ~0.6 (3.4⋆) </li></ul><ul><ul><li>10%  0.4(2.6⋆), 10%  0.8 (4.2⋆) </li></ul></ul><ul><li>user ratings centered 0.7 (3.8⋆) </li></ul><ul><li>expert ratings centered 0.6 (3.4⋆): small variability </li></ul><ul><ul><li>only 10% of the experts have a mean score  0.55 (3.2⋆) and another 10%  0.7 (3.8⋆) </li></ul></ul>
  16. 16. Dataset Analysis (std) <ul><li>Users: </li></ul><ul><ul><li>per movie centered around 0.25 (1⋆), little variation </li></ul></ul><ul><ul><li>per user centered around 0.25, larger variability </li></ul></ul><ul><li>Experts: </li></ul><ul><ul><li>lower std per movie (0.15) and larger variation. </li></ul></ul><ul><ul><li>average std per expert = 0.2, small variability. </li></ul></ul>
  17. 17. Dataset Analysis. Summary <ul><li>Experts... </li></ul><ul><ul><li>are much less sparse </li></ul></ul><ul><ul><li>rate movies all over the rating scale instead of being biased towards rating only “good” movies (different incentives). </li></ul></ul><ul><ul><li>but, they seem to consistently agree on the good movies. </li></ul></ul><ul><ul><li>have a lower overall standard deviation per movie: they tend to agree more than regular users. </li></ul></ul><ul><ul><li>tend to deviate less from their personal average rating. </li></ul></ul>
  18. 18. Experimental Results
  19. 19. Evaluation Procedure <ul><li>Use the 169 experts to predict ratings from 10.000 users sampled from the Netflix dataset </li></ul><ul><li>Prediction MAE using a 80-20 holdout procedure (5-fold cross-validation) </li></ul><ul><li>Top-N precision by classifying items as being “recommendable” given a threshold </li></ul><ul><li>Still, take results with a grain of salt... we have a user study backing up the approach </li></ul>
  20. 20. Results. Prediction MAE <ul><li>Setting our parameters to  =10 and  =0.01, we obtain a MAE of 0.781 and a coverage of 97.7% </li></ul><ul><ul><li>expert-CF yields a significant accuracy improvement with respect to using the experts’ average </li></ul></ul><ul><ul><li>Accuracy is worse than standard CF (with better coverage) </li></ul></ul>
  21. 21. Role of Thresholds <ul><li>MAE is inversely proportional to the similarity threshold (  ) until the 0.06 mark, when it starts to increase as we move to higher  values. </li></ul><ul><ul><li>below 0.0 it degrades rapidly: too many experts; </li></ul></ul><ul><li>Coverage decreases as we increase  . </li></ul><ul><ul><li>For the optimal MAE point of 0.06, coverage is still above 70%. </li></ul></ul><ul><li>MAE as a function of the confidence threshold (  )  =0.0 and  =0.01(optimal around 9) </li></ul>
  22. 22. Comparison to standard CF <ul><li>Standard NN CF has MAE around 10% but coverage is also 10% lower </li></ul><ul><li>Expert-CF only works worse for the 10% of the users with lower MAE </li></ul>
  23. 23. Results2. Top-N Precision <ul><li>Precision of the Top-N Recommendations as a function of the “recommendable” threshold </li></ul><ul><li>For a threshold of 4, NN-CF outperforms expert-based but if we lower it to 3 they are almost equal </li></ul>
  24. 24. Conclusions <ul><li>Different approach to the Recommendation problem </li></ul><ul><li>At least in some conditions, users prefer recommendations from similar experts than similar users. </li></ul><ul><li>Expert-based CF has the potential to address many of standard CF shortcomings </li></ul>
  25. 25. Future/Curent Work <ul><li>We are currently exploring its performance in other domains and implementing a distributed expert-based CF application (work with Jae-Wook Ahn, Pittsburgh U.) </li></ul>
  26. 26. The Wisdom of the Few <ul><ul><li>Thanks! </li></ul></ul><ul><ul><li>Questions? </li></ul></ul>