The Science and the Magic of User Feedback for Recommender Systems

10,832 views

Published on

Slides that I used as the base for a set of invited talks in companies in the Bay Area such as Netflix and LinkedIn in March 2011.

Published in: Technology
1 Comment
14 Likes
Statistics
Notes
  • Thank you sharing this wonderful presentation of Telefinoca.I know it will come to a point that it will be the number in the world.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
10,832
On SlideShare
0
From Embeds
0
Number of Embeds
1,054
Actions
Shares
0
Downloads
239
Comments
1
Likes
14
Embeds 0
No embeds

No notes for slide

The Science and the Magic of User Feedback for Recommender Systems

  1. The Science and the Magic of User Feedback for Recommender Systems Xavier Amatriain Bay Area, March 11
  2. But first...About Telefonica and Telefonica R&D
  3. Telefonica is a fast-growing Telecom 1989 2000 2008 Clients About 12 About 68 About 260 million million million subscribers customers customers Services Basic Wireline and mobile Integrated ICT telephone and voice, data and solutions for all data services Internet services customersGeographies Operations in Operations in Spain 25 countries 16 countries Staff About 71,000 About 149,000 About 257,000 professionals professionals professionals Finances Rev: 4,273 M€ Rev: 28,485 M€ Rev: 57,946 M€ EPS(1): 0.45 € EPS(1): 0.67 € EPS: 1.63 € (1) EPS: Earnings per share
  4. Currently among the largest in the world Telco sector worldwide ranking by market cap (US$ bn) Source: Bloomberg, 06/12/09 Just announced 2010 results: record net earnings,  first Spanish company ever to make > 10B €
  5. Leader in South AmericaData as of March ‘09 1 2 Argentina: 20.9 million Wireline market rank  2 1 Brazil: 61.4 million Mobile market rank 2 Central America: 6.1 million 1 2 Colombia: 12.6 million 1 1 Chile: 10.1 million 2 Ecuador: 3.3 million 2 Mexico: 15.7 million 1 1 Peru: 15.2 million 1 Uruguay: 1.5 million 2 Venezuela: 12.0 million Total Accesses (as of March ‘09) 159.5 millionNotes:- Central America includes Guatemala, Panama, El Salvador and Nicaragua- Total accesses figure includes Narrowband Internet accesses of Terra Brasil and Terra Colombia, and BroadbandInternet accesses of Terra Brasil, Telefónica de Argentina, Terra Guatemala and Terra México.
  6. And a significant footprint in Europe Wireline market rank Mobile market rankData as of March ‘09 1 1 Spain: 47.2 million 1 UK: 20.8 million 4 Germany: 16.0 million 2 Ireland: 1.7 million Czech Republic: 7.7 million 1 2 Slovakia: 0.4 million 3 Total Accesses (as of March ’09) 93.8 million
  7. Scientific Research Mobile and UbicompMultimedia Core User Modelling & Data Mining HCIR DATA MINING Wireless SystemsContent Distribution & P2P Social Networks
  8. Projects Recommendation  Algorithms Tourist routes Social  contacts MusicUser Analysis  Movies& Modeling Contextual The Wisdom of the  Noise in users’  Few Mobile ratings  Tourist  behavior Microprofiles Implicit user  feedback Multiverse  Tensor  IPTV viewing habits Factorization
  9. Projects Recommendation  Algorithms Tourist routes Social  contacts MusicUser Analysis  Movies& Modeling Contextual The Wisdom of the  Noise in users’  Few Mobile ratings  Tourist  behavior Microprofiles Implicit user  feedback Multiverse  Tensor  IPTV viewing habits Factorization
  10. And about the world we live in...
  11. Information Overload
  12. More is Less W or se D ec is ns io io ns is ec D s esL
  13. Analysis Paralysis is making headlines
  14. Search engines don’t always hold the answer
  15. What about discovery?
  16. What about curiosity?
  17. What about information to help take decisions?
  18. The Age of Search has come to an end... long live the Age of Recommendation!●● Chris Anderson in “The Long Tail” ● “We are leaving the age of information and entering the age of recommendation”● CNN Money, “The race to create a smart Google”: ● “The Web, they say, is leaving the era of search and entering one of discovery. Whats the difference? Search is what you do when youre looking for something. Discovery is when something wonderful that you didnt know existed, or didnt know how to ask for, finds you.”
  19. Recommender Systems RecommendationsRead this Attend this conference
  20. Data mining + all those other things● User Interface● User modeling● System requirements (efficiency, scalability, privacy....)● Business Logic● Serendipity● ....
  21. Approaches to RecommendationCollaborative Filtering● ● Recommend items based only on the users past behaviorContent-based● ● Recommend based on features inherent to the itemsSocial recommendations (trust-based)●
  22. What works● It depends on the domain and particular problem ● As a general rule, it is usually a good idea to combine: Hybrid Recommender Systems● However, in the general case it has beendemonstrated that (currently) the best isolatedapproach is CF. ● Item-based in general more efficient and better but mixing CF approaches can improve result ● Other approaches can improve results in specific cases (cold-start problem...)
  23. The CF Ingredients● List of m Users and a list of n Items● Each user has a list of items with associated opinion ● Explicit opinion - a rating score (numerical scale) ● Implicit feedback – purchase records or listening history● Active user for whom the prediction task is performed● A metric for measuring similarity between users● A method for selecting a subset of neighbors● A method for predicting a rating for items not rated bythe active user. 24
  24. The Netflix Prize● 500K users x 17K movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE) ● 49K contestants on 40K teams from 184 countries. ● 41K valid submissions from 5K teams; 64 submissions per day ● Wining approach uses hundreds of predictors from several teams
  25. But ...
  26. User Feedback is Noisy DID YOU HEAR WHAT I LIKE??!!...and limits Our Prediction Accuracy
  27. The Magic Barrier● Magic Barrier = Limit on prediction accuracy due to noise in original data● Natural Noise = involuntary noise introduced by users when giving feedback ● Due to (a) mistakes, and (b) lack of resolution in personal rating scale● Magic Barrier >= Natural Noise Threshold ● Our prediction error cannot be smaller than the error in the original data
  28. Our related research questionsX. Amatriain, J.M. Pujol, N. Oliver (2009) "I like It... I like It Not: Measuring Users Ratings Noise in Recommender Systems", in UMAP 09 ● Q1. Are users inconsistent when providing explicit feedback to Recommender Systems via the common Rating procedure? ● Q2. How large is the prediction error due to these inconsistencies? ● Q3. What factors affect user inconsistencies?
  29. Experimental Setup● 100 Movies selected from Netflix dataset doing a stratified random sampling on popularity● Ratings on a 1 to 5 star scale ● Special “not seen” symbol.● Trial 1 and 3 = random order; trial 2 = ordered by popularity
  30. User Feedback is Noisy● Users are inconsistent● Inconsistencies are not random and depend on many factors
  31. User Feedback is Noisy● Users are inconsistent● Inconsistencies are not random and depend on many factors ● More inconsistencies for mild opinions
  32. User Feedback is Noisy● Users are inconsistent● Inconsistencies are not random and depend on many factors ● More inconsistencies for mild opinions ● More inconsistencies for negative opinions
  33. User’s ratings are far from ground truth #Ti #Tj # RMSE     T1, T2 2185 1961 1838 2308 0.573 0.707 T1, T3 2185 1909 1774 2320 0.637 0.765 T2, T3 1969 1909 1730 2140 0.557 0.694Pairwise comparison between trials, RMSE is already > 0.55 or > 0.69 (Netflix Prize was to get below 0.85 !!!)
  34. Algorithm Robustness to NNTrial 2 is  Alg./Trialconsistently the  <T1 T2 T3 Tworst /Tbestleast noisy User  1.2011 1.1469 1.1945 4.7% Average Item  1.0555 1.0361 1.0776 4% Average User­based  0.9990 0.9640 1.0171 5.5% kNN Item­based  1.0429 1.0031 1.0417 4% kNN SVD 1.0244 0.9861 1.0285 4.3%  RMSE for different Recommendation algorithms  ● when predicting each of the trials
  35. Rate it AgainX. Amatriain et al. (2009)"Rate it Again: Increasing Recommendation Accuracy by User re-Rating", 2009 ACM RecSys ● Given that users are noisy… can we benefit from asking to rate the same movie more than once? ● We propose an algorithm to allow for multiple ratings of the same <user,item> tuple. ● The algorithm is subjected to two fairness conditions: – Algorithm should remove as few ratings as possible (i.e. only when there is some certainty that the rating is only adding noise) – Algorithm should not make up new ratings but decide on which of the existing ones are valid.
  36. Re-rating Algorithm• One source re­rating case: Examples: {3, 1} →Ø {4} →4 {3, 4} →3 (2 source) {3, 4, 5} →3• Given the following milding function:   
  37. Results● One-source re-rating (Denoised⊚Denoising) T1⊚T2 ΔT1 T1⊚T3 ΔT1 T2⊚T3 ΔT2 User­based kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8% SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%● Two-source re-rating (Denoising T1with the other 2) Datasets T1⊚(T2, T3) ΔT1 User­based kNN 0.8647 13.4% SVD 0.8800 14.1%
  38. Rate it again● By asking users to rate items again we can remove noise in the dataset ● Improvements of up to 14% in accuracy!● Because we dont want all users to re-rate all items we design ways to do partial denoising ● Data-dependent: only denoise extreme ratings ● User-dependent: detect “noisy” users
  39. Denoising only noisy users●  Improvement in RMSE when doing one­source as a function of the percentage of denoised ratings and users: selecting only noisy users and extreme ratings
  40. The value or a re-rating Adding new ratings increases performance of the CF algorithm
  41. The value or a re-rating But you are better off doing re-rating than new ratings !!
  42. The value or a re-rating And much better if you know which ratings to re-rate!!
  43. Lets recap● Users are inconsistent● Inconsistencies can depend on many things including how the items are presented● Inconsistencies produce natural noise● Natural noise reduces our prediction accuracy independently of the algorithm● By asking (some) users to re-rate (some) items again we can remove noise and improve accuracy● Having users repeat existing ratings may have more value than adding new ones
  44. Crowds are not always wise ● Diversity of opinionConditions that are  ● Independenceneeded to guarantee the  ● DecentralizationWisdom in a Crowd ● Aggregation
  45. Crowds are not always wise vs. Who  won?
  46. The Wisdom of the Few X. Amatriain et al. "The wisdom of the few: a collaborative filtering approach based on expert opinions from the web", SIGIR 09
  47. “It is really only experts who can reliably account  for their reactions”
  48. Expert-based CF● expert = individual that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain● Expert-based Collaborative Filtering ● Find neighbors from a reduced set of experts instead of regular users. 1. Identify domain experts with reliable ratings 2. For each user, compute “expert neighbors” 3. Compute recommendations similar to standard kNN CF
  49. User Study● 57 participants, only 14.5 ratings/participant● 50% of the users consider Expert-based CF to be good or very good● Expert-based CF: only algorithm with an average rating over 3 (on a 0-4 scale)
  50. Advantages of the Approach● Noise ● Cold Start problem ● Experts introduce less ● Experts rate items as natural noise soon as they are● Malicious Ratings available ● Dataset can be monitored ● Scalability to avoid shilling ● Dataset is several order of● Data Sparsity magnitudes smaller ● Reduced set of domain ● Privacy experts can be motivated ● Recommendations can be to rate items computed locally
  51. So...● Can we generate meaningful and personalized recommendations ensuring 100% privacy? ● YES!● Can we have a recommendation algorithm that is so efficient to run on a phone? ● YES!● Can we have a recommender system that works even if there is only one user? ● YES!
  52. Architecture of the approach
  53. Some implementations● A distributed Music Recommendation engine
  54. Some implementations (II)● A geo-localized Mobile Movie Recommender iPhone App
  55. Geo-localized Expert Movie Recommendations 0 Powered by...
  56. Expert CF...● Recreates the old paradigm of manually finding your favorite experts in magazines but in a fully automatic non-supervised manner.
  57. What if we dont have ratings?The fascinating world of implicit user feedback Examples of implicit feedback: ● Movies you watched ● Links you visited ● Songs you listened to ● Items you bought ● ....
  58. Main features of implicit feedback● Our starting hypothesis are different from those in previous works: 1.Implicit feedback can contain negative feedback – given the right granularity and diversity, low feedback = negative feedback 2.Numerical value of implicit feedback can be mapped to preference given the appropriate mapping 3.Once we have a trustworthy mapping, we can evaluate implicit feedback predictions same way as with explicit feedback.
  59. Our questions● Q1. Is it possible to predict ratings a user would give to items given their implicit feedback?● Q2. Are there other variables that affect this mapping?
  60. Experimental setup● Online user study on the music domain● Users required to have a music profile in lastfm● Goal: Compare explicit ratings with their listening history taking to account a number of controlled variables
  61. Results. Do explicit ratings relate to implicit feedback? Almost perfect linear  relation between ratings  and quantized implicit  feedback
  62. Results. Do explicit ratings relate to implicit feedback?Extreme ratings have clear  ascending/descending  trend, but mild ratings  respond more to changes  in one direction
  63. Results. Do other variables affect? Albums listened to more  recently tend to receive  more positive ratings
  64. Results. Do other variables affect? Contrary to our expectations,  global album popularity does  not affect ratings
  65. Results. What about user variables?● We obtained many demographic (age, sex, location...) and usage variables (hours of music per week, concerts, music magazines, ways of buying music...) in the study.● We performed an ANOVA analysis on the data to understand which variables explained some of its variance.● Only one of the usage variables, contributed (Sig. Value < 0.05) → “Listening Style” encoded whether the user listened preferably to tracks, full albums, or both.
  66. Results. Regression Analysis – Model 1: riu = β0 + β1 · ifiu – Model 2: riu = β0 + β1 · ifiu + β2 · reiu – Model 3: riu = β0 + β1 · ifiu + β2 · reiu + β3 · gpi – Model 4: riu = β0 + β1 · ifiu + β2 · reiu + β3 · ifiu · reiuModel R2 F-value p-value β0 β1 β2 β3 1 0.125 F (1, 10120) = 1146 < 2.2 · 10−16 2.726 0.499 2 0.1358 F (2, 10019) = 794.8 < 2.2 · 10−16 2.491 0.484 0.133 3 0.1362 F (3, 10018) = 531.8 < 2.2 · 10−16 2.435 0.486 0.134 0.0285 4 0.1368 F (3, 10018) = 534.7 < 2.2 · 10−16 2.677 0.379 0.038 0.053All models meaningfully explain the data. Introducing “recentness” improves 10% but “global popularity” or interaction between variables do not make much difference
  67. Results. Predictive power Model RMSE – Excluding non-rated items User Average 1.131 1 1.026 2 1.017 3 1.016 4 1.016Error in predicting 20% of the ratings, having trained our regression model on the other 80%
  68. Conclusions● Recommender systems and similar applications usually focus on having more data● But... many times is not about having more but rather better data● User feedback can not always be treated as ground truth and needs to be processed● Crowds are not always wise and sometimes we are better off using experts● Implicit feedback represents a good alternative to understand users but mapping is not trivial
  69. Colleagues● Josep M. Pujol and Nuria Oliver (Telefonica) worked on Natural Noise and Wisdom of the Few projects● Nava Tintarev (Telefonica) worked on Natural Noise External Collaborators● Neal Lathia (UCL, London), Haewook Ahn (KAIST, Korea), Jaewook Ahn (Pittsburgh Univ.), and Josep Bachs (UPF, Barcelona) on Wisdom of the Few● Denis Parra (Pittsburgh Univ.) worked on implicit-explicit
  70. Thanks! Questions? Xavier Amatriain xar@tid.es http://xavier.amatriain.nethttp://technocalifornia.blogspot.com @xamat

×