Telefonica is a fast-growing Telecom
1989 2000 2008 Clients About 12 About 68 About 260 million million million subscribers customers customers Services Basic Wireline and mobile Integrated ICT telephone and voice, data and solutions for all data services Internet services customersGeographies Operations in Operations in Spain 25 countries 16 countries Staff About 71,000 About 149,000 About 257,000 professionals professionals professionals Finances Rev: 4,273 M€ Rev: 28,485 M€ Rev: 57,946 M€ EPS(1): 0.45 € EPS(1): 0.67 € EPS: 1.63 € (1) EPS: Earnings per share
Currently among the largest in
the world Telco sector worldwide ranking by market cap (US$ bn) Source: Bloomberg, 06/12/09 Just announced 2010 results: record net earnings, first Spanish company ever to make > 10B €
Leader in South AmericaData as
of March ‘09 1 2 Argentina: 20.9 million Wireline market rank 2 1 Brazil: 61.4 million Mobile market rank 2 Central America: 6.1 million 1 2 Colombia: 12.6 million 1 1 Chile: 10.1 million 2 Ecuador: 3.3 million 2 Mexico: 15.7 million 1 1 Peru: 15.2 million 1 Uruguay: 1.5 million 2 Venezuela: 12.0 million Total Accesses (as of March ‘09) 159.5 millionNotes:- Central America includes Guatemala, Panama, El Salvador and Nicaragua- Total accesses figure includes Narrowband Internet accesses of Terra Brasil and Terra Colombia, and BroadbandInternet accesses of Terra Brasil, Telefónica de Argentina, Terra Guatemala and Terra México.
And a significant footprint in
Europe Wireline market rank Mobile market rankData as of March ‘09 1 1 Spain: 47.2 million 1 UK: 20.8 million 4 Germany: 16.0 million 2 Ireland: 1.7 million Czech Republic: 7.7 million 1 2 Slovakia: 0.4 million 3 Total Accesses (as of March ’09) 93.8 million
The Age of Search has
come to an end... long live the Age of Recommendation!●● Chris Anderson in “The Long Tail” ● “We are leaving the age of information and entering the age of recommendation”● CNN Money, “The race to create a smart Google”: ● “The Web, they say, is leaving the era of search and entering one of discovery. Whats the difference? Search is what you do when youre looking for something. Discovery is when something wonderful that you didnt know existed, or didnt know how to ask for, finds you.”
What works● It depends on
the domain and particular problem ● As a general rule, it is usually a good idea to combine: Hybrid Recommender Systems● However, in the general case it has beendemonstrated that (currently) the best isolatedapproach is CF. ● Item-based in general more efficient and better but mixing CF approaches can improve result ● Other approaches can improve results in specific cases (cold-start problem...)
The CF Ingredients● List of
m Users and a list of n Items● Each user has a list of items with associated opinion ● Explicit opinion - a rating score (numerical scale) ● Implicit feedback – purchase records or listening history● Active user for whom the prediction task is performed● A metric for measuring similarity between users● A method for selecting a subset of neighbors● A method for predicting a rating for items not rated bythe active user. 24
The Netflix Prize● 500K users
x 17K movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE) ● 49K contestants on 40K teams from 184 countries. ● 41K valid submissions from 5K teams; 64 submissions per day ● Wining approach uses hundreds of predictors from several teams
The Magic Barrier● Magic Barrier
= Limit on prediction accuracy due to noise in original data● Natural Noise = involuntary noise introduced by users when giving feedback ● Due to (a) mistakes, and (b) lack of resolution in personal rating scale● Magic Barrier >= Natural Noise Threshold ● Our prediction error cannot be smaller than the error in the original data
Our related research questionsX. Amatriain,
J.M. Pujol, N. Oliver (2009) "I like It... I like It Not: Measuring Users Ratings Noise in Recommender Systems", in UMAP 09 ● Q1. Are users inconsistent when providing explicit feedback to Recommender Systems via the common Rating procedure? ● Q2. How large is the prediction error due to these inconsistencies? ● Q3. What factors affect user inconsistencies?
Experimental Setup● 100 Movies selected
from Netflix dataset doing a stratified random sampling on popularity● Ratings on a 1 to 5 star scale ● Special “not seen” symbol.● Trial 1 and 3 = random order; trial 2 = ordered by popularity
User Feedback is Noisy● Users
are inconsistent● Inconsistencies are not random and depend on many factors ● More inconsistencies for mild opinions
User Feedback is Noisy● Users
are inconsistent● Inconsistencies are not random and depend on many factors ● More inconsistencies for mild opinions ● More inconsistencies for negative opinions
User’s ratings are far from
ground truth #Ti #Tj # RMSE T1, T2 2185 1961 1838 2308 0.573 0.707 T1, T3 2185 1909 1774 2320 0.637 0.765 T2, T3 1969 1909 1730 2140 0.557 0.694Pairwise comparison between trials, RMSE is already > 0.55 or > 0.69 (Netflix Prize was to get below 0.85 !!!)
Algorithm Robustness to NNTrial 2 is Alg./Trialconsistently the
<T1 T2 T3 Tworst /Tbestleast noisy User 1.2011 1.1469 1.1945 4.7% Average Item 1.0555 1.0361 1.0776 4% Average Userbased 0.9990 0.9640 1.0171 5.5% kNN Itembased 1.0429 1.0031 1.0417 4% kNN SVD 1.0244 0.9861 1.0285 4.3% RMSE for different Recommendation algorithms ● when predicting each of the trials
Rate it AgainX. Amatriain et
al. (2009)"Rate it Again: Increasing Recommendation Accuracy by User re-Rating", 2009 ACM RecSys ● Given that users are noisy… can we benefit from asking to rate the same movie more than once? ● We propose an algorithm to allow for multiple ratings of the same <user,item> tuple. ● The algorithm is subjected to two fairness conditions: – Algorithm should remove as few ratings as possible (i.e. only when there is some certainty that the rating is only adding noise) – Algorithm should not make up new ratings but decide on which of the existing ones are valid.
Rate it again● By asking
users to rate items again we can remove noise in the dataset ● Improvements of up to 14% in accuracy!● Because we dont want all users to re-rate all items we design ways to do partial denoising ● Data-dependent: only denoise extreme ratings ● User-dependent: detect “noisy” users
Lets recap● Users are inconsistent●
Inconsistencies can depend on many things including how the items are presented● Inconsistencies produce natural noise● Natural noise reduces our prediction accuracy independently of the algorithm● By asking (some) users to re-rate (some) items again we can remove noise and improve accuracy● Having users repeat existing ratings may have more value than adding new ones
Crowds are not always wise
● Diversity of opinionConditions that are ● Independenceneeded to guarantee the ● DecentralizationWisdom in a Crowd ● Aggregation
Expert-based CF● expert = individual
that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain● Expert-based Collaborative Filtering ● Find neighbors from a reduced set of experts instead of regular users. 1. Identify domain experts with reliable ratings 2. For each user, compute “expert neighbors” 3. Compute recommendations similar to standard kNN CF
User Study● 57 participants, only
14.5 ratings/participant● 50% of the users consider Expert-based CF to be good or very good● Expert-based CF: only algorithm with an average rating over 3 (on a 0-4 scale)
Advantages of the Approach● Noise
● Cold Start problem ● Experts introduce less ● Experts rate items as natural noise soon as they are● Malicious Ratings available ● Dataset can be monitored ● Scalability to avoid shilling ● Dataset is several order of● Data Sparsity magnitudes smaller ● Reduced set of domain ● Privacy experts can be motivated ● Recommendations can be to rate items computed locally
So...● Can we generate meaningful
and personalized recommendations ensuring 100% privacy? ● YES!● Can we have a recommendation algorithm that is so efficient to run on a phone? ● YES!● Can we have a recommender system that works even if there is only one user? ● YES!
What if we dont have
ratings?The fascinating world of implicit user feedback Examples of implicit feedback: ● Movies you watched ● Links you visited ● Songs you listened to ● Items you bought ● ....
Main features of implicit feedback●
Our starting hypothesis are different from those in previous works: 1.Implicit feedback can contain negative feedback – given the right granularity and diversity, low feedback = negative feedback 2.Numerical value of implicit feedback can be mapped to preference given the appropriate mapping 3.Once we have a trustworthy mapping, we can evaluate implicit feedback predictions same way as with explicit feedback.
Our questions● Q1. Is it
possible to predict ratings a user would give to items given their implicit feedback?● Q2. Are there other variables that affect this mapping?
Experimental setup● Online user study
on the music domain● Users required to have a music profile in lastfm● Goal: Compare explicit ratings with their listening history taking to account a number of controlled variables
Results. What about user variables?●
We obtained many demographic (age, sex, location...) and usage variables (hours of music per week, concerts, music magazines, ways of buying music...) in the study.● We performed an ANOVA analysis on the data to understand which variables explained some of its variance.● Only one of the usage variables, contributed (Sig. Value < 0.05) → “Listening Style” encoded whether the user listened preferably to tracks, full albums, or both.
Results. Regression Analysis – Model
1: riu = β0 + β1 · ifiu – Model 2: riu = β0 + β1 · ifiu + β2 · reiu – Model 3: riu = β0 + β1 · ifiu + β2 · reiu + β3 · gpi – Model 4: riu = β0 + β1 · ifiu + β2 · reiu + β3 · ifiu · reiuModel R2 F-value p-value β0 β1 β2 β3 1 0.125 F (1, 10120) = 1146 < 2.2 · 10−16 2.726 0.499 2 0.1358 F (2, 10019) = 794.8 < 2.2 · 10−16 2.491 0.484 0.133 3 0.1362 F (3, 10018) = 531.8 < 2.2 · 10−16 2.435 0.486 0.134 0.0285 4 0.1368 F (3, 10018) = 534.7 < 2.2 · 10−16 2.677 0.379 0.038 0.053All models meaningfully explain the data. Introducing “recentness” improves 10% but “global popularity” or interaction between variables do not make much difference
Results. Predictive power Model RMSE
– Excluding non-rated items User Average 1.131 1 1.026 2 1.017 3 1.016 4 1.016Error in predicting 20% of the ratings, having trained our regression model on the other 80%
Conclusions● Recommender systems and similar
applications usually focus on having more data● But... many times is not about having more but rather better data● User feedback can not always be treated as ground truth and needs to be processed● Crowds are not always wise and sometimes we are better off using experts● Implicit feedback represents a good alternative to understand users but mapping is not trivial
Colleagues● Josep M. Pujol and
Nuria Oliver (Telefonica) worked on Natural Noise and Wisdom of the Few projects● Nava Tintarev (Telefonica) worked on Natural Noise External Collaborators● Neal Lathia (UCL, London), Haewook Ahn (KAIST, Korea), Jaewook Ahn (Pittsburgh Univ.), and Josep Bachs (UPF, Barcelona) on Wisdom of the Few● Denis Parra (Pittsburgh Univ.) worked on implicit-explicit