3. A More Clear Outline
• This presentation has, IMHO, 3 sections:
1. Good News
2. Not that Good News
3. Good News
• Which represent, respectively
1. Results of first study on last.fm presented in UMAP
2011
2. Initial results of the study we present here
3. Expected Results after analysis of 2. (once I finish my
comps) – and your feedback!
4. Introduction
• Most of recommender system approaches rely on
explicit information of the users, but…
• Explicit feedback: scarce (people are not especially
eager to rate or to provide personal info)
• Implicit feedback: Is less scarce, but (Hu et al., 2008)
There’s no negative feedback … and if you watch a TV program just
once or twice?
Noisy … but explicit feedback is also noisy
(Amatriain et al., 2009)
Preference & Confidence … we aim to map the I.F. to
preference (our main goal)
Lack of evaluation metrics … if we can map I.F. and E.F., we can
have a comparable evaluation
7/12/2011 Parra, Amatriain "Walk the Talk" 4
5. Section I : Good News
• Last.fm User Study
• Linear regression results
6. Recalling the 1st study (1/5)
• Last.fm users (114 in total after filtering)
• For each user, we crawled all the albums they
listened to send them a personalized survey
7. Recalling the 1st study (2/5)
• What items should they rate? Item (album) sampling:
– Implicit Feedback (IF): playcount for a user on a given album.
Changed to scale [1-3], 3 means being more listened to.
– Global Popularity (GP): global playcount for all users on a given
album [1-3]. Changed to scale [1-3], 3 means being more listened
to.
– Recentness (R) : time elapsed since user played a given album.
Changed to scale [1-3], 3 means being listened to more recently.
7/12/2011 Parra, Amatriain "Walk the Talk" 7
8. Recalling the 1st study (3/5)
• Demographics Survey + Rating 100 albums
7/12/2011 Parra, Amatriain "Walk the Talk" 8
9. Recalling the 1st study (4/5)
• Gender
• Age
• Country
• Hours per week spent on internet [int_hrs_per_week]
• Hours per week listening to music online [msc_hrs_per_week]
• Number of concerts per year [conc_per_year]
• Do you read specialized music blogs or magazines? [blogs_mag]
• Do you have experience evaluating music online? [rate_music]
• How frequently do you buy physical music records? [buy_records]
• How frequently do you buy music online? [buy_online]
• Do you prefer listening to single tracks, whole albums or either
way? [track_or_CD]
7/12/2011 Parra, Amatriain "Walk the Talk" 9
10. Recalling the 1st study (5/5)
• Prediction of rating by multiple Linear Regression
evaluated with RMSE.
• Results showed that Implicit feedback (play
count of the album by a specific user) and
recentness (how recently an album was listened
to) were important factors, global popularity had
a weaker effect.
• Results also showed that listening style (if user
preferred to listen to single tracks, CDs, or either)
was also an important factor, and not the other
ones.
11. ... but
• Linear Regression didn’t account for the
nested nature of ratings
User 1 User n
...
1 3 5 3 0 4 5 2 2 1 5 4 3 2 3 2 1 0 4 5 25 4 3 21 3 5
• And ratings were treated as continuous, when
they are actually ordinal.
12. So, Ordinal Logistic Regression!
• Actually Mixed-Effects Ordinal Multinomial
Logistic Regression
• Mixed-effects: Nested nature of ratings
• We obtain a distribution over ratings (ordinal
multinomial) per each pair USER, ITEM -> we
predict the rating using the expected value.
• … And we can compare the inferred ratings with
a method that directly uses implicit information
(playcounts) to recommend ( by Hu, Koren et al.
2007)
20. Lessons / Challenges (1/2)
Problem/ Challenge
1. Ground truth: Playcounts of albums or tracks?
2. Quantization of playcounts (implicit feedback),
recentness, and overall number of listeners of an album
(global popularity) [1-3] scale v/s raw playcounts
3. Defining Relevancy of recommended elements (to
compare with the raw playcounts)
21. Lessons / Challenges (2/2)
Problem/ Challenge
4. Additional/Alternative metrics for evaluation [MAP
and nDCG used in the paper]
5. New Survey (In order to deal with the issues of
identifying “actual” relevancy in dataset2)
6. Significance of level-2 variables: track_or_CD (study
1 v/s 2, where concerts_per_year was significant)
22. … so
• Lots of work to do [after my comps]
• Questions, Suggestions?
23. Is this the end?
7/12/2011 Parra, Amatriain "Walk the Talk" 23
30. M2:
implicit
feedback &
recentness
4 Regression Analysis
M1: implicit feedback
M4:
M3: implicit
Interaction of
feedback,
implicit
recentness,
feedback &
global
recentness
popularity
• Including Recentness increases R2 in more than 10% [ 1 -> 2]
• Including GP increases R2, not much compared to RE + IF [ 1 -> 3]
• Not Including GP, but including interaction between IF and RE
improves the variance of the DV explained by the regression model.
[ 2 -> 4 ]
7/12/2011 Parra, Amatriain "Walk the Talk" 30
31. 4.1 Regression Analysis
Model RMSE1 RMSE2
User average 1.5308 1.1051
M1: Implicit feedback 1.4206 1.0402
M2: Implicit feedback + recentness 1.4136 1.034
M3: Implicit feedback + recentness + global popularity 1.4130 1.0338
M4: Interaction of Implicit feedback * recentness 1.4127 1.0332
• We tested conclusions of regression analysis
by predicting the score, checking RMSE in 10-
fold cross validation.
• Results of regression analysis are supported.
7/12/2011 Parra, Amatriain "Walk the Talk" 31
32. 4.2 Regression Analysis – Track or Album
Model Tracks Tracks/ Albums
Albums
User average 1.1833 1.1501 1.1306
M1: Implicit feedback 1.0417 1.0579 1.0257
M2: Implicit feedback + recentness 1.0383 1.0512 1.0169
M3: Implicit feedback + recentness + global 1.0386 1.0507 1.0159
popularity
M4: Interaction of Implicit feedback * recentness 1.0384 1.049 1.0159
• Including this variable that seemed to have an
effect in the general analysis, helped to
improve accuracy of the model
7/12/2011 Parra, Amatriain "Walk the Talk" 32
33. Is this the end?
7/12/2011 Parra, Amatriain "Walk the Talk" 33
Editor's Notes
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Regression Mapping*Xavier Amatrian works currently at Netflix, but by the time we were working on this research, he was still at Telefonica I+D
.. And why CARS? The reviewers of the paper criticized that this article was interesting but did not contribute extensively to the area of Context-aware Recommender Systems.However, of the different workshops so far, I considered this one the most relevant although the music recommendation was another interesting one.Still think that we contribute to this area by showing the minimal effects of several variables in the results, that I would expect other researchers could investigate and compare results.
- Our main motivation is the deal with the fact that the largest amount of recommender systems approaches rely on explicit information such as ratings, or metadata of users and items, butAlthough mostrecsys rely on explicit data, we face the situation that explicit feedback is scarcer than implicit feedback.- Hu, Koren in 2008 on the paper “Implicit feedback for Recommender Systems”, which is one of the few works in this area, defined some characteristics of implicit data that make difficult to work with it.
During September and October of the last year we run a user study on last.fm users.In a web page, they had to provide their last.fm username, we checked some
Sampling strategy – 100 albums were rated by each users. We discretized 3 variables to be able to do a more
Image on the left is the screenshot of the 1st part of the survey (demographics and consumption information) the image on the right is a sample of one of the 100 albums that each user had to rate from 0 to 5
Change description to the right side and change color
This model should account for larger variance of the DV.
Beta represents the coefficients for the fixed factors (if, re, gp)G_u represents the random factors
- Why the dataset 1 wasn’t enough- We needed to crawl more users …. But at the expense of missing some important information (we w
MERL stands for Mixed-Effects Linear Regression Model
Check which kind of popularity is used here (global or user) and check if it is an appropriate baseline.
Replace equations by words (if: implicit feedback, Re: recentness, gp: global popularity)