Research on Recommender Systems: Beyond Ratings and Lists

Research on Recommender
Systems: Beyond Ratings and Lists
Denis Parra, Ph.D. Information Sciences
Assistant Professor, CS Department
School of Engineering
Pontificia Universidad Católica de Chile
Tuesday, November 11th of 2014

Outline
• Personal Introduction
• Quick Overview of Recommender Systems
• My Work on Recommender Systems
– Tag-Based Recommendation
– Implicit-Feedback (time allowing …)
– Visual Interactive Interfaces
• Summary & Current & Future Work
Nov 11th 2014 D.Parra ~ JCC 2014 ~ Invited Talk 2

Personal Introduction
• I’m from Valdivia!
• There are many reasons to love Valdivia
The City

The Sports

The Animals

• B.Eng. and professional title of Civil Engineer in
Informatics from Universidad Austral de Chile
(2004), Valdivia, Chile
• Ph.D. in Information Sciences at University of
Pittsburgh (2013), Pittsburgh, PA, USA

Recommender Systems
INTRODUCTION
*
* Danboard (Danbo): Amazon’s cardboard robot, in these slides it represents a
recommender system
Nov 11th 2014 8

Recommender Systems (RecSys)
Systems that help (groups of) people to find relevant items in
a crowded item or information space (MacNee et al. 2006)

Why do we care about RecSys?
• RecSys have gained popularity due to several
domains & applications that require people to make
decisions among a large set of items.

A lil’ bit of History
• First recommender systems were built at the
beginning of 90’s (Tapestry, GroupLens, Ringo)
• Online contests, such as the Netflix prize, grew the
attention on recommender systems beyond
Computer Science
(2006-2009)

The Recommendation Problem
• The most popular way that the recommendation
problem has been presented is about rating
prediction:
Predict!
Item 1 Item 2 … Item m
User 1 1 5 4
User 2 5 1 ?
…
User n 2 5 ?
• How good is my prediction?

Recommendation Methods
• Without covering all possible methods, the two
most typical classifications on recommender
algorithms are
Classification 1 Classification 2
- Collaborative Filtering
- Content-based Filtering
- Hybrid
- Memory-based
- Model-based

Collaborative Filtering (User-based KNN)
• Step 1: Finding Similar Users (Pearson Corr.)
5
4
4
1
2
1
5
4
4
1
2
5
Active
user
User_1
User_2
User_3
active user
user_1
user_2
user_3

• Step 1: Finding Similar Users (Pearson Corr.)
5
4
4
1
2
1
5
4
4
1
2
5
Active
user
User_1
User_2
User_3
Σ
r r r r
( )( )
i ⊂
CR ui u ni n
r r r r
− −
u ,
n
( )2 ( )2
Σ Σ
− −
i ⊂ CR ui u i ⊂
CR ni n
=
u n u n
Sim u n
, ,
( , )
active user
user_1
0.4472136
user_2
0.49236596
user_3
-0.91520863

• Step 2: Ranking the items to recommend
5
4
4
Item 1
2
1
5
4
4
Active
user
2
User_1
3
User_2
4
2
Item 2
Item 3

• Step 2: Ranking the items to recommend
5
4
4
Item 1
2
1
5
4
4
Active
user
2 pred u i r
User_1
3
User_2
userSim u n r r
( , ) ( )
⊂ ⋅ −
n neighbors u ni n
u userSim u n
Σ
Σ
n ⊂
neighbors u
= +
( )
( )
( , )
( , )
4
2
Item 2
Item 3

Pros/Cons of CF
PROS:
• Very simple to implement
• Content-agnostic
• Compared to other techniques such as content-based,
is more accurate
CONS:
• Sparsity
• Cold-start
• New Item

Content-Based Filtering
• Can be traced back to techniques from IR, where
the User Profile represents a query.
Doc_1 = {w_1, w_2, …., w_3}
5
4
5
user_profile = {w_1, w_2, …., w_3} using TF-IDF, weighting
Doc_2 = {w_1, w_2, …., w_3}
Doc_3 = {w_1, w_2, …., w_3}
Doc_n = {w_1, w_2, …., w_3}

PROS/CONS of Content-Based Filtering
PROS:
• New items can be matched without previous
feedback
• It can exploit also techniques such as LSA or LDA
• It can use semantic data (ConceptNet, WordNet,
etc.)
CONS:
• Less accurate than collaborative filtering
• Tends to overspecialization

Hybridization
• Combine previous methods to overcome their
weaknesses (Burke, 2002)

C2. Model/Memory Classification
• Memory-based methods use the whole dataset in
training and prediction. User and Item-based CF are
examples.
• Model-based methods build a model during training
and only use this model during prediction. This
makes prediction performance way faster and
scalable

Model-based: Matrix Factorization
Latent vector of the item
Latent vector of the user
SVD ~
Singular Value Decomposition

PROS/CONS of MF and latent factors model
PROS:
• So far, state-of-the-art in terms of accuracy (these
methods won the Netflix Prize)
• Performance-wise, the best option nowadays: slow
at training time O((m+n)3) compared to correlation
O(m2n), but linear at prediction time O(m+n)
CONS:
• Recommendations are obscure: How to explain that
certain “latent factors” produced the
recommendation

Rethinking the Recommendation Problem
• Ratings are scarce: need for exploiting other sources
of user preference
• User-centric recommendation takes the problem
beyond ratings and ranked lists: evaluate user
engagement and satisfaction, not only RMSE
• Several other dimensions to consider in the
evaluation: novelty of the results, diversity,
coverage (user and catalog), serendipity
• Study de effect of interface characteristics: user-control,
explainability

My Take on RecSys Research

My Work on RecSys
• Traditional RecSys: accurate prediction and TopN
algorithms
• In my research I have contributed to RecSys by:
– Utilizing other sources of user preference (Social Tags)
– Exploiting implicit feedback for recommendation and for
mapping explicit feedback
– Studying user-centric evaluation: the effect of user
controllability on user satisfaction in interactive
interfaces
• And nowadays: Studying whether Virtual Worlds are a
good proxy for real world recommendation tasks

This is not only My work J
• Dr. Peter Brusilovsky
University of Pittsburgh, PA, USA
• Dr. Alexander Troussov
IBM Dublin and TCD, Ireland
• Dr. Xavier Amatriain
TID / Netflix, CA, USA
• Dr. Christoff Trattner
NTNU, Norway
• Dr. Katrien Verbert
KU Leuven, Belgium
• Dr. Leandro Balby-Marinho
UFCG, Brasil

Tag-based Recommendation
• D. Parra, P. Brusilovsky. Improving Collaborative
Filtering in Social Tagging Systems for the
Recommendation of Scientific Articles. Web
Intelligence 2010, Toronto, Canada
• D. Parra, P. Brusilovsky. Collaborative Filtering
for Social Tagging Systems: an Experiment with
CiteULike. ACM Recsys 2009, New York, NY, USA

Motivation
• Ratings are scarce. Find another source of user
preference: Social Tagging Systems
User
Resource
Tags

A Folksonomy
• When a user u uses adds an item i using one or
more tags t1,…, tn, there is a tagging instance.
• The collection of tagging instances produces a
folksonomy

Applying CF over the Folksonomy
• In the first step: Calculate user similarity
Traditional CF Tag-based CF
Pearson Correlation
over ratings
BM25 over social tags
• In the second step: incorporate the amount of raters
to rank the items (NwCF)

Tag-based CF
=
Query
Doc_1
Doc_2
Doc_3
BM25
Active
User

Okapi BM25
BM25: We obtain the similarity between
users (neighbors) using their set of tags as
“documents” and performing an Okapi BM25
(probabilistic IR model) Retrieval Status
Value calculation.
Tag frequency in the neighbor (v) profile
1 ( 1)
RSV IDF k tf
sim(u, v) = ( 1)
Σ∈
+
td
k tf
3
d k ((1 − b ) + b × ( L / L ))
+
tf
k tf
= ⋅
t q 1
d ave td
tq
Tag frequency in the active user (u) profile
( , ) log (1 ( )) ( , ) 10 predʹ′ u i = + nbr i ⋅ pred u i
+
+
⋅
tq
3

Evaluation
• Crawl during 38 days during June-July 2009
Item # unique instances
# users 784
# items 26,599
# tags 26,009
# posts 71,413
# annotations 218,930
avg # items per user 91
avg # users per item 2.68
avg # tags per user 88.02
avg # users per tag 2.65
avg # tags per item 7.07
avg # items per tag 7.23
Item Phase 2
dataset
# users 5,849
# articles 574,907
# tags 139,993
#tagging
incidents
2,337,571
Filtering
process

Cross-validation
• Test-validation-train sets, 10-fold cross
validation
• Training to obtain parameter K: neighb. size
• One run the experiment: ~12 hours

Results & Statistical Significance
• BM25 is intended to bring more neighbors, at the
cost of more noise (neighbors not so similar)
• NwCF helps to decrease noise, so it was natural to
combine them and try just that option
CCF NwCF BM25+CCF BM25+NwCF
MAP@10 0.12875 0.1432* 0.1876** 0.1942***
K (neigh.size) 20 22 21 29
Ucov 81.12% 81.12% 99.23% 99.23%
Significance over the baseline: *p < 0.236, ** p < 0.033, *** p < 0.001

Take-aways
• We can exploit tags as a source for user similarity in
recommendation algorithms
• Tag-based (BM25) similarity can be considered as
an alternative to Pearson Correlation to calculate
user similarity in STS.
• Incorporating the number of raters helped to
decrease the noise produced by items with too few
ratings

Work with Xavier Amatriain
IMPLICIT FEEDBACK

Implicit-Feedback
• Slides are based on two articles:
– Parra-Santander, D., & Amatriain, X. (2011). Walk the
Talk: Analyzing the relation between implicit and
explicit feedback for preference elicitation. Proceedings
of UMAP 2011, Girona, Spain
– Parra, D., Karatzoglou, A., Amatriain, X., & Yavuz, I.
(2011). Implicit feedback recommendation via implicit-to-
explicit ordinal logistic regression mapping.
Proceedings of the CARS Workshop, Chicago, IL, USA,
2011.

Introduction (1/2)
• Most of recommender system approaches rely on
explicit information of the users, but…
• Explicit feedback: scarce (people are not especially
eager to rate or to provide personal info)
• Implicit feedback: Is less scarce, but (Hu et al., 2008)
There’s no negative feedback
… and if you watch a TV program just
once or twice?
Noisy
… but explicit feedback is also noisy
(Amatriain et al., 2009)
Preference & Confidence
… we aim to map the I.F. to preference
(our main goal)
Lack of evaluation metrics
… if we can map I.F. and E.F., we can
have a comparable evaluation

Introduction (2/2)
• Is it possible to map implicit behavior to explicit
preference (ratings)?
• Which variables better account for the amount of
times a user listens to online albums? [Baltrunas &
Amatriain CARS ‘09 workshop – RecSys 2009.]
• OUR APPROACH: Study with Last.fm users
– Part I: Ask users to rate 100 albums (how to sample)
– Part II: Build a model to map collected implicit feedback
and context to explicit feedback

Walk the Talk (2011)
Albums they listened to during last:
7days, 3months, 6months, year,
overall For each album in the list we obtained:
# user plays (in each period), # of
global listeners and # of global plays

Walk the Talk - 2
• Requirements: 18 y.o., scrobblings > 5000

Quantization of Data for Sampling
• What items should they rate? Item (album) sampling:
– Implicit Feedback (IF): playcount for a user on a given album.
Changed to scale [1-3], 3 means being more listened to.
– Global Popularity (GP): global playcount for all users on a given
album [1-3]. Changed to scale [1-3], 3 means being more listened to.
– Recentness (R) : time elapsed since user played a given album.
Changed to scale [1-3], 3 means being listened to more recently.

4 Regression Analysis
M1: implicit feedback
M2:
implicit
feedback &
recentness
M4:
Interaction of
implicit
feedback &
recentness
M3: implicit
feedback,
recentness,
global
popularity
• Including Recentness increases R2 in more than 10% [ 1 -> 2]
• Including GP increases R2, not much compared to RE + IF [ 1 -> 3]
• Not Including GP, but including interaction between IF and RE
improves the variance of the DV explained by the regression model.
[ 2 -> 4 ]

4.1 Regression Analysis
Model RMSE1 RMSE2
User average 1.5308 1.1051
M1: Implicit feedback 1.4206 1.0402
M2: Implicit feedback + recentness 1.4136 1.034
M3: Implicit feedback + recentness + global popularity 1.4130 1.0338
M4: Interaction of Implicit feedback * recentness 1.4127 1.0332
• We tested conclusions of regression analysis by
predicting the score, checking RMSE in 10-fold
cross validation.
• Results of regression analysis are supported.

Conclusions of Part I
• Using a linear model, Implicit feedback and
recentness can help to predict explicit feedback (in
the form of ratings)
• Global popularity doesn’t show a significant
improvement in the prediction task
• Our model can help to relate implicit and explicit
feedback, helping to evaluate and compare explicit
and implicit recommender systems.

Part II: Extension of Walk the Talk
• Implicit Feedback Recommendation via Implicit-to-
Explicit OLR Mapping (Recsys 2011, CARS
Workshop)
– Consider ratings as ordinal variables
– Use mixed-models to account for non-independence of
observations
– Compare with state-of-the-art implicit feedback
algorithm

Recalling the 1st study (5/5)
• Prediction of rating by multiple Linear Regression
evaluated with RMSE.
• Results showed that Implicit feedback (play count
of the album by a specific user) and recentness (how
recently an album was listened to) were important
factors, global popularity had a weaker effect.
• Results also showed that listening style (if user
preferred to listen to single tracks, CDs, or either)
was also an important factor, and not the other
ones.

... but
• Linear Regression didn’t account for the nested
nature of ratings
User 1
1 3 5 3 0 4 5 2 2 1 5 4 3 2
User n
3 2 1 0 4 5 2 5 4 3 2 1 3 5
. . .
• And ratings were treated as continuous, when they
are actually ordinal.

So, Ordinal Logistic Regression!
• Actually Mixed-Efects Ordinal Multinomial Logistic
Regression
• Mixed-effects: Nested nature of ratings
• We obtain a distribution over ratings (ordinal
multinomial) per each pair USER, ITEM -> we
predict the rating using the expected value.
• … And we can compare the inferred ratings with
a method that directly uses implicit information
(playcounts) to recommend ( by Hu, Koren et al.
2007)

Ordinal Regression for Mapping
• Model
• Predicted value

Datasets
• D1: users, albums, if, re, gp, ratings,
demographics/consumption
• D2: users, albums, if, re, gp, NO RATINGS.

Results

Conclusions & Current Work
Problem/ Challenge
1. Ground truth: How many Playcounts to relevancy?
> Sensibility Analysis needed
2. Quantization of playcounts (implicit feedback),
recentness, and overall number of listeners of an album
(global popularity) [1-3] scale v/s raw playcounts >
modifiy and compare
3. Additional/Alternative metrics for evaluation [MAP
and nDCG used in the paper]

Part of this work with Katrien Verbert
VISUALIZATION + USER
CONTROLLABILITY

Visualization & User Controllability
• Motivation: Can user controllability and
explainability improve user engagement and
satisfaction with a recommender system?
• Specific research question: How intersections of
contexts of relevance (of recommendation
algorithms) might be better represented for user
experience with the recommender?

The Concept of Controllability
MovieLens: example of traditional recommender list

Visualization & User Controllability
• Motivation: Can user controllability and
explainability improve user engagement and
satisfaction with a recommender system?
• Specific research question: How intersections of
contexts of relevance (of recommendation
algorithms) might be better represented for user
experience with the recommender?

Research Platform
• The studies were conducted using Conference
Navigator, a Conference Support System
• Our goal was recommending conference talks
http://halley.exp.sis.pitt.edu/cn3/
Program
Proceedings
Author List
Recommendations

Hybrid RecSys: Visualizing Intersections
• Clustermap vs. Venn Diagram
Clustermap Venn diagram

TalkExplorer – IUI 2013
• Adaptation of Aduna Visualization to CN
• Main research question: Does fusion (intersection) of
contexts of relevance improve user experience?

TalkExplorer - I
Entities
Tags, Recommender Agents,
Users

TalkExplorer - II
• Canvas Area: Intersections of Different Entities
Recommender
Recommender
Cluster
with
intersect
ion of
entities Cluster (of
talks)
associated
to only one
entity
User

TalkExplorer - III
Items
Talks explored by the
user

Our Assumptions
• Items which are relevant in more that
one aspect could be more valuable to the
users
• Displaying multiple aspects of relevance
visually is important for the users in the
process of item’s exploration

TalkExplorer Studies I & II
• Study I
– Controlled Experiment: Users were asked to discover
relevant talks by exploring the three types of entities: tags,
recommender agents and users.
– Conducted at Hypertext and UMAP 2012 (21 users)
– Subjects familiar with Visualizations and Recsys
• Study II
– Field Study: Users were left free to explore the interface.
– Conducted at LAK 2012 and ECTEL 2013 (18 users)
– Subjects familiar with visualizations, but not much with
RecSys

Evaluation: Intersections & Effectiveness
• What do we call an “Intersection”?
• We used #explorations on intersections and their
effectiveness, defined as:
Effectiveness = |푏표표푘푚푎푟푘푒푑 푖푡푒푚푠|/|
푖푛푡푒푟푒푠푒푐푡푖표푛푠 푒푥푝푙표푟푒푑| 

Results of Studies I & II
• Effectiveness increases
with intersections of
more entities
• Effectiveness wasn’t
affected in the field
study (study 2)
• … but exploration
distribution was
affected

SETFUSION: VENN DIAGRAM FOR
USER-CONTROLLABLE INTERFACE
76

SetFusion – IUI 2014

SetFusion I
Traditional
Ranked List
Papers sorted by
Relevance.
It combines 3
recommendation
approaches.

SetFusion - II
Sliders
Allow the user to control the importance of
each data source or recommendation method
Interactive Venn Diagram
Allows the user to inspect and to filter papers
recommended. Actions available:
- Filter item list by clicking on an area
- Highlight a paper by mouse-over on a circle
- Scroll to paper by clicking on a circle
- Indicate bookmarked papers

SetFusion – UMAP 2012
• Field Study: let users freely explore the interface
- ~50% (50 users) tried the
SetFusion recommender
- 28% (14 users) bookmarked at
least one paper
- Users explored in average 14.9
talks and bookmarked 7.36
talks in average.
A AB ABC AC B BC C
15 7 9 26 18 4 17
16% 7% 9% 27% 19% 4% 18%
Distribution of bookmarks per method or combination of methods

TalkExplorer vs. SetFusion
• Comparing distributions of explorations
In studies 1 and 2 over
talkExplorer we observed an
important change in the
distribution of explorations.

TalkExplorer vs. SetFusion
• Comparing distributions of explorations
Comparing the field studies:
- In TalkExplorer, 84% of
the explorations over
intersections were
performed over clusters of
1 item
- In SetFusion, was only
52%, compared to 48%
(18% + 30%) of multiple
intersections, diff. not
statistically significant

Summary & Conclusions
• We presented two implementations of visual
interactive interfaces that tackle exploration on a
recommendation setting
• We showed that intersections of several contexts of
relevance help to discover relevant items
• The visual paradigm used can have a strong effect on
user behavior: we need to keep working on visual
representation that promote exploration without
increasing the cognitive load over the users

Limitations & Future Work
• Apply our approach to other domains (fusion of
data sources or recommendation algorithms)
• For SetFusion, find alternatives to scale the
approach to more than 3 sets, potential alternatives:
– Clustering and
– Radial sets
• Consider other factors that interact with the user
satisfaction:
– Controllability by itself vs. minimum level of accuracy

More Details on SetFusion?
• Effect of other variables: gender, age, experience
with in the domain, or familiarity with the system
• Check our upcoming paper in the IJHCS “User-controllable
Personalization: A Case Study with
SetFusion”: Controlled Laboratory study with
SetFusion versus traditional ranked list

CONCLUSIONS (& CURRENT) &
FUTURE WORK

Challenges in Recommender Systems
• Recommendation to groups
• Cross-Domain recommendation
• User-centric evaluation
• Interactive interfaces and visualization
• Improve Evaluation for comparison (P. Campos of U.
of Bio-Bio on doing fair evaluations considering time)
• ML: Active learning, multi-armed bandits (exploration,
exploitation)
• Prevent the “Filter Bubble”
• Make algorithms resistant to attacks

Are Virtual Worlds Good Proxies for Real
World ?
• Why? We have a Second Life dataset with 3
connected dimensions of information
Social Network
Marketplace
Virtual World
• 2 undergoing projects: Entrepreneurship and LBSN

Entrepreneurship
• Can we predict whether a user will create a store
and how successful will she/he be? Literature on
this area is extremely scarce.
Social Network Marketplace
James Gaskin
SEM, Causal models
BYU, USA
Stephen Zhang
Entrepreneurship
PUC Chile
Christoph Trattner
Social Networks
NTNU, Norway

Location-Based Social Networks (LBSN)
• How similar are the patterns of mobility in real
world and virtual world ?
Social Network
Virtual World
Christoph Trattner
Social Networks
NTNU, Norway
Leandro Balby-Marinho
LBSN and RecSys
UFCG, Brasil

Other RecSys Activities
• I am part of the Program Committee of the 2015
RecSys challenge. Don’t miss it!
» Is the user going to buy items in this session? Yes|No
» If yes, what are the items that are going to be bought?
• Part of team creating the upcoming RecSys Forum
(like SIGIR Forum). Coming Soon! (Alan Said,
Cataldo Musto, Alejandro Bellogin, etc.)

Research on Recommender Systems: Beyond Ratings and Lists

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Research on Recommender Systems: Beyond Ratings and Lists

Similar to Research on Recommender Systems: Beyond Ratings and Lists (20)

Recently uploaded

Recently uploaded (20)

Research on Recommender Systems: Beyond Ratings and Lists