Information Retrieval Models for Recommender Systems - PhD slides

PhD Thesis
Information Retrieval Models
for Recommender Systems
Author: Daniel Valcarce
Advisors: Álvaro Barreiro & Javier Parapar
A Coruña, May 8th, 2019
Information Retrieval Lab
Computer Science Department
University of A Coruña

Outline
1. Introduction
2. Evaluation
3. Top-N recommendation
4. Other recommendation tasks
5. Pseudo-relevance feedback
6. Conclusions
1

Research aim
Recommender Systems are active Information Filtering systems that
present items that their users may be interested in.
Information Retrieval systems obtain items of information relevant
to the users’ information needs.
3

Research aim
Both Information Retrieval and Information Filtering ﬁelds:
⊚ cope with enormous amounts of information,
⊚ provide relevant information to their users,
⊚ can offer personalization.
3

Research aim
Both Information Retrieval and Information Filtering ﬁelds:
⊚ cope with enormous amounts of information,
⊚ provide relevant information to their users,
⊚ can offer personalization.
This PhD Thesis revolves around the idea of exploiting Information
Retrieval models in Recommender Systems.
3

Information Retrieval vs Information Filtering
Information Retrieval (IR)
⊚ Goal: retrieve documents
relevant to the users’
information needs.
Information Filtering (IF)
⊚ Goal: select relevant items
for the users from an
information stream.
4

information needs.
⊚ Systems: search engines
(web, multimedia...).
information stream.
⊚ Systems: spam ﬁlters,
recommender systems.
4

information needs.
⊚ Systems: search engines
(web, multimedia...).
⊚ Input: the user’s query
(explicit).
information stream.
⊚ Systems: spam ﬁlters,
recommender systems.
⊚ Input: the user’s proﬁle
(implicit).
4

IR and IF: two sides of the same coin?
Some people consider them different ﬁelds:
⊚ U. Hanani, B. Shapira and P. Shoval: Information Filtering:
Overview of Issues, Research and Systems. User Modeling and
User-Adapted Interaction (2001).
While others consider them the same thing:
⊚ N. J. Belkin and W. B. Croft: Information ﬁltering and information
retrieval: two sides of the same coin? Communications of the
ACM (1992).
What is undeniable is that they are closely related.
5

IR and IF: two sides of the same coin?
Some people consider them different fields:
⊚ U. Hanani, B. Shapira and P. Shoval: Information Filtering:
Overview of Issues, Research and Systems. User Modeling and
User-Adapted Interaction (2001).
While others consider them the same thing:
⊚ N. J. Belkin and W. B. Croft: Information filtering and information
retrieval: two sides of the same coin? Communications of the
ACM (1992).
What is undeniable is that they are closely related.
⊚ Why not apply techniques from one field to the other?
5

Overview of thesis contributions
⊚ Evaluation within the
Cranﬁeld paradigm
⊚ Ad hoc retrieval
⊚ Pseudo-relevance feedback
Recommender Systems (RS)
⊚ Evaluation of top-N
recommendation
⊚ Neighborhood computation
⊚ Recommendation
6

Cranﬁeld paradigm
recommendation
⊚ Recommendation
Ranking metrics are commonly used in IR and RS.
Following previous work in IR, we study the robustness and
discriminative power of these metrics in recommendation.
6

Cranﬁeld paradigm
recommendation
⊚ Recommendation
Neighborhood-based techniques are a family of RS.
We show that ad hoc retrieval models can compute neighborhoods
effectively.
6

Cranﬁeld paradigm
recommendation
⊚ Recommendation
Pseudo-relevance feedback (PRF) provides automatic query
expansion.
We adapt PRF techniques to diverse recommendation tasks.
6

Cranﬁeld paradigm
recommendation
⊚ Recommendation
Sparse linear methods are very effective recommenders.
We propose a PRF model based on sparse linear methods that
achieves state-of-the-art effectiveness.
6

Recommender Systems evaluation
Online evaluation (e.g., A/B testing)
⊚ expensive,
⊚ measures real user behavior.
Ofﬂine evaluation
⊚ cheap,
⊚ highly reproducible,
⊚ usually constitutes the ﬁrst step before deploying a
recommender system.
9

Recommender Systems evaluation
Online evaluation (e.g., A/B testing)
⊚ expensive,
⊚ measures real user behavior.
Ofﬂine evaluation ←
⊚ cheap,
⊚ highly reproducible,
⊚ usually constitutes the ﬁrst step before deploying a
recommender system.
9

Ofﬂine evaluation of RS
When evaluating RS, which metric should we use?
⊚ Many types: error, ranking accuracy, diversity, novelty, etc.
⊚ Ranking accuracy metrics are the most popular.
⊚ These metrics have been traditionally used in IR.
10

Ofﬂine evaluation of RS
When evaluating RS, which metric should we use?
⊚ Many types: error, ranking accuracy, diversity, novelty, etc.
⊚ Ranking accuracy metrics are the most popular.
⊚ These metrics have been traditionally used in IR.
⊚ However, IR and RS evaluation assumptions are quite different:
Information Retrieval
⊚ relevance is independent
of users,
⊚ relevance judgments are
(almost) complete.
Recommender Systems
⊚ relevance depends
on the users,
⊚ relevance judgments are
far from complete.
10

Ranking metrics study
Precision, Recall, MAP, NDCG, MRR, BPref, InfAP...
Many ranking accuracy metrics have been studied in IR.
We now study their behavior in top-N recommendation.
12

Two perspectives:
⊚ discriminative power,
⊚ robustness to incompleteness.
12

Two perspectives:
⊚ discriminative power,
⊚ robustness to incompleteness:
◦ sparsity bias,
◦ popularity bias.
12

Robustness to incompleteness
Sparsity bias
⊚ Sparsity is intrinsic to the
recommendation task.
⊚ We take random
subsamples from the test
set to increase the bias.
Popularity bias
⊚ Missing-not-at-random
(long tail distribution).
⊚ We remove the most
popular items to study
the bias.
We measure the robustness of a metric by computing the Kendall’s
correlation of systems rankings when changing the amount of bias.
13

Discriminative power
⊚ A metric is discriminative when its differences in value are
statistically signiﬁcant.
⊚ We use the permutation test with difference in means as test
statistic.
⊚ We run the statistical test between all possible system pairs.
⊚ We plot the obtained p-values sorted by decreasing value.
14

Comparing cut-offs of the same metric (nDCG) 1/4
@5 @10 @20 @30 @40 @50 @60 @70 @80 @90 @100
@5
@10
@20
@30
@40
@50
@60
@70
@80
@90
@100
1.00 0.95 0.93 0.92 0.92 0.92 0.92 0.91 0.90 0.90 0.90
0.95 1.00 0.98 0.97 0.97 0.97 0.97 0.96 0.95 0.95 0.95
0.93 0.98 1.00 0.99 0.99 0.99 0.99 0.98 0.97 0.97 0.97
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.91 0.96 0.98 0.99 0.99 0.99 0.99 1.00 0.99 0.99 0.99
0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00
0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00
0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00
Correlation between cut-offs of nDCG.
16

100 90 80 70 60 50 40 30 20 10 0
% of ratings in the test set
0.85
0.90
0.95
1.00
Kendall’sτ
nDCG@5
nDCG@10
nDCG@20
nDCG@30
nDCG@40
nDCG@50
nDCG@60
nDCG@70
nDCG@80
nDCG@90
nDCG@100
Kendall’s correlation among systems evaluated with nDCG when
increasing the sparsity bias.
17

100 95 90 85 80
% least popular items in the test set
0.0
0.2
0.4
0.6
0.8
1.0
Kendall’sτ
nDCG@5
nDCG@10
nDCG@20
nDCG@30
nDCG@40
nDCG@50
nDCG@60
nDCG@70
nDCG@80
nDCG@90
nDCG@100
Kendall’s correlation among systems evaluated with nDCG when
changing the popularity bias.
18

0 5 10 15 20 25
pairs of recommender systems
0.0
0.2
0.4
0.6
0.8
1.0
p-value
nDCG@5
nDCG@10
nDCG@20
nDCG@30
nDCG@40
nDCG@50
nDCG@60
nDCG@70
nDCG@80
nDCG@90
nDCG@100
Discriminative power of nDCG measured with p-value curves.
19

Comparing metrics with cut-off @100 1/4
Precision Recall MAP nDCG MRR Bpref InfAP
Precision
Recall
MAP
nDCG
MRR
Bpref
InfAP
1.00 0.89 0.87 0.89 0.71 0.89 0.91
0.89 1.00 0.87 0.90 0.72 0.90 0.92
0.87 0.87 1.00 0.96 0.84 0.92 0.92
0.89 0.90 0.96 1.00 0.82 0.94 0.96
0.71 0.72 0.84 0.82 1.00 0.80 0.80
0.89 0.90 0.92 0.94 0.80 1.00 0.96
0.91 0.92 0.92 0.96 0.80 0.96 1.00
Correlation between metrics at cut-off @100.
20

100 90 80 70 60 50 40 30 20 10 0
% of ratings in the test set
0.85
0.90
0.95
1.00
Kendall’sτ
Precision
Recall
MAP
nDCG
MRR
Bpref
InfAP
Kendall’s correlation among systems when increasing
the sparsity bias.
21

100 95 90 85 80
% least popular items in the test set
0.0
0.2
0.4
0.6
0.8
1.0
Kendall’sτ
Precision
Recall
MAP
nDCG
MRR
Bpref
InfAP
Kendall’s correlation among systems when increasing
the popularity bias.
22

0 5 10 15 20 25
pairs of recommender systems
0.0
0.2
0.4
0.6
0.8
1.0
p-value
Precision
Recall
MAP
nDCG
MRR
Bpref
InfAP
Discriminative power measured with p-value curves.
23

Findings
⊚ Deep cut-offs offer greater robustness and discriminative power
than shallow cut-offs.
⊚ Precision offers high robustness to sparsity and popularity
biases and good discriminative power.
⊚ nDCG provides the best discriminative power and high
robustness to the sparsity bias and moderate robustness to the
popularity bias.
25

Experimental settings: metrics
We measure three recommendation dimensions:
⊚ Ranking accuracy: nDCG@100.
◦ nDCG is robust and discriminative.
◦ nDCG models graded relevance.
⊚ Diversity: Gini@100.
◦ The Gini index measures item recommendation inequality.
⊚ Novelty: MSI@100.
◦ Mean self-information quantify recommendations
unexpectedness.
26

Experimental settings: datasets
Dataset Users Items Ratings Density
MovieLens 100k 943 1682 100 000 6.305 %
MovieLens 1M 6040 3706 1 000 209 4.468 %
MovieLens 10M 71 567 10 681 10 000 054 1.308 %
R3-Yahoo 15 400 1000 365 703 2.375 %
LibraryThing 7279 37 232 749 401 0.277 %
BeerAdvocate 33 388 66 055 1 571 808 0.071 %
Ta-Feng 32 266 23 812 817 741 0.106 %
27

Recommender Systems
Recommendation algorithms can be classified in:
⊚ Content-based: find items similar to those the target user liked
using the items descriptions.
⊚ Collaborative filtering: relies on user-item interactions.
⊚ Hybrid: combination of content-based and collaborative
filtering approaches.
29

Collaborative ﬁltering
Collaborative ﬁltering (CF) exploits user-item feedback:
⊚ Explicit: ratings, reviews, etc.
⊚ Implicit: clicks, purchases, check-ins, etc.
30

Collaborative ﬁltering
Collaborative ﬁltering (CF) exploits user-item feedback:
⊚ Explicit: ratings, reviews, etc.
⊚ Implicit: clicks, purchases, check-ins, etc.
Two main families of CF methods:
⊚ Model-based: learn a predictive model from the data.
⊚ Neighborhood-based (or memory-based): directly use the
user-item feedback to compute recommendations.
30

Neighborhood-based methods
Two perspectives:
⊚ User-based: recommend items that users with common
interests liked.
⊚ Item-based: recommend items similar to those you liked.
Similarity between items is computed using common users
among items (not the content!).
31

Two perspectives:
⊚ User-based: recommend items that users with common
interests liked.
⊚ Item-based: recommend items similar to those you liked.
Similarity between items is computed using common users
among items (not the content!).
Two phases:
⊚ neighborhood computation,
⊚ recommendation generation.
31

Top-N recommendation
Pseudo-relevance feedback models
for recommendation

Pseudo-relevance feedback (PRF)
Information need
33

Information need
query
33

Information need
query Retrieval
System
33

Information need
query Retrieval
System
Query
Expansion
expanded
query
33

PRF for Recommendation
Pseudo-relevance feedback Neighborhood-based recommenders
User’s query User’s proﬁle
most^1,populated^1,state^2 Titanic^2,Avatar^3,Watchmen^5
Documents
Neighbors
Terms
Items
34

Relevance models

Relevance models
Relevance-based language models or, simply, relevance models (RM)
are state-of-the-art PRF methods [Lavrenko & Croft, SIGIR ’01]:
⊚ RM1: i.i.d. sampling,
⊚ RM2: conditional sampling.
36

Relevance models
Relevance-based language models or, simply, relevance models (RM)
are state-of-the-art PRF methods [Lavrenko & Croft, SIGIR ’01]:
⊚ RM1: i.i.d. sampling,
⊚ RM2: conditional sampling.
RM has been adapted to user-based CF [Parapar et al., IPM ’13].
36

Relevance models for CF
RM2 : p(i|Ru) ∝ p(i)
∏
j∈Iu
∑
v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
⊚ Iu is the set of items rated by the user u
⊚ Vu is neighborhood of the user u computed with kNN cosine
⊚ p(i|u) is computed smoothing the maximum likelihood
estimate with the probability in the collection
⊚ p(i) and p(v) are the item and user priors
37

Smoothing in RM2
∏
j∈Iu
∑
v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
To compute the conditional probabilities, we smooth the maximum
likelihood estimate (MLE):
pmle(i|u) =
r(u, i)
∑
j∈Iu
r(u, j)
with the probability in the collection:
p(i|C) =
∑
v∈U r(v, i)
∑
j∈I
∑
v∈U r(v, j)
38

Why use smoothing?
In IR [Zhai & Lafferty, TOIS 2004], smoothing provides:
⊚ a way to deal with data sparsity,
⊚ inverse document frequency (IDF) effect,
⊚ document length normalization.
39

Why use smoothing?
In IR [Zhai & Lafferty, TOIS 2004], smoothing provides:
⊚ a way to deal with data sparsity,
⊚ inverse document frequency (IDF) effect,
⊚ document length normalization.
In RS, we have the same problems:
⊚ data sparsity,
⊚ item popularity/speciﬁcity,
⊚ user proﬁles with different sizes.
39

IDF effect
In IR, the IDF effect:
⊚ measures term speciﬁcity in most weighting schemes,
⊚ was born as a heuristic but was given theoretical justiﬁcation.
41

IDF effect
In RS, item speciﬁcity is related to item novelty.
41

IDF effect
In RS, item speciﬁcity is related to item novelty.
IDF effect in recommendation
⊚ Let u be a user from the set of users U;
⊚ let Vu be their neighborhood;
⊚ given two items i1 and i2 with:
◦ the same ratings r(v, i1) = r(v, i2) ∀ v ∈ Vu,
◦ different popularity p(i1|C) < p(i2|C);
⊚ a recommender system that outputs p(i1|Ru) > p(i2|Ru) is said to
support the IDF effect.
41

Smoothing: axiomatic analysis of the IDF effect
We analyze axiomatically the IDF effect in RM2 when using different
smoothing methods:
Smoothing method IDF effect?
Jelinek-Mercer
Dirichlet priors
Absolute discounting
Additive
42

Smoothing: axiomatic analysis of the IDF effect
We analyze axiomatically the IDF effect in RM2 when using different
smoothing methods:
Smoothing method IDF effect?
Jelinek-Mercer ×
Dirichlet priors ×
Absolute discounting ×
Additive ✓
We expect additive smoothing to offer better ﬁgures of novelty.
42

Smoothing: ranking accuracy
0.28
0.30
0.32
0.34
0.36
0.38
0.40
0.42
0.44
0.46
0.48
0.50
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.001 0.01 0.1 1 10nDCG@100
δ, λ, µ × 103
γ
Additive (γ)
Absolute discounting (δ)
Jelinek-Mercer (λ)
Dirichlet priors (µ)
Figure: nDCG@100 values of RM2 varying the smoothing method on
MovieLens 100k. Also evaluated in MovieLens 1M, R3-Yahoo and LibraryThing.
43

Smoothing: diversity
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
0.26
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.001 0.01 0.1 1 10Gini@100
δ, λ, µ × 103
γ
Additive (γ)
Jelinek-Mercer (λ)
Figure: Gini@100 values of RM2 varying the smoothing method on
MovieLens 100k. Also evaluated in MovieLens 1M, R3-Yahoo and LibraryThing.
44

Smoothing: novelty
130
140
150
160
170
180
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.001 0.01 0.1 1 10MSI@100
δ, λ, µ × 103
γ
Additive (γ)
Jelinek-Mercer (λ)
Figure: MSI@100 values of RM2 varying the smoothing method on MovieLens
100k. Also evaluated in MovieLens 1M, R3-Yahoo and LibraryThing.
45

Priors in RM2
∏
j∈Iu
∑
v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
⊚ enable to introduce a priori information into the model,
⊚ provide a principled way of modeling business rules,
46

Priors in RM2
∏
j∈Iu
∑
v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
⊚ enable to introduce a priori information into the model,
⊚ provide a principled way of modeling business rules,
⊚ similar to document priors used in IR such as:
◦ linear document length prior [Kraaij et al., SIGIR ’02],
◦ probabilistic document length prior [Blanco & Barreiro, ECIR ’08].
46

Priors: evaluation
RM2 Metric ML 100k ML 1M R3-Yahoo LibraryThing
U-U
nDCG 0.4936 0.4242 0.0706 0.2206
Gini 0.2470 0.1352 0.3006 0.0390
MSI 175.94 172.14 303.87 331.05
U-PJMS
nDCG 0.4953* 0.4296* 0.0717* 0.2385*
Gini 0.2637 0.1637 0.4769 0.0319
MSI 180.45* 182.75* 339.65* 417.57*
Table: Comparison of RM2 method using uniform user and item priors (U-U)
or a uniform user prior and a probabilistic item prior estimate with
Jelinek-Mercer smoothing (U-PJMS). Best values in pink. Statistically
signiﬁcant improvements (permutation test p < 0.05) with a *.
49

Rocchio framework

Previous Work on Adapting PRF Methods to CF
Relevance models are very effective recommenders but have:
⊚ high computational cost,
⊚ several hyperparameters to tune,
⊚ different smoothing and prior choices to be made.
RM1 : p(i|Ru) ∝
∑
v∈Vu
p(v) p(i|v)
∏
j∈Iu
p(j|v)
∏
j∈Iu
∑
v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
51

Popular approaches to pseudo-relevance feedback
⊚ Relevance models
[Lavrenko & Croft, SIGIR ’01]
⊚ Scoring functions based on the Rocchio framework
[Rocchio, 1971; Carpineto et al., ACM TOIS ’01]
⊚ Divergence minimization model
[Zhai & Lafferty, SIGIR ’06]
⊚ Mixture models
[Tao & Zhai, SIGIR ’06]
52

Scoring functions from Rocchio framework
Rocchio Weights (RW)
pRW (i|u) =
∑
v∈Vu
r(v, i)
|Vu|
Robertson Selection Value (RSV)
pRSV (i|u) = p(i|Vu)
∑
v∈Vu
r(v, i)
|Vu|
CHI2
pCHI2 (i|u) =
[
p(i|Vu) − p(i|C)
]2
p(i|C)
Kullback–Leibler Divergence (KLD)
pKLD(i|u) = p(i|Vu) log
p(i|Vu)
p(i|C)
53

Probability estimators
Maximum likelihood estimate (MLE)
MLE of a multinomial distribution over the ratings:
pmle(i|Vu) =
∑
v∈Vu
r(v, i)
∑
v∈Vu
∑
j∈I r(v, j)
pmle(i|C) =
∑
u∈U r(u, i)
∑
u∈U
∑
j∈I r(u, j)
54

Neighborhood size normalization (I)
Neighborhoods are computed using clustering algorithms:
⊚ Hard clustering: every user appears in only one cluster. Clusters
may have different sizes. Example: k-means.
⊚ Soft clustering: each user has its own neighbors. When we set k
to a high value, we may ﬁnd different amounts of neighbors.
Example: kNN algorithm.
55

Neighborhood size normalization (I)
Neighborhoods are computed using clustering algorithms:
⊚ Hard clustering: every user appears in only one cluster. Clusters
may have different sizes. Example: k-means.
⊚ Soft clustering: each user has its own neighbors. When we set k
to a high value, we may ﬁnd different amounts of neighbors.
Example: kNN algorithm.
Idea: why not consider the variability of neighborhood sizes?
⊚ Large neighborhoods are equivalent to query with a lot of
results: the collection model is closer to the target user.
⊚ Small neighborhoods imply that neighbors are highly speciﬁc:
the collection is very different from the target user.
55

Rocchio: efﬁciency
0.001
0.01
0.1
1
ML 100k ML 1M ML 10M
recommendationtimeperuser(s)
RM2
RW
RSV
KLD
CHI2
Figure: Recommendation time per user (in logarithmic scale) using RM2, RW,
RSV, CHI2 and KLD algorithms on the MovieLens 100k, 1M and 10M datasets.
57

Rocchio: ranking accuracy
Method ML 100k ML 1M R3-Yahoo LibraryThing
RM2 0.4953bcdef g
0.4296bcdef g
0.0717bcd
0.2385bcg
RW 0.4827cdef
0.4114cdef
0.0704d
0.2182c
RSV 0.4825def
0.4112def
0.0703d
0.2180
CHI2-MLE 0.2916 0.2775 0.0628 0.2605abcf g
CHI2-NMLE 0.4639df
0.3966df
0.0726bcdf
0.2610abcf g
KLD-MLE 0.4207d
0.3393d
0.0709d
0.2543abcg
KLD-NMLE 0.4839def
0.4195bcdef
0.0715bcd
0.2337bc
Table: Values of nDCG@100. Statistically signiﬁcant improvements
(permutation test p < 0.05) with respect to RM2, RW, RSV, CHI2-MLE,
CHI2-NMLE, KLD-NMLE and KLD-NMLE are superscripted with a, b, c, d, e, f
and g, respectively. Best values in pink.
58

Rocchio: diversity
RM2 0.2637 0.1637 0.4769 0.0319
RW 0.2341 0.1331 0.2937 0.0348
RSV 0.2338 0.1329 0.2940 0.0346
CHI2-MLE 0.3745 0.3895 0.4429 0.1496
CHI2-NMLE 0.2947 0.1677 0.4136 0.1128
KLD-MLE 0.3168 0.3190 0.6064 0.0891
KLD-NMLE 0.2806 0.1540 0.3037 0.0669
Table: Values of Gini@100. Best values in pink.
59

Rocchio: novelty
RM2 180.45 182.75 339.65 417.57
RW 172.72 171.87 302.82 326.95
RSV 172.60 171.80 302.91 326.69
CHI2-MLE 233.63 262.21 333.12 442.55
CHI2-NMLE 190.77 188.34 327.74 400.18
KLD-MLE 199.23 237.88 371.56 396.31
KLD-NMLE 185.27 179.59 306.48 359.25
Table: Values of MSI@100. Best values in pink.
60

Improving neighborhoods

Neighborhood-based methods usually are:
⊚ simple,
⊚ efﬁcient,
⊚ explainable.
But their effectiveness relies largely on the quality of the neighbors.
The most common approach is to compute the k nearest neighbors
(kNN algorithm) using a pairwise similarity.
62

Weighted sum recommender (WSR)
NNCosNgbr [Cremonesi et al., RecSys ’10]
ˆru,i = bu,i +
∑
j∈Ji
shrunk_cosine (i, j) (r(u, j)−bu,i )
63

Weighted sum recommender (WSR)
NNCosNgbr [Cremonesi et al., RecSys ’10]
ˆru,i = bu,i +
∑
j∈Ji
shrunk_cosine (i, j) (r(u, j)−bu,i )
Item-based weighted sum recommender (WSR-IB)
ˆru,i =
∑
j∈Ji
cos (i, j) r(u, j)
User-based weighted sum recommender (WSR-UB)
ˆru,i =
∑
v∈Vu
cos (u, v) r(v, i)
63

Experiments with WSR
Method Metric ML 100k ML 1M R3-Yahoo LibraryThing
NNCosNgbr
nDCG 0.2227 0.1980 0.0567 0.0852
Gini 0.3438 0.2407 0.2341 0.0659
MSI 230.14 228.00 386.78 546.47
WSR-UB
nDCG 0.4857* 0.4138* 0.0705* 0.2213*
Gini 0.2375 0.1356 0.3208 0.0768
MSI 173.86 172.76 309.52 364.70
WSR-IB
nDCG 0.4833* 0.4035* 0.0727* 0.3085*
Gini 0.2560 0.1516 0.3356 0.2768
MSI 177.34 178.95 315.05 461.73
Table: Statistically signiﬁcant improvements in nDCG@100 (permutation test
p < 0.05) with respect to NNCosNgbr are indicated with *. Best values of
nDCG@100 in pink.
64

Improving cosine with an oracle

Room for improvement
WSR with kNN cosine works well in top-N recommendation.
What is the room for improvement of this similarity measure?
66

Room for improvement
WSR with kNN cosine works well in top-N recommendation.
What is the room for improvement of this similarity measure?
Let’s build an oracle that generates ideal neighborhoods:
⊚ Finding the best neighborhood is a NP-hard problem.
⊚ We build an approximate oracle using a greedy approach.
66

Greedy neighborhood oracle
0 50 100 150 200 250 300
0.0
0.2
0.4
0.6
0.8
1.0
nDCG@100
0.80
0.85
0.90
Greedy Oracle
0 50 100 150 200 250 300
k
0.40
0.45
0.50
kNN cosine
Figure: Values of nDCG@100 of WSR when using the neighborhoods
produced by the greedy oracle and by kNN using cosine similarity on
MovieLens 100k.
67

Cosine-based neighborhood oracle
The neighborhoods produced by the greedy oracle may be
impossible to achieve with similarities based on co-occurrence.
68

Cosine-based neighborhood oracle
The neighborhoods produced by the greedy oracle may be
impossible to achieve with similarities based on co-occurrence.
We develop a simpler oracle based on cosine similarity:
⊚ We ﬁnd the best neighborhoods that cosine similarity can
provide by tuning the value k for each user.
⊚ This oracle can be seen as an adaptive kNN algorithm that
uses the optimal k for each user.
68

Comparison against oracles
Method nDCG@100 Gini@100 MSI@100
kNN Cosine 0.4857 0.2375 173.86
Cosine-based Oracle 0.5298 0.2508 174.97
Greedy Oracle 0.8631 0.2664 168.08
Table: Values of nDCG@100, Gini@100 and MSI@100 using WSR with cosine
similarity and the two oracles on the MovieLens 100k dataset.
69

Cosine similarity improvements
By studying the properties of the neighborhoods provided by the
oracles, we modify cosine similarity:
70

⊚ We penalize the cosine similarity to add user proﬁle size
normalization.
◦ Similar to the pivoted document length normalization in IR
[Singhal et al., SIGIR ’96].
70

⊚ We penalize the cosine similarity to add user profile size
normalization.
◦ Similar to the pivoted document length normalization in IR
[Singhal et al., SIGIR ’96].
⊚ We add the IDF effect to cosine similarity to increase the user
profile overlap of the neighbors.
◦ The IDF is a fundamental term specificity measure in IR.
70

Cosine similarity improvements: results
Method Metric ML 100k ML 1M R3-Yahoo LibraryThing
Cosine
nDCG 0.4857 0.4138 0.0704 0.2255
Gini 0.2375 0.1356 0.3107 0.0417
MSI 173.86 172.76 305.26 333.50
Penalized
Cosine
nDCG 0.4889* 0.4194* 0.0709 0.2266
Gini 0.2516 0.1446 0.2863 0.0471
MSI 177.97* 176.41* 302.39 339.05*
Penalized
Cosine
with IDF
nDCG 0.4927*†
0.4281*†
0.0721*†
0.2422*†
Gini 0.2517 0.1551 0.3376 0.0596
MSI 178.65* 180.41*†
312.08*†
354.46*†
Table: Statistically signiﬁcant improvements in nDCG@100 and MSI@100
(permutation test p < 0.05) with respect to cosine and penalized cosine is
indicated with an * and †
, respectively. Best values in pink.
71

Language models for computing
neighborhoods

Alternatives to cosine similarity
So far, we have improved cosine similarity with ideas from IR.
Can we do better than with cosine similarity?
73

Alternatives to cosine similarity
So far, we have improved cosine similarity with ideas from IR.
Can we do better than with cosine similarity?
Let’s study cosine similarity from IR perspective.
73

Cosine similarity and the vector space model
Recommender Systems
⊚ Target user
⊚ Rest of users
⊚ Items
⊚ Query
⊚ Documents
⊚ Terms
74

Recommender Systems
⊚ Target user
⊚ Rest of users
⊚ Items
⊚ Query
⊚ Documents
⊚ Terms
Computing neighborhoods using cosine similarity is equivalent to
search in the vector space model.
If we swap users and items, we can derive an analogous item-based
approach.
74

Recommender Systems
⊚ Target user
⊚ Rest of users
⊚ Items
⊚ Query
⊚ Documents
⊚ Terms
Computing neighborhoods using cosine similarity is equivalent to
search in the vector space model.
If we swap users and items, we can derive an analogous item-based
approach.
We can use sophisticated search techniques for ﬁnding neighbors!
74

Language models
Statistical language models are a state-of-the-art ad hoc retrieval
framework [Ponte & Croft, SIGIR ’98].
Documents are ranked according to their posterior probability given
the query:
p(d|q) =
p(q|d) p(d)
p(q)
rank
= p(q|d) p(d)
75

Language models for ﬁnding neighborhoods (II)
User-based collaborative ﬁltering:
p(v|u)
rank
= p(v)
∏
i∈Iu
p(i|v)r(v,i)
We assume a multinomial distribution over the count of ratings:
pmle(i|v) =
r(v, i)
∑
j∈Iv
r(v, j)
77

Language models for ﬁnding neighborhoods (II)
User-based collaborative ﬁltering:
p(v|u)
rank
= p(v)
∏
i∈Iu
p(i|v)r(v,i)
We assume a multinomial distribution over the count of ratings:
pmle(i|v) =
r(v, i)
∑
j∈Iv
r(v, j)
However it suffers from sparsity. We need smoothing!
⊚ Jelinek-Mercer smoothing (JMS)
⊚ Dirichlet priors smoothing (DPS)
⊚ Absolute discounting smoothing (ADS)
⊚ Additive smoothing (AS)
77

Language models: ranking accuracy
0.24
0.26
0.28
0.30
0.32
0.34
0.36
0.38
0.40
0.42
0.44
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.001 0.01 0.1 1 10
nDCG@100
δ, λ, µ × 4 × 103
γ
Additive (γ)
Jelinek-Mercer (λ)
Cosine
Figure: nDCG@100 values of WSR-UB varying the smoothing method on
MovieLens 1M. Also evaluated in MovieLens 100k, R3-Yahoo and LibraryThing.
78

Language models: diversity
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.001 0.01 0.1 1 10
Gini@100
δ, λ, µ × 4 × 103
γ
Additive (γ)
Jelinek-Mercer (λ)
Cosine
Figure: Gini@100 values of WSR-UB varying the smoothing method on
79

Language models: novelty
140
145
150
155
160
165
170
175
180
185
190
195
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.001 0.01 0.1 1 10
MSI@100
δ, λ, µ × 4 × 103
γ
Additive (γ)
Jelinek-Mercer (λ)
Cosine
Figure: MSI@100 values of WSR-UB varying the smoothing method on
80

Language models: ranking accuracy
Cosine WSR-UB 0.4857b
0.4138b
0.0703 0.2255
Cosine WSR-IB 0.4790 0.4035 0.0727a
0.3085acdf
Cosine RM2 0.4953ab
0.4322abe
0.0717a
0.2384a
LM-JMS WSR-UB 0.4990abc
0.4329abe
0.0719a
0.2370a
LM-JMS WSR-IB 0.4989abc
0.4232ab
0.0731a
0.3118abcdf
LM-JMS RM2 0.5021abcd
0.4392abcde
0.0731acd
0.2406ad
Table: Ranking accuracy ﬁgures measured in nDCG@100. Statistically
signiﬁcant improvements (permutation test p < 0.05) indicated with
superscripts. Best values in pink.
81

Language models: diversity
Cosine WSR-UB 0.2375 0.1356 0.3107 0.0417
Cosine WSR-IB 0.2738 0.1516 0.3309 0.2768
Cosine RM2 0.2637 0.1533 0.4769 0.1278
LM-JMS WSR-UB 0.2645 0.1731 0.3566 0.0570
LM-JMS WSR-IB 0.2952 0.1854 0.3520 0.3368
LM-JMS RM2 0.2794 0.1825 0.4281 0.1285
Table: Diversity ﬁgures measured in Gini@100. Best values in pink.
82

Language models: novelty
Cosine WSR-UB 173.86 172.76 305.26 333.50
Cosine WSR-IB 181.59ac
178.95a
314.12a
461.74acdf
Cosine RM2 180.45a
179.39a
339.64abdef
417.56ad
LM-JMS WSR-UB 180.59a
186.15abc
314.23a
352.80a
LM-JMS WSR-IB 190.23abcdf
191.34abcdf
318.00abd
499.73abcdf
LM-JMS RM2 184.29abcd
189.27abcd
332.49abde
418.39ad
Table: Novelty ﬁgures measured in MSI@100. Statistically signiﬁcant
improvements (permutation test p < 0.05) indicated with superscripts. Best
values in pink.
83

Why LM with JMS works?
Why language models with Jelinek-Mercer smoothing work better
than cosine similarity?
To explain this, we perform an axiomatic analysis.
We define user specificity and item specificity properties.
84

User specificity
User specificity
⊚ Given the target user u,
⊚ and the candidate neighbors v and w such that:
◦ Iu ∩ Iv = Iu ∩ Iw ,
◦ r(u, i) = r(v, i) = r(w, i) ∀i ∈ Iu ∩ Iv ,
◦ |v| < |w|;
⊚ the user specificity property enforces sim(u, v) > sim(u, w).
85

Item specificity
Item specificity
⊚ Let u be the target user;
⊚ let v and w be two candidate users such that |v| = |w|;
⊚ let j and k be two items from the set of items I such that:
◦ j ∈ Iu ∩ Iv ,
◦ k ∈ Iu ∩ Iw ;
⊚ given:
◦ (Iu ∩ Iv ) {j} = (Iu ∩ Iw ) {k},
◦ r(u, j) = r(v, j) = r(u, k) = r(w, k),
◦ r(u, i) = r(v, i) = r(w, i) ∀i ∈ Iu ∩ Iv ∩ Iw ;
⊚ if |j| < |k|, then the item specificity property enforces
sim(u, v) > sim(u, w).
86

Language models: axiomatic analysis
We analyze axiomatically the user specificity and item specificity
properties in cosine similarity and in language models with
Jelinek-Mercer smoothing:
Neighborhood method User specificity Item specificity
Cosine similarity
LM-JMS
87

Language models: axiomatic analysis
We analyze axiomatically the user specificity and item specificity
properties in cosine similarity and in language models with
Jelinek-Mercer smoothing:
Neighborhood method User specificity Item specificity
Cosine similarity ∼ ∼
LM-JMS ✓ ✓
We think differences in effectiveness may be related to these
properties.
87

Other recommendation problems
Top-N recommendation is the most prominent task in RS.
However, recommendation technologies are used in many
industrial scenarios.
89

Other recommendation problems
Top-N recommendation is the most prominent task in RS.
However, recommendation technologies are used in many
industrial scenarios.
In this part, we focus on two less popular recommendation
problems:
⊚ long tail liquidation,
⊚ user-item group formation.
89

Other recommendation tasks
Long tail liquidation

Item popularity follows a long tail distribution.
The excess of inventory or overstock is a source of revenue loss.
91

Item popularity follows a long tail distribution.
The excess of inventory or overstock is a source of revenue loss.
We formulate a recommendation task centered on the liquidation of
long tail items.
We propose an item-based adaptation of relevance models to deal
with this novel task.
91

Long tail liquidation problem
Long tail liquidation problem
Let I′
⊂ I be the items we want to liquidate,
we aim to ﬁnd a scoring function s′
: I′
× U → R such that:
⊚ for each item i ∈ I′
,
⊚ we can build a ranked list of n users Ln
i ∈ Un
,
⊚ that are most likely interested in such item i.
92

Long tail estimation
Least rated products
I′
=
{
i ∈ I |Ui | < c1
}
Lowest rated products
I′
=
{
i ∈ I
∑
u∈Ui
ru,i
|Ui |
< c2
}
Least recommended products
I′
=
{
i ∈ I i /∈ Lc3
u , ∀u ∈ U
}
93

Item relevance models
IRM2
p(u|Ri ) ∝ p(u)
∏
v∈Ui
∑
j∈Ji
p(v| j)
p(u| j) p(j)
p(u)
94

IRM2
p(u|Ri ) ∝ p(u)
∏
v∈Ui
∑
j∈Ji
p(v| j)
p(u| j) p(j)
p(u)
MLE with additive smoothing
pγ(u|i) =
r(u, i) + γ
∑
v∈Ui
r(v, i) + γ |U|
94

Long tail liquidation: results on LibraryThing
Method Least rated Lowest rated Least recommended
Random 0.0024 0.0002 0.0030
Pop 0.0408acd
0.0499acd
0.0455acd
kNN-UB 0.0018 0.0039 0.0026
kNN-IB 0.0255ac
0.0061 0.0169ac
UIR-IB 0.0890abcd
0.0894abcd
0.0876abcd
HT 0.1431abcdeg
0.1451abcdeg
0.1477abcdeg
PureSVD 0.0879abcd
0.0919abcd
0.1065abcde
SLIM 0.2004abcdef g
0.2029abcdef g
0.2495abcdef g
IRM2 0.2120abcdef gh
0.2108abcdef g
0.2522abcdef g
Table: Values of nDCG@100 on LibraryThing for each long tail estimation.
Superscripts indicate signiﬁcant improvements. Best values in pink.
95

Long tail liquidation: results on Ta-Feng
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.010
1 2 3 4 5 6 7 8 9 10
nDCG@100
#buyers
Random
Pop
kNN-UB
kNN-IB
UIR-Item
HT
PureSVD
SLIM
IRM2
Figure: Values of nDCG@100 on the Ta-Feng dataset for liquidating long tail
items (those with no more than n buyers).
96

Other recommendation tasks
User-item group formation

The user-item group formation (UIGF) problem aims to ﬁnd the best
companions for a given item and a target user [Brilhante et al.,
ICMDM ’16].
98

ICMDM ’16].
IRM2:
⊚ estimates the relevance of a user given an item;
⊚ deals with long tail item liquidation with uniform priors.
98

ICMDM ’16].
IRM2:
⊚ estimates the relevance of a user given an item;
⊚ deals with long tail item liquidation with uniform priors.
We can model the user relationships with different priors estimators.
98

User-item group formation problem
UIGF as an item relevance modeling problem
⊚ Given the target user u ∈ U,
⊚ the recommended item i ∈ I,
⊚ an integer k;
⊚ the UIGF problem seeks to ﬁnd the set FG
u,i ⊆ U such that:
FG
u,i = arg max
F ∗
u
∑
v∈F ∗
p (v|Ri )
s.t. F∗
⊆ U, |F∗
| = k
99

UIGF priors
Uniform prior (U)
pU(v) =
1
|Fu|
Common Friends (CF)
pCF (v) ∝
1
|Fu ∩ Fv |
Common group friends (CGF)
pCGF (v) ∝
1
(∪
w∈F G
u,i
Fw
)
∩ Fv
Group closeness (GC)
pGC(v) ∝
1
FG
u,i ∩ Fv
100

UIGF evaluation
We used ground truth groups to evaluate UIGF approaches:
⊚ users who checked in the same place within 4 hours,
⊚ groups of at least 4 members,
⊚ each user must be friends with at least another group member.
101

UIGF evaluation
We used ground truth groups to evaluate UIGF approaches:
⊚ users who checked in the same place within 4 hours,
⊚ groups of at least 4 members,
⊚ each user must be friends with at least another group member.
Evaluation protocol:
⊚ for each group, we select a random member as the target user
and the place where the group registered as the target item;
⊚ we ask the UIGF model to form a group of k friends for this
speciﬁc user and item;
⊚ we evaluate the precision of the recommended group against
the ground truth groups.
101

UIGF datasets
Dataset Users Items Links Check-ins Ratings
FS 2 138 367 83 999 27 098 472 1 021 966 2 809 580
FS-NYC 103 663 7813 1 890 844 157 064 330 043
Gowalla 196 591 1 280 969 1 900 654 6 442 892 −
Brightkite 58 228 772 966 428 156 4 747 281 −
Weeplaces 15 799 971 307 114 131 7 369 712 −
Table: Statistics of location-based social network datasets.
102

UIGF: results in Foursquare
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
4 6 8 10 12
Precision
Group Size
k-Top
DkSP (PAV)
DkSP (PLM)
GREEDY (PAV)
GREEDY (PLM)
k-NN (PAV)
k-NN (PLM)
IRM2-U
IRM2-CF
IRM2-CGF
IRM2-GC
103

UIGF: results in Brightkite
0.10
0.15
0.20
0.25
0.30
0.35
0.40
4 6 8 10 12
Precision
Group Size
k-Top
DkSP (PAV)
DkSP (PLM)
GREEDY (PAV)
GREEDY (PLM)
k-NN (PAV)
k-NN (PLM)
IRM2-U
IRM2-CF
IRM2-CGF
IRM2-GC
104

LiMe: Linear Methods for PRF
Linear methods such as SLIM have been successfully used in
recommendation [Ning & Karypis, ICDM ’11].
We adapt them to PRF. Our proposal LiMe:
⊚ models the PRF task a matrix decomposition problem
⊚ employs linear methods to provide a solution
⊚ is able to learn inter-term or inter-doc similarities
⊚ jointly models the query and the pseudo-relevant set
⊚ admits different feature schemes
⊚ is agnostic to the retrieval model
106

LiMe variants
Two variants:
⊚ DLiMe: learns inter-document similarities.
⊚ TLiMe: learns inter-term similarities.
107

TLiMe: matrix formulation
Let X ∈ Rm×n
be the extended pseudo-relevant set matrix, we aim to
ﬁnd a inter-term similarity matrix W ∈ Rn×n
+ such that:
X = X × W





Q
D1
. . .
Dm−1





m×n
=





Q
D1
. . .
Dm−1





m×n
×




w11 · · · w1n
...
...
...
wn1 · · · wnn




n×n
s.t. diag(W) = 0, W ≥ 0
108

LiMe: feature schemes
How do we ﬁll matrix X =





Q
D1
. . .
Dm−1





m×n
?
109

LiMe: feature schemes
How do we ﬁll matrix X =





Q
D1
. . .
Dm−1





m×n
?
xij =



s(tj , Q) if i = 1 and f (tj , Q) > 0,
s(tj , Di−1) if i > 1 and f (tj , Di−1) > 0,
0 otherwise
⊚ stf −idf (t, D) = (1 + log2 f (t, D)) × log2
|C|
df (t)
⊚ f (t, D): #occurrences of term t in D (or Q)
109

LiMe: optimization problem
Matrix optimization problem:
W∗
= arg min
W
1
2
∥X − X W∥2
F + β1 ∥W∥1,1 +
β2
2
∥W∥2
F
s.t. diag(W) = 0, W ≥ 0
(1)
110

LiMe: optimization problem
Matrix optimization problem:
W∗
= arg min
W
1
2
∥X − X W∥2
F + β1 ∥W∥1,1 +
β2
2
∥W∥2
F
s.t. diag(W) = 0, W ≥ 0
(1)
Bound constrained least squares optimization problem with elastic
net (ℓ1 and ℓ2 regularization) penalty:
⃗w∗
·j = arg min
⃗w·j
1
2
∥⃗x·j − X ⃗w·j ∥2
2 + β1 ∥ ⃗w·j ∥1 +
β2
2
∥ ⃗w·j ∥2
2
s.t. wjj = 0, ⃗w·j ≥ 0
(2)
110

LiMe: query expansion
To expand the original query, we reconstruct the ﬁrst row of X:
(
Q′
)
1×n
=
(
Q
)
1×n
×




w11 · · · w1n
...
...
...
wn1 · · · wnn




n×n
ˆx1· = ⃗x1· × W∗
(3)
111

LiMe: query expansion
To expand the original query, we reconstruct the ﬁrst row of X:
(
Q′
)
1×n
=
(
Q
)
1×n
×




w11 · · · w1n
...
...
...
wn1 · · · wnn




n×n
ˆx1· = ⃗x1· × W∗
(3)
We compute a probabilistic estimate of a term tj given the feedback
model θF :
p(tj |θF ) =



ˆx1j∑
tv ∈VF ′
ˆx1v
if tj ∈ VF ′ ,
0 otherwise
(4)
111

LiMe: second retrieval
The second retrieval is performed interpolating the original query
model with the feedback model:
p(t|θ′
Q) = (1 − α) p(t|θQ) + α p(t|θF ) (5)
⊚ The hyperparameter α controls the interpolation
⊚ This is a standard procedure in state-of-the-art PRF techniques
112

LiMe: test collections
Collection #docs
Avg doc Topics
length Training Test
AP88-89 165k 284.7 51-100 101-150
TREC-678 528k 297.1 301-350 351-400
Robust-04 528k 28.3 301-450 601-700
WT10G 1,692k 399.3 451-500 501-550
GOV2 25,205k 647.9 701-750 751-800
113

LiMe: results
Method Metric AP88-89 TREC-678 Robust-04 WT10G GOV2
LM
nDCG 0.5637 0.4518 0.5830 0.5212 0.6325
RI − − − − −
RFMF
nDCG 0.5749 0.4746 0.5884 0.5262 0.6453
RI 0.42 0.23 0.07 0.30 0.42
MEDMM
nDCG 0.5955 0.5115 0.6227 0.5324 0.6653
RI 0.42 0.26 0.32 0.36 0.66
RM3
nDCG 0.6005 0.4987 0.6251 0.5352 0.6618
RI 0.50 0.40 0.37 0.20 0.60
DLiMe
nDCG 0.6058 0.4936 0.6247 0.5290 0.6588
RI 0.52 0.44 0.32 0.26 0.72
TLiMe
nDCG 0.6085 0.5198 0.6294 0.5398 0.6698
RI 0.52 0.46 0.37 0.30 0.62
114

Conclusions (I)
We explored cross-pollination of ideas between IR and RS:
⊚ We studied the robustness and discriminative power of ranking
accuracy metrics. These ﬁndings inﬂuenced the evaluation of
this thesis.
116

Conclusions (I)
this thesis.
⊚ We adapted different pseudo-relevance feedback models to
top-N recommendation as memory-based recommenders:
◦ relevance models offer highly accurate recommendations;
◦ techniques from the Rocchio framework are a very cost-effective
alternative.
116

Conclusions (I)
this thesis.
⊚ We adapted different pseudo-relevance feedback models to
top-N recommendation as memory-based recommenders:
◦ relevance models offer highly accurate recommendations;
◦ techniques from the Rocchio framework are a very cost-effective
alternative.
⊚ We used ad hoc retrieval models to compute better
neighborhoods in collaborative ﬁltering:
◦ neighborhood oracles provide insights for improvements;
◦ language models outperform cosine similarity.
116

Conclusions (II)
⊚ We adapted relevance models to novel recommendation tasks:
◦ item-based relevance models can tackle long tail item liquidation;
◦ speciﬁc priors can be used to deal with the user-item group
formation problem.
117

Conclusions (II)
⊚ We adapted relevance models to novel recommendation tasks:
◦ item-based relevance models can tackle long tail item liquidation;
◦ speciﬁc priors can be used to deal with the user-item group
formation problem.
⊚ We proposed a novel PRF framework inspired by a
recommendation method.
117

Future directions
⊚ Explore our robustness and discriminative analysis to different
types of metrics such as diversity or novelty metrics.
119

Future directions
⊚ Study the adaptation of different pseudo-relevance feedback
models to top-N recommendation or other tasks.
119

Future directions
⊚ Analyze other neighborhood computation techniques using the
methodology based on oracles.
119

Future directions
⊚ Examine other ad hoc retrieval models to compute
neighborhoods.
119

Future directions
⊚ Examine other ad hoc retrieval models to compute
neighborhoods.
⊚ Extend LiMe with richer features (based on Wikipedia, query
logs, etc.).
119

Conferences (I)
gitemizeitemize4 [gitemize,1] leftmargin=0.3cm+, label=
A. Landin, D. Valcarce, J. Parapar, Á. Barreiro. “PRIN: A Probabilistic
Recommender with Item Priors and Neural Models”. ECIR ’19, pp.
133-147, 2019.
D. Valcarce, A. Bellogín, J. Parapar, P. Castells. “On the Robustness and
Discriminative Power of IR Metrics for Top-N Recommendation”.
ACM RecSys ’18, pp. 260-268, 2018.
D. Valcarce, J. Parapar, Á. Barreiro. “LiMe: Linear Methods for
Pseudo-Relevance Feedback”. ACM SAC ’18, pp. 678-687, 2018.
D. Valcarce, J. Parapar, Á. Barreiro. “Combining Top-N Recommenders
with Metasearch Algorithms”. ACM SIGIR ’17, pp. 805-808, 2017.
D. Valcarce, J. Parapar, Á. Barreiro. “Additive Smoothing for
Relevance-Based Language Modelling of Recommender Systems”. 121

Conferences (II)
D. Valcarce, J. Parapar, Á. Barreiro. “Efﬁcient Pseudo-Relevance
Feedback Methods for Collaborative Filtering Recommendation”.
ECIR ’16, pp. 602-613, 2016.
D. Valcarce, J. Parapar, Á. Barreiro. “Language Models for Collaborative
Filtering Neighbourhoods”. ECIR ’16, pp. 614-625, 2016.
D. Valcarce. “Exploring Statistical Language Models for Recommender
Systems”. ACM RecSys ’15, pp. 375-378, 2015.
D. Valcarce, J. Parapar, Á. Barreiro. “A Study of Priors for
Relevance-Based Language Modelling of Recommender Systems”.
ACM RecSys ’15, pp. 237-240, 2015.
D. Valcarce, J. Parapar, Á. Barreiro. “A Study of Smoothing Methods for
Relevance-Based Language Modelling of Recommender Systems”. 122

Journals
D. Valcarce, J. Parapar, Á. Barreiro. “Document-based and Term-based Linear Methods
for Pseudo-Relevance Feedback”. Applied Computing Review 18(4), pp. 5-17, 2018.
D. Valcarce, I. Brilhante, J.A. Macedo, F.M. Nardini, R. Perego, C. Renso. “Item-driven
group formation”. Online Social Networks and Media 8, pp. 17-31, 2018.
D. Valcarce, J. Parapar, Á. Barreiro. “Finding and Analysing Good Neighbourhoods to
Improve Collaborative Filtering”. Knowledge-Based Systems 159, pp. 193-202, 2018.
D. Valcarce, J. Parapar, Á. Barreiro. “A MapReduce implementation of posterior
probability clustering and relevance models for recommendation”. Engineering
Applications of Artiﬁcial Intelligence 75, pp. 114-124, 2018.
D. Valcarce, J. Parapar, Á. Barreiro. “Axiomatic Analysis of Language Modelling of
Recommender Systems”. International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems 25(2), pp. 113-128, 2017.
D. Valcarce, J. Parapar, Á. Barreiro. “Item-Based Relevance Modelling of
Recommendations for Getting Rid of Long Tail Products”. Knowledge-Based Systems
103, pp. 41-51, 2016. 123

Information Retrieval Models for Recommender Systems - PhD slides

More Related Content

What's hot

Similar to Information Retrieval Models for Recommender Systems - PhD slides

More from Daniel Valcarce

Recently uploaded

Information Retrieval Models for Recommender Systems - PhD slides