Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]

DOCTORAL SYMPOSIUM
Exploring Statistical Language Models for
Recommender Systems
RecSys 2015
16 - 20 September, Vienna, Austria
Daniel Valcarce
@dvalcarce
Information Retrieval Lab
University of A Coruña
Spain

Information Retrieval vs Information Filtering (1)
Information Retrieval (IR) Information Filtering (IF)
2

Information Retrieval (IR)
Goal: Retrieve relevant
documents according to the
information need of a user
Information Filtering (IF)
2

Examples: Search engines
(web, multimedia...)
2

Goal: Select relevant items
from an information stream
for a given user
2

for a given user
Examples: spam ﬁlters,
recommender systems
2

Input: The user’s query
(explicit).
for a given user
Examples: spam ﬁlters,
recommender systems
Input: The user’s history
(implicit).
2

Some people consider them different ﬁelds:
U. Hanani, B. Shapira and P. Shoval: Information
Filtering: Overview of Issues, Research and Systems in
User Modeling and User-Adapted Interaction (2001)
3

While other consider them the same thing:
N. J. Belkin and W. B. Croft: Information ﬁltering and
information retrieval: two sides of the same coin? in
Communications of the ACM (1992)
3

What is undeniable is that they are closely related:
Why not apply techniques from one ﬁeld to the other?
3

What is undeniable is that they are closely related:
Why not apply techniques from one ﬁeld to the other?
It has already been done!
3

Some retrieval techniques are:
Some CF techniques are:
4

Vector: Vector Space Model
Vector: Pairwise similarities
(cosine, Pearson)
4

MF: Latent Semantic
Indexing (LSI)
(cosine, Pearson)
MF: SVD, NMF
4

MF: Latent Semantic
Indexing (LSI)
Probabilistic: LDA
(cosine, Pearson)
MF: SVD, NMF
Probabilistic: LDA and
other PGMs
4

MF: Latent Semantic
Indexing (LSI)
Probabilistic: LDA,
Language Models (LM)
(cosine, Pearson)
MF: SVD, NMF
Probabilistic: LDA and
other PGMs
4

Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
5

Maybe they can also be useful in RecSys:
5

Are LM a good framework for Collaborative Filtering?
5

Can LM be adapted to deal with temporal (TARS) and/or
contextual information (CARS)?
5

Can LM be adapted to deal with temporal (TARS) and/or
contextual information (CARS)?
A principled formulation of LM that combines
Content-Based and Collaborative Filtering?
5

Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
J. Wang, A. P. de Vries and M. J. Reinders: A User-Item
Relevance Model for Log-based Collaborative Filtering
in ECIR 2006
6

in ECIR 2006
A. Bellogín, J. Wang and P. Castells: Bridging
Memory-Based Collaborative Filtering and Text
Retrieval in Information Retrieval (2013)
6

in ECIR 2006
A. Bellogín, J. Wang and P. Castells: Bridging
Memory-Based Collaborative Filtering and Text
Retrieval in Information Retrieval (2013)
J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:
Relevance-Based Language Modelling for Recommender
Systems in Information Processing & Management (2013)
6

Relevance-Based Language Models
for Collaborative Filtering
6

Relevance-Based Language Models
Relevance-Based Language Models or Relevance Models (RM)
are a pseudo-relevance feedback technique from IR.
Pseudo-relevance feedback is an automatic query expansion
technique.
The expanded query is expected to yield better results than the
original one.
7

Pseudo-relevance feedback
Information need
8

Information need
query
8

Information need
query Retrieval
System
8

Information need
query Retrieval
System
Query
Expansion
expanded
query
8

Relevance-Based Language Models for CF Recommendation (1)
IR RecSys
User’s query User’s proﬁle
mostˆ1,populatedˆ1,stateˆ2 Titanicˆ2,Avatarˆ3,Sharkˆ5
Documents
Neighbours
Terms
Items
9

Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
Iu is the set of items rated by the user u
Vu is neighbourhood of the user u. This is computed using a
clustering algorithm
p(i|u) is computed smoothing the maximum likelihood
estimate with the probability in the collection
p(i) and p(v) are the item and user priors
10

Parapar et al. (2013):
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
Iu is the set of items rated by the user u
Vu is neighbourhood of the user u. This is computed
using a clustering algorithm
estimate with the probability in the collection
p(i) and p(v) are the item and user priors
10

Smoothing in RM2
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
estimate:
pml(i|u) =
ru,i
j∈Iu
ru,j
with the probability in the collection:
p(i|C) = v∈U rv,i
j∈I, v∈U rv,j
11

Why use smoothing?
In Information Retrieval, smoothing provides:
A way to deal with data sparsity
The inverse document frequency (IDF) role
Document length normalisation
12

Why use smoothing?
In Information Retrieval, smoothing provides:
A way to deal with data sparsity
The inverse document frequency (IDF) role
Document length normalisation
In RecSys, we have the same problems:
Data sparsity
Item popularity vs item speciﬁcity
Proﬁles with different lengths
12

Smoothing: ranking accuracy
0.20
0.25
0.30
0.35
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
nDCG@10
µ
λ, δ
RM2 + AD
RM2 + JM
RM2 + DP
Figure: nDCG@10 values of RM2 varying the smoothing method
using 400 nearest neighbours according to Pearson’s correlation on
MovieLens 100k dataset
14

Smoothing: diversity
0.010
0.015
0.020
0.025
0.030
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Gini@10
µ
λ, δ
RM2 + AD
RM2 + JM
RM2 + DP
Figure: Gini@10 values of RM2 varying the smoothing method using
400 nearest neighbours according to Pearson’s correlation on
15

Smoothing: novelty
7.5
8.0
8.5
9.0
9.5
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MSI@10
µ
λ, δ
RM2 + AD
RM2 + JM
RM2 + DP
Figure: MSI@10 values of RM2 varying the smoothing method using
400 nearest neighbours according to Pearson’s correlation on
16

More about smoothings in RM2 for CF
More details about smoothings in:
D. Valcarce, J. Parapar, Á. Barreiro: A Study of
Smoothing Methods for Relevance-Based Language
Modelling of Recommender Systems in ECIR 2015
17

Priors in RM2
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
Enable to introduce a priori information into the model
18

Priors in RM2
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
Enable to introduce a priori information into the model
Provide a principled way of modelling business rules!
18

Prior estimates
Uniform (U) Linear (L)
User prior pU(u) =
1
|U|
pL(u) = i∈Iu
ru,i
v∈U j∈Iv
rv,j
Item prior pU(i) =
1
|I|
pL(i) =
u∈Ui
ru,i
j∈I v∈Uj
rv,j
19

Priors on MovieLens 100k
User prior Item prior nDCG@10 Gini@10 MSI@10
Linear Linear 0.0922 0.4603 28.4284
Uniform Linear 0.2453 0.2027 16.4022
Uniform Uniform 0.3296 0.0256 6.8273
Linear Uniform 0.3423 0.0264 6.7848
Table: nDCG@10, Gini@10 and MSI@10 values of RM2 varying the
prior estimates using 400 nearest neighbours according to Pearson’s
correlation on MovieLens 100k dataset and Absolute Discounting
(δ = 0.1)
More priors in
D. Valcarce, J. Parapar and Á. Barreiro: A Study of Priors
for Relevance-Based Language Modelling of
Recommender Systems in RecSys 2015!
20

Comparison with other CF algorithms
20

Priors on MovieLens 100k
Algorithm nDCG@10 Gini@10 MSI@10
SVD 0.0946 0.0109 14.6129
SVD++ 0.1113 0.0126 14.9574
NNCosNgbr 0.1771 0.0344 16.8222
UIR-Item 0.2188 0.0124 5.2337
PureSVD 0.3595 0.1364 11.8841
RM2-JM 0.3175 0.0232 9.1087
RM2-DP 0.3274 0.0251 9.2181
RM2-AD 0.3296 0.0256 9.2409
RM2-AD-L-U 0.3423 0.0264 9.2004
Table: nDCG@10, Gini@10 and MSI@10 values of different CF
recommendation algorithms
21

Conclusions and future directions
21

Conclusions
IR techniques can be employed in RecSys
Not only methods such as SVD...
but also Language Models!
22

Conclusions
Language Models provide a principled and interpretable
framework for recommendation.
22

Conclusions
Relevance-Based Language Models are competitive, but there is
room for improvements:
More sophisticated priors
22

Conclusions
Relevance-Based Language Models are competitive, but there is
room for improvements:
More sophisticated priors
Neighbourhood computation
◦ Different similarity metrics: cosine, Kullback–Leibler
divergence
◦ Matrix factorisation: NMF, SVD
◦ Spectral clustering: NC
22

Future work
Improve novelty and diversity ﬁgures:
RM2 performance is similar to PureSVD in terms of nDCG
but it fails in terms of diversity and novelty
23

Future work
Improve novelty and diversity ﬁgures:
RM2 performance is similar to PureSVD in terms of nDCG
but it fails in terms of diversity and novelty
Introduce more evidences in the LM framework apart from
ratings:
Content-based information (hybrid recommender)
Temporal and contextual information (TARS & CARS)
23

Thank you!
@dvalcarce
http://www.dc.fi.udc.es/~dvalcarce

Time and Context in Language Models
Time:
X. Li and W. B. Croft: Time-based Language Models in
CIKM 2003
K. Berberich, S. Bedathur, O. Alonso and G. Weikum: A
language modeling approach for temporal information
needs in ECIR 2010
Context:
H. Rode and D. Hiemstra: Conceptual Language Models
for Context-Aware Text Retrieval in TREC 2004
L. Azzopardi: Incorporating Context within the
Language Modeling Approach for ad hoc Information
Retrieval. PhD Thesis (2005)
25

Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]

Similar to Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides] (20)

More from Daniel Valcarce

More from Daniel Valcarce (8)

Recently uploaded

Recently uploaded (20)

Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]