DOCTORAL SYMPOSIUM
Exploring Statistical Language Models for
Recommender Systems
RecSys 2015
16 - 20 September, Vienna, Austria
Daniel Valcarce
@dvalcarce
Information Retrieval Lab
University of A Coruña
Spain
Motivation
1
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR) Information Filtering (IF)
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
Goal: Retrieve relevant
documents according to the
information need of a user
Information Filtering (IF)
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
Goal: Retrieve relevant
documents according to the
information need of a user
Examples: Search engines
(web, multimedia...)
Information Filtering (IF)
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
Goal: Retrieve relevant
documents according to the
information need of a user
Examples: Search engines
(web, multimedia...)
Information Filtering (IF)
Goal: Select relevant items
from an information stream
for a given user
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
Goal: Retrieve relevant
documents according to the
information need of a user
Examples: Search engines
(web, multimedia...)
Information Filtering (IF)
Goal: Select relevant items
from an information stream
for a given user
Examples: spam filters,
recommender systems
2
Information Retrieval vs Information Filtering (1)
Information Retrieval (IR)
Goal: Retrieve relevant
documents according to the
information need of a user
Examples: Search engines
(web, multimedia...)
Input: The user’s query
(explicit).
Information Filtering (IF)
Goal: Select relevant items
from an information stream
for a given user
Examples: spam filters,
recommender systems
Input: The user’s history
(implicit).
2
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
U. Hanani, B. Shapira and P. Shoval: Information
Filtering: Overview of Issues, Research and Systems in
User Modeling and User-Adapted Interaction (2001)
3
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
U. Hanani, B. Shapira and P. Shoval: Information
Filtering: Overview of Issues, Research and Systems in
User Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
N. J. Belkin and W. B. Croft: Information filtering and
information retrieval: two sides of the same coin? in
Communications of the ACM (1992)
3
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
U. Hanani, B. Shapira and P. Shoval: Information
Filtering: Overview of Issues, Research and Systems in
User Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
N. J. Belkin and W. B. Croft: Information filtering and
information retrieval: two sides of the same coin? in
Communications of the ACM (1992)
What is undeniable is that they are closely related:
Why not apply techniques from one field to the other?
3
Information Retrieval vs Information Filtering (2)
Some people consider them different fields:
U. Hanani, B. Shapira and P. Shoval: Information
Filtering: Overview of Issues, Research and Systems in
User Modeling and User-Adapted Interaction (2001)
While other consider them the same thing:
N. J. Belkin and W. B. Croft: Information filtering and
information retrieval: two sides of the same coin? in
Communications of the ACM (1992)
What is undeniable is that they are closely related:
Why not apply techniques from one field to the other?
It has already been done!
3
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
Information Filtering (IF)
Some CF techniques are:
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
Vector: Vector Space Model
Information Filtering (IF)
Some CF techniques are:
Vector: Pairwise similarities
(cosine, Pearson)
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
Vector: Vector Space Model
MF: Latent Semantic
Indexing (LSI)
Information Filtering (IF)
Some CF techniques are:
Vector: Pairwise similarities
(cosine, Pearson)
MF: SVD, NMF
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
Vector: Vector Space Model
MF: Latent Semantic
Indexing (LSI)
Probabilistic: LDA
Information Filtering (IF)
Some CF techniques are:
Vector: Pairwise similarities
(cosine, Pearson)
MF: SVD, NMF
Probabilistic: LDA and
other PGMs
4
Information Retrieval vs Information Filtering (3)
Information Retrieval (IR)
Some retrieval techniques are:
Vector: Vector Space Model
MF: Latent Semantic
Indexing (LSI)
Probabilistic: LDA,
Language Models (LM)
Information Filtering (IF)
Some CF techniques are:
Vector: Pairwise similarities
(cosine, Pearson)
MF: SVD, NMF
Probabilistic: LDA and
other PGMs
4
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
Maybe they can also be useful in RecSys:
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
Maybe they can also be useful in RecSys:
Are LM a good framework for Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
Maybe they can also be useful in RecSys:
Are LM a good framework for Collaborative Filtering?
Can LM be adapted to deal with temporal (TARS) and/or
contextual information (CARS)?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
Maybe they can also be useful in RecSys:
Are LM a good framework for Collaborative Filtering?
Can LM be adapted to deal with temporal (TARS) and/or
contextual information (CARS)?
A principled formulation of LM that combines
Content-Based and Collaborative Filtering?
5
Language Models for Recommendation: Research goals
Language Models (LM) represented a breakthrough in
Information Retrieval:
State-of-the-art technique for text retrieval
Solid statistical foundation
Maybe they can also be useful in RecSys:
Are LM a good framework for Collaborative Filtering?
Can LM be adapted to deal with temporal (TARS) and/or
contextual information (CARS)?
A principled formulation of LM that combines
Content-Based and Collaborative Filtering?
5
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
J. Wang, A. P. de Vries and M. J. Reinders: A User-Item
Relevance Model for Log-based Collaborative Filtering
in ECIR 2006
6
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
J. Wang, A. P. de Vries and M. J. Reinders: A User-Item
Relevance Model for Log-based Collaborative Filtering
in ECIR 2006
A. Bellogín, J. Wang and P. Castells: Bridging
Memory-Based Collaborative Filtering and Text
Retrieval in Information Retrieval (2013)
6
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
J. Wang, A. P. de Vries and M. J. Reinders: A User-Item
Relevance Model for Log-based Collaborative Filtering
in ECIR 2006
A. Bellogín, J. Wang and P. Castells: Bridging
Memory-Based Collaborative Filtering and Text
Retrieval in Information Retrieval (2013)
J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:
Relevance-Based Language Modelling for Recommender
Systems in Information Processing & Management (2013)
6
Language Models for Recommendation: Related work
There is little work done in using Language Models for CF:
J. Wang, A. P. de Vries and M. J. Reinders: A User-Item
Relevance Model for Log-based Collaborative Filtering
in ECIR 2006
A. Bellogín, J. Wang and P. Castells: Bridging
Memory-Based Collaborative Filtering and Text
Retrieval in Information Retrieval (2013)
J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:
Relevance-Based Language Modelling for Recommender
Systems in Information Processing & Management (2013)
6
Relevance-Based Language Models
for Collaborative Filtering
6
Relevance-Based Language Models
Relevance-Based Language Models or Relevance Models (RM)
are a pseudo-relevance feedback technique from IR.
Pseudo-relevance feedback is an automatic query expansion
technique.
The expanded query is expected to yield better results than the
original one.
7
Pseudo-relevance feedback
Information need
8
Pseudo-relevance feedback
Information need
query
8
Pseudo-relevance feedback
Information need
query Retrieval
System
8
Pseudo-relevance feedback
Information need
query Retrieval
System
8
Pseudo-relevance feedback
Information need
query Retrieval
System
8
Pseudo-relevance feedback
Information need
query Retrieval
System
8
Pseudo-relevance feedback
Information need
query Retrieval
System
Query
Expansion
expanded
query
8
Pseudo-relevance feedback
Information need
query Retrieval
System
Query
Expansion
expanded
query
8
Relevance-Based Language Models for CF Recommendation (1)
IR RecSys
User’s query User’s profile
mostˆ1,populatedˆ1,stateˆ2 Titanicˆ2,Avatarˆ3,Sharkˆ5
Documents
Neighbours
Terms
Items
9
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
Iu is the set of items rated by the user u
Vu is neighbourhood of the user u. This is computed using a
clustering algorithm
p(i|u) is computed smoothing the maximum likelihood
estimate with the probability in the collection
p(i) and p(v) are the item and user priors
10
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
Iu is the set of items rated by the user u
Vu is neighbourhood of the user u. This is computed
using a clustering algorithm
p(i|u) is computed smoothing the maximum likelihood
estimate with the probability in the collection
p(i) and p(v) are the item and user priors
10
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
Iu is the set of items rated by the user u
Vu is neighbourhood of the user u. This is computed using a
clustering algorithm
p(i|u) is computed smoothing the maximum likelihood
estimate with the probability in the collection
p(i) and p(v) are the item and user priors
10
Relevance-Based Language Models for CF Recommendation (2)
Parapar et al. (2013):
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
Iu is the set of items rated by the user u
Vu is neighbourhood of the user u. This is computed using a
clustering algorithm
p(i|u) is computed smoothing the maximum likelihood
estimate with the probability in the collection
p(i) and p(v) are the item and user priors
10
Smoothing methods
10
Smoothing in RM2
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i|u) is computed smoothing the maximum likelihood
estimate:
pml(i|u) =
ru,i
j∈Iu
ru,j
with the probability in the collection:
p(i|C) = v∈U rv,i
j∈I, v∈U rv,j
11
Why use smoothing?
In Information Retrieval, smoothing provides:
A way to deal with data sparsity
The inverse document frequency (IDF) role
Document length normalisation
12
Why use smoothing?
In Information Retrieval, smoothing provides:
A way to deal with data sparsity
The inverse document frequency (IDF) role
Document length normalisation
In RecSys, we have the same problems:
Data sparsity
Item popularity vs item specificity
Profiles with different lengths
12
Smoothing techniques
Jelinek-Mercer (JM): Linear interpolation. Parameter λ.
pλ(i|u) = (1 − λ) pml(i|u) + λ p(i|C)
Dirichlet priors (DP): Bayesian analysis. Parameter µ.
pµ(i|u) =
ru,i + µ p(i|C)
µ + j∈Iu
ru,j
Absolute Discounting (AD): Subtract a constant δ.
pδ(i|u) =
max(ru,i − δ, 0) + δ |Iu| p(i|C)
j∈Iu
ru,j
13
Experiments with smoothing
13
Smoothing: ranking accuracy
0.20
0.25
0.30
0.35
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
nDCG@10
µ
λ, δ
RM2 + AD
RM2 + JM
RM2 + DP
Figure: nDCG@10 values of RM2 varying the smoothing method
using 400 nearest neighbours according to Pearson’s correlation on
MovieLens 100k dataset
14
Smoothing: diversity
0.010
0.015
0.020
0.025
0.030
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Gini@10
µ
λ, δ
RM2 + AD
RM2 + JM
RM2 + DP
Figure: Gini@10 values of RM2 varying the smoothing method using
400 nearest neighbours according to Pearson’s correlation on
MovieLens 100k dataset
15
Smoothing: novelty
7.5
8.0
8.5
9.0
9.5
0 100 200 300 400 500 600 700 800 900 1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MSI@10
µ
λ, δ
RM2 + AD
RM2 + JM
RM2 + DP
Figure: MSI@10 values of RM2 varying the smoothing method using
400 nearest neighbours according to Pearson’s correlation on
MovieLens 100k dataset
16
More about smoothings in RM2 for CF
More details about smoothings in:
D. Valcarce, J. Parapar, Á. Barreiro: A Study of
Smoothing Methods for Relevance-Based Language
Modelling of Recommender Systems in ECIR 2015
17
Priors
17
Priors in RM2
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
Enable to introduce a priori information into the model
18
Priors in RM2
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
p(i) and p(v) are the item and user priors:
Enable to introduce a priori information into the model
Provide a principled way of modelling business rules!
18
Prior estimates
Uniform (U) Linear (L)
User prior pU(u) =
1
|U|
pL(u) = i∈Iu
ru,i
v∈U j∈Iv
rv,j
Item prior pU(i) =
1
|I|
pL(i) =
u∈Ui
ru,i
j∈I v∈Uj
rv,j
19
Experiments with priors
19
Priors on MovieLens 100k
User prior Item prior nDCG@10 Gini@10 MSI@10
Linear Linear 0.0922 0.4603 28.4284
Uniform Linear 0.2453 0.2027 16.4022
Uniform Uniform 0.3296 0.0256 6.8273
Linear Uniform 0.3423 0.0264 6.7848
Table: nDCG@10, Gini@10 and MSI@10 values of RM2 varying the
prior estimates using 400 nearest neighbours according to Pearson’s
correlation on MovieLens 100k dataset and Absolute Discounting
(δ = 0.1)
More priors in
D. Valcarce, J. Parapar and Á. Barreiro: A Study of Priors
for Relevance-Based Language Modelling of
Recommender Systems in RecSys 2015!
20
Comparison with other CF algorithms
20
Priors on MovieLens 100k
Algorithm nDCG@10 Gini@10 MSI@10
SVD 0.0946 0.0109 14.6129
SVD++ 0.1113 0.0126 14.9574
NNCosNgbr 0.1771 0.0344 16.8222
UIR-Item 0.2188 0.0124 5.2337
PureSVD 0.3595 0.1364 11.8841
RM2-JM 0.3175 0.0232 9.1087
RM2-DP 0.3274 0.0251 9.2181
RM2-AD 0.3296 0.0256 9.2409
RM2-AD-L-U 0.3423 0.0264 9.2004
Table: nDCG@10, Gini@10 and MSI@10 values of different CF
recommendation algorithms
21
Conclusions and future directions
21
Conclusions
IR techniques can be employed in RecSys
Not only methods such as SVD...
but also Language Models!
22
Conclusions
IR techniques can be employed in RecSys
Not only methods such as SVD...
but also Language Models!
Language Models provide a principled and interpretable
framework for recommendation.
22
Conclusions
IR techniques can be employed in RecSys
Not only methods such as SVD...
but also Language Models!
Language Models provide a principled and interpretable
framework for recommendation.
Relevance-Based Language Models are competitive, but there is
room for improvements:
More sophisticated priors
22
Conclusions
IR techniques can be employed in RecSys
Not only methods such as SVD...
but also Language Models!
Language Models provide a principled and interpretable
framework for recommendation.
Relevance-Based Language Models are competitive, but there is
room for improvements:
More sophisticated priors
Neighbourhood computation
◦ Different similarity metrics: cosine, Kullback–Leibler
divergence
◦ Matrix factorisation: NMF, SVD
◦ Spectral clustering: NC
22
Future work
Improve novelty and diversity figures:
RM2 performance is similar to PureSVD in terms of nDCG
but it fails in terms of diversity and novelty
23
Future work
Improve novelty and diversity figures:
RM2 performance is similar to PureSVD in terms of nDCG
but it fails in terms of diversity and novelty
Introduce more evidences in the LM framework apart from
ratings:
Content-based information (hybrid recommender)
Temporal and contextual information (TARS & CARS)
23
Thank you!
@dvalcarce
http://www.dc.fi.udc.es/~dvalcarce
Time and Context in Language Models
Time:
X. Li and W. B. Croft: Time-based Language Models in
CIKM 2003
K. Berberich, S. Bedathur, O. Alonso and G. Weikum: A
language modeling approach for temporal information
needs in ECIR 2010
Context:
H. Rode and D. Hiemstra: Conceptual Language Models
for Context-Aware Text Retrieval in TREC 2004
L. Azzopardi: Incorporating Context within the
Language Modeling Approach for ad hoc Information
Retrieval. PhD Thesis (2005)
25

Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS Slides]

  • 1.
    DOCTORAL SYMPOSIUM Exploring StatisticalLanguage Models for Recommender Systems RecSys 2015 16 - 20 September, Vienna, Austria Daniel Valcarce @dvalcarce Information Retrieval Lab University of A Coruña Spain
  • 2.
  • 3.
    Information Retrieval vsInformation Filtering (1) Information Retrieval (IR) Information Filtering (IF) 2
  • 4.
    Information Retrieval vsInformation Filtering (1) Information Retrieval (IR) Goal: Retrieve relevant documents according to the information need of a user Information Filtering (IF) 2
  • 5.
    Information Retrieval vsInformation Filtering (1) Information Retrieval (IR) Goal: Retrieve relevant documents according to the information need of a user Examples: Search engines (web, multimedia...) Information Filtering (IF) 2
  • 6.
    Information Retrieval vsInformation Filtering (1) Information Retrieval (IR) Goal: Retrieve relevant documents according to the information need of a user Examples: Search engines (web, multimedia...) Information Filtering (IF) Goal: Select relevant items from an information stream for a given user 2
  • 7.
    Information Retrieval vsInformation Filtering (1) Information Retrieval (IR) Goal: Retrieve relevant documents according to the information need of a user Examples: Search engines (web, multimedia...) Information Filtering (IF) Goal: Select relevant items from an information stream for a given user Examples: spam filters, recommender systems 2
  • 8.
    Information Retrieval vsInformation Filtering (1) Information Retrieval (IR) Goal: Retrieve relevant documents according to the information need of a user Examples: Search engines (web, multimedia...) Input: The user’s query (explicit). Information Filtering (IF) Goal: Select relevant items from an information stream for a given user Examples: spam filters, recommender systems Input: The user’s history (implicit). 2
  • 9.
    Information Retrieval vsInformation Filtering (2) Some people consider them different fields: U. Hanani, B. Shapira and P. Shoval: Information Filtering: Overview of Issues, Research and Systems in User Modeling and User-Adapted Interaction (2001) 3
  • 10.
    Information Retrieval vsInformation Filtering (2) Some people consider them different fields: U. Hanani, B. Shapira and P. Shoval: Information Filtering: Overview of Issues, Research and Systems in User Modeling and User-Adapted Interaction (2001) While other consider them the same thing: N. J. Belkin and W. B. Croft: Information filtering and information retrieval: two sides of the same coin? in Communications of the ACM (1992) 3
  • 11.
    Information Retrieval vsInformation Filtering (2) Some people consider them different fields: U. Hanani, B. Shapira and P. Shoval: Information Filtering: Overview of Issues, Research and Systems in User Modeling and User-Adapted Interaction (2001) While other consider them the same thing: N. J. Belkin and W. B. Croft: Information filtering and information retrieval: two sides of the same coin? in Communications of the ACM (1992) What is undeniable is that they are closely related: Why not apply techniques from one field to the other? 3
  • 12.
    Information Retrieval vsInformation Filtering (2) Some people consider them different fields: U. Hanani, B. Shapira and P. Shoval: Information Filtering: Overview of Issues, Research and Systems in User Modeling and User-Adapted Interaction (2001) While other consider them the same thing: N. J. Belkin and W. B. Croft: Information filtering and information retrieval: two sides of the same coin? in Communications of the ACM (1992) What is undeniable is that they are closely related: Why not apply techniques from one field to the other? It has already been done! 3
  • 13.
    Information Retrieval vsInformation Filtering (3) Information Retrieval (IR) Some retrieval techniques are: Information Filtering (IF) Some CF techniques are: 4
  • 14.
    Information Retrieval vsInformation Filtering (3) Information Retrieval (IR) Some retrieval techniques are: Vector: Vector Space Model Information Filtering (IF) Some CF techniques are: Vector: Pairwise similarities (cosine, Pearson) 4
  • 15.
    Information Retrieval vsInformation Filtering (3) Information Retrieval (IR) Some retrieval techniques are: Vector: Vector Space Model MF: Latent Semantic Indexing (LSI) Information Filtering (IF) Some CF techniques are: Vector: Pairwise similarities (cosine, Pearson) MF: SVD, NMF 4
  • 16.
    Information Retrieval vsInformation Filtering (3) Information Retrieval (IR) Some retrieval techniques are: Vector: Vector Space Model MF: Latent Semantic Indexing (LSI) Probabilistic: LDA Information Filtering (IF) Some CF techniques are: Vector: Pairwise similarities (cosine, Pearson) MF: SVD, NMF Probabilistic: LDA and other PGMs 4
  • 17.
    Information Retrieval vsInformation Filtering (3) Information Retrieval (IR) Some retrieval techniques are: Vector: Vector Space Model MF: Latent Semantic Indexing (LSI) Probabilistic: LDA, Language Models (LM) Information Filtering (IF) Some CF techniques are: Vector: Pairwise similarities (cosine, Pearson) MF: SVD, NMF Probabilistic: LDA and other PGMs 4
  • 18.
    Language Models forRecommendation: Research goals Language Models (LM) represented a breakthrough in Information Retrieval: State-of-the-art technique for text retrieval Solid statistical foundation 5
  • 19.
    Language Models forRecommendation: Research goals Language Models (LM) represented a breakthrough in Information Retrieval: State-of-the-art technique for text retrieval Solid statistical foundation Maybe they can also be useful in RecSys: 5
  • 20.
    Language Models forRecommendation: Research goals Language Models (LM) represented a breakthrough in Information Retrieval: State-of-the-art technique for text retrieval Solid statistical foundation Maybe they can also be useful in RecSys: Are LM a good framework for Collaborative Filtering? 5
  • 21.
    Language Models forRecommendation: Research goals Language Models (LM) represented a breakthrough in Information Retrieval: State-of-the-art technique for text retrieval Solid statistical foundation Maybe they can also be useful in RecSys: Are LM a good framework for Collaborative Filtering? Can LM be adapted to deal with temporal (TARS) and/or contextual information (CARS)? 5
  • 22.
    Language Models forRecommendation: Research goals Language Models (LM) represented a breakthrough in Information Retrieval: State-of-the-art technique for text retrieval Solid statistical foundation Maybe they can also be useful in RecSys: Are LM a good framework for Collaborative Filtering? Can LM be adapted to deal with temporal (TARS) and/or contextual information (CARS)? A principled formulation of LM that combines Content-Based and Collaborative Filtering? 5
  • 23.
    Language Models forRecommendation: Research goals Language Models (LM) represented a breakthrough in Information Retrieval: State-of-the-art technique for text retrieval Solid statistical foundation Maybe they can also be useful in RecSys: Are LM a good framework for Collaborative Filtering? Can LM be adapted to deal with temporal (TARS) and/or contextual information (CARS)? A principled formulation of LM that combines Content-Based and Collaborative Filtering? 5
  • 24.
    Language Models forRecommendation: Related work There is little work done in using Language Models for CF: J. Wang, A. P. de Vries and M. J. Reinders: A User-Item Relevance Model for Log-based Collaborative Filtering in ECIR 2006 6
  • 25.
    Language Models forRecommendation: Related work There is little work done in using Language Models for CF: J. Wang, A. P. de Vries and M. J. Reinders: A User-Item Relevance Model for Log-based Collaborative Filtering in ECIR 2006 A. Bellogín, J. Wang and P. Castells: Bridging Memory-Based Collaborative Filtering and Text Retrieval in Information Retrieval (2013) 6
  • 26.
    Language Models forRecommendation: Related work There is little work done in using Language Models for CF: J. Wang, A. P. de Vries and M. J. Reinders: A User-Item Relevance Model for Log-based Collaborative Filtering in ECIR 2006 A. Bellogín, J. Wang and P. Castells: Bridging Memory-Based Collaborative Filtering and Text Retrieval in Information Retrieval (2013) J. Parapar, A. Bellogín, P. Castells and Á. Barreiro: Relevance-Based Language Modelling for Recommender Systems in Information Processing & Management (2013) 6
  • 27.
    Language Models forRecommendation: Related work There is little work done in using Language Models for CF: J. Wang, A. P. de Vries and M. J. Reinders: A User-Item Relevance Model for Log-based Collaborative Filtering in ECIR 2006 A. Bellogín, J. Wang and P. Castells: Bridging Memory-Based Collaborative Filtering and Text Retrieval in Information Retrieval (2013) J. Parapar, A. Bellogín, P. Castells and Á. Barreiro: Relevance-Based Language Modelling for Recommender Systems in Information Processing & Management (2013) 6
  • 28.
    Relevance-Based Language Models forCollaborative Filtering 6
  • 29.
    Relevance-Based Language Models Relevance-BasedLanguage Models or Relevance Models (RM) are a pseudo-relevance feedback technique from IR. Pseudo-relevance feedback is an automatic query expansion technique. The expanded query is expected to yield better results than the original one. 7
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Pseudo-relevance feedback Information need queryRetrieval System Query Expansion expanded query 8
  • 37.
    Pseudo-relevance feedback Information need queryRetrieval System Query Expansion expanded query 8
  • 38.
    Relevance-Based Language Modelsfor CF Recommendation (1) IR RecSys User’s query User’s profile mostˆ1,populatedˆ1,stateˆ2 Titanicˆ2,Avatarˆ3,Sharkˆ5 Documents Neighbours Terms Items 9
  • 39.
    Relevance-Based Language Modelsfor CF Recommendation (2) Parapar et al. (2013): RM2 : p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) Iu is the set of items rated by the user u Vu is neighbourhood of the user u. This is computed using a clustering algorithm p(i|u) is computed smoothing the maximum likelihood estimate with the probability in the collection p(i) and p(v) are the item and user priors 10
  • 40.
    Relevance-Based Language Modelsfor CF Recommendation (2) Parapar et al. (2013): RM2 : p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) Iu is the set of items rated by the user u Vu is neighbourhood of the user u. This is computed using a clustering algorithm p(i|u) is computed smoothing the maximum likelihood estimate with the probability in the collection p(i) and p(v) are the item and user priors 10
  • 41.
    Relevance-Based Language Modelsfor CF Recommendation (2) Parapar et al. (2013): RM2 : p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) Iu is the set of items rated by the user u Vu is neighbourhood of the user u. This is computed using a clustering algorithm p(i|u) is computed smoothing the maximum likelihood estimate with the probability in the collection p(i) and p(v) are the item and user priors 10
  • 42.
    Relevance-Based Language Modelsfor CF Recommendation (2) Parapar et al. (2013): RM2 : p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) Iu is the set of items rated by the user u Vu is neighbourhood of the user u. This is computed using a clustering algorithm p(i|u) is computed smoothing the maximum likelihood estimate with the probability in the collection p(i) and p(v) are the item and user priors 10
  • 43.
  • 44.
    Smoothing in RM2 RM2: p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) p(i|u) is computed smoothing the maximum likelihood estimate: pml(i|u) = ru,i j∈Iu ru,j with the probability in the collection: p(i|C) = v∈U rv,i j∈I, v∈U rv,j 11
  • 45.
    Why use smoothing? InInformation Retrieval, smoothing provides: A way to deal with data sparsity The inverse document frequency (IDF) role Document length normalisation 12
  • 46.
    Why use smoothing? InInformation Retrieval, smoothing provides: A way to deal with data sparsity The inverse document frequency (IDF) role Document length normalisation In RecSys, we have the same problems: Data sparsity Item popularity vs item specificity Profiles with different lengths 12
  • 47.
    Smoothing techniques Jelinek-Mercer (JM):Linear interpolation. Parameter λ. pλ(i|u) = (1 − λ) pml(i|u) + λ p(i|C) Dirichlet priors (DP): Bayesian analysis. Parameter µ. pµ(i|u) = ru,i + µ p(i|C) µ + j∈Iu ru,j Absolute Discounting (AD): Subtract a constant δ. pδ(i|u) = max(ru,i − δ, 0) + δ |Iu| p(i|C) j∈Iu ru,j 13
  • 48.
  • 49.
    Smoothing: ranking accuracy 0.20 0.25 0.30 0.35 0100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 nDCG@10 µ λ, δ RM2 + AD RM2 + JM RM2 + DP Figure: nDCG@10 values of RM2 varying the smoothing method using 400 nearest neighbours according to Pearson’s correlation on MovieLens 100k dataset 14
  • 50.
    Smoothing: diversity 0.010 0.015 0.020 0.025 0.030 0 100200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Gini@10 µ λ, δ RM2 + AD RM2 + JM RM2 + DP Figure: Gini@10 values of RM2 varying the smoothing method using 400 nearest neighbours according to Pearson’s correlation on MovieLens 100k dataset 15
  • 51.
    Smoothing: novelty 7.5 8.0 8.5 9.0 9.5 0 100200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MSI@10 µ λ, δ RM2 + AD RM2 + JM RM2 + DP Figure: MSI@10 values of RM2 varying the smoothing method using 400 nearest neighbours according to Pearson’s correlation on MovieLens 100k dataset 16
  • 52.
    More about smoothingsin RM2 for CF More details about smoothings in: D. Valcarce, J. Parapar, Á. Barreiro: A Study of Smoothing Methods for Relevance-Based Language Modelling of Recommender Systems in ECIR 2015 17
  • 53.
  • 54.
    Priors in RM2 RM2: p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) p(i) and p(v) are the item and user priors: Enable to introduce a priori information into the model 18
  • 55.
    Priors in RM2 RM2: p(i|Ru) ∝ p(i) j∈Iu v∈Vu p(i|v) p(v) p(i) p(j|v) p(i) and p(v) are the item and user priors: Enable to introduce a priori information into the model Provide a principled way of modelling business rules! 18
  • 56.
    Prior estimates Uniform (U)Linear (L) User prior pU(u) = 1 |U| pL(u) = i∈Iu ru,i v∈U j∈Iv rv,j Item prior pU(i) = 1 |I| pL(i) = u∈Ui ru,i j∈I v∈Uj rv,j 19
  • 57.
  • 58.
    Priors on MovieLens100k User prior Item prior nDCG@10 Gini@10 MSI@10 Linear Linear 0.0922 0.4603 28.4284 Uniform Linear 0.2453 0.2027 16.4022 Uniform Uniform 0.3296 0.0256 6.8273 Linear Uniform 0.3423 0.0264 6.7848 Table: nDCG@10, Gini@10 and MSI@10 values of RM2 varying the prior estimates using 400 nearest neighbours according to Pearson’s correlation on MovieLens 100k dataset and Absolute Discounting (δ = 0.1) More priors in D. Valcarce, J. Parapar and Á. Barreiro: A Study of Priors for Relevance-Based Language Modelling of Recommender Systems in RecSys 2015! 20
  • 59.
    Comparison with otherCF algorithms 20
  • 60.
    Priors on MovieLens100k Algorithm nDCG@10 Gini@10 MSI@10 SVD 0.0946 0.0109 14.6129 SVD++ 0.1113 0.0126 14.9574 NNCosNgbr 0.1771 0.0344 16.8222 UIR-Item 0.2188 0.0124 5.2337 PureSVD 0.3595 0.1364 11.8841 RM2-JM 0.3175 0.0232 9.1087 RM2-DP 0.3274 0.0251 9.2181 RM2-AD 0.3296 0.0256 9.2409 RM2-AD-L-U 0.3423 0.0264 9.2004 Table: nDCG@10, Gini@10 and MSI@10 values of different CF recommendation algorithms 21
  • 61.
  • 62.
    Conclusions IR techniques canbe employed in RecSys Not only methods such as SVD... but also Language Models! 22
  • 63.
    Conclusions IR techniques canbe employed in RecSys Not only methods such as SVD... but also Language Models! Language Models provide a principled and interpretable framework for recommendation. 22
  • 64.
    Conclusions IR techniques canbe employed in RecSys Not only methods such as SVD... but also Language Models! Language Models provide a principled and interpretable framework for recommendation. Relevance-Based Language Models are competitive, but there is room for improvements: More sophisticated priors 22
  • 65.
    Conclusions IR techniques canbe employed in RecSys Not only methods such as SVD... but also Language Models! Language Models provide a principled and interpretable framework for recommendation. Relevance-Based Language Models are competitive, but there is room for improvements: More sophisticated priors Neighbourhood computation ◦ Different similarity metrics: cosine, Kullback–Leibler divergence ◦ Matrix factorisation: NMF, SVD ◦ Spectral clustering: NC 22
  • 66.
    Future work Improve noveltyand diversity figures: RM2 performance is similar to PureSVD in terms of nDCG but it fails in terms of diversity and novelty 23
  • 67.
    Future work Improve noveltyand diversity figures: RM2 performance is similar to PureSVD in terms of nDCG but it fails in terms of diversity and novelty Introduce more evidences in the LM framework apart from ratings: Content-based information (hybrid recommender) Temporal and contextual information (TARS & CARS) 23
  • 68.
  • 69.
    Time and Contextin Language Models Time: X. Li and W. B. Croft: Time-based Language Models in CIKM 2003 K. Berberich, S. Bedathur, O. Alonso and G. Weikum: A language modeling approach for temporal information needs in ECIR 2010 Context: H. Rode and D. Hiemstra: Conceptual Language Models for Context-Aware Text Retrieval in TREC 2004 L. Azzopardi: Incorporating Context within the Language Modeling Approach for ad hoc Information Retrieval. PhD Thesis (2005) 25