Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence

Factorization Meets the Item Embedding:
Regularizing Matrix Factorization with
Item Co-occurrence
Dawen Liang
Columbia University/Netﬂix
Jaan Altosaar Laurent Charlin David Blei

A simple trick to boost the performance
of your recommender system without
using any additional data
Dawen Liang
Columbia University/Netﬂix
Jaan Altosaar Laurent Charlin David Blei

• User-item interaction is commonly
encoded in a user-by-item matrix
• In the form of (user, item, preference) triplets
• Matrix factorization is the standard method to
infer latent user preferences
Motivation
Items
Users
?
?

• Alternatively we can model item co-
occurrence across users
• Analogy: modeling a set of documents (users) as
a bag of co-occurring words (items): e.g., “Pluto”
and “planet”
Motivation
:
…
,{ } ,{ }
,{ }
,
, ,{ }

Can we combine these two
views in a single model?
YES

ItemsUsers
?
? ≈
User latent factors θ
Item latent factors β
K
# ItemsK
#users
*
Click matrix Y
“Collaborative ﬁltering for implicit feedback datasets”, Y. Hu, Y. Koren, C. Volinsky, ICDM 08.
Lmf =
X
u,i
cui(yui ✓>
u i)2

• Skip-gram word2vec
• Learn a low-dimensional
word embedding in a
continuous space
• Predict context words given
the current word
Word embedding

Item embedding
• Skip-gram word2vec
• Learn a low-dimensional
word embedding in a
continuous space
• Predict context words given
the current word
We can embed item sequences in the same fashion

Levy & Goldberg show that skip-gram
word2vec is implicitly factorizing (some
variation of) the pointwise mutual
information (PMI) matrix
“Neural Word Embedding as Implicit Matrix Factorization”, Levy & Goldberg, NIPS 14.
ct of
held-
dings
ories
ords.
earn
erred
the
used
item
tions
that
eable
ather
s for
intly
locks
item
how
Mikolov et al. [13] for more details).
Levy and Goldberg [10] show that word2vec with a neg-
ative sampling value of k can be interpreted as implicitly
factorizing the pointwise mutual information (PMI) matrix
shifted by log k. PMI between a word i and its context word
j is deﬁned as:
PMI(i, j) = log
P(i, j)
P(i)P(j)
Empirically, it is estimated as:
PMI(i, j) = log
#(i, j) · D
#(i) · #(j)
.
Here #(i, j) is the number of times word j appears in the
context of word i. D is the total number of word-context
pairs. #(i) =
P
j #(i, j) and #(j) =
P
i #(i, j).
After making the connection between word2vec and matrix
factorization, Levy and Goldberg [10] further proposed to
perform word embedding by spectral dimensionality reduc-
tion (e.g., singular value decomposition) on shifted positive
PMI (SPPMI) matrix:
SPPMI(i, j) = max max{PMI(i, j), 0} log k, 0
This is attractive since it does not require learning rate and
current
word/item
context
word/item
Co-occurrence matrix
• PMI(“Pluto”, “planet”) > PMI(“Pluto”, “RecSys”)

Jointly factorize both the click matrix and
co-occurrence PMI matrix with a shared
item representation/embedding
CoFactor

• Item representation must account for both user-
item interactions and item-item co-occurrence
• Alternative interpretation: regularizing the
traditional MF objective with item embeddings
learned by factorizing the item co-occurrence
matrix
Lco =
X
u,i
cui(yui ✓>
u i)2
+
X
mij 6=0
(mij
>
i j wi cj)2
Matrix factorization Item embedding
Shared item representation/embedding

Problem/application-specific
• Define context as the entire user click history
• #(i, j) is the number of users who clicked on
both item i and item j
• Do not require any additional information
beyond standard MF model
How to define “co-occur”

• Data preparation: 70/20/10 train/test/validation
• Make sure train/validation do not overlap in time
with test
• Metrics: Recall@20, 50, NDCG@100, MAP@100
Empirical study
ArXiv ML-20M TasteProfile
# of users 25,057 111,148 221,830
# of items 63,003 11,711 22,781
# interactions 1.8M 8.2M 14.0M
% interactions 0.12% 0.63% 0.29%
with timestamps yes yes no
Table 1: Attributes of datasets after preprocessing. Inter-
actions are non-zero entries (listening counts, watches, and
clicks). % interactions refers to the density of the user-item
interaction matrix (Y ). For datasets with timestamps, we
ensure there is no overlap in time between the training and
test sets.
why jointly factoring both the user click matrix and
item co-occurrence matrix boosts the performance by
exploring the model fits.
• We also demonstrate the importance of joint learning
dation challenges,
Recall@M, truncat
(NDCG@M), and
each user, all the
(unobserved) items
considers all items r
NDCG@M and M
discount to emphas
lower ones. Formal
items, 1{·} is the in
user u has consume
to predict ranking
preference ✓>
u i fo
defined as
Recall@M(u, ⇡)
The expression in th
between M and th

Quantitative results
ArXiv ML-20M TasteProfile
WMF CoFactor WMF CoFactor WMF CoFactor
Recall@20 0.063 0.067 0.133 0.145 0.198 0.208
Recall@50 0.108 0.110 0.165 0.177 0.286 0.300
NDCG@100 0.076 0.079 0.160 0.172 0.257 0.268
MAP@100 0.019 0.021 0.047 0.055 0.103 0.111
ble 2: Comparison between the widely-used weighted matrix factorization (WMF) model [8] and our CoFactor mode
Factor significantly outperforms WMF on all the datasets across all metrics. The improvement is most pronounced on th
ovie watching (ML-20M) and music listening (TasteProfile) datasets.
rameter indicates that the model benefits from account-
g for co-occurrence patterns in the observed user behavior
ta. We also grid search for the negative sampling values
2 {1, 2, 5, 10, 50} which e↵ectively modulate how much to
ft the empirically estimated PMI matrix.
4 Analyzing the CoFactor model fits
Table 2 summarizes the quantitative results. Each metric
averaged across all users in the test set. As we can see,
• We get better results by simply re-using the data
• Item co-occurrence is in principle available to
MF model, but MF model (bi-linear) has limited
modeling capacity to make use of it

< 50 ≥ 50, < 100 ≥ 100, < 150 ≥ 150, < 500 ≥ 500
1umber of songs Whe user hDs lisWeneG Wo
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40AverDge1DCG@100
CoFDFWor
W0F
User activity: Low High
We observe similar trend for other datasets as well

Toy Story (24659)
Fight Club (18728)
Kill Bill: Vol. 1 (8728)
Mouchette (32)
Army of Shadows (L'armée
des ombres) (96)
User’s watch
history
The Silence of the Lambs
(37217)
Pulp Fiction (37445)
Finding Nemo (9290)
Atalante L’ (90)
Diary of a Country Priest
(Journal d'un curé de
campagne) (68)
Top recommendation
by CoFactor
Rain Man (11862)
Pulp Fiction (37445)
Finding Nemo (9290)
The Godfather: Part II (15325)
That Obscure Object of Desire
(Cet obscur objet du désir)
(300)
Top recommendation
by WMF
number of users who watched
the movie in the training set

How important is joint learning?
steProﬁle, WMF CoFactor word2vec + reg
Recall@20 0.063 0.067 0.052
Recall@50 0.108 0.110 0.095
NDCG@100 0.076 0.079 0.065
MAP@100 0.019 0.021 0.016
Table 3: Comparison between joint learning (CoFactor)
and learning from a separate two-stage (word2vec + reg)
process on ArXiv. Even though they make similar modeling
assumptions, CoFactor provides superior performance.
word2vec as the latent factors ˇi in the MF model, and learn
user latent factors ✓u. Learning ✓u in this way is the same

Extension
• User-user co-occurrence
• Higher-order co-occurrence patterns
• Add the same type of item-item co-
occurrence regularization in other
collaborative ﬁltering methods, e.g.,
BPR, factorization machine, or SLIM

Conclusion
• We present CoFactor model:
• Jointly factorize both user-item click matrix and
item-item co-occurrence matrix
• Motivated by the recent success of word
embedding models (e.g., word2vec)
• Explore the results both quantitatively and
qualitatively to investigate the pros/cons
Source code available: https://github.com/dawenl/cofactor

Thank you
• We present CoFactor model:
• Jointly factorize both user-item click matrix and
item-item co-occurrence matrix
• Motivated by the recent success of word
embedding models (e.g., word2vec)
• Explore the results both quantitatively and
qualitatively to investigate the pros/cons
Source code available: https://github.com/dawenl/cofactor

Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence

More Related Content

What's hot

Viewers also liked

Similar to Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence

Recently uploaded

Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence