Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Factorization Meets the Item Embedding:
Regularizing Matrix Factorization with
Item Co-occurrence
Dawen Liang
Columbia Uni...
A simple trick to boost the performance
of your recommender system without
using any additional data
Dawen Liang
Columbia ...
• User-item interaction is commonly
encoded in a user-by-item matrix
• In the form of (user, item, preference) triplets
• ...
• Alternatively we can model item co-
occurrence across users
• Analogy: modeling a set of documents (users) as
a bag of c...
Can we combine these two
views in a single model?
YES
ItemsUsers
?
? ≈
User latent factors θ
Item latent factors β
K
# ItemsK
#users
*
Click matrix Y
“Collaborative filtering fo...
• Skip-gram word2vec
• Learn a low-dimensional
word embedding in a
continuous space
• Predict context words given
the curr...
Item embedding
• Skip-gram word2vec
• Learn a low-dimensional
word embedding in a
continuous space
• Predict context words...
Levy & Goldberg show that skip-gram
word2vec is implicitly factorizing (some
variation of) the pointwise mutual
informatio...
Jointly factorize both the click matrix and
co-occurrence PMI matrix with a shared
item representation/embedding
CoFactor
• Item representation must account for both user-
item interactions and item-item co-occurrence
• Alternative interpretati...
Problem/application-specific
• Define context as the entire user click history
• #(i, j) is the number of users who clicked ...
• Data preparation: 70/20/10 train/test/validation
• Make sure train/validation do not overlap in time
with test
• Metrics...
Quantitative results
ArXiv ML-20M TasteProfile
WMF CoFactor WMF CoFactor WMF CoFactor
Recall@20 0.063 0.067 0.133 0.145 0.1...
< 50 ≥ 50, < 100 ≥ 100, < 150 ≥ 150, < 500 ≥ 500
1umber of songs Whe user hDs lisWeneG Wo
0.00
0.05
0.10
0.15
0.20
0.25
0....
Toy Story (24659)
Fight Club (18728)
Kill Bill: Vol. 1 (8728)
Mouchette (32)
Army of Shadows (L'armée
des ombres) (96)
Use...
How important is joint learning?
steProfile, WMF CoFactor word2vec + reg
Recall@20 0.063 0.067 0.052
Recall@50 0.108 0.110 ...
Extension
• User-user co-occurrence
• Higher-order co-occurrence patterns
• Add the same type of item-item co-
occurrence ...
Conclusion
• We present CoFactor model:
• Jointly factorize both user-item click matrix and
item-item co-occurrence matrix...
Thank you
• We present CoFactor model:
• Jointly factorize both user-item click matrix and
item-item co-occurrence matrix
...
Upcoming SlideShare
Loading in …5
×

Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence

Presented at RecSys 2016

Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence

  1. 1. Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence Dawen Liang Columbia University/Netflix Jaan Altosaar Laurent Charlin David Blei
  2. 2. A simple trick to boost the performance of your recommender system without using any additional data Dawen Liang Columbia University/Netflix Jaan Altosaar Laurent Charlin David Blei
  3. 3. • User-item interaction is commonly encoded in a user-by-item matrix • In the form of (user, item, preference) triplets • Matrix factorization is the standard method to infer latent user preferences Motivation Items Users ? ?
  4. 4. • Alternatively we can model item co- occurrence across users • Analogy: modeling a set of documents (users) as a bag of co-occurring words (items): e.g., “Pluto” and “planet” Motivation : … ,{ } ,{ } ,{ } , , ,{ }
  5. 5. Can we combine these two views in a single model? YES
  6. 6. ItemsUsers ? ? ≈ User latent factors θ Item latent factors β K # ItemsK #users * Click matrix Y “Collaborative filtering for implicit feedback datasets”, Y. Hu, Y. Koren, C. Volinsky, ICDM 08. Lmf = X u,i cui(yui ✓> u i)2
  7. 7. • Skip-gram word2vec • Learn a low-dimensional word embedding in a continuous space • Predict context words given the current word Word embedding
  8. 8. Item embedding • Skip-gram word2vec • Learn a low-dimensional word embedding in a continuous space • Predict context words given the current word We can embed item sequences in the same fashion
  9. 9. Levy & Goldberg show that skip-gram word2vec is implicitly factorizing (some variation of) the pointwise mutual information (PMI) matrix “Neural Word Embedding as Implicit Matrix Factorization”, Levy & Goldberg, NIPS 14. ct of held- dings ories ords. earn erred the used item tions that eable ather s for intly locks item how Mikolov et al. [13] for more details). Levy and Goldberg [10] show that word2vec with a neg- ative sampling value of k can be interpreted as implicitly factorizing the pointwise mutual information (PMI) matrix shifted by log k. PMI between a word i and its context word j is defined as: PMI(i, j) = log P(i, j) P(i)P(j) Empirically, it is estimated as: PMI(i, j) = log #(i, j) · D #(i) · #(j) . Here #(i, j) is the number of times word j appears in the context of word i. D is the total number of word-context pairs. #(i) = P j #(i, j) and #(j) = P i #(i, j). After making the connection between word2vec and matrix factorization, Levy and Goldberg [10] further proposed to perform word embedding by spectral dimensionality reduc- tion (e.g., singular value decomposition) on shifted positive PMI (SPPMI) matrix: SPPMI(i, j) = max max{PMI(i, j), 0} log k, 0 This is attractive since it does not require learning rate and current word/item context word/item Co-occurrence matrix • PMI(“Pluto”, “planet”) > PMI(“Pluto”, “RecSys”)
  10. 10. Jointly factorize both the click matrix and co-occurrence PMI matrix with a shared item representation/embedding CoFactor
  11. 11. • Item representation must account for both user- item interactions and item-item co-occurrence • Alternative interpretation: regularizing the traditional MF objective with item embeddings learned by factorizing the item co-occurrence matrix Lco = X u,i cui(yui ✓> u i)2 + X mij 6=0 (mij > i j wi cj)2 Matrix factorization Item embedding Shared item representation/embedding
  12. 12. Problem/application-specific • Define context as the entire user click history • #(i, j) is the number of users who clicked on both item i and item j • Do not require any additional information beyond standard MF model How to define “co-occur”
  13. 13. • Data preparation: 70/20/10 train/test/validation • Make sure train/validation do not overlap in time with test • Metrics: Recall@20, 50, NDCG@100, MAP@100 Empirical study ArXiv ML-20M TasteProfile # of users 25,057 111,148 221,830 # of items 63,003 11,711 22,781 # interactions 1.8M 8.2M 14.0M % interactions 0.12% 0.63% 0.29% with timestamps yes yes no Table 1: Attributes of datasets after preprocessing. Inter- actions are non-zero entries (listening counts, watches, and clicks). % interactions refers to the density of the user-item interaction matrix (Y ). For datasets with timestamps, we ensure there is no overlap in time between the training and test sets. why jointly factoring both the user click matrix and item co-occurrence matrix boosts the performance by exploring the model fits. • We also demonstrate the importance of joint learning dation challenges, Recall@M, truncat (NDCG@M), and each user, all the (unobserved) items considers all items r NDCG@M and M discount to emphas lower ones. Formal items, 1{·} is the in user u has consume to predict ranking preference ✓> u i fo defined as Recall@M(u, ⇡) The expression in th between M and th
  14. 14. Quantitative results ArXiv ML-20M TasteProfile WMF CoFactor WMF CoFactor WMF CoFactor Recall@20 0.063 0.067 0.133 0.145 0.198 0.208 Recall@50 0.108 0.110 0.165 0.177 0.286 0.300 NDCG@100 0.076 0.079 0.160 0.172 0.257 0.268 MAP@100 0.019 0.021 0.047 0.055 0.103 0.111 ble 2: Comparison between the widely-used weighted matrix factorization (WMF) model [8] and our CoFactor mode Factor significantly outperforms WMF on all the datasets across all metrics. The improvement is most pronounced on th ovie watching (ML-20M) and music listening (TasteProfile) datasets. rameter indicates that the model benefits from account- g for co-occurrence patterns in the observed user behavior ta. We also grid search for the negative sampling values 2 {1, 2, 5, 10, 50} which e↵ectively modulate how much to ft the empirically estimated PMI matrix. 4 Analyzing the CoFactor model fits Table 2 summarizes the quantitative results. Each metric averaged across all users in the test set. As we can see, • We get better results by simply re-using the data • Item co-occurrence is in principle available to MF model, but MF model (bi-linear) has limited modeling capacity to make use of it
  15. 15. < 50 ≥ 50, < 100 ≥ 100, < 150 ≥ 150, < 500 ≥ 500 1umber of songs Whe user hDs lisWeneG Wo 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40AverDge1DCG@100 CoFDFWor W0F User activity: Low High We observe similar trend for other datasets as well
  16. 16. Toy Story (24659) Fight Club (18728) Kill Bill: Vol. 1 (8728) Mouchette (32) Army of Shadows (L'armée des ombres) (96) User’s watch history The Silence of the Lambs (37217) Pulp Fiction (37445) Finding Nemo (9290) Atalante L’ (90) Diary of a Country Priest (Journal d'un curé de campagne) (68) Top recommendation by CoFactor Rain Man (11862) Pulp Fiction (37445) Finding Nemo (9290) The Godfather: Part II (15325) That Obscure Object of Desire (Cet obscur objet du désir) (300) Top recommendation by WMF number of users who watched the movie in the training set
  17. 17. How important is joint learning? steProfile, WMF CoFactor word2vec + reg Recall@20 0.063 0.067 0.052 Recall@50 0.108 0.110 0.095 NDCG@100 0.076 0.079 0.065 MAP@100 0.019 0.021 0.016 Table 3: Comparison between joint learning (CoFactor) and learning from a separate two-stage (word2vec + reg) process on ArXiv. Even though they make similar modeling assumptions, CoFactor provides superior performance. word2vec as the latent factors ˇi in the MF model, and learn user latent factors ✓u. Learning ✓u in this way is the same
  18. 18. Extension • User-user co-occurrence • Higher-order co-occurrence patterns • Add the same type of item-item co- occurrence regularization in other collaborative filtering methods, e.g., BPR, factorization machine, or SLIM
  19. 19. Conclusion • We present CoFactor model: • Jointly factorize both user-item click matrix and item-item co-occurrence matrix • Motivated by the recent success of word embedding models (e.g., word2vec) • Explore the results both quantitatively and qualitatively to investigate the pros/cons Source code available: https://github.com/dawenl/cofactor
  20. 20. Thank you • We present CoFactor model: • Jointly factorize both user-item click matrix and item-item co-occurrence matrix • Motivated by the recent success of word embedding models (e.g., word2vec) • Explore the results both quantitatively and qualitatively to investigate the pros/cons Source code available: https://github.com/dawenl/cofactor

×