Meta-Prod2Vec
- Product Embeddings Using Side-
Information for Recommendation
Yuya Kanemoto
Vasile F et al. RecSys 2016
Neural embedding: Word2Vec (Skip-gram)
• A method for learning distributed vector representations that capture a large
number of syntactic and semantic word relationships
• Example: Tokyo - Japan + Germany = Berlin
• Word2Vec is essentially a two-layer neural network
• Objective function:
Mikolov T et al. 2013
Skip-gram with negative sampling
• Data sets are often too large to perform SGD as iterations at the
denominator of conditional probability takes time
• We could set the task to distinguish the target word co-
occurrences and k negative samples
Mikolov T et al. 2013
: Objective function
: Objective function with
negative sampling
Embedding and Matrix Factorisation
• The objective of the embedding is closely related to matrix
factorisation
• Embedding can be considered as decomposition of SPMI (shifted
pointwise mutual information) matrix
Levy O et al. 2014
Neural embedding: Prod2Vec
• A method applying Skip-gram model for product recommendation
• When an user buys a product, products with similar vector
representation will be recommended
Grbovic M et al. 2015
Prod2Vec for popular songs
“Shake It Off” “All About That Bass”
Vasile F et al. 2016
Prod2Vec in cold start case
“You’re Not Sorry” “Du Hast”
Vasile F et al. 2016
Meta-Prod2Vec constraints
• Meta-Prod2Vec = Prod2Vec +
product meta-data
• The aim is to deal with cold start
problems
Vasile F et al. 2016
Loss function of Prod2Vec
Vasile F et al. 2016
Negative sampling for Meta-Prod2Vec
Vasile F et al. 2016
Loss function of Meta-Prod2Vec
Vasile F et al. 2016
I: input
J: output
M: meta-data
Evaluation of experiments
Vasile F et al. 2016
• Hit ratio at K (HR@K): whether product appears in the top K list of
recommended products (doesn’t care the rank of test product in the
recommendation list)
• Normalised discounted cumulative gain (NDCG@K): measurement of the
performance of a recommendation system based on the graded relevance of the
recommended entities. It varies from 0 to 1, with 1 representing the ideal ranking
of the entities.
IDCG is the maximum possible (ideal) DCG for a given set of queries
rel: graded relevance of the result at position i
k: maximum number of entities that can be recommended
Methods for comparison
Vasile F et al. 2016
• BestOf: based on popularity
• CoCounts: based on cosine similarity (basic collaborative filtering)
• Prod2Vec
• Meta-Prod2Vec
• Mix(Prod2Vec,CoCounts):
• Mix(Meta-Prod2Vec,CoCounts):
Parameters
Number of songs: 433k
Number of artists: 67k
Embedding dimension: 50
Context window size: 3
λ: 1
α: 0.15
Relative importance of meta data
Vasile F et al. 2016
Improvement in cold start
Vasile F et al. 2016
Cold start
Improvement in cold start
Vasile F et al. 2016
Better performance in ensemble model
Vasile F et al. 2016
Discussion
• Meta data was informative, especially for cold start case
• Ensemble method (with 15% Meta-Prod2Vec) worked well
• No comparison with matrix factorisation methods/other meta-data
utilising Word2Vec variants

Journal club: Meta-Prod2Vec

  • 1.
    Meta-Prod2Vec - Product EmbeddingsUsing Side- Information for Recommendation Yuya Kanemoto Vasile F et al. RecSys 2016
  • 2.
    Neural embedding: Word2Vec(Skip-gram) • A method for learning distributed vector representations that capture a large number of syntactic and semantic word relationships • Example: Tokyo - Japan + Germany = Berlin • Word2Vec is essentially a two-layer neural network • Objective function: Mikolov T et al. 2013
  • 3.
    Skip-gram with negativesampling • Data sets are often too large to perform SGD as iterations at the denominator of conditional probability takes time • We could set the task to distinguish the target word co- occurrences and k negative samples Mikolov T et al. 2013 : Objective function : Objective function with negative sampling
  • 4.
    Embedding and MatrixFactorisation • The objective of the embedding is closely related to matrix factorisation • Embedding can be considered as decomposition of SPMI (shifted pointwise mutual information) matrix Levy O et al. 2014
  • 5.
    Neural embedding: Prod2Vec •A method applying Skip-gram model for product recommendation • When an user buys a product, products with similar vector representation will be recommended Grbovic M et al. 2015
  • 6.
    Prod2Vec for popularsongs “Shake It Off” “All About That Bass” Vasile F et al. 2016
  • 7.
    Prod2Vec in coldstart case “You’re Not Sorry” “Du Hast” Vasile F et al. 2016
  • 8.
    Meta-Prod2Vec constraints • Meta-Prod2Vec= Prod2Vec + product meta-data • The aim is to deal with cold start problems Vasile F et al. 2016
  • 9.
    Loss function ofProd2Vec Vasile F et al. 2016
  • 10.
    Negative sampling forMeta-Prod2Vec Vasile F et al. 2016
  • 11.
    Loss function ofMeta-Prod2Vec Vasile F et al. 2016 I: input J: output M: meta-data
  • 12.
    Evaluation of experiments VasileF et al. 2016 • Hit ratio at K (HR@K): whether product appears in the top K list of recommended products (doesn’t care the rank of test product in the recommendation list) • Normalised discounted cumulative gain (NDCG@K): measurement of the performance of a recommendation system based on the graded relevance of the recommended entities. It varies from 0 to 1, with 1 representing the ideal ranking of the entities. IDCG is the maximum possible (ideal) DCG for a given set of queries rel: graded relevance of the result at position i k: maximum number of entities that can be recommended
  • 13.
    Methods for comparison VasileF et al. 2016 • BestOf: based on popularity • CoCounts: based on cosine similarity (basic collaborative filtering) • Prod2Vec • Meta-Prod2Vec • Mix(Prod2Vec,CoCounts): • Mix(Meta-Prod2Vec,CoCounts): Parameters Number of songs: 433k Number of artists: 67k Embedding dimension: 50 Context window size: 3 λ: 1 α: 0.15
  • 14.
    Relative importance ofmeta data Vasile F et al. 2016
  • 15.
    Improvement in coldstart Vasile F et al. 2016 Cold start
  • 16.
    Improvement in coldstart Vasile F et al. 2016
  • 17.
    Better performance inensemble model Vasile F et al. 2016
  • 18.
    Discussion • Meta datawas informative, especially for cold start case • Ensemble method (with 15% Meta-Prod2Vec) worked well • No comparison with matrix factorisation methods/other meta-data utilising Word2Vec variants