In recent years, Word2Vec and its expansion (Doc2Vec, Paragraph2Vec, etc.) is receiving a lot of attention in the NLP field.
In this slide, we will introduce our approach for applying the Doc2Vec to the item recommender system. And we report the results of the performance evaluation of Doc2Vec-based recommender by using Rakuten Singapore EC data.
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Recommender System with Distributed Representation
1. 分散表現を用いた
商品レコメンダーシステムの構築と評価
Recommender System with Distributed Representation
Thuy PhiVan1,2, Chen Liu 2 and Yu Hirate2
1. Computational Linguistics Laboratory, NAIST
2.Rakuten Institute of Technology, Rakuten, Inc.
{ar-thuy.phivan, chen.liu, yu.hirate}@rakuten.com
3. 3
Distributed Representations for Words
• Distributed representations for words
• Similar words are projected into similar vectors.
• Relationship between words can be expressed
as a simple vector calculation.
[T.Mikolov et al. NIPS 2013]
• Analogy
• v(“woman”) – v(”man”) + v(”king”) = v(“queen”)
4. 4
2 models in word2vec
input projection output input projection output
v(t-2)
v(t-1)
v(t+1)
v(t+2)
v(t)
v(t-2)
v(t-1)
v(t+1)
v(t+2)
v(t)
CBoW Skip-gram
• given context words
• predict a probability of
a target word
• given a target word
• predict a probability of
context words
6. 6
Doc2Vec(Paragraph2Vec) [Q.Le et al. ICML2014]
input projection output input projection output
v(doc)
v(t-1)
v(t+1)
v(t)
v(t-2)
v(t-1)
v(t)
v(t+1)
v(doc)
PV-DM PV-DBoW
v(t-2)
• Assign a “Document Vector” to each document
• Document vector can be used for
• feature of the document
• similarity of documents
7. 7
Category2Vec [Marui et al. NLP2015]
https://github.com/rakuten-nlp/category2vec
• Assign “Category Vector” to each category.
• Each document has its own category information.
input projection output
input projection output
v(doc)
v(t-1)
v(t+1)
v(t)
v(t-2)
v(t-1)
v(t)
v(t+1)
v(doc)
CV-DM CV-DBoW
v(t-2)
v(cat)
v(cat)
9. 9
Recommender Systems in EC service
Item2Item recommender
• Given an item, show relevant items to the item
User2Item recommender
• Given a user, show relevant items to the user
10. 10
Distributed Representation for Users and Items
Document : a sequence of words with context.
User : a sequence of item views with user’s intention.
Set of documents
Vectors for words
Vectors for documents
sim{word, word}
sim{doc, word}
sim{doc, doc}
Set of user behaviors
Vectors for items
Vectors for users
sim{item, item}
sim{user, item}
sim{user, user}
11. 11
Dataset Preparation
• Service:
• Rakuten Singapore www.rakuten.com.sg
• Rakuten’s EC service in Singapore
• Started from 2014.
• Data Source
• Purchase History Data
• Click Through Data
• Term
• Jan. 2015 – Oct. 2015
12. 12
Dataset Preparation
(Purchase History Data)
• A set of items purchased by the same user.
User ID A set of Purchased Items
user #1 𝑖𝑡𝑒𝑚1,1, 𝑖𝑡𝑒𝑚1,2
user #2 {𝑖𝑡𝑒𝑚2.1, 𝑖𝑡𝑒𝑚2.2, 𝑖𝑡𝑒𝑚2.3}
⋮ ⋮
user #N {𝑖𝑡𝑒𝑚 𝑁.1}
13. 13
Dataset Preparation
(Click Through Data)
• A set of users’ sessions
• Session :
• A sequence of page views with the same cookie.
• A sequence is splitted by time interval > 2 hours.
User ID A set of Sessions
user #1 𝑖𝑡𝑒𝑚1.1.1, 𝑖𝑡𝑒𝑚1.1.2, ⋯ , 𝑖𝑡𝑒𝑚1.1.𝑛 , 𝑖𝑡𝑒𝑚1,2,1 ⋯
user #2 {𝑖𝑡𝑒𝑚2.1.1, 𝑖𝑡𝑒𝑚2.1.2}
⋮ ⋮
user #N 𝑖𝑡𝑒𝑚 𝑁.1.1, 𝑖𝑡𝑒𝑚 𝑁.1.2, ⋯ , 𝑖𝑡𝑒𝑚 𝑁.1.𝑛 , 𝑖𝑡𝑒𝑚 𝑁,2,1, ⋯
Longer than 2 hours time
Session A Session B
: session
14. 14
Dataset Property
• More than 60% of sessions finish with one page request.
• More than X% of users visited rakuten.com.sg one time only.
Distribution of Session Length Distribution of Session Count
18. 18
Evaluations
1. Parameter Optimization
• Find an optimal parameter set.
• Find important parameters to build a good
model
2. Performance Comparison with Conventional
Recommender Algorithms
• Item Similarity
• Matrix Factorization
19. 19
1. Parameter Optimization
Parameter Values Explanation
Size
[50, 100, 200, 300,
400, 500]
Dimensionality of the vectors
Window [1, 3, 5, 8, 10, 15]
Maximum number items of context
that the training algorithm take into account
Negative [0, 5, 10, 15, 20, 25]
Number of “noise words” should be drawn
(usually between 5-20)
Sample
[0, 1e-2, 1e-3, 1e-4,
1e-5, 1e-6, 1e-7, 1e-8]
Sub-sampling of frequent words
Min-count [1, ..., 20]
Items appear less than this min-count
value is ignored
Iteration [10,15, 20, 25, 30] Number of iteration for building model
• Best setting for parameters
Size Window Negative Sample min_count Iteration hit-rate
300 8 10 1e-5 3 20 0.1821
21. 21
2. Performance Comparison
with Conventional Recommender Algorithms
Item Similarity Matrix Factorization
U x
I
= { }
= { }
Jaccard Sim. of user sets
dim=32
max iteration=25
22. 22
2. Performance Comparison
with Conventional Algorithms
0
2
4
6
8
10
12
14
16
18
20
Item Similarity Matrix
Factorization
Doc2Vec
hit-rate(%) Doc2Vec based algorithm performed the best.
23. 23
Conclusion and Future Works
• Conclusion
• Developed distributed representation based RS.
• Applied it to dataset generated based on Rakuten Singapore
click through data.
• Confirmed distributed representation based RS performed better
than conventional RS algorithms.
• Future Works
• Distributed representation based RS based on other datasets
• Rakuten Singapore Product Data
• Rakuten (Japan) Ichiba Click Though Data
• Hybrid Model (contents based RS x user behavior based RS)
• Testing the real service.