分散表現を用いた
商品レコメンダーシステムの構築と評価
Recommender System with Distributed Representation
Thuy PhiVan1,2, Chen Liu 2 and Yu Hirate2
1. Computational Linguistics Laboratory, NAIST
2.Rakuten Institute of Technology, Rakuten, Inc.
{ar-thuy.phivan, chen.liu, yu.hirate}@rakuten.com
2
1. Distributed Representation
for words, docs and categories
3
Distributed Representations for Words
• Distributed representations for words
• Similar words are projected into similar vectors.
• Relationship between words can be expressed
as a simple vector calculation.
[T.Mikolov et al. NIPS 2013]
• Analogy
• v(“woman”) – v(”man”) + v(”king”) = v(“queen”)
4
2 models in word2vec
input projection output input projection output
v(t-2)
v(t-1)
v(t+1)
v(t+2)
v(t)
v(t-2)
v(t-1)
v(t+1)
v(t+2)
v(t)
CBoW Skip-gram
• given context words
• predict a probability of
a target word
• given a target word
• predict a probability of
context words
5
Sample results of word2vec
trained by Wikipedia data
query: nagoya
• osaka 0.799002
• chiba 0.762829
• fukuoka 0.755166
• sendai 0.731760
• yokohama 0.729205
• kobe 0.726732
• shiga 0.705707
• niigata 0.699777
• aichi 0.692371
• hyogo 0.687128
• saitama 0.685672
• tokyo 0.671428
• sapporo 0.670466
• kumamoto 0.660786
• japan 0.658769
• kitakyushu 0.654265
• wakayama 0.652783
• shizuoka 0.624380
query: coffee
• cocoa 0.603515
• robusta 0.565269
• beans 0.565232
• bananas 0.565207
• cinnamon 0.556771
• citrus 0.547495
• espresso 0.542120
• caff 0.542082
• infusions 0.538069
• tea 0.532565
• cassava 0.524657
• pineapples 0.523557
• coffea 0.512420
• tapioca 0.510727
• sugarcane 0.508203
• yams 0.507347
• avocados 0.507072
• arabica 0.506231
6
Doc2Vec(Paragraph2Vec) [Q.Le et al. ICML2014]
input projection output input projection output
v(doc)
v(t-1)
v(t+1)
v(t)
v(t-2)
v(t-1)
v(t)
v(t+1)
v(doc)
PV-DM PV-DBoW
v(t-2)
• Assign a “Document Vector” to each document
• Document vector can be used for
• feature of the document
• similarity of documents
7
Category2Vec [Marui et al. NLP2015]
https://github.com/rakuten-nlp/category2vec
• Assign “Category Vector” to each category.
• Each document has its own category information.
input projection output
input projection output
v(doc)
v(t-1)
v(t+1)
v(t)
v(t-2)
v(t-1)
v(t)
v(t+1)
v(doc)
CV-DM CV-DBoW
v(t-2)
v(cat)
v(cat)
8
2. Applying Doc2Vec to
Item Recommender
9
Recommender Systems in EC service
Item2Item recommender
• Given an item, show relevant items to the item
User2Item recommender
• Given a user, show relevant items to the user
10
Distributed Representation for Users and Items
Document : a sequence of words with context.
User : a sequence of item views with user’s intention.
Set of documents
Vectors for words
Vectors for documents
sim{word, word}
sim{doc, word}
sim{doc, doc}
Set of user behaviors
Vectors for items
Vectors for users
sim{item, item}
sim{user, item}
sim{user, user}
11
Dataset Preparation
• Service:
• Rakuten Singapore www.rakuten.com.sg
• Rakuten’s EC service in Singapore
• Started from 2014.
• Data Source
• Purchase History Data
• Click Through Data
• Term
• Jan. 2015 – Oct. 2015
12
Dataset Preparation
(Purchase History Data)
• A set of items purchased by the same user.
User ID A set of Purchased Items
user #1 𝑖𝑡𝑒𝑚1,1, 𝑖𝑡𝑒𝑚1,2
user #2 {𝑖𝑡𝑒𝑚2.1, 𝑖𝑡𝑒𝑚2.2, 𝑖𝑡𝑒𝑚2.3}
⋮ ⋮
user #N {𝑖𝑡𝑒𝑚 𝑁.1}
13
Dataset Preparation
(Click Through Data)
• A set of users’ sessions
• Session :
• A sequence of page views with the same cookie.
• A sequence is splitted by time interval > 2 hours.
User ID A set of Sessions
user #1 𝑖𝑡𝑒𝑚1.1.1, 𝑖𝑡𝑒𝑚1.1.2, ⋯ , 𝑖𝑡𝑒𝑚1.1.𝑛 , 𝑖𝑡𝑒𝑚1,2,1 ⋯
user #2 {𝑖𝑡𝑒𝑚2.1.1, 𝑖𝑡𝑒𝑚2.1.2}
⋮ ⋮
user #N 𝑖𝑡𝑒𝑚 𝑁.1.1, 𝑖𝑡𝑒𝑚 𝑁.1.2, ⋯ , 𝑖𝑡𝑒𝑚 𝑁.1.𝑛 , 𝑖𝑡𝑒𝑚 𝑁,2,1, ⋯
Longer than 2 hours time
Session A Session B
: session
14
Dataset Property
• More than 60% of sessions finish with one page request.
• More than X% of users visited rakuten.com.sg one time only.
Distribution of Session Length Distribution of Session Count
15
Item2Item Recommender (Example)
Click
Though
Data
Purchase
History
Data
16
3. Evaluation
17
Evaluation Metrics
Training Data
2015/01/01
2015/08/31
Test
Data
2015/09/01
2015/10/31
• N is the total number of common users in training and testing data
• Hit-rate of the recommender system (RS):
hit-rate = Number of hits / N
• Each user: RS predicts top-20 items
• “Hit”: any items for 1 particular user appear in test data
18
Evaluations
1. Parameter Optimization
• Find an optimal parameter set.
• Find important parameters to build a good
model
2. Performance Comparison with Conventional
Recommender Algorithms
• Item Similarity
• Matrix Factorization
19
1. Parameter Optimization
Parameter Values Explanation
Size
[50, 100, 200, 300,
400, 500]
Dimensionality of the vectors
Window [1, 3, 5, 8, 10, 15]
Maximum number items of context
that the training algorithm take into account
Negative [0, 5, 10, 15, 20, 25]
Number of “noise words” should be drawn
(usually between 5-20)
Sample
[0, 1e-2, 1e-3, 1e-4,
1e-5, 1e-6, 1e-7, 1e-8]
Sub-sampling of frequent words
Min-count [1, ..., 20]
Items appear less than this min-count
value is ignored
Iteration [10,15, 20, 25, 30] Number of iteration for building model
• Best setting for parameters
Size Window Negative Sample min_count Iteration hit-rate
300 8 10 1e-5 3 20 0.1821
20
1. Parameter Optimization
13.7
15.5
17.7 18.2 17.8 17.2
0
2
4
6
8
10
12
14
16
18
20
50 100 200 300 400 500
hit-rate(%)
Size
15.4
16.9
17.8 18.2 18 18
0
2
4
6
8
10
12
14
16
18
20
1 3 5 8 10 15
hit-rate(%)
window
15.9
17.9 18.2 17.6 17.4 17.3
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25
hit-rate(%)
Negative
16.216.516.416.7
18.2
15.1
2
0.3
0
2
4
6
8
10
12
14
16
18
20
0
1.00E-02
1.00E-03
1.00E-04
1.00E-05
1.00E-06
1.00E-07
1.00E-08
hit-rate(%)
Sample
16.8
18.2
18.9
18.8
18.9
19
18.8
18.7
18.9
18.90
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19
hit-rate(%)
Min_count
16.8
17.8 18.2 18.2 18.2
0
2
4
6
8
10
12
14
16
18
20
10 15 20 25 30
hit-rate(%)
Iteration
21
2. Performance Comparison
with Conventional Recommender Algorithms
Item Similarity Matrix Factorization
U x
I
= { }
= { }
Jaccard Sim. of user sets
dim=32
max iteration=25
22
2. Performance Comparison
with Conventional Algorithms
0
2
4
6
8
10
12
14
16
18
20
Item Similarity Matrix
Factorization
Doc2Vec
hit-rate(%) Doc2Vec based algorithm performed the best.
23
Conclusion and Future Works
• Conclusion
• Developed distributed representation based RS.
• Applied it to dataset generated based on Rakuten Singapore
click through data.
• Confirmed distributed representation based RS performed better
than conventional RS algorithms.
• Future Works
• Distributed representation based RS based on other datasets
• Rakuten Singapore Product Data
• Rakuten (Japan) Ichiba Click Though Data
• Hybrid Model (contents based RS x user behavior based RS)
• Testing the real service.
24
Thank you

Recommender System with Distributed Representation

  • 1.
    分散表現を用いた 商品レコメンダーシステムの構築と評価 Recommender System withDistributed Representation Thuy PhiVan1,2, Chen Liu 2 and Yu Hirate2 1. Computational Linguistics Laboratory, NAIST 2.Rakuten Institute of Technology, Rakuten, Inc. {ar-thuy.phivan, chen.liu, yu.hirate}@rakuten.com
  • 2.
    2 1. Distributed Representation forwords, docs and categories
  • 3.
    3 Distributed Representations forWords • Distributed representations for words • Similar words are projected into similar vectors. • Relationship between words can be expressed as a simple vector calculation. [T.Mikolov et al. NIPS 2013] • Analogy • v(“woman”) – v(”man”) + v(”king”) = v(“queen”)
  • 4.
    4 2 models inword2vec input projection output input projection output v(t-2) v(t-1) v(t+1) v(t+2) v(t) v(t-2) v(t-1) v(t+1) v(t+2) v(t) CBoW Skip-gram • given context words • predict a probability of a target word • given a target word • predict a probability of context words
  • 5.
    5 Sample results ofword2vec trained by Wikipedia data query: nagoya • osaka 0.799002 • chiba 0.762829 • fukuoka 0.755166 • sendai 0.731760 • yokohama 0.729205 • kobe 0.726732 • shiga 0.705707 • niigata 0.699777 • aichi 0.692371 • hyogo 0.687128 • saitama 0.685672 • tokyo 0.671428 • sapporo 0.670466 • kumamoto 0.660786 • japan 0.658769 • kitakyushu 0.654265 • wakayama 0.652783 • shizuoka 0.624380 query: coffee • cocoa 0.603515 • robusta 0.565269 • beans 0.565232 • bananas 0.565207 • cinnamon 0.556771 • citrus 0.547495 • espresso 0.542120 • caff 0.542082 • infusions 0.538069 • tea 0.532565 • cassava 0.524657 • pineapples 0.523557 • coffea 0.512420 • tapioca 0.510727 • sugarcane 0.508203 • yams 0.507347 • avocados 0.507072 • arabica 0.506231
  • 6.
    6 Doc2Vec(Paragraph2Vec) [Q.Le etal. ICML2014] input projection output input projection output v(doc) v(t-1) v(t+1) v(t) v(t-2) v(t-1) v(t) v(t+1) v(doc) PV-DM PV-DBoW v(t-2) • Assign a “Document Vector” to each document • Document vector can be used for • feature of the document • similarity of documents
  • 7.
    7 Category2Vec [Marui etal. NLP2015] https://github.com/rakuten-nlp/category2vec • Assign “Category Vector” to each category. • Each document has its own category information. input projection output input projection output v(doc) v(t-1) v(t+1) v(t) v(t-2) v(t-1) v(t) v(t+1) v(doc) CV-DM CV-DBoW v(t-2) v(cat) v(cat)
  • 8.
    8 2. Applying Doc2Vecto Item Recommender
  • 9.
    9 Recommender Systems inEC service Item2Item recommender • Given an item, show relevant items to the item User2Item recommender • Given a user, show relevant items to the user
  • 10.
    10 Distributed Representation forUsers and Items Document : a sequence of words with context. User : a sequence of item views with user’s intention. Set of documents Vectors for words Vectors for documents sim{word, word} sim{doc, word} sim{doc, doc} Set of user behaviors Vectors for items Vectors for users sim{item, item} sim{user, item} sim{user, user}
  • 11.
    11 Dataset Preparation • Service: •Rakuten Singapore www.rakuten.com.sg • Rakuten’s EC service in Singapore • Started from 2014. • Data Source • Purchase History Data • Click Through Data • Term • Jan. 2015 – Oct. 2015
  • 12.
    12 Dataset Preparation (Purchase HistoryData) • A set of items purchased by the same user. User ID A set of Purchased Items user #1 𝑖𝑡𝑒𝑚1,1, 𝑖𝑡𝑒𝑚1,2 user #2 {𝑖𝑡𝑒𝑚2.1, 𝑖𝑡𝑒𝑚2.2, 𝑖𝑡𝑒𝑚2.3} ⋮ ⋮ user #N {𝑖𝑡𝑒𝑚 𝑁.1}
  • 13.
    13 Dataset Preparation (Click ThroughData) • A set of users’ sessions • Session : • A sequence of page views with the same cookie. • A sequence is splitted by time interval > 2 hours. User ID A set of Sessions user #1 𝑖𝑡𝑒𝑚1.1.1, 𝑖𝑡𝑒𝑚1.1.2, ⋯ , 𝑖𝑡𝑒𝑚1.1.𝑛 , 𝑖𝑡𝑒𝑚1,2,1 ⋯ user #2 {𝑖𝑡𝑒𝑚2.1.1, 𝑖𝑡𝑒𝑚2.1.2} ⋮ ⋮ user #N 𝑖𝑡𝑒𝑚 𝑁.1.1, 𝑖𝑡𝑒𝑚 𝑁.1.2, ⋯ , 𝑖𝑡𝑒𝑚 𝑁.1.𝑛 , 𝑖𝑡𝑒𝑚 𝑁,2,1, ⋯ Longer than 2 hours time Session A Session B : session
  • 14.
    14 Dataset Property • Morethan 60% of sessions finish with one page request. • More than X% of users visited rakuten.com.sg one time only. Distribution of Session Length Distribution of Session Count
  • 15.
  • 16.
  • 17.
    17 Evaluation Metrics Training Data 2015/01/01 2015/08/31 Test Data 2015/09/01 2015/10/31 •N is the total number of common users in training and testing data • Hit-rate of the recommender system (RS): hit-rate = Number of hits / N • Each user: RS predicts top-20 items • “Hit”: any items for 1 particular user appear in test data
  • 18.
    18 Evaluations 1. Parameter Optimization •Find an optimal parameter set. • Find important parameters to build a good model 2. Performance Comparison with Conventional Recommender Algorithms • Item Similarity • Matrix Factorization
  • 19.
    19 1. Parameter Optimization ParameterValues Explanation Size [50, 100, 200, 300, 400, 500] Dimensionality of the vectors Window [1, 3, 5, 8, 10, 15] Maximum number items of context that the training algorithm take into account Negative [0, 5, 10, 15, 20, 25] Number of “noise words” should be drawn (usually between 5-20) Sample [0, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8] Sub-sampling of frequent words Min-count [1, ..., 20] Items appear less than this min-count value is ignored Iteration [10,15, 20, 25, 30] Number of iteration for building model • Best setting for parameters Size Window Negative Sample min_count Iteration hit-rate 300 8 10 1e-5 3 20 0.1821
  • 20.
    20 1. Parameter Optimization 13.7 15.5 17.718.2 17.8 17.2 0 2 4 6 8 10 12 14 16 18 20 50 100 200 300 400 500 hit-rate(%) Size 15.4 16.9 17.8 18.2 18 18 0 2 4 6 8 10 12 14 16 18 20 1 3 5 8 10 15 hit-rate(%) window 15.9 17.9 18.2 17.6 17.4 17.3 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 hit-rate(%) Negative 16.216.516.416.7 18.2 15.1 2 0.3 0 2 4 6 8 10 12 14 16 18 20 0 1.00E-02 1.00E-03 1.00E-04 1.00E-05 1.00E-06 1.00E-07 1.00E-08 hit-rate(%) Sample 16.8 18.2 18.9 18.8 18.9 19 18.8 18.7 18.9 18.90 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 hit-rate(%) Min_count 16.8 17.8 18.2 18.2 18.2 0 2 4 6 8 10 12 14 16 18 20 10 15 20 25 30 hit-rate(%) Iteration
  • 21.
    21 2. Performance Comparison withConventional Recommender Algorithms Item Similarity Matrix Factorization U x I = { } = { } Jaccard Sim. of user sets dim=32 max iteration=25
  • 22.
    22 2. Performance Comparison withConventional Algorithms 0 2 4 6 8 10 12 14 16 18 20 Item Similarity Matrix Factorization Doc2Vec hit-rate(%) Doc2Vec based algorithm performed the best.
  • 23.
    23 Conclusion and FutureWorks • Conclusion • Developed distributed representation based RS. • Applied it to dataset generated based on Rakuten Singapore click through data. • Confirmed distributed representation based RS performed better than conventional RS algorithms. • Future Works • Distributed representation based RS based on other datasets • Rakuten Singapore Product Data • Rakuten (Japan) Ichiba Click Though Data • Hybrid Model (contents based RS x user behavior based RS) • Testing the real service.
  • 24.