SlideShare a Scribd company logo
1 of 24
DM
Factorizing PMI matrix
catching co-occurrence information using
word2vec-inspired matrix factorization
이현성
DMFactor model based recommender system (Y. Hu, Y. Koren et al. ‘08)
 Represent users and items in latent space numerically
EX)
 Represent a user U as vector 𝑢 = 0.7, 1.3, −0.5, 0.6 𝑇
 Represent a item I as vector 𝑖 = 2.05, 1.2, 2.6, 3.9 𝑇
 Targets(what we want to predict) are calculated using numerically represented
using user, item, and other contents information.
 user와 item을 벡터로 잘 표현하는 것이 가장 중요!
EX)
 Predicted rating of user U on item I
𝑟𝑢,𝑖 = 𝑑𝑜𝑡 𝑢, 𝑖 = 𝑢 𝑇
𝑖 = 0.7 ∗ 2.05 + 1.3 ∗ 1.2 − 0.5 ∗ 2.6 + 0.6 ∗ 3.9 = 4.035
2
DMFactor model as Matrix Completion
Explicit Feedback(Ratings prediction)
Suppose that there are four users and
five items
Rating Matrix R is defined as :
From known information such that
𝑅 𝑢1,𝑖3
= 3 , 𝑅 𝑢3,𝑖2
= 2, … ,
We want predict 𝑅 𝑢1,𝑖1
(missing values)
i1 i2 i3 i4 i5
u1 ? ? 3 4 ?
u2 2 ? 5 ? ?
u3 ? 1 ? ? ?
u4 2 ? 4 ? 3
3
DMFactor model as Matrix Completion
Explicit Feedback(Ratings prediction)
With a few assumption, we make prediction on unseen interactions
1. Users have constant preferences and these can be represented numerically.
2. Items have features and these can be represented numerically.
These numerically represented preferences are called
Latent vectors, features, representations, and embeddings.
1. Implicit feedback(whether click it or not) or explicit feedback(rating on movies) can be
expressed linear combination of user and item features.
1. User u’s preference 𝑢 = 0.2,0.3, −1.0, …
2. Item i’s features 𝑖 = −1.0, −0.3, 1.3, …
3. rating on item i by user u := 𝑟𝑢𝑖 = 𝑘 𝑢 𝑘 𝑖 𝑘 = 0.2 ∗ −1.0 + 0.3 ∗ −0.3 + ⋯ + = 4.1
4
DMIf we find latent vectors that fits well with known ratings…
R is known information.
𝑅 ≔ 𝑢1, 𝑖3, 3 , 𝑢1, 𝑖4, 4 , … , 𝑢4, 𝑖5, 3
(rating이 있는 것만 R에 포함된다)
알고 있는 데이터를 잘 맞추는 유저 벡터 𝜃 𝑢와, 아
이템 벡터 𝛽𝑖를 찾으면,
다시 말해 𝜃 𝑢
𝑇 𝛽𝑖 ≃ 𝑟𝑢𝑖 인 벡터들을 잘 찾으면,
아직 모르는 데이터들을 맞출 수 있을 것이다.
Our objective will be
Minimize 𝑢,𝑖 ∈𝑅 𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
5
i1 i2 i3 i4 i5
u1 ? ? 3 4 ?
u2 2 ? 5 ? ?
u3 ? 1 ? ? ?
u4 2 ? 4 ? 3
Rating matrix with missing cells
DMWe could fill missing elements in matrix
i1 i2 i3 i4 i5
u1 𝜃1
𝑇
𝛽1 𝜃1
𝑇
𝛽2 3 4 …
u2 2 … 5 … …
u3 ? 1 ? ? ?
u4 2 𝜃4
𝑇
𝛽2 4 𝜃4
𝑇
𝛽4 3
Rating matrix with missing values fiiled
This is how factor model fill empty
cells in matrix.
𝜃 𝑢 := user의 latent vectors
𝛽𝑖 := item의 latent vectors
Matrix completion(rating prediction)
6
i1 i2 i3 i4 i5
u1 ? ? 3 4 ?
u2 2 ? 5 ? ?
u3 ? 1 ? ? ?
u4 2 ? 4 ? 3
Rating matrix with missing cells
DMSummary
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢
has interacted with item
𝑖
𝑟𝑢𝑖
Explicit feedback
(ratings)
𝑟𝑢𝑖 ∈ {1,2,3,4,5}
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜆 𝜔
Regularizer for parameter 𝜔(any)
 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑢,𝑖 ∈𝑅
𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
 Objective가 하는 일 :
 𝑅의 nonempty cell을 잘 근사하는 벡터 𝜃 𝑢과 𝛽𝑖를 찾기.
 벡터 𝜃 𝑢과 𝛽𝑖를 찾으면 어디에 쓰지??
 Nonempty cell (유저가 보지 않은 영화의 rating)를 예측
하기
7
DMIf we find latent vectors that fits well with known ratings…
R is known information.
𝑅 ≔
𝑢1, 𝑖1, 0 , 𝑢1, 𝑖2, 0 , 𝑢1, 𝑖3, 1 , … ,
…
𝑢2, 𝑖3, 1 , 𝑢2, 𝑖4, 0 , … , 𝑢4, 𝑖5, 3
(click이 없는 것도 𝑅 에 포함된다)
 We only know that
Whether users have interacted with item or not (1
or 0)
From items that a user have interactions(items
with label 1),
Find items(items with label 0) that user will
like.
Our objective will be
Minimize 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
𝑐 𝑢𝑖 ≔ 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 𝑡𝑒𝑟𝑚
8
i1 i2 i3 i4 i5
u1 0 0 1 1 0
u2 1 0 1 0 0
u3 0 1 0 1 0
u4 1 0 1 0 1
Interaction matrix
DMImplicit feedback case (answering if user clicked it or not)
We can recommend
item i5 to user 𝑢1
item 𝑖4 to user 𝑢2…
9
i1 i2 i3 i4 i5
u1 0 0 1 1 0
u2 1 0 1 0 0
u3 0 1 0 1 0
u4 1 0 1 0 1
Interaction matrix
i1 i2 i3 i4 i5
u1 0.05 0.1 1 1 0.7
u2 1 0.33 1 0.6 0
u3 0.9 1 0.02 1 0.4
u4 1 0.5 1 0.4 1
Interaction matrix(빨간 색은 예측값)
DMSummary
 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
 Objective가 하는 일 :
 𝑅을 잘 근사하는 벡터 𝜃 𝑢과 𝛽𝑖를 찾기.
 벡터 𝜃 𝑢과 𝛽𝑖를 찾으면 어디에 쓰지??
 유저가 보지 않은 영화를 볼 확률(0과 1 사이)을 예측하기
10
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢
has interacted with item
𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence
on items with high ratings)
1 + 𝛼𝑟𝑢𝑖
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜆 𝜔
Regularizer for parameter 𝜔(any)
DMWeighted Matrix Factorization (Hu et al. ‘08)
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence on
items with high ratings)
1 + 𝛼𝑟𝑢𝑖
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜆 𝜔
Regularizer for parameter 𝜔(any)
Loss function for Weighted Matrix Factorization
𝐿 𝑊𝑀𝐹 =
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ 𝜆 𝜃
𝑢
𝜃 2
+ 𝜆 𝛽
𝑖
𝛽𝑖
2
Solving implicit feedback problem
(Item recommendation)
11
DMAlternating Least Squares : Example
 𝐴 =
1 2
−1 1
 We approximate A by U and V
V를 [1 -1]로 초기화하고 U를 업데이트한다.
 𝑈𝑉 =
𝑢1
𝑢2
1 − 1 =
𝑢1 −𝑢1
𝑢2 −𝑢2
𝐿𝑜𝑠𝑠 = 𝑆𝑢𝑚 𝐴 − 𝑈𝑉 2
= 𝑢1 − 1 2
+ −𝑢1 − 2 2
+ 𝑢2 + 1 2
+ −𝑢2 − 1 2
= 2𝑢1
2
+ 2𝑢1 + 2𝑢2
2
+ 4𝑢2 + 7
Loss를 u에 대해 미분해 풀어 U를 구한다.
𝜕𝐿𝑜𝑠𝑠
𝜕𝑢1
= 4𝑢1 + 2 = 0 ,
𝜕𝐿𝑜𝑠𝑠
𝜕𝑢2
= 4𝑢1 + 1 = 0
𝑢1 = −
1
2
, 𝑢2 = −1,
 𝑈 =
−
1
2
−1
방금 구한 U로, V를 구한다.
 𝑈𝑉 =
−
1
2
−1
𝑣1 𝑣2 =
−
1
2
𝑣1 −
1
2
𝑣2
−𝑣1 −𝑣2
𝐿𝑜𝑠𝑠 = 𝑆𝑢𝑚 𝐴 − 𝑈𝑉 2 = 1 +
1
2
𝑣1
2
+ 2 +
1
2
𝑣2
2
+ −1 + 𝑣1
2 + 1 + 𝑣2
2
=
5
4
𝑣1
2
− 𝑣1 +
5
4
𝑣2
2
+ 4 𝑣2 + 7
Loss를 v에 대해 미분해 풀어 U를 구한다.
𝜕𝐿𝑜𝑠𝑠
𝜕𝑣1
=
5
2
𝑣1 − 1 = 0 ,
𝜕𝐿𝑜𝑠𝑠
𝜕𝑣2
=
5
2
𝑣1 + 4 = 0
𝑣1 =
2
5
, 𝑣2 = −
8
5
 𝑉 =
2
5
−
8
5
이 과정을 반복하면, A를 잘 근사하는 U와 V를 구할 수 있다.
12
DMAlternating Least Squares
𝐿 𝑊𝑀𝐹 = 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ 𝜆 𝜃 𝑢 𝜃 2
+ 𝜆 𝛽 𝑖 𝛽𝑖
2
The objective function has two parameter 𝜃 and 𝛽
We can see the objective as function of 𝜃 or function of 𝛽
then we can find 𝜃∗
, 𝛽∗
that makes 𝐿 𝑊𝑀𝐹(𝜃) minimum, 𝐿 𝑊𝑀𝐹(𝛽) minimum, respectively.
𝜕𝐿
𝜕𝜃 𝑢
= 0이 되게 하는 𝜃 𝑢
∗
= 𝑖 𝑐 𝑢𝑖 𝛽𝑖 𝛽𝑖
𝑇
+ 𝜆 𝜃 𝐼 𝐾
−1
𝑖 𝑐 𝑢𝑖 𝑦 𝑢𝑖 𝛽𝑖
𝜕𝐿
𝜕𝛽𝑖
= 0이 되게 하는 𝛽𝑖
∗
= 𝑖 𝑐 𝑢𝑖 𝜃𝑖 𝜃𝑖
𝑇
+ 𝜆 𝛽 𝐼 𝐾
−1
𝑖 𝑐 𝑢𝑖 𝑦 𝑢𝑖 𝜃𝑖 .
We first update 𝜃 = 𝜃∗
, then update 𝛽 = 𝛽∗
.
This method is called Alternating Least Squares and used in finding latent factors of users and items.
algorithms introduced hereafter are trained using ALS.
13
DMSharing parameters in matrix factorization
 𝐴 = 𝑈𝑋 , 𝐵 = 𝑈𝑌
We want to factorize both matrices with sharing factor matrix U
𝐿𝑜𝑠𝑠 = |𝐴 − 𝑈𝑋|2 + |𝐵 − 𝑈𝑌|2 =
𝑢,𝑖∈𝐴
𝑎 𝑢𝑖 − 𝑢 𝑢 𝑥𝑖
2 +
𝑢,𝑗∈𝐵
𝑏 𝑢𝑖 − 𝑢 𝑢 𝑦𝑗
2
14
DMRelation between PMI matrix and word2vec
Neural Word Embedding as Implicit Matrix Factorization(NIPS ’14)
PMI = pairwise mutual information
𝑃𝑀𝐼 𝑖, 𝑗 = log
p i, j
p i p j
≃ log
# i, j D
# i #(j)
𝑃𝑀𝐼 𝑖, 𝑗 > 0 := item 𝑖 and item 𝑗 co-occur
𝑀 𝑃𝑀𝐼 ≔ PMI matrix, # item by # item dimension square matrix.
𝑚𝑖𝑗
𝑝𝑚𝑖
≔ 𝑃𝑀𝐼 𝑖, 𝑗
# i ≔ 전체 buckets 중 i를 포함하는 bucket의 수
# i, j ≔ 전체 buckets 중 i와 j를 동시에 포함하는 bucket의 수
D := 전체 buckets의 개S수(e.g. recommender system에서는 user의 수, NLP에서는 문장의 수)
15
DMPMI matrix
 Skip-gram with negative sampling factor k is equivalent to factorizing S where 𝑠𝑖𝑗 = 𝑚𝑖𝑗
𝑝𝑚𝑖
− 𝑙𝑜𝑔 𝑘
 S is called as k-shifted PMI matrix. Factorizing S.
𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑖,𝑗∈𝑖𝑡𝑒𝑚𝑠
𝑠𝑖𝑗 − 𝜃𝑖 𝛽𝑗
After solving this objective, we can get item latent representations 𝜃𝑖, and their expectation values are equal to
those trained with word2vec.
 k-shifted PMI positive matrix 𝑆 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑠𝑖𝑗
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
= max 0, 𝑚𝑖𝑗
𝑝𝑚𝑖
− 𝑙𝑜𝑔 𝑘
𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑖,𝑗∈𝑖𝑡𝑒𝑚𝑠
𝑠 𝑖𝑗≠0
𝑠𝑖𝑗
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
− 𝜃𝑖 𝛽𝑗
16
DMCofactor (Liang et al. ‘16)
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence on items with
high ratings)
1 + 𝛼𝑟𝑢𝑖
𝜆 𝜔
Regularizer for parameter 𝜔(any)
MF parameters
𝜃 𝑢
Latent factor for user 𝑢
𝛽𝑖
Latent factor for item 𝑖 (shared in both matrix factorization)
Embedding parameters
#(i) The number of users who have interaction with item i
#(i, j) The number of users who have interaction with both item i and j
SK K-Shifted Positive Pointwise Mutual
Information matrix
Sij = max(
# i, j |U|
#(i)#(j)
− log k , 0)
𝛾𝑗
Latent vector for item 𝑗 in factorizing PMI matrix(right factor matrix)
𝑤𝑖, 𝑐𝑗
Regularization for item 𝑖 and item 𝑗 in factorizing PMI matrix
Loss function for Cofactor Model
𝐿 𝐶𝑂 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚
𝑀𝐹 𝑇𝑒𝑟𝑚 =
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 =
𝑖,𝑗 ∈𝑅
𝑠𝑖𝑗
𝐾
− 𝛽𝑖 𝛾𝑗 + 𝑤𝑖 + 𝑐𝑗
2
𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2
2
+ 𝜆 𝛽 𝛽𝑖 2
2
+ 𝜆 𝛾 𝛾 2
2
Item latent vector 𝜷𝒊 is shared between two terms.
Regularizing MF term by factorizing PMI matrix.
Train five parameters 𝛽, 𝜃, 𝛾, 𝑤, 𝑎𝑛𝑑 𝑐 using ALS
17
DMCEMF (Nguyen et al. ‘17)
Paramete
r
Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence on items with high
ratings)
1 + 𝛼𝑟𝑢𝑖
𝜆 𝜔
Regularizer for parameter 𝜔(any)
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
Embedding Parameters
#(i) The number of users who have interaction with item i
#(i, j) The number of users who have interaction with both item i and j
SK K-Shifted Positive Pointwise Mutual Information
matrix
𝑠𝑖𝑗
𝐾
= max(
# i, j |U|
#(i)#(j)
− log k , 0)
Loss function for CEMF Model
𝐿 𝐶𝐸𝑀𝐹 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚
𝑀𝐹 𝑇𝑒𝑟𝑚 =
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 =
𝑖,𝑗 ∈𝑅
𝑠𝑖𝑗
𝐾
− 𝛽𝑖
𝑇
𝛽𝑗
2
𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2
2
+ 𝜆 𝛽 𝛽𝑖 2
2
+ 𝜆 𝛾 𝛾 2
2
Simplify cofactor model using that SK
is symmetric (SK
can be factorized SK
=
YT
Y)
Simpler than cofactor model -> better generalization and efficient training
Train two parameters 𝛽, 𝜃 using ALS
18
DMExperiment results with these three MF algorithms
ML-20M TasteProfile OnlineRetail
# of users 138,493 629,113 3,704
# of items 26,308 98,486 3,643
# of
interactions
18M 35.5M 235K
Sparsity (%) 99.5% 99.94% 98.25%
Statistical information of the datasets
19
DMPrecision@N of WMF, Cofactor, and CEMF
Dataset Model Pre@5 Pre@10 Pre@20 Pre@50 Pre@100
ML-20 WMF 0.2176 0.1818 0.1443 0.0974 0.0677
Cofactor 0.2249 0.1835 0.1416 0.0926 0.0635
CEMF 0.2369 0.1952 0.1523 0.1007 0.0690
TasteProfile WMF 0.1152 0.0950 0.0755 0.0525 0.0378
Cofactor 0.1076 0.0866 0.0071 0.0525 0.0378
CEMF 0.1181 0.0966 0.0760 0.0523 0.0353
OnlineRetail WMF 0.0870 0.0s713 0.0582 0.0406 0.0294
Cofactor 0.0927 0.0728 0.0552 0.0381 0.0273
CEMF 0.0959 0.0779 0.0619 0.0425 0.0302
20
DMRecall@N of WMF, Cofactor, and CEMF
Dataset Model Recall@5 Recall@10 Recall@20 Recall@50 Recall@100
ML-20 WMF 0.2366 0.2601 0.3233 0.4553 0.5788
Cofactor 0.2420 0.2550 0.3022 0.4101 0.5194
CEMF 0.2563 0.2750 0.3331 0.4605 0.5896
TasteProfile WMF 0.1186 0.1148 0.1377 0.2129 0.2960
Cofactor 0.1106 0.1060 0.1256 0.1947 0.2741
CEMF 0.1215 0.1159 0.1369 0.2092 0.2891
OnlineRetail WMF 0.1142 0.1463 0.2136 0.3428 0.4638
Cofactor 0.1160 0.1384 0.1891 0.3020 0.4159
CEMF 0.1232 0.1550 0.2191 0.3466 0.4676
Latent factor dimension : 30
21
DMUser의 interaction 회수에 따른 성능 변화.
-low : interaction이 20회 이하인 user group
-medium : interaction이 20-100회 사이인 user group
-high : interaction이 100회 이상인 user group
Cofactor나 CEMF같이 PMI를 factorize하는 Model은
interaction이 많은 유저를 더 정확하게 예측한다.
(이 부분은 Embedding Term 전체를 regularize함으로서
조절 가능할 것 같다.)
ML-20M dataset에서
User의 interaction 횟수에 따른 precision과 recall의 변화.
22
DMCF with Implicit and Explicit Feedback data (Nguyen et al. ’17)
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝜆 𝜔
Regularizer for parameter 𝜔(any)
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜇 Global bias (used in factorizing rating matrix)
𝑏 𝑢
Bias for user 𝑢 (used in factorizing rating matrix)
𝑏𝑖
Bias for item 𝑖 (used in factorizing rating matrix)
Embedding Parameters
𝜌𝑖
Latent factor for item I using factorizing PMI matrix(left factor matrix)
𝛾𝑗
Latent vector for item 𝑗 factorizing PMI matrix(right factor matrix)
#(i) The number of users who have interaction with item i
#(i, j) The number of users who have interaction with both item i and j
SK K-Shifted Positive Pointwise Mutual
Information matrix
𝑠𝑖𝑗
𝐾
= max(
# i,j |U|
#(i)#(j)
− log k , 0)
Loss function for CFIE Model
𝐿 𝐶𝐹𝐼𝐸 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚
𝑀𝐹 𝑇𝑒𝑟𝑚 =
𝑢,𝑖 ∈𝑅
𝑟𝑢𝑖 − 𝑏 𝑢 + 𝑏𝑖 + 𝜇 + 𝜃 𝑢
𝑇
𝛽𝑖
2
𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔
𝑖,𝑗 ∈𝑆
𝑠𝑖𝑗
𝐾
− 𝜌𝑖
𝑇
𝛾𝑗
2
𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2
2
+ 𝜆 𝛽 𝛽𝑖 − 𝜌𝑖 2
2
+ 𝜆 𝜌 𝜌𝑖 2
2
+ 𝜆 𝛾 𝛾 2
2
+ 𝜆 𝑏 𝑏 𝑢
2
+ 𝜆 𝑏 𝑏𝑖
2
Train six parameters 𝛽, 𝜌, 𝛾, 𝑎𝑛𝑑 𝜇, 𝑏 𝑢, 𝑏𝑖 using ALS
Improve rating prediction using implicit feedback which is hard to catch with linear
MF
Controlling effect of embedding term using 𝜆 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔
23
DMEvaluation results : RMSE
Methods
ML-1M ML-20M
RMSE
PMF 0.0995 0.9373
NMF 1.0051 0.9891
SVD++ 0.9421 0.8721
CFIE 0.9004 0.8528
80% for training data
(10% of training data for validation data)
20% for test data.
Dimension : 30
24
ML-1M ML-20M
# of users 6,040 138,493
# of items 3,706 26,308
# of
interactions
1M 20M
Sparsity (%) 95.5% 0.5S%
Statistical information of the datasets

More Related Content

What's hot

Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sineijcsa
 
Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemMarsan Ma
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...IJECEIAES
 
# Neural network toolbox
# Neural network toolbox # Neural network toolbox
# Neural network toolbox VineetKumar508
 
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic IntroductionFuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic IntroductionWaqas Tariq
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
Artificial neural network - Architectures
Artificial neural network - ArchitecturesArtificial neural network - Architectures
Artificial neural network - ArchitecturesErin Brunston
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support systemR A Akerkar
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...Jisang Yoon
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
 

What's hot (20)

Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Neural network
Neural networkNeural network
Neural network
 
Fuzzy logic member functions
Fuzzy logic member functionsFuzzy logic member functions
Fuzzy logic member functions
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
 
Supervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking systemSupervised embedding techniques in search ranking system
Supervised embedding techniques in search ranking system
 
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
 
Lesson 36
Lesson 36Lesson 36
Lesson 36
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
# Neural network toolbox
# Neural network toolbox # Neural network toolbox
# Neural network toolbox
 
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic IntroductionFuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Artificial neural network - Architectures
Artificial neural network - ArchitecturesArtificial neural network - Architectures
Artificial neural network - Architectures
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support system
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
 

Similar to Catching co occurrence information using word2vec-inspired matrix factorization

Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...IJERA Editor
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Introduction to Tensor Flow for Optical Character Recognition (OCR)
Introduction to Tensor Flow for Optical Character Recognition (OCR)Introduction to Tensor Flow for Optical Character Recognition (OCR)
Introduction to Tensor Flow for Optical Character Recognition (OCR)Vincenzo Santopietro
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep DiveSara Hooker
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Dawen Liang
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4Sunwoo Kim
 
Solving linear equations from an image using ann
Solving linear equations from an image using annSolving linear equations from an image using ann
Solving linear equations from an image using anneSAT Journals
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...ETS Asset Management Factory
 
2013 - Andrei Zmievski: Machine learning para datos
2013 - Andrei Zmievski: Machine learning para datos2013 - Andrei Zmievski: Machine learning para datos
2013 - Andrei Zmievski: Machine learning para datosPHP Conference Argentina
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 

Similar to Catching co occurrence information using word2vec-inspired matrix factorization (20)

Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Shap
ShapShap
Shap
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Introduction to Tensor Flow for Optical Character Recognition (OCR)
Introduction to Tensor Flow for Optical Character Recognition (OCR)Introduction to Tensor Flow for Optical Character Recognition (OCR)
Introduction to Tensor Flow for Optical Character Recognition (OCR)
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 
Solving linear equations from an image using ann
Solving linear equations from an image using annSolving linear equations from an image using ann
Solving linear equations from an image using ann
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
2013 - Andrei Zmievski: Machine learning para datos
2013 - Andrei Zmievski: Machine learning para datos2013 - Andrei Zmievski: Machine learning para datos
2013 - Andrei Zmievski: Machine learning para datos
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 

Recently uploaded

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样wsppdmt
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptTanveerAhmed817946
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 

Recently uploaded (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 

Catching co occurrence information using word2vec-inspired matrix factorization

  • 1. DM Factorizing PMI matrix catching co-occurrence information using word2vec-inspired matrix factorization 이현성
  • 2. DMFactor model based recommender system (Y. Hu, Y. Koren et al. ‘08)  Represent users and items in latent space numerically EX)  Represent a user U as vector 𝑢 = 0.7, 1.3, −0.5, 0.6 𝑇  Represent a item I as vector 𝑖 = 2.05, 1.2, 2.6, 3.9 𝑇  Targets(what we want to predict) are calculated using numerically represented using user, item, and other contents information.  user와 item을 벡터로 잘 표현하는 것이 가장 중요! EX)  Predicted rating of user U on item I 𝑟𝑢,𝑖 = 𝑑𝑜𝑡 𝑢, 𝑖 = 𝑢 𝑇 𝑖 = 0.7 ∗ 2.05 + 1.3 ∗ 1.2 − 0.5 ∗ 2.6 + 0.6 ∗ 3.9 = 4.035 2
  • 3. DMFactor model as Matrix Completion Explicit Feedback(Ratings prediction) Suppose that there are four users and five items Rating Matrix R is defined as : From known information such that 𝑅 𝑢1,𝑖3 = 3 , 𝑅 𝑢3,𝑖2 = 2, … , We want predict 𝑅 𝑢1,𝑖1 (missing values) i1 i2 i3 i4 i5 u1 ? ? 3 4 ? u2 2 ? 5 ? ? u3 ? 1 ? ? ? u4 2 ? 4 ? 3 3
  • 4. DMFactor model as Matrix Completion Explicit Feedback(Ratings prediction) With a few assumption, we make prediction on unseen interactions 1. Users have constant preferences and these can be represented numerically. 2. Items have features and these can be represented numerically. These numerically represented preferences are called Latent vectors, features, representations, and embeddings. 1. Implicit feedback(whether click it or not) or explicit feedback(rating on movies) can be expressed linear combination of user and item features. 1. User u’s preference 𝑢 = 0.2,0.3, −1.0, … 2. Item i’s features 𝑖 = −1.0, −0.3, 1.3, … 3. rating on item i by user u := 𝑟𝑢𝑖 = 𝑘 𝑢 𝑘 𝑖 𝑘 = 0.2 ∗ −1.0 + 0.3 ∗ −0.3 + ⋯ + = 4.1 4
  • 5. DMIf we find latent vectors that fits well with known ratings… R is known information. 𝑅 ≔ 𝑢1, 𝑖3, 3 , 𝑢1, 𝑖4, 4 , … , 𝑢4, 𝑖5, 3 (rating이 있는 것만 R에 포함된다) 알고 있는 데이터를 잘 맞추는 유저 벡터 𝜃 𝑢와, 아 이템 벡터 𝛽𝑖를 찾으면, 다시 말해 𝜃 𝑢 𝑇 𝛽𝑖 ≃ 𝑟𝑢𝑖 인 벡터들을 잘 찾으면, 아직 모르는 데이터들을 맞출 수 있을 것이다. Our objective will be Minimize 𝑢,𝑖 ∈𝑅 𝑟𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 + regularization 5 i1 i2 i3 i4 i5 u1 ? ? 3 4 ? u2 2 ? 5 ? ? u3 ? 1 ? ? ? u4 2 ? 4 ? 3 Rating matrix with missing cells
  • 6. DMWe could fill missing elements in matrix i1 i2 i3 i4 i5 u1 𝜃1 𝑇 𝛽1 𝜃1 𝑇 𝛽2 3 4 … u2 2 … 5 … … u3 ? 1 ? ? ? u4 2 𝜃4 𝑇 𝛽2 4 𝜃4 𝑇 𝛽4 3 Rating matrix with missing values fiiled This is how factor model fill empty cells in matrix. 𝜃 𝑢 := user의 latent vectors 𝛽𝑖 := item의 latent vectors Matrix completion(rating prediction) 6 i1 i2 i3 i4 i5 u1 ? ? 3 4 ? u2 2 ? 5 ? ? u3 ? 1 ? ? ? u4 2 ? 4 ? 3 Rating matrix with missing cells
  • 7. DMSummary Parameter Meaning Definition 𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has interacted with item 𝑖 𝑟𝑢𝑖 Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5} MF parameters 𝜃 𝑢 Latent factor for user u 𝛽𝑖 Latent factor for item I 𝜆 𝜔 Regularizer for parameter 𝜔(any)  𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔ 𝑢,𝑖 ∈𝑅 𝑟𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 + regularization  Objective가 하는 일 :  𝑅의 nonempty cell을 잘 근사하는 벡터 𝜃 𝑢과 𝛽𝑖를 찾기.  벡터 𝜃 𝑢과 𝛽𝑖를 찾으면 어디에 쓰지??  Nonempty cell (유저가 보지 않은 영화의 rating)를 예측 하기 7
  • 8. DMIf we find latent vectors that fits well with known ratings… R is known information. 𝑅 ≔ 𝑢1, 𝑖1, 0 , 𝑢1, 𝑖2, 0 , 𝑢1, 𝑖3, 1 , … , … 𝑢2, 𝑖3, 1 , 𝑢2, 𝑖4, 0 , … , 𝑢4, 𝑖5, 3 (click이 없는 것도 𝑅 에 포함된다)  We only know that Whether users have interacted with item or not (1 or 0) From items that a user have interactions(items with label 1), Find items(items with label 0) that user will like. Our objective will be Minimize 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑟𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 + regularization 𝑐 𝑢𝑖 ≔ 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 𝑡𝑒𝑟𝑚 8 i1 i2 i3 i4 i5 u1 0 0 1 1 0 u2 1 0 1 0 0 u3 0 1 0 1 0 u4 1 0 1 0 1 Interaction matrix
  • 9. DMImplicit feedback case (answering if user clicked it or not) We can recommend item i5 to user 𝑢1 item 𝑖4 to user 𝑢2… 9 i1 i2 i3 i4 i5 u1 0 0 1 1 0 u2 1 0 1 0 0 u3 0 1 0 1 0 u4 1 0 1 0 1 Interaction matrix i1 i2 i3 i4 i5 u1 0.05 0.1 1 1 0.7 u2 1 0.33 1 0.6 0 u3 0.9 1 0.02 1 0.4 u4 1 0.5 1 0.4 1 Interaction matrix(빨간 색은 예측값)
  • 10. DMSummary  𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔ 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑟𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 + regularization  Objective가 하는 일 :  𝑅을 잘 근사하는 벡터 𝜃 𝑢과 𝛽𝑖를 찾기.  벡터 𝜃 𝑢과 𝛽𝑖를 찾으면 어디에 쓰지??  유저가 보지 않은 영화를 볼 확률(0과 1 사이)을 예측하기 10 Parameter Meaning Definition 𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has interacted with item 𝑖 𝑦 𝑢𝑖 Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1} 𝑟𝑢𝑖 Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5} 𝑐 𝑢𝑖 Feedback confidence (We will give strong confidence on items with high ratings) 1 + 𝛼𝑟𝑢𝑖 MF parameters 𝜃 𝑢 Latent factor for user u 𝛽𝑖 Latent factor for item I 𝜆 𝜔 Regularizer for parameter 𝜔(any)
  • 11. DMWeighted Matrix Factorization (Hu et al. ‘08) Parameter Meaning Definition 𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has interacted with item 𝑖 𝑦 𝑢𝑖 Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1} 𝑟𝑢𝑖 Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5} 𝑐 𝑢𝑖 Feedback confidence (We will give strong confidence on items with high ratings) 1 + 𝛼𝑟𝑢𝑖 MF parameters 𝜃 𝑢 Latent factor for user u 𝛽𝑖 Latent factor for item I 𝜆 𝜔 Regularizer for parameter 𝜔(any) Loss function for Weighted Matrix Factorization 𝐿 𝑊𝑀𝐹 = 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 + 𝜆 𝜃 𝑢 𝜃 2 + 𝜆 𝛽 𝑖 𝛽𝑖 2 Solving implicit feedback problem (Item recommendation) 11
  • 12. DMAlternating Least Squares : Example  𝐴 = 1 2 −1 1  We approximate A by U and V V를 [1 -1]로 초기화하고 U를 업데이트한다.  𝑈𝑉 = 𝑢1 𝑢2 1 − 1 = 𝑢1 −𝑢1 𝑢2 −𝑢2 𝐿𝑜𝑠𝑠 = 𝑆𝑢𝑚 𝐴 − 𝑈𝑉 2 = 𝑢1 − 1 2 + −𝑢1 − 2 2 + 𝑢2 + 1 2 + −𝑢2 − 1 2 = 2𝑢1 2 + 2𝑢1 + 2𝑢2 2 + 4𝑢2 + 7 Loss를 u에 대해 미분해 풀어 U를 구한다. 𝜕𝐿𝑜𝑠𝑠 𝜕𝑢1 = 4𝑢1 + 2 = 0 , 𝜕𝐿𝑜𝑠𝑠 𝜕𝑢2 = 4𝑢1 + 1 = 0 𝑢1 = − 1 2 , 𝑢2 = −1,  𝑈 = − 1 2 −1 방금 구한 U로, V를 구한다.  𝑈𝑉 = − 1 2 −1 𝑣1 𝑣2 = − 1 2 𝑣1 − 1 2 𝑣2 −𝑣1 −𝑣2 𝐿𝑜𝑠𝑠 = 𝑆𝑢𝑚 𝐴 − 𝑈𝑉 2 = 1 + 1 2 𝑣1 2 + 2 + 1 2 𝑣2 2 + −1 + 𝑣1 2 + 1 + 𝑣2 2 = 5 4 𝑣1 2 − 𝑣1 + 5 4 𝑣2 2 + 4 𝑣2 + 7 Loss를 v에 대해 미분해 풀어 U를 구한다. 𝜕𝐿𝑜𝑠𝑠 𝜕𝑣1 = 5 2 𝑣1 − 1 = 0 , 𝜕𝐿𝑜𝑠𝑠 𝜕𝑣2 = 5 2 𝑣1 + 4 = 0 𝑣1 = 2 5 , 𝑣2 = − 8 5  𝑉 = 2 5 − 8 5 이 과정을 반복하면, A를 잘 근사하는 U와 V를 구할 수 있다. 12
  • 13. DMAlternating Least Squares 𝐿 𝑊𝑀𝐹 = 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 + 𝜆 𝜃 𝑢 𝜃 2 + 𝜆 𝛽 𝑖 𝛽𝑖 2 The objective function has two parameter 𝜃 and 𝛽 We can see the objective as function of 𝜃 or function of 𝛽 then we can find 𝜃∗ , 𝛽∗ that makes 𝐿 𝑊𝑀𝐹(𝜃) minimum, 𝐿 𝑊𝑀𝐹(𝛽) minimum, respectively. 𝜕𝐿 𝜕𝜃 𝑢 = 0이 되게 하는 𝜃 𝑢 ∗ = 𝑖 𝑐 𝑢𝑖 𝛽𝑖 𝛽𝑖 𝑇 + 𝜆 𝜃 𝐼 𝐾 −1 𝑖 𝑐 𝑢𝑖 𝑦 𝑢𝑖 𝛽𝑖 𝜕𝐿 𝜕𝛽𝑖 = 0이 되게 하는 𝛽𝑖 ∗ = 𝑖 𝑐 𝑢𝑖 𝜃𝑖 𝜃𝑖 𝑇 + 𝜆 𝛽 𝐼 𝐾 −1 𝑖 𝑐 𝑢𝑖 𝑦 𝑢𝑖 𝜃𝑖 . We first update 𝜃 = 𝜃∗ , then update 𝛽 = 𝛽∗ . This method is called Alternating Least Squares and used in finding latent factors of users and items. algorithms introduced hereafter are trained using ALS. 13
  • 14. DMSharing parameters in matrix factorization  𝐴 = 𝑈𝑋 , 𝐵 = 𝑈𝑌 We want to factorize both matrices with sharing factor matrix U 𝐿𝑜𝑠𝑠 = |𝐴 − 𝑈𝑋|2 + |𝐵 − 𝑈𝑌|2 = 𝑢,𝑖∈𝐴 𝑎 𝑢𝑖 − 𝑢 𝑢 𝑥𝑖 2 + 𝑢,𝑗∈𝐵 𝑏 𝑢𝑖 − 𝑢 𝑢 𝑦𝑗 2 14
  • 15. DMRelation between PMI matrix and word2vec Neural Word Embedding as Implicit Matrix Factorization(NIPS ’14) PMI = pairwise mutual information 𝑃𝑀𝐼 𝑖, 𝑗 = log p i, j p i p j ≃ log # i, j D # i #(j) 𝑃𝑀𝐼 𝑖, 𝑗 > 0 := item 𝑖 and item 𝑗 co-occur 𝑀 𝑃𝑀𝐼 ≔ PMI matrix, # item by # item dimension square matrix. 𝑚𝑖𝑗 𝑝𝑚𝑖 ≔ 𝑃𝑀𝐼 𝑖, 𝑗 # i ≔ 전체 buckets 중 i를 포함하는 bucket의 수 # i, j ≔ 전체 buckets 중 i와 j를 동시에 포함하는 bucket의 수 D := 전체 buckets의 개S수(e.g. recommender system에서는 user의 수, NLP에서는 문장의 수) 15
  • 16. DMPMI matrix  Skip-gram with negative sampling factor k is equivalent to factorizing S where 𝑠𝑖𝑗 = 𝑚𝑖𝑗 𝑝𝑚𝑖 − 𝑙𝑜𝑔 𝑘  S is called as k-shifted PMI matrix. Factorizing S. 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔ 𝑖,𝑗∈𝑖𝑡𝑒𝑚𝑠 𝑠𝑖𝑗 − 𝜃𝑖 𝛽𝑗 After solving this objective, we can get item latent representations 𝜃𝑖, and their expectation values are equal to those trained with word2vec.  k-shifted PMI positive matrix 𝑆 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑖𝑗 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = max 0, 𝑚𝑖𝑗 𝑝𝑚𝑖 − 𝑙𝑜𝑔 𝑘 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔ 𝑖,𝑗∈𝑖𝑡𝑒𝑚𝑠 𝑠 𝑖𝑗≠0 𝑠𝑖𝑗 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 − 𝜃𝑖 𝛽𝑗 16
  • 17. DMCofactor (Liang et al. ‘16) Parameter Meaning Definition 𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has interacted with item 𝑖 𝑦 𝑢𝑖 Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1} 𝑟𝑢𝑖 Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5} 𝑐 𝑢𝑖 Feedback confidence (We will give strong confidence on items with high ratings) 1 + 𝛼𝑟𝑢𝑖 𝜆 𝜔 Regularizer for parameter 𝜔(any) MF parameters 𝜃 𝑢 Latent factor for user 𝑢 𝛽𝑖 Latent factor for item 𝑖 (shared in both matrix factorization) Embedding parameters #(i) The number of users who have interaction with item i #(i, j) The number of users who have interaction with both item i and j SK K-Shifted Positive Pointwise Mutual Information matrix Sij = max( # i, j |U| #(i)#(j) − log k , 0) 𝛾𝑗 Latent vector for item 𝑗 in factorizing PMI matrix(right factor matrix) 𝑤𝑖, 𝑐𝑗 Regularization for item 𝑖 and item 𝑗 in factorizing PMI matrix Loss function for Cofactor Model 𝐿 𝐶𝑂 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚 𝑀𝐹 𝑇𝑒𝑟𝑚 = 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 = 𝑖,𝑗 ∈𝑅 𝑠𝑖𝑗 𝐾 − 𝛽𝑖 𝛾𝑗 + 𝑤𝑖 + 𝑐𝑗 2 𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2 2 + 𝜆 𝛽 𝛽𝑖 2 2 + 𝜆 𝛾 𝛾 2 2 Item latent vector 𝜷𝒊 is shared between two terms. Regularizing MF term by factorizing PMI matrix. Train five parameters 𝛽, 𝜃, 𝛾, 𝑤, 𝑎𝑛𝑑 𝑐 using ALS 17
  • 18. DMCEMF (Nguyen et al. ‘17) Paramete r Meaning Definition 𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has interacted with item 𝑖 𝑦 𝑢𝑖 Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1} 𝑟𝑢𝑖 Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5} 𝑐 𝑢𝑖 Feedback confidence (We will give strong confidence on items with high ratings) 1 + 𝛼𝑟𝑢𝑖 𝜆 𝜔 Regularizer for parameter 𝜔(any) MF parameters 𝜃 𝑢 Latent factor for user u 𝛽𝑖 Latent factor for item I Embedding Parameters #(i) The number of users who have interaction with item i #(i, j) The number of users who have interaction with both item i and j SK K-Shifted Positive Pointwise Mutual Information matrix 𝑠𝑖𝑗 𝐾 = max( # i, j |U| #(i)#(j) − log k , 0) Loss function for CEMF Model 𝐿 𝐶𝐸𝑀𝐹 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚 𝑀𝐹 𝑇𝑒𝑟𝑚 = 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢 𝑇 𝛽𝑖 2 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 = 𝑖,𝑗 ∈𝑅 𝑠𝑖𝑗 𝐾 − 𝛽𝑖 𝑇 𝛽𝑗 2 𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2 2 + 𝜆 𝛽 𝛽𝑖 2 2 + 𝜆 𝛾 𝛾 2 2 Simplify cofactor model using that SK is symmetric (SK can be factorized SK = YT Y) Simpler than cofactor model -> better generalization and efficient training Train two parameters 𝛽, 𝜃 using ALS 18
  • 19. DMExperiment results with these three MF algorithms ML-20M TasteProfile OnlineRetail # of users 138,493 629,113 3,704 # of items 26,308 98,486 3,643 # of interactions 18M 35.5M 235K Sparsity (%) 99.5% 99.94% 98.25% Statistical information of the datasets 19
  • 20. DMPrecision@N of WMF, Cofactor, and CEMF Dataset Model Pre@5 Pre@10 Pre@20 Pre@50 Pre@100 ML-20 WMF 0.2176 0.1818 0.1443 0.0974 0.0677 Cofactor 0.2249 0.1835 0.1416 0.0926 0.0635 CEMF 0.2369 0.1952 0.1523 0.1007 0.0690 TasteProfile WMF 0.1152 0.0950 0.0755 0.0525 0.0378 Cofactor 0.1076 0.0866 0.0071 0.0525 0.0378 CEMF 0.1181 0.0966 0.0760 0.0523 0.0353 OnlineRetail WMF 0.0870 0.0s713 0.0582 0.0406 0.0294 Cofactor 0.0927 0.0728 0.0552 0.0381 0.0273 CEMF 0.0959 0.0779 0.0619 0.0425 0.0302 20
  • 21. DMRecall@N of WMF, Cofactor, and CEMF Dataset Model Recall@5 Recall@10 Recall@20 Recall@50 Recall@100 ML-20 WMF 0.2366 0.2601 0.3233 0.4553 0.5788 Cofactor 0.2420 0.2550 0.3022 0.4101 0.5194 CEMF 0.2563 0.2750 0.3331 0.4605 0.5896 TasteProfile WMF 0.1186 0.1148 0.1377 0.2129 0.2960 Cofactor 0.1106 0.1060 0.1256 0.1947 0.2741 CEMF 0.1215 0.1159 0.1369 0.2092 0.2891 OnlineRetail WMF 0.1142 0.1463 0.2136 0.3428 0.4638 Cofactor 0.1160 0.1384 0.1891 0.3020 0.4159 CEMF 0.1232 0.1550 0.2191 0.3466 0.4676 Latent factor dimension : 30 21
  • 22. DMUser의 interaction 회수에 따른 성능 변화. -low : interaction이 20회 이하인 user group -medium : interaction이 20-100회 사이인 user group -high : interaction이 100회 이상인 user group Cofactor나 CEMF같이 PMI를 factorize하는 Model은 interaction이 많은 유저를 더 정확하게 예측한다. (이 부분은 Embedding Term 전체를 regularize함으로서 조절 가능할 것 같다.) ML-20M dataset에서 User의 interaction 횟수에 따른 precision과 recall의 변화. 22
  • 23. DMCF with Implicit and Explicit Feedback data (Nguyen et al. ’17) Parameter Meaning Definition 𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has interacted with item 𝑖 𝑟𝑢𝑖 Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5} 𝜆 𝜔 Regularizer for parameter 𝜔(any) MF parameters 𝜃 𝑢 Latent factor for user u 𝛽𝑖 Latent factor for item I 𝜇 Global bias (used in factorizing rating matrix) 𝑏 𝑢 Bias for user 𝑢 (used in factorizing rating matrix) 𝑏𝑖 Bias for item 𝑖 (used in factorizing rating matrix) Embedding Parameters 𝜌𝑖 Latent factor for item I using factorizing PMI matrix(left factor matrix) 𝛾𝑗 Latent vector for item 𝑗 factorizing PMI matrix(right factor matrix) #(i) The number of users who have interaction with item i #(i, j) The number of users who have interaction with both item i and j SK K-Shifted Positive Pointwise Mutual Information matrix 𝑠𝑖𝑗 𝐾 = max( # i,j |U| #(i)#(j) − log k , 0) Loss function for CFIE Model 𝐿 𝐶𝐹𝐼𝐸 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚 𝑀𝐹 𝑇𝑒𝑟𝑚 = 𝑢,𝑖 ∈𝑅 𝑟𝑢𝑖 − 𝑏 𝑢 + 𝑏𝑖 + 𝜇 + 𝜃 𝑢 𝑇 𝛽𝑖 2 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑖,𝑗 ∈𝑆 𝑠𝑖𝑗 𝐾 − 𝜌𝑖 𝑇 𝛾𝑗 2 𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2 2 + 𝜆 𝛽 𝛽𝑖 − 𝜌𝑖 2 2 + 𝜆 𝜌 𝜌𝑖 2 2 + 𝜆 𝛾 𝛾 2 2 + 𝜆 𝑏 𝑏 𝑢 2 + 𝜆 𝑏 𝑏𝑖 2 Train six parameters 𝛽, 𝜌, 𝛾, 𝑎𝑛𝑑 𝜇, 𝑏 𝑢, 𝑏𝑖 using ALS Improve rating prediction using implicit feedback which is hard to catch with linear MF Controlling effect of embedding term using 𝜆 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 23
  • 24. DMEvaluation results : RMSE Methods ML-1M ML-20M RMSE PMF 0.0995 0.9373 NMF 1.0051 0.9891 SVD++ 0.9421 0.8721 CFIE 0.9004 0.8528 80% for training data (10% of training data for validation data) 20% for test data. Dimension : 30 24 ML-1M ML-20M # of users 6,040 138,493 # of items 3,706 26,308 # of interactions 1M 20M Sparsity (%) 95.5% 0.5S% Statistical information of the datasets

Editor's Notes

  1. 80% training 20% test Training 중 10% validation Ml은 explicit만 있어서
  2. 80% training 20% test Training 중 10% validation Ml은 explicit만 있어서, 데이터 중 20%는 explicit으로, 나머지는 implicit으로 사용함.