2. DMFactor model based recommender system (Y. Hu, Y. Koren et al. ‘08)
Represent users and items in latent space numerically
EX)
Represent a user U as vector 𝑢 = 0.7, 1.3, −0.5, 0.6 𝑇
Represent a item I as vector 𝑖 = 2.05, 1.2, 2.6, 3.9 𝑇
Targets(what we want to predict) are calculated using numerically represented
using user, item, and other contents information.
user와 item을 벡터로 잘 표현하는 것이 가장 중요!
EX)
Predicted rating of user U on item I
𝑟𝑢,𝑖 = 𝑑𝑜𝑡 𝑢, 𝑖 = 𝑢 𝑇
𝑖 = 0.7 ∗ 2.05 + 1.3 ∗ 1.2 − 0.5 ∗ 2.6 + 0.6 ∗ 3.9 = 4.035
2
3. DMFactor model as Matrix Completion
Explicit Feedback(Ratings prediction)
Suppose that there are four users and
five items
Rating Matrix R is defined as :
From known information such that
𝑅 𝑢1,𝑖3
= 3 , 𝑅 𝑢3,𝑖2
= 2, … ,
We want predict 𝑅 𝑢1,𝑖1
(missing values)
i1 i2 i3 i4 i5
u1 ? ? 3 4 ?
u2 2 ? 5 ? ?
u3 ? 1 ? ? ?
u4 2 ? 4 ? 3
3
4. DMFactor model as Matrix Completion
Explicit Feedback(Ratings prediction)
With a few assumption, we make prediction on unseen interactions
1. Users have constant preferences and these can be represented numerically.
2. Items have features and these can be represented numerically.
These numerically represented preferences are called
Latent vectors, features, representations, and embeddings.
1. Implicit feedback(whether click it or not) or explicit feedback(rating on movies) can be
expressed linear combination of user and item features.
1. User u’s preference 𝑢 = 0.2,0.3, −1.0, …
2. Item i’s features 𝑖 = −1.0, −0.3, 1.3, …
3. rating on item i by user u := 𝑟𝑢𝑖 = 𝑘 𝑢 𝑘 𝑖 𝑘 = 0.2 ∗ −1.0 + 0.3 ∗ −0.3 + ⋯ + = 4.1
4
5. DMIf we find latent vectors that fits well with known ratings…
R is known information.
𝑅 ≔ 𝑢1, 𝑖3, 3 , 𝑢1, 𝑖4, 4 , … , 𝑢4, 𝑖5, 3
(rating이 있는 것만 R에 포함된다)
알고 있는 데이터를 잘 맞추는 유저 벡터 𝜃 𝑢와, 아
이템 벡터 𝛽𝑖를 찾으면,
다시 말해 𝜃 𝑢
𝑇 𝛽𝑖 ≃ 𝑟𝑢𝑖 인 벡터들을 잘 찾으면,
아직 모르는 데이터들을 맞출 수 있을 것이다.
Our objective will be
Minimize 𝑢,𝑖 ∈𝑅 𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
5
i1 i2 i3 i4 i5
u1 ? ? 3 4 ?
u2 2 ? 5 ? ?
u3 ? 1 ? ? ?
u4 2 ? 4 ? 3
Rating matrix with missing cells
6. DMWe could fill missing elements in matrix
i1 i2 i3 i4 i5
u1 𝜃1
𝑇
𝛽1 𝜃1
𝑇
𝛽2 3 4 …
u2 2 … 5 … …
u3 ? 1 ? ? ?
u4 2 𝜃4
𝑇
𝛽2 4 𝜃4
𝑇
𝛽4 3
Rating matrix with missing values fiiled
This is how factor model fill empty
cells in matrix.
𝜃 𝑢 := user의 latent vectors
𝛽𝑖 := item의 latent vectors
Matrix completion(rating prediction)
6
i1 i2 i3 i4 i5
u1 ? ? 3 4 ?
u2 2 ? 5 ? ?
u3 ? 1 ? ? ?
u4 2 ? 4 ? 3
Rating matrix with missing cells
7. DMSummary
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢
has interacted with item
𝑖
𝑟𝑢𝑖
Explicit feedback
(ratings)
𝑟𝑢𝑖 ∈ {1,2,3,4,5}
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜆 𝜔
Regularizer for parameter 𝜔(any)
𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑢,𝑖 ∈𝑅
𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
Objective가 하는 일 :
𝑅의 nonempty cell을 잘 근사하는 벡터 𝜃 𝑢과 𝛽𝑖를 찾기.
벡터 𝜃 𝑢과 𝛽𝑖를 찾으면 어디에 쓰지??
Nonempty cell (유저가 보지 않은 영화의 rating)를 예측
하기
7
8. DMIf we find latent vectors that fits well with known ratings…
R is known information.
𝑅 ≔
𝑢1, 𝑖1, 0 , 𝑢1, 𝑖2, 0 , 𝑢1, 𝑖3, 1 , … ,
…
𝑢2, 𝑖3, 1 , 𝑢2, 𝑖4, 0 , … , 𝑢4, 𝑖5, 3
(click이 없는 것도 𝑅 에 포함된다)
We only know that
Whether users have interacted with item or not (1
or 0)
From items that a user have interactions(items
with label 1),
Find items(items with label 0) that user will
like.
Our objective will be
Minimize 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
𝑐 𝑢𝑖 ≔ 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑤𝑒𝑖𝑔ℎ𝑡 𝑡𝑒𝑟𝑚
8
i1 i2 i3 i4 i5
u1 0 0 1 1 0
u2 1 0 1 0 0
u3 0 1 0 1 0
u4 1 0 1 0 1
Interaction matrix
9. DMImplicit feedback case (answering if user clicked it or not)
We can recommend
item i5 to user 𝑢1
item 𝑖4 to user 𝑢2…
9
i1 i2 i3 i4 i5
u1 0 0 1 1 0
u2 1 0 1 0 0
u3 0 1 0 1 0
u4 1 0 1 0 1
Interaction matrix
i1 i2 i3 i4 i5
u1 0.05 0.1 1 1 0.7
u2 1 0.33 1 0.6 0
u3 0.9 1 0.02 1 0.4
u4 1 0.5 1 0.4 1
Interaction matrix(빨간 색은 예측값)
10. DMSummary
𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑟𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ regularization
Objective가 하는 일 :
𝑅을 잘 근사하는 벡터 𝜃 𝑢과 𝛽𝑖를 찾기.
벡터 𝜃 𝑢과 𝛽𝑖를 찾으면 어디에 쓰지??
유저가 보지 않은 영화를 볼 확률(0과 1 사이)을 예측하기
10
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢
has interacted with item
𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence
on items with high ratings)
1 + 𝛼𝑟𝑢𝑖
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜆 𝜔
Regularizer for parameter 𝜔(any)
11. DMWeighted Matrix Factorization (Hu et al. ‘08)
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence on
items with high ratings)
1 + 𝛼𝑟𝑢𝑖
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜆 𝜔
Regularizer for parameter 𝜔(any)
Loss function for Weighted Matrix Factorization
𝐿 𝑊𝑀𝐹 =
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ 𝜆 𝜃
𝑢
𝜃 2
+ 𝜆 𝛽
𝑖
𝛽𝑖
2
Solving implicit feedback problem
(Item recommendation)
11
13. DMAlternating Least Squares
𝐿 𝑊𝑀𝐹 = 𝑢,𝑖 ∈𝑅 𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
+ 𝜆 𝜃 𝑢 𝜃 2
+ 𝜆 𝛽 𝑖 𝛽𝑖
2
The objective function has two parameter 𝜃 and 𝛽
We can see the objective as function of 𝜃 or function of 𝛽
then we can find 𝜃∗
, 𝛽∗
that makes 𝐿 𝑊𝑀𝐹(𝜃) minimum, 𝐿 𝑊𝑀𝐹(𝛽) minimum, respectively.
𝜕𝐿
𝜕𝜃 𝑢
= 0이 되게 하는 𝜃 𝑢
∗
= 𝑖 𝑐 𝑢𝑖 𝛽𝑖 𝛽𝑖
𝑇
+ 𝜆 𝜃 𝐼 𝐾
−1
𝑖 𝑐 𝑢𝑖 𝑦 𝑢𝑖 𝛽𝑖
𝜕𝐿
𝜕𝛽𝑖
= 0이 되게 하는 𝛽𝑖
∗
= 𝑖 𝑐 𝑢𝑖 𝜃𝑖 𝜃𝑖
𝑇
+ 𝜆 𝛽 𝐼 𝐾
−1
𝑖 𝑐 𝑢𝑖 𝑦 𝑢𝑖 𝜃𝑖 .
We first update 𝜃 = 𝜃∗
, then update 𝛽 = 𝛽∗
.
This method is called Alternating Least Squares and used in finding latent factors of users and items.
algorithms introduced hereafter are trained using ALS.
13
15. DMRelation between PMI matrix and word2vec
Neural Word Embedding as Implicit Matrix Factorization(NIPS ’14)
PMI = pairwise mutual information
𝑃𝑀𝐼 𝑖, 𝑗 = log
p i, j
p i p j
≃ log
# i, j D
# i #(j)
𝑃𝑀𝐼 𝑖, 𝑗 > 0 := item 𝑖 and item 𝑗 co-occur
𝑀 𝑃𝑀𝐼 ≔ PMI matrix, # item by # item dimension square matrix.
𝑚𝑖𝑗
𝑝𝑚𝑖
≔ 𝑃𝑀𝐼 𝑖, 𝑗
# i ≔ 전체 buckets 중 i를 포함하는 bucket의 수
# i, j ≔ 전체 buckets 중 i와 j를 동시에 포함하는 bucket의 수
D := 전체 buckets의 개S수(e.g. recommender system에서는 user의 수, NLP에서는 문장의 수)
15
16. DMPMI matrix
Skip-gram with negative sampling factor k is equivalent to factorizing S where 𝑠𝑖𝑗 = 𝑚𝑖𝑗
𝑝𝑚𝑖
− 𝑙𝑜𝑔 𝑘
S is called as k-shifted PMI matrix. Factorizing S.
𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑖,𝑗∈𝑖𝑡𝑒𝑚𝑠
𝑠𝑖𝑗 − 𝜃𝑖 𝛽𝑗
After solving this objective, we can get item latent representations 𝜃𝑖, and their expectation values are equal to
those trained with word2vec.
k-shifted PMI positive matrix 𝑆 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑠𝑖𝑗
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
= max 0, 𝑚𝑖𝑗
𝑝𝑚𝑖
− 𝑙𝑜𝑔 𝑘
𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 ≔
𝑖,𝑗∈𝑖𝑡𝑒𝑚𝑠
𝑠 𝑖𝑗≠0
𝑠𝑖𝑗
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
− 𝜃𝑖 𝛽𝑗
16
17. DMCofactor (Liang et al. ‘16)
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence on items with
high ratings)
1 + 𝛼𝑟𝑢𝑖
𝜆 𝜔
Regularizer for parameter 𝜔(any)
MF parameters
𝜃 𝑢
Latent factor for user 𝑢
𝛽𝑖
Latent factor for item 𝑖 (shared in both matrix factorization)
Embedding parameters
#(i) The number of users who have interaction with item i
#(i, j) The number of users who have interaction with both item i and j
SK K-Shifted Positive Pointwise Mutual
Information matrix
Sij = max(
# i, j |U|
#(i)#(j)
− log k , 0)
𝛾𝑗
Latent vector for item 𝑗 in factorizing PMI matrix(right factor matrix)
𝑤𝑖, 𝑐𝑗
Regularization for item 𝑖 and item 𝑗 in factorizing PMI matrix
Loss function for Cofactor Model
𝐿 𝐶𝑂 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚
𝑀𝐹 𝑇𝑒𝑟𝑚 =
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 =
𝑖,𝑗 ∈𝑅
𝑠𝑖𝑗
𝐾
− 𝛽𝑖 𝛾𝑗 + 𝑤𝑖 + 𝑐𝑗
2
𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2
2
+ 𝜆 𝛽 𝛽𝑖 2
2
+ 𝜆 𝛾 𝛾 2
2
Item latent vector 𝜷𝒊 is shared between two terms.
Regularizing MF term by factorizing PMI matrix.
Train five parameters 𝛽, 𝜃, 𝛾, 𝑤, 𝑎𝑛𝑑 𝑐 using ALS
17
18. DMCEMF (Nguyen et al. ‘17)
Paramete
r
Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑦 𝑢𝑖
Implicit feedback (clicks) 𝑦 𝑢𝑖 ∈ {0,1}
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝑐 𝑢𝑖
Feedback confidence
(We will give strong confidence on items with high
ratings)
1 + 𝛼𝑟𝑢𝑖
𝜆 𝜔
Regularizer for parameter 𝜔(any)
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
Embedding Parameters
#(i) The number of users who have interaction with item i
#(i, j) The number of users who have interaction with both item i and j
SK K-Shifted Positive Pointwise Mutual Information
matrix
𝑠𝑖𝑗
𝐾
= max(
# i, j |U|
#(i)#(j)
− log k , 0)
Loss function for CEMF Model
𝐿 𝐶𝐸𝑀𝐹 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚
𝑀𝐹 𝑇𝑒𝑟𝑚 =
𝑢,𝑖 ∈𝑅
𝑐 𝑢𝑖 𝑦 𝑢𝑖 − 𝜃 𝑢
𝑇
𝛽𝑖
2
𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 =
𝑖,𝑗 ∈𝑅
𝑠𝑖𝑗
𝐾
− 𝛽𝑖
𝑇
𝛽𝑗
2
𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2
2
+ 𝜆 𝛽 𝛽𝑖 2
2
+ 𝜆 𝛾 𝛾 2
2
Simplify cofactor model using that SK
is symmetric (SK
can be factorized SK
=
YT
Y)
Simpler than cofactor model -> better generalization and efficient training
Train two parameters 𝛽, 𝜃 using ALS
18
19. DMExperiment results with these three MF algorithms
ML-20M TasteProfile OnlineRetail
# of users 138,493 629,113 3,704
# of items 26,308 98,486 3,643
# of
interactions
18M 35.5M 235K
Sparsity (%) 99.5% 99.94% 98.25%
Statistical information of the datasets
19
22. DMUser의 interaction 회수에 따른 성능 변화.
-low : interaction이 20회 이하인 user group
-medium : interaction이 20-100회 사이인 user group
-high : interaction이 100회 이상인 user group
Cofactor나 CEMF같이 PMI를 factorize하는 Model은
interaction이 많은 유저를 더 정확하게 예측한다.
(이 부분은 Embedding Term 전체를 regularize함으로서
조절 가능할 것 같다.)
ML-20M dataset에서
User의 interaction 횟수에 따른 precision과 recall의 변화.
22
23. DMCF with Implicit and Explicit Feedback data (Nguyen et al. ’17)
Parameter Meaning Definition
𝑅 Interaction matrix 𝑅 contains 𝑢, 𝑖 if user 𝑢 has
interacted with item 𝑖
𝑟𝑢𝑖
Explicit feedback (ratings) 𝑟𝑢𝑖 ∈ {1,2,3,4,5}
𝜆 𝜔
Regularizer for parameter 𝜔(any)
MF parameters
𝜃 𝑢
Latent factor for user u
𝛽𝑖
Latent factor for item I
𝜇 Global bias (used in factorizing rating matrix)
𝑏 𝑢
Bias for user 𝑢 (used in factorizing rating matrix)
𝑏𝑖
Bias for item 𝑖 (used in factorizing rating matrix)
Embedding Parameters
𝜌𝑖
Latent factor for item I using factorizing PMI matrix(left factor matrix)
𝛾𝑗
Latent vector for item 𝑗 factorizing PMI matrix(right factor matrix)
#(i) The number of users who have interaction with item i
#(i, j) The number of users who have interaction with both item i and j
SK K-Shifted Positive Pointwise Mutual
Information matrix
𝑠𝑖𝑗
𝐾
= max(
# i,j |U|
#(i)#(j)
− log k , 0)
Loss function for CFIE Model
𝐿 𝐶𝐹𝐼𝐸 = 𝑀𝐹 𝑇𝑒𝑟𝑚 + 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 + 𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑇𝑒𝑟𝑚
𝑀𝐹 𝑇𝑒𝑟𝑚 =
𝑢,𝑖 ∈𝑅
𝑟𝑢𝑖 − 𝑏 𝑢 + 𝑏𝑖 + 𝜇 + 𝜃 𝑢
𝑇
𝛽𝑖
2
𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔
𝑖,𝑗 ∈𝑆
𝑠𝑖𝑗
𝐾
− 𝜌𝑖
𝑇
𝛾𝑗
2
𝑅𝑒𝑔 𝑇𝑒𝑟𝑚 = 𝜆 𝜃 𝜃 𝑢 2
2
+ 𝜆 𝛽 𝛽𝑖 − 𝜌𝑖 2
2
+ 𝜆 𝜌 𝜌𝑖 2
2
+ 𝜆 𝛾 𝛾 2
2
+ 𝜆 𝑏 𝑏 𝑢
2
+ 𝜆 𝑏 𝑏𝑖
2
Train six parameters 𝛽, 𝜌, 𝛾, 𝑎𝑛𝑑 𝜇, 𝑏 𝑢, 𝑏𝑖 using ALS
Improve rating prediction using implicit feedback which is hard to catch with linear
MF
Controlling effect of embedding term using 𝜆 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔
23
24. DMEvaluation results : RMSE
Methods
ML-1M ML-20M
RMSE
PMF 0.0995 0.9373
NMF 1.0051 0.9891
SVD++ 0.9421 0.8721
CFIE 0.9004 0.8528
80% for training data
(10% of training data for validation data)
20% for test data.
Dimension : 30
24
ML-1M ML-20M
# of users 6,040 138,493
# of items 3,706 26,308
# of
interactions
1M 20M
Sparsity (%) 95.5% 0.5S%
Statistical information of the datasets
Editor's Notes
80% training
20% test
Training 중 10% validation
Ml은 explicit만 있어서
80% training
20% test
Training 중 10% validation
Ml은 explicit만 있어서, 데이터 중 20%는 explicit으로, 나머지는 implicit으로 사용함.