Recsys matrix-factorizations

Matrix Factorizations for Recommender Systems
Dmitriy Selivanov
2017-08-26

Recommender systems are everywhere
Figure 1:

Figure 2:

Figure 3:

Figure 4:

Goals
Personalized oﬀers
recommended items for a customer given history of activities
(transactions, browsing history, favourites)
Similar items
substitutions
frequently bought together
. . .
Exploration

Live demo
http://94.204.253.34/reco-playlist/
http://94.204.253.34/reco-similar-artists/

Main approaches
Content based
good for cold start
not personalized
Collaborative ﬁltering
vanilla collaborative ﬁtlering
matrix factorizations
. . .
Hybrid and context aware recommender systems
best of both worlds

Collaborative ﬁltering
Trivial algorithm:
1. take cutomers who also bought item i0
2. check other items they’ve bought - i1, i2, ...
3. calculate similarity with other items sim(i0, i1), sim(i0, i2), . . .
just frequency
similarity of the descriptions
correlation
. . .
4. sort by similarity
Cons:
recommendations are trivial - usually most popular items
not personalized
cold start - how to recommend new items?
need to keep and work on whole matrix

User-based collaborative ﬁltering
1. for a user u0 calculate sim(u0, U) and take top K
2. aggregate their opinions about items
weighted sum of their ratings
Cons:
cold start
nothing to recommend to new/untypical users
need to keep and work on whole matrix

Item-based collaborative ﬁltering
1. for a item i0 calculate sim(i0, I) and take top K
2. show most similar items
Cons:
not personalized
cold start

Latent methods
Users can be described by small number of latent factors puk
Items can be described by small number of latent factors qki
Figure 5:

Explicit feedback - rating prediction
~ 480k users, 18k movies, 100m ratings
sparsity ~ 90%
goal is to reduce RMSE by 10% - from 0.9514 to 0.8563
RMSE2
=
1
D u,i∈D
(rui − ˆrui )2

Low rank matrix factorization
R = P ∗ Q
factors
users
items
factors

Reconstruction
items
users
items
users

SVD
For any matrix X:
X = USV T
X ∈ Rm∗n
U, V - columns are orthonormal bases (dot product of any 2
columns is zero, unit norm)
S - matrix with singular values on diagonal

Truncated SVD
Take k largest singular values:
X ≈ UkSkV T
k
Truncated SVD is the best rank k approximation of the matrix X in
terms of Frobenius norm:
||X − UkSkV T
k ||F
P = Uk
√
Sk
Q =
√
SkV T
k

Issue with truncated SVD
Optimal in terms of Frobenius norm - takes into account
zeros in ratings -
RMSE =
1
users × items u∈users,i∈items
(rui − ˆrui )2
Overﬁts data
Our goal is error only in “observed” ratings:
RMSE =
1
Observed u,i∈Observed
(rui − ˆrui )2

SVD-like matrix factorization
J =
u,i∈Observed
(rui − pu × qi )2
+ λ(||Q|| + ||P||)
Non-convex - hard to optimize, but SGD and ALS works good in
practice
Alternating Least Squares
min
i∈Observed
(ri − qi × P)2
+ λ
u
j=1
p2
j
min
u∈Observed
(ru − pu × Q)2
+ λ
i
j=1
q2
j
Ridge regression: P = (QT Q + λI)−1QT y, Q = (PT P + λI)−1PT y

Types of feedback
Explicit
Ratings, likes/dislikes, purchases
cleaner data
smaller
hard to collect
Implicit
Browsing, clicks, purchases, . . .
dirty data
larger datasets
generally gives better results

Implicit feedback
missed entries in matrix are mix of negative preferences and
positive preferences
consider them as negative with low confidence
observed entries are positive preferences
should have high confidence
Model - “Collaborative Filtering for Implicit Feedback Datasets”
Preferences
Pij =
1 if Rij > 0
0 otherwise
Confidence Cui = 1 + f (Rui )
Objective
J
u=user i=item
Cui (Pui − XuYi ) + λ(||X||F + ||Y ||F )

Alternating Least Squares for implicit feedback
For ﬁxed Y :
dL/dxu = −2
i=item
cui (pui − xT
u yi )yi + 2λxu =
−2
i=item
cui (pui − yT
i xu)yi + 2λxu =
−2Y T
Cu
p(u) + 2Y T
Cu
Yxu + 2λxu
Setting dL/dxu = 0 for optimal solution gives us
(Y T CuY + λI)xu = Y T Cup(u)
xu can be obtained by solving system of linear equations:
xu = solve(Y T
Cu
Y + λI, Y T
Cu
p(u))

Alternating Least Squares for implicit feedback
Similarly for ﬁxed X:
dL/dyi = −2XT Ci p(i) + 2XT Ci Yyi + 2λyi
yi = solve(XT Ci X + λI, XT Ci p(i))
Another optimization:
XT Ci X = XT X + XT (Ci − I)X
Y T CuY = Y T Y + Y T (Cu − I)Y
XT X and Y T Y can be precomputed

Evaluation
We only care about how to produce small number of highly
relevant items
RMSE in not the best measure!
MAP@K - Mean average precision
AveragePrecision =
n
k=1
(P(k)×rel(k))
number of relevant documents
## index relevant precision_at_k
## 1: 1 0 0.0000000
## 2: 2 0 0.0000000
## 3: 3 1 0.3333333
## 4: 4 0 0.2500000
## 5: 5 0 0.2000000
map@5 = 0.1566667

Evaluation
NDCG@K - Normalized Discounted Cumulative Gain
Intuition is the same as for MAP@K, but also takes into account
value of relevance.
DCGp =
p
i=1
2reli − 1
log2(i + 1)
nDCGp =
DCGp
IDCGp
IDCGp =
|REL|
i=1
2reli − 1
log2(i + 1)

Questions?
http://dsnotes.com/tags/recommender-systems/
http://94.204.253.34/reco-playlist/
http://94.204.253.34/reco-similar-artists/
Contacts:
selivanov.dmitriy@gmail.com
https://github.com/dselivanov

Recsys matrix-factorizations

More Related Content

What's hot

Similar to Recsys matrix-factorizations

Recently uploaded

Recsys matrix-factorizations