Fast ALS-based matrix factorization for explicit and implicit feedback datasets

Fast ALS-based matrix factorization for explicit and implicit feedback datasets Istv á n Pil á szy, D ávid Zibriczky, Domonkos Tikk Gravity R&D Ltd. www.gravityrd.com 28 September 20 10

Problem setting 5 4 3 4 4 2 4 1

Optimal solution: Ridge Regression

Computing the optimal solution: Matrix inversion is costly: Sum of squared errors of the optimal solution: 0.055 Ridge Regression

RR1: RR with coordinate descent Idea: optimize only one variable of at once Start with zero: Sum of squared errors: 24.6

RR1: RR with coordinate descent Idea: optimize only one variable of at once Start with zero, then optimize w 1 Sum of squared errors: 7.5

RR1: RR with coordinate descent Idea: optimize only one variable of at once Start with zero, then optimize w 1 ,then optimize w 2 Sum of squared errors: 6.2

RR1: RR with coordinate descent Idea: optimize only one variable of at once Start with zero, then optimize w 1 , then w 2 , then w 3 Sum of squared errors: 5.7

RR1: RR with coordinate descent Idea: optimize only one variable of at once … w 4 Sum of squared errors: 5.4

RR1: RR with coordinate descent Idea: optimize only one variable of at once … w 5 Sum of squared errors: 5.0

RR1: RR with coordinate descent Idea: optimize only one variable of at once … w 1 again Sum of squared errors: 3.4

RR1: RR with coordinate descent Idea: optimize only one variable of at once … after a while: Sum of squared errors: 0.055 No remarkable difference Cost: n examples, e epoch

The rating matrix, R of (M x N ) is approximated as the product of two lower ranked matrices, P : user feature matrix of ( M x K ) size Q : item (movie) feature matrix of ( N x K ) size K : number of features Matrix factorization P T R T Q

Matrix Factorization for explicit feedb. Q P 5 5 4 3 1 R 3.3 1.3 1.3 1. 4 1. 3 1 . 9 1. 7 0.7 1.0 1.3 0.8 0 0. 7 0.4 1. 7 0. 3 2.1 2.2 6.7 1.6 1. 4 2 4 3.3 1.6 1.8

Finding P and Q Q P R 0.3 0.9 0.7 1.3 0.5 0 .6 1.2 0.3 1. 6 1.1 5 5 4 3 1 2 4 ? ? Init Q randomly Find p 1

Finding p 1 with RR Optimal solution:

Finding p 1 with RR Q P R 0.3 0.9 0.7 1.3 0.5 0 .6 1.2 0.3 1. 6 1.1 5 5 4 3 1 2 4 2.3 3.2

Initialize Q randomly Repeat Recompute P Compute p 1 with RR Compute p 2 with RR … (for each user) Recompute Q Compute q 1 with RR … (for each item) Alternating Least Squares (ALS)

ALS relies on RR: recomputation of vectors with RR when recomputing p 1 , the previously computed value is ignored ALS1 relies on RR1: optimize the previously computed p 1 , one scalar at once the previously computed value is not lost run RR1 only for one epoch ALS is just an approximation method. Likewise ALS1. ALS1: ALS with RR1

Implicit feedback Q P 1 0 R 0.5 0.1 0.2 0.7 0.3 0.1 0.1 0.7 0.3 0 0.2 0 0. 7 0.4 0.4 0. 4 1 0 0 0 0 1 1 0 0 1 0 1 1

The matrix is fully specified: each user watched each item. Zeros are less important, but still important. Many 0-s, few 1-s. Recall, that Idea (Hu, Koren, Volinsky): consider a user, who watched nothing compute and for this user (the null-user) when recomputing p 1 , compare her to the null-user based on the cached and , update them according to the differences In this way, only the number of 1-s affect performance, not the number of 0-s IALS: alternating least squares with this trick. Implicit feedback: IALS

The RR1 trick cannot be applied here  Implicit feedback: IALS1

The RR1 trick cannot be applied here  But, wait…! Implicit feedback: IALS1

X T X is just a matrix. No matter how many items we have, its dimension is the same (KxK) If we are lucky, we can find K items which generate this matrix What, if we are unlucky? We can still create synthetic items. Assume that the null user did not watch these K items X T X and X T y are the same, if synthetic items were created appropriately Implicit feedback: IALS1

Can we find a Z matrix such that Z is small, KxK and ? We can, by eigenvalue decomposition Implicit feedback: IALS1

If a user watched N items,we can run RR1 with N+K examples To recompute p u , we need steps (assume 1 epoch) Is it better in practice, than the of IALS ? Implicit feedback: IALS1

Evaluation of ALS vs. ALS1 Probe10 RMSE on Netflix Prize dataset, after 25 epochs

Evaluation of ALS vs. ALS1 Time-accuracy tradeoff

Evaluation of IALS vs. IALS1 Average Relative Position on the test subset of a proprietary implicit feedback dataset, after 20 epochs. Lower is better.

Evaluation of IALS vs. IALS1 Time – accuracy tradeoff.

Conclusions users items We learned two tricks: ALS1: RR1 can be used instead of RR in ALS IALS1: we can create few synthetic examples to replace the not-watching of many examples ALS and IALS are approximation algorithms, so why not change them to be even more approximative ALS1 and IALS1 offer better time-accuracy tradeoffs, esp. when K is large. They can be even 10x faster (or even 100x faster, for non-realistic K values) TODO: Precision, recall, other datasets.

Thank you for your attention ?

Fast ALS-based matrix factorization for explicit and implicit feedback datasets

More Related Content

What's hot

Similar to Fast ALS-based matrix factorization for explicit and implicit feedback datasets

Recently uploaded

Fast ALS-based matrix factorization for explicit and implicit feedback datasets