Fast ALS-based matrix factorization for explicit and implicit feedback datasets
Upcoming SlideShare
Loading in...5
×
 

Fast ALS-based matrix factorization for explicit and implicit feedback datasets

on

  • 1,248 views

If you want to learn more, get in touch with us through http://www.gravityrd.com

If you want to learn more, get in touch with us through http://www.gravityrd.com

Statistics

Views

Total Views
1,248
Views on SlideShare
1,248
Embed Views
0

Actions

Likes
1
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Fast ALS-based matrix factorization for explicit and implicit feedback datasets Fast ALS-based matrix factorization for explicit and implicit feedback datasets Presentation Transcript

  • Fast ALS-based matrix factorization for explicit and implicit feedback datasets Istv á n Pil á szy, D ávid Zibriczky, Domonkos Tikk Gravity R&D Ltd. www.gravityrd.com 28 September 20 10
  • Collaborative filtering
  • Problem setting 5 4 3 4 4 2 4 1
    • Ridge Regression
    • Optimal solution:
    • Ridge Regression
    • Computing the optimal solution:
    • Matrix inversion is costly:
    • Sum of squared errors of the optimal solution: 0.055
    • Ridge Regression
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • Start with zero:
    • Sum of squared errors: 24.6
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • Start with zero, then optimize w 1
    • Sum of squared errors: 7.5
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • Start with zero, then optimize w 1 ,then optimize w 2
    • Sum of squared errors: 6.2
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • Start with zero, then optimize w 1 , then w 2 , then w 3
    • Sum of squared errors: 5.7
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • … w 4
    • Sum of squared errors: 5.4
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • … w 5
    • Sum of squared errors: 5.0
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • … w 1 again
    • Sum of squared errors: 3.4
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • … w 2 again
    • Sum of squared errors: 2.9
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • … w 3 again
    • Sum of squared errors: 2.7
    • RR1: RR with coordinate descent
    • Idea: optimize only one variable of at once
    • … after a while:
    • Sum of squared errors: 0.055
    • No remarkable difference
    • Cost: n examples, e epoch
    • The rating matrix, R of (M x N ) is approximated as the product of two lower ranked matrices,
    • P : user feature matrix of ( M x K ) size
    • Q : item (movie) feature matrix of ( N x K ) size
    • K : number of features
    • Matrix factorization
    P T R T Q
  • Matrix Factorization for explicit feedb. Q P 5 5 4 3 1 R 3.3 1.3 1.3 1. 4 1. 3 1 . 9 1. 7 0.7 1.0 1.3 0.8 0 0. 7 0.4 1. 7 0. 3 2.1 2.2 6.7 1.6 1. 4 2 4 3.3 1.6 1.8
  • Finding P and Q Q P R 0.3 0.9 0.7 1.3 0.5 0 .6 1.2 0.3 1. 6 1.1 5 5 4 3 1 2 4 ? ?
    • Init Q randomly
    • Find p 1
  • Finding p 1 with RR
    • Optimal solution:
  • Finding p 1 with RR Q P R 0.3 0.9 0.7 1.3 0.5 0 .6 1.2 0.3 1. 6 1.1 5 5 4 3 1 2 4 2.3 3.2
    • Initialize Q randomly
    • Repeat
      • Recompute P
        • Compute p 1 with RR
        • Compute p 2 with RR
        • … (for each user)
      • Recompute Q
        • Compute q 1 with RR
        • … (for each item)
    • Alternating Least Squares (ALS)
    • ALS relies on RR:
      • recomputation of vectors with RR
        • when recomputing p 1 , the previously computed value is ignored
      • ALS1 relies on RR1:
        • optimize the previously computed p 1 , one scalar at once
        • the previously computed value is not lost
        • run RR1 only for one epoch
      • ALS is just an approximation method.
      • Likewise ALS1.
    • ALS1: ALS with RR1
  • Implicit feedback Q P 1 0 R 0.5 0.1 0.2 0.7 0.3 0.1 0.1 0.7 0.3 0 0.2 0 0. 7 0.4 0.4 0. 4 1 0 0 0 0 1 1 0 0 1 0 1 1
    • The matrix is fully specified: each user watched each item.
    • Zeros are less important, but still important. Many 0-s, few 1-s.
    • Recall, that
    • Idea (Hu, Koren, Volinsky):
      • consider a user, who watched nothing
      • compute and for this user (the null-user)
      • when recomputing p 1 , compare her to the null-user
      • based on the cached and , update them according to the differences
      • In this way, only the number of 1-s affect performance, not the number of 0-s
    • IALS: alternating least squares with this trick.
    • Implicit feedback: IALS
    • The RR1 trick cannot be applied here 
    • Implicit feedback: IALS1
    • The RR1 trick cannot be applied here 
    • But, wait…!
    • Implicit feedback: IALS1
    • X T X is just a matrix.
    • No matter how many items we have, its dimension is the same (KxK)
    • If we are lucky, we can find K items which generate this matrix
    • What, if we are unlucky? We can still create synthetic items.
    • Assume that the null user did not watch these K items
    • X T X and X T y are the same, if synthetic items were created appropriately
    • Implicit feedback: IALS1
    • Can we find a Z matrix such that
      • Z is small, KxK and ?
    • We can, by eigenvalue decomposition
    • Implicit feedback: IALS1
    • If a user watched N items,we can run RR1 with N+K examples
    • To recompute p u , we need steps (assume 1 epoch)
    • Is it better in practice, than the of IALS ?
    • Implicit feedback: IALS1
    • Evaluation of ALS vs. ALS1
    • Probe10 RMSE on Netflix Prize dataset, after 25 epochs
    • Evaluation of ALS vs. ALS1
    • Time-accuracy tradeoff
    • Evaluation of IALS vs. IALS1
    • Average Relative Position on the test subset of a proprietary implicit feedback dataset, after 20 epochs. Lower is better.
    • Evaluation of IALS vs. IALS1
    • Time – accuracy tradeoff.
  • Conclusions users items
    • We learned two tricks:
      • ALS1: RR1 can be used instead of RR in ALS
      • IALS1: we can create few synthetic examples to replace the not-watching of many examples
    • ALS and IALS are approximation algorithms, so why not change them to be even more approximative
    • ALS1 and IALS1 offer better time-accuracy tradeoffs, esp. when K is large.
    • They can be even 10x faster (or even 100x faster, for non-realistic K values)
    • TODO:
    • Precision, recall, other datasets.
  • Thank you for your attention ?