Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intro to Factorization Machines

4,058 views

Published on

Introduction to Factorization Machines, talk at St. Petersburg Data Science Meetup #2, May, 29th, 2015

Published in: Data & Analytics
  • Be the first to comment

Intro to Factorization Machines

  1. 1. Factorization Machines St. Petersburg Data Science Meetup, May, 29th, 2015 @facultyofwonder Rutarget/Segmento
  2. 2. credits: https://twitter.com/ejlbell/status/559772240544563201
  3. 3. Q: What, say, 3 recent papers in machine learning do you think will be influential to directing the cutting edge of research these days? Peter Norvig: I’ve never been able to pick lasting papers in the past, so don’t trust me now, but here are a few: ● Rendle’s “Factorization Machines” ● Wang et al. “Bayesian optimization in high dimensions via random embeddings” ● Dean et al. “Fast, Accurate Detection of 100,000 Object Classes on a Single Machine” http://blog.teamleada.com/2014/08/ask-peter-norvig/
  4. 4. Criteo Dataset: http://labs.criteo.com/downloads/download-terabyte-click-logs/
  5. 5. Data Lots of categorical features Sparse settings Pairwise interactions Hashing trick?
  6. 6. Linear model
  7. 7. Polynomial features independent interactions
  8. 8. Factorization breaking the independence of interaction parameters
  9. 9. Example U = {Alice (A), Bob (B), Charlie (C), . . .} I = {Titanic (TI), Notting Hill (NH), Star Wars (SW), Star Trek (ST), . . .}
  10. 10. Example {(A, TI, 2010-1, 5),(A, NH, 2010-2, 3),(A, SW, 2010-4, 1), (B, SW, 2009-5, 4),(B, ST, 2009-8, 5), (C, TI, 2009-9, 1),(C, SW, 2009-12, 5)} Interaction between Alice and Star Trek to predict rating? Zero interaction? B-SW and C-SW are similar A and C are different ST and SW are similar A-SW and A-ST are to be similar
  11. 11. Complexity Number of parameters: 1 + p + k * p Linear to the input size and the size of factorization
  12. 12. Regularization Many parameters, prone to overfitting L2 regularization
  13. 13. LR+FM+SGD
  14. 14. Learning
  15. 15. Learning bit.ly/pyfm_demo
  16. 16. Hyperparameters Number of factors Regularization Learning rate Initial weights
  17. 17. FFM:ideas Features can be grouped into fields: users, movies, context, SSPs, publishers, whatever Better use this information Factor vector per field
  18. 18. Summary Factorized interactions High sparsity is OK for parameters estimation
  19. 19. Papers,papers bit.ly/factorization_machines_2010 bit.ly/libfm bit.ly/field_aware_FM

×