Recommendation as   Classification             Max Lin            @m4xl1n  NYC Predictive Analytics Meetup          March 2...
R Recommendation  Engine Competition• Kaggle.com• http://www.kaggle.com/R• “Record Me Men” placed 2nd with AUC  0.9832 • <...
Recommendation as     Classification• Input: (User, Package)• Output: Recommend the package or not• Recommend≈ Package is i...
Classification• Features• Classifier training algorithms• Training: Minimize loss + regularizers     J(θ) =       L(yi , f (...
Classification Models• Model 1: Baseline• Model 2: Latent factor models• Model 3: Package LDA topic• Model 4: Package task ...
M1: Baseline• Provided by the contest organizer• Strong baseline: AUC of ~0.94• 7 package features + User factors• Logisti...
M2: Factor Models• Features: user factors, package factors, latent  user and package factors• Classifier: f (u, p) = µ + µu...
Model Expressiveness
M3: Package LDA topic• Features: user factors, package factors,  package LDA topics• Classifier: Similar to M2           f ...
M4: Package task view• Features: user factors, package factors,  package task views  • e.g., high-performance computing• C...
Ensemble Learning• Combine predictions from individual  models• Logistic Regression
Code & More• Github https://github.com/m4xl1n• Python + R• Blog post: http://bit.ly/hWmQyM
Lessons• Features, Features, Features • User factors, package factors• Data cleaning• Domain knowledge
Recommendation as Classification
Recommendation as Classification
Recommendation as Classification
Upcoming SlideShare
Loading in …5
×

Recommendation as Classification

3,604 views
3,486 views

Published on

Max Lin presented at the NYC Predictive Analytics meetup on his submission to the R recommendation engine competition at Kaggle.

Published in: Technology

Recommendation as Classification

  1. 1. Recommendation as Classification Max Lin @m4xl1n NYC Predictive Analytics Meetup March 2011
  2. 2. R Recommendation Engine Competition• Kaggle.com• http://www.kaggle.com/R• “Record Me Men” placed 2nd with AUC 0.9832 • < 0.9881, > 0.9812
  3. 3. Recommendation as Classification• Input: (User, Package)• Output: Recommend the package or not• Recommend≈ Package is installed by User
  4. 4. Classification• Features• Classifier training algorithms• Training: Minimize loss + regularizers J(θ) = L(yi , f (ui , pi ; θ)) + λR(θ)• Stochastic gradient descent i• Choose parameters by cross validation
  5. 5. Classification Models• Model 1: Baseline• Model 2: Latent factor models• Model 3: Package LDA topic• Model 4: Package task view• Ensemble Learning
  6. 6. M1: Baseline• Provided by the contest organizer• Strong baseline: AUC of ~0.94• 7 package features + User factors• Logistic Regression
  7. 7. M2: Factor Models• Features: user factors, package factors, latent user and package factors• Classifier: f (u, p) = µ + µu + µp + T βu βp• Minimize exponential loss + L2 regularizers
  8. 8. Model Expressiveness
  9. 9. M3: Package LDA topic• Features: user factors, package factors, package LDA topics• Classifier: Similar to M2 f (u, v) = µ + µu + µp + tu
  10. 10. M4: Package task view• Features: user factors, package factors, package task views • e.g., high-performance computing• Classifier: Similar to M3 f (u, v) = µ + µu + µp + tu
  11. 11. Ensemble Learning• Combine predictions from individual models• Logistic Regression
  12. 12. Code & More• Github https://github.com/m4xl1n• Python + R• Blog post: http://bit.ly/hWmQyM
  13. 13. Lessons• Features, Features, Features • User factors, package factors• Data cleaning• Domain knowledge

×