Collaborative Filtering and Recommender Systems By Navisro Analytics


Published on

Recommendation System Overview, Types of Recommender System, and OpenSource tools/libraries available.

Published in: Technology, Education

Collaborative Filtering and Recommender Systems By Navisro Analytics

  1. 1. ACM Data Mining Hackathon 8/18/2012Recommender Systems Navisro Analytics @navisro
  2. 2. Capturing the Long Tail…
  3. 3. Recommender Approaches Model Based Training SVM, LDA, SVD for Collaborative implicit features Filtering – Item- Item similarity (You like Godfather so you will like Attribute-based Scarface - Netflix) recommendations (You like action movies, starringClint Eastwood, you Social+Interest might like “Good, Graph Based (Your Bad and the Ugly” friends like Lady Netflix) Collaborative Gaga so you will Filtering – User- like Lady Gaga, User Similarity PYMK – Facebook, LinkedIn) (People like you who bought beer Item also bought Hierarchy diapers - Target) (You bought Printer you will also need ink - BestBuy)
  4. 4. Other/Model-based Approaches• Slope one recommender• Latent factor Models for Web Data – Matrix factorization using SVD, ALS, with Regularization – LDA, SVM, Bayesian Clustering
  5. 5. General Steps •Problem definition (user-based, item-based, ratings/binary…) Data Prep •Map-Reduce, cleansing, massaging data (input matrix) •Training Set, Validation Set Normalize • bias removal - Z-score, Mean-centering, Log • Pearson Correlation Coefficient Similarity • Cosine Similarityweights/Neighbors • K-nearest neighbor Train • Training model (only in model-based approaches) • Predict missing ratings Predict • top-N predictions for every user Denormalize • Reverse of normalizationEvaluate Accuracy • Accuracy, Precision, Recall, F1, ROC
  6. 6. User-based CFReference: Recommenderlab vignette,
  7. 7. Challenges• Dimensionality reduction (e.g. use PCA)• Input data sparsity (aka cold start problem)• Overfitting to training data set (use regularization)• Data wrangling, in general…
  8. 8. Just How Good is your Recommender?• Evaluation of predicted ratings (Mean Average Error, Root Mean Sq Error)• Evaluation of top-N recommendations – Mean Absolute Error – Accuracy – Precision & Recall (F1 score) – ROC curve
  9. 9. Tools
  10. 10. Open Source ToolsSoftware Description Language URL Hadoop ML library that includes Mahout Collaborative Filtering JavaCofi Collaborative Filtering Library Java Components to createCrab recommender systems Python Recommender for web pages Java Collaborative Filtering algorithmsLensKit from GroupLens Research Java Recommender system algorithms C#/Mono Toolkit for Feature based MatrixSVDFeature Factorization C++ Collaborative Filtering forVogoo PHP LIB personalized web sites PHP http://cran.r- R library for developing and testing collaborative filtering systems R lab/index.html Python module integrating classic ML algorithms in scientific Python packagesScikit-learn (numpy, scipy, matplotlib) Python
  11. 11. recommenderlabReference: Recommenderlab vignette,
  12. 12. MahoutDataModel model = new FileDataModel(new File("data.txt"));// Construct the list of pre-computed correlationsCollection<GenericItemSimilarity.ItemItemSimilarity> correlations = ...;ItemSimilarity itemSimilarity = new GenericItemSimilarity(correlations);Recommender recommender = new GenericItemBasedRecommender(model, itemSimilarity);Recommender cachingRecommender = new CachingRecommender(recommender);...List<RecommendedItem> recommendations = cachingRecommender.recommend (1234, 10);
  13. 13. Peter Harrington’s Sample Py Code
  14. 14. 2. References & Reading• High Level Reading – Programming Collective Intelligence by Toby Segaran. The 2nd chapter gives a good introduction to collaborative filtering with Python examples (non-SVD). – Matrix Factorization Techniques for Recommender Systems Yehuda Koren; Robert Bell; Chris Volinsky, IEEE Computer, 2009, 8• Singular Value Decomposition (SVD) Reading – The Singular Value Decomposition, by Jody Hourigan and Lynn McIndoo, Linear Algebra – Math 45. JodLynn/report2.pdf w/ Matlab & image examples – Numerical Recipes, 3rd Edition, Press et. al.,2007, p65-75.
  15. 15. References & Reading (continued)• Collaborative Filtering Reading – See papers on – Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu; Yehuda Koren; Chris Volinsky, IEEE International Conference on Data Mining (ICDM 2008), IEEE, 2008 – Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, Yehuda Koren, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD’08), 2008 – Collaborative Filtering with Temporal Dynamics, Yehuda Koren, KDD 2009, ACM, 2009 – James Thornton’s CF Blog – Apache Mahout Recommender documentation.html – Flexible Collaborative Filtering In Java With Mahout Taste - Philippe Adjiman – Books, Articles and Tutorials on Mahout/Cofi
  16. 16. Questions?