Collaborative Filtering and Recommender Systems By Navisro Analytics

ACM Data Mining Hackathon
8/18/2012

Recommender Systems
Navisro Analytics
@navisro
info@navisro.com
http://www.navisro.com

Recommender Approaches
Model Based
Training SVM,
LDA, SVD for
Collaborative implicit features
Filtering – Item-
Item similarity
(You like Godfather
so you will like
Attribute-based Scarface - Netflix)
recommendations
(You like action
movies, starring
Clint Eastwood, you Social+Interest
might like “Good, Graph Based (Your
Bad and the Ugly” friends like Lady
Netflix) Collaborative Gaga so you will
Filtering – User- like Lady Gaga,
User Similarity PYMK – Facebook,
LinkedIn)
(People like you
who bought beer
Item also bought
Hierarchy diapers - Target)
(You bought
Printer you
will also need
ink - BestBuy)

Other/Model-based
Approaches
• Slope one recommender
• Latent factor Models for Web Data
– Matrix factorization using SVD, ALS,
with Regularization
– LDA, SVM, Bayesian Clustering

General Steps
•Problem definition (user-based, item-based, ratings/binary…)
Data Prep •Map-Reduce, cleansing, massaging data (input matrix)
•Training Set, Validation Set

Normalize • bias removal - Z-score, Mean-centering, Log

• Pearson Correlation Coefficient
Similarity
• Cosine Similarity
weights/Neighbors • K-nearest neighbor

Train • Training model (only in model-based approaches)

• Predict missing ratings
Predict
• top-N predictions for every user

Denormalize • Reverse of normalization

Evaluate Accuracy • Accuracy, Precision, Recall, F1, ROC

User-based CF

Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

Challenges
• Dimensionality reduction (e.g. use PCA)
• Input data sparsity (aka cold start
problem)
• Overfitting to training data set (use
regularization)
• Data wrangling, in general…

Just How Good is your
Recommender?
• Evaluation of predicted ratings (Mean
Average Error, Root Mean Sq Error)

• Evaluation of top-N recommendations
– Mean Absolute Error
– Accuracy
– Precision & Recall (F1 score)
– ROC curve

Open Source Tools
Software Description Language URL
Hadoop ML library that includes http://mahout.apache.org/
Apache Mahout Collaborative Filtering Java

Cofi Collaborative Filtering Library Java http://www.nongnu.org/cofi/
Components to create
Crab recommender systems Python https://github.com/muricoca/crab

easyrec Recommender for web pages Java http://easyrec.org/
Collaborative Filtering algorithms
LensKit from GroupLens Research Java http://lenskit.grouplens.org/

MyMediaLite Recommender system algorithms C#/Mono http://mloss.org/software/view/282/
Toolkit for Feature based Matrix
SVDFeature Factorization C++ http://mloss.org/software/view/333/
Collaborative Filtering for
Vogoo PHP LIB personalized web sites PHP http://sourceforge.net/projects/vogoo/
http://cran.r-
R library for developing and testing project.org/web/packages/recommender
recommenderlab collaborative filtering systems R lab/index.html
Python module integrating
classic ML algorithms in
scientific Python packages
Scikit-learn (numpy, scipy, matplotlib) Python http://scikit-learn.org/stable/

recommenderlab

Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

Mahout
DataModel model = new FileDataModel(new File("data.txt"));

// Construct the list of pre-computed correlations
Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =
...;
ItemSimilarity itemSimilarity =
new GenericItemSimilarity(correlations);

Recommender recommender =
new GenericItemBasedRecommender(model, itemSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
...
List<RecommendedItem> recommendations = cachingRecommender.recommend (1234, 10);

Peter Harrington’s Sample Py
Code

2. References & Reading
• High Level Reading
– Programming Collective Intelligence by Toby Segaran. The 2nd
chapter gives a good introduction to collaborative filtering with Python
examples (non-SVD).
– Matrix Factorization Techniques for Recommender Systems
Yehuda Koren; Robert Bell; Chris Volinsky, IEEE Computer,
2009, 8
• Singular Value Decomposition (SVD) Reading
– The Singular Value Decomposition, by Jody Hourigan and Lynn
McIndoo, Linear Algebra – Math 45.
http://online.redwoods.edu/INSTRUCT/darnold/LAPROJ/Fall98/
JodLynn/report2.pdf w/ Matlab & image examples
– Numerical Recipes, 3rd Edition, Press et. al.,2007, p65-75.

References & Reading (continued)
• Collaborative Filtering Reading
– See papers on research.yahoo.com/Yehuda_Koren
– Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu;
Yehuda Koren; Chris Volinsky, IEEE International Conference on
Data Mining (ICDM 2008), IEEE, 2008
– Factorization Meets the Neighborhood: a Multifaceted Collaborative
Filtering Model, Yehuda Koren, ACM Int. Conference on
Knowledge Discovery and Data Mining (KDD’08), 2008
– Collaborative Filtering with Temporal Dynamics, Yehuda Koren,
KDD 2009, ACM, 2009
– James Thornton’s CF Blog http://original.jamesthornton.com/cf/
– Apache Mahout Recommender
https://cwiki.apache.org/MAHOUT/recommender-
documentation.html
– Flexible Collaborative Filtering In Java With Mahout Taste - Philippe
Adjiman
– Books, Articles and Tutorials on Mahout/Cofi

Collaborative Filtering and Recommender Systems By Navisro Analytics

More Related Content

What's hot

Viewers also liked

Similar to Collaborative Filtering and Recommender Systems By Navisro Analytics

Recently uploaded

Collaborative Filtering and Recommender Systems By Navisro Analytics