ACM Data Mining Hackathon
          8/18/2012




Recommender Systems
       Navisro Analytics
            @navisro
       info@navisro.com
    http://www.navisro.com
Capturing the Long Tail…
Recommender Approaches
                                                         Model Based
                                                         Training SVM,
                                                         LDA, SVD for
                               Collaborative             implicit features
                            Filtering – Item-
                             Item similarity
                         (You like Godfather
                             so you will like
    Attribute-based        Scarface - Netflix)
  recommendations
     (You like action
    movies, starring
Clint Eastwood, you                               Social+Interest
  might like “Good,                               Graph Based (Your
 Bad and the Ugly”                                friends like Lady
              Netflix)        Collaborative       Gaga so you will
                              Filtering – User-   like Lady Gaga,
                              User Similarity     PYMK – Facebook,
                                                  LinkedIn)
                              (People like you
                              who bought beer
       Item                   also bought
       Hierarchy              diapers - Target)
       (You bought
       Printer you
       will also need
       ink - BestBuy)
Other/Model-based
           Approaches
• Slope one recommender
• Latent factor Models for Web Data
  – Matrix factorization using SVD, ALS,
    with Regularization
  – LDA, SVM, Bayesian Clustering
General Steps
                    •Problem definition (user-based, item-based, ratings/binary…)
    Data Prep       •Map-Reduce, cleansing, massaging data (input matrix)
                    •Training Set, Validation Set


   Normalize        • bias removal - Z-score, Mean-centering, Log

                     • Pearson Correlation Coefficient
    Similarity
                     • Cosine Similarity
weights/Neighbors    • K-nearest neighbor

      Train         • Training model (only in model-based approaches)

                    • Predict missing ratings
     Predict
                    • top-N predictions for every user

  Denormalize       • Reverse of normalization

Evaluate Accuracy   • Accuracy, Precision, Recall, F1, ROC
User-based CF




Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
Challenges
• Dimensionality reduction (e.g. use PCA)
• Input data sparsity (aka cold start
  problem)
• Overfitting to training data set (use
  regularization)
• Data wrangling, in general…
Just How Good is your
          Recommender?
• Evaluation of predicted ratings (Mean
  Average Error, Root Mean Sq Error)

• Evaluation of top-N recommendations
  – Mean Absolute Error
  – Accuracy
  – Precision & Recall (F1 score)
  – ROC curve
Tools
Open Source Tools
Software          Description                          Language   URL
                  Hadoop ML library that includes                 http://mahout.apache.org/
Apache Mahout     Collaborative Filtering              Java

Cofi              Collaborative Filtering Library      Java       http://www.nongnu.org/cofi/
                  Components to create
Crab              recommender systems                  Python     https://github.com/muricoca/crab

easyrec           Recommender for web pages            Java       http://easyrec.org/
                  Collaborative Filtering algorithms
LensKit           from GroupLens Research              Java       http://lenskit.grouplens.org/

MyMediaLite       Recommender system algorithms        C#/Mono    http://mloss.org/software/view/282/
                  Toolkit for Feature based Matrix
SVDFeature        Factorization                        C++        http://mloss.org/software/view/333/
                  Collaborative Filtering for
Vogoo PHP LIB     personalized web sites               PHP        http://sourceforge.net/projects/vogoo/
                                                                  http://cran.r-
               R library for developing and testing               project.org/web/packages/recommender
recommenderlab collaborative filtering systems      R             lab/index.html
               Python module integrating
               classic ML algorithms in
               scientific Python packages
Scikit-learn   (numpy, scipy, matplotlib)           Python        http://scikit-learn.org/stable/
recommenderlab




Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
Mahout
DataModel model = new FileDataModel(new File("data.txt"));

// Construct the list of pre-computed correlations
Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =
           ...;
ItemSimilarity itemSimilarity =
          new GenericItemSimilarity(correlations);

Recommender recommender =
       new GenericItemBasedRecommender(model, itemSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
...
List<RecommendedItem> recommendations = cachingRecommender.recommend (1234, 10);
Peter Harrington’s Sample Py
            Code
2. References & Reading
• High Level Reading
  – Programming Collective Intelligence by Toby Segaran. The 2nd
    chapter gives a good introduction to collaborative filtering with Python
    examples (non-SVD).
  – Matrix Factorization Techniques for Recommender Systems
    Yehuda Koren; Robert Bell; Chris Volinsky, IEEE Computer,
    2009, 8
• Singular Value Decomposition (SVD) Reading
  – The Singular Value Decomposition, by Jody Hourigan and Lynn
    McIndoo, Linear Algebra – Math 45.
    http://online.redwoods.edu/INSTRUCT/darnold/LAPROJ/Fall98/
    JodLynn/report2.pdf w/ Matlab & image examples
  – Numerical Recipes, 3rd Edition, Press et. al.,2007, p65-75.
References & Reading (continued)
• Collaborative Filtering Reading
   – See papers on research.yahoo.com/Yehuda_Koren
   – Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu;
     Yehuda Koren; Chris Volinsky, IEEE International Conference on
     Data Mining (ICDM 2008), IEEE, 2008
   – Factorization Meets the Neighborhood: a Multifaceted Collaborative
     Filtering Model, Yehuda Koren, ACM Int. Conference on
     Knowledge Discovery and Data Mining (KDD’08), 2008
   – Collaborative Filtering with Temporal Dynamics, Yehuda Koren,
     KDD 2009, ACM, 2009
   – James Thornton’s CF Blog http://original.jamesthornton.com/cf/
   – Apache Mahout Recommender
     https://cwiki.apache.org/MAHOUT/recommender-
     documentation.html
   – Flexible Collaborative Filtering In Java With Mahout Taste - Philippe
     Adjiman
   – Books, Articles and Tutorials on Mahout/Cofi
Questions?

Collaborative Filtering and Recommender Systems By Navisro Analytics

  • 1.
    ACM Data MiningHackathon 8/18/2012 Recommender Systems Navisro Analytics @navisro info@navisro.com http://www.navisro.com
  • 2.
  • 3.
    Recommender Approaches Model Based Training SVM, LDA, SVD for Collaborative implicit features Filtering – Item- Item similarity (You like Godfather so you will like Attribute-based Scarface - Netflix) recommendations (You like action movies, starring Clint Eastwood, you Social+Interest might like “Good, Graph Based (Your Bad and the Ugly” friends like Lady Netflix) Collaborative Gaga so you will Filtering – User- like Lady Gaga, User Similarity PYMK – Facebook, LinkedIn) (People like you who bought beer Item also bought Hierarchy diapers - Target) (You bought Printer you will also need ink - BestBuy)
  • 4.
    Other/Model-based Approaches • Slope one recommender • Latent factor Models for Web Data – Matrix factorization using SVD, ALS, with Regularization – LDA, SVM, Bayesian Clustering
  • 5.
    General Steps •Problem definition (user-based, item-based, ratings/binary…) Data Prep •Map-Reduce, cleansing, massaging data (input matrix) •Training Set, Validation Set Normalize • bias removal - Z-score, Mean-centering, Log • Pearson Correlation Coefficient Similarity • Cosine Similarity weights/Neighbors • K-nearest neighbor Train • Training model (only in model-based approaches) • Predict missing ratings Predict • top-N predictions for every user Denormalize • Reverse of normalization Evaluate Accuracy • Accuracy, Precision, Recall, F1, ROC
  • 6.
    User-based CF Reference: Recommenderlabvignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
  • 7.
    Challenges • Dimensionality reduction(e.g. use PCA) • Input data sparsity (aka cold start problem) • Overfitting to training data set (use regularization) • Data wrangling, in general…
  • 8.
    Just How Goodis your Recommender? • Evaluation of predicted ratings (Mean Average Error, Root Mean Sq Error) • Evaluation of top-N recommendations – Mean Absolute Error – Accuracy – Precision & Recall (F1 score) – ROC curve
  • 9.
  • 10.
    Open Source Tools Software Description Language URL Hadoop ML library that includes http://mahout.apache.org/ Apache Mahout Collaborative Filtering Java Cofi Collaborative Filtering Library Java http://www.nongnu.org/cofi/ Components to create Crab recommender systems Python https://github.com/muricoca/crab easyrec Recommender for web pages Java http://easyrec.org/ Collaborative Filtering algorithms LensKit from GroupLens Research Java http://lenskit.grouplens.org/ MyMediaLite Recommender system algorithms C#/Mono http://mloss.org/software/view/282/ Toolkit for Feature based Matrix SVDFeature Factorization C++ http://mloss.org/software/view/333/ Collaborative Filtering for Vogoo PHP LIB personalized web sites PHP http://sourceforge.net/projects/vogoo/ http://cran.r- R library for developing and testing project.org/web/packages/recommender recommenderlab collaborative filtering systems R lab/index.html Python module integrating classic ML algorithms in scientific Python packages Scikit-learn (numpy, scipy, matplotlib) Python http://scikit-learn.org/stable/
  • 11.
    recommenderlab Reference: Recommenderlab vignette,http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
  • 12.
    Mahout DataModel model =new FileDataModel(new File("data.txt")); // Construct the list of pre-computed correlations Collection<GenericItemSimilarity.ItemItemSimilarity> correlations = ...; ItemSimilarity itemSimilarity = new GenericItemSimilarity(correlations); Recommender recommender = new GenericItemBasedRecommender(model, itemSimilarity); Recommender cachingRecommender = new CachingRecommender(recommender); ... List<RecommendedItem> recommendations = cachingRecommender.recommend (1234, 10);
  • 13.
  • 14.
    2. References &Reading • High Level Reading – Programming Collective Intelligence by Toby Segaran. The 2nd chapter gives a good introduction to collaborative filtering with Python examples (non-SVD). – Matrix Factorization Techniques for Recommender Systems Yehuda Koren; Robert Bell; Chris Volinsky, IEEE Computer, 2009, 8 • Singular Value Decomposition (SVD) Reading – The Singular Value Decomposition, by Jody Hourigan and Lynn McIndoo, Linear Algebra – Math 45. http://online.redwoods.edu/INSTRUCT/darnold/LAPROJ/Fall98/ JodLynn/report2.pdf w/ Matlab & image examples – Numerical Recipes, 3rd Edition, Press et. al.,2007, p65-75.
  • 15.
    References & Reading(continued) • Collaborative Filtering Reading – See papers on research.yahoo.com/Yehuda_Koren – Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu; Yehuda Koren; Chris Volinsky, IEEE International Conference on Data Mining (ICDM 2008), IEEE, 2008 – Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, Yehuda Koren, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD’08), 2008 – Collaborative Filtering with Temporal Dynamics, Yehuda Koren, KDD 2009, ACM, 2009 – James Thornton’s CF Blog http://original.jamesthornton.com/cf/ – Apache Mahout Recommender https://cwiki.apache.org/MAHOUT/recommender- documentation.html – Flexible Collaborative Filtering In Java With Mahout Taste - Philippe Adjiman – Books, Articles and Tutorials on Mahout/Cofi
  • 16.