Next directions in Mahout's recommenders


Published on

Slides from my talk "Next directions in Mahout's recommenders" given at the Bay Area Mahout Meetup

Published in: Technology, Education

Next directions in Mahout's recommenders

  1. 1. Next Directions in Mahout’s Recommenders Sebastian Schelter, Apache Software Foundation Bay Area Mahout Meetup
  2. 2. NextDirectionsinMahout’sRecommenders 2/38 About me PhD student at the Database Systems and Information Management Group of Technische Universit¨at Berlin Member of the Apache Software Foundation, committer on Mahout and Giraph currently interning at IBM Research Almaden
  3. 3. NextDirectionsinMahout’sRecommenders 3/38 Next Directions? Mahout in Action is the prime source of information for using Mahout in practice. As it is more than two years old (and only covers Mahout 0.5), it is missing a lot of recent developments. This talk describes what has been added to the recommenders of Mahout since then and gives suggestions on directions for future versions of Mahout.
  4. 4. Collaborative Filtering 101
  5. 5. NextDirectionsinMahout’sRecommenders 5/38 Collaborative Filtering Problem: Given a user’s interactions with items, guess which other items would be highly preferred Collaborative Filtering: infer recommendations from patterns found in the historical user-item interactions data can be explicit feedback (ratings) or implicit feedback (clicks, pageviews), represented in the interaction matrix A      item1 · · · item3 · · · user1 3 · · · 4 · · · user2 − · · · 4 · · · user3 5 · · · 1 · · · · · · · · · · · · · · · · · ·     
  6. 6. NextDirectionsinMahout’sRecommenders 6/38 Neighborhood Methods User-based: for each user, compute a ”jury” of users with similar taste pick the recommendations from the ”jury’s” items Item-based: for each item, compute a set of items with similar interaction pattern pick the recommendations from those similar items
  7. 7. NextDirectionsinMahout’sRecommenders 7/38 Neighborhood Methods item-based variant most popular: simple and intuitively understandable additionally gives non-personalized, per-item recommendations (people who like X might also like Y) recommendations for new users without model retraining comprehensible explanations (we recommend Y because you liked X)
  8. 8. NextDirectionsinMahout’sRecommenders 8/38 Latent factor models Idea: interactions are deeply influenced by a set of factors that are very specific to the domain (e.g. amount of action or complexity of characters in movies) these factors are in general not obvious and need to be inferred from the interaction data both users and items can be described in terms of these factors
  9. 9. NextDirectionsinMahout’sRecommenders 9/38 Matrix factorization Computing a latent factor model: approximately factor A into the product of two rank k feature matrices U and M such that A ≈ UM. U models the latent features of the users, M models the latent features of the items dot product ui mj in the latent feature space predicts strength of interaction between user i and item j ≈ × A u × i U u × k M k × i
  10. 10. Single machine recommenders
  11. 11. NextDirectionsinMahout’sRecommenders 11/38 Taste based on Sean Owen’s Taste framework (started in 2005) mature and stable codebase Recommender implementations encapsulate recommender algorithms DataModel implementations handle interaction data in memory, files, databases, key-value stores but focus was mostly on neighborhood methods lack of implementations for latent factor models little support for scientific usecases (e.g. recommender contests)
  12. 12. NextDirectionsinMahout’sRecommenders 12/38 Collaboration MyMedialite, scientific library of recom- mender system algorithms Mahout now features a couple of popular latent factor models, mostly ported by Zeno Gantner.
  13. 13. NextDirectionsinMahout’sRecommenders 13/38 Lots of different Factorizers for our SVDRecommender RatingSGDFactorizer, biased matrix factorization Koren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09 SVDPlusPlusFactorizer, SVD++ Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08 ALSWRFactorizer, matrix factorization using Alternating Least Squares Zhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08 Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08 ParallelSGDFactorizer, parallel version of biased matrix factorization (contributed by Peng Cheng) Tak´acs et. al.: Scalable Collaborative Filtering Approaches for Large Recommender Systems, JMLR ’09 Niu et al.: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, NIPS ’11
  14. 14. NextDirectionsinMahout’sRecommenders 14/38 Next directions better tooling for cross-validation and hold-out tests (e.g. time-based splits of interactions) memory-efficient DataModel implementations tailored to specific usecases (e.g. matrix factorization with SGD) better support for computing recommendations for ”anonymous” users online recommenders
  15. 15. NextDirectionsinMahout’sRecommenders 15/38 Usage researchers at TU Berlin and CWI Amsterdam regularly use Mahout for their recommender research published at international conferences ”Bayrischer Rundfunk”, one of Germany’s largest public TV broadcasters, uses Mahout to help users discover TV content in its online media library Berlin-based company plista runs a live contest for the best news recommender algorithm and provides Mahout-based ”skeleton code” to participants The Dutch Institute of Sound and Vision runs a webplatform that uses Mahout for recommending content from its archive of Dutch audio-visual heritage collections of the 20th century
  16. 16. Parallel processing
  17. 17. NextDirectionsinMahout’sRecommenders 17/38 Distribution difficult environment: data is partitioned and stored in a distributed filesystem algorithms must be expressed in MapReduce our distributed implementations focus on two popular methods item-based collaborative filtering matrix factorization with Alternating Least Squares
  18. 18. Scalable neighborhood methods
  19. 19. NextDirectionsinMahout’sRecommenders 19/38 Cooccurrences start with a simplified view: imagine interaction matrix A was binary → we look at cooccurrences only item similarity computation becomes matrix multiplication S = A A scale-out of the item-based approach reduces to finding an efficient way to compute this item similarity matrix
  20. 20. NextDirectionsinMahout’sRecommenders 20/38 Parallelizing S = A A standard approach of computing item cooccurrences requires random access to both users and items foreach item f do foreach user i who interacted with f do foreach item j that i also interacted with do Sfj = Sfj + 1 → not efficiently parallelizable on partitioned data row outer product formulation of matrix multiplication is efficiently parallelizable on a row-partitioned A S = A A = i∈A ai ai mappers compute the outer products of rows of A, emit the results row-wise, reducers sum these up to form S
  21. 21. NextDirectionsinMahout’sRecommenders 21/38 Parallel similarity computation much more details in the implementation support for various similarity measures various optimizations (e.g. for symmetric similarity measures) downsampling of skewed interaction data in-depth description available in: Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce ACM RecSys 2012
  22. 22. NextDirectionsinMahout’sRecommenders 22/38 Implementation in Mahout o.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJob computes the top-k pairwise similarities for each row of a matrix using some similarity measure computes the top-k similar items per item using RowSimilarityJob computes recommendations and similar items using RowSimilarityJob
  23. 23. NextDirectionsinMahout’sRecommenders 23/38 Scalable Neighborhood Methods: Experiments Setup 6 machines running Java 7 and Hadoop 1.0.4 two 4-core Opteron CPUs, 32 GB memory and four 1 TB disk drives per machine Results Yahoo Songs dataset (700M datapoints, 1.8M users, 136K items), similarity computation takes less than 100 minutes
  24. 24. Scalable matrix factorization
  25. 25. NextDirectionsinMahout’sRecommenders 25/38 Alternating Least Squares ALS rotates between fixing U and M. When U is fixed, the system recomputes M by solving a least-squares problem per item, and vice versa. easy to parallelize, as all users (and vice versa, items) can be recomputed independently additionally, ALS can be applied to usecases with implicit data (pageviews, clicks) ≈ × A u × i U u × k M k × i
  26. 26. NextDirectionsinMahout’sRecommenders 26/38 Scalable Matrix Factorization: Implementation Recompute user feature matrix U using a broadcast-join: 1. Run a map-only job using multithreaded mappers 2. load item-feature matrix M into memory from HDFS to share it among the individual mappers 3. mappers read the interaction histories of the users 4. multithreaded: solve a least squares problem per user to recompute its feature vector user histories A user features U item features M Map Hash-Join + Re-computation localfwdlocalfwdlocalfwd Map Hash-Join + Re-computation Map Hash-Join + Re-computation broadcast machine1machine2machine3
  27. 27. NextDirectionsinMahout’sRecommenders 27/38 Implementation in Mahout different solvers for explicit and implicit data Zhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08 Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08 computes recommendations from a factorization in-depth description available in: Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov, Volker Markl: Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins to appear at ACM RecSys 2013
  28. 28. NextDirectionsinMahout’sRecommenders 28/38 Scalable Matrix Factorization: Experiments Cluster: 26 machines, two 4-core Opteron CPUs, 32 GB memory and four 1 TB disk drives each Hadoop Configuration: reuse JVMs, used JBlas as solver, run multithreaded mappers Datasets: Netflix (0.5M users, 100M datapoints), Yahoo Songs (1.8M users, 700M datapoints), Bigflix (25M users, 5B datapoints) 0 50 100 150 number of features r avg.durationperjob(seconds) (U )10 (M )10 (U )20 (M )20 (U )50 (M )50 (U )100 (M )100 Yahoo Songs Netflix 5 10 15 20 25 0 100 200 300 400 500 600 number of machines avg.durationperjob(seconds) Bigflix (M) Bigflix (U)
  29. 29. NextDirectionsinMahout’sRecommenders 29/38 Next directions better tooling for cross-validation and hold-out tests (e.g. to find parameters for ALS) integration of more efficient solver libraries like JBlas should be easier to modify and adjust the MapReduce code
  30. 30. NextDirectionsinMahout’sRecommenders 30/38 A selection of users Mendeley, a data platform for researchers (2.5M users, 50M research articles): Mendeley Suggest for discovering relevant research publications Researchgate, the world’s largest social network for researchers (3M users) a German online retailer with several million customers across Europe German online market places for real estate and pre-owned cars with millions of users
  31. 31. Deployment -
  32. 32. NextDirectionsinMahout’sRecommenders 32/38 ”Small data, low load” use GenericItembasedRecommender or GenericUserbasedRecommender, feed it with interaction data stored in a file, database or key-value store have it load the interaction data in memory and compute recommendations on request collect new interactions into your files or database and periodically refresh the recommender In order to improve performance, try to: have your recommender look at fewer interactions by using SamplingCandidateItemsStrategy cache computed similarities with a CachingItemSimilarity
  33. 33. NextDirectionsinMahout’sRecommenders 33/38 ”Medium data, high load” Assumption: interaction data still fits into main memory use a recommender that is able to leverage a precomputed model, e.g. GenericItembasedRecommender or SVDRecommender load the interaction data and the model in memory and compute recommendations on request collect new interactions into your files or database and periodically recompute the model and refresh the recommender use BatchItemSimilarities or ParallelSGDFactorizer for precomputing the model using multiple threads on a single machine
  34. 34. NextDirectionsinMahout’sRecommenders 34/38 ”Lots of data, high load” Assumption: interaction data does not fit into main memory use a recommender that is able to leverage a precomputed model, e.g. GenericItembasedRecommender or SVDRecommender keep the interaction data in a (potentially partitioned) database or in a key-value store load the model into memory, the recommender will only use one (cacheable) query per recommendation request to retrieve the user’s interaction history collect new interactions into your files or database and periodically recompute the model offline use ItemSimilarityJob or ParallelALSFactorizationJob to precompute the model with Hadoop
  35. 35. NextDirectionsinMahout’sRecommenders 35/38 ”Precompute everything” use RecommenderJob to precompute recommendations for all users with Hadoop directly serve those recommendations successfully employed by Mendeley for their research paper recommender ”Suggest” allowed them to run their recommender infrastructure serving 2 million users for less than $100 per month in AWS
  36. 36. NextDirectionsinMahout’sRecommenders 36/38 Next directions ”Search engine based recommender infrastructure” (work in progress driven by Pat Ferrel) use RowSimilarityJob to find anomalously co-occuring items using Hadoop index those item pairs with a distributed search engine such as Apache Solr query based on a user’s interaction history and the search engine will answer with recommendations gives us an easy-to-use, scalable serving layer for free (Apache Solr) allows complex recommendation queries containing filters, geo-location, etc.
  37. 37. NextDirectionsinMahout’sRecommenders 37/38 The shape of things to come MapReduce is not well suited for certain ML usecases, e.g. when the algorithms to apply are iterative and the dataset fits into the aggregate main memory of the cluster Mahout always stated that it is not tied to Hadoop, however there were no production-quality alternatives in the past With the advent of YARN and the maturing of alternative systems, this situation is changing and we should embrace this change Personally, I would love to see an experimental port of our distributed recommenders to another Apache-supported system such Spark or Giraph
  38. 38. Thanks for listening! Follow me on twitter at Join Mahout’s mailinglists at picture on slide 3 by Tim Abott, picture on slide 21 by Crimson Diabolics,
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.