New Directions in Mahout's Recommenders


Published on

Published in: Education, Technology

New Directions in Mahout's Recommenders

  1. 1. New Directions in Mahout’s RecommendersSebastian Schelter, Apache Software FoundationRecommender Systems Get-together Berlin
  2. 2. NewDirectionsinMahout’sRecommenders2/28New Directions?Mahout in Action is the prime source ofinformation for using Mahout in practice.As it is more than two years old, itis missing a lot of recent developments.This talk describes what has been added to the recommendersof Mahout since then.
  3. 3. Single machine recommenders
  4. 4. NewDirectionsinMahout’sRecommenders4/28MyMedialite, scientific library of recom-mender system algorithmsMahout now features a couple of popular latent factor models,mostly ported by Zeno Gantner.
  5. 5. NewDirectionsinMahout’sRecommenders5/28New recommenders and factorizersBiasedItemBasedRecommender, item-based kNN withuser-item-bias estimationKoren: Factor in the Neighbors: Scalable and Accurate Collaborative Filtering, TKDD ’09RatingSGDFactorizer, biased matrix factorizationKoren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09SVDPlusPlusFactorizer, SVD++Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08ALSWRFactorizer, matrix factorization using AlternatingLeast SquaresZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
  6. 6. NewDirectionsinMahout’sRecommenders6/28Batch Item-Similarities on a single machineSimple but powerful way to deploy Mahout: Use item-basedcollaborative filtering with periodically precomputed itemsimilarities.Mahout now supports multithreaded item similaritycomputation on a single machine for data sizes that don’trequire a Hadoop-based solution.DataModel dataModel = new FileDataModel(new File(”movielens.csv”));ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel));ItemBasedRecommender recommender =new GenericItemBasedRecommender(dataModel, similarity);BatchItemSimilarities batch =new MultithreadedBatchItemSimilarities(recommender, k);batch.computeItemSimilarities(numThreads, maxDurationInHours,new FileSimilarItemsWriter(resultFile));
  7. 7. Parallel processing
  8. 8. NewDirectionsinMahout’sRecommenders8/28Collaborative Filteringidea: infer recommendations from patterns found in thehistorical user-item interactionsdata can be explicit feedback (ratings) or implicit feedback(clicks, pageviews), represented in the interaction matrix Aitem1 · · · item3 · · ·user1 3 · · · 4 · · ·user2 − · · · 4 · · ·user3 5 · · · 1 · · ·· · · · · · · · · · · · · · ·row ai denotes the interaction history of user iwe target use cases with millions of users and hundreds ofmillions of interactions
  9. 9. NewDirectionsinMahout’sRecommenders9/28MapReduceparadigm for data-intensive parallel processingdata is partitioned in a distributed file systemcomputation is moved to datasystem handles distribution, execution, scheduling, failuresfixed processing pipeline where user specifies twofunctionsmap : (k1, v1) → list(k2, v2)reduce : (k2, list(v2)) → list(v2)DFSInputInputInputmapmapmapreducereduceDFSOutputOutputshuffle
  10. 10. Scalable neighborhood methods
  11. 11. NewDirectionsinMahout’sRecommenders11/28Neighborhood MethodsItem-Based Collaborative Filtering is one of the mostdeployed CF algorithms, because:simple and intuitively understandableadditionally gives non-personalized, per-itemrecommendations (people who like X might also like Y)recommendations for new users without model retrainingcomprehensible explanations (we recommend Y becauseyou liked X)
  12. 12. NewDirectionsinMahout’sRecommenders12/28Cooccurrencesstart with a simplified view:imagine interaction matrix A wasbinary→ we look at cooccurrences onlyitem similarity computation becomes matrix multiplicationri = (A A) aiscale-out of the item-based approach reduces to finding anefficient way to compute the item similarity matrixS = A A
  13. 13. NewDirectionsinMahout’sRecommenders13/28Parallelizing S = A Astandard approach of computing item cooccurrences requiresrandom access to both users and itemsforeach item f doforeach user i who interacted with f doforeach item j that i also interacted with doSfj = Sfj + 1→ not efficiently parallelizable on partitioned datarow outer product formulation of matrix multiplication isefficiently parallelizable on a row-partitioned AS = A A =i∈Aai aimappers compute the outer products of rows of A, emit theresults row-wise, reducers sum these up to form S
  14. 14. NewDirectionsinMahout’sRecommenders14/28Parallel similarity computationreal datasets not binary and we want to use a variety ofsimilarity measures, e.g. Pearson correlationexpress similarity measures by 3 canonical functions, whichcan be efficiently embedded into the computation (cf.,VectorSimilarityMeasure)preprocess adjusts an item rating vectorf = preprocess( f ) j = preprocess( j )norm computes a single number from the adjusted vectornf = norm( f ) nj = norm( j )similarity computes the similarity of two vectors from thenorms and their dot productSfj = similarity( dotfj, nf , nj )
  15. 15. NewDirectionsinMahout’sRecommenders15/28Example: Jaccard coefficientpreprocess binarizes the rating vectorsif =3−5 j =441 f = bin(f ) =101 j = bin(j) =111norm computes the number of users that rated each itemnf = f 1 = 2 nj = j 1 = 3similarity finally computes the jaccard coefficient fromthe norms and the dot product of the vectorsjaccard(f , j) =|f ∩ j||f ∪ j|=dotfjnf + nj − dotfj=22 + 3 − 2=23
  16. 16. NewDirectionsinMahout’sRecommenders16/28Implementation in Mahouto.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJobcomputes the top-k pairwise similarities for each row of amatrix using some similarity the top-k similar items per item recommendations and similar items usingRowSimilarityJob
  17. 17. NewDirectionsinMahout’sRecommenders17/28MapReduce pass 1data partitioned by items (row-partitioned A )invokes preprocess and norm for each item vectortransposes input to form Areduceshufflecombinemap1----1-----1---21---1,2,2,-1,--1----1----1-----1-0,1,2,1,---1-----1--3212,0,-1,1----11---21---1,2,-1,--11-11-----11-,2, ,--1-11----0,1,21-----321-1, ,0 1 2 3 40 - - 1 - 11 1 - 1 1 -2 1 1 1 1 -binarized A pointingfrom users to itemsAT pointing fromitems to users21321item „norms“0 1 20 - 1 21 - - 12 3 1 53 - 2 44 1 - ---1-1--11---11-0,1,2,--321-1,
  18. 18. NewDirectionsinMahout’sRecommenders18/28MapReduce pass 2data partitioned by users (row-partitioned A)computes dot products of columnsloads norms and invokes similarityimplementation contains several optimizations(sparsification, exploit symmetry and thresholds)reduceshufflecombinemap0 1 2 3 40 - - 1 - 11 1 - 1 1 -2 1 1 1 1 --122---11----2-0,1,2,binarized A----12,--11----1--111---11-0,2,0,1,---1-2,----12,-122---11-0,1,---2-,----12,0 1 2 3 40 - 12231 -1 - - 1312-2 - - - 23133 - - - - -4 - - - - -“ATA“ holding itemsimilarities21321item „norms“
  19. 19. NewDirectionsinMahout’sRecommenders19/28Cost of the algorithmmajor cost in our algorithm is the communication in thesecond MapReduce pass: for each user, we have to process thesquare of the number of his interactionsS =i∈Aai ai→ cost is dominated by the densest rows of A(the users with the highest number of interactions)distribution of interactions per user is usually heavy tailed→ small number of power users with an unproportionallyhigh amount of interactions drastically increase the runtimeif a user has more than p interactions, only use a randomsample of size p of his interactionssaw negligible effect on prediction quality for moderate p
  20. 20. NewDirectionsinMahout’sRecommenders20/28Scalable Neighborhood Methods: ExperimentsSetup26 machines running Java 7 and Hadoop 1.0.4two 4-core Opteron CPUs, 32 GB memory and four 1 TBdisk drives per machineResultsYahoo Songs dataset (700M datapoints, 1.8M users, 136Kitems), 26 machines, similarity computation takes less than 40minutes
  21. 21. Scalable matrix factorization
  22. 22. NewDirectionsinMahout’sRecommenders22/28Latent factor models: ideainteractions are deeply influenced by a set of factors that arevery specific to the domain (e.g. amount of action orcomplexity of characters in movies)these factors are in general not obvious, we might be able tothink of some of them but it’s hard to estimate their impacton the interactionsneed to infer those so called latent factors from theinteraction data
  23. 23. NewDirectionsinMahout’sRecommenders23/28low-rank matrix factorizationapproximately factor A into the product of two rank r featurematrices U and M such that A ≈ UM.U models the latent features of the users, M models the latentfeatures of the itemsdot product ui mj in the latent feature space predicts strengthof interactions between user i and item jto obtain a factorization, minimize regularized squared errorover the observed interactions, e.g.:minU,M(i,j)∈A(aij − ui mj)2+ λinui ui2+jnmj mj2
  24. 24. NewDirectionsinMahout’sRecommenders24/28Alternating Least SquaresALS rotates between fixing U and M. When U is fixed, thesystem recomputes M by solving a least-squares problem peritem, and vice versa.easy to parallelize, as all users (and vice versa, items) can berecomputed independentlyadditionally, ALS is able to solve non-sparse models fromimplicit data≈ ×Au × iUu × kMk × i
  25. 25. NewDirectionsinMahout’sRecommenders25/28Implementation in a factorization using Alternating Least Squares, hasdifferent solvers for explicit and implicit dataZhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’ computesthe prediction error of a factorization on a test computesrecommendations from a factorization
  26. 26. NewDirectionsinMahout’sRecommenders26/28Scalable Matrix Factorization: ImplementationRecompute user feature matrix U using a broadcast-join:1. Run a map-only job using multithreaded mappers2. load item-feature matrix M into memory from HDFS toshare it among the individual mappers3. mappers read the interaction histories of the users4. multithreaded: solve a least squares problem per user torecompute its feature vectoruser histories A user features Uitem features MMapHash-Join + Re-computationlocalfwdlocalfwdlocalfwdMapHash-Join + Re-computationMapHash-Join + Re-computationbroadcastmachine1machine2machine3
  27. 27. NewDirectionsinMahout’sRecommenders27/28Scalable Matrix Factorization: ExperimentsSetup26 machines running Java 7 and Hadoop 1.0.4two 4-core Opteron CPUs, 32 GB memory and four 1 TBdisk drives per machineconfigured Hadoop to reuse JVMs, ran multithreadedmappersResultsYahoo Songs dataset (700M datapoints), 26 machines, singleiteration (two map-only jobs) takes less than 2 minutes
  28. 28. Thanks for listening!Follow me on twitter at Mahout’s mailinglists at on slide 3 by Tim Abott, on slide 21 by Crimson Diabolics,