A QUICK TUTORIAL ON MAHOUT’S RECOMMENDATION ENGINE (V 0.4) Jee Vang, Ph.D. [email_address]   A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License. Slide Version 3.1
What is recommendation? Recommendation involves the prediction of what new items a user would like or dislike based on preferences of or associations to previous items (Made-up) Example: A user, John Doe, likes the following books (items): A Tale of Two Cities The Great Gatsby For Whom the Bell Tolls Recommendations will predict which new books (items), John Doe, will like: Jane Eyre The Adventures of Tom Sawyer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
What is Mahout? Mahout is a machine learning application programming interface (API) built on Hadoop MapReduce (MR or M/R) Hadoop Distributed File System (HDFS) Mahout is written in Java Mahout has machine learning algorithms in the following areas: Clustering Pattern mining Classification Regression Evolutionary algorithms Recommenders/Collaborative filtering A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
How does Mahout’s Recommendation Engine Work? X = S U R S is the similarity matrix between items U is the user’s preferences for items R is the predicted recommendations A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
What is the similarity matrix, S? S is a n x n (square) matrix Each element, e, in S are indexed by row (j) and column (k), e jk Each e jk  in S holds a value that describes how similar are its corresponding j-th and k-th items In this example, the similarity of the j-th and k-th items are determined by frequency of their co-occurrence (when the j-th item is seen, the k-th item is seen as well) In general, any similarity measure may be used to produce these values We see in this example that  Items 1 and 2 co-occur 3 times,  Items 1 and 3 co-occur 4 times,  and so on… S Item 1 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
What is the user’s preferences, U? The user’s preference is represented as a column vector Each value in the vector represents the user’s preference for j-th item In general, this column vector is sparse Values of zero, 0, represent no recorded preferences for the j-th item U Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
What is the recommendation, R? R is a column vector representing the prediction of recommendation of the j-th item for the user R is computed from the multiplication of S and U S x U = R In this running example, the user already has expressed positive preferences for Items 1, 4, 5 and 7, so we look at only Items 2, 3, and 6 We would recommend to the user Items 3, 2, and 6, in this order, to the user R Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
What data format does Mahout’s recommendation engine expects? For Mahout v0.4, look at  RecommenderJob   (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) Each line of the input file should have the following format userID,itemID[,preferencevalue] userID is parsed as a long itemID is parsed as a long preferencevalue is parsed as a double and is optional Format 1 123,345 123,456 123,789 … 789,458 Format 2 123,345,1.0 123,456,2.2 123,789,3.4 … 789,458,1.2 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
How do you run Mahout’s recommendation engine? Requirements Hadoop cluster on GNU/Linux Java 1.6.x SSH Assuming you have a Hadoop cluster installed and configured correctly with the data loaded into HDFS, $HADOOP_INSTALL$/bin/hadoop jar $TARGET$/mahout-core-0.4-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob  -Dmapred.input.dir=$INPUT$ -Dmapred.output.dir=$OUTPUT$ $HADOOP_INSTALL$ is the location where you installed Hadoop $TARGET$ is the directory where you have the Mahout jar file $INPUT$ is the input file name $OUTPUT$ is the output file name There are plenty of runtime options (check javadocs) --userFile  (path) : optional; a file containing userIDs; only preferences of these userIDs will be computed --itemsFile  (path) : optional; a file containing itemIDs; only these items will be used in the recommendation predictions --numRecommendations  (integer) : number of recommendations to compute per user; default 10 --booleanData  (boolean) : treat input data as having no preference values; default false --maxPrefsPerUser  (integer) : maximum number of preferences considered per user in final recommendation phase; default 10 --similarityClassname (classname): similarity measure (cooccurence, euclidean, log-likelihood, pearson, tanimoto coefficient, uncentered cosine, cosine) A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
What are the mechanics of Mahout’s recommendation engine? Mahout is built on Hadoop’s MapReduce (MR) API <K1,V1>    map    <K2,V2> <K2,List(V2)>   reduce   <K3,V3> A series of MR phases (Jobs) are called to accomplish the task of predicting recommendations ItemIDIndexMapper, ItemIDIndexReducer ItemPrefsMapper,ToUserVectorReducer CounterUsersMapper,CounterUsersReducer … PartialMultiplyMapper,AggregateAndRecommendReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 1, Generate List of ItemIDs Input:  <LongWritable,Text> Output:  <VarIntWritable,VarLongWritable> Parses out itemID long Converts itemID to int, itemID int Emits  <itemID int ,itemID long > Input: <VarIntWritable,List(VarLongWritable)> Output: <VarIntWritable,VarLongWritable> Find the smallest value in the list of values, itemID long min Emits  <itemID int , itemID long min  > ItemIDIndexMapper ItemIDIndexReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 2, Create Preference Vector Input:  <LongWritable,Text> Output:  <VarLongWritable,VarLongWritable> Parses out userID and itemID Emits  <userID,itemID> Input: <VarLongWritable,List(VarLongWritable)> Output: <VarLongWritable,VectorWritable> Creates preferences, U U is a sparse Vector Emits  <userID, U> ToItemPrefsMapper ToUserVectorReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 3, Count Unique Users Input:  <LongWritable,Text> Output:  <CountUsersKeyWritable,VarLongWritable> Parses out userID Emits  <userID,userID> Input: <CountUsersKeyWritable,List(VarLongWritable)> Output: <VarIntWritable,NullWritable> Count all unique users, numUsers Emits  <numUsers, null> CountUsersMapper CountUsersReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 4, Transpose Preferences Vectors Input:  <VarLongWritable,VectorWritable> Uses MR output from Phase 2 Output:  <IntWritable,DistributedRowMatrix.MatrixEntryWritable> Transposes MR output from Phase 2 MR Phase 2 output had users as rows and items as cols Now, items are rows and users are cols Each element, e jk , is transposed, e kj Emits  <k,e kj > Input: <IntWritable,List(DistributedRowMatrix.MatrixEntryWritable)> Output: <IntWritable,VectorWritable> Writes transposed user preferences vectors, U’ Emits  <row, U’> MaybePruneRowsMapper ToItemVectorsReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 5.1, RowSimilarityJob, Compute Weights   Input:  <IntWritable,VectorWritable> Uses MR output from Phase 4 Output:  <VarIntWritable,WeightedOccurences> For each element, e jk , compute its weighted occurrence, w jk Emits  <k,w jk > Input: <VarIntWritable,List(WeightedOccurrences)> Output: <VarIntWritable,WeightedOccurrenceArray> Transfers weighted occurrences to array and writes results Emits  <k, w jk > RowWeightMapper WeightedOccurrencesPerColumnReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 5.2, RowSimilarityJob, Compute Similarities   Input:  <VarIntWritable,WeightedOccurrenceArray> Uses MR output from Phase 5.1 Output:  <WeightedRowPair,Coocurrence> For pair of rows, p, write its column coocurrences, c Emits  <  p ,  c > Input: <WeightedRowPair,List(Coocurrence)> Output: <SimilarityMatrixEntryKey,MatrixEntryWritable> Compute the row similarities between row a  and row b , and write corresponding position in the matrix Emits  <row j , matrix entry> CooccurrencesMapper SimilarityReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 5.3, RowSimilarityJob, Similarity Matrix   Input:  <SimilarityMatrixEntryKey,MatrixEntryWritable> Uses MR output from Phase 5.2 Output:  <SimilarityMatrixEntryKey,MatrixEntryWritable> Writes similarity matrix entry key, sme, and matrix entry, me, as is sme is basically each row me is basically each row-col entry of the similarity matrix Emits  <sme,me> Input: <SimilarityMatrixEntryKey,List(MatrixEntryWritable)> Output: <IntWritable,VectorWritable> Write the row and its associated vector out Emits  <row, vector> Mapper EntriesToVectorsReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 6, Pre-partial multiply, Similarity Matrix Input:  < IntWritable,VectorWritable> Uses MR output from Phase 5.3 Output:  <IntWritable,VectorOrPrefWritable> Wraps the similarity vector, v 1 , into a different vector format, v 2 Emits  <row,v 2 > Input: <IntWritable,List(VectorOrPrefWritable)> Output: <IntWritable,VectorOrPrefWritable> Write the row and each of its associated vector out Emits  <row, vector> SimilarityMatrixRowWrapperMapper Reducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 7, Pre-partial multiply, Preferences Input:  < VarLongWritable,VectorWritable> Uses MR output from Phase 2 Output:  < VarIntWritable,VectorOrPrefWritable> Maps userID and preference vector, U Emits  <userID,U> Input: <IntWritable,List(VectorOrPrefWritable)> Output: <IntWritable,VectorOrPrefWritable> Write the row and each of its associated vector out Emits  <row, vector> UserVectorSplitterMapper Reducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 8, Partial Multiply Input:  < VarLongWritable,VectorWritable> Uses MR outputs from Phases 6 and 7 Output:  < VarIntWritable,VectorOrPrefWritable> Maps row and vector, v Emits  <row,v> Input: <VarIntWritable,List(VectorOrPrefWritable)> Output: <IntWritable,VectorOrPrefWritable> Write the row and each of its associated vector similarity, userIDs, and preference values Emits  <row, vector> Mapper ToVectorAndPrefReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 9, Filters Items Input:  <LongWritable,Text> Output:  <VarLongWritable,VectorLongWritable> Parses userID and itemID Emits  <itemID,userID> Input: <VarLongWritable,List(VarLongWritable)> Output: <VarIntWritable,VectorOrPrefWritable> Writes itemID and vector of userIDs and preferences Emits  <itemID, vector> ItemFilterMapper ItemFilterAsVectorAndPrefReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Mahout’s Recommender Engine: Phase 10, Aggregate and Recommend Input:  <VarIntWritable,VectorAndPrefsWritable> Uses MR outputs from phases 8 and 9 Output:  <VarLongWritable,PrefAndSimilarityColumnWritable> Writes userID and recommendations Emits  <userID,recommendation> Input: <VarLongWritable,List(PrefAndSimilarityColumnWritable)> Output: <VarLongWritable,RecommendedItemsWritable> Writes userID and vector of recommendations Emits  <userID, vector> PartialMultiplyMapper AggregateAndRecommendReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
Summary and Conclusion Mahout is a machine learning API built on top of Hadoop which includes clustering, pattern mining, classification, regression, evolutionary algorithms, and recommenders Mahout’s recommender engine transforms an expected input format into predicted recommendations Uses a series of MR phases to accomplish predicting recommendations A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
References S. Owen, R. Anil, T. Dunning, E. Friedman.  Mahout in Action . MEAP: Manning Publications, 2010. T. White.  Hadoop: The Definitive Guide . Sebastopol, CA: O’Reilly Media, Inc., 2009. J. Venner.  Pro Hadoop . Berkely, CA: Apress, 2009. C. Lam.  Hadoop in Action . Stamford, CT: Manning Publications Co., 2011. A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.

A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)

  • 1.
    A QUICK TUTORIALON MAHOUT’S RECOMMENDATION ENGINE (V 0.4) Jee Vang, Ph.D. [email_address] A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License. Slide Version 3.1
  • 2.
    What is recommendation?Recommendation involves the prediction of what new items a user would like or dislike based on preferences of or associations to previous items (Made-up) Example: A user, John Doe, likes the following books (items): A Tale of Two Cities The Great Gatsby For Whom the Bell Tolls Recommendations will predict which new books (items), John Doe, will like: Jane Eyre The Adventures of Tom Sawyer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 3.
    What is Mahout?Mahout is a machine learning application programming interface (API) built on Hadoop MapReduce (MR or M/R) Hadoop Distributed File System (HDFS) Mahout is written in Java Mahout has machine learning algorithms in the following areas: Clustering Pattern mining Classification Regression Evolutionary algorithms Recommenders/Collaborative filtering A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 4.
    How does Mahout’sRecommendation Engine Work? X = S U R S is the similarity matrix between items U is the user’s preferences for items R is the predicted recommendations A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 5.
    What is thesimilarity matrix, S? S is a n x n (square) matrix Each element, e, in S are indexed by row (j) and column (k), e jk Each e jk in S holds a value that describes how similar are its corresponding j-th and k-th items In this example, the similarity of the j-th and k-th items are determined by frequency of their co-occurrence (when the j-th item is seen, the k-th item is seen as well) In general, any similarity measure may be used to produce these values We see in this example that Items 1 and 2 co-occur 3 times, Items 1 and 3 co-occur 4 times, and so on… S Item 1 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 6.
    What is theuser’s preferences, U? The user’s preference is represented as a column vector Each value in the vector represents the user’s preference for j-th item In general, this column vector is sparse Values of zero, 0, represent no recorded preferences for the j-th item U Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 7.
    What is therecommendation, R? R is a column vector representing the prediction of recommendation of the j-th item for the user R is computed from the multiplication of S and U S x U = R In this running example, the user already has expressed positive preferences for Items 1, 4, 5 and 7, so we look at only Items 2, 3, and 6 We would recommend to the user Items 3, 2, and 6, in this order, to the user R Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 8.
    What data formatdoes Mahout’s recommendation engine expects? For Mahout v0.4, look at RecommenderJob (org.apache.mahout.cf.taste.hadoop.item.RecommenderJob) Each line of the input file should have the following format userID,itemID[,preferencevalue] userID is parsed as a long itemID is parsed as a long preferencevalue is parsed as a double and is optional Format 1 123,345 123,456 123,789 … 789,458 Format 2 123,345,1.0 123,456,2.2 123,789,3.4 … 789,458,1.2 A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 9.
    How do yourun Mahout’s recommendation engine? Requirements Hadoop cluster on GNU/Linux Java 1.6.x SSH Assuming you have a Hadoop cluster installed and configured correctly with the data loaded into HDFS, $HADOOP_INSTALL$/bin/hadoop jar $TARGET$/mahout-core-0.4-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=$INPUT$ -Dmapred.output.dir=$OUTPUT$ $HADOOP_INSTALL$ is the location where you installed Hadoop $TARGET$ is the directory where you have the Mahout jar file $INPUT$ is the input file name $OUTPUT$ is the output file name There are plenty of runtime options (check javadocs) --userFile (path) : optional; a file containing userIDs; only preferences of these userIDs will be computed --itemsFile (path) : optional; a file containing itemIDs; only these items will be used in the recommendation predictions --numRecommendations (integer) : number of recommendations to compute per user; default 10 --booleanData (boolean) : treat input data as having no preference values; default false --maxPrefsPerUser (integer) : maximum number of preferences considered per user in final recommendation phase; default 10 --similarityClassname (classname): similarity measure (cooccurence, euclidean, log-likelihood, pearson, tanimoto coefficient, uncentered cosine, cosine) A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 10.
    What are themechanics of Mahout’s recommendation engine? Mahout is built on Hadoop’s MapReduce (MR) API <K1,V1>  map  <K2,V2> <K2,List(V2)>  reduce  <K3,V3> A series of MR phases (Jobs) are called to accomplish the task of predicting recommendations ItemIDIndexMapper, ItemIDIndexReducer ItemPrefsMapper,ToUserVectorReducer CounterUsersMapper,CounterUsersReducer … PartialMultiplyMapper,AggregateAndRecommendReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 11.
    Mahout’s Recommender Engine:Phase 1, Generate List of ItemIDs Input: <LongWritable,Text> Output: <VarIntWritable,VarLongWritable> Parses out itemID long Converts itemID to int, itemID int Emits <itemID int ,itemID long > Input: <VarIntWritable,List(VarLongWritable)> Output: <VarIntWritable,VarLongWritable> Find the smallest value in the list of values, itemID long min Emits <itemID int , itemID long min > ItemIDIndexMapper ItemIDIndexReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 12.
    Mahout’s Recommender Engine:Phase 2, Create Preference Vector Input: <LongWritable,Text> Output: <VarLongWritable,VarLongWritable> Parses out userID and itemID Emits <userID,itemID> Input: <VarLongWritable,List(VarLongWritable)> Output: <VarLongWritable,VectorWritable> Creates preferences, U U is a sparse Vector Emits <userID, U> ToItemPrefsMapper ToUserVectorReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 13.
    Mahout’s Recommender Engine:Phase 3, Count Unique Users Input: <LongWritable,Text> Output: <CountUsersKeyWritable,VarLongWritable> Parses out userID Emits <userID,userID> Input: <CountUsersKeyWritable,List(VarLongWritable)> Output: <VarIntWritable,NullWritable> Count all unique users, numUsers Emits <numUsers, null> CountUsersMapper CountUsersReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 14.
    Mahout’s Recommender Engine:Phase 4, Transpose Preferences Vectors Input: <VarLongWritable,VectorWritable> Uses MR output from Phase 2 Output: <IntWritable,DistributedRowMatrix.MatrixEntryWritable> Transposes MR output from Phase 2 MR Phase 2 output had users as rows and items as cols Now, items are rows and users are cols Each element, e jk , is transposed, e kj Emits <k,e kj > Input: <IntWritable,List(DistributedRowMatrix.MatrixEntryWritable)> Output: <IntWritable,VectorWritable> Writes transposed user preferences vectors, U’ Emits <row, U’> MaybePruneRowsMapper ToItemVectorsReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 15.
    Mahout’s Recommender Engine:Phase 5.1, RowSimilarityJob, Compute Weights Input: <IntWritable,VectorWritable> Uses MR output from Phase 4 Output: <VarIntWritable,WeightedOccurences> For each element, e jk , compute its weighted occurrence, w jk Emits <k,w jk > Input: <VarIntWritable,List(WeightedOccurrences)> Output: <VarIntWritable,WeightedOccurrenceArray> Transfers weighted occurrences to array and writes results Emits <k, w jk > RowWeightMapper WeightedOccurrencesPerColumnReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 16.
    Mahout’s Recommender Engine:Phase 5.2, RowSimilarityJob, Compute Similarities Input: <VarIntWritable,WeightedOccurrenceArray> Uses MR output from Phase 5.1 Output: <WeightedRowPair,Coocurrence> For pair of rows, p, write its column coocurrences, c Emits < p , c > Input: <WeightedRowPair,List(Coocurrence)> Output: <SimilarityMatrixEntryKey,MatrixEntryWritable> Compute the row similarities between row a and row b , and write corresponding position in the matrix Emits <row j , matrix entry> CooccurrencesMapper SimilarityReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 17.
    Mahout’s Recommender Engine:Phase 5.3, RowSimilarityJob, Similarity Matrix Input: <SimilarityMatrixEntryKey,MatrixEntryWritable> Uses MR output from Phase 5.2 Output: <SimilarityMatrixEntryKey,MatrixEntryWritable> Writes similarity matrix entry key, sme, and matrix entry, me, as is sme is basically each row me is basically each row-col entry of the similarity matrix Emits <sme,me> Input: <SimilarityMatrixEntryKey,List(MatrixEntryWritable)> Output: <IntWritable,VectorWritable> Write the row and its associated vector out Emits <row, vector> Mapper EntriesToVectorsReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 18.
    Mahout’s Recommender Engine:Phase 6, Pre-partial multiply, Similarity Matrix Input: < IntWritable,VectorWritable> Uses MR output from Phase 5.3 Output: <IntWritable,VectorOrPrefWritable> Wraps the similarity vector, v 1 , into a different vector format, v 2 Emits <row,v 2 > Input: <IntWritable,List(VectorOrPrefWritable)> Output: <IntWritable,VectorOrPrefWritable> Write the row and each of its associated vector out Emits <row, vector> SimilarityMatrixRowWrapperMapper Reducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 19.
    Mahout’s Recommender Engine:Phase 7, Pre-partial multiply, Preferences Input: < VarLongWritable,VectorWritable> Uses MR output from Phase 2 Output: < VarIntWritable,VectorOrPrefWritable> Maps userID and preference vector, U Emits <userID,U> Input: <IntWritable,List(VectorOrPrefWritable)> Output: <IntWritable,VectorOrPrefWritable> Write the row and each of its associated vector out Emits <row, vector> UserVectorSplitterMapper Reducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 20.
    Mahout’s Recommender Engine:Phase 8, Partial Multiply Input: < VarLongWritable,VectorWritable> Uses MR outputs from Phases 6 and 7 Output: < VarIntWritable,VectorOrPrefWritable> Maps row and vector, v Emits <row,v> Input: <VarIntWritable,List(VectorOrPrefWritable)> Output: <IntWritable,VectorOrPrefWritable> Write the row and each of its associated vector similarity, userIDs, and preference values Emits <row, vector> Mapper ToVectorAndPrefReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 21.
    Mahout’s Recommender Engine:Phase 9, Filters Items Input: <LongWritable,Text> Output: <VarLongWritable,VectorLongWritable> Parses userID and itemID Emits <itemID,userID> Input: <VarLongWritable,List(VarLongWritable)> Output: <VarIntWritable,VectorOrPrefWritable> Writes itemID and vector of userIDs and preferences Emits <itemID, vector> ItemFilterMapper ItemFilterAsVectorAndPrefReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 22.
    Mahout’s Recommender Engine:Phase 10, Aggregate and Recommend Input: <VarIntWritable,VectorAndPrefsWritable> Uses MR outputs from phases 8 and 9 Output: <VarLongWritable,PrefAndSimilarityColumnWritable> Writes userID and recommendations Emits <userID,recommendation> Input: <VarLongWritable,List(PrefAndSimilarityColumnWritable)> Output: <VarLongWritable,RecommendedItemsWritable> Writes userID and vector of recommendations Emits <userID, vector> PartialMultiplyMapper AggregateAndRecommendReducer A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 23.
    Summary and ConclusionMahout is a machine learning API built on top of Hadoop which includes clustering, pattern mining, classification, regression, evolutionary algorithms, and recommenders Mahout’s recommender engine transforms an expected input format into predicted recommendations Uses a series of MR phases to accomplish predicting recommendations A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 24.
    References S. Owen,R. Anil, T. Dunning, E. Friedman. Mahout in Action . MEAP: Manning Publications, 2010. T. White. Hadoop: The Definitive Guide . Sebastopol, CA: O’Reilly Media, Inc., 2009. J. Venner. Pro Hadoop . Berkely, CA: Apress, 2009. C. Lam. Hadoop in Action . Stamford, CT: Manning Publications Co., 2011. A Quick Tutorial on Mahout's Recommendation Engine is licensed under a Creative Commons Attribution 3.0 Unported License.