Apache Mahout Tutorial - Recommendation - 2013/2014

  • 12,756 views
Uploaded on

SWAP Research Group - Mahout Tutorial on Recommendation. Based on Apache Mahout 0.8

SWAP Research Group - Mahout Tutorial on Recommendation. Based on Apache Mahout 0.8

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
12,756
On Slideshare
0
From Embeds
0
Number of Embeds
18

Actions

Shares
Downloads
676
Comments
1
Likes
38

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Mahout – Tutorial (2014) Cataldo Musto, Ph.D. Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2013/2014 08/01/2014
  • 2. Outline • What is Mahout ? – Overview • How to use Mahout ? – Hands-on session
  • 3. Part 1 What is Mahout?
  • 4. Goal • Mahout is a Java library – Implementing Machine Learning techniques
  • 5. Goal • Mahout is a Java library – Implementing Machine Learning techniques • • • • Clustering Classification Recommendation Frequent ItemSet
  • 6. What can we do? • Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries to find items users might like. – Clustering - takes e.g. text documents and groups them into groups of topically related documents. – Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. – Frequent itemset mining - takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.
  • 7. Why Mahout? • Mahout is not the only ML framework – Weka (http://www.cs.waikato.ac.nz/ml/weka/) – R (http://www.r-project.org/) • Why do we prefer Mahout?
  • 8. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation
  • 9. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable
  • 10. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable • Based on Hadoop (not mandatory!)
  • 11. Why do we need a scalable framework? Big Data!
  • 12. Use Cases
  • 13. Use Cases Recommendation Engine on Foursquare
  • 14. Use Cases User Interest Modeling on Twitter
  • 15. Use Cases Pattern Mining on Yahoo! (as anti-spam)
  • 16. Algorithms • Recommendation – User-based Collaborative Filtering – Item-based Collaborative Filtering – SlopeOne Recommenders – Singular Value Decomposition-based CF
  • 17. Algorithms • Clustering – – – – – Canopy K-Means Fuzzy K-Means Latent Dirichlet Allocation (LDA) MinHash Clustering – Draft algorithms • Hierarchical Clustering • (and much more…)
  • 18. Algorithms • Classification – – – – Logistic Regression Bayes Random Forests Hidden Markov Models – Draft algorithms • • • • • Support Vector Machines Perceptrons Neural Networks Restricted Boltzmann Machines (and much more…)
  • 19. Algorithms • Other – Evolutionary Algorithms • Genetic Algorithms – Dimensionality Reduction techniques • Principal Component Analysis (PCA) • Singular Value Decomposition (SVD) – Frequent ItemSet Pattern Mining – (and much more.)
  • 20. Mahout in the Apache Software Foundation
  • 21. Mahout in the Apache Software Foundation Original Mahout Project
  • 22. Mahout in the Apache Software Foundation Taste: collaborative filtering framework
  • 23. Mahout in the Apache Software Foundation Lucene: information retrieval software library
  • 24. Mahout in the Apache Software Foundation Hadoop: framework for distributed storage and programming based on MapReduce
  • 25. General Architecture Three-tiers architecture (Application, Algorithms and Shared Libraries)
  • 26. General Architecture Data Storage and Shared Libraries
  • 27. General Architecture Business Logic
  • 28. General Architecture External Applications invoking Mahout APIs
  • 29. In this tutorial we will focus on Recommendation
  • 30. Recommendation • Mahout implements a Collaborative Filtering framework – Popularized by Amazon and others – Uses historical data (ratings, clicks, and purchases) to provide recommendations • User-based: Recommend items by finding similar users. This is often harder to scale because of the dynamic nature of users. • Item-based: Calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline. • Slope-One: A very fast and simple item-based recommendation approach applicable when users have given ratings (and not just boolean preferences).
  • 31. Slope-One Recommender • Brief Recap • Sample Rating database (from Wikipedia)
  • 32. Slope-One Recommender • • • • Brief Recap Sample Rating database (from Wikipedia) Is a variant of item-based CF Compute predictions by taking into account the average differences between item’s ratings
  • 33. Slope-One Recommender • Step One: simplification X – Just two items – Goal: to compute Lucy’s rating for item A – Algorithm: compute the average difference between ratings fro Item A and B – difference: +0.5 for Item A – Rating(Lucy,ItemA) = Rating(ItemB) + difference = 2.5
  • 34. Slope-One Recommender X • Step One: simplification – Just two items – Goal: to compute Lucy’s rating for item A – Algorithm: compute the average difference between ratings fro Item A and C – difference: +3 for Item A – Rating(Lucy,ItemA) = Rating(ItemC) + difference = 8
  • 35. Slope-One Recommender • Step One: combining partial computations – Lucy’s rating for item A is the weighted sum of the estimation based on item B and the estimation based on item C
  • 36. Slope-One Recommender • Step One: combining partial computations – Weight: #users that voted both items • John and Mark voted both Item A and Item B • Just John voted both Item A and Item C
  • 37. (comin’ back to Mahout)
  • 38. Recommendation - Architecture
  • 39. Recommendation - Architecture Inceptive Idea: A Java/J2EE application invokes a Mahout Recommender whose DataModel is based on a set of User Preferences that are built on the ground of a physical DataStore
  • 40. Recommendation - Architecture Physical Storage (database, files, etc.)
  • 41. Recommendation - Architecture Data Model Physical Storage (database, files, etc.)
  • 42. Recommendation - Architecture Recommender Data Model Physical Storage (database, files, etc.)
  • 43. Recommendation - Architecture External Application Recommender Data Model Physical Storage (database, files, etc.)
  • 44. Recommendation in Mahout • Input: raw data (user preferences) • Output: preferences estimation • Step 1 – Mapping raw data into a DataModel Mahout-compliant • Step 2 – Tuning recommender components • Similarity measure, neighborhood, etc. • Step 3 – Computing rating estimations • Step 4 – Evaluating recommendation
  • 45. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – DataModel interface • Methods for mapping raw data to a Mahout-compliant form – UserSimilarity interface • Methods to calculate the degree of correlation between two users – ItemSimilarity interface • Methods to calculate the degree of correlation between two items – UserNeighborhood interface • Methods to define the concept of ‘neighborhood’ – Recommender interface • Methods to implement the recommendation step itself
  • 46. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – example: DataModel interface • Each methods for mapping raw data to a Mahoutcompliant form is an implementation of the generic interface • e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database • (and so on)
  • 47. Components: DataModel • A DataModel is the interface to draw information about user preferences. • Which sources is it possible to draw? – Database • MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel – External Files • FileDataModel – Generic (preferences directly feed through Java code) • GenericDataModel (They are all implementations of the DataModel interface)
  • 48. Components: DataModel • GenericDataModel – Feed through Java calls • FileDataModel – CSV (Comma Separated Values) • JDBCDataModel – JDBC Driver – Standard database structure
  • 49. FileDataModel – CSV input
  • 50. Components: DataModel • Regardless the source, they all share a common implementation. • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray
  • 51. Components: DataModel • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray • Two implementations – GenericUserPreferenceArray • It stores numerical preference, as well. – BooleanUserPreferenceArray • It skips numerical preference values.
  • 52. Components: UserSimilarity • UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity between two Items. • Which definition of similarity are available? – – – – – Pearson Correlation Spearman Correlation Euclidean Distance Tanimoto Coefficient LogLikelihood Similarity – Already implemented!
  • 53. Example: TanimotoDistance
  • 54. Example: CosineSimilarity
  • 55. Different Similarity definitions influence neighborhood formation
  • 56. Pearson’s vs. Euclidean distance
  • 57. Pearson’s vs. Euclidean distance
  • 58. Components: UserNeighborhood • Which definition of neighborhood are available? – Nearest N users • The first N users with the highest similarity are labeled as ‘neighbors’ – Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’ – Already implemented!
  • 59. Components: Recommender • Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item • Which recommendation algorithms are implemented? – – – – User-based CF Item-based CF SVD-based CF SlopeOne CF (and much more…)
  • 60. Recommendation Engines
  • 61. Recap • Many implementations of a CF-based recommender! – 6 different recommendation algorithms – 2 different neighborhood definitions – 5 different similarity definitions • Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save time in the evaluation of the different combinations of the parameters! – Standard interface for the evaluation of a Recommender System
  • 62. Evaluation • Mahout provides classes for the evaluation of a recommender system – Prediction-based measures • Mean Average Error • RMSE (Root Mean Square Error) – IR-based measures • Precision, Recall, F1-measure, F1@n • NDCG (ranking measure)
  • 63. Evaluation • Prediction-based Measures – Class: AverageAbsoluteDifferenceEvaluator – Method: evaluate() – Parameters: • • • • Recommender implementation DataModel implementation TrainingSet size (e.g. 70%) % of the data to use in the evaluation (smaller % for fast prototyping)
  • 64. Evaluation • IR-based Measures – Class: GenericRecommenderIRStatsEvaluator – Method: evaluate() – Parameters: • • • • Recommender implementation DataModel implementation Relevance Threshold (mean+standard deviation) % of the data to use in the evaluation (smaller % for fast prototyping)
  • 65. Part 2 How to use Mahout? Hands-on
  • 66. Download Mahout • Download – The latest Mahout release is 0.8 – Available at: http://apache.fastbull.org/mahout/0.8/mahoutdistribution-0.8.zip - Extract all the libraries and include them in a new NetBeans (Eclipse) project • Requirement: Java 1.6.x or greater. • Hadoop is not mandatory!
  • 67. Important JavaDoc https://builds.apache.org/job/Mahout-Quality/javadoc/
  • 68. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.)
  • 69. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.) • Hints about objects to be used: – Preference – GenericUserPreferenceArray
  • 70. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } }
  • 71. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } } Score 2 for Item 101
  • 72. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.)
  • 73. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.) • Hints about objects to be used: – FastByIdMap – Model
  • 74. Exercise 2: data model • PreferenceArray stores the preferences of a single user • Where do the preferences of all the users are stored? – An HashMap? No. – Mahout introduces data structures optimized for recommendation tasks • HashMap are replaced by FastByIDMap
  • 75. Exercise 2: data model import import import import import org.apache.mahout.cf.taste.impl.common.FastByIDMap; org.apache.mahout.cf.taste.impl.model.GenericDataModel; org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; org.apache.mahout.cf.taste.model.DataModel; org.apache.mahout.cf.taste.model.PreferenceArray; class CreateGenericDataModel { private CreateGenericDataModel() { } public static void main(String[] args) { FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>(); PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10); prefsForUser1.setUserID(0, 1L); prefsForUser1.setItemID(0, 101L); prefsForUser1.setValue(0, 3.0f); prefsForUser1.setItemID(1, 102L); prefsForUser1.setValue(1, 4.5f); preferences.put(1L, prefsForUser1); DataModel model = new GenericDataModel(preferences); System.out.println(model); } }
  • 76. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data!
  • 77. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Hints about objects to be used: – FileDataModel – PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc.
  • 78. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } }
  • 79. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } FileDataModel
  • 80. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Similarity Definitions
  • 81. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Output
  • 82. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations
  • 83. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters!
  • 84. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters! • Hints about objects to be used: – NearestNUserNeighborhood – GenericUserBasedRecommender • Parameters: data model, neighborhood, similarity measure
  • 85. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  • 86. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { FileDataModel DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  • 87. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } 2 neighbours
  • 88. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } Top-1 Recommendation for User 1
  • 89. Exercise 5: MovieLens Recommender • Download the GroupLens dataset (100k) – Its format is already Mahout compliant – http://files.grouplens.org/datasets/movielens/ml100k.zip • Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset • Next: now we can run the recommendation framework against a state-of-the-art dataset
  • 90. Exercise 5: MovieLens Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 20); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  • 91. Exercise 5: MovieLens Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(10, 50); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } We can play with parameters!
  • 92. Exercise 5: MovieLens Recommender • Analyze Recommender behavior with different combination of parameters – Do the recommendations change with a different similarity measure? – Do the recommendations change with different neighborhood sizes? – Which one is the best one? • …. Let’s go to the next exercise!
  • 93. Exercise 6: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE, Precision
  • 94. Exercise 6: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE • Hints: useful classes – Implementations of RecommenderEvaluator interface • AverageAbsoluteDifferenceRecommenderEvaluator • RMSRecommenderEvaluator
  • 95. Exercise 6: Recommender Evaluation • Further Hints: – Use RandomUtils.useTestSeed()to ensure the consistency among different evaluation runs – Invoke the evaluate() method • Parameters – RecommenderBuilder: recommender istance (as in previous exercises. – DataModelBuilder: specific criterion for training – Split Training-Test: double value (e.g. 0.7 for 70%) – Amount of data to use in the evaluation: double value (e.g 1.0 for 100%)
  • 96. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } Ensures the consistency between different evaluation runs. public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } }
  • 97. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } }
  • 98. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } 70%training (evaluation on the whole dataset)
  • 99. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } Recommendation Engine
  • 100. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } We can add more measures
  • 101. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } }
  • 102. Mahout Strengths • Fast-prototyping and evaluation – To evaluate a different configuration of the same algorithm we just need to update a parameter and run again. – Example • Different Neighborhood Size
  • 103. 5 minutes to look for the best configuration 
  • 104. Exercise 7: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: Precision, Recall
  • 105. Exercise 7: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: Precision, Recall • Hints: useful classes – GenericRecommenderIRStatsEvaluator – Evaluate() method • Same parameters of exercise 6
  • 106. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Precision@5 , Recall@5, etc.
  • 107. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Precision@5 , Recall@5, etc.
  • 108. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Set Neighborhood to 500
  • 109. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new EuclideanDistanceSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Set Euclidean Distance
  • 110. Exercise 8: item-based recommender • Mahout provides Java classes for building an item-based recommender system – Amazon-like – Recommendations are based on similarities among items (generally pre-computed offline) – Evaluate it with the MovieLens dataset!
  • 111. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } }
  • 112. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } ItemSimilarity
  • 113. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } No Neighborhood definition for itembased recommenders
  • 114. Exercise 9: Recommender Evaluation • Write a class that automatically runs evaluation with different parameters – e.g. fixed neighborhood sizes from an Array of values – Print the best scores and the configuration
  • 115. Exercise 10: more datasets! • Find the best configuration for several datasets – Download datasets from http://mahout.apache.org/users/basics/collections.html – Write classes to transform input data in a Mahout-compliant form – Extend exercise 9!
  • 116. End. Do you want more?
  • 117. Do you want more? • Recommendation – Deploy of a Mahout-based Web Recommender – Integration with Hadoop – Integration of content-based information – Custom similarities, Custom recommenders, Rescoring functions • Classification, Clustering and Pattern Mining