Your SlideShare is downloading. ×
0
Apache Mahout – Tutorial (2014)
Cataldo Musto, Ph.D.

Corso di Accesso Intelligente all’Informazione ed Elaborazione del L...
Outline
• What is Mahout ?
– Overview

• How to use Mahout ?
– Hands-on session
Part 1

What is Mahout?
Goal
• Mahout is a Java library
– Implementing Machine Learning techniques
Goal
• Mahout is a Java library
– Implementing Machine Learning techniques
•
•
•
•

Clustering
Classification
Recommendati...
What can we do?
• Currently Mahout supports mainly four use cases:
– Recommendation - takes users' behavior and tries to f...
Why Mahout?
• Mahout is not the only ML framework
– Weka (http://www.cs.waikato.ac.nz/ml/weka/)
– R (http://www.r-project....
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation

–Scalable
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation

–Scalable
• Based on Hadoop...
Why do we need a scalable framework?

Big Data!
Use Cases
Use Cases

Recommendation Engine on Foursquare
Use Cases

User Interest Modeling on Twitter
Use Cases

Pattern Mining on Yahoo! (as anti-spam)
Algorithms
• Recommendation
– User-based Collaborative Filtering
– Item-based Collaborative Filtering
– SlopeOne Recommend...
Algorithms
• Clustering
–
–
–
–
–

Canopy
K-Means
Fuzzy K-Means
Latent Dirichlet Allocation (LDA)
MinHash Clustering

– Dr...
Algorithms
• Classification
–
–
–
–

Logistic Regression
Bayes
Random Forests
Hidden Markov Models

– Draft algorithms
•
•...
Algorithms
• Other
– Evolutionary Algorithms
• Genetic Algorithms

– Dimensionality Reduction techniques
• Principal Compo...
Mahout in the
Apache Software Foundation
Mahout in the
Apache Software Foundation

Original Mahout Project
Mahout in the
Apache Software Foundation

Taste: collaborative filtering framework
Mahout in the
Apache Software Foundation

Lucene: information retrieval software library
Mahout in the
Apache Software Foundation

Hadoop: framework for distributed storage and programming based on MapReduce
General Architecture

Three-tiers architecture
(Application, Algorithms and Shared Libraries)
General Architecture

Data Storage and Shared Libraries
General Architecture

Business Logic
General Architecture

External Applications invoking Mahout APIs
In this tutorial we will focus on

Recommendation
Recommendation
• Mahout implements a Collaborative Filtering
framework
– Popularized by Amazon and others
– Uses historica...
Slope-One Recommender
• Brief Recap
• Sample Rating database (from Wikipedia)
Slope-One Recommender
•
•
•
•

Brief Recap
Sample Rating database (from Wikipedia)
Is a variant of item-based CF
Compute p...
Slope-One Recommender

• Step One: simplification

X

– Just two items
– Goal: to compute Lucy’s rating for item A
– Algor...
Slope-One Recommender

X

• Step One: simplification

– Just two items
– Goal: to compute Lucy’s rating for item A
– Algor...
Slope-One Recommender

• Step One: combining partial computations
– Lucy’s rating for item A is the weighted sum of the
es...
Slope-One Recommender

• Step One: combining partial computations
– Weight: #users that voted both items
• John and Mark v...
(comin’ back to Mahout)
Recommendation - Architecture
Recommendation - Architecture
Inceptive Idea:
A Java/J2EE application
invokes a Mahout
Recommender whose
DataModel is base...
Recommendation - Architecture

Physical Storage (database, files, etc.)
Recommendation - Architecture

Data Model

Physical Storage (database, files, etc.)
Recommendation - Architecture

Recommender

Data Model

Physical Storage (database, files, etc.)
Recommendation - Architecture
External Application

Recommender

Data Model

Physical Storage (database, files, etc.)
Recommendation in Mahout
• Input: raw data (user preferences)
• Output: preferences estimation
• Step 1
– Mapping raw data...
Recommendation Components
• Mahout key abstractions are implemented through
Java interfaces :
– DataModel interface
• Meth...
Recommendation Components
• Mahout key abstractions are implemented
through Java interfaces :
– example: DataModel interfa...
Components: DataModel
• A DataModel is the interface to draw information about user
preferences.
• Which sources is it pos...
Components: DataModel
• GenericDataModel
– Feed through Java calls

• FileDataModel
– CSV (Comma Separated Values)

• JDBC...
FileDataModel – CSV input
Components: DataModel
• Regardless the source, they all share a
common implementation.
• Basic object: Preference
– Prefer...
Components: DataModel
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArra...
Components: UserSimilarity
• UserSimilarity defines a notion of similarity between two
Users.
– (respectively) ItemSimilar...
Example: TanimotoDistance
Example: CosineSimilarity
Different Similarity definitions
influence neighborhood formation
Pearson’s vs. Euclidean distance
Pearson’s vs. Euclidean distance
Components: UserNeighborhood
• Which definition of neighborhood are available?
– Nearest N users
• The first N users with ...
Components: Recommender
• Given a DataModel, a definition of similarity between
users (items) and a definition of neighbor...
Recommendation Engines
Recap
• Many implementations of a CF-based recommender!
– 6 different recommendation algorithms
– 2 different neighborhood...
Evaluation
• Mahout provides classes for the evaluation of
a recommender system
– Prediction-based measures
• Mean Average...
Evaluation
• Prediction-based Measures
– Class: AverageAbsoluteDifferenceEvaluator
– Method: evaluate()
– Parameters:
•
•
...
Evaluation
• IR-based Measures
– Class: GenericRecommenderIRStatsEvaluator
– Method: evaluate()
– Parameters:
•
•
•
•

Rec...
Part 2

How to use Mahout?
Hands-on
Download Mahout
• Download
– The latest Mahout release is 0.8
– Available at:
http://apache.fastbull.org/mahout/0.8/mahout...
Important

JavaDoc
https://builds.apache.org/job/Mahout-Quality/javadoc/
Exercise 1
• Create a Preference object
• Set preferences through some simple Java call
• Print some statistics about pref...
Exercise 1
• Create a Preference object
• Set preferences through some simple Java call
• Print some statistics about pref...
Exercise 1: preferences
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout....
Exercise 1: preferences
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout....
Exercise 2
• Create a DataModel
• Feed the DataModel through some simple
Java calls
• Print some statistics about data (ho...
Exercise 2
• Create a DataModel
• Feed the DataModel through some simple Java
calls
• Print some statistics about data (ho...
Exercise 2: data model
• PreferenceArray stores the preferences of a
single user
• Where do the preferences of all the use...
Exercise 2: data model
import
import
import
import
import

org.apache.mahout.cf.taste.impl.common.FastByIDMap;
org.apache....
Exercise 3
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file ...
Exercise 3
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file ...
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*...
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*...
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*...
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*...
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file ...
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file ...
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file ...
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.mod...
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.mod...
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.mod...
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.mod...
Exercise 5: MovieLens Recommender
• Download the GroupLens dataset (100k)
– Its format is already Mahout compliant
– http:...
Exercise 5: MovieLens Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl...
Exercise 5: MovieLens Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl...
Exercise 5: MovieLens Recommender
• Analyze Recommender behavior with different
combination of parameters
– Do the recomme...
Exercise 6: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: RMSE, M...
Exercise 6: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: RMSE, M...
Exercise 6: Recommender Evaluation
• Further Hints:
– Use RandomUtils.useTestSeed()to ensure
the consistency among differe...
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}

Ensures the consistency between
different evalu...
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Ex...
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws E...
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws E...
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Ex...
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Ex...
Mahout Strengths
• Fast-prototyping and evaluation
– To evaluate a different configuration of the same
algorithm we just n...
5 minutes to look for the
best configuration 
Exercise 7: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: Precisi...
Exercise 7: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: Precisi...
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] a...
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] a...
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] a...
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] a...
Exercise 8: item-based recommender
• Mahout provides Java classes for building an
item-based recommender system
– Amazon-l...
Example 8: item-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[]...
Example 8: item-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[]...
Example 8: item-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[]...
Exercise 9: Recommender Evaluation
• Write a class that automatically runs
evaluation with different parameters
– e.g. fix...
Exercise 10: more datasets!
• Find the best configuration for several
datasets
– Download datasets from
http://mahout.apac...
End. Do you want more?
Do you want more?
• Recommendation
– Deploy of a Mahout-based Web Recommender
– Integration with Hadoop
– Integration of c...
Upcoming SlideShare
Loading in...5
×

Apache Mahout Tutorial - Recommendation - 2013/2014

20,091

Published on

SWAP Research Group - Mahout Tutorial on Recommendation. Based on Apache Mahout 0.8

Published in: Technology, Education
1 Comment
63 Likes
Statistics
Notes
No Downloads
Views
Total Views
20,091
On Slideshare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
1,177
Comments
1
Likes
63
Embeds 0
No embeds

No notes for slide

Transcript of "Apache Mahout Tutorial - Recommendation - 2013/2014 "

  1. 1. Apache Mahout – Tutorial (2014) Cataldo Musto, Ph.D. Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2013/2014 08/01/2014
  2. 2. Outline • What is Mahout ? – Overview • How to use Mahout ? – Hands-on session
  3. 3. Part 1 What is Mahout?
  4. 4. Goal • Mahout is a Java library – Implementing Machine Learning techniques
  5. 5. Goal • Mahout is a Java library – Implementing Machine Learning techniques • • • • Clustering Classification Recommendation Frequent ItemSet
  6. 6. What can we do? • Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries to find items users might like. – Clustering - takes e.g. text documents and groups them into groups of topically related documents. – Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. – Frequent itemset mining - takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.
  7. 7. Why Mahout? • Mahout is not the only ML framework – Weka (http://www.cs.waikato.ac.nz/ml/weka/) – R (http://www.r-project.org/) • Why do we prefer Mahout?
  8. 8. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation
  9. 9. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable
  10. 10. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable • Based on Hadoop (not mandatory!)
  11. 11. Why do we need a scalable framework? Big Data!
  12. 12. Use Cases
  13. 13. Use Cases Recommendation Engine on Foursquare
  14. 14. Use Cases User Interest Modeling on Twitter
  15. 15. Use Cases Pattern Mining on Yahoo! (as anti-spam)
  16. 16. Algorithms • Recommendation – User-based Collaborative Filtering – Item-based Collaborative Filtering – SlopeOne Recommenders – Singular Value Decomposition-based CF
  17. 17. Algorithms • Clustering – – – – – Canopy K-Means Fuzzy K-Means Latent Dirichlet Allocation (LDA) MinHash Clustering – Draft algorithms • Hierarchical Clustering • (and much more…)
  18. 18. Algorithms • Classification – – – – Logistic Regression Bayes Random Forests Hidden Markov Models – Draft algorithms • • • • • Support Vector Machines Perceptrons Neural Networks Restricted Boltzmann Machines (and much more…)
  19. 19. Algorithms • Other – Evolutionary Algorithms • Genetic Algorithms – Dimensionality Reduction techniques • Principal Component Analysis (PCA) • Singular Value Decomposition (SVD) – Frequent ItemSet Pattern Mining – (and much more.)
  20. 20. Mahout in the Apache Software Foundation
  21. 21. Mahout in the Apache Software Foundation Original Mahout Project
  22. 22. Mahout in the Apache Software Foundation Taste: collaborative filtering framework
  23. 23. Mahout in the Apache Software Foundation Lucene: information retrieval software library
  24. 24. Mahout in the Apache Software Foundation Hadoop: framework for distributed storage and programming based on MapReduce
  25. 25. General Architecture Three-tiers architecture (Application, Algorithms and Shared Libraries)
  26. 26. General Architecture Data Storage and Shared Libraries
  27. 27. General Architecture Business Logic
  28. 28. General Architecture External Applications invoking Mahout APIs
  29. 29. In this tutorial we will focus on Recommendation
  30. 30. Recommendation • Mahout implements a Collaborative Filtering framework – Popularized by Amazon and others – Uses historical data (ratings, clicks, and purchases) to provide recommendations • User-based: Recommend items by finding similar users. This is often harder to scale because of the dynamic nature of users. • Item-based: Calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline. • Slope-One: A very fast and simple item-based recommendation approach applicable when users have given ratings (and not just boolean preferences).
  31. 31. Slope-One Recommender • Brief Recap • Sample Rating database (from Wikipedia)
  32. 32. Slope-One Recommender • • • • Brief Recap Sample Rating database (from Wikipedia) Is a variant of item-based CF Compute predictions by taking into account the average differences between item’s ratings
  33. 33. Slope-One Recommender • Step One: simplification X – Just two items – Goal: to compute Lucy’s rating for item A – Algorithm: compute the average difference between ratings fro Item A and B – difference: +0.5 for Item A – Rating(Lucy,ItemA) = Rating(ItemB) + difference = 2.5
  34. 34. Slope-One Recommender X • Step One: simplification – Just two items – Goal: to compute Lucy’s rating for item A – Algorithm: compute the average difference between ratings fro Item A and C – difference: +3 for Item A – Rating(Lucy,ItemA) = Rating(ItemC) + difference = 8
  35. 35. Slope-One Recommender • Step One: combining partial computations – Lucy’s rating for item A is the weighted sum of the estimation based on item B and the estimation based on item C
  36. 36. Slope-One Recommender • Step One: combining partial computations – Weight: #users that voted both items • John and Mark voted both Item A and Item B • Just John voted both Item A and Item C
  37. 37. (comin’ back to Mahout)
  38. 38. Recommendation - Architecture
  39. 39. Recommendation - Architecture Inceptive Idea: A Java/J2EE application invokes a Mahout Recommender whose DataModel is based on a set of User Preferences that are built on the ground of a physical DataStore
  40. 40. Recommendation - Architecture Physical Storage (database, files, etc.)
  41. 41. Recommendation - Architecture Data Model Physical Storage (database, files, etc.)
  42. 42. Recommendation - Architecture Recommender Data Model Physical Storage (database, files, etc.)
  43. 43. Recommendation - Architecture External Application Recommender Data Model Physical Storage (database, files, etc.)
  44. 44. Recommendation in Mahout • Input: raw data (user preferences) • Output: preferences estimation • Step 1 – Mapping raw data into a DataModel Mahout-compliant • Step 2 – Tuning recommender components • Similarity measure, neighborhood, etc. • Step 3 – Computing rating estimations • Step 4 – Evaluating recommendation
  45. 45. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – DataModel interface • Methods for mapping raw data to a Mahout-compliant form – UserSimilarity interface • Methods to calculate the degree of correlation between two users – ItemSimilarity interface • Methods to calculate the degree of correlation between two items – UserNeighborhood interface • Methods to define the concept of ‘neighborhood’ – Recommender interface • Methods to implement the recommendation step itself
  46. 46. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – example: DataModel interface • Each methods for mapping raw data to a Mahoutcompliant form is an implementation of the generic interface • e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database • (and so on)
  47. 47. Components: DataModel • A DataModel is the interface to draw information about user preferences. • Which sources is it possible to draw? – Database • MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel – External Files • FileDataModel – Generic (preferences directly feed through Java code) • GenericDataModel (They are all implementations of the DataModel interface)
  48. 48. Components: DataModel • GenericDataModel – Feed through Java calls • FileDataModel – CSV (Comma Separated Values) • JDBCDataModel – JDBC Driver – Standard database structure
  49. 49. FileDataModel – CSV input
  50. 50. Components: DataModel • Regardless the source, they all share a common implementation. • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray
  51. 51. Components: DataModel • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray • Two implementations – GenericUserPreferenceArray • It stores numerical preference, as well. – BooleanUserPreferenceArray • It skips numerical preference values.
  52. 52. Components: UserSimilarity • UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity between two Items. • Which definition of similarity are available? – – – – – Pearson Correlation Spearman Correlation Euclidean Distance Tanimoto Coefficient LogLikelihood Similarity – Already implemented!
  53. 53. Example: TanimotoDistance
  54. 54. Example: CosineSimilarity
  55. 55. Different Similarity definitions influence neighborhood formation
  56. 56. Pearson’s vs. Euclidean distance
  57. 57. Pearson’s vs. Euclidean distance
  58. 58. Components: UserNeighborhood • Which definition of neighborhood are available? – Nearest N users • The first N users with the highest similarity are labeled as ‘neighbors’ – Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’ – Already implemented!
  59. 59. Components: Recommender • Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item • Which recommendation algorithms are implemented? – – – – User-based CF Item-based CF SVD-based CF SlopeOne CF (and much more…)
  60. 60. Recommendation Engines
  61. 61. Recap • Many implementations of a CF-based recommender! – 6 different recommendation algorithms – 2 different neighborhood definitions – 5 different similarity definitions • Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save time in the evaluation of the different combinations of the parameters! – Standard interface for the evaluation of a Recommender System
  62. 62. Evaluation • Mahout provides classes for the evaluation of a recommender system – Prediction-based measures • Mean Average Error • RMSE (Root Mean Square Error) – IR-based measures • Precision, Recall, F1-measure, F1@n • NDCG (ranking measure)
  63. 63. Evaluation • Prediction-based Measures – Class: AverageAbsoluteDifferenceEvaluator – Method: evaluate() – Parameters: • • • • Recommender implementation DataModel implementation TrainingSet size (e.g. 70%) % of the data to use in the evaluation (smaller % for fast prototyping)
  64. 64. Evaluation • IR-based Measures – Class: GenericRecommenderIRStatsEvaluator – Method: evaluate() – Parameters: • • • • Recommender implementation DataModel implementation Relevance Threshold (mean+standard deviation) % of the data to use in the evaluation (smaller % for fast prototyping)
  65. 65. Part 2 How to use Mahout? Hands-on
  66. 66. Download Mahout • Download – The latest Mahout release is 0.8 – Available at: http://apache.fastbull.org/mahout/0.8/mahoutdistribution-0.8.zip - Extract all the libraries and include them in a new NetBeans (Eclipse) project • Requirement: Java 1.6.x or greater. • Hadoop is not mandatory!
  67. 67. Important JavaDoc https://builds.apache.org/job/Mahout-Quality/javadoc/
  68. 68. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.)
  69. 69. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.) • Hints about objects to be used: – Preference – GenericUserPreferenceArray
  70. 70. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } }
  71. 71. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } } Score 2 for Item 101
  72. 72. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.)
  73. 73. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.) • Hints about objects to be used: – FastByIdMap – Model
  74. 74. Exercise 2: data model • PreferenceArray stores the preferences of a single user • Where do the preferences of all the users are stored? – An HashMap? No. – Mahout introduces data structures optimized for recommendation tasks • HashMap are replaced by FastByIDMap
  75. 75. Exercise 2: data model import import import import import org.apache.mahout.cf.taste.impl.common.FastByIDMap; org.apache.mahout.cf.taste.impl.model.GenericDataModel; org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; org.apache.mahout.cf.taste.model.DataModel; org.apache.mahout.cf.taste.model.PreferenceArray; class CreateGenericDataModel { private CreateGenericDataModel() { } public static void main(String[] args) { FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>(); PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10); prefsForUser1.setUserID(0, 1L); prefsForUser1.setItemID(0, 101L); prefsForUser1.setValue(0, 3.0f); prefsForUser1.setItemID(1, 102L); prefsForUser1.setValue(1, 4.5f); preferences.put(1L, prefsForUser1); DataModel model = new GenericDataModel(preferences); System.out.println(model); } }
  76. 76. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data!
  77. 77. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Hints about objects to be used: – FileDataModel – PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc.
  78. 78. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } }
  79. 79. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } FileDataModel
  80. 80. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Similarity Definitions
  81. 81. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Output
  82. 82. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations
  83. 83. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters!
  84. 84. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters! • Hints about objects to be used: – NearestNUserNeighborhood – GenericUserBasedRecommender • Parameters: data model, neighborhood, similarity measure
  85. 85. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  86. 86. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { FileDataModel DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  87. 87. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } 2 neighbours
  88. 88. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } Top-1 Recommendation for User 1
  89. 89. Exercise 5: MovieLens Recommender • Download the GroupLens dataset (100k) – Its format is already Mahout compliant – http://files.grouplens.org/datasets/movielens/ml100k.zip • Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset • Next: now we can run the recommendation framework against a state-of-the-art dataset
  90. 90. Exercise 5: MovieLens Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 20); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  91. 91. Exercise 5: MovieLens Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(10, 50); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } We can play with parameters!
  92. 92. Exercise 5: MovieLens Recommender • Analyze Recommender behavior with different combination of parameters – Do the recommendations change with a different similarity measure? – Do the recommendations change with different neighborhood sizes? – Which one is the best one? • …. Let’s go to the next exercise!
  93. 93. Exercise 6: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE, Precision
  94. 94. Exercise 6: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE • Hints: useful classes – Implementations of RecommenderEvaluator interface • AverageAbsoluteDifferenceRecommenderEvaluator • RMSRecommenderEvaluator
  95. 95. Exercise 6: Recommender Evaluation • Further Hints: – Use RandomUtils.useTestSeed()to ensure the consistency among different evaluation runs – Invoke the evaluate() method • Parameters – RecommenderBuilder: recommender istance (as in previous exercises. – DataModelBuilder: specific criterion for training – Split Training-Test: double value (e.g. 0.7 for 70%) – Amount of data to use in the evaluation: double value (e.g 1.0 for 100%)
  96. 96. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } Ensures the consistency between different evaluation runs. public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } }
  97. 97. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } }
  98. 98. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } 70%training (evaluation on the whole dataset)
  99. 99. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } Recommendation Engine
  100. 100. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } We can add more measures
  101. 101. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } }
  102. 102. Mahout Strengths • Fast-prototyping and evaluation – To evaluate a different configuration of the same algorithm we just need to update a parameter and run again. – Example • Different Neighborhood Size
  103. 103. 5 minutes to look for the best configuration 
  104. 104. Exercise 7: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: Precision, Recall
  105. 105. Exercise 7: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: Precision, Recall • Hints: useful classes – GenericRecommenderIRStatsEvaluator – Evaluate() method • Same parameters of exercise 6
  106. 106. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Precision@5 , Recall@5, etc.
  107. 107. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Precision@5 , Recall@5, etc.
  108. 108. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Set Neighborhood to 500
  109. 109. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new EuclideanDistanceSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Set Euclidean Distance
  110. 110. Exercise 8: item-based recommender • Mahout provides Java classes for building an item-based recommender system – Amazon-like – Recommendations are based on similarities among items (generally pre-computed offline) – Evaluate it with the MovieLens dataset!
  111. 111. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } }
  112. 112. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } ItemSimilarity
  113. 113. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } No Neighborhood definition for itembased recommenders
  114. 114. Exercise 9: Recommender Evaluation • Write a class that automatically runs evaluation with different parameters – e.g. fixed neighborhood sizes from an Array of values – Print the best scores and the configuration
  115. 115. Exercise 10: more datasets! • Find the best configuration for several datasets – Download datasets from http://mahout.apache.org/users/basics/collections.html – Write classes to transform input data in a Mahout-compliant form – Extend exercise 9!
  116. 116. End. Do you want more?
  117. 117. Do you want more? • Recommendation – Deploy of a Mahout-based Web Recommender – Integration with Hadoop – Integration of content-based information – Custom similarities, Custom recommenders, Rescoring functions • Classification, Clustering and Pattern Mining
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×