Recommendation Engine using Apache Mahout

  • 1,177 views
Uploaded on

Exploring a weighted ensemble of different recommendation engines such as User based, Item based, Slope-one based and Content based for the MovieLens 100K, 1M, 10M datasets. Achieved an improvement of …

Exploring a weighted ensemble of different recommendation engines such as User based, Item based, Slope-one based and Content based for the MovieLens 100K, 1M, 10M datasets. Achieved an improvement of 11.59% with the ensemble. Also implemented the item based recommender in a distributed manner using Apache Mahout.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,177
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
25
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MovieLens Recommendation Engine  Outline:  Task & Dataset  Techniques  Results  Scalability  Conclusion ­ Ambarish Hazarnis ­ Vibhor Mathur
  • 2. Task Predict the rating, a user will give to a movie which he hasn’t seen yet. Recommend the movies with the highest scores.
  • 3. Dataset MovieLens 100k • 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. • Movies can be in several genres at once. • Demographic information about the users (age, gender, occupation). Evaluation  Root Mean Squared Error
  • 4. Techniques  Collaborative  User Based  Item Based  Slope one  Content Based  User Based – Age, Occupation, Gender  Item Based – Genre  Ensemble  Committee  Weighted  Distributed
  • 5. Results  RMSE Recommender Error User Based 1.227 Item Based 0.664 Slope One 0.587 User Content Based 0.649 Item Content Based 0.639
  • 6. Ensemble  Commitee Recommender RMSE Collaborative Based 0.595 Content Based 0.612 Collaborative + Content 0.594  Weighted Recommender RMSE Collaborative Based 0.747 Content Based 0.612 Collaborative + Content 0.663
  • 7. Slope One  Principle: Preferences for new items is based on average difference in the preference value between a new item and the other items the user prefers.  For two items I1 and I2, rating of user1 for I2 who has rated I1,  Count Weighting- Weight heavily those differences that are based on more data.  Standard Deviation- A low std dev means will translate to a higher weight.
  • 8. User Content Based User: Gender, Occupation, Age Principle - Two users having similar gender, occupation or age group share similar taste. Similarity - Taking advantage of user-specific knowledge. Custom Similarity metric for user similarity. Assigning different weightage to gender, occupation and age similarities to deduce this custom similarity. This custom similarity metric can be paired with a standard GenericUserBasedRecommender. Discard all rating related information from metric computation.
  • 9. Item Content based Item: Multiple genre Principle - Two movies of similar multiple genres will be similar. Similarity - Taking advantage of item-specific knowledge. Custom Similarity metric for movie similarity. Similarity is deduced based on the degree of similarity of genres. This custom movie similarity metric can be paired with a standard GenericItemBasedRecommender.
  • 10. Ensemble  Ensemble  Uses phenomenon of 'Wisdom of crowds'  Commitee Unweighted average of predicted ratings of all recommenders  Weighted  Higher weights for better recommenders  If Ei is the error of recommender, let Ai and Wi denote its accuracy and weight respectively.
  • 11. Scalability-1  Case Study: Item Based Recommender using Coocurrence as similarity. 4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(4.5) + 2(0.0) + 0(5.0) = 24.5 Distributed computation helps by breaking up a problem that’s too big for one server into pieces that several smaller servers can handle
  • 12. Scalability-2  Sums the products of co-occurrences and preference values.  How is it suitable for distributed? Computing the resulting recommendation vector only requires loading one row or column of the matrix at a time User's Ratings Cooccurence Matrix Item Based Rec Top N Recommendations Apache Mahout: Provides scalable Machine learning libraries Package: org.apache.mahout.cf.taste.hadoop.item.RecommenderJob (5 MapReduce jobs) Recommendations for User 122: [ 9 : 5.0, 546 : 5.0, 568 : 5.0, 527 : 5.0, 515 : 5.0, 514 : 5.0, 511 : 5.0, 498 : 5.0]
  • 13. Conclusion  Slope one recommender worked best but it is also computationally very expensive.  Content based approach gave better results than plain collaborative approach. However, the former is domain-specific.  A ensemble of simple learners gave comparable result.  More learners in a ensemble results in better predictions.
  • 14. Thank YouThank You