Your SlideShare is downloading. ×
0
MovieLens Recommendation
Engine

Outline:

Task & Dataset

Techniques

Results

Scalability

Conclusion
­ Ambarish H...
Task
Predict the rating, a user will give to a movie
which he hasn’t seen yet.
Recommend the movies with the highest score...
Dataset
MovieLens 100k
•
100000 ratings by 943 users on 1682 items. Each user has rated at
least 20 movies.
•
Movies can b...
Techniques

Collaborative

User Based

Item Based

Slope one

Content Based

User Based – Age, Occupation, Gender

...
Results

RMSE
Recommender Error
User Based 1.227
Item Based 0.664
Slope One 0.587
User Content Based 0.649
Item Content B...
Ensemble

Commitee
Recommender RMSE
Collaborative Based 0.595
Content Based 0.612
Collaborative + Content 0.594

Weighte...
Slope One

Principle:
Preferences for new items is based on average difference in the
preference value between a new item...
User Content Based
User: Gender, Occupation, Age
Principle - Two users having similar gender, occupation or age group shar...
Item Content based
Item: Multiple genre
Principle - Two movies of similar multiple genres will be similar.
Similarity -
Ta...
Ensemble

Ensemble

Uses phenomenon of 'Wisdom of crowds'

Commitee
Unweighted average of predicted ratings of all reco...
Scalability-1

Case Study: Item Based Recommender using Coocurrence as similarity.
4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(...
Scalability-2

Sums the products of co-occurrences and preference values.

How is it suitable for distributed?
Computing...
Conclusion

Slope one recommender worked best but it is also computationally
very expensive.

Content based approach gav...
Thank YouThank You
Upcoming SlideShare
Loading in...5
×

Recommendation Engine using Apache Mahout

1,317

Published on

Exploring a weighted ensemble of different recommendation engines such as User based, Item based, Slope-one based and Content based for the MovieLens 100K, 1M, 10M datasets. Achieved an improvement of 11.59% with the ensemble. Also implemented the item based recommender in a distributed manner using Apache Mahout.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,317
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Recommendation Engine using Apache Mahout"

  1. 1. MovieLens Recommendation Engine  Outline:  Task & Dataset  Techniques  Results  Scalability  Conclusion ­ Ambarish Hazarnis ­ Vibhor Mathur
  2. 2. Task Predict the rating, a user will give to a movie which he hasn’t seen yet. Recommend the movies with the highest scores.
  3. 3. Dataset MovieLens 100k • 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. • Movies can be in several genres at once. • Demographic information about the users (age, gender, occupation). Evaluation  Root Mean Squared Error
  4. 4. Techniques  Collaborative  User Based  Item Based  Slope one  Content Based  User Based – Age, Occupation, Gender  Item Based – Genre  Ensemble  Committee  Weighted  Distributed
  5. 5. Results  RMSE Recommender Error User Based 1.227 Item Based 0.664 Slope One 0.587 User Content Based 0.649 Item Content Based 0.639
  6. 6. Ensemble  Commitee Recommender RMSE Collaborative Based 0.595 Content Based 0.612 Collaborative + Content 0.594  Weighted Recommender RMSE Collaborative Based 0.747 Content Based 0.612 Collaborative + Content 0.663
  7. 7. Slope One  Principle: Preferences for new items is based on average difference in the preference value between a new item and the other items the user prefers.  For two items I1 and I2, rating of user1 for I2 who has rated I1,  Count Weighting- Weight heavily those differences that are based on more data.  Standard Deviation- A low std dev means will translate to a higher weight.
  8. 8. User Content Based User: Gender, Occupation, Age Principle - Two users having similar gender, occupation or age group share similar taste. Similarity - Taking advantage of user-specific knowledge. Custom Similarity metric for user similarity. Assigning different weightage to gender, occupation and age similarities to deduce this custom similarity. This custom similarity metric can be paired with a standard GenericUserBasedRecommender. Discard all rating related information from metric computation.
  9. 9. Item Content based Item: Multiple genre Principle - Two movies of similar multiple genres will be similar. Similarity - Taking advantage of item-specific knowledge. Custom Similarity metric for movie similarity. Similarity is deduced based on the degree of similarity of genres. This custom movie similarity metric can be paired with a standard GenericItemBasedRecommender.
  10. 10. Ensemble  Ensemble  Uses phenomenon of 'Wisdom of crowds'  Commitee Unweighted average of predicted ratings of all recommenders  Weighted  Higher weights for better recommenders  If Ei is the error of recommender, let Ai and Wi denote its accuracy and weight respectively.
  11. 11. Scalability-1  Case Study: Item Based Recommender using Coocurrence as similarity. 4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(4.5) + 2(0.0) + 0(5.0) = 24.5 Distributed computation helps by breaking up a problem that’s too big for one server into pieces that several smaller servers can handle
  12. 12. Scalability-2  Sums the products of co-occurrences and preference values.  How is it suitable for distributed? Computing the resulting recommendation vector only requires loading one row or column of the matrix at a time User's Ratings Cooccurence Matrix Item Based Rec Top N Recommendations Apache Mahout: Provides scalable Machine learning libraries Package: org.apache.mahout.cf.taste.hadoop.item.RecommenderJob (5 MapReduce jobs) Recommendations for User 122: [ 9 : 5.0, 546 : 5.0, 568 : 5.0, 527 : 5.0, 515 : 5.0, 514 : 5.0, 511 : 5.0, 498 : 5.0]
  13. 13. Conclusion  Slope one recommender worked best but it is also computationally very expensive.  Content based approach gave better results than plain collaborative approach. However, the former is domain-specific.  A ensemble of simple learners gave comparable result.  More learners in a ensemble results in better predictions.
  14. 14. Thank YouThank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×