Recommendation Engine using Apache Mahout

•Download as PPT, PDF•

2 likes•2,146 views

Exploring a weighted ensemble of different recommendation engines such as User based, Item based, Slope-one based and Content based for the MovieLens 100K, 1M, 10M datasets. Achieved an improvement of 11.59% with the ensemble. Also implemented the item based recommender in a distributed manner using Apache Mahout.

Technology

MovieLens Recommendation
Engine

Outline:

Task & Dataset

Techniques

Results

Scalability

Conclusion
Ambarish Hazarnis
Vibhor Mathur

Task
Predict the rating, a user will give to a movie
which he hasn’t seen yet.
Recommend the movies with the highest scores.

Dataset
MovieLens 100k
•
100000 ratings by 943 users on 1682 items. Each user has rated at
least 20 movies.
•
Movies can be in several genres at once.
•
Demographic information about the users (age, gender, occupation).
Evaluation

Root Mean Squared Error

Techniques

Collaborative

User Based

Item Based

Slope one

Content Based

User Based – Age, Occupation, Gender

Item Based – Genre

Ensemble

Committee

Weighted

Distributed

Results

RMSE
Recommender Error
User Based 1.227
Item Based 0.664
Slope One 0.587
User Content Based 0.649
Item Content Based 0.639

Ensemble

Commitee
Recommender RMSE
Collaborative Based 0.595
Content Based 0.612
Collaborative + Content 0.594

Weighted
Recommender RMSE
Collaborative Based 0.747
Content Based 0.612
Collaborative + Content 0.663

Slope One

Principle:
Preferences for new items is based on average difference in the
preference value between a new item and the other items the user
prefers.

For two items I1 and I2, rating of user1 for I2 who has rated I1,

Count Weighting- Weight heavily those differences that are based on
more data.

Standard Deviation- A low std dev means will translate to a higher
weight.

User Content Based
User: Gender, Occupation, Age
Principle - Two users having similar gender, occupation or age group share similar taste.
Similarity -
Taking advantage of user-specific knowledge.
Custom Similarity metric for user similarity.
Assigning different weightage to gender, occupation and age similarities to deduce this
custom similarity.
This custom similarity metric can be paired with a standard
GenericUserBasedRecommender.
Discard all rating related information from metric computation.

Item Content based
Item: Multiple genre
Principle - Two movies of similar multiple genres will be similar.
Similarity -
Taking advantage of item-specific knowledge.
Custom Similarity metric for movie similarity.
Similarity is deduced based on the degree of similarity of genres.
This custom movie similarity metric can be paired with a standard
GenericItemBasedRecommender.

Ensemble

Ensemble

Uses phenomenon of 'Wisdom of crowds'

Commitee
Unweighted average of predicted ratings of all recommenders

Weighted

Higher weights for better recommenders

If Ei is the error of recommender, let Ai and Wi denote its accuracy and
weight respectively.

Scalability-1

Case Study: Item Based Recommender using Coocurrence as similarity.
4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(4.5) + 2(0.0) + 0(5.0) = 24.5
Distributed computation helps
by breaking up a problem that’s too big for
one server into pieces that several smaller
servers can handle

Scalability-2

Sums the products of co-occurrences and preference values.

How is it suitable for distributed?
Computing the resulting recommendation vector only requires
loading one row or column of the matrix at a time
User's
Ratings
Cooccurence
Matrix
Item Based Rec
Top N
Recommendations
Apache Mahout: Provides scalable Machine learning
libraries
Package:
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
(5 MapReduce jobs)
Recommendations for User 122:
[ 9 : 5.0, 546 : 5.0, 568 : 5.0, 527 : 5.0, 515 : 5.0, 514 : 5.0, 511 : 5.0, 498 : 5.0]

Conclusion

Slope one recommender worked best but it is also computationally
very expensive.

Content based approach gave better results than plain collaborative
approach. However, the former is domain-specific.

A ensemble of simple learners gave comparable result.

More learners in a ensemble results in better predictions.

Viewers also liked

Apache Accumulo and ClouderaJoey Echeverria

Slope one recommender on hadoopYONG ZHENG

CDH5最新情報 #cwt2013Cloudera Japan

Cloudera hadoop installationSumitra Pundlik

YARN High AvailabilityDataWorks Summit

Introducing Cloudera Director at Big Data BashAndrei Savu

Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting

Extending and Automating Cloudera Manager via APIClouderaUserGroups

Mahout Workshop on Google Cloud PlatformIMC Institute

Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.

Big Data Analytics using MahoutIMC Institute

Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.

Cluster management and automation with cloudera managerChris Westin

Cloudera Manager 5 (hadoop運用) #cwt2013Cloudera Japan

Five Tips for Running Cloudera on AWSCloudera, Inc.

Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said

Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit

The Good, Bad and Ugly of ServerlessPipedrive

HIVE: Data Warehousing & Analytics on HadoopZheng Shao

Hive Quick Start TutorialCarl Steinbach

Viewers also liked (20)

Apache Accumulo and Cloudera

Slope one recommender on hadoop

CDH5最新情報 #cwt2013

Cloudera hadoop installation

YARN High Availability

Introducing Cloudera Director at Big Data Bash

Hadoop Operations for Production Systems (Strata NYC)

Extending and Automating Cloudera Manager via API

Mahout Workshop on Google Cloud Platform

Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud

Big Data Analytics using Mahout

Samsung’s First 90-Days Building a Next-Generation Analytics Platform

Cluster management and automation with cloudera manager

Cloudera Manager 5 (hadoop運用) #cwt2013

Five Tips for Running Cloudera on AWS

Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

The Good, Bad and Ugly of Serverless

HIVE: Data Warehousing & Analytics on Hadoop

Hive Quick Start Tutorial

Similar to Recommendation Engine using Apache Mahout

movie recommender system using vectorization and SVD techUddeshBhagat

Movie recommendation Engine using Artificial IntelligenceHarivamshi D

Movie lens movie recommendation systemGaurav Sawant

2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …Dongseo University

Collaborative Filtering Recommendation SystemMilind Gokhale

Recommendation SystemsRobin Reni

Learning a Joint Embedding Representation for Image Search using Self-supervi...Sujit Pal

[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...Byung Soo Ko

Recommendation engine Using Genetic AlgorithmVaibhav Varshney

Proceedings Template - WORDbutest

Feature Based Opinion Mining from Amazon ReviewsRavi Kiran Holur Vijay

MOVIE RECOMMENDATION SYSTEM.pptxAyushkumar417871

Typicality based collaborative filtering recommendationPapitha Velumani

The Wisdom of the Few @SIGIR09Xavier Amatriain

Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBigDataExpo

Hosanagar Supernova 2008TerrorNova Guild

Analyzing Movie Reviews : Machine learning projectBoston Institute of Analytics

Ensemble Learning Featuring the Netflix Prize Competition and ...butest

Movie Recommendation System Using Hybrid Approch.pptxChanduChandran6

16 recommender systemsTanmayVijay1

Similar to Recommendation Engine using Apache Mahout (20)

movie recommender system using vectorization and SVD tech

Movie recommendation Engine using Artificial Intelligence

Movie lens movie recommendation system

2013-1 Machine Learning Lecture 03 - Sergio Jimenez - Text Classification …

Collaborative Filtering Recommendation System

Recommendation Systems

Learning a Joint Embedding Representation for Image Search using Self-supervi...

[AAAI2021] Proxy Synthesis: Learning with Synthetic Classes for Deep Metric L...

Recommendation engine Using Genetic Algorithm

Proceedings Template - WORD

Feature Based Opinion Mining from Amazon Reviews

MOVIE RECOMMENDATION SYSTEM.pptx

Typicality based collaborative filtering recommendation

The Wisdom of the Few @SIGIR09

Big Data Expo 2015 - Hortonworks Effective use of Apache Spark

Hosanagar Supernova 2008

Analyzing Movie Reviews : Machine learning project

Ensemble Learning Featuring the Netflix Prize Competition and ...

Movie Recommendation System Using Hybrid Approch.pptx

16 recommender systems

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdflior mazor

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Manulife - Insurer Transformation Award 2024The Digital Insurer

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

A Year of the Servo Reboot: Where Are We Now?Igalia

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Why Teams call analytics are critical to your entire businesspanagenda

A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz

Real Time Object Detection Using Open CVKhem

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

DBX First Quarter 2024 Investor PresentationDropbox

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf

Apidays New York 2024 - The value of a flexible API Management solution for O...

AXA XL - Insurer Innovation Award Americas 2024

Manulife - Insurer Transformation Award 2024

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

A Year of the Servo Reboot: Where Are We Now?

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Powerful Google developer tools for immediate impact! (2023-24 C)

Why Teams call analytics are critical to your entire business

A Beginners Guide to Building a RAG App Using Open Source Milvus

Real Time Object Detection Using Open CV

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

DBX First Quarter 2024 Investor Presentation

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

ICT role in 21st century education and its challenges

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Recommendation Engine using Apache Mahout

1. MovieLens Recommendation Engine  Outline:  Task & Dataset  Techniques  Results  Scalability  Conclusion Ambarish Hazarnis Vibhor Mathur

2. Task Predict the rating, a user will give to a movie which he hasn’t seen yet. Recommend the movies with the highest scores.

3. Dataset MovieLens 100k • 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. • Movies can be in several genres at once. • Demographic information about the users (age, gender, occupation). Evaluation  Root Mean Squared Error

4. Techniques  Collaborative  User Based  Item Based  Slope one  Content Based  User Based – Age, Occupation, Gender  Item Based – Genre  Ensemble  Committee  Weighted  Distributed

5. Results  RMSE Recommender Error User Based 1.227 Item Based 0.664 Slope One 0.587 User Content Based 0.649 Item Content Based 0.639

6. Ensemble  Commitee Recommender RMSE Collaborative Based 0.595 Content Based 0.612 Collaborative + Content 0.594  Weighted Recommender RMSE Collaborative Based 0.747 Content Based 0.612 Collaborative + Content 0.663

7. Slope One  Principle: Preferences for new items is based on average difference in the preference value between a new item and the other items the user prefers.  For two items I1 and I2, rating of user1 for I2 who has rated I1,  Count Weighting- Weight heavily those differences that are based on more data.  Standard Deviation- A low std dev means will translate to a higher weight.

8. User Content Based User: Gender, Occupation, Age Principle - Two users having similar gender, occupation or age group share similar taste. Similarity - Taking advantage of user-specific knowledge. Custom Similarity metric for user similarity. Assigning different weightage to gender, occupation and age similarities to deduce this custom similarity. This custom similarity metric can be paired with a standard GenericUserBasedRecommender. Discard all rating related information from metric computation.

9. Item Content based Item: Multiple genre Principle - Two movies of similar multiple genres will be similar. Similarity - Taking advantage of item-specific knowledge. Custom Similarity metric for movie similarity. Similarity is deduced based on the degree of similarity of genres. This custom movie similarity metric can be paired with a standard GenericItemBasedRecommender.

10. Ensemble  Ensemble  Uses phenomenon of 'Wisdom of crowds'  Commitee Unweighted average of predicted ratings of all recommenders  Weighted  Higher weights for better recommenders  If Ei is the error of recommender, let Ai and Wi denote its accuracy and weight respectively.

11. Scalability-1  Case Study: Item Based Recommender using Coocurrence as similarity. 4(2.0) + 3(0.0) + 4(0.0) + 3(4.0) + 1(4.5) + 2(0.0) + 0(5.0) = 24.5 Distributed computation helps by breaking up a problem that’s too big for one server into pieces that several smaller servers can handle

12. Scalability-2  Sums the products of co-occurrences and preference values.  How is it suitable for distributed? Computing the resulting recommendation vector only requires loading one row or column of the matrix at a time User's Ratings Cooccurence Matrix Item Based Rec Top N Recommendations Apache Mahout: Provides scalable Machine learning libraries Package: org.apache.mahout.cf.taste.hadoop.item.RecommenderJob (5 MapReduce jobs) Recommendations for User 122: [ 9 : 5.0, 546 : 5.0, 568 : 5.0, 527 : 5.0, 515 : 5.0, 514 : 5.0, 511 : 5.0, 498 : 5.0]

13. Conclusion  Slope one recommender worked best but it is also computationally very expensive.  Content based approach gave better results than plain collaborative approach. However, the former is domain-specific.  A ensemble of simple learners gave comparable result.  More learners in a ensemble results in better predictions.

14. Thank YouThank You

Recommendation Engine using Apache Mahout

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Recommendation Engine using Apache Mahout

Similar to Recommendation Engine using Apache Mahout (20)

Recently uploaded

Recently uploaded (20)

Recommendation Engine using Apache Mahout