SlideShare a Scribd company logo
1 of 37
Retail Products Recommendations Using
Machine Learning
P. 2
Products Recommendations
Business Case – Why?
P. 3
When you use YouTube, Netflix or other online media services, you may have
noticed “recommendation for you” on videos, movies or music. As
consumers, we like to have a personalized list for easy access to products,
services and to save time. As we watch more videos, those recommendations
become better in accuracy and quality. A more satisfied and happy user is a
winning factor for a company.
Big data makes this easy and cool stuff available to us with its scalability and it’s
power to process huge data either structured or unstructured data. Through big
data, data developers can analyze billions of products of a company and process
them with the help of machine learning to better provide even more narrowed
recommendations for the user.
Products Recommendations - Business Case
PRODUCT
RECOMMENDATIONS
P. 4
Customers of these brands would be
delighted with the huge
variety of products to choose from
But often find it difficult to
sift through the variety
and identify things they would like
RECOMMENDATIONS HELP USERS
Navigate the maze of the product catalogues
Find what they are looking for
Find PRODUCTS they might like, But didn’t know of
Products Recommendations - Business Case
P. 5
RECOMMENDATIONS HELPTHESE BRANDS
SOLVE the Problem of DISCOVERY
HOW?
Products Recommendations - Business Case
P. 6
HOW?
USING DATA
What users
Bought
What users
Browsed
What Users
Rated
RECOMMENDATION
ENGINE
Top picks for you!!!
If you like this,
You’ll love that
Products Recommendations - Business Case
P. 7
Products Recommendations
How?
P. 8
RECOMMENDATION ENGINE
OBJECTIVE
Filter Relevant Products
Predict what rating the user
would give a product
Predict whether a user
would buy a product
Rank products based on their
relevance to the user
Tasks Performed
By
RECOMMENDATION
ENGINES
Products Recommendations - How?
P. 9
Most RECOMMENDATION ENGINES
use a technique called
COLLABORATIVE
FILTERING – Latent Factor
How does that work?
The basic premise is that
If 2 users have the same opinion
About a bunch of Products
They are likely to have the same
opinion about other products too
IT REPRESENTS USERS BYTHEIR
RATINGS FOR DIFFERENT
PRODUCTS
COLLABORATIVE FILTERING
Algorithms normally predict
Users’ Ratings for Products they haven’t yet rated
Products Recommendations - How?
P. 10
Products are represented using
these descriptors
sweatersJeans
Shirts
Outerwear
Users are represented using the same descriptors
Joe likes light weight skinny fit jeans and
Linen-cotton short sleeve standard fit shirt
9, 7
Products Recommendations - How?
P. 11
RECOMMENDATIONS
FOR Joe
sweatersJeans
Shirts
Outerwear
10, 9
Products Recommendations - How?
P. 12
USER-PRODUCT-RATING
MATRIX
4 - 4 - - - - - - -
- 3 4 - - - - - - -
5 3 2 - - - - 3 - -
2 - 3 - - - - - - -
5 - - - 4 2 3 - - -
- - - - - 4 - - - -
- - - 4 - 5 - - - -
- 3 3 - - - - - 4 -
Prod1 Prod2 Prod3 Prod4 Prod5 Prod n

User1
User2
User3
User4
User5


User n
User ID
Product ID Rating
Products Recommendations - How?
P. 13
Products Recommendations
Implementation
P. 14
We will use the Spark’s MLlib ALS algorithm to learn the latent factors that can be used to predict missing entries in the user-
product association matrix.
First we separate the ratings data into training data (80%) and test data (20%). We will get recommendations for the training
data, then we will evaluate the predictions with the test data. This process of taking a subset of the data to build the model
and then verifying the model with the remaining data is known as cross validation, the goal is to estimate how accurately a
predictive model will perform in practice.
To improve the model this process is often done multiple times with different subsets, we will only do it once.
Products Recommendations – Implementation
Using ALTERNATING LEAST SQUARES (ALS) to Build a Matrix Factorization Model
P. 15
Products Recommendations – Implementation
All ratings are contained in the file "ratings.dat" and are in the following format:
UserID::ProductID::Rating::Timestamp
1::1193::5::978300760
- UserIDs range between 1 and 6040
- ProductIDs range between 1 and 3952
- Ratings are made on a 5-star scale
- Timestamp is represented in seconds since the epoch
User information is in the file "users.dat" and is in the following format:
UserID::Gender::Age::Occupation::Zip-code
1::F::1::10::4806720::M::25::14::55113
- Gender is denoted by a "M" for male and "F" for female
- Age is chosen from the following ranges:
* 1: "Under 18“
* 18: "18-24“
* 25: "25-34“
* 35: "35-44“
* 45: "45-49“
* 50: "50-55“
* 56: "56+"
- Occupation is chosen from the following choices:
* 0: "other" or not specified
* 1: "academic/educator“
* 2: "artist“
* 3: "clerical/admin"
* 4: "college/grad student“
* 5: "customer service“
* 6: "doctor/health care“
* 7: "executive/managerial“
* 8: "farmer“
* 9: "homemaker“
* 10: "K-12 student“
* 11: "lawyer“
* 12: "programmer“
* 13: "retired“
* 14: "sales/marketing“
* 15: "scientist“
* 16: "self-employed“
* 17: "technician/engineer“
* 18: "tradesman/craftsman“
* 19: "unemployed“
* 20: "writer"
Product information is in the file “products.dat" and is in the following format:
ProductID::Name::Category
1::Product1::Pants|Baby|Stripe
The Sample Data Sets
P. 16
Products Recommendations – Implementation
The Infrastructure – Amazon Web Services: EMR
P. 17
Products Recommendations – Implementation
The Infrastructure – Amazon Web Services: EMR
P. 18
Products Recommendations – Implementation
The Infrastructure – Amazon Web Services: EMR
P. 19
Load Data into Spark DataFrames
First we will import some packages and instantiate a sqlContext, which is the entry point for working with structured data
(rows and columns) in Spark and allows the creation of DataFrame objects.
Products Recommendations – Implementation
// SQLContext entry point for working with structured data
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
// Import Spark SQL data types
import org.apache.spark.sql._
// Import MLLIB data types
import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating}
// define the schemas using a case classes
// input format ProductID::Name::Category
case class Product(productId: Int, name: String)
P. 20
Products Recommendations – Implementation
// input format UserID::Gender::Age::Occupation::Zip-code
case class User(userId: Int, gender: String, age: Int, occupation: Int, zip: String)
// function to parse input into Product class
def parseProduct(str: String): Product = {
val fields = str.split("::")
assert(fields.size == 3)
Product(fields(0).toInt, fields(1))
}
// function to parse input into User class
def parseUser(str: String): User = {
val fields = str.split("::")
assert(fields.size == 5)
User(fields(0).toInt, fields(1).toString, fields(2).toInt, fields(3).toInt, fields(4).toString)
}
// function to parse input UserID::ProductID::Rating
// and pass into constructor for org.apache.spark.mllib.recommendation.Rating class
def parseRating(str: String): Rating = {
val fields = str.split("::")
Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble)
}
P. 21
Products Recommendations – Implementation
// load the data into an RDD
val ratingText = sc.textFile("/user/hadoop/data/ratings.dat")
val ratingsRDD = ratingText.map(parseRating).cache()
// count number of total ratings
val numRatings = ratingsRDD.count()
// count number of users who rated a product
val numUsers = ratingsRDD.map(_.user).distinct().count()
// count number of product rated
val numProducts = ratingsRDD.map(_.product).distinct().count()
println(s"Got $numRatings ratings from $numUsers users on $numProducts products.")
// load the data into DataFrames
val productsDF= sc.textFile("/user/hadoop/data/products.dat").map(parseProduct).toDF()
val usersDF = sc.textFile("/user/hadoop/data/users.dat").map(parseUser).toDF()
// create a DataFrame from ratingsRDD
val ratingsDF = ratingsRDD.toDF()
ratingsDF.registerTempTable("ratings")
productsDF.registerTempTable("products")
usersDF.registerTempTable("users")
P. 22
Products Recommendations – Implementation
ratingsDF.select("product").distinct.count //res7: Long = 3706
ratingsDF.groupBy("product", "rating").count.show
ratingsDF.groupBy("product").count.agg(min("count"), avg("count"),max("count")).show
ratingsDF.select("product", "rating").groupBy("product", "rating").count.agg(min("count"), avg("count"),max("count")).show
// Count the max, min ratings along with the number of users who have rated a product.
// Display the name, max rating, min rating, number of users.
val results =sqlContext.sql("select products.name, productrates.maxr, productrates.minr, productrates.cntu from(SELECT
ratings.product, max(ratings.rating) as maxr, min(ratings.rating) as minr,count(distinct user) as cntu FROM ratings group by
ratings.product ) productrates join products on productrates.product=products.productId order by productrates.cntu desc ")
// DataFrame show() displays the top 20 rows in tabular form
results.show()
// Show the top 10 most-active users and how many times they rated a product
val mostActiveUsersSchemaRDD = sqlContext.sql("SELECT ratings.user, count(*) as ct from ratings group by ratings.user order by
ct desc limit 10")
mostActiveUsersSchemaRDD.take(20).foreach(println)
// Find the products that user 4169 rated higher than 4
val results =sqlContext.sql("SELECT ratings.user, ratings.product, ratings.rating, products.name FROM ratings JOIN products ON
products.productId=ratings.product where ratings.user=4169 and ratings.rating > 4 order by ratings.rating desc ")
P. 23
Products Recommendations – Implementation
P. 24
Products Recommendations – Implementation
results.show
We run ALS on the input trainingRDD of Rating (user, product, rating) objects with the rank and Iterations parameters:
‱ Rank is the number of latent factors in the model.
‱ Iterations is the number of iterations to run.
The ALS run(trainingRDD) method will build and return a MatrixFactorizationModel, which can be used to make product
predictions for users.
// Randomly split ratings RDD into training data RDD (80%) and test data RDD (20%)
val splits = ratingsRDD.randomSplit(Array(0.8, 0.2), 0L)
val trainingRatingsRDD = splits(0).cache()
val testRatingsRDD = splits(1).cache()
val numTraining = trainingRatingsRDD.count()
val numTest = testRatingsRDD.count()
println(s"Training: $numTraining, test: $numTest.")
// Build the recommendation model using ALS with rank=20, iterations=10
val model = ALS.train(trainingRatingsRDD, 20, 10)
P. 25
Products Recommendations – Implementation
P. 26
Products Recommendations – Implementation
val model = (new ALS().setRank(20).setIterations(10).run(trainingRatingsRDD))
Making Predictions with the MatrixFactorizationModel
Now we can use the MatrixFactorizationModel to make predictions. First we will get product predictions for the most active
user, 4169, with the recommendProducts() method , which takes as input the user ID and the number of products to
recommend. Then we print out the recommended product names.
// Make product predictions for user 4169
val topRecsForUser = model.recommendProducts(4169, 10)
// get product names to show with recommendations
val productNames= productsDF.rdd.map(array => (array(0), array(1))).collectAsMap()
// print out top recommendations for user 4169 with products
topRecsForUser.map(rating => (productNames(rating.product), rating.rating)).foreach(println)
Evaluating the Model
Next we will compare predictions from the model with actual ratings in the testRatingsRDD. First we get the user product pairs
from the testRatingsRDD to pass to the MatrixFactorizationModel predict(user:Int,product:Int) method , which will return
predictions as Rating (user, product, rating) objects.
P. 27
Products Recommendations – Implementation
// get predicted ratings to compare to test ratings
val predictionsForTestRDD = model.predict(testRatingsRDD.map{case Rating(user, product, rating) => (user, product)})
predictionsForTestRDD.take(10).mkString("n")
Now we will compare the test predictions to the actual test ratings. First we put the predictions and the test RDDs in this key,
value pair format for joining: ((user, product), rating). Then we print out the (user, product), (test rating, predicted rating) for
comparison.
// prepare the predictions for comparison
val predictionsKeyedByUserProductRDD = predictionsForTestRDD.map{
case Rating(user, product, rating) => ((user, product), rating)
}
// prepare the test for comparison
val testKeyedByUserProductRDD = testRatingsRDD.map{
case Rating(user, product, rating) => ((user, product), rating)
}
//Join the test with the predictions
val testAndPredictionsJoinedRDD = testKeyedByUserProductRDD.join(predictionsKeyedByUserProductRDD)
testAndPredictionsJoinedRDD.take(10).mkString("n")
P. 28
Products Recommendations – Implementation
The example below finds false positives by finding predicted ratings which were >= 4 when the actual test rating was <= 1.
val falsePositives =(testAndPredictionsJoinedRDD.filter{ case ((user, product), (ratingT, ratingP)) => (ratingT <= 1 && ratingP
>=4) })
falsePositives.take(2)
falsePositives.count
Next we evaluate the model using Mean Absolute Error (MAE). MAE is the absolute differences between the predicted and
actual targets.
//Evaluate the model using Mean Absolute Error (MAE) between test and predictions
val meanAbsoluteError = testAndPredictionsJoinedRDD.map {
case ((user, product), (testRating, predRating)) =>
val err = (testRating - predRating)
Math.abs(err)
}.mean()
P. 29
Products Recommendations – Implementation
P. 30
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
P. 31
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
P. 32
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
P. 33
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
P. 34
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
P. 35
Products Recommendations – Implementation
CLOSING THOUGHTS
 The goal of a collaborative filtering algorithm is to take preferences data from users, and to create a model that can be used
for recommendations or predictions.
 Collaborative filtering algorithms recommend items based on preference information from many users. The collaborative
filtering approach is based on similarity; people who liked similar items in the past will like similar items in the future.
 Machine learning algorithms are pretty complicated
 Apache Spark’s MLlib has Built-in modules for ClassiïŹcation, regression, clustering, recommendations etc algorithms. Under the
hood the library takes care of running these algorithms across a cluster. This completely abstracts the programmer from
Implementing the ML algorithm Intricacies of running it across a cluster
 Latent Factor analysis and ALS are pretty magical. We just need to have a good dataset with User-Product Ratings
P. 36
Thank You
P. 37
hkbhadraa@gmail.com

More Related Content

What's hot

A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation systemPranav Prakash
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentCrossing Minds
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation projectAbhishek Jaisingh
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
Project presentation
Project presentationProject presentation
Project presentationShivarshi Bajpai
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive ModelDKALab
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsZia Babar
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesDaniel Valcarce
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Mauryasuraj98
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Shrutika Oswal
 

What's hot (20)

A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time Deployment
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Project presentation
Project presentationProject presentation
Project presentation
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 

Similar to Retail products - machine learning recommendation engine

Salesforce mumbai user group june meetup
Salesforce mumbai user group   june meetupSalesforce mumbai user group   june meetup
Salesforce mumbai user group june meetupRakesh Gupta
 
Flipkart pre sales_analysis
Flipkart pre sales_analysisFlipkart pre sales_analysis
Flipkart pre sales_analysiskumarrajn
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...IRJET Journal
 
29.4 Mb
29.4 Mb29.4 Mb
29.4 Mbguru100
 
Einstein recommendations how it works
Einstein recommendations  how it works  Einstein recommendations  how it works
Einstein recommendations how it works Cloud Analogy
 
MongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National LabsLucidworks
 
Association Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big DataAssociation Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big DataIRJET Journal
 
Final project ADS INFO-7390
Final project ADS INFO-7390Final project ADS INFO-7390
Final project ADS INFO-7390Tushar Goel
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework managermaxonlinetr
 
La6 ict-topic-6-information-systems
La6 ict-topic-6-information-systemsLa6 ict-topic-6-information-systems
La6 ict-topic-6-information-systemsAzmiah Mahmud
 
Spring Data JPA in detail with spring boot
Spring Data JPA in detail with spring bootSpring Data JPA in detail with spring boot
Spring Data JPA in detail with spring bootrinky1234
 
James hall ch 14
James hall ch 14James hall ch 14
James hall ch 14David Julian
 
major ppt.pptx
major ppt.pptxmajor ppt.pptx
major ppt.pptxAnushaG52
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating DataIRJET Journal
 
Building Innovative Products with Agile
Building Innovative Products with AgileBuilding Innovative Products with Agile
Building Innovative Products with AgileSean Ammirati
 
Pruebas de rendimiento de Microsoft Dynamics NAV Whitepaper
Pruebas de rendimiento de Microsoft Dynamics NAV WhitepaperPruebas de rendimiento de Microsoft Dynamics NAV Whitepaper
Pruebas de rendimiento de Microsoft Dynamics NAV WhitepaperCLARA CAMPROVIN
 

Similar to Retail products - machine learning recommendation engine (20)

Salesforce mumbai user group june meetup
Salesforce mumbai user group   june meetupSalesforce mumbai user group   june meetup
Salesforce mumbai user group june meetup
 
Flipkart pre sales_analysis
Flipkart pre sales_analysisFlipkart pre sales_analysis
Flipkart pre sales_analysis
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...
 
29.4 mb
29.4 mb29.4 mb
29.4 mb
 
29.4 Mb
29.4 Mb29.4 Mb
29.4 Mb
 
Einstein recommendations how it works
Einstein recommendations  how it works  Einstein recommendations  how it works
Einstein recommendations how it works
 
MongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB.local DC 2018: Building Intelligent Apps with MongoDB & Google Cloud
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Association Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big DataAssociation Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big Data
 
Final project ADS INFO-7390
Final project ADS INFO-7390Final project ADS INFO-7390
Final project ADS INFO-7390
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework manager
 
La6 ict-topic-6-information-systems
La6 ict-topic-6-information-systemsLa6 ict-topic-6-information-systems
La6 ict-topic-6-information-systems
 
Spring Data JPA in detail with spring boot
Spring Data JPA in detail with spring bootSpring Data JPA in detail with spring boot
Spring Data JPA in detail with spring boot
 
James hall ch 14
James hall ch 14James hall ch 14
James hall ch 14
 
major ppt.pptx
major ppt.pptxmajor ppt.pptx
major ppt.pptx
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating Data
 
Building Innovative Products with Agile
Building Innovative Products with AgileBuilding Innovative Products with Agile
Building Innovative Products with Agile
 
Pruebas de rendimiento de Microsoft Dynamics NAV Whitepaper
Pruebas de rendimiento de Microsoft Dynamics NAV WhitepaperPruebas de rendimiento de Microsoft Dynamics NAV Whitepaper
Pruebas de rendimiento de Microsoft Dynamics NAV Whitepaper
 

More from hkbhadraa

Big data and hadoop training - Session 5
Big data and hadoop training - Session 5Big data and hadoop training - Session 5
Big data and hadoop training - Session 5hkbhadraa
 
Big data and hadoop training - Session 3
Big data and hadoop training - Session 3Big data and hadoop training - Session 3
Big data and hadoop training - Session 3hkbhadraa
 
Big data and hadoop training - Session 2
Big data and hadoop training  - Session 2Big data and hadoop training  - Session 2
Big data and hadoop training - Session 2hkbhadraa
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands Onhkbhadraa
 
Setup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands OnSetup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands Onhkbhadraa
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands Onhkbhadraa
 
Project management part 5
Project management part 5Project management part 5
Project management part 5hkbhadraa
 
Project management part 4
Project management part 4Project management part 4
Project management part 4hkbhadraa
 
Project management part 3
Project management part 3Project management part 3
Project management part 3hkbhadraa
 
Project management part 2
Project management part 2Project management part 2
Project management part 2hkbhadraa
 
Project management part 1
Project management part 1Project management part 1
Project management part 1hkbhadraa
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analyticshkbhadraa
 
Gamification
GamificationGamification
Gamificationhkbhadraa
 
Internet of things
Internet of thingsInternet of things
Internet of thingshkbhadraa
 
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDEIBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDEhkbhadraa
 

More from hkbhadraa (15)

Big data and hadoop training - Session 5
Big data and hadoop training - Session 5Big data and hadoop training - Session 5
Big data and hadoop training - Session 5
 
Big data and hadoop training - Session 3
Big data and hadoop training - Session 3Big data and hadoop training - Session 3
Big data and hadoop training - Session 3
 
Big data and hadoop training - Session 2
Big data and hadoop training  - Session 2Big data and hadoop training  - Session 2
Big data and hadoop training - Session 2
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands On
 
Setup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands OnSetup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands On
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands On
 
Project management part 5
Project management part 5Project management part 5
Project management part 5
 
Project management part 4
Project management part 4Project management part 4
Project management part 4
 
Project management part 3
Project management part 3Project management part 3
Project management part 3
 
Project management part 2
Project management part 2Project management part 2
Project management part 2
 
Project management part 1
Project management part 1Project management part 1
Project management part 1
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
 
Gamification
GamificationGamification
Gamification
 
Internet of things
Internet of thingsInternet of things
Internet of things
 
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDEIBM Bluemix Cloud Platform Application Development with Eclipse IDE
IBM Bluemix Cloud Platform Application Development with Eclipse IDE
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Retail products - machine learning recommendation engine

  • 1. Retail Products Recommendations Using Machine Learning
  • 3. P. 3 When you use YouTube, Netflix or other online media services, you may have noticed “recommendation for you” on videos, movies or music. As consumers, we like to have a personalized list for easy access to products, services and to save time. As we watch more videos, those recommendations become better in accuracy and quality. A more satisfied and happy user is a winning factor for a company. Big data makes this easy and cool stuff available to us with its scalability and it’s power to process huge data either structured or unstructured data. Through big data, data developers can analyze billions of products of a company and process them with the help of machine learning to better provide even more narrowed recommendations for the user. Products Recommendations - Business Case PRODUCT RECOMMENDATIONS
  • 4. P. 4 Customers of these brands would be delighted with the huge variety of products to choose from But often find it difficult to sift through the variety and identify things they would like RECOMMENDATIONS HELP USERS Navigate the maze of the product catalogues Find what they are looking for Find PRODUCTS they might like, But didn’t know of Products Recommendations - Business Case
  • 5. P. 5 RECOMMENDATIONS HELPTHESE BRANDS SOLVE the Problem of DISCOVERY HOW? Products Recommendations - Business Case
  • 6. P. 6 HOW? USING DATA What users Bought What users Browsed What Users Rated RECOMMENDATION ENGINE Top picks for you!!! If you like this, You’ll love that Products Recommendations - Business Case
  • 8. P. 8 RECOMMENDATION ENGINE OBJECTIVE Filter Relevant Products Predict what rating the user would give a product Predict whether a user would buy a product Rank products based on their relevance to the user Tasks Performed By RECOMMENDATION ENGINES Products Recommendations - How?
  • 9. P. 9 Most RECOMMENDATION ENGINES use a technique called COLLABORATIVE FILTERING – Latent Factor How does that work? The basic premise is that If 2 users have the same opinion About a bunch of Products They are likely to have the same opinion about other products too IT REPRESENTS USERS BYTHEIR RATINGS FOR DIFFERENT PRODUCTS COLLABORATIVE FILTERING Algorithms normally predict Users’ Ratings for Products they haven’t yet rated Products Recommendations - How?
  • 10. P. 10 Products are represented using these descriptors sweatersJeans Shirts Outerwear Users are represented using the same descriptors Joe likes light weight skinny fit jeans and Linen-cotton short sleeve standard fit shirt 9, 7 Products Recommendations - How?
  • 12. P. 12 USER-PRODUCT-RATING MATRIX 4 - 4 - - - - - - - - 3 4 - - - - - - - 5 3 2 - - - - 3 - - 2 - 3 - - - - - - - 5 - - - 4 2 3 - - - - - - - - 4 - - - - - - - 4 - 5 - - - - - 3 3 - - - - - 4 - Prod1 Prod2 Prod3 Prod4 Prod5 Prod n
 User1 User2 User3 User4 User5 
 User n User ID Product ID Rating Products Recommendations - How?
  • 14. P. 14 We will use the Spark’s MLlib ALS algorithm to learn the latent factors that can be used to predict missing entries in the user- product association matrix. First we separate the ratings data into training data (80%) and test data (20%). We will get recommendations for the training data, then we will evaluate the predictions with the test data. This process of taking a subset of the data to build the model and then verifying the model with the remaining data is known as cross validation, the goal is to estimate how accurately a predictive model will perform in practice. To improve the model this process is often done multiple times with different subsets, we will only do it once. Products Recommendations – Implementation Using ALTERNATING LEAST SQUARES (ALS) to Build a Matrix Factorization Model
  • 15. P. 15 Products Recommendations – Implementation All ratings are contained in the file "ratings.dat" and are in the following format: UserID::ProductID::Rating::Timestamp 1::1193::5::978300760 - UserIDs range between 1 and 6040 - ProductIDs range between 1 and 3952 - Ratings are made on a 5-star scale - Timestamp is represented in seconds since the epoch User information is in the file "users.dat" and is in the following format: UserID::Gender::Age::Occupation::Zip-code 1::F::1::10::4806720::M::25::14::55113 - Gender is denoted by a "M" for male and "F" for female - Age is chosen from the following ranges: * 1: "Under 18“ * 18: "18-24“ * 25: "25-34“ * 35: "35-44“ * 45: "45-49“ * 50: "50-55“ * 56: "56+" - Occupation is chosen from the following choices: * 0: "other" or not specified * 1: "academic/educator“ * 2: "artist“ * 3: "clerical/admin" * 4: "college/grad student“ * 5: "customer service“ * 6: "doctor/health care“ * 7: "executive/managerial“ * 8: "farmer“ * 9: "homemaker“ * 10: "K-12 student“ * 11: "lawyer“ * 12: "programmer“ * 13: "retired“ * 14: "sales/marketing“ * 15: "scientist“ * 16: "self-employed“ * 17: "technician/engineer“ * 18: "tradesman/craftsman“ * 19: "unemployed“ * 20: "writer" Product information is in the file “products.dat" and is in the following format: ProductID::Name::Category 1::Product1::Pants|Baby|Stripe The Sample Data Sets
  • 16. P. 16 Products Recommendations – Implementation The Infrastructure – Amazon Web Services: EMR
  • 17. P. 17 Products Recommendations – Implementation The Infrastructure – Amazon Web Services: EMR
  • 18. P. 18 Products Recommendations – Implementation The Infrastructure – Amazon Web Services: EMR
  • 19. P. 19 Load Data into Spark DataFrames First we will import some packages and instantiate a sqlContext, which is the entry point for working with structured data (rows and columns) in Spark and allows the creation of DataFrame objects. Products Recommendations – Implementation // SQLContext entry point for working with structured data val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.implicits._ // Import Spark SQL data types import org.apache.spark.sql._ // Import MLLIB data types import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating} // define the schemas using a case classes // input format ProductID::Name::Category case class Product(productId: Int, name: String)
  • 20. P. 20 Products Recommendations – Implementation // input format UserID::Gender::Age::Occupation::Zip-code case class User(userId: Int, gender: String, age: Int, occupation: Int, zip: String) // function to parse input into Product class def parseProduct(str: String): Product = { val fields = str.split("::") assert(fields.size == 3) Product(fields(0).toInt, fields(1)) } // function to parse input into User class def parseUser(str: String): User = { val fields = str.split("::") assert(fields.size == 5) User(fields(0).toInt, fields(1).toString, fields(2).toInt, fields(3).toInt, fields(4).toString) } // function to parse input UserID::ProductID::Rating // and pass into constructor for org.apache.spark.mllib.recommendation.Rating class def parseRating(str: String): Rating = { val fields = str.split("::") Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble) }
  • 21. P. 21 Products Recommendations – Implementation // load the data into an RDD val ratingText = sc.textFile("/user/hadoop/data/ratings.dat") val ratingsRDD = ratingText.map(parseRating).cache() // count number of total ratings val numRatings = ratingsRDD.count() // count number of users who rated a product val numUsers = ratingsRDD.map(_.user).distinct().count() // count number of product rated val numProducts = ratingsRDD.map(_.product).distinct().count() println(s"Got $numRatings ratings from $numUsers users on $numProducts products.") // load the data into DataFrames val productsDF= sc.textFile("/user/hadoop/data/products.dat").map(parseProduct).toDF() val usersDF = sc.textFile("/user/hadoop/data/users.dat").map(parseUser).toDF() // create a DataFrame from ratingsRDD val ratingsDF = ratingsRDD.toDF() ratingsDF.registerTempTable("ratings") productsDF.registerTempTable("products") usersDF.registerTempTable("users")
  • 22. P. 22 Products Recommendations – Implementation ratingsDF.select("product").distinct.count //res7: Long = 3706 ratingsDF.groupBy("product", "rating").count.show ratingsDF.groupBy("product").count.agg(min("count"), avg("count"),max("count")).show ratingsDF.select("product", "rating").groupBy("product", "rating").count.agg(min("count"), avg("count"),max("count")).show // Count the max, min ratings along with the number of users who have rated a product. // Display the name, max rating, min rating, number of users. val results =sqlContext.sql("select products.name, productrates.maxr, productrates.minr, productrates.cntu from(SELECT ratings.product, max(ratings.rating) as maxr, min(ratings.rating) as minr,count(distinct user) as cntu FROM ratings group by ratings.product ) productrates join products on productrates.product=products.productId order by productrates.cntu desc ") // DataFrame show() displays the top 20 rows in tabular form results.show() // Show the top 10 most-active users and how many times they rated a product val mostActiveUsersSchemaRDD = sqlContext.sql("SELECT ratings.user, count(*) as ct from ratings group by ratings.user order by ct desc limit 10") mostActiveUsersSchemaRDD.take(20).foreach(println) // Find the products that user 4169 rated higher than 4 val results =sqlContext.sql("SELECT ratings.user, ratings.product, ratings.rating, products.name FROM ratings JOIN products ON products.productId=ratings.product where ratings.user=4169 and ratings.rating > 4 order by ratings.rating desc ")
  • 23. P. 23 Products Recommendations – Implementation
  • 24. P. 24 Products Recommendations – Implementation results.show We run ALS on the input trainingRDD of Rating (user, product, rating) objects with the rank and Iterations parameters: ‱ Rank is the number of latent factors in the model. ‱ Iterations is the number of iterations to run. The ALS run(trainingRDD) method will build and return a MatrixFactorizationModel, which can be used to make product predictions for users. // Randomly split ratings RDD into training data RDD (80%) and test data RDD (20%) val splits = ratingsRDD.randomSplit(Array(0.8, 0.2), 0L) val trainingRatingsRDD = splits(0).cache() val testRatingsRDD = splits(1).cache() val numTraining = trainingRatingsRDD.count() val numTest = testRatingsRDD.count() println(s"Training: $numTraining, test: $numTest.") // Build the recommendation model using ALS with rank=20, iterations=10 val model = ALS.train(trainingRatingsRDD, 20, 10)
  • 25. P. 25 Products Recommendations – Implementation
  • 26. P. 26 Products Recommendations – Implementation val model = (new ALS().setRank(20).setIterations(10).run(trainingRatingsRDD)) Making Predictions with the MatrixFactorizationModel Now we can use the MatrixFactorizationModel to make predictions. First we will get product predictions for the most active user, 4169, with the recommendProducts() method , which takes as input the user ID and the number of products to recommend. Then we print out the recommended product names. // Make product predictions for user 4169 val topRecsForUser = model.recommendProducts(4169, 10) // get product names to show with recommendations val productNames= productsDF.rdd.map(array => (array(0), array(1))).collectAsMap() // print out top recommendations for user 4169 with products topRecsForUser.map(rating => (productNames(rating.product), rating.rating)).foreach(println) Evaluating the Model Next we will compare predictions from the model with actual ratings in the testRatingsRDD. First we get the user product pairs from the testRatingsRDD to pass to the MatrixFactorizationModel predict(user:Int,product:Int) method , which will return predictions as Rating (user, product, rating) objects.
  • 27. P. 27 Products Recommendations – Implementation // get predicted ratings to compare to test ratings val predictionsForTestRDD = model.predict(testRatingsRDD.map{case Rating(user, product, rating) => (user, product)}) predictionsForTestRDD.take(10).mkString("n") Now we will compare the test predictions to the actual test ratings. First we put the predictions and the test RDDs in this key, value pair format for joining: ((user, product), rating). Then we print out the (user, product), (test rating, predicted rating) for comparison. // prepare the predictions for comparison val predictionsKeyedByUserProductRDD = predictionsForTestRDD.map{ case Rating(user, product, rating) => ((user, product), rating) } // prepare the test for comparison val testKeyedByUserProductRDD = testRatingsRDD.map{ case Rating(user, product, rating) => ((user, product), rating) } //Join the test with the predictions val testAndPredictionsJoinedRDD = testKeyedByUserProductRDD.join(predictionsKeyedByUserProductRDD) testAndPredictionsJoinedRDD.take(10).mkString("n")
  • 28. P. 28 Products Recommendations – Implementation The example below finds false positives by finding predicted ratings which were >= 4 when the actual test rating was <= 1. val falsePositives =(testAndPredictionsJoinedRDD.filter{ case ((user, product), (ratingT, ratingP)) => (ratingT <= 1 && ratingP >=4) }) falsePositives.take(2) falsePositives.count Next we evaluate the model using Mean Absolute Error (MAE). MAE is the absolute differences between the predicted and actual targets. //Evaluate the model using Mean Absolute Error (MAE) between test and predictions val meanAbsoluteError = testAndPredictionsJoinedRDD.map { case ((user, product), (testRating, predRating)) => val err = (testRating - predRating) Math.abs(err) }.mean()
  • 29. P. 29 Products Recommendations – Implementation
  • 30. P. 30 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 31. P. 31 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 32. P. 32 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 33. P. 33 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 34. P. 34 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 35. P. 35 Products Recommendations – Implementation CLOSING THOUGHTS  The goal of a collaborative filtering algorithm is to take preferences data from users, and to create a model that can be used for recommendations or predictions.  Collaborative filtering algorithms recommend items based on preference information from many users. The collaborative filtering approach is based on similarity; people who liked similar items in the past will like similar items in the future.  Machine learning algorithms are pretty complicated  Apache Spark’s MLlib has Built-in modules for ClassiïŹcation, regression, clustering, recommendations etc algorithms. Under the hood the library takes care of running these algorithms across a cluster. This completely abstracts the programmer from Implementing the ML algorithm Intricacies of running it across a cluster  Latent Factor analysis and ALS are pretty magical. We just need to have a good dataset with User-Product Ratings