Retail Products Recommendations Using
Machine Learning
Products Recommendations
Business Case – Why?
When you use YouTube, Netflix or other online media services, you may have
noticed “recommendation for you” on videos, movies or music. As
consumers, we like to have a personalized list for easy access to products,
services and to save time. As we watch more videos, those recommendations
become better in accuracy and quality. A more satisfied and happy user is a
winning factor for a company.
Big data makes this easy and cool stuff available to us with its scalability and it’s
power to process huge data either structured or unstructured data. Through big
data, data developers can analyze billions of products of a company and process
them with the help of machine learning to better provide even more narrowed
recommendations for the user.
Products Recommendations - Business Case
Customers of these brands would be
delighted with the huge
variety of products to choose from
But often find it difficult to
sift through the variety
and identify things they would like
Navigate the maze of the product catalogues
Find what they are looking for
Find PRODUCTS they might like, But didn’t know of
Products Recommendations - Business Case
SOLVE the Problem of DISCOVERY
Products Recommendations - Business Case
P. 6
What users
What users
What Users
Top picks for you!!!
If you like this,
You’ll love that
Products Recommendations - Business Case
Products Recommendations
P. 8
Filter Relevant Products
Predict what rating the user
would give a product
Predict whether a user
would buy a product
Rank products based on their
relevance to the user
Tasks Performed
Products Recommendations - How?
use a technique called
FILTERING – Latent Factor
How does that work?
The basic premise is that
If 2 users have the same opinion
About a bunch of Products
They are likely to have the same
opinion about other products too
Algorithms normally predict
Users’ Ratings for Products they haven’t yet rated
Products Recommendations - How?
Products are represented using
these descriptors
Users are represented using the same descriptors
Joe likes light weight skinny fit jeans and
Linen-cotton short sleeve standard fit shirt
9, 7
Products Recommendations - How?
P. 11
10, 9
Products Recommendations - How?
4 - 4 - - - - - - -
- 3 4 - - - - - - -
5 3 2 - - - - 3 - -
2 - 3 - - - - - - -
5 - - - 4 2 3 - - -
- - - - - 4 - - - -
- - - 4 - 5 - - - -
- 3 3 - - - - - 4 -
Prod1 Prod2 Prod3 Prod4 Prod5 Prod n


User n
User ID
Product ID Rating
Products Recommendations - How?
Products Recommendations
P. 14
We will use the Spark’s MLlib ALS algorithm to learn the latent factors that can be used to predict missing entries in the user-
product association matrix.
First we separate the ratings data into training data (80%) and test data (20%). We will get recommendations for the training
data, then we will evaluate the predictions with the test data. This process of taking a subset of the data to build the model
and then verifying the model with the remaining data is known as cross validation, the goal is to estimate how accurately a
predictive model will perform in practice.
To improve the model this process is often done multiple times with different subsets, we will only do it once.
Products Recommendations – Implementation
Using ALTERNATING LEAST SQUARES (ALS) to Build a Matrix Factorization Model
Products Recommendations – Implementation
All ratings are contained in the file "ratings.dat" and are in the following format:
- UserIDs range between 1 and 6040
- ProductIDs range between 1 and 3952
- Ratings are made on a 5-star scale
- Timestamp is represented in seconds since the epoch
User information is in the file "users.dat" and is in the following format:
- Gender is denoted by a "M" for male and "F" for female
- Age is chosen from the following ranges:
* 1: "Under 18“
* 18: "18-24“
* 25: "25-34“
* 35: "35-44“
* 45: "45-49“
* 50: "50-55“
* 56: "56+"
- Occupation is chosen from the following choices:
* 0: "other" or not specified
* 1: "academic/educator“
* 2: "artist“
* 3: "clerical/admin"
* 4: "college/grad student“
* 5: "customer service“
* 6: "doctor/health care“
* 7: "executive/managerial“
* 8: "farmer“
* 9: "homemaker“
* 10: "K-12 student“
* 11: "lawyer“
* 12: "programmer“
* 13: "retired“
* 14: "sales/marketing“
* 15: "scientist“
* 16: "self-employed“
* 17: "technician/engineer“
* 18: "tradesman/craftsman“
* 19: "unemployed“
* 20: "writer"
Product information is in the file “products.dat" and is in the following format:
The Sample Data Sets
Products Recommendations – Implementation
The Infrastructure – Amazon Web Services: EMR
P. 17
Products Recommendations – Implementation
The Infrastructure – Amazon Web Services: EMR
P. 18
Products Recommendations – Implementation
The Infrastructure – Amazon Web Services: EMR
P. 19
Load Data into Spark DataFrames
First we will import some packages and instantiate a sqlContext, which is the entry point for working with structured data
(rows and columns) in Spark and allows the creation of DataFrame objects.
Products Recommendations – Implementation
// SQLContext entry point for working with structured data
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
// Import Spark SQL data types
import org.apache.spark.sql._
// Import MLLIB data types
import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating}
// define the schemas using a case classes
// input format ProductID::Name::Category
case class Product(productId: Int, name: String)
Products Recommendations – Implementation
// input format UserID::Gender::Age::Occupation::Zip-code
case class User(userId: Int, gender: String, age: Int, occupation: Int, zip: String)
// function to parse input into Product class
def parseProduct(str: String): Product = {
val fields = str.split("::")
assert(fields.size == 3)
Product(fields(0).toInt, fields(1))
// function to parse input into User class
def parseUser(str: String): User = {
val fields = str.split("::")
assert(fields.size == 5)
User(fields(0).toInt, fields(1).toString, fields(2).toInt, fields(3).toInt, fields(4).toString)
// function to parse input UserID::ProductID::Rating
// and pass into constructor for org.apache.spark.mllib.recommendation.Rating class
def parseRating(str: String): Rating = {
val fields = str.split("::")
Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble)
Products Recommendations – Implementation
// load the data into an RDD
val ratingText = sc.textFile("/user/hadoop/data/ratings.dat")
val ratingsRDD =
// count number of total ratings
val numRatings = ratingsRDD.count()
// count number of users who rated a product
val numUsers =
// count number of product rated
val numProducts =
println(s"Got $numRatings ratings from $numUsers users on $numProducts products.")
// load the data into DataFrames
val productsDF= sc.textFile("/user/hadoop/data/products.dat").map(parseProduct).toDF()
val usersDF = sc.textFile("/user/hadoop/data/users.dat").map(parseUser).toDF()
// create a DataFrame from ratingsRDD
val ratingsDF = ratingsRDD.toDF()
Products Recommendations – Implementation"product").distinct.count //res7: Long = 3706
ratingsDF.groupBy("product", "rating")
ratingsDF.groupBy("product").count.agg(min("count"), avg("count"),max("count")).show"product", "rating").groupBy("product", "rating").count.agg(min("count"), avg("count"),max("count")).show
// Count the max, min ratings along with the number of users who have rated a product.
// Display the name, max rating, min rating, number of users.
val results =sqlContext.sql("select, productrates.maxr, productrates.minr, productrates.cntu from(SELECT
ratings.product, max(ratings.rating) as maxr, min(ratings.rating) as minr,count(distinct user) as cntu FROM ratings group by
ratings.product ) productrates join products on productrates.product=products.productId order by productrates.cntu desc ")
// DataFrame show() displays the top 20 rows in tabular form
// Show the top 10 most-active users and how many times they rated a product
val mostActiveUsersSchemaRDD = sqlContext.sql("SELECT ratings.user, count(*) as ct from ratings group by ratings.user order by
ct desc limit 10")
// Find the products that user 4169 rated higher than 4
val results =sqlContext.sql("SELECT ratings.user, ratings.product, ratings.rating, FROM ratings JOIN products ON
products.productId=ratings.product where ratings.user=4169 and ratings.rating > 4 order by ratings.rating desc ")
Products Recommendations – Implementation
Products Recommendations – Implementation
We run ALS on the input trainingRDD of Rating (user, product, rating) objects with the rank and Iterations parameters:
‱ Rank is the number of latent factors in the model.
‱ Iterations is the number of iterations to run.
The ALS run(trainingRDD) method will build and return a MatrixFactorizationModel, which can be used to make product
predictions for users.
// Randomly split ratings RDD into training data RDD (80%) and test data RDD (20%)
val splits = ratingsRDD.randomSplit(Array(0.8, 0.2), 0L)
val trainingRatingsRDD = splits(0).cache()
val testRatingsRDD = splits(1).cache()
val numTraining = trainingRatingsRDD.count()
val numTest = testRatingsRDD.count()
println(s"Training: $numTraining, test: $numTest.")
// Build the recommendation model using ALS with rank=20, iterations=10
val model = ALS.train(trainingRatingsRDD, 20, 10)
Products Recommendations – Implementation
Products Recommendations – Implementation
val model = (new ALS().setRank(20).setIterations(10).run(trainingRatingsRDD))
Making Predictions with the MatrixFactorizationModel
Now we can use the MatrixFactorizationModel to make predictions. First we will get product predictions for the most active
user, 4169, with the recommendProducts() method , which takes as input the user ID and the number of products to
recommend. Then we print out the recommended product names.
// Make product predictions for user 4169
val topRecsForUser = model.recommendProducts(4169, 10)
// get product names to show with recommendations
val productNames= => (array(0), array(1))).collectAsMap()
// print out top recommendations for user 4169 with products => (productNames(rating.product), rating.rating)).foreach(println)
Evaluating the Model
Next we will compare predictions from the model with actual ratings in the testRatingsRDD. First we get the user product pairs
from the testRatingsRDD to pass to the MatrixFactorizationModel predict(user:Int,product:Int) method , which will return
predictions as Rating (user, product, rating) objects.
Products Recommendations – Implementation
// get predicted ratings to compare to test ratings
val predictionsForTestRDD = model.predict({case Rating(user, product, rating) => (user, product)})
Now we will compare the test predictions to the actual test ratings. First we put the predictions and the test RDDs in this key,
value pair format for joining: ((user, product), rating). Then we print out the (user, product), (test rating, predicted rating) for
// prepare the predictions for comparison
val predictionsKeyedByUserProductRDD ={
case Rating(user, product, rating) => ((user, product), rating)
// prepare the test for comparison
val testKeyedByUserProductRDD ={
case Rating(user, product, rating) => ((user, product), rating)
//Join the test with the predictions
val testAndPredictionsJoinedRDD = testKeyedByUserProductRDD.join(predictionsKeyedByUserProductRDD)
Products Recommendations – Implementation
The example below finds false positives by finding predicted ratings which were >= 4 when the actual test rating was <= 1.
val falsePositives =(testAndPredictionsJoinedRDD.filter{ case ((user, product), (ratingT, ratingP)) => (ratingT <= 1 && ratingP
>=4) })
Next we evaluate the model using Mean Absolute Error (MAE). MAE is the absolute differences between the predicted and
actual targets.
//Evaluate the model using Mean Absolute Error (MAE) between test and predictions
val meanAbsoluteError = {
case ((user, product), (testRating, predRating)) =>
val err = (testRating - predRating)
Products Recommendations – Implementation
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
Products Recommendations – Implementation
Amazon Web Services: EMR Cluster Monitoring
Products Recommendations – Implementation
 The goal of a collaborative filtering algorithm is to take preferences data from users, and to create a model that can be used
for recommendations or predictions.
 Collaborative filtering algorithms recommend items based on preference information from many users. The collaborative
filtering approach is based on similarity; people who liked similar items in the past will like similar items in the future.
 Machine learning algorithms are pretty complicated
 Apache Spark’s MLlib has Built-in modules for ClassiïŹcation, regression, clustering, recommendations etc algorithms. Under the
hood the library takes care of running these algorithms across a cluster. This completely abstracts the programmer from
Implementing the ML algorithm Intricacies of running it across a cluster
 Latent Factor analysis and ALS are pretty magical. We just need to have a good dataset with User-Product Ratings
Thank You
  • 1. Retail Products Recommendations Using Machine Learning
  • 3. P. 3 When you use YouTube, Netflix or other online media services, you may have noticed “recommendation for you” on videos, movies or music. As consumers, we like to have a personalized list for easy access to products, services and to save time. As we watch more videos, those recommendations become better in accuracy and quality. A more satisfied and happy user is a winning factor for a company. Big data makes this easy and cool stuff available to us with its scalability and it’s power to process huge data either structured or unstructured data. Through big data, data developers can analyze billions of products of a company and process them with the help of machine learning to better provide even more narrowed recommendations for the user. Products Recommendations - Business Case PRODUCT RECOMMENDATIONS
  • 4. P. 4 Customers of these brands would be delighted with the huge variety of products to choose from But often find it difficult to sift through the variety and identify things they would like RECOMMENDATIONS HELP USERS Navigate the maze of the product catalogues Find what they are looking for Find PRODUCTS they might like, But didn’t know of Products Recommendations - Business Case
  • 5. P. 5 RECOMMENDATIONS HELPTHESE BRANDS SOLVE the Problem of DISCOVERY HOW? Products Recommendations - Business Case
  • 6. P. 6 HOW? USING DATA What users Bought What users Browsed What Users Rated RECOMMENDATION ENGINE Top picks for you!!! If you like this, You’ll love that Products Recommendations - Business Case
  • 8. P. 8 RECOMMENDATION ENGINE OBJECTIVE Filter Relevant Products Predict what rating the user would give a product Predict whether a user would buy a product Rank products based on their relevance to the user Tasks Performed By RECOMMENDATION ENGINES Products Recommendations - How?
  • 9. P. 9 Most RECOMMENDATION ENGINES use a technique called COLLABORATIVE FILTERING – Latent Factor How does that work? The basic premise is that If 2 users have the same opinion About a bunch of Products They are likely to have the same opinion about other products too IT REPRESENTS USERS BYTHEIR RATINGS FOR DIFFERENT PRODUCTS COLLABORATIVE FILTERING Algorithms normally predict Users’ Ratings for Products they haven’t yet rated Products Recommendations - How?
  • 10. P. 10 Products are represented using these descriptors sweatersJeans Shirts Outerwear Users are represented using the same descriptors Joe likes light weight skinny fit jeans and Linen-cotton short sleeve standard fit shirt 9, 7 Products Recommendations - How?
  • 12. P. 12 USER-PRODUCT-RATING MATRIX 4 - 4 - - - - - - - - 3 4 - - - - - - - 5 3 2 - - - - 3 - - 2 - 3 - - - - - - - 5 - - - 4 2 3 - - - - - - - - 4 - - - - - - - 4 - 5 - - - - - 3 3 - - - - - 4 - Prod1 Prod2 Prod3 Prod4 Prod5 Prod n
 User1 User2 User3 User4 User5 
 User n User ID Product ID Rating Products Recommendations - How?
  • 14. P. 14 We will use the Spark’s MLlib ALS algorithm to learn the latent factors that can be used to predict missing entries in the user- product association matrix. First we separate the ratings data into training data (80%) and test data (20%). We will get recommendations for the training data, then we will evaluate the predictions with the test data. This process of taking a subset of the data to build the model and then verifying the model with the remaining data is known as cross validation, the goal is to estimate how accurately a predictive model will perform in practice. To improve the model this process is often done multiple times with different subsets, we will only do it once. Products Recommendations – Implementation Using ALTERNATING LEAST SQUARES (ALS) to Build a Matrix Factorization Model
  • 15. P. 15 Products Recommendations – Implementation All ratings are contained in the file "ratings.dat" and are in the following format: UserID::ProductID::Rating::Timestamp 1::1193::5::978300760 - UserIDs range between 1 and 6040 - ProductIDs range between 1 and 3952 - Ratings are made on a 5-star scale - Timestamp is represented in seconds since the epoch User information is in the file "users.dat" and is in the following format: UserID::Gender::Age::Occupation::Zip-code 1::F::1::10::4806720::M::25::14::55113 - Gender is denoted by a "M" for male and "F" for female - Age is chosen from the following ranges: * 1: "Under 18“ * 18: "18-24“ * 25: "25-34“ * 35: "35-44“ * 45: "45-49“ * 50: "50-55“ * 56: "56+" - Occupation is chosen from the following choices: * 0: "other" or not specified * 1: "academic/educator“ * 2: "artist“ * 3: "clerical/admin" * 4: "college/grad student“ * 5: "customer service“ * 6: "doctor/health care“ * 7: "executive/managerial“ * 8: "farmer“ * 9: "homemaker“ * 10: "K-12 student“ * 11: "lawyer“ * 12: "programmer“ * 13: "retired“ * 14: "sales/marketing“ * 15: "scientist“ * 16: "self-employed“ * 17: "technician/engineer“ * 18: "tradesman/craftsman“ * 19: "unemployed“ * 20: "writer" Product information is in the file “products.dat" and is in the following format: ProductID::Name::Category 1::Product1::Pants|Baby|Stripe The Sample Data Sets
  • 16. P. 16 Products Recommendations – Implementation The Infrastructure – Amazon Web Services: EMR
  • 17. P. 17 Products Recommendations – Implementation The Infrastructure – Amazon Web Services: EMR
  • 18. P. 18 Products Recommendations – Implementation The Infrastructure – Amazon Web Services: EMR
  • 19. P. 19 Load Data into Spark DataFrames First we will import some packages and instantiate a sqlContext, which is the entry point for working with structured data (rows and columns) in Spark and allows the creation of DataFrame objects. Products Recommendations – Implementation // SQLContext entry point for working with structured data val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.implicits._ // Import Spark SQL data types import org.apache.spark.sql._ // Import MLLIB data types import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating} // define the schemas using a case classes // input format ProductID::Name::Category case class Product(productId: Int, name: String)
  • 20. P. 20 Products Recommendations – Implementation // input format UserID::Gender::Age::Occupation::Zip-code case class User(userId: Int, gender: String, age: Int, occupation: Int, zip: String) // function to parse input into Product class def parseProduct(str: String): Product = { val fields = str.split("::") assert(fields.size == 3) Product(fields(0).toInt, fields(1)) } // function to parse input into User class def parseUser(str: String): User = { val fields = str.split("::") assert(fields.size == 5) User(fields(0).toInt, fields(1).toString, fields(2).toInt, fields(3).toInt, fields(4).toString) } // function to parse input UserID::ProductID::Rating // and pass into constructor for org.apache.spark.mllib.recommendation.Rating class def parseRating(str: String): Rating = { val fields = str.split("::") Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble) }
  • 21. P. 21 Products Recommendations – Implementation // load the data into an RDD val ratingText = sc.textFile("/user/hadoop/data/ratings.dat") val ratingsRDD = // count number of total ratings val numRatings = ratingsRDD.count() // count number of users who rated a product val numUsers = // count number of product rated val numProducts = println(s"Got $numRatings ratings from $numUsers users on $numProducts products.") // load the data into DataFrames val productsDF= sc.textFile("/user/hadoop/data/products.dat").map(parseProduct).toDF() val usersDF = sc.textFile("/user/hadoop/data/users.dat").map(parseUser).toDF() // create a DataFrame from ratingsRDD val ratingsDF = ratingsRDD.toDF() ratingsDF.registerTempTable("ratings") productsDF.registerTempTable("products") usersDF.registerTempTable("users")
  • 22. P. 22 Products Recommendations – Implementation"product").distinct.count //res7: Long = 3706 ratingsDF.groupBy("product", "rating") ratingsDF.groupBy("product").count.agg(min("count"), avg("count"),max("count")).show"product", "rating").groupBy("product", "rating").count.agg(min("count"), avg("count"),max("count")).show // Count the max, min ratings along with the number of users who have rated a product. // Display the name, max rating, min rating, number of users. val results =sqlContext.sql("select, productrates.maxr, productrates.minr, productrates.cntu from(SELECT ratings.product, max(ratings.rating) as maxr, min(ratings.rating) as minr,count(distinct user) as cntu FROM ratings group by ratings.product ) productrates join products on productrates.product=products.productId order by productrates.cntu desc ") // DataFrame show() displays the top 20 rows in tabular form // Show the top 10 most-active users and how many times they rated a product val mostActiveUsersSchemaRDD = sqlContext.sql("SELECT ratings.user, count(*) as ct from ratings group by ratings.user order by ct desc limit 10") mostActiveUsersSchemaRDD.take(20).foreach(println) // Find the products that user 4169 rated higher than 4 val results =sqlContext.sql("SELECT ratings.user, ratings.product, ratings.rating, FROM ratings JOIN products ON products.productId=ratings.product where ratings.user=4169 and ratings.rating > 4 order by ratings.rating desc ")
  • 23. P. 23 Products Recommendations – Implementation
  • 24. P. 24 Products Recommendations – Implementation We run ALS on the input trainingRDD of Rating (user, product, rating) objects with the rank and Iterations parameters: ‱ Rank is the number of latent factors in the model. ‱ Iterations is the number of iterations to run. The ALS run(trainingRDD) method will build and return a MatrixFactorizationModel, which can be used to make product predictions for users. // Randomly split ratings RDD into training data RDD (80%) and test data RDD (20%) val splits = ratingsRDD.randomSplit(Array(0.8, 0.2), 0L) val trainingRatingsRDD = splits(0).cache() val testRatingsRDD = splits(1).cache() val numTraining = trainingRatingsRDD.count() val numTest = testRatingsRDD.count() println(s"Training: $numTraining, test: $numTest.") // Build the recommendation model using ALS with rank=20, iterations=10 val model = ALS.train(trainingRatingsRDD, 20, 10)
  • 25. P. 25 Products Recommendations – Implementation
  • 26. P. 26 Products Recommendations – Implementation val model = (new ALS().setRank(20).setIterations(10).run(trainingRatingsRDD)) Making Predictions with the MatrixFactorizationModel Now we can use the MatrixFactorizationModel to make predictions. First we will get product predictions for the most active user, 4169, with the recommendProducts() method , which takes as input the user ID and the number of products to recommend. Then we print out the recommended product names. // Make product predictions for user 4169 val topRecsForUser = model.recommendProducts(4169, 10) // get product names to show with recommendations val productNames= => (array(0), array(1))).collectAsMap() // print out top recommendations for user 4169 with products => (productNames(rating.product), rating.rating)).foreach(println) Evaluating the Model Next we will compare predictions from the model with actual ratings in the testRatingsRDD. First we get the user product pairs from the testRatingsRDD to pass to the MatrixFactorizationModel predict(user:Int,product:Int) method , which will return predictions as Rating (user, product, rating) objects.
  • 27. P. 27 Products Recommendations – Implementation // get predicted ratings to compare to test ratings val predictionsForTestRDD = model.predict({case Rating(user, product, rating) => (user, product)}) predictionsForTestRDD.take(10).mkString("n") Now we will compare the test predictions to the actual test ratings. First we put the predictions and the test RDDs in this key, value pair format for joining: ((user, product), rating). Then we print out the (user, product), (test rating, predicted rating) for comparison. // prepare the predictions for comparison val predictionsKeyedByUserProductRDD ={ case Rating(user, product, rating) => ((user, product), rating) } // prepare the test for comparison val testKeyedByUserProductRDD ={ case Rating(user, product, rating) => ((user, product), rating) } //Join the test with the predictions val testAndPredictionsJoinedRDD = testKeyedByUserProductRDD.join(predictionsKeyedByUserProductRDD) testAndPredictionsJoinedRDD.take(10).mkString("n")
  • 28. P. 28 Products Recommendations – Implementation The example below finds false positives by finding predicted ratings which were >= 4 when the actual test rating was <= 1. val falsePositives =(testAndPredictionsJoinedRDD.filter{ case ((user, product), (ratingT, ratingP)) => (ratingT <= 1 && ratingP >=4) }) falsePositives.take(2) falsePositives.count Next we evaluate the model using Mean Absolute Error (MAE). MAE is the absolute differences between the predicted and actual targets. //Evaluate the model using Mean Absolute Error (MAE) between test and predictions val meanAbsoluteError = { case ((user, product), (testRating, predRating)) => val err = (testRating - predRating) Math.abs(err) }.mean()
  • 29. P. 29 Products Recommendations – Implementation
  • 30. P. 30 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 31. P. 31 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 32. P. 32 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 33. P. 33 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 34. P. 34 Products Recommendations – Implementation Amazon Web Services: EMR Cluster Monitoring
  • 35. P. 35 Products Recommendations – Implementation CLOSING THOUGHTS  The goal of a collaborative filtering algorithm is to take preferences data from users, and to create a model that can be used for recommendations or predictions.  Collaborative filtering algorithms recommend items based on preference information from many users. The collaborative filtering approach is based on similarity; people who liked similar items in the past will like similar items in the future.  Machine learning algorithms are pretty complicated  Apache Spark’s MLlib has Built-in modules for ClassiïŹcation, regression, clustering, recommendations etc algorithms. Under the hood the library takes care of running these algorithms across a cluster. This completely abstracts the programmer from Implementing the ML algorithm Intricacies of running it across a cluster  Latent Factor analysis and ALS are pretty magical. We just need to have a good dataset with User-Product Ratings