Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building aBuilding a
RecommenderRecommender
SystemSystem
in Pysparkin Pyspark
Will JohnsonWill Johnson
- Uline- Uline
- DePaul- DePaul
LearnBy
Marketing.com
AGENDAAGENDA
- RecSys- RecSys
* Basics* Basics
* MF* MF
* Evaluation* Evaluation
* Advanced* Advanced
- PySpark- PySpark
*...
User Based Collaborative Filtering
4.5
4.0
5.0
4.5
3.0
4.0
2.0
1.0 2.0
1.5
4.5
User Based Collaborative Filtering
4.5
4.0
5.0
4.5
3.0
4.0
3.8 2.0
1.0 2.0
1.5
4.5
Item Based Collaborative Filtering
Item Based Collaborative Filtering
Matrix Factorization
Matrix Factorization
Evaluation
RMSE =
√∑(Predicted−Actual)2
n
Precision Recall
|hitsu|
|RecoSetu|
|hitsu|
|TestSetu|
Expert Review: Novelty, C...
CRISP-DM
Data
Understanding
movielens = sc.textFile("../in/ml-100k/u.data")
Data
Understanding
movielens.first()
movielens.count() 100,000
u'196t242t3t881250949'
Data
Understanding
clean_data = movielens.map(lambda x:x.split('t'))
rate = clean_data.map(lambda y: int(y[2]))
rate.mean(...
Data
Preparation
from pyspark.mllib.recommendation
import ALS, MatrixFactorizationModel, Rating
mls = movielens.map(lambda...
Data
Preparation
train, test = ratings.randomSplit([0.7,0.3],7856)
train.count()
70,005
test.count()
29,995
train.cache()
...
Modeling
rank = 5 # Latent Factors to be made
numIterations = 10 # Times to repeat process
#Create the model on the traini...
Modeling /
Evaluation
model.userFeatures()
model.productFeatures()
Modeling /
Evaluation
# For Product X, Find N Users to Sell To
model.recommendUsers(242,100)
# For User Y Find N Products ...
Modeling /
Evaluation
# Predict Multi Users and Multi Products
# Pre-Processing
pred_input = train.map(lambda x:(x[0],x[1]...
Evaluation
User Item Actual Pred
196 242 3.0 3.91
186 302 3.0 3.29
22 377 1.0 1.09
244 51 2.0 3.66
298 474 4.0 4.11
TRAINI...
Evaluation
#Organize the data to make (user, product) the key)
true_reorg = train.map(lambda x:((x[0],x[1]), x[2]))
pred_r...
Evaluation
test_input = test.map(lambda x:(x[0],x[1]))
pred_test = model.predictAll(test_input)
test_reorg = test.map(lamb...
CRISP-DM
RECAP
RecSys are Nearest Neighbors or MF Based
ALS is Implemented in Spark
RECAP
rank = 5; numIterations = 10;
#Create the model on the training data
model = ALS.train(train, rank, numIterations)
#...
Questions?Questions?
LearnBy
Marketing.com
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Build a Recommendation Engine using Amazon Machine Learning in Real-time
Next
Download to read offline and view in fullscreen.

5

Share

Download to read offline

Recommender Systems with Apache Spark's ALS Function

Download to read offline

A quick visual guide to recommender systems (user based, item based, and matrix factorization) and the code behind making an apache spark MatrxFactorization Model with the ALS function.

Recommender Systems with Apache Spark's ALS Function

  1. 1. Building aBuilding a RecommenderRecommender SystemSystem in Pysparkin Pyspark
  2. 2. Will JohnsonWill Johnson - Uline- Uline - DePaul- DePaul LearnBy Marketing.com
  3. 3. AGENDAAGENDA - RecSys- RecSys * Basics* Basics * MF* MF * Evaluation* Evaluation * Advanced* Advanced - PySpark- PySpark * Basics* Basics * ALS* ALS
  4. 4. User Based Collaborative Filtering 4.5 4.0 5.0 4.5 3.0 4.0 2.0 1.0 2.0 1.5 4.5
  5. 5. User Based Collaborative Filtering 4.5 4.0 5.0 4.5 3.0 4.0 3.8 2.0 1.0 2.0 1.5 4.5
  6. 6. Item Based Collaborative Filtering
  7. 7. Item Based Collaborative Filtering
  8. 8. Matrix Factorization
  9. 9. Matrix Factorization
  10. 10. Evaluation RMSE = √∑(Predicted−Actual)2 n Precision Recall |hitsu| |RecoSetu| |hitsu| |TestSetu| Expert Review: Novelty, Context
  11. 11. CRISP-DM
  12. 12. Data Understanding movielens = sc.textFile("../in/ml-100k/u.data")
  13. 13. Data Understanding movielens.first() movielens.count() 100,000 u'196t242t3t881250949'
  14. 14. Data Understanding clean_data = movielens.map(lambda x:x.split('t')) rate = clean_data.map(lambda y: int(y[2])) rate.mean() 3.52986 3 users = clean_data.map(lambda y: int(y[0])) users.distinct().count() 943 clean_data.map(lambda y: int(y[1])). distinct().count() 1,682
  15. 15. Data Preparation from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating mls = movielens.map(lambda l: l.split('t')) ratings = mls.map(lambda x: Rating(int(x[0]), int(x[1]), float(x[2]))) Rating(user=196, product=242, rating=3.0)
  16. 16. Data Preparation train, test = ratings.randomSplit([0.7,0.3],7856) train.count() 70,005 test.count() 29,995 train.cache() test.cache()
  17. 17. Modeling rank = 5 # Latent Factors to be made numIterations = 10 # Times to repeat process #Create the model on the training data model = ALS.train(train, rank, numIterations)
  18. 18. Modeling / Evaluation model.userFeatures() model.productFeatures()
  19. 19. Modeling / Evaluation # For Product X, Find N Users to Sell To model.recommendUsers(242,100) # For User Y Find N Products to Promote model.recommendProducts(196,10) #Predict Single Product for Single User model.predict(196, 242)
  20. 20. Modeling / Evaluation # Predict Multi Users and Multi Products # Pre-Processing pred_input = train.map(lambda x:(x[0],x[1])) # Lots of Predictions pred = model.predictAll(pred_input) #Returns Ratings(user, item, prediction) (196, 242) Rating(user=894, product=1560, rating=3.845)
  21. 21. Evaluation User Item Actual Pred 196 242 3.0 3.91 186 302 3.0 3.29 22 377 1.0 1.09 244 51 2.0 3.66 298 474 4.0 4.11 TRAINING RMSE: 0.763
  22. 22. Evaluation #Organize the data to make (user, product) the key) true_reorg = train.map(lambda x:((x[0],x[1]), x[2])) pred_reorg = pred.map(lambda x:((x[0],x[1]), x[2])) #Do the actual join true_pred = true_reorg.join(pred_reorg) from math import sqrt MSE = true_pred.map(lambda r: (r[1][0] - r[1][1])**2).mean() RMSE = sqrt(MSE) #Results in 0.7629908117414474 ((582, 1014), (4.0, 3.397)) ((196, 242), 3.0)
  23. 23. Evaluation test_input = test.map(lambda x:(x[0],x[1])) pred_test = model.predictAll(test_input) test_reorg = test.map(lambda x:((x[0],x[1]), x[2])) pred_reorg = pred_test.map(lambda x: ((x[0],x[1]), x[2])) test_pred = test_reorg.join(pred_reorg) test_MSE = test_pred.map(lambda r: (r[1][0] - r[1][1])**2).mean() test_RMSE = sqrt(test_MSE) TEST RMSE: 1.0145
  24. 24. CRISP-DM
  25. 25. RECAP RecSys are Nearest Neighbors or MF Based ALS is Implemented in Spark
  26. 26. RECAP rank = 5; numIterations = 10; #Create the model on the training data model = ALS.train(train, rank, numIterations) # Lots of Predictions pred = model.predictAll(pred_input) #Examine Model Features model.productFeatures() # Save your model! model.save(sc,"../out/ml-model")
  27. 27. Questions?Questions? LearnBy Marketing.com
  • tkmallik

    Jul. 16, 2019
  • ruiwang36

    Aug. 17, 2017
  • ssuserae0045

    Jul. 24, 2017
  • SujayKar1

    Mar. 30, 2017
  • NaoualElMouhaddab

    May. 1, 2016

A quick visual guide to recommender systems (user based, item based, and matrix factorization) and the code behind making an apache spark MatrxFactorization Model with the ALS function.

Views

Total views

22,986

On Slideshare

0

From embeds

0

Number of embeds

19,639

Actions

Downloads

119

Shares

0

Comments

0

Likes

5

×