We will cover Apache Spark's Machine Learning Library (MLlib) This presentation covers using Spark for recommender systems.
MLlib is a library built on top of Spark's engine which allows us to train, test, validate and operationalize machine learning models while working with lots of data in a convenient way thanks to its robust abstractions over data sets.
Find out how you can use MLlib to build product recommendation systems by employing both traditional ML techniques such as collaborative filtering, as well as more novel, deep-learning approaches which make use of Neural Networks.
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Spark for Recommender Systems
1. Using Spark’s Machine
Learning Library to Make
Product Recommendations
Sorin Pește
Technology Solutions Professional, Data & AI
Microsoft
source: xkcd.com
3. A PA C H E S PA R K
A unified, distributed, open source engine for large-scale data processing
Spark Structured
Streaming
Stream processing
Spark MLlib
Machine
Learning
Spark Core Engine
Spark SQL
Interactive
Queries
Yarn Mesos
Standalone
Scheduler
Spark MLlib
Machine
Learning
Spark
Streaming
Stream processing
GraphX
Graph
Computation
5. S PA R K D ATA F R A M E S
A distributed collection of data that’s conceptually equivalent to a table
6. S P A R K M A C H I N E L E A R N I N G ( M L L I B )
Offers a set of parallelized machine learning algorithms for ML
Supports Model Selection (hyperparameter tuning) using Cross
Validation and Train-Validation Split.
Supports Java, Scala or Python apps using DataFrame-based API
Enables Parallel, Distributed ML for large datasets on Spark Clusters
7. S P A R K M L L I B A L G O R I T H M S
Spark MLlib
Algorithms
13. A L S : E X P L I C I T V S I M P L I C I T F E E D B A C K
Explicit feedback — user rates items
Implicit feedback — system records user activity
Browses a product page
Watches a movie trailer
Plays a song
Shares on social media
etc
Implicit feedback is generally used in real-world implementations
14. A L S : H Y P E R P A R A M E T E R T U N I N G
Hyperparameters which can be adjusted:
rank = the number of latent factors in the model
maxIter = the maximum number of iterations
regParam = the regularization parameter
15. A L S : H Y P E R P A R A M E T E R T U N I N G
16. A L S : W H A T A B O U T R E A L - T I M E ?
Near real-time computation of ALS algorithm may be unfeasible
Streaming variant of ALS, using Stochastic Gradient Descent
https://github.com/brkyvz/streaming-matrix-factorization
• Oryx Framework (http://oryx.io ) also offers streaming ALS
17. B E Y O N D A L S
ALS-learned latent factors can be useful as input for other algorithms
18. D E E P L E A R N I N G
A set of machine learning techniques that use multiple layers of non-linear processing units to
learn useful data representations of input
19. D E E P L E A R N I N G W I T H S P A R K
Integrations with existing DL libraries
• Microsoft CNTK (mmlspark)
• TensorFlow (TensorFlowOnSpark)
• DeepLearning4J
• Caffe (CaffeOnSpark)
• Keras (Elephas)
• mxnet
• Paddle
• and more…
Implementations of DL on Spark
• BigDL
• DeepDist
• SparkCL
• SparkNet
• Deep Learning Pipelines (Databricks)
• and more…
Distributed Hyperparameter Tuning
20. D E E P L E A R N I N G F O R R E C O M M E N D E R S
• Neural Collaborative Filtering (He et al, 2017)
https://arxiv.org/abs/1708.05031
https://github.com/hexiangnan/neural_collaborative_filtering
Neural Collaborative Filtering
21. D E E P L E A R N I N G F O R R E C O M M E N D E R S
• Predict the next item the user will want to interact with
Recommendations as sequence prediction
[a] -> b
[a, b] -> c
[a, b, c] -> d
[0, 0, 0, a] -> b
[0, 0, a, b] -> c
[0, a, b, c] -> d
22. D E E P L E A R N I N G F O R R E C O M M E N D E R S
• Predict the next item the user will want to interact with
Recommendations as sequence prediction
23. D E E P L E A R N I N G F O R R E C O M M E N D E R S
Session-based Recommendations with Recurrent Neural Networks
(Hidasi et al., 2015)
https://arxiv.org/abs/1511.06939
https://github.com/hidasib/GRU4Rec
Recommendations as sequence prediction
24. D E E P L E A R N I N G F O R R E C O M M E N D E R S
https://arxiv.org/pdf/1510.01784.pdf
Featurize product images