Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Spark Technology Center
Oct /
27 /
16
Creating an end-to-end
Recommender System
with Apache Spark
and Elasticsearch
Jean-F...
Spark Technology Center
§ @JFPuget
§ Distinguished Engineer, IBM Machine
Learning & Optimization
§ @MLnick
§ Principal Eng...
Spark Technology Center
§ Recommender systems & the machine
learning workflow
§ Data modelling for recommender
systems
§ W...
Spark Technology Center
Recommender
Systems & the ML
Workflow
Spark Technology Center
Recommender
Systems
Overview
Spark Technology Center
The Machine
Learning
Workflow
Perception
Data ???
Machine
Learning
??? $$$
Spark Technology Center
The Machine
Learning
Workflow
Reality
Data
• Historical
• Streaming
Ingest
Data
Processing
• Featu...
Spark Technology Center
The Machine
Learning
Workflow
Recommender Version
Data Ingest
Data
Processing
• Aggregation
• Hand...
Spark Technology Center
Data Modeling for
Recommender
Systems
Spark Technology Center
Data modelUser and Item
Metadata
! !
Spark Technology Center
System RequirementsUser and Item
Metadata
! !
Filtering &
Grouping
Business
Rules
Spark Technology Center
User interactions
Implicit preference data
• Page view
• eCommerce - cart, purchase
• Media – prev...
Spark Technology Center
Data modelAnatomy of a
User Event
!
!
! !! !
!
Spark Technology Center
How to handle implicit feedback?Anatomy of a
User Event
!
!
! !! !
!
!
Spark Technology Center
Why Spark &
Elasticsearch?
Spark Technology Center
DataFrames
§ Events & metadata are “lightly
structured” data
§ Suited to DataFrames
§ Pluggable ex...
Spark Technology Center
Storage
§ Native JSON
§ Scalable
§ Good support for time-series / event data
§ Kibana for data vis...
Spark Technology Center
Spark ML for
Collaborative
Filtering
Spark Technology Center
Matrix FactorizationCollaborative
Filtering
3 4
1
5 2
1 3
2 1
!
!
−1.1 3.2 4.3
0.2 1.4 3.1
2.5 0.3...
Spark Technology Center
PredictionCollaborative
Filtering
3 4
1
5 2
1 3
2 1
!
!
−1.1 3.2 4.3
0.2 1.4 3.1
2.5 0.3 2.3
4.3 −...
Spark Technology Center
Loading DataAlternating Least
Squares
Spark Technology Center
Implicit Preference DataAlternating Least
Squares
Spark Technology Center
Deploying &
Scoring
Recommendation
Models
Spark Technology Center
Full-text Search & SimilarityPrelude: Search
“cat videos”
!
!cat videos
0 0 ⋯ 0 1 ⋯
0 1 ⋯ 1 1 ⋯
1 ...
Spark Technology Center
Can we use the same machinery?Recommendation
!
0 0 ⋯ 0 1 ⋯
0 1 ⋯ 1 1 ⋯
1 1 ⋯ 0 0 ⋯
1 0 ⋯ 0 1 ⋯
Sor...
Spark Technology Center
Delimited Payload FilterElasticsearch
Term Vectors
Raw vector
1.2 ⋯ −0.2 0.3
Term vector with payl...
Spark Technology Center
Custom scoring function
• Native script (Java), compiled for speed
• Scoring function computes dot...
Spark Technology Center
Can we use the same machinery?Recommendation
! Sort
results
1.2 ⋯ −0.2 0.3
Scoring RankingAnalysis...
Spark Technology Center
We get search engine functionality for free!Elasticsearch
Scoring
Spark Technology Center
Deploying to ElasticsearchAlternating Least
Squares
Spark Technology Center
Monitoring &
Feedback
Spark Technology Center
Demo
Spark Technology Center
Elasticsearch
Elasticsearch Spark Integration
Spark ML ALS for Collaborative Filtering
Collaborati...
Spark Technology Center
Thanks!
https://github.com/MLnick/sseu16-meetup
https://github.com/MLnick/elasticsearch-vector-sco...
Upcoming SlideShare
Loading in …5
×

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

4,343 views

Published on

At the sold-out Spark & Machine Learning Meetup in Brussels on October 27, 2016, Nick Pentreath of the Spark Technology Center teamed up with Jean-François Puget of IBM Analytics to deliver a talk called Creating an end-to-endRecommender System with Apache Spark and Elasticsearch.

Jean-François and Nick started with a look at the workflow for recommender systems and machine learning, then moved on to data modeling and using Spark ML for collaborative filtering. They closed with a discussion of deploying and scoring the recommender models, including a demo.

Published in: Technology

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

  1. 1. Spark Technology Center Oct / 27 / 16 Creating an end-to-end Recommender System with Apache Spark and Elasticsearch Jean-François Puget Nick Pentreath
  2. 2. Spark Technology Center § @JFPuget § Distinguished Engineer, IBM Machine Learning & Optimization § @MLnick § Principal Engineer, IBM Spark Technology Center § Apache Spark PMC About
  3. 3. Spark Technology Center § Recommender systems & the machine learning workflow § Data modelling for recommender systems § Why Spark & Elasticsearch? § Spark ML for collaborative filtering § Deploying & scoring recommender models § Demo Agenda
  4. 4. Spark Technology Center Recommender Systems & the ML Workflow
  5. 5. Spark Technology Center Recommender Systems Overview
  6. 6. Spark Technology Center The Machine Learning Workflow Perception Data ??? Machine Learning ??? $$$
  7. 7. Spark Technology Center The Machine Learning Workflow Reality Data • Historical • Streaming Ingest Data Processing • Feature transformation & engineering Model Training • Model selection & evaluation Deploy • Pipelines, not just models • Versioning Live System • Predict given new data • Monitoring & live evaluation Feedback Loop Spark DataFrames Spark ML Various ??? Stream (Kafka) Missing piece!
  8. 8. Spark Technology Center The Machine Learning Workflow Recommender Version Data Ingest Data Processing • Aggregation • Handle implicit data Model Training • ALS • Ranking-style evaluation Deploy • Model size & complexity Live System •User & item recommendations •Monitoring, filters Feedback => another Event Type Spark DataFrames Spark ML Elasticsearch • User & Item Metadata • Events Elasticsearch Stream (Kafka)
  9. 9. Spark Technology Center Data Modeling for Recommender Systems
  10. 10. Spark Technology Center Data modelUser and Item Metadata ! !
  11. 11. Spark Technology Center System RequirementsUser and Item Metadata ! ! Filtering & Grouping Business Rules
  12. 12. Spark Technology Center User interactions Implicit preference data • Page view • eCommerce - cart, purchase • Media – preview, watch, listen Intent data • Search query Anatomy of a User Event Explicit preference data • Rating • Review Social network interactions • Like • Share • Follow User Interactions ! ! ! ! ! ! ! !
  13. 13. Spark Technology Center Data modelAnatomy of a User Event ! ! ! !! ! !
  14. 14. Spark Technology Center How to handle implicit feedback?Anatomy of a User Event ! ! ! !! ! ! !
  15. 15. Spark Technology Center Why Spark & Elasticsearch?
  16. 16. Spark Technology Center DataFrames § Events & metadata are “lightly structured” data § Suited to DataFrames § Pluggable external data source support Spark ML § Spark ML pipelines § Scalable ALS algorithm, supporting implicit feedback & NMF § Cross-validation § Custom transformers & algorithms Why Spark?
  17. 17. Spark Technology Center Storage § Native JSON § Scalable § Good support for time-series / event data § Kibana for data visualisation § Integration with Spark DataFrames Scoring § Full-text search § Filtering § Aggregations (grouping) § Search ~== recommendation (more later) Why Elasticsearch?
  18. 18. Spark Technology Center Spark ML for Collaborative Filtering
  19. 19. Spark Technology Center Matrix FactorizationCollaborative Filtering 3 4 1 5 2 1 3 2 1 ! ! −1.1 3.2 4.3 0.2 1.4 3.1 2.5 0.3 2.3 4.3 −2.4 0.5 3.6 0.3 1.2 0.2 1.7 2.3 1.9 0.4 0.8 1.5 −1.2 0.3 −0.4 2.1 0.6 2.7 0.8 1.4 ! !
  20. 20. Spark Technology Center PredictionCollaborative Filtering 3 4 1 5 2 1 3 2 1 ! ! −1.1 3.2 4.3 0.2 1.4 3.1 2.5 0.3 2.3 4.3 −2.4 0.5 3.6 0.3 1.2 0.2 1.7 2.3 1.9 0.4 0.8 1.5 −1.2 0.3 −0.4 2.1 0.6 2.7 0.8 1.4 ! !
  21. 21. Spark Technology Center Loading DataAlternating Least Squares
  22. 22. Spark Technology Center Implicit Preference DataAlternating Least Squares
  23. 23. Spark Technology Center Deploying & Scoring Recommendation Models
  24. 24. Spark Technology Center Full-text Search & SimilarityPrelude: Search “cat videos” ! !cat videos 0 0 ⋯ 0 1 ⋯ 0 1 ⋯ 1 1 ⋯ 1 1 ⋯ 0 0 ⋯ 1 0 ⋯ 0 1 ⋯ Sort results 0 1 ⋯ 1 0 ⋯ Scoring RankingAnalysis Term vectors Similarity
  25. 25. Spark Technology Center Can we use the same machinery?Recommendation ! 0 0 ⋯ 0 1 ⋯ 0 1 ⋯ 1 1 ⋯ 1 1 ⋯ 0 0 ⋯ 1 0 ⋯ 0 1 ⋯ Sort results 1.2 ⋯ −0.2 0.3 Dot product & cosine similarity … the same as we need for recommendations! Scoring RankingAnalysis Term vectors ! !!! SimilarityUser (or item) vector ?
  26. 26. Spark Technology Center Delimited Payload FilterElasticsearch Term Vectors Raw vector 1.2 ⋯ −0.2 0.3 Term vector with payloads 0|1.2 ⋯ 3|-0.2 4|0.3 Custom analyzer
  27. 27. Spark Technology Center Custom scoring function • Native script (Java), compiled for speed • Scoring function computes dot product by: § For each document vector index (“term”), retrieve payload § score += payload * query(i) • Normalize with query vector norm and document vector norm for cosine similarity (“similar items”) Elasticsearch Scoring
  28. 28. Spark Technology Center Can we use the same machinery?Recommendation ! Sort results 1.2 ⋯ −0.2 0.3 Scoring RankingAnalysis Term vectors !! Custom scoring function !! Delimited payload filter −1.1 1.3 ⋯ 0.4 1.2 −0.2 ⋯ 0.3 0.5 0.7 ⋯ −1.3 0.9 1.4 ⋯ −0.8 ! User (or item) vector
  29. 29. Spark Technology Center We get search engine functionality for free!Elasticsearch Scoring
  30. 30. Spark Technology Center Deploying to ElasticsearchAlternating Least Squares
  31. 31. Spark Technology Center Monitoring & Feedback
  32. 32. Spark Technology Center Demo
  33. 33. Spark Technology Center Elasticsearch Elasticsearch Spark Integration Spark ML ALS for Collaborative Filtering Collaborative Filtering for Implicit Feedback Datasets Elasticsearch Term Vectors & Payloads Delimited Payload Filter Vector Scoring Plugin Kibana References
  34. 34. Spark Technology Center Thanks! https://github.com/MLnick/sseu16-meetup https://github.com/MLnick/elasticsearch-vector-scoring

×