Successfully reported this slideshow.
Your SlideShare is downloading. ×

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 34 Ad

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

Download to read offline

At the sold-out Spark & Machine Learning Meetup in Brussels on October 27, 2016, Nick Pentreath of the Spark Technology Center teamed up with Jean-François Puget of IBM Analytics to deliver a talk called Creating an end-to-endRecommender System with Apache Spark and Elasticsearch.

Jean-François and Nick started with a look at the workflow for recommender systems and machine learning, then moved on to data modeling and using Spark ML for collaborative filtering. They closed with a discussion of deploying and scoring the recommender models, including a demo.

At the sold-out Spark & Machine Learning Meetup in Brussels on October 27, 2016, Nick Pentreath of the Spark Technology Center teamed up with Jean-François Puget of IBM Analytics to deliver a talk called Creating an end-to-endRecommender System with Apache Spark and Elasticsearch.

Jean-François and Nick started with a look at the workflow for recommender systems and machine learning, then moved on to data modeling and using Spark ML for collaborative filtering. They closed with a discussion of deploying and scoring the recommender models, including a demo.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget (20)

Advertisement

More from sparktc (12)

Recently uploaded (20)

Advertisement

Creating an end-to-end Recommender System with Apache Spark and Elasticsearch - Nick Pentreath & Jean-François Puget

  1. 1. Spark Technology Center Oct / 27 / 16 Creating an end-to-end Recommender System with Apache Spark and Elasticsearch Jean-François Puget Nick Pentreath
  2. 2. Spark Technology Center § @JFPuget § Distinguished Engineer, IBM Machine Learning & Optimization § @MLnick § Principal Engineer, IBM Spark Technology Center § Apache Spark PMC About
  3. 3. Spark Technology Center § Recommender systems & the machine learning workflow § Data modelling for recommender systems § Why Spark & Elasticsearch? § Spark ML for collaborative filtering § Deploying & scoring recommender models § Demo Agenda
  4. 4. Spark Technology Center Recommender Systems & the ML Workflow
  5. 5. Spark Technology Center Recommender Systems Overview
  6. 6. Spark Technology Center The Machine Learning Workflow Perception Data ??? Machine Learning ??? $$$
  7. 7. Spark Technology Center The Machine Learning Workflow Reality Data • Historical • Streaming Ingest Data Processing • Feature transformation & engineering Model Training • Model selection & evaluation Deploy • Pipelines, not just models • Versioning Live System • Predict given new data • Monitoring & live evaluation Feedback Loop Spark DataFrames Spark ML Various ??? Stream (Kafka) Missing piece!
  8. 8. Spark Technology Center The Machine Learning Workflow Recommender Version Data Ingest Data Processing • Aggregation • Handle implicit data Model Training • ALS • Ranking-style evaluation Deploy • Model size & complexity Live System •User & item recommendations •Monitoring, filters Feedback => another Event Type Spark DataFrames Spark ML Elasticsearch • User & Item Metadata • Events Elasticsearch Stream (Kafka)
  9. 9. Spark Technology Center Data Modeling for Recommender Systems
  10. 10. Spark Technology Center Data modelUser and Item Metadata ! !
  11. 11. Spark Technology Center System RequirementsUser and Item Metadata ! ! Filtering & Grouping Business Rules
  12. 12. Spark Technology Center User interactions Implicit preference data • Page view • eCommerce - cart, purchase • Media – preview, watch, listen Intent data • Search query Anatomy of a User Event Explicit preference data • Rating • Review Social network interactions • Like • Share • Follow User Interactions ! ! ! ! ! ! ! !
  13. 13. Spark Technology Center Data modelAnatomy of a User Event ! ! ! !! ! !
  14. 14. Spark Technology Center How to handle implicit feedback?Anatomy of a User Event ! ! ! !! ! ! !
  15. 15. Spark Technology Center Why Spark & Elasticsearch?
  16. 16. Spark Technology Center DataFrames § Events & metadata are “lightly structured” data § Suited to DataFrames § Pluggable external data source support Spark ML § Spark ML pipelines § Scalable ALS algorithm, supporting implicit feedback & NMF § Cross-validation § Custom transformers & algorithms Why Spark?
  17. 17. Spark Technology Center Storage § Native JSON § Scalable § Good support for time-series / event data § Kibana for data visualisation § Integration with Spark DataFrames Scoring § Full-text search § Filtering § Aggregations (grouping) § Search ~== recommendation (more later) Why Elasticsearch?
  18. 18. Spark Technology Center Spark ML for Collaborative Filtering
  19. 19. Spark Technology Center Matrix FactorizationCollaborative Filtering 3 4 1 5 2 1 3 2 1 ! ! −1.1 3.2 4.3 0.2 1.4 3.1 2.5 0.3 2.3 4.3 −2.4 0.5 3.6 0.3 1.2 0.2 1.7 2.3 1.9 0.4 0.8 1.5 −1.2 0.3 −0.4 2.1 0.6 2.7 0.8 1.4 ! !
  20. 20. Spark Technology Center PredictionCollaborative Filtering 3 4 1 5 2 1 3 2 1 ! ! −1.1 3.2 4.3 0.2 1.4 3.1 2.5 0.3 2.3 4.3 −2.4 0.5 3.6 0.3 1.2 0.2 1.7 2.3 1.9 0.4 0.8 1.5 −1.2 0.3 −0.4 2.1 0.6 2.7 0.8 1.4 ! !
  21. 21. Spark Technology Center Loading DataAlternating Least Squares
  22. 22. Spark Technology Center Implicit Preference DataAlternating Least Squares
  23. 23. Spark Technology Center Deploying & Scoring Recommendation Models
  24. 24. Spark Technology Center Full-text Search & SimilarityPrelude: Search “cat videos” ! !cat videos 0 0 ⋯ 0 1 ⋯ 0 1 ⋯ 1 1 ⋯ 1 1 ⋯ 0 0 ⋯ 1 0 ⋯ 0 1 ⋯ Sort results 0 1 ⋯ 1 0 ⋯ Scoring RankingAnalysis Term vectors Similarity
  25. 25. Spark Technology Center Can we use the same machinery?Recommendation ! 0 0 ⋯ 0 1 ⋯ 0 1 ⋯ 1 1 ⋯ 1 1 ⋯ 0 0 ⋯ 1 0 ⋯ 0 1 ⋯ Sort results 1.2 ⋯ −0.2 0.3 Dot product & cosine similarity … the same as we need for recommendations! Scoring RankingAnalysis Term vectors ! !!! SimilarityUser (or item) vector ?
  26. 26. Spark Technology Center Delimited Payload FilterElasticsearch Term Vectors Raw vector 1.2 ⋯ −0.2 0.3 Term vector with payloads 0|1.2 ⋯ 3|-0.2 4|0.3 Custom analyzer
  27. 27. Spark Technology Center Custom scoring function • Native script (Java), compiled for speed • Scoring function computes dot product by: § For each document vector index (“term”), retrieve payload § score += payload * query(i) • Normalize with query vector norm and document vector norm for cosine similarity (“similar items”) Elasticsearch Scoring
  28. 28. Spark Technology Center Can we use the same machinery?Recommendation ! Sort results 1.2 ⋯ −0.2 0.3 Scoring RankingAnalysis Term vectors !! Custom scoring function !! Delimited payload filter −1.1 1.3 ⋯ 0.4 1.2 −0.2 ⋯ 0.3 0.5 0.7 ⋯ −1.3 0.9 1.4 ⋯ −0.8 ! User (or item) vector
  29. 29. Spark Technology Center We get search engine functionality for free!Elasticsearch Scoring
  30. 30. Spark Technology Center Deploying to ElasticsearchAlternating Least Squares
  31. 31. Spark Technology Center Monitoring & Feedback
  32. 32. Spark Technology Center Demo
  33. 33. Spark Technology Center Elasticsearch Elasticsearch Spark Integration Spark ML ALS for Collaborative Filtering Collaborative Filtering for Implicit Feedback Datasets Elasticsearch Term Vectors & Payloads Delimited Payload Filter Vector Scoring Plugin Kibana References
  34. 34. Spark Technology Center Thanks! https://github.com/MLnick/sseu16-meetup https://github.com/MLnick/elasticsearch-vector-scoring

×