SlideShare a Scribd company logo
DBG / June 6, 2018 / © 2018 IBM Corporation
Deep Learning for
Recommender
Systems
Nick Pentreath
Principal Engineer
@MLnick
DBG / June 6, 2018 / © 2018 IBM Corporation
About
@MLnick on Twitter & Github
Principal Engineer, IBM
CODAIT - Center for Open-Source Data & AI
Technologies
Machine Learning & AI
Apache Spark committer & PMC
Author of Machine Learning with Spark
Various conferences & meetups
DBG / June 6, 2018 / © 2018 IBM Corporation
Center for Open Source Data and AI Technologies
CODAIT
codait.org
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
Improving Enterprise AI Lifecycle in Open Source
DBG / June 6, 2018 / © 2018 IBM Corporation
Agenda
Recommender systems overview
Deep learning overview
Deep learning for recommendations
Challenges and future directions
DBG / June 6, 2018 / © 2018 IBM Corporation
Recommender Systems
DBG / June 6, 2018 / © 2018 IBM Corporation
Users and Items
Recommender Systems
DBG / June 6, 2018 / © 2018 IBM Corporation
Events
Recommender Systems
Implicit preference data
▪ Online – page view, click, app interaction
▪ Commerce – cart, purchase, return
▪ Media – preview, watch, listen
Explicit preference data
▪ Ratings, reviews
Intent
▪ Search query
Social
▪ Like, share, follow, unfollow, block
DBG / June 6, 2018 / © 2018 IBM Corporation
Context
Recommender Systems
DBG / June 6, 2018 / © 2018 IBM Corporation
Prediction
Recommender Systems
Prediction is ranking
– Given a user and context, rank the available items in
order of likelihood that the user will interact with them
Sort
items
DBG / June 6, 2018 / © 2018 IBM Corporation
Cold Start
Recommender Systems
New items
– No historical interaction data
– Typically use baselines or item content
New (or unknown) users
– Previously unseen or anonymous users have no user
profile or historical interactions
– Have context data (but possibly very limited)
– Cannot directly use collaborative filtering models
• Item-similarity for current item
• Represent user as aggregation of items
• Contextual models can incorporate short-term history
DBG / June 6, 2018 / © 2018 IBM Corporation
Deep Learning
DBG / June 6, 2018 / © 2018 IBM Corporation
Overview
Deep Learning
Original theory from 1940s; computer models
originated around 1960s; fell out of favor in
1980s/90s
Recent resurgence due to
– Bigger (and better) data; standard datasets (e.g. ImageNet)
– Better hardware (GPUs)
– Improvements to algorithms, architectures and optimization
Leading to new state-of-the-art results in
computer vision (images and video); speech/
text; language translation and more
Source: Wikipedia
DBG / June 6, 2018 / © 2018 IBM Corporation
Modern Neural Networks
Deep Learning
Deep (multi-layer) networks
Computer vision
– Convolution neural networks (CNNs)
– Image classification, object detection, segmentation
Sequences and time-series
– Recurrent neural networks (RNNs)
– Machine translation, text generation
– LSTMs, GRUs
Embeddings
– Text, categorical features
Deep learning frameworks
– Flexibility, computation graphs, auto-differentiation, GPUs
Source: Stanford CS231n
DBG / June 6, 2018 / © 2018 IBM Corporation
Deep Learning as Feature
Extraction
Source: Stanford CS231n; D. Zeiler,R. Fergus
DBG / June 6, 2018 / © 2018 IBM Corporation
Deep Learning for

Recommendations
DBG / June 6, 2018 / © 2018 IBM Corporation
Evolution of Recommendation Models
Deep Learning for Recommendations
Scalable models
Amazon’s item-to-item
collaborative filtering
Netflix Prize
The rise of matrix
factorization
Factorization
Machines
General factor models
Gaining momentum
DLRS Workshops @
RecSys Conferences
Deep content-based
models
Music recommendation
2003 2006-2009 2010 2013 2016/2017
– Netflix prize created
significant progress (not
unlike ImageNet)
– Matrix factorization
techniques became SotA
– Restricted Boltzmann
Machine (a class of NN) was
one of the strongest single
models
– Mostly tweaks until
introduction of Factorization
Machines
– Initial work on applying DL
focused on feature
extraction for content
– Huge momentum in DL
techniques over the past two
years
DBG / June 6, 2018 / © 2018 IBM Corporation
Prelude: Matrix Factorization
Deep Learning for Recommendations
The de facto standard model
– Represent user ratings as a user-item matrix
– Find two smaller matrices (called the factor
matrices) that approximate the full matrix
– Minimize the reconstruction error (i.e. rating
prediction / completion)
– Efficient, scalable algorithms
• Gradient Descent
• Alternating Least Squares (ALS)
– Prediction is simple
– Can handle implicit data
DBG / June 6, 2018 / © 2018 IBM Corporation
Prelude: Matrix Factorization
Deep Learning for Recommendations
Can be re-created easily in Deep Learning
frameworks, e.g. Keras
– Use raw weight matrices in low-level frameworks
– Or use embeddings for user and item identifiers
– Loss function: dot product
– Can add user and item biases using single-
dimensional embeddings
– What about implicit data?
DBG / June 6, 2018 / © 2018 IBM Corporation
Matrix Factorization for Implicit Data
Deep Learning for Recommendations
Flexibility of deep learning frameworks
means we can deal with implicit data in many
ways
– Frame it as a classification problem and use the
built-in loss functions
– Frame it as a ranking problem and create custom
loss functions (typically with negative item
sampling)
– Frame it as a weighted regression / classification
problem (similar to weighted ALS for implicit data)
Source: Maciej Kula
DBG / June 6, 2018 / © 2018 IBM Corporation
Deep content-based music recommendation
Deep Learning for Recommendations
Example of using a neural network model to
act as feature extractor for item content /
metadata
– CNN with audio spectogram as input data
– Filters capture lower-level audio characteristics,
progressing to high-level features (akin to image
problems)
– Max pooling and global pooling
– Fully connected layers with ReLU activations
– Output layer is the factor vector for the track from
a trained collaborative filtering model
– Models trained separately
Source: Spotify / Sander Dieleman
DBG / June 6, 2018 / © 2018 IBM Corporation
Wide and Deep Learning
Deep Learning for Recommendations
Combines the strengths of shallow linear
models with deep learning models
– Used for Google Play app store recommendations
– Linear model captures sparse feature interactions,
while deep model captures higher-order
interactions
– Sparsity of the deep model handled by using
embeddings
– Both models trained at the same time
Source: Google Research
DBG / June 6, 2018 / © 2018 IBM Corporation
DeepFM
Deep Learning for Recommendations
Wide and Deep architecture aiming to
leverage strengths of Factorization Machines
for the linear component
– Models trained together and both parts share the
same weights
– More flexible handling of mixed real / categorical
variables
– Reduces need for manually engineering interaction
features
Source: Guo, Tang, et al
DBG / June 6, 2018 / © 2018 IBM Corporation
Content2Vec
Deep Learning for Recommendations
Specialize content embeddings for
recommendation
– Combine modular sets of feature extractors into
one item embedding
– e.g. CNN-based model for images (AlexNet)
– e.g. Word2Vec, sentence CNN, RNN for text
– e.g. Prod2Vec for embedding collaborative filtering
(co-occurrences)
– Modules mostly pre-trained in some form
– Final training step then similar to “transfer
learning”
– Use pair-wise item similarity metric (loss)
Source: Criteo Research
DBG / June 6, 2018 / © 2018 IBM Corporation
Session-based recommendation
Deep Learning for Recommendations
Apply the advances in sequence modeling
from deep learning
– RNN architectures trained on the sequence of user
events in a session (e.g. products viewed,
purchased) to predict next item in session
– Adjustments for domain
• Item encoding (1-of-N, weighted average)
• Parallel mini-batch processing
• Ranking losses – BPR , TOP1
• Negative item sampling per mini-batch
– Report 20-30% accuracy gain over baselines
Source: Hidasi, Karatzoglou, Baltrunas, Tikk
DBG / June 6, 2018 / © 2018 IBM Corporation
Contextual Session-based models
Deep Learning for Recommendations
Add contextual data to the RNN architecture
– Context included time, time since last event, event
type
– Combine context data with input / output layer
– Also combine context with the RNN layers
– About 3-6% improvement (in Recall@10 metric)
over simple RNN baseline
– Importantly, model is even better at predicting
sales (vs view, add to cart events) and at predicting
new / fresh items (vs items the user has already
seen)
Source: Smirnova, Vasile
DBG / June 6, 2018 / © 2018 IBM Corporation
Challenges
Deep Learning for Recommendations
Challenges particular to recommendation
models
– Data size and dimensionality (input & output)
– Extreme sparsity
• Embeddings & compressed representations
– Wide variety of specialized settings
– Combining session, content, context and
preference data
– Model serving is difficult – ranking, large number of
items, computationally expensive
– Metrics – model accuracy and its relation to real-
world outcomes and behaviors
– Need for standard, open, large-scale, datasets that
have time / session data and are content- and
context-rich
– Evaluation – watch you baselines!
• When Recurrent Neural Networks meet the
Neighborhood for Session-Based Recommendation
DBG / June 6, 2018 / © 2018 IBM Corporation
Future Directions
Deep Learning for Recommendations
Most recent and future directions in research
& industry
– Improved RNNs
• Cross-session models
• Further research on contextual models, as well
as content and metadata
– Combine sequence and historical models (long-
and short-term user behavior)
– Improving and understanding user and item
embeddings
– Applications at scale
• Dimensionality reduction techniques (e.g.
feature hashing, Bloom embeddings for large
input/output spaces)
• Compression / quantization techniques
• Distributed training
• Efficient model serving for complex
architectures
DBG / June 6, 2018 / © 2018 IBM Corporation
Summary
Deep Learning for Recommendations
DL for recommendation is just getting started
(again)
– Huge increase in interest, research papers. Already
many new models and approaches
– DL approaches have generally yielded incremental
% gains
• But that can translate to significant $$$
• More pronounced in e.g. session-based
– Cold start scenarios benefit from multi-modal
nature of DL models
– Flexibility of DL frameworks helps a lot
– Benefits from advances in DL for images, video,
NLP
– Open-source libraries appearing (e.g. Spotlight)
– Check out DLRS workshops & tutorials @ RecSys
2016 / 2017, and upcoming in Oct, 2018
DBG / June 6, 2018 / © 2018 IBM Corporation
Thank you
codait.org
twitter.com/MLnick
github.com/MLnick
developer.ibm.com/code
FfDL
Sign up for IBM Cloud and try Watson Studio!
https://datascience.ibm.com/
MAX
DBG / June 6, 2018 / © 2018 IBM Corporation
Links & References
Wikipedia: Perceptron
Amazon’s Item-to-item Collaborative Filtering Engine
Matrix Factorization for Recommender Systems (Netflix Prize)
Deep Learning for Recommender Systems Workshops @ RecSys
Deep Learning for Recommender Systems Tutorial @ RecSys 2017
Fashion MNIST Dataset
Deep Content-based Music Recommendation
Google’s Wide and Deep Learning Model
Spotlight: Recommendation models in PyTorch
Visualizing and Understanding Convolutional Networks
Stanford CS231n Convolutional Neural Networks for Visual Recognition
Restricted Boltzmann Machines for Collaborative Filtering
Specializing Joint Representations for the task of Product Recommendation
Session-based Recommendations with Recurrent Neural Networks
Contextual Sequence Modeling for Recommendation with Recurrent Neural
Networks
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Getting deep recommenders fit: Bloom embeddings for sparse binary input/
output networks
Deep Neural Networks for YouTube Recommendations
Ranking loss with Keras
3D Convolutional Networks for Session-based Recommendation with Content
Features
DBG / June 6, 2018 / © 2018 IBM Corporation

More Related Content

What's hot

What's hot (20)

Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflow
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
SVM Algorithm Explained | Support Vector Machine Tutorial Using R | Edureka
SVM Algorithm Explained | Support Vector Machine Tutorial Using R | EdurekaSVM Algorithm Explained | Support Vector Machine Tutorial Using R | Edureka
SVM Algorithm Explained | Support Vector Machine Tutorial Using R | Edureka
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 

Similar to Deep Learning for Recommender Systems with Nick pentreath

Similar to Deep Learning for Recommender Systems with Nick pentreath (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and Personalization
 
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
When Deep Learning Meets Recommender System
When Deep Learning Meets Recommender SystemWhen Deep Learning Meets Recommender System
When Deep Learning Meets Recommender System
 
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
 
FIWARE Training: Introduction to Smart Data Models
FIWARE Training: Introduction to Smart Data ModelsFIWARE Training: Introduction to Smart Data Models
FIWARE Training: Introduction to Smart Data Models
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science Platform
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Business Intelligence.pptx
Business Intelligence.pptxBusiness Intelligence.pptx
Business Intelligence.pptx
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 

Deep Learning for Recommender Systems with Nick pentreath

  • 1. DBG / June 6, 2018 / © 2018 IBM Corporation Deep Learning for Recommender Systems Nick Pentreath Principal Engineer @MLnick
  • 2. DBG / June 6, 2018 / © 2018 IBM Corporation About @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies Machine Learning & AI Apache Spark committer & PMC Author of Machine Learning with Spark Various conferences & meetups
  • 3. DBG / June 6, 2018 / © 2018 IBM Corporation Center for Open Source Data and AI Technologies CODAIT codait.org CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission Improving Enterprise AI Lifecycle in Open Source
  • 4. DBG / June 6, 2018 / © 2018 IBM Corporation Agenda Recommender systems overview Deep learning overview Deep learning for recommendations Challenges and future directions
  • 5. DBG / June 6, 2018 / © 2018 IBM Corporation Recommender Systems
  • 6. DBG / June 6, 2018 / © 2018 IBM Corporation Users and Items Recommender Systems
  • 7. DBG / June 6, 2018 / © 2018 IBM Corporation Events Recommender Systems Implicit preference data ▪ Online – page view, click, app interaction ▪ Commerce – cart, purchase, return ▪ Media – preview, watch, listen Explicit preference data ▪ Ratings, reviews Intent ▪ Search query Social ▪ Like, share, follow, unfollow, block
  • 8. DBG / June 6, 2018 / © 2018 IBM Corporation Context Recommender Systems
  • 9. DBG / June 6, 2018 / © 2018 IBM Corporation Prediction Recommender Systems Prediction is ranking – Given a user and context, rank the available items in order of likelihood that the user will interact with them Sort items
  • 10. DBG / June 6, 2018 / © 2018 IBM Corporation Cold Start Recommender Systems New items – No historical interaction data – Typically use baselines or item content New (or unknown) users – Previously unseen or anonymous users have no user profile or historical interactions – Have context data (but possibly very limited) – Cannot directly use collaborative filtering models • Item-similarity for current item • Represent user as aggregation of items • Contextual models can incorporate short-term history
  • 11. DBG / June 6, 2018 / © 2018 IBM Corporation Deep Learning
  • 12. DBG / June 6, 2018 / © 2018 IBM Corporation Overview Deep Learning Original theory from 1940s; computer models originated around 1960s; fell out of favor in 1980s/90s Recent resurgence due to – Bigger (and better) data; standard datasets (e.g. ImageNet) – Better hardware (GPUs) – Improvements to algorithms, architectures and optimization Leading to new state-of-the-art results in computer vision (images and video); speech/ text; language translation and more Source: Wikipedia
  • 13. DBG / June 6, 2018 / © 2018 IBM Corporation Modern Neural Networks Deep Learning Deep (multi-layer) networks Computer vision – Convolution neural networks (CNNs) – Image classification, object detection, segmentation Sequences and time-series – Recurrent neural networks (RNNs) – Machine translation, text generation – LSTMs, GRUs Embeddings – Text, categorical features Deep learning frameworks – Flexibility, computation graphs, auto-differentiation, GPUs Source: Stanford CS231n
  • 14. DBG / June 6, 2018 / © 2018 IBM Corporation Deep Learning as Feature Extraction Source: Stanford CS231n; D. Zeiler,R. Fergus
  • 15. DBG / June 6, 2018 / © 2018 IBM Corporation Deep Learning for
 Recommendations
  • 16. DBG / June 6, 2018 / © 2018 IBM Corporation Evolution of Recommendation Models Deep Learning for Recommendations Scalable models Amazon’s item-to-item collaborative filtering Netflix Prize The rise of matrix factorization Factorization Machines General factor models Gaining momentum DLRS Workshops @ RecSys Conferences Deep content-based models Music recommendation 2003 2006-2009 2010 2013 2016/2017 – Netflix prize created significant progress (not unlike ImageNet) – Matrix factorization techniques became SotA – Restricted Boltzmann Machine (a class of NN) was one of the strongest single models – Mostly tweaks until introduction of Factorization Machines – Initial work on applying DL focused on feature extraction for content – Huge momentum in DL techniques over the past two years
  • 17. DBG / June 6, 2018 / © 2018 IBM Corporation Prelude: Matrix Factorization Deep Learning for Recommendations The de facto standard model – Represent user ratings as a user-item matrix – Find two smaller matrices (called the factor matrices) that approximate the full matrix – Minimize the reconstruction error (i.e. rating prediction / completion) – Efficient, scalable algorithms • Gradient Descent • Alternating Least Squares (ALS) – Prediction is simple – Can handle implicit data
  • 18. DBG / June 6, 2018 / © 2018 IBM Corporation Prelude: Matrix Factorization Deep Learning for Recommendations Can be re-created easily in Deep Learning frameworks, e.g. Keras – Use raw weight matrices in low-level frameworks – Or use embeddings for user and item identifiers – Loss function: dot product – Can add user and item biases using single- dimensional embeddings – What about implicit data?
  • 19. DBG / June 6, 2018 / © 2018 IBM Corporation Matrix Factorization for Implicit Data Deep Learning for Recommendations Flexibility of deep learning frameworks means we can deal with implicit data in many ways – Frame it as a classification problem and use the built-in loss functions – Frame it as a ranking problem and create custom loss functions (typically with negative item sampling) – Frame it as a weighted regression / classification problem (similar to weighted ALS for implicit data) Source: Maciej Kula
  • 20. DBG / June 6, 2018 / © 2018 IBM Corporation Deep content-based music recommendation Deep Learning for Recommendations Example of using a neural network model to act as feature extractor for item content / metadata – CNN with audio spectogram as input data – Filters capture lower-level audio characteristics, progressing to high-level features (akin to image problems) – Max pooling and global pooling – Fully connected layers with ReLU activations – Output layer is the factor vector for the track from a trained collaborative filtering model – Models trained separately Source: Spotify / Sander Dieleman
  • 21. DBG / June 6, 2018 / © 2018 IBM Corporation Wide and Deep Learning Deep Learning for Recommendations Combines the strengths of shallow linear models with deep learning models – Used for Google Play app store recommendations – Linear model captures sparse feature interactions, while deep model captures higher-order interactions – Sparsity of the deep model handled by using embeddings – Both models trained at the same time Source: Google Research
  • 22. DBG / June 6, 2018 / © 2018 IBM Corporation DeepFM Deep Learning for Recommendations Wide and Deep architecture aiming to leverage strengths of Factorization Machines for the linear component – Models trained together and both parts share the same weights – More flexible handling of mixed real / categorical variables – Reduces need for manually engineering interaction features Source: Guo, Tang, et al
  • 23. DBG / June 6, 2018 / © 2018 IBM Corporation Content2Vec Deep Learning for Recommendations Specialize content embeddings for recommendation – Combine modular sets of feature extractors into one item embedding – e.g. CNN-based model for images (AlexNet) – e.g. Word2Vec, sentence CNN, RNN for text – e.g. Prod2Vec for embedding collaborative filtering (co-occurrences) – Modules mostly pre-trained in some form – Final training step then similar to “transfer learning” – Use pair-wise item similarity metric (loss) Source: Criteo Research
  • 24. DBG / June 6, 2018 / © 2018 IBM Corporation Session-based recommendation Deep Learning for Recommendations Apply the advances in sequence modeling from deep learning – RNN architectures trained on the sequence of user events in a session (e.g. products viewed, purchased) to predict next item in session – Adjustments for domain • Item encoding (1-of-N, weighted average) • Parallel mini-batch processing • Ranking losses – BPR , TOP1 • Negative item sampling per mini-batch – Report 20-30% accuracy gain over baselines Source: Hidasi, Karatzoglou, Baltrunas, Tikk
  • 25. DBG / June 6, 2018 / © 2018 IBM Corporation Contextual Session-based models Deep Learning for Recommendations Add contextual data to the RNN architecture – Context included time, time since last event, event type – Combine context data with input / output layer – Also combine context with the RNN layers – About 3-6% improvement (in Recall@10 metric) over simple RNN baseline – Importantly, model is even better at predicting sales (vs view, add to cart events) and at predicting new / fresh items (vs items the user has already seen) Source: Smirnova, Vasile
  • 26. DBG / June 6, 2018 / © 2018 IBM Corporation Challenges Deep Learning for Recommendations Challenges particular to recommendation models – Data size and dimensionality (input & output) – Extreme sparsity • Embeddings & compressed representations – Wide variety of specialized settings – Combining session, content, context and preference data – Model serving is difficult – ranking, large number of items, computationally expensive – Metrics – model accuracy and its relation to real- world outcomes and behaviors – Need for standard, open, large-scale, datasets that have time / session data and are content- and context-rich – Evaluation – watch you baselines! • When Recurrent Neural Networks meet the Neighborhood for Session-Based Recommendation
  • 27. DBG / June 6, 2018 / © 2018 IBM Corporation Future Directions Deep Learning for Recommendations Most recent and future directions in research & industry – Improved RNNs • Cross-session models • Further research on contextual models, as well as content and metadata – Combine sequence and historical models (long- and short-term user behavior) – Improving and understanding user and item embeddings – Applications at scale • Dimensionality reduction techniques (e.g. feature hashing, Bloom embeddings for large input/output spaces) • Compression / quantization techniques • Distributed training • Efficient model serving for complex architectures
  • 28. DBG / June 6, 2018 / © 2018 IBM Corporation Summary Deep Learning for Recommendations DL for recommendation is just getting started (again) – Huge increase in interest, research papers. Already many new models and approaches – DL approaches have generally yielded incremental % gains • But that can translate to significant $$$ • More pronounced in e.g. session-based – Cold start scenarios benefit from multi-modal nature of DL models – Flexibility of DL frameworks helps a lot – Benefits from advances in DL for images, video, NLP – Open-source libraries appearing (e.g. Spotlight) – Check out DLRS workshops & tutorials @ RecSys 2016 / 2017, and upcoming in Oct, 2018
  • 29. DBG / June 6, 2018 / © 2018 IBM Corporation Thank you codait.org twitter.com/MLnick github.com/MLnick developer.ibm.com/code FfDL Sign up for IBM Cloud and try Watson Studio! https://datascience.ibm.com/ MAX
  • 30. DBG / June 6, 2018 / © 2018 IBM Corporation Links & References Wikipedia: Perceptron Amazon’s Item-to-item Collaborative Filtering Engine Matrix Factorization for Recommender Systems (Netflix Prize) Deep Learning for Recommender Systems Workshops @ RecSys Deep Learning for Recommender Systems Tutorial @ RecSys 2017 Fashion MNIST Dataset Deep Content-based Music Recommendation Google’s Wide and Deep Learning Model Spotlight: Recommendation models in PyTorch Visualizing and Understanding Convolutional Networks Stanford CS231n Convolutional Neural Networks for Visual Recognition Restricted Boltzmann Machines for Collaborative Filtering Specializing Joint Representations for the task of Product Recommendation Session-based Recommendations with Recurrent Neural Networks Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks DeepFM: A Factorization-Machine based Neural Network for CTR Prediction Getting deep recommenders fit: Bloom embeddings for sparse binary input/ output networks Deep Neural Networks for YouTube Recommendations Ranking loss with Keras 3D Convolutional Networks for Session-based Recommendation with Content Features
  • 31. DBG / June 6, 2018 / © 2018 IBM Corporation