SlideShare a Scribd company logo
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
1. Introduction
Optimization problem, linear regression and Stochastic Gradient Descent (SGD)
1. Baseline models
Global average, user average and item-item models
1. Basic linear models
Least Squares (LS)
Regularized Least Squares (RLS)
1. Matrix factorization
Matrix Factorization, analytical solution and numerical solution
1. Non-linear models
Basic and Complex Deep Learning model
Introduction
Model training – Introduction
Explicit vs Implicit feedback
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Model training – Introduction
Explicit vs Implicit feedback
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Explicit feedback Implicit feedback
Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses
Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases
Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
Model training – Introduction
Recommendation engine types
Recommendation
engine
Content-based
Collaborative-filtering
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model training – Introduction
Recommendation engine types
Recommendation
engine
Content-based
Collaborative-filtering
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model When? Linear Problem definition Solutions strategies
Content-based Item Cold start Least Square, Deep Learning
Item-Item n_users >> n_items Affinity Matrix
User-User n_user << n_items KNN, Affinity Matrix
User-Item Better performance Matrix Factorization, Deep Learning
Model training – Introduction
Recommendation engine types
Recommendation
engine
Content-based
Collaborative-based
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model When? Linear Problem definition Solutions strategies
Content-based Item Cold start Least Square, Deep Learning
Item-Item n_users >> n_items Affinity Matrix
User-User n_user << n_items KNN, Affinity Matrix
User-Item Better performance Matrix Factorization, Deep Learning
Model training – Introduction - Optimization
(or R)
Model training – Introduction - Optimization
Optimization problem (definitions)
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Model training – Introduction - Optimization
Optimization problem (definitions)
Ratings of User #1
Embedding of User #1
Embedding of Item #1
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Ratings of User #m
To Item #n
Model training – Introduction - Optimization
Optimization problem (definitions)
AVAILABLE DATASET
?
?
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Model training – Introduction - Optimization
Optimization problem (basic formulation with RMSE)
Our goal is to find U and I, such as the difference between each datapoint in R and and the product
between each user and item is minimal.
(or R)
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Content-based
Content-based with Regularization
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Content-based
Content-based with Regularization
Available data
Regularization to
avoid overfitting
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Content-based
Content-based with Regularization
Take home
● In content-based models we already know I (items features)
● We can find a linear solutions to this problem using Least Squares
Available data
Regularization to
avoid overfitting
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Collaborative-filtering
Collaborative-filtering with Regularization
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Collaborative-filtering
Collaborative-filtering with Regularization
Available data
Regularization to
avoid overfitting
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Collaborative-filtering
Collaborative-filtering with Regularization
Available data
Regularization to
avoid overfitting
Take home
● In collaborative-filtering we want to find U and I (users and items embeddings)
● We can find a linear solutions to this problem using Matrix Factorization and SGD
Model training – Introduction - Optimization
How to analytical solve an optimization problem?
Let’s start with the simple optimization problem: linear regression without regularization.
With m > n and. We want to find W such as:
Model training – Introduction - Optimization
How to analytical solve an optimization problem?
Let’s start with the simple optimization problem: linear regression without regularization.
With m > n and. We want to find W such as:
Add column of ones
to support w0
Scalar numbers
Model training – Introduction - Optimization
How to numerical solve an optimization problem?
Gradient descent: Start with random values for W and move in the opposite direction of the gradient
By taking just one sample
Model training – Introduction - Optimization
How to numerical solve an optimization problem?
Gradient descent: Start with random values for W and move in the opposite direction of the gradient
By taking just one sample
J(w)
Model training – Introduction - Optimization
Gradient Descent algorithm Stochastic Gradient Descent algorithm
for epoch in n_epochs:
● compute the predictions for all the samples
● compute the error between truth and predictions
● compute the gradient using all the samples
● update the parameters of the model
for epoch in n_epochs:
● shuffle the samples
● for sample in n_samples:
○ compute the predictions for the sample
○ compute the error between truth and
predictions
○ compute the gradient using the sample
○ update the parameters of the model
Mini-Batch Gradient Descent algorithm
for epoch in n_epochs:
● shuffle the batches
● for batch in n_batches:
○ compute the predictions for the batch
○ compute the error for the batch
○ compute the gradient for the batch
○ update the parameters of the model
Model training – Introduction - Optimization
Gradient Descent comparison
Gradient Descent Stochastic Gradient Descent Mini-Batch Gradient Descent
Gradient
Speed Very Fast (vectorized) Slow (compute sample by sample) Fast (vectorized)
Memory O(dataset) O(1) O(batch)
Convergence Needs more epochs Needs less epochs Middle point between GD and SGD
Gradient Stability Smooth updates in params Noisy updates in params Middle point between GD and SGD
Model training – Introduction - Optimization
A Problem with Implicit Feedback
With datasets with only unary positive feedback (e.g. clicks history)
Negative Sampling
Common fix: add random users and items with r=0
Model training – Introduction - Optimization
A Problem with Implicit Feedback
With datasets with only unary positive feedback (e.g. clicks history)
Negative Sampling
Common fix: add random users and items with r=0
Uniform distribution
Dataset
Model training – Introduction - Optimization
Negative Sampling
Common fix: add random users and items with rating=0
● Expresses “unknowns items” from users
● Acts as a regularizer
● Works also for explicit feedback
Baseline models
Model Training – Baseline models
Introduction
● Before starting to train models, always compute a baseline
● Baselines are very useful to debug more complex models
● As a general rule:
○ Very basic models can’t capture all the details on the training data and tend to underfit
○ Very complex models capture every detail on the training data and tend to overfit
● Note: During this presentation we will be using RMSE for comparing models performance
Model Training – Baseline models
Global Average
Average = 3.64
3.64
3.64
3.64
3.64
3.64
3.64
Prediction
RMSE = sqrt((2 - 3.64)^2 + (1-3.64)^2 + …)
RMSE = sqrt(4.13)
Model Training – Baseline models
Global average - Numpy code
importnumpyas np
from scipy.sparse import csr_matrix
rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5])
cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4])
data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2])
ratings= csr_matrix((data,(rows, cols)), shape=(6, 6))
idx = np.random.permutation(data.size)
idx_train = idx[0:int(idx.size*0.8)]
idx_valid = idx[int(idx.size*0.8):]
global_avg= data[idx_train].mean()
rmse = np.sqrt(((data[idx_valid]- global_avg)**2).sum())
Model Training – Baseline models
User average
Average u1 = 4.50
Average u2 = 5.00
Average u3 = 3.67
4.50
5.00
3.67
2.50
5.00
2.50
Prediction
RMSE = sqrt((2 - 4.5)^2 + (1-5.0)^2 ...)
RMSE = sqrt(6.15)
Average u4 = 2.50
Average u5 = 5.00
Average u6 = 2.50
Model Training – Baseline models
User average - Numpy code
rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5])
cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4])
data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2])
ratings= csr_matrix((data,(rows, cols)), shape=(6, 6))
idx = np.random.permutation(data.size)
idx_train = idx[0:int(idx.size*0.8)]
idx_valid = idx[int(idx.size*0.8):]
ratings_train = csr_matrix((data[idx_train],(rows[idx_train],cols[idx_train])),shape=(6,6))
ratings_valid = csr_matrix((data[idx_valid],(rows[idx_valid],cols[idx_valid])),shape=(6,6))
count_per_row = (ratings_train> 0).sum(axis=1).A1
sum_per_row = ratings_train.sum(axis=1).A1.astype('float32')
user_avg = sum_per_row / count_per_row
rmse = np.sqrt(((ratings_valid.tocoo().data -user_avg[rows[idx_valid]])**2).sum())
Model Training – Baseline models
Item-Item
Basic linear models
Model Training – Basic linear models
Content Based - Standard Least Squares model
● Goal: very basic linear model
● Data: the matrix of items features I (may be sparse)
● Pre-processing: use PCA to reduce the dimension of I
● Solve:
● Solution is Least Squares:
Model Training – Basic linear models
Content Based - Standard Least Squares model
● Goal: very basic linear model
● Data: the matrix of items features I (may be sparse)
● Pre-processing: use PCA to reduce the dimension of I
● Solve:
● Solution is Least Squares:
Never compute the inverse!
(1) Use numpy:
numpy.linalg.solve(I*I.T, I*R.T)
(1) Use Cholesky decomposition:
(I * I.T) is a positive definite matrix!
Model Training – Basic linear models
Content Based - Regularized Least Squares model
● Goal: avoid overfitting
● Method: Tikhonov Regularization (a.k.a Ridge Regression)
● Solve:
● Solution is Regularized Least Squares:
Matrix factorization
Model Training – Matrix Factorization
Matrix Factorization
● If we don’t have I, to find a linear solution to our problem we need to use Matrix Factorization
techniques.
● Now we want to solve the following optimization problem:
SOLUTIONS
ANALYTICAL NUMERICAL
SVD ALS SGD
Model Training – Matrix Factorization
Matrix Factorization - Graphical interpretation
(or R)
Model Training – Matrix Factorization
Matrix Factorization - Graphical interpretation
Model Training – Matrix Factorization
Matrix Factorization - Graphical interpretation
Model Training – Matrix Factorization
Analytical solution - Singular Value Decomposition (SVD)
● Optimal Solution
● Closed Form, readily available in scikit-learn
● O(n^3) algorithm, does not scale
Model Training – Matrix Factorization
Numerical solution - Alternating Least Square (ALS)
Initialize:
Iterate:
● Solving least squares is easy
● Scales to big dataset
● Distributed implementation are available (e.g. on Spark)
Model Training – Matrix Factorization
Numerical solution - Stochastic Gradient Descent (SGD)
We are using SGD -> One sample each time
Model Training – Matrix Factorization
100 epochs
Non-linear models
Model Training – Non-linear models
Simple Deep Learning model for collaborative filtering
Model Training – Non-linear models
Simple Deep Learning model for collaborative filtering
Model Training – Basic Deep Learning model
Simple Deep Learning model for collaborative filtering
Model Training – Complex Deep Learning problem
More complex Deep Learning model for collaborative filtering
Model Training – Complex Deep Learning problem
Training with Deep Learning
● Use Deep Learning Framework (e.g. PyTorch, TensorFlow)
● ...or at least Analytical Gradient Libraries (e.g. Theano, Chainer)
● Acceleration Heuristics (e.g. AdaGrad, Nesterov, RMSProp, Adam, NAdam)
● DropOut / BatchNorm
● Watch-out for Sparse Momentum Updates! Most Deep Learning frameworks don’t support it
● Hyper-parameter Optimization and Architecture Search (e.g. Gaussian Processes)
Conclusions
Model Training – Conclusions
Conclusions
Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning
Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets
Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear
Time Complexity + + +++ ++++ ++++ ++++ ++
Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit
Hyper-Params 0 0 0 1 2 2–3 many
Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
Model Training – Conclusions
Take home
● Always start with the simplest, stupidest models
● Spend time on simple interpretable models to debug your codebase and clean your data
● Gradually increase the complexity of your models
● Add more regularization as soon as a complex model performs worse than a simpler model
Questions
Thank YOU!

More Related Content

What's hot

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
Shuai Zhang
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
Rishabh Mehta
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
Milind Gokhale
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
Akshat Thakar
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Marcel Kurovski
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
Kapil Garg
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
D Yogendra Rao
 
Recommendation system for ecommerce
Recommendation system for ecommerceRecommendation system for ecommerce
Recommendation system for ecommerce
Tu Pham
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
Aravindharamanan S
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Carlos Castillo (ChaTo)
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
Alexander Konduforov
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Tamer Rezk
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Deep Learning Recommender Systems
Deep Learning Recommender SystemsDeep Learning Recommender Systems
Deep Learning Recommender Systems
Cristian Javier Martinez
 

What's hot (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Recommendation system for ecommerce
Recommendation system for ecommerceRecommendation system for ecommerce
Recommendation system for ecommerce
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Deep Learning Recommender Systems
Deep Learning Recommender SystemsDeep Learning Recommender Systems
Deep Learning Recommender Systems
 

Similar to Recommender Systems from A to Z – Model Training

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
CloudxLab
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
Michael Winer
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
Poo Kuan Hoong
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
Collin Bennett
 
Regresión
RegresiónRegresión
Regression ppt
Regression pptRegression ppt
Regression ppt
SuyashSingh70
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 

Similar to Recommender Systems from A to Z – Model Training (20)

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
Regresión
RegresiónRegresión
Regresión
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 

Recently uploaded

Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 

Recently uploaded (20)

Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 

Recommender Systems from A to Z – Model Training

  • 1.
  • 2. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 3. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 4. 1. Introduction Optimization problem, linear regression and Stochastic Gradient Descent (SGD) 1. Baseline models Global average, user average and item-item models 1. Basic linear models Least Squares (LS) Regularized Least Squares (RLS) 1. Matrix factorization Matrix Factorization, analytical solution and numerical solution 1. Non-linear models Basic and Complex Deep Learning model
  • 6. Model training – Introduction Explicit vs Implicit feedback Explicit feedback (users’ ratings) Implicit feedback (users’ clicks)
  • 7. Model training – Introduction Explicit vs Implicit feedback Explicit feedback (users’ ratings) Implicit feedback (users’ clicks) Explicit feedback Implicit feedback Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
  • 8. Model training – Introduction Recommendation engine types Recommendation engine Content-based Collaborative-filtering Hybrid engine Memory-based Model-based Item-Item User-User User-Item
  • 9. Model training – Introduction Recommendation engine types Recommendation engine Content-based Collaborative-filtering Hybrid engine Memory-based Model-based Item-Item User-User User-Item Model When? Linear Problem definition Solutions strategies Content-based Item Cold start Least Square, Deep Learning Item-Item n_users >> n_items Affinity Matrix User-User n_user << n_items KNN, Affinity Matrix User-Item Better performance Matrix Factorization, Deep Learning
  • 10. Model training – Introduction Recommendation engine types Recommendation engine Content-based Collaborative-based Hybrid engine Memory-based Model-based Item-Item User-User User-Item Model When? Linear Problem definition Solutions strategies Content-based Item Cold start Least Square, Deep Learning Item-Item n_users >> n_items Affinity Matrix User-User n_user << n_items KNN, Affinity Matrix User-Item Better performance Matrix Factorization, Deep Learning
  • 11. Model training – Introduction - Optimization (or R)
  • 12. Model training – Introduction - Optimization Optimization problem (definitions) Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings
  • 13. Model training – Introduction - Optimization Optimization problem (definitions) Ratings of User #1 Embedding of User #1 Embedding of Item #1 Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings Ratings of User #m To Item #n
  • 14. Model training – Introduction - Optimization Optimization problem (definitions) AVAILABLE DATASET ? ? Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings
  • 15. Model training – Introduction - Optimization Optimization problem (basic formulation with RMSE) Our goal is to find U and I, such as the difference between each datapoint in R and and the product between each user and item is minimal. (or R)
  • 16. Model training – Introduction - Optimization Optimization problem (more complex formulation) Content-based Content-based with Regularization
  • 17. Model training – Introduction - Optimization Optimization problem (more complex formulation) Content-based Content-based with Regularization Available data Regularization to avoid overfitting
  • 18. Model training – Introduction - Optimization Optimization problem (more complex formulation) Content-based Content-based with Regularization Take home ● In content-based models we already know I (items features) ● We can find a linear solutions to this problem using Least Squares Available data Regularization to avoid overfitting
  • 19. Model training – Introduction - Optimization Optimization problem (more complex formulation) Collaborative-filtering Collaborative-filtering with Regularization
  • 20. Model training – Introduction - Optimization Optimization problem (more complex formulation) Collaborative-filtering Collaborative-filtering with Regularization Available data Regularization to avoid overfitting
  • 21. Model training – Introduction - Optimization Optimization problem (more complex formulation) Collaborative-filtering Collaborative-filtering with Regularization Available data Regularization to avoid overfitting Take home ● In collaborative-filtering we want to find U and I (users and items embeddings) ● We can find a linear solutions to this problem using Matrix Factorization and SGD
  • 22. Model training – Introduction - Optimization How to analytical solve an optimization problem? Let’s start with the simple optimization problem: linear regression without regularization. With m > n and. We want to find W such as:
  • 23. Model training – Introduction - Optimization How to analytical solve an optimization problem? Let’s start with the simple optimization problem: linear regression without regularization. With m > n and. We want to find W such as: Add column of ones to support w0 Scalar numbers
  • 24. Model training – Introduction - Optimization How to numerical solve an optimization problem? Gradient descent: Start with random values for W and move in the opposite direction of the gradient By taking just one sample
  • 25. Model training – Introduction - Optimization How to numerical solve an optimization problem? Gradient descent: Start with random values for W and move in the opposite direction of the gradient By taking just one sample J(w)
  • 26. Model training – Introduction - Optimization Gradient Descent algorithm Stochastic Gradient Descent algorithm for epoch in n_epochs: ● compute the predictions for all the samples ● compute the error between truth and predictions ● compute the gradient using all the samples ● update the parameters of the model for epoch in n_epochs: ● shuffle the samples ● for sample in n_samples: ○ compute the predictions for the sample ○ compute the error between truth and predictions ○ compute the gradient using the sample ○ update the parameters of the model Mini-Batch Gradient Descent algorithm for epoch in n_epochs: ● shuffle the batches ● for batch in n_batches: ○ compute the predictions for the batch ○ compute the error for the batch ○ compute the gradient for the batch ○ update the parameters of the model
  • 27. Model training – Introduction - Optimization Gradient Descent comparison Gradient Descent Stochastic Gradient Descent Mini-Batch Gradient Descent Gradient Speed Very Fast (vectorized) Slow (compute sample by sample) Fast (vectorized) Memory O(dataset) O(1) O(batch) Convergence Needs more epochs Needs less epochs Middle point between GD and SGD Gradient Stability Smooth updates in params Noisy updates in params Middle point between GD and SGD
  • 28. Model training – Introduction - Optimization A Problem with Implicit Feedback With datasets with only unary positive feedback (e.g. clicks history) Negative Sampling Common fix: add random users and items with r=0
  • 29. Model training – Introduction - Optimization A Problem with Implicit Feedback With datasets with only unary positive feedback (e.g. clicks history) Negative Sampling Common fix: add random users and items with r=0 Uniform distribution Dataset
  • 30. Model training – Introduction - Optimization Negative Sampling Common fix: add random users and items with rating=0 ● Expresses “unknowns items” from users ● Acts as a regularizer ● Works also for explicit feedback
  • 32. Model Training – Baseline models Introduction ● Before starting to train models, always compute a baseline ● Baselines are very useful to debug more complex models ● As a general rule: ○ Very basic models can’t capture all the details on the training data and tend to underfit ○ Very complex models capture every detail on the training data and tend to overfit ● Note: During this presentation we will be using RMSE for comparing models performance
  • 33. Model Training – Baseline models Global Average Average = 3.64 3.64 3.64 3.64 3.64 3.64 3.64 Prediction RMSE = sqrt((2 - 3.64)^2 + (1-3.64)^2 + …) RMSE = sqrt(4.13)
  • 34. Model Training – Baseline models Global average - Numpy code importnumpyas np from scipy.sparse import csr_matrix rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5]) cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4]) data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2]) ratings= csr_matrix((data,(rows, cols)), shape=(6, 6)) idx = np.random.permutation(data.size) idx_train = idx[0:int(idx.size*0.8)] idx_valid = idx[int(idx.size*0.8):] global_avg= data[idx_train].mean() rmse = np.sqrt(((data[idx_valid]- global_avg)**2).sum())
  • 35. Model Training – Baseline models User average Average u1 = 4.50 Average u2 = 5.00 Average u3 = 3.67 4.50 5.00 3.67 2.50 5.00 2.50 Prediction RMSE = sqrt((2 - 4.5)^2 + (1-5.0)^2 ...) RMSE = sqrt(6.15) Average u4 = 2.50 Average u5 = 5.00 Average u6 = 2.50
  • 36. Model Training – Baseline models User average - Numpy code rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5]) cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4]) data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2]) ratings= csr_matrix((data,(rows, cols)), shape=(6, 6)) idx = np.random.permutation(data.size) idx_train = idx[0:int(idx.size*0.8)] idx_valid = idx[int(idx.size*0.8):] ratings_train = csr_matrix((data[idx_train],(rows[idx_train],cols[idx_train])),shape=(6,6)) ratings_valid = csr_matrix((data[idx_valid],(rows[idx_valid],cols[idx_valid])),shape=(6,6)) count_per_row = (ratings_train> 0).sum(axis=1).A1 sum_per_row = ratings_train.sum(axis=1).A1.astype('float32') user_avg = sum_per_row / count_per_row rmse = np.sqrt(((ratings_valid.tocoo().data -user_avg[rows[idx_valid]])**2).sum())
  • 37. Model Training – Baseline models Item-Item
  • 39. Model Training – Basic linear models Content Based - Standard Least Squares model ● Goal: very basic linear model ● Data: the matrix of items features I (may be sparse) ● Pre-processing: use PCA to reduce the dimension of I ● Solve: ● Solution is Least Squares:
  • 40. Model Training – Basic linear models Content Based - Standard Least Squares model ● Goal: very basic linear model ● Data: the matrix of items features I (may be sparse) ● Pre-processing: use PCA to reduce the dimension of I ● Solve: ● Solution is Least Squares: Never compute the inverse! (1) Use numpy: numpy.linalg.solve(I*I.T, I*R.T) (1) Use Cholesky decomposition: (I * I.T) is a positive definite matrix!
  • 41. Model Training – Basic linear models Content Based - Regularized Least Squares model ● Goal: avoid overfitting ● Method: Tikhonov Regularization (a.k.a Ridge Regression) ● Solve: ● Solution is Regularized Least Squares:
  • 43. Model Training – Matrix Factorization Matrix Factorization ● If we don’t have I, to find a linear solution to our problem we need to use Matrix Factorization techniques. ● Now we want to solve the following optimization problem: SOLUTIONS ANALYTICAL NUMERICAL SVD ALS SGD
  • 44. Model Training – Matrix Factorization Matrix Factorization - Graphical interpretation (or R)
  • 45. Model Training – Matrix Factorization Matrix Factorization - Graphical interpretation
  • 46. Model Training – Matrix Factorization Matrix Factorization - Graphical interpretation
  • 47. Model Training – Matrix Factorization Analytical solution - Singular Value Decomposition (SVD) ● Optimal Solution ● Closed Form, readily available in scikit-learn ● O(n^3) algorithm, does not scale
  • 48. Model Training – Matrix Factorization Numerical solution - Alternating Least Square (ALS) Initialize: Iterate: ● Solving least squares is easy ● Scales to big dataset ● Distributed implementation are available (e.g. on Spark)
  • 49. Model Training – Matrix Factorization Numerical solution - Stochastic Gradient Descent (SGD) We are using SGD -> One sample each time
  • 50. Model Training – Matrix Factorization 100 epochs
  • 52. Model Training – Non-linear models Simple Deep Learning model for collaborative filtering
  • 53. Model Training – Non-linear models Simple Deep Learning model for collaborative filtering
  • 54. Model Training – Basic Deep Learning model Simple Deep Learning model for collaborative filtering
  • 55. Model Training – Complex Deep Learning problem More complex Deep Learning model for collaborative filtering
  • 56. Model Training – Complex Deep Learning problem Training with Deep Learning ● Use Deep Learning Framework (e.g. PyTorch, TensorFlow) ● ...or at least Analytical Gradient Libraries (e.g. Theano, Chainer) ● Acceleration Heuristics (e.g. AdaGrad, Nesterov, RMSProp, Adam, NAdam) ● DropOut / BatchNorm ● Watch-out for Sparse Momentum Updates! Most Deep Learning frameworks don’t support it ● Hyper-parameter Optimization and Architecture Search (e.g. Gaussian Processes)
  • 58. Model Training – Conclusions Conclusions Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear Time Complexity + + +++ ++++ ++++ ++++ ++ Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit Hyper-Params 0 0 0 1 2 2–3 many Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
  • 59. Model Training – Conclusions Take home ● Always start with the simplest, stupidest models ● Spend time on simple interpretable models to debug your codebase and clean your data ● Gradually increase the complexity of your models ● Add more regularization as soon as a complex model performs worse than a simpler model