SlideShare a Scribd company logo
1 of 19
Recommendation System
with Machine Learning and Deep Learning
Ding Li 2021.11
Charu C. Aggarwal, Recommender Systems: The Textbook,
Springer Publishing Company, Incorporated, 2016
In the age of Machine Learning
Goals of Recommendation Systems, Rating Types
Operational & Technical Goals
• Relevance (most important)
• Novelty
• Serendipity (surprising)
• Diversity
Business Goals
• Improve user satisfaction
• Improve user loyalty
• Increase sales
• Provide insights into users’ need
• Help customize the user experience further
Rating Types
Interaction or not
Less expressive
• Continuous
• Interval- based
• Ordinal
• Binary
• Unary
Prediction Types
• Rating value of a user-item combination
• Top-k items or top-k users
Basic Models of Recommendation Systems
• Use the collaborative power of the ratings provided by
multiple users to make recommendations
• Observed ratings are often highly correlated across
various users and items
Collaborative Filtering Models
Content-Based Recommender Systems
• Descriptive attributes of items are used to make recommendations
• Ratings and buying behavior of users are combined with the content
information available in the items
Knowledge-Based Recommender Systems
• Allow the users to explicitly specify what they want
• Based on the similarities between customer requirements and item
descriptions, or the use of constraints specifying user requirements.
• Constraint-based vs Case-based
• Conversational vs Search-based vs Navigation-based
Demographic Recommender Systems
• Map specific demographics to ratings or buying propensities
• Combined with additional context to guide the recommendation
Context-Based Recommender Systems
• Time, Location, Social (structural recommendation)
Hybrid and Ensemble-Based Recommender Systems
Generalization of classification/regression modeling in
which the prediction is performed in entry-wise fashion
rather than row-wise fashion.
Neighborhood-Based Collaborative Filtering Models
• Similar users display similar patterns of rating behavior
• Predict using the ratings of neighboring user
• Provide diverse recommendations
User-based collaborative filtering
• Similar items receive similar ratings
• Predict using the user’s own ratings on neighboring items
• Provide relevant recommendations
Item-based collaborative filtering
mean rating of user u Iu items rated by u
mean-centered rating
prediction function
Example: select users (1, 2) as peer group to predict user 3’s ratings on item 1
and 3
Pu(j): closest users to target user u, who have specified ratings for item j.
Adjusted cosine similarity between the items (columns) i and j:
Each row of the ratings matrix is first centered to a mean of zero.
Qt(u): top-k matching items
to item t, rated by user u
Example: items (2,3) are similar to item1; items (4,5) are similar to item 6
Neighborhood-Based Collaborative Filtering Optimization
Impact of the Long Tail
Some movies may be very popular, and they may repeatedly occur as
commonly rated items by different users. Such ratings can sometimes
worsen the quality of the recommendations because they tend to be less
discriminative across different users.
If mj is the number of ratings of item j, and m is the total
number of users, then the weight wj of the item j:
Less popular items will have more weights
Clustering and Neighborhood-Based Methods
• The users or items are divided into k clusters first
• Top-k closest peers within the same cluster are used to perform prediction
• Computation is significantly more efficient
• The m x n matrix is incomplete, subset of dimensions are used in calculation
Dimensionality Reduction
• Principle Component Analysis (PCA)
Rating matrix: R(m x n) → R’(m x d), d ≪ n
• Singular Value Decomposition (SVD)
Fill in the missing values in R (col/row average) → Rf
n × n similarity matrix between pairs of
P(n x n), whose columns contain the orthonormal eigenvectors of S
∆: diagonal matrix containing the non-negative eigenvalues of S along its diagonal
Pd (n x d): containing only the columns of P corresponding to the largest d eigenvectors
Rf (m x n) is represented by Rf Pd (m x d ) each user is represented in a d-dimensional space
• The maximum likelihood estimate of the covariance between each pair of items is
estimated as the covariance between only the specified entries.
• Incomplete matrix R, rather than filled matrix Rf, can be directly projected on the
reduced matrix Pd .
Filling missing values with mean introduces errors
• Direct Matrix Factorization of Incomplete Data
Q (m x m), whose columns contain the orthonormal eigenvectors of RRT
P (n x n), whose columns contain the orthonormal eigenvectors of RTR
∑ (m x n), only diagonal entries are nonzero and contain the square-root of the
eigenvalues of RTR (or equivalently RTR)
The squared error of factorization can be optimized only over the
observed entries of the ratings matrix.
A Regression Modeling View of Neighborhood Methods
Use the observed ratings in the matrix to set up a least-squares optimization
problem over the unknown values of w in order to minimize the overall error
Graph Models for neighborhood-Based Methods
Defining Neighborhoods with Random Walks
In the case of the Pearson’s correlation coefficient, two users need to be connected
directly to a set of common items for the neighborhood to be defined meaningfully. In
sparse user-item graphs, such direct connectivity may not exist for many nodes. On the
other hand, a random-walk method also considers indirect connectivity, because a walk
from one node to another may use any number of steps.
Katz measure: the weighted number of walks between a pair of nodes
β: discount factor
A: adjacency matrix
Katz measure is used to compute the affinity between pairs of users.
User-User Graphs
Horting: A user u is said to hort user v at
level (F,G), if either of the following are true:
Predictability: The user v predicts user u, if u
horts v and there exists a linear transformation
function f(·) such that the following is true
Let f1. . .fr represent the sequence of linear
transformations along the directed path
starting from node u to this user v.
Item-Item Graphs
The weights on edges correspond to random-walk probabilities.
∶ the number of walks of
length t between nodes i and j
Rule-Based & Naïve Bayes Collaborative Filtering
Support of an itemset X ⊆ I is the fraction of transactions in T , of which X
is a subset.
• If the support of an itemset is at least equal to a predefined threshold
s, then the itemset is said to be frequent.
• This threshold is referred to as the minimum support.
{Bread, Butter ,Milk}, {Fish, Beef, Ham} have a support of 2/7.
Confidence of the rule X → Y : conditional probability that a transaction
in T contains Y , given that it also contains X
{Bread, Milk} → {Butter}
Association Rules
A rule X → Y is said to be an association rule at a minimum support
of s and minimum confidence of c, if the following two conditions are
Rule-Based Collaborative Filtering Naïve Bayes Collaborative Filtering
Latent Factor Model and Matrix Factorization
Factorization is a general way of
approximating a matrix when it is prone
to dimensionality reduction because of
correlations between columns (or rows).
R (m x n)
U (m x k)
V (n x k)
k ≪ min 𝑚, 𝑛
Approximation Error:
The key usefulness of the approach arises when
the matrix R is not fully specified, but one can still
robustly estimate all entries of the latent factors U
and V, respectively.
Unconstrained Matrix Factorization
Stochastic Gradient Descent
In the context of a matrix with missing entries:
Let 𝑢𝑖 be the ith row of U and
𝑣𝑗 be the jth row of V:
Incorporating User and Item Biases
oi : general bias of user i
pj : general bias of item j
In fact, it has been shown that using only the bias variables (i.e., k = 0) can often
provide reasonably good rating predictions.
Incorporating Implicit Feedback
Even in cases in which users explicitly rate items, the identity of the items
they rate can be viewed as an implicit feedback
Y (n x k) : implicit item-factor matrix
F (m x n) : provides the linear combination coefficients to create a user-factor matrix from it
Other Matrix Factorization Methods
Singular Value Decomposition (SVD)
Columns of U and V are constrained to be mutually orthogonal.
Non-negative Matrix Factorization
Provide high-level interpretability
Probabilistic Latent Semantic Analysis (PLSA)
A probabilistic variant of non-negative matrix factorization.
• borrows ideas from support vector machines to add a maximum
margin regularizer to the objective function and some of its variants
• particularly effective for discrete ratings
Maximum Margin Factorization
Content-Based Recommender Systems
1. Preprocessing and feature extraction
Assume the features are keywords of each item
V (n x d) Feature matrix, n items, d keywords
Feature Representation (can be normalized)
• Unary
• Term Frequency tf (k): counts of keyword k in a content
• Weighted Term Frequency (Title counts more than body)
• TF-IDF (Term Frequency – Inverse Document Frequency)
• 𝑖𝑑𝑓(𝑘) = log(
), N: total content, 𝑛k: contents with keyword k
• tf-idf(k) = 𝑡𝑓 𝑘 × log
2. Content-based learning of user profiles (can be normalized)
R (m x n)
V normalized first
U (m x d) = R x V User Profile Matrix, m users, d keywords
3. Filtering and recommendation
V’ (n’ x d) Feature matrix, n’ testing items, d keywords
R’ (m x n’) = U x V’T Prediction matrix, m users, n’ testing items
Cosine Similarity
𝐶𝑜𝑛𝑠𝑖𝑛𝑒 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 =
𝐴 ∙ 𝐵
𝐴 = 1,1,1,0
𝐵 = 0,1,0,1
𝐴 = 12 + 12 + 12 + 02 = 3
𝐵 = 02 + 12 + 02 + 12 = 2
𝐴 ∙ 𝐵 = 1 ∗ 0 + 1 ∗ 1 + 1 ∗ 0 + 0 ∗ 1 = 1
𝐶𝑜𝑛𝑠𝑖𝑛𝑒 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 =
𝐴 ∙ 𝐵
3 ∙ 2
= 0.41
Content Similarity (n x n) = V x VT
Rating matrix, m users, n items
Each cell represent similarity between a user and an item
in the age of Deep Learning
Neural Collaborative Filtering (NCF) He 2017 GitHub
Neural Collaborative Filtering (Multi-layer Perceptron) Neural Matrix Factorization Model (MLP + GMF)
Learning from Implicit Data Generalized Matrix Factorization (GMF)
Hit Ration (HR): the test item among other 99 negative samples is ranked top 10
NDCG: assigns higher scores to hits at top ranks
one negative sample per positive instance is insufficient,
the optimal sampling ratio is around 3 to 6.
Ying 2018
I (2 billion pins) ↔ C (1 billion board)
Method Hit-rate
Visual embeddings (4,096 dimensions, from CNN) 17%
Annotation embeddings (256 dim, title & description -> Word2Vec) 14%
Combined embeddings (2-layer MLP on visual and annotation embeddings) 27%
Pixie (random-walk-based, closeness only from graph structure) -
PinSage (graph convolution with visual and annotation features) 67%
Hit-rate: probability that positive samples were ranked among the top
500 among the 5M negative samples
Importance pooling: based upon random walk similarity to choose positive
sampling, leading to a 46% performance gain in offline evaluation metrics.
Curriculum training: the algorithm is fed harder-and-harder examples (from
PageRank score) during training, resulting in a 12% performance gain.
A/B tests show 30%
to 100%
improvements in user
engagement across
various settings after
deploying PinSage
Neural Graph Collaborative Filtering (NGCF) Wang 2020 GitHub
High order connectivity contains rich semantics carrying
collaborative signal.
NGCF explicitly incorporated collaborative signal into the embedding
function of model-based CF, by leveraging high-order connectivity in
the user-item integration graph.
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
He 2020 GitHub
Feature transformation and nonlinear activation contribute
little to the performance of collaborative filtering
• NGCF-f, which removes the feature transformation matricesW1 andW2.
• NGCF-n, which removes the non-linear activation function σ.
• NGCF-fn, which removes both the feature transformation matrices and
non-linear activation function.
Light Graph Convolution (LGC)
Layer Combination and Model Prediction
Keep neighbored
aggregation only
user-item interaction matrix
Adjacency matrix
(M+N) X (M+N)
D is a (M+N)×(M+N) diagonal matrix, in which each entry Dii denotes the number of nonzero entries in the
ith row vector of the adjacency matrix A (also named as degree matrix).
Bayesian Personalized Ranking (BPR) loss
In all cases, LightGCN outperforms NGCF by a large margin
Temporal Graph Networks (TGN) for Deep Learning on Dynamic Graphs Rossi 2020
Deep learning on static graphs
Dynamic Graphs
• A node-wise event is represented by vi(t)
• An interaction event between nodes i and j is
represented by a (directed) temporal edge eij(t)
For each time t, the embedding of the graph nodes Z(t) = (z1(t),…, zn(t)(t))
Message Function
Message Aggregator
Memory Updater
Aggregation over graph
“Do not quench your inspiration
and your imagination; do not
become the slave of your
– Vincent van Gogh

More Related Content

What's hot

Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsRoelof van Zwol
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa

What's hot (20)

Recommendation system
Recommendation systemRecommendation system
Recommendation system
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines

Similar to Recommendation system

PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.Giuseppe Ricci
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...
A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...
A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...Monica Gero
A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...Gianluca Bontempi
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
CSE545_Porjecthan li
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationMaruf Aytekin
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenderssscdotopen
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networksbutest

Similar to Recommendation system (20)

PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Recommender system
Recommender systemRecommender system
Recommender system
A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...
A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...
A Collaborative Recommender System Based On Probabilistic Inference From Fuzz...
A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks

More from Ding Li

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u netDing Li
Titanic survivor prediction by machine learning
Titanic survivor prediction by machine learningTitanic survivor prediction by machine learning
Titanic survivor prediction by machine learningDing Li
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-netDing Li
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksDing Li
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science researchDing Li
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graphDing Li
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisDing Li
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudDing Li

More from Ding Li (13)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
Titanic survivor prediction by machine learning
Titanic survivor prediction by machine learningTitanic survivor prediction by machine learning
Titanic survivor prediction by machine learning
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Practical data science
Practical data sciencePractical data science
Practical data science
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling

Recommendation system

  • 1. Recommendation System with Machine Learning and Deep Learning Ding Li 2021.11
  • 2. 2 Charu C. Aggarwal, Recommender Systems: The Textbook, Springer Publishing Company, Incorporated, 2016 Recommendation In the age of Machine Learning
  • 3. 3 Goals of Recommendation Systems, Rating Types Operational & Technical Goals • Relevance (most important) • Novelty • Serendipity (surprising) • Diversity Business Goals • Improve user satisfaction • Improve user loyalty • Increase sales • Provide insights into users’ need • Help customize the user experience further Rating Types Explicit Like/dislike Implicit Interaction or not Less expressive • Continuous • Interval- based • Ordinal • Binary • Unary Prediction Types • Rating value of a user-item combination • Top-k items or top-k users
  • 4. 4 Basic Models of Recommendation Systems • Use the collaborative power of the ratings provided by multiple users to make recommendations • Observed ratings are often highly correlated across various users and items Collaborative Filtering Models Content-Based Recommender Systems • Descriptive attributes of items are used to make recommendations • Ratings and buying behavior of users are combined with the content information available in the items Knowledge-Based Recommender Systems • Allow the users to explicitly specify what they want • Based on the similarities between customer requirements and item descriptions, or the use of constraints specifying user requirements. • Constraint-based vs Case-based • Conversational vs Search-based vs Navigation-based Demographic Recommender Systems • Map specific demographics to ratings or buying propensities • Combined with additional context to guide the recommendation process Context-Based Recommender Systems • Time, Location, Social (structural recommendation) Hybrid and Ensemble-Based Recommender Systems Generalization of classification/regression modeling in which the prediction is performed in entry-wise fashion rather than row-wise fashion.
  • 5. 5 Neighborhood-Based Collaborative Filtering Models • Similar users display similar patterns of rating behavior • Predict using the ratings of neighboring user • Provide diverse recommendations User-based collaborative filtering • Similar items receive similar ratings • Predict using the user’s own ratings on neighboring items • Provide relevant recommendations Item-based collaborative filtering mean rating of user u Iu items rated by u mean-centered rating prediction function Example: select users (1, 2) as peer group to predict user 3’s ratings on item 1 and 3 Pu(j): closest users to target user u, who have specified ratings for item j. Adjusted cosine similarity between the items (columns) i and j: Each row of the ratings matrix is first centered to a mean of zero. Qt(u): top-k matching items to item t, rated by user u Example: items (2,3) are similar to item1; items (4,5) are similar to item 6
  • 6. 6 Neighborhood-Based Collaborative Filtering Optimization Impact of the Long Tail Some movies may be very popular, and they may repeatedly occur as commonly rated items by different users. Such ratings can sometimes worsen the quality of the recommendations because they tend to be less discriminative across different users. If mj is the number of ratings of item j, and m is the total number of users, then the weight wj of the item j: Less popular items will have more weights Clustering and Neighborhood-Based Methods • The users or items are divided into k clusters first • Top-k closest peers within the same cluster are used to perform prediction • Computation is significantly more efficient • The m x n matrix is incomplete, subset of dimensions are used in calculation Dimensionality Reduction • Principle Component Analysis (PCA) Rating matrix: R(m x n) → R’(m x d), d ≪ n • Singular Value Decomposition (SVD) Fill in the missing values in R (col/row average) → Rf n × n similarity matrix between pairs of items P(n x n), whose columns contain the orthonormal eigenvectors of S ∆: diagonal matrix containing the non-negative eigenvalues of S along its diagonal Pd (n x d): containing only the columns of P corresponding to the largest d eigenvectors Rf (m x n) is represented by Rf Pd (m x d ) each user is represented in a d-dimensional space • The maximum likelihood estimate of the covariance between each pair of items is estimated as the covariance between only the specified entries. • Incomplete matrix R, rather than filled matrix Rf, can be directly projected on the reduced matrix Pd . Filling missing values with mean introduces errors • Direct Matrix Factorization of Incomplete Data Q (m x m), whose columns contain the orthonormal eigenvectors of RRT P (n x n), whose columns contain the orthonormal eigenvectors of RTR ∑ (m x n), only diagonal entries are nonzero and contain the square-root of the eigenvalues of RTR (or equivalently RTR) The squared error of factorization can be optimized only over the observed entries of the ratings matrix. A Regression Modeling View of Neighborhood Methods Use the observed ratings in the matrix to set up a least-squares optimization problem over the unknown values of w in order to minimize the overall error
  • 7. 7 Graph Models for neighborhood-Based Methods Defining Neighborhoods with Random Walks In the case of the Pearson’s correlation coefficient, two users need to be connected directly to a set of common items for the neighborhood to be defined meaningfully. In sparse user-item graphs, such direct connectivity may not exist for many nodes. On the other hand, a random-walk method also considers indirect connectivity, because a walk from one node to another may use any number of steps. Katz measure: the weighted number of walks between a pair of nodes β: discount factor A: adjacency matrix Katz measure is used to compute the affinity between pairs of users. User-User Graphs Horting: A user u is said to hort user v at level (F,G), if either of the following are true: Predictability: The user v predicts user u, if u horts v and there exists a linear transformation function f(·) such that the following is true Let f1. . .fr represent the sequence of linear transformations along the directed path starting from node u to this user v. Item-Item Graphs The weights on edges correspond to random-walk probabilities. 𝑛𝑖𝑗 (𝑡) ∶ the number of walks of length t between nodes i and j
  • 8. 8 Rule-Based & Naïve Bayes Collaborative Filtering Support of an itemset X ⊆ I is the fraction of transactions in T , of which X is a subset. • If the support of an itemset is at least equal to a predefined threshold s, then the itemset is said to be frequent. • This threshold is referred to as the minimum support. {Bread, Butter ,Milk}, {Fish, Beef, Ham} have a support of 2/7. Confidence of the rule X → Y : conditional probability that a transaction in T contains Y , given that it also contains X {Bread, Milk} → {Butter} Association Rules A rule X → Y is said to be an association rule at a minimum support of s and minimum confidence of c, if the following two conditions are satisfied: Rule-Based Collaborative Filtering Naïve Bayes Collaborative Filtering
  • 9. 9 Latent Factor Model and Matrix Factorization Factorization is a general way of approximating a matrix when it is prone to dimensionality reduction because of correlations between columns (or rows). R (m x n) U (m x k) V (n x k) k ≪ min 𝑚, 𝑛 Approximation Error: The key usefulness of the approach arises when the matrix R is not fully specified, but one can still robustly estimate all entries of the latent factors U and V, respectively.
  • 10. 10 Unconstrained Matrix Factorization Stochastic Gradient Descent In the context of a matrix with missing entries: Let 𝑢𝑖 be the ith row of U and 𝑣𝑗 be the jth row of V: Regularization Incorporating User and Item Biases oi : general bias of user i pj : general bias of item j In fact, it has been shown that using only the bias variables (i.e., k = 0) can often provide reasonably good rating predictions. Incorporating Implicit Feedback Even in cases in which users explicitly rate items, the identity of the items they rate can be viewed as an implicit feedback Y (n x k) : implicit item-factor matrix F (m x n) : provides the linear combination coefficients to create a user-factor matrix from it
  • 11. 11 Other Matrix Factorization Methods Singular Value Decomposition (SVD) Columns of U and V are constrained to be mutually orthogonal. Non-negative Matrix Factorization Provide high-level interpretability Probabilistic Latent Semantic Analysis (PLSA) A probabilistic variant of non-negative matrix factorization. • borrows ideas from support vector machines to add a maximum margin regularizer to the objective function and some of its variants • particularly effective for discrete ratings Maximum Margin Factorization
  • 12. 12 Content-Based Recommender Systems 1. Preprocessing and feature extraction Assume the features are keywords of each item V (n x d) Feature matrix, n items, d keywords Feature Representation (can be normalized) • Unary • Term Frequency tf (k): counts of keyword k in a content • Weighted Term Frequency (Title counts more than body) • TF-IDF (Term Frequency – Inverse Document Frequency) • 𝑖𝑑𝑓(𝑘) = log( 𝑁 𝑛𝑘 ), N: total content, 𝑛k: contents with keyword k • tf-idf(k) = 𝑡𝑓 𝑘 × log 𝑁 𝑛𝑘 2. Content-based learning of user profiles (can be normalized) R (m x n) V normalized first U (m x d) = R x V User Profile Matrix, m users, d keywords 3. Filtering and recommendation V’ (n’ x d) Feature matrix, n’ testing items, d keywords R’ (m x n’) = U x V’T Prediction matrix, m users, n’ testing items Cosine Similarity 𝐶𝑜𝑛𝑠𝑖𝑛𝑒 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝐴 ∙ 𝐵 𝐴 𝐵 = 𝑖=1 𝑛 𝐴𝑖𝐵𝑖 𝑖=1 𝑛 𝐴𝑖 2 𝑖=1 𝑛 𝐵𝑖 2 𝐴 = 1,1,1,0 𝐵 = 0,1,0,1 𝐴 = 12 + 12 + 12 + 02 = 3 𝐵 = 02 + 12 + 02 + 12 = 2 𝐴 ∙ 𝐵 = 1 ∗ 0 + 1 ∗ 1 + 1 ∗ 0 + 0 ∗ 1 = 1 𝐶𝑜𝑛𝑠𝑖𝑛𝑒 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝐴 ∙ 𝐵 𝐴 𝐵 = 1 3 ∙ 2 = 0.41 Content Similarity (n x n) = V x VT Rating matrix, m users, n items Each cell represent similarity between a user and an item
  • 13. 13 Recommendation in the age of Deep Learning
  • 14. 14 Neural Collaborative Filtering (NCF) He 2017 GitHub Neural Collaborative Filtering (Multi-layer Perceptron) Neural Matrix Factorization Model (MLP + GMF) Learning from Implicit Data Generalized Matrix Factorization (GMF) Hit Ration (HR): the test item among other 99 negative samples is ranked top 10 NDCG: assigns higher scores to hits at top ranks one negative sample per positive instance is insufficient, the optimal sampling ratio is around 3 to 6.
  • 15. 15 Ying 2018 I (2 billion pins) ↔ C (1 billion board) Method Hit-rate Visual embeddings (4,096 dimensions, from CNN) 17% Annotation embeddings (256 dim, title & description -> Word2Vec) 14% Combined embeddings (2-layer MLP on visual and annotation embeddings) 27% Pixie (random-walk-based, closeness only from graph structure) - PinSage (graph convolution with visual and annotation features) 67% Hit-rate: probability that positive samples were ranked among the top 500 among the 5M negative samples Importance pooling: based upon random walk similarity to choose positive sampling, leading to a 46% performance gain in offline evaluation metrics. Curriculum training: the algorithm is fed harder-and-harder examples (from PageRank score) during training, resulting in a 12% performance gain. Pinterest A/B tests show 30% to 100% improvements in user engagement across various settings after deploying PinSage
  • 16. 16 Neural Graph Collaborative Filtering (NGCF) Wang 2020 GitHub High order connectivity contains rich semantics carrying collaborative signal. Target NGCF explicitly incorporated collaborative signal into the embedding function of model-based CF, by leveraging high-order connectivity in the user-item integration graph.
  • 17. 17 LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation He 2020 GitHub NGCF Feature transformation and nonlinear activation contribute little to the performance of collaborative filtering • NGCF-f, which removes the feature transformation matricesW1 andW2. • NGCF-n, which removes the non-linear activation function σ. • NGCF-fn, which removes both the feature transformation matrices and non-linear activation function. Light Graph Convolution (LGC) Layer Combination and Model Prediction Keep neighbored aggregation only user-item interaction matrix Adjacency matrix (M+N) X (M+N) D is a (M+N)×(M+N) diagonal matrix, in which each entry Dii denotes the number of nonzero entries in the ith row vector of the adjacency matrix A (also named as degree matrix). Bayesian Personalized Ranking (BPR) loss In all cases, LightGCN outperforms NGCF by a large margin
  • 18. 18 Temporal Graph Networks (TGN) for Deep Learning on Dynamic Graphs Rossi 2020 Deep learning on static graphs Dynamic Graphs • A node-wise event is represented by vi(t) • An interaction event between nodes i and j is represented by a (directed) temporal edge eij(t) For each time t, the embedding of the graph nodes Z(t) = (z1(t),…, zn(t)(t)) Message Function Message Aggregator Memory Updater Embedding Aggregation over graph
  • 19. 19 “Do not quench your inspiration and your imagination; do not become the slave of your model.” – Vincent van Gogh