Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
We have built an online Movie Recommender System which is based on the analysis of users' ratings history to several movies and their demographic information. We used data from Movielens website. Collaborative filtering and matrix factorization techniques have been used for the implementation. The end result is a web application where a user is recommended with top 20 movies.
Codebase: http://goo.gl/nM7RMy
Demo Video: http://goo.gl/VgZ2uI
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
Recommender systems are software tools and techniques providing suggestions for items to be of interest to a user. Recommender systems have proved in recent years to be a valuable means of helping Web users by providing useful and effective recommendations or suggestions.
The goal of a recommender system is to predict the degree to which a user will like or dislike a set of items, such as movies or TV shows.
Most recommender systems use a combination of different approaches, but broadly speaking there are three different methods that can be used: Content analysis, Social recommendations and Collaborative filtering.
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
Keynote for the ACM Intelligent User Interface conference in 2016 in Sonoma, CA. I start with the past by talking about the Recommender Problem, and the Netflix Prize. Then I go into the Present and the Future by talking about approaches that go beyond rating prediction and ranking and by finishing with some of the most important lessons learned over the years. Throughout my talk I put special emphasis on the relation between algorithms and the User Interface.
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...Rajasekar Nonburaj
The topic presented at the "Datascience Chennai June Meetup"
"Building a Recommender systems" by Vivek Murugesan - Technical Architect at Crayon Data. Check more at https://www.meetup.com/datasciencechn
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
We have built an online Movie Recommender System which is based on the analysis of users' ratings history to several movies and their demographic information. We used data from Movielens website. Collaborative filtering and matrix factorization techniques have been used for the implementation. The end result is a web application where a user is recommended with top 20 movies.
Codebase: http://goo.gl/nM7RMy
Demo Video: http://goo.gl/VgZ2uI
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
Recommender systems are software tools and techniques providing suggestions for items to be of interest to a user. Recommender systems have proved in recent years to be a valuable means of helping Web users by providing useful and effective recommendations or suggestions.
The goal of a recommender system is to predict the degree to which a user will like or dislike a set of items, such as movies or TV shows.
Most recommender systems use a combination of different approaches, but broadly speaking there are three different methods that can be used: Content analysis, Social recommendations and Collaborative filtering.
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
Keynote for the ACM Intelligent User Interface conference in 2016 in Sonoma, CA. I start with the past by talking about the Recommender Problem, and the Netflix Prize. Then I go into the Present and the Future by talking about approaches that go beyond rating prediction and ranking and by finishing with some of the most important lessons learned over the years. Throughout my talk I put special emphasis on the relation between algorithms and the User Interface.
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...Rajasekar Nonburaj
The topic presented at the "Datascience Chennai June Meetup"
"Building a Recommender systems" by Vivek Murugesan - Technical Architect at Crayon Data. Check more at https://www.meetup.com/datasciencechn
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Daniel Valcarce
Slides of the presentation given at ECIR 2016 for the following paper:
Daniel Valcarce, Javier Parapar, Alvaro Barreiro: Language Models for Collaborative Filtering Neighbourhoods. ECIR 2016: 614-625
http://dx.doi.org/10.1007/978-3-319-30671-1_45
As it is suggested in the name, we use recommender systems to recommend items to users bases on their preferences, and the preferences of other users.
We will talk about two categories of recommoncder systems : Content based filtering and Collaborative filtering. In the later one, there are two approaches: neighborhood approach, and model based approach. In this section, we see the first one.
[Notebook](https://colab.research.google.com/drive/12gM8EEa6gxhgpMB-QvCbfmwwZm7MVrku)
Some highlights from Recsys 2018 presented to my team at Schibsted. Note this is a "biased" summary based on personal interest and work related to my team.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
Talk at RecSys 2017 in Como, Italy on 2017-08-29.
Abstract:
Time plays a key role in recommendation. Handling it properly is especially critical when using recommender systems in real-world applications, which may not be as clear when doing research with historical data. In this talk, we will discuss some of the important challenges of handling time in recommendation algorithms at Netflix. We will focus on challenges related to how our users, items, and systems all change over time. We will then discuss some strategies for tackling these challenges, which revolves around proper treatment of causality in our systems.
DVC - Git-like Data Version Control for Machine Learning projectsFrancesco Casalegno
DVC is an open-source tool for versioning datasets, artifacts, and models in Machine Learning projects.
This extremely powerful tool allows you to leverage an intuitive git-like interface to seamlessly
1. track datasets version updates
2. have reproducible and sharable machine learning pipelines (e.g. model training)
3. compare model performance scores
4. integrate your data and model versioning with git
5. deploy the desired version of your trained models
Ordinal Regression and Machine Learning: Applications, Methods, MetricsFrancesco Casalegno
What do movie recommender systems, disease progression evaluation, and sovereign credit ranking have in common?
→ ordinal regression sits between classification and regression
→ target values are categorical and discrete, but ordered
→ many challenges to face when training and evaluating models
What will you find in this presentation?
→ real life, clear examples of ordinal regression you see everyday
→ learning to rank: predict user preferences and items relevance
→ best solution methods: naïve, binary decomposition, threshold
→ how to measure performance: understand & choose metrics
Why should you care about Markov Chain Monte Carlo methods?
→ They are in the list of "Top 10 Algorithms of 20th Century"
→ They allow you to make inference with Bayesian Networks
→ They are used everywhere in Machine Learning and Statistics
Markov Chain Monte Carlo methods are a class of algorithms used to sample from complicated distributions. Typically, this is the case of posterior distributions in Bayesian Networks (Belief Networks).
These slides cover the following topics.
→ Motivation and Practical Examples (Bayesian Networks)
→ Basic Principles of MCMC
→ Gibbs Sampling
→ Metropolis–Hastings
→ Hamiltonian Monte Carlo
→ Reversible-Jump Markov Chain Monte Carlo
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapFrancesco Casalegno
••• Learn how to correctly compute and interprete Confidence Intervals •••
In this presentation:
▸ (mis)understanding the real meaning of confidence intervals
▸ exact methods for known distributions
▸ approximated methods for non-parametric statistics
▸ resampling techniques: jackknife and bootstrap
••• Learn how to safely manage memory with smart pointers! •••
In this presentation you will learn:
▸ the dangers of using raw pointers for dynamic memory
▸ the difference between unique_ptr, shared_ptr, weak_ptr
▸ how to use factories to increase safety and performance
▸ when raw pointers are still needed
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...Francesco Casalegno
••• Exploit the full potential of the CRTP! •••
In this presentation you will learn:
▸ what is the curiously recurring template pattern
▸ the actual cost (memory and time) of virtual functions
▸ how to implement static polymorphism
▸ how to implement expression templates to avoid loops and copies
••• Boost your code's performances using C++11 new features! •••
In this presentation you will learn:
▸ the difference between an Lvalue and Rvalue
▸ how to use std::move, std::forward, noexcept
▸ how to implement move semantics to avoid useless copies
▸ how to implement perfect forwarding for the factory pattern
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
2. Francesco Casalegno – Recommender Systems 2
● Users cannot evaluate overwhelming numbers of alternatives
○ YouTube: 5 B videos (watched every day)
2 B users
○ Amazon: 3 B products (across 11 marketplaces)
200 M users (per month)
Problem Statement
Francesco Casalegno
● Recommender Systems to the rescue!
○ Predict rating or preference a user would give to an item
○ Provide users with suggestions for items to be of use to them
● Many challenges are involved
○ Accuracy of recommendations
○ Scalability (> 2 B users on YouTube)
○ Serendipity (surprising + fortunate discoveries)
○ Explainability
○ Cold start
○ Interests evolves over time
3. Francesco Casalegno – Recommender Systems
Explicit Feedback: Ratings
3
● In many cases, users give explicit feedback to some items they viewed/purchased
● We can then define the rating matrix by rui
= rating given by user u to item i
○ The matrix rui
is usually large and sparse, as users view only few items and rate even fewer
○ We denote by K the set of pairs (u, i) s.t. rui
is known, i.e. user u gave a rating to item i
➝ Problem Statement: Predict rui
for unobserved items
4. Francesco Casalegno – Recommender Systems
Implicit Feedback: Confidence
● Explicit feedback (rate 1 to 5, like/dislike, …) is not always available, at least not in large quantities
○ But we can use implicit feedback, indirectly reflecting opinions through behavior
○ Examples: purchase history, browsing history, search patterns, mouse movements, …
● Implicit feedback is much more abundant, but also more difficult to use.
○ No negative feedback. User did not watch a movie: she dislikes it / does not even know it exists?
○ Noise. User searches a product: he may be buying a gift; he may be disappointed; …
○ Appreciation vs Confidence. Unlike the explicit case, rui
here measures confidence
→if you watch/search something many times/for long durations, probably you liked it
■ For how much time did the user watch the show?
■ How many times did the user search an item?
○ Evaluation metric: choice is not straightforward
● Example. TV shows, rui
= how many times u fully watched show i
○ rui
= 0.5 → user (got bored?) stopped watching at half show
○ rui
= 2 → user (loved it? fell asleep and played in loop?) watched the show twice
4
10. Francesco Casalegno – Recommender Systems
Motivation
10
Francesco CasalegnoJean Dupont
11. Classes of Recommender Systems
11
Recommender Systems
Content-Based Filtering Collaborative Filtering
User-Based
Model-BasedMemory-Based
Item-Based
12. Francesco Casalegno – Recommender Systems
Classes of Recommender Systems
12
Recommender Systems
Content-Based Filtering Collaborative Filtering
User-Based
Model-BasedMemory-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD,
Neural Networks
Information Retrieval Methods
13. Francesco Casalegno – Recommender Systems
Classes of Recommender Systems
13
Recommender Systems
Content-Based Filtering Collaborative Filtering
User-Based
Model-BasedMemory-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD,
Neural Networks
Information Retrieval Methods
14. Francesco Casalegno – Recommender Systems
● Recommend items by assuming that users who
agreed in the past will agree in future
● Tracks and compare user activity
○ explicit: like/dislike, star ratings
○ implicit: viewing times, purchased items
● Examples:
➕ Works without needing any knowledge of items
➕ More variety in recommendations
➖ Cold start: need much data to get accurate
➖ Shilling attacks
● Recommend items having features similar to
those of the items liked by the user in the past
→ extract features + use information retrieval
● Similarity of items is based on discrete features
○ text: word counts / tf-idf (see
○ movies: “comedy”, “horror”, … tags
○ songs: Music Genome Project attributes
(e.g. “aggressive drummings”)
● Examples:
➕ Needs little info on user to start
➕ Leverages items info (e.g. genre) if available
➖ Proposes items too similar to those liked by user
➖ Requires to describe features for each item
Content-Based VS Collaborative Filtering
1414
Content-Based Filtering Collaborative Filtering
… obviously the two approaches can be combined (hybrid methods)
15. Francesco Casalegno – Recommender Systems
Content-Based VS Collaborative Filtering
Content-Based Filtering Collaborative Filtering
1515
similar features
(taste, ingredients, …)
liked
by user 1
recommended
to user 1
user 1
user 1 user 2
liked
by user 1
liked
by user 2
liked
by user 2
recommended
to user 1
16. Francesco Casalegno – Recommender Systems
Classes of Recommender Systems
16
Recommender Systems
Content-Based Filtering Collaborative Filtering
User-Based
Model-BasedMemory-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD
Information Retrieval Methods
17. Francesco Casalegno – Recommender Systems
● Use given ratings as training set to fit a model
that predicts users' rating of unrated items
● Typically uses
○ Embedding / dim. reduction / matrix factoriz.
○ Machine Learning models to train & predict
● Examples
○ Co-Clustering
○ SVD
➕ Scales well with sparse data
➕ ML models can capture more complex relations
➕ Fast prediction
➕ Usually better predictions than memory-based
➖ Learning/Training phase required
● Uses users’ ratings to compute the similarity
between users or items
● Typically uses
○ Similarity (cosine dist., Pearson correlation)
○ Predict a weighted average of ratings
● Examples
○ k-Nearest Neighbors
○ Slope One
➕ Easy to implement
➕ Explainable results
➖ Scalability issues for sparse data
➖ Slow predictions (has to find similar items/users)
Memory-Based VS Model-Based
1717
Memory-Based Model-Based
19. Francesco Casalegno – Recommender Systems
Classes of Recommender Systems
19
Recommender Systems
Content-Based Filtering Collaborative Filtering
User-Based
Model-BasedMemory-Based
Item-Based
Item-Based k-NN,
Slope One
User-Based k-NN
Co-Clustering,
SVD
Information Retrieval Methods
20. Francesco Casalegno – Recommender Systems
User-Based VS Item-Based
● Memory-Based models predict the rating rui
given user u to item i in different ways.
○ User-Based models looks at users v∊V that are similar to u.
■ rui
prediction is based on ratings rvi
given by similar users to same item
○ Item-Based models look at items j∊J that are similar to i.
■ rui
prediction is a based on ratings ruj
given by same user to similar item
20
ruj
= 6.0
user u
ruj
= 8.0 ruj
= 3.0
predict
item i rui
= 6.5
similar items
similar users
rvi
= 5.0
user u
rvi
= 9.5 rvi
= 8.5
predict
item i rui
= 8.0
21. Francesco Casalegno – Recommender Systems
● Simplest class of methods, based on looking at ratings of most similar (neighbors) users/items.
● First, we represent users and items by simply considering rows and cols of rating matrix:
○ user u is represented by the vector [ru1
, ru2
, ru3
, ...]
○ item i is represented by the vector [r1i
, r2i
, r3i
, ...]
● Then, we compute similarity between vectors
○ sim(u, v) / sim(i, j) can be: cosine similarity, Pearson’s correlation, …
○ But our vectors have unknown entries! → considering only indexes where ratings are known:
● Finally, we pick a number of nearest neighbors k and we predict the rating rui
as
→ The first formula corresponds to the user-based k-NN, the second to the item-based k-NN.
k-Nearest Neighbors Method
21
sim([?, ?, 4, 5, ?, 2], [?, ?, ?, 3, 4, 1]) ⟶ sim([5, 2], [3, 1])
or(1) (2)
Nk
i
(u) = k items most similar to i that are rated by user u
22. Francesco Casalegno – Recommender Systems
● Simple, yet powerful item-based method with good scalability and less prone to overfit.
● Idea: we could fit a linear model y = ax + b for any x = ruj
and y = rui
■ For 1,000 items, that means 2 M coefficients to learn!
■ Prone to overfit
● So, instead we use simplified (slope-one) linear regression of the form y = x + b
■ More robust to overfit
■ Coefficients can be computed very easily, and we get:
mu
= avg rating of user u
Ri
(u)= items j rated by u also having at least one common user with i
dev(i,j) = average items deviation =
Uij
= users having rated both i and j
Slope One Method
22
24. Francesco Casalegno – Recommender Systems
● Cluster = subset of rows (columns) with similar behavior across the set of all columns (rows)
● Co-cluster = subset of rows + subset of cols, where rows have similar behavior across cols, and vice-versa
● We can then base our model on these clusters and predict ratings as
Co-clustering Method
24
rating matrix rui
item clustering user clustering
rating matrix rui
co-clustering
rating matrix rui
Cu
= avg rating of cluster user u belongs to
Ci
= avg rating of cluster item i belongs to
Cui
= avg rating of cluster user u and item i belong to
mu
= avg rating of user u
mi
= avg rating of item i
25. Francesco Casalegno – Recommender Systems
● A popular set of methods is based on matrix factorization of the rating matrix X = {rui
} ∊ Rn x m
○ Remark: A ∊ Rn x m
has rank k ↔ Y = QT
P for some Q ∊ Rk x n
and P ∊ Rk x m
○ Remark: A ∊ Rn x m
has SVD decomposition A = UΣVT
and truncated SVD Ak
= ∑i=1..k
σi
ui
vi
T
● In particular, recommender systems focus on these two different low-rank approximations:
○ SVD coincides with the solution to the problem (ǁxǁF
= Frobenius norm)
This result is known as Eckart-Young-Mirsky Theorem.
○ NMF (non-negative matrix factorization) is defined as the solution to the problem
s.t. Q, P have all coefficients ≥ 0
● Idea: factorize matrix {rui
} (with SVD or NMF) , then predict rui
as qi
T
pu
… but {rui
} has unknown entries!
○ We could fill {rui
} with 0 when unknown entries → old approach, not really meaningful…
○ … or instead solve minimization problem only on known entries → much better!
Matrix Factorization Methods
25
26. Francesco Casalegno – Recommender Systems
SVD Method
● One of the most popular methods, equivalent to Probabilistic Matrix Factorization..
● Idea: if we had a representation xu
∊ Rf
for user u, then we solve a linear regression problem:
○ ...so we try to learn representations qi
for items and pu
for users:
○ notice that if λ = 0 and if all ratings rui
are in K (i.e. are known) this is exactly SVD decomposition!
● How to minimize this loss function?
○ GD (gradient descent) → scalability issues + loss is non convex!
○ ALS (alternating least squares) → 2-step iterative method, solves 2 convex problems:
1. fix pu
, solve optimization problem for qi
2. fix qi
, solve optimization problem for pu
● Much of the variation in ratings is due to effects, called biases, associated with users or items.
So, most SVD-based methods modify the model to include item biases bi
and user biases bu
:
26
where m is the overall average rating.
27. Francesco Casalegno – Recommender Systems
SVD Method for Implicit Feedback
● For implicit feedback rui
(e.g. how many times u watched show i) is a measure of a confidence value
○ Define binary preferences bui
= 1 if rui
>0, and 0 otherwise
○ Define confidence variables as cui
= 1 + α rui
(typically α = 40)
● The problem is then formulated in terms of trying to predict preferences as
○ The minimization of the loss function can be efficiently done using ALS
● SVD methods presented here (both explicit and implicit) are very scalable
○ Spark ML uses these two methods to implement recommender systems
27
29. Francesco Casalegno – Recommender Systems
Neural Net for Explicit Feedback
● Idea: start writing SVD Method as a Neural Net
● How can we improve this network?
Learn a generic learnable function (with fully-connected layers) instead of simple dot-product
○ Include users metadata (age, sex, ...) and items metadata (cost, class, ...) as inputs to the network
29
30. Francesco Casalegno – Recommender Systems
Triplet Loss and Siamese Networks
30
● Idea: have a Neural Net learning how close a sample is to an anchor
○ Output of NN is a learned distance between anchor and sample: dNN
(a, x)
○ Train using triplet loss and siamese network:
■ we want dNN
(a, x+
) > dNN
(a, x–
) for a positive and a negative sample
■ equivalent to dNN
(a, x+
) - dNN
(a, x–
) + α ≥ 0
● Applications
○ Face Recognition:
○ Learn to Rate: say x is preferred by a over y if dNN
(a, x) > dNN
(a, y)
31. Francesco Casalegno – Recommender Systems
Neural Net for Implicit Feedback
31
● Idea: use siamese network to learn user’s preferences
○ Train with triplet loss: users prefer shows they watched over shows they have not watched (yet)
■ i+ (positive samples) = shows watched by u
■ i- (negative samples) = shows watched by u
○ Then, sort all unseen shows using the predicted distance user-item dNN
(u, i)
32. Francesco Casalegno – Recommender Systems
Neural Net with Hybrid Approach
● Finally, here is a more complex hybrid approach for YouTube recommendations.
32
34. Francesco Casalegno – Recommender Systems
1. Understand which kind of data you have
○ explicit feedback (users ratings) = easy to use, available in limited amount
○ implicit feedback (users activities) = difficult to use, available in greater quantity
2. Decide which approach works best in your case
○ content-based = good if you can extract features (e.g. tf-idf), ignores the other users
○ collaborative filtering = leverages from all users ratings, but cold start and shilling attacks
3. Choose the method considering several factors
○ scalability = working with 1,000 items or with 1 B items is not the same
○ easy to update = add users and items to the system does not require rebuild from scratch
○ cold start = new user or item, no interaction relative to them are available yet
○ accuracy = make relevant recommendations
○ interpretability = “why I am seeing this ad?”
Take-Home Messages
34
35. Francesco Casalegno – Recommender Systems
● Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the
10th ACM Conference on Recommender Systems. ACM, 2016. [link]
● George, Thomas, and Srujana Merugu. "A scalable collaborative filtering framework based on co-clustering." Data Mining,
Fifth IEEE international conference on. IEEE, 2005. [link]
● Grisel, Olivier, presentation at dotAI Conference, Paris, 2017 [link]
● Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008.
ICDM'08. Eighth IEEE International Conference on. Ieee, 2008. [link]
● Hug, Nicolas. "Surprise, a Python library for recommender systems." (2017). [link]
● Ricci, F. Rokah, L. Sharpira, and B. Kantor. "Recommender Systems Handbook." (2010).
● Zhou, Yunhong, et al. "Large-scale parallel collaborative filtering for the netflix prize." International Conference on
Algorithmic Applications in Management. Springer, Berlin, Heidelberg, 2008. [link]
References
35