SlideShare a Scribd company logo
1 of 39
Download to read offline
An introduction to recommendation algorithms
Collaborative filtering: how does it work?
Arnaud de Myttenaere
About me
Data Scientist, PhD
Founder of Uchidata
Consultant at Octo Technology, Sydney
Several projects on recommendation
algorithms (Viadeo social network,
e-commerce, news website, . . . )
How do recommendation algorithms work?
Context
Available data
Personal information Historical behavior
Objective
→ If a young man is a fan of Daft Punk, what’s the best artist we
could recommend to him?
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Model Based approach
1. Build a dataset which summarizes the data
UserId Like Gender Age Artist Style Country
1 1 M 25 Daft Punk Electro France
1 0 M 25 Lady Gaga Pop USA
2 1 F 20 The Beatles Rock UK
target user information item information
2. Learn a model to predict the target variable using your favorite
algorithm: Linear Regression, Random Forest, XGBoost, . . .
3. For each new customer, apply the model on a set of artists
and recommend the ones with the highest scores.
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Memory Based approach
How to recommend items to a particular customer or user?
For each new customer:
Search for similar customers in
historical data
Recommend popular items among
similar customers
Example: Collaborative Filtering.
Summary
Model based
Memory based
Collaborative filtering: why?
Collaborative Filtering algorithms1 ..
are intuitive
are simple to implement
scales relatively well
captures many implicit signals
are hard to beat
1
Criteo Product Recommendation Beyond Collaborative Filtering - welcome
to the Twilight Zone!, Olivier Koch RecSys Meetup London - Sept 20, 2017
But Collaborative Filtering algorithms also have some limitations2 :
does not scales that well in practice
does not capture temporal signals
does not solve cold start
does not address exploration in the long tail
2
Criteo Product Recommendation Beyond Collaborative Filtering - welcome
to the Twilight Zone!, Olivier Koch RecSys Meetup London - Sept 20, 2017
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Context
Let us consider the following example:
John likes Rock
Mike likes Pop and Electro
Dan likes Pop, R&B and Rock
Lea likes Pop
This information can be loaded in the folowing dataset:
Customer Item
John Rock
Mike Pop
Mike Electro
Dan Pop
Dan R&B
Dan Rock
Lea Pop
Objective: find the best recommendation for Lea.
Graph visualization
The data can be visualized as a (bipartite) graph.
Code : R
Library : igraph
JohnMike Dan Lea
Rock PopElectro R&B
1 l i b r a r y ( i g r a p h )
d = read . csv ( ” data . csv ” ) # Load Data
3 g = graph . data . frame ( d ) # Load Data i n t o graph
5 V( g ) $ type <− V( g ) $name %i n% d$ Item # Set graph as b i p a r t i t e
p l o t (g , l a y o u t=l a y o u t . b i p a r t i t e , v e r t e x . c o l o r=c ( ” green ” , ” cyan ” ) [V( g ) $ type +1])
Incidence matrix
This graph can be represented by a matrix...






Rock Pop Electro R&B
John : 1 0 0 0
Mike : 0 1 1 0
Dan : 1 1 0 1
Lea : 0 1 0 0






A = get . i n c i d e n c e (g , s p a r s e=TRUE)
Incidence matrices
... or by two matrices:
Atrain =




Rock Pop Electro R&B
John : 1 0 0 0
Mike : 0 1 1 0
Dan : 1 1 0 1




Atest =
Rock Pop Electro R&B
Lea : 0 1 0 0
1 A t r a i n = A[ which ( rownames (A) != ”Lea” ) , ]
A t e s t = A[ which ( rownames (A) == ”Lea” ) , ]
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
If similarity is the number of items in common (1/2)
The similarity vector is given by:
SimMatrix = Atrain ⊗ t(Atest)
i.e.
SimMatrix =


1 0 0 0
0 1 1 0
1 1 0 1

 ⊗




0
1
0
0



 =


0
1
1


John
Mike
Dan
Indeed Lea does not have any item in common with John, but has
1 item in common with Mike and Dan (Pop).
sim matrix = A t r a i n % ∗ % A t e s t
If similarity is the number of items in common (2/2)
Then the recommendation scores are given by
scoreMatrix = t(SimMatrix) ⊗ Atrain
i.e.
scoreMatrix = 0 1 1 ⊗


1 0 0 0
0 1 1 0
1 1 0 1


So
scoreMatrix =
Rock Pop Electro R&B
(1 2 1 1)
1 s c o r e M a t r i x = t ( as . matrix ( sim matrix ) ) % ∗ % A t r a i n
Comments
If similarity is the number of items in common...
→ not optimal since users with a lot of items will be very similar
to (almost) every user.
→ hard to use because leads to a lot of items with the same
recommendation score.
Better similarity metric: cosine similarity
→ Idea: normalize the similarity using the number of items
associated to each users.
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Cosine similarity (1/3)
Using the same data:
Atrain =




Rock Pop Electro R&B
John : 1 0 0 0
Mike : 0 1 1 0
Dan : 1 1 0 1




The normalization vector is:
Ntrain =


1
1/
√
2
1/
√
3


John is associated to 1 item (Rock)
Mike is associated to 2 items (Pop, Electro)
Dan is associated to 3 items
Cosine similarity (1/3)
Then the normalized similarity matrix is given by:
SimMatrixnorm =



1 . .
. 1√
2
.
. . 1√
3


⊗


0
1
1

 =


0
1/
√
2
1/
√
3




0
0.71
0.58


John
Mike
Dan
→Lea is more similar to Mike than Dan, because she has an item
in common with both but Mike is associated to less items than
Dan, so the link with Mike is stronger.
1 N t r a i n = 1/ s q r t (A t r a i n % ∗ % rep (1 , ncol (A) ) )
M norm = diag ( as . v e c t o r (N t r a i n ) )
Cosine similarity (3/3)
The matrix of scores is given by:
scoreMatrix = t(SimMatrixnorm) ⊗ Atrain
i.e.
scoreMatrix = 0 0.71 0.58 ⊗


1 0 0 0
0 1 1 0
1 1 0 1


So
scoreMatrix =
Rock Pop Electro R&B
(0.58 1.29 0.71 0.58)
s c o r e M a t r i x = t ( as . matrix ( sim matrix norm ) ) % ∗ % A t r a i n
Comments
For
Atest =
Rock Pop Electro R&B
Lea : 0 1 0 0
Recommendation scores are
scoreMatrix =
Rock Pop Electro R&B
(0.58 1.29 0.71 0.58)
1. Pop is the best recommendation for Lea, but she is already
associated to Pop.
2. If the objective is to recommend new items, Electro is the
best recommendation for Lea.
3. Rock and R&B have the same score and can be ordered by
frequencies or randomly.
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
R code
Collaborative filtering in 10 lines of code.
1 l i b r a r y ( i g r a p h ) # Load graph l i b r a r y
2 g = graph . data . frame ( read . csv ( ” data . csv ” ) ) # Read and con vert data i n t o graph
V( g ) $ type <− V( g ) $name %i n% d$ Item # Set graph as b i p a r t i t e
4 A = get . i n c i d e n c e (g , s p a r s e=TRUE) # Compute I n c i d e n c e Matrix
5 A t r a i n = A[ which ( rownames (A) != ”Lea” ) , ]
A t e s t = A[ which ( rownames (A) == ”Lea” ) , ]
N t r a i n = 1/ s q r t (A t r a i n % ∗ % rep (1 , ncol (A) ) )
8 M norm = diag ( as . v e c t o r (N t r a i n ) )
sim matrix norm = M norm % ∗ % (A t r a i n % ∗ % A t e s t )
10 s c o r e M a t r i x = t ( as . matrix ( sim matrix norm ) ) % ∗ % A t r a i n
In practice
Can be precomputed and do not need to be updated in real time :
Atrain and Mnorm
Must be computed in real time :
Atest and scoreMatrix (matrix calculation)
Optimal number of users :
too small → bad performance,
too big → too slow (unless computation is parallelized).
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Collaborative Filtering
Notations
Let Iu(t) be the vector of items associated to a user u at time t:
Iu(t) = (0, 0, . . . , 1, . . . , 0)
where the kth coefficient is equal to 1 if item k is associated to
user u at time t, and 0 else.
Example: in music recommendation an ”item” could be an artist (or a song), and
coefficient k is equal to 1 if the user u likes the artist (or song) k.
Collaborative Filtering
Then, for t > t, the collaborative filtering algorithm estimates
Iu(t ) (the future vector of items associated to the user u) by:
Iu(t ) =
v=u
sim(v, u) · Iv (t)
where sim(v, u) represents the similarity between users u and v.
→ The most relevant items for the user u are the ones with the
highest score.
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Similarity function
The similarity between two users can be defined as the number of
items in common. Then
sim(u, v) = Iu|Iv
where ·|· is the classical scalar product.
→ not optimal since users with a lot of items will be very similar to
every user.
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Cosine similarity
Cosine similarity One can normalize the similarity by the number of
items associated to users u and v.
sim(u, v) =
Iu|Iv
Iu · Iv
However, as
Iu(t ) =
v=u
Iu|Iv
Iu · Iv
· Iv =
1
Iu
·
v=u
Iu|Iv
Iv
· Iv
The order of recommendations for the user u is the same than the
ones got with sim(u, v) = Iu|Iv
√
Iv
.
→ In practice we can use sim(u, v) = Iu|Iv
√
Iv
.
The different approaches
Model Based
Memory Based
Collaborative filtering using graph libraries
Context
A simple example
Cosine similarity
R code
More formally
Notations
Similarity function
Cosine similarity
Conclusion
Conclusion
Two different approaches :
Model Based
Memory Based
→ Choose the number of users in Atrain to fit your practical
constraints,
→ The definition of similarity between users can be modified to
consider users and context.
Conclusion
However
This algorithm is based on past behavior, so it never suggests
new content.
→ It is necessary to often refresh the training set.
Thanks!

More Related Content

What's hot

Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network SciencePavel Loskot
 
Algorithms - a brief introduction
Algorithms - a brief introductionAlgorithms - a brief introduction
Algorithms - a brief introductionGiacomo Belocchi
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lectureSara-Jayne Terp
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowAndrew Ferlitsch
 
Novel Machine Learning Methods for Extraction of Features Characterizing Data...
Novel Machine Learning Methods for Extraction of Features Characterizing Data...Novel Machine Learning Methods for Extraction of Features Characterizing Data...
Novel Machine Learning Methods for Extraction of Features Characterizing Data...Velimir (monty) Vesselinov
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 
Solving travelling salesman problem using firefly algorithm
Solving travelling salesman problem using firefly algorithmSolving travelling salesman problem using firefly algorithm
Solving travelling salesman problem using firefly algorithmishmecse13
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksIlias Flaounas
 
Contextual Bandit Survey
Contextual Bandit SurveyContextual Bandit Survey
Contextual Bandit SurveySangwoo Mo
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance charlesmartin14
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyVidhya Murali
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationTravis Oliphant
 
Firefly Algorithm, Stochastic Test Functions and Design Optimisation
 Firefly Algorithm, Stochastic Test Functions and Design Optimisation Firefly Algorithm, Stochastic Test Functions and Design Optimisation
Firefly Algorithm, Stochastic Test Functions and Design OptimisationXin-She Yang
 

What's hot (20)

Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network Science
 
Algorithms - a brief introduction
Algorithms - a brief introductionAlgorithms - a brief introduction
Algorithms - a brief introduction
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
 
Novel Machine Learning Methods for Extraction of Features Characterizing Data...
Novel Machine Learning Methods for Extraction of Features Characterizing Data...Novel Machine Learning Methods for Extraction of Features Characterizing Data...
Novel Machine Learning Methods for Extraction of Features Characterizing Data...
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
Solving travelling salesman problem using firefly algorithm
Solving travelling salesman problem using firefly algorithmSolving travelling salesman problem using firefly algorithm
Solving travelling salesman problem using firefly algorithm
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricks
 
Contextual Bandit Survey
Contextual Bandit SurveyContextual Bandit Survey
Contextual Bandit Survey
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
Firefly Algorithm, Stochastic Test Functions and Design Optimisation
 Firefly Algorithm, Stochastic Test Functions and Design Optimisation Firefly Algorithm, Stochastic Test Functions and Design Optimisation
Firefly Algorithm, Stochastic Test Functions and Design Optimisation
 

Similar to An introduction to recommendation algorithms using collaborative filtering

Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
 
Chapter8-Link_Analysis.pptx
Chapter8-Link_Analysis.pptxChapter8-Link_Analysis.pptx
Chapter8-Link_Analysis.pptxAmenahAbbood
 
Chapter8-Link_Analysis (1).pptx
Chapter8-Link_Analysis (1).pptxChapter8-Link_Analysis (1).pptx
Chapter8-Link_Analysis (1).pptxAmenahAbbood
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
Page rank
Page rankPage rank
Page rankCarlos
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
Perform brute force
Perform brute forcePerform brute force
Perform brute forceSHC
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsCynthia Freeman
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsTony Nguyen
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsHarry Potter
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 
lecture 1
lecture 1lecture 1
lecture 1sajinsc
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringArthur Mensch
 

Similar to An introduction to recommendation algorithms using collaborative filtering (20)

Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Chapter8-Link_Analysis.pptx
Chapter8-Link_Analysis.pptxChapter8-Link_Analysis.pptx
Chapter8-Link_Analysis.pptx
 
Chapter8-Link_Analysis (1).pptx
Chapter8-Link_Analysis (1).pptxChapter8-Link_Analysis (1).pptx
Chapter8-Link_Analysis (1).pptx
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Q
QQ
Q
 
Data Analysis Homework Help
Data Analysis Homework HelpData Analysis Homework Help
Data Analysis Homework Help
 
Page rank
Page rankPage rank
Page rank
 
08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
Perform brute force
Perform brute forcePerform brute force
Perform brute force
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
lecture 1
lecture 1lecture 1
lecture 1
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

An introduction to recommendation algorithms using collaborative filtering

  • 1. An introduction to recommendation algorithms Collaborative filtering: how does it work? Arnaud de Myttenaere
  • 2. About me Data Scientist, PhD Founder of Uchidata Consultant at Octo Technology, Sydney Several projects on recommendation algorithms (Viadeo social network, e-commerce, news website, . . . )
  • 3. How do recommendation algorithms work?
  • 4. Context Available data Personal information Historical behavior Objective → If a young man is a fan of Daft Punk, what’s the best artist we could recommend to him?
  • 5. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 6. Model Based approach 1. Build a dataset which summarizes the data UserId Like Gender Age Artist Style Country 1 1 M 25 Daft Punk Electro France 1 0 M 25 Lady Gaga Pop USA 2 1 F 20 The Beatles Rock UK target user information item information 2. Learn a model to predict the target variable using your favorite algorithm: Linear Regression, Random Forest, XGBoost, . . . 3. For each new customer, apply the model on a set of artists and recommend the ones with the highest scores.
  • 7. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 8. Memory Based approach How to recommend items to a particular customer or user? For each new customer: Search for similar customers in historical data Recommend popular items among similar customers Example: Collaborative Filtering.
  • 10. Collaborative filtering: why? Collaborative Filtering algorithms1 .. are intuitive are simple to implement scales relatively well captures many implicit signals are hard to beat 1 Criteo Product Recommendation Beyond Collaborative Filtering - welcome to the Twilight Zone!, Olivier Koch RecSys Meetup London - Sept 20, 2017
  • 11. But Collaborative Filtering algorithms also have some limitations2 : does not scales that well in practice does not capture temporal signals does not solve cold start does not address exploration in the long tail 2 Criteo Product Recommendation Beyond Collaborative Filtering - welcome to the Twilight Zone!, Olivier Koch RecSys Meetup London - Sept 20, 2017
  • 12. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 13. Context Let us consider the following example: John likes Rock Mike likes Pop and Electro Dan likes Pop, R&B and Rock Lea likes Pop This information can be loaded in the folowing dataset: Customer Item John Rock Mike Pop Mike Electro Dan Pop Dan R&B Dan Rock Lea Pop Objective: find the best recommendation for Lea.
  • 14. Graph visualization The data can be visualized as a (bipartite) graph. Code : R Library : igraph JohnMike Dan Lea Rock PopElectro R&B 1 l i b r a r y ( i g r a p h ) d = read . csv ( ” data . csv ” ) # Load Data 3 g = graph . data . frame ( d ) # Load Data i n t o graph 5 V( g ) $ type <− V( g ) $name %i n% d$ Item # Set graph as b i p a r t i t e p l o t (g , l a y o u t=l a y o u t . b i p a r t i t e , v e r t e x . c o l o r=c ( ” green ” , ” cyan ” ) [V( g ) $ type +1])
  • 15. Incidence matrix This graph can be represented by a matrix...       Rock Pop Electro R&B John : 1 0 0 0 Mike : 0 1 1 0 Dan : 1 1 0 1 Lea : 0 1 0 0       A = get . i n c i d e n c e (g , s p a r s e=TRUE)
  • 16. Incidence matrices ... or by two matrices: Atrain =     Rock Pop Electro R&B John : 1 0 0 0 Mike : 0 1 1 0 Dan : 1 1 0 1     Atest = Rock Pop Electro R&B Lea : 0 1 0 0 1 A t r a i n = A[ which ( rownames (A) != ”Lea” ) , ] A t e s t = A[ which ( rownames (A) == ”Lea” ) , ]
  • 17. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 18. If similarity is the number of items in common (1/2) The similarity vector is given by: SimMatrix = Atrain ⊗ t(Atest) i.e. SimMatrix =   1 0 0 0 0 1 1 0 1 1 0 1   ⊗     0 1 0 0     =   0 1 1   John Mike Dan Indeed Lea does not have any item in common with John, but has 1 item in common with Mike and Dan (Pop). sim matrix = A t r a i n % ∗ % A t e s t
  • 19. If similarity is the number of items in common (2/2) Then the recommendation scores are given by scoreMatrix = t(SimMatrix) ⊗ Atrain i.e. scoreMatrix = 0 1 1 ⊗   1 0 0 0 0 1 1 0 1 1 0 1   So scoreMatrix = Rock Pop Electro R&B (1 2 1 1) 1 s c o r e M a t r i x = t ( as . matrix ( sim matrix ) ) % ∗ % A t r a i n
  • 20. Comments If similarity is the number of items in common... → not optimal since users with a lot of items will be very similar to (almost) every user. → hard to use because leads to a lot of items with the same recommendation score. Better similarity metric: cosine similarity → Idea: normalize the similarity using the number of items associated to each users.
  • 21. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 22. Cosine similarity (1/3) Using the same data: Atrain =     Rock Pop Electro R&B John : 1 0 0 0 Mike : 0 1 1 0 Dan : 1 1 0 1     The normalization vector is: Ntrain =   1 1/ √ 2 1/ √ 3   John is associated to 1 item (Rock) Mike is associated to 2 items (Pop, Electro) Dan is associated to 3 items
  • 23. Cosine similarity (1/3) Then the normalized similarity matrix is given by: SimMatrixnorm =    1 . . . 1√ 2 . . . 1√ 3   ⊗   0 1 1   =   0 1/ √ 2 1/ √ 3     0 0.71 0.58   John Mike Dan →Lea is more similar to Mike than Dan, because she has an item in common with both but Mike is associated to less items than Dan, so the link with Mike is stronger. 1 N t r a i n = 1/ s q r t (A t r a i n % ∗ % rep (1 , ncol (A) ) ) M norm = diag ( as . v e c t o r (N t r a i n ) )
  • 24. Cosine similarity (3/3) The matrix of scores is given by: scoreMatrix = t(SimMatrixnorm) ⊗ Atrain i.e. scoreMatrix = 0 0.71 0.58 ⊗   1 0 0 0 0 1 1 0 1 1 0 1   So scoreMatrix = Rock Pop Electro R&B (0.58 1.29 0.71 0.58) s c o r e M a t r i x = t ( as . matrix ( sim matrix norm ) ) % ∗ % A t r a i n
  • 25. Comments For Atest = Rock Pop Electro R&B Lea : 0 1 0 0 Recommendation scores are scoreMatrix = Rock Pop Electro R&B (0.58 1.29 0.71 0.58) 1. Pop is the best recommendation for Lea, but she is already associated to Pop. 2. If the objective is to recommend new items, Electro is the best recommendation for Lea. 3. Rock and R&B have the same score and can be ordered by frequencies or randomly.
  • 26. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 27. R code Collaborative filtering in 10 lines of code. 1 l i b r a r y ( i g r a p h ) # Load graph l i b r a r y 2 g = graph . data . frame ( read . csv ( ” data . csv ” ) ) # Read and con vert data i n t o graph V( g ) $ type <− V( g ) $name %i n% d$ Item # Set graph as b i p a r t i t e 4 A = get . i n c i d e n c e (g , s p a r s e=TRUE) # Compute I n c i d e n c e Matrix 5 A t r a i n = A[ which ( rownames (A) != ”Lea” ) , ] A t e s t = A[ which ( rownames (A) == ”Lea” ) , ] N t r a i n = 1/ s q r t (A t r a i n % ∗ % rep (1 , ncol (A) ) ) 8 M norm = diag ( as . v e c t o r (N t r a i n ) ) sim matrix norm = M norm % ∗ % (A t r a i n % ∗ % A t e s t ) 10 s c o r e M a t r i x = t ( as . matrix ( sim matrix norm ) ) % ∗ % A t r a i n
  • 28. In practice Can be precomputed and do not need to be updated in real time : Atrain and Mnorm Must be computed in real time : Atest and scoreMatrix (matrix calculation) Optimal number of users : too small → bad performance, too big → too slow (unless computation is parallelized).
  • 29. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 30. Collaborative Filtering Notations Let Iu(t) be the vector of items associated to a user u at time t: Iu(t) = (0, 0, . . . , 1, . . . , 0) where the kth coefficient is equal to 1 if item k is associated to user u at time t, and 0 else. Example: in music recommendation an ”item” could be an artist (or a song), and coefficient k is equal to 1 if the user u likes the artist (or song) k.
  • 31. Collaborative Filtering Then, for t > t, the collaborative filtering algorithm estimates Iu(t ) (the future vector of items associated to the user u) by: Iu(t ) = v=u sim(v, u) · Iv (t) where sim(v, u) represents the similarity between users u and v. → The most relevant items for the user u are the ones with the highest score.
  • 32. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 33. Similarity function The similarity between two users can be defined as the number of items in common. Then sim(u, v) = Iu|Iv where ·|· is the classical scalar product. → not optimal since users with a lot of items will be very similar to every user.
  • 34. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 35. Cosine similarity Cosine similarity One can normalize the similarity by the number of items associated to users u and v. sim(u, v) = Iu|Iv Iu · Iv However, as Iu(t ) = v=u Iu|Iv Iu · Iv · Iv = 1 Iu · v=u Iu|Iv Iv · Iv The order of recommendations for the user u is the same than the ones got with sim(u, v) = Iu|Iv √ Iv . → In practice we can use sim(u, v) = Iu|Iv √ Iv .
  • 36. The different approaches Model Based Memory Based Collaborative filtering using graph libraries Context A simple example Cosine similarity R code More formally Notations Similarity function Cosine similarity Conclusion
  • 37. Conclusion Two different approaches : Model Based Memory Based → Choose the number of users in Atrain to fit your practical constraints, → The definition of similarity between users can be modified to consider users and context.
  • 38. Conclusion However This algorithm is based on past behavior, so it never suggests new content. → It is necessary to often refresh the training set.