SlideShare a Scribd company logo
Recommender Systems
Collaborative Filtering &
Dimensionality Reduction
Mining of Massive Datasets
Jure Leskovec,Anand Rajaraman, Jeff Ullman
Stanford University
*Adapted by Gustavo Coutinho
Note to other teachers and users of these slides: We would be delighted if you found this our
material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering
Harnessing quality judgments of other
users
2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Previously - Content-Based
3J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Previously - Content-Based
4J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Previously - Content-Based
5J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Utility Matrix
Users have preferences for certain items,
and these preferences must be teased out
of the data.
Lets represent it with an Utility Matrix!
Example:
6
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering
Consider user x
Find set N of other 

users whose ratings 

are “similar” to 

x’s ratings
Estimate x’s ratings 

based on ratings 

of users in N
7
x
N
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering
Different from Content-Based Filtering
We don’t need to understand the
content of an specific item!
Different user share their experiences
8J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Let rx and rx vectors of users x and y
ratings, respectively
Lets try to use the Jaccard Similarity as
a measure
9
Finding “Similar” Users
rx = [*, _, _, *, ***]
ry = [*, _, **, **, _]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Now, rx and ry are considered as sets
Problem: Ignores the value of the rating!
10
Finding “Similar” Users
rx = { 1, 4, 5}
ry = { 1, 3, 4}
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
How to put the rating factor under a
formula?
Cosine Similarity measure
Now, rx and ry are considered as points
Problem: Treats missing ratings as
“negative”!
11
Finding “Similar” Users
similarity = cos(Θ) =
rx · ry
||rx|| · ||ry||
rx = { 1, 0, 0, 1, 3}
ry = { 1, 0, 2, 2, 0}
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
How do we balance the missing values?
Pearson correlation coefficient
Sxy= items rated by both users x and y
12
Finding “Similar” Users
sim(x, y) =
s∈Sxy
(rxs − rx)(rys − ry)
s∈Sxy
(rxs − rx)2
s∈Sxy
(rys − ry)2
rx and ry = average rating of “x” and “y”
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Similarity Metric
Lets consider de following Utility Matrix
of users and ratings
Intuitively we want: sim(A,B)>sim(A,C)
Using Jaccard: 1/5 < 2/4
Using Cosine: 0.386 > 0.322
13
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Similarity Metric
Now, we’re going to use Pearson
Correlation
Subtracting the (row) mean
Using Pearson: 0.092 > -0.559
Notice that Cosine Similarity is a
correlation when data is centered at 0
14
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Rating Predictions
How can we go from similarity metrics to
recommendations?
Let rx be the vector of user x’s ratings
Let N be the set of k users most similar to
x who have rated item i
Prediction for item s of user x:
Where sxy=sim(x,y)
15
rxi =
y∈N sxy · ryi
y∈N sxy
rxi =
1
k
·
y∈N
ryi
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item Collaborative Filtering
Until now we have used an User-User
approach.
What about an Item-Item?
▪ For item i, find other similar items
▪ Estimate rating for item i based on
ratings for similar items
▪ Can use the same similarity metrics and
predictions functions as in user-user
model
16
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Movies
- unknown rating - rating between 1 and 5
Item-Item CF (|N|=2)
17
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Movies
- estimate rating of movie 1 by user 5
18
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19
Item-Item CF (|N|=2)
1.00
-0.18
0.41
-0.10
-0.31
0.59
sim(1,m)12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Movies
- estimate rating of movie 1 by user 5
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
Neighbour selection: identify movies
similar to movie 1, rated by user 5
Here we use Pearson correlation as
similarity:
Subtract mean rating mi from each
movie i
m1=(1+3+5+5+4)/5 = 3.6
row1:[ -2.6, 0, -0.6, 0, 0, 1.4, 0, 0,
1.4, 0, 0.4, 0]
Compute cosine similarities between
rows 20
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
Compute similarity weights: s1,3 = 0.41, s1,6 = 0.59
1.00
-0.18
0.41
-0.10
-0.31
0.59
sim(1,m)12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 ? 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Movies
21
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item CF (|N|=2)
Predict by taking weighted average
r1,5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6
12 11 10 9 8 7 6 5 4 3 2 1
4 5 5 2.6 3 1 1
3 1 2 4 4 5 2
5 3 4 3 2 1 4 2 3
2 4 5 4 2 4
5 2 2 4 3 4 5
4 2 3 3 1 6
Users
Movies
22
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Define similarity sij of items i and j
Select k nearest neighbors N(i; x)
▪ Items most similar to i, that were rated by x
Estimate rating rxi as the weighted
average:
CF: Common Practice
23
baseline estimate for rxi
µ = overall mean movie rating
bx = rating deviation of user x
= (avg. rating of user x) – µ
bi = rating deviation of movie i
∑
∑
∈
∈
−⋅
+=
);(
);(
)(
xiNj ij
xiNj xjxjij
xixi
s
brs
br
 
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Item-Item vs. User-User
In practice, it has been observed that item-item
often works better than user-user
Why? Items are simpler, users have multiple tastes
Avatar LOTR Matrix Pirates
Alice 1 0.8
Bob 0.5 0.3
Carol 0.9 1 0.8
David 1 0.4
24
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Works for any kind of item
No feature selection needed
Unexpected recommendations
A user may receive recommendations
different from active searches done by itself
Groups with similar ratings
Users may connect with each other and
create groups with similar interests
Pros/Cons of Collaborative Filtering
25
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Cold Start
Need enough users in the system to find a match
Sparsity
The user/ratings matrix is sparse
Hard to find users that have rated the same items
First rater
Cannot recommend an item that has not been 

previously rated
New items, Esoteric items
Popularity bias
Cannot recommend items to someone with 

unique taste
Tends to recommend popular items
Pros/Cons of Collaborative Filtering
26
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Hybrid Methods
Implement two or more different
recommenders and combine predictions
Perhaps using a linear model
Add content-based methods to 

collaborative filtering
Item profiles for new item problem
Demographics to deal with new user
problem
27
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Remarks & Practical Tips
- Evaluation
- Error metrics
- Complexity / Speed
2828
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Evaluation
1 3 4
3 5 5
4 5 5
3
3
2 2 2
5
2 1 1
3 3
1
Users
Movies
29
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Evaluation
1 3 4
3 5 5
4 5 5
3
3
2 ? ?
?
2 1 ?
3 ?
1
Users
Movies
Test Data Set
30
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Collaborative Filtering: Complexity
Expensive step is finding k most similar
customers: O(|X|)
Too expensive to do at runtime
Could pre-compute
Naïve pre-computation takes time O(k·|X|)
We already know how to do this!
Near-neighbor search in high
dimensions (LSH)
Clustering
Dimensionality reduction
32
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Tip:Add Data
Leverage all the data
Don’t try to reduce data size in an 

effort to make fancy algorithms work
Simple methods on large data do best
Add more data
e.g., add IMDB data on genres
More data beats better algorithms
http://anand.typepad.com/datawocky/2008/03/more-data-
usual.html
33
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
Questions
34

More Related Content

Similar to Bando de Dados Avançados - Recommender Systems

A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratings
Aravindharamanan S
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
Khulna University
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
Changsung Moon
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Daniel Valcarce
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
Tariq Hassan
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Sc Huang
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...
Prabhu Kumar
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Tamer Rezk
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Kishor Datta Gupta
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
thanhdowork
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Cs583 recommender-systems
Cs583 recommender-systemsCs583 recommender-systems
Cs583 recommender-systems
Aravindharamanan S
 
Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13
ICDEcCnferenece
 
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple ViewsA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views
collwe
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
Evgeny Frolov
 
Telefonica Lunch Seminar
Telefonica Lunch SeminarTelefonica Lunch Seminar
Telefonica Lunch Seminar
Neal Lathia
 

Similar to Bando de Dados Avançados - Recommender Systems (20)

A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratings
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Cs583 recommender-systems
Cs583 recommender-systemsCs583 recommender-systems
Cs583 recommender-systems
 
Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13Icdec2020_presentation_slides_13
Icdec2020_presentation_slides_13
 
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple ViewsA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
Telefonica Lunch Seminar
Telefonica Lunch SeminarTelefonica Lunch Seminar
Telefonica Lunch Seminar
 

Recently uploaded

Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 

Recently uploaded (20)

Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 

Bando de Dados Avançados - Recommender Systems

  • 1. Recommender Systems Collaborative Filtering & Dimensionality Reduction Mining of Massive Datasets Jure Leskovec,Anand Rajaraman, Jeff Ullman Stanford University *Adapted by Gustavo Coutinho Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org
  • 2. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Collaborative Filtering Harnessing quality judgments of other users 2
  • 3. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Previously - Content-Based 3J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 4. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Previously - Content-Based 4J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 5. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Previously - Content-Based 5J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 6. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Utility Matrix Users have preferences for certain items, and these preferences must be teased out of the data. Lets represent it with an Utility Matrix! Example: 6
  • 7. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Collaborative Filtering Consider user x Find set N of other 
 users whose ratings 
 are “similar” to 
 x’s ratings Estimate x’s ratings 
 based on ratings 
 of users in N 7 x N J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 8. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Collaborative Filtering Different from Content-Based Filtering We don’t need to understand the content of an specific item! Different user share their experiences 8J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 9. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Let rx and rx vectors of users x and y ratings, respectively Lets try to use the Jaccard Similarity as a measure 9 Finding “Similar” Users rx = [*, _, _, *, ***] ry = [*, _, **, **, _]
  • 10. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Now, rx and ry are considered as sets Problem: Ignores the value of the rating! 10 Finding “Similar” Users rx = { 1, 4, 5} ry = { 1, 3, 4}
  • 11. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org How to put the rating factor under a formula? Cosine Similarity measure Now, rx and ry are considered as points Problem: Treats missing ratings as “negative”! 11 Finding “Similar” Users similarity = cos(Θ) = rx · ry ||rx|| · ||ry|| rx = { 1, 0, 0, 1, 3} ry = { 1, 0, 2, 2, 0}
  • 12. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org How do we balance the missing values? Pearson correlation coefficient Sxy= items rated by both users x and y 12 Finding “Similar” Users sim(x, y) = s∈Sxy (rxs − rx)(rys − ry) s∈Sxy (rxs − rx)2 s∈Sxy (rys − ry)2 rx and ry = average rating of “x” and “y”
  • 13. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Similarity Metric Lets consider de following Utility Matrix of users and ratings Intuitively we want: sim(A,B)>sim(A,C) Using Jaccard: 1/5 < 2/4 Using Cosine: 0.386 > 0.322 13
  • 14. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Similarity Metric Now, we’re going to use Pearson Correlation Subtracting the (row) mean Using Pearson: 0.092 > -0.559 Notice that Cosine Similarity is a correlation when data is centered at 0 14
  • 15. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Rating Predictions How can we go from similarity metrics to recommendations? Let rx be the vector of user x’s ratings Let N be the set of k users most similar to x who have rated item i Prediction for item s of user x: Where sxy=sim(x,y) 15 rxi = y∈N sxy · ryi y∈N sxy rxi = 1 k · y∈N ryi
  • 16. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Item-Item Collaborative Filtering Until now we have used an User-User approach. What about an Item-Item? ▪ For item i, find other similar items ▪ Estimate rating for item i based on ratings for similar items ▪ Can use the same similarity metrics and predictions functions as in user-user model 16
  • 17. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 Users Movies - unknown rating - rating between 1 and 5 Item-Item CF (|N|=2) 17
  • 18. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Item-Item CF (|N|=2) 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 Users Movies - estimate rating of movie 1 by user 5 18
  • 19. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19 Item-Item CF (|N|=2) 1.00 -0.18 0.41 -0.10 -0.31 0.59 sim(1,m)12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 Users Movies - estimate rating of movie 1 by user 5
  • 20. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Item-Item CF (|N|=2) Neighbour selection: identify movies similar to movie 1, rated by user 5 Here we use Pearson correlation as similarity: Subtract mean rating mi from each movie i m1=(1+3+5+5+4)/5 = 3.6 row1:[ -2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0] Compute cosine similarities between rows 20
  • 21. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Item-Item CF (|N|=2) Compute similarity weights: s1,3 = 0.41, s1,6 = 0.59 1.00 -0.18 0.41 -0.10 -0.31 0.59 sim(1,m)12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 Users Movies 21
  • 22. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Item-Item CF (|N|=2) Predict by taking weighted average r1,5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 2.6 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 Users Movies 22
  • 23. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Define similarity sij of items i and j Select k nearest neighbors N(i; x) ▪ Items most similar to i, that were rated by x Estimate rating rxi as the weighted average: CF: Common Practice 23 baseline estimate for rxi µ = overall mean movie rating bx = rating deviation of user x = (avg. rating of user x) – µ bi = rating deviation of movie i ∑ ∑ ∈ ∈ −⋅ += );( );( )( xiNj ij xiNj xjxjij xixi s brs br  
  • 24. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Item-Item vs. User-User In practice, it has been observed that item-item often works better than user-user Why? Items are simpler, users have multiple tastes Avatar LOTR Matrix Pirates Alice 1 0.8 Bob 0.5 0.3 Carol 0.9 1 0.8 David 1 0.4 24
  • 25. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Works for any kind of item No feature selection needed Unexpected recommendations A user may receive recommendations different from active searches done by itself Groups with similar ratings Users may connect with each other and create groups with similar interests Pros/Cons of Collaborative Filtering 25
  • 26. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Cold Start Need enough users in the system to find a match Sparsity The user/ratings matrix is sparse Hard to find users that have rated the same items First rater Cannot recommend an item that has not been 
 previously rated New items, Esoteric items Popularity bias Cannot recommend items to someone with 
 unique taste Tends to recommend popular items Pros/Cons of Collaborative Filtering 26
  • 27. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Hybrid Methods Implement two or more different recommenders and combine predictions Perhaps using a linear model Add content-based methods to 
 collaborative filtering Item profiles for new item problem Demographics to deal with new user problem 27
  • 28. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Remarks & Practical Tips - Evaluation - Error metrics - Complexity / Speed 2828
  • 29. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Evaluation 1 3 4 3 5 5 4 5 5 3 3 2 2 2 5 2 1 1 3 3 1 Users Movies 29
  • 30. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Evaluation 1 3 4 3 5 5 4 5 5 3 3 2 ? ? ? 2 1 ? 3 ? 1 Users Movies Test Data Set 30
  • 31. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Collaborative Filtering: Complexity Expensive step is finding k most similar customers: O(|X|) Too expensive to do at runtime Could pre-compute Naïve pre-computation takes time O(k·|X|) We already know how to do this! Near-neighbor search in high dimensions (LSH) Clustering Dimensionality reduction 32
  • 32. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Tip:Add Data Leverage all the data Don’t try to reduce data size in an 
 effort to make fancy algorithms work Simple methods on large data do best Add more data e.g., add IMDB data on genres More data beats better algorithms http://anand.typepad.com/datawocky/2008/03/more-data- usual.html 33
  • 33. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Questions 34