2. User-Based or Memory-Based Filtering
• We compare a user with every other user to find the closest matches
• Also called Memory-Based Filtering because we need to store all ratings
in order to make recommendations
• It’s a 3-step process. Let’s say we are trying to find item
recommendations for User X:
oFind past item ratings from User 𝑋
oFind the “most similar” User 𝑌 (based on similarity of item ratings) from the
remaining user corpus
oRecommend those items to User X that the “most similar” User 𝑌 has rated, and
that User 𝑋 hasn’t used yet
3. What measures of similarity can we use
between two users?
1) Distance Based (dis)similarity measures
a. Manhattan distance
b. Euclidean distance
c. Minkowski distance
2) Cosine-based similarity measure
3) Pearson Correlation-based similarity measure
4) K-nearest neighbor
4. Running Example for this Section
Let’s say we are trying to find item recommendations for Veronica:
• We already have Veronica’s past item ratings.
• Now if we can find the user who is “most similar” to Veronica based on their item ratings,
• then we can recommend those items to Veronica that are highly rated by that “most
similar”, and that Veronica hasn’t already discovered
5. (1) Distance Based Measures
Distance-based (dis)similarity measures between User 𝑋 and User 𝑌 based on
𝑛 item ratings:
• Distance >= 0
• Most Similar Shortest Distance
• An item rating is considered in the distance measure only if it exists for both users
• Three different distance-based measures:
a) Manhattan Distance: σ𝑘=1
𝑛
|𝑥𝑘 − 𝑦𝑘|
b) Euclidean Distance: (σ𝑘=1
𝑛
𝑥𝑘 − 𝑦𝑘
2
)
Τ
1
2
c) Minkowski Distance: (σ𝑘=1
𝑛
|𝑥𝑘 − 𝑦𝑘|𝑟
)
Τ
1
𝑟
Note: Minkowski is a generalization of Manhattan (r=1), and Euclidean (r=2)
example comparing 3 users across 2 items
6. a) Manhattan Distance: σ𝑘=1
𝑛
|𝑥𝑘 − 𝑦𝑘|
We want to find item recommendations for Veronica.
• Manhattan Distances between Veronica and all other users (only consider those items ratings in distance measure that have been rated by both users):
o Angelica and Veronica = |3.5-3| + 4.5-5| + |5-4| + |1.5-2.5| + |2.5-3| = 3.5
o Bill and Veronica = |2-3| + |2-4| + |3.5-2.5| = 4
o Chan and Veronica = |5-3| + |3-5| + |5-4| + |1-2.5| = 6.5
o Dan and Veronica = |3-3| + |3-4| + |4.5-2.5| + |4-3| = 4
o Hailey and Veronica = |4-5| + |4-3| = 2
o Jordyn and Veronica = |5-5| + |5-4| + |4.5-2.5| + |4-3| = 4
o Sam and Veronica = |5-3| + |3-5| + |5-4| + |4-2.5| + |5-3| = 8.5
• User most similar (shortest distance) to Veronica: Hailey (Manhattan Distance 2)
o Hailey has rated three items that Veronica hasn’t: Broken Bells (Rating 4), Deadmau5 (Rating 1), Vampire Weekend (Rating 1)
o So we can make the following recommendation to Veronica: [('Broken Bells', 4.0), ('Deadmau5', 1.0), ('Vampire Weekend', 1.0)]
Since these are highly rated items by the most similar user Hailey, and these are items that Veronica hasn’t discovered yet
8. We want to find item recommendations for Veronica.
• Euclidean Distance between Veronica and all other users (only consider those items ratings in distance measure that have been rated by both users):
o Angelica and Veronica = (|3.5-3|^2 + 4.5-5|^2 + |5-4|^2 + |1.5-2.5|^2 + |2.5-3|^2)^1/2 = 1.7
o Bill and Veronica = (|2-3|^2 + |2-4|^2 + |3.5-2.5|^2 )^1/2 = 2.4
o Chan and Veronica = (|5-3|^2 + |3-5|^2 + |5-4|^2 + |1-2.5|^2 )^1/2 = 3.4
o Dan and Veronica = (|3-3|^2 + |3-4|^2 + |4.5-2.5|^2 + |4-3|^2 )^1/2 = 2.4
o Hailey and Veronica = (|4-5|^2 + |4-3|^2 )^1/2 = 1.4
o Jordyn and Veronica = (|5-5|^2 + |5-4|^2 + |4.5-2.5|^2 + |4-3|^2 )^1/2 = 2.4
o Sam and Veronica = (|5-3|^2 + |3-5|^2 + |5-4|^2 + |4-2.5|^2 + |5-3|^2 )^1/2 = 3.9
• User most similar (shortest distance) to Veronica: Hailey (Euclidean Distance 1.4)
o Hailey has rated three items that Veronica hasn’t: Broken Bells (Rating 4), Deadmau5 (Rating 1), Vampire Weekend (Rating 1)
o So we can make the following recommendation to Veronica: [('Broken Bells', 4.0), ('Deadmau5', 1.0), ('Vampire Weekend', 1.0)]
Since these are highly rated items by the most similar user Hailey, and these are items that Veronica hasn’t discovered yet
b) Euclidean Distance: (σ𝑘=1
𝑛
𝑥𝑘 − 𝑦𝑘
2
)
Τ
1
2
10. c) Minkowski Distance: (σ𝑘=1
𝑛
|𝑥𝑘 − 𝑦𝑘|𝑟)
Τ
1
𝑟
Try this for yourself to make recommendations for Veronica (with r=3)
11. Which Distance-Based Measure to Use?
• For higher-dimensional vectors, you might find that lower-order Minkowski
Distances (Manhattan r=1, Euclidean r=2) work better than higher-order
Minkowski Distances (r > 2).
• This is because the higher the order, the more the distance is going to be
dominated by the dimension with the highest difference.
• At lower orders, all dimensions get to play a substantial role in the distance
measure.
12. When to use Distance-Based Measures?
• If your data is dense (not too many zero or missing attribute values) and the
magnitude of the attribute values is important, use distance measures such as
Euclidean or Manhattan. Because if the data is sparse, then you can end up with
spurious results.
o For instance, when you compute the distance between Hailey and Veronica, you notice they
only rated two bands in common (Norah Jones and The Strokes), whereas when you compute
the distance between Hailey and Jordyn, you notice they rated five bands in common.
o This will skew our distance measurement, since the Hailey-Veronica distance is in 2 dimensions
while the Hailey-Jordyn distance is in 5 dimensions.
o Adding 0s to missing ratings typically just exacerbates the problem.
• Use smaller r (r=1 or r=2) if you don't want the measure to be dominated by larger
differences
• You may need to scale data if the attributes are on very different scales
13. (2) Cosine Similarity Measure
Motivation:
• Let’s say we were trying to make song recommendations for a user based on
what other “similar” users have played. Most users would have played most
songs 0 times, and very few songs non-0 times. So we wouldn’t want our
similarity measure to be based on the number of shared 0 values since any
users are likely to have “not played” many of the same songs. So what we’d like
is a Jaccard measure, but for non-binary vectors.
• Similarly, we’d only want to consider a match when both users have played a
song, rather than when one has and the other hasn’t. Because otherwise, we’d
be letting the similarity measure be overwhelmed by non-matches rather than
matches.*
• Cosine similarity accounts for both of these by considering products terms (so
the 0s fall off naturally).
Ɵ
d
* Unless we want to “self-dampen” the similarity measure based on amount of overlap…
14. Cosine-based similarity measure between User X and User Y based on n
item ratings:
• Cosine similarity lies between -1 and 1
(-1 total opposites, 0 independent, 1 perfectly similar)
• Most Similar Highest Cosine
• An item rating is considered in the cosine measure only if it exists for both
users (that is, we consider intersection).*
• cos(x,y) =
𝑥.𝑦
𝑥 𝑦
=
σ𝑖=1
𝑛
𝑥𝑖𝑦𝑖
σ𝑖=1
𝑛 𝑥𝑖
2 σ𝑖=1
𝑛 𝑦𝑖
2
* Or we could consider the union if we’d like the similarity measure to self-dampen
based on amount of overlap.
16. When to use Cosine Measure?
• If the data is sparse (too many zero or missing attribute values)
consider using Cosine Similarity since it ignores 0 matches.
17. (3) Pearson Correlation Measure
Motivation:
• Users often have different rating patterns. For instance, Bill seems to avoid
extreme ratings, his ratings range from 2 to 4. Jordyn seems to like everything,
her ratings range from 4 to 5. Hailey is a binary person, giving ratings of either
1s or 4s.
• In other words, users often anchor their ratings at different scales. One user
might rate <bad, good, great> as <1, 2, 3>, whereas another user might rate
<bad, good, great> as <3, 4, 5>.
• So we need a way to be able to base similarity on similar trending of ratings,
rather than similar absolute ratings.
18. Pearson Correlation-based similarity measure between User 𝑋 and User 𝑌 based on
𝑛 item ratings:
• Correlation between -1 and 1
(-1 perfectly -vely correlated, 0 not correlated, 1 perfectly +vely correlated)
• Most Similar Highest Correlation
• An item rating is considered in the pearson measure only if it exists for both users
(intersection).
• Original formula:
σ𝑖=1
𝑛
𝑥𝑖− ҧ
𝑥 𝑦𝑖− ത
𝑦
σ𝑖=1
𝑛
𝑥𝑖− ҧ
𝑥 2 σ𝑖=1
𝑛
𝑦𝑖− ത
𝑦 2
• Modified formula more efficient since it only requires a single pass through the data:
σ𝑖=1
𝑛
𝑥𝑖𝑦𝑖−
σ𝑖=1
𝑛 𝑥𝑖 σ𝑖=1
𝑛 𝑦𝑖
𝑛
σ𝑖=1
𝑛
𝑥𝑖
2−
σ
𝑖=1
𝑛 𝑥𝑖
2
𝑛
σ𝑖=1
𝑛
𝑦𝑖
2−
σ
𝑖=1
𝑛 𝑦𝑖
2
𝑛
19. NOTE
Recall:
• Cosine Similarity:
𝑥.𝑦
𝑥 𝑦
=
σ𝑖=1
𝑛
𝑥𝑖𝑦𝑖
σ𝑖=1
𝑛 𝑥𝑖
2 σ𝑖=1
𝑛 𝑦𝑖
2
• Pearson Correlation:
σ𝑖=1
𝑛
𝑥𝑖− ҧ
𝑥 𝑦𝑖− ത
𝑦
σ𝑖=1
𝑛
𝑥𝑖− ҧ
𝑥 2 σ𝑖=1
𝑛
𝑦𝑖− ത
𝑦 2
Mean-centered or mean-adjusted cosine:
• If we center the data before taking the cosine similarity, and only consider
items rated by both users, then it's exactly same as Pearson
• If we center the data before taking the cosine similarity, make unknows 0, and
consider all items, then denominator will dampen the effect of too few
overlaps. This in some sense factors in confidence related with degree of
overlap and can sometimes perform better than Pearson Correlation.
https://grouplens.org/blog/similarity-functions-for-user-user-collaborative-filtering/
21. When to use Pearson Correlation Measure?
• If the data is subject to grade-inflation (different users may be using
different scales) use Pearson Similarity.
• Recall that pearson correlation considers an item rating in the distance
measure only if it exists for both users. If you’d like factor in
confidence related with degree of overlap, you can additionally
“dampen” the similarity by multiplying the similarity with a weighting
factor such as
𝑚𝑖𝑛 𝐼𝑢∩𝐼𝑣,50
50
22. (4) K-Nearest Neighbor Recommendation
Motivation:
• Note that in all of the above methods, we have relied on a single “most
similar” person.
• So any quirk that person has is passed on as a recommendation.
• One way of getting around this is to base our recommendations on more than
one person who is similar to our user.
23. K-Nearest Neighbor recommendation based on Pearson Correlation-based similarity measure
between User 𝑋 and User 𝑌 based on 𝑛 item ratings:
• The projected rating for each item is calculated using the weighted average influence
(based on similarity) of the k-nearest neighbors.
• So let’s say the k-nearest neighbors to Ann are Sally, Eric and Amanda with Pearson
Scores of 0.8, 0.7 and 0.5.
o Then their influence (based on Pearson similarity) is 0.8/(0.8+0.7+0.5)=0.4, 0.7/(0.8+0.7+0.5)=0.35, and
0.5/(0.8+0.7+0.5)=0.25.
o Now suppose Sally, Eric, and Amanda rated the band Grey Wardens as 3.5, 5, 4.5, then the projected rating for
Ann would be 3.5*0.4+5*0.35+4.5*0.25=4.275.
• Note on calculating the weights: since Pearson Correlation can be negative, the above
method of calculating weights can create some interesting challenges. So when
calculating the weights, it is customary to first transform the Pearson Coefficient to a 0-1
scale using the following transformation: (pc + 1)/2.
• The best value for k is application specific—you will need to do some experimentation.