Speaker pham cong dinh

A quick introduction to item-based
collaborative filtering

Pham Cong Dinh
@pcdinh
PHPDay Saigon 2012

Outline
● PHP popularity and challenges to produce
engaging content
● Recommendation engine at work
● How to build a item-based collaborative
filtering-based recommendation engine

PHP is everywhere
● W3Tech report in 2012
●

●

●

PHP website distribution
● Reported by builtwith.com in 2012 (more than
28 millions site in PHP)
●

●

●

Information overload

OR
no engaging
content?

From http://bethesignal.org/

Why recommendation system?
●

Build a recommendation system
● Collaborative filtering: user and item
– Filtering: automatic predictions about the interests
of a user
– Collaborative: many users (preferences or taste
information)

Item-based collaborative filtering
● Model-based
– The similarities between different items in the data
set are calculated
– Predict ratings for user-item pairs not present in
the data set

Steps to do item-based
collaborative filtering
● Data collection and representations (preferences/taste
…)
● Finding the relationships and determine the similarity
● Recommendation computations -
recommendations/suggestions/discoveries (produce
engaging content)

Collaborative filtering: data
collection
● Data collection and representations
(preferences/taste …) (user, item)
– Clicks X,1
✗

– Likes, favorites X,2
✗

Y,1
Watch, read
✗
–
Y,2
✗

– Survey
Z,2
✗

– Ratings Z,3
✗

– Others …
● E.x: Find the set of movies that user X likes

Collaborative filtering: Similarity
(1)
● Finding the relationships and determine the
similarity
– The similarity values between items are
measured by observing all the users who
have interacted (rated) both the items
● E.x: Find a group of movies that is similar to
these set of movies that we know user X likes

(2)
● Manhattan distance: |x1 – x2| + |y1 - y2|
● X Y

●

User(x, y)

Amy(5, 5)
X Bill(2, 5)
Jim(1, 4)

Item(x1, x2, x3) → Ratings
Snow Crash(5, 2, 1)
Girl with the Dragon Tattoo (5, 5, 1)

Manhattan distance
→ Amy – Bill: |5 – 2| + |5 – 5| = 3
Y → Snow Crash - Girl with the Dragon Tattoo: 3

(3)
● Cosine distance: the angle between these
vectors. Value: -1 (no related) to 1

Item(x1, x2, x3) → Ratings
Snow Crash(5, 2, 1)
Girl with the Dragon Tattoo (5, 5, 1)

Cosine distance
→ Snow Crash - Girl with the Dragon Tattoo:
(5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1))

PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php

(4)
● Pearson Correlation Coefficient: from -1 (no
related) to +1
●

●

●

●

● How much the ratings by common users for a
pair of items deviate from average ratings for
those items
● Correlation is basically the average product

(5)
● Euclidean distance: the "ordinary" distance
between two points.
●

●

●

●

● Values: Near 0 (no related) to 1

(6)
● Spearman distance: Spearman distance is a
square of Euclidean Distance between two
rank vectors. A perfect positive correlation is
+1 and a perfect negative correlation is -1.
●

●

● Spearman Rank Correlation: The range of
Spearman Correlation is from -1 to 1 (a perfect
Spearman correlation of +1)

(6)
● Adjusted Euclidean distance: take length of
vectors into account
●

Collaborative filtering:
Recommendation computations
● Calculate similarity between Item A that user X
watch/buy/like with items that User X does not
watch/buy/like
● Score all the items (e.x: apply weighted
algorithms – average score by the other)
● Sorting
● Return top-N items

Collaborative filtering: Other
issues
● Accuracy of Predicting Ratings. To evaluate
accuracy when predicting unrated item for the active
user, use Mean Absolute Error (MAE).
● Accuracy of Recommendations. To evaluate the
accuracy of recommendations, use Mean Average
Precision (MAP), which is defined as Average of the
Average Precision (AP) value for a set of queries (a
query could be considered as a user’s asking for
recommending items in recommender systems).

Speaker pham cong dinh

More Related Content

Similar to Speaker pham cong dinh

More from AiTi Education

Speaker pham cong dinh