Speaker pham cong dinh

842 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
842
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Speaker pham cong dinh

  1. 1. A quick introduction to item-basedcollaborative filtering Pham Cong Dinh @pcdinh PHPDay Saigon 2012
  2. 2. Outline● PHP popularity and challenges to produce engaging content● Recommendation engine at work● How to build a item-based collaborative filtering-based recommendation engine
  3. 3. PHP is everywhere● W3Tech report in 2012●●●
  4. 4. PHP website distribution● Reported by builtwith.com in 2012 (more than 28 millions site in PHP)●●●
  5. 5. You have a website. Now what?
  6. 6. Information overload OR no engaging content? From http://bethesignal.org/
  7. 7. Why recommendation system?●
  8. 8. Recommendation engine at work
  9. 9. Recommendation engine at work
  10. 10. Build a recommendation system● Collaborative filtering: user and item – Filtering: automatic predictions about the interests of a user – Collaborative: many users (preferences or taste information)
  11. 11. Item-based collaborative filtering● Model-based – The similarities between different items in the data set are calculated – Predict ratings for user-item pairs not present in the data set
  12. 12. Steps to do item-based collaborative filtering● Data collection and representations (preferences/taste …)● Finding the relationships and determine the similarity● Recommendation computations - recommendations/suggestions/discoveries (produce engaging content)
  13. 13. Collaborative filtering: data collection● Data collection and representations (preferences/taste …) (user, item) – Clicks X,1 ✗ – Likes, favorites X,2 ✗ Y,1 Watch, read ✗ – Y,2 ✗ – Survey Z,2 ✗ – Ratings Z,3 ✗ – Others …● E.x: Find the set of movies that user X likes
  14. 14. Collaborative filtering: Similarity (1)● Finding the relationships and determine the similarity – The similarity values between items are measured by observing all the users who have interacted (rated) both the items● E.x: Find a group of movies that is similar to these set of movies that we know user X likes
  15. 15. Collaborative filtering: Similarity (2)● Manhattan distance: |x1 – x2| + |y1 - y2|● X Y● User(x, y) Amy(5, 5) X Bill(2, 5) Jim(1, 4) Item(x1, x2, x3) → Ratings Snow Crash(5, 2, 1) Girl with the Dragon Tattoo (5, 5, 1) Manhattan distance → Amy – Bill: |5 – 2| + |5 – 5| = 3 Y → Snow Crash - Girl with the Dragon Tattoo: 3
  16. 16. Collaborative filtering: Similarity (3)● Cosine distance: the angle between these vectors. Value: -1 (no related) to 1 Item(x1, x2, x3) → Ratings Snow Crash(5, 2, 1) Girl with the Dragon Tattoo (5, 5, 1) Cosine distance → Snow Crash - Girl with the Dragon Tattoo: (5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1)) PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php
  17. 17. Collaborative filtering: Similarity (4)● Pearson Correlation Coefficient: from -1 (no related) to +1●●●●● How much the ratings by common users for a pair of items deviate from average ratings for those items● Correlation is basically the average product
  18. 18. Collaborative filtering: Similarity (5)● Euclidean distance: the "ordinary" distance between two points.●●●●● Values: Near 0 (no related) to 1
  19. 19. Collaborative filtering: Similarity (6)● Spearman distance: Spearman distance is a square of Euclidean Distance between two rank vectors. A perfect positive correlation is +1 and a perfect negative correlation is -1.●●● Spearman Rank Correlation: The range of Spearman Correlation is from -1 to 1 (a perfect Spearman correlation of +1)
  20. 20. Collaborative filtering: Similarity (6)● Adjusted Euclidean distance: take length of vectors into account●
  21. 21. Collaborative filtering:Recommendation computations● Calculate similarity between Item A that user X watch/buy/like with items that User X does not watch/buy/like● Score all the items (e.x: apply weighted algorithms – average score by the other)● Sorting● Return top-N items
  22. 22. Collaborative filtering: Other issues● Accuracy of Predicting Ratings. To evaluate accuracy when predicting unrated item for the active user, use Mean Absolute Error (MAE).● Accuracy of Recommendations. To evaluate the accuracy of recommendations, use Mean Average Precision (MAP), which is defined as Average of the Average Precision (AP) value for a set of queries (a query could be considered as a user’s asking for recommending items in recommender systems).
  23. 23. The End● Q&A

×