A quick introduction to item-based
collaborative filtering



                     Pham Cong Dinh
                     @pcdinh
                     PHPDay Saigon 2012
Outline
●   PHP popularity and challenges to produce
    engaging content
●   Recommendation engine at work
●   How to build a item-based collaborative
    filtering-based recommendation engine
PHP is everywhere
●   W3Tech report in 2012
●


●


●
PHP website distribution
●   Reported by builtwith.com in 2012 (more than
    28 millions site in PHP)
●


●


●
You have a website. Now what?
Information overload


                     OR
                 no engaging
                   content?




                  From http://bethesignal.org/
Why recommendation system?
●
Recommendation engine at work
Recommendation engine at work
Build a recommendation system
●   Collaborative filtering: user and item
        –   Filtering: automatic predictions about the interests
              of a user
        –   Collaborative: many users (preferences or taste
             information)
Item-based collaborative filtering
●   Model-based
       –   The similarities between different items in the data
            set are calculated
       –   Predict ratings for user-item pairs not present in
             the data set
Steps to do item-based
          collaborative filtering
●   Data collection and representations (preferences/taste
    …)
●   Finding the relationships and determine the similarity
●   Recommendation computations -
    recommendations/suggestions/discoveries (produce
    engaging content)
Collaborative filtering: data
             collection
●   Data collection and representations
    (preferences/taste …)                     (user, item)
        –   Clicks                        X,1
                                          ✗



        –   Likes, favorites              X,2
                                          ✗



                                          Y,1
            Watch, read
                                          ✗
        –
                                          Y,2
                                          ✗

        –   Survey
                                          Z,2
                                          ✗

        –   Ratings                       Z,3
                                          ✗


        –   Others …
●   E.x: Find the set of movies that user X likes
Collaborative filtering: Similarity
                (1)
●   Finding the relationships and determine the
    similarity
        –   The similarity values between items are
             measured by observing all the users who
             have interacted (rated) both the items
●   E.x: Find a group of movies that is similar to
    these set of movies that we know user X likes
Collaborative filtering: Similarity
                (2)
●       Manhattan distance: |x1 – x2| + |y1 - y2|
●                        X                                           Y


●



                             User(x, y)

                             Amy(5, 5)
    X                        Bill(2, 5)
                             Jim(1, 4)

                             Item(x1, x2, x3) → Ratings
                             Snow Crash(5, 2, 1)
                             Girl with the Dragon Tattoo (5, 5, 1)

                             Manhattan distance
                             → Amy – Bill: |5 – 2| + |5 – 5| = 3
                   Y         → Snow Crash - Girl with the Dragon Tattoo: 3
Collaborative filtering: Similarity
                (3)
●   Cosine distance: the angle between these
    vectors. Value: -1 (no related) to 1




    Item(x1, x2, x3) → Ratings
    Snow Crash(5, 2, 1)
    Girl with the Dragon Tattoo (5, 5, 1)

    Cosine distance
    → Snow Crash - Girl with the Dragon Tattoo:
    (5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1))

    PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php
Collaborative filtering: Similarity
                (4)
●   Pearson Correlation Coefficient: from -1 (no
    related) to +1
●


●


●


●


●   How much the ratings by common users for a
    pair of items deviate from average ratings for
    those items
●   Correlation is basically the average product
Collaborative filtering: Similarity
                (5)
●   Euclidean distance: the "ordinary" distance
    between two points.
●


●


●


●


●   Values: Near 0 (no related) to 1
Collaborative filtering: Similarity
                (6)
●   Spearman distance: Spearman distance is a
    square of Euclidean Distance between two
    rank vectors. A perfect positive correlation is
    +1 and a perfect negative correlation is -1.
●


●


●   Spearman Rank Correlation: The range of
    Spearman Correlation is from -1 to 1 (a perfect
    Spearman correlation of +1)
Collaborative filtering: Similarity
                (6)
●   Adjusted Euclidean distance: take length of
    vectors into account
●
Collaborative filtering:
Recommendation computations
●   Calculate similarity between Item A that user X
    watch/buy/like with items that User X does not
    watch/buy/like
●   Score all the items (e.x: apply weighted
    algorithms – average score by the other)
●   Sorting
●   Return top-N items
Collaborative filtering: Other
                issues
●   Accuracy of Predicting Ratings. To evaluate
    accuracy when predicting unrated item for the active
    user, use Mean Absolute Error (MAE).
●   Accuracy of Recommendations. To evaluate the
    accuracy of recommendations, use Mean Average
    Precision (MAP), which is defined as Average of the
    Average Precision (AP) value for a set of queries (a
    query could be considered as a user’s asking for
    recommending items in recommender systems).
The End

●   Q&A

Speaker pham cong dinh

  • 1.
    A quick introductionto item-based collaborative filtering Pham Cong Dinh @pcdinh PHPDay Saigon 2012
  • 2.
    Outline ● PHP popularity and challenges to produce engaging content ● Recommendation engine at work ● How to build a item-based collaborative filtering-based recommendation engine
  • 3.
    PHP is everywhere ● W3Tech report in 2012 ● ● ●
  • 4.
    PHP website distribution ● Reported by builtwith.com in 2012 (more than 28 millions site in PHP) ● ● ●
  • 5.
    You have awebsite. Now what?
  • 6.
    Information overload OR no engaging content? From http://bethesignal.org/
  • 7.
  • 8.
  • 9.
  • 10.
    Build a recommendationsystem ● Collaborative filtering: user and item – Filtering: automatic predictions about the interests of a user – Collaborative: many users (preferences or taste information)
  • 11.
    Item-based collaborative filtering ● Model-based – The similarities between different items in the data set are calculated – Predict ratings for user-item pairs not present in the data set
  • 12.
    Steps to doitem-based collaborative filtering ● Data collection and representations (preferences/taste …) ● Finding the relationships and determine the similarity ● Recommendation computations - recommendations/suggestions/discoveries (produce engaging content)
  • 13.
    Collaborative filtering: data collection ● Data collection and representations (preferences/taste …) (user, item) – Clicks X,1 ✗ – Likes, favorites X,2 ✗ Y,1 Watch, read ✗ – Y,2 ✗ – Survey Z,2 ✗ – Ratings Z,3 ✗ – Others … ● E.x: Find the set of movies that user X likes
  • 14.
    Collaborative filtering: Similarity (1) ● Finding the relationships and determine the similarity – The similarity values between items are measured by observing all the users who have interacted (rated) both the items ● E.x: Find a group of movies that is similar to these set of movies that we know user X likes
  • 15.
    Collaborative filtering: Similarity (2) ● Manhattan distance: |x1 – x2| + |y1 - y2| ● X Y ● User(x, y) Amy(5, 5) X Bill(2, 5) Jim(1, 4) Item(x1, x2, x3) → Ratings Snow Crash(5, 2, 1) Girl with the Dragon Tattoo (5, 5, 1) Manhattan distance → Amy – Bill: |5 – 2| + |5 – 5| = 3 Y → Snow Crash - Girl with the Dragon Tattoo: 3
  • 16.
    Collaborative filtering: Similarity (3) ● Cosine distance: the angle between these vectors. Value: -1 (no related) to 1 Item(x1, x2, x3) → Ratings Snow Crash(5, 2, 1) Girl with the Dragon Tattoo (5, 5, 1) Cosine distance → Snow Crash - Girl with the Dragon Tattoo: (5x5 + 2x5 + 1x1) / (( 5x5 + 2x2 + 1 x 1) x ( 5x5 + 5x5 + 1x1)) PHP: https://github.com/aoiaoi/CosineSimilarity/blob/master/CosineSimilarity.php
  • 17.
    Collaborative filtering: Similarity (4) ● Pearson Correlation Coefficient: from -1 (no related) to +1 ● ● ● ● ● How much the ratings by common users for a pair of items deviate from average ratings for those items ● Correlation is basically the average product
  • 18.
    Collaborative filtering: Similarity (5) ● Euclidean distance: the "ordinary" distance between two points. ● ● ● ● ● Values: Near 0 (no related) to 1
  • 19.
    Collaborative filtering: Similarity (6) ● Spearman distance: Spearman distance is a square of Euclidean Distance between two rank vectors. A perfect positive correlation is +1 and a perfect negative correlation is -1. ● ● ● Spearman Rank Correlation: The range of Spearman Correlation is from -1 to 1 (a perfect Spearman correlation of +1)
  • 20.
    Collaborative filtering: Similarity (6) ● Adjusted Euclidean distance: take length of vectors into account ●
  • 21.
    Collaborative filtering: Recommendation computations ● Calculate similarity between Item A that user X watch/buy/like with items that User X does not watch/buy/like ● Score all the items (e.x: apply weighted algorithms – average score by the other) ● Sorting ● Return top-N items
  • 22.
    Collaborative filtering: Other issues ● Accuracy of Predicting Ratings. To evaluate accuracy when predicting unrated item for the active user, use Mean Absolute Error (MAE). ● Accuracy of Recommendations. To evaluate the accuracy of recommendations, use Mean Average Precision (MAP), which is defined as Average of the Average Precision (AP) value for a set of queries (a query could be considered as a user’s asking for recommending items in recommender systems).
  • 23.