Recommender Systems
Collaborative Filtering &
Content-Based Recommending
1
Recommender Systems
 Systems for recommending items (e.g.
books, movies, CD’s, web pages,
newsgroup messages) to users based
on examples of their preferences.
 Many on-line stores provide
recommendations (e.g. Amazon,
CDNow).
 Recommenders have been shown to
substantially increase sales at on-line
stores.
2
Book Recommender
3
Red
Mars
Juras-
sic
Park
Lost
World
2001
Found
ation
Differ-
ence
Engine
Machine
Learning
User
Profile
Neuro-
mancer
2010
Utility matrix
4
In a recommendation-system application there are two
classes of entities, which we shall refer to as users and
items.
Users have preferences for certain items, and these
preferences must be teased out of the data. The data
itself is represented as a utility matrix, giving for each
user-item pair, a value that represents what is known
about the degree of preference of that user for that
item.
We assume that the matrix is sparse, meaning that most
entries are “unknown.” An unknown rating implies that
we have no explicit information about the user’s
preference for the item.
Example
5
The goal of a recommendation system is to
predict the blanks in the utility matrix.
Long Tail Phenomena
6
The distinction between the physical and on-
line worlds has been called the long tail
phenomenon.
The long-tail phenomenon forces on-line
institutions to recommend items to
individual users. It is not possible to present
all available items to the user, the way
physical institutions can.
Populating the Utility Matrix
7
There are two way:
1. We can ask users to rate items.
2. We can make inferences from users’
behavior.
Approaches of recommender
system
8
There are two basic approaches to
recommending:
1. Collaborative Filtering
2. Content-based Filtering
Collaborative Filtering
 Maintain a database of many users’
ratings of a variety of items.
 For a given user, find other similar users
whose ratings strongly correlate with the
current user.
 Recommend items rated highly by these
similar users, but not rated by the current
user.
 Almost all existing commercial
recommenders use this approach (e.g.
Amazon).
9
Collaborative Filtering
10
A 9
B 3
C
: :
Z 5
A
B
C 9
: :
Z 10
A 5
B 3
C
: :
Z 7
A
B
C 8
: :
Z
A 6
B 4
C
: :
Z
A 10
B 4
C 8
. .
Z 1
User
Database
Active
User
Correlation
Match
A 9
B 3
C
. .
Z 5
A 9
B 3
C
: :
Z 5
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
Collaborative Filtering
Method
 Weight all users with respect to similarity with the
active user.
 Select a subset of the users (neighbors) to use as
predictors.
 Normalize ratings and compute a prediction from a
weighted combination of the selected neighbors’
ratings.
 Present items with highest predicted ratings as
recommendations.
11
Measuring Similarity
The first question we must deal with is how
to measure similarity of users or items from
their rows or columns in the utility matrix.
Jaccard Distance: lets assume A and B have
an intersection of size 1 and a union of size
5.Thus, their Jaccard similarity is 1/5, and
their Jaccard distance is 4/5; i.e.,they are
very far apart.
12
Conti….
Cosine Distance:
We can treat blanks as a 0 value. This choice is
questionable, since it has the effect of treating the
lack of a rating as more similar to disliking the movie
than liking it.
Rounding the Data:
We could try to eliminate the apparent similarity
between movies a user rates highly and those with
low scores by rounding the ratings. For instance, we
could consider ratings of 3, 4, and 5 as a “1” and
consider ratings 1 and 2 as unrated.
13
Conti….
Normalizing the rating:
If we normalize ratings, by subtracting from
each rating the average rating of that user,
we turn low ratings into negative numbers
and high ratings into positive numbers.
14
User based similarity
Let say,
Sim(User1, User2)= Σj | rating (User1, Item
j) – rating|(User2, Item j) /Num. of items
15
Problems with Collaborative
Filtering
 Cold Start: There needs to be enough other users
already in the system to find a match.
 Sparsity: If there are many items to be
recommended, even if there are many users, the
user/ratings matrix is sparse, and it is hard to find
users that have rated the same items.
 First Rater: Cannot recommend an item that has
not been previously rated.
 New items
 Esoteric items
 Popularity Bias: Cannot recommend items to
someone with unique tastes.
 Tends to recommend popular items.
16
Content-Based
Recommending
 Recommendations are based on
information on the content of items
rather than on other users’ opinions.
 Uses a machine learning algorithm to
induce a profile of the users
preferences.
 Some previous applications:
 Newsweeder (Lang, 1995)
 Syskill and Webert (Pazzani et al., 1996)
17
Item profile
In a content-based system, we must construct for
each item a profile, which is a record or collection of
records representing important characteristics of that
item.
 1. The set of actors of the movie. Some viewers
prefer movies with their favorite actors.
 2. The director. Some viewers have a preference
for the work of certain directors.
 3. The year in which the movie was made. Some
viewers prefer old movies; others watch only the
latest releases.
 4. The genre or general type of movie. Some
viewers like only comedie, others dramas or
romances. 18
Discovering Features of
Documents
There are other classes of items where it is not
immediately apparent what the values of
features should be. We shall consider two of
them: document collections and images.
 First, eliminate stop words –the several
hundred most common words, which tend to
say little about the topic of a document.
 For the remaining words, compute the
TF.IDF score for each word in the document.
 The ones with the highest scores are the
words that characterize the document.
19
Obtaining Item Features From
Tags
 The problem with images is that their
data, typically an array of pixels, does
not tell us anything useful about their
features.
 There have been a number of attempts to
obtain information about features of
items by inviting users to tag the items by
entering words or phrases that describe
the item.
20
User Profiles
We need to create vectors with the same
components that describe the user’s
preferences. We have the utility matrix
representing the connection between users
and items.
21
Recommending Items to Users
Based on Content
With profile vectors for both users and
items, we can estimate the degree to which
a user would prefer an item by computing
the cosine distance between the user’s and
item’s vectors.
22
Advantages of Content-Based
Approach
 No need for data on other users.
 No cold-start or sparsity problems.
 Able to recommend to users with unique
tastes.
 Able to recommend new and unpopular items
 No first-rater problem.
 Can provide explanations of recommended
items by listing content-features that caused
an item to be recommended.
23
Disadvantages of Content-
Based Method
 Requires content that can be encoded as
meaningful features.
 Users’ tastes must be represented as a
learnable function of these content
features.
 Unable to exploit quality judgments of
other users.
 Unless these are somehow included in
the content features.
24
LIBRA
Learning Intelligent Book Recommending
Agent
 Content-based recommender for books using
information about titles extracted from
Amazon.
 Uses information extraction from the web to
organize text into fields:
 Author
 Title
 Editorial Reviews
 Customer Comments
 Subject terms
 Related authors
 Related titles
25
LIBRA System
26
Amazon Pages
Rated
Examples
User Profile
Machine Learning
Learner
Information
Extraction
LIBRA
Database
Recommendations
1.~~~~~~
2.~~~~~~~
3.~~~~~
:
:
: Predictor
Sample Amazon Page
27
Age of Spiritual Machines
Sample Extracted Information
28
Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence>
Author: <Ray Kurzweil>
Price: <11.96>
Publication Date: <January 2000>
ISBN: <0140282025>
Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind>
Author: <Hans Moravec> >
…
Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> >
…
Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> >
…
Related Authors: <Hans P. Moravec> <K. Eric Drexler>…
Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
Libra Content Information
 Libra uses this extracted information to form “bags of
words” for the following slots:
 Author
 Title
 Description (reviews and comments)
 Subjects
 Related Titles
 Related Authors
29
Implementation
 Stopwords removed from all bags.
 A book’s title and author are added to its own related title and related
author slots.
 All probabilities are smoothed using Laplace estimation to account for
small sample size.
 Lisp implementation is quite efficient:
 Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs
 Test: 200 books per second
30
Experimental Data
 Amazon searches were used to find books in various
genres.
 Titles that have at least one review or comment were
kept.
 Data sets:
 Literature fiction: 3,061 titles
 Mystery: 7,285 titles
 Science: 3,813 titles
 Science Fiction: 3.813 titles
31
Rated Data
 4 users rated random examples within a genre by
reviewing the Amazon pages about the title:
 LIT1 936 titles
 LIT2 935 titles
 MYST 500 titles
 SCI 500 titles
 SF 500 titles
32
Combining Content and
Collaboration
 Content-based and collaborative methods
have complementary strengths and
weaknesses.
 Combine methods to obtain the best of both.
 Various hybrid approaches:
 Apply both methods and combine
recommendations.
 Use collaborative data as content.
 Use content-based predictor as another
collaborator.
 Use content-based predictor to complete
collaborative data.
33
Content-Boosted
Collaborative Filtering
34
IMDbEachMovie Web Crawler
Movie
Content
Database
Full User
Ratings Matrix
Collaborative
Filtering
Active
User Ratings
User Ratings
Matrix (Sparse)
Content-based
Predictor
Recommendations
Content-Boosted CF - I
35
Content-Based
Predictor
Training Examples
Pseudo User-ratings Vector
Items with Predicted Ratings
User-ratings Vector
User-rated Items
Unrated Items
Conclusions
 Recommending and personalization are important
approaches to combating information over-load.
 Machine Learning is an important part of systems for
these tasks.
 Collaborative filtering has problems.
 Content-based methods address these problems (but
have problems of their own).
 Integrating both is best.
36

Recommenders Systems

  • 1.
    Recommender Systems Collaborative Filtering& Content-Based Recommending 1
  • 2.
    Recommender Systems  Systemsfor recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences.  Many on-line stores provide recommendations (e.g. Amazon, CDNow).  Recommenders have been shown to substantially increase sales at on-line stores. 2
  • 3.
  • 4.
    Utility matrix 4 In arecommendation-system application there are two classes of entities, which we shall refer to as users and items. Users have preferences for certain items, and these preferences must be teased out of the data. The data itself is represented as a utility matrix, giving for each user-item pair, a value that represents what is known about the degree of preference of that user for that item. We assume that the matrix is sparse, meaning that most entries are “unknown.” An unknown rating implies that we have no explicit information about the user’s preference for the item.
  • 5.
    Example 5 The goal ofa recommendation system is to predict the blanks in the utility matrix.
  • 6.
    Long Tail Phenomena 6 Thedistinction between the physical and on- line worlds has been called the long tail phenomenon. The long-tail phenomenon forces on-line institutions to recommend items to individual users. It is not possible to present all available items to the user, the way physical institutions can.
  • 7.
    Populating the UtilityMatrix 7 There are two way: 1. We can ask users to rate items. 2. We can make inferences from users’ behavior.
  • 8.
    Approaches of recommender system 8 Thereare two basic approaches to recommending: 1. Collaborative Filtering 2. Content-based Filtering
  • 9.
    Collaborative Filtering  Maintaina database of many users’ ratings of a variety of items.  For a given user, find other similar users whose ratings strongly correlate with the current user.  Recommend items rated highly by these similar users, but not rated by the current user.  Almost all existing commercial recommenders use this approach (e.g. Amazon). 9
  • 10.
    Collaborative Filtering 10 A 9 B3 C : : Z 5 A B C 9 : : Z 10 A 5 B 3 C : : Z 7 A B C 8 : : Z A 6 B 4 C : : Z A 10 B 4 C 8 . . Z 1 User Database Active User Correlation Match A 9 B 3 C . . Z 5 A 9 B 3 C : : Z 5 A 10 B 4 C 8 . . Z 1 Extract Recommendations C
  • 11.
    Collaborative Filtering Method  Weightall users with respect to similarity with the active user.  Select a subset of the users (neighbors) to use as predictors.  Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings.  Present items with highest predicted ratings as recommendations. 11
  • 12.
    Measuring Similarity The firstquestion we must deal with is how to measure similarity of users or items from their rows or columns in the utility matrix. Jaccard Distance: lets assume A and B have an intersection of size 1 and a union of size 5.Thus, their Jaccard similarity is 1/5, and their Jaccard distance is 4/5; i.e.,they are very far apart. 12
  • 13.
    Conti…. Cosine Distance: We cantreat blanks as a 0 value. This choice is questionable, since it has the effect of treating the lack of a rating as more similar to disliking the movie than liking it. Rounding the Data: We could try to eliminate the apparent similarity between movies a user rates highly and those with low scores by rounding the ratings. For instance, we could consider ratings of 3, 4, and 5 as a “1” and consider ratings 1 and 2 as unrated. 13
  • 14.
    Conti…. Normalizing the rating: Ifwe normalize ratings, by subtracting from each rating the average rating of that user, we turn low ratings into negative numbers and high ratings into positive numbers. 14
  • 15.
    User based similarity Letsay, Sim(User1, User2)= Σj | rating (User1, Item j) – rating|(User2, Item j) /Num. of items 15
  • 16.
    Problems with Collaborative Filtering Cold Start: There needs to be enough other users already in the system to find a match.  Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items.  First Rater: Cannot recommend an item that has not been previously rated.  New items  Esoteric items  Popularity Bias: Cannot recommend items to someone with unique tastes.  Tends to recommend popular items. 16
  • 17.
    Content-Based Recommending  Recommendations arebased on information on the content of items rather than on other users’ opinions.  Uses a machine learning algorithm to induce a profile of the users preferences.  Some previous applications:  Newsweeder (Lang, 1995)  Syskill and Webert (Pazzani et al., 1996) 17
  • 18.
    Item profile In acontent-based system, we must construct for each item a profile, which is a record or collection of records representing important characteristics of that item.  1. The set of actors of the movie. Some viewers prefer movies with their favorite actors.  2. The director. Some viewers have a preference for the work of certain directors.  3. The year in which the movie was made. Some viewers prefer old movies; others watch only the latest releases.  4. The genre or general type of movie. Some viewers like only comedie, others dramas or romances. 18
  • 19.
    Discovering Features of Documents Thereare other classes of items where it is not immediately apparent what the values of features should be. We shall consider two of them: document collections and images.  First, eliminate stop words –the several hundred most common words, which tend to say little about the topic of a document.  For the remaining words, compute the TF.IDF score for each word in the document.  The ones with the highest scores are the words that characterize the document. 19
  • 20.
    Obtaining Item FeaturesFrom Tags  The problem with images is that their data, typically an array of pixels, does not tell us anything useful about their features.  There have been a number of attempts to obtain information about features of items by inviting users to tag the items by entering words or phrases that describe the item. 20
  • 21.
    User Profiles We needto create vectors with the same components that describe the user’s preferences. We have the utility matrix representing the connection between users and items. 21
  • 22.
    Recommending Items toUsers Based on Content With profile vectors for both users and items, we can estimate the degree to which a user would prefer an item by computing the cosine distance between the user’s and item’s vectors. 22
  • 23.
    Advantages of Content-Based Approach No need for data on other users.  No cold-start or sparsity problems.  Able to recommend to users with unique tastes.  Able to recommend new and unpopular items  No first-rater problem.  Can provide explanations of recommended items by listing content-features that caused an item to be recommended. 23
  • 24.
    Disadvantages of Content- BasedMethod  Requires content that can be encoded as meaningful features.  Users’ tastes must be represented as a learnable function of these content features.  Unable to exploit quality judgments of other users.  Unless these are somehow included in the content features. 24
  • 25.
    LIBRA Learning Intelligent BookRecommending Agent  Content-based recommender for books using information about titles extracted from Amazon.  Uses information extraction from the web to organize text into fields:  Author  Title  Editorial Reviews  Customer Comments  Subject terms  Related authors  Related titles 25
  • 26.
    LIBRA System 26 Amazon Pages Rated Examples UserProfile Machine Learning Learner Information Extraction LIBRA Database Recommendations 1.~~~~~~ 2.~~~~~~~ 3.~~~~~ : : : Predictor
  • 27.
    Sample Amazon Page 27 Ageof Spiritual Machines
  • 28.
    Sample Extracted Information 28 Title:<The Age of Spiritual Machines: When Computers Exceed Human Intelligence> Author: <Ray Kurzweil> Price: <11.96> Publication Date: <January 2000> ISBN: <0140282025> Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind> Author: <Hans Moravec> > … Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans…> > … Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has …> > … Related Authors: <Hans P. Moravec> <K. Eric Drexler>… Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence> …
  • 29.
    Libra Content Information Libra uses this extracted information to form “bags of words” for the following slots:  Author  Title  Description (reviews and comments)  Subjects  Related Titles  Related Authors 29
  • 30.
    Implementation  Stopwords removedfrom all bags.  A book’s title and author are added to its own related title and related author slots.  All probabilities are smoothed using Laplace estimation to account for small sample size.  Lisp implementation is quite efficient:  Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs  Test: 200 books per second 30
  • 31.
    Experimental Data  Amazonsearches were used to find books in various genres.  Titles that have at least one review or comment were kept.  Data sets:  Literature fiction: 3,061 titles  Mystery: 7,285 titles  Science: 3,813 titles  Science Fiction: 3.813 titles 31
  • 32.
    Rated Data  4users rated random examples within a genre by reviewing the Amazon pages about the title:  LIT1 936 titles  LIT2 935 titles  MYST 500 titles  SCI 500 titles  SF 500 titles 32
  • 33.
    Combining Content and Collaboration Content-based and collaborative methods have complementary strengths and weaknesses.  Combine methods to obtain the best of both.  Various hybrid approaches:  Apply both methods and combine recommendations.  Use collaborative data as content.  Use content-based predictor as another collaborator.  Use content-based predictor to complete collaborative data. 33
  • 34.
    Content-Boosted Collaborative Filtering 34 IMDbEachMovie WebCrawler Movie Content Database Full User Ratings Matrix Collaborative Filtering Active User Ratings User Ratings Matrix (Sparse) Content-based Predictor Recommendations
  • 35.
    Content-Boosted CF -I 35 Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings User-ratings Vector User-rated Items Unrated Items
  • 36.
    Conclusions  Recommending andpersonalization are important approaches to combating information over-load.  Machine Learning is an important part of systems for these tasks.  Collaborative filtering has problems.  Content-based methods address these problems (but have problems of their own).  Integrating both is best. 36

Editor's Notes

  • #6 But, It is not necessary to predict every blank entry in a utility matrix. Rather, it is only necessary to discover some entries in each row that are likely to be high.
  • #12 Normalize: