Subclass of information filtering system that seek to predict the 'rating' or
'preference' that a user would give to an item.
Helps deciding in what to wear, what to buy, what stocks to purchase etc.
Applied in a variety of applications like movies, books, research arcticles.
People relied on the recommendations from their peers.
This method doesn’t take the personal preference of the user in to account.
It also limits the search space.
Computer based recommender systems overcomes this by expanding the search
space and providing a more fine tunes results.
Tasks of Recommender Systems
Predict Task- The user’s preference for an item.
Recommend Task- Produce best ranked list of n-items for user’s need.
collaborative filtering is the process of filtering for information or patterns using
techniques involving collaboration among multiple agents, viewpoints, data
For recommender systems collaborative filtering is a method of making automatic
predictions about the interests of a user by collecting preferences information
from many users.
Based on the idea that people who agreed in their evaluation of certain items in
the past are likely to agree again in the future.
User - User Collaborative Filtering
Basic Idea- find other users whose past rating behavior is similar to that of the
current user and use their ratings on other items to predict what the current user
Required: Ratings matrix and similarity function that computes the similarity
between two users.
The selection of neighbors can be random or based on a threshold value.
User U’s prediction for item i is given by pu,I
Item-Item Collaborative Filtering
Basic Idea- Recommend items that are similar to the user’s highly preferred items.
Provides performance gains by lending itself well to pre-computing similarity
User U’s prediction for item is given by pu,i
Cosine similarity or conditional probability is used to computer item-item
User-User or Item-Item CF: The user-items ratings domain is a vector space. Thus redundancy
Information Retrieval: term-document matrix thus high dimensional representation of terms
Synonymy, Polysemy, noise
Can we Reduce the number of dimensions to a constant k?
Truncated SVD – Singular dimensionality reduction by singular value decomposition
Information retrieval: LSA/LSI Latent semantic analysis / index.
The core idea of probabilistic methods is to compute either P(i|u), the probability
that user u will purchase or view item i, or the probability distribution P(ru,i|u) over
user u’s rating of item I
uses pairwise conditional probabilities with the na¨ıve Bayes assumption to do
recommendation in unary e-commerce domains.
Based on user purchase histories, the algorithm estimates P(a|b) (the probability that a
user purchases a given that they have purchased b) for each pair of items a, b. The
user’s currently-viewed item or shopping basket is combined with these pairwise
probabilities to recommend items optimizing the expected value of site-defined
Probabilistic Matrix Factorization
Probabilistic latent semantic analysis/indexing (PLSA/PLSI)
PLSA decomposes the probability P(i|u) by introducing a set Z of latent factors.
Here z is a factor on the basis of which user (u) decides which item (i) to view or
P(i|u) is therefore
Thus basically users are represented as a mixture of preference profiles or feature
preferences and attributes the item preference by user, to the preference profiles
rather than directly to the users.
ˆU is the matrix of the mixtures of preference profiles for each user
ˆT is the matrix of preference profile probabilities of selecting various items.
Σ is a diagonal matrix such that σz = P(z)
Hybrids can be particularly beneficial when the algorithms involved cover different use cases or
different aspects of the data set.
7 Classes of Hybrid Recommenders
Weighted – takes scores produced by several recommenders and combines them
Switching – switch between difference algorithms according to the context
Mixed – present several recommender results but not combined into single list.
Feature-combining – Use multiple recommendation data sources to get a single meta-recommender
Cascading – chain the algorithms (output of one to other as input)
Feature-augmenting – Uses output of one algo as one of the inputs to other algo
Meta-level – Train a model using one algo and give it as input to another algo
Example: Netflix Prize – Feature weighted linear stacking;
function gj of item meta-features, such as number of ratings or genre, to alter the blending ratio of the
various algorithms’ predictions on an item-by-item basis
User-based Algo: more tractable when there are more items than users
Item-based Algo: more tractable when there are more users than items
Minimal offline computation but higher online computation
Matrix Factorization methods:
- Require expensive offline model
+ Fast for online use
+ Reduced impact of ratings noise
+ Reduced impact of user rating on each others’ ratings
Probabilistic Models: when recommendation process should follow models of user
Evaluating Recommender Systems
It can be costly to try algorithms on real sets of users and measure the effects.
Offline Algorithmic Evaluations:
Pre-test algorithms in order to understand user testing.
It is beneficial for performing direct, objective comparison of different algorithms in a
EachMovie: by DEC Systems Research center – 2.8M user ratings of movies
MovieLens: 100K timestamped user ratings, 1M ratings, and 10M rating and 100K
timestamped records of users tagging movies.
Jester: ratings of 100 jokes from 73,421 users between April 99’ – May 03’, and
ratings of 150 jokes from 63,974 users between Nov 06’ – May 09’
BookCrossing: 1.1M ratings from 279K users for 271K books
Netflix: 100M datestamped ratings of 17K movies from 480K users.
Offline Evaluation Structure
The users in the data set are split into two groups: training set and test set.
A recommender model is built against the training set.
The users in the test set are then split into two parts: query set and target set.
The recommender is given the query set as a user history and asked to
recommend items or to predict ratings for the items in the target set;
it is then evaluated on how well its recommendations or predictions match with
those held out in the query.
This whole process is frequently repeated as in k-fold cross-validation by splitting
the users into k equal sets.
Prediction Accuracy: MAE
(MAE) Mean Absolute Error:
Example: 5-star scale [1, 5], an MAE of 0.7 means that the algorithm, on average,
was off by 0.7 stars.
This is useful for understanding the results in a particular context, but makes it
difficult to compare results across data sets as they have differing rating ranges
(NMAE) Normalized mean absolute error: Divides the ranges of possible ratings
and thus a common metric range of [0,1]
Prediction Accuracy: RMSE
(RMSE) Root Mean Square Error: Amplifies the larger absolute errors
Netflix Prize: $1M prize was awarded for a 10% improvement in RMSE over Netflix’s
Further, RMSE can also be normalized like NMAE by dividing the rating scale.
Out of the three techniques, which one to use depends on how the results are to be
Mostly these metrics are used for evaluation of predict tasks.
Accuracy over time
Temporal versions of MAE and RMSE introduced to measure the accuracy of
recommender systems over time as and when more users are added to the
Hence the timestamped datasets prove to be very useful for measuring accuracy
nt - number of ratings computed up through time t
tu,i - the time of rating ru,i.
Decision Support Metrics
This framework examines the capacity for a retrieval system to accurately identify
resources relevant to a query, measuring separately its capacity to find all relevant
items and avoid finding irrelevant items.
A confusion matrix is used for measuring this.
Decision Support Metrics
High Precision System: Example - Movie Recommendation
High Recall System: Example – Legal precedent needs
Offline evaluation though useful is limited to operating on past data.
Recommender systems with similar metric performance can still give different
results and a decrease in the error may or may not make the system better at
meeting the user needs.
For this online user testing is needed.
Field Trials: Here the recommender is deployed in the live systems and users’
interaction with the system are recorded
Virtual Lab Studies: They generally have a small user base who are invited to
participate instead of live user base.
Building a Data Set
The need for preference data can be decomposed into two types of information
User information: user’s preferences
Item information: what kinds of users like or dislike each item
User–item preferences: Set of characteristics, user preferences for those
characteristics, and those characteristics’ applicability to various items.
Item–item model: What items are liked by the same users as well as the current
Problem of providing recommendations when there is not yet data available
Item cold-start: A new item has been added to the database (e.g., when a new
movie or book is released) but has not yet received enough ratings to be
User cold-start: A new user has joined the system but their preferences are not yet
Sources of Preference Data
Preference data (ratings) comes from two primary sources.
Explicit ratings: Preferences the user has explicitly stated for particular items.
Implicit ratings: Preferences inferred by the system from observable user activity, such as
purchases or clicks.
Many recommender systems obtain ratings by having users explicitly state their
preferences for items. These stated preferences are then used to estimate the
user’s preference for items they have not rated.
Drawback: There can, for many reasons, be a discrepancy between what the users say
and what they do.
Preferences can also be inferred from user
Usenet – Reading
Time spent reading
Saving or replying
Copying text into new articles
Mentions of URLs.
Infers the user’s preference for
various songs in their library as they
skip them or allow them to play to
Item purchases as gifts or personal
Shared accounts can be misleading
GroupLens used a 5-star scale
Jester uses a semi-continuous −10 to +10 graphical scale
Ringo used a 7-star scale
Pandora music uses a “like”/“dislike” method
Dealing with Noise
Noise in rating can be introduced by – normal human error and other factors.
Natural noise in ratings can be detected by asking users to re-rate items.
Another approach is detecting and ignoring noisy ratings by comparing each
rating to the user’s predicted preference for that item and discarding ratings
whose differences exceed some threshold from the prediction and