(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
IR UNIT V.docx
1. UNIT V
RECOMMENDER SYSTEM
5. Recommender Systems
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to
be of use to a user. The suggestions relate to various decision-making processes, such as what
items to buy, what music to listen to, or what online news to read.
“Item” is the general term used to denote what the system recommends to users. A RS normally
focuses on a specific type of item (e.g., CDs, or news) and accordingly its design, its graphical
user interface, and the core recommendation technique used to generate the recommendations are
all customized to provide useful and effective suggestions for that specific type of item.
In the popular Web site, Amazon.com, the site employs a RS to personalize the online store for
each customer.
Since recommendations are usually personalized, different users or user groups receive diverse
suggestions.
In addition there are also non-personalized recommendations. These are much simpler to generate
and are normally featured in magazines or newspapers.
Typical examples include the top ten selections of books, CDs etc. While they may be useful and
effective in certain situations, these types of non-personalized recommendations are not typically
addressed by RS research.
In seeking to mimic this behavior, the first RSs applied algorithms to leverage recommendations
produced by a community of users to deliver recommendations to an active user, i.e., a user
looking for suggestions.
The recommendations were for items that similar users (those with similar tastes) had liked.
This approach is termed collaborative-filtering and its rationale is that if the active user agreed in
the past with some users, then the other recommendations coming from these similar users should
be relevant as well and of interest to the active user.
As e-commerce Web sites began to develop, a pressing need emerged for pro- viding
recommendations derived from filtering the whole range of available alter- natives.
Users were finding it very difficult to arrive at the most appropriate choices from the immense
variety of items (products and services) that these Web sites were offering.
The explosive growth and variety of information available on the Web and the rapid introduction
of new e-business services (buying products, product comparison, auction, etc.) frequently
overwhelmed users, leading them to make poor decisions.
2. The availability of choices, instead of producing a benefit, started to decrease users’ well-being. It
was understood that while choice is good, more choice is not always better.
RSs have proved in recent years to be a valuable means for coping with the information overload
problem. Ultimately a RS addresses this phenomenon by pointing a user towards new, not-yet-
experienced items that may be relevant to the user’s current task.
Recommender systems emerged as an independent research area in the mid-1990s. In recent years,
the interest in recommender systems has dramatically increased, as the following facts indicate:
Recommender systems play an important role in such highly rated Internet sites as Amazon.com,
YouTube, Netflix, Yahoo, Tripadvisor, Last.fm, and IMDb.
Moreover many media companies are now developing and deploying RSs as part of the services
they provide to their subscribers.
For example Netflix, the online movie rental service, awarded a million dollar prize to the team
that first succeeded in improving substantially the performance of its recommender system.
There are dedicated conferences and workshops related to the field. We refer specifically to ACM
Recommender Systems (RecSys), established in 2007 and now the premier annual event in
recommender technology research and applications.
At institutions of higher education around the world, undergraduate and graduate courses are now
dedicated entirely to RSs; tutorials on RSs are very popular at computer science conferences; and
recently a book introducing RSs techniques was published.
There have been several special issues in academic journals covering research and developments
in the RS field. Among the journals that have dedicated issues to RS are: AI Communications
(2008); IEEE Intelligent Systems (2007); Inter- national Journal of Electronic Commerce (2006);
International Journal of Computer Science and Applications (2006); ACM Transactions on
Computer-Human Interaction (2005); and ACM Transactions on Information Systems (2004).
5.1 Recommender Systems Functions
In fact, there are various reasons as to why service providers may want to exploit this technology:
o Increase the number of items sold. This is probably the most important function for a
commercial RS, i.e., to be able to sell an additional set of items compared to those usually sold
without any kind of recommendation.
o Sell more diverse items. Another major function of a RS is to enable the user to select items
that might be hard to find without a precise recommendation. For instance, in a movie RS such
as Netflix, the service provider is interested in renting all the DVDs in the catalogue, not just
the most popular ones.
3. o Increase the user satisfaction. A well designed RS can also improve the experience of the
user with the site or the application. The user will find the recommendations interesting,
relevant and, with a properly designed human-computer interaction, she will also enjoy using
the system. The combination of effective, i.e., accurate, recommendations and a usable
interface will increase the user’s subjective evaluation of the system.
o Increase user fidelity. A user should be loyal to a Web site which, when visited, recognizes
the old customer and treats him as a valuable visitor.
o Better understand what the user wants. Another important function of a RS, which can be
leveraged to many other applications, is the description of the user’s preferences, either
collected explicitly or predicted by the system. The service provider may then decide to re-use
this knowledge for a number of other goals such as improving the management of the item’s
stock or production.
Its primary function is to locate documents that are relevant to the user’s information need, but it
can also be used to check the importance of a Web page (looking at the position of the page in the
result list of a query) or to discover the various usages of a word in a collection of documents.
o Find Some Good Items: Recommend to a user some items as a ranked list along with predictions
of how much the user would like them (e.g., on a one- to five- star scale).
o Find all good items: Recommend all the items that can satisfy some user needs.
o Annotation in context: Given an existing context, e.g., a list of items, emphasize some of them
depending on the user’s long-term preferences. For example, a TV recommender system might
annotate which TV shows displayed in theelectronic program guide (EPG) are worth watching
o Recommend a sequence: Instead of focusing on the generation of a single recommendation, the
idea is to recommend a sequence of items that is pleasing as a whole.
o Recommend a bundle: Suggest a group of items that fits well together. For in- stance a travel
plan may be composed of various attractions, destinations, and accommodation services that are
located in a delimited area.
o Just browsing: In this task, the user browses the catalog without any imminent intention of
purchasing an item. The task of the recommender is to help the user to browse the items that are
more likely to fall within the scope of the user’s interests for that specific browsing session. This is a
task that has been also supported by adaptive hypermedia techniques.
4. o Find credible recommender: Some users do not trust recommender systems thus they play with
them to see how good they are in making recommendations.
o Improve the profile: This relates to the capability of the user to provide (input) information to the
recommender system about what he likes and dislikes.
o Express self: Some users may not care about the recommendations at all.
o Help others: Some users are happy to contribute with information, e.g., their evaluation of items
(ratings), because they believe that the community benefits from their contribution.
o Influence others: In Web-based RSs, there are users whose main goal is to explicitly influence
other users into purchasing particular products.
5.2 Data and Knowledge Sources
RSs are information processing systems that actively gather various kinds of data in order to build
their recommendations.
Data is primarily about the items to suggest and the users who will receive these recommendations.
But, since the data and knowledge sources available for recommender systems can be very
diverse, ultimately, whether they can be exploited or not depends on the recommendation
technique.
In general, there are recommendation techniques that are knowledge poor, i.e., they use very
simple and basic data, such as user ratings/evaluations for items.
Other techniques are much more knowledge dependent, e.g., using ontological descriptions of the
users or the items, or constraints, or social relations and activities of the users.
In any case, as a general classification, data used by RSs refers to three kinds of objects: items,
users, and transactions, i.e., relations between users and items.
1. Items.
Items are the objects that are recommended.
Items may be characterized by their complexity and their value or utility.
The value of an item may be positive if the item is useful for the user, or negative if the item is not
appropriate and the user made a wrong decision when selecting it.
We note that when a user is acquiring an item she will always incur in a cost, which includes the
cognitive cost of searching for the item and the real monetary cost eventually paid for the item.
For instance, the designer of a news RS must take into account the complexity of a news item, i.e.,
its structure, the textual representation, and the time-dependent importance of any news item.
5. Items with low complexity and value are: news, Web pages, books, CDs, movies. Items with larger
complexity and value are: digital cameras, mobile phones, PCs, etc.
The most complex items that have been considered are insurance policies, financial investments,
travels, jobs.
RSs, according to their core technology, can use a range of properties and features of the items.
For example in a movie recommender system, the genre (such as comedy, thriller, etc.),
2. Users.
Users of a RS, as mentioned above, may have very diverse goals and characteristics.
In order to personalize the recommendations and the human-computer interaction, RSs exploit a
range of information about the users.
This information can be structured in various ways and again the selection of what information to
model depends on the recommendation technique.
For instance, in collaborative filtering, users are modeled as a simple list containing the ratings
provided by the user for some items. In a demographic RS, socio- demographic attributes such as
age, gender, profession, and education, are used.
Users can also be described by their behavior pattern data, for example, site browsing patterns (in
a Web-based recommender system) , or travel search patterns (in a travel recommender system).
Moreover, user data may include relations between users such as the trust level of these relations
between users. A RS might utilize this information to recommend items to users that were
preferred by similar or trusted users.
3. Transactions.
We generically refer to a transaction as a recorded interaction between a user and the RS.
Transactions are log-like data that store important information generated during the human-
computer interaction and which are useful for the recommendation generation algorithm that the
system is using.
For instance, a transaction log may contain a reference to the item selected by the user and a
description of the context (e.g., the user goal/query) for that particular recommendation.
If available, that transaction may also include an explicit feedback the user has provided, such as
the rating for the selected item.
6. In fact, ratings are the most popular form of transaction data that a RS collects.
These ratings may be collected explicitly or implicitly.
In the explicit collection of ratings, the user is asked to provide opinion about an item on a
rating scale. Accordingly, ratings can take on a variety of forms:
o Numerical ratings such as the 1-5 stars provided in the book recommender associated with
Amazon.com.
o Ordinal ratings, such as “strongly agree, agree, neutral, disagree, strongly disagree” where the
user is asked to select the term that best indicates her opinion regarding an item (usually via
questionnaire).
o Binary ratings that model choices in which the user is simply asked to decide if a certain item
is good or bad.
o Unary ratings can indicate that a user has observed or purchased an item, or otherwise rated the
item positively. In such cases, the absence of a rating indicates that we have no information
relating the user to the item (perhaps shepurchased the item somewhere else).
In transactions collecting implicit ratings, the system aims to infer the users opinion based on the
user’s actions.
For example, if a user enters the keyword “Yoga” at Amazon.com she will be provided with a long
list of books. In return, the user may click on a certain book on the list in order to receive additional
information.
Atthis point, the system may infer that the user is somewhat interested in that book.
5.3 Recommendation Techniques
Too much information: information overload – consumers have too many options
A recommender system is a system which provides recommendations to a user
Information used for recommendations can come from different sources:
browsing and searching data
purchase data
feedback explicitly provided by the users
textual comments
expert recommendations
demographic data
7. Recommendations can take the following forms:
Attribute-based recommendations: based on syntactic attributes of products
(e.g. science fiction books)
Item-to-item correlation (as in shopping basket recommendations)
User-to-user correlation (finding users with similar tastes)
Non-personalized recommendations (as in traditional stores, i.e. dish of the day, generic
book recommendations etc.)
Recommendation technologies-IR systems
allow users to express queries to retrieve information relevant to a topic of
interest or fulfill an information need
they are not useful in the actual recommendation process
they cannot capture any information about the users’ preferences
they cannot retrieve documents based on opinions or quality as they are textbased
To address these issues two techniques have been developed:
o Content-based filtering (Information filtering)
o Collaborative-based filtering
5.4 Content-based filtering
The system processes information from various sources and tries to extract useful elements about
its content
o keyword-based search (keywords sometimes in Boolean form)
o semantic-information extraction by using associative networks of keywords, or directed
graphs of words
Each user is assumed to act independently and the system requires a profile of the user’s needs or
preferences
The user has to provide information on her personal interests on starting to use the system for the
profile to be built
The profile includes information about the items of interest, i.e. movies, books, CDs etc.
Content-based filtering techniques try to identify similar items which are returned as
recommendations
They do not depend on having other users in the system
Issues:
o Pure content-based filtering systems are not capable of exploring new items and topics
o Over-specialization: one is restricted in viewing similar items
8. o Difficult to apply in situations where the desirability of an item is determined in part by aesthetic
qualities that are difficult to quantity – it is difficult to apply content-based analysis to such items
The user profiles
For the system to produce accurate recommendations, the user has to provide constant
feedback on the returned suggestions – users do not like providing feedback
Consist entirely of ratings of items and topics of interest: the fewer the ratings, the more
limited the set of possible recommendations
As the user’s interests change, these changes need to be tracked
5.4.1 High Level Architecture:
The High level architecture of Content based filtering is shown below Fig 5.1
Fig.5.1 Content-based Filtering Architecture
CONTENT ANALYZER
When information has no structure (e.g. text), some kind of pre-processing step is needed to
extract structured relevant information.
The main responsibility of the component is to represent the content of items (e.g. documents,
Web pages, news, product descriptions, etc.) coming from information sources in a form suitable
for the next processing steps.
9. Data items are analyzed by feature extraction techniques in order to shift item representation from
the original information space to the target one (e.g. Web pages represented as keyword vectors).
This representation is the input to the PROFILE LEARNER and FILTERING COMPONENT
PROFILE LEARNER
This module collects data representative of the user preferences and tries to generalize this data, in
order to construct the user profile.
Usually, the generalization strategy is realized through machine learning techniques, which are
able to infer a model of user interests starting from items liked or disliked in the past.
For instance, the PROFILE LEARNER of a Web page recommender can implement a relevance
feedback method in which the learning technique combines vectors of positive and negative
examples into a prototype vector representing the user profile.
Training examples are Web pages on which a positive or negative feedback has been provided by
the user
FILTERING COMPONENT
This module exploits the user profile to suggest relevant items by matching the profile
representation against that of items to be recommended.
The result is a binary or continuous relevance judgment (computed using some similarity metrics,
the latter case resulting in a ranked list of potentially interesting items.
In the above mentioned example, the matching is realized by computing the cosine similarity
between the prototype vector and the item vectors.
5.4.2 Advantages and Drawbacks of Content-based Filtering
Advantages
The model doesn't need any data about other users, since the recommendations are specific to this
user. This makes it easier to scale to a large number of users.
The model can capture the specific interests of a user, and can recommend niche items that very
few other users are interested in.
Drawbacks
Since the feature representation of the items are hand-engineered to some extent, this technique
requires a lot of domain knowledge. Therefore, the model can only be as good as the hand-
engineered features.
The model can only make recommendations based on existing interests of the user. In other words,
the model has limited ability to expand on the users' existing interests.
10. 5.5 Collaborative filtering
Collaborative-based filtering systems can produce recommendations by computing the similarity
between a user’s preferences and the preferences of other people
Such systems do not attempt to analyze or understand the content of the items being recommended
They are able to suggest new items to user who have similar preferences with others
Match people with similar interests as a basis for recommendation.
Many people must participate to make it likely that a person with similar interests will be found.
There must be a simple way for people to express their interests.
There must be an efficient algorithm to match people with similar interests
How does CF Work?
o Users rate items – user interests recorded. Ratings may be:
Explicit, e.g. buying or rating an item
Implicit, e.g. browsing time, no. of mouse clicks
o Nearest neighbour matching used to find people with similar interests
o Items that neighbours rate highly but that you have not rated are recommended to you
o User can then rate recommended items
5.5.1 Basic mechanism:
A large group of people’s preferences are registered
A subgroup of people is located whose preferences are similar of the user who seeks the
recommendation
An average of the preferences for that group is calculated
The resulting preference function is used to recommend options to the user who seeks the
recommendation
The concept of similarity needs to be defined in some way
11. Example user-item matrix
What would be the recommendation for user D?
CF Techniques:
Case Study – Amazon.com
Customers who bought this item also bought:
Item-to-item collaborative filtering
o Find similar items rather than similar customers.
Record pairs of items bought by the same customer and their similarity.
o This computation is done offline for all items.
Use this information to recommend similar or popular books bought by others.
o This computation is fast and done online.
12. Challenges for CF
Sparsity problem – when many of the items have not been rated by many people, it may be hard to
find ‘like minded’ people.
First rater problem – what happens if an item has not been rated by anyone?
Privacy problems.
Can combine CF with CB recommenders
o Use CB approach to score some unrated items.
o Then use CF for recommendations.
Content-Based (CB) – use personal preferences to match and filter items
o E.g. what sort of books do I like?
Serendipity - recommend to me something I do not know already
o Oxford dictionary: the occurrence and development of events by chance in a happy or
beneficial way.
Issues
A critical mass of users is needed in order to create a database of preferences: first-rater or cold
start problem
New items cannot be recommended until someone has rated them
The scarcity of ratings (the user profiles are sparse vectors of ratings) also presents a problem
Recommendations will come from users with which the active user shares ratings (or votes) – this
presents a problem to methods such as Pearson’s correlation coefficients; potential solutions:
default voting
Scalability: in systems with a large number of items and users, computation grows linearly;
appropriate algorithms that scale up are needed
Reliability, especially in reputation systems: content providers inflate their ratings
Lack of transparency: the user is given no indication whether to trust a recommendation –
incorporating explanation systems would help address this concern
Privacy – once a system has built your profile, who else can have access to it?
Advantages
No domain knowledge necessary
Serendipity
Great starting point
Disadvantages
Cannot handle fresh items
Hard to include side features for query/item
13. Combing collaborative and content-based filtering
The underlying idea is that the content is also taken into account when attempting to identify
similar users for collaborative recommendations
A number of systems have been developed: Fab, Tango, the Recommender system,
GroupLens approach
5.6 Matrix Factorization Models
Matrix factorization is a class of collaborative filtering algorithms used in recommender systems.
Matrix factorization algorithms work by decomposing the user-item interaction matrix into the
product of two lower dimensionality rectangular matrices.
Latent Dirichlet Allocation and models that are induced by factorization of the user-item ratings
matrix (also known as SVD-based models).
However, applying SVD to explicit ratings in the CF domain raises difficulties due to the high
portion of missing values.
Conventional SVD is undefined when knowledge about the matrix is incomplete.
Moreover, carelessly addressing only the relatively few known entries is highly prone to over
fitting.
Earlier works relied on imputation which fills in missing ratings and makes the rating matrix
dense.
However, imputation can be very expensive as it significantly increases the amount of data. In
addition, the data may be considerably distorted due to inaccurate imputation.
5.6.1. SVD
Matrix factorization models map both users and items to a joint latent factor space of
dimensionality f , such that user-item interactions are modeled as inner products in that space.
The latent space tries to explain ratings by characterizing both products and users on factors
automatically inferred from user feedback.
For example, when the products are movies, factors might measure obvious dimensions such as
comedy vs. drama, amount of action, or orientation to children; less well defined dimensions such
as depth of character development or “quirkiness”; or completely uninterpretable dimensions.
Accordingly, each item i is associated with a vector qi ∈ Rf
, and each user u is associated with a
vector pu ∈ Rf
.Thus, a rating is predicted by the rule
14. In order to learn the model parameters (bu,bi, pu and qi) we minimize the regularized squared error
The constant λ4
, which controls the extent of regularization, is usually determined by cross
validation. Minimization is typically performed by either stochastic gradient descent or alternating
least squares.
5.6.2. SVD++
Prediction accuracy is improved by considering also implicit feedback, which provides an
additional indication of user preferences.
This is especially helpful for those users that provided much more implicit feedback than explicit
one.
As explained earlier, even in cases where independent implicit feedback is absent, one can capture
a significant signal by accounting for which items users rate, regardless of their rating value.
This led to several methods that modeled a user factor by the identity of the items he/she has rated.
Here we focus on the SVD++ method which was shown to offer accuracy superior to SVD.
To this end, a second set of item factors is added, relating each item i to a factor vector yi ∈ Rf
.
Those new item factors are used to characterize users based on the set of items that they rated. The
exact model is as follows:
The set R(u) contains the items rated by user u.
Several types of implicit feedback can be simultaneously introduced into the model by using extra
sets of item factors. For example, if a user u has a certain kind of implicit preference to the items
in N1
(u) (e.g., she rented them), and a different type of implicit feedback to the items in N2
(u)
(e.g., she browsed them), we could use the model
15. The relative importance of each source of implicit feedback will be automatically learned by the
algorithm by its setting of the respective values of model parameters.
5.6.3. Time-aware factor model
The matrix-factorization approach lends itself well to modeling temporal effects, which can
significantly improve its accuracy.
Decomposing ratings into distinct terms allows us to treat different temporal aspects separately.
Specifically, we identify the following effects that each vary over time: (1) user biases bu(t), (2)
item biases bi(t), and (3) user preferences pu(t).
On the other hand, we specify static item characteristics, qi, because we do not expect significant
temporal variation for items, which, unlike humans, are static in nature.
We start with a detailed discussion of the temporal effects that are contained within the baseline
predictors.
5.6.3.1 Time changing baseline predictors
Much of the temporal variability is included within the baseline predictors, through two major
temporal effects.
The first addresses the fact that an item’s popularity may change over time. For example,
movies can go in and out of popularity as triggered by external events such as the appearance of an
actor in a new movie. This is manifested in our models by treating the item bias bi as a function of
time.
The second major temporal effect allows users to change their baseline ratings over time. For
example, a user who tended to rate an average movie “4 stars”, may now rate such a movie “3
stars”. This may reflect several factors including a natural drift in a user’s rating scale, the fact that
ratings are given in relationship to other ratings that were given recently and also the fact that the
identity of the rater within a household can change over time.
Hence, in our models we take the parameter bu as a function of time. This induces a template for a
time sensitive baseline predictor for u’s rating of i at day tui:
Here, bu(·) and bi(·) are real valued functions that change over time.
We start with our choice of time-changing item biases bi(t).
16. One simple modeling choice uses a linear function to capture a possible gradual drift of user bias.
For each user u, we denote the mean date of rating by tu. Now, if u rated a movie on day t, then the
associated time deviation of this rating is defined as
Definition of a time dependent user-bias
We designate ku time points – {tu
1 , . . . , tu
ku} – spaced uniformly across the dates of u’s ratings as
kernels that control the following function:
The time-linear model becomes
Similarly, the spline-based model becomes
For example, in our actual implementation we adopt rule for modeling the drifting user bias, thus
arriving at the baseline predictor
Table 5.1: Comparing baseline predictors capturing main movie and user effects. As temporal modeling
becomes more accurate, prediction accuracy improves (lowering RMSE).
17. We code the predictors
This way, the item bias of
Similarly, recurring user effects may be modeled by modifying
Thus, the baseline predictor
5.6.3.2 Time changing factor model
However, as hinted earlier, temporal dynamics go beyond this, they also affect user preferences
and thereby the interaction between users and items.
Users change their preferences over time. For example, a fan of the “psychological thrillers” genre
may become a fan of “crime dramas” a year later.
Similarly, humans change their perception on certain actors and directors. This type of evolution is
modeled by taking the user factors (the vector pu) as a function of time.
Once again, we need to model those changes at the very fine level of a daily basis, while facing the
built in scarcity of user ratings.
In fact, these temporal effects are the hardest to capture, because preferences are not as
pronounced as main effects (user-biases), but are split over many factors.
movie-rating dataset, we have found modeling
18. At this point, we can tie all pieces together and extend the SVD++ factor model by incorporating
the time changing parameters. The resulting model will be denoted as time SVD++, where the
prediction rule is as follows:
5.6.4. Comparison
we compare results of the three algorithms discussed in this section.
First is SVD, the plain matrix factorization algorithm.
Second, is the SVD++ method, which improves upon SVD by incorporating a kind of implicit
feedback.
Finally is timeSVD++, which accounts for temporal effects.
The three methods are compared over a range of factorization dimensions ( f ).
All benefit from a growing number of factor dimensions that enables them to better express
complex movieuser interactions.
5.7 Neighborhood Models
The most common approach to CF is based on neighborhood models.
Its user-user based and good analysis.
User-user methods estimate unknown ratings based on recorded ratings of likeminded users.
Later, an analogous item-item approach became popular. In those methods, a rating is estimated
using known ratings made by the same user on similar items. Better scalability and improved
accuracy make the item-item approach more favorable in many cases.
In addition, item-item methods are more amenable to explaining the reasoning behind predictions.
19. This is because users are familiar with items previously preferred by them, but do not know those
allegedly like-minded users.
We focus mostly on item-item approaches, but the same techniques can be directly applied within
a user-user approach
In general, latent factor models offer high expressive ability to describe various aspects of the data.
Thus, they tend to provide more accurate results than neighborhood models. However, most
literature and commercial systems (e.g., those of Amazon and TiVo ) are based on the
neighborhood models.
5.7.1. Similarity measures
Central to most item-item approaches is a similarity measure between items.
Frequently, it is based on the Pearson correlation coefficient, ρi j, which measures the tendency of
users to rate items i and j similarly.
Since many ratings are unknown, some items may share only a handful of common observed
raters.
The empirical correlation coefficient, ρˆi j, is based only on the common user support.
It is advised to work with residuals from the baseline predictors to compensate for user- and item-
specific deviations.
Thus the approximated correlation coefficient is given by
The set U(i, j) contains the users who rated both items i and j.
shrunk correlation coefficient of the form
The variable ni j = |U(i, j)| denotes the number of users that rated both i and j. A typical value for λ8 is
100.
5.7.2. Similarity-based interpolation
Its one of most popular approach to neighborhood modeling, and apparently also to CF in general.
Our goal is to predict rui – the unobserved rating by user u for item i.
Using the similarity measure, we identify the k items rated by u that are most similar to i.
This set of k neighbors is denoted by Sk(i;u).
The predicted value of rui is taken as a weighted average of the ratings of neighboring items, while
adjusting for user and item effects through the baseline predictors
20. Similarity-based methods became very popular because they are intuitive and relatively simple to
implement.
They also offer the following two useful properties:
oExplainability
oNew ratings
However, standard neighborhood-based methods raise some concerns:
o The similarity function (si j), which directly defines the interpolation weights, is arbitrary
o neighborhood-based methods do not account for interactions among neighbors.
o By definition, the interpolation weights sum to one, which may cause overfitting.
o Neighborhood methods may not work well if variability of ratings differs substantially among
neighbors
For example, the third item, dealing with the sum-to-one constraint, can be alleviated by using the
following prediction rule:
5.7.3. Jointly derived interpolation weights
More accurate neighborhood model that overcomes the difficulties discussed above, while
retaining known merits of item-item models.
As above, we use the similarity measure to define neighbors for each prediction.
However, we search for optimum interpolation weights without regard to values of the similarity
measure.
Given a set of neighbors Sk(i;u) we need to compute interpolation weights
that enable the best prediction rule of the form
Typical values of k (number of neighbors) lie in the range of 20–50; see [2]. During this subsection
we assume that baseline predictors have already been removed.
21. 5.7.3.1 Formal model
To start, we consider a hypothetical dense case, where all users but u rated both i and all its
neighbors in Sk
(i;u).
In that case, we could learn the interpolation weights by modeling the relationships between item i
and its neighbors through a least squares problem
Accordingly, we define the corresponding k×k matrix ˆA and the vector ˆb∈ Rk:
o The parameter β controls the extent of the shrinkage. A typical value would be β = 500.
5.7.3.2 Computational issues
Efficient computation of an item-item neighborhood method requires pre-computing certain values
associated with each item-item pair for rapid retrieval.
First, we need a quick access to all item-item similarities, by pre-computing all si j values, as
explained in Subsection
Second, we pre-compute all possible entries of ˆA and ˆb.
To this end, for each two items i and j, we compute
22. 5.7.4. Summary
Collaborative filtering through neighborhood-based interpolation is probably the most popular way
to create a recommender system. Three major components characterize the neighborhood
approach: (1) data normalization, (2) neighbor selection, and (3) determination of interpolation
weights.
Normalization is essential to collaborative filtering in general and in particular to the more local
neighborhood methods. Otherwise, even more sophisticated methods are bound to fail, as they mix
incompatible ratings pertaining to different unnormalized users or items. We described a suitable
approach to data normalization, based around baseline predictors.
Neighborhood selection is another important component. It is directly related to the employed
similarity measure. Here, we emphasized the importance of shrinking unreliable similarities, in
order to avoid detection of neighbors with a low rating support.
Finally, the success of neighborhood methods depends on the choice of the interpolation weights,
which are used to estimate unknown ratings from neighboring known ones. Nevertheless, most
known methods lack a rigorous way to derive these weights. We showed how the interpolation
weights can be computed as a global solution to an optimization problem that precisely reflects
their role.