Recommender Engines Seminar Paper

RWTH Aachen University
University of Bonn
Fraunhofer FIT
E-Commerce Seminar WT 08/09

Recommender Engines
Seminar Paper

Thomas Hess (289222)

February 1, 2009

Abstract Recommender engines are used by more and more e-commerce businesses to help con-
sumers ﬁnding products they are interested in. The paper describes what recommender engines are
and what role they play in e-commerce. Recommender engines use various techniques that use dif-
ferent knowledge sources to make recommendations. The paper explains these techniques and their
strengths and weaknesses. Some of the common issues that recommender systems face are discussed
and possible solutions presented. Concluding examples of recommender engines in e-commerce are
described. It is shown what techniques they use and how the e-businesses utilize recommendations on
their websites.

Contents

1 Introduction 5

2 Recommender Techniques 6
2.1 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 User-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Item-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.3 Model-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Issues And Solutions 14
3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Cold Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Stability vs. Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Performance & Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 User Input Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Recommender Engine Examples 19
4.1 ChoiceStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Amazon.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Conclusion 36

3

List of Figures

2.1 Knowledge Sources of Recommender Engines . . . . . . . . . . . . . . . . . . . . . 6
2.2 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 User-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Item-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8 Item-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Model-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1 ChoiceStream Recommender Engine . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Amazon – Item With Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Amazon – Shopping Cart With Recommendations . . . . . . . . . . . . . . . . . . . 24
4.4 Amazon – Your Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Amazon – Recommendation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Amazon – Your Purchases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Digg – Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.8 Digg – Topic Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.9 Digg – Homepage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.10 Digg – Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.11 Digg – Correlated User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4

1 Introduction

Recommender engines are personalized information agents that attempt to predict which items out of
a large pool a user may be interested in. These items can be of any type, like movies, music, books,
websites, or news articles. The user’s interest in an item is expressed through the rating the user gives
the item. A recommendation system has to predict the ratings for items that the user has not yet seen.
With these estimated ratings the system can recommend the items that have the highest estimated
rating.

Recommender engines have become an integral part of many e-commerce businesses [1, 2]. They are
a serious business tool that gets used by an ever-increasing number of online stores. Recommender
systems are an unique feature of e-commerce, as websites are able to track everything their customers
do, in contrast to real stores. The knowledge learned from the customers’ behaviour is the basis for
the recommendations. Because online businesses have no real space constraint, they can offer much
larger stocks, providing their customers with more choices. These large stocks become impossible to
stack search, so e-commerce stores must provide personalized versions with reduced choices to the
individual users. One way to achieve this is the use of recommender engines.

For e-commerce vendors, recommender engines provide multiple beneﬁts. Good recommender sys-
tems present customers products they are interested in but did not plan to buy, making them purchase
more items [2, 3, 4]. These unplanned purchases are not yet happening as often in online stores as in
traditional stores [2]. Recommender engines can help to gain consumers’ loyalty, which is a essential
business strategy in e-commerce as the competitor is always just “one click away” [4]. Because rec-
ommender systems make it easier und faster to ﬁnd new items, customers come back more often [2].
The more a user uses a website and purchases items, the more the recommender engine learns about
the user and the better the recommendations get. This helps to build a “value-added relationship”
between the website and the user [4]. Recommender systems are also a way to promote older or
low-demand items, such as niche products [2].

5

2 Recommender Techniques

The techniques used by recommender engines can be classified based on the information sources they
use [5, 2]. The available sources are the user features (demographics) (e.g. age, gender, profession,
income, location), the item features (e.g. keywords, genres), and the user-item ratings (gathered
through questionnaires, explicit ratings, transaction data). See figure 2.1.

2.1 Non-Personalized Recommendation

Non-personalized recommendations are identical for each user. The recommendations are either man-
ually selected (e.g. editor choices) or based on the popularity of items (e.g. average ratings, sales data).
See figure 2.2.

Figure 2.1: Knowledge Sources of Recommender Engines (From [5])

6


Figure 2.2: Non-Personalized Recommendation (From [5])

Because non-personalized recommendations are easy to compute, they are popular among e-commerce
businesses. They are also an option for websites that offer no personalization.

2.2 Demographic Recommendation

Demographic recommendation methods uses only the information about the users. The users are
categorized based on the attributes of their demographic profiles in order to find users with similar
features. The engine then recommends items that are preferred by these similar users. See figure 2.3.

Advantages

• Because user-item ratings are not used, new users can get recommendations before they have
rated any item.
• Knowledge about the items and their features is not needed, therefore the technique is domain-
independent.

Figure 2.3: Demographic Recommendation (From [5])

7


Figure 2.4: Content-Based Recommendation (From [5])

Problems

• Gathering the required demographic data leads to privacy issues, see 3.7.
• Demographic classification is too crude for highly personalized recommendations [5, 3]. The
generalisations created from the classification are often false, especially when it comes to cul-
tural items like books, music, or movies [6, 3].
• Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6).
• Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).

2.3 Content-Based Recommendation

Content-based recommendation methods use the information about item features and the ratings a
user has given to items. The technique combines these ratings to a profile of the user’s interests based
on the features of the rated items. The engine then can find items with the preferred features and
recommend the items with the highest similarity to the ones preferred in the past. See figure 2.4. The
recommendations of a content-based system are based on individual information and ignore contribu-
tions from other users.

The profiles of the users’ interests are often represented as vectors of weights on item features. But if
automatic learning methods, like a rule induction algorithm, are used to generate them, they can also
be rule-based [7].

Content-based recommendation works well if the items can be properly represented as a set of fea-
tures. The quality of the recommendations depends directly on the quality of the available descriptive
data. In order to have a sufficient set of features, the item descriptions must either be in a form from
which features can be extracted automatically with information retrieval techniques (e.g. text), or

8


the features must be assigned manually, which takes a lot of resources [8]. Besides objective cate-
gorizations, systems can also use (user-generated) tags associated to items that provide a subjective
view.

Problems

• Content analysis is necessary to determine the item features.
• The technique depends not only on the quality of the item metadata but also on the homogeneity
of the stock, so items can be categorized.
• The quality of items cannot be evaluated. The similarity computation is limited to the item
features [5].
• The technique suffers from the cold start problem for new users, see 3.2.

2.4 Collaborative Filtering

Collaborative filtering techniques use the user behaviour in form of the user-item ratings as their in-
formation source. The concept is to make correlations between users or between items.Collaborative
filtering is widely implemented and the most mature recommendation technique. Three main ap-
proaches of collaborative filtering can be distinguished: user-based, item-based, and model-based
approaches.

Advantages

• Like for demographic recommendations no knowledge about the item features is needed. Col-
laborative filtering works completely independent of machine-readable item representations. It
is therefore domain independent.
• The quality (not just the relevancy) of items can be evaluated, as it is also expressed through
user-item ratings [5].
• Collaborative filtering techniques are able to make recommendations “outside the box” because
they look outside the preferences of the individual user [1].

9


Figure 2.5: User-Based Collaborative Filtering (From [5])

Problems

• The quality of the recommendations depends on the size of the historical rating data set.
• The technique suffers from the cold start problem for new users and new items, see 3.2.
• Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6).

2.4.1 User-Based Approach

The user-based approach is based on the assumption that users that rated the same items similarly
probably have the same taste. It make user-to-user correlations by using the rating profiles of different
users to find highly correlated users. These users form like-minded neighbourhoods based on their
shared item preferences. The engine then can recommend the items preferred by the other users in the
neighbourhood. See figure 2.5.

Figure 2.6 shows an example of user-based collaborative recommendation.

But if there are little overlapping ratings across users in the data set, the user-based approach runs into
the sparsity problem, see 3.4.

User-based collaborative filtering does not scale well for many users and items, because the analysis
and comparison processes become more complex, see 3.5.

2.4.2 Item-Based Approach

The item-based approach focuses on items, assuming that items rated similarly are probably similar. It
compares items based on the shared appreciation of users, in order to create neighbourhoods of similar

10


Figure 2.6: User-Based Collaborative Filtering Example (From [5])

items. The engine then recommends the neighbouring items of the user’s know preferred ones. See
ﬁgure 2.7.

Figure 2.8 shows an example of item-based collaborative recommendation.

Item-based collaborative ﬁltering is more scalable than the user-based approach, as the correlations
are drawn among a limited number of products, instead of a potentially very large number of users.
Items are also easy to categorize, while users’ activities must be examined and analyzed. See 3.5.

Also because the number of items is naturally smaller than the number of users, the item-based ap-
proach has a reduced sparsity problem (see 3.4) in comparison to the user-based approach.

Figure 2.7: Item-Based Collaborative Filtering (From [5])

11


Figure 2.8: Item-Based Collaborative Filtering Example (From [5])

2.4.3 Model-Based Approach

For huge data sets, the quadratic complexity of the user-item rating matrix gets very high [7]. But in
real applications predictions must me made quickly. Model-based approaches address this problem
by deriving a model for prediction from historical user-item rating data, in order to make the online
prediction process faster. To build the model learning techniques like bayesian networks, neural net-
works, or latent semantic indexing are used. For an accurate model a large amount of data must be
available. The engine then makes the online recommendations by using the model. See ﬁgure 2.9.

As the model is build in advance of the online recommendation processes, this approach has a higher
performance than the memory-based approaches and avoids the scalability problem, see 3.5. Depend-
ing on the learning techniques used to create the model, this approach can lead to a higher recommen-
dation accuracy and a reduced sparsity problem [5].

The major drawback of the model-based approach is that the recommendation results do not adapt

Figure 2.9: Model-Based Collaborative Filtering (From [5])

12


automatically to data changes. Instead the model must be re-build to reﬂect updated data.

2.5 Hybrid Approaches

Hybrid approaches combine collaborative and demographic or content-based methods in order to over-
come their drawbacks. Collaborative ﬁltering systems often result in better predictive performance
but have problems when limited user-item ratings are available [7]. Demographic and content-based
recommendation systems work without rating data and therefore can compensate for the cold start
problem [1].

There are various methods to combine recommender techniques in a hybrid system [1, 9]:

Weighted Hybridization The scores of the different recommendation components are combined
numerically. Each component of the hybrid system scores a given item and the scores are
combined using a linear formula.

Switching Hybridization The system chooses among recommendation components based on the
situation and applies the selected one. Some reliable criterion must be available on which to
base the switching decision.

Mixed Hybridization Recommendations from different recommenders are presented side-by-side
in a combined list. The results of the recommender systems are not combined.

Feature Combination Features derived from different knowledge sources are combined together
and then injected into a single recommendation algorithm.

Feature Augmentation One recommendation technique is used to compute a feature or set of fea-
tures, which is then part of the input to the next technique.

Cascaded Hybridization Recommenders are given strict priority, with the lower priority ones break-
ing ties in the scoring of the higher ones.

Meta-Level Hybridization One recommendation technique is applied to produce a model, which is
then used as the input for another technique.

13

3 Issues And Solutions

3.1 Data Collection

The data used by recommender engines can be categorized into explicit and implicit data [2].

Explicit is all data that users themselves feed into the system. Like demographic data, information
about their preferences (e.g. collected through questionnaires), search terms, explicit ratings and
reviews of items (wisdom of the crowds). The collection of explicit data must not be intrusive or time
consuming. The way the explicit data is collected can affect the quality and amount of data the users
will provide [10].

Recommendation systems should not rely completely on explicit data. Websites are able to track
their user’s activities in order to acquire implicit data. The most important implicit data source in
e-commerce is the transaction data including the purchase information. Other sources are web usage
patterns like click sequences or reading times, or search engine referrers. Implicit data needs to be
analyzed ﬁrst before it can be used to describe user features or user-item ratings.

3.2 Cold Start

The cold start problem occurs when too little rating data is available in the initial state. The rec-
ommender system then lacks data to produce appropriate recommendations. A distinction is made
between the new user and new item problem.

New User Problem When recommendations follow from user-to-user correlations based on the
accumulation of ratings, a user with few ratings is difﬁcult to categorize.

14


New Item Problem A item with few ratings cannot easily be recommended. This problem occurs
particularly in domains with many new items (e.g. news articles). As the problem also occurs for long
tail items, it is also called “long tail problem” [10].

A solution to the cold start problem is the combination of the collaborative technique with demo-
graphic (for the new user problem) or content-based (for the new item problem) techniques in a hybrid
recommender engine, see 2.5. That way the cold start problem gets compensated by techniques that
don’t rely on user-item ratings.

Other solutions to reduce the cold start problem are the use of default ratings (e.g. from the average
rating of all users) [6, 10] or the use of active learning techniques in model-based recommendation
techniques [5].

3.3 Stability vs. Plasticity

The converse of the cold start problem is the stability vs. plasticity problem. When users have rated a
lot of items, their preferences in the established user profiles are difficult to change [1, 9]. But because
in reality taste evolves, this becomes a problem.

The solution for this is to gradually discount older ratings to have less influence. But by doing so
engines risk to loose information about long-term interests [1, 9].

Related to this problem is that users may use a website with different intentions. For example one day
a customer buys books for himself, but the next day he is looking for a present for someone else.

3.4 Sparsity

In most use cases for recommender systems, due to the catalog sizes of e-business vendors, the number
of ratings already obtained is usually very small compared to the number of ratings that need to be
predicted. But collaborative filtering techniques depend on an overlap in ratings across users and have
difficulties when the space of ratings is sparse (few users have rated the same items). Sparsity in the
user-item rating matrix degrades the quality of the recommendations.

15


To reduce the sparsity the rating data needs to be adjusted by either adding additional ratings or
reducing the dimensionality of the matrix. Ratings can be augmented by inserting simulated values
on behalf of the users. These can be ratings derived from other (implicit) data sources, like item views
or clicks, or default values [6].

The dimensionality of the rating matrix can be reduced by techniques such as singular value decompo-
sition [1]. Singular value decomposition is a well-known method for matrix factorization that provides
the best lower rank approximations of the original matrix. Dimensionality reduction techniques are
often used in model-based collaborative filtering approaches [1].

3.5 Performance & Scalability

Performance and scalability are important issues for recommender systems as e-commerce websites
must be able to determine recommendations in real-time and often deal with huge data sets of millions
of customers and items. The big growth rates of e-businesses are making the sets even larger in the
user dimension [6].

Definitive for the performance is the computational complexity of a recommendation technique. Tech-
niques that calculate correlation coefficients for M users over N items have a complexity of O(M × N)
in the worst case. Due to the common sparsity of the user-item rating matrix the performance tends
to be closer to O(M + N) [11]. However for large data sets this still leads to performance and scaling
issues.

Techniques that can perform the most expensive calculations offline scale better than techniques where
everything must be calculated online, in real time [11]. Demographic and content-based recommen-
dation as well as item- and model-based collaborative filtering can utilize offline computation. But
user-based collaborative filtering can do little or no offline computing, which makes it impractical for
large data sets [11].

Additionally to performing calculations offline, all methods that help reducing the size of the data
set improve performance and scalability of a recommendation technique [6]. For example users with
very few ratings or very popular or unpopular items could be discarded [11]. But these methods also
reduce the recommendation quality.

16


3.6 User Input Consistency

Recommender techniques that work with user-to-user correlations, like demographic or collaborative
filtering, depend on high correlation coefficients between the users in a data set.

Users can be split into three classes based on their correlation coefficients with other users [6]. The
majority of users fall into the class of “white sheep”, which have a high rating correlation with many
other users. Engines can easily find recommendations for these users. The opposite type are the
“black sheep”. For them there are only few or no correlating users. This makes it very difficult to find
recommendations for them. But when the number of overall users in a data set increases, the chance
to find similar users increases as well.

The bigger problem is the “gray sheep” problem. These users have different opinions or an unusual
taste, that results in low correlation coefficients with many users. They fall on a border between user
cliques. Recommendations for them are very difficult to find and they also cause odd recommenda-
tions for their correlated users.

3.7 Privacy

Privacy is an important issue in recommender systems. In order to provide personalized recommen-
dations, recommender systems must know something about the users. In fact, the more the systems
know, the more accurate the recommendations can get. Users are reasonably concerned about what
information is collected, how it is used, and if it is stored.

These privacy concerns affect both, the collection of explicit and implicit data. Regarding explicit
data, users are reluctant to disclose information about themselves and their interests [2, 4]. If ques-
tionnaires get too personal, users may provide false information in order to protect their privacy [4].
Recommender engines should be able to deal with privacy concerned users and not solely rely on
explicit data or recommender techniques that do, like demographic recommendation.

Regarding implicit data that gets acquired by tracking users’ behaviour, there are concerns that per-
sonal taste or private actions get revealed through the recommendations [5]. Users fear that extensive
consumer profiles get created.

17


To confront these concerns e-commerce businesses muss provide privacy protection mechanisms [5]
and make transparent which data gets acquired and analyzed. Usage und storage restrictions must be
assured through privacy policies [4].

18

4 Recommender Engine Examples

Recommender engines are developed and run by independent technology vendors and by e-commerce
businesses themselves.

The business model of recommendation technology vendors is either to offer the recommender engine
as a hosted service or to license their engines to e-commerce businesses. Examples for technology
vendors are: ChoiceStream1 , Baynote2 , ExpertMaker3 , Loomia4 , Criteo5 , SourceLight6 , and Collar-
ity7 .

Especially bigger e-commerce businesses develop their own recommender solutions because they
have unique requirements, want unique features, or deal with items that third-party products are not
suited for. Examples are: Amazon.com8 , Netﬂix9 , Digg10 , The Internet Movie Database (IMDb)11 ,
Pandora12 , and Last.fm13 .

In the following the techniques and usages of the recommender engines of ChoiceStream, Ama-
zon.com, and Digg are described in detail.

1 http://www.choicestream.com
2 http://www.baynote.com
3 http://www.expertmaker.com
4 http://www.loomia.com
5 http://www.criteo.com
6 http://www.sourcelight.com
7 http://www.collarity.com
8 http://www.amazon.com
9 http://www.netflix.com
10 http://digg.com
11 http://www.imdb.com
12 http://www.pandora.com
13 http://www.last.fm

19


4.1 ChoiceStream

ChoiceStream is a personalisation company that offers their recommendation technology “RealRele-
vance Recommendations” as a fully-hosted service for e-commerce vendors.

Because the different recommendation techniques all have their drawbacks and are not suited for all
fields of application, ChoiceStream is using a hybrid system based on a variety of techniques that are
chosen and combined depending on the concrete recommendation use case on hand [10]. The use
cases that ChoiceStream distinguishes are listed in table 4.1.

The recommendation techniques used by the ChoiceStream recommender engine are [10]:

Collaborative Filtering Both, user-based and item-based collaborative filtering are used.

Collaborative Filtering Using Multiple Correlation Tables Use of multiple correlation tables
(e.g. item views or clicks in addition to transactions) to overcome the cold start problem
(see 3.2).

Cohort Analysis Creation of groups of similar users, called cohorts, in order to make better recom-
mendations for users with sparse rating data.

Use Case Definition
Rich Profile User Users for whom you have a lot of data (e.g. more than 5 transac-
tions).
Sparse Profile User Users for whom you have little data (e.g. fewer than 1 to 4 trans-
actions).
Anonymous / New User Users for whom you have no data.
Popular Content Items in your catalog that you can determine are “most popular”.
Typically these will be few in number, but very high volume.
Mainstream Content Items for which you have recorded patterns of behavior (e.g. more
than 20 transactions per the items).
New Content Items for which there are no past transactions.
Long Tail Content Items in a catalog which are less well known, but still profitable,
and for which there are few past transactions.
Business Goal Optimization The requirement to maximize a metric other than the number of
transactions, such as revenue, margin, or order size.

Table 4.1: ChoiceStream – Common Use Cases Requiring Different Algorithms (From [10])

20


Selective Filtering By selective filtering the most popular items are taken out of the recommenda-
tions, so they don’t dominate and customers can find less popular items.

Attribute Correlations Item attributes are used to make content-based recommendations to over-
come the cold start problems of collaborative filtering.

Default Recommendations Default recommendations are the fallback function if all other tech-
niques fail to determine recommendations.

Business Goal Optimization With a multi-term scoring function the recommendation algorithm
can be adjusted to for example preferably recommend higher-priced items in order to increase
revenue.

Figure 4.1 shows what techniques are used for which use cases by the ChoiceStream recommender
engine.

Figure 4.1: ChoiceStream Recommender Engine (From [10])

21


4.2 Amazon.com

Amazon.com, founded in 1994, is the largest online retailer worldwide and one of the most well know
example of e-commerce businesses utilizing a recommender engine. Amazon uses it’s recommenda-
tion engines extensively to personalize its website.

Amazon’s recommender engine is based on item-based collaborative filtering [5, 6, 11]. It looks for
items correlating to the ones purchased and rated and combines the highly correlated items into a
recommendation list [11].

The recommendation engine consists of an online and an offline component. The offline component
creates an item-to-item matrix with all similar items. The online component can then lookup recom-
mendations in the matrix when they are needed [11]. To build the item-to-item matrix a similarity
function is used that determines the correlation coefficient between item pairs that customers tend to
purchase together. This expensive calculation is done offline [11, 6]. The online component then only
has to lookup similar items to the ones a user already has purchased or rated. This is a very easy and
fast operation that can be done online in real-time. Its complexity only depends on the number of
items a customer is associated with [11].

By performing the most expensive calculations offline Amazon’s recommendation system can deal
with the huge data set of approximately 50 million customers per month (only from the U.S.) and
several million catalog items. The online component scales independently of the catalog size and the
number of customers [11]. Another benefit of the created similar-items table is that the algorithm
produces higher quality recommendations for users with little user-item rating data than traditional
collaborative filtering [11].

Customers Who Bought On the information page for every item, Amazon shows the “Customers
Who Bought” feature that recommends items frequently purchased by customers who purchased the
selected item, see Figure 4.2.

As figure 4.3 shows, the feature is also used on the shopping cart page. This works as the equivalent
to the impulse items in a supermarket checkout line [11], but here the impulse items are personalized
for each customer.

22


Figure 4.2: Amazon – Item With Recommendations

23


Figure 4.3: Amazon – Shopping Cart With Recommendations

24


Your Recommendations On the page “Your Recommendations” all recommendations are listed
with the ones derived from recent purchases in front, see Figure 4.4. They can be filtered by product
line and subject area. Users can mark the recommended items as already owned or as not interesting
as well as rate them in order to provide the recommender engine with further rating data to influence
what gets recommended. It is also shown why an item is recommended, that is which purchased item
is correlated to the recommended item.

Additionally the user can view a detail page for every recommendation that lists all correlations to
purchased or otherwise rated items, see Figure 4.5.

Amazon encourages users to refine their user-item rating data by giving the option to rate purchased
items on a 5-point scale. On a page that lists all previous purchases the items can be rated and also
excluded from the recommendation calculation, see Figure 4.6.

25


Figure 4.4: Amazon – Your Recommendations
1 Recommended items can be marked as owned or not interested in and be rated
2 It is shown why items are recommended.

26


Figure 4.5: Amazon – Recommendation Details

27


Figure 4.6: Amazon – Your Purchases
1 Items can be rated
2 Items can be excluded from the recommendation engine

28


4.3 Digg

Digg is social news site, launched in 2004, where users can submit links to websites. Users can rate
these links, called stories, by “digging” or “burying” them. Stories can also be favorited, shared,
and commented on. See figure 4.7. The stories are categorized into various topics. A user can
configure which topics he is interested in and will then only see stories in these categories throughout
the website, see Figure 4.8.

On the Digg homepage the most popular stories are shown, see Figure 4.9. The popularity is measured
by the number of recent “diggs”. Thereby the homepage utilizes non-personalized recommendation.

For registered users Digg provides personalized recommendations through their own recommendation
engine, which is based on user-based collaborative filtering. The engine relies solely on the user-item
ratings express by the the “digg” function. It works without knowledge about the content of the
stories [12].

The recommendation engine uses the user’s history of “dugg” stories in the last thirty days to make
recommendations [13]. This short time span is appropriate for fast moving internet news, avoids the
stability vs. plasticity problem, and helps to keep the size of the ratings matrix within limits.

Every time a user “diggs” a story, the engine associates the user with all other users who also have
“dugg” the story. Out of these associations the recommender system calculates a correlation coef-
ficient between the users. The coefficient is based on the number of “dugg” stories in common in
relation to the total number of stories “dugg” by each of the associated users [13]. The coefficient has
a value between one zero and one. Zero if both users have never “dugg” the same story. One if the
users share all their “dugg” stories. The coefficient calculation automatically accounts for the overall
level of user activity. If a user “diggs” a lot of stories, the number of common “dugg” stories must be
high to get a high correlation coefficient. If a user “diggs” rarely, a small amount of agreement can
suffice.

The users highly correlated to a user are called “Diggers Like You”. The engine recommends the
upcoming stories that have been “dugg” by these users, minus the stories the user has already “dugg”
or buried. Stories are upcoming if they are newly submitted and have not made it to the homepage yet.
The “Diggers Like You” therefore work as a filter for all the upcoming stories. In average numbers
this means that more than 17,000 submissions per day get boiled down to about 300 recommenda-
tions [12].

29


Figure 4.7: Digg – Story
1 Users can “Digg” Stories
2 Users can Share and Favorite Stories
3 Recommendations by the Recommender Engine

30


Figure 4.8: Digg – Topic Settings

31


Figure 4.9: Digg – Homepage
1 Non-Personalized Recommendations
2 Personalized Recommendations from the Recommender Engine

32


A user’s recommended upcoming stories are displayed on the recommendations page, see Figure 4.10.
On the right pane of the page a list of the most highly correlated users with their compatibility per-
centage is shown. The compatibility percentage represents the correlation coefﬁcient. This allows the
user to explore the correlated users. Also for every recommended story the correlated users, that have
“dugg” this story, are shown including their compatibility percentage. By clicking on the compatibil-
ity percentage of a correlated user a page is shown, that displays the correlation to this user in detail,
see Figure 4.11. It is listed which stories both users have “dugg” and which stories are at the moment
recommended through this correlation. The user is also able to remove the correlation to this user
from his recommendation calculation.

The recommender engine works in real-time without prediction models or batch processing. In order
to achieve this for more than 2 million users, Digg is using their own graph-database [12].

As a social platform Digg enables users to create social networks by designating other users as friends.
Users can explore the stories their friends found interesting, which makes Digg also a social recom-
mendation engine.

33


Figure 4.10: Digg – Recommendations
1 Recommendations by the Recommender Engine
2 Correlated User with Compatibility Percentage
3 Highly Correlated Users with Compatibility Percentage

34


Figure 4.11: Digg – Correlated User
1 Remove User from the Recommender Engine
2 Shared “Dugg” Stories

35

5 Conclusion

Recommender systems are a powerful technology for personalization. Used in the right way, they
can benefit both consumers and businesses. Consumers profit by finding new interesting products and
businesses can increase their sales.

As e-commerce continues to grow the technologies of recommender engines are challenged to deal
with greater amounts of data. Therefore systems must be developed further to meet this challenge in
terms of recommendation accuracy, scalability and performance.

Item-based collaborative filtering proves to be the best recommendation technique in terms of recom-
mendation quality, scalability, performance, and learning capability [7]. Combined in a hybrid system
with content-based techniques in order to overcome the cold start problem, this is the state of the art
of recommender systems used today.

There are many fields of application for recommender engines and many have their own requirements
that get fulfilled by different techniques. So which recommendation technique works best always
depends on the concrete use case.

36

Bibliography

[1] Burke, R. (2002): Hybrid Recommender Systems: Survey and Experiments.
In: User Modeling and User-Adapted Interaction, Volume 12, Issue 4 (November 2002), Kluwer
Academic Publishers, pp. 331–370

[2] Leavitt, N. (2006): Recommendation Technology: Will It Boost E-Commerce?.
In: Computer Journal, Volume 39, Issue 5 (May 2006), IEEE Computer Society Press, pp. 13–16

[3] Thompson, C. (2008): If You Liked This, You’re Sure to Love That.
In: The New York Times Magazine (November 21, 2008), http://www.nytimes.com/2008/11/
23/magazine/23Netflix-t.html

[4] Schafer, J. B. et al. (2001): E-Commerce Recommendation Applications.
In: Data Mining and Knowledge Discovery, Volume 5, Issue 1-2 (January–April 2001), pp. 115–
153

[5] Kim, J. (2006): What is a recommender system?.
In: Proceedings of Recommenders06.com (2006), pp. 1-21

[6] McCrae, J. et al. (2004): Collaborative Filtering.
http://www.imperialviolet.org/suprema.pdf

[7] Candillier, L. et al. (2009): State-of-the-Art Recommender Systems.
In: Collaborative and Social Information Retrieval and Access (2009), Idea Group Inc, pp. 1–22

[8] Adomavicius, G.; Tuzhilin, A. (2004): Recommendation Technologies: Survey of Current Meth-
ods and Possible Extensions.
Working paper, Stern School of Business, New York University

37

Bibliography

[9] Burke, R. (2007): Hybrid Web Recommender Systems.
In: Lecture Notes in Computer Science (2007), Springer Berlin/Heidelberg, pp. 377–408

[10] ChoiceStream, Inc.: Personalization Technology Brief.
http://www.choicestream.com/resources/

[11] Linden, G. et al. (2003): Amazon.com Recommendations: Item-to-Item Collaborative Filtering.
In: IEEE Internet Computing, Volume 7, Issue 1 (January/February 2003), pp. 76–80

[12] Rose, K. (2008): Recommendation Engine Announcement.
http://blog.digg.com/?p=127

[13] Kast, A. (2008): Digg Recommendation Engine White Paper.
http://digg.com/whitepapers/recommendationengine

38

Recommender Engines Seminar Paper

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Recommender Engines Seminar Paper

Similar to Recommender Engines Seminar Paper (20)

Recently uploaded

Recently uploaded (20)

Recommender Engines Seminar Paper