Your SlideShare is downloading. ×
Recommender Engines Seminar Paper
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Recommender Engines Seminar Paper

11,401
views

Published on

Recommender engines are used by more and more e-commerce businesses to help consumers finding products they are interested in. The paper describes what recommender engines are and what role they play …

Recommender engines are used by more and more e-commerce businesses to help consumers finding products they are interested in. The paper describes what recommender engines are and what role they play in e-commerce. Recommender engines use various techniques that use different knowledge sources to make recommendations. The paper explains these techniques and their strengths and weaknesses. Some of the common issues that recommender systems face are discussed and possible solutions presented. Concluding examples of recommender engines in e-commerce are described. It is shown what techniques they use and how the e-businesses utilize recommendations on
their websites.

Published in: Technology, Business

3 Comments
28 Likes
Statistics
Notes
  • Good sumary aticle!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Perfect aticle!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • recommener engines
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
11,401
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1,132
Comments
3
Likes
28
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. RWTH Aachen University University of Bonn Fraunhofer FIT E-Commerce Seminar WT 08/09 Recommender Engines Seminar Paper Thomas Hess (289222) February 1, 2009
  • 2. Abstract Recommender engines are used by more and more e-commerce businesses to help con- sumers finding products they are interested in. The paper describes what recommender engines are and what role they play in e-commerce. Recommender engines use various techniques that use dif- ferent knowledge sources to make recommendations. The paper explains these techniques and their strengths and weaknesses. Some of the common issues that recommender systems face are discussed and possible solutions presented. Concluding examples of recommender engines in e-commerce are described. It is shown what techniques they use and how the e-businesses utilize recommendations on their websites.
  • 3. Contents 1 Introduction 5 2 Recommender Techniques 6 2.1 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.1 User-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.2 Item-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.3 Model-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Issues And Solutions 14 3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Cold Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Stability vs. Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Performance & Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 User Input Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.7 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Recommender Engine Examples 19 4.1 ChoiceStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Amazon.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Conclusion 36 3
  • 4. List of Figures 2.1 Knowledge Sources of Recommender Engines . . . . . . . . . . . . . . . . . . . . . 6 2.2 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 User-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 11 2.7 Item-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.8 Item-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 12 2.9 Model-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 ChoiceStream Recommender Engine . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Amazon – Item With Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Amazon – Shopping Cart With Recommendations . . . . . . . . . . . . . . . . . . . 24 4.4 Amazon – Your Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Amazon – Recommendation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.6 Amazon – Your Purchases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.7 Digg – Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.8 Digg – Topic Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.9 Digg – Homepage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.10 Digg – Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.11 Digg – Correlated User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4
  • 5. 1 Introduction Recommender engines are personalized information agents that attempt to predict which items out of a large pool a user may be interested in. These items can be of any type, like movies, music, books, websites, or news articles. The user’s interest in an item is expressed through the rating the user gives the item. A recommendation system has to predict the ratings for items that the user has not yet seen. With these estimated ratings the system can recommend the items that have the highest estimated rating. Recommender engines have become an integral part of many e-commerce businesses [1, 2]. They are a serious business tool that gets used by an ever-increasing number of online stores. Recommender systems are an unique feature of e-commerce, as websites are able to track everything their customers do, in contrast to real stores. The knowledge learned from the customers’ behaviour is the basis for the recommendations. Because online businesses have no real space constraint, they can offer much larger stocks, providing their customers with more choices. These large stocks become impossible to stack search, so e-commerce stores must provide personalized versions with reduced choices to the individual users. One way to achieve this is the use of recommender engines. For e-commerce vendors, recommender engines provide multiple benefits. Good recommender sys- tems present customers products they are interested in but did not plan to buy, making them purchase more items [2, 3, 4]. These unplanned purchases are not yet happening as often in online stores as in traditional stores [2]. Recommender engines can help to gain consumers’ loyalty, which is a essential business strategy in e-commerce as the competitor is always just “one click away” [4]. Because rec- ommender systems make it easier und faster to find new items, customers come back more often [2]. The more a user uses a website and purchases items, the more the recommender engine learns about the user and the better the recommendations get. This helps to build a “value-added relationship” between the website and the user [4]. Recommender systems are also a way to promote older or low-demand items, such as niche products [2]. 5
  • 6. 2 Recommender Techniques The techniques used by recommender engines can be classified based on the information sources they use [5, 2]. The available sources are the user features (demographics) (e.g. age, gender, profession, income, location), the item features (e.g. keywords, genres), and the user-item ratings (gathered through questionnaires, explicit ratings, transaction data). See figure 2.1. 2.1 Non-Personalized Recommendation Non-personalized recommendations are identical for each user. The recommendations are either man- ually selected (e.g. editor choices) or based on the popularity of items (e.g. average ratings, sales data). See figure 2.2. Figure 2.1: Knowledge Sources of Recommender Engines (From [5]) 6
  • 7. 2 Recommender Techniques Figure 2.2: Non-Personalized Recommendation (From [5]) Because non-personalized recommendations are easy to compute, they are popular among e-commerce businesses. They are also an option for websites that offer no personalization. 2.2 Demographic Recommendation Demographic recommendation methods uses only the information about the users. The users are categorized based on the attributes of their demographic profiles in order to find users with similar features. The engine then recommends items that are preferred by these similar users. See figure 2.3. Advantages • Because user-item ratings are not used, new users can get recommendations before they have rated any item. • Knowledge about the items and their features is not needed, therefore the technique is domain- independent. Figure 2.3: Demographic Recommendation (From [5]) 7
  • 8. 2 Recommender Techniques Figure 2.4: Content-Based Recommendation (From [5]) Problems • Gathering the required demographic data leads to privacy issues, see 3.7. • Demographic classification is too crude for highly personalized recommendations [5, 3]. The generalisations created from the classification are often false, especially when it comes to cul- tural items like books, music, or movies [6, 3]. • Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6). • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3). 2.3 Content-Based Recommendation Content-based recommendation methods use the information about item features and the ratings a user has given to items. The technique combines these ratings to a profile of the user’s interests based on the features of the rated items. The engine then can find items with the preferred features and recommend the items with the highest similarity to the ones preferred in the past. See figure 2.4. The recommendations of a content-based system are based on individual information and ignore contribu- tions from other users. The profiles of the users’ interests are often represented as vectors of weights on item features. But if automatic learning methods, like a rule induction algorithm, are used to generate them, they can also be rule-based [7]. Content-based recommendation works well if the items can be properly represented as a set of fea- tures. The quality of the recommendations depends directly on the quality of the available descriptive data. In order to have a sufficient set of features, the item descriptions must either be in a form from which features can be extracted automatically with information retrieval techniques (e.g. text), or 8
  • 9. 2 Recommender Techniques the features must be assigned manually, which takes a lot of resources [8]. Besides objective cate- gorizations, systems can also use (user-generated) tags associated to items that provide a subjective view. Problems • Content analysis is necessary to determine the item features. • The technique depends not only on the quality of the item metadata but also on the homogeneity of the stock, so items can be categorized. • The quality of items cannot be evaluated. The similarity computation is limited to the item features [5]. • The technique suffers from the cold start problem for new users, see 3.2. • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3). 2.4 Collaborative Filtering Collaborative filtering techniques use the user behaviour in form of the user-item ratings as their in- formation source. The concept is to make correlations between users or between items.Collaborative filtering is widely implemented and the most mature recommendation technique. Three main ap- proaches of collaborative filtering can be distinguished: user-based, item-based, and model-based approaches. Advantages • Like for demographic recommendations no knowledge about the item features is needed. Col- laborative filtering works completely independent of machine-readable item representations. It is therefore domain independent. • The quality (not just the relevancy) of items can be evaluated, as it is also expressed through user-item ratings [5]. • Collaborative filtering techniques are able to make recommendations “outside the box” because they look outside the preferences of the individual user [1]. 9
  • 10. 2 Recommender Techniques Figure 2.5: User-Based Collaborative Filtering (From [5]) Problems • The quality of the recommendations depends on the size of the historical rating data set. • The technique suffers from the cold start problem for new users and new items, see 3.2. • Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6). • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3). 2.4.1 User-Based Approach The user-based approach is based on the assumption that users that rated the same items similarly probably have the same taste. It make user-to-user correlations by using the rating profiles of different users to find highly correlated users. These users form like-minded neighbourhoods based on their shared item preferences. The engine then can recommend the items preferred by the other users in the neighbourhood. See figure 2.5. Figure 2.6 shows an example of user-based collaborative recommendation. But if there are little overlapping ratings across users in the data set, the user-based approach runs into the sparsity problem, see 3.4. User-based collaborative filtering does not scale well for many users and items, because the analysis and comparison processes become more complex, see 3.5. 2.4.2 Item-Based Approach The item-based approach focuses on items, assuming that items rated similarly are probably similar. It compares items based on the shared appreciation of users, in order to create neighbourhoods of similar 10
  • 11. 2 Recommender Techniques Figure 2.6: User-Based Collaborative Filtering Example (From [5]) items. The engine then recommends the neighbouring items of the user’s know preferred ones. See figure 2.7. Figure 2.8 shows an example of item-based collaborative recommendation. Item-based collaborative filtering is more scalable than the user-based approach, as the correlations are drawn among a limited number of products, instead of a potentially very large number of users. Items are also easy to categorize, while users’ activities must be examined and analyzed. See 3.5. Also because the number of items is naturally smaller than the number of users, the item-based ap- proach has a reduced sparsity problem (see 3.4) in comparison to the user-based approach. Figure 2.7: Item-Based Collaborative Filtering (From [5]) 11
  • 12. 2 Recommender Techniques Figure 2.8: Item-Based Collaborative Filtering Example (From [5]) 2.4.3 Model-Based Approach For huge data sets, the quadratic complexity of the user-item rating matrix gets very high [7]. But in real applications predictions must me made quickly. Model-based approaches address this problem by deriving a model for prediction from historical user-item rating data, in order to make the online prediction process faster. To build the model learning techniques like bayesian networks, neural net- works, or latent semantic indexing are used. For an accurate model a large amount of data must be available. The engine then makes the online recommendations by using the model. See figure 2.9. As the model is build in advance of the online recommendation processes, this approach has a higher performance than the memory-based approaches and avoids the scalability problem, see 3.5. Depend- ing on the learning techniques used to create the model, this approach can lead to a higher recommen- dation accuracy and a reduced sparsity problem [5]. The major drawback of the model-based approach is that the recommendation results do not adapt Figure 2.9: Model-Based Collaborative Filtering (From [5]) 12
  • 13. 2 Recommender Techniques automatically to data changes. Instead the model must be re-build to reflect updated data. 2.5 Hybrid Approaches Hybrid approaches combine collaborative and demographic or content-based methods in order to over- come their drawbacks. Collaborative filtering systems often result in better predictive performance but have problems when limited user-item ratings are available [7]. Demographic and content-based recommendation systems work without rating data and therefore can compensate for the cold start problem [1]. There are various methods to combine recommender techniques in a hybrid system [1, 9]: Weighted Hybridization The scores of the different recommendation components are combined numerically. Each component of the hybrid system scores a given item and the scores are combined using a linear formula. Switching Hybridization The system chooses among recommendation components based on the situation and applies the selected one. Some reliable criterion must be available on which to base the switching decision. Mixed Hybridization Recommendations from different recommenders are presented side-by-side in a combined list. The results of the recommender systems are not combined. Feature Combination Features derived from different knowledge sources are combined together and then injected into a single recommendation algorithm. Feature Augmentation One recommendation technique is used to compute a feature or set of fea- tures, which is then part of the input to the next technique. Cascaded Hybridization Recommenders are given strict priority, with the lower priority ones break- ing ties in the scoring of the higher ones. Meta-Level Hybridization One recommendation technique is applied to produce a model, which is then used as the input for another technique. 13
  • 14. 3 Issues And Solutions 3.1 Data Collection The data used by recommender engines can be categorized into explicit and implicit data [2]. Explicit is all data that users themselves feed into the system. Like demographic data, information about their preferences (e.g. collected through questionnaires), search terms, explicit ratings and reviews of items (wisdom of the crowds). The collection of explicit data must not be intrusive or time consuming. The way the explicit data is collected can affect the quality and amount of data the users will provide [10]. Recommendation systems should not rely completely on explicit data. Websites are able to track their user’s activities in order to acquire implicit data. The most important implicit data source in e-commerce is the transaction data including the purchase information. Other sources are web usage patterns like click sequences or reading times, or search engine referrers. Implicit data needs to be analyzed first before it can be used to describe user features or user-item ratings. 3.2 Cold Start The cold start problem occurs when too little rating data is available in the initial state. The rec- ommender system then lacks data to produce appropriate recommendations. A distinction is made between the new user and new item problem. New User Problem When recommendations follow from user-to-user correlations based on the accumulation of ratings, a user with few ratings is difficult to categorize. 14
  • 15. 3 Issues And Solutions New Item Problem A item with few ratings cannot easily be recommended. This problem occurs particularly in domains with many new items (e.g. news articles). As the problem also occurs for long tail items, it is also called “long tail problem” [10]. A solution to the cold start problem is the combination of the collaborative technique with demo- graphic (for the new user problem) or content-based (for the new item problem) techniques in a hybrid recommender engine, see 2.5. That way the cold start problem gets compensated by techniques that don’t rely on user-item ratings. Other solutions to reduce the cold start problem are the use of default ratings (e.g. from the average rating of all users) [6, 10] or the use of active learning techniques in model-based recommendation techniques [5]. 3.3 Stability vs. Plasticity The converse of the cold start problem is the stability vs. plasticity problem. When users have rated a lot of items, their preferences in the established user profiles are difficult to change [1, 9]. But because in reality taste evolves, this becomes a problem. The solution for this is to gradually discount older ratings to have less influence. But by doing so engines risk to loose information about long-term interests [1, 9]. Related to this problem is that users may use a website with different intentions. For example one day a customer buys books for himself, but the next day he is looking for a present for someone else. 3.4 Sparsity In most use cases for recommender systems, due to the catalog sizes of e-business vendors, the number of ratings already obtained is usually very small compared to the number of ratings that need to be predicted. But collaborative filtering techniques depend on an overlap in ratings across users and have difficulties when the space of ratings is sparse (few users have rated the same items). Sparsity in the user-item rating matrix degrades the quality of the recommendations. 15
  • 16. 3 Issues And Solutions To reduce the sparsity the rating data needs to be adjusted by either adding additional ratings or reducing the dimensionality of the matrix. Ratings can be augmented by inserting simulated values on behalf of the users. These can be ratings derived from other (implicit) data sources, like item views or clicks, or default values [6]. The dimensionality of the rating matrix can be reduced by techniques such as singular value decompo- sition [1]. Singular value decomposition is a well-known method for matrix factorization that provides the best lower rank approximations of the original matrix. Dimensionality reduction techniques are often used in model-based collaborative filtering approaches [1]. 3.5 Performance & Scalability Performance and scalability are important issues for recommender systems as e-commerce websites must be able to determine recommendations in real-time and often deal with huge data sets of millions of customers and items. The big growth rates of e-businesses are making the sets even larger in the user dimension [6]. Definitive for the performance is the computational complexity of a recommendation technique. Tech- niques that calculate correlation coefficients for M users over N items have a complexity of O(M × N) in the worst case. Due to the common sparsity of the user-item rating matrix the performance tends to be closer to O(M + N) [11]. However for large data sets this still leads to performance and scaling issues. Techniques that can perform the most expensive calculations offline scale better than techniques where everything must be calculated online, in real time [11]. Demographic and content-based recommen- dation as well as item- and model-based collaborative filtering can utilize offline computation. But user-based collaborative filtering can do little or no offline computing, which makes it impractical for large data sets [11]. Additionally to performing calculations offline, all methods that help reducing the size of the data set improve performance and scalability of a recommendation technique [6]. For example users with very few ratings or very popular or unpopular items could be discarded [11]. But these methods also reduce the recommendation quality. 16
  • 17. 3 Issues And Solutions 3.6 User Input Consistency Recommender techniques that work with user-to-user correlations, like demographic or collaborative filtering, depend on high correlation coefficients between the users in a data set. Users can be split into three classes based on their correlation coefficients with other users [6]. The majority of users fall into the class of “white sheep”, which have a high rating correlation with many other users. Engines can easily find recommendations for these users. The opposite type are the “black sheep”. For them there are only few or no correlating users. This makes it very difficult to find recommendations for them. But when the number of overall users in a data set increases, the chance to find similar users increases as well. The bigger problem is the “gray sheep” problem. These users have different opinions or an unusual taste, that results in low correlation coefficients with many users. They fall on a border between user cliques. Recommendations for them are very difficult to find and they also cause odd recommenda- tions for their correlated users. 3.7 Privacy Privacy is an important issue in recommender systems. In order to provide personalized recommen- dations, recommender systems must know something about the users. In fact, the more the systems know, the more accurate the recommendations can get. Users are reasonably concerned about what information is collected, how it is used, and if it is stored. These privacy concerns affect both, the collection of explicit and implicit data. Regarding explicit data, users are reluctant to disclose information about themselves and their interests [2, 4]. If ques- tionnaires get too personal, users may provide false information in order to protect their privacy [4]. Recommender engines should be able to deal with privacy concerned users and not solely rely on explicit data or recommender techniques that do, like demographic recommendation. Regarding implicit data that gets acquired by tracking users’ behaviour, there are concerns that per- sonal taste or private actions get revealed through the recommendations [5]. Users fear that extensive consumer profiles get created. 17
  • 18. 3 Issues And Solutions To confront these concerns e-commerce businesses muss provide privacy protection mechanisms [5] and make transparent which data gets acquired and analyzed. Usage und storage restrictions must be assured through privacy policies [4]. 18
  • 19. 4 Recommender Engine Examples Recommender engines are developed and run by independent technology vendors and by e-commerce businesses themselves. The business model of recommendation technology vendors is either to offer the recommender engine as a hosted service or to license their engines to e-commerce businesses. Examples for technology vendors are: ChoiceStream1 , Baynote2 , ExpertMaker3 , Loomia4 , Criteo5 , SourceLight6 , and Collar- ity7 . Especially bigger e-commerce businesses develop their own recommender solutions because they have unique requirements, want unique features, or deal with items that third-party products are not suited for. Examples are: Amazon.com8 , Netflix9 , Digg10 , The Internet Movie Database (IMDb)11 , Pandora12 , and Last.fm13 . In the following the techniques and usages of the recommender engines of ChoiceStream, Ama- zon.com, and Digg are described in detail. 1 http://www.choicestream.com 2 http://www.baynote.com 3 http://www.expertmaker.com 4 http://www.loomia.com 5 http://www.criteo.com 6 http://www.sourcelight.com 7 http://www.collarity.com 8 http://www.amazon.com 9 http://www.netflix.com 10 http://digg.com 11 http://www.imdb.com 12 http://www.pandora.com 13 http://www.last.fm 19
  • 20. 4 Recommender Engine Examples 4.1 ChoiceStream ChoiceStream is a personalisation company that offers their recommendation technology “RealRele- vance Recommendations” as a fully-hosted service for e-commerce vendors. Because the different recommendation techniques all have their drawbacks and are not suited for all fields of application, ChoiceStream is using a hybrid system based on a variety of techniques that are chosen and combined depending on the concrete recommendation use case on hand [10]. The use cases that ChoiceStream distinguishes are listed in table 4.1. The recommendation techniques used by the ChoiceStream recommender engine are [10]: Collaborative Filtering Both, user-based and item-based collaborative filtering are used. Collaborative Filtering Using Multiple Correlation Tables Use of multiple correlation tables (e.g. item views or clicks in addition to transactions) to overcome the cold start problem (see 3.2). Cohort Analysis Creation of groups of similar users, called cohorts, in order to make better recom- mendations for users with sparse rating data. Use Case Definition Rich Profile User Users for whom you have a lot of data (e.g. more than 5 transac- tions). Sparse Profile User Users for whom you have little data (e.g. fewer than 1 to 4 trans- actions). Anonymous / New User Users for whom you have no data. Popular Content Items in your catalog that you can determine are “most popular”. Typically these will be few in number, but very high volume. Mainstream Content Items for which you have recorded patterns of behavior (e.g. more than 20 transactions per the items). New Content Items for which there are no past transactions. Long Tail Content Items in a catalog which are less well known, but still profitable, and for which there are few past transactions. Business Goal Optimization The requirement to maximize a metric other than the number of transactions, such as revenue, margin, or order size. Table 4.1: ChoiceStream – Common Use Cases Requiring Different Algorithms (From [10]) 20
  • 21. 4 Recommender Engine Examples Selective Filtering By selective filtering the most popular items are taken out of the recommenda- tions, so they don’t dominate and customers can find less popular items. Attribute Correlations Item attributes are used to make content-based recommendations to over- come the cold start problems of collaborative filtering. Default Recommendations Default recommendations are the fallback function if all other tech- niques fail to determine recommendations. Business Goal Optimization With a multi-term scoring function the recommendation algorithm can be adjusted to for example preferably recommend higher-priced items in order to increase revenue. Figure 4.1 shows what techniques are used for which use cases by the ChoiceStream recommender engine. Figure 4.1: ChoiceStream Recommender Engine (From [10]) 21
  • 22. 4 Recommender Engine Examples 4.2 Amazon.com Amazon.com, founded in 1994, is the largest online retailer worldwide and one of the most well know example of e-commerce businesses utilizing a recommender engine. Amazon uses it’s recommenda- tion engines extensively to personalize its website. Amazon’s recommender engine is based on item-based collaborative filtering [5, 6, 11]. It looks for items correlating to the ones purchased and rated and combines the highly correlated items into a recommendation list [11]. The recommendation engine consists of an online and an offline component. The offline component creates an item-to-item matrix with all similar items. The online component can then lookup recom- mendations in the matrix when they are needed [11]. To build the item-to-item matrix a similarity function is used that determines the correlation coefficient between item pairs that customers tend to purchase together. This expensive calculation is done offline [11, 6]. The online component then only has to lookup similar items to the ones a user already has purchased or rated. This is a very easy and fast operation that can be done online in real-time. Its complexity only depends on the number of items a customer is associated with [11]. By performing the most expensive calculations offline Amazon’s recommendation system can deal with the huge data set of approximately 50 million customers per month (only from the U.S.) and several million catalog items. The online component scales independently of the catalog size and the number of customers [11]. Another benefit of the created similar-items table is that the algorithm produces higher quality recommendations for users with little user-item rating data than traditional collaborative filtering [11]. Customers Who Bought On the information page for every item, Amazon shows the “Customers Who Bought” feature that recommends items frequently purchased by customers who purchased the selected item, see Figure 4.2. As figure 4.3 shows, the feature is also used on the shopping cart page. This works as the equivalent to the impulse items in a supermarket checkout line [11], but here the impulse items are personalized for each customer. 22
  • 23. 4 Recommender Engine Examples Figure 4.2: Amazon – Item With Recommendations 23
  • 24. 4 Recommender Engine Examples Figure 4.3: Amazon – Shopping Cart With Recommendations 24
  • 25. 4 Recommender Engine Examples Your Recommendations On the page “Your Recommendations” all recommendations are listed with the ones derived from recent purchases in front, see Figure 4.4. They can be filtered by product line and subject area. Users can mark the recommended items as already owned or as not interesting as well as rate them in order to provide the recommender engine with further rating data to influence what gets recommended. It is also shown why an item is recommended, that is which purchased item is correlated to the recommended item. Additionally the user can view a detail page for every recommendation that lists all correlations to purchased or otherwise rated items, see Figure 4.5. Amazon encourages users to refine their user-item rating data by giving the option to rate purchased items on a 5-point scale. On a page that lists all previous purchases the items can be rated and also excluded from the recommendation calculation, see Figure 4.6. 25
  • 26. 4 Recommender Engine Examples Figure 4.4: Amazon – Your Recommendations 1 Recommended items can be marked as owned or not interested in and be rated 2 It is shown why items are recommended. 26
  • 27. 4 Recommender Engine Examples Figure 4.5: Amazon – Recommendation Details 27
  • 28. 4 Recommender Engine Examples Figure 4.6: Amazon – Your Purchases 1 Items can be rated 2 Items can be excluded from the recommendation engine 28
  • 29. 4 Recommender Engine Examples 4.3 Digg Digg is social news site, launched in 2004, where users can submit links to websites. Users can rate these links, called stories, by “digging” or “burying” them. Stories can also be favorited, shared, and commented on. See figure 4.7. The stories are categorized into various topics. A user can configure which topics he is interested in and will then only see stories in these categories throughout the website, see Figure 4.8. On the Digg homepage the most popular stories are shown, see Figure 4.9. The popularity is measured by the number of recent “diggs”. Thereby the homepage utilizes non-personalized recommendation. For registered users Digg provides personalized recommendations through their own recommendation engine, which is based on user-based collaborative filtering. The engine relies solely on the user-item ratings express by the the “digg” function. It works without knowledge about the content of the stories [12]. The recommendation engine uses the user’s history of “dugg” stories in the last thirty days to make recommendations [13]. This short time span is appropriate for fast moving internet news, avoids the stability vs. plasticity problem, and helps to keep the size of the ratings matrix within limits. Every time a user “diggs” a story, the engine associates the user with all other users who also have “dugg” the story. Out of these associations the recommender system calculates a correlation coef- ficient between the users. The coefficient is based on the number of “dugg” stories in common in relation to the total number of stories “dugg” by each of the associated users [13]. The coefficient has a value between one zero and one. Zero if both users have never “dugg” the same story. One if the users share all their “dugg” stories. The coefficient calculation automatically accounts for the overall level of user activity. If a user “diggs” a lot of stories, the number of common “dugg” stories must be high to get a high correlation coefficient. If a user “diggs” rarely, a small amount of agreement can suffice. The users highly correlated to a user are called “Diggers Like You”. The engine recommends the upcoming stories that have been “dugg” by these users, minus the stories the user has already “dugg” or buried. Stories are upcoming if they are newly submitted and have not made it to the homepage yet. The “Diggers Like You” therefore work as a filter for all the upcoming stories. In average numbers this means that more than 17,000 submissions per day get boiled down to about 300 recommenda- tions [12]. 29
  • 30. 4 Recommender Engine Examples Figure 4.7: Digg – Story 1 Users can “Digg” Stories 2 Users can Share and Favorite Stories 3 Recommendations by the Recommender Engine 30
  • 31. 4 Recommender Engine Examples Figure 4.8: Digg – Topic Settings 31
  • 32. 4 Recommender Engine Examples Figure 4.9: Digg – Homepage 1 Non-Personalized Recommendations 2 Personalized Recommendations from the Recommender Engine 32
  • 33. 4 Recommender Engine Examples A user’s recommended upcoming stories are displayed on the recommendations page, see Figure 4.10. On the right pane of the page a list of the most highly correlated users with their compatibility per- centage is shown. The compatibility percentage represents the correlation coefficient. This allows the user to explore the correlated users. Also for every recommended story the correlated users, that have “dugg” this story, are shown including their compatibility percentage. By clicking on the compatibil- ity percentage of a correlated user a page is shown, that displays the correlation to this user in detail, see Figure 4.11. It is listed which stories both users have “dugg” and which stories are at the moment recommended through this correlation. The user is also able to remove the correlation to this user from his recommendation calculation. The recommender engine works in real-time without prediction models or batch processing. In order to achieve this for more than 2 million users, Digg is using their own graph-database [12]. As a social platform Digg enables users to create social networks by designating other users as friends. Users can explore the stories their friends found interesting, which makes Digg also a social recom- mendation engine. 33
  • 34. 4 Recommender Engine Examples Figure 4.10: Digg – Recommendations 1 Recommendations by the Recommender Engine 2 Correlated User with Compatibility Percentage 3 Highly Correlated Users with Compatibility Percentage 34
  • 35. 4 Recommender Engine Examples Figure 4.11: Digg – Correlated User 1 Remove User from the Recommender Engine 2 Shared “Dugg” Stories 35
  • 36. 5 Conclusion Recommender systems are a powerful technology for personalization. Used in the right way, they can benefit both consumers and businesses. Consumers profit by finding new interesting products and businesses can increase their sales. As e-commerce continues to grow the technologies of recommender engines are challenged to deal with greater amounts of data. Therefore systems must be developed further to meet this challenge in terms of recommendation accuracy, scalability and performance. Item-based collaborative filtering proves to be the best recommendation technique in terms of recom- mendation quality, scalability, performance, and learning capability [7]. Combined in a hybrid system with content-based techniques in order to overcome the cold start problem, this is the state of the art of recommender systems used today. There are many fields of application for recommender engines and many have their own requirements that get fulfilled by different techniques. So which recommendation technique works best always depends on the concrete use case. 36
  • 37. Bibliography [1] Burke, R. (2002): Hybrid Recommender Systems: Survey and Experiments. In: User Modeling and User-Adapted Interaction, Volume 12, Issue 4 (November 2002), Kluwer Academic Publishers, pp. 331–370 [2] Leavitt, N. (2006): Recommendation Technology: Will It Boost E-Commerce?. In: Computer Journal, Volume 39, Issue 5 (May 2006), IEEE Computer Society Press, pp. 13–16 [3] Thompson, C. (2008): If You Liked This, You’re Sure to Love That. In: The New York Times Magazine (November 21, 2008), http://www.nytimes.com/2008/11/ 23/magazine/23Netflix-t.html [4] Schafer, J. B. et al. (2001): E-Commerce Recommendation Applications. In: Data Mining and Knowledge Discovery, Volume 5, Issue 1-2 (January–April 2001), pp. 115– 153 [5] Kim, J. (2006): What is a recommender system?. In: Proceedings of Recommenders06.com (2006), pp. 1-21 [6] McCrae, J. et al. (2004): Collaborative Filtering. http://www.imperialviolet.org/suprema.pdf [7] Candillier, L. et al. (2009): State-of-the-Art Recommender Systems. In: Collaborative and Social Information Retrieval and Access (2009), Idea Group Inc, pp. 1–22 [8] Adomavicius, G.; Tuzhilin, A. (2004): Recommendation Technologies: Survey of Current Meth- ods and Possible Extensions. Working paper, Stern School of Business, New York University 37
  • 38. Bibliography [9] Burke, R. (2007): Hybrid Web Recommender Systems. In: Lecture Notes in Computer Science (2007), Springer Berlin/Heidelberg, pp. 377–408 [10] ChoiceStream, Inc.: Personalization Technology Brief. http://www.choicestream.com/resources/ [11] Linden, G. et al. (2003): Amazon.com Recommendations: Item-to-Item Collaborative Filtering. In: IEEE Internet Computing, Volume 7, Issue 1 (January/February 2003), pp. 76–80 [12] Rose, K. (2008): Recommendation Engine Announcement. http://blog.digg.com/?p=127 [13] Kast, A. (2008): Digg Recommendation Engine White Paper. http://digg.com/whitepapers/recommendationengine 38