Your SlideShare is downloading. ×
Analysis of recommendation algorithms for e-commerce
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Analysis of recommendation algorithms for e-commerce

370
views

Published on

Published in: Technology, Education

3 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
3
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Analysis of Recommendation Algorithms for E-Commerce Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl {sarwar, karypis, konstan, riedlg}@cs.umn.edu GroupLens Research Group / Army HPC Research Center Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455 ESOE R99525045 郭羿呈 1
  • 2. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 2
  • 3. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 3
  • 4. 評價推薦演算法是十分困難的 • 不同演算法在不同資料集上的表現不同 • 評價的目的也不盡相同 • 是否需要線上使用者的測試 • 選擇哪些指標進行綜合評價 劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009 4
  • 5. 推薦系統的評價指標 • 準確度指標 – 預測準確度 – 分類準確度 – 排序準確度 – 預測打分關聯 • 準確度以外的評價指標 – 推薦列表的流行性與多樣性 – 覆蓋率 – 新鮮性和意外姓 – 用戶的滿意度 劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009 5
  • 6. 分類準確度評價方法 (源於此論文) 劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009 6
  • 7. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 7
  • 8. Abstract • In this paper, Author investigate several techniques for the purpose of producing useful recommendations to customer. • Author apply a collection of algorithms – Data mining – Nearest-neighbor collaborative filtering – Dimensionality reduction • Applied to two different data sets – The web-purchasing transaction of a large E- commerce – MovieLens movie recommendation site. 8
  • 9. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 9
  • 10. Introduction • There remain important research questions in overcoming two fundamental challenges for collaborative filtering – To improve the scalability – To improve the quality of the recommendations • In some ways these two challenges are in conflict. 10
  • 11. Problem Statement • Author research these two challenges together. • There has been little work on experimental validation of recommender systems against a set of real-world dataset. More experimental validation is needed against real-world datasets and it is important. • Author seek to investigate how the quality of their recommendations compares to other algorithms under different practical circumstances. 11
  • 12. The focus of this paper is two-fold • Author provide a systematic experimental evaluation of different techniques. • Author present new algorithms that are particularly suited for sparse data sets. 12
  • 13. Contributions • An analysis of the effectiveness of recommender systems on actual customer data from an e-commerce site. • A comparison of the performance of several different recommender algorithms, including original collaborative filtering algorithms, algorithms based on dimensionality reduction, and classical data mining algorithms. • A new approach to forming recommendations that has online efficiency advantage versus, and that also has quality advantages in the presence of very sparse dataset. 13
  • 14. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 14
  • 15. Recommender systems • They apply data analysis techniques to the problem of helping customers find which products they would like to purchase at E- Commerce sites by producing a list of top recommended products for a given customer. • In this section we discuss some traditional data mining techniques, particularly, we discuss the association rule technique and how this technique can be effectively utilized to produce top-n recommendation generation. 15
  • 16. Traditional Data Mining: Association Rules • One of the most commonly used data mining techniques for E-commerce is finding association rules between a set of co- purchased products. • More formally, 16
  • 17. The quality of association rules is commonly evaluated by looking at their support and confidence. • Support • Confidence 17
  • 18. Association rules can be used to develop top-N recommender systems in the following way. 1. We then use an association rule discovery algorithm to find all the rules that satisfy given minimum support and minimum confidence constraints. 2. We find all the rules that are supported by the customer. Let P be the set of unique products that are being predicted by all these rules and have not yet been purchased by customer u. 3. We sort these product them, so that products predicted by rules that have a higher confidence are ranked first. 4. Finally, we select the first N highest ranked products as the recommended set. 18
  • 19. Recommender Systems Based on Collaborative Filtering • Collaborative filtering (CF) is the most successful recommender system technology to date, and is used in many of the most successful recommender systems on the Web. • CF systems recommend products to a target customer based on the opinions of other customers. 19
  • 20. Author divide the entire process of CF- based recommendation generation 20
  • 21. Representation • In a typical CF-based recommender system, the input data is a collection of historical purchasing transactions of n customers on m products. We term this m x n representation of the input data set as original representation. 21
  • 22. Alternate methods for representing the input data • A natural way of representing sparse data sets is to compute a lower dimensional representation using Latent semantic indexing (LSI). • Essentially, this approach takes the n x m customer-product matrix and uses a truncated singular value decomposition to obtain a rank-k approximation of the original matrix. • Author will refer to this as the reduced dimensional representation. This representation has a number of advantages. 23
  • 23. Alternate representation has a number of advantages. 1. It alleviates the sparsity problem as all the entries in the n x k matrix are nonzero, which means that all n customers now have their opinions on the k meta-products. 2. The scalability problem also gets better as k << n, the processing time and storage requirement both improve dramatically. 3. This reduced representation captures latent association between customers and products in the reduced feature space and thus can potentially remove the synonymy problem. 24
  • 24. Author divide the entire process of CF- based recommendation generation 25
  • 25. Neighborhood Formation 26
  • 26. Proximity Measure • Correlation • Cosine 27
  • 27. Cosine Similarity (補充說明) http://www.douban.com/note/208193209/ 28
  • 28. Two main schemes for neighborhood formation 29
  • 29. Author divide the entire process of CF- based recommendation generation 30
  • 30. Generation of Recommendation • Most-frequent Item Recommendation – looks into the neighborhood N and for each neighbor scans through his/her purchase data and performs a frequency count of the products. • Association Rule-based Recommendation – is based on the association rule-based top-N recommendation. However, instead of using the entire population of customers to generate rules, this technique only considers the l neighbors while generating the rules. 31
  • 31. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 32
  • 32. Sparsity level of data sets • For author’s experiments, author also take another factor into consideration, sparsity level of data sets. • For the data matrix R This is defined as 33
  • 33. The first data set • Movie data (ML) – Author used data from our MovieLens recommender system, MovieLens is a web-based research recommender system that debuted in Fall 1997. – The site now has over 35000 users who have expressed opinions on 3000+ different movies. – We randomly selected enough user to obtain 100,000 ratings from the database (we only considered users that had rated 20 or more movies). – We divided the database into 80% training set and 20% test set. – The data set was converted into a binary user-movie matrix R that had 943 rows (i.e., 943 users) and 1682 columns (i.e., 1682 movies that were rated by at least one of the users). – We compute the sparsity level for this data set and found it to be 0.9369. 34
  • 34. The second data set • E-Commerce data (EC) – Author use historical e-commerce purchase data from Fingerhut Inc., a large e-commerce company. – This data set contains purchase information of 6,502 customers on 23,554 catalog products. In total, this data set contains 97,045 purchase records. – As before, we divided the data set into a train set and a test set by using the same 80%/20% train/test ratio. – We compute the sparsity level for this data set and found it to be 0.9994. 35
  • 35. Two metrics widely used in the IR community 36
  • 36. Author use the standard F1 metric • These two measures are, however, often conflicting in nature. For instance, increasing the number N tends to increase recall but decrease precision. • The fact that both are critical for the quality judgment leads us to use a combination of the two. • In particular, we use the standard F1 metric that gives equal weight to them both and is computed as 37
  • 37. Experimental Results • Our main goal is to explore the possibilities of combining different subtasks to formulate an efficient recommendation algorithm. • In all the CF-based experiments the proximity between customers was measured by using cosine metric. The cosine metric was selected because it is applicable both in original and lower dimensional representations. • Finally, in all author’s experiments author fixed the number of recommendations at 10 (i.e., top- 10). 38
  • 38. Experiments with neighborhood size 39
  • 39. Experiments with number of dimension P.S. There is no direct analytical method to determine the value of the optimal number of dimensions so the optimal value has to be experimentally evaluated. For the rest of the experiments we fixed the number of dimensions to 20 for the ML and 300 for the EC data set. 40
  • 40. Summarizes in both high and low dimensional settings 41
  • 41. Density Sensitivity Analysis 42
  • 42. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 43
  • 43. Conclusion • In this paper we presented and experimentally evaluate various algorithmic choices for CF-based recommender systems. • Author’s results show that dimensionality reduction techniques hold the promise of allowing CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations. • Future work is required to understand exactly why low dimensional representation works well for some recommender applications, and less well for others. 44