Analysis of Recommendation
Algorithms for E-Commerce
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl
{sarwar...
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
評價推薦演算法是十分困難的
• 不同演算法在不同資料集上的表現不同
• 評價的目的也不盡相同
• 是否需要線上使用者的測試
• 選擇哪些指標進行綜合評價
劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 200...
推薦系統的評價指標
• 準確度指標
– 預測準確度
– 分類準確度
– 排序準確度
– 預測打分關聯
• 準確度以外的評價指標
– 推薦列表的流行性與多樣性
– 覆蓋率
– 新鮮性和意外姓
– 用戶的滿意度
劉建國,周濤,郭強,汪秉宏. 個性化...
分類準確度評價方法 (源於此論文)
劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009
6
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
Abstract
• In this paper, Author investigate several
techniques for the purpose of producing useful
recommendations to cus...
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
Introduction
• There remain important research questions in
overcoming two fundamental challenges for
collaborative filter...
Problem Statement
• Author research these two challenges together.
• There has been little work on experimental
validation...
The focus of this paper is two-fold
• Author provide a systematic experimental
evaluation of different techniques.
• Autho...
Contributions
• An analysis of the effectiveness of recommender
systems on actual customer data from an e-commerce
site.
•...
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
Recommender systems
• They apply data analysis techniques to the
problem of helping customers find which
products they wou...
Traditional Data Mining: Association
Rules
• One of the most commonly used data mining
techniques for E-commerce is findin...
The quality of association rules is
commonly evaluated by looking at
their support and confidence.
• Support
• Confidence
...
Association rules can be used to
develop top-N recommender systems
in the following way.
1. We then use an association rul...
Recommender Systems Based on
Collaborative Filtering
• Collaborative filtering (CF) is the most
successful recommender sys...
Author divide the entire process of CF-
based recommendation generation
20
Representation
• In a typical CF-based recommender
system, the input data is a collection of
historical purchasing transac...
Alternate methods for representing the
input data
• A natural way of representing sparse data sets is
to compute a lower d...
Alternate representation has a number
of advantages.
1. It alleviates the sparsity problem as all the
entries in the n x k...
Author divide the entire process of CF-
based recommendation generation
25
Neighborhood Formation
26
Proximity Measure
• Correlation
• Cosine
27
Cosine Similarity (補充說明)
http://www.douban.com/note/208193209/
28
Two main schemes for neighborhood
formation
29
Author divide the entire process of CF-
based recommendation generation
30
Generation of Recommendation
• Most-frequent Item Recommendation
– looks into the neighborhood N and for each
neighbor sca...
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
Sparsity level of data sets
• For author’s experiments, author also take
another factor into consideration, sparsity level...
The first data set
• Movie data (ML)
– Author used data from our MovieLens recommender
system, MovieLens is a web-based re...
The second data set
• E-Commerce data (EC)
– Author use historical e-commerce purchase data
from Fingerhut Inc., a large e...
Two metrics widely used in the IR
community
36
Author use the standard F1 metric
• These two measures are, however, often
conflicting in nature. For instance, increasing...
Experimental Results
• Our main goal is to explore the possibilities of
combining different subtasks to formulate an
effic...
Experiments with neighborhood size
39
Experiments with number of
dimension
P.S. There is no direct analytical method to determine the value of the optimal numbe...
Summarizes in both high and low
dimensional settings
41
Density Sensitivity Analysis
42
Outline
• Why this paper?
• Abstract
• Introduction
– Problem Statement
– Contributions
• Recommender systems
– Traditiona...
Conclusion
• In this paper we presented and experimentally evaluate
various algorithmic choices for CF-based recommender
s...
Upcoming SlideShare
Loading in …5
×

Analysis of recommendation algorithms for e-commerce

887 views

Published on

Published in: Technology, Education
3 Comments
2 Likes
Statistics
Notes
No Downloads
Views
Total views
887
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
27
Comments
3
Likes
2
Embeds 0
No embeds

No notes for slide

Analysis of recommendation algorithms for e-commerce

  1. 1. Analysis of Recommendation Algorithms for E-Commerce Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl {sarwar, karypis, konstan, riedlg}@cs.umn.edu GroupLens Research Group / Army HPC Research Center Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455 ESOE R99525045 郭羿呈 1
  2. 2. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 2
  3. 3. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 3
  4. 4. 評價推薦演算法是十分困難的 • 不同演算法在不同資料集上的表現不同 • 評價的目的也不盡相同 • 是否需要線上使用者的測試 • 選擇哪些指標進行綜合評價 劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009 4
  5. 5. 推薦系統的評價指標 • 準確度指標 – 預測準確度 – 分類準確度 – 排序準確度 – 預測打分關聯 • 準確度以外的評價指標 – 推薦列表的流行性與多樣性 – 覆蓋率 – 新鮮性和意外姓 – 用戶的滿意度 劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009 5
  6. 6. 分類準確度評價方法 (源於此論文) 劉建國,周濤,郭強,汪秉宏. 個性化推薦系統評價方法綜述. 複雜系統與複雜性科學, 2009 6
  7. 7. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 7
  8. 8. Abstract • In this paper, Author investigate several techniques for the purpose of producing useful recommendations to customer. • Author apply a collection of algorithms – Data mining – Nearest-neighbor collaborative filtering – Dimensionality reduction • Applied to two different data sets – The web-purchasing transaction of a large E- commerce – MovieLens movie recommendation site. 8
  9. 9. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 9
  10. 10. Introduction • There remain important research questions in overcoming two fundamental challenges for collaborative filtering – To improve the scalability – To improve the quality of the recommendations • In some ways these two challenges are in conflict. 10
  11. 11. Problem Statement • Author research these two challenges together. • There has been little work on experimental validation of recommender systems against a set of real-world dataset. More experimental validation is needed against real-world datasets and it is important. • Author seek to investigate how the quality of their recommendations compares to other algorithms under different practical circumstances. 11
  12. 12. The focus of this paper is two-fold • Author provide a systematic experimental evaluation of different techniques. • Author present new algorithms that are particularly suited for sparse data sets. 12
  13. 13. Contributions • An analysis of the effectiveness of recommender systems on actual customer data from an e-commerce site. • A comparison of the performance of several different recommender algorithms, including original collaborative filtering algorithms, algorithms based on dimensionality reduction, and classical data mining algorithms. • A new approach to forming recommendations that has online efficiency advantage versus, and that also has quality advantages in the presence of very sparse dataset. 13
  14. 14. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 14
  15. 15. Recommender systems • They apply data analysis techniques to the problem of helping customers find which products they would like to purchase at E- Commerce sites by producing a list of top recommended products for a given customer. • In this section we discuss some traditional data mining techniques, particularly, we discuss the association rule technique and how this technique can be effectively utilized to produce top-n recommendation generation. 15
  16. 16. Traditional Data Mining: Association Rules • One of the most commonly used data mining techniques for E-commerce is finding association rules between a set of co- purchased products. • More formally, 16
  17. 17. The quality of association rules is commonly evaluated by looking at their support and confidence. • Support • Confidence 17
  18. 18. Association rules can be used to develop top-N recommender systems in the following way. 1. We then use an association rule discovery algorithm to find all the rules that satisfy given minimum support and minimum confidence constraints. 2. We find all the rules that are supported by the customer. Let P be the set of unique products that are being predicted by all these rules and have not yet been purchased by customer u. 3. We sort these product them, so that products predicted by rules that have a higher confidence are ranked first. 4. Finally, we select the first N highest ranked products as the recommended set. 18
  19. 19. Recommender Systems Based on Collaborative Filtering • Collaborative filtering (CF) is the most successful recommender system technology to date, and is used in many of the most successful recommender systems on the Web. • CF systems recommend products to a target customer based on the opinions of other customers. 19
  20. 20. Author divide the entire process of CF- based recommendation generation 20
  21. 21. Representation • In a typical CF-based recommender system, the input data is a collection of historical purchasing transactions of n customers on m products. We term this m x n representation of the input data set as original representation. 21
  22. 22. Alternate methods for representing the input data • A natural way of representing sparse data sets is to compute a lower dimensional representation using Latent semantic indexing (LSI). • Essentially, this approach takes the n x m customer-product matrix and uses a truncated singular value decomposition to obtain a rank-k approximation of the original matrix. • Author will refer to this as the reduced dimensional representation. This representation has a number of advantages. 23
  23. 23. Alternate representation has a number of advantages. 1. It alleviates the sparsity problem as all the entries in the n x k matrix are nonzero, which means that all n customers now have their opinions on the k meta-products. 2. The scalability problem also gets better as k << n, the processing time and storage requirement both improve dramatically. 3. This reduced representation captures latent association between customers and products in the reduced feature space and thus can potentially remove the synonymy problem. 24
  24. 24. Author divide the entire process of CF- based recommendation generation 25
  25. 25. Neighborhood Formation 26
  26. 26. Proximity Measure • Correlation • Cosine 27
  27. 27. Cosine Similarity (補充說明) http://www.douban.com/note/208193209/ 28
  28. 28. Two main schemes for neighborhood formation 29
  29. 29. Author divide the entire process of CF- based recommendation generation 30
  30. 30. Generation of Recommendation • Most-frequent Item Recommendation – looks into the neighborhood N and for each neighbor scans through his/her purchase data and performs a frequency count of the products. • Association Rule-based Recommendation – is based on the association rule-based top-N recommendation. However, instead of using the entire population of customers to generate rules, this technique only considers the l neighbors while generating the rules. 31
  31. 31. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 32
  32. 32. Sparsity level of data sets • For author’s experiments, author also take another factor into consideration, sparsity level of data sets. • For the data matrix R This is defined as 33
  33. 33. The first data set • Movie data (ML) – Author used data from our MovieLens recommender system, MovieLens is a web-based research recommender system that debuted in Fall 1997. – The site now has over 35000 users who have expressed opinions on 3000+ different movies. – We randomly selected enough user to obtain 100,000 ratings from the database (we only considered users that had rated 20 or more movies). – We divided the database into 80% training set and 20% test set. – The data set was converted into a binary user-movie matrix R that had 943 rows (i.e., 943 users) and 1682 columns (i.e., 1682 movies that were rated by at least one of the users). – We compute the sparsity level for this data set and found it to be 0.9369. 34
  34. 34. The second data set • E-Commerce data (EC) – Author use historical e-commerce purchase data from Fingerhut Inc., a large e-commerce company. – This data set contains purchase information of 6,502 customers on 23,554 catalog products. In total, this data set contains 97,045 purchase records. – As before, we divided the data set into a train set and a test set by using the same 80%/20% train/test ratio. – We compute the sparsity level for this data set and found it to be 0.9994. 35
  35. 35. Two metrics widely used in the IR community 36
  36. 36. Author use the standard F1 metric • These two measures are, however, often conflicting in nature. For instance, increasing the number N tends to increase recall but decrease precision. • The fact that both are critical for the quality judgment leads us to use a combination of the two. • In particular, we use the standard F1 metric that gives equal weight to them both and is computed as 37
  37. 37. Experimental Results • Our main goal is to explore the possibilities of combining different subtasks to formulate an efficient recommendation algorithm. • In all the CF-based experiments the proximity between customers was measured by using cosine metric. The cosine metric was selected because it is applicable both in original and lower dimensional representations. • Finally, in all author’s experiments author fixed the number of recommendations at 10 (i.e., top- 10). 38
  38. 38. Experiments with neighborhood size 39
  39. 39. Experiments with number of dimension P.S. There is no direct analytical method to determine the value of the optimal number of dimensions so the optimal value has to be experimentally evaluated. For the rest of the experiments we fixed the number of dimensions to 20 for the ML and 300 for the EC data set. 40
  40. 40. Summarizes in both high and low dimensional settings 41
  41. 41. Density Sensitivity Analysis 42
  42. 42. Outline • Why this paper? • Abstract • Introduction – Problem Statement – Contributions • Recommender systems – Traditional Data Mining: Association Rules – Recommender Systems Based on Collaborative Filtering • Experimental evaluation – Data sets – Evaluation Metrics – Experimental Results • Conclusion 43
  43. 43. Conclusion • In this paper we presented and experimentally evaluate various algorithmic choices for CF-based recommender systems. • Author’s results show that dimensionality reduction techniques hold the promise of allowing CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations. • Future work is required to understand exactly why low dimensional representation works well for some recommender applications, and less well for others. 44

×