Your SlideShare is downloading. ×
夏ゼミプレゼン 4xp
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

夏ゼミプレゼン 4xp

1,009
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,009
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Badrul Sarwar, ”Item-Based Collaborative Filtering Recommendation Algorithms”, WWW 2001 Deguchi Lab. Takashi UMEDA Mail: umeda07[at]cs.dis.titech.ac.jp Web: http://umekoumeda.net/ Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 2. Outline… • Introduction • Item-Based CF • Experimental Procedure • Experimental Result • Conclusions Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 3. Chap.1 INTRODUCTION Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 4. 1-1. My Research Domain • Evaluating recommendation Algorithms by ABM – Recommendation: • Rule Based Approach • Contents Based Approach • Collaborative Filtering(CF) • Bayesian Network – Why CF? • It’s mainly used in many websites – Why ABM? • To use ABM, Algorithms are optimized according to the market environment Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 5. 1-2. What’s CF? (1/2) • Have you used Amazon.com ? Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 6. 1-3. What’s CF 2/2 Collaborative Filtering Algorithms(CF) is commonly used in EC WebSite. Recommendation Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 7. 1-4. What’s CF 3/3 Book List CF will recommend Prof Deguchi Follow book, Prof. Kizima Based on people that are similar with him Book List They have same books Prof. Deguchi ↓ They have similar preference Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 8. 1-5. Contribution of this paper • Problem of the Basic CF Algorithms – Basic CF : Nearest Neighbors – Scalability(Performance) • High Scalability : In many users, a system recommend for them quickly – Accuracy(Quality) • High Accuracy : if the data were sparse, a system recommend the item that a user may like • In this paper, the Author proposed new Algorithms – Item-Based CF – Performance & Quality can be improved Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 9. 1-6. Collaborative Filtering Process Input Data CF-Algorithm Output Interface i1 i2 ・・ in Pa,j u1 a 1,2 • Predicted the degree of u2 Prediction u3 User – Item Matrices likeness of item ij by the : user ua um • Ir ∩Iua = Φ •U ={ u1,u2,..,um} • I ={i1,i2,..,in} A list of N-items • Iui : item where user ui that the user will evalues, Iui ⊆ I Recommendation (Top-N Recommendation) like the most(Ir⊂I) • ai,j : evaluation of item ij by user ui •Ir ∩Iua = Φ Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 10. 1-7.Variation of the CF-Algorithm CF- Algorithm Memory Based Approach Model Based Approach • Procedure •Procedure(Nearest Neighbor) 1. The system develops a 1. The system defines a set of model of user ratings at off- users known as neighbors line at on-line 2. By using the model, the 2. The system produces a system produce a prediction or top-n prediction or top n recommendation recommendation • How developing the mode ? • Bayesian Network • clustering Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 11. 1-8.What ‘s online and offline ? Off-line Computation On-line Computation At a suitable interval, When a user used the offline computation is system, online performed automatically Computation is performed quickly • Indexing If you input a query, the EX: • Crowling search engine output the Google • Ranking result. Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 12. 1-9.the problem of the basic CF Sparsity of user-item matrices: many users may have purchased Accuracy well under 1% of the all items → accuracy of Nearest Neighbor Weakness of algorithm may be poor the Nearest Neighbor With millions of users and items, Nearest Neighbor Scalability algorithm may suffer serious scalability problem We need new CF-Algorithms……….. Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 13. Chap.2 ITEM-BASED CF Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 14. 2-1. Overview of Item base CF Off-line Computation On-line Computation Item Similarity Computation Prediction Computation Si,j : Similarity between item ii and ij •Pu,i is the degree of the likeness item-i by user- i1 i2 ・・ in u ,based on the similarity u1 R 1,2 between items,S u2 u3 : um S2n Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 15. 2-2. Item Similarity Computation • Cosine-Based Similarity • Correlation-based Similarity The Difference in rating scale between defferent users • Adjusted Cosine Similarity Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 16. 2-3.Prediction Computation • Weighted Sum •N is the set of item that is very similar with item I • |N| : neighbor size normalization coefficient • Regression – Ru,n is calculated by Regression model – Ri: Target item’s rating(explaining variable) – Rn: Similar item’s rating (explained variable) Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 17. 2-4. Time Complexity(1/2) Time complexity of Nearest Neibhor is….. On-line Computation User Similarity Action Prediction Computation Computation •Computing 1 user-user similarity, Recommend System scan n scores. → O(n) • Computing 1 Pi,j-Value, Time • Recommend System must Recommend System scan m Compl computing m × m user-user user-user similarity → O(m) exity similarity. →O(m×m) O(m2n) + O(m) Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 18. 2-4. Time Complexity(2/2) Time complexity of Item-Based CF is better Performance than Neaest Neighbor Off-line Computation On-line Computation Item Similarity Action Prediction Computation Computation Item-Item Similarity is static as Computing 1 Pi,j-Value, opposed the User Similarity → It Time Recommend System scan n It’s possible to precompute item item similarity → O(n) Compl Similarity ( = model ) exity O(n) Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 19. Chap.3 EXPERIMENTAL PROCEDURE Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 20. 3-1. Experimental Procedure the data set is divided into a train and a test portion 1.Data Dividing user item rating u1 i2 3 u2 Test i1 2 Evaluation u6 Train i3 3 Parameter Learning 2.To fix the optimal values The Follow parameters is decided. of a parameter • Similarity Algorithms • Train/ Test Ratio(x) : Sparsity level in data • neighborhood size 3.Full Experiment To evalue Item based CF, the follow value is measured • Performance • Quality Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 21. 3-2. Data Sets • Data Sets – Data from website “ MovieLens” – MovieLens is web based recommender system – Hundreds of users visit MovieLens to rate and receive recommendations for movies. – A data set was converted into a user-item matrix( 943user × 1682 columns ) Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 22. 3-3. Evaluation Metrics • To evaluating the quality of a recomender system, we use MAE as evaluation metrics. • MAE: Mean Absolute Error – pi: Predicted Rating for item I (predicted based on a train data) – qi: true Rating for item I (from a test data) – The lower the MAE, the more accurately the recommendation engine predicts user ratings. Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 23. Chap.4 EXPERIMENTAL RESULTS Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 24. 4-1.Optimal Values of a parameter(1/2) Item-Similarity Algorithms = Train-test ratio (x) = 0.8 as an Adjusted cosine is the best optimum value quality Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 25. 4-1.Optimal Values of a parameter(2/2) In Full Experiment, basic parameter is as follows. • Similarity Algorithms: Adjusted Cosine Considering both trends, • test/train ratio: 0.8 Optimal choise of Neighborhood Size Is 30 • neighborhood size : 30 Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 26. 4-2. Quality • Quality • Item-Based CF ( weighted sum ) out perform the nearest-neighbor • Item-Based CF (regression ) out perform the other two cases at low values of x and at low neighborhood size Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 27. 4-3. Performance(1/2) • model size: – Full model: At item similarity computation, all item – item similarity(1682×1682) is computed . – Model size = 200: At item similarity computation, 200 item – 200 item similarity (200×200 ) is computated . • If model size is small , Good quality is consistent ? – Other model based Approach is consistent – If it is consistent, online performance is higher than full- model case • Result: – if model size is 100 ~ 200, it’s possible to obtain resonably good prediction quality In the case of not using all item-item similarity , the accurarcy of prediction don’t down and the performance improve. Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 28. Chap.5 CONCLUSIONS Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 29. 5. Conclusion • Quality – Item-based CF provides better quality of predictions than nearest neighbor Algorithms. • Independent of Neighborhood size and train/test ratio – The improvement in quality is not large • Performance – Item-Similarity Computation can be pre-computed • Item-similarity is static – High online Performance – It is possible to retain only a small subset of items and produce good prediction quality& high Performance Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
  • 30. THANK YOU Summer Seminar 2008 @Susukakedai http://umekoumeda.net/