Collaborative filtering20081111

783 views

Published on

Collaborative filtering

Published in: Education, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
783
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • 改善避開集中點使用問題 排除貴的設備可直接溝通
  • 排除雜訊使用者
  • Collaborative filtering20081111

    1. 1. DCFLA: A D istributed C ollaborative- F iltering Neighbor- L ocating A lgorithm Authors: Bo Xie, Peng Han, Fan Yang, Rui-Min Shen, Hua- Jun Zeng, and Zheng Chen Source: Information Sciences, vol. 177, no. 6, pp. 1349-1363, 2007. Professor: Dr. Shu-Ching Wang ( 王淑卿 ) Speaker: Yu-Chien Chou( 周裕健 ) Date: Nov. 11, 2008
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Novel Collaborative-Filtering </li></ul><ul><li>Recommendation Systems </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>Nov. 11, 2008
    3. 3. Introduction (1/2) <ul><li>Motivation </li></ul><ul><ul><li>Overload of information </li></ul></ul><ul><ul><li>Compare the recommendation approach </li></ul></ul><ul><ul><ul><li>CB – information-filtering </li></ul></ul></ul><ul><ul><ul><li>CF – memory-based </li></ul></ul></ul><ul><ul><ul><li>CF – model-based </li></ul></ul></ul><ul><ul><li>Focus on the scalability issue (CF) </li></ul></ul><ul><ul><ul><li>Time and space complexity of the similarity calculation </li></ul></ul></ul><ul><ul><li>Design and implement the decentralization CF </li></ul></ul>Nov. 11, 2008
    4. 4. Introduction (2/2) <ul><li>Purpose </li></ul><ul><ul><li>Reduce the time complexity </li></ul></ul><ul><ul><ul><li>O ( M 2 * N )  O ( N + M log M ) </li></ul></ul></ul><ul><ul><li>Increase the algorithm’s scalability </li></ul></ul><ul><ul><li>Propose the concept of most same opinion (MSO) </li></ul></ul><ul><ul><li>Propose the average rating normalization (ARN) technique to improve the MSO </li></ul></ul><ul><ul><li>Use a normalized rating instead of the raw rating </li></ul></ul>Nov. 11, 2008
    5. 5. Related work (1/5) <ul><li>Proposing a recommendation system </li></ul><ul><ul><li>Use the memory-based CF algorithm </li></ul></ul><ul><ul><li>Built on the peer-to-peer (P2P) architecture </li></ul></ul><ul><ul><ul><li>Distribution hash table (DHT) routing algorithms </li></ul></ul></ul><ul><li>Memory-based CF algorithm (a) </li></ul><ul><ul><li>Average of the votes </li></ul></ul>P a,j is the vote by active user a on item j , v' a is the mean vote by user a , N is the number of users in the user database, v i , j is the vote by user i on item j , v' i is the mean vote by user i . v i,j is the vote by user i on item j , I i is the total items . Nov. 11, 2008
    6. 6. Related work (2/5) <ul><li>Memory-based CF algorithm (b) </li></ul><ul><ul><li>2. Pearson correlation coefficient </li></ul></ul><ul><ul><li>3. Vector similarity </li></ul></ul><ul><ul><li>j is the all item, </li></ul></ul><ul><ul><li>a and i are the user. </li></ul></ul><ul><ul><li>j is the all item, </li></ul></ul><ul><ul><li>k is the target item, </li></ul></ul><ul><ul><li>I is the total user, </li></ul></ul><ul><ul><li>a and i are the user. </li></ul></ul>Nov. 11, 2008
    7. 7. Related work (3/5) <ul><li>P2P system and DHT routing algorithm </li></ul><ul><ul><li>DHT routing algorithm </li></ul></ul><ul><ul><ul><li>Decentralization </li></ul></ul></ul><ul><ul><ul><li>Scalability </li></ul></ul></ul><ul><ul><ul><li>Fault tolerance </li></ul></ul></ul><ul><li>P2P benefits </li></ul><ul><ul><li>Avoiding dependence on centralized point </li></ul></ul><ul><ul><li>Allowing direct communication </li></ul></ul><ul><ul><li>Aggregating the possibility of resource </li></ul></ul>Nov. 11, 2008
    8. 8. Related work (4/5) <ul><li>P2P application </li></ul><ul><ul><li>Parallelizable application </li></ul></ul><ul><ul><ul><li>Split a large computation-intensive </li></ul></ul></ul><ul><ul><li>Content and file management applications </li></ul></ul><ul><ul><ul><li>Storing information </li></ul></ul></ul><ul><ul><ul><li>Retrieving information </li></ul></ul></ul><ul><ul><li>Collaborative application </li></ul></ul><ul><ul><ul><li>Allow users to collaborate in real time </li></ul></ul></ul>Nov. 11, 2008
    9. 9. Related work (5/5) <ul><li>The primary goals of DHT </li></ul><ul><ul><li>Provide an efficient, scalable, and robust routing algorithm </li></ul></ul><ul><ul><ul><li>Reduce the number of P2P hops </li></ul></ul></ul><ul><ul><li>Reduce the amount of routing states that should be preserved at each peer </li></ul></ul><ul><li>The distributed collaborative filtering systems (DCF) </li></ul><ul><ul><li>Advantage in terms of scalability </li></ul></ul>Nov. 11, 2008
    10. 10. Novel collaborative-filtering (1/10) <ul><li>Distributed collaborative-filtering neighbor-locating algorithm (DCFLA) </li></ul><ul><ul><li>Applies DCF recommendation systems </li></ul></ul><ul><ul><li>Most same opinion (MSO) </li></ul></ul><ul><ul><li>Average rating normalization (ARN) </li></ul></ul><ul><li>The main goal </li></ul><ul><ul><li>Reduce the network traffic and time costs </li></ul></ul>Nov. 11, 2008
    11. 11. Novel collaborative-filtering (2/10) <ul><li>Basic DHT-based CF algorithm </li></ul><ul><ul><li>Divide the original centralized user database into fractions  buckets </li></ul></ul>Rid of some noisy users Nov. 11, 2008
    12. 12. Novel collaborative-filtering (3/10) <ul><li>Locating neighbors in DHT-based CF </li></ul><ul><ul><li>Most same opinion (MSO) (a) </li></ul></ul><ul><ul><ul><li>Inverse preference frequency (IPF) </li></ul></ul></ul>M is the total number of users in the system, n i,v is the user voted for item i with rating v . Nov. 11, 2008
    13. 13. Novel collaborative-filtering (4/10) <ul><li>Locating neighbors in DHT-based CF </li></ul><ul><ul><li>Most same opinion (MSO) (b) </li></ul></ul><ul><ul><ul><li>The consistency of user i and j </li></ul></ul></ul>C i,j is the consistency of users i and j , v i,k is the vote of user i for item k , v j,k is the vote of user j for item k , N is the total number of items in the system. Nov. 11, 2008
    14. 14. Novel collaborative-filtering (5/10) <ul><li>Time complexity of the similarity calculation for traditional memory-based CF </li></ul>Program SimilarityCalculation (Output) For User=1:M For OtherUser=1:M, Calculate similarity between User and OtherUser by “ Pearson correlation coefficient or Vector similarity ” End End. Nov. 11, 2008 <ul><ul><li>Time complexity: O ( M 2 N ) </li></ul></ul><ul><ul><li>M is the number of users, N is the number of items, </li></ul></ul><ul><ul><li>When M or N grows to millions, it’s almost impossible to make a real-time prediction using the traditional centralized method. </li></ul></ul>
    15. 15. Novel collaborative-filtering (6/10) <ul><li>Time complexity of the similarity calculation for DHT-MSO-based CF neighbor-locating algorithm </li></ul>Program DHT_MSO_NeighborLocating (Output) For each rated item, fetch vector <USERID,IPF> from bucket; End Merge vectors to get consistency by IPF Nov. 11, 2008 <ul><ul><li>Time complexity: O ( N+MlogM ) </li></ul></ul>
    16. 16. Novel collaborative-filtering (7/10) <ul><li>Average rating normalization (ARN) (1/3) </li></ul><ul><ul><li>For example -- a and b will be not in the same bucket </li></ul></ul>Nov. 11, 2008 Item ID 1 2 3 4 5 6 Vote(a) 4 4 5 5 6 6 Vote(b) 3 3 4 4 5 5
    17. 17. Novel collaborative-filtering (8/10) <ul><li>Average rating normalization (ARN) (2/3) </li></ul><ul><ul><li>For example </li></ul></ul>Nov. 11, 2008 Item ID 1 2 3 4 5 6 Vote(a) 4 4 5 5 6 6 Vote(b) 3 3 4 4 5 5
    18. 18. Novel collaborative-filtering (9/10) <ul><li>Average rating normalization (ARN) (3/3) </li></ul><ul><ul><li>a and b will never be in the same bucket (based on basic DHT-based CF algorithm) </li></ul></ul><ul><ul><li><ITEM_ID, VOTE> vs. <ITEM_ID, ARN_VOTE> </li></ul></ul><ul><ul><li>For example -- a and b will be in the same bucket </li></ul></ul>Nov. 11, 2008 New approach Former approach Item ID 1 2 3 4 5 6 Vote(a) -1 -1 0 0 1 1 Vote(b) -1 -1 0 0 1 1 Item ID 1 2 3 4 5 6 Vote(a) 4 4 5 5 6 6 Vote(b) 3 3 4 4 5 5
    19. 19. Novel collaborative-filtering (10/10) <ul><li>ARN_VOTE </li></ul><ul><li>ARN approach constructs N buckets instead of N * C buckets for the basic DHT-based CF algorithm </li></ul><ul><ul><li>N is the number of items </li></ul></ul><ul><ul><li>C is the possible rating for every item </li></ul></ul>v ’ i,j is the ARN_VOTE of user i on item j , v ij is the vote of user i on item j , v ’ i is the mean vote for user i . v req : retrieve similar neighbors, v ack : return the users degree of satisfying, δ : threshold. Nov. 11, 2008
    20. 20. Recommendation systems (1/7) <ul><li>Architecture of DCF system </li></ul>Nov. 11, 2008 User1 User2 User3 User4 User5
    21. 21. Recommendation systems (2/7) <ul><li>DCF system vs. traditional centralized CF system </li></ul><ul><ul><li>Difference </li></ul></ul><ul><ul><ul><li>Maintenance of the user database </li></ul></ul></ul><ul><ul><ul><li>Complex computation task of making predictions </li></ul></ul></ul><ul><ul><li>Similarity </li></ul></ul><ul><ul><ul><li>Unique key </li></ul></ul></ul><ul><ul><ul><ul><li>Each user has V key </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Construct a DHT overlay network </li></ul></ul></ul></ul>Nov. 11, 2008
    22. 22. Recommendation systems (3/7) <ul><li>DHT-MSO-based CF algorithm </li></ul><ul><ul><li>MSO is used as the distributed neighbor-locating algorithm </li></ul></ul>Construct DHT overlay network put (key), lookup (key) Training set, executing the put (key) function Fetches similar neighbors, executing the lookup (key) function Computes the corresponding prediction put (key) <ITEM_ID, VOTE> Input: training set, test set, target item Output: mean absolute error of prediction Nov. 11, 2008
    23. 23. Recommendation systems (4/7) <ul><li>DCF puts the vote vector for peer P into the DHT overlay network </li></ul>P generates a unique 128-bit DHT key k local Local key k local is most similar to K Receives the put message with K Repeats steps 2 and 3 Input: test set ( P ’s vote vector) Output: NULL P hashes one <ITEM_ID, VOTE> combination to key K Finding the similar neighbor Nov. 11, 2008
    24. 24. Recommendation systems (5/7) <ul><li>DCF lookup similar users for peer P </li></ul>P generates a unique 128-bit DHT key k local Local key k local is most similar to K Receives the lookup, message with K Repeats steps 2 and 3 Input: test set ( P ’s vote vector) Output: training set (vote vectors retrieved for similar users) P hashes one <ITEM_ID, VOTE> combination to key K Finding the similar neighbor Received from similar users Nov. 11, 2008
    25. 25. Recommendation systems (6/7) <ul><li>DCF put algorithm </li></ul><ul><ul><li>Construct a DHT overlay network </li></ul></ul><ul><ul><li>Fill data into DHT </li></ul></ul><ul><li>DCF lookup algorithm </li></ul><ul><ul><li>Fetch similar user with high consistency </li></ul></ul><ul><ul><li>Construct a local training set to make recommendations </li></ul></ul>Nov. 11, 2008
    26. 26. Recommendation systems (7/7) <ul><li>DHT-MSO-ARN-based CF </li></ul><ul><ul><li>Almost the same as a DHT-MSO-based algorithm </li></ul></ul>Construct DHT overlay network put (key), lookup (key) Training set, executing the put (key) function Fetches similar neighbors, executing the lookup (key) function Computes the corresponding prediction Input: training set, test set, target item Output: mean absolute error of prediction put (key) <ITEM_ID, ARN_VOTE> Nov. 11, 2008
    27. 27. Experiments (1/10) <ul><li>CF algorithms </li></ul><ul><ul><li>Traditional memory-based CF algorithm (baseline) </li></ul></ul><ul><ul><li>Basic DHT-based CF </li></ul></ul><ul><ul><li>DHT-based CF with MSO </li></ul></ul><ul><ul><li>DHT-based CF with MSO and ARN </li></ul></ul><ul><li>Data set </li></ul><ul><ul><li>EachMovie data set </li></ul></ul><ul><ul><li>72,916 users, 1,628 movies </li></ul></ul><ul><ul><li>2,811,983 ratings ranging from 0 to 5 </li></ul></ul>Nov. 11, 2008
    28. 28. Experiments (2/10) <ul><li>Metrics and methodology </li></ul><ul><ul><li>MAE (mean absolute error) </li></ul></ul>v a,j is the actual rating user a gives to item j , p a,j is the predicted value, A is the active user set, T is the test item set. Nov. 11, 2008
    29. 29. Experiments (3/10) <ul><li>Experimental results (1/8) </li></ul><ul><ul><li>The efficiency of neighbor-chosen (a) </li></ul></ul>Nov. 11, 2008
    30. 30. Experiments (4/10) <ul><li>Experimental results (2/8) </li></ul><ul><ul><li>The efficiency of neighbor-chosen (b) </li></ul></ul>Nov. 11, 2008
    31. 31. Experiments (5/10) <ul><li>Experimental results (3/8) </li></ul><ul><ul><li>Comparison of the prediction accuracy of four CF algorithms (all-but-one protocol) </li></ul></ul>Nov. 11, 2008
    32. 32. Experiments (6/10) <ul><li>Experimental results (4/8) </li></ul><ul><ul><li>Comparison of the prediction accuracy of four CF algorithms (given 5 protocol) </li></ul></ul>Nov. 11, 2008
    33. 33. Experiments (7/10) <ul><li>Experimental results (5/8) </li></ul><ul><ul><li>Comparison of the fetch by four algorithms (all-but-one protocol) </li></ul></ul>Nov. 11, 2008
    34. 34. Experiments (8/10) <ul><li>Experimental results (6/8) </li></ul><ul><ul><li>Comparison of the fetch by four algorithms (given 5 protocol) </li></ul></ul>Nov. 11, 2008
    35. 35. Experiments (9/10) <ul><li>Experimental results (7/8) </li></ul><ul><ul><li>Comparison of different threshold values for ARN (all-but-one protocol) </li></ul></ul>Nov. 11, 2008
    36. 36. Experiments (10/10) <ul><li>Experimental results (8/8) </li></ul><ul><ul><li>Comparison of different threshold values for ARN (given 5 protocol) </li></ul></ul>Nov. 11, 2008
    37. 37. Conclusions <ul><li>Proposed a new algorithm </li></ul><ul><ul><li>Based on a DHT peer-to-peer routing method </li></ul></ul><ul><ul><li>Distributed collaborative filtering neighbor locating algorithm (DCFLA) </li></ul></ul><ul><ul><ul><li>Most same opinion (MSO) </li></ul></ul></ul><ul><ul><ul><li>Average rating normalization (ARZ) </li></ul></ul></ul><ul><ul><li>Reduced the network traffic and time cost </li></ul></ul>Nov. 11, 2008
    38. 38. Q & A <ul><li>Thanks for your Listening!! </li></ul>

    ×