• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Collaborative filtering20081111
 

Collaborative filtering20081111

on

  • 651 views

Collaborative filtering

Collaborative filtering

Statistics

Views

Total Views
651
Views on SlideShare
651
Embed Views
0

Actions

Likes
5
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 改善避開集中點使用問題 排除貴的設備可直接溝通
  • 排除雜訊使用者

Collaborative filtering20081111 Collaborative filtering20081111 Presentation Transcript

  • DCFLA: A D istributed C ollaborative- F iltering Neighbor- L ocating A lgorithm Authors: Bo Xie, Peng Han, Fan Yang, Rui-Min Shen, Hua- Jun Zeng, and Zheng Chen Source: Information Sciences, vol. 177, no. 6, pp. 1349-1363, 2007. Professor: Dr. Shu-Ching Wang ( 王淑卿 ) Speaker: Yu-Chien Chou( 周裕健 ) Date: Nov. 11, 2008
  • Outline
    • Introduction
    • Related Work
    • Novel Collaborative-Filtering
    • Recommendation Systems
    • Experiments
    • Conclusions
    Nov. 11, 2008
  • Introduction (1/2)
    • Motivation
      • Overload of information
      • Compare the recommendation approach
        • CB – information-filtering
        • CF – memory-based
        • CF – model-based
      • Focus on the scalability issue (CF)
        • Time and space complexity of the similarity calculation
      • Design and implement the decentralization CF
    Nov. 11, 2008
  • Introduction (2/2)
    • Purpose
      • Reduce the time complexity
        • O ( M 2 * N )  O ( N + M log M )
      • Increase the algorithm’s scalability
      • Propose the concept of most same opinion (MSO)
      • Propose the average rating normalization (ARN) technique to improve the MSO
      • Use a normalized rating instead of the raw rating
    Nov. 11, 2008
  • Related work (1/5)
    • Proposing a recommendation system
      • Use the memory-based CF algorithm
      • Built on the peer-to-peer (P2P) architecture
        • Distribution hash table (DHT) routing algorithms
    • Memory-based CF algorithm (a)
      • Average of the votes
    P a,j is the vote by active user a on item j , v' a is the mean vote by user a , N is the number of users in the user database, v i , j is the vote by user i on item j , v' i is the mean vote by user i . v i,j is the vote by user i on item j , I i is the total items . Nov. 11, 2008
  • Related work (2/5)
    • Memory-based CF algorithm (b)
      • 2. Pearson correlation coefficient
      • 3. Vector similarity
      • j is the all item,
      • a and i are the user.
      • j is the all item,
      • k is the target item,
      • I is the total user,
      • a and i are the user.
    Nov. 11, 2008
  • Related work (3/5)
    • P2P system and DHT routing algorithm
      • DHT routing algorithm
        • Decentralization
        • Scalability
        • Fault tolerance
    • P2P benefits
      • Avoiding dependence on centralized point
      • Allowing direct communication
      • Aggregating the possibility of resource
    Nov. 11, 2008
  • Related work (4/5)
    • P2P application
      • Parallelizable application
        • Split a large computation-intensive
      • Content and file management applications
        • Storing information
        • Retrieving information
      • Collaborative application
        • Allow users to collaborate in real time
    Nov. 11, 2008
  • Related work (5/5)
    • The primary goals of DHT
      • Provide an efficient, scalable, and robust routing algorithm
        • Reduce the number of P2P hops
      • Reduce the amount of routing states that should be preserved at each peer
    • The distributed collaborative filtering systems (DCF)
      • Advantage in terms of scalability
    Nov. 11, 2008
  • Novel collaborative-filtering (1/10)
    • Distributed collaborative-filtering neighbor-locating algorithm (DCFLA)
      • Applies DCF recommendation systems
      • Most same opinion (MSO)
      • Average rating normalization (ARN)
    • The main goal
      • Reduce the network traffic and time costs
    Nov. 11, 2008
  • Novel collaborative-filtering (2/10)
    • Basic DHT-based CF algorithm
      • Divide the original centralized user database into fractions  buckets
    Rid of some noisy users Nov. 11, 2008
  • Novel collaborative-filtering (3/10)
    • Locating neighbors in DHT-based CF
      • Most same opinion (MSO) (a)
        • Inverse preference frequency (IPF)
    M is the total number of users in the system, n i,v is the user voted for item i with rating v . Nov. 11, 2008
  • Novel collaborative-filtering (4/10)
    • Locating neighbors in DHT-based CF
      • Most same opinion (MSO) (b)
        • The consistency of user i and j
    C i,j is the consistency of users i and j , v i,k is the vote of user i for item k , v j,k is the vote of user j for item k , N is the total number of items in the system. Nov. 11, 2008
  • Novel collaborative-filtering (5/10)
    • Time complexity of the similarity calculation for traditional memory-based CF
    Program SimilarityCalculation (Output) For User=1:M For OtherUser=1:M, Calculate similarity between User and OtherUser by “ Pearson correlation coefficient or Vector similarity ” End End. Nov. 11, 2008
      • Time complexity: O ( M 2 N )
      • M is the number of users, N is the number of items,
      • When M or N grows to millions, it’s almost impossible to make a real-time prediction using the traditional centralized method.
  • Novel collaborative-filtering (6/10)
    • Time complexity of the similarity calculation for DHT-MSO-based CF neighbor-locating algorithm
    Program DHT_MSO_NeighborLocating (Output) For each rated item, fetch vector <USERID,IPF> from bucket; End Merge vectors to get consistency by IPF Nov. 11, 2008
      • Time complexity: O ( N+MlogM )
  • Novel collaborative-filtering (7/10)
    • Average rating normalization (ARN) (1/3)
      • For example -- a and b will be not in the same bucket
    Nov. 11, 2008 Item ID 1 2 3 4 5 6 Vote(a) 4 4 5 5 6 6 Vote(b) 3 3 4 4 5 5
  • Novel collaborative-filtering (8/10)
    • Average rating normalization (ARN) (2/3)
      • For example
    Nov. 11, 2008 Item ID 1 2 3 4 5 6 Vote(a) 4 4 5 5 6 6 Vote(b) 3 3 4 4 5 5
  • Novel collaborative-filtering (9/10)
    • Average rating normalization (ARN) (3/3)
      • a and b will never be in the same bucket (based on basic DHT-based CF algorithm)
      • <ITEM_ID, VOTE> vs. <ITEM_ID, ARN_VOTE>
      • For example -- a and b will be in the same bucket
    Nov. 11, 2008 New approach Former approach Item ID 1 2 3 4 5 6 Vote(a) -1 -1 0 0 1 1 Vote(b) -1 -1 0 0 1 1 Item ID 1 2 3 4 5 6 Vote(a) 4 4 5 5 6 6 Vote(b) 3 3 4 4 5 5
  • Novel collaborative-filtering (10/10)
    • ARN_VOTE
    • ARN approach constructs N buckets instead of N * C buckets for the basic DHT-based CF algorithm
      • N is the number of items
      • C is the possible rating for every item
    v ’ i,j is the ARN_VOTE of user i on item j , v ij is the vote of user i on item j , v ’ i is the mean vote for user i . v req : retrieve similar neighbors, v ack : return the users degree of satisfying, δ : threshold. Nov. 11, 2008
  • Recommendation systems (1/7)
    • Architecture of DCF system
    Nov. 11, 2008 User1 User2 User3 User4 User5
  • Recommendation systems (2/7)
    • DCF system vs. traditional centralized CF system
      • Difference
        • Maintenance of the user database
        • Complex computation task of making predictions
      • Similarity
        • Unique key
          • Each user has V key
          • Construct a DHT overlay network
    Nov. 11, 2008
  • Recommendation systems (3/7)
    • DHT-MSO-based CF algorithm
      • MSO is used as the distributed neighbor-locating algorithm
    Construct DHT overlay network put (key), lookup (key) Training set, executing the put (key) function Fetches similar neighbors, executing the lookup (key) function Computes the corresponding prediction put (key) <ITEM_ID, VOTE> Input: training set, test set, target item Output: mean absolute error of prediction Nov. 11, 2008
  • Recommendation systems (4/7)
    • DCF puts the vote vector for peer P into the DHT overlay network
    P generates a unique 128-bit DHT key k local Local key k local is most similar to K Receives the put message with K Repeats steps 2 and 3 Input: test set ( P ’s vote vector) Output: NULL P hashes one <ITEM_ID, VOTE> combination to key K Finding the similar neighbor Nov. 11, 2008
  • Recommendation systems (5/7)
    • DCF lookup similar users for peer P
    P generates a unique 128-bit DHT key k local Local key k local is most similar to K Receives the lookup, message with K Repeats steps 2 and 3 Input: test set ( P ’s vote vector) Output: training set (vote vectors retrieved for similar users) P hashes one <ITEM_ID, VOTE> combination to key K Finding the similar neighbor Received from similar users Nov. 11, 2008
  • Recommendation systems (6/7)
    • DCF put algorithm
      • Construct a DHT overlay network
      • Fill data into DHT
    • DCF lookup algorithm
      • Fetch similar user with high consistency
      • Construct a local training set to make recommendations
    Nov. 11, 2008
  • Recommendation systems (7/7)
    • DHT-MSO-ARN-based CF
      • Almost the same as a DHT-MSO-based algorithm
    Construct DHT overlay network put (key), lookup (key) Training set, executing the put (key) function Fetches similar neighbors, executing the lookup (key) function Computes the corresponding prediction Input: training set, test set, target item Output: mean absolute error of prediction put (key) <ITEM_ID, ARN_VOTE> Nov. 11, 2008
  • Experiments (1/10)
    • CF algorithms
      • Traditional memory-based CF algorithm (baseline)
      • Basic DHT-based CF
      • DHT-based CF with MSO
      • DHT-based CF with MSO and ARN
    • Data set
      • EachMovie data set
      • 72,916 users, 1,628 movies
      • 2,811,983 ratings ranging from 0 to 5
    Nov. 11, 2008
  • Experiments (2/10)
    • Metrics and methodology
      • MAE (mean absolute error)
    v a,j is the actual rating user a gives to item j , p a,j is the predicted value, A is the active user set, T is the test item set. Nov. 11, 2008
  • Experiments (3/10)
    • Experimental results (1/8)
      • The efficiency of neighbor-chosen (a)
    Nov. 11, 2008
  • Experiments (4/10)
    • Experimental results (2/8)
      • The efficiency of neighbor-chosen (b)
    Nov. 11, 2008
  • Experiments (5/10)
    • Experimental results (3/8)
      • Comparison of the prediction accuracy of four CF algorithms (all-but-one protocol)
    Nov. 11, 2008
  • Experiments (6/10)
    • Experimental results (4/8)
      • Comparison of the prediction accuracy of four CF algorithms (given 5 protocol)
    Nov. 11, 2008
  • Experiments (7/10)
    • Experimental results (5/8)
      • Comparison of the fetch by four algorithms (all-but-one protocol)
    Nov. 11, 2008
  • Experiments (8/10)
    • Experimental results (6/8)
      • Comparison of the fetch by four algorithms (given 5 protocol)
    Nov. 11, 2008
  • Experiments (9/10)
    • Experimental results (7/8)
      • Comparison of different threshold values for ARN (all-but-one protocol)
    Nov. 11, 2008
  • Experiments (10/10)
    • Experimental results (8/8)
      • Comparison of different threshold values for ARN (given 5 protocol)
    Nov. 11, 2008
  • Conclusions
    • Proposed a new algorithm
      • Based on a DHT peer-to-peer routing method
      • Distributed collaborative filtering neighbor locating algorithm (DCFLA)
        • Most same opinion (MSO)
        • Average rating normalization (ARZ)
      • Reduced the network traffic and time cost
    Nov. 11, 2008
  • Q & A
    • Thanks for your Listening!!