The document discusses recommendation algorithms used by Amazon, including traditional collaborative filtering, cluster models, and search-based methods. It focuses on Amazon's item-to-item collaborative filtering algorithm. This algorithm builds a similar-items table offline by finding items customers tend to purchase together. It then scales well to large data sets, provides high quality recommendations even with limited user data, and performs recommendations quickly.
Amazon.com Recommendation Item-to-ItemCollaborative Filtering Authors: Greg Linden,Brent Smith,and Jeremy York Origin: JANUARY • FEBRUARY 2003 Published by the IEEE Computer Society Reporter: 朱韋恩 Date: 2008/11/3
Some problem Manyapplications require the results set to be returned in realtime New customers typically have extremely limited information Customer data is volatile
5.
Three common approachesto solving the problem Traditional collaborative filtering Cluster models Search-based methods Amazon.com Item-to-Item CF Algorithm
6.
Traditional Collaborative FilteringNearest-Neighbor CF algorithm Cosine distance For N-dimensional vector of items, measure two customers A and B
7.
Traditional Collaborative FilteringDisadvantage 1.examines only a small customer sample... 2.item-space partitioning ... 3.If discards the most popular or unpopular items...
8.
Cluster Models Goal:Divide the customer base into many segments and assign the user to the segment containing the most similar customers
9.
Cluster Models Advantagein smaller size of group have better online scalability and performance Disadvantage complex and expensive clustering computation is run offline. However, recommendation quality is low.
10.
Search-Based Methods Giventhe user ’ s purchased and rated items, constructs a search query to find other popular items For example, same author, artist, director, or similar keywords
11.
Search-Based Methods Ifthe user has few purchases or ratings, search-based recommendation algorithms scale and perform well If users with thousands of purchases, it is impractical to base a query on all the items
Item-to-Item Collaborative FilteringRather than matching the user to similar customers, build a similar-items table by finding that customers tend to purchase together Amazon.com used this method
Item-to-Item CF AlgorithmFor each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2
Scalability: A ComparisonTraditional CF: Impractical on large data sets Cluster models: Perform much of the computation offline, but recommendation quality is relatively poor Search-based models: Scale poorly for customers with numerous purchases and ratings
19.
Scalability: A ComparisonItem-to-Item CF: -creates the similar-items table offline -fast for extremely large data set -quality is excellent -performs well with limited user data