This document provides an overview of data mining and machine learning techniques for filtering and making recommendations, specifically focusing on collaborative filtering. It discusses how collaborative filtering works by finding patterns in user behaviors and ratings to predict what individual users might like based on opinions of other similar users. Memory-based and model-based collaborative filtering algorithms are described as well as common approaches like user-based, item-based and cluster models. Challenges with collaborative filtering and limitations are also outlined.
1. DATA MINING AND MACHINE LEARNING
IN A NUTSHELL
COLLECTIVE INTELLIGENCE
Mohammad-Ali Abbasi
http://www.public.asu.edu/~mabbasi2/
SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING
ARIZONA STATE UNIVERSITY
Arizona State University
http://dmml.asu.edu/
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 1
2. Filtering &
Making Recommendation
• Recommendation Systems
• Collaborative filtering
• Content Based Filtering
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 2
3. Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 3
4. Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 4
5. Collaborative Filtering- Example
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 5
6. Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 6
7. Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 7
8. Collaborative Filtering- Example
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 8
9. Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 9
10. Collaborative Filtering
• Collaborative Filtering is a method of making
personalized suggestions for other products,
based on your previous shopping habits.
• The method of making automatic predictions
(filtering) about the interests of a user by
collecting taste information from many users.
• In most cases the goal is to predict user
preferences on items by learning their
aggregated relationships through the
historical records
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 10
11. What are they and Why are they
• RS – problem of information filtering
• RS – problem of machine learning
• Enhance user experience
– Assist users in finding information
– Reduce search and navigation time
• Increase productivity
• Increase credibility
• Mutually beneficial proposition
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 11
12. Personalization
• Recommenders are instances of personalization
software.
• Personalization concerns adapting to the individual
needs, interests, and preferences of each user.
• Includes:
– Recommending
– Filtering
– Predicting (e.g. form or calendar appt. completion)
• From a business perspective, it is viewed as part of
Customer Relationship Management (CRM).
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 12
13. Netfilx Prize
• The Netflix Prize sought to substantially
improve the accuracy of predictions about
how much someone is going to enjoy a movie
based on their movie preferences.
On September 21,
2009 “BellKor’s
Pragmatic Chaos”
team, owned $1M
Grand Prize.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 13
14. Types of Collaborative Filtering
• Memory-Based
– This mechanism uses user rating data to compute
similarity between users or items then uses this
similarity to make a recommendation
• Similarity methods: Pearson correlation, vector cosine
• Model-Based
– Models are developed using data mining, machine
learning algorithms to find patterns based on training
data to make predictions for real data.
• Model Based Alg.: Bayesian Networks, clustering
models, latent semantic models (SVD) , probabilistic latent
semantic analysis, Latent Dirichlet allocation, Markov DP
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 15
15. Memory Based Algorithms
• Customer Based Algorithms
• Item Based Algorithms
• Cluster Models
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 16
16. Recommendation Algorithms- Customer Based Algorithm
• Most algorithms start by finding a set of
customers whose purchased and rated items
overlap the user’s purchased and rated items.
• The algorithm aggregates items from these
similar customers, eliminates items the user
has already purchased or rated, and
recommends the remaining items to the user.
• Two popular versions of these algorithms:
– collaborative filtering
– cluster models.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 17
17. Collaborative Filtering
• A traditional collaborative filtering algorithm
represents a customer as an N-dimensional
vector of items, where N is the number of
distinct catalog items. The components of the
vector are positive for purchased or positively
rated items and negative for negatively rated
items.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 18
18. The user-oriented neighborhood method
• Joe likes the three
movies on the left.
• To make a prediction for
him, the system finds
similar users who also
liked those movies, and
then determines which
other movies they liked.
• In this case, all three
liked Saving Private
Ryan, so that is the first
recommendation.
• Two of them liked
Dune, so that is next,
and so on.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 19
19. Recommendation Algorithms, Item Based Algorithm
• These algorithms focus on finding similar
items, not similar customers.
• For each of the user’s purchased and rated
items, the algorithm attempts to find similar
items. It then aggregates the similar items and
recommends them.
– search-based methods
– Amazon’s item-to-item collaborative filtering
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 20
20. Cluster Models
• To find customers who are similar to the user,
cluster models divide the customer base into
many segments and treat the task as a
classification problem.
• The algorithm’s goal is to assign the user to
the segment containing the most similar
customers.
• It then uses the purchases and ratings of the
customers in the segment to generate
recommendations.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 21
21. Clustering Example
• Clustering based on Gender and Genre
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 22
22. Amazon Item Based- Collaborative Filtering
• Rather than matching the user to similar
customers, item-to-item method, matches
each of the user’s purchased and rated items
to similar items, then combines those similar
items into a recommendation list
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 23
23. Inside of the algorithms, Customer Based Algorithms
• vi,j= vote of user i on item j
• Ii = items for which user i has voted
• Mean vote for i is
• Predicted vote for “active user” a is weighted sum
normalizer weights of n similar users
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 24
24. Customer Based Algorithms, Computing the weights
• K-nearest neighbor
1 if i neighbors(a)
w(a, i)
0 else
• Pearson correlation coefficient (Resnick ’94,
Grouplens):
• Cosine distance (from IR)
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 25
25. Customer Based Algorithms, Computing the weights
• Cosine with “inverse user frequency” fi = log(n/nj), where n is
number of users, nj is number of users voting for item j
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 26
26. Customer Based Algorithms, Evaluation
• Split users into train/test sets
• For each user a in the test set:
– split a’s votes into observed (I) and to-predict (P)
– measure average absolute deviation between
predicted and actual votes in P
– predict votes in P, and form a ranked list
– assume (a) utility of k-th item in list is max(va,j-
d,0), where d is a “default vote” (b) probability of
reaching rank k drops exponentially in k. Score a
list by its expected utility Ra
• Average Ra over all test users
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 29
27. Collaborative Filtering Systems- Review
• Look for users who share the same rating
patterns with the active user (the user whom
the prediction is for).
• Use the ratings from those like-minded users
found in step 1 to calculate a prediction for
the active user
• Build an item-item matrix determining
relationships between pairs of items
• Using the matrix, and the data on the current
user, infer his taste
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 30
28. Collaborative Filtering highlights
• Use other users recommendations (ratings) to
judge item’s utility
• Key is to find users/user groups whose interests
match with the current user
• Vector Space model widely used (directions of
vectors are user specified ratings)
• More users, more ratings: better results
• Can account for items dissimilar to the ones seen
in the past too
• Example: Movielens.org
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 31
29. Collaborative Filtering Limitations
• Different users might use different scales.
Possible solution: weighted ratings, i.e.
deviations from average rating
• Finding similar users/user groups isn’t very
easy
• New user: No preferences available
• New item: No ratings available
• Demographic filtering is required
• Multi-criteria ratings is required
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 32
30. Challenges in Recommendation algorithms
• A large retailer might have huge amounts of data, tens of
millions of customers and millions of distinct catalog items.
• Many applications require the results set to be returned in
realtime, in no more than half a second, while still
producing high-quality recommendations.
• New customers typically have extremely limited
information, based on only a few purchases or product
ratings.
• Older customers can have a glut of information, based on
thousands of purchases and ratings.
• Customer data is volatile: Each interaction provides
valuable customer data, and the algorithm must respond
immediately to new information.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 33
31. Model Based Algorithms
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 34
32. Model Based Methods
• Model or content-based methods treat the
recommendations problem as a search for
related items.
• Given the user’s purchased and rated items, the
algorithm constructs a search query to find other
popular items by the same author, artist, or
director, or with similar keywords or subjects.
– If a customer buys the Godfather DVD Collection, for
example, the system might recommend other crime
drama titles, other titles starring Marlon Brando, or
other movies directed by Francis Ford Coppola.
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 35
33. Content based RS highlights
• Recommend items similar to those users
preferred in the past
• User profiling is the key
• Items/content usually denoted by keywords
• Matching “user preferences” with “item
characteristics” … works for textual
information
• Vector Space Model widely used
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 36
34. Content based RS - Limitations
• Not all content is well represented by
keywords, e.g. images
• Items represented by same set of features are
indistinguishable
• Overspecialization: unrated items not shown
• Users with thousands of purchases is a
problem
• New user: No history available
• Shouldn’t show items that are too different,
or too similar
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 37
35. Other issues, not addressed much
• Combining and weighting different types of information
sources
– How much is a web page link worth vs a link in a newsgroup?
• Spamming—how to prevent vendors from biasing
results?
• Efficiency issues—how to handle a large community?
• What do we measure when we evaluate CF?
– Predicting actual rating may be useless!
– Example: music recommendations:
• Beatles, Eric Clapton, Stones, Elton John, Led Zep, the Who, ...
– What’s useful and new? for this need model of user’s prior
knowledge, not just his tastes.
• Subjectively better recs result from “poor” distance metrics
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 38
36. References
• http://www.cs.duke.edu/csed/socialnet/workshop/2006/assign/cf-
4up.pdf
• http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/
RecommenderSystemsandCollaborativeFiltering/tabid/1318/Default.aspx
• http://public.research.att.com/~volinsky/netflix/RecSys08tutorial.pdf
• http://www.grouplens.org/papers/pdf/www10_sarwar.pdf
• http://web4.cs.ucl.ac.uk/staff/jun.wang/blog/topics/research-
resources/collaborative-filtering/
• http://webwhompers.com/collaborative-filtering.html
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 39
37. Mohammad-Ali Abbasi (Ali),
Ali, is a Ph.D student at Data Mining
and Machine Learning Lab, Arizona
State University.
His research interests include Data
Mining, Machine Learning, Social
Computing, and Social Media Behavior
Analysis.
http://www.public.asu.edu/~mabbasi2/
Arizona State University
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell Collective Intelligence 40
The underlying assumption of CF approach is that those who agreed in the past tend to agree again in the future. For example, a collaborative filtering or recommendation system for television tastes could make predictions about which television show a user should like given a partial list of that user's tastes (likes or dislikes)[1]. Note that these predictions are specific to the user, but use information gleaned from many users. This differs from the simpler approach of giving an average (non-specific) score for each item of interest, for example based on its number of votes. [wiki]