Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

Data Mining:
Concepts and Techniques

— Chapter 11 —
Additional Theme: Collaborative Filtering & Data
Mining

Jiawei Han and Micheline Kamber
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber. All rights reserved
04/18/13 Data Mining: Principles and Algorithms
1

2

Outline
 Motivation
 Systems in Action
 A Conceptual Framework
 User-User Methods
 Item-Item Methods
 Recent Advances and Open Problems

3

Motivation
 User Perspective
 Lots of online products, books, movies, etc.

 Reduce my choices…please…

 Manager Perspective

“ if I have 3 million customers on the web, I should have
3 million stores on the web.”
CEO of Amazon.com [SCH01]

4

Example: Recommendation

Customers who bought this book also bought:

•Data Preparation for Data Mining: by Dorian Pyle (Author)
•The Elements of Statistical Learning: by T. Hastie, et al
•Data Mining: Introductory and Advanced Topics: by Margaret H. Dunham
•Mining the Web: Analysis of Hypertext and Semi Structured Data

5

Example: Personalization

6

Other Examples
 Movielens: movies
 Moviecritic: movies again
 My launch: music
 Gustos starrater: web pages
 Jester: Jokes
 TV Recommender: TV shows
 Suggest 1.0 : different products
 And much more…

7

How it Works?
 Each user has a profile
 Users rate items
 Explicitly: score from 1..5

 Implicitly: web usage mining


Time spent in viewing the item
 Navigation path
 Etc…
 System does the rest, How?
 This is what we will show today

8

Basic Approaches
 Collaborative Filtering (CF)
 Look at users collective behavior

 Look at the active user history

 Combine!

 Content-based Filtering
 Recommend items based on key-words

 More appropriate for information retrieval

9

Collaborative Filtering: A
Framework

Items: I
i1 i2 … ij … in

u1 The task:
u2 3 1.5 …. … 2 Q1: Find Unknown ratings?
Q2: Which items should we
…
rij=? recommend to this user?
.
ui 2
.
... .
1
Users: U
um Unknown function
f: U x I→ R

10

Collaborative Filtering Road Map
 Identify like-minded users

 Memory-based: K-NN

 Model-based: Clustering

 Item-Item Method
 Identify buying patterns

 Correlation Analysis

 Linear Regression

 Belief Network

 Association Rule Mining

11

User-User Similarity: Intuition

Target
Customer

Q3: How to combine?
Q1: How to measure
similarity?

Q2: How to select
neighbors?
12

How to Measure Similarity?
i1 in
 Pearson correlation coefficient ui

∑Rated Itemsra )(rij − ri )
j∈ Commonly
(raj −
ua
w p ( a, i ) =
∑ (raj − ra ) 2
j∈Commonly Rated Items
∑ ( rij − ri ) 2
j∈Commonly Rated Items

 Cosine measure
 Users are vectors in product-dimension space

ra .ri
wc (a, i ) =
r a 2 * ri 2

13

Nearest Neighbor Approaches
[SAR00a]
 Offline phase:
 Do nothing…just store transactions

 Online phase:
 Identify highly similar users to the active one

 Best K ones
 All with a measure greater than a threshold
 Prediction

∑ w(a, i)(r − r )
ij i
raj = ra + i

User a’s neutral
∑ w(a, i)
i
User i’s deviation

User a’s estimated deviation
14

Horting Method [ AGG99 ]
 K-NN is not transitive
 Horting takes advantage of transitivity
 Uses new similarity measure: Predictability
 User i predicts user a if
 They have rated sufficiently common items

 There is an error-bounded linear

transformation from user i’s ratings to a’s ones

15

How Horting Works?
 Offline phase: build neighborhood graph
 Online phase: Compute raj

1- Identify users who predict ua
2- Identify users who rated j
Ua 3- Find shortest paths from group1 to 2
4- Backward propagation and averaging

- Better for sparse environments
- Not well evaluated

16

Clustering [BRE98]
 Offline phase:
 Build clusters: k-mean, k-medoid, etc.

 Online phase:
 Identify the nearest cluster to the active user

 Prediction:

 Use the center of the cluster
 Weighted average between cluster members
 Weights depend on the active user

Faster Slower but a little
more accurate
17

Clustering vs. k-NN
Approaches
 K-NN using Pearson measure is slower but more
accurate
 Clustering is more scalable
Active user

Bad recommendations

We can use soft clustering but
will lose computational edge
18

Did We Answer the Questions?

Target
Customer

Q3: How to combine?
Q1: How to measure
similarity?

Q2: How to select
neighbors?
19

Are We Done?
 Q1:How to measure similarity? Done... Really??

Sparsity results from the poor representation!
∑ ......
j∈ Commonly Rated Items
w p ( a, i ) = U1 rates recycled letter pads High
.....
U2 rates recycled memo pads High

Both of them like Recycled office products

They are similar but the math won’t work
for that
What about Sparsity?
Not enough common Items Example from [SAR00P]
implies spurious neighbors
and hence bad recommendations

By working at the right level of abstraction we
can eliminate sparsity
20

The Power of Representation [UNG98]

Action Foreign Classic

Q1-B: How can we formalize this intuition?
21

How to Abstract?
 Semi-manual Methods
 Use product features

 Cluster products first, then cluster users

 Works only if we have descriptive features

 Automatic Methods
 Adjusted Product Taxonomy

 Latent Semantic Indexing

22

Adjusted Product Taxonomy [CHO04]
• Input : product taxonomy
•Output: modified taxonomy with even distribution

23

Adjusted Product Taxonomy (2)

Using
original
taxonomy

Number of transactions
having this category

Using
adjusted
taxonomy

24

Latent Semantic Indexing [SAR00b]

=
Sk I’
R R UUk S
k Ik’

mXn k
mXr rXr
k k k
rXn

The reconstructed matrix Rk = Uk.Sk.Ik’ is the closest
rank-k matrix to the original matrix R.

• Captures latent associations
• Reduced space is less-noisy

25

Are We Done? (2)
Not adequately
 Q2:How to Select Neighbors? answered
 We don’t expect to use the same neighbors
for all products
 Neighbors should be product-category

specific

Q2-B. How can we determine whether or not a
user is relevant to a given product?

26

Selecting Relevant Instances
[YU01]

 Superman and Batman and correlated
Predict this
 Titanic and Batman are negatively correlated
 “Dances with Wolves” has nothing to do with Batman’s rating
 Karen is not a good instance to consider

How can we formalize this?  Mutual Information
 MI(X;Y) = H(X) – H(X|Y)

27

Selecting Relevant Instances (2)
 Offline phase:
 Estimate mutual information between items

 For each item:


Find users who rated it

Compute their strength (how many relevant items
they also rated)

Retain subset of them (10% works fine)
 Online phase:
 To predict the target item’s rating, run k-NN on

its reduced instance space
Better results with less data… quality not quantity is what matter

28

Are We Done? (3)
 Q3:How to combine?
 Weighted average
 Discover association rules in neighbors’ transactions
[LEE01, WAN04]
 For every x in this group:
like(x, Item1) ^ like(x, Item2) like(x, Item3)
 Use confidence and support to judge the quality of the
prediction
 Prediction is done on the binary level (like, dislike)
 Costly to run online

29

User-User Methods Evaluation
 Achieve good quality in practice
 The more processing we push offline, the better
the method scale
 However:
 User preference is dynamic

 High update frequency of offline-calculated
information
 No recommendation for new users
 We don’t know much about them yet

30

Collaborative Filtering Road Map
 Identify like-minded users

 Memory-based: K-NN

 Model-based: Clustering

 Item-Item Method
 Identify buying patterns



 Belief Network


31

Item-Item Similarity: The Intuition
 Search for similarities among items
 All computations can be done offline
 Item-Item similarity is more stable that user-user
similarity
 No need for frequent updates

 First Order Models


 Higher Order Models
 Belief Network


32

Correlation-based Methods [SAR01]

 Same as in user-user similarity but on item vectors
 Pearson correlation coefficient
 Look for users who rated both items

i1 ii ij in

∑ (r uj − r )(rui − ri )
j
u1

sij = u∈ Users Rated Both Items

∑ (ruj − rj ) 2
u∈Users Rated Both Items
∑ (rui − ri ) 2
u∈Users Rated Both Items
um

33

Correlation-based Methods (2)
 Offline phase:
 Calculate n(n-1) similarity measures

 For each item


Determine its k-most similar items
 Online phase:
 Predict rating for a given user-item pair as a

weighted sum over similar items that he rated

∑s r ij ai
raj = i∈similar items Ua 2 3 ? 4
∑s ij
i∈similar items
j

34

Regression Based Methods [VUC00]
 Offline phase:
 Fit n(n-1) linear regressions

 F (x) is a linear transformation of a user rating on
ij
item i to his rating on item j
 Online phase
 Same as previous method

 The weights are inversely proportional to the

regression error rates

∑ w f (r ij ij
i∈rated items by a
ai )
raj =
∑w ij
i∈rated items by a

35

Higher Order Models
 Previous approaches used the Naïve Bayes
assumption
 Item effects on a given one are independent

 Not always true
 Higher order models can do better
 Belief Network


36

Bayesian Belief Network: introduction

 Bayesian belief network allows a subset of the variables to
be conditionally independent
 A graphical model of causal relationships
 Represents dependency among the variables

 Gives a specification of joint probability distribution

Nodes: random variables
Links: dependency
X Y X,Y are the parents of Z, and Y is the
parent of P
Z No dependency between Z and P
P
Has no loops or cycles
37

Bayesian Belief Network: An Example

Family
Smoker
History
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1
LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9

The conditional probability table
for the variable LungCancer:
PositiveXRay Dyspnea Shows the conditional probability
for each possible combination of its
parents n
Bayesian Belief Networks P ( z1,..., zn) = ∏ P ( z i | Parents ( Z i ))
i =1

38

Belief Network for CF [BRE98]
 Every item is a node
 Binary rating (like, dislike)
 Learn offline a belief network over the training date
 CPT table at each node is represented as a decision tree
 Use greedy algorithms to determine the best network
structure
 Use probabilistic inference for online prediction

39

Belief Network for CF: An Example

CPT
Friends B.H

M.P

Probability

decision tree for the random variable “Melrose Palace” in
the movie domain

40

Association Rule Mining
 Offline processing
 Work on the binary level (like, dislike)

 View user as market basket containing items

liked by user
 Discover association rules between items

 Online processing:
 Match items that the active user like with rules

left hand side
 Recommend rules’ consequent based on

support and confidence

41

Association Rule Mining : Problems
 High support threshold leads to low coverage and may
eliminate important, but infrequent items from
consideration

 Low support thresholds result in very large model sizes,
computationally expensive offline pattern discovery phase
and slower online matching phase

 Solution:
 Adaptive Association Rule Mining

42

Adaptive Association Rule Mining [LIN01]

 Given:
 transaction dataset
Desired number
minConfidence
of rules
 target item

 desired range for number of

rules
 specified minimum confidence
minSupport

Find: set S of association rules for target item such that
 number of rules in S is in given range

 rules in S satisfy minimum confidence constraint

 rules in S have higher support than rules not in S that satisfy above

constraints

43

Adaptive Association Rule Mining (2)

 Discover rules with one item on the head
 Like (x, item1) ^ Like (x, item2)  Like(x,

target)

 The miner discovers association rules iteratively
(for each target item) until the desired number of
rules are extracted

 Support is adjusted per-item

44

Item-Item Methods: Why It Works?

Like(x,Book1)^like(x,book2) Like(x,Movie1)  like(x,Movie2)
like(x,book3)
Book1, Book2
Support Movie1 Support

Book Movie
gang gang

Without discovering the
We use the right neighbors for each groups themselves thus
item eliminating costly online
matching
In general better quality than user-user methods and better response time [LIN03]
45

Recent Work and Open Problems
 Order-based methods
 Ordering items is more informative than rating them

 [KAM03] developed k-o’mean to work on orders

 Preference-based methods
 Total ordering of items is not feasible

 Work on partial orders (preferences) [COH99]

 Integrating background knowledge
 User demographic information, item-features, etc..

 Modeling time
 Sequential patterns

46

References (1)
 Charu C. Aggarwal, Joel L. Wolf, Kun-Lung Wu, Philip S. Yu: Horting Hatches
an Egg: A New Graph-Theoretic Approach to Collaborative Filtering. KDD
1999: 201-212
 J. Breese, D. Heckerman, C. Kadie Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In Proc. 14th Conf. Uncertainty in
Artificial Intelligence, Madison, July 1998.
 Yoon Ho Cho and Jae Kyeong Kim: Application of Web usage mining and
product taxonomy to collaborative recommendations in e-commerce. Expert
Systems with Applications, 26(2), 2003
 William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order
things. In Advances in Neural Processing Systems 10, Denver, CO, 1997
 Jiawe Han, Fall 2003 online course notes available at:
http://www-courses.cs.uiuc.edu/~cs397han/slides/05.ppt
 Toshihiro Kamishima: Nantonac collaborative filtering: recommendation
based on order responses. KDD 2003: 583-588
 Lee, C.-H, Kim, Y.-H., Rhee, P.-K. Web personalization expert with combining
collaborative filtering and association rule mining technique. Expert Systems
with Applications, v 21, n 3, October, 2001, p 131-137

47

References (2)
 W. Lin, 2001P, online presentation available at: http://www.wiwi.hu-
berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/LinAlvarezRuiz_W
ebKDD2000.ppt
 Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support
association rule mining for recommender systems. Data Mining and
Knowledge Discovery, 6:83--105, 2002
 G. Linden, B. Smith, and J. York, "Amazon.com Recommendations Iemto
-item collaborative filtering", IEEE Internet Computing, Vo. 7, No. 1, pp. 7680,
Jan. 2003.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: Analysis
of recommendation algorithms for e-commerce. ACM Conf. Electronic
Commerce 2000: 158-167
 B. Sarwar, G. Karypis, J. Konstan, and J. Riedl: Application of dimensionality
reduction in recommender systems--a case study. In ACM WebKDD 2000
Web Mining for E-Commerce Workshop, 2000.
 B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based
collaborative filtering recommendation algorithms. WWW’01

48

References (3)
 B. Sarwar, 2000P, online presentation available at: http://www.wiwi.hu-
berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/badrul.ppt
 J. Ben Schafer, Joseph A. Konstan, John Riedl: E-Commerce
Recommendation Applications. Data Mining and Knowledge Discovery 5(1/2):
115-153, 2001
 L.H. Ungar and D.P. Foster: Clustering Methods for Collaborative Filtering,
AAAI Workshop on Recommendation Systems, 1998.
 Yi-Fan Wang, Yu-Liang Chuang, Mei-Hua Hsu and Huan-Chao Keh: A
personalized recommender system for the cosmetic business. Expert
Systems with Applications, v 26, n 3, April, 2004 Pages 427-434
 S. Vucetic and Z. Obradovic. A regression-based approach for scaling-up
personalized recommender systems in e-commerce. In ACM WebKDD 2000
Web Mining for E-Commerce Workshop, 2000.
 Kai Yu, Xiaowei Xu, Martin Ester, and Hans-Peter Kriegel: Selecting relevant
instances for efficient accurate collaborative filtering. In Proceedings of the
10th CIKM, pages 239--246. ACM Press, 2001.
 Cheng Zhai, Spring 2003 online course notes available at:
http://sifaka.cs.uiuc.edu/course/2003-497CXZ/loc/cf.ppt

49

50

Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

Similar to Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber (20)

Recently uploaded

Recently uploaded (20)