Pert 05 aplikasi clustering

Overview of Clustering
Rong Jin

Outline
 K means for clustering
 Expectation Maximization algorithm for clustering
 Spectrum clustering (if time is permitted)

Clustering
$$$
age
 Find out the underlying structure for given data
points

Application (I): Search Result Clustering

Application (III): Google News

Application (III): Visualization
Islands of music
(Pampalk et al., KDD’ 03)

Application (IV): Image Compression
http://www.ece.neu.edu/groups/rpl/kmeans/

How to Find good Clustering?
 Minimize the sum of
distance within clusters
C1
C2
C3
C4
C5
 
 
,
6 2
,
1 1,
arg min
j i j
n
i j i j
j iC m
m x C
 
 
,
6
,
1
1 the j-th cluster
0 the j-th cluster
1
any a single cluster
i
i j
i
i j
j
i
x
m
x
m
x


 


 


How to Efficiently Clustering Data?
 
 
,
6 2
,
1 1,
arg min
j i j
n
i j i j
j iC m
m x C
 
 
   ,Memberships and centers are correlated.i j jm C
 
,
1
,
,
1
Given memberships ,
n
i j i
i
i j j n
i j
i
m x
m C
m





2
,
1 arg min( )
Given centers { },
0 otherwise
i j
kj i j
j x C
C m
  
 


K-means for Clustering
 K-means
 Start with a random
guess of cluster
centers
 Determine the
membership of each
data points
 Adjust the cluster
centers

K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)

K-means
2. Randomly guess k cluster
Center locations

K-means
Center locations
3. Each datapoint finds out
which Center it’s closest to.
(Thus each Center “owns” a
set of datapoints)

K-means
Center locations
4. Each Center finds the
centroid of the points it
owns

K-means
Center locations
4. Each Center finds the
centroid of the points it
owns
Any Computational Problem?
Computational Complexity: O(N)
where N is the number of points?

Improve K-means
 Group points by region
 KD tree
 SR tree
 Key difference
 Find the closest center for
each rectangle
 Assign all the points within a
rectangle to one cluster

Improved K-means
 Find the closest center for
each rectangle
 Assign all the points within
a rectangle to one cluster

A Gaussian Mixture Model for Clustering
 Assume that data are
generated from a
mixture of Gaussian
distributions
 For each Gaussian
distribution
 Center: i
 Variance: i (ignore)
 For each data point
 Determine membership
: if belongs to j-th clusterij iz x

Learning a Gaussian Mixture
(with known covariance)
 Probability ( )ip x x
 
2
/ 2 2
2
( ) ( , ) ( ) ( | )
1
( ) exp
22
j j
j
i i j j i j
i j
j d
p x x p x x p p x x
x
p
 

     

 

       
 
   
 
 
 


 Probability ( )ip x x
 
2
/ 2 2
2
( ) ( , ) ( ) ( | )
1
( ) exp
22
j j
j
i i j j i j
i j
j d
p x x p x x p p x x
x
p
 

     

 

       
 
   
 
 
 

 Log-likelihood of data
 Apply MLE to find optimal parameters
 
2
/ 2 2
2
1
log ( ) log ( ) exp
22j
i j
i j d
i i
x
p x x p


 

         
   
  
 ( ),j j j
p   

2
2
2
2
1
( )
2
1
( )
2
1
( )
( )
i j
i n
x
j
k x
n
n
e p
e p




 
 
 
 




[ ] ( | )ij j iE z p x x   E-Step
1
( | ) ( )
( | ) ( )
i j j
k
i n j
n
p x x p
p x x p
   
   

  

  

1
1
1
[ ]
[ ]
m
j ij im
i
ij
i
E z x
E z



 

M-Step
1
1
( ) [ ]
m
j ij
i
p E z
m
 

  

Gaussian Mixture Example: Start

Mixture Model for Doc Clustering
 A set of language models

 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   


 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
( )ip d d
( , )
1
( ) ( , )
( ) ( | )
( ) ( | )
j
j
k i
j
i i j
j i j
V tf w d
j k j
k
p d d p d d
p p d d
p p w



 
   
  

   
   
    


 
 Probability


 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
( )ip d d
( , )
1
( ) ( , )
( ) ( | )
( ) ( | )
j
j
k i
j
i i j
j i j
V tf w d
j k j
k
p d d p d d
p p d d
p p w



 
   
  

   
   
    


 
 Probability
Introduce hidden variable zij
zij: document di is generated by the
j-th language model j.

Learning a Mixture Model
 
( , )
1
( , )
1 1
( | ) ( )
( | ) ( )
k i
k i
V tf w d
m j j
m
VK
tf w d
m n n
n m
p w p
p w p
  
  

 
   




1
[ ] ( | )
( | ) ( )
( | ) ( )
ij j i
i j j
K
i n n
n
E z p d d
p d d p
p d d p
 
   
   

  
  

  
E-Step
K: number of language models

Learning a Mixture Model
M-Step
1
1
( ) [ ]
N
j ij
i
p E z
N
 

  
1
1
[ ] ( , )
( | )
[ ]
N
ij i k
k
i j N
ij k
k
E z tf w d
p w
E z d
 




N: number of documents

Other Mixture Models
 Probabilistic latent semantic index (PLSI)
 Latent Dirichlet Allocation (LDA)

Problems (I)
 Both k-means and mixture models need to compute
centers of clusters and explicit distance measurement
 Given strange distance measurement, the center of
clusters can be hard to compute
E.g.,  ' ' '
1 1 2 2' max , ,..., n nx x x x x x x x
    
x y
z
 
  x y x z

Problems (II)
 Both k-means and mixture models look for compact
clustering structures
 In some cases, connected clustering structures are more desirable

Graph Partition
 MinCut: bipartite graphs with minimal number of
cut edges
CutSize = 2

2-way Spectral Graph Partitioning
 Weight matrix W
 wi,j: the weight between two
vertices i and j
 Membership vector q
1 Cluster
-1 Cluster
i
i A
q
i B

 

 
[ 1,1]
2
,
,
arg min
1
4
n
i j i j
i j
CutSize
CutSize J q q w
 

  
q
q

Solving the Optimization Problem
 Directly solving the above problem requires
combinatorial search  exponential complexity
 How to reduce the computation complexity?
 
2
,
[ 1,1] ,
1
argmin
4n
i j i j
i j
q q w
 
 
q
q

Relaxation Approach
 Key difficulty: qi has to be either –1, 1
 Relax qi to be any real number
 Impose constraint 2
1
n
ii
q n

   
2 2 2
, ,
, ,
2
, ,
,
1 1
2
4 4
1 1
2 2
4 4
i j i j i j i j i j
i j i j
i i j i j i j
i j i j
J q q w q q q q w
q w q q w
    
 
   
 
 
  
,i i j
j
d w 
 2
, , ,
,
1 1 1
2 2 2
i i i j i j i i i j i j j
i i j i
q d q q w q d w q     
,i i jD d    
( )T
J  q D W q

Relaxation Approach
2
* argmin argmin ( )
subject to
T
k
k
J
q n
  

q q
q q D W q

Relaxation Approach
 Solution: the second minimum eigenvector for D-W
2
* argmin argmin ( )
subject to
T
k
k
J
q n
  

q q
q q D W q
2( )D W  q q

Graph Laplacian
 L is semi-positive definitive matrix
 For Any x, we have xTLx  0, why?
 Minimum eigenvalue 1 = 0 (what is the eigenvector?)

 The second minimum eigenvalue 2 gives the best bipartite
graph
 , , ,: ,i j i j i jj
w w         L D W W D
1 2 30 ... k      

Recovering Partitions
 Due to the relaxation, q can be any number (not just
–1 and 1)
 How to construct partition based on the eigenvector?
 Simple strategy: { | 0}, { | 0}i iA i q B i q   

Spectral Clustering
 Minimum cut does not balance the size of bipartite
graphs

Normalized Cut (Shi & Malik, 1997)
 Minimize the similarity between clusters and meanwhile
maximize the similarity within clusters
,( , ) , ,
( , ) ( , )
i j A i B i
i A j B i A i B
A B
s A B w d d d d
s A B s A B
J
d d
   
  
 
   
,
( , ) ( , ) B A
i j
i A j BA B A B
d ds A B s A B
J w
d d d d 

    
j
j
d d 
 2
,
B A
i j
i A j B A B
d d
w
d d d 

  







Biddd
Aiddd
iq
BA
AB
if
if
/
/
)(
 
2
,i j i j
i j
w q q 

Normalized Cut
 
2
, ( - )
/ if
/ if
T
i j i j
i j
B A
i
A B
J w q q
d d d i A
q
d d d i B
  
 
 
 
 q D W q

Normalized Cut
 Relax q to real value under the constraint
 
2
, ( - )
/ if
/ if
T
i j i j
i j
B A
i
A B
J w q q
d d d i A
q
d d d i B
  
 
 
 
 q D W q
0,1  DeqDqq TT
 Solution: DqqWD  )(

Non-negative Matrix Factorization

Pert 05 aplikasi clustering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Pert 05 aplikasi clustering

Similar to Pert 05 aplikasi clustering (20)

More from aiiniR

More from aiiniR (20)

Recently uploaded

Recently uploaded (20)

Pert 05 aplikasi clustering