SlideShare a Scribd company logo
1 of 66
Overview of Clustering
Rong Jin
Outline
 K means for clustering
 Expectation Maximization algorithm for clustering
 Spectrum clustering (if time is permitted)
Clustering
$$$
age
 Find out the underlying structure for given data
points
Application (I): Search Result Clustering
Application (II): Navigation
Application (III): Google News
Application (III): Visualization
Islands of music
(Pampalk et al., KDD’ 03)
Application (IV): Image Compression
http://www.ece.neu.edu/groups/rpl/kmeans/
How to Find good Clustering?
 Minimize the sum of
distance within clusters
C1
C2
C3
C4
C5
 
 
,
6 2
,
1 1,
arg min
j i j
n
i j i j
j iC m
m x C
 
 
,
6
,
1
1 the j-th cluster
0 the j-th cluster
1
any a single cluster
i
i j
i
i j
j
i
x
m
x
m
x


 


 

How to Efficiently Clustering Data?
 
 
,
6 2
,
1 1,
arg min
j i j
n
i j i j
j iC m
m x C
 
 
   ,Memberships and centers are correlated.i j jm C
 
,
1
,
,
1
Given memberships ,
n
i j i
i
i j j n
i j
i
m x
m C
m





2
,
1 arg min( )
Given centers { },
0 otherwise
i j
kj i j
j x C
C m
  
 

K-means for Clustering
 K-means
 Start with a random
guess of cluster
centers
 Determine the
membership of each
data points
 Adjust the cluster
centers
K-means for Clustering
 K-means
 Start with a random
guess of cluster
centers
 Determine the
membership of each
data points
 Adjust the cluster
centers
K-means for Clustering
 K-means
 Start with a random
guess of cluster
centers
 Determine the
membership of each
data points
 Adjust the cluster
centers
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
3. Each datapoint finds out
which Center it’s closest to.
(Thus each Center “owns” a
set of datapoints)
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
3. Each datapoint finds out
which Center it’s closest to.
4. Each Center finds the
centroid of the points it
owns
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
3. Each datapoint finds out
which Center it’s closest to.
4. Each Center finds the
centroid of the points it
owns
Any Computational Problem?
Computational Complexity: O(N)
where N is the number of points?
Improve K-means
 Group points by region
 KD tree
 SR tree
 Key difference
 Find the closest center for
each rectangle
 Assign all the points within a
rectangle to one cluster
Improved K-means
 Find the closest center for
each rectangle
 Assign all the points within
a rectangle to one cluster
Improved K-means
Improved K-means
Improved K-means
Improved K-means
Improved K-means
Improved K-means
Improved K-means
Improved K-means
Improved K-means
A Gaussian Mixture Model for Clustering
 Assume that data are
generated from a
mixture of Gaussian
distributions
 For each Gaussian
distribution
 Center: i
 Variance: i (ignore)
 For each data point
 Determine membership
: if belongs to j-th clusterij iz x
Learning a Gaussian Mixture
(with known covariance)
 Probability ( )ip x x
 
2
/ 2 2
2
( ) ( , ) ( ) ( | )
1
( ) exp
22
j j
j
i i j j i j
i j
j d
p x x p x x p p x x
x
p
 

     

 

       
 
   
 
 
 

Learning a Gaussian Mixture
(with known covariance)
 Probability ( )ip x x
 
2
/ 2 2
2
( ) ( , ) ( ) ( | )
1
( ) exp
22
j j
j
i i j j i j
i j
j d
p x x p x x p p x x
x
p
 

     

 

       
 
   
 
 
 

 Log-likelihood of data
 Apply MLE to find optimal parameters
 
2
/ 2 2
2
1
log ( ) log ( ) exp
22j
i j
i j d
i i
x
p x x p


 

         
   
  
 ( ),j j j
p   
Learning a Gaussian Mixture
(with known covariance)
2
2
2
2
1
( )
2
1
( )
2
1
( )
( )
i j
i n
x
j
k x
n
n
e p
e p




 
 
 
 




[ ] ( | )ij j iE z p x x   E-Step
1
( | ) ( )
( | ) ( )
i j j
k
i n j
n
p x x p
p x x p
   
   

  

  
Learning a Gaussian Mixture
(with known covariance)
1
1
1
[ ]
[ ]
m
j ij im
i
ij
i
E z x
E z



 

M-Step
1
1
( ) [ ]
m
j ij
i
p E z
m
 

  
Gaussian Mixture Example: Start
After First Iteration
After 2nd Iteration
After 3rd Iteration
After 4th Iteration
After 5th Iteration
After 6th Iteration
After 20th Iteration
Mixture Model for Doc Clustering
 A set of language models

 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
Mixture Model for Doc Clustering
 A set of language models

 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
( )ip d d
( , )
1
( ) ( , )
( ) ( | )
( ) ( | )
j
j
k i
j
i i j
j i j
V tf w d
j k j
k
p d d p d d
p p d d
p p w



 
   
  

   
   
    


 
 Probability
Mixture Model for Doc Clustering
 A set of language models

 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
( )ip d d
( , )
1
( ) ( , )
( ) ( | )
( ) ( | )
j
j
k i
j
i i j
j i j
V tf w d
j k j
k
p d d p d d
p p d d
p p w



 
   
  

   
   
    


 
 Probability
Mixture Model for Doc Clustering
 A set of language models

 1 2, ,..., K   
1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
( )ip d d
( , )
1
( ) ( , )
( ) ( | )
( ) ( | )
j
j
k i
j
i i j
j i j
V tf w d
j k j
k
p d d p d d
p p d d
p p w



 
   
  

   
   
    


 
 Probability
Introduce hidden variable zij
zij: document di is generated by the
j-th language model j.
Learning a Mixture Model
 
( , )
1
( , )
1 1
( | ) ( )
( | ) ( )
k i
k i
V tf w d
m j j
m
VK
tf w d
m n n
n m
p w p
p w p
  
  

 
   




1
[ ] ( | )
( | ) ( )
( | ) ( )
ij j i
i j j
K
i n n
n
E z p d d
p d d p
p d d p
 
   
   

  
  

  
E-Step
K: number of language models
Learning a Mixture Model
M-Step
1
1
( ) [ ]
N
j ij
i
p E z
N
 

  
1
1
[ ] ( , )
( | )
[ ]
N
ij i k
k
i j N
ij k
k
E z tf w d
p w
E z d
 




N: number of documents
Examples of Mixture Models
Other Mixture Models
 Probabilistic latent semantic index (PLSI)
 Latent Dirichlet Allocation (LDA)
Problems (I)
 Both k-means and mixture models need to compute
centers of clusters and explicit distance measurement
 Given strange distance measurement, the center of
clusters can be hard to compute
E.g.,  ' ' '
1 1 2 2' max , ,..., n nx x x x x x x x
    
x y
z
 
  x y x z
Problems (II)
 Both k-means and mixture models look for compact
clustering structures
 In some cases, connected clustering structures are more desirable
Graph Partition
 MinCut: bipartite graphs with minimal number of
cut edges
CutSize = 2
2-way Spectral Graph Partitioning
 Weight matrix W
 wi,j: the weight between two
vertices i and j
 Membership vector q
1 Cluster
-1 Cluster
i
i A
q
i B

 

 
[ 1,1]
2
,
,
arg min
1
4
n
i j i j
i j
CutSize
CutSize J q q w
 

  
q
q
Solving the Optimization Problem
 Directly solving the above problem requires
combinatorial search  exponential complexity
 How to reduce the computation complexity?
 
2
,
[ 1,1] ,
1
argmin
4n
i j i j
i j
q q w
 
 
q
q
Relaxation Approach
 Key difficulty: qi has to be either –1, 1
 Relax qi to be any real number
 Impose constraint 2
1
n
ii
q n

   
2 2 2
, ,
, ,
2
, ,
,
1 1
2
4 4
1 1
2 2
4 4
i j i j i j i j i j
i j i j
i i j i j i j
i j i j
J q q w q q q q w
q w q q w
    
 
   
 
 
  
,i i j
j
d w 
 2
, , ,
,
1 1 1
2 2 2
i i i j i j i i i j i j j
i i j i
q d q q w q d w q     
,i i jD d    
( )T
J  q D W q
Relaxation Approach
2
* argmin argmin ( )
subject to
T
k
k
J
q n
  

q q
q q D W q
Relaxation Approach
 Solution: the second minimum eigenvector for D-W
2
* argmin argmin ( )
subject to
T
k
k
J
q n
  

q q
q q D W q
2( )D W  q q
Graph Laplacian
 L is semi-positive definitive matrix
 For Any x, we have xTLx  0, why?
 Minimum eigenvalue 1 = 0 (what is the eigenvector?)

 The second minimum eigenvalue 2 gives the best bipartite
graph
 , , ,: ,i j i j i jj
w w         L D W W D
1 2 30 ... k      
Recovering Partitions
 Due to the relaxation, q can be any number (not just
–1 and 1)
 How to construct partition based on the eigenvector?
 Simple strategy: { | 0}, { | 0}i iA i q B i q   
Spectral Clustering
 Minimum cut does not balance the size of bipartite
graphs
Normalized Cut (Shi & Malik, 1997)
 Minimize the similarity between clusters and meanwhile
maximize the similarity within clusters
,( , ) , ,
( , ) ( , )
i j A i B i
i A j B i A i B
A B
s A B w d d d d
s A B s A B
J
d d
   
  
 
   
,
( , ) ( , ) B A
i j
i A j BA B A B
d ds A B s A B
J w
d d d d 

    
j
j
d d 
 2
,
B A
i j
i A j B A B
d d
w
d d d 

  







Biddd
Aiddd
iq
BA
AB
if
if
/
/
)(
 
2
,i j i j
i j
w q q 
Normalized Cut
 
2
, ( - )
/ if
/ if
T
i j i j
i j
B A
i
A B
J w q q
d d d i A
q
d d d i B
  
 
 
 
 q D W q
Normalized Cut
 Relax q to real value under the constraint
 
2
, ( - )
/ if
/ if
T
i j i j
i j
B A
i
A B
J w q q
d d d i A
q
d d d i B
  
 
 
 
 q D W q
0,1  DeqDqq TT
 Solution: DqqWD  )(
Image Segmentation
Non-negative Matrix Factorization

More Related Content

What's hot

Mychurch File Upload
Mychurch File UploadMychurch File Upload
Mychurch File UploadJoe Suh
 
Discrete form of the riccati equation
Discrete form of the riccati equationDiscrete form of the riccati equation
Discrete form of the riccati equationAlberth Carantón
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Ee693 questionshomework
Ee693 questionshomeworkEe693 questionshomework
Ee693 questionshomeworkGopi Saiteja
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017Fred J. Hickernell
 
Chapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-pChapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-pIIUI
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
Chapter 06 boolean algebra
Chapter 06 boolean algebraChapter 06 boolean algebra
Chapter 06 boolean algebraIIUI
 
D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S J N T U M O D E L...
D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S  J N T U  M O D E L...D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S  J N T U  M O D E L...
D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S J N T U M O D E L...guest3f9c6b
 
Chapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-pChapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-pIIUI
 
Paper id 71201927
Paper id 71201927Paper id 71201927
Paper id 71201927IJRAT
 
An application of gd
An application of gdAn application of gd
An application of gdgraphhoc
 

What's hot (20)

Mychurch File Upload
Mychurch File UploadMychurch File Upload
Mychurch File Upload
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Discrete form of the riccati equation
Discrete form of the riccati equationDiscrete form of the riccati equation
Discrete form of the riccati equation
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Ee693 questionshomework
Ee693 questionshomeworkEe693 questionshomework
Ee693 questionshomework
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 
Digital Electronics University Question Bank
Digital Electronics University Question BankDigital Electronics University Question Bank
Digital Electronics University Question Bank
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
 
Chapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-pChapter 06 boolean algebra 3o-p
Chapter 06 boolean algebra 3o-p
 
Bq25399403
Bq25399403Bq25399403
Bq25399403
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
Chapter 06 boolean algebra
Chapter 06 boolean algebraChapter 06 boolean algebra
Chapter 06 boolean algebra
 
D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S J N T U M O D E L...
D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S  J N T U  M O D E L...D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S  J N T U  M O D E L...
D E S I G N A N D A N A L Y S I S O F A L G O R I T H M S J N T U M O D E L...
 
Chapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-pChapter 06 boolean algebra 2o-p
Chapter 06 boolean algebra 2o-p
 
Paper id 71201927
Paper id 71201927Paper id 71201927
Paper id 71201927
 
An application of gd
An application of gdAn application of gd
An application of gd
 
1508.07756v1
1508.07756v11508.07756v1
1508.07756v1
 
2-D array
2-D array2-D array
2-D array
 

Viewers also liked

Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering TutorialZitao Liu
 
Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...
Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...
Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...Nadiar AS
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsAkisato Kimura
 
Intelligent computer aided diagnosis system for liver fibrosis
Intelligent computer aided diagnosis system for liver fibrosisIntelligent computer aided diagnosis system for liver fibrosis
Intelligent computer aided diagnosis system for liver fibrosisAboul Ella Hassanien
 

Viewers also liked (7)

Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...
Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...
Seminar: Visualisasi Data Interaktif Data Terbuka Pemerintah Provinsi DKI Jak...
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
ICWSM12 Brief Review
ICWSM12 Brief ReviewICWSM12 Brief Review
ICWSM12 Brief Review
 
Blog clustering
Blog clusteringBlog clustering
Blog clustering
 
Intelligent computer aided diagnosis system for liver fibrosis
Intelligent computer aided diagnosis system for liver fibrosisIntelligent computer aided diagnosis system for liver fibrosis
Intelligent computer aided diagnosis system for liver fibrosis
 

Similar to Pert 05 aplikasi clustering

Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...Nesma
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMWireilla
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMijfls
 
An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideWooSung Choi
 
Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
 
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)paperpublications3
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7aVuTran231
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
K-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codeK-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codegokulprasath06
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsemGopi Saiteja
 
Machine learning (7)
Machine learning (7)Machine learning (7)
Machine learning (7)NYversity
 

Similar to Pert 05 aplikasi clustering (20)

Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slide
 
Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimation
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
Some Continued Mock Theta Functions from Ramanujan’s Lost Notebook (IV)
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
Cs229 notes7a
Cs229 notes7aCs229 notes7a
Cs229 notes7a
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
Lect4
Lect4Lect4
Lect4
 
K-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codeK-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source code
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsem
 
Machine learning (7)
Machine learning (7)Machine learning (7)
Machine learning (7)
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 

More from aiiniR

Crm 8 strategi crm
Crm 8 strategi crmCrm 8 strategi crm
Crm 8 strategi crmaiiniR
 
Crm 7 mempertahankan pelanggan
Crm 7 mempertahankan pelangganCrm 7 mempertahankan pelanggan
Crm 7 mempertahankan pelangganaiiniR
 
Crm 6 tipe pelanggan
Crm 6 tipe pelangganCrm 6 tipe pelanggan
Crm 6 tipe pelangganaiiniR
 
Crm 5 nilai pelanggan
Crm 5 nilai pelangganCrm 5 nilai pelanggan
Crm 5 nilai pelangganaiiniR
 
Crm 4 analisis portofolio pelanggan
Crm 4 analisis portofolio pelangganCrm 4 analisis portofolio pelanggan
Crm 4 analisis portofolio pelangganaiiniR
 
Crm 3 rantai nilai crm
Crm 3 rantai nilai crmCrm 3 rantai nilai crm
Crm 3 rantai nilai crmaiiniR
 
Crm 2 konsep crm
Crm 2 konsep crmCrm 2 konsep crm
Crm 2 konsep crmaiiniR
 
Crm 1 kontrak kuliah
Crm 1 kontrak kuliahCrm 1 kontrak kuliah
Crm 1 kontrak kuliahaiiniR
 
Testing&implementasi 4 5
Testing&implementasi 4 5Testing&implementasi 4 5
Testing&implementasi 4 5aiiniR
 
Testing&implementasi 4
Testing&implementasi 4Testing&implementasi 4
Testing&implementasi 4aiiniR
 
Testing&implementasi 3
Testing&implementasi 3Testing&implementasi 3
Testing&implementasi 3aiiniR
 
Testing&implementasi 2
Testing&implementasi 2Testing&implementasi 2
Testing&implementasi 2aiiniR
 
Testing&implementasi 1
Testing&implementasi 1Testing&implementasi 1
Testing&implementasi 1aiiniR
 
Testing&implementasi 1 pendahuluan
Testing&implementasi 1   pendahuluanTesting&implementasi 1   pendahuluan
Testing&implementasi 1 pendahuluanaiiniR
 
Pert 06 association rules
Pert 06 association rulesPert 06 association rules
Pert 06 association rulesaiiniR
 
Pert 04 clustering data mining
Pert 04 clustering   data miningPert 04 clustering   data mining
Pert 04 clustering data miningaiiniR
 
Pert 03 introduction dm 2012
Pert 03 introduction dm 2012Pert 03 introduction dm 2012
Pert 03 introduction dm 2012aiiniR
 
Pert 02 statistik deskriptif 2013
Pert 02 statistik deskriptif 2013Pert 02 statistik deskriptif 2013
Pert 02 statistik deskriptif 2013aiiniR
 
3 basis data
3 basis data3 basis data
3 basis dataaiiniR
 
2 pengenalan peta
2 pengenalan peta2 pengenalan peta
2 pengenalan petaaiiniR
 

More from aiiniR (20)

Crm 8 strategi crm
Crm 8 strategi crmCrm 8 strategi crm
Crm 8 strategi crm
 
Crm 7 mempertahankan pelanggan
Crm 7 mempertahankan pelangganCrm 7 mempertahankan pelanggan
Crm 7 mempertahankan pelanggan
 
Crm 6 tipe pelanggan
Crm 6 tipe pelangganCrm 6 tipe pelanggan
Crm 6 tipe pelanggan
 
Crm 5 nilai pelanggan
Crm 5 nilai pelangganCrm 5 nilai pelanggan
Crm 5 nilai pelanggan
 
Crm 4 analisis portofolio pelanggan
Crm 4 analisis portofolio pelangganCrm 4 analisis portofolio pelanggan
Crm 4 analisis portofolio pelanggan
 
Crm 3 rantai nilai crm
Crm 3 rantai nilai crmCrm 3 rantai nilai crm
Crm 3 rantai nilai crm
 
Crm 2 konsep crm
Crm 2 konsep crmCrm 2 konsep crm
Crm 2 konsep crm
 
Crm 1 kontrak kuliah
Crm 1 kontrak kuliahCrm 1 kontrak kuliah
Crm 1 kontrak kuliah
 
Testing&implementasi 4 5
Testing&implementasi 4 5Testing&implementasi 4 5
Testing&implementasi 4 5
 
Testing&implementasi 4
Testing&implementasi 4Testing&implementasi 4
Testing&implementasi 4
 
Testing&implementasi 3
Testing&implementasi 3Testing&implementasi 3
Testing&implementasi 3
 
Testing&implementasi 2
Testing&implementasi 2Testing&implementasi 2
Testing&implementasi 2
 
Testing&implementasi 1
Testing&implementasi 1Testing&implementasi 1
Testing&implementasi 1
 
Testing&implementasi 1 pendahuluan
Testing&implementasi 1   pendahuluanTesting&implementasi 1   pendahuluan
Testing&implementasi 1 pendahuluan
 
Pert 06 association rules
Pert 06 association rulesPert 06 association rules
Pert 06 association rules
 
Pert 04 clustering data mining
Pert 04 clustering   data miningPert 04 clustering   data mining
Pert 04 clustering data mining
 
Pert 03 introduction dm 2012
Pert 03 introduction dm 2012Pert 03 introduction dm 2012
Pert 03 introduction dm 2012
 
Pert 02 statistik deskriptif 2013
Pert 02 statistik deskriptif 2013Pert 02 statistik deskriptif 2013
Pert 02 statistik deskriptif 2013
 
3 basis data
3 basis data3 basis data
3 basis data
 
2 pengenalan peta
2 pengenalan peta2 pengenalan peta
2 pengenalan peta
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Pert 05 aplikasi clustering

  • 2. Outline  K means for clustering  Expectation Maximization algorithm for clustering  Spectrum clustering (if time is permitted)
  • 3. Clustering $$$ age  Find out the underlying structure for given data points
  • 4. Application (I): Search Result Clustering
  • 7. Application (III): Visualization Islands of music (Pampalk et al., KDD’ 03)
  • 8. Application (IV): Image Compression http://www.ece.neu.edu/groups/rpl/kmeans/
  • 9. How to Find good Clustering?  Minimize the sum of distance within clusters C1 C2 C3 C4 C5     , 6 2 , 1 1, arg min j i j n i j i j j iC m m x C     , 6 , 1 1 the j-th cluster 0 the j-th cluster 1 any a single cluster i i j i i j j i x m x m x         
  • 10. How to Efficiently Clustering Data?     , 6 2 , 1 1, arg min j i j n i j i j j iC m m x C        ,Memberships and centers are correlated.i j jm C   , 1 , , 1 Given memberships , n i j i i i j j n i j i m x m C m      2 , 1 arg min( ) Given centers { }, 0 otherwise i j kj i j j x C C m      
  • 11. K-means for Clustering  K-means  Start with a random guess of cluster centers  Determine the membership of each data points  Adjust the cluster centers
  • 12. K-means for Clustering  K-means  Start with a random guess of cluster centers  Determine the membership of each data points  Adjust the cluster centers
  • 13. K-means for Clustering  K-means  Start with a random guess of cluster centers  Determine the membership of each data points  Adjust the cluster centers
  • 14. K-means 1. Ask user how many clusters they’d like. (e.g. k=5)
  • 15. K-means 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations
  • 16. K-means 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)
  • 17. K-means 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. 4. Each Center finds the centroid of the points it owns
  • 18. K-means 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. 4. Each Center finds the centroid of the points it owns Any Computational Problem? Computational Complexity: O(N) where N is the number of points?
  • 19. Improve K-means  Group points by region  KD tree  SR tree  Key difference  Find the closest center for each rectangle  Assign all the points within a rectangle to one cluster
  • 20. Improved K-means  Find the closest center for each rectangle  Assign all the points within a rectangle to one cluster
  • 30. A Gaussian Mixture Model for Clustering  Assume that data are generated from a mixture of Gaussian distributions  For each Gaussian distribution  Center: i  Variance: i (ignore)  For each data point  Determine membership : if belongs to j-th clusterij iz x
  • 31. Learning a Gaussian Mixture (with known covariance)  Probability ( )ip x x   2 / 2 2 2 ( ) ( , ) ( ) ( | ) 1 ( ) exp 22 j j j i i j j i j i j j d p x x p x x p p x x x p                                  
  • 32. Learning a Gaussian Mixture (with known covariance)  Probability ( )ip x x   2 / 2 2 2 ( ) ( , ) ( ) ( | ) 1 ( ) exp 22 j j j i i j j i j i j j d p x x p x x p p x x x p                                    Log-likelihood of data  Apply MLE to find optimal parameters   2 / 2 2 2 1 log ( ) log ( ) exp 22j i j i j d i i x p x x p                        ( ),j j j p   
  • 33. Learning a Gaussian Mixture (with known covariance) 2 2 2 2 1 ( ) 2 1 ( ) 2 1 ( ) ( ) i j i n x j k x n n e p e p                 [ ] ( | )ij j iE z p x x   E-Step 1 ( | ) ( ) ( | ) ( ) i j j k i n j n p x x p p x x p                
  • 34. Learning a Gaussian Mixture (with known covariance) 1 1 1 [ ] [ ] m j ij im i ij i E z x E z       M-Step 1 1 ( ) [ ] m j ij i p E z m      
  • 43. Mixture Model for Doc Clustering  A set of language models   1 2, ,..., K    1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w   
  • 44. Mixture Model for Doc Clustering  A set of language models   1 2, ,..., K    1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w    ( )ip d d ( , ) 1 ( ) ( , ) ( ) ( | ) ( ) ( | ) j j k i j i i j j i j V tf w d j k j k p d d p d d p p d d p p w                                Probability
  • 45. Mixture Model for Doc Clustering  A set of language models   1 2, ,..., K    1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w    ( )ip d d ( , ) 1 ( ) ( , ) ( ) ( | ) ( ) ( | ) j j k i j i i j j i j V tf w d j k j k p d d p d d p p d d p p w                                Probability
  • 46. Mixture Model for Doc Clustering  A set of language models   1 2, ,..., K    1 2{ ( | ), ( | ),..., ( | )}i i i V ip w p w p w    ( )ip d d ( , ) 1 ( ) ( , ) ( ) ( | ) ( ) ( | ) j j k i j i i j j i j V tf w d j k j k p d d p d d p p d d p p w                                Probability Introduce hidden variable zij zij: document di is generated by the j-th language model j.
  • 47. Learning a Mixture Model   ( , ) 1 ( , ) 1 1 ( | ) ( ) ( | ) ( ) k i k i V tf w d m j j m VK tf w d m n n n m p w p p w p                  1 [ ] ( | ) ( | ) ( ) ( | ) ( ) ij j i i j j K i n n n E z p d d p d d p p d d p                      E-Step K: number of language models
  • 48. Learning a Mixture Model M-Step 1 1 ( ) [ ] N j ij i p E z N       1 1 [ ] ( , ) ( | ) [ ] N ij i k k i j N ij k k E z tf w d p w E z d       N: number of documents
  • 50. Other Mixture Models  Probabilistic latent semantic index (PLSI)  Latent Dirichlet Allocation (LDA)
  • 51. Problems (I)  Both k-means and mixture models need to compute centers of clusters and explicit distance measurement  Given strange distance measurement, the center of clusters can be hard to compute E.g.,  ' ' ' 1 1 2 2' max , ,..., n nx x x x x x x x      x y z     x y x z
  • 52. Problems (II)  Both k-means and mixture models look for compact clustering structures  In some cases, connected clustering structures are more desirable
  • 53. Graph Partition  MinCut: bipartite graphs with minimal number of cut edges CutSize = 2
  • 54. 2-way Spectral Graph Partitioning  Weight matrix W  wi,j: the weight between two vertices i and j  Membership vector q 1 Cluster -1 Cluster i i A q i B       [ 1,1] 2 , , arg min 1 4 n i j i j i j CutSize CutSize J q q w       q q
  • 55. Solving the Optimization Problem  Directly solving the above problem requires combinatorial search  exponential complexity  How to reduce the computation complexity?   2 , [ 1,1] , 1 argmin 4n i j i j i j q q w     q q
  • 56. Relaxation Approach  Key difficulty: qi has to be either –1, 1  Relax qi to be any real number  Impose constraint 2 1 n ii q n      2 2 2 , , , , 2 , , , 1 1 2 4 4 1 1 2 2 4 4 i j i j i j i j i j i j i j i i j i j i j i j i j J q q w q q q q w q w q q w                   ,i i j j d w   2 , , , , 1 1 1 2 2 2 i i i j i j i i i j i j j i i j i q d q q w q d w q      ,i i jD d     ( )T J  q D W q
  • 57. Relaxation Approach 2 * argmin argmin ( ) subject to T k k J q n     q q q q D W q
  • 58. Relaxation Approach  Solution: the second minimum eigenvector for D-W 2 * argmin argmin ( ) subject to T k k J q n     q q q q D W q 2( )D W  q q
  • 59. Graph Laplacian  L is semi-positive definitive matrix  For Any x, we have xTLx  0, why?  Minimum eigenvalue 1 = 0 (what is the eigenvector?)   The second minimum eigenvalue 2 gives the best bipartite graph  , , ,: ,i j i j i jj w w         L D W W D 1 2 30 ... k      
  • 60. Recovering Partitions  Due to the relaxation, q can be any number (not just –1 and 1)  How to construct partition based on the eigenvector?  Simple strategy: { | 0}, { | 0}i iA i q B i q   
  • 61. Spectral Clustering  Minimum cut does not balance the size of bipartite graphs
  • 62. Normalized Cut (Shi & Malik, 1997)  Minimize the similarity between clusters and meanwhile maximize the similarity within clusters ,( , ) , , ( , ) ( , ) i j A i B i i A j B i A i B A B s A B w d d d d s A B s A B J d d              , ( , ) ( , ) B A i j i A j BA B A B d ds A B s A B J w d d d d        j j d d   2 , B A i j i A j B A B d d w d d d             Biddd Aiddd iq BA AB if if / / )(   2 ,i j i j i j w q q 
  • 63. Normalized Cut   2 , ( - ) / if / if T i j i j i j B A i A B J w q q d d d i A q d d d i B           q D W q
  • 64. Normalized Cut  Relax q to real value under the constraint   2 , ( - ) / if / if T i j i j i j B A i A B J w q q d d d i A q d d d i B           q D W q 0,1  DeqDqq TT  Solution: DqqWD  )(