Cluster	
  Forests	
  
Presented	
  by:	
  Romit	
  Singhai	
  
	
  
References	
  
Cluster	
  Forests	
  
Donghui	
  Yan	
  
	
  Department	
  of	
  Sta=s=cs	
  
University	
  of	
  Californ...
Overview	
  
•  Clustering	
  aims	
  to	
  par==on	
  a	
  set	
  of	
  data	
  such	
  
that	
  points	
  are	
  “simila...
Challenges	
  
•  Modern	
  data	
  	
  has	
  addi=onal	
  challenges	
  
v High	
  dimensionality	
  
v Huge	
  number...
Mo=va=on	
  
•  Ensemble	
  to	
  achieve	
  best	
  performance	
  
•  Can	
  we	
  develop	
  a	
  clustering	
  analogy...
General	
  Approach	
  
Cluster	
  ensemble	
  methods	
  generally	
  consist	
  of	
  
two	
  stages	
  
v Genera=on	
 ...
Algorithmic	
  descrip=on	
  of	
  CF	
  
Experiments	
  
Dataset	
   #	
  Features	
   #	
  Classes	
  
Soybean	
   35	
   4	
  
ImageSeg	
   19	
   7	
  
SPECT	
 ...
Performance	
  Metrics	
  
•  Propor=on	
  of	
  pairs	
  of	
  points	
  with	
  “correct”	
  co-­‐
cluster	
  membership...
Results	
  under	
  Pr	
  
Experiments	
  on	
  eight	
  UC	
  Irvine	
  datasets	
  
Results	
  under	
  Pc	
  
Conclusions	
  
•  CF	
  is	
  cluster	
  ensemble	
  method	
  that	
  
incorporates	
  model	
  selec=on	
  
•  Good	
  ...
Upcoming SlideShare
Loading in …5
×

Cluster Forest

690 views

Published on

Cluster Forests Algorithm

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
690
On SlideShare
0
From Embeds
0
Number of Embeds
457
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cluster Forest

  1. 1. Cluster  Forests   Presented  by:  Romit  Singhai    
  2. 2. References   Cluster  Forests   Donghui  Yan    Department  of  Sta=s=cs   University  of  California,  Berkeley   Aiyou  Chen  (Google)   Michael  I.  Jordan  (U.C.  Berkeley)  
  3. 3. Overview   •  Clustering  aims  to  par==on  a  set  of  data  such   that  points  are  “similar”  within  the  same   cluster  while  “dissimilar”  across  clusters.   v One  of  the  fundamental  task  in  machine  learning   and  paOern  classifica=on   v Applicable  in  wide  scien=fic  and  business  domains  
  4. 4. Challenges   •  Modern  data    has  addi=onal  challenges   v High  dimensionality   v Huge  number  of  observa=ons   v Increasingly  complex  
  5. 5. Mo=va=on   •  Ensemble  to  achieve  best  performance   •  Can  we  develop  a  clustering  analogy  to  RF?   •  Unifying  view  of  clustering  and  classifica=on  
  6. 6. General  Approach   Cluster  ensemble  methods  generally  consist  of   two  stages   v Genera=on  of  clustering  instances   v Aggrega=on  of  mul=ple  clustering  instances    
  7. 7. Algorithmic  descrip=on  of  CF  
  8. 8. Experiments   Dataset   #  Features   #  Classes   Soybean   35   4   ImageSeg   19   7   SPECT   22   2   Heart   13   2   Wine   13   3   WDBC   30   2   Robot   164   5   Madelon   500   2  
  9. 9. Performance  Metrics   •  Propor=on  of  pairs  of  points  with  “correct”  co-­‐ cluster  membership   Pr  =  (#  correctly  clustered  pairs/Total  #  pairs)  X  100   %   •  Clustering  accuracy     Pc  =  (#  points  with  “correct”  cluster  membership/ Total  #  points)  X  100  %     •  Assume  availability  of  “true”  labels  for  the   datasets  
  10. 10. Results  under  Pr   Experiments  on  eight  UC  Irvine  datasets  
  11. 11. Results  under  Pc  
  12. 12. Conclusions   •  CF  is  cluster  ensemble  method  that   incorporates  model  selec=on   •  Good  empirical  performance  

×