Machine Learning Clustering Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Clustering <ul><li>Grouping data into  (hopefully useful) sets. </li></ul>Adapted by Doug Downey from Machine Learning EEC...
Clustering <ul><li>Unsupervised Learning </li></ul><ul><ul><li>No labels </li></ul></ul><ul><li>Why do clustering? </li></...
Some definitions <ul><li>Let   X   be the dataset: </li></ul><ul><li>An  m-clustering  of   X  is a partition of  X  into ...
How many possible clusters? (Stirling numbers) Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Size of ...
What does this mean? <ul><li>We can’t try all possible clusterings. </li></ul><ul><li>Clustering algorithms look at a smal...
Who is right? <ul><li>Different techniques cluster the same data set DIFFERENTLY. </li></ul><ul><li>Who is right?  Is ther...
Classic Example: Half Moons <ul><li>From Batra et al.,  http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf <...
Steps in Clustering <ul><li>Select Features </li></ul><ul><li>Define a Proximity Measure </li></ul><ul><li>Define Clusteri...
Kinds of Clustering <ul><li>Sequential </li></ul><ul><ul><li>Fast </li></ul></ul><ul><ul><li>Results depend on data order ...
A Sequential Clustering Method <ul><li>Basic Sequential Algorithmic Scheme (BSAS) </li></ul><ul><ul><li>S. Theodoridis and...
BSAS Pseudo Code Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
A Cost-optimization method <ul><li>K-means clustering </li></ul><ul><ul><li>J. B. MacQueen (1967): &quot;Some Methods for ...
The K-means algorithm <ul><li>Place K points into the space represented by the objects that are being clustered. These poi...
K-means clustering <ul><li>The way to initialize the mean values is not specified.  </li></ul><ul><ul><li>Randomly choose ...
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
EM Algorithm <ul><li>General probabilistic approach to dealing with missing data </li></ul><ul><li>“ Parameters” (model) <...
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Mixture Models for Documents <ul><li>Learn simultaneously P(w | topic), P(topic | doc) </li></ul>From Blei et al., 2003 ( ...
Greedy Hierarchical Clustering <ul><li>Initialize one cluster for each data point </li></ul><ul><li>Until  done </li></ul>...
Hierarchical Clustering on Strings <ul><li>Features =  contexts  in which strings appear </li></ul>Adapted by Doug Downey ...
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Summary <ul><li>Algorithms: </li></ul><ul><ul><li>Sequential clustering </li></ul></ul><ul><ul><ul><li>Requires key distan...
Upcoming SlideShare
Loading in …5
×

Clustering

974 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
974
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Clustering

  1. 1. Machine Learning Clustering Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  2. 2. Clustering <ul><li>Grouping data into (hopefully useful) sets. </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Things on the right Things on the left
  3. 3. Clustering <ul><li>Unsupervised Learning </li></ul><ul><ul><li>No labels </li></ul></ul><ul><li>Why do clustering? </li></ul><ul><ul><li>Hypothesis Generation/Data Understanding </li></ul></ul><ul><ul><ul><li>Clusters might suggest natural groups. </li></ul></ul></ul><ul><ul><li>Visualization </li></ul></ul><ul><ul><li>Data pre-processing, e.g.: </li></ul></ul><ul><ul><ul><li>Medical Diagnosis </li></ul></ul></ul><ul><ul><ul><li>Text Classification (e.g., search engines, Google Sets) </li></ul></ul></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  4. 4. Some definitions <ul><li>Let X be the dataset: </li></ul><ul><li>An m-clustering of X is a partition of X into m sets (clusters) C 1 ,…C m such that: </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  5. 5. How many possible clusters? (Stirling numbers) Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Size of dataset Number of clusters
  6. 6. What does this mean? <ul><li>We can’t try all possible clusterings. </li></ul><ul><li>Clustering algorithms look at a small fraction of all partitions of the data. </li></ul><ul><li>The exact partitions tried depend on the kind of clustering used. </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  7. 7. Who is right? <ul><li>Different techniques cluster the same data set DIFFERENTLY. </li></ul><ul><li>Who is right? Is there a “right” clustering? </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  8. 8. Classic Example: Half Moons <ul><li>From Batra et al., http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  9. 9. Steps in Clustering <ul><li>Select Features </li></ul><ul><li>Define a Proximity Measure </li></ul><ul><li>Define Clustering Criterion </li></ul><ul><li>Define a Clustering Algorithm </li></ul><ul><li>Validate the Results </li></ul><ul><li>Interpret the Results </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  10. 10. Kinds of Clustering <ul><li>Sequential </li></ul><ul><ul><li>Fast </li></ul></ul><ul><ul><li>Results depend on data order </li></ul></ul><ul><li>Hierarchical </li></ul><ul><ul><li>Start with many clusters </li></ul></ul><ul><ul><li>join clusters at each step </li></ul></ul><ul><li>Cost Optimization </li></ul><ul><ul><li>Fixed number of clusters (typically) </li></ul></ul><ul><ul><li>Boundary detection, Probabilistic classifiers </li></ul></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  11. 11. A Sequential Clustering Method <ul><li>Basic Sequential Algorithmic Scheme (BSAS) </li></ul><ul><ul><li>S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, London England, 1999 </li></ul></ul><ul><li>Assumption: The number of clusters is not known in advance. </li></ul><ul><li>Let: </li></ul><ul><ul><li>d(x,C) be the distance between feature vector x and cluster C. </li></ul></ul><ul><ul><li> be the threshold of dissimilarity </li></ul></ul><ul><ul><li>q be the maximum number of clusters </li></ul></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  12. 12. BSAS Pseudo Code Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  13. 13. A Cost-optimization method <ul><li>K-means clustering </li></ul><ul><ul><li>J. B. MacQueen (1967): &quot;Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability&quot; , Berkeley, University of California Press, 1:281-297 </li></ul></ul><ul><li>A greedy algorithm </li></ul><ul><li>Partitions n samples into k clusters </li></ul><ul><li>minimizes the sum of the squared distances to the cluster centers </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  14. 14. The K-means algorithm <ul><li>Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids (means). </li></ul><ul><li>Assign each object to the group that has the closest centroid (mean). </li></ul><ul><li>When all objects have been assigned, recalculate the positions of the K centroids (means). </li></ul><ul><li>Repeat Steps 2 and 3 until the centroids no longer move. </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  15. 15. K-means clustering <ul><li>The way to initialize the mean values is not specified. </li></ul><ul><ul><li>Randomly choose k samples? </li></ul></ul><ul><li>Results depend on the initial means </li></ul><ul><ul><li>Try multiple starting points? </li></ul></ul><ul><li>Assumes K is known. </li></ul><ul><ul><li>How do we chose this? </li></ul></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  16. 16. Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  17. 17. Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  18. 18. EM Algorithm <ul><li>General probabilistic approach to dealing with missing data </li></ul><ul><li>“ Parameters” (model) </li></ul><ul><ul><li>For MMs: cluster distributions P( x | c i ) </li></ul></ul><ul><ul><ul><li>For MoGs: mean  i and variance  i 2 of each c i </li></ul></ul></ul><ul><li>“ Variables” (data) </li></ul><ul><ul><li>For MMs: Assignments of data points to clusters </li></ul></ul><ul><ul><ul><li>Probabilities of these represented as P(  i | x i ) </li></ul></ul></ul><ul><li>Idea: alternately optimize parameters and variables </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  19. 19. Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  20. 20. Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  21. 21. Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  22. 22. Mixture Models for Documents <ul><li>Learn simultaneously P(w | topic), P(topic | doc) </li></ul>From Blei et al., 2003 ( http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf )
  23. 23. Greedy Hierarchical Clustering <ul><li>Initialize one cluster for each data point </li></ul><ul><li>Until done </li></ul><ul><ul><li>Merge the two nearest clusters </li></ul></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  24. 24. Hierarchical Clustering on Strings <ul><li>Features = contexts in which strings appear </li></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  25. 25. Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
  26. 26. Summary <ul><li>Algorithms: </li></ul><ul><ul><li>Sequential clustering </li></ul></ul><ul><ul><ul><li>Requires key distance threshold, sensitive to data order </li></ul></ul></ul><ul><ul><li>K-means clustering </li></ul></ul><ul><ul><ul><li>Requires # of clusters, sensitive to initial conditions </li></ul></ul></ul><ul><ul><ul><li>Special case of mixture modeling </li></ul></ul></ul><ul><ul><li>Greedy agglomerative clustering </li></ul></ul><ul><ul><ul><li>Naively takes order of n^2 runtime </li></ul></ul></ul><ul><ul><ul><li>Hard to tell when you’re “done” </li></ul></ul></ul>Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

×