Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cs221 lecture6-fall11

444 views

Published on

Published in: Education, Technology
  • Be the first to comment

Cs221 lecture6-fall11

  1. 1. CS 221: Artificial Intelligence Lecture 6: Advanced Machine Learning Sebastian Thrun and Peter Norvig Slide credit: Mark Pollefeys, Dan Klein, Chris Manning
  2. 2. Outline <ul><li>Clustering </li></ul><ul><ul><li>K-Means </li></ul></ul><ul><ul><li>EM </li></ul></ul><ul><ul><li>Spectral Clustering </li></ul></ul><ul><li>Dimensionality Reduction </li></ul>
  3. 3. The unsupervised learning problem Many data points, no labels
  4. 4. Unsupervised Learning? <ul><li>Google Street View </li></ul>
  5. 5. K-Means Many data points, no labels
  6. 6. K-Means <ul><li>Choose a fixed number of clusters </li></ul><ul><li>Choose cluster centers and point-cluster allocations to minimize error </li></ul><ul><li>can ’t do this by exhaustive search, because there are too many possible allocations. </li></ul><ul><li>Algorithm </li></ul><ul><ul><li>fix cluster centers; allocate points to closest cluster </li></ul></ul><ul><ul><li>fix allocation; compute best cluster centers </li></ul></ul><ul><li>x could be any set of features for which we can compute a distance (careful about scaling) </li></ul>* From Marc Pollefeys COMP 256 2003
  7. 7. K-Means
  8. 8. K-Means * From Marc Pollefeys COMP 256 2003
  9. 9. K-means clustering using intensity alone and color alone Image Clusters on intensity Clusters on color * From Marc Pollefeys COMP 256 2003 Results of K-Means Clustering:
  10. 10. K-Means <ul><li>Is an approximation to EM </li></ul><ul><ul><li>Model (hypothesis space): Mixture of N Gaussians </li></ul></ul><ul><ul><li>Latent variables: Correspondence of data and Gaussians </li></ul></ul><ul><li>We notice: </li></ul><ul><ul><li>Given the mixture model, it ’s easy to calculate the correspondence </li></ul></ul><ul><ul><li>Given the correspondence it ’s easy to estimate the mixture models </li></ul></ul>
  11. 11. Expectation Maximzation: Idea <ul><li>Data generated from mixture of Gaussians </li></ul><ul><li>Latent variables: Correspondence between Data Items and Gaussians </li></ul>
  12. 12. Generalized K-Means (EM)
  13. 13. Gaussians
  14. 14. ML Fitting Gaussians
  15. 15. Learning a Gaussian Mixture (with known covariance) M-Step E-Step
  16. 16. Expectation Maximization <ul><li>Converges! </li></ul><ul><li>Proof [Neal/Hinton, McLachlan/Krishnan] : </li></ul><ul><ul><li>E/M step does not decrease data likelihood </li></ul></ul><ul><ul><li>Converges at local minimum or saddle point </li></ul></ul><ul><li>But subject to local minima </li></ul>
  17. 17. EM Clustering: Results http://www.ece.neu.edu/groups/rpl/kmeans/
  18. 18. Practical EM <ul><li>Number of Clusters unknown </li></ul><ul><li>Suffers (badly) from local minima </li></ul><ul><li>Algorithm: </li></ul><ul><ul><li>Start new cluster center if many points “unexplained” </li></ul></ul><ul><ul><li>Kill cluster center that doesn’t contribute </li></ul></ul><ul><ul><li>(Use AIC/BIC criterion for all this, if you want to be formal) </li></ul></ul>
  19. 19. Spectral Clustering
  20. 20. Spectral Clustering
  21. 21. The Two Spiral Problem
  22. 22. Spectral Clustering: Overview Data Similarities Block-Detection * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
  23. 23. Eigenvectors and Blocks <ul><li>Block matrices have block eigenvectors: </li></ul><ul><li>Near-block matrices have near-block eigenvectors: [Ng et al ., NIPS 02] </li></ul>eigensolver  1 = 2  2 = 2  3 = 0  4 = 0 eigensolver  1 = 2.02  2 = 2.02  3 = -0.02  4 = -0.02 * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 .71 .71 0 0 0 0 .71 .71 1 1 .2 0 1 1 0 -.2 .2 0 1 1 0 -.2 1 1 .71 .69 .14 0 0 -.14 .69 .71
  24. 24. Spectral Space <ul><li>Can put items into blocks by eigenvectors: </li></ul><ul><li>Resulting clusters independent of row ordering: </li></ul>e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University 1 1 .2 0 1 1 0 -.2 .2 0 1 1 0 -.2 1 1 .71 .69 .14 0 0 -.14 .69 .71 1 .2 1 0 .2 1 0 1 1 0 1 -.2 0 1 -.2 1 .71 .14 .69 0 0 .69 -.14 .71
  25. 25. The Spectral Advantage <ul><li>The key advantage of spectral clustering is the spectral space representation: </li></ul>* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
  26. 26. Measuring Affinity Intensity Texture Distance * From Marc Pollefeys COMP 256 2003
  27. 27. Scale affects affinity * From Marc Pollefeys COMP 256 2003
  28. 28. * From Marc Pollefeys COMP 256 2003
  29. 29. Dimensionality Reduction
  30. 30. The Space of Digits (in 2D) M. Brand, MERL
  31. 31. Dimensionality Reduction with PCA [email_address]
  32. 32. Linear: Principal Components <ul><li>Fit multivariate Gaussian </li></ul><ul><li>Compute eigenvectors of Covariance </li></ul><ul><li>Project onto eigenvectors with largest eigenvalues </li></ul>
  33. 33. Other examples of unsupervised learning Mean face (after alignment) Slide credit: Santiago Serrano
  34. 34. Eigenfaces Slide credit: Santiago Serrano
  35. 35. Non-Linear Techniques <ul><li>Isomap </li></ul><ul><li>Local Linear Embedding </li></ul>Isomap, Science, M. Balasubmaranian and E. Schwartz
  36. 36. Scape (Drago Anguelov et al)

×