Data-driven modeling: Lecture 10

5,366 views

Published on

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,366
On SlideShare
0
From Embeds
0
Number of Embeds
3,984
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Data-driven modeling: Lecture 10

  1. 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University April 9, 2012Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 1 / 11
  2. 2. Clustering images Clustering is an unsupervised learning task by which we look for structure in the data, grouping similar examples together e.g., find groups of similar pixels within a single imageJake Hofman (Columbia University) Data-driven modeling April 9, 2012 2 / 11
  3. 3. Clustering images Clustering is an unsupervised learning task by which we look for structure in the data, grouping similar examples together e.g., find groups of similar images across a collection of imagesJake Hofman (Columbia University) Data-driven modeling April 9, 2012 2 / 11
  4. 4. K-means clustering K-means: represent each cluster by the average of its points Learn by iteratively updating cluster means and point assigmentsJake Hofman (Columbia University) Data-driven modeling April 9, 2012 3 / 11
  5. 5. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centersJake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11
  6. 6. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centersJake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11
  7. 7. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centersJake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11
  8. 8. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centersJake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11
  9. 9. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centersJake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11
  10. 10. Clustering pixels Find groups of similar pixels within a single image (e.g. “the bright red circles”) Represent each pixel as a separate example with its (R,G,B) value as a 3-d feature vectorJake Hofman (Columbia University) Data-driven modeling April 9, 2012 5 / 11
  11. 11. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( chairs . jpg )Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 6 / 11
  12. 12. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( chairs . jpg )Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 6 / 11
  13. 13. Clustering pixels Group pixels within candy.jpg into 7 clusters ./ cluster_pixels . py candy . jpg 7Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 7 / 11
  14. 14. Clustering images Find groups of similar images within a collection of images (e.g. “warm” photos) Represent each image with a binned RGB intensity histogramJake Hofman (Columbia University) Data-driven modeling April 9, 2012 8 / 11
  15. 15. Intensity histograms Disregard all spatial information, simply count pixels by intensities (e.g. lots of pixels with bright green and dark blue)Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 9 / 11
  16. 16. Intensity histograms How many bins for pixel intensities? Too many bins gives a noisy, overly complex representation of the data, while using too few bins results in an overly simple oneJake Hofman (Columbia University) Data-driven modeling April 9, 2012 10 / 11
  17. 17. Clustering images Group ’vivid’ images into 3 clusters ./ cluster_flickr . py flickr_vivid 7 10Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 11 / 11

×