Data-driven modeling: Lecture 10

•

3 likes•4,112 views

jakehofman

Education Technology

Clustering images

Clustering is an unsupervised learning task by which we look for
structure in the data, grouping similar examples together

e.g., ﬁnd groups of similar pixels within a single image

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 2 / 11

K-means clustering
K-means: represent each cluster by the average of its points

Learn by iteratively updating cluster means and point assigments

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 3 / 11

K-means clustering

K-means:
Choose number of clusters
Initialize cluster centers
While not converged:
Assign each point to closest
cluster
Update cluster centers

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11

Clustering pixels

Find groups of similar pixels within a single image
(e.g. “the bright red circles”)

Represent each pixel as a separate example
with its (R,G,B) value as a 3-d feature vector

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 5 / 11

Images as arrays
Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities

import matplotlib . image as mpimg
I = mpimg . imread ( ' chairs . jpg ')

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 6 / 11

Clustering pixels

Group pixels within candy.jpg into 7 clusters
./ cluster_pixels . py candy . jpg 7

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 7 / 11

Clustering images

Find groups of similar images within a collection of images
(e.g. “warm” photos)

Represent each image with a binned RGB intensity histogram

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 8 / 11

Intensity histograms

Disregard all spatial information, simply count pixels by intensities
(e.g. lots of pixels with bright green and dark blue)

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 9 / 11

Intensity histograms

How many bins for pixel intensities?

Too many bins gives a noisy, overly complex representation of
the data, while using too few bins results in an overly simple one

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 10 / 11

Clustering images

Group ’vivid’ images into 3 clusters
./ cluster_flickr . py flickr_vivid 7 10

Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 11 / 11

Viewers also liked

1강 기업교육론 20110302조현경

Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Rijksdienst voor Ondernemend Nederland

Zeynep asa 2011 privacyZeynep Tufekci

Timefilipj2000

Hieu qua san xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao VietHo Cao Viet

Modeling Social Data, Lecture 1: Case Studiesjakehofman

κινα νο 3filipj2000

Collective action under autocraciesZeynep Tufekci

Greek islands !!filipj2000

Dynamics of household africahcabanas

The Voice 2000 editionTsogo Alumni Directory

onetodayid001

Прорисиbelyavsky

Pps delz@-forbidden city-reissue 2011filipj2000

Costing System example מערכת תמחיר - דוגמאoferyuval

Socioeconomic Transformations in the Atlantic World from 1492 1750 CE Due to ...Jackson Reynolds

Introduction to Steens FurnitureSteens Furniture

11강 기업교육론 20110518조현경

Why states should fund comprehensive tc programsMaine Public Health Association Tobacco Policy Subcommittee and Friends of the FHM

Luciano pavarotti chitarra romanafilipj2000

Viewers also liked (20)

1강 기업교육론 20110302

Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...

Zeynep asa 2011 privacy

Time

Hieu qua san xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao Viet

Modeling Social Data, Lecture 1: Case Studies

κινα νο 3

Collective action under autocracies

Greek islands !!

Dynamics of household africa

The Voice 2000 edition

one

Прориси

Pps delz@-forbidden city-reissue 2011

Costing System example מערכת תמחיר - דוגמא

Socioeconomic Transformations in the Atlantic World from 1492 1750 CE Due to ...

Introduction to Steens Furniture

11강 기업교육론 20110518

Why states should fund comprehensive tc programs

Luciano pavarotti chitarra romana

Recently uploaded

This PowerPoint helps students to consider the concept of infinity.christianmathematics

Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil

The basics of sentences session 3pptx.pptxheathfieldcps1

Introduction to Nonprofit Accounting: The BasicsTechSoup

Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh

The basics of sentences session 2pptx copy.pptxheathfieldcps1

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

PROCESS RECORDING FORMAT.docxPoojaSen20

Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane

Activity 01 - Artificial Culture (1).pdfciinovamais

Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection

Holdier Curriculum Vitae (April 2024).pdfagholdier

Asian American Pacific Islander Month DDSD 2024.pptxDavid Douglas School District

On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash

How to Give a Domain for a Field in Odoo 17Celine George

Recently uploaded (20)

This PowerPoint helps students to consider the concept of infinity.

Unit-V; Pricing (Pharma Marketing Management).pptx

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

General Principles of Intellectual Property: Concepts of Intellectual Proper...

The basics of sentences session 3pptx.pptx

Introduction to Nonprofit Accounting: The Basics

Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II

Grant Readiness 101 TechSoup and Remy Consulting

Micro-Scholarship, What it is, How can it help me.pdf

The basics of sentences session 2pptx copy.pptx

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

PROCESS RECORDING FORMAT.docx

Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources

Activity 01 - Artificial Culture (1).pdf

Key note speaker Neum_Admir Softic_ENG.pdf

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Holdier Curriculum Vitae (April 2024).pdf

Asian American Pacific Islander Month DDSD 2024.pptx

On National Teacher Day, meet the 2024-25 Kenan Fellows

How to Give a Domain for a Field in Odoo 17

Data-driven modeling: Lecture 10

1. Data-driven modeling APAM E4990 Jake Hofman Columbia University April 9, 2012 Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 1 / 11

2. Clustering images Clustering is an unsupervised learning task by which we look for structure in the data, grouping similar examples together e.g., ﬁnd groups of similar pixels within a single image Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 2 / 11

3. Clustering images Clustering is an unsupervised learning task by which we look for structure in the data, grouping similar examples together e.g., ﬁnd groups of similar images across a collection of images Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 2 / 11

4. K-means clustering K-means: represent each cluster by the average of its points Learn by iteratively updating cluster means and point assigments Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 3 / 11

5. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centers Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11

6. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centers Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11

7. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centers Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11

8. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centers Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11

9. K-means clustering K-means: Choose number of clusters Initialize cluster centers While not converged: Assign each point to closest cluster Update cluster centers Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 4 / 11

10. Clustering pixels Find groups of similar pixels within a single image (e.g. “the bright red circles”) Represent each pixel as a separate example with its (R,G,B) value as a 3-d feature vector Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 5 / 11

11. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( ' chairs . jpg ') Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 6 / 11

12. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( ' chairs . jpg ') Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 6 / 11

13. Clustering pixels Group pixels within candy.jpg into 7 clusters ./ cluster_pixels . py candy . jpg 7 Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 7 / 11

14. Clustering images Find groups of similar images within a collection of images (e.g. “warm” photos) Represent each image with a binned RGB intensity histogram Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 8 / 11

15. Intensity histograms Disregard all spatial information, simply count pixels by intensities (e.g. lots of pixels with bright green and dark blue) Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 9 / 11

16. Intensity histograms How many bins for pixel intensities? Too many bins gives a noisy, overly complex representation of the data, while using too few bins results in an overly simple one Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 10 / 11

17. Clustering images Group ’vivid’ images into 3 clusters ./ cluster_flickr . py flickr_vivid 7 10 Jake Hofman (Columbia University) Data-driven modeling April 9, 2012 11 / 11

Data-driven modeling: Lecture 10

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

More from jakehofman

More from jakehofman (20)

Recently uploaded

Recently uploaded (20)

Data-driven modeling: Lecture 10