SlideShare a Scribd company logo
Data-driven modeling
                                           APAM E4990


                                           Jake Hofman

                                          Columbia University


                                         January 30, 2012




Jake Hofman   (Columbia University)        Data-driven modeling   January 30, 2012   1 / 23
Outline



      1 Digit recognition



      2 Image classification



      3 Acquiring image data




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   2 / 23
Digit recognition

          Classification is an supervised learning task by which we aim to
             predict the correct label for an example given its features




                                               ↓
                                  0 5 4 1 4 9
              e.g. determine which digit {0, 1, . . . , 9} is in depicted in each
                                         image


Jake Hofman    (Columbia University)    Data-driven modeling            January 30, 2012   3 / 23
Digit recognition

          Determine which digit {0, 1, . . . , 9} is in depicted in each image




Jake Hofman   (Columbia University)   Data-driven modeling          January 30, 2012   4 / 23
Images as arrays


               Grayscale images ↔ 2-d arrays of M × N pixel intensities




Jake Hofman   (Columbia University)   Data-driven modeling       January 30, 2012   5 / 23
Images as arrays

               Grayscale images ↔ 2-d arrays of M × N pixel intensities




          Represent each image as a “vector of pixels”, flattening the 2-d
                          array of pixels to a 1-d vector

Jake Hofman   (Columbia University)   Data-driven modeling       January 30, 2012   5 / 23
k-nearest neighbors classification
         k-nearest neighbors: memorize training examples, predict labels
                   using labels of the k closest training points




                          Intuition: nearby points have similar labels

Jake Hofman   (Columbia University)      Data-driven modeling            January 30, 2012   6 / 23
k-nearest neighbors classification




              Small k gives a complex boundary, large k results in coarse
                                     averaging


Jake Hofman    (Columbia University)   Data-driven modeling      January 30, 2012   7 / 23
k-nearest neighbors classification




                Evaluate performance on a held-out test set to assess
                                generalization error


Jake Hofman   (Columbia University)   Data-driven modeling      January 30, 2012   7 / 23
Digit recognition


       Simple digit classifer with k=1 nearest neighbors
          ./ classify_digits . py




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   8 / 23
Outline



      1 Digit recognition



      2 Image classification



      3 Acquiring image data




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   9 / 23
Image classification


                      Determine if an image is a landscape or headshot




                                            ↓                 ↓
                                       ’landscape’        ’headshot’

              Represent each image with a binned RGB intensity histogram




Jake Hofman    (Columbia University)         Data-driven modeling      January 30, 2012   10 / 23
Images as arrays
          Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities




       import matplotlib . image as mpimg
       I = mpimg . imread ( ' chairs . jpg ')

Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   11 / 23
Images as arrays
          Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities




       import matplotlib . image as mpimg
       I = mpimg . imread ( ' chairs . jpg ')


Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   11 / 23
Intensity histograms

        Disregard all spatial information, simply count pixels by intensities
               (e.g. lots of pixels with bright green and dark blue)




Jake Hofman   (Columbia University)   Data-driven modeling       January 30, 2012   12 / 23
Intensity histograms

                                How many bins for pixel intensities?




         Too many bins gives a noisy, overly complex representation of
                                  the data,
           while using too few bins results in an overly simple one


Jake Hofman   (Columbia University)        Data-driven modeling        January 30, 2012   13 / 23
Image classification

       Classify
           ./ classify_flickr . py 16 9
               flickr_headshot flickr_landscape




       Change in performance on test set with number of neighbors
       k      =   1,    accuracy      =   0.7125
       k      =   3,    accuracy      =   0.7425
       k      =   5,    accuracy      =   0.7725
       k      =   7,    accuracy      =   0.7650
       k      =   9,    accuracy      =   0.7500


Jake Hofman   (Columbia University)       Data-driven modeling   January 30, 2012   14 / 23
Outline



      1 Digit recognition



      2 Image classification



      3 Acquiring image data




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   15 / 23
Simple screen scraping




       One-liner to download ESL digit data
          wget - Nr -- level =1 --no - parent http :// www -
            stat . stanford . edu /~ tibs / ElemStatLearn /
            datasets / zip . digits




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   16 / 23
Simple screen scraping


       One-liner to scrape images from a webpage
          wget -O - http :// bit . ly / zxy0jN |
            tr ' ' ' ' " = ' 'n ' |
            egrep '^ http .*( png | jpg | gif ) ' |
            xargs wget




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   17 / 23
Simple screen scraping


       One-liner to scrape images from a webpage
          wget -O - http :// bit . ly / zxy0jN |
            tr ' ' ' ' " = ' 'n ' |
            egrep '^ http .*( png | jpg | gif ) ' |
            xargs wget



              • get page source
              • translate quotes and = to newlines
              • match urls with image extensions
              • download qualifying images



Jake Hofman    (Columbia University)   Data-driven modeling   January 30, 2012   17 / 23
“cat flickr xargs wget”?




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   18 / 23
Flickr API




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   19 / 23
YQL: SELECT * FROM Internet1




                                   http://developer.yahoo.com/yql




              1
                  http://oreillynet.com/pub/e/1369
Jake Hofman       (Columbia University)      Data-driven modeling   January 30, 2012   20 / 23
YQL: Console




                       http://developer.yahoo.com/yql/console


Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   21 / 23
YQL + Python
       Python function for public YQL queries
        def yql_public ( query , env = False ):
            # build dictionary of GET parameters
            params = { 'q ': query , ' format ': ' json '}
            if env :
                params [ ' env '] = env

                 # escape query
                 query_str = urlencode ( params )

                 # fetch results
                 url = '% s ?% s ' % ( YQL_PUBLIC , query_str )
                 result = urlopen ( url )

                 # parse json and return
                 return json . load ( result )[ ' query ' ][ ' results ']
Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   22 / 23
YQL + Python + Flickr


       Fetch info for “interestingness” photos
          ./ simpleyql . py ' select * from flickr . photos
            . interestingness (20) where api_key = " ... " '



       Download thumbnails for photos tagged with “vivid”
          ./ download_flickr . py vivid 500 < api_key >




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   23 / 23

More Related Content

Viewers also liked

Chapter 1 market & marketing
Chapter 1 market & marketingChapter 1 market & marketing
Chapter 1 market & marketing
Ho Cao Viet
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
Pat Maher
 
产品早期市场推广探路实践 by XDash
产品早期市场推广探路实践 by XDash产品早期市场推广探路实践 by XDash
产品早期市场推广探路实践 by XDash
Zengzhangheike.com (Uther Growth Hacking Consulting)
 
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Rijksdienst voor Ondernemend Nederland
 
Creativity
CreativityCreativity
Creativity
filipj2000
 
Reuters2
Reuters2Reuters2
Reuters2
filipj2000
 
1172. su.-.pps_angelines
1172.  su.-.pps_angelines1172.  su.-.pps_angelines
1172. su.-.pps_angelines
filipj2000
 
K state candidacy presentation
K state candidacy presentationK state candidacy presentation
K state candidacy presentation
Teague Allen
 
Cost Reductions Process - Results
Cost Reductions Process - ResultsCost Reductions Process - Results
Cost Reductions Process - Results
oferyuval
 
Successful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB toolsSuccessful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB tools
Prageeth Sandakalum
 
Towers of today
Towers of todayTowers of today
Towers of today
filipj2000
 
Letter to arvind kejriwal dec 31 13 water issue
Letter to arvind kejriwal dec 31 13   water issueLetter to arvind kejriwal dec 31 13   water issue
Letter to arvind kejriwal dec 31 13 water issue
Madhukar Varshney
 
PropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solutionPropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solution
Ratnesh1979
 
Tearsofawoman
TearsofawomanTearsofawoman
Tearsofawoman
filipj2000
 
Praias παραλίες
Praias   παραλίεςPraias   παραλίες
Praias παραλίεςfilipj2000
 
Grecia linda
Grecia lindaGrecia linda
Grecia linda
filipj2000
 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
Waner Miranda
 
Imp local
Imp localImp local
Internet Message
Internet MessageInternet Message
Internet Message
bearobo
 

Viewers also liked (20)

Chapter 1 market & marketing
Chapter 1 market & marketingChapter 1 market & marketing
Chapter 1 market & marketing
 
Orosz fotosok
Orosz fotosokOrosz fotosok
Orosz fotosok
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
 
产品早期市场推广探路实践 by XDash
产品早期市场推广探路实践 by XDash产品早期市场推广探路实践 by XDash
产品早期市场推广探路实践 by XDash
 
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
 
Creativity
CreativityCreativity
Creativity
 
Reuters2
Reuters2Reuters2
Reuters2
 
1172. su.-.pps_angelines
1172.  su.-.pps_angelines1172.  su.-.pps_angelines
1172. su.-.pps_angelines
 
K state candidacy presentation
K state candidacy presentationK state candidacy presentation
K state candidacy presentation
 
Cost Reductions Process - Results
Cost Reductions Process - ResultsCost Reductions Process - Results
Cost Reductions Process - Results
 
Successful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB toolsSuccessful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB tools
 
Towers of today
Towers of todayTowers of today
Towers of today
 
Letter to arvind kejriwal dec 31 13 water issue
Letter to arvind kejriwal dec 31 13   water issueLetter to arvind kejriwal dec 31 13   water issue
Letter to arvind kejriwal dec 31 13 water issue
 
PropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solutionPropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solution
 
Tearsofawoman
TearsofawomanTearsofawoman
Tearsofawoman
 
Praias παραλίες
Praias   παραλίεςPraias   παραλίες
Praias παραλίες
 
Grecia linda
Grecia lindaGrecia linda
Grecia linda
 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
 
Imp local
Imp localImp local
Imp local
 
Internet Message
Internet MessageInternet Message
Internet Message
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
jakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
jakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
jakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
jakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
jakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
jakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
jakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
jakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
jakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
jakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
jakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
jakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
jakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
jakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
jakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
jakehofman
 

More from jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 

Recently uploaded

The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 

Recently uploaded (20)

The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 

Data-driven modeling: Lecture 02

  • 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University January 30, 2012 Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 1 / 23
  • 2. Outline 1 Digit recognition 2 Image classification 3 Acquiring image data Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 2 / 23
  • 3. Digit recognition Classification is an supervised learning task by which we aim to predict the correct label for an example given its features ↓ 0 5 4 1 4 9 e.g. determine which digit {0, 1, . . . , 9} is in depicted in each image Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 3 / 23
  • 4. Digit recognition Determine which digit {0, 1, . . . , 9} is in depicted in each image Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 4 / 23
  • 5. Images as arrays Grayscale images ↔ 2-d arrays of M × N pixel intensities Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 5 / 23
  • 6. Images as arrays Grayscale images ↔ 2-d arrays of M × N pixel intensities Represent each image as a “vector of pixels”, flattening the 2-d array of pixels to a 1-d vector Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 5 / 23
  • 7. k-nearest neighbors classification k-nearest neighbors: memorize training examples, predict labels using labels of the k closest training points Intuition: nearby points have similar labels Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 6 / 23
  • 8. k-nearest neighbors classification Small k gives a complex boundary, large k results in coarse averaging Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 7 / 23
  • 9. k-nearest neighbors classification Evaluate performance on a held-out test set to assess generalization error Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 7 / 23
  • 10. Digit recognition Simple digit classifer with k=1 nearest neighbors ./ classify_digits . py Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 8 / 23
  • 11. Outline 1 Digit recognition 2 Image classification 3 Acquiring image data Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 9 / 23
  • 12. Image classification Determine if an image is a landscape or headshot ↓ ↓ ’landscape’ ’headshot’ Represent each image with a binned RGB intensity histogram Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 10 / 23
  • 13. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( ' chairs . jpg ') Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 11 / 23
  • 14. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( ' chairs . jpg ') Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 11 / 23
  • 15. Intensity histograms Disregard all spatial information, simply count pixels by intensities (e.g. lots of pixels with bright green and dark blue) Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 12 / 23
  • 16. Intensity histograms How many bins for pixel intensities? Too many bins gives a noisy, overly complex representation of the data, while using too few bins results in an overly simple one Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 13 / 23
  • 17. Image classification Classify ./ classify_flickr . py 16 9 flickr_headshot flickr_landscape Change in performance on test set with number of neighbors k = 1, accuracy = 0.7125 k = 3, accuracy = 0.7425 k = 5, accuracy = 0.7725 k = 7, accuracy = 0.7650 k = 9, accuracy = 0.7500 Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 14 / 23
  • 18. Outline 1 Digit recognition 2 Image classification 3 Acquiring image data Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 15 / 23
  • 19. Simple screen scraping One-liner to download ESL digit data wget - Nr -- level =1 --no - parent http :// www - stat . stanford . edu /~ tibs / ElemStatLearn / datasets / zip . digits Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 16 / 23
  • 20. Simple screen scraping One-liner to scrape images from a webpage wget -O - http :// bit . ly / zxy0jN | tr ' ' ' ' " = ' 'n ' | egrep '^ http .*( png | jpg | gif ) ' | xargs wget Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 17 / 23
  • 21. Simple screen scraping One-liner to scrape images from a webpage wget -O - http :// bit . ly / zxy0jN | tr ' ' ' ' " = ' 'n ' | egrep '^ http .*( png | jpg | gif ) ' | xargs wget • get page source • translate quotes and = to newlines • match urls with image extensions • download qualifying images Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 17 / 23
  • 22. “cat flickr xargs wget”? Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 18 / 23
  • 23. Flickr API Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 19 / 23
  • 24. YQL: SELECT * FROM Internet1 http://developer.yahoo.com/yql 1 http://oreillynet.com/pub/e/1369 Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 20 / 23
  • 25. YQL: Console http://developer.yahoo.com/yql/console Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 21 / 23
  • 26. YQL + Python Python function for public YQL queries def yql_public ( query , env = False ): # build dictionary of GET parameters params = { 'q ': query , ' format ': ' json '} if env : params [ ' env '] = env # escape query query_str = urlencode ( params ) # fetch results url = '% s ?% s ' % ( YQL_PUBLIC , query_str ) result = urlopen ( url ) # parse json and return return json . load ( result )[ ' query ' ][ ' results '] Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 22 / 23
  • 27. YQL + Python + Flickr Fetch info for “interestingness” photos ./ simpleyql . py ' select * from flickr . photos . interestingness (20) where api_key = " ... " ' Download thumbnails for photos tagged with “vivid” ./ download_flickr . py vivid 500 < api_key > Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 23 / 23