SlideShare a Scribd company logo
1 of 135
Download to read offline
Machine Learning in
 Computer Vision

      Fei-Fei Li
What is (computer) vision?
• When we “see” something, what does it
  involve?
• Take a picture with a camera, it is just a
  bunch of colored dots (pixels)
• Want to make computers understand
  images
• Looks easy, but not really…

Image (or video)   Sensing device   Interpreting device    Interpretations


                                                          Corn/mature corn
                                                          jn a cornfield/
                                                          plant/blue sky in
                                                          the background
                                                          Etc.
What is it related to?
                              Biology
                                               Psychology
                          Neuroscience
Information                                Cognitive
                                           sciences
Engineering                                                            Computer
               Robotics                                                Science

                            Computer Vision
               Speech                          Information retrieval



                            Machine learning


               Physics                         Maths
Quiz?
What about this?
A picture is worth a thousand words.
                          --- Confucius
                    or Printers’ Ink Ad (1921)
A picture is worth a thousand words.
                                       --- Confucius
                                 or Printers’ Ink Ad (1921)



     horizontal lines                 textured

 blue on the top
                                        vertical
                         white

    shadow to the left                 porous



    oblique              large green patches
A picture is worth a thousand words.
                                       --- Confucius
                                 or Printers’ Ink Ad (1921)



       clear sky              autumn leaves


   building         court yard
                                    bicycles

     trees                             campus
                   day time

  talking                        outdoor
              people
Today: machine learning methods
     for object recognition
outline
• Intro to object categorization
• Brief overview
  – Generative
  – Discriminative
• Generative models
• Discriminative models
How many object categories are there?




                                Biederman 1987
Challenges 1: view point variation




Michelangelo 1475-1564
Challenges 2: illumination




                             slide credit: S. Ullman
Challenges 3: occlusion




         Magritte, 1957
Challenges 4: scale
Challenges 5: deformation




                            Xu, Beihong 1943
Challenges 6: background clutter




      Klimt, 1913
History: single object recognition
History: single object recognition




     • Lowe, et al. 1999, 2003
     • Mahamud and Herbert, 2000
     • Ferrari, Tuytelaars, and Van Gool, 2004
     • Rothganger, Lazebnik, and Ponce, 2004
     • Moreels and Perona, 2005
     •…
Challenges 7: intra-class variation
Object categorization:
           the statistical viewpoint
                         p ( zebra | image)
                                     vs.
                       p (no zebra|image)
• Bayes rule:
  p ( zebra | image)   p (image | zebra )   p ( zebra)
                     =                    ⋅
p (no zebra | image) p (image | no zebra) p (no zebra )

   posterior ratio       likelihood ratio   prior ratio
Object categorization:
           the statistical viewpoint

  p ( zebra | image)   p (image | zebra )   p ( zebra)
                     =                    ⋅
p (no zebra | image) p (image | no zebra) p (no zebra )

   posterior ratio       likelihood ratio   prior ratio


• Discriminative methods model posterior

• Generative methods model likelihood and
  prior
Discriminative
                         p( zebra | image)
• Direct modeling of
                       p (no zebra | image)

      Decision               Zebra
      boundary
                             Non-zebra
Generative
• Model p(image | zebra) and   p (image | no zebra)




        p(image | zebra)   p (image | no zebra)

              Low                  Middle

              High             Middle Low
Three main issues
• Representation
  – How to represent an object category


• Learning
  – How to form the classifier, given training data


• Recognition
  – How the classifier is to be used on novel data
Representation
– Generative /
  discriminative / hybrid
Representation
– Generative /
  discriminative / hybrid
– Appearance only or
  location and
  appearance
Representation
– Generative /
  discriminative / hybrid
– Appearance only or
  location and
  appearance
– Invariances
   •   View point
   •   Illumination
   •   Occlusion
   •   Scale
   •   Deformation
   •   Clutter
   •   etc.
Representation
– Generative /
  discriminative / hybrid
– Appearance only or
  location and
  appearance
– invariances
– Part-based or global
  w/sub-window
Representation
– Generative /
  discriminative / hybrid
– Appearance only or
  location and
  appearance
– invariances
– Parts or global w/sub-
  window
– Use set of features or
  each pixel in image
Learning
– Unclear how to model categories, so we
  learn what distinguishes them rather than
  manually specify the difference -- hence
  current interest in machine learning
Learning
– Unclear how to model categories, so we
  learn what distinguishes them rather than
  manually specify the difference -- hence
  current interest in machine learning)
– Methods of training: generative vs.
  discriminative

                  5
                                                                                               p(C1|x)               p(C2|x)
                                             p(x|C )
                                                  2                                       1
                  4



                                                               posterior probabilities
class densities




                                                                                         0.8
                  3
                                                                                         0.6
                  2
                       p(x|C1)                                                           0.4
                  1
                                                                                         0.2

                  0                                                                       0
                   0      0.2    0.4   0.6      0.8        1                               0      0.2    0.4   0.6   0.8           1
                                                       x                                                                       x
Learning
– Unclear how to model categories, so we
  learn what distinguishes them rather than
  manually specify the difference -- hence
  current interest in machine learning)
– What are you maximizing? Likelihood
  (Gen.) or performances on train/validation
  set (Disc.)
– Level of supervision
    • Manual segmentation; bounding box; image
      labels; noisy labels
                                                 Contains a motorbike
Learning
– Unclear how to model categories, so we
  learn what distinguishes them rather than
  manually specify the difference -- hence
  current interest in machine learning)
– What are you maximizing? Likelihood
  (Gen.) or performances on train/validation
  set (Disc.)
– Level of supervision
    • Manual segmentation; bounding box; image
      labels; noisy labels
– Batch/incremental (on category and image
  level; user-feedback )
Learning
– Unclear how to model categories, so we
  learn what distinguishes them rather than
  manually specify the difference -- hence
  current interest in machine learning)
– What are you maximizing? Likelihood
  (Gen.) or performances on train/validation
  set (Disc.)
– Level of supervision
    • Manual segmentation; bounding box; image
      labels; noisy labels
– Batch/incremental (on category and image
  level; user-feedback )
– Training images:
    • Issue of overfitting
    • Negative images for discriminative methods
Learning
– Unclear how to model categories, so we
  learn what distinguishes them rather than
  manually specify the difference -- hence
  current interest in machine learning)
– What are you maximizing? Likelihood
  (Gen.) or performances on train/validation
  set (Disc.)
– Level of supervision
    • Manual segmentation; bounding box; image
      labels; noisy labels
– Batch/incremental (on category and image
  level; user-feedback )
– Training images:
    • Issue of overfitting
    • Negative images for discriminative methods
– Priors
Recognition
– Scale / orientation range to search over
– Speed
Bag-of-words models
Object   Bag of ‘words’
Analogy to documents
Of all the sensory impressions proceeding to        China is forecasting a trade surplus of $90bn
the brain, the visual experiences are the           (£51bn) to $100bn this year, a threefold
dominant ones. Our perception of the world          increase on 2004's $32bn. The Commerce
around us is based essentially on the               Ministry said the surplus would be created by
messages that reach the brain from our eyes.        a predicted 30% jump in exports to $750bn,
For a long time it was thought that the retinal     compared with a 18% rise in imports to
image was transmitted pointbrain, to visual
               sensory, by point                                       China, trade,
                                                    $660bn. The figures are likely to further
            visual, perception,
centers in the brain; the cerebral cortex was a
movie screen, so to speak, upon which the
                                                                surplus, commerce,
                                                    annoy the US, which has long argued that
                                                    China's exports are unfairly helped by a
         retinal, cerebral cortex,
image in the eye was projected. Through the                    exports, imports, US,
                                                    deliberately undervalued yuan. Beijing
discoveries of Hubelcell, optical
              eye, and Wiesel we now                           yuan, bank, domestic,
                                                    agrees the surplus is too high, but says the
know that behind the origin of the visual           yuan is only one factor. Bank of China
                  nerve, image
perception in the brain there is a considerably                    foreign, increase,
                                                    governor Zhou Xiaochuan said the country
                 Hubel, Wiesel
more complicated course of events. By               also needed to do more tovalue
                                                                        trade, boost domestic
following the visual impulses along their path      demand so more goods stayed within the
to the various cell layers of the optical cortex,   country. China increased the value of the
Hubel and Wiesel have been able to                  yuan against the dollar by 2.1% in July and
demonstrate that the message about the              permitted it to trade within a narrow band, but
image falling on the retina undergoes a step-       the US wants the yuan to be allowed to trade
wise analysis in a system of nerve cells            freely. However, Beijing has made it clear that
stored in columns. In this system each cell         it will take its time and tread carefully before
has its specific function and is responsible for    allowing the yuan to rise further in value.
a specific detail in the pattern of the retinal
image.
learning                   recognition



                       codewords dictionary
   feature detection
   & representation


image representation




 category models                              category
(and/or) classifiers                          decision
Representation


                                    2.
                          codewords dictionary
 1.   feature detection
      & representation


 image representation


3.
1.Feature detection and representation
1.Feature detection and representation



Compute
  SIFT       Normalize
descriptor     patch
 [Lowe’99]

                                       Detect patches
                         [Mikojaczyk and Schmid ’02]
                         [Matas et al. ’02]
                         [Sivic et al. ’03]




                                                Slide credit: Josef Sivic
1.Feature detection and representation

              …
2. Codewords dictionary formation

            …
2. Codewords dictionary formation

            …




                      Vector quantization


                             Slide credit: Josef Sivic
2. Codewords dictionary formation




                             Fei-Fei et al. 2005
3. Image representation
frequency




                           …..
               codewords
Representation


                                    2.
                          codewords dictionary
 1.   feature detection
      & representation


 image representation


3.
Learning and Recognition



                codewords dictionary




 category models                       category
(and/or) classifiers                   decision
2 case studies

1. Naïve Bayes classifier
  –   Csurka et al. 2004




2. Hierarchical Bayesian text models
   (pLSA and LDA)
  –   Background: Hoffman 2001, Blei et al. 2004
  –   Object categorization: Sivic et al. 2005, Sudderth et
      al. 2005
  –   Natural scene categorization: Fei-Fei et al. 2005
First, some notations


• wn: each patch in an image
  – wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image
  – w = [w1,w2,…,wN]
• dj: the jth image in an image collection
• c: category of the image
• z: theme or topic of the patch
Case #1: the Naïve Bayes model


                      c                 w
                                                N

                                                          N
 c ∗ = arg max   p (c | w) ∝ p (c) p ( w | c) = p(c)∏ p( wn | c)
           c                                              n =1



Object class        Prior prob. of     Image likelihood
 decision         the object classes    given the class



                                                              Csurka et al. 2004
Case #2: Hierarchical Bayesian
                  text models
Probabilistic Latent Semantic Analysis (pLSA)


     d         z           w
                               N
 D                                 Hoffman, 2001


Latent Dirichlet Allocation (LDA)


     c     π           z           w
                   N
 D                                     Blei et al., 2001
Case #2: Hierarchical Bayesian
                  text models
Probabilistic Latent Semantic Analysis (pLSA)


     d       z      w
                        N
 D




                            “face”

                                                Sivic et al. ICCV 2005
Case #2: Hierarchical Bayesian
                  text models



 “beach”


Latent Dirichlet Allocation (LDA)


     c     π         z     w
                 N
 D
                                    Fei-Fei et al. ICCV 2005
Another application
• Human action classification
Invariance issues
• Scale and rotation
  – Implicit
  – Detectors and descriptors




                                Kadir and Brady. 2003
Invariance issues
• Scale and rotation
• Occlusion
  – Implicit in the models
  – Codeword distribution: small variations
  – (In theory) Theme (z) distribution: different
    occlusion patterns
Invariance issues
• Scale and rotation
• Occlusion
• Translation
  – Encode (relative) location information




                                        Sudderth et al. 2005
Invariance issues
•   Scale and rotation
•   Occlusion
•   Translation
•   View point (in theory)
    – Codewords: detector
      and descriptor
    – Theme distributions:
      different view points



                   Fergus et al. 2005
Model properties
                            Of all the sensory impressions proceeding to
                            the brain, the visual experiences are the
                            dominant ones. Our perception of the world
                            around us is based essentially on the
                            messages that reach the brain from our eyes.
                            For a long time it was thought that the retinal
                            image was transmitted pointbrain, to visual
                                           sensory, by point
• Intuitive                             visual, perception,
                            centers in the brain; the cerebral cortex was a
                            movie screen, so to speak, upon which the
                                     retinal, cerebral cortex,
                            image in the eye was projected. Through the
   – Analogy to documents   discoveries of Hubelcell, optical
                                          eye, and Wiesel we now
                            know that behind the origin of the visual
                                              nerve, image
                            perception in the brain there is a considerably
                                             Hubel, Wiesel
                            more complicated course of events. By
                            following the visual impulses along their path
                            to the various cell layers of the optical cortex,
                            Hubel and Wiesel have been able to
                            demonstrate that the message about the
                            image falling on the retina undergoes a step-
                            wise analysis in a system of nerve cells
                            stored in columns. In this system each cell
                            has its specific function and is responsible for
                            a specific detail in the pattern of the retinal
                            image.
Model properties



• Intuitive
• (Could use)
  generative models
  – Convenient for weakly-
    or un-supervised
    training
  – Prior information
  – Hierarchical Bayesian
    framework                Sivic et al., 2005, Sudderth et al., 2005
Model properties



• Intuitive
• (Could use)
  generative models
• Learning and
  recognition relatively
  fast
   – Compare to other
     methods
Weakness of the model

• No rigorous geometric information
  of the object components
• It’s intuitive to most of us that
  objects are made of parts – no
  such information
• Not extensively tested yet for
  – View point invariance
  – Scale invariance
• Segmentation and localization
  unclear
part-based models


   Slides courtesy to Rob Fergus for “part-based models”
One-shot learning
Fei-Fei et al. ‘03, ‘04, ‘06
                               of object categories
P. Bruegel, 1562




                                One-shot learning
Fei-Fei et al. ‘03, ‘04, ‘06
                               of object categories
model representation




                                One-shot learning
Fei-Fei et al. ‘03, ‘04, ‘06
                               of object categories
X (location)

(x,y) coords. of region center


     A (appearance)


         normalize


                     ch
                1 pat
            1
        11x                c1
                           c2




                           …..
        Projection onto
          PCA basis        c10
The Generative Model
                                 X (location)

                         (x,y) coords. of region center

     μX ΓX       μA ΓA
                              A (appearance)


             h                    normalize

      X           A                           ch
                                           pat
 I                               11x
                                     1   1          c1
                                                    c2




                                                    …..
                                 Projection onto
                                   PCA basis        c10
The Generative Model



     μX ΓX       μA ΓA   parameters


                         hidden variable
             h

      X           A      observed variables
 I
The Generative Model


                                                        ML/MAP
        μX ΓX                   μA ΓA



                      h                                                     θ1

          X                       A
    I                                                                  θ2
                                          θn
                                               where θ = {µX, ΓX, µA, ΓA}


Weber et al. ’98 ’00, Fergus et al. ’03
The Generative Model


 μX ΓX                               ML/MAP
shape model




                                                         θ1
  μA ΓA
appearance model                                    θ2
                       θn
                            where θ = {µX, ΓX, µA, ΓA}
The Generative Model


                                      ML/MAP




 μX       ΓX       μA       ΓA
                                                θ1
               h

                                               θ2
 I    X                 A
                                 θn
The Generative Model
                                                           Bayesian
     m0X       a0X             m0A       a0A
     β0X       B0X             β0A       B0A


      μX       ΓX              μA        ΓA
                           P
                                                                                θ1
                         h

                                                                               θ2
     I     X                         A
                                                θn


                                Parameters to estimate: {mX, βX, aX, BX, mA, βA, aA, BA}
Fei-Fei et al. ‘03, ‘04, ‘06    i.e. parameters of Normal-Wishart distribution
The Generative Model


     m0X       a0X             m0A       a0A   priors
     β0X       B0X             β0A       B0A


      μX       ΓX              μA        ΓA    parameters
                           P

                         h

     I     X                         A




Fei-Fei et al. ‘03, ‘04, ‘06
The Generative Model


     m0X       a0X             m0A       a0A    priors
     β0X       B0X             β0A       B0A


      μX       ΓX              μA        ΓA
                           P

                         h

     I     X                         A         Prior distribution




Fei-Fei et al. ‘03, ‘04, ‘06
1. human vision

                             3. learning
                            & inferences


            One-shot learning
2. model   of object categories
representation
                            4. evaluation
                              & dataset
                            & application
learning & inferences


    No labeling                   No segmentation   No alignment




                                              One-shot learning
Fei-Fei et al. 2003, 2004, 2006
                                             of object categories
learning & inferences

                                  Bayesian




                                                  θ1


                                                 θ2
                           θn

                                              One-shot learning
Fei-Fei et al. 2003, 2004, 2006
                                             of object categories
Random
Variational EM                  initialization




          M-Step                E-Step




           new estimate
            of p(θ|train)

                              prior knowledge of p(θ)
Attias, Jordan, Hinton etc.
evaluation & dataset




                                     One-shot learning
Fei-Fei et al. 2004, 2006a, 2006b
                                    of object categories
evaluation & dataset -- Caltech 101 Dataset




                                     One-shot learning
Fei-Fei et al. 2004, 2006a, 2006b
                                    of object categories
evaluation & dataset -- Caltech 101 Dataset




                                     One-shot learning
Fei-Fei et al. 2004, 2006a, 2006b
                                    of object categories
Part 3: discriminative methods
Discriminative methods
 Object detection and recognition is formulated as a classification problem.
 The image is partitioned into a set of overlapping windows
 … and a decision is taken at each window about if it contains a target object or not.
                                                                               Decision
                                                          Background           boundary
Where are the screens?




                                                                         Computer screen
                            Bag of image patches
                                                           In some feature space
Discriminative vs. generative
• Generative model
                             0.1
 (The artist)               0.05

                              0
                                   0   10   20   30   40     50       60   70
                                                           x = data


• Discriminative model
                              1
(The lousy painter)
                             0.5

                              0
                                   0   10   20   30   40     50       60   70
                                                           x = data


• Classification function
                              1




                             -1

                                  0    10   20   30   40     50       60   70   80
                                                           x = data
Discriminative methods
Nearest neighbor                      Neural networks




          106 examples

Shakhnarovich, Viola, Darrell 2003    LeCun, Bottou, Bengio, Haffner 1998
Berg, Berg, Malik 2005                Rowley, Baluja, Kanade 1998
…                                     …


Support Vector Machines and Kernels   Conditional Random Fields




Guyon, Vapnik                         McCallum, Freitag, Pereira 2000
Heisele, Serre, Poggio, 2001          Kumar, Hebert 2003
…                                     …
Formulation
• Formulation: binary classification
                                      …
Features x =    x1      x2      x3 … xN                xN+1 xN+2 … xN+M
Labels    y=     -1     +1      -1        -1           ?        ?              ?

          Training data: each image patch is labeled          Test data
          as containing the object or background

• Classification function
                          Where            belongs to some family of functions

• Minimize misclassification error
(Not that simple: we need some guarantees that there will be generalization)
Overview of section
• Object detection with classifiers

• Boosting
  – Gentle boosting
  – Weak detectors
  – Object model
  – Object detection

• Multiclass object detection
Why boosting?
• A simple algorithm for learning robust classifiers
   – Freund & Shapire, 1995
   – Friedman, Hastie, Tibshhirani, 1998

• Provides efficient algorithm for sparse visual
  feature selection
   – Tieu & Viola, 2000
   – Viola & Jones, 2003

• Easy to implement, not requires external
  optimization tools.
Boosting
• Defines a classifier using an additive model:



Strong               Weak classifier
classifier
                 Weight
      Features
      vector
Boosting
• Defines a classifier using an additive model:



Strong               Weak classifier
classifier
                 Weight
      Features
      vector

• We need to define a family of weak classifiers
                          from a family of weak classifiers
Boosting
• It is a sequential procedure:

         xt=1                     Each data point has
                    xt
                                  a class label:
         xt=2
                                           +1 ( )
                                    yt =
                                           -1 ( )

                                  and a weight:
                                       wt =1
Toy example
Weak learners from the family of lines



                                                              Each data point has
                                                              a class label:
                                                                          +1 ( )
                                                                   yt =
                                                                          -1 ( )

                                                              and a weight:
                                                                   wt =1




                             h => p(error) = 0.5 it is at chance
Toy example

                                                     Each data point has
                                                     a class label:
                                                              +1 ( )
                                                       yt =
                                                              -1 ( )

                                                     and a weight:
                                                          wt =1




                 This one seems to be the best
This is a ‘weak classifier’: It performs slightly better than chance.
Toy example

                                                            Each data point has
                                                            a class label:
                                                                     +1 ( )
                                                              yt =
                                                                     -1 ( )

                                                            We update the weights:
                                                               wt    wt exp{-yt Ht}




We set a new problem for which the previous weak classifier performs at chance again
Toy example

                                                            Each data point has
                                                            a class label:
                                                                     +1 ( )
                                                              yt =
                                                                     -1 ( )

                                                            We update the weights:
                                                               wt    wt exp{-yt Ht}




We set a new problem for which the previous weak classifier performs at chance again
Toy example

                                                            Each data point has
                                                            a class label:
                                                                     +1 ( )
                                                              yt =
                                                                     -1 ( )

                                                            We update the weights:
                                                               wt    wt exp{-yt Ht}




We set a new problem for which the previous weak classifier performs at chance again
Toy example

                                                            Each data point has
                                                            a class label:
                                                                     +1 ( )
                                                              yt =
                                                                     -1 ( )

                                                            We update the weights:
                                                               wt    wt exp{-yt Ht}




We set a new problem for which the previous weak classifier performs at chance again
Toy example
                   f1           f2

                                                     f4




                                                          f3




The strong (non- linear) classifier is built as the combination of
all the weak (linear) classifiers.
From images to features:
            Weak detectors
We will now define a family of visual
 features that can be used as weak
 classifiers (“weak detectors”)




    Takes image as input and the output is binary response.
    The output is a weak detector.
Weak detectors
 Textures of textures
 Tieu and Viola, CVPR 2000




Every combination of three filters
generates a different feature


This gives thousands of features. Boosting selects a sparse subset, so computations
on test time are very efficient. Boosting also avoids overfitting to some extend.
Weak detectors
Haar filters and integral image
Viola and Jones, ICCV 2001




                             The average intensity in the
                             block is computed with four
                             sums independently of the
                             block size.
Weak detectors
Other weak detectors:
• Carmichael, Hebert 2004
• Yuille, Snow, Nitzbert, 1998
• Amit, Geman 1998
• Papageorgiou, Poggio, 2000
• Heisele, Serre, Poggio, 2001
• Agarwal, Awan, Roth, 2004
• Schneiderman, Kanade 2004
• …
Weak detectors
Part based: similar to part-based generative
 models. We create weak detectors by
 using parts and voting for the object center
 location




         Car model                                   Screen model

     These features are used for the detector on the course web site.
Weak detectors
First we collect a set of part templates from a set of training
objects.
Vidal-Naquet, Ullman (2003)




                                                           …
Weak detectors
We now define a family of “weak detectors” as:



                   =              *       =




                                              Better than chance
Weak detectors
We can do a better job using filtered images


               =                =          *   =
         *




                                               Still a weak detector
                                               but better than before
Training
First we evaluate all the N features on all the training images.




 Then, we sample the feature outputs on the object center and at random
 locations in the background:
Representation and object model
Selected features for the screen detector

                                            …        …
    1       2        3        4                 10       100


                         Lousy painter
Representation and object model
Selected features for the car detector


                                         …        …
    1        2        3        4             10       100
Overview of section
• Object detection with classifiers

• Boosting
  – Gentle boosting
  – Weak detectors
  – Object model
  – Object detection

• Multiclass object detection
Example: screen detection
    Feature
     output
Example: screen detection
    Feature         Thresholded
     output            output




                   Weak ‘detector’
              Produces many false alarms.
Example: screen detection
    Feature   Thresholded   Strong classifier
     output      output       at iteration 1
Example: screen detection
    Feature       Thresholded            Strong
     output          output             classifier




             Second weak ‘detector’
          Produces a different set of
          false alarms.
Example: screen detection
    Feature   Thresholded    Strong
     output      output     classifier




                                +

                                         Strong classifier
                                         at iteration 2
Example: screen detection
    Feature   Thresholded    Strong
     output      output     classifier




                                +




                                …
                                         Strong classifier
                                         at iteration 10
Example: screen detection
    Feature         Thresholded    Strong
     output            output     classifier




                                      +




                                      …
                                               Adding
                                               features

                                                      Final
                                                  classification


              Strong classifier
              at iteration 200
applications
Document Analysis




 Digit recognition, AT&T labs
http://www.research.att.com/~yann/
Medical Imaging
Robotics
Toys and robots
Finger prints
Surveillance
Security
Searching the web

More Related Content

What's hot

Imagine camp, Developing Image Processing app for windows phone platform
Imagine camp, Developing Image Processing app for windows phone platformImagine camp, Developing Image Processing app for windows phone platform
Imagine camp, Developing Image Processing app for windows phone platformRahat Yasir
 
005 Weaving Kufi Hat
005 Weaving Kufi Hat005 Weaving Kufi Hat
005 Weaving Kufi Hatartoutman
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robotss1240148
 
Cut-Out Animation Using Magnet Motion
Cut-Out Animation Using Magnet MotionCut-Out Animation Using Magnet Motion
Cut-Out Animation Using Magnet MotionCSCJournals
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognitionAshiq Ullah
 
GRUPO 1 : digital manipulation of bright field and florescence images noise ...
GRUPO 1 :  digital manipulation of bright field and florescence images noise ...GRUPO 1 :  digital manipulation of bright field and florescence images noise ...
GRUPO 1 : digital manipulation of bright field and florescence images noise ...viisonartificial2012
 
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...Laurent Duval
 
Pertemuan 1. introduction to image processing
Pertemuan 1. introduction to image processingPertemuan 1. introduction to image processing
Pertemuan 1. introduction to image processingAditya Kurniawan
 
20110415 Scattering in CG and CV
20110415 Scattering in CG and CV20110415 Scattering in CG and CV
20110415 Scattering in CG and CVToru Tamaki
 
20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱Toru Tamaki
 

What's hot (13)

Imagine camp, Developing Image Processing app for windows phone platform
Imagine camp, Developing Image Processing app for windows phone platformImagine camp, Developing Image Processing app for windows phone platform
Imagine camp, Developing Image Processing app for windows phone platform
 
005 Weaving Kufi Hat
005 Weaving Kufi Hat005 Weaving Kufi Hat
005 Weaving Kufi Hat
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robots
 
Cut-Out Animation Using Magnet Motion
Cut-Out Animation Using Magnet MotionCut-Out Animation Using Magnet Motion
Cut-Out Animation Using Magnet Motion
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognition
 
GRUPO 1 : digital manipulation of bright field and florescence images noise ...
GRUPO 1 :  digital manipulation of bright field and florescence images noise ...GRUPO 1 :  digital manipulation of bright field and florescence images noise ...
GRUPO 1 : digital manipulation of bright field and florescence images noise ...
 
linear classification
linear classificationlinear classification
linear classification
 
Seminar
SeminarSeminar
Seminar
 
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...
 
Pertemuan 1. introduction to image processing
Pertemuan 1. introduction to image processingPertemuan 1. introduction to image processing
Pertemuan 1. introduction to image processing
 
20110415 Scattering in CG and CV
20110415 Scattering in CG and CV20110415 Scattering in CG and CV
20110415 Scattering in CG and CV
 
20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱20110326 CG・CVにおける散乱
20110326 CG・CVにおける散乱
 

Similar to Machine Learning in Computer Vision

04 data viz concepts
04 data viz concepts04 data viz concepts
04 data viz conceptsPaul Kahn
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2zukun
 
Applied Computer Vision - a Deep Learning Approach
Applied Computer Vision - a Deep Learning ApproachApplied Computer Vision - a Deep Learning Approach
Applied Computer Vision - a Deep Learning ApproachJose Berengueres
 
Cvpr2007 object category recognition p0 - introduction
Cvpr2007 object category recognition   p0 - introductionCvpr2007 object category recognition   p0 - introduction
Cvpr2007 object category recognition p0 - introductionzukun
 
SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiJohn Breslin
 
Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebertzukun
 
Lecture 7&8 computer vision face_recognition
Lecture 7&8 computer vision face_recognitionLecture 7&8 computer vision face_recognition
Lecture 7&8 computer vision face_recognitioncairo university
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyUniversidade de São Paulo
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
No Slide Title
No Slide TitleNo Slide Title
No Slide Titlebutest
 
No Slide Title
No Slide TitleNo Slide Title
No Slide Titlebutest
 
Gephi icwsm-tutorial
Gephi icwsm-tutorialGephi icwsm-tutorial
Gephi icwsm-tutorialcsedays
 
Cognitive Vision - After the hype
Cognitive Vision - After the hypeCognitive Vision - After the hype
Cognitive Vision - After the hypepotaters
 

Similar to Machine Learning in Computer Vision (20)

Part1
Part1Part1
Part1
 
04 data viz concepts
04 data viz concepts04 data viz concepts
04 data viz concepts
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 
Applied Computer Vision - a Deep Learning Approach
Applied Computer Vision - a Deep Learning ApproachApplied Computer Vision - a Deep Learning Approach
Applied Computer Vision - a Deep Learning Approach
 
Cvpr2007 object category recognition p0 - introduction
Cvpr2007 object category recognition   p0 - introductionCvpr2007 object category recognition   p0 - introduction
Cvpr2007 object category recognition p0 - introduction
 
SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with Gephi
 
Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebert
 
Lecture 7&8 computer vision face_recognition
Lecture 7&8 computer vision face_recognitionLecture 7&8 computer vision face_recognition
Lecture 7&8 computer vision face_recognition
 
Bibliotheca Digitalis Summer school: From pixels to content - Jean-Yves Ramel
Bibliotheca Digitalis Summer school: From pixels to content - Jean-Yves RamelBibliotheca Digitalis Summer school: From pixels to content - Jean-Yves Ramel
Bibliotheca Digitalis Summer school: From pixels to content - Jean-Yves Ramel
 
Reviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical StudyReviewing Data Visualization: an Analytical Taxonomical Study
Reviewing Data Visualization: an Analytical Taxonomical Study
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
No Slide Title
No Slide TitleNo Slide Title
No Slide Title
 
No Slide Title
No Slide TitleNo Slide Title
No Slide Title
 
Gephi icwsm-tutorial
Gephi icwsm-tutorialGephi icwsm-tutorial
Gephi icwsm-tutorial
 
Cognitive Vision - After the hype
Cognitive Vision - After the hypeCognitive Vision - After the hype
Cognitive Vision - After the hype
 
Promising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in visionPromising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in vision
 
Image generative modeling for design inspiration and image editing by Camille...
Image generative modeling for design inspiration and image editing by Camille...Image generative modeling for design inspiration and image editing by Camille...
Image generative modeling for design inspiration and image editing by Camille...
 
Recognition
RecognitionRecognition
Recognition
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning in Computer Vision

  • 1. Machine Learning in Computer Vision Fei-Fei Li
  • 2. What is (computer) vision? • When we “see” something, what does it involve? • Take a picture with a camera, it is just a bunch of colored dots (pixels) • Want to make computers understand images • Looks easy, but not really… Image (or video) Sensing device Interpreting device Interpretations Corn/mature corn jn a cornfield/ plant/blue sky in the background Etc.
  • 3. What is it related to? Biology Psychology Neuroscience Information Cognitive sciences Engineering Computer Robotics Science Computer Vision Speech Information retrieval Machine learning Physics Maths
  • 6. A picture is worth a thousand words. --- Confucius or Printers’ Ink Ad (1921)
  • 7. A picture is worth a thousand words. --- Confucius or Printers’ Ink Ad (1921) horizontal lines textured blue on the top vertical white shadow to the left porous oblique large green patches
  • 8. A picture is worth a thousand words. --- Confucius or Printers’ Ink Ad (1921) clear sky autumn leaves building court yard bicycles trees campus day time talking outdoor people
  • 9. Today: machine learning methods for object recognition
  • 10. outline • Intro to object categorization • Brief overview – Generative – Discriminative • Generative models • Discriminative models
  • 11. How many object categories are there? Biederman 1987
  • 12. Challenges 1: view point variation Michelangelo 1475-1564
  • 13. Challenges 2: illumination slide credit: S. Ullman
  • 14. Challenges 3: occlusion Magritte, 1957
  • 16. Challenges 5: deformation Xu, Beihong 1943
  • 17. Challenges 6: background clutter Klimt, 1913
  • 18. History: single object recognition
  • 19. History: single object recognition • Lowe, et al. 1999, 2003 • Mahamud and Herbert, 2000 • Ferrari, Tuytelaars, and Van Gool, 2004 • Rothganger, Lazebnik, and Ponce, 2004 • Moreels and Perona, 2005 •…
  • 21. Object categorization: the statistical viewpoint p ( zebra | image) vs. p (no zebra|image) • Bayes rule: p ( zebra | image) p (image | zebra ) p ( zebra) = ⋅ p (no zebra | image) p (image | no zebra) p (no zebra ) posterior ratio likelihood ratio prior ratio
  • 22. Object categorization: the statistical viewpoint p ( zebra | image) p (image | zebra ) p ( zebra) = ⋅ p (no zebra | image) p (image | no zebra) p (no zebra ) posterior ratio likelihood ratio prior ratio • Discriminative methods model posterior • Generative methods model likelihood and prior
  • 23. Discriminative p( zebra | image) • Direct modeling of p (no zebra | image) Decision Zebra boundary Non-zebra
  • 24. Generative • Model p(image | zebra) and p (image | no zebra) p(image | zebra) p (image | no zebra) Low Middle High Middle Low
  • 25. Three main issues • Representation – How to represent an object category • Learning – How to form the classifier, given training data • Recognition – How the classifier is to be used on novel data
  • 26. Representation – Generative / discriminative / hybrid
  • 27. Representation – Generative / discriminative / hybrid – Appearance only or location and appearance
  • 28. Representation – Generative / discriminative / hybrid – Appearance only or location and appearance – Invariances • View point • Illumination • Occlusion • Scale • Deformation • Clutter • etc.
  • 29. Representation – Generative / discriminative / hybrid – Appearance only or location and appearance – invariances – Part-based or global w/sub-window
  • 30. Representation – Generative / discriminative / hybrid – Appearance only or location and appearance – invariances – Parts or global w/sub- window – Use set of features or each pixel in image
  • 31. Learning – Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning
  • 32. Learning – Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) – Methods of training: generative vs. discriminative 5 p(C1|x) p(C2|x) p(x|C ) 2 1 4 posterior probabilities class densities 0.8 3 0.6 2 p(x|C1) 0.4 1 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x
  • 33. Learning – Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) – What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) – Level of supervision • Manual segmentation; bounding box; image labels; noisy labels Contains a motorbike
  • 34. Learning – Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) – What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) – Level of supervision • Manual segmentation; bounding box; image labels; noisy labels – Batch/incremental (on category and image level; user-feedback )
  • 35. Learning – Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) – What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) – Level of supervision • Manual segmentation; bounding box; image labels; noisy labels – Batch/incremental (on category and image level; user-feedback ) – Training images: • Issue of overfitting • Negative images for discriminative methods
  • 36. Learning – Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning) – What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) – Level of supervision • Manual segmentation; bounding box; image labels; noisy labels – Batch/incremental (on category and image level; user-feedback ) – Training images: • Issue of overfitting • Negative images for discriminative methods – Priors
  • 37. Recognition – Scale / orientation range to search over – Speed
  • 39. Object Bag of ‘words’
  • 40. Analogy to documents Of all the sensory impressions proceeding to China is forecasting a trade surplus of $90bn the brain, the visual experiences are the (£51bn) to $100bn this year, a threefold dominant ones. Our perception of the world increase on 2004's $32bn. The Commerce around us is based essentially on the Ministry said the surplus would be created by messages that reach the brain from our eyes. a predicted 30% jump in exports to $750bn, For a long time it was thought that the retinal compared with a 18% rise in imports to image was transmitted pointbrain, to visual sensory, by point China, trade, $660bn. The figures are likely to further visual, perception, centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the surplus, commerce, annoy the US, which has long argued that China's exports are unfairly helped by a retinal, cerebral cortex, image in the eye was projected. Through the exports, imports, US, deliberately undervalued yuan. Beijing discoveries of Hubelcell, optical eye, and Wiesel we now yuan, bank, domestic, agrees the surplus is too high, but says the know that behind the origin of the visual yuan is only one factor. Bank of China nerve, image perception in the brain there is a considerably foreign, increase, governor Zhou Xiaochuan said the country Hubel, Wiesel more complicated course of events. By also needed to do more tovalue trade, boost domestic following the visual impulses along their path demand so more goods stayed within the to the various cell layers of the optical cortex, country. China increased the value of the Hubel and Wiesel have been able to yuan against the dollar by 2.1% in July and demonstrate that the message about the permitted it to trade within a narrow band, but image falling on the retina undergoes a step- the US wants the yuan to be allowed to trade wise analysis in a system of nerve cells freely. However, Beijing has made it clear that stored in columns. In this system each cell it will take its time and tread carefully before has its specific function and is responsible for allowing the yuan to rise further in value. a specific detail in the pattern of the retinal image.
  • 41.
  • 42. learning recognition codewords dictionary feature detection & representation image representation category models category (and/or) classifiers decision
  • 43. Representation 2. codewords dictionary 1. feature detection & representation image representation 3.
  • 44. 1.Feature detection and representation
  • 45. 1.Feature detection and representation Compute SIFT Normalize descriptor patch [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Slide credit: Josef Sivic
  • 46. 1.Feature detection and representation …
  • 47. 2. Codewords dictionary formation …
  • 48. 2. Codewords dictionary formation … Vector quantization Slide credit: Josef Sivic
  • 49. 2. Codewords dictionary formation Fei-Fei et al. 2005
  • 51. Representation 2. codewords dictionary 1. feature detection & representation image representation 3.
  • 52. Learning and Recognition codewords dictionary category models category (and/or) classifiers decision
  • 53. 2 case studies 1. Naïve Bayes classifier – Csurka et al. 2004 2. Hierarchical Bayesian text models (pLSA and LDA) – Background: Hoffman 2001, Blei et al. 2004 – Object categorization: Sivic et al. 2005, Sudderth et al. 2005 – Natural scene categorization: Fei-Fei et al. 2005
  • 54. First, some notations • wn: each patch in an image – wn = [0,0,…1,…,0,0]T • w: a collection of all N patches in an image – w = [w1,w2,…,wN] • dj: the jth image in an image collection • c: category of the image • z: theme or topic of the patch
  • 55. Case #1: the Naïve Bayes model c w N N c ∗ = arg max p (c | w) ∝ p (c) p ( w | c) = p(c)∏ p( wn | c) c n =1 Object class Prior prob. of Image likelihood decision the object classes given the class Csurka et al. 2004
  • 56. Case #2: Hierarchical Bayesian text models Probabilistic Latent Semantic Analysis (pLSA) d z w N D Hoffman, 2001 Latent Dirichlet Allocation (LDA) c π z w N D Blei et al., 2001
  • 57. Case #2: Hierarchical Bayesian text models Probabilistic Latent Semantic Analysis (pLSA) d z w N D “face” Sivic et al. ICCV 2005
  • 58. Case #2: Hierarchical Bayesian text models “beach” Latent Dirichlet Allocation (LDA) c π z w N D Fei-Fei et al. ICCV 2005
  • 59. Another application • Human action classification
  • 60. Invariance issues • Scale and rotation – Implicit – Detectors and descriptors Kadir and Brady. 2003
  • 61. Invariance issues • Scale and rotation • Occlusion – Implicit in the models – Codeword distribution: small variations – (In theory) Theme (z) distribution: different occlusion patterns
  • 62. Invariance issues • Scale and rotation • Occlusion • Translation – Encode (relative) location information Sudderth et al. 2005
  • 63. Invariance issues • Scale and rotation • Occlusion • Translation • View point (in theory) – Codewords: detector and descriptor – Theme distributions: different view points Fergus et al. 2005
  • 64. Model properties Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted pointbrain, to visual sensory, by point • Intuitive visual, perception, centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the retinal, cerebral cortex, image in the eye was projected. Through the – Analogy to documents discoveries of Hubelcell, optical eye, and Wiesel we now know that behind the origin of the visual nerve, image perception in the brain there is a considerably Hubel, Wiesel more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step- wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
  • 65. Model properties • Intuitive • (Could use) generative models – Convenient for weakly- or un-supervised training – Prior information – Hierarchical Bayesian framework Sivic et al., 2005, Sudderth et al., 2005
  • 66. Model properties • Intuitive • (Could use) generative models • Learning and recognition relatively fast – Compare to other methods
  • 67. Weakness of the model • No rigorous geometric information of the object components • It’s intuitive to most of us that objects are made of parts – no such information • Not extensively tested yet for – View point invariance – Scale invariance • Segmentation and localization unclear
  • 68. part-based models Slides courtesy to Rob Fergus for “part-based models”
  • 69. One-shot learning Fei-Fei et al. ‘03, ‘04, ‘06 of object categories
  • 70. P. Bruegel, 1562 One-shot learning Fei-Fei et al. ‘03, ‘04, ‘06 of object categories
  • 71. model representation One-shot learning Fei-Fei et al. ‘03, ‘04, ‘06 of object categories
  • 72. X (location) (x,y) coords. of region center A (appearance) normalize ch 1 pat 1 11x c1 c2 ….. Projection onto PCA basis c10
  • 73. The Generative Model X (location) (x,y) coords. of region center μX ΓX μA ΓA A (appearance) h normalize X A ch pat I 11x 1 1 c1 c2 ….. Projection onto PCA basis c10
  • 74. The Generative Model μX ΓX μA ΓA parameters hidden variable h X A observed variables I
  • 75. The Generative Model ML/MAP μX ΓX μA ΓA h θ1 X A I θ2 θn where θ = {µX, ΓX, µA, ΓA} Weber et al. ’98 ’00, Fergus et al. ’03
  • 76. The Generative Model μX ΓX ML/MAP shape model θ1 μA ΓA appearance model θ2 θn where θ = {µX, ΓX, µA, ΓA}
  • 77. The Generative Model ML/MAP μX ΓX μA ΓA θ1 h θ2 I X A θn
  • 78. The Generative Model Bayesian m0X a0X m0A a0A β0X B0X β0A B0A μX ΓX μA ΓA P θ1 h θ2 I X A θn Parameters to estimate: {mX, βX, aX, BX, mA, βA, aA, BA} Fei-Fei et al. ‘03, ‘04, ‘06 i.e. parameters of Normal-Wishart distribution
  • 79. The Generative Model m0X a0X m0A a0A priors β0X B0X β0A B0A μX ΓX μA ΓA parameters P h I X A Fei-Fei et al. ‘03, ‘04, ‘06
  • 80. The Generative Model m0X a0X m0A a0A priors β0X B0X β0A B0A μX ΓX μA ΓA P h I X A Prior distribution Fei-Fei et al. ‘03, ‘04, ‘06
  • 81. 1. human vision 3. learning & inferences One-shot learning 2. model of object categories representation 4. evaluation & dataset & application
  • 82. learning & inferences No labeling No segmentation No alignment One-shot learning Fei-Fei et al. 2003, 2004, 2006 of object categories
  • 83. learning & inferences Bayesian θ1 θ2 θn One-shot learning Fei-Fei et al. 2003, 2004, 2006 of object categories
  • 84. Random Variational EM initialization M-Step E-Step new estimate of p(θ|train) prior knowledge of p(θ) Attias, Jordan, Hinton etc.
  • 85. evaluation & dataset One-shot learning Fei-Fei et al. 2004, 2006a, 2006b of object categories
  • 86. evaluation & dataset -- Caltech 101 Dataset One-shot learning Fei-Fei et al. 2004, 2006a, 2006b of object categories
  • 87. evaluation & dataset -- Caltech 101 Dataset One-shot learning Fei-Fei et al. 2004, 2006a, 2006b of object categories
  • 89. Discriminative methods Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows … and a decision is taken at each window about if it contains a target object or not. Decision Background boundary Where are the screens? Computer screen Bag of image patches In some feature space
  • 90. Discriminative vs. generative • Generative model 0.1 (The artist) 0.05 0 0 10 20 30 40 50 60 70 x = data • Discriminative model 1 (The lousy painter) 0.5 0 0 10 20 30 40 50 60 70 x = data • Classification function 1 -1 0 10 20 30 40 50 60 70 80 x = data
  • 91. Discriminative methods Nearest neighbor Neural networks 106 examples Shakhnarovich, Viola, Darrell 2003 LeCun, Bottou, Bengio, Haffner 1998 Berg, Berg, Malik 2005 Rowley, Baluja, Kanade 1998 … … Support Vector Machines and Kernels Conditional Random Fields Guyon, Vapnik McCallum, Freitag, Pereira 2000 Heisele, Serre, Poggio, 2001 Kumar, Hebert 2003 … …
  • 92. Formulation • Formulation: binary classification … Features x = x1 x2 x3 … xN xN+1 xN+2 … xN+M Labels y= -1 +1 -1 -1 ? ? ? Training data: each image patch is labeled Test data as containing the object or background • Classification function Where belongs to some family of functions • Minimize misclassification error (Not that simple: we need some guarantees that there will be generalization)
  • 93. Overview of section • Object detection with classifiers • Boosting – Gentle boosting – Weak detectors – Object model – Object detection • Multiclass object detection
  • 94. Why boosting? • A simple algorithm for learning robust classifiers – Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998 • Provides efficient algorithm for sparse visual feature selection – Tieu & Viola, 2000 – Viola & Jones, 2003 • Easy to implement, not requires external optimization tools.
  • 95. Boosting • Defines a classifier using an additive model: Strong Weak classifier classifier Weight Features vector
  • 96. Boosting • Defines a classifier using an additive model: Strong Weak classifier classifier Weight Features vector • We need to define a family of weak classifiers from a family of weak classifiers
  • 97. Boosting • It is a sequential procedure: xt=1 Each data point has xt a class label: xt=2 +1 ( ) yt = -1 ( ) and a weight: wt =1
  • 98. Toy example Weak learners from the family of lines Each data point has a class label: +1 ( ) yt = -1 ( ) and a weight: wt =1 h => p(error) = 0.5 it is at chance
  • 99. Toy example Each data point has a class label: +1 ( ) yt = -1 ( ) and a weight: wt =1 This one seems to be the best This is a ‘weak classifier’: It performs slightly better than chance.
  • 100. Toy example Each data point has a class label: +1 ( ) yt = -1 ( ) We update the weights: wt wt exp{-yt Ht} We set a new problem for which the previous weak classifier performs at chance again
  • 101. Toy example Each data point has a class label: +1 ( ) yt = -1 ( ) We update the weights: wt wt exp{-yt Ht} We set a new problem for which the previous weak classifier performs at chance again
  • 102. Toy example Each data point has a class label: +1 ( ) yt = -1 ( ) We update the weights: wt wt exp{-yt Ht} We set a new problem for which the previous weak classifier performs at chance again
  • 103. Toy example Each data point has a class label: +1 ( ) yt = -1 ( ) We update the weights: wt wt exp{-yt Ht} We set a new problem for which the previous weak classifier performs at chance again
  • 104. Toy example f1 f2 f4 f3 The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.
  • 105. From images to features: Weak detectors We will now define a family of visual features that can be used as weak classifiers (“weak detectors”) Takes image as input and the output is binary response. The output is a weak detector.
  • 106. Weak detectors Textures of textures Tieu and Viola, CVPR 2000 Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
  • 107. Weak detectors Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size.
  • 108. Weak detectors Other weak detectors: • Carmichael, Hebert 2004 • Yuille, Snow, Nitzbert, 1998 • Amit, Geman 1998 • Papageorgiou, Poggio, 2000 • Heisele, Serre, Poggio, 2001 • Agarwal, Awan, Roth, 2004 • Schneiderman, Kanade 2004 • …
  • 109. Weak detectors Part based: similar to part-based generative models. We create weak detectors by using parts and voting for the object center location Car model Screen model These features are used for the detector on the course web site.
  • 110. Weak detectors First we collect a set of part templates from a set of training objects. Vidal-Naquet, Ullman (2003) …
  • 111. Weak detectors We now define a family of “weak detectors” as: = * = Better than chance
  • 112. Weak detectors We can do a better job using filtered images = = * = * Still a weak detector but better than before
  • 113. Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background:
  • 114. Representation and object model Selected features for the screen detector … … 1 2 3 4 10 100 Lousy painter
  • 115. Representation and object model Selected features for the car detector … … 1 2 3 4 10 100
  • 116. Overview of section • Object detection with classifiers • Boosting – Gentle boosting – Weak detectors – Object model – Object detection • Multiclass object detection
  • 117. Example: screen detection Feature output
  • 118. Example: screen detection Feature Thresholded output output Weak ‘detector’ Produces many false alarms.
  • 119. Example: screen detection Feature Thresholded Strong classifier output output at iteration 1
  • 120. Example: screen detection Feature Thresholded Strong output output classifier Second weak ‘detector’ Produces a different set of false alarms.
  • 121. Example: screen detection Feature Thresholded Strong output output classifier + Strong classifier at iteration 2
  • 122. Example: screen detection Feature Thresholded Strong output output classifier + … Strong classifier at iteration 10
  • 123. Example: screen detection Feature Thresholded Strong output output classifier + … Adding features Final classification Strong classifier at iteration 200
  • 125.
  • 126.
  • 127.
  • 128. Document Analysis Digit recognition, AT&T labs http://www.research.att.com/~yann/