SlideShare a Scribd company logo
Object Recognition with
     Deformable Models
            Pedro F. Felzenszwalb
        Department of Computer Science
            University of Chicago



Joint work with: Dan Huttenlocher, Joshua Schwartz,
         David McAllester, Deva Ramanan.
Example Problems
  Detecting rigid objects              PASCAL challenge




                            Medical image
Detecting non-rigid objects   analysis
                                            Segmenting cells
Deformable Models
•   Significant challenge:
    - Handling variation in appearance within object classes
    - Non-rigid objects, generic categories, etc.
•   Deformable models approach:
    - Consider each object as a deformed version of a template
    - Compact representation
    - Leads to interesting modeling and algorithmic problems
Overview
•   Part I: Pictorial Structures
    - Deformable part models
    - Highly efficient matching algorithms
•   Part II: Deformable Shapes
    - Triangulated polygons
    - Hierarchical models
•   Part III: The PASCAL Challenge
    - Recognizing 20 object categories in realistic scenes
    - Discriminatively trained, multiscale, deformable part models
Part I: Pictorial Structures

•   Introduced by Fischler and Elschlager in 1973

•   Part-based models:
    - Each part represents local visual properties
    - “Springs” capture spatial relationships
                             Matching model to image involves
                             joint optimization of part locations
                                       “stretch and fit”
Local Evidence + Global Decision

•   Parts have a match quality at each image location

•   Local evidence is noisy
    - Parts are detected in the context of the whole model
             part




          test image                 match quality
Matching Problem

•   Model is represented by a graph G = (V, E)
    - V = {v ,...,v } are the parts
                 1         n

    - (v ,v ) ∈ E indicates a connection between parts
         i   j

•   mi(li) is a cost for placing part i at location li

•   dij(li,lj) is a deformation cost

•   Optimal configuration for the object is L = (l1,...,ln) minimizing
                     n
     E(L) =          ∑ m (l ) + ∑ d (l ,l )
                               i i                 ij i j
                     i=1             (vi,vj) ∈ E
Matching Problem
                           n
                E(L) =    ∑ m (l ) + ∑ d (l ,l )
                                 i i                 ij i j
                          i=1          (vi,vj) ∈ E


•   Assume n parts, k possible locations for each part
    - There are k n   configurations L

•   If graph is a tree we can use dynamic programming
    - O(nk ) algorithm
            2


•   If dij(li,lj) = g(li-lj) we can use min-convolutions
    - O(nk) algorithm
    - As fast as matching each part separately!
Dynamic Programming on Trees
                     n                                                 v2
          E(L) =    ∑ m (l ) + ∑ d (l ,l )
                              i i                 ij i j
                    i=1             (vi,vj) ∈ E                   v1



•   For each l1 find best l2:

    - Best (l ) = min [m (l ) + d
           2 1
                         l2
                                2 2               12(l1,l2)   ]
•   “Delete” v2 and solve problem with smaller model

•   Keep removing leafs until there is a single part left
Min-Convolution Speedup
                                                           v2

      Best2(l1) = min [m2(l2) + d12(l1,l2)]           v1
                     l2




•   Brute force: O(k2) --- k is number of locations

•   Suppose d12(l1,l2) = g(l1-l2):

    - Best (l ) = min [m (l ) + g(l -l )]
           2 1
                     l2
                            2 2        1 2


•   Min-convolution: O(k) if g is convex
Finding Motorbikes

Model with 6 parts:
      2 wheels
    2 headlights
front & back of seat
Human Pose Estimation
Human Tracking




Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance
IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
Part II: Deformable Shapes
•   Shape is a fundamental cue for recognizing objects

•   Many objects have no well defined parts
    - We can capture their outlines using deformable models
Triangulated Polygons




•   Polygonal templates

•   Delauney triangulation gives natural decomposition of an object

•   Consider deforming each triangle “independently”


                                    Rabbit ear can be bent by
                                    changing shape of a single
                                            triangle
Structure of Triangulated Polygons


                     There are 2 graphs associated with a
                            triangulated polygon



If the polygon is simple (no holes):

  Dual graph is a tree
  Graphical structure of triangulation is a 2-tree
Deformable Matching
        Consider piecewise affine maps from model
        to image (taking triangles to triangles)

        Find globally optimal deformation using
Model   dynamic programming over 2-tree




            Matching to MRI data
Hierarchical Shape Model
•   Shape-tree of curve from a to b:
    -   Select midpoint c, store relative location c | a,b.
    -   Left child is a shape-tree of sub-curve from a to c.
    -   Right child is a shape-tree of sub-curve from c to b.
                            h
            f           c       d     i
                e   g                                     c | a,b
                                          b
        a

                                              e | a,c                d | c,b




                                    f | a,e     g | e,c             h | c,d    i | d,b
Deformations

•   Independently perturb relative locations stored in a shape-tree
    -   Local and global properties are preserved
    -   Reconstructed curve is perceptually similar to original
Matching
                     h
     f           c           d     i
         e   g                                         c | a,b

a
                                       b   w                                           p

                                           e | a,c                d | c,b
                                                                                               r


                         v       f | a,e     g | e,c             h | c,d    i | d,b
                                                                                           q


                                             u
    model                                                                             curve

Match(v, [p,q]) = w1
Match(u, [q,r]) = w2
Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r))

         similar to parsing with the CKY algorithm
Recognizing Leafs




Nearest neighbor classification
                                  15 species
   Shape-tree           96.28
                                  75 examples per species
 Inner distance         94.13
                                  (25 training, 50 test)
 Shape context          88.12
Part III: PASCAL Challenge
•   ~10,000 images, with ~25,000 target objects
    - Objects from 20 categories (person, car, bicycle, cow, table...)
    - Objects are annotated with labeled bounding boxes
Model Overview




detection     root filter   part filters deformation
                                         models

Model has a root filter plus deformable parts
Histogram of Gradient (HOG) Features




•   Image is partitioned into 8x8 pixel blocks

•   In each block we compute a histogram of gradient orientations
    - Invariant to changes in lighting, small deformations, etc.
•   We compute features at different resolutions (pyramid)
Filters

•   Filters are rectangular templates defining weights for features

•   Score is dot product of filter and subwindow of HOG pyramid


                                                          H
                                          W
                                      Score of H at this location is H ⋅ W




                        HOG pyramid
Object Hypothesis




                                              Score is sum of filter
                                             scores plus deformation
                                                      scores

  Image pyramid        HOG feature pyramid




Multiscale model captures features at two-resolutions
Training
•   Training data consists of images with labeled bounding boxes

•   Need to learn the model structure, filters and deformation costs




                                    Training
Connection With Linear Classifiers
 •   Score of model is sum of filter scores plus deformation scores
     - Bounding box in training data specifies that score should be
       high for some placement in a range


                   w is a model
                   x is a detection window
                   z are filter placements




concatenation of filters and       concatenation of features
deformation parameters            and part displacements
Latent SVMs


Linear in w if z is fixed




            Regularization   Hinge loss
Learned Models
                            Bicycle
                     Sofa


          Car
Bottle
Example Results
More Results
Overall Results

•   9 systems competed in the 2007 challenge

•   Out of 20 classes we get:
    - First place in 10 classes
    - Second place in 6 classes
•   Some statistics:
    - It takes ~2 seconds to evaluate a model in one image
    - It takes ~3 hours to train a model
    - MUCH faster than most systems
Component Analysis

                               PASCAL2006 Person
             1
            0.9                       Root (0.18)
                                      Root+Latent (0.24)
            0.8                       Parts+Latent (0.29)
            0.7                       Root+Parts+Latent (0.34)
            0.6
precision




            0.5
            0.4
            0.3
            0.2
            0.1
             0
                  0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9        1
                                     recall
Summary

•   Deformable models provide an elegant framework for object
    detection and recognition

    - Efficient algorithms for matching models to images
    - Applications: pose estimation, medical image analysis,
      object recognition, etc.

•   We can learn models from partially labeled data

    - Generalized standard ideas from machine learning
    - Leads to state-of-the-art results in PASCAL challenge
•   Future work: hierarchical models, grammars, 3D objects

More Related Content

What's hot

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Jia-Bin Huang
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
Matthew Leingang
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defense
ssj4mathgenius
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
Pratik Poddar
 
Multimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsMultimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applications
Xavier Anguera
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New Youth
Xavier Anguera
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
Christian Robert
 
Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2
Jimbo Lamb
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Svm my
Svm mySvm my
Learning with Nets and Meshes
Learning with Nets and MeshesLearning with Nets and Meshes
Learning with Nets and Meshes
Don Sheehy
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
 
Auctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) MatchingsAuctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) Matchings
Jason Riedy
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Joo-Haeng Lee
 
Fcv learn ramanan
Fcv learn ramananFcv learn ramanan
Fcv learn ramananzukun
 

What's hot (19)

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defense
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
Multimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsMultimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applications
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New Youth
 
Gottlob ICDE 2011
Gottlob ICDE 2011Gottlob ICDE 2011
Gottlob ICDE 2011
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Svm my
Svm mySvm my
Svm my
 
Learning with Nets and Meshes
Learning with Nets and MeshesLearning with Nets and Meshes
Learning with Nets and Meshes
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...
 
Auctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) MatchingsAuctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) Matchings
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
 
Fcv learn ramanan
Fcv learn ramananFcv learn ramanan
Fcv learn ramanan
 
Venn diagram
Venn diagramVenn diagram
Venn diagram
 

Similar to Object Recognition with Deformable Models

Computer Vision transformations
Computer Vision  transformationsComputer Vision  transformations
Computer Vision transformations
Wael Badawy
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detectionzukun
 
16 17 bag_words
16 17 bag_words16 17 bag_words
16 17 bag_words
khawarbashir
 
lec07_transformations.pptx
lec07_transformations.pptxlec07_transformations.pptx
lec07_transformations.pptx
AneesAbbasi14
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetectionJie Feng
 
Solid modeling
Solid modelingSolid modeling
Solid modeling
selvakumar948
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
Charles Deledalle
 
07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicing
Elsayed Hemayed
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
McSwathi
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017
Pirouz Nourian
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
United States Air Force Academy
 
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationLec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image Segmentation
Ulaş Bağcı
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networks
Vincent Traag
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
Mark Kilgard
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
KNaveenKumarECE
 
6.2 Notes
6.2 Notes6.2 Notes
6.2 Notesmbetzel
 

Similar to Object Recognition with Deformable Models (20)

Computer Vision transformations
Computer Vision  transformationsComputer Vision  transformations
Computer Vision transformations
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detection
 
16 17 bag_words
16 17 bag_words16 17 bag_words
16 17 bag_words
 
lec07_transformations.pptx
lec07_transformations.pptxlec07_transformations.pptx
lec07_transformations.pptx
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetection
 
point processing
point processingpoint processing
point processing
 
Solid modeling
Solid modelingSolid modeling
Solid modeling
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
 
07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicing
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Integration
IntegrationIntegration
Integration
 
Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017
 
Cg 04-math
Cg 04-mathCg 04-math
Cg 04-math
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
 
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationLec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image Segmentation
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networks
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
6.2 Notes
6.2 Notes6.2 Notes
6.2 Notes
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 

Recently uploaded

Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 

Recently uploaded (20)

Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 

Object Recognition with Deformable Models

  • 1. Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.
  • 2. Example Problems Detecting rigid objects PASCAL challenge Medical image Detecting non-rigid objects analysis Segmenting cells
  • 3. Deformable Models • Significant challenge: - Handling variation in appearance within object classes - Non-rigid objects, generic categories, etc. • Deformable models approach: - Consider each object as a deformed version of a template - Compact representation - Leads to interesting modeling and algorithmic problems
  • 4. Overview • Part I: Pictorial Structures - Deformable part models - Highly efficient matching algorithms • Part II: Deformable Shapes - Triangulated polygons - Hierarchical models • Part III: The PASCAL Challenge - Recognizing 20 object categories in realistic scenes - Discriminatively trained, multiscale, deformable part models
  • 5. Part I: Pictorial Structures • Introduced by Fischler and Elschlager in 1973 • Part-based models: - Each part represents local visual properties - “Springs” capture spatial relationships Matching model to image involves joint optimization of part locations “stretch and fit”
  • 6. Local Evidence + Global Decision • Parts have a match quality at each image location • Local evidence is noisy - Parts are detected in the context of the whole model part test image match quality
  • 7. Matching Problem • Model is represented by a graph G = (V, E) - V = {v ,...,v } are the parts 1 n - (v ,v ) ∈ E indicates a connection between parts i j • mi(li) is a cost for placing part i at location li • dij(li,lj) is a deformation cost • Optimal configuration for the object is L = (l1,...,ln) minimizing n E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E
  • 8. Matching Problem n E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E • Assume n parts, k possible locations for each part - There are k n configurations L • If graph is a tree we can use dynamic programming - O(nk ) algorithm 2 • If dij(li,lj) = g(li-lj) we can use min-convolutions - O(nk) algorithm - As fast as matching each part separately!
  • 9. Dynamic Programming on Trees n v2 E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E v1 • For each l1 find best l2: - Best (l ) = min [m (l ) + d 2 1 l2 2 2 12(l1,l2) ] • “Delete” v2 and solve problem with smaller model • Keep removing leafs until there is a single part left
  • 10. Min-Convolution Speedup v2 Best2(l1) = min [m2(l2) + d12(l1,l2)] v1 l2 • Brute force: O(k2) --- k is number of locations • Suppose d12(l1,l2) = g(l1-l2): - Best (l ) = min [m (l ) + g(l -l )] 2 1 l2 2 2 1 2 • Min-convolution: O(k) if g is convex
  • 11. Finding Motorbikes Model with 6 parts: 2 wheels 2 headlights front & back of seat
  • 13. Human Tracking Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
  • 14. Part II: Deformable Shapes • Shape is a fundamental cue for recognizing objects • Many objects have no well defined parts - We can capture their outlines using deformable models
  • 15. Triangulated Polygons • Polygonal templates • Delauney triangulation gives natural decomposition of an object • Consider deforming each triangle “independently” Rabbit ear can be bent by changing shape of a single triangle
  • 16. Structure of Triangulated Polygons There are 2 graphs associated with a triangulated polygon If the polygon is simple (no holes): Dual graph is a tree Graphical structure of triangulation is a 2-tree
  • 17. Deformable Matching Consider piecewise affine maps from model to image (taking triangles to triangles) Find globally optimal deformation using Model dynamic programming over 2-tree Matching to MRI data
  • 18. Hierarchical Shape Model • Shape-tree of curve from a to b: - Select midpoint c, store relative location c | a,b. - Left child is a shape-tree of sub-curve from a to c. - Right child is a shape-tree of sub-curve from c to b. h f c d i e g c | a,b b a e | a,c d | c,b f | a,e g | e,c h | c,d i | d,b
  • 19. Deformations • Independently perturb relative locations stored in a shape-tree - Local and global properties are preserved - Reconstructed curve is perceptually similar to original
  • 20. Matching h f c d i e g c | a,b a b w p e | a,c d | c,b r v f | a,e g | e,c h | c,d i | d,b q u model curve Match(v, [p,q]) = w1 Match(u, [q,r]) = w2 Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r)) similar to parsing with the CKY algorithm
  • 21. Recognizing Leafs Nearest neighbor classification 15 species Shape-tree 96.28 75 examples per species Inner distance 94.13 (25 training, 50 test) Shape context 88.12
  • 22. Part III: PASCAL Challenge • ~10,000 images, with ~25,000 target objects - Objects from 20 categories (person, car, bicycle, cow, table...) - Objects are annotated with labeled bounding boxes
  • 23.
  • 24. Model Overview detection root filter part filters deformation models Model has a root filter plus deformable parts
  • 25. Histogram of Gradient (HOG) Features • Image is partitioned into 8x8 pixel blocks • In each block we compute a histogram of gradient orientations - Invariant to changes in lighting, small deformations, etc. • We compute features at different resolutions (pyramid)
  • 26. Filters • Filters are rectangular templates defining weights for features • Score is dot product of filter and subwindow of HOG pyramid H W Score of H at this location is H ⋅ W HOG pyramid
  • 27. Object Hypothesis Score is sum of filter scores plus deformation scores Image pyramid HOG feature pyramid Multiscale model captures features at two-resolutions
  • 28. Training • Training data consists of images with labeled bounding boxes • Need to learn the model structure, filters and deformation costs Training
  • 29. Connection With Linear Classifiers • Score of model is sum of filter scores plus deformation scores - Bounding box in training data specifies that score should be high for some placement in a range w is a model x is a detection window z are filter placements concatenation of filters and concatenation of features deformation parameters and part displacements
  • 30. Latent SVMs Linear in w if z is fixed Regularization Hinge loss
  • 31. Learned Models Bicycle Sofa Car Bottle
  • 34. Overall Results • 9 systems competed in the 2007 challenge • Out of 20 classes we get: - First place in 10 classes - Second place in 6 classes • Some statistics: - It takes ~2 seconds to evaluate a model in one image - It takes ~3 hours to train a model - MUCH faster than most systems
  • 35. Component Analysis PASCAL2006 Person 1 0.9 Root (0.18) Root+Latent (0.24) 0.8 Parts+Latent (0.29) 0.7 Root+Parts+Latent (0.34) 0.6 precision 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
  • 36. Summary • Deformable models provide an elegant framework for object detection and recognition - Efficient algorithms for matching models to images - Applications: pose estimation, medical image analysis, object recognition, etc. • We can learn models from partially labeled data - Generalized standard ideas from machine learning - Leads to state-of-the-art results in PASCAL challenge • Future work: hierarchical models, grammars, 3D objects