1
Object Recognition
Elsayed Hemayed
2
Object Recognition
Outline
 What is recognition
 Challenges
 Object Recognition Approach
• 2D-based recognition (Viewer-centered)
• Using histogram
• Using local features
 Bag-of-Word (BoW)
 Conclusion
3
Object Recognition
What is recognition
 Given a scene (image) and a library of models, Object Recognition is
trying to answer the following questions:
• What objects are we looking at? (n-objects X m-models)
• Are they bottles, cars, kings? And where are they in the scene?
• We want to identify all objects in view. [Object Detection]
• Is this part of the scene an instance of model X? (1-object X 1-model)
• We have a model X and we are trying to match it to a precise part of the
scene. [Object Verification]
• What is this part of the scene? (1-object X m-models)
• We want to determine the identity of a part of the scene.
• Are there any instances of model X in the scene? (n-object X 1-model)
• We picked a model X and we are trying to find instances of it in the whole
scene.
• Also object, scene, and context categorization
 Note that Face, gesture and activity recognition falls under the above
scenarios too.
4
Detection: are there people?
Object Recognition
5
Verification: is that a lamp?
Object Recognition
6
Identification: is that Potala Palace?
Object Recognition
7
Object categorization
mountain
building
tree
banner
vendor
people
street lamp
Object Recognition
8
Scene and context categorization
• outdoor
• city
• …
Object Recognition
9
Object Recognition
Challenges
 Viewpoint changes
• Translation
• Rotation
• Scale changes
 Illumination
 Clutter
 Occlusion
 Noise
 Deformation
 Intra-class variations
model object
10
Object Recognition
Clutter and Occlusion
 Clutter
• Real-world surface data has multiple objects “Extra Data”
• Can cause clutter
 Occlusion
• Surface data can have missing components
• Alter global properties of surfaces
11
Challenges: deformation
Xu, Beihong 1943
Object Recognition
12
Challenges: intra-class variation
Object Recognition
13
Object Recognition
Object Recognition Approach
 2D-based recognition (Appearance-based recognition)
• Viewer-Centered
• Global Vs Local Shape Descriptor
 3D-based recognition (Model-based recognition)
• Object-Centered
• Global Vs Local Shape Descriptor
14
Object Recognition
Viewer-centered Vs Object-centered
 Viewer-centered
• Dependent on surface view
• Easy to construct
• Surface description changes with the viewpoint
• Surfaces have to be aligned before comparison
• Separate representations must be stored for each viewpoint
 Object-centered
• Object is described in a coordinate System fixed to the object
• View-independent
• No alignment required
• More compact: single representation
• But finding a coordinate system is tough
15
Object Recognition
2D-based Recognition
Appearance-Based Recognition
 Basic assumption
• Objects can be represented by a set of images
(“appearances”).
• For recognition, it is sufficient to just compare the 2D
appearances.
• No 3D model is needed.
16
Object Recognition
Global Representation
 Represent each object (view) by a global descriptor.
 For recognizing objects, just match the (global)
descriptors.
 Histogram can be used as a global descriptors
17
Object Recognition
Recognition Using Histogram
Database with multiple training views per object
Bernt Schiele - TU Darmstadt
18
Object Recognition
Application Example : Brand Identification
19
Local Descriptors: SIFT
 These statistical approaches
characterise some aspects of the
appearance of an object that can be
used to recognise it
 But this means they are (largely)
view dependent, you have to learn a
different statistical model for each
different view
 e.g. SIFT based recognition
(David Lowe, UBC)
• Find interest points in the scale
space
• Re-describe the interest points so
that they are robust to:
• Image translation, scaling, rotation
• Partially invariant to illumination
changes, affine and 3d projection
changes
Object Recognition
20
Feature matching
 For each feature in A, find nearest neighbor in B
A B
Object Recognition
21
Feature matching
 Example: 3D object recognition
Object Recognition
22
3D object recognition
 Training images
Object Recognition
23
3D object recognition
 Only 3 keys are
needed for
recognition, so extra
keys provide
robustness
Object Recognition
24
Recognition under occlusion
Object Recognition
25
Bag-of-words models
by Li Fei-Fei (UIUC)
26
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain; the
cerebral cortex was a movie screen, so to speak,
upon which the image in the eye was projected.
Through the discoveries of Hubel and Wiesel we
now know that behind the origin of the visual
perception in the brain there is a considerably
more complicated course of events. By following
the visual impulses along their path to the various
cell layers of the optical cortex, Hubel and Wiesel
have been able to demonstrate that the message
about the image falling on the retina undergoes a
step-wise analysis in a system of nerve cells
stored in columns. In this system each cell has its
specific function and is responsible for a specific
detail in the pattern of the retinal image.
sensory, brain,
visual, perception,
retinal, cerebral cortex,
eye, cell, optical
nerve, image
Hubel, Wiesel
China is forecasting a trade surplus of $90bn
(£51bn) to $100bn this year, a threefold increase
on 2004's $32bn. The Commerce Ministry said
the surplus would be created by a predicted 30%
jump in exports to $750bn, compared with a 18%
rise in imports to $660bn. The figures are likely to
further annoy the US, which has long argued that
China's exports are unfairly helped by a
deliberately undervalued yuan. Beijing agrees the
surplus is too high, but says the yuan is only one
factor. Bank of China governor Zhou Xiaochuan
said the country also needed to do more to boost
domestic demand so more goods stayed within
the country. China increased the value of the yuan
against the dollar by 2.1% in July and permitted it
to trade within a narrow band, but the US wants
the yuan to be allowed to trade freely. However,
Beijing has made it clear that it will take its time
and tread carefully before allowing the yuan to
rise further in value.
China, trade,
surplus, commerce,
exports, imports, US,
yuan, bank, domestic,
foreign, increase,
trade, value
Object Recognition
27
Bag-of-words models
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
 Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Object Recognition
28
Bag-of-words models
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
 Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Object Recognition
29
Bag-of-words models
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
 Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Object Recognition
30
What is a bag-of-words
representation?
 For a text document
 Have a dictionary of non-common words
 Count the occurrence of each word in that document
 Make a histogram of the counts
 Normalize the histogram by dividing each count by the sum of all the
counts
 The histogram is the representation.
apple worm tree dog joint leaf grass bush fence
Object Recognition
31
Bags of features for image
classification
 Extract features
Object Recognition
32
Bags of features for image
classification
 Extract features
 Learn “visual vocabulary”
Object Recognition
33
Bags of features for image
classification
 Extract features
 Learn “visual vocabulary”
 Quantize features using visual vocabulary
Object Recognition
34
Bags of features for image
classification
 Extract features
 Learn “visual vocabulary”
 Quantize features using visual vocabulary
 Represent images by frequencies of
“visual words”
Object Recognition
35
Object Bag of ‘words’
Object Recognition
36
Object Recognition
37
category
decision
learning
feature detection
& representation
codewords dictionary
image representation
category models
(and/or) classifiers
recognition
Object Recognition
38
feature detection
& representation
codewords dictionary
image representation
Representation
1.
2.
3.
Object Recognition
39
1.Feature detection and representation
Object Recognition
40
 Regular grid: every grid square is a feature
• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005
1. Feature extraction
Object Recognition
41
 Regular grid: every grid square is a feature
• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005
 Interest point detector: the region around each point
• Csurka et al. 2004
• Fei-Fei & Perona, 2005
• Sivic et al. 2005
1. Feature extraction
Object Recognition
42
Normalize
patch
Detect patches
[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]
Compute
SIFT
descriptor
[Lowe’99]
Slide credit: Josef Sivic
1. Feature extraction
1
23
Object Recognition
43
…
1. Feature extraction
Lots of feature descriptors
for the whole image or set
of images.
Object Recognition
44
2. Discovering the visual vocabulary
…
feature vector space
What is the dimensionality?
128D for SIFT
Object Recognition
45
2. Discovering the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Object Recognition
46
2. Discovering the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Visual vocabulary
Object Recognition
47
2. Codewords dictionary formation
Object Recognition
48
Image patch examples of codewords
Object Recognition
49
Example codebook
…
Source: B. Leibe
Appearance codebook
Object Recognition
50
Another codebook
Appearance codebook
…
…
…
…
…
Source: B. Leibe
Object Recognition
51
3. Image representation
…..
frequency
codewords
Object Recognition
52
feature detection
& representation
codewords dictionary
image representation
Representation
1.
2.
3.
Object Recognition
53
category
decision
codewords dictionary
category models
(and/or) classifiers
Learning and Recognition
Object Recognition
54
Image classification
• Given the bag-of-features representations of images from
different classes, learn a classifier using machine learning
Object Recognition
55
Distance Computation
 Distance Families surveyed by Sung-Hyuk Cha 2007
Sung-Hyuk Cha, “Comprehensive Survey on Distance/Similarity Measures between Probability Density
Functions”, INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN
APPLIED SCIENCES, Issue 4, Volume 1, 2007, pp. 300-307.
Object Recognition
56
Distance Computation
Object Recognition
57
Distance Computation
Object Recognition
58
Pattern Classification Techniques
 K-Nearest Neighbors
 Naïve Bayes Classifier
 Neural Network
 Support Vector Machine (SVM)
Object Recognition
59
Object Recognition
Conclusion
 Given a scene (image) and a library of models, Object
Recognition is trying to answer a generic question: What
objects are we looking at?
 Many techniques have been developed to answer this
question and similar ones.
 Some are building their models library using 2D images
only. Others are using 3D models library.
 Still many challenges to solve before we can fully
understand what we are looking at.

12 cie552 object_recognition

  • 1.
  • 2.
    2 Object Recognition Outline  Whatis recognition  Challenges  Object Recognition Approach • 2D-based recognition (Viewer-centered) • Using histogram • Using local features  Bag-of-Word (BoW)  Conclusion
  • 3.
    3 Object Recognition What isrecognition  Given a scene (image) and a library of models, Object Recognition is trying to answer the following questions: • What objects are we looking at? (n-objects X m-models) • Are they bottles, cars, kings? And where are they in the scene? • We want to identify all objects in view. [Object Detection] • Is this part of the scene an instance of model X? (1-object X 1-model) • We have a model X and we are trying to match it to a precise part of the scene. [Object Verification] • What is this part of the scene? (1-object X m-models) • We want to determine the identity of a part of the scene. • Are there any instances of model X in the scene? (n-object X 1-model) • We picked a model X and we are trying to find instances of it in the whole scene. • Also object, scene, and context categorization  Note that Face, gesture and activity recognition falls under the above scenarios too.
  • 4.
    4 Detection: are therepeople? Object Recognition
  • 5.
    5 Verification: is thata lamp? Object Recognition
  • 6.
    6 Identification: is thatPotala Palace? Object Recognition
  • 7.
  • 8.
    8 Scene and contextcategorization • outdoor • city • … Object Recognition
  • 9.
    9 Object Recognition Challenges  Viewpointchanges • Translation • Rotation • Scale changes  Illumination  Clutter  Occlusion  Noise  Deformation  Intra-class variations model object
  • 10.
    10 Object Recognition Clutter andOcclusion  Clutter • Real-world surface data has multiple objects “Extra Data” • Can cause clutter  Occlusion • Surface data can have missing components • Alter global properties of surfaces
  • 11.
  • 12.
  • 13.
    13 Object Recognition Object RecognitionApproach  2D-based recognition (Appearance-based recognition) • Viewer-Centered • Global Vs Local Shape Descriptor  3D-based recognition (Model-based recognition) • Object-Centered • Global Vs Local Shape Descriptor
  • 14.
    14 Object Recognition Viewer-centered VsObject-centered  Viewer-centered • Dependent on surface view • Easy to construct • Surface description changes with the viewpoint • Surfaces have to be aligned before comparison • Separate representations must be stored for each viewpoint  Object-centered • Object is described in a coordinate System fixed to the object • View-independent • No alignment required • More compact: single representation • But finding a coordinate system is tough
  • 15.
    15 Object Recognition 2D-based Recognition Appearance-BasedRecognition  Basic assumption • Objects can be represented by a set of images (“appearances”). • For recognition, it is sufficient to just compare the 2D appearances. • No 3D model is needed.
  • 16.
    16 Object Recognition Global Representation Represent each object (view) by a global descriptor.  For recognizing objects, just match the (global) descriptors.  Histogram can be used as a global descriptors
  • 17.
    17 Object Recognition Recognition UsingHistogram Database with multiple training views per object Bernt Schiele - TU Darmstadt
  • 18.
  • 19.
    19 Local Descriptors: SIFT These statistical approaches characterise some aspects of the appearance of an object that can be used to recognise it  But this means they are (largely) view dependent, you have to learn a different statistical model for each different view  e.g. SIFT based recognition (David Lowe, UBC) • Find interest points in the scale space • Re-describe the interest points so that they are robust to: • Image translation, scaling, rotation • Partially invariant to illumination changes, affine and 3d projection changes Object Recognition
  • 20.
    20 Feature matching  Foreach feature in A, find nearest neighbor in B A B Object Recognition
  • 21.
    21 Feature matching  Example:3D object recognition Object Recognition
  • 22.
    22 3D object recognition Training images Object Recognition
  • 23.
    23 3D object recognition Only 3 keys are needed for recognition, so extra keys provide robustness Object Recognition
  • 24.
  • 25.
  • 26.
    26 Analogy to documents Ofall the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value Object Recognition
  • 27.
    27 Bag-of-words models US PresidentialSpeeches Tag Cloud http://chir.ag/phernalia/preztags/  Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Object Recognition
  • 28.
    28 Bag-of-words models US PresidentialSpeeches Tag Cloud http://chir.ag/phernalia/preztags/  Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Object Recognition
  • 29.
    29 Bag-of-words models US PresidentialSpeeches Tag Cloud http://chir.ag/phernalia/preztags/  Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) Object Recognition
  • 30.
    30 What is abag-of-words representation?  For a text document  Have a dictionary of non-common words  Count the occurrence of each word in that document  Make a histogram of the counts  Normalize the histogram by dividing each count by the sum of all the counts  The histogram is the representation. apple worm tree dog joint leaf grass bush fence Object Recognition
  • 31.
    31 Bags of featuresfor image classification  Extract features Object Recognition
  • 32.
    32 Bags of featuresfor image classification  Extract features  Learn “visual vocabulary” Object Recognition
  • 33.
    33 Bags of featuresfor image classification  Extract features  Learn “visual vocabulary”  Quantize features using visual vocabulary Object Recognition
  • 34.
    34 Bags of featuresfor image classification  Extract features  Learn “visual vocabulary”  Quantize features using visual vocabulary  Represent images by frequencies of “visual words” Object Recognition
  • 35.
    35 Object Bag of‘words’ Object Recognition
  • 36.
  • 37.
    37 category decision learning feature detection & representation codewordsdictionary image representation category models (and/or) classifiers recognition Object Recognition
  • 38.
    38 feature detection & representation codewordsdictionary image representation Representation 1. 2. 3. Object Recognition
  • 39.
    39 1.Feature detection andrepresentation Object Recognition
  • 40.
    40  Regular grid:every grid square is a feature • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 1. Feature extraction Object Recognition
  • 41.
    41  Regular grid:every grid square is a feature • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005  Interest point detector: the region around each point • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 1. Feature extraction Object Recognition
  • 42.
    42 Normalize patch Detect patches [Mikojaczyk andSchmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic 1. Feature extraction 1 23 Object Recognition
  • 43.
    43 … 1. Feature extraction Lotsof feature descriptors for the whole image or set of images. Object Recognition
  • 44.
    44 2. Discovering thevisual vocabulary … feature vector space What is the dimensionality? 128D for SIFT Object Recognition
  • 45.
    45 2. Discovering thevisual vocabulary Clustering … Slide credit: Josef Sivic Object Recognition
  • 46.
    46 2. Discovering thevisual vocabulary Clustering … Slide credit: Josef Sivic Visual vocabulary Object Recognition
  • 47.
    47 2. Codewords dictionaryformation Object Recognition
  • 48.
    48 Image patch examplesof codewords Object Recognition
  • 49.
    49 Example codebook … Source: B.Leibe Appearance codebook Object Recognition
  • 50.
  • 51.
  • 52.
    52 feature detection & representation codewordsdictionary image representation Representation 1. 2. 3. Object Recognition
  • 53.
    53 category decision codewords dictionary category models (and/or)classifiers Learning and Recognition Object Recognition
  • 54.
    54 Image classification • Giventhe bag-of-features representations of images from different classes, learn a classifier using machine learning Object Recognition
  • 55.
    55 Distance Computation  DistanceFamilies surveyed by Sung-Hyuk Cha 2007 Sung-Hyuk Cha, “Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions”, INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES, Issue 4, Volume 1, 2007, pp. 300-307. Object Recognition
  • 56.
  • 57.
  • 58.
    58 Pattern Classification Techniques K-Nearest Neighbors  Naïve Bayes Classifier  Neural Network  Support Vector Machine (SVM) Object Recognition
  • 59.
    59 Object Recognition Conclusion  Givena scene (image) and a library of models, Object Recognition is trying to answer a generic question: What objects are we looking at?  Many techniques have been developed to answer this question and similar ones.  Some are building their models library using 2D images only. Others are using 3D models library.  Still many challenges to solve before we can fully understand what we are looking at.