12 cie552 object_recognition

1
Object Recognition
Elsayed Hemayed

2
Object Recognition
Outline
 What is recognition
 Challenges
 Object Recognition Approach
• 2D-based recognition (Viewer-centered)
• Using histogram
• Using local features
 Bag-of-Word (BoW)
 Conclusion

3
Object Recognition
What is recognition
 Given a scene (image) and a library of models, Object Recognition is
trying to answer the following questions:
• What objects are we looking at? (n-objects X m-models)
• Are they bottles, cars, kings? And where are they in the scene?
• We want to identify all objects in view. [Object Detection]
• Is this part of the scene an instance of model X? (1-object X 1-model)
• We have a model X and we are trying to match it to a precise part of the
scene. [Object Verification]
• What is this part of the scene? (1-object X m-models)
• We want to determine the identity of a part of the scene.
• Are there any instances of model X in the scene? (n-object X 1-model)
• We picked a model X and we are trying to find instances of it in the whole
scene.
• Also object, scene, and context categorization
 Note that Face, gesture and activity recognition falls under the above
scenarios too.

4
Detection: are there people?
Object Recognition

5
Verification: is that a lamp?
Object Recognition

6
Identification: is that Potala Palace?
Object Recognition

7
Object categorization
mountain
building
tree
banner
vendor
people
street lamp
Object Recognition

8
Scene and context categorization
• outdoor
• city
• …
Object Recognition

9
Object Recognition
Challenges
 Viewpoint changes
• Translation
• Rotation
• Scale changes
 Illumination
 Clutter
 Occlusion
 Noise
 Deformation
 Intra-class variations
model object

10
Object Recognition
Clutter and Occlusion
 Clutter
• Real-world surface data has multiple objects “Extra Data”
• Can cause clutter
 Occlusion
• Surface data can have missing components
• Alter global properties of surfaces

11
Challenges: deformation
Xu, Beihong 1943
Object Recognition

12
Challenges: intra-class variation
Object Recognition

13
Object Recognition
Object Recognition Approach
 2D-based recognition (Appearance-based recognition)
• Viewer-Centered
• Global Vs Local Shape Descriptor
 3D-based recognition (Model-based recognition)
• Object-Centered
• Global Vs Local Shape Descriptor

14
Object Recognition
Viewer-centered Vs Object-centered
 Viewer-centered
• Dependent on surface view
• Easy to construct
• Surface description changes with the viewpoint
• Surfaces have to be aligned before comparison
• Separate representations must be stored for each viewpoint
 Object-centered
• Object is described in a coordinate System fixed to the object
• View-independent
• No alignment required
• More compact: single representation
• But finding a coordinate system is tough

15
Object Recognition
2D-based Recognition
Appearance-Based Recognition
 Basic assumption
• Objects can be represented by a set of images
(“appearances”).
• For recognition, it is sufficient to just compare the 2D
appearances.
• No 3D model is needed.

16
Object Recognition
Global Representation
 Represent each object (view) by a global descriptor.
 For recognizing objects, just match the (global)
descriptors.
 Histogram can be used as a global descriptors

17
Object Recognition
Recognition Using Histogram
Database with multiple training views per object
Bernt Schiele - TU Darmstadt

18
Object Recognition
Application Example : Brand Identification

19
Local Descriptors: SIFT
 These statistical approaches
characterise some aspects of the
appearance of an object that can be
used to recognise it
 But this means they are (largely)
view dependent, you have to learn a
different statistical model for each
different view
 e.g. SIFT based recognition
(David Lowe, UBC)
• Find interest points in the scale
space
• Re-describe the interest points so
that they are robust to:
• Image translation, scaling, rotation
• Partially invariant to illumination
changes, affine and 3d projection
changes
Object Recognition

20
Feature matching
 For each feature in A, find nearest neighbor in B
A B
Object Recognition

21
Feature matching
 Example: 3D object recognition
Object Recognition

22
3D object recognition
 Training images
Object Recognition

23
3D object recognition
 Only 3 keys are
needed for
recognition, so extra
keys provide
robustness
Object Recognition

24
Recognition under occlusion
Object Recognition

25
Bag-of-words models
by Li Fei-Fei (UIUC)

26
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain; the
cerebral cortex was a movie screen, so to speak,
upon which the image in the eye was projected.
Through the discoveries of Hubel and Wiesel we
now know that behind the origin of the visual
perception in the brain there is a considerably
more complicated course of events. By following
the visual impulses along their path to the various
cell layers of the optical cortex, Hubel and Wiesel
have been able to demonstrate that the message
about the image falling on the retina undergoes a
step-wise analysis in a system of nerve cells
stored in columns. In this system each cell has its
specific function and is responsible for a specific
detail in the pattern of the retinal image.
sensory, brain,
visual, perception,
retinal, cerebral cortex,
eye, cell, optical
nerve, image
Hubel, Wiesel
China is forecasting a trade surplus of $90bn
(£51bn) to $100bn this year, a threefold increase
on 2004's $32bn. The Commerce Ministry said
the surplus would be created by a predicted 30%
jump in exports to $750bn, compared with a 18%
rise in imports to $660bn. The figures are likely to
further annoy the US, which has long argued that
China's exports are unfairly helped by a
deliberately undervalued yuan. Beijing agrees the
surplus is too high, but says the yuan is only one
factor. Bank of China governor Zhou Xiaochuan
said the country also needed to do more to boost
domestic demand so more goods stayed within
the country. China increased the value of the yuan
against the dollar by 2.1% in July and permitted it
to trade within a narrow band, but the US wants
the yuan to be allowed to trade freely. However,
Beijing has made it clear that it will take its time
and tread carefully before allowing the yuan to
rise further in value.
China, trade,
surplus, commerce,
exports, imports, US,
yuan, bank, domestic,
foreign, increase,
trade, value
Object Recognition

27
Bag-of-words models
US Presidential Speeches Tag Cloud
http://chir.ag/phernalia/preztags/
 Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
Object Recognition

28
Bag-of-words models
Object Recognition

29
Bag-of-words models
Object Recognition

30
What is a bag-of-words
representation?
 For a text document
 Have a dictionary of non-common words
 Count the occurrence of each word in that document
 Make a histogram of the counts
 Normalize the histogram by dividing each count by the sum of all the
counts
 The histogram is the representation.
apple worm tree dog joint leaf grass bush fence
Object Recognition

31
Bags of features for image
classification
 Extract features
Object Recognition

32
classification
 Learn “visual vocabulary”
Object Recognition

33
classification
 Quantize features using visual vocabulary
Object Recognition

34
classification
 Quantize features using visual vocabulary
 Represent images by frequencies of
“visual words”
Object Recognition

35
Object Bag of ‘words’
Object Recognition

37
category
decision
learning
feature detection
& representation
codewords dictionary
image representation
category models
(and/or) classifiers
recognition
Object Recognition

38
feature detection
& representation
Representation
1.
2.
3.
Object Recognition

39
1.Feature detection and representation
Object Recognition

40
 Regular grid: every grid square is a feature
• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005
1. Feature extraction
Object Recognition

41
 Regular grid: every grid square is a feature
• Vogel & Schiele, 2003
 Interest point detector: the region around each point
• Csurka et al. 2004
• Sivic et al. 2005
Object Recognition

42
Normalize
patch
Detect patches
[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]
Compute
SIFT
descriptor
[Lowe’99]
Slide credit: Josef Sivic
1
23
Object Recognition

43
…
Lots of feature descriptors
for the whole image or set
of images.
Object Recognition

44
2. Discovering the visual vocabulary
…
feature vector space
What is the dimensionality?
128D for SIFT
Object Recognition

45
Clustering
…
Object Recognition

46
Clustering
…
Visual vocabulary
Object Recognition

47
2. Codewords dictionary formation
Object Recognition

48
Image patch examples of codewords
Object Recognition

49
Example codebook
…
Source: B. Leibe
Appearance codebook
Object Recognition

50
Another codebook
Appearance codebook
…
…
…
…
…
Source: B. Leibe
Object Recognition

51
3. Image representation
…..
frequency
codewords
Object Recognition

52
feature detection
& representation
Representation
1.
2.
3.
Object Recognition

53
category
decision
category models
(and/or) classifiers
Learning and Recognition
Object Recognition

54
Image classification
• Given the bag-of-features representations of images from
different classes, learn a classifier using machine learning
Object Recognition

55
Distance Computation
 Distance Families surveyed by Sung-Hyuk Cha 2007
Sung-Hyuk Cha, “Comprehensive Survey on Distance/Similarity Measures between Probability Density
Functions”, INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN
APPLIED SCIENCES, Issue 4, Volume 1, 2007, pp. 300-307.
Object Recognition

56
Object Recognition

57
Object Recognition

58
Pattern Classification Techniques
 K-Nearest Neighbors
 Naïve Bayes Classifier
 Neural Network
 Support Vector Machine (SVM)
Object Recognition

59
Object Recognition
Conclusion
 Given a scene (image) and a library of models, Object
Recognition is trying to answer a generic question: What
objects are we looking at?
 Many techniques have been developed to answer this
question and similar ones.
 Some are building their models library using 2D images
only. Others are using 3D models library.
 Still many challenges to solve before we can fully
understand what we are looking at.

12 cie552 object_recognition

More Related Content

Similar to 12 cie552 object_recognition

More from Elsayed Hemayed

Recently uploaded

12 cie552 object_recognition