Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Contextless Object Recognition 
with Shape-enriched SIFT and 
Bags of Features 
Marcel Tella Amo 
Directed by Dr. Matthias...
Motivation 
2 
Object Recognition and Classification 
Categories 
• Ball 
• Airplane 
• Chair 
• Beaver 
• … 
Ball Airplan...
3 
Index 
Requirements 
State of the Art 
Design 
Results
Requirements 
4
Requirements State of the Art Design Results 
Design shape features that can be used in an 
aggregated framework, like Bag...
Requirements State of the Art Design Results 
Analyse the implication of the vocabulary size 
with respect to the size of ...
The proposed features should be at least scale, 
rotation and translation invariant. If it is 
possible, flip invariant as...
Need for Segmentation to codify the shape 
Study the limitations of shape coding when using a state of the art 
segmentati...
State of the Art 
9
Requirements State of the Art Design Results 
Object Candidates algorithms 
Multiscale Combinatorial Grouping (MCG) 
10 
R...
Requirements State of the Art Design Results 
Shape Context 
11 
G. Mori, S. Belongie, and J. Malik. Ecient shape 
matchin...
Requirements State of the Art Design Results 
Interest point descriptors: 
SIFT descriptor 
Simplified example 
Typically ...
Requirements State of the Art Design Results 
Enrichment of SIFT 
Extra features : Absolute spatial location (X,Y) or angl...
Bag of Words 
14 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Bags of Words - Pipeline 
15 
Get 
Descriptors 
Clustering 
(K-means) 
Creat...
Design 
16
Requirements State of the Art Design Results 
Why dense SIFT? 
17
Main principle: Combination of dense SIFT and Object Candidates 
18 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Distance to the nearest border (DNB) 
Logarithmic distance to the nearest bo...
Distance and Angle to the nearest border (DANB) 
Problem: Really similar in 2D but very different values. 
Solution: Codif...
Rotation Invariant Angle to the nearest border 
21 
Requirements State of the Art Design Results
Distance to the center (DC) 
22 
Requirements State of the Art Design Results
η - Angular Scan (ηAS) 
WINNER! 
23 
Requirements State of the Art Design Results
Shape Context from a dense SIFT (DSC) 
Note: It crosses the contour of the region like Shape Context. 
ηAS does not! 
24 
...
Requirements State of the Art Design Results 
Rotation Invariant Region Quantization (RIRQ) 
Main idea: Get spatial inform...
Achieving flip invariance (RIRQ) 
1 
2 
4 3 
1 
2 3 
4 
2 
4 1 
3 2 
3 
4 
1 
4 2 2 4 
SORT SORT 
2 4 
26 
Requirements St...
Where do we integrate our features? 
Two main Architectures 
Enriched SIFT (eSIFT) 
SIFT Shape features 
Visual Vocabulary...
BoW+Shape Creation of the shape histograms 
SIFT 
Accumulation of features 
Visual Vocabulary 
Bag of Words Shape histogra...
Results and conclusions 
29
Requirements State of the Art Design Results 
The dataset: Caltech-101 
30 
•Well recognized dataset 
• 101 Different Cate...
Requirements State of the Art Design Results 
Metrics: Accuracy (%) 
31 
Correct Classifications 
Correct + Incorrect Clas...
Requirements State of the Art Design Results 
Experiments setup 
32 
• 30 images per category in train and 30-50 in test. ...
Results enriched SIFT 
33 
Requirements State of the Art Design Results
Results BoW+S 
34 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Performance achieved 
35 
Conclusion 
With Angular Scan, there is an increas...
Requirements State of the Art Design Results 
Comparison between adding features 
after and before 
Conclusion 
In Angular...
Requirements State of the Art Design Results 
Number of bins per histogram 
Conclusion 
In Angular Scan, 8 bins is the val...
Requirements State of the Art Design Results 
Ground truth vs MCG Object Candidates 
Conclusion 1 
2 
Higher vocabulary va...
Requirements State of the Art Design Results 
Context gain vs Shape gain 
Conclusion 
Object 
Context 
It gives better per...
FutureWork 
Comparison betwen our work and 
Second Order Pooling 
PhD thesis of Carles Ventura 
Carreira, J., Caseiro, R.,...
Distance to the nearest border (DNB) 
41 
Future Work
Conclusions 
1. Increase of performance from 16% to around 41% 
2. In Angular Scan, if the number of shape features is hig...
Upcoming SlideShare
Loading in …5
×

Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

2,861 views

Published on

Thesis report and full details: https://imatge.upc.edu/web/publications/contextless-object-recognition-shape-enriched-sift-and-bags-features

Author: Marcel Tella
Advisors: Xavier Giró-i-Nieto (UPC) and Matthias Zeppelzauer (TU Wien)

Degree: Telecommunications Engineering (5 years) at Telecom BCN-ETSETB (UPC)

Abstract:
Currently, there are highly competitive results in the field of object recognition based on the aggregation of point-based features. The aggregation process, typically with an average or max-pooling of the features generates a single vector that represents the image or region that contains the object.

The aggregated point-based features typically describe the texture around the points with descriptors such as SIFT. These descriptors present limitations for wired and textureless objects. A possible solution is the addition of shape-based information. Shape descriptors have been previously used to encode shape information and thus, recognise those types of objects. But generally an alignment step is required in order to match every point from one shape to other ones. The computational cost of the similarity assessment is high.

We purpose to enrich location and texture-based features with shape-based ones. Two main architectures are explored: On the one side, to enrich the SIFT descriptors with shape information before they are aggregated. On the other side, to create the standard Bag of Words histogram and concatenate a shape histogram, classifying them as a single vector.

We evaluate the proposed techniques and the novel features on the Caltech-101 dataset.

Results show that shape features increase the final performance. Our extension of the Bag of Words with a shape-based histogram(BoW+S) results in better performance. However, for a high number of shape features, BoW+S and enriched SIFT architectures tend to converge.

Published in: Technology
  • Login to see the comments

Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

  1. 1. Contextless Object Recognition with Shape-enriched SIFT and Bags of Features Marcel Tella Amo Directed by Dr. Matthias Zeppelzauer (TU Wien) Codirected by Dr. Xavier Giró-i-Nieto (UPC)
  2. 2. Motivation 2 Object Recognition and Classification Categories • Ball • Airplane • Chair • Beaver • … Ball Airplane Chair Shape Information Texture information
  3. 3. 3 Index Requirements State of the Art Design Results
  4. 4. Requirements 4
  5. 5. Requirements State of the Art Design Results Design shape features that can be used in an aggregated framework, like Bag of Words with no need of matching or alignment. 5 Take a successful method : Shape Information SIFT
  6. 6. Requirements State of the Art Design Results Analyse the implication of the vocabulary size with respect to the size of the shape features. SIFT 6 Shape
  7. 7. The proposed features should be at least scale, rotation and translation invariant. If it is possible, flip invariant as well. 7 Requirements State of the Art Design Results
  8. 8. Need for Segmentation to codify the shape Study the limitations of shape coding when using a state of the art segmentation. Manual annotations vs Automatic Segmentation 8 Requirements State of the Art Design Results
  9. 9. State of the Art 9
  10. 10. Requirements State of the Art Design Results Object Candidates algorithms Multiscale Combinatorial Grouping (MCG) 10 Ranking Object Plausibility Arbelaez, P., Pont-Tuset, J., Barron, J. T., Marques, F., Malik, J. (2014). Multiscale Combinatorial Grouping. CVPR. High Low
  11. 11. Requirements State of the Art Design Results Shape Context 11 G. Mori, S. Belongie, and J. Malik. Ecient shape matching using shape contexts. PAMI, 27(11), 2005.
  12. 12. Requirements State of the Art Design Results Interest point descriptors: SIFT descriptor Simplified example Typically 4x4 divisions * 8 bins/hist = 128 features dense SIFT sparse SIFT 12 David G Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision 60 (2004), no. 2, 91{110.
  13. 13. Requirements State of the Art Design Results Enrichment of SIFT Extra features : Absolute spatial location (X,Y) or angle and distance Rene Grzeszick, Leonard Rothacker, and Gernot A. Fink, "Bag-of-features representations using spatial visual vocabularies for object classication,“ in IEEE Intl. Conf. on Image Processing, Melbourne, Australia, 2013 Extra features : Relative position + aspect ratio + scale ratio + Color Space Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision{ECCV 2012} (pp. 430-443). Springer Berlin Heidelberg. 13 128-dimensional SIFT descriptor Extra features
  14. 14. Bag of Words 14 Requirements State of the Art Design Results
  15. 15. Requirements State of the Art Design Results Bags of Words - Pipeline 15 Get Descriptors Clustering (K-means) Create histograms Train Model (SVM) Image Create histogram Evaluate (SVM)
  16. 16. Design 16
  17. 17. Requirements State of the Art Design Results Why dense SIFT? 17
  18. 18. Main principle: Combination of dense SIFT and Object Candidates 18 Requirements State of the Art Design Results
  19. 19. Requirements State of the Art Design Results Distance to the nearest border (DNB) Logarithmic distance to the nearest border (LDNB) Less influence of big distances 19 Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.
  20. 20. Distance and Angle to the nearest border (DANB) Problem: Really similar in 2D but very different values. Solution: Codify them in two separated features. 20 Requirements State of the Art Design Results
  21. 21. Rotation Invariant Angle to the nearest border 21 Requirements State of the Art Design Results
  22. 22. Distance to the center (DC) 22 Requirements State of the Art Design Results
  23. 23. η - Angular Scan (ηAS) WINNER! 23 Requirements State of the Art Design Results
  24. 24. Shape Context from a dense SIFT (DSC) Note: It crosses the contour of the region like Shape Context. ηAS does not! 24 Requirements State of the Art Design Results
  25. 25. Requirements State of the Art Design Results Rotation Invariant Region Quantization (RIRQ) Main idea: Get spatial information. Easily extensible to a pyramid! 25 Lazebnik, S., Schmid, C., & Ponce, J. (2006). 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169-2178). IEEE.
  26. 26. Achieving flip invariance (RIRQ) 1 2 4 3 1 2 3 4 2 4 1 3 2 3 4 1 4 2 2 4 SORT SORT 2 4 26 Requirements State of the Art Design Results
  27. 27. Where do we integrate our features? Two main Architectures Enriched SIFT (eSIFT) SIFT Shape features Visual Vocabulary Bag of eSIFT visual words BoW+Shape SIFT Visual Vocabulary Bag of Words Shape histogram 27 Requirements State of the Art Design Results
  28. 28. BoW+Shape Creation of the shape histograms SIFT Accumulation of features Visual Vocabulary Bag of Words Shape histogram 1 1. Accumulate the same feature for all points . 2. Create a histogram of X bins for that feature. 1 2 2 3. Concatenate histograms to create the final one. Example: 8-Angular Scan 8 distances (different angles) # SIFT keypoints 28 Requirements State of the Art Design Results
  29. 29. Results and conclusions 29
  30. 30. Requirements State of the Art Design Results The dataset: Caltech-101 30 •Well recognized dataset • 101 Different Categories of images • Ground truth annotations available • From 40 to 800 images per category.
  31. 31. Requirements State of the Art Design Results Metrics: Accuracy (%) 31 Correct Classifications Correct + Incorrect Classifications
  32. 32. Requirements State of the Art Design Results Experiments setup 32 • 30 images per category in train and 30-50 in test. • 101 Categories + Background category. • Different Vocabulary sizes in the X axis. • Accuracy(%) in the Y axis: •Experiments and analysis: • eSIFT • BoW+S • eSIFT vs BoW+S • Performance acheived • Comparison between adding features before or after quantization • Number of bins per histogram • Ground truth vs MCG Object Canditates • Context vs Shape
  33. 33. Results enriched SIFT 33 Requirements State of the Art Design Results
  34. 34. Results BoW+S 34 Requirements State of the Art Design Results
  35. 35. Requirements State of the Art Design Results Performance achieved 35 Conclusion With Angular Scan, there is an increase of performance from 16% to around 41%.
  36. 36. Requirements State of the Art Design Results Comparison between adding features after and before Conclusion In Angular Scan, if the number of shape features is high, both architectures tend to converge. 36
  37. 37. Requirements State of the Art Design Results Number of bins per histogram Conclusion In Angular Scan, 8 bins is the value that gives the best performance. 37
  38. 38. Requirements State of the Art Design Results Ground truth vs MCG Object Candidates Conclusion 1 2 Higher vocabulary values lead to a more robust approach in terms of segmentation errors. Shape-based methods are more sensible to segmentation errors than texture-based. 38
  39. 39. Requirements State of the Art Design Results Context gain vs Shape gain Conclusion Object Context It gives better performance to codify the shape than the context of the image. 39
  40. 40. FutureWork Comparison betwen our work and Second Order Pooling PhD thesis of Carles Ventura Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg. 40
  41. 41. Distance to the nearest border (DNB) 41 Future Work
  42. 42. Conclusions 1. Increase of performance from 16% to around 41% 2. In Angular Scan, if the number of shape features is high, both architectures tend to converge. 3. In Angular Scan, 8 bins is the value that gives the best performance. 4. Higher vocabulary values lead to a more robust approach in terms of segmentation errors. 5. Shape-based methods are more sensible to segmentation errors than texture-based. 6. It gives better performance to codify the shape than the context of the image. Thank you! Questions? 42

×