Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An accurate retrieval through R-MAC+ descriptors for landmark recognition

32 views

Published on

The landmark recognition problem is far from being solved, but with the use of features extracted from intermediate layers of Convolutional Neural Networks (CNNs), excellent results have been obtained. In this work, we propose some improvements on the creation of R-MAC descriptors in order to make the newly-proposed R-MAC+ descriptors more representative than the previous ones. However, the main contribution of this paper is a novel retrieval technique, that exploits the fine representativeness of the MAC descriptors of the database images. Using this descriptors called "db regions" during the retrieval stage, the performance is greatly improved. The proposed method is tested on different public datasets: Oxford5k, Paris6k and Holidays. It outperforms the state-of-the- art results on Holidays and reached excellent results on Oxford5k and Paris6k, overcame only by approaches based on fine-tuning strategies.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

An accurate retrieval through R-MAC+ descriptors for landmark recognition

  1. 1. An accurate retrieval through R-MAC+ descriptors for landmark recognition Federico Magliani, Andrea Prati ICDSC 2018 – Eindhoven, Netherlands – 3-4 September 2018
  2. 2. Agenda 2 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  3. 3. Motivations 3 Landmark Recognition problem ➢ Try to understand what’s is in front of you and retrieve similar images. ➢ Semantic gap: for a human, this task is pretty simple thanks to personal experience, but a computer can use only the info available in the images. ➢ It is far from being solved (viewpoint, illumination conditions, image resolution, ...).
  4. 4. Motivations 4 ➢ Challenges ○ High accuracy retrieval (precision) ○ Fast research (response to query) ○ Reduced memory occupied (mobile friendly) ○ Work well with big data (>1M data) ➢ Possible applications ○ Augmented reality (tourism) ○ Person Re-ID (video-surveillance) ○ Online clothes search (fashion)
  5. 5. Agenda 5 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions and Future Works
  6. 6. Summary of contributions 6 ➢ a new region detector for CNN feature maps implemented through grids, that respect the aspect ratio of the images. ➢ an improvement on the effectiveness of the multi-resolution approach for R-MAC descriptors. ➢ a novel retrieval method for checking the similarities between query descriptors and regions of database R-MAC descriptors. It allows to outperform the results of R-MAC descriptors on Oxford5k and Paris6k by +7% and +3%.
  7. 7. Agenda 7 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  8. 8. Related works 8 ➢ Bag of Words (BoW): first method for solving the problem (different techniques: vocabulary tree, …). ➢ VLAD: similar to BoW, but using the residual of the descriptors (=feature descriptor - closest centers in the vocabulary). ➢ CNN based: extract features from intermediate layers of CNN architectures and then apply previous embedding techniques (BLCF, ...). ➢ MAC: max pooling applied on CNN features ➢ R-MAC: regional MAC descriptors created through the application of a rigid-grid mechanism
  9. 9. Agenda 9 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  10. 10. R-MAC (Regional MAC) descriptors 10 Considering a rectangular region R ⊆ Ω = (1,W) x (1,H), and define the regional feature vector: fR = (fR,1 ...fR,i ...fR,K )T where fR,i = max Xi (p) is the maximum activation of the ith channel on the considered region. Then we calculate the feature vector associated with each region, and post-process it with l2 -normalization, PCA-whitening and l2 -normalization. We combine the collection of regional feature vectors into a single image vector by summing them and l2 -normalizing in the end. We define the response maps and sample square regions at L different scales ➢ at the largest scale (l=1), the region size is determined to be as large as possible (height = width = min(W,H)) ➢ at every other scale l, we uniformly sample l x (l+m-1) regions of width 2min(W,H)/(l+1). (with m=2)
  11. 11. R-MAC (Regional MAC) descriptors 11 Settings: ➢ Fully convolutional off-the-shelf VGG16 ➢ Pool5 ➢ Spatial Max pooling ➢ High Resolution images ➢ Global descriptor based on aggregating region vectors ➢ Sliding window approach Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015.
  12. 12. Agenda 12 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  13. 13. Proposed approach: R-MAC+ New multi-resolution approach: the images are resized of +25%,-25%, 0% on the largest size, respecting the aspect ratio of the image. ➢ This strategy is an alternative of the first multi-resolution approach, that resized the image to a fixed size: 550px, 800px and 1050 on the largest size, retaining the aspect ratio of the image. ➢ This strategy should allow to augment the dimensions of the feature maps in order to have more features and therefore local maxima than the previous multi-resolution R-MAC. This approach is connected to the new region detector, that detects a reduced number of regions (15) instead of the 20 of the original one. 13
  14. 14. Proposed approach: R-MAC+ 14 A new mechanism for region detection in the CNN feature maps (15 regions) ● l=0 → 1 region covering entirely the image; ● l=1 → 2 square regions (widthRegion = heightRegion = min(H,W)); ● l=2 → 6 rect regions (widthRegion = heightRegion =⌈2*min(W,H)/(l+1))⌉, arranged along the horizontal axis (width and height of the regions are adapted to cover all the image); ● l=3 → 6 rect regions (widthRegion = heightRegion= ⌈2*min(W,H)/(l+2))⌉, arranged along the vertical axis (width and height of the regions are adapted to cover all the image).
  15. 15. Proposed approach: R-MAC+ 15 A new retrieval method based on db regions (MAC descriptors of the database images) and the R-MAC descriptors of the query images (+7% on Oxford5k and +4% on Paris6k than previous results)
  16. 16. Agenda 16 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  17. 17. Datasets and evaluation metric Datasets: ➢ Holidays (1491 images: 500 classes, 500 queries). ➢ Oxford5k (5063 images, 11 classes, 55 queries). ➢ Paris6k (6412 images, 11 classes, 55 queries). Evaluation metric: ➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results) for each query, based on the position in the ranking. 17
  18. 18. Results 18 Method Network Holidays (original/rotated) Oxf5k Paris6k MAC VGG19 76.26 % 57.44 % 73.15 % R-MAC VGG19 87.65 % 65.56 % 82.80 % R-MAC ResNet50 92.55 % 71.77 % 83.31 % M-R R-MAC+ ResNet50 94.63 % / 95.58 % 78.88 % 88.63 % M-R R-MAC+ with retrieval based on db regions ResNet50 94.37 % / 95.87 % 85.39 % 91.90 %
  19. 19. Results after QE application 19 Method Network Holidays (original/rotated) Oxf5k Paris6k M-R R-MAC+ ResNet50 94.97 % / 95.97 % 86.45 % 92.01 % M-R R-MAC+ with retrieval based on db regions ResNet50 94.42 % / 96.05 % 87.92 % 93.64 % M-R R-MAC+ with retrieval based on db regions and query expansion based on db regions ResNet50 94.28 % / 95.91 % 88.78 % 92.30 %
  20. 20. Comparison with the state of the art 20
  21. 21. Agenda 21 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  22. 22. Conclusions ➢ We propose different improvements on R-MAC descriptors in order to make the retrieval very accurate. ○ A multi-resolution approach, that uses bigger feature maps than the previous one. ○ A new region detector with the use of adaptable grids allows to catch more local maxima. ○ A novel retrieval method based on db regions that highly boosts the performance on Oxford5k and Paris6k. ➢ The proposed method outperforms the state of the art on Holidays, both on the original and rotated version. Also it outperforms the state-of-the-art results on some other public benchmarks without the fine-tuning application. 22
  23. 23. Thank you for your attention! questions? http://implab.ce.unipr.it 23

×