Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A location-aware embedding technique for accurate landmark recognition

75 views

Published on

The current state of the research in landmark recognition highlights the good accuracy which can be achieved by embedding techniques, such as Fisher vector and VLAD. All these techniques do not exploit spatial information, i.e. consider all the features and the corresponding descriptors without embedding their location in the image. This paper presents a new variant of the well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique which accounts, at a certain degree, for the location of features. The driving motivation comes from the observation that, usually, the most interesting part of an image (e.g., the landmark to be recognized) is almost at the center of the image, while the features at the borders are irrelevant features which do no depend on the landmark. The proposed variant, called locVLAD (location-aware VLAD), computes the mean of the two global descriptors: the VLAD executed on the entire original image, and the one computed on a cropped image which removes a certain percentage of the image borders. This simple variant shows an accuracy greater than the existing state-of-the-art approach. Experiments are conducted on two public datasets (ZuBuD and Holidays) which are used both for training and testing. Morever a more balanced version of ZuBuD is proposed.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

A location-aware embedding technique for accurate landmark recognition

  1. 1. A location-aware embedding technique for accurate landmark recognition Federico Magliani, Navid Mahmoudian Bidgoli, Andrea Prati ICDSC 2017 – Stanford, USA – 5-7 September 2017
  2. 2. Agenda 2 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  3. 3. Motivations 3 Landmark Recognition problem ➢ try to understand what’s is in front of you ➢ using client-server communication ➢ helping with geolocalization (GPS)
  4. 4. Motivations 4 ➢ Challenges ○ high accuracy retrieval (precision) ○ fast research (response to query) ○ reduced memory occupied (mobile friendly) ○ work well with big data (>100k data) ➢ Possible applications ○ augmented reality (tourism) ➢ Why mobile based? ○ everyone owns a mobile phone ○ a mobile phone has powerful HW, that allows to run some applications
  5. 5. Motivations 5 “Changes in the image resolution, illumination conditions, viewpoint and the presence of distractors such as trees or traffic signs (just to mention some) make the task of matching features between a query image and the database rather difficult.” ➢ In order to mitigate these problems, the existing approaches rely on feature description with a certain degree of invariance to scale, orientation and illumination changes.
  6. 6. Agenda 6 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  7. 7. Summary of contribution 7 ➢ A location-aware version of VLAD, called locVLAD, that allows to outperform the state of the art in the intra-dataset problem. It tries to overcome a weakness of VLAD, reducing the noise of the features in the borders of the images ➢ The time for vocabulary creation is significantly reduced, using only ⅕ random of the detected features ➢ A new balanced version of the public dataset ZuBuD is proposed and made available to the scientific community (ZuBuD+)
  8. 8. Agenda 8 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  9. 9. Related work 9 ➢ Bag of Words (BoW): first method for solving the problem (different techniques: vocabulary tree, …) ➢ Fisher vector: embedding based on Fisher kernel ➢ VLAD and its variants: simplified version of Fisher vector ➢ Hamming embedding: embedding based on binarized descriptors ➢ CNN based: deep neural network, that at the end contain classification layers
  10. 10. 10 Proposed Pipeline
  11. 11. Agenda 11 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  12. 12. VLAD (Vector of Locally Aggregated Descriptors) C = {c1 ,.., ck } codebook of k visual words (K-means clustering) 1. Every local descriptor x, extracted from the image, is assigned to the closest cluster center of the codebook (ci = NN(xj )) 2. vi = ∑ (x - ci ) (residuals) 3. VLAD vector is the concatenation of vi vectors (i = 1, …, k) d-dimensional 4. VLAD normalization to contrast the burstiness problem 16 centroids, features described with SIFT 128d → D=128x16=2048 12
  13. 13. VLAD normalization 13 ➢ Signed Square Rooting normalization: sign(xi ) sqrt(|xi |) followed by L2 norm ➢ Residual normalization: independent residual L2 norm followed by L2 norm ➢ Z-Score normalization: residual normalization followed by subtraction of the mean from every vector and division by the standard deviation ➢ Power normalization: sign(xi )|xi |α (usually α=0.2) followed by L2 norm
  14. 14. Agenda 14 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  15. 15. Proposed approach: locVLAD ➢ This method allows to improve the performance of VLAD vectors in the recognition problem. ➢ It tackles this problem by reducing the influence of features found at the borders of the image. How does it work? It consists in a new global descriptor, that is the mean of VLAD descriptors of the original query image (v̇) and a VLAD descriptor calculated on a cropped query image (v̇cropped ). 15
  16. 16. Proposed approach: locVLAD The dimension of the cropped image is a parameter, that depends on the used dataset ➢ ZuBuD → 90% of the original query images ➢ Holidays → 70% of the original query images. 16424 features detected 367 features detected
  17. 17. Why does it increase the performance? Because, usually, the important features for the recognition are located in the center of the images while the features close to the border are noisy features. Why not applying VLAD encoding directly on the cropped image? Because useful information might be lost. Not any guarantee that features in the borders are only noisy features. Why not creating a cropped vocabulary? Experiments were conducted but results were poor. Proposed approach: locVLAD 17
  18. 18. Agenda 18 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  19. 19. Datasets ➢ INRIA Holidays (1491 images in 2448x3264: 500 classes, 500 query) ➢ ZuBuD (1005 images in 640x480: 201 classes, 115 query in 320x240) ➢ ZuBuD+ (1005 images in 640x480: 201 classes, 1005 query in 320x240) 19
  20. 20. Holidays 20
  21. 21. ZuBuD 21
  22. 22. ZuBuD+ 2222 It is the balanced version of ZuBuD ➢ 1005 query in 320x240 instead of 115 query. ➢ The new query images are random choices of database images, but different from other query images ○ rotation (±90°) and resize ○ resize only Download: http://implab.ce.unipr.it/?page_id=194
  23. 23. Evaluation Metrics 2323 Different evaluation metrics are used to compare with the state-of-the-art approaches: ➢ Top1 → accuracy retrieval, evaluating only the first position of the ranking ➢ 5 x Recall in Top5 → average of how many times the correct image is in the top 5 results in the ranking ➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results) for each query, based on the position in the ranking
  24. 24. Results on ZuBuD (and ZuBuD+) 24
  25. 25. Results on ZuBuD (and ZuBuD+) 25 Method Descriptor size Top1 5 x Recall in Top5 Tree histogram (ZuBuD) [7] 10M 98.00 % - Decision tree (ZuBuD) [9] n/a 91.00 % - Sparse coding (ZuBuD) [22] 8k*64+1k*36 - 4.538 VLAD (ZuBuD) [12] 4281*128 99.00 % 4.416 VLAD (ZuBuD+) [12] 4281*128 99.00 % 4.526 locVLAD (ZuBuD) 4281*128 100.00 % 4.469 locVLAD (ZuBuD+) 4281*128 100.00 % 4.543 It is worth to note that on ZuBuD the method based on sparse coding slightly outperforms the proposed one. This is due to an unbalanced query set and, probably, on the use of color information.
  26. 26. Results on Holidays 26
  27. 27. Results on Holidays 27 Method Descriptor size mAP Sparse coding [22] 8k*64+1k*36 76.51 % VLAD [12] 4281*128 74.43 % locVLAD 4281*128 77.20 % Sparse coding [4] 20k*128 79.00 % VLAD [12] 20k*128 78.78 % locVLAD 20k*128 80.89 %
  28. 28. Vocabulary creation 28
  29. 29. Agenda 29 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to VLAD ➢ Proposed approach (locVLAD) ➢ Experimental results ➢ Conclusions and Future Works
  30. 30. Conclusions ➢ The proposed locVLAD technique includes, at a certain degree, information on the location of the features, by mitigating the negative effects of distractors found at the image borders. ➢ Experiments are performed on two public datasets, namely ZuBuD and Holidays, and demonstrate superior recognition accuracy w.r.t. the state of the art. 30
  31. 31. Future works ➢ Compression: try to reduce the dimension of the descriptors, while keeping the same accuracy in retrieval (mobile friendly). ➢ Indexing: create a system for the evaluation in a large scale domain (adding until 1M distractors). Passing from Nearest Neighbor problem to Approximate Nearest Neighbor problem. We are working with kd tree and permutation-based methods. ➢ Sparse coding: new methods for the creation of the vocabulary and the assignment of the features to the VLAD vector. 31
  32. 32. Thank you for your attention! questions? http://implab.ce.unipr.it 32

×