Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep Image Retrieval:
Learning global representations for image
search
Albert Gordo, Jon Almazan, Jerome Revaud, Diane Lar...
1. Introduction
2
3
Instance Retrieval + Ranking
1.
2.
3.
4.
Image Retrieval
Slide credit: Amaia
Ranking
Image
Query
CNN-based retrieval
● CNNs trained for classification tasks
● Features are very robust to intra-class variability
● Lack o...
R-MAC
● Regional Maximum Activation of Convolutions
● Compact feature vectors encode image regions
Related Work
Giorgos To...
R-MAC
● Regions selected using a rigid grid
● Compute a feature vector per region
● Combine all region feature vectors
○ D...
2. Methodology
7
1st Contribution
● Three-stream siamese network
● PCA implemented as a shift + fully connected layer
● Optimize weights (C...
where:
● m is a scalar that controls the margin
● q, d+, d- are the descriptors for the query, positive and negative image...
2nd Contribution
● Localize regions of interest (ROIs)
● Train a Region Proposal Network with bounding boxes (Similar Fast...
2nd Contribution
RPN in a nutshell
11
● Predict, for a set of candidate boxes of
various sizes and aspects ratio, and at a...
Summary
12
● Able to encode one image into a compact feature vector in a single forward
pass
● Images can be compared usin...
3. Experiments
13
Datasets
14
● Training Landmarks dataset: 214k images from 672 landmark sites
● Testing Oxford 5k, Paris 6k, Oxford 105k, ...
Bounding Box Estimation
15
● RPN trained using automatically estimated bounding box annotations
1. Define initial bounding...
Experimental Details
16
● VGG-16 network pre-trained on ImageNet
● Fine-tune with Landmarks dataset
● Select triplets in a...
1st Experiment
17
Comparison between R-MAC and their implementations
C: Classification Network
R: Ranking (Trained with tr...
2nd Experiment
18
Comparison between fixed grid vs number of region proposals
16-32 proposals already outperform rigid gri...
2nd Experiment
19
meanAP - Number of triplets Recall - Number of region proposals
2nd Experiment
20
Heatmap vs Bounding Box Estimation
Comparison with state of the art
21
Comparison with state of the art
22
Top Retrieval Results
23
4. Conclusions
24
Conclusions
25
● They have proposed an effective and scalable method for image retrieval that
encodes images into compact ...
Thank You!
Any Questions?
26
Upcoming SlideShare
Loading in …5
×

Deep image retrieval learning global representations for image search

3,053 views

Published on

Slides by Albert Jimenez about the following paper:

Gordo, Albert, Jon Almazan, Jerome Revaud, and Diane Larlus. "Deep Image Retrieval: Learning global representations for image search." arXiv preprint arXiv:1604.01325 (2016).

We propose a novel approach for instance-level image retrieval. It produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a black box to produce features, our method leverages a deep architecture trained for the specific task of image retrieval. Our contribution is twofold: (i) we introduce a ranking framework to learn convolution and projection weights that are used to build the region features; and (ii) we employ a region proposal network to learn which regions should be pooled to form the final global descriptor. We show that using clean training data is key to the success of our approach. To that aim, we leverage a large scale but noisy landmark dataset and develop an automatic cleaning approach. The proposed architecture produces a global image representation in a single forward pass. Our approach significantly outperforms previous approaches based on global descriptors on standard datasets. It even surpasses most prior works based on costly local descriptor indexing and spatial verification. We intend to release our pre-trained model.

Published in: Technology

Deep image retrieval learning global representations for image search

  1. 1. Deep Image Retrieval: Learning global representations for image search Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus Slides by Albert Jiménez [GDoc] Computer Vision Reading Group (10/05/2016)1 [arXiv]
  2. 2. 1. Introduction 2
  3. 3. 3 Instance Retrieval + Ranking 1. 2. 3. 4. Image Retrieval Slide credit: Amaia Ranking Image Query
  4. 4. CNN-based retrieval ● CNNs trained for classification tasks ● Features are very robust to intra-class variability ● Lack of robustness to scaling, cropping and image clutter Related Work Lamp We are interested in distinguishing between particular objects from the same class! 4
  5. 5. R-MAC ● Regional Maximum Activation of Convolutions ● Compact feature vectors encode image regions Related Work Giorgos Tolias, Ronan Sicre, Hervé Jégou, Particular object retrieval with integral max-pooling of CNN activations (Submitted to ICLR 2016) 5
  6. 6. R-MAC ● Regions selected using a rigid grid ● Compute a feature vector per region ● Combine all region feature vectors ○ Dimension → 256 / 512 Related Work Giorgos Tolias, Ronan Sicre, Hervé Jégou, Particular object retrieval with integral max-pooling of CNN activations (Submitted to ICLR 2016) ConvNet Last Layer K feature maps size = W x H Different scale region grids maximum activation 6
  7. 7. 2. Methodology 7
  8. 8. 1st Contribution ● Three-stream siamese network ● PCA implemented as a shift + fully connected layer ● Optimize weights (CNN + PCA) from R-MAC representation with a triplet loss function 8
  9. 9. where: ● m is a scalar that controls the margin ● q, d+, d- are the descriptors for the query, positive and negative images 1st Contribution Ranking Loss Function 9
  10. 10. 2nd Contribution ● Localize regions of interest (ROIs) ● Train a Region Proposal Network with bounding boxes (Similar Fast R-CNN, [arXiv]) In R-MAC → Rigid grid Replace Region Proposal Network 10
  11. 11. 2nd Contribution RPN in a nutshell 11 ● Predict, for a set of candidate boxes of various sizes and aspects ratio, and at all possible image locations, a score describing how likely each box contains an object of interest. ● Simultaneously, for each candidate box perform regression to improve its location.
  12. 12. Summary 12 ● Able to encode one image into a compact feature vector in a single forward pass ● Images can be compared using the dot product ● Very efficient at test time
  13. 13. 3. Experiments 13
  14. 14. Datasets 14 ● Training Landmarks dataset: 214k images from 672 landmark sites ● Testing Oxford 5k, Paris 6k, Oxford 105k, Paris 106k, INRIA Holidays ● Remove all images contained in Oxford 5k and Paris 6k datasets ○ Landmarks-full: 200k images from 592 landmarks ● Cleaning Landmarks dataset (Select most relevant images/discard incorrect) ○ SIFT + Hessian Affine keypoint det. → Construct graph of similar images ○ Landmarks-clean: 52k images from 592 landmarks
  15. 15. Bounding Box Estimation 15 ● RPN trained using automatically estimated bounding box annotations 1. Define initial bounding box: min rectangle that encloses all matched keypoints 2. For a pair (i, j) we predict the bounding box Bj using Bi and an affine transform Aij 3. Update (Merge using geometrical mean) 4. Iterate until convergence Bounding box projections Initial vs Final estimations
  16. 16. Experimental Details 16 ● VGG-16 network pre-trained on ImageNet ● Fine-tune with Landmarks dataset ● Select triplets in an efficient manner ○ Forward pass to obtain image representations ○ Select hard negatives (Large loss) ● Dimension of the feature vector = 512 ● Evaluation: mean Average Precision (mAP) VGG16
  17. 17. 1st Experiment 17 Comparison between R-MAC and their implementations C: Classification Network R: Ranking (Trained with triplets)
  18. 18. 2nd Experiment 18 Comparison between fixed grid vs number of region proposals 16-32 proposals already outperform rigid grid!
  19. 19. 2nd Experiment 19 meanAP - Number of triplets Recall - Number of region proposals
  20. 20. 2nd Experiment 20 Heatmap vs Bounding Box Estimation
  21. 21. Comparison with state of the art 21
  22. 22. Comparison with state of the art 22
  23. 23. Top Retrieval Results 23
  24. 24. 4. Conclusions 24
  25. 25. Conclusions 25 ● They have proposed an effective and scalable method for image retrieval that encodes images into compact global signatures that can be compared with the dot-product. ● Proposal of a siamese network architecture trained for the specific task of image retrieval using ranking loss function (Triplets). ● Demonstrate the benefit of predicting the ROI of the images when encoding by using Region Proposal Networks.
  26. 26. Thank You! Any Questions? 26

×