Saliency Weighted Convolutional Features for Instance Search

Saliency Weighted Convolutional
Features for Instance Search
Eva Mohedano, Kevin McGuinness, Xavier Giro-i-Nieto
and Noel E. O’Connor

Contents
Instance Search task
Motivation
Proposed Method
Results
Conclusions and Future Work

Visual Instance Retrieval
4
Image Database
“This dog”
Expected outcome:
Visual Query

The Classic Retrieval Pipeline
5
Image RepresentationsQuery
Image
Dataset
Image Matching Ranked List
Similarity score Image
.
.
.
0.98
0.97
0.10
0.01
v = (v1
, …, vn
)
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
Euclidean distance
Cosine Similarity
Similarity
Metric .
.
.

6
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
variable number of
feature vectors per image
Bag of Visual
Words
N-Dimensional
feature space
M visual words
(M clusters)
INVERTED FILE
word Image ID
1 1, 12,
2 1, 30, 102
3 10, 12
4 2,3
6 10
...
Large vocabularies (50k-1M)
Very fast!
Typically used with SIFT features
Initial Search

7
Re-ranking the top-ranked results using spatial constraints
RAndom SAmple Consensus (RANSAC)
● Estimates an homography between
the query and a dataset image
● Re-rank based on number of inlier
local features
● Improves quality of the initial search
Philbin, James, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. "Object retrieval with large vocabularies and fast
spatial matching." In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, pp. 1-8. IEEE, 2007.
Expensive to compute
Spatial re-ranking

Deep Learning Approaches in CBMI
9
Zheng, Liang, Yi Yang, and Qi Tian. "SIFT meets CNN: A decade survey of instance retrieval." TPAMI 2018.

Features from pre-trained CNN networks
- Providing more importance to the center region (Content-independent)
10
Gaussian weighting
Convolutional
features
Sum-pooled
features
Babenko, Artem, and Victor Lempitsky. "Aggregating local deep features for image retrieval." CVPR 2015.

- Providing more importance to the most active regions in a convolution layer
(Content-dependent)
11
Convolutional
features
Sum-pooled
featuresSum across conv
channels weighting
Kalantidis, Yannis, Clayton Mellina, and Simon Osindero. "Cross-dimensional weighting for aggregated deep convolutional features." ECCV 2016.

- Region Maximum Activation of Convolution (R-MAC)
12
Region1
Region2
…
RegionN
Max-pool Region
Normalization
Tolias, Giorgos, Ronan Sicre, and Hervé Jégou. "Particular object retrieval with integral max-pooling of CNN activations." ICLR 2016.

- Region Maximum Activation of Convolution (R-MAC) (Content-independent)
13
R-MAC spatial weight
Fix set of locations and
window scales

Using human-based Saliency models
14
Human-based saliency

Saliency weighting for retrieval
[1] Awad, Dounia, Vincent Courboulay, and Arnaud Revel. "Saliency filtering of sift detectors: Application to cbir." ACIVS, 2012
[2] de Carvalho Soares, Robson, Ilmerio Reis da Silva, and Denise Guliato. "Spatial locality weighting of features using saliency map
with a bag-of-visual-words approach." ICTAI, 2012
15
- Traditionally explored with SIFT-based BoW approaches to:
- Prune the number of local descriptors [1]
- Weight the contribution of the background [2]
We investigate traditional and data-driven saliency models to weight the
contribution of visual words assigned to local convolutional features for
the Visual Instance Search task.

Bag of Local Convolutional Features
19
(336x256)
Resolution
conv5_1 from
VGG16
(21x16)
25K centroids
(Visual Vocabulary)
25K-D vector
Bag of Words
Sparse feature representation
Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marqués, and Xavier Giro-i-Nieto. "Bags of local convolutional
features for scalable instance search." ICMR 2016.

Masking the relevant region
(Encoding the query)
20
(336x256)
Resolution
conv5_1 from
VGG16
(21x16)
25K centroids
(Visual Vocabulary)
25K-D vector
Bag of Words
Assignment Maps
Mohedano, Eva, Kevin McGuinness, Noel E. O'Connor, Amaia Salvador, Ferran Marqués, and Xavier Giro-i-Nieto. "Bags of local convolutional
features for scalable instance search." ICMR 2016.

General Framework
21
Pan, Junting, Cristian Canton Ferrer, Kevin McGuinness, Noel E. O'Connor, Jordi Torres, Elisa Sayrol, and Xavier Giro-i-Nieto. "Salgan: Visual
saliency prediction with generative adversarial networks." arXiv preprint arXiv:1701.01081 (2017).

Different Saliency models
22
Gaussian Conv features Itti-Koch BMS
SalNet SalGAN SAM-VGG SAM-ResNet

Encoding relevant areas based on saliency
prediction (dataset image)
24
Spatial weighting
25K-D BoW vector
Unweighted Bow Weighted Bow
25K-D BoW vector

Effect of different spatial weighting methods
26
Hand-crafted
saliency models
Deep-learning
based saliency
models

27
Saliency region ‘within’ the instance, which is not beneficial in
retrieval datasets based on buildings

Comparison Sum-pooling vs BCLF
28
● BCLF better baseline (vocabulary learning can be seen as
unsupervised domain adaptation)
● Saliency effective in both Sum-pooling and BLCF approach for the
instance search dataset Instre

Comparison with the State-of-the-art
29
High dimensional 25,000D representations
with an average number of non-zeros ~200

31
Gomez P, Mohedano E, McGuinness K, Giró-i-Nieto X, O'Connor N, “Demonstration of an Open Source Framework for Qualitative
Evaluation of CBIR Systems”, ACM Multimedia 2018
Dockerized visualization tool

Conclusions
● Proven the application of modern saliency models for the instance
search task
● Achieved SoA performance on instance search benchmark (Instre)
with a off-the-shelf CNN model
● Investigate better post-processing for ranking refinement
● Scale method on large-scale datasets
Future Work

Thanks for your attention!
Questions?
Software available @ https://github.com/imatge-upc/salbow

Saliency Weighted Convolutional Features for Instance Search

Recommended

Recommended

More Related Content

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

Saliency Weighted Convolutional Features for Instance Search