Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[course site]
#DLUPC
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin Cit...
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features...
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features...
The problem: query by example
Given:
● An example query image that
illustrates the user's information
need
● A very large ...
Challenges
Similarity: How to compare two images for similarity? What does it mean for two
images to be similar?
Speed: Ho...
Challenges: comparing images
Distance: 0.0 Distance: 0.654 Distance: 0.661
Image representation: 150528 dimensional vector...
Challenges: comparing images
Distance: 0.0 Distance: 0.718 Distance: 0.080
Image representation: 256 dimensional vector ba...
Classification
8
Query: This chair
Results from dataset classified as “chair”
Retrieval
9
Query: This chair
Results from dataset ranked by similarity to the query
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features...
The retrieval pipeline
11
Image RepresentationsQuery
Image
Dataset
Image Matching Ranked List
Similarity score Image
.
.
....
The classic SIFT retrieval pipeline
12
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
variable number of
feature vectors pe...
Convolutional neural networks and retrieval
13
Classification Object Detection Segmentation
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off-the-shelf CNN features...
Off-the-shelf CNN representations
15Razavian et al. CNN features off-the-shelf: an astounding baseline for recognition. In...
Off-the-shelf CNN representations
Neural codes for retrieval [1]
● FC7 layer (4096D)
● L2
norm + PCA whitening + L2
norm
●...
Off-the-shelf CNN representations
17
sum/max pool conv features across filters
Babenko and Lempitsky, Aggregating local de...
Off-the-shelf CNN representations
18
Descriptors from convolutional layers
Off-the-shelf CNN representations
19
Pooling features on conv layers allow to describe specific parts of an image
Off-the-shelf CNN representations
[1] Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 201...
Off-the-shelf CNN representations
21
BoW, VLAD encoding of conv features
Ng et al. Exploiting local features from deep net...
Off-the-shelf CNN representations
22
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Descriptor...
Off-the-shelf CNN representations
23
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Off-the-shelf CNN representations
24
Paris Buildings 6k Oxford Buildings 5k
TRECVID Instance Search 2013
(subset of 23k fr...
Off-the-shelf CNN representations
CNN representations
● L2
Normalization + PCA whitening + L2
Normalization
● Cosine simil...
CNN as a feature extractor
Diagram from SIFT Meets CNN: A Decade Survey of Instance Retrieval
26
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features...
Learning representations for retrieval
Siamese network: network to learn a function that maps input
patterns into a target...
Learning representations for retrieval
Siamese network: network to learn a function that maps input
patterns into a target...
Margin: m
Hinge loss
Negative pairs: if
nearer than the
margin, pay a linear
penalty
The margin is a
hyperparameter
30
Learning representations for retrieval
Siamese network with triplet loss: loss function minimizes distance between query a...
Learning representations for retrieval
32
Deep Image Retrieval: Learning global representations for image
search, Gordo A....
Learning representations for retrieval
33
Learning representations for retrieval
34
“Manually” generated dataset to create the training triplets
Dataset: Landmarks ...
Learning representations for retrieval
35
Comparison between R-MAC from off-the-shelf network and R-MAC retrained for retr...
Summary
36
● Pre-trained CNN are useful to generate image descriptors for retrieval
● Convolutional layers allow us to enc...
Questions?
Upcoming SlideShare
Loading in …5
×

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

741 views

Published on

https://telecombcn-dl.github.io/2017-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

  1. 1. [course site] #DLUPC Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Image retrieval Day 4 Lecture 5
  2. 2. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 2
  3. 3. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 3
  4. 4. The problem: query by example Given: ● An example query image that illustrates the user's information need ● A very large dataset of images Task: ● Rank all images in the dataset according to how likely they are to fulfil the user's information need 4
  5. 5. Challenges Similarity: How to compare two images for similarity? What does it mean for two images to be similar? Speed: How to guarantee sub-second query response times? Scalability: How to handle millions of images and multiple simultaneous users? How to ensure image representations are sufficiently compact? 5
  6. 6. Challenges: comparing images Distance: 0.0 Distance: 0.654 Distance: 0.661 Image representation: 150528 dimensional vector consisting in the concatenation of the flatten color channels. Final vector is l2 normalized. 6
  7. 7. Challenges: comparing images Distance: 0.0 Distance: 0.718 Distance: 0.080 Image representation: 256 dimensional vector based on the histogram of the grayscale pixels. Vector is l2 normalized. 7
  8. 8. Classification 8 Query: This chair Results from dataset classified as “chair”
  9. 9. Retrieval 9 Query: This chair Results from dataset ranked by similarity to the query
  10. 10. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 10
  11. 11. The retrieval pipeline 11 Image RepresentationsQuery Image Dataset Image Matching Ranked List Similarity score Image . . . 0.98 0.97 0.10 0.01 v = (v1 , …, vn ) v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... Euclidean distance Cosine Similarity Similarity Metric . . .
  12. 12. The classic SIFT retrieval pipeline 12 v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... variable number of feature vectors per image Bag of Visual Words N-Dimensional feature space M visual words (M clusters) INVERTED FILE word Image ID 1 1, 12, 2 1, 30, 102 3 10, 12 4 2,3 6 10 ... Large vocabularies (50k-1M) Very fast! Typically used with SIFT features
  13. 13. Convolutional neural networks and retrieval 13 Classification Object Detection Segmentation
  14. 14. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off-the-shelf CNN features for retrieval ● Learning representations for retrieval 14
  15. 15. Off-the-shelf CNN representations 15Razavian et al. CNN features off-the-shelf: an astounding baseline for recognition. In CVPR Babenko et al. Neural codes for image retrieval. ECCV 2014 FC layers as global feature representation
  16. 16. Off-the-shelf CNN representations Neural codes for retrieval [1] ● FC7 layer (4096D) ● L2 norm + PCA whitening + L2 norm ● Euclidean distance ● Only better than traditional SIFT approach after fine tuning on similar domain image dataset. CNN features off-the-shelf: an astounding baseline for recognition [2] ● Extending Babenko’s approach with spatial search ● Several features extracted by image (sliding window approach) ● Really good results but too computationally expensive for practical situations [1] Babenko et al, Neural codes for image retrieval CVPR 2014 [2] Razavian et al, CNN features off-the-shelf: an astounding baseline for recognition, CVPR 2014 16
  17. 17. Off-the-shelf CNN representations 17 sum/max pool conv features across filters Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 2015 Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879. Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv:1512.04065.
  18. 18. Off-the-shelf CNN representations 18 Descriptors from convolutional layers
  19. 19. Off-the-shelf CNN representations 19 Pooling features on conv layers allow to describe specific parts of an image
  20. 20. Off-the-shelf CNN representations [1] Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 2015 [2] Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. ECCV 2016 Apply spatial weighting on the features before pooling them Sum/max pooling operation of a conv layer [1] weighting based on ‘strength’ of the local features [2] weighting based on the distance to the center of the image 20
  21. 21. Off-the-shelf CNN representations 21 BoW, VLAD encoding of conv features Ng et al. Exploiting local features from deep networks for image retrieval. CVPR Workshops 2015 Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
  22. 22. Off-the-shelf CNN representations 22 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector Descriptors from convolutional layers Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
  23. 23. Off-the-shelf CNN representations 23 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector
  24. 24. Off-the-shelf CNN representations 24 Paris Buildings 6k Oxford Buildings 5k TRECVID Instance Search 2013 (subset of 23k frames) [7] Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv:1512.04065. Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
  25. 25. Off-the-shelf CNN representations CNN representations ● L2 Normalization + PCA whitening + L2 Normalization ● Cosine similarity ● Convolutional features better than fully connected features ● Convolutional features keep spatial information → retrieval + object location ● Convolutional layers allows custom input size. ● If data labels available, fine tuning the network to the image domain improves CNN representations. 25
  26. 26. CNN as a feature extractor Diagram from SIFT Meets CNN: A Decade Survey of Instance Retrieval 26
  27. 27. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 27
  28. 28. Learning representations for retrieval Siamese network: network to learn a function that maps input patterns into a target space such that L2 norm in the target space approximates the semantic distance in the input space. Applied in: ● Dimensionality reduction [1] ● Face verification [2] ● Learning local image representations [3] 28 [1] Song et al.: Deep metric learning via lifted structured feature embedding. CVPR 2015 [2] Chopra et al. Learning a similarity metric discriminatively, with application to face verification CVPR' 2005 [3] Simo-Serra et al. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
  29. 29. Learning representations for retrieval Siamese network: network to learn a function that maps input patterns into a target space such that L2 norm in the target space approximates the semantic distance in the input space. 29 Image credit: Simo-Serra et al. Fracking deep convolutional image descriptors. CoRR 2014
  30. 30. Margin: m Hinge loss Negative pairs: if nearer than the margin, pay a linear penalty The margin is a hyperparameter 30
  31. 31. Learning representations for retrieval Siamese network with triplet loss: loss function minimizes distance between query and positive and maximizes distance between query and negative 31Schroff et al. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015 CNN w w a p n L2 embedding space Triplet Loss CNN CNN
  32. 32. Learning representations for retrieval 32 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 ● R-MAC representation ● Learning descriptors for retrieval using three channels siamese loss: Ranking objective: ● Learning where to pool within an image: predicting object locations ● Local features (from predicted ROI) pooled into a more discriminative space (learned fc) ● Building and cleaning a dataset to generate triplets
  33. 33. Learning representations for retrieval 33
  34. 34. Learning representations for retrieval 34 “Manually” generated dataset to create the training triplets Dataset: Landmarks dataset: ● 214K images of 672 famous landmark site. ● Dataset processing based on a matching baseline: SIFT + Hessian-Affine keypoint detector. ● Important to select the “useful” triplets. Gordo et al, Deep Image Retrieval: Learning global representations for image search. ECCV 2016
  35. 35. Learning representations for retrieval 35 Comparison between R-MAC from off-the-shelf network and R-MAC retrained for retrieval Gordo et al, Deep Image Retrieval: Learning global representations for image search. ECCV 2016
  36. 36. Summary 36 ● Pre-trained CNN are useful to generate image descriptors for retrieval ● Convolutional layers allow us to encode local information ● Knowing how to rank similarity is the primary task in retrieval ● Designing CNN architectures to learn how to rank
  37. 37. Questions?

×