Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873, 44874)
Fall 2016, M/W 4-5:15pm@Bloch 0012
Lec 07
Feature Aggregation and Image Retrieval System
Zhu Li
Dept of CSEE, UMKC
Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.
http://l.web.umkc.edu/lizhu
p.1Image Analysis & Retrieval, 2016
Outline
 ReCap of Lecture 06
 SIFT
 Box Filter
 Image Retrieval System
 Why Aggregation ?
 Aggregation Schemes
 Summary
Image Analysis & Retrieval, 2016 p.2
Scale Space Theory - Lindeberg
 Scale Space Response via Laplacian of Gaussian
 The scale is controlled by 𝜎
 Characteristic Scale:
Image Analysis & Retrieval, 2016 p.3
2
2
2
2
2
y
g
x
g
g






𝑔 = 𝑒
− 𝑥+𝑦 2
2𝜎
r
image
𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟
…
characteristic
scale
SIFT
 Use DoG to approximate LoG
 Separable Gaussian filter
 Difference of image instead of difference of Gaussian kernel
Image Analysis & Retrieval, 2016 p.4
L
o
G
Scale space construction
By Gaussian Filtering,
and Image Difference
Peak Strength & Edge Removal
 Peak Strength:
 Interpolate true DoG response and pixel location by Taylor
expansion
 Edge Removal:
 Re-do Harris type detection to remove edge on much reduced
pixel set
Image Analysis & Retrieval, 2016 p.5
Scale Invariance thru Dominant Orientation Coding
 Voting for the dominant orientation
 Weighted by a Gaussian window to give more emphasis to the
gradients closer to the center
Image Analysis & Retrieval, 2016 p.6
SIFT Matching and Repeatability Prediction
 SIFT Distance
Not all SIFT are created equal…
 Peak strength (DoG response at interpolated position)
Image Analysis & Retrieval, 2016 p.7
Combined scale/peak strength pmf
𝑑(𝑠1
1
, 𝑠 𝑘∗
2
)
𝑑(𝑠1
1
, 𝑠 𝑘
2
)
≤ 𝜃
Box Fitler – CABOX work
 Basic Idea:
 Approximate DoG with linear combination of box filters
min.
𝒉
𝒈 − 𝐵 ∙ 𝒉 𝐿2
2
+ 𝒉 𝐿1
 Solution by LASSO
Image Analysis & Retrieval, 2016 p.8
= h1*
h2*+ + …
Outline
 ReCap of Lecture 06
 SIFT
 Box Filter
 Image Retrieval System
 Why Aggregation ?
 Aggregation Schemes
 Summary
Image Analysis & Retrieval, 2016 p.9
Image Matching/Retrieval System
SIFT is a sub-image level feature, we actually care
more on how SIFT match will translate into image level
matching/retrieval accuracy
Say if we can compute a single distance from a
collection of features:
 Then for a data base of n images, we can compute an n
x n distance matrix
 This gives us full information of the performance of this
feature/distance system
 How to characterize the performance of such image matching
and retrieval system ?
Image Analysis & Retrieval, 2016 p.10
𝑑 𝐼1, 𝐼2 =
𝑘
𝛼 𝑘 𝑑(𝐹𝑘
1
, 𝐹𝑘
2
)
𝐷𝑖, 𝑘 = 𝑑(𝐼𝑗, 𝐼 𝑘)
Thresholding for Matching
 Basically, for any pair of Images (documents, in IR
jargon), we declare
 Then for each possible image pair, or pairs we care, for
a given threshold t, there will be 4 possible
consequences
 TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;
 FP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) >= t;
 TN pair: {Ij, Ik} declared non-matching pairs, d(Ij, Ik) >= t;
 FN pair: {Ij, Ik} declared non- matching pairs, d(Ij, Ik) < t;
Image Analysis & Retrieval, 2016 p.11
𝐼𝑗, 𝐼 𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ, 𝑖𝑓 𝑑 𝐼𝑗, 𝐼 𝑘 < 𝑡
𝐼𝑗, 𝐼 𝑘 𝑎𝑟𝑒𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Matching System Performance
 True Positive Rate/Precision:
 Out of retrieved matching pairs, how many are true matching
pairs
 For all matching pairs with distance < t
 False Positive Rate:
 Out of retrieved matching pairs, how many are actually
negative, false matchings
Image Analysis & Retrieval, 2016 p.12
𝑇𝑃𝑅 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑛
𝐹𝑃𝑅 =
𝑓𝑝
𝑓𝑝 + 𝑡𝑛
TPR-FPR
Definition:
TP rate = TP/(TP+FN)
FP rate = FP/(FP+TN)
From the
actual value
point of view
Image Analysis & Retrieval, 2016 p.13
ROC curve(1)
ROC = receiver operating characteristic
Y:TP rate
X:FP rate
Image Analysis & Retrieval, 2016 p.14
ROC curve(2)
Which method (A or B) is better?
compute ROC area: area under ROC
curve
Image Analysis & Retrieval, 2016 p.15
Precision, Recall, F-measure
Precision = TP/(TP + FP),
Recall = TP/(TP + FN)
F-measure = 2*(precision*recall)/(precision + recall)
Precision:
is the probability that a
retrieved document
is relevant.
Recall:
is the probability that a
relevant document
is retrieved in a search.
Image Analysis & Retrieval, 2016 p.16
Matlab Implementation
 We will compute all image
pair distances D(j,k)
 How do we compute the
TPR-FPR plot ?
 Understand that TPR and
FPR are actually function of
threshold t,
 Just need to parameterize
TPR(t) and FPR(t), and
obtaining operating points of
meaningful thresholds, to
generate the plot.
 Matlab Implementation:
 [tp, fp, tn,
fn]=getPrecisionRecall()
Image Analysis & Retrieval, 2016 p.17
d_min = min(min(d0), min(d1));
d_max = max(max(d0), max(d1));
delta = (d_max - d_min) / npt;
for k=1:npt
thres = d_min + (k-1)*delta;
tp(k) = length(find(d0<=thres));
fp(k) = length(find(d1<=thres));
tn(k) = length(find(d1>thres));
fn(k) = length(find(d0>thres));
end
if dbg
figure(22); grid on; hold on;
plot(fp./(tn+fp), tp./(tp+fn), '.-r',
'DisplayName', 'tpr-fpr');legend();
end
TPR-FPR
 Image Matching performance are characterized by
functions
 TPR(FPR)
 Retrieval set: we want high Precision, Short List: High
Recall.
Image Analysis & Retrieval, 2016 p.18
Outline
 ReCap of Lecture 06
 SIFT
 Box Filter
 Image Retrieval System
 Why Aggregation ?
 Aggregation Schemes
 Summary
Image Analysis & Retrieval, 2016 p.19
Why Aggregation ?
 What (Local) Interesting Points features bring us ?
 Scale and rotation invariance in the form of nk x d:
 Un-cerntainty of the number of detected features nk, at query
time
 Permutation along rows of features are the same
representation.
 Problems:
 The feature has state, not able to draw decision boundaries,
 Not directly indexable/hashable
 Typically very high dimensionality
Image Analysis & Retrieval, 2016 p.20
𝑆 𝑘| [𝑥 𝑘, 𝑦 𝑘, 𝜃 𝑘, 𝜎 𝑘, ℎ1, ℎ2, … , ℎ128] , 𝑘 = 1. . 𝑛
Decision Boundary in Matching
 Can we have a decision boundary function for
interesting points based representation ?
Image Analysis & Retrieval, 2016 p.21
…..
Curse of Dimensionality in Retrieval
 What feature dimensions will do to the retrieval
efficiency…
 Looking at retrieval 99% of per dimension locality, and the
total volume covered plot.
 Matlab: showDimensionCurse.m
Image Analysis & Retrieval, 2016 p.22
+
Aggregation – 30,000ft view
 Bag of Words
 Compute k centroids in feature space, called visual words
 Compute histogram
 k x1 feature, hard assignment
 VLAD
 Compute centroids in feature space
 Compute aggregaged difference w.r.t the centroids
 k x d feature, soft assignment
 Fisher Vector
 Compute a Gaussian Mixture Model (GMM) with 2nd order info
 Compute the aggregated feature w.r.t the mean and covariance of
GMM
 2 x k x d feature
 AKULA
 Adaptive centroids and feature count
 Improved with covariance ?
Image Analysis & Retrieval, 2016 p.23
0.5
0.4 0.05
0.05
Visual Key Words: main idea
Extract some local features from a number of
images …
Image Analysis & Retrieval, 2016 24
e.g., SIFT descriptor
space: each point is 128-
dimensional
Slide credit: D. Nister
Visual Key Words: main idea
Image Analysis & Retrieval, 2016 25Slide credit: D. Nister
Visual words: main idea
Image Analysis & Retrieval, 2016 26
Slide credit: D. Nister
Visual words: main idea
Image Analysis & Retrieval, 2016 27
Slide credit: D. Nister
Slide credit: D. Nister
Visual Key Words
Image Analysis & Retrieval, 2016 28
Each point is a local
descriptor, e.g. SIFT
vector.
Slide credit: D. Nister
Image Analysis & Retrieval, 2016 29
Visual words
Example: each group of patches belongs to the
same visual word
Image Analysis & Retrieval, 2016 30
Figure from Sivic & Zisserman, ICCV 2003
Visual words
Image Analysis & Retrieval, 2016 31
31
Source credit: K. Grauman, B. Leibe
• More recently used for describing scenes and
objects for the sake of indexing or classification.
Sivic & Zisserman 2003;
Csurka, Bray, Dance, & Fan
2004; many others.
Object Bag of ‘words’
ICCV 2005 short course, L. Fei-Fei
Bag of Words
Image Analysis & Retrieval, 2016 32
BoW Examples
 Illustration
Image Analysis & Retrieval, 2016 33
Bags of visual words
Summarize entire image based on its distribution
(histogram) of word occurrences.
Analogous to bag of words representation
commonly used for documents.
Image Analysis & Retrieval, 2016 34
Image credit: Fei-Fei Li
Texture Retrieval
Texons…
Image Analysis & Retrieval, 2016 35
Universal texton dictionary
histogram
Source: Lana Lazebnik
BoW Distance Metrics
Rank images by normalized scalar product
between their (possibly weighted) occurrence
counts---nearest neighbor search for similar
images.
Image Analysis & Retrieval, 2016 p.36
[5 1 1 0][1 8 1 4]
dj
q
Inverted List
 Image Retrieval via Inverted List
Image Analysis & Retrieval, 2016 37
Image credit: A. Zisserman
Visual
Word
number
List of image
numbers
When will this give us a significant gain in efficiency?
Indexing local features: inverted file index
For text documents, an
efficient way to find all pages
on which a word occurs is to
use an index…
We want to find all images in
which a feature occurs.
We need to index each
feature by the image it
appears and also we keep the
# of occurrence.
Image Analysis & Retrieval, 2016 38
Source credit : K. Grauman, B. Leibe
TF-IDF Weighting
Term Frequency – Inverse Document Frequency
 Describe image by frequency of each visual word within
it, down-weight words that appear often in the database
(Standard weighting for text retrieval)
Image Analysis & Retrieval, 2016 p.39
Total number of
words in database
Number of
occurrences of
word i in whole
database
Number of
occurrences of
word i in
document d
Number of
words in
document d
BoW Use Case with Spatial Localization
Collecting words within a query region
Image Analysis & Retrieval, 2016 40
Query region:
pull out only the SIFT
descriptors whose
positions are within the
polygon
Image Analysis & Retrieval, 2016 41
BoW Patch Search
Localizing the BoW representation
Image Analysis & Retrieval, 2016 42
Localization with BoW
Image Analysis & Retrieval, 2016 43
Hiearchical Assignment of Histogram
Tree construction:
Image Analysis & Retrieval, 2016 44
[Nister & Stewenius, CVPR’06]
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 45
[Nister & Stewenius, CVPR’06]
46
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 46Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
47
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 47Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 48
[Nister & Stewenius, CVPR’06]
Vocabulary Tree
Training: Filling the tree
Image Analysis & Retrieval, 2016 49
[Nister & Stewenius, CVPR’06]
50
Vocabulary Tree
Recognition
Image Analysis & Retrieval, 2016 50Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
RANSAC
verification
Vocabulary Tree: Performance
Evaluated on large databases
 Indexing with up to 1M images
Online recognition for database
of 50,000 CD covers
 Retrieval in ~1s
Find experimentally that large vocabularies can be
beneficial for recognition
Image Analysis & Retrieval, 2016 51
[Nister & Stewenius, CVPR’06]
Larger vocabularies
can be
advantageous…
But what happens if it
is too large?
Visual Word Vocabulary Size
 Performance w.r.t vocabulary size
Image Analysis & Retrieval, 2016 52
Bags of words: pros and cons
Good:
+ flexible to geometry / deformations / viewpoint
+ compact summary of image content
+ provides vector representation for sets
+ Inverted List implementation offers practical solution
against large repository
Bad:
- Lost of information at quantization and histogram
generation
- basic model ignores geometry – must verify afterwards,
or encode via features
- background and foreground mixed when bag covers
whole image
- interest points or sampling: no guarantee to capture
object-level parts
Image Analysis & Retrieval, 2016 53Source credit : K. Grauman, B. Leibe
Can we improve BoW ?
• E.g. Why isn’t our Bag of Words classifier at 90%
instead of 70%?
• Training Data
– Huge issue, but not necessarily a variable you can manipulate.
• Learning method
– BoW is on top of any feature scheme
• Representation
– Are we losing too much info in the process ?
Image Analysis & Retrieval, 2016 p.54
Standard Kmeans Bag of Words
 BoW revisited
Image Analysis & Retrieval, 2016 p.55
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
Motivation
Bag of Visual Words is only about counting the number
of local descriptors assigned to each Voronoi region
Why not including other statistics/information ?
Image Analysis & Retrieval, 2016 p.56
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
We already looked at the Spatial Pyramid/Pooling
Spatial Pooling
Image Analysis & Retrieval, 2016 p.57
level 2: 4x4level 0: 1x1 level 1: 2x2
Key take away: Multiple assignment ? Soft Assignment ?
Motivation
Bag of Visual Words is only about counting the number
of local descriptors assigned to each Voronoi region
Why not including other statistics? For instance:
• mean of local descriptors
Image Analysis & Retrieval, 2016 p.58
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
Motivation
Bag of Visual Words is only about counting the number
of local descriptors assigned to each Voronoi region
Why not including other statistics? For instance:
• mean of local descriptors
• (co)variance of local descriptors
Image Analysis & Retrieval, 2016 p.59
http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
Simple case: Soft Assignment
Called “Kernel codebook encoding” by Chatfield et al.
2011. Cast a weighted vote into the most similar
clusters.
Image Analysis & Retrieval, 2016 p.60
Simple case: Soft Assignment
Called “Kernel codebook encoding” by Chatfield et al.
2011. Cast a weighted vote into the most similar
clusters.
This is fast and easy to implement (try it for Project 3!)
but it does have some downsides for image retrieval –
the inverted file index becomes less sparse.
Image Analysis & Retrieval, 2016 p.61
A first example: the VLAD
Given a codebook ,
e.g. learned with K-means, and a set of
local descriptors :
•  assign:
•  compute:
• concatenate vi’s + normalize
Image Analysis & Retrieval, 2016 p.62
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
 3
x
v1 v2
v3 v4
v5
1
 4
 2
 5
① assign descriptors
② compute x-  i
③ vi=sum x-  i for cell i
A first example: the VLAD
A graphical representation of
Image Analysis & Retrieval, 2016 p.63
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
VL_FEAT Implementation
 Matlab:
Image Analysis & Retrieval, 2016 p.64
function [vc]=vladSiftEncoding(sift,
codebook)
dbg=1;
if dbg
if (0) % init VL_FEAT, only need
to do once
run('../../tools/vlfeat-
0.9.20/toolbox/vl_setup.m');
end
im = imread('../pics/flarsheim-
2.jpg');
[f, sift] =
vl_sift(single(rgb2gray(im))); sift =
single(sift');
[indx, codebook] = kmeans(sift,
16);
% make sift # smaller
sift = sift(1:800,:);
end
[n, kd]=size(sift);
[m, kd]=size(codebook);
% compute assignment
dist = pdist2(codebook, sift);
mdist = mean(mean(dist));
% normalize the heat kernel s.t. mean
dist is mapped to 0.5
a = -log(0.5)/mdist;
indx = exp(-a*dist);
vc=vl_vlad(sift', codebook', indx);
if dbg
figure(41); colormap(gray);
subplot(2,2,1); imshow(im);
title('image');
subplot(2,2,2); imagesc(dist);
title('m x n distance');
subplot(2,2,3); imagesc(indx);
title('m x n assignment');
subplot(2,2,4); imagesc(reshape(vc,
[m, kd]));title('vlad code');
end
VLAD Code
 What are the tweaks ?
 Code book design
 Soft Assignment options
Image Analysis & Retrieval, 2016 p.65
References
 Vocabulary Tree:
 David Nistér, Henrik Stewénius: Scalable Recognition with a Vocabulary
Tree. CVPR (2) 2006: 2161-2168
 VLAD:
 Herve Jegou, Matthijs Douze, Cordelia Schmid:
Improving Bag-of-Features for Large Scale Image Search. International
Journal of Computer Vision 87(3): 316-336 (2010)
 Fisher Vector:
 Florent Perronnin, Jorge Sánchez, Thomas Mensink:
Improving the Fisher Kernel for Large-Scale Image Classification.
ECCV (4) 2010: 143-156
 AKULA:
 Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park:
AKULA - Adaptive Cluster Aggregation for Visual Search. DCC 2014:
13-22
Image Analysis & Retrieval, 2016 p.66
Lec 07 Summary
 Image Retrieval System Metric
 What is true positive, false positive, true negative, false
negative ?
 What is precision, recall, F-score ?
Why Aggregation ?
 Decision boundary
 Indexing/Hashing
 Bag of Words
 A histogram with bins visual words
 Variations: hierarchical assignment with vocabulary tree
 Implementation: Inverted List
VLAD
 Richer encoding of aggregated info
 Soft assignment of features to codebook bins
 Vectorized representation – no need for inverted list
Image Analysis & Retrieval, 2016 p.67

Lec07 aggregation-and-retrieval-system

  • 1.
    Image Analysis &Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 07 Feature Aggregation and Image Retrieval System Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu p.1Image Analysis & Retrieval, 2016
  • 2.
    Outline  ReCap ofLecture 06  SIFT  Box Filter  Image Retrieval System  Why Aggregation ?  Aggregation Schemes  Summary Image Analysis & Retrieval, 2016 p.2
  • 3.
    Scale Space Theory- Lindeberg  Scale Space Response via Laplacian of Gaussian  The scale is controlled by 𝜎  Characteristic Scale: Image Analysis & Retrieval, 2016 p.3 2 2 2 2 2 y g x g g       𝑔 = 𝑒 − 𝑥+𝑦 2 2𝜎 r image 𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟 … characteristic scale
  • 4.
    SIFT  Use DoGto approximate LoG  Separable Gaussian filter  Difference of image instead of difference of Gaussian kernel Image Analysis & Retrieval, 2016 p.4 L o G Scale space construction By Gaussian Filtering, and Image Difference
  • 5.
    Peak Strength &Edge Removal  Peak Strength:  Interpolate true DoG response and pixel location by Taylor expansion  Edge Removal:  Re-do Harris type detection to remove edge on much reduced pixel set Image Analysis & Retrieval, 2016 p.5
  • 6.
    Scale Invariance thruDominant Orientation Coding  Voting for the dominant orientation  Weighted by a Gaussian window to give more emphasis to the gradients closer to the center Image Analysis & Retrieval, 2016 p.6
  • 7.
    SIFT Matching andRepeatability Prediction  SIFT Distance Not all SIFT are created equal…  Peak strength (DoG response at interpolated position) Image Analysis & Retrieval, 2016 p.7 Combined scale/peak strength pmf 𝑑(𝑠1 1 , 𝑠 𝑘∗ 2 ) 𝑑(𝑠1 1 , 𝑠 𝑘 2 ) ≤ 𝜃
  • 8.
    Box Fitler –CABOX work  Basic Idea:  Approximate DoG with linear combination of box filters min. 𝒉 𝒈 − 𝐵 ∙ 𝒉 𝐿2 2 + 𝒉 𝐿1  Solution by LASSO Image Analysis & Retrieval, 2016 p.8 = h1* h2*+ + …
  • 9.
    Outline  ReCap ofLecture 06  SIFT  Box Filter  Image Retrieval System  Why Aggregation ?  Aggregation Schemes  Summary Image Analysis & Retrieval, 2016 p.9
  • 10.
    Image Matching/Retrieval System SIFTis a sub-image level feature, we actually care more on how SIFT match will translate into image level matching/retrieval accuracy Say if we can compute a single distance from a collection of features:  Then for a data base of n images, we can compute an n x n distance matrix  This gives us full information of the performance of this feature/distance system  How to characterize the performance of such image matching and retrieval system ? Image Analysis & Retrieval, 2016 p.10 𝑑 𝐼1, 𝐼2 = 𝑘 𝛼 𝑘 𝑑(𝐹𝑘 1 , 𝐹𝑘 2 ) 𝐷𝑖, 𝑘 = 𝑑(𝐼𝑗, 𝐼 𝑘)
  • 11.
    Thresholding for Matching Basically, for any pair of Images (documents, in IR jargon), we declare  Then for each possible image pair, or pairs we care, for a given threshold t, there will be 4 possible consequences  TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;  FP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) >= t;  TN pair: {Ij, Ik} declared non-matching pairs, d(Ij, Ik) >= t;  FN pair: {Ij, Ik} declared non- matching pairs, d(Ij, Ik) < t; Image Analysis & Retrieval, 2016 p.11 𝐼𝑗, 𝐼 𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ, 𝑖𝑓 𝑑 𝐼𝑗, 𝐼 𝑘 < 𝑡 𝐼𝑗, 𝐼 𝑘 𝑎𝑟𝑒𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 12.
    Matching System Performance True Positive Rate/Precision:  Out of retrieved matching pairs, how many are true matching pairs  For all matching pairs with distance < t  False Positive Rate:  Out of retrieved matching pairs, how many are actually negative, false matchings Image Analysis & Retrieval, 2016 p.12 𝑇𝑃𝑅 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑛 𝐹𝑃𝑅 = 𝑓𝑝 𝑓𝑝 + 𝑡𝑛
  • 13.
    TPR-FPR Definition: TP rate =TP/(TP+FN) FP rate = FP/(FP+TN) From the actual value point of view Image Analysis & Retrieval, 2016 p.13
  • 14.
    ROC curve(1) ROC =receiver operating characteristic Y:TP rate X:FP rate Image Analysis & Retrieval, 2016 p.14
  • 15.
    ROC curve(2) Which method(A or B) is better? compute ROC area: area under ROC curve Image Analysis & Retrieval, 2016 p.15
  • 16.
    Precision, Recall, F-measure Precision= TP/(TP + FP), Recall = TP/(TP + FN) F-measure = 2*(precision*recall)/(precision + recall) Precision: is the probability that a retrieved document is relevant. Recall: is the probability that a relevant document is retrieved in a search. Image Analysis & Retrieval, 2016 p.16
  • 17.
    Matlab Implementation  Wewill compute all image pair distances D(j,k)  How do we compute the TPR-FPR plot ?  Understand that TPR and FPR are actually function of threshold t,  Just need to parameterize TPR(t) and FPR(t), and obtaining operating points of meaningful thresholds, to generate the plot.  Matlab Implementation:  [tp, fp, tn, fn]=getPrecisionRecall() Image Analysis & Retrieval, 2016 p.17 d_min = min(min(d0), min(d1)); d_max = max(max(d0), max(d1)); delta = (d_max - d_min) / npt; for k=1:npt thres = d_min + (k-1)*delta; tp(k) = length(find(d0<=thres)); fp(k) = length(find(d1<=thres)); tn(k) = length(find(d1>thres)); fn(k) = length(find(d0>thres)); end if dbg figure(22); grid on; hold on; plot(fp./(tn+fp), tp./(tp+fn), '.-r', 'DisplayName', 'tpr-fpr');legend(); end
  • 18.
    TPR-FPR  Image Matchingperformance are characterized by functions  TPR(FPR)  Retrieval set: we want high Precision, Short List: High Recall. Image Analysis & Retrieval, 2016 p.18
  • 19.
    Outline  ReCap ofLecture 06  SIFT  Box Filter  Image Retrieval System  Why Aggregation ?  Aggregation Schemes  Summary Image Analysis & Retrieval, 2016 p.19
  • 20.
    Why Aggregation ? What (Local) Interesting Points features bring us ?  Scale and rotation invariance in the form of nk x d:  Un-cerntainty of the number of detected features nk, at query time  Permutation along rows of features are the same representation.  Problems:  The feature has state, not able to draw decision boundaries,  Not directly indexable/hashable  Typically very high dimensionality Image Analysis & Retrieval, 2016 p.20 𝑆 𝑘| [𝑥 𝑘, 𝑦 𝑘, 𝜃 𝑘, 𝜎 𝑘, ℎ1, ℎ2, … , ℎ128] , 𝑘 = 1. . 𝑛
  • 21.
    Decision Boundary inMatching  Can we have a decision boundary function for interesting points based representation ? Image Analysis & Retrieval, 2016 p.21 …..
  • 22.
    Curse of Dimensionalityin Retrieval  What feature dimensions will do to the retrieval efficiency…  Looking at retrieval 99% of per dimension locality, and the total volume covered plot.  Matlab: showDimensionCurse.m Image Analysis & Retrieval, 2016 p.22 +
  • 23.
    Aggregation – 30,000ftview  Bag of Words  Compute k centroids in feature space, called visual words  Compute histogram  k x1 feature, hard assignment  VLAD  Compute centroids in feature space  Compute aggregaged difference w.r.t the centroids  k x d feature, soft assignment  Fisher Vector  Compute a Gaussian Mixture Model (GMM) with 2nd order info  Compute the aggregated feature w.r.t the mean and covariance of GMM  2 x k x d feature  AKULA  Adaptive centroids and feature count  Improved with covariance ? Image Analysis & Retrieval, 2016 p.23 0.5 0.4 0.05 0.05
  • 24.
    Visual Key Words:main idea Extract some local features from a number of images … Image Analysis & Retrieval, 2016 24 e.g., SIFT descriptor space: each point is 128- dimensional Slide credit: D. Nister
  • 25.
    Visual Key Words:main idea Image Analysis & Retrieval, 2016 25Slide credit: D. Nister
  • 26.
    Visual words: mainidea Image Analysis & Retrieval, 2016 26 Slide credit: D. Nister
  • 27.
    Visual words: mainidea Image Analysis & Retrieval, 2016 27 Slide credit: D. Nister
  • 28.
    Slide credit: D.Nister Visual Key Words Image Analysis & Retrieval, 2016 28 Each point is a local descriptor, e.g. SIFT vector.
  • 29.
    Slide credit: D.Nister Image Analysis & Retrieval, 2016 29
  • 30.
    Visual words Example: eachgroup of patches belongs to the same visual word Image Analysis & Retrieval, 2016 30 Figure from Sivic & Zisserman, ICCV 2003
  • 31.
    Visual words Image Analysis& Retrieval, 2016 31 31 Source credit: K. Grauman, B. Leibe • More recently used for describing scenes and objects for the sake of indexing or classification. Sivic & Zisserman 2003; Csurka, Bray, Dance, & Fan 2004; many others.
  • 32.
    Object Bag of‘words’ ICCV 2005 short course, L. Fei-Fei Bag of Words Image Analysis & Retrieval, 2016 32
  • 33.
    BoW Examples  Illustration ImageAnalysis & Retrieval, 2016 33
  • 34.
    Bags of visualwords Summarize entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation commonly used for documents. Image Analysis & Retrieval, 2016 34 Image credit: Fei-Fei Li
  • 35.
    Texture Retrieval Texons… Image Analysis& Retrieval, 2016 35 Universal texton dictionary histogram Source: Lana Lazebnik
  • 36.
    BoW Distance Metrics Rankimages by normalized scalar product between their (possibly weighted) occurrence counts---nearest neighbor search for similar images. Image Analysis & Retrieval, 2016 p.36 [5 1 1 0][1 8 1 4] dj q
  • 37.
    Inverted List  ImageRetrieval via Inverted List Image Analysis & Retrieval, 2016 37 Image credit: A. Zisserman Visual Word number List of image numbers When will this give us a significant gain in efficiency?
  • 38.
    Indexing local features:inverted file index For text documents, an efficient way to find all pages on which a word occurs is to use an index… We want to find all images in which a feature occurs. We need to index each feature by the image it appears and also we keep the # of occurrence. Image Analysis & Retrieval, 2016 38 Source credit : K. Grauman, B. Leibe
  • 39.
    TF-IDF Weighting Term Frequency– Inverse Document Frequency  Describe image by frequency of each visual word within it, down-weight words that appear often in the database (Standard weighting for text retrieval) Image Analysis & Retrieval, 2016 p.39 Total number of words in database Number of occurrences of word i in whole database Number of occurrences of word i in document d Number of words in document d
  • 40.
    BoW Use Casewith Spatial Localization Collecting words within a query region Image Analysis & Retrieval, 2016 40 Query region: pull out only the SIFT descriptors whose positions are within the polygon
  • 41.
    Image Analysis &Retrieval, 2016 41
  • 42.
    BoW Patch Search Localizingthe BoW representation Image Analysis & Retrieval, 2016 42
  • 43.
    Localization with BoW ImageAnalysis & Retrieval, 2016 43
  • 44.
    Hiearchical Assignment ofHistogram Tree construction: Image Analysis & Retrieval, 2016 44 [Nister & Stewenius, CVPR’06]
  • 45.
    Vocabulary Tree Training: Fillingthe tree Image Analysis & Retrieval, 2016 45 [Nister & Stewenius, CVPR’06]
  • 46.
    46 Vocabulary Tree Training: Fillingthe tree Image Analysis & Retrieval, 2016 46Slide credit: David Nister [Nister & Stewenius, CVPR’06]
  • 47.
    47 Vocabulary Tree Training: Fillingthe tree Image Analysis & Retrieval, 2016 47Slide credit: David Nister [Nister & Stewenius, CVPR’06]
  • 48.
    Vocabulary Tree Training: Fillingthe tree Image Analysis & Retrieval, 2016 48 [Nister & Stewenius, CVPR’06]
  • 49.
    Vocabulary Tree Training: Fillingthe tree Image Analysis & Retrieval, 2016 49 [Nister & Stewenius, CVPR’06]
  • 50.
    50 Vocabulary Tree Recognition Image Analysis& Retrieval, 2016 50Slide credit: David Nister [Nister & Stewenius, CVPR’06] RANSAC verification
  • 51.
    Vocabulary Tree: Performance Evaluatedon large databases  Indexing with up to 1M images Online recognition for database of 50,000 CD covers  Retrieval in ~1s Find experimentally that large vocabularies can be beneficial for recognition Image Analysis & Retrieval, 2016 51 [Nister & Stewenius, CVPR’06]
  • 52.
    Larger vocabularies can be advantageous… Butwhat happens if it is too large? Visual Word Vocabulary Size  Performance w.r.t vocabulary size Image Analysis & Retrieval, 2016 52
  • 53.
    Bags of words:pros and cons Good: + flexible to geometry / deformations / viewpoint + compact summary of image content + provides vector representation for sets + Inverted List implementation offers practical solution against large repository Bad: - Lost of information at quantization and histogram generation - basic model ignores geometry – must verify afterwards, or encode via features - background and foreground mixed when bag covers whole image - interest points or sampling: no guarantee to capture object-level parts Image Analysis & Retrieval, 2016 53Source credit : K. Grauman, B. Leibe
  • 54.
    Can we improveBoW ? • E.g. Why isn’t our Bag of Words classifier at 90% instead of 70%? • Training Data – Huge issue, but not necessarily a variable you can manipulate. • Learning method – BoW is on top of any feature scheme • Representation – Are we losing too much info in the process ? Image Analysis & Retrieval, 2016 p.54
  • 55.
    Standard Kmeans Bagof Words  BoW revisited Image Analysis & Retrieval, 2016 p.55 http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
  • 56.
    Motivation Bag of VisualWords is only about counting the number of local descriptors assigned to each Voronoi region Why not including other statistics/information ? Image Analysis & Retrieval, 2016 p.56 http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
  • 57.
    We already lookedat the Spatial Pyramid/Pooling Spatial Pooling Image Analysis & Retrieval, 2016 p.57 level 2: 4x4level 0: 1x1 level 1: 2x2 Key take away: Multiple assignment ? Soft Assignment ?
  • 58.
    Motivation Bag of VisualWords is only about counting the number of local descriptors assigned to each Voronoi region Why not including other statistics? For instance: • mean of local descriptors Image Analysis & Retrieval, 2016 p.58 http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
  • 59.
    Motivation Bag of VisualWords is only about counting the number of local descriptors assigned to each Voronoi region Why not including other statistics? For instance: • mean of local descriptors • (co)variance of local descriptors Image Analysis & Retrieval, 2016 p.59 http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf
  • 60.
    Simple case: SoftAssignment Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters. Image Analysis & Retrieval, 2016 p.60
  • 61.
    Simple case: SoftAssignment Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters. This is fast and easy to implement (try it for Project 3!) but it does have some downsides for image retrieval – the inverted file index becomes less sparse. Image Analysis & Retrieval, 2016 p.61
  • 62.
    A first example:the VLAD Given a codebook , e.g. learned with K-means, and a set of local descriptors : •  assign: •  compute: • concatenate vi’s + normalize Image Analysis & Retrieval, 2016 p.62 Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.  3 x v1 v2 v3 v4 v5 1  4  2  5 ① assign descriptors ② compute x-  i ③ vi=sum x-  i for cell i
  • 63.
    A first example:the VLAD A graphical representation of Image Analysis & Retrieval, 2016 p.63 Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
  • 64.
    VL_FEAT Implementation  Matlab: ImageAnalysis & Retrieval, 2016 p.64 function [vc]=vladSiftEncoding(sift, codebook) dbg=1; if dbg if (0) % init VL_FEAT, only need to do once run('../../tools/vlfeat- 0.9.20/toolbox/vl_setup.m'); end im = imread('../pics/flarsheim- 2.jpg'); [f, sift] = vl_sift(single(rgb2gray(im))); sift = single(sift'); [indx, codebook] = kmeans(sift, 16); % make sift # smaller sift = sift(1:800,:); end [n, kd]=size(sift); [m, kd]=size(codebook); % compute assignment dist = pdist2(codebook, sift); mdist = mean(mean(dist)); % normalize the heat kernel s.t. mean dist is mapped to 0.5 a = -log(0.5)/mdist; indx = exp(-a*dist); vc=vl_vlad(sift', codebook', indx); if dbg figure(41); colormap(gray); subplot(2,2,1); imshow(im); title('image'); subplot(2,2,2); imagesc(dist); title('m x n distance'); subplot(2,2,3); imagesc(indx); title('m x n assignment'); subplot(2,2,4); imagesc(reshape(vc, [m, kd]));title('vlad code'); end
  • 65.
    VLAD Code  Whatare the tweaks ?  Code book design  Soft Assignment options Image Analysis & Retrieval, 2016 p.65
  • 66.
    References  Vocabulary Tree: David Nistér, Henrik Stewénius: Scalable Recognition with a Vocabulary Tree. CVPR (2) 2006: 2161-2168  VLAD:  Herve Jegou, Matthijs Douze, Cordelia Schmid: Improving Bag-of-Features for Large Scale Image Search. International Journal of Computer Vision 87(3): 316-336 (2010)  Fisher Vector:  Florent Perronnin, Jorge Sánchez, Thomas Mensink: Improving the Fisher Kernel for Large-Scale Image Classification. ECCV (4) 2010: 143-156  AKULA:  Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park: AKULA - Adaptive Cluster Aggregation for Visual Search. DCC 2014: 13-22 Image Analysis & Retrieval, 2016 p.66
  • 67.
    Lec 07 Summary Image Retrieval System Metric  What is true positive, false positive, true negative, false negative ?  What is precision, recall, F-score ? Why Aggregation ?  Decision boundary  Indexing/Hashing  Bag of Words  A histogram with bins visual words  Variations: hierarchical assignment with vocabulary tree  Implementation: Inverted List VLAD  Richer encoding of aggregated info  Soft assignment of features to codebook bins  Vectorized representation – no need for inverted list Image Analysis & Retrieval, 2016 p.67