HardNet: Convolutional Network for Local Image Description

HARDNET:
CONVOLUTIONAL NETWORK FOR
LOCAL IMAGE DESCRIPTION
Anastasiia Mishchuk,
Dmytro Mishkin,
Filip Radenovic
Jiri Matas

Short review of methods for learning
of local descriptors
The L2Net
HardNet loss and architecture
Benchmarks
2
OUTLINE

3
TRAINING DATA
Discriminant Learning of Local Image Descriptors
Brown et al, PAMI2010
3 sets, 400k patches each:
• Liberty (shown)
• Notredame
• Yosemite
Size: 64x64, grayscale.
Obtained from SfM model,
3D point → DoG keypoints
Used in all learned
descriptors meantioned
in this presentation

4
CONVEXOPT (SIMONYAN ET AL, 2012)
Global margin loss
Simonyan et al, ECCV 2012
Convex optimization problem

5
MATCHNET
Han et al, CVPR2015.
Works well, but rely on metric network.
Approximate kNN methods, e.g. FLANN
cannot be applied directly
7x7 Conv
pad 1
64
24
ReLU
1 24
5x5 Conv
pad 2
64
ReLU
64
3x3 Conv
pad 1
ReLU
64
2x2 MP/2
32 32
2x2 MP/2
16
96
16
3x3 Conv
pad 1
ReLU
96
16
3x3 Conv
pad 1
ReLU
64
16
3x3 MP/2
64
8
8x8 Conv
ReLU
1
128
1x1 Conv
ReLU
1
256
1x1 Conv
ReLU
1
256
1x1 Conv
Softmax
1
2

6
DEEPCOMPARE
Zagoruyko and Komodakis, CVPR 2015
Works well, but rely on metric network.
Approximate kNN methods, e.g. FLANN
cannot be applied directly
7x7 Conv
pad 3
64
96
ReLU
1 96
5x5 Conv
pad 2
192
ReLU
192
3x3 Conv
pad 1
ReLU
64
2x2 MP/2
32 32
2x2 MP/2
16
256
16
8
8x8 Conv
ReLU
1 1
256
1x1 Conv
ReLU
256
1x1 Conv
Sigmoid
1
2x2 MP/2
256

9
Simo-Serra et al, ICCV 2015. Balntas et al, BMVC 2016
32
32
7x7 Conv
26
TanH
1
2x2 MP/2
13
6x6 Conv
TanH
8
64
8x8 Conv
TanH
1
12832
TFeat (Balntas et al, 2016)
• Even shallower and faster CNN,
• hard-negative mining:
by anchor swap in triplet.
• triplet margin loss on L2 distance
1
64
7x7
Conv
58
32
TanH
2x2
L2pool/2
29 6x6
Conv
TanH
23
64
5x5
Conv
TanH
4
12832
3x3
L2Pool/3 8
64
4x4
L2Pool/4
1
128
DeepDesc
(Simo-Serra et al, 2015 )
Relatively shallow and fast CNN.
Hard negative mining:
Contrastive loss on L2 distance

10
DESCRIPTOR COMPARISON
Descr. #layers
w/params
Loss Hard mining Kd-tree
ready
ConvexOpt 1 Global margin - +
DeepDesc 3 Contrastive + +
TFeat 3 Triplet margin +/- +
MatchNet 8 Cross entropy - -
DeepComp 5 Hinge - -
Balntas et al, BMVC 2016

11
L2NET. TIAN ET AL (CVPR 2017)
32 32 16 16
3x3 Conv
pad 1
32
32
BN + ReLU
1
3x3 Conv
pad 1
32
BN + ReLU
3x3 Conv
pad 1 /2
64
BN + ReLU
3x3 Conv
pad 1
64
BN + ReLU
3x3 Conv
pad 1 /2
BN + ReLU
8
128
3x3 Conv
pad 1
BN + ReLU
8
128
8x8 Conv
BN+ L2Norm
1
128

13
L2NET: LOSS TERMS
Softmax over row/column of distance matrix

14
L2NET: LOSS TERMS
Penalty on descriptor components correlation

15
L2NET: LOSS TERMS
Softmax over row/column of
distance matrix of intermediate
features
Penalty on descriptor components correlation

16
HARDNET
Triplet margin loss for hard negative
Penalty on descriptor channels correlation
Softmax over row/column of
distance matrix of intermediate
features

17
HARDNET (OURS)
3x3 Conv
pad 1
32
32
BN + ReLU
1
3x3 Conv
pad 1
32
BN + ReLU
3x3 Conv
pad 1 /2
64
BN + ReLU
3x3 Conv
pad 1
64
BN + ReLU
3x3 Conv
pad 1 /2
BN + ReLU
8
128
3x3 Conv
pad 1
BN + ReLU
8
128
8x8 Conv
BN+ L2Norm
1
128

19
DESCRIPTOR COMPARISON
Descr. #layers
w/params
Loss Hard mining Kd-tree
ready
ConvexOpt 1 Global margin - +
DeepDesc 3 Contrastive + +
TFeat 3 Triplet margin +/- +
MatchNet 8 Cross entropy - -
DeepComp 5 Hinge - -
L2Net 7 SoftMax + +
HardNet 7 Triplet margin + +

Loss comparison on
patch triplets
20

21
LOSSES COMPARISON,
DERIVATIVES

22
LOSSES COMPARISON,
DERIVATIVES

No gradient
from
negative example
Small gradients

23
LOSSES COMPARISON
Contrastive Softmax (L2Net) Triplet margin
FPR, Brown
Yos
0.009 0.009 0.006
mAUC, W1BS 0.072 0.083 0.083
mAUC, HP-T 0.153 0.157 0.164

26
RESULTS: W1BS DATASET
Mishkin et al, BMVC 2015
Nuisance factor:
Appearance Geometry Lighting Sensor

27
HPATCHES DATASETDoG, Hessian, Harris – in ref.image
~1300 patches per image kept.
Reprojected to other images with
3 levels of “affine frame noise” added
V: 57 image sixplets – photometric changes
I: 59 image sixplets – geometric changes
Balntas et al, CVPR 2017

29
RESULTS: MATCHING WITH VIEW
SYNTH
Datasets are already saturated
On par with
RootSIFT
Still challenging
due to multiple
nuisance factors
Zitnick and Ramnath, 2011, Mishkin et al 2015, Mikolajczyk et al. 2013,
Hauagge and Snavely, 2012, Kelman et al, 2007, Fernando et al. 2014

30
RESULTS: BOW OXFORD5K &
PARIS 6K
Philbin et al 2007, Philbin et al 2008

31
RESULTS: HQE OXFORD5K &
PARIS 6K

Thank you
for attention
PDF: https://arxiv.org/abs/1705.10872
Source and models: https://github.com/DagnyT/hardnet
32

HardNet: Convolutional Network for Local Image Description

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HardNet: Convolutional Network for Local Image Description

Similar to HardNet: Convolutional Network for Local Image Description (20)

Recently uploaded

Recently uploaded (20)

HardNet: Convolutional Network for Local Image Description