HARDNET: 
CONVOLUTIONAL NETWORK FOR 
LOCAL IMAGE DESCRIPTION
Anastasiia Mishchuk,
Dmytro Mishkin,
Filip Radenovic
Jiri Matas
Short review of methods for learning 
of local descriptors
The L2­Net
HardNet loss and architecture
Benchmarks
2
OUTLINE
3
TRAINING DATA
Discriminant Learning of Local Image Descriptors
Brown et al, PAMI2010
3 sets, 400k patches each:
• Liberty (shown)
• Notredame
• Yosemite
Size: 64x64, grayscale.
Obtained from SfM model,
3D point → DoG keypoints
Used in all learned
descriptors meantioned
in this presentation
4
CONVEXOPT (SIMONYAN ET AL, 2012)
Global margin loss
Simonyan et al, ECCV 2012
Convex optimization problem
5
MATCHNET
Han et al, CVPR2015.
Works well, but rely on metric network.
Approximate kNN methods, e.g. FLANN
cannot be applied directly
7x7 Conv
pad 1
64
24
ReLU
1 24
5x5 Conv
pad 2
64
ReLU
64
3x3 Conv
pad 1
ReLU
64
2x2 MP/2
32 32
2x2 MP/2
16
96
16
3x3 Conv
pad 1
ReLU
96
16
3x3 Conv
pad 1
ReLU
64
16
3x3 MP/2
64
8
8x8 Conv
ReLU
1
128
1x1 Conv
ReLU
1
256
1x1 Conv
ReLU
1
256
1x1 Conv
Softmax
1
2
6
DEEPCOMPARE
Zagoruyko and Komodakis, CVPR 2015
Works well, but rely on metric network.
Approximate kNN methods, e.g. FLANN
cannot be applied directly
7x7 Conv
pad 3
64
96
ReLU
1 96
5x5 Conv
pad 2
192
ReLU
192
3x3 Conv
pad 1
ReLU
64
2x2 MP/2
32 32
2x2 MP/2
16
256
16
8
8x8 Conv
ReLU
1 1
256
1x1 Conv
ReLU
256
1x1 Conv
Sigmoid
1
2x2 MP/2
256
9
Simo-Serra et al, ICCV 2015. Balntas et al, BMVC 2016
32
32
7x7 Conv
26
TanH
1
2x2 MP/2
13
6x6 Conv
TanH
8
64
8x8 Conv
TanH
1
12832
TFeat (Balntas et al, 2016)
• Even shallower and faster CNN,
• hard-negative mining:
by anchor swap in triplet.
• triplet margin loss on L2 distance
1
64
7x7
Conv
58
32
TanH
2x2
L2pool/2
29 6x6
Conv
TanH
23
64
5x5
Conv
TanH
4
12832
3x3
L2Pool/3 8
64
4x4
L2Pool/4
1
128
DeepDesc
(Simo-Serra et al, 2015 )
Relatively shallow and fast CNN.
Hard negative mining:
Contrastive loss on L2 distance
10
DESCRIPTOR COMPARISON
Descr. #layers
w/params
Loss Hard mining Kd-tree
ready
ConvexOpt 1 Global margin - +
DeepDesc 3 Contrastive + +
TFeat 3 Triplet margin +/- +
MatchNet 8 Cross entropy - -
DeepComp 5 Hinge - -
Balntas et al, BMVC 2016
11
L2NET. TIAN ET AL (CVPR 2017)
32 32 16 16
3x3 Conv
pad 1
32
32
BN + ReLU
1
3x3 Conv
pad 1
32
BN + ReLU
3x3 Conv
pad 1 /2
64
BN + ReLU
3x3 Conv
pad 1
64
BN + ReLU
3x3 Conv
pad 1 /2
BN + ReLU
8
128
3x3 Conv
pad 1
BN + ReLU
8
128
8x8 Conv
BN+ L2Norm
1
128
 
13
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
14
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
Penalty on descriptor components correlation
15
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
Softmax over row/column of
distance matrix of intermediate
features
Penalty on descriptor components correlation
16
HARDNET
Triplet margin loss for hard negative
Penalty on descriptor channels correlation
Softmax over row/column of
distance matrix of intermediate
features
 
17
HARDNET (OURS)
3x3 Conv
pad 1
32
32
BN + ReLU
1
3x3 Conv
pad 1
32
BN + ReLU
3x3 Conv
pad 1 /2
64
BN + ReLU
3x3 Conv
pad 1
64
BN + ReLU
3x3 Conv
pad 1 /2
BN + ReLU
8
128
3x3 Conv
pad 1
BN + ReLU
8
128
8x8 Conv
BN+ L2Norm
1
128
18
BATCH SIZE INFLUENCE
19
DESCRIPTOR COMPARISON
Descr. #layers
w/params
Loss Hard mining Kd-tree
ready
ConvexOpt 1 Global margin - +
DeepDesc 3 Contrastive + +
TFeat 3 Triplet margin +/- +
MatchNet 8 Cross entropy - -
DeepComp 5 Hinge - -
L2Net 7 SoftMax + +
HardNet 7 Triplet margin + +
Loss comparison on 
patch triplets
20
21
LOSSES COMPARISON, 
DERIVATIVES
   
 
22
LOSSES COMPARISON, 
DERIVATIVES
   
 
No gradient
from
negative example
Small gradients
23
LOSSES COMPARISON
Contrastive Softmax (L2Net) Triplet margin
FPR, Brown
Yos
0.009 0.009 0.006
mAUC, W1BS 0.072 0.083 0.083
mAUC, HP-T 0.153 0.157 0.164
Results 
24
25
RESULTS: BROWN DATASET
26
RESULTS: W1BS DATASET
Mishkin et al, BMVC 2015
Nuisance factor:
Appearance Geometry Lighting Sensor
27
HPATCHES DATASETDoG, Hessian, Harris – in ref.image
~1300 patches per image kept.
Reprojected to other images with
3 levels of “affine frame noise” added
V: 57 image sixplets – photometric changes
I: 59 image sixplets – geometric changes
Balntas et al, CVPR 2017
28
RESULTS: HPATCHES
29
RESULTS: MATCHING WITH VIEW 
SYNTH
Datasets are already saturated
On par with
RootSIFT
Still challenging
due to multiple
nuisance factors
Zitnick and Ramnath, 2011, Mishkin et al 2015, Mikolajczyk et al. 2013,
Hauagge and Snavely, 2012, Kelman et al, 2007, Fernando et al. 2014
30
RESULTS: BOW OXFORD5K & 
PARIS 6K
Philbin et al 2007, Philbin et al 2008
31
RESULTS: HQE OXFORD5K & 
PARIS 6K
Thank you
for attention
PDF:                         https://arxiv.org/abs/1705.10872
Source and models: https://github.com/DagnyT/hardnet 
32

HardNet: Convolutional Network for Local Image Description