We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.
3. 3
TRAINING DATA
Discriminant Learning of Local Image Descriptors
Brown et al, PAMI2010
3 sets, 400k patches each:
• Liberty (shown)
• Notredame
• Yosemite
Size: 64x64, grayscale.
Obtained from SfM model,
3D point → DoG keypoints
Used in all learned
descriptors meantioned
in this presentation
12. 15
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
Softmax over row/column of
distance matrix of intermediate
features
Penalty on descriptor components correlation
13. 16
HARDNET
Triplet margin loss for hard negative
Penalty on descriptor channels correlation
Softmax over row/column of
distance matrix of intermediate
features
26. 29
RESULTS: MATCHING WITH VIEW
SYNTH
Datasets are already saturated
On par with
RootSIFT
Still challenging
due to multiple
nuisance factors
Zitnick and Ramnath, 2011, Mishkin et al 2015, Mikolajczyk et al. 2013,
Hauagge and Snavely, 2012, Kelman et al, 2007, Fernando et al. 2014