Visual geometry with deep learning

Visual Geometry with
Deep Learning
Kwang Moo Yi

University of Victoria

Data
!4
“make use of the best ally we have: the
unreasonable eﬀectiveness of data.”
Alon Halevy, Peter Norvig, and Fernando Pereira, The unreasonable effectiveness of data. IEEE
Intelligent Systems, 24(2), 8-12. 2009

Effectiveness of data in deep learning
!5
Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning
era. InComputer Vision (ICCV), 2017 IEEE International Conference on 2017 Oct 22 (pp. 843-852).
IEEE. Image from arXiv preprint version
MSCOCO PASCAL VOC 2007
Object detection performance

Why is data useful?
!6
“… perhaps when it comes to natural
language processing … will never have
the elegance of physical equations…”

Using data
• Learn the limitations of your data

• Understand how data is acquired

• Identify where the mathematical elegance
becomes impractical

• Domain knowledge

Multi-view Geometry
Hotel Images are in the public domain. Modified to simulate 3D rotation !10

C1
Hotel Images are in the public domain. Modified to simulate 3D rotation
Multi-view Geometry
!11

C1
C2
Multi-view Geometry
!12

C1
C2
How did the camera move?
Multi-view Geometry
!13

Drone image is from parrot. Reproduced for educational purposes.
Multi-view Geometry
!14

Drone image is from parrot. Reproduced for educational purposes.
Multi-view Geometry
!15
Car image is CC0

Camera Pose
!16
[Crivelaro et. al, TPAMI, 2019]

Camera Pose
!17
[Klein and Murray, ISMAR, 2007]

C1
C2
Multi-view Geometry
How did the camera move?
!18

C1
C2
Multi-view Geometry
Find corresponding points and
triangulate!
!19

C1
C2
Multi-view Geometry
triangulate!
!20

C1
C2
Multi-view Geometry
triangulate!
!21

Best tool for matching points across images.
SIFT (Lowe, ICCV’99) started the trend: ~68k citations.
Interest Points
!22

LIFT: Learned Invariant Feature Transform
DET Crop
ORI Rot DESC
LIFT pipeline
SCORE MAP
softargmax
description
vector
!23
Y. Verdie, K.M. Yi, P. Fua, V. Lepetit:
"TILDE: A Temporally Invariant
Learned DEtector", CVPR 2015.
K.M. Yi, Y. Verdie, V. Lepetit,
P. Fua : ”Learning to Assign
Orientations to Feature
Points", CVPR 2016 (Oral)
K.M. Yi, E. Trulls, V. Lepetit, P. Fua:
“LIFT: Learned Invariant Feature
Transform", ECCV 2016 (Spotlight)

Quantitative results
0.165
0.22
SIFT SURF ORB Daisy sGLOH MROGH LIOP BiCE
BRISK FREAK VGG DeepDesc PN-Net KAZE LIFT (pic) LIFT (rf)
0
0.1
0.2
0.3
0.4
Avg. matching score on ‘Strecha’
0
0.08
0.16
0.24
0.32
Avg. matching score on ‘DTU’
0
0.055
0.11
0.165
0.22
Avg. matching score on ‘Webcam’
LIFT with ‘pic’ dataset
LIFT with ‘rf’ dataset
• Best performance on all datasets, with either ‘pic’ or ‘rf’.
• Surprising? SIFT remains #3 overall (#1: ours, #2: VGG).
!24

Quantitative results on
outdoor scenes
!27

Quantitative results on
outdoor scenes
!28
Simply training for scale
invariance gave best results

Camera Pose?
!29
mAP20degrees
0
0.1
0.2
0.3
0.4
SIFT+RANSAC SIFT+CVPR18 SIFT+arXiv19 LF-Net+arXiV19

TL; DR
• End-to-end pipeline for local feature matching

• Learning with non-diﬀerentiable components within Deep Learning

• Tighter formulation —> better performance
!30

TL; DR
• End-to-end pipeline for local feature matching

• Learning with non-diﬀerentiable components within Deep Learning

• Tighter formulation —> better performance
!31
Beyond?

Towards practical benchmarks
Beyond
!32
Towards less/no supervision
Towards stable optimization
Towards “active” data acquisition

Beyond
!33

Image Matching: Local Features and Beyond
https://image-matching-workshop.github.io
Vassileios Balntas (Scape), Vincent Lepetit (U. Bordeaux), Johannes Schönberger (Microsoft), Eduard
Trulls (Google), Kwang Moo Yi (U. Victoria)

The phototourism challenge: Data
36

37

● 25k images in total for training.
● “Quasi” ground truth data is generated by
performing SfM with COLMAP with all
images.
○ Assumption: Images registered in
COLMAP are accurate given enough
images.
● Valid pairs are generated via simple visibility
check.
38

● 4k images in total for testing.
● Random bags of images are
subsampled to form test subsets
(size: 3, 5, 10, 25).
39

The phototourism challenge: local features
● Submission: Features
● IMW evaluates them via a typical
stereo/SfM pipeline
○ Nearest neighbor matching
○ 1-to-1 matching
○ RANSAC_F
○ COLMAP
40

The phototourism challenge: matches
● Submission: Features + Matches
stereo/SfM pipeline
○ 1-to-1 matching
○ RANSAC_F
○ COLMAP
41

The phototourism challenge: poses
● Submission: Poses
stereo/SfM pipeline
○ 1-to-1 matching
○ RANSAC_F
○ COLMAP
42

Improving with descriptors (multi-view task)
+12%
+23%
+26% +28%
+30% +32%
Full results: https://image-matching-workshop.github.io/leaderboard 43

Improving with matching (multi-view task)
+11%
+37%
+14%
+35%
SuperPoint: Self-Supervised Interest Point Detection and Description. DeTone et al., 2018.
ContextDesc: Local Descriptor Augmentation with Cross-Modality Context. Luo et al., CVPR'19
Learning to Find Good Correspondences. Yi et al., CVPR'18
44

End-to-end pipelines
SuperPoint: Self-Supervised Interest Point Detection and Description. DeTone et al., 2018.
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Dusmanu et al., CVPR'19 45

Beyond
!47

Beyond
!48

LF-Net: Inference
Image-level Scale-space Heatmap Learning
!50

LF-Net: Inference
Image-level Scale-space Heatmap Learning
Extract top-K patches
!51

LF-Net: Inference
Back propagation breaks
Extract top-K patches
!52

LF-Net: Training
!53
Back propagation until here

LF-Net: Training
!55
Back prop. with

results from other branch

LF-Net: Training
!56
Apply score map cleaning, etc.

(traditional heuristics)

LF-Net: Training
!57
Can we simply back propagate without
requiring the second branch?

!58
[Angles et. al, arXiv, 2019]
MIST
Multiple Instance Spatial Transformer Networks
Learning to localize & understand is easy when there are
only single instances of the object in the scene

!59
MIST
Non-trivial when multiple instances exist

!60
MIST
Non-trivial when multiple instances exist

Key Idea
Lifting via slack variable
!61

!62
MIST

!63
MIST
Lifting the optimization to circumvent top-K

!64
MIST
Treat intermediate heatmap as slack variable

!65
MIST
Back propagate in two stages

!66
MIST
Learning digits with supervision on “number of things”

!67
MIST
Learning basis kernel with supervision on “number of things”

!68
MIST
Learning to ﬁnd digits without locational supervision

!69
MIST

!70
MIST
Better than with supervision?!

!71
MIST
Back propagate in two stages

Beyond
!72

Beyond
!73

!74
[Jiang et. al, arXiv, 2019]
Linearized Multi-Sampling
Bilinear sampling Our method
Visualization of gradients w.r.t. crop location.

Should point towards centre.

!75

!76

!78

!79
Intensities

!80
Intensities
Coordinates

!81
Intensities
Coordinates
Plane equation — dY/DX

!82
Intensities
Coordinates
Plane equation — dY/DX

!83

Qualitative Highlights: Image alignment
Blue: bounding-box of the target region
Red: bounding-box from bilinear sampling
Green: bounding-box from our method
Target image
Bilinear sampling [14]Our method

!87

!88

!89

Beyond
!90

Beyond
!91

Magnetic Resonance Imaging
!96
ﬁxed
sampling
Reconstruction
acquisitions
FT-1
…
sampling
randomly
chosen
Learned from data

!97
[Jin et. al, arXiv, 2019]
Accelerated MRI(Reconstructed)
Image
Residual
Samplingpattern
inFourierSpace

!98
ﬁxed
sampling
Reconstruction
acquisitions
FT-1
…
sampling
randomly
chosen
Learned from data

!99
ﬁxed
sampling
Reconstruction
acquisitions
FT-1
…
sampling
randomly
chosen
Learned from data

!100
Accelerated MRI(Reconstructed)
Image
Residual
Samplingpattern
inFourierSpace

!101
Accelerated MRI
Learning
both to
acquire
data and
use data
(Reconstructed)
Image
Residual
Samplingpattern
inFourierSpace

!102
ﬁxed
sampling
Reconstruction
acquisitions
FT-1
…
sampling
randomly
chosen

!103
ﬁxed
sampling
Reconstruction
acquisitions
FT-1
…
sampling
randomly
chosen
Sampler
(Deep Net)

!104
acquisitions
FT-1
…
sampling
randomly
chosen
Sampler
(Deep Net)
Reconstrutor
(Deep Net)

!105
acquisitions
FT-1
…
sampling
randomly
chosen
Sampler
(Deep Net)
Reconstrutor
(Deep Net)
Non-diﬀerentiable

Key Idea
Self supervision
Reinforcement Learning
!106

!108
Accelerated MRI
Progressive sampling
Decompose & Simplify

• ReconNet learns to
reconstruct

• SampleNet learns to predict
the next best sample position

!109
Accelerated MRI
Self-supervision through MCTS

with implicit minimax
Enhance via Self-supervision

• MCTS provides better
direction

• Supervision to improve, not
ground-truth

!110
Accelerated MRI


!111
Accelerated MRI
Performs best when using both components of our method together.

!112
Accelerated MRI
When reconstructing vis simple zero ﬁlling inverse Fourier
Transform, learned sampling does not perform well.

!113
Accelerated MRI
Neither does the learned reconstruction when used with  
other sampling patterns.

!114
Accelerated MRI
Neither does the learned reconstruction when used with  
other sampling patterns.

!115
Accelerated MRI
(Reconstructed)
Image
Residual
Samplingpattern
inFourierSpace

Accelerated MRI
(Reconstructed)
Image
Residual
Samplingpattern
inFourierSpace
!116

!117
Accelerated MRI


Beyond
!118

Data
!119
“make use of the best ally we have: the
unreasonable eﬀectiveness of data.”

Thank you!
People behind our research (in the order of appearance)
Code and Datasets: https://github.com/vcg-uvic

Visual geometry with deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Visual geometry with deep learning

Similar to Visual geometry with deep learning (20)

More from NAVER Engineering

More from NAVER Engineering (20)

Recently uploaded

Recently uploaded (20)

Visual geometry with deep learning