Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence

Sangryul Jeon
School of Electrical and Electronic Engineering
Yonsei University
Feb. 19, 2019
PYRAMIDAL AFFINE REGRESSION NETWORKS
FOR DENSE SEMANTIC CORRESPONDENCE

2
Contents
I. Introduction
II. Problem Formulation and Overview
III. Pyramidal Affine Regression Networks
IV. Training
V. Experimental Results
VI. Conclusion

4
Introduction
Correspondence
• Image alignment
• Image registration
• Optical flow
• Stereo
• Etc.
One of the most fundamental and essential tasks in computer vision
[Aubry et al., CVPR’14]

5
Introduction
Dense Correspondence
• Establishing dense correspondences between visually similar images, i.e., taken
under similar viewpoints or times
• Are they enough to deal with challenging scenarios?
To achieve 3D depth Information To achieve motion Information
Stereo Matching Optical Flow

6
Introduction
Dense Semantic Correspondence
• Establishing dense correspondences between semantically similar images, i.e.,
different instances within the same object or scene category
• For example, the wheels of two different cars, the body of people and animals, etc.
Semantic Correspondence

7
Introduction
Dense Semantic Correspondence: Applications
[Hassner&Basri’13]
Shape by-Example
[Liu et al.’11]
Depth TransferLabel Transfer / Scene Parsing
Face Recognition
[Liu et al.’11]
View Synthesis
[Hassner et al.’13]
[Karsch et al.’14]
[slide courtesy: T. Hassner]

8
Introduction
Challenges in Semantic Correspondence
[Image courtesy: Andrea Vedaldi]

9
Introduction
Challenges in Semantic Correspondence
Photometric Deformations
?
• Different imaging modalities
• Intra-class appearance variations
• Etc.
• Different viewpoints or baselines
• Non-rigid shape deformations
• Etc.
Geometric Deformations

10
II. PROBLEM FORMULATION
AND OVERVIEW

11
Problem Formulation and Overview
Estimating local transformation across semantically similar images
• Affine Transformation Fields
• Non-rigid image deformations can be locally well approximated by affine
transformations
• Establishing dense affine transformation fields between images

Estimating local transformation across semantically similar images
• Affine Transformation Fields (2 × 3 Matrix)
that maps pixel to , and in homogeneous coordinates
12
,
,
i
i
i
 
  
 
x
y
T
T
T
i ii  Ti [ ,1]T
ii
i ii  Ti
iT

13
1. Smoothness constraints within pyramidal graph model
• J. Hur et al., “Generalized Deformable Spatial Pyramid: Geometry-Preserving
Dense Correspondence Estimation”, CVPR’2015
• Major weaknesses
1. Still tremendous solution spaces
2. Handcrafted descriptors and optimization technique

14
2. Transformation parameter regression through CNN architecture
• Traditional matching pipeline

 Histogram of Oriented Gradients (HOG) [Dalal et al., CVPR’05]
 Scale Invariant Feature Transform (SIFT) [Liu et al., ECCV’08]
 DAISY [Tola et al., CVPR’08]
Handcrafted
Feature
Representation
Feature Matching/
Optimization
Parameter
Estimator

15
• CNN architecture for geometric matching

 CNNgeometric [Rocco et al., CVPR’17]
 CNNgeometric with supervision from inliers [Rocco et al., CVPR’18]
 Attentive Semantic Alignment Networks [Seo et al., ECCV’18]
CNN
Feature
Representation
Feature Matching
/Correlation
Layer
Transform.
Parameter
Regressor

16
• Major weaknesses
1. Assumption of global transformation
2. Synthesize training data in a self-supervising manner

17
III. PYRAMIDAL AFFINE REGRESSION
NETWORKS

18
Pyramidal Affine Regression Networks
Visualization of our PARN results
• Dense affine transformation fields are progressively estimated in a coarse-to-fine
manner, so that the smoothness is naturally imposed within deep networks
Image pair
Level 1 Level 2 Level 3 Level 4
Warped Results

19
Network Architecture
• Overall framework

1. Hierarchical Feature Extraction
• Leverage the feature hierarchies in CNN
• : Convolutional activation
• : siamese network parameters
→ Handle the trade-off between semantic robustness and matching precision
20
cW

21
2. Constrained cost volume construction
• The cost volume between two extracted features is computed with a rectified
cosine similarity
Level 1 Level 2 Level 3 Level 4Image pair

22
3. Locally-varying affine transformation field
• Progressively divide each grid into four rectangular grids, yielding
T
1 1
2 2k k 


23
• Discontinuities between nearby affine fields result blocky artifacts around grid
boundaries
Level 1 Level 2 Level 3Image pair

24
• To alleviate this, a bilinear upsampler is applied at the end of successive CNNs
Affine field upsampling

25
Level 1 Level 2
wo/Upsamp.
Level 3Image pair Level 3
wo/Upsamp.
Level 2 Level 4

Generating Progressive Supervisions
• Challenges: the lack of ground-truth semantic correspondences
• How to learn the network without pixel-level ground-truth annotations?
• Our solution: Correspondence consistency
→ weakly-supervised learning using tentative training samples
27
Training

28
Training
• Correspondence consistency in computer vision
Shape Matching Co-segmentation SfM
Collection of
Correspondences
[Huang et al., SGP’13] [Wang et al., ICCV’13] [Zach et al., CVPR’10]
[Zhou et al., CVPR’15] [Zhou et al., ICCV’15]
[Slide courtesy: Tinghui Zhou]

29
Training
• Supervisions are progressively obtained during training

30
Training
• Supervisions are progressively obtained during training
Image pair
Level 1 Level 2 Level 3 Level 4
Benchmark
Annotations

32
Experimental Results
Experimental Settings
• Three grid-level modules ( )
• sampled after intermediate pooling layers :`conv5-3’, `conv4-3’, `conv3-3’
• is set to the ratio of the whole search space : {1/10,1/10,1/15,1/15}
Comparison to the lastest methods on semantic correspondence
• “Convolutional Neural Network Architecture for Geometric Matching” (CNNgeo),
CVPR’18
• “SCNet: Learning Semantic Correspondence” (SCNet), ICCV’17
• “DCTM: Discrete-Continuous Transformation Matching” (DCTM), ICCV’17
3K 
( )M k
( )r k

33
On TSS benchmark

34
On TSS benchmark

35
On TSS benchmark
Source Target CNNgeo SCNet DCTM PARN

36
On PF-PASCAL& Caltech 101

37
On PF-PASCAL& Caltech 101
Source Target CNNgeo SCNet DCTM PARN

39
Conclusion
• We propose a CNN architecture which estimates locally-varying affine
transformation fields across semantically similar images
• Our network was trained in a weakly-supervised manner, using
correspondence consistency training image pairs.
• We believe PARN can potentially benefit instance-level object
detection and segmentation, thanks to its robustness to severe
geometric variations

Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence

Similar to Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence (20)

More from NAVER Engineering

More from NAVER Engineering (20)

Recently uploaded

Recently uploaded (20)

Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence