Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
[arXiv] [GitXiv] [slides] [video]
Spatial transformer
networks
Slides by Víctor Campos [GDoc]
Computer Vision Reading Group (28/03/2016)
1.
Introduction
Why do we need spatial transformer networks?
Why do we need
Spatial transformer networks?
Are Convolutional Neural Networks invariant to…
▪ Scale?
▪ Rotation?
▪ Translation?
Why do we need
Spatial transformer networks?
CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)
Why do we need
Spatial transformer networks?
Are Convolutional Neural Networks invariant to…
▪ Scale? No
▪ Rotation?
▪ Translation?
Why do we need
Spatial transformer networks?
Are Convolutional Neural Networks invariant to…
▪ Scale? No
▪ Rotation? No
▪ Translation?
Why do we need
Spatial transformer networks?
Are Convolutional Neural Networks invariant to…
▪ Scale? No
▪ Rotation? No
▪ Translation? Partially
Why do we need
Spatial transformer networks?
A. W. Harley, "An Interactive Node-Link Visualization of Convolutional Neural Networks," in ISVC, pages 867-877, 2015
2.
Spatial transformers
Understanding how they work
Intuition behind
Spatial transformers
Intuition behind
Spatial transformers
Intuition behind
Spatial transformers
Sampling!
Formulating
Spatial transformers
Three main differentiable blocks:
▪ Localisation network
▪ Grid generator
▪ Sampler
Grid generator:
Examples
Affine transform Attention model
- coordinates in the target (output) feature map
- coordinates in the source (input) feature map
Sampler:
Mathematical formulation
Generic sampling
kernel
From the grid
generator
All pixels in the
output feature map
3.
Experiments
Evaluating spatial transformer networks
Experiment #1:
Distorted MNIST
Experiment #1:
Distorted MNIST
Distortions: Rotation, Translation, Projective, Elastic
Transformations: Affine, Projective, Thin Plate Spline (TPS)
Experiment #2:
Multiple spatial transformers
Experiment #2:
Adding two digits in an image
Experiment #2:
Adding two digits in an image
Experiment #2:
Adding two digits in an image
Other experiments:
Applications of spatial transformers
▪ Street View House Numbers
▪ Fine-grained classification
4.
Conclusions
Conclusions:
Spatial transformer networks
▪ A module that performs spatial transformations to features has
been presented
▪ It is differentiable and learnt in an end-to-end fashion
▪ No modifications to the loss function are needed
▪ Outperforms the state-of-the-art performance in some tasks
Thanks!
Any question?

Spatial transformer networks