SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

•

1 like•898 views

Dae Hyun Nam

A Paper Review of SPPNet SPPNet 논문정리한 문서입니다.

Data & Analytics

Dae Hyun Nam
@dev_strender
<Paper Review>
SPPNet

Introduction
- initial technical report for ECCV 2014
- published on Arxiv in Apr, 2015
“Spatial Pyramid Pooling in Deep convolutional Networks for Visual Recognition”
https://arxiv.org/abs/1406.4729

Introduction
Object Detection History(?)
R-CNN - Spatial Pyramid Pooling (SPP-Net) - Fast R-CNN - Faster R-CNN - Mask R-CNN

Backgrounds
일반적인 CNN은 항상 fixed-size image 만 고려 ex) 224 x 224
이러한 입력 영상의 크기를 맞추기 위해서, 다양한 크기의 이미지를 이 네트워크에 집어넣기 위해서는 cropping,
warping 등 영상에 대한 전처리를 통해서 이루어졌다.

Backgrounds
하지만,
cropping 한 결과가 전체 object를 담아내지 못할 수
도 (그림 왼쪽),
warping 한 결과가 geometric distortion을 발생시
킬 수 있다.

Backgrounds
그러면, 왜 사람들은 항상 fixed-size 의 네트워크를 설계하였는가?
원인은 바로 “Fully Connected Layer”
Convolutional layer는 sliding window 방식으로 계산되기 때문에, 굳이 output 뉴런의 개수가 fixed 될 필요가
없다.
하지만, FC의 경우는, input neuron 의 개수가 미리 정해져 있어야 하기 때문에, 여기서부터 문제 발생
논문에서는, 이미지 크기 변경없이 SPP를 이용한 고정된 개수의 feature extraction 을 가능하게 한다.

3 Advantages
1) fixed-length output 을 만들어낼 수 있다.
2) multi-level spatial bin 을 이용하였다.
3) 다양한 level에서 feature 를 뽑아낼 수 있다.

What normal Pooling do?
sliding window pooling
input 크기에 따라, output 의 크기가 달라진다.

Proposed Pooling
output => k * M
k : number of filters of past convolutional layer
M : bin size. 입력 영상 크기에 따라 동적으로 결정
global pooling 도 중요!

Spatial Pyramid Pooling Layer
3 levels of SPP (bin : 4x4, 2x2, 1x1)
=> output dim: 21 * 256
[참고]

how to get “FIXED”?
pre-compute the bin sizes needed for spatial pyramid pooling
ex) feature map size after conv5 : 13 * 13 (a * a)
-> n x n bin level => window size == ceil(a/n)
-> n x n bin level => stride == floor(a/n)

Training
1. Single size training
proposed) training with 2244 * 224 images
enable the multi-level pooling behavior
2. Multi size training

Drawbacks
hard to implement back-propagation

Similar to SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...MYEONGGYU LEE

Image Deep Learning 실무적용Youngjae Kim

Feature Pyramid Network, FPNInstitute of Agricultural Machinery, NARO

History of Vision AITae Young Lee

HistoryOfCNNTae Young Lee

LeNet & GoogLeNetInstitute of Agricultural Machinery, NARO

텐서플로우 2.0 튜토리얼 - CNNHwanhee Kim

Deep learningwithkeras ch3_1PartPrime

Long term feature banks for detailed video understanding (Action Recognition)Susang Kim

Cnn 발표자료종현 최

PiCANet, Pytorch Implementation (Korean)JaehoonYoo5

kics2013-winter-biomp-slide-20130127-1340Samsung Electronics

전형규, Vertex Post-Processing Framework, NDC2011devCAT Studio, NEXON

Designing more efficient convolution neural networkNAVER Engineering

Designing more efficient convolution neural networkDongyi Kim

[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축Ji-Woong Choi

[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son

Faster R-CNNrlawjdgns

Summary in recent advances in deep learning for object detection창기 문

Similar to SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (20)

(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...

Image Deep Learning 실무적용

Feature Pyramid Network, FPN

History of Vision AI

HistoryOfCNN

LeNet & GoogLeNet

텐서플로우 2.0 튜토리얼 - CNN

Deep learningwithkeras ch3_1

Long term feature banks for detailed video understanding (Action Recognition)

Cnn 발표자료

PiCANet, Pytorch Implementation (Korean)

kics2013-winter-biomp-slide-20130127-1340

전형규, Vertex Post-Processing Framework, NDC2011

Designing more efficient convolution neural network

[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축

[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...

Faster R-CNN

Summary in recent advances in deep learning for object detection

SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

1. Dae Hyun Nam @dev_strender <Paper Review> SPPNet

2. Introduction - initial technical report for ECCV 2014 - published on Arxiv in Apr, 2015 “Spatial Pyramid Pooling in Deep convolutional Networks for Visual Recognition” https://arxiv.org/abs/1406.4729

3. Introduction Object Detection History(?) R-CNN - Spatial Pyramid Pooling (SPP-Net) - Fast R-CNN - Faster R-CNN - Mask R-CNN

4. Backgrounds 일반적인 CNN은 항상 fixed-size image 만 고려 ex) 224 x 224 이러한 입력 영상의 크기를 맞추기 위해서, 다양한 크기의 이미지를 이 네트워크에 집어넣기 위해서는 cropping, warping 등 영상에 대한 전처리를 통해서 이루어졌다.

5. Backgrounds 하지만, cropping 한 결과가 전체 object를 담아내지 못할 수 도 (그림 왼쪽), warping 한 결과가 geometric distortion을 발생시 킬 수 있다.

6. Backgrounds 그러면, 왜 사람들은 항상 fixed-size 의 네트워크를 설계하였는가? 원인은 바로 “Fully Connected Layer” Convolutional layer는 sliding window 방식으로 계산되기 때문에, 굳이 output 뉴런의 개수가 fixed 될 필요가 없다. 하지만, FC의 경우는, input neuron 의 개수가 미리 정해져 있어야 하기 때문에, 여기서부터 문제 발생 논문에서는, 이미지 크기 변경없이 SPP를 이용한 고정된 개수의 feature extraction 을 가능하게 한다.

7. 3 Advantages 1) fixed-length output 을 만들어낼 수 있다. 2) multi-level spatial bin 을 이용하였다. 3) 다양한 level에서 feature 를 뽑아낼 수 있다.

8. What normal Pooling do? sliding window pooling input 크기에 따라, output 의 크기가 달라진다.

9. Proposed Pooling output => k * M k : number of filters of past convolutional layer M : bin size. 입력 영상 크기에 따라 동적으로 결정 global pooling 도 중요!

10. Ref) AlexNet

11. Spatial Pyramid Pooling Layer 3 levels of SPP (bin : 4x4, 2x2, 1x1) => output dim: 21 * 256 [참고]

12. how to get “FIXED”? pre-compute the bin sizes needed for spatial pyramid pooling ex) feature map size after conv5 : 13 * 13 (a * a) -> n x n bin level => window size == ceil(a/n) -> n x n bin level => stride == floor(a/n)

13. Training 1. Single size training proposed) training with 2244 * 224 images enable the multi-level pooling behavior 2. Multi size training

14. Drawbacks hard to implement back-propagation

15. Close Remarks

SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Recommended

Recommended

More Related Content

Similar to SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Similar to SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (20)

SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition