SlideShare a Scribd company logo
1 of 15
Download to read offline
Dae Hyun Nam
@dev_strender
<Paper Review>
SPPNet
Introduction
- initial technical report for ECCV 2014
- published on Arxiv in Apr, 2015
“Spatial Pyramid Pooling in Deep convolutional Networks for Visual Recognition”
https://arxiv.org/abs/1406.4729
Introduction
Object Detection History(?)
R-CNN - Spatial Pyramid Pooling (SPP-Net) - Fast R-CNN - Faster R-CNN - Mask R-CNN
Backgrounds
일반적인 CNN은 항상 fixed-size image 만 고려 ex) 224 x 224
이러한 입력 영상의 크기를 맞추기 위해서, 다양한 크기의 이미지를 이 네트워크에 집어넣기 위해서는 cropping,
warping 등 영상에 대한 전처리를 통해서 이루어졌다.
Backgrounds
하지만,
cropping 한 결과가 전체 object를 담아내지 못할 수
도 (그림 왼쪽),
warping 한 결과가 geometric distortion을 발생시
킬 수 있다.
Backgrounds
그러면, 왜 사람들은 항상 fixed-size 의 네트워크를 설계하였는가?
원인은 바로 “Fully Connected Layer”
Convolutional layer는 sliding window 방식으로 계산되기 때문에, 굳이 output 뉴런의 개수가 fixed 될 필요가
없다.
하지만, FC의 경우는, input neuron 의 개수가 미리 정해져 있어야 하기 때문에, 여기서부터 문제 발생
논문에서는, 이미지 크기 변경없이 SPP를 이용한 고정된 개수의 feature extraction 을 가능하게 한다.
3 Advantages
1) fixed-length output 을 만들어낼 수 있다.
2) multi-level spatial bin 을 이용하였다.
3) 다양한 level에서 feature 를 뽑아낼 수 있다.
What normal Pooling do?
sliding window pooling
input 크기에 따라, output 의 크기가 달라진다.
Proposed Pooling
output => k * M
k : number of filters of past convolutional layer
M : bin size. 입력 영상 크기에 따라 동적으로 결정
global pooling 도 중요!
Ref) AlexNet
Spatial Pyramid Pooling Layer
3 levels of SPP (bin : 4x4, 2x2, 1x1)
=> output dim: 21 * 256
[참고]
how to get “FIXED”?
pre-compute the bin sizes needed for spatial pyramid pooling
ex) feature map size after conv5 : 13 * 13 (a * a)
-> n x n bin level => window size == ceil(a/n)
-> n x n bin level => stride == floor(a/n)
Training
1. Single size training
proposed) training with 2244 * 224 images
enable the multi-level pooling behavior
2. Multi size training
Drawbacks
hard to implement back-propagation
Close Remarks

More Related Content

Similar to SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...MYEONGGYU LEE
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용Youngjae Kim
 
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AITae Young Lee
 
텐서플로우 2.0 튜토리얼 - CNN
텐서플로우 2.0 튜토리얼 - CNN텐서플로우 2.0 튜토리얼 - CNN
텐서플로우 2.0 튜토리얼 - CNNHwanhee Kim
 
Deep learningwithkeras ch3_1
Deep learningwithkeras ch3_1Deep learningwithkeras ch3_1
Deep learningwithkeras ch3_1PartPrime
 
Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Susang Kim
 
Cnn 발표자료
Cnn 발표자료Cnn 발표자료
Cnn 발표자료종현 최
 
PiCANet, Pytorch Implementation (Korean)
PiCANet, Pytorch Implementation (Korean)PiCANet, Pytorch Implementation (Korean)
PiCANet, Pytorch Implementation (Korean)JaehoonYoo5
 
kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340Samsung Electronics
 
전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011devCAT Studio, NEXON
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkNAVER Engineering
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkDongyi Kim
 
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축Ji-Woong Choi
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNrlawjdgns
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection창기 문
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection창기 문
 

Similar to SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (20)

(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
(Paper Review)Kernel predicting-convolutional-networks-for-denoising-monte-ca...
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용
 
Feature Pyramid Network, FPN
Feature Pyramid Network, FPNFeature Pyramid Network, FPN
Feature Pyramid Network, FPN
 
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
 
HistoryOfCNN
HistoryOfCNNHistoryOfCNN
HistoryOfCNN
 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
LeNet & GoogLeNet
 
텐서플로우 2.0 튜토리얼 - CNN
텐서플로우 2.0 튜토리얼 - CNN텐서플로우 2.0 튜토리얼 - CNN
텐서플로우 2.0 튜토리얼 - CNN
 
Deep learningwithkeras ch3_1
Deep learningwithkeras ch3_1Deep learningwithkeras ch3_1
Deep learningwithkeras ch3_1
 
Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)Long term feature banks for detailed video understanding (Action Recognition)
Long term feature banks for detailed video understanding (Action Recognition)
 
Cnn 발표자료
Cnn 발표자료Cnn 발표자료
Cnn 발표자료
 
PiCANet, Pytorch Implementation (Korean)
PiCANet, Pytorch Implementation (Korean)PiCANet, Pytorch Implementation (Korean)
PiCANet, Pytorch Implementation (Korean)
 
kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340
 
전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
 
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 

SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

  • 2. Introduction - initial technical report for ECCV 2014 - published on Arxiv in Apr, 2015 “Spatial Pyramid Pooling in Deep convolutional Networks for Visual Recognition” https://arxiv.org/abs/1406.4729
  • 3. Introduction Object Detection History(?) R-CNN - Spatial Pyramid Pooling (SPP-Net) - Fast R-CNN - Faster R-CNN - Mask R-CNN
  • 4. Backgrounds 일반적인 CNN은 항상 fixed-size image 만 고려 ex) 224 x 224 이러한 입력 영상의 크기를 맞추기 위해서, 다양한 크기의 이미지를 이 네트워크에 집어넣기 위해서는 cropping, warping 등 영상에 대한 전처리를 통해서 이루어졌다.
  • 5. Backgrounds 하지만, cropping 한 결과가 전체 object를 담아내지 못할 수 도 (그림 왼쪽), warping 한 결과가 geometric distortion을 발생시 킬 수 있다.
  • 6. Backgrounds 그러면, 왜 사람들은 항상 fixed-size 의 네트워크를 설계하였는가? 원인은 바로 “Fully Connected Layer” Convolutional layer는 sliding window 방식으로 계산되기 때문에, 굳이 output 뉴런의 개수가 fixed 될 필요가 없다. 하지만, FC의 경우는, input neuron 의 개수가 미리 정해져 있어야 하기 때문에, 여기서부터 문제 발생 논문에서는, 이미지 크기 변경없이 SPP를 이용한 고정된 개수의 feature extraction 을 가능하게 한다.
  • 7. 3 Advantages 1) fixed-length output 을 만들어낼 수 있다. 2) multi-level spatial bin 을 이용하였다. 3) 다양한 level에서 feature 를 뽑아낼 수 있다.
  • 8. What normal Pooling do? sliding window pooling input 크기에 따라, output 의 크기가 달라진다.
  • 9. Proposed Pooling output => k * M k : number of filters of past convolutional layer M : bin size. 입력 영상 크기에 따라 동적으로 결정 global pooling 도 중요!
  • 11. Spatial Pyramid Pooling Layer 3 levels of SPP (bin : 4x4, 2x2, 1x1) => output dim: 21 * 256 [참고]
  • 12. how to get “FIXED”? pre-compute the bin sizes needed for spatial pyramid pooling ex) feature map size after conv5 : 13 * 13 (a * a) -> n x n bin level => window size == ceil(a/n) -> n x n bin level => stride == floor(a/n)
  • 13. Training 1. Single size training proposed) training with 2244 * 224 images enable the multi-level pooling behavior 2. Multi size training
  • 14. Drawbacks hard to implement back-propagation