SlideShare a Scribd company logo
FCN to DeepLab.v3+
Whi Kwon
TLDR;
1. Semantic Segmentation 분야에서 FCN 이라는 Encoder(CNN)-Decoder 구조의
새로운 패러다임이 등장함 .
2. U-Net 이 등장함 . Skip Connection, gradually up/down sampling 이 구조에 추가 되었으며
왠지는 모르겠지만 많은 논문들에서 “ U-Net architecture” 라는 이름으로 segmentation
Network 를 사용함 .
3. 옛날 알고리즘 보다는 좋지만 FCN(+U-Net) 의 가장 큰 문제 (a.k.a. 개선 가능점 ) 는 Pooling.
Pooling 의 역할은 Exponential expansion of receptive field.
Pooling 의 문제점은 Feature map 의 크기의 축소 , 위치 정보의 손실 .
4. Pooling 의 역할을 대체해보자 ! → Dilated(Atrous) convolution.
Exponential expansion of receptive field 을 구조적 변경으로 가능하게 함 . 성능 저하도 없음 .
Feature map 크기 축소 문제 해결 !
5. Pooling 할 때 filter 크기 별로 위치 정보 손실이 다르지 않을까 ? 그럼 , 다양한 크기로 pooling 한
뒤에 합쳐보자 . → Spatial Pyramid Pooling.
6. 위에서 사용한 내용들 , skip connection, dilated convolution, spatial pyramid pooling 을
다 함께 사용하자 . + 좋은 Encoder → DeepLab.v3+ ( 현재 PASCAL VOC 2012 1 등 )
7. 아주 짧은 내용만을 다뤘기 때문에 내용을 참고하셔서 많은 논문을 보시면 좋겠습니다 .
Outline
Part.1: Encoder – Decoder 란 ?
Part.2: 위치 정보를 잘 보존하려면 ?
Part.3: End-to-End Semantic Segmentation
의 재료들
Part 1. Encoder – Decoder 란 무엇인가 ?
Outline – Part 1.
1. Encoder - Decoder 란 ?
2. Encoder 로써의 CNN
3. 위치 정보를 얻기 위한 Decoder 는 ?
4. Fully Convolutional Network (FCN) 의 등장
Encoder Decoder“hello
world”
[104, 101,
108, 108,
111, 32, 119,
111, 114,
108, 100]
“hello
world”
Encoder 는 원본 데이터로
부터 변환된 데이터를 얻습
니다 .
Decoder 는 변환된 데이터
로부터 원본 데이터를 얻습
니다 .
Source: https://unsplash.com/photos/EcsCeS6haJ8
Encoder
(CNN)
Feature
map
0
0
0
1
고양이 사진을 입력했을 때
feature extraction 하는 과정을
Encoding 이라고 볼 수 있고 이
때 , Encoder 는 CNN 입니다 .
각각의 값들이 어떤 의미를 하는
지 정확하게 알 수는 없지만 고
양이 사진을 변환한 정보를 가지
고 있습니다 .
Source: https://cdn-images-1.medium.com/max/1600/1*bGTawFxQwzc5yV1_szDrwQ.png
CNN 이라는 Encoder 로 데이터
를 변환해서 예측했는데 매우 잘
합니다 . 데이터가 잘 변환되어
사진의 정보를 많이 가지고 있는
듯 합니다 !
이 정보를 잘 활용 할 수 있지
않을까 ..
Fully Convolutional Network!
Encoder Decoder
CNN 은 Encoder 로써 잘 작동하므
로 Feature map 에 각 픽셀의 정
보가 압축되어 있다고 해보자 .
압축된 정보가 Decoder 를 통하
면 픽셀의 위치 정보를 얻을 수 있
지 않을까 ? Yes!
Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
Part 2. 위치 정보를 잘 보존하려면 ?
Outline – Part 2.
1. FCN 의 문제점 ?
2. En--------coder De--------coder 구조 (U-Net)
3. Dilated Convolution (Dilated Net, DeepLab.v2)
4. Spatial Pyramid Pooling (PSPNet, DeepLab.v3,+)
Fully Convolutional Network?
x32
Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
Feature map 의 값이 대응되는 pixel 개수가
너무 많습니다 ! 위치 정보가 세세하게 보존되
기 어려워요 .
문제 : Pooling layer!
VGG-19(FCN Encoder)
Image
Conv/
Pool
Conv/
Pool
Conv/
Pool
Conv/
Pool
Conv/
Pool FC
Pooling 의 역할 :
- Exponential expansion of
receptive field
- Translation invariance
Pooling 의 문제점 :
- Feature map 의 축소
- 위치 정보의 손실
En—coder De—coder (a.k.a. U-net architecture)
단계적
Encoding
단계적
Decoding
앞선 정보를 전달하자 !
(skip connection)
Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015.
좋은 방법들을 사용하긴 했는데 그래도 여전히
마지막 Feature map 이 원본 이미지에 비해 너
무 작은 문제는 그대로 있네요 .
Dilated(Atrous) Convolution
Perone et al. Spinal cord gray matter segmentation using deep dilated convolutions. ArXiv, 2017
Dilated Convolution?
Dilated(Atrous) Convolution
Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016
Layer 1 2 3
Convolution 3x3 3x3 3x3
Dilation 1 1 1
Receptive field 3x3 5x5 5x5
Layer 1 2 3
Convolution 3x3 3x3 3x3
Dilation 1 2 4
Receptive field 3x3 7x7 15x15
vs
Receptive Field 비교 (Normal vs Dilated)
Exponential expansion of receptive field!
1 2 3
Dilated(Atrous) Convolution
Input/Final feature
map : 1/32
Input/Final feature
map: 1/8
Feature map 크기 기존 대비 4 배 보존 !
Chen et al. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017
Feature map 비교 (Normal vs Dilated)
Spatial Pyramid Pooling
He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014.
Pooling 할 때 생기는 위치 정보 손실이
filter 크기 마다 다르지 않을까요 ?
Filter 크기 별로 정보를 추출한 뒤에
합쳐서 위치 정보 손실을 최소화해봅시다 .
Atrous Convolution + Spatial Pyramid Pooling!
Spatial Pyramid Pooling!
Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017.
Zhao et al. Pyramid scene parsing network. CVPR, 2017.
Encoder/Decoder,
Atrous Conv,
Spatial Pyramid
Pooling
DeepLab.v3+
Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018.
배운 내용을 찾아봅시다 !
PASCAL VOC2012 Leaderboard
모델 Mean Average
Precision (%)
Base CNN 모델
DeepLab.v3+ 87.8 Xception
DeepLab.v3 85.7 ResNet-101
PSPNet 85.4 ResNet-101
DeepLab.v2-CRF 79.7 ResNet-101
FCN-2s-
Dilated_VGG19
69.0 VGG-19
FCN-8s 62.2 VGG-19
SegNet 59.9 VGG-19
VOC Score: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6&submid=6103
Encoder 의 발전
SPP
Dilated
Conv
Encoder/
Decoder
Part 3. End-to-End Semantic
Segmentation 의 추가 재료들
Outline – Part 3.
1. 데이터 준비 , 전처리
2. 모델 선정
3. Loss, Optimizer 선정
4. 평가 (Metrics)
데이터 전처리
- 전처리는 classification 과 다르게 특별한 건 없습니다 .
대신 augmentation 할 때 image-mask 쌍으로 해줘야
합니 다 !
Loss
- Cross Entropy Loss
Optimizer
- SGD with momentum (+ Nesterov)
Learning rate
- Poly learning rate policy
(PSPNet, DeepLab.v2~v3+)
평가 방법 (Pixel)
- IoU: B / (A + C - B)
- Pixel accuracy: B / A
A
B
C
예측
정답
예측 성공 !
평가 방법 (Object)
- Precision/Recall: IoU >= 0.5
- AP: IoU 기준 (0~1.0) 에 따른
Precision/Recall Curve 의 면적
- mAP: 모든 class 의 AP 평균
A
A
A’
C
C
C’
IoU = 0.7
IoU = 0.2
Success(TP)
Fail(FN)
AP AP → mAP
Source: https://github.com/Cartucho/mAP
A C
C’
빠진 내용
1. Post preprocess – CRF, ...
2. Dilated Conv, Upsampling 에 대한 상세 이해
3. 다른 분야와의 접목된 연구 결과 (e.g. pix2pix)
… 채워주세요 !
Reference
1. 모델
- He et al. Spatial pyramid pooling in deep convolutional networks for
visual recognition. ECCV, 2014.
- Long et al. Fully convolutional networks for semantic segmentation.
CVPR, 2015.
- Ronneberger et al, U-net: Convolutional networks for biomedical image
segmentation. MICCAI, 2015.
- Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions.
ILCR, 2016
- Zhao et al. Pyramid scene parsing network. CVPR, 2017.
- Chen et al. Rethinking atrous convolution for semantic image segmentation.
ArXiv, 2017
- Chen et al. Encoder-Decoder with Atrous Separable Convolution for
Semantic Image Segmentation. ArXiv, 2018.
Reference
2. 참고 자료
– FCN – PSPNet Pytorch 구현
(https://github.com/ZijunDeng/pytorch-semantic-segmentation)
- 평가 지표 Python 구현
(https://github.com/martinkersner/py_img_seg_eval)
- DeepLab Pytorch 구현
(https://github.com/doiken23/DeepLab_pytorch)
- Deconvolution 설명 – Distill
(https://distill.pub/2016/deconv-checkerboard/)
- FCN to DeepLab.v3 정리 블로그
(http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review)
- PASCAL VOC 2012 Semantic Segmentation 평가 결과
(http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean
&challengeid=11&compid=6&submid=8284
Reference
– Dilated Convolution 설명
(https://stackoverflow.com/questions/41178576/whats-the-use-of-dilated-
convolutions)
- Spatial Pyramid Pooling 설명
(https://www.quora.com/What-is-the-difference-between-simple-max-
Pooling-and-spatial-pyramid-pooling-Im-seeing-these-terms-a-lot-lately-
In-papers-where-the-authors-need-to-get-a-feature-vector)
- Receptive field 설명
(https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for
-convolutional-neural-networks-e0f514068807)
- Dilated Convolution 유무 성능 비교 , 발생 문제 (gridding artifact) 해결
(https://arxiv.org/abs/1705.09914)

More Related Content

What's hot

An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
guestd1b1b5
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present..."Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
Edge AI and Vision Alliance
 
Image segmentation 2
Image segmentation 2 Image segmentation 2
Image segmentation 2
Rumah Belajar
 
Image Processing
Image ProcessingImage Processing
Image Processing
Rolando
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Morphological Image Processing
Morphological Image ProcessingMorphological Image Processing
Morphological Image Processing
kumari36
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
홍배 김
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
Rishabh shah
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
Gayan Sampath
 
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
Sergio Orts-Escolano
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
Shunta Saito
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
Wael Badawy
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
岳華 杜
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
ESCOM
 
Machine Learning vs. Deep Learning
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
Machine Learning vs. Deep Learning
Belatrix Software
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
Mohsin Ul Haq
 
Image Acquisition and Representation
Image Acquisition and RepresentationImage Acquisition and Representation
Image Acquisition and Representation
Amnaakhaan
 
Image compression .
Image compression .Image compression .
Image compression .
Payal Vishwakarma
 

What's hot (20)

An Introduction to Computer Vision
An Introduction to Computer VisionAn Introduction to Computer Vision
An Introduction to Computer Vision
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present..."Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
 
Image segmentation 2
Image segmentation 2 Image segmentation 2
Image segmentation 2
 
Image Processing
Image ProcessingImage Processing
Image Processing
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Morphological Image Processing
Morphological Image ProcessingMorphological Image Processing
Morphological Image Processing
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Machine Learning vs. Deep Learning
Machine Learning vs. Deep LearningMachine Learning vs. Deep Learning
Machine Learning vs. Deep Learning
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
Image Acquisition and Representation
Image Acquisition and RepresentationImage Acquisition and Representation
Image Acquisition and Representation
 
Image compression .
Image compression .Image compression .
Image compression .
 

Similar to FCN to DeepLab.v3+

History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
Tae Young Lee
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding Autoencoder
Lee Seungeun
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
창기 문
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
창기 문
 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
HistoryOfCNN
HistoryOfCNNHistoryOfCNN
HistoryOfCNN
Tae Young Lee
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
Dae Hyun Nam
 
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
Lee Seungeun
 
CNN
CNNCNN
CNN
chs71
 
Deep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetDeep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNet
Hyojun Kim
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Oh Yoojin
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark
NAVER D2
 
Final project v0.84
Final project v0.84Final project v0.84
Final project v0.84
Soukwon Jun
 
실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬
현찬 양
 
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
changedaeoh
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
rlawjdgns
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
taeseon ryu
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Sanghoon Yoon
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용
Youngjae Kim
 
Dl from scratch(7)
Dl from scratch(7)Dl from scratch(7)
Dl from scratch(7)
Park Seong Hyeon
 

Similar to FCN to DeepLab.v3+ (20)

History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding Autoencoder
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
LeNet & GoogLeNet
 
HistoryOfCNN
HistoryOfCNNHistoryOfCNN
HistoryOfCNN
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
 
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
 
CNN
CNNCNN
CNN
 
Deep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetDeep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNet
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark
 
Final project v0.84
Final project v0.84Final project v0.84
Final project v0.84
 
실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬
 
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용
 
Dl from scratch(7)
Dl from scratch(7)Dl from scratch(7)
Dl from scratch(7)
 

FCN to DeepLab.v3+

  • 2. TLDR; 1. Semantic Segmentation 분야에서 FCN 이라는 Encoder(CNN)-Decoder 구조의 새로운 패러다임이 등장함 . 2. U-Net 이 등장함 . Skip Connection, gradually up/down sampling 이 구조에 추가 되었으며 왠지는 모르겠지만 많은 논문들에서 “ U-Net architecture” 라는 이름으로 segmentation Network 를 사용함 . 3. 옛날 알고리즘 보다는 좋지만 FCN(+U-Net) 의 가장 큰 문제 (a.k.a. 개선 가능점 ) 는 Pooling. Pooling 의 역할은 Exponential expansion of receptive field. Pooling 의 문제점은 Feature map 의 크기의 축소 , 위치 정보의 손실 . 4. Pooling 의 역할을 대체해보자 ! → Dilated(Atrous) convolution. Exponential expansion of receptive field 을 구조적 변경으로 가능하게 함 . 성능 저하도 없음 . Feature map 크기 축소 문제 해결 ! 5. Pooling 할 때 filter 크기 별로 위치 정보 손실이 다르지 않을까 ? 그럼 , 다양한 크기로 pooling 한 뒤에 합쳐보자 . → Spatial Pyramid Pooling. 6. 위에서 사용한 내용들 , skip connection, dilated convolution, spatial pyramid pooling 을 다 함께 사용하자 . + 좋은 Encoder → DeepLab.v3+ ( 현재 PASCAL VOC 2012 1 등 ) 7. 아주 짧은 내용만을 다뤘기 때문에 내용을 참고하셔서 많은 논문을 보시면 좋겠습니다 .
  • 3. Outline Part.1: Encoder – Decoder 란 ? Part.2: 위치 정보를 잘 보존하려면 ? Part.3: End-to-End Semantic Segmentation 의 재료들
  • 4. Part 1. Encoder – Decoder 란 무엇인가 ?
  • 5. Outline – Part 1. 1. Encoder - Decoder 란 ? 2. Encoder 로써의 CNN 3. 위치 정보를 얻기 위한 Decoder 는 ? 4. Fully Convolutional Network (FCN) 의 등장
  • 6. Encoder Decoder“hello world” [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100] “hello world” Encoder 는 원본 데이터로 부터 변환된 데이터를 얻습 니다 . Decoder 는 변환된 데이터 로부터 원본 데이터를 얻습 니다 .
  • 7. Source: https://unsplash.com/photos/EcsCeS6haJ8 Encoder (CNN) Feature map 0 0 0 1 고양이 사진을 입력했을 때 feature extraction 하는 과정을 Encoding 이라고 볼 수 있고 이 때 , Encoder 는 CNN 입니다 . 각각의 값들이 어떤 의미를 하는 지 정확하게 알 수는 없지만 고 양이 사진을 변환한 정보를 가지 고 있습니다 .
  • 8. Source: https://cdn-images-1.medium.com/max/1600/1*bGTawFxQwzc5yV1_szDrwQ.png CNN 이라는 Encoder 로 데이터 를 변환해서 예측했는데 매우 잘 합니다 . 데이터가 잘 변환되어 사진의 정보를 많이 가지고 있는 듯 합니다 ! 이 정보를 잘 활용 할 수 있지 않을까 ..
  • 9. Fully Convolutional Network! Encoder Decoder CNN 은 Encoder 로써 잘 작동하므 로 Feature map 에 각 픽셀의 정 보가 압축되어 있다고 해보자 . 압축된 정보가 Decoder 를 통하 면 픽셀의 위치 정보를 얻을 수 있 지 않을까 ? Yes! Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
  • 10. Part 2. 위치 정보를 잘 보존하려면 ?
  • 11. Outline – Part 2. 1. FCN 의 문제점 ? 2. En--------coder De--------coder 구조 (U-Net) 3. Dilated Convolution (Dilated Net, DeepLab.v2) 4. Spatial Pyramid Pooling (PSPNet, DeepLab.v3,+)
  • 12. Fully Convolutional Network? x32 Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015. Feature map 의 값이 대응되는 pixel 개수가 너무 많습니다 ! 위치 정보가 세세하게 보존되 기 어려워요 .
  • 13. 문제 : Pooling layer! VGG-19(FCN Encoder) Image Conv/ Pool Conv/ Pool Conv/ Pool Conv/ Pool Conv/ Pool FC Pooling 의 역할 : - Exponential expansion of receptive field - Translation invariance Pooling 의 문제점 : - Feature map 의 축소 - 위치 정보의 손실
  • 14. En—coder De—coder (a.k.a. U-net architecture) 단계적 Encoding 단계적 Decoding 앞선 정보를 전달하자 ! (skip connection) Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015. 좋은 방법들을 사용하긴 했는데 그래도 여전히 마지막 Feature map 이 원본 이미지에 비해 너 무 작은 문제는 그대로 있네요 .
  • 15. Dilated(Atrous) Convolution Perone et al. Spinal cord gray matter segmentation using deep dilated convolutions. ArXiv, 2017 Dilated Convolution?
  • 16. Dilated(Atrous) Convolution Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016 Layer 1 2 3 Convolution 3x3 3x3 3x3 Dilation 1 1 1 Receptive field 3x3 5x5 5x5 Layer 1 2 3 Convolution 3x3 3x3 3x3 Dilation 1 2 4 Receptive field 3x3 7x7 15x15 vs Receptive Field 비교 (Normal vs Dilated) Exponential expansion of receptive field! 1 2 3
  • 17. Dilated(Atrous) Convolution Input/Final feature map : 1/32 Input/Final feature map: 1/8 Feature map 크기 기존 대비 4 배 보존 ! Chen et al. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017 Feature map 비교 (Normal vs Dilated)
  • 18. Spatial Pyramid Pooling He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014. Pooling 할 때 생기는 위치 정보 손실이 filter 크기 마다 다르지 않을까요 ? Filter 크기 별로 정보를 추출한 뒤에 합쳐서 위치 정보 손실을 최소화해봅시다 .
  • 19. Atrous Convolution + Spatial Pyramid Pooling! Spatial Pyramid Pooling! Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017. Zhao et al. Pyramid scene parsing network. CVPR, 2017.
  • 21. Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018. 배운 내용을 찾아봅시다 !
  • 22. PASCAL VOC2012 Leaderboard 모델 Mean Average Precision (%) Base CNN 모델 DeepLab.v3+ 87.8 Xception DeepLab.v3 85.7 ResNet-101 PSPNet 85.4 ResNet-101 DeepLab.v2-CRF 79.7 ResNet-101 FCN-2s- Dilated_VGG19 69.0 VGG-19 FCN-8s 62.2 VGG-19 SegNet 59.9 VGG-19 VOC Score: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6&submid=6103 Encoder 의 발전 SPP Dilated Conv Encoder/ Decoder
  • 23. Part 3. End-to-End Semantic Segmentation 의 추가 재료들
  • 24. Outline – Part 3. 1. 데이터 준비 , 전처리 2. 모델 선정 3. Loss, Optimizer 선정 4. 평가 (Metrics)
  • 25. 데이터 전처리 - 전처리는 classification 과 다르게 특별한 건 없습니다 . 대신 augmentation 할 때 image-mask 쌍으로 해줘야 합니 다 !
  • 26. Loss - Cross Entropy Loss Optimizer - SGD with momentum (+ Nesterov) Learning rate - Poly learning rate policy (PSPNet, DeepLab.v2~v3+)
  • 27. 평가 방법 (Pixel) - IoU: B / (A + C - B) - Pixel accuracy: B / A A B C 예측 정답 예측 성공 !
  • 28. 평가 방법 (Object) - Precision/Recall: IoU >= 0.5 - AP: IoU 기준 (0~1.0) 에 따른 Precision/Recall Curve 의 면적 - mAP: 모든 class 의 AP 평균 A A A’ C C C’ IoU = 0.7 IoU = 0.2 Success(TP) Fail(FN) AP AP → mAP Source: https://github.com/Cartucho/mAP A C C’
  • 29. 빠진 내용 1. Post preprocess – CRF, ... 2. Dilated Conv, Upsampling 에 대한 상세 이해 3. 다른 분야와의 접목된 연구 결과 (e.g. pix2pix) … 채워주세요 !
  • 30. Reference 1. 모델 - He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014. - Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015. - Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015. - Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016 - Zhao et al. Pyramid scene parsing network. CVPR, 2017. - Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017 - Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018.
  • 31. Reference 2. 참고 자료 – FCN – PSPNet Pytorch 구현 (https://github.com/ZijunDeng/pytorch-semantic-segmentation) - 평가 지표 Python 구현 (https://github.com/martinkersner/py_img_seg_eval) - DeepLab Pytorch 구현 (https://github.com/doiken23/DeepLab_pytorch) - Deconvolution 설명 – Distill (https://distill.pub/2016/deconv-checkerboard/) - FCN to DeepLab.v3 정리 블로그 (http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review) - PASCAL VOC 2012 Semantic Segmentation 평가 결과 (http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean &challengeid=11&compid=6&submid=8284
  • 32. Reference – Dilated Convolution 설명 (https://stackoverflow.com/questions/41178576/whats-the-use-of-dilated- convolutions) - Spatial Pyramid Pooling 설명 (https://www.quora.com/What-is-the-difference-between-simple-max- Pooling-and-spatial-pyramid-pooling-Im-seeing-these-terms-a-lot-lately- In-papers-where-the-authors-need-to-get-a-feature-vector) - Receptive field 설명 (https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for -convolutional-neural-networks-e0f514068807) - Dilated Convolution 유무 성능 비교 , 발생 문제 (gridding artifact) 해결 (https://arxiv.org/abs/1705.09914)