FCN to DeepLab.v3+

TLDR;
1. Semantic Segmentation 분야에서 FCN 이라는 Encoder(CNN)-Decoder 구조의
새로운 패러다임이 등장함 .
2. U-Net 이 등장함 . Skip Connection, gradually up/down sampling 이 구조에 추가 되었으며
왠지는 모르겠지만 많은 논문들에서 “ U-Net architecture” 라는 이름으로 segmentation
Network 를 사용함 .
3. 옛날 알고리즘 보다는 좋지만 FCN(+U-Net) 의 가장 큰 문제 (a.k.a. 개선 가능점 ) 는 Pooling.
Pooling 의 역할은 Exponential expansion of receptive field.
Pooling 의 문제점은 Feature map 의 크기의 축소 , 위치 정보의 손실 .
4. Pooling 의 역할을 대체해보자 ! → Dilated(Atrous) convolution.
Exponential expansion of receptive field 을 구조적 변경으로 가능하게 함 . 성능 저하도 없음 .
Feature map 크기 축소 문제 해결 !
5. Pooling 할 때 filter 크기 별로 위치 정보 손실이 다르지 않을까 ? 그럼 , 다양한 크기로 pooling 한
뒤에 합쳐보자 . → Spatial Pyramid Pooling.
6. 위에서 사용한 내용들 , skip connection, dilated convolution, spatial pyramid pooling 을
다 함께 사용하자 . + 좋은 Encoder → DeepLab.v3+ ( 현재 PASCAL VOC 2012 1 등 )
7. 아주 짧은 내용만을 다뤘기 때문에 내용을 참고하셔서 많은 논문을 보시면 좋겠습니다 .

Outline
Part.1: Encoder – Decoder 란 ?
Part.2: 위치 정보를 잘 보존하려면 ?
Part.3: End-to-End Semantic Segmentation
의 재료들

Part 1. Encoder – Decoder 란 무엇인가 ?

Outline – Part 1.
1. Encoder - Decoder 란 ?
2. Encoder 로써의 CNN
3. 위치 정보를 얻기 위한 Decoder 는 ?
4. Fully Convolutional Network (FCN) 의 등장

Encoder Decoder“hello
world”
[104, 101,
108, 108,
111, 32, 119,
111, 114,
108, 100]
“hello
world”
Encoder 는 원본 데이터로
부터 변환된 데이터를 얻습
니다 .
Decoder 는 변환된 데이터
로부터 원본 데이터를 얻습
니다 .

Source: https://unsplash.com/photos/EcsCeS6haJ8
Encoder
(CNN)
Feature
map
0
0
0
1
고양이 사진을 입력했을 때
feature extraction 하는 과정을
Encoding 이라고 볼 수 있고 이
때 , Encoder 는 CNN 입니다 .
각각의 값들이 어떤 의미를 하는
지 정확하게 알 수는 없지만 고
양이 사진을 변환한 정보를 가지
고 있습니다 .

Source: https://cdn-images-1.medium.com/max/1600/1*bGTawFxQwzc5yV1_szDrwQ.png
CNN 이라는 Encoder 로 데이터
를 변환해서 예측했는데 매우 잘
합니다 . 데이터가 잘 변환되어
사진의 정보를 많이 가지고 있는
듯 합니다 !
이 정보를 잘 활용 할 수 있지
않을까 ..

Fully Convolutional Network!
Encoder Decoder
CNN 은 Encoder 로써 잘 작동하므
로 Feature map 에 각 픽셀의 정
보가 압축되어 있다고 해보자 .
압축된 정보가 Decoder 를 통하
면 픽셀의 위치 정보를 얻을 수 있
지 않을까 ? Yes!
Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.

Part 2. 위치 정보를 잘 보존하려면 ?

Outline – Part 2.
1. FCN 의 문제점 ?
2. En--------coder De--------coder 구조 (U-Net)
3. Dilated Convolution (Dilated Net, DeepLab.v2)
4. Spatial Pyramid Pooling (PSPNet, DeepLab.v3,+)

Fully Convolutional Network?
x32
Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
Feature map 의 값이 대응되는 pixel 개수가
너무 많습니다 ! 위치 정보가 세세하게 보존되
기 어려워요 .

문제 : Pooling layer!
VGG-19(FCN Encoder)
Image
Conv/
Pool
Conv/
Pool
Conv/
Pool
Conv/
Pool
Conv/
Pool FC
Pooling 의 역할 :
- Exponential expansion of
receptive field
- Translation invariance
Pooling 의 문제점 :
- Feature map 의 축소
- 위치 정보의 손실

En—coder De—coder (a.k.a. U-net architecture)
단계적
Encoding
단계적
Decoding
앞선 정보를 전달하자 !
(skip connection)
Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015.
좋은 방법들을 사용하긴 했는데 그래도 여전히
마지막 Feature map 이 원본 이미지에 비해 너
무 작은 문제는 그대로 있네요 .

Dilated(Atrous) Convolution
Perone et al. Spinal cord gray matter segmentation using deep dilated convolutions. ArXiv, 2017
Dilated Convolution?

Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016
Layer 1 2 3
Convolution 3x3 3x3 3x3
Dilation 1 1 1
Receptive field 3x3 5x5 5x5
Layer 1 2 3
Convolution 3x3 3x3 3x3
Dilation 1 2 4
Receptive field 3x3 7x7 15x15
vs
Receptive Field 비교 (Normal vs Dilated)
Exponential expansion of receptive field!
1 2 3

Input/Final feature
map : 1/32
Input/Final feature
map: 1/8
Feature map 크기 기존 대비 4 배 보존 !
Chen et al. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017
Feature map 비교 (Normal vs Dilated)

Spatial Pyramid Pooling
He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014.
Pooling 할 때 생기는 위치 정보 손실이
filter 크기 마다 다르지 않을까요 ?
Filter 크기 별로 정보를 추출한 뒤에
합쳐서 위치 정보 손실을 최소화해봅시다 .

Atrous Convolution + Spatial Pyramid Pooling!
Spatial Pyramid Pooling!
Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017.
Zhao et al. Pyramid scene parsing network. CVPR, 2017.

Encoder/Decoder,
Atrous Conv,
Spatial Pyramid
Pooling
DeepLab.v3+

Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018.
배운 내용을 찾아봅시다 !

PASCAL VOC2012 Leaderboard
모델 Mean Average
Precision (%)
Base CNN 모델
DeepLab.v3+ 87.8 Xception
DeepLab.v3 85.7 ResNet-101
PSPNet 85.4 ResNet-101
DeepLab.v2-CRF 79.7 ResNet-101
FCN-2s-
Dilated_VGG19
69.0 VGG-19
FCN-8s 62.2 VGG-19
SegNet 59.9 VGG-19
VOC Score: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6&submid=6103
Encoder 의 발전
SPP
Dilated
Conv
Encoder/
Decoder

Part 3. End-to-End Semantic
Segmentation 의 추가 재료들

Outline – Part 3.
1. 데이터 준비 , 전처리
2. 모델 선정
3. Loss, Optimizer 선정
4. 평가 (Metrics)

데이터 전처리
- 전처리는 classification 과 다르게 특별한 건 없습니다 .
대신 augmentation 할 때 image-mask 쌍으로 해줘야
합니 다 !

Loss
- Cross Entropy Loss
Optimizer
- SGD with momentum (+ Nesterov)
Learning rate
- Poly learning rate policy
(PSPNet, DeepLab.v2~v3+)

평가 방법 (Pixel)
- IoU: B / (A + C - B)
- Pixel accuracy: B / A
A
B
C
예측
정답
예측 성공 !

평가 방법 (Object)
- Precision/Recall: IoU >= 0.5
- AP: IoU 기준 (0~1.0) 에 따른
Precision/Recall Curve 의 면적
- mAP: 모든 class 의 AP 평균
A
A
A’
C
C
C’
IoU = 0.7
IoU = 0.2
Success(TP)
Fail(FN)
AP AP → mAP
Source: https://github.com/Cartucho/mAP
A C
C’

빠진 내용
1. Post preprocess – CRF, ...
2. Dilated Conv, Upsampling 에 대한 상세 이해
3. 다른 분야와의 접목된 연구 결과 (e.g. pix2pix)
… 채워주세요 !

Reference
1. 모델
- He et al. Spatial pyramid pooling in deep convolutional networks for
visual recognition. ECCV, 2014.
- Long et al. Fully convolutional networks for semantic segmentation.
CVPR, 2015.
- Ronneberger et al, U-net: Convolutional networks for biomedical image
segmentation. MICCAI, 2015.
- Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions.
ILCR, 2016
- Zhao et al. Pyramid scene parsing network. CVPR, 2017.
- Chen et al. Rethinking atrous convolution for semantic image segmentation.
ArXiv, 2017
- Chen et al. Encoder-Decoder with Atrous Separable Convolution for
Semantic Image Segmentation. ArXiv, 2018.

Reference
2. 참고 자료
– FCN – PSPNet Pytorch 구현
(https://github.com/ZijunDeng/pytorch-semantic-segmentation)
- 평가 지표 Python 구현
(https://github.com/martinkersner/py_img_seg_eval)
- DeepLab Pytorch 구현
(https://github.com/doiken23/DeepLab_pytorch)
- Deconvolution 설명 – Distill
(https://distill.pub/2016/deconv-checkerboard/)
- FCN to DeepLab.v3 정리 블로그
(http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review)
- PASCAL VOC 2012 Semantic Segmentation 평가 결과
(http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean
&challengeid=11&compid=6&submid=8284

Reference
– Dilated Convolution 설명
(https://stackoverflow.com/questions/41178576/whats-the-use-of-dilated-
convolutions)
- Spatial Pyramid Pooling 설명
(https://www.quora.com/What-is-the-difference-between-simple-max-
Pooling-and-spatial-pyramid-pooling-Im-seeing-these-terms-a-lot-lately-
In-papers-where-the-authors-need-to-get-a-feature-vector)
- Receptive field 설명
(https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for
-convolutional-neural-networks-e0f514068807)
- Dilated Convolution 유무 성능 비교 , 발생 문제 (gridding artifact) 해결
(https://arxiv.org/abs/1705.09914)

FCN to DeepLab.v3+

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FCN to DeepLab.v3+

Similar to FCN to DeepLab.v3+ (20)

FCN to DeepLab.v3+