Auto DeepLab을 간단하게 소개를 먼저 드리면 Semantic Segmentation
테스크를 위한 모델입니다 저자들은 머신러닝을 통해서 세그멘테이션 네트워크 자체를 생성하고자 했습니다 아키텍처 Search 같은 경우에는 AutoML의 대표적인 방법인데요
그래서 이 논문의 제목이 Auto DeepLab인 이유도 이제 AutoML의 방법을 사용했기 때문입니다 저자들은 AutoML 측면에서 DARTS라는 논문을 참고로 해 갖고 다음에 Segmentation측면에서는 DeepLab V3을 많이 참고하였습니다 논문 리뷰를 이미지 처리팀 김선옥님이 디테일한 논문 리뷰 도와주셨습니다!
https://youtu.be/2886fuyKo9g
4. Auto-DeepLab
Contribution
4
Efficient gradient-based architecture search
(3 P100 GPU days on Cityscapes images).
The first attempts to extend NAS beyond image
classification to dense image prediction.
The architecture for semantic image segmentation
attains state-of-the-art performance without any
ImageNet pretraining.
6. Auto-DeepLab
Semantic Segmentation
6
DeepLab V1 (2015) DeepLab V2 (2017)
DeepLab V3 (2017)
DeepLab V3+ (2018)
1Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv:1412.7062, 2015.
2DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv:1606.00915v2, 2017
3Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv:1706.05587v3, 2017
4Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv:1802.02611v3, 2018
7. Auto-DeepLab
ASPP(Atrous Spatial Pyramid Pooling)
7
ASPP1
ASSP is a semantic segmentation module for resampling a given feature layer at multiple rates prior to
convolution
1DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv:1606.00915v2, 2017
8. Auto-DeepLab
NAS (Network Architecture Search)
8
NAS aims at automatically designing neural network architectures, hence minimizing human hours and efforts. The
authors’ work follows the differentiable NAS formulation and extends it into the more general hierarchical setting.
DARTS (2018)1
1Darts: Differentiable architecture search. arXiv:1806.09055, 2018
9. Auto-DeepLab
Network level and Cell Level Search Space
9
The authors propose to search the network level structure in addition to the cell level structure, which forms a
hierarchical architecture search space.
10. Auto-DeepLab
Motivation
10
• Existing works often focus on searching the repeatable cell structure. The outer network level structure
fixed by hand simplifies the search space.
• Image classification should not be the end point for NAS.
• Optimal architectures for semantic segmentation must operate on high resolution imagery.
Cityspaces Dataset2
1MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv:1807.11626, 2018
2www.cityscapes-dataset.com
1024 * 2048 resolution
MNasNet (2018)1
14. Auto-DeepLab
Cell Level Search Space
14
The set of possible layer types, O,
consists of the following 8 operators,
all prevalent in modern CNNs:
• 3 × 3 depthwise-separable conv
• 5 × 5 depthwise-separable conv
• 3×3 atrous conv with rate2
• 5×5 atrous conv with rate2
• 3 × 3 average pooling
• 3 × 3 max pooling
• skip connection
• no connection(zero)
1Darts: Differentiable architecture search. arXiv:1806.09055, 2018
Before
architecture search
After
architecture search
15. Auto-DeepLab
Cell Architecture
15
𝐻!
"
= #
#!
"
∈%#
"
𝑂&→!(𝐻&
"
) '
𝑂&→!(𝐻&
"
) = #
($∈(
𝛼&→!
)
𝑂) (𝐻&
"
)
#
)*+
(
𝛼&→!
)
= 1
Every block’s output tensor Hil is connected to all hidden states in 𝑇!
"
.
𝛼&→!
)
≥ 0
𝐻" = Cell(𝐻",+, 𝐻",-; 𝛼)
𝐻%
&
∶ output tensor at layer 𝑙
𝑂 : operator
𝛼'→%
)
: normalized scalars
where
17. Auto-DeepLab
Network Level Search Space
17
Two principles:
The spatial resolution of the next layer is either twice as large, or twice as small, or remains the same.
The smallest spatial resolution is down-sampled by 32.
19. Auto-DeepLab
Optimization
19
𝑡𝑟𝑎𝑖𝑛𝐴
𝑡𝑟𝑎𝑖𝑛𝐵
Update network weight 𝑤 by ∇( 𝐿)*+,-.(𝑤, 𝛼, 𝛽)
Update architecture 𝛼, 𝛽 by ∇/,0 𝐿)*+,-1(𝑤, 𝛼, 𝛽)
The authors use 321 × 321 random image crops from half-resolution(512 × 1024) images in the train fine set
and partition the training data into two disjoint sets train A and train B.
22. Auto-DeepLab
Architecture Search Implementation Details
1https://github.com/tensorflow/models/tree/master/research/deeplab 22
A total of L = 12 layers in the network
Architecture search on the Cityscapes dataset for
semantic image segmentation
321 × 321 random image crops from half-
resolution (512 × 1024) images in the train fine set.
23. Auto-DeepLab
Semantic Segmentation Results
23
The authors report our architecture search implementation details as well as the search results. The authors then
report semantic segmentation results on benchmark datasets with our best found architecture.
29. Auto-DeepLab
Conclusion
29
The first attempts to extend Neural Architecture Search beyond image classification to dense image
prediction problems.
A differentiable formulation that allows efficient (about 1000× faster than DPC) architecture search.
Auto-DeepLab significantly outperforms the previous state-of-the-art.