SlideShare a Scribd company logo
1 of 23
Download to read offline
김 성 철
Contents
1. Introduction
2. Related Work
3. Proposed Method
4. Experiments
5. Conclusions
2
Introduction
• Pixel-level annotation
→ Fully supervised semantic segmentation
• Image-level annotation으로 segmentation network 학습은 어려움!
→ Weakly labeled data는 class의 존재 유무만 가리키기 때문!
→ Classification network에서 얻어진 localization map에 의존
→ Localization map도 정확한 경계에 대한 표현은 없고, small discriminative part에 집중
3
Introduction
• FickleNet
→ CNN의 hidden unit의 random combination을 이용해 여러 가지 localization map 생성
→ 각 sliding window position에서 random으로 hidden unit을 선택
4
Related Work
• Grad-CAM
→ Image-level annotation으로 pixel을 분류
→ Object의 small discriminative region에 집중
5
https://arxiv.org/abs/1610.02391
Related Work
• SRG (Seeded Region Growing)
→ Seeded cues와 이미지로 Region Growing
→ Pixel level을 기준으로 clustering
6
https://blog.lunit.io/2019/04/25/ficklenetweakly-and-semi-supervised-semantic-image-segmentation-using-stochastic-inference/
Related Work
• DSRG (Deep Seeded Region Growing)
→ Seeded cues와 CAM으로 Region Growing, Segmentation training
→ Seeding loss + Fully connected CRF와의 Boundary loss
7
CAM Segmentation Probability Map
𝑯 𝒖,𝒄
https://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Weakly-Supervised_Semantic_Segmentation_CVPR_2018_paper.pdf
Related Work
• DSRG (Deep Seeded Region Growing)
8
https://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Weakly-Supervised_Semantic_Segmentation_CVPR_2018_paper.pdf
Proposed Method
1. Hidden unit의 stochastic selection 사용, multi-class classification 학습
2. Training image의 localization map 생성
3. Localization map을 segmentation network 학습에 pseudo-label로 사용
9
①
②
Proposed Method
1. Stochastic Hidden Unit Selection
① Feature Map Expansion
- GPU 활용을 극대화하기 위해 localization map을 stride = convolutional kernel size가 되도록 확장
- 한 번에 random selection을 시행하여 빠른 연산 가능
10
Proposed Method
1. Stochastic Hidden Unit Selection
② Center-preserving Spatial Dropout
- 각 sliding window position의 kernel 중심은 drop하지 않음
→ 중심과 다른 위치와의 관계 발견 가능!
- Training과 inference 모두 dropout 적용
③ Classification
- Global Average Pooling + Sigmoid
- Cross-entropy loss
11
Proposed Method
2. Inference Localization Map
① Grad-CAM
② Aggregate Localization Map
- 하나의 이미지로 부터 많은 localization map 생성
→ 다양한 조합으로 classification score 계산
- Activation score가 일정 threshold 보다 높으면 그 pixel에는 해당 class가 위치
→ 어떠한 class도 위치하지 않은 pixel은 training에서 제외
→ 한 pixel에 다수의 class가 위치하면, 특정 class 별로 score의 평균을 계산하여 가장 높은 score의 class로 결정
12
Proposed Method
3. Training the Segmentation Network
- 생성된 localization map은 semantic segmentation network 학습에 이용
13
Weakly-supervised:
𝐿 = 𝐿 𝑠𝑒𝑒𝑑 + 𝐿 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝐿 𝑠𝑒𝑒𝑑 = −
1
σ 𝑐∈𝐶 𝑆𝑐
෍
𝑐∈𝐶
෍
𝑢∈𝑆 𝑐
log 𝐻 𝑢,𝑐 −
1
σ 𝑐∈ ҧ𝐶 𝑆𝑐
෍
𝑐∈ ҧ𝐶
෍
𝑢∈𝑆 𝑐
log 𝐻 𝑢,𝑐
𝐿 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = −
1
𝑛
෍
𝑢=1
𝑛
෍
𝑐∈𝐶
𝑄 𝑢,𝑐 𝑋, 𝑓 𝑋 log
𝑄 𝑢,𝑐 𝑋, 𝑓 𝑋
𝑓𝑢,𝑐 𝑋
𝐿 𝑓𝑢𝑙𝑙 = −
1
σ 𝑐∈𝐶 𝐹𝑐
෍
𝑐∈𝐶
෍
𝑢∈𝐹𝑐
log 𝐻 𝑢,𝑐
Semi-supervised:
𝐿 = 𝐿 𝑠𝑒𝑒𝑑 + 𝐿 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 + 𝛼𝐿 𝑓𝑢𝑙𝑙
Foreground class Background class
Predicted segmentation map
= 𝑯 𝒖,𝒄
Fully-connected
Conditional Random Field
Ground truth
Normalization
https://m.blog.naver.com/laonple/221017461464
Seed cues
Experiments
1. Experimental Setup
- Dataset : Pascal VOC 2012
- Network details
- Classification : pre-trained VGG-16
- Segmentation : Deeplab-CRF-LargeFOV
- Experimental details
- Batch size : 10
- Image size : 321 x 321
- Learning rate & Optimizer : 0.001 and halved every 10 epochs & Adam optimizer
- # of localization map : 200
- threshold & 𝜶 for semi-supervised learning : 0.35 & 2 14
Experiments
2. Comparison to the State of the Art
- Weakly supervised segmentation
15
Experiments
2. Comparison to the State of the Art
- Semi-supervised segmentation
16
Experiments
2. Comparison to the State of the Art
17
Experiments
3. Ablation studies
① Effects of the Map Expansion Technique
→ Training, CAM extraction 각각 15.4, 14.2배 감소
→ GPU usage는 12% 증가 (확장된 localization map의 크기 때문에)
18
Experiments
3. Ablation studies
② Analysis of Iterative Inference
→ N이 커질수록 mIoU 증가 추세
→ N이 커질수록 metric의 std가 감소
19
Experiments
3. Ablation studies
③ Analysis of Dropout
Effects of dropout rate
- Dropout rate = 0.9일 때 DSRG보다 넓은 영역을 커버함
- 높은 rate는 object의 discriminative part를 drop
→ non-discriminative part를 사용하게 만듬
- 낮은 rate는 object의 discriminative part가 drop되지 않을 수 있음
→ 이 부분만으로 classification을 진행할 수 있기 때문에
non-discriminative part 사용 X
20
Experiments
3. Ablation studies
③ Analysis of Dropout
Comparison to general dropout
- General dropout으로 만들어진 feature map은 noisy해보임
→ inference 시 dropout 시행 X (+ center preserving X)
Effectiveness of each steps
21
Conclusions
1. Stochastic selection으로 많은 localization map을 얻은 뒤, 하나로 통합
→ 기존보다 더 넓은 activation map을 구할 수 있었음
2. Localization map을 kernel size에 맞게 확장
→ GPU를 효율적으로 사용하여 학습 및 CAM 추출에 시간 단축
3. Weakly supervised & semi-supervised segmentation 모두 활용 가능
22
감 사 합 니 다
23

More Related Content

Similar to FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

Similar to FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference (9)

Nationality recognition
Nationality recognitionNationality recognition
Nationality recognition
 
Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive losses
 
Simple Review of Single Image Super Resolution Task
Simple Review of Single Image Super Resolution TaskSimple Review of Single Image Super Resolution Task
Simple Review of Single Image Super Resolution Task
 
[A-GIST 발표] Crowdsourced 3D Mapping: A combined Multi-View Geometry and Self-...
[A-GIST 발표] Crowdsourced 3D Mapping: A combined Multi-View Geometry and Self-...[A-GIST 발표] Crowdsourced 3D Mapping: A combined Multi-View Geometry and Self-...
[A-GIST 발표] Crowdsourced 3D Mapping: A combined Multi-View Geometry and Self-...
 
Yolo v2 urop 발표자료
Yolo v2 urop 발표자료Yolo v2 urop 발표자료
Yolo v2 urop 발표자료
 

More from Sungchul Kim

More from Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 

Recently uploaded

Recently uploaded (8)

데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법
 
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
 
JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례
 
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
 
공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화
 
JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!
 
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
 
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
 

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

  • 2. Contents 1. Introduction 2. Related Work 3. Proposed Method 4. Experiments 5. Conclusions 2
  • 3. Introduction • Pixel-level annotation → Fully supervised semantic segmentation • Image-level annotation으로 segmentation network 학습은 어려움! → Weakly labeled data는 class의 존재 유무만 가리키기 때문! → Classification network에서 얻어진 localization map에 의존 → Localization map도 정확한 경계에 대한 표현은 없고, small discriminative part에 집중 3
  • 4. Introduction • FickleNet → CNN의 hidden unit의 random combination을 이용해 여러 가지 localization map 생성 → 각 sliding window position에서 random으로 hidden unit을 선택 4
  • 5. Related Work • Grad-CAM → Image-level annotation으로 pixel을 분류 → Object의 small discriminative region에 집중 5 https://arxiv.org/abs/1610.02391
  • 6. Related Work • SRG (Seeded Region Growing) → Seeded cues와 이미지로 Region Growing → Pixel level을 기준으로 clustering 6 https://blog.lunit.io/2019/04/25/ficklenetweakly-and-semi-supervised-semantic-image-segmentation-using-stochastic-inference/
  • 7. Related Work • DSRG (Deep Seeded Region Growing) → Seeded cues와 CAM으로 Region Growing, Segmentation training → Seeding loss + Fully connected CRF와의 Boundary loss 7 CAM Segmentation Probability Map 𝑯 𝒖,𝒄 https://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Weakly-Supervised_Semantic_Segmentation_CVPR_2018_paper.pdf
  • 8. Related Work • DSRG (Deep Seeded Region Growing) 8 https://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Weakly-Supervised_Semantic_Segmentation_CVPR_2018_paper.pdf
  • 9. Proposed Method 1. Hidden unit의 stochastic selection 사용, multi-class classification 학습 2. Training image의 localization map 생성 3. Localization map을 segmentation network 학습에 pseudo-label로 사용 9 ① ②
  • 10. Proposed Method 1. Stochastic Hidden Unit Selection ① Feature Map Expansion - GPU 활용을 극대화하기 위해 localization map을 stride = convolutional kernel size가 되도록 확장 - 한 번에 random selection을 시행하여 빠른 연산 가능 10
  • 11. Proposed Method 1. Stochastic Hidden Unit Selection ② Center-preserving Spatial Dropout - 각 sliding window position의 kernel 중심은 drop하지 않음 → 중심과 다른 위치와의 관계 발견 가능! - Training과 inference 모두 dropout 적용 ③ Classification - Global Average Pooling + Sigmoid - Cross-entropy loss 11
  • 12. Proposed Method 2. Inference Localization Map ① Grad-CAM ② Aggregate Localization Map - 하나의 이미지로 부터 많은 localization map 생성 → 다양한 조합으로 classification score 계산 - Activation score가 일정 threshold 보다 높으면 그 pixel에는 해당 class가 위치 → 어떠한 class도 위치하지 않은 pixel은 training에서 제외 → 한 pixel에 다수의 class가 위치하면, 특정 class 별로 score의 평균을 계산하여 가장 높은 score의 class로 결정 12
  • 13. Proposed Method 3. Training the Segmentation Network - 생성된 localization map은 semantic segmentation network 학습에 이용 13 Weakly-supervised: 𝐿 = 𝐿 𝑠𝑒𝑒𝑑 + 𝐿 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝐿 𝑠𝑒𝑒𝑑 = − 1 σ 𝑐∈𝐶 𝑆𝑐 ෍ 𝑐∈𝐶 ෍ 𝑢∈𝑆 𝑐 log 𝐻 𝑢,𝑐 − 1 σ 𝑐∈ ҧ𝐶 𝑆𝑐 ෍ 𝑐∈ ҧ𝐶 ෍ 𝑢∈𝑆 𝑐 log 𝐻 𝑢,𝑐 𝐿 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = − 1 𝑛 ෍ 𝑢=1 𝑛 ෍ 𝑐∈𝐶 𝑄 𝑢,𝑐 𝑋, 𝑓 𝑋 log 𝑄 𝑢,𝑐 𝑋, 𝑓 𝑋 𝑓𝑢,𝑐 𝑋 𝐿 𝑓𝑢𝑙𝑙 = − 1 σ 𝑐∈𝐶 𝐹𝑐 ෍ 𝑐∈𝐶 ෍ 𝑢∈𝐹𝑐 log 𝐻 𝑢,𝑐 Semi-supervised: 𝐿 = 𝐿 𝑠𝑒𝑒𝑑 + 𝐿 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 + 𝛼𝐿 𝑓𝑢𝑙𝑙 Foreground class Background class Predicted segmentation map = 𝑯 𝒖,𝒄 Fully-connected Conditional Random Field Ground truth Normalization https://m.blog.naver.com/laonple/221017461464 Seed cues
  • 14. Experiments 1. Experimental Setup - Dataset : Pascal VOC 2012 - Network details - Classification : pre-trained VGG-16 - Segmentation : Deeplab-CRF-LargeFOV - Experimental details - Batch size : 10 - Image size : 321 x 321 - Learning rate & Optimizer : 0.001 and halved every 10 epochs & Adam optimizer - # of localization map : 200 - threshold & 𝜶 for semi-supervised learning : 0.35 & 2 14
  • 15. Experiments 2. Comparison to the State of the Art - Weakly supervised segmentation 15
  • 16. Experiments 2. Comparison to the State of the Art - Semi-supervised segmentation 16
  • 17. Experiments 2. Comparison to the State of the Art 17
  • 18. Experiments 3. Ablation studies ① Effects of the Map Expansion Technique → Training, CAM extraction 각각 15.4, 14.2배 감소 → GPU usage는 12% 증가 (확장된 localization map의 크기 때문에) 18
  • 19. Experiments 3. Ablation studies ② Analysis of Iterative Inference → N이 커질수록 mIoU 증가 추세 → N이 커질수록 metric의 std가 감소 19
  • 20. Experiments 3. Ablation studies ③ Analysis of Dropout Effects of dropout rate - Dropout rate = 0.9일 때 DSRG보다 넓은 영역을 커버함 - 높은 rate는 object의 discriminative part를 drop → non-discriminative part를 사용하게 만듬 - 낮은 rate는 object의 discriminative part가 drop되지 않을 수 있음 → 이 부분만으로 classification을 진행할 수 있기 때문에 non-discriminative part 사용 X 20
  • 21. Experiments 3. Ablation studies ③ Analysis of Dropout Comparison to general dropout - General dropout으로 만들어진 feature map은 noisy해보임 → inference 시 dropout 시행 X (+ center preserving X) Effectiveness of each steps 21
  • 22. Conclusions 1. Stochastic selection으로 많은 localization map을 얻은 뒤, 하나로 통합 → 기존보다 더 넓은 activation map을 구할 수 있었음 2. Localization map을 kernel size에 맞게 확장 → GPU를 효율적으로 사용하여 학습 및 CAM 추출에 시간 단축 3. Weakly supervised & semi-supervised segmentation 모두 활용 가능 22
  • 23. 감 사 합 니 다 23