Structured Knowledge Distillation
for Semantic Segmentation
2019. 4. 10
Sang Jun Lee
https://arxiv.org/abs/1903.04197
(CVPR 2019)
Knowledge Distillation
Teacher network (deep)
𝜎
𝜎 𝑧
𝑒
Σ 𝑒
Hard label
Loss
0
1
0
0
Student network
𝜎 Loss
Soft-label
Distillation
0.01 0.83 0.15 0.01
Knowledge Distillation
Model compression (network minimization)
Network ensemble
Self-distillation
Structured Knowledge Distillation for Semantic Segmentation
 Pixel-wise distillation
teacher network의 soft-max 출력에서 개별 픽셀에 해당하는
class-probability를 이용
 Pair-wise distillation
어떤 feature map에서 paired feature vector들의 similarity를
distillation
 Distillation of holistic knowledge
영상 전체의 정보를 이용하기 위한 teacher network 출력과
student network 출력 사이의 adversarial learning
Structured knowledge
Method
Pixel-wise distillation
Pair-wise distillation
Similarity map
Method
Wasserstein distance
: evaluate the difference between
real and fake distribution
Training
 Discriminator
(maximize)
 Student network
(minimize)
Total loss
Experiment
Teacher PSPNet
Students ESPNet, ESPNet-C, MobileNetV2Plus, ResNet18
https://arxiv.org/abs/1903.04197

[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation