Knowledge distillation is a process that transfers knowledge from a teacher network to a student network. It was initially proposed to help deploy complex models onto devices with limited resources by distilling the knowledge into a smaller student model. Now it is commonly used to compress models for more efficient inference or training. The 2014 NIPS paper first defined knowledge distillation, using a softmax distribution over the teacher's outputs as "dark knowledge" to train the student, in addition to the hard targets. Since then, many models have been proposed that apply different distillation losses and processes to transfer knowledge from large pretrained models to more compact student models while maintaining high performance.
1. Knowledge Distillation 1
🧪
Knowledge Distillation
발표자 유용상
발표일자
논문링크
논문게재일
도메인 기타
발표자료
파일과미디어
Knowledge Distillation이란?
@2023년2월23일
Knowledge Distillation이란?
왜필요할까?
Distilling the Knowledge in a Neural Network (NIPS 2014)
진행과정
Soft Label
Distillation Loss
다양한KD 모델들
DistillBERT (NIPS 2019)
TinyBERT (EMNLP 2020)
SEED : SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION (ICLR
2021)
참고자료
2. Knowledge Distillation 2
지식(Knowledge) + 증류(Distillation)
→ Teacher Network로부터증류한지식을Student Network로transfer하는일련의과
정
왜필요할까?
처음등장했을때→ Model Deploy(모델배포) 측면에서필요하다고주장
3. Knowledge Distillation 3
현재→경량화된모델을만들기위해서, 학습단계에드는리소스를줄이기위해서등등다
양한이유로연구되고있는분야!
Distilling the Knowledge in a Neural
Network (NIPS 2014)
KD에대한개념을처음으로정의한논문
복잡한모델(ex.앙상블모델)을유저에게배포하는것은어렵기때문에KD를통해작은
모델로학습한결과를전달하고전달받은모델의성능을평가
사용데이터셋: MNIST (Multi Class Classification)
진행과정
Teacher Network 학습
▼
Teacher Network에서Soft Label(Soft output, Dark Knowledge) 추출
▼
추출한지식과 Student 모델이예측한결과와정답사이의CE Loss 를합쳐Distillation
Loss 구성