Explaining knowledge distillation

•

0 likes•366 views

taeseon ryu

https://youtu.be/3q3iHhgKOY0 영상에서 사용된 Explaining knowledge distillation 발표자료

Data & Analytics

김동희
Fundamental Team
김창연, 송헌, 이재윤
CVPR 2020

Knowledge Distillation
Teacher Model의 soft label을 Student Model의 정답 label로 사용!
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).
2

Knowledge Distillation
https://github.com/HobbitLong/RepDistiller
3

Why Knowledge Distillation is Successful?
4
Hypothesis:
1. 배경(background)보다 전경(foreground)를 학습함
2. 각 사물의 특징(visual concepts)들을 동시에
(simultaneously) 학습하는 경향이 있음
3. 처음에는 불필요한(unreliable) 특징(visual concepts)
도 학습하지만 나중에는 해당 특징을 제거
위의 세 가지 가설을 검증하기 위한 세 가지 Metric을 제안함

Quantification of Information Discarding
5

Hypothesis 1
6
Where,
배경(background)보다 전경(foreground)를 학습함

Hypothesis 1
7
배경(background)보다 전경(foreground)를 학습함

Hypothesis 2
8
각 사물의 특징(visual concepts)들을 동시에(simultaneously) 학습하는 경향이 있음
=> 특징을 얼마나 빨리 배우나
=> 얼마나 다양한 특징을 배우나
𝐷 𝑚𝑒𝑎𝑛과 𝐷𝑠𝑡𝑑 모두 작을수록 좋음

Hypothesis 2
9
각 사물의 특징(visual concepts)들을 동시에(simultaneously) 학습하는 경향이 있음

Hypothesis 3
10
처음에는 불필요한(unreliable) 특징(visual concepts)도 학습하지만 나중에는 해당 특징을 제거

More from taeseon ryu

Dataset Distillation by Matching Training Trajectories taeseon ryu

RL_UpsideDowntaeseon ryu

Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu

MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu

Scaling Instruction-Finetuned Language Modelstaeseon ryu

Visual prompt tuningtaeseon ryu

mPLUGtaeseon ryu

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu

Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu

The Forward-Forward Algorithmtaeseon ryu

Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu

BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu

ProximalPolicyOptimizationtaeseon ryu

Dream2Control paper reviewtaeseon ryu

Online Continual Learning on Class Incremental Blurry Task Configuration with...taeseon ryu

[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu

Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu

PaLM Scaling Language Modeling with Pathways - 230219 (1).pdftaeseon ryu

Distributional RL via Moment Matchingtaeseon ryu

Deep Reinforcement Learning from Human Preferencestaeseon ryu

More from taeseon ryu (20)

Dataset Distillation by Matching Training Trajectories

RL_UpsideDown

Packed Levitated Marker for Entity and Relation Extraction

MOReL: Model-Based Offline Reinforcement Learning

Scaling Instruction-Finetuned Language Models

Visual prompt tuning

mPLUG

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf

Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf

The Forward-Forward Algorithm

Towards Robust and Reproducible Active Learning using Neural Networks

BRIO: Bringing Order to Abstractive Summarization

ProximalPolicyOptimization

Dream2Control paper review

Online Continual Learning on Class Incremental Blurry Task Configuration with...

[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Unsupervised Neural Machine Translation for Low-Resource Domains

PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf

Distributional RL via Moment Matching

Deep Reinforcement Learning from Human Preferences

Explaining knowledge distillation

1. 김동희 Fundamental Team 김창연, 송헌, 이재윤 CVPR 2020

2. Knowledge Distillation Teacher Model의 soft label을 Student Model의 정답 label로 사용! Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015). 2

3. Knowledge Distillation https://github.com/HobbitLong/RepDistiller 3

4. Why Knowledge Distillation is Successful? 4 Hypothesis: 1. 배경(background)보다 전경(foreground)를 학습함 2. 각 사물의 특징(visual concepts)들을 동시에 (simultaneously) 학습하는 경향이 있음 3. 처음에는 불필요한(unreliable) 특징(visual concepts) 도 학습하지만 나중에는 해당 특징을 제거 위의 세 가지 가설을 검증하기 위한 세 가지 Metric을 제안함

5. Quantification of Information Discarding 5

6. Hypothesis 1 6 Where, 배경(background)보다 전경(foreground)를 학습함

7. Hypothesis 1 7 배경(background)보다 전경(foreground)를 학습함

8. Hypothesis 2 8 각 사물의 특징(visual concepts)들을 동시에(simultaneously) 학습하는 경향이 있음 => 특징을 얼마나 빨리 배우나 => 얼마나 다양한 특징을 배우나 𝐷 𝑚𝑒𝑎𝑛과 𝐷𝑠𝑡𝑑 모두 작을수록 좋음

9. Hypothesis 2 9 각 사물의 특징(visual concepts)들을 동시에(simultaneously) 학습하는 경향이 있음

10. Hypothesis 3 10 처음에는 불필요한(unreliable) 특징(visual concepts)도 학습하지만 나중에는 해당 특징을 제거

11. 11

12. Q & A 12

13. Result – Monocular Depth 13