SlideShare a Scribd company logo
1 of 10
Download to read offline
Knowledge Distillation 1
🧪
Knowledge Distillation
발표자 유용상
발표일자
논문링크
논문게재일
도메인 기타
발표자료
파일과미디어
Knowledge Distillation이란?
@2023년2월23일
Knowledge Distillation이란?
왜필요할까?
Distilling the Knowledge in a Neural Network (NIPS 2014)
진행과정
Soft Label
Distillation Loss
다양한KD 모델들
DistillBERT (NIPS 2019)
TinyBERT (EMNLP 2020)
SEED : SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION (ICLR
2021)
참고자료
Knowledge Distillation 2
지식(Knowledge) + 증류(Distillation)
→ Teacher Network로부터증류한지식을Student Network로transfer하는일련의과
정
왜필요할까?
처음등장했을때→ Model Deploy(모델배포) 측면에서필요하다고주장
Knowledge Distillation 3
현재→경량화된모델을만들기위해서, 학습단계에드는리소스를줄이기위해서등등다
양한이유로연구되고있는분야!
Distilling the Knowledge in a Neural
Network (NIPS 2014)
KD에대한개념을처음으로정의한논문
복잡한모델(ex.앙상블모델)을유저에게배포하는것은어렵기때문에KD를통해작은
모델로학습한결과를전달하고전달받은모델의성능을평가
사용데이터셋: MNIST (Multi Class Classification)
진행과정
Teacher Network 학습
▼
Teacher Network에서Soft Label(Soft output, Dark Knowledge) 추출
▼
추출한지식과 Student 모델이예측한결과와정답사이의CE Loss 를합쳐Distillation
Loss 구성
Knowledge Distillation 4
Soft Label
일반적인분류모델이🐮, 🐶, 😺, 🚗를구분한다면?
정답(Hard Label, Original Target) :
분류모델이추론한결과:
논문에서는정답확률이아닌나머지값에주목했고이것들을Dark Knowledge라고표현
But!! 분류에서주로사용되는소프트맥스함수는큰값은더크게만들고작은값은더작게
만드는특징이있음
따라서Teacher Model의Dark Knowledge를잘추출하기위해서는출력값의분포를조금
더Soft하게만들필요가있다!!
일반소프트맥스식에T(Temperature)가추가됨: 높아지면Soft, 낮아지면Hard
Knowledge Distillation 5
Distillation Loss
🧐: 추출한Teacher Model 의지식을Student Model한테어떻게학습시킴??
→ Student Model이Teacher Model의Soft Label을출력하도록함! (KD Loss)
🧐:하지만Soft Label만학습시키는것은정답Label을예측하는게아니라그저
Teacher Model이뱉는결과의‘분포’만답습하는모델을만드는거아닌가요??
→ Student 모델이예측하는결과와정답(Hard Label)이가까워지도록하는CE Loss
구성
위의두Loss의합을최종손실함수로정의함
실험결과
Knowledge Distillation 6
MNIST dataset에서숫자3 데이터를제거하여student model을knowledge
distillation 방법으로학습→ 숫자3에대한정보를학습하지않았지만, soft label
이갖고있는정보로만학습하여test 3 이미지에대해98.6%의정확도를달성
student model이10개의모델을ensemble한model과비슷한정확도를보여줌,
10개의모델을ensemble하는비용을생각하면, knowledge distillation은정말효
과적!!
파이토치구현블로그
https://deep-learning-study.tistory.com/700
다양한KD 모델들
DistillBERT (NIPS 2019)
Teacher : 사전학습된BERT 모델(RoBERTa처럼dynamic masking 사용)
Student : token-type embedding, pooler 제거+ 레이어개수2배감소
3가지Loss 사용
1. Distillation Loss
: 소프트타깃과소프트예측사이의CE Loss
2. Masked Language Modeling Loss
: 하드타깃과하드예측사이의CE Loss
3. Cosine Embedding Loss
Knowledge Distillation 7
: Teacher와Student의hidden state vector 사이의거리로두모델의state가같은방향을
바라보게함
→ 기본BERT 대비2배적은레이어(모델용량207MB) + 유사한(97%) 성능+ 빠른(60%)
추론속도
TinyBERT (EMNLP 2020)
세가지Loss
1. Transformer Distillation
: Teacher Model의Transformer Layer의어텐션행렬(정규화전) 학습
+Transformer Layer의아웃풋(=Hidden States) 학습
2. Embedding-layer Distillation
Knowledge Distillation 8
: Teacher Model 의임베딩결과학습
3. Prediction-layer Distillation
: 최종레이어의결과값에대한Soft CE loss
두가지단계의distillation
1. General Distillation
: Teacher Model에서[Transformer Distillation, Embedding layer Distillation]
수행
2. Task-Specific Distillation (Over-parameterization 해결)
a. Data Augmentation
b. Task-Specific Distillation (Fine Tuning)
→ 4개Layer 버전: BERT_base보다7.5배작고9.4배빠름+ 96.8% 성능
→ 6개Layer 버전: 파라미터40% 감소+ 2배빠름+ 성능유지
SEED : SELF-SUPERVISED DISTILLATION FOR VISUAL
REPRESENTATION (ICLR 2021)
→ Constrastive Learning에서KD 도입
사전학습된Teacher Model을freeze해서더작은모델에게Distill
Knowledge Distillation 9
이미지를randomly augment한뒤계산하는feature에관해두모델의probability score
유사도를CE로구함
참고자료
https://baeseongsu.github.io/posts/knowledge-distillation/
https://deep-learning-study.tistory.com/699
https://deep-learning-study.tistory.com/700
https://velog.io/@dldydldy75/지식-증류-Knowledge-Distillation
https://syj9700.tistory.com/38
https://3months.tistory.com/436
https://facerain.club/distilbert-paper/
https://littlefoxdiary.tistory.com/64
Knowledge Distillation 10
KD, SSL 등을포괄하는개념인representation learning에대한좋은글:
https://89douner.tistory.com/339

More Related Content

More from YongSang Yoo

221220_페르소나챗봇
221220_페르소나챗봇221220_페르소나챗봇
221220_페르소나챗봇YongSang Yoo
 
230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...YongSang Yoo
 
221108_Multimodal Transformer
221108_Multimodal Transformer221108_Multimodal Transformer
221108_Multimodal TransformerYongSang Yoo
 

More from YongSang Yoo (8)

221220_페르소나챗봇
221220_페르소나챗봇221220_페르소나챗봇
221220_페르소나챗봇
 
220920_AI ETHICS
220920_AI ETHICS220920_AI ETHICS
220920_AI ETHICS
 
230309_LoRa
230309_LoRa230309_LoRa
230309_LoRa
 
230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...
 
221108_Multimodal Transformer
221108_Multimodal Transformer221108_Multimodal Transformer
221108_Multimodal Transformer
 
221011_BERT
221011_BERT221011_BERT
221011_BERT
 
220910_GatedRNN
220910_GatedRNN220910_GatedRNN
220910_GatedRNN
 
220906_Glove
220906_Glove220906_Glove
220906_Glove
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Recently uploaded (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

230223_Knowledge_Distillation

  • 1. Knowledge Distillation 1 🧪 Knowledge Distillation 발표자 유용상 발표일자 논문링크 논문게재일 도메인 기타 발표자료 파일과미디어 Knowledge Distillation이란? @2023년2월23일 Knowledge Distillation이란? 왜필요할까? Distilling the Knowledge in a Neural Network (NIPS 2014) 진행과정 Soft Label Distillation Loss 다양한KD 모델들 DistillBERT (NIPS 2019) TinyBERT (EMNLP 2020) SEED : SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION (ICLR 2021) 참고자료
  • 2. Knowledge Distillation 2 지식(Knowledge) + 증류(Distillation) → Teacher Network로부터증류한지식을Student Network로transfer하는일련의과 정 왜필요할까? 처음등장했을때→ Model Deploy(모델배포) 측면에서필요하다고주장
  • 3. Knowledge Distillation 3 현재→경량화된모델을만들기위해서, 학습단계에드는리소스를줄이기위해서등등다 양한이유로연구되고있는분야! Distilling the Knowledge in a Neural Network (NIPS 2014) KD에대한개념을처음으로정의한논문 복잡한모델(ex.앙상블모델)을유저에게배포하는것은어렵기때문에KD를통해작은 모델로학습한결과를전달하고전달받은모델의성능을평가 사용데이터셋: MNIST (Multi Class Classification) 진행과정 Teacher Network 학습 ▼ Teacher Network에서Soft Label(Soft output, Dark Knowledge) 추출 ▼ 추출한지식과 Student 모델이예측한결과와정답사이의CE Loss 를합쳐Distillation Loss 구성
  • 4. Knowledge Distillation 4 Soft Label 일반적인분류모델이🐮, 🐶, 😺, 🚗를구분한다면? 정답(Hard Label, Original Target) : 분류모델이추론한결과: 논문에서는정답확률이아닌나머지값에주목했고이것들을Dark Knowledge라고표현 But!! 분류에서주로사용되는소프트맥스함수는큰값은더크게만들고작은값은더작게 만드는특징이있음 따라서Teacher Model의Dark Knowledge를잘추출하기위해서는출력값의분포를조금 더Soft하게만들필요가있다!! 일반소프트맥스식에T(Temperature)가추가됨: 높아지면Soft, 낮아지면Hard
  • 5. Knowledge Distillation 5 Distillation Loss 🧐: 추출한Teacher Model 의지식을Student Model한테어떻게학습시킴?? → Student Model이Teacher Model의Soft Label을출력하도록함! (KD Loss) 🧐:하지만Soft Label만학습시키는것은정답Label을예측하는게아니라그저 Teacher Model이뱉는결과의‘분포’만답습하는모델을만드는거아닌가요?? → Student 모델이예측하는결과와정답(Hard Label)이가까워지도록하는CE Loss 구성 위의두Loss의합을최종손실함수로정의함 실험결과
  • 6. Knowledge Distillation 6 MNIST dataset에서숫자3 데이터를제거하여student model을knowledge distillation 방법으로학습→ 숫자3에대한정보를학습하지않았지만, soft label 이갖고있는정보로만학습하여test 3 이미지에대해98.6%의정확도를달성 student model이10개의모델을ensemble한model과비슷한정확도를보여줌, 10개의모델을ensemble하는비용을생각하면, knowledge distillation은정말효 과적!! 파이토치구현블로그 https://deep-learning-study.tistory.com/700 다양한KD 모델들 DistillBERT (NIPS 2019) Teacher : 사전학습된BERT 모델(RoBERTa처럼dynamic masking 사용) Student : token-type embedding, pooler 제거+ 레이어개수2배감소 3가지Loss 사용 1. Distillation Loss : 소프트타깃과소프트예측사이의CE Loss 2. Masked Language Modeling Loss : 하드타깃과하드예측사이의CE Loss 3. Cosine Embedding Loss
  • 7. Knowledge Distillation 7 : Teacher와Student의hidden state vector 사이의거리로두모델의state가같은방향을 바라보게함 → 기본BERT 대비2배적은레이어(모델용량207MB) + 유사한(97%) 성능+ 빠른(60%) 추론속도 TinyBERT (EMNLP 2020) 세가지Loss 1. Transformer Distillation : Teacher Model의Transformer Layer의어텐션행렬(정규화전) 학습 +Transformer Layer의아웃풋(=Hidden States) 학습 2. Embedding-layer Distillation
  • 8. Knowledge Distillation 8 : Teacher Model 의임베딩결과학습 3. Prediction-layer Distillation : 최종레이어의결과값에대한Soft CE loss 두가지단계의distillation 1. General Distillation : Teacher Model에서[Transformer Distillation, Embedding layer Distillation] 수행 2. Task-Specific Distillation (Over-parameterization 해결) a. Data Augmentation b. Task-Specific Distillation (Fine Tuning) → 4개Layer 버전: BERT_base보다7.5배작고9.4배빠름+ 96.8% 성능 → 6개Layer 버전: 파라미터40% 감소+ 2배빠름+ 성능유지 SEED : SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION (ICLR 2021) → Constrastive Learning에서KD 도입 사전학습된Teacher Model을freeze해서더작은모델에게Distill
  • 9. Knowledge Distillation 9 이미지를randomly augment한뒤계산하는feature에관해두모델의probability score 유사도를CE로구함 참고자료 https://baeseongsu.github.io/posts/knowledge-distillation/ https://deep-learning-study.tistory.com/699 https://deep-learning-study.tistory.com/700 https://velog.io/@dldydldy75/지식-증류-Knowledge-Distillation https://syj9700.tistory.com/38 https://3months.tistory.com/436 https://facerain.club/distilbert-paper/ https://littlefoxdiary.tistory.com/64
  • 10. Knowledge Distillation 10 KD, SSL 등을포괄하는개념인representation learning에대한좋은글: https://89douner.tistory.com/339