SlideShare a Scribd company logo
1 of 16
LeapMind ICML2019 Reading Session
LIT: Learned Intermediate Representation
Training for Model Compression
Reader:
LeapMind, DL Researcher
Joel Nicholls
LeapMind Inc. © 2019 2
Paper Info
Published:
Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia ;
Proceedings of the 36th International Conference on Machine
Learning, PMLR 97:3509-3518, 2019. here.
→ Diagrams, equations, and figures from the paper
have been used in these slides to explain LIT.
ICML 2019: author’s slides.
NIPS 2018 Workshop CDNNRIA open review.
LeapMind Inc. © 2019
During training, the amount of computation is larger.
But arguably, it is more important to compress/accelerate the inference.
So for that reason, LIT is a nice technology for deep learning.
3
What is it ?
● Similar to knowledge distillation, it uses a teacher network to improve the accuracy
of a student network.
● At test time, the teacher is cut away from the student, so there is no increase in
computation for the inference stage, compared to training from scratch.
● (I think) the theory behind methods like this is still not well established. (Lopez-Paz
et al. [4] is one of the few works on theory). But, anyway if it gets results, then great.
LeapMind Inc. © 2019 4
Related works - Knowledge distillation
Knowledge distillation loss [1] is the combination of two losses.
1. The usual xentropy loss between the student outputs and the ground truth.
2. Another xentropy loss between the (trained) teacher outputs and student outputs.
Image by Ujjwal U. of Intel
https://software.intel.com/en-
us/articles/knowledge-distillation-
with-keras
LeapMind Inc. © 2019 5
Related works - Knowledge distillation
Knowledge distillation introduces two new hyperparameters.
The authors of LIT (Koratana et al.) say that both must be tuned to get good results.
1. Tau (temperature), to soften the targets.
2. Alpha, is the weighting between the two types of loss.
LeapMind Inc. © 2019 6
Related works - others
Born again neural networks [3]
They did experiments using distillation on
same-size student and teacher.
Fitnets [2]
Is a kind of hint training. It distills one of the
intermediate feature maps.
- the most similar work to LIT.
Image by Romero et al. from the paper
“fitnets: hints for thin deep nets”, published
as a conference paper in ICML 2015 [2]
LeapMind Inc. © 2019 7
Top-down view of LIT
● Combines both knowledge distillation (KD) loss AND intermediate representation (IR) loss.
● IR loss is L2 loss between intermediate feature maps (must be same size).
● In the training forward pass, student block receives teacher block as input.
LeapMind Inc. © 2019 8
LIT loss equation
LeapMind Inc. © 2019 9
More details for LIT
● Their student is less layers, but same thickness as teacher.
● They put the IR loss before downsampling points (but not at every downsample).
● New hyperparameter beta is the weighting between KD and IR losses.
● After the main training, they do a fine-tune with KD loss only.
● They find new hyperparameters tau, alpha, beta for each architecture type and dataset pair.
LeapMind Inc. © 2019 10
Experiments
Now moving to experiments !
In experiments, LIT compare with:
Knowledge distillation [1], fitnets [2], born again neural networks [3], and from-scratch.
● They say their method is better than all of these.
● They also do some ablation, which is good, but I won’t mention.
LeapMind Inc. © 2019 11
Experiments
LeapMind Inc. © 2019 12
How much improvement is that ?
Keeping to just one example:
Resnet-20 for CIFAR100
classification.
(reading roughly from the graph)
From scratch KD LIT
Test error 30.6 28.1 27.39
Improvement from scratch 0 30.6-28.1 = 2.5 30.6-27.39 = 3.21
Relative improvement 0 2.5/30.6 = 8.2% 3.21/30.6 = 10.5%
LeapMind Inc. © 2019 13
Additional experiments
Nice performance on Sentiment analysis
And can be combined with pruning
LeapMind Inc. © 2019 14
Can also be used for GAN
LeapMind Inc. © 2019 15
The main points, from my overall impression
● They compare with from-scratch. Some other distillation/pruning papers
don’t do that, but it is very important to see what is the improvement.
● Performs a bit better than knowledge distillation, in terms of relative
improvement (8.2% → 10.5% for Resnet-20 CIFAR100).
● Can compress GAN, which other distillation methods can’t do.
● Needs same size intermediate feature maps at the points where the student
and teacher are linked. For this reason, it is mostly best for student/teacher
pairs with same width (channels), and different depth (layers).
LeapMind Inc. © 2019 16
References
[1] Hinton et al. “Distilling the Knowledge in a Neural Network”
https://arxiv.org/abs/1503.02531
[2] Romero et al. “Fitnets: Hints for thin deep nets” (ICLR 2015)
https://arxiv.org/abs/1412.6550
[3] Furlanello et al. “Born again neural networks” (ICML 2018)
http://proceedings.mlr.press/v80/furlanello18a.html
[4] Lopez-Paz et al. “Unifying distillation and privileged information” (ICLR 2016)
https://arxiv.org/abs/1511.03643

More Related Content

Similar to [Icml2019]LIT: Learned Intermediate Representation Training for Model Compression

IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET Journal
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...IRJET Journal
 
CP Optimizer pour la planification et l'ordonnancement
CP Optimizer pour la planification et l'ordonnancementCP Optimizer pour la planification et l'ordonnancement
CP Optimizer pour la planification et l'ordonnancementPhilippe Laborie
 
Career Point Infosystems IPO
Career Point Infosystems IPOCareer Point Infosystems IPO
Career Point Infosystems IPOAngel Broking
 
Overview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning TasksOverview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning TasksHima Patel
 
IRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired MapreduceIRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired MapreduceIRJET Journal
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLBryan Bischof
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET Journal
 
IRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep LearningIRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep LearningIRJET Journal
 
Course Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptxCourse Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptxMobin26
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...IJERA Editor
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...IJERA Editor
 
IRJET- Virtual Classroom
IRJET- Virtual ClassroomIRJET- Virtual Classroom
IRJET- Virtual ClassroomIRJET Journal
 
An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...
An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...
An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...Eswar Publications
 
Diabetes Disease Prediction Using Machine Learning Algorithms
Diabetes Disease Prediction Using Machine Learning AlgorithmsDiabetes Disease Prediction Using Machine Learning Algorithms
Diabetes Disease Prediction Using Machine Learning AlgorithmsIRJET Journal
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 

Similar to [Icml2019]LIT: Learned Intermediate Representation Training for Model Compression (20)

IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
 
CP Optimizer pour la planification et l'ordonnancement
CP Optimizer pour la planification et l'ordonnancementCP Optimizer pour la planification et l'ordonnancement
CP Optimizer pour la planification et l'ordonnancement
 
Career Point Infosystems IPO
Career Point Infosystems IPOCareer Point Infosystems IPO
Career Point Infosystems IPO
 
Overview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning TasksOverview and Importance of Data Quality for Machine Learning Tasks
Overview and Importance of Data Quality for Machine Learning Tasks
 
IRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired MapreduceIRJET- Predicting Academic Course Preference using Inspired Mapreduce
IRJET- Predicting Academic Course Preference using Inspired Mapreduce
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in ML
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial Intelligence
 
21CLHK9 - Building Heroes
21CLHK9 - Building Heroes21CLHK9 - Building Heroes
21CLHK9 - Building Heroes
 
IRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep LearningIRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep Learning
 
Course Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptxCourse Outline Sep 2021 Trimester.pptx
Course Outline Sep 2021 Trimester.pptx
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
 
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...A Hybrid method of face detection based on Feature Extraction using PIFR and ...
A Hybrid method of face detection based on Feature Extraction using PIFR and ...
 
IRJET- Virtual Classroom
IRJET- Virtual ClassroomIRJET- Virtual Classroom
IRJET- Virtual Classroom
 
Promoting computer knowledge among D.T.Ed students
Promoting computer knowledge among D.T.Ed studentsPromoting computer knowledge among D.T.Ed students
Promoting computer knowledge among D.T.Ed students
 
An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...
An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...
An Elitist Simulated Annealing Algorithm for Solving Multi Objective Optimiza...
 
Diabetes Disease Prediction Using Machine Learning Algorithms
Diabetes Disease Prediction Using Machine Learning AlgorithmsDiabetes Disease Prediction Using Machine Learning Algorithms
Diabetes Disease Prediction Using Machine Learning Algorithms
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 

More from LeapMind Inc

Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DLLeapMind Inc
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...
[Icml2019] mix hop higher-order graph convolutional architectures via spars...LeapMind Inc
 
[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...LeapMind Inc
 
エッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことエッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことLeapMind Inc
 
20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用LeapMind Inc
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAILeapMind Inc
 
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係LeapMind Inc
 
20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂LeapMind Inc
 
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜LeapMind Inc
 
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)LeapMind Inc
 
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.120180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1LeapMind Inc
 
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説LeapMind Inc
 
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGAAn Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGALeapMind Inc
 
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会LeapMind Inc
 
JUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep LearningコンパイラJUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep LearningコンパイラLeapMind Inc
 

More from LeapMind Inc (17)

Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DL
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
 
[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...
 
エッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことエッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なこと
 
20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAI
 
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
 
20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂
 
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
 
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
 
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.120180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
 
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
 
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGAAn Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
 
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
 
Pitch v2.2
Pitch v2.2Pitch v2.2
Pitch v2.2
 
JUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep LearningコンパイラJUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep Learningコンパイラ
 

Recently uploaded

UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Recently uploaded (20)

UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

[Icml2019]LIT: Learned Intermediate Representation Training for Model Compression

  • 1. LeapMind ICML2019 Reading Session LIT: Learned Intermediate Representation Training for Model Compression Reader: LeapMind, DL Researcher Joel Nicholls
  • 2. LeapMind Inc. © 2019 2 Paper Info Published: Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia ; Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3509-3518, 2019. here. → Diagrams, equations, and figures from the paper have been used in these slides to explain LIT. ICML 2019: author’s slides. NIPS 2018 Workshop CDNNRIA open review.
  • 3. LeapMind Inc. © 2019 During training, the amount of computation is larger. But arguably, it is more important to compress/accelerate the inference. So for that reason, LIT is a nice technology for deep learning. 3 What is it ? ● Similar to knowledge distillation, it uses a teacher network to improve the accuracy of a student network. ● At test time, the teacher is cut away from the student, so there is no increase in computation for the inference stage, compared to training from scratch. ● (I think) the theory behind methods like this is still not well established. (Lopez-Paz et al. [4] is one of the few works on theory). But, anyway if it gets results, then great.
  • 4. LeapMind Inc. © 2019 4 Related works - Knowledge distillation Knowledge distillation loss [1] is the combination of two losses. 1. The usual xentropy loss between the student outputs and the ground truth. 2. Another xentropy loss between the (trained) teacher outputs and student outputs. Image by Ujjwal U. of Intel https://software.intel.com/en- us/articles/knowledge-distillation- with-keras
  • 5. LeapMind Inc. © 2019 5 Related works - Knowledge distillation Knowledge distillation introduces two new hyperparameters. The authors of LIT (Koratana et al.) say that both must be tuned to get good results. 1. Tau (temperature), to soften the targets. 2. Alpha, is the weighting between the two types of loss.
  • 6. LeapMind Inc. © 2019 6 Related works - others Born again neural networks [3] They did experiments using distillation on same-size student and teacher. Fitnets [2] Is a kind of hint training. It distills one of the intermediate feature maps. - the most similar work to LIT. Image by Romero et al. from the paper “fitnets: hints for thin deep nets”, published as a conference paper in ICML 2015 [2]
  • 7. LeapMind Inc. © 2019 7 Top-down view of LIT ● Combines both knowledge distillation (KD) loss AND intermediate representation (IR) loss. ● IR loss is L2 loss between intermediate feature maps (must be same size). ● In the training forward pass, student block receives teacher block as input.
  • 8. LeapMind Inc. © 2019 8 LIT loss equation
  • 9. LeapMind Inc. © 2019 9 More details for LIT ● Their student is less layers, but same thickness as teacher. ● They put the IR loss before downsampling points (but not at every downsample). ● New hyperparameter beta is the weighting between KD and IR losses. ● After the main training, they do a fine-tune with KD loss only. ● They find new hyperparameters tau, alpha, beta for each architecture type and dataset pair.
  • 10. LeapMind Inc. © 2019 10 Experiments Now moving to experiments ! In experiments, LIT compare with: Knowledge distillation [1], fitnets [2], born again neural networks [3], and from-scratch. ● They say their method is better than all of these. ● They also do some ablation, which is good, but I won’t mention.
  • 11. LeapMind Inc. © 2019 11 Experiments
  • 12. LeapMind Inc. © 2019 12 How much improvement is that ? Keeping to just one example: Resnet-20 for CIFAR100 classification. (reading roughly from the graph) From scratch KD LIT Test error 30.6 28.1 27.39 Improvement from scratch 0 30.6-28.1 = 2.5 30.6-27.39 = 3.21 Relative improvement 0 2.5/30.6 = 8.2% 3.21/30.6 = 10.5%
  • 13. LeapMind Inc. © 2019 13 Additional experiments Nice performance on Sentiment analysis And can be combined with pruning
  • 14. LeapMind Inc. © 2019 14 Can also be used for GAN
  • 15. LeapMind Inc. © 2019 15 The main points, from my overall impression ● They compare with from-scratch. Some other distillation/pruning papers don’t do that, but it is very important to see what is the improvement. ● Performs a bit better than knowledge distillation, in terms of relative improvement (8.2% → 10.5% for Resnet-20 CIFAR100). ● Can compress GAN, which other distillation methods can’t do. ● Needs same size intermediate feature maps at the points where the student and teacher are linked. For this reason, it is mostly best for student/teacher pairs with same width (channels), and different depth (layers).
  • 16. LeapMind Inc. © 2019 16 References [1] Hinton et al. “Distilling the Knowledge in a Neural Network” https://arxiv.org/abs/1503.02531 [2] Romero et al. “Fitnets: Hints for thin deep nets” (ICLR 2015) https://arxiv.org/abs/1412.6550 [3] Furlanello et al. “Born again neural networks” (ICML 2018) http://proceedings.mlr.press/v80/furlanello18a.html [4] Lopez-Paz et al. “Unifying distillation and privileged information” (ICLR 2016) https://arxiv.org/abs/1511.03643