SlideShare a Scribd company logo
1 of 21
Download to read offline
김 성 철
Contents
1. Introduction
2. Methods
3. Results
4. Conclusion
2
Introduction
• Terminologies
• Privileged provision : input features or computational resources
• Privileged information : input features
• Example about the privileged provision between training and inference
• Image tag recommendation / face recognition
3
→Resources are only
accessible for training!
Introduction
• Knowledge Distillation (KD)
• Classifier (student) / Teacher
• Generative Adversarial Networks (GAN)
• Classifier (Generator) / Discriminator
4https://light-tree.tistory.com/196
https://pathmind.com/kr/wiki/generative-adversarial-network-gan
Introduction
• Knowledge Distillation with Generative Adversarial Networks (KDGAN)
• KD to train a lightweight classifier.
• GAN to teach the classifier the true data distribution.
5
Methods
• Naïve GAN (NaGAN)
• The advantages and disadvantages of KD and NaGAN
• KD : requires a small number of training instances and epochs, but ~.
• NaGAN : ensures the equilibrium where 𝑝𝑐 𝒚 𝒙 = 𝑝 𝑢(𝒚|𝒙), but ~.
6
Methods
• KDGAN
• a classifier 𝐶, a teacher 𝑇, a discriminator 𝐷 with privileged provision 𝜚
• 𝐷 : maximize the probability of correctly distinguishing the true and pseudo labels.
• 𝐶, 𝑇 : minimize the probability that 𝐷 rejects their generated pseudo labels.
• 𝐶 : learn from 𝑇 by mimicking the learned distribution of 𝑇.
• 𝑻 also learn from 𝑪, because a teacher’s ability can also be enhanced by interacting
with students. → reduce the probability of generating different pseudo labels.
7
Methods
• KDGAN
8
Methods
• KDGAN
• The value function 𝑈 𝑐, 𝑡, 𝑑 (𝛼 ∈ 0, 1 , 𝛽 ∈ 0, +∞ , 𝛾 ∈ 0, +∞ )
• Distillation losses : L2 loss, Kullback-Leibler divergence, …
• ℒ 𝐷𝑆
𝑐
, ℒDS
t
are used to train the classifier and the teacher, respectively.
9
Distillation lossesAdversarial losses
Methods
• KDGAN
• The classifier perfectly learns the true data distribution.
→ 𝑝 𝛼
𝜚
𝒚 𝒙 = 𝛼𝑝𝑐 𝒚 𝒙 + 1 − 𝛼 𝑝𝑡
𝜚
(𝒚|𝒙)
10
Methods
• KDGAN
• ℒ 𝑀𝐷 = 𝛽ℒ 𝐷𝑆
𝑐
𝑝𝑐 𝒚 𝒙 , 𝑝𝑡
𝜚
𝒚 𝒙 + 𝛾ℒ 𝐷𝑆
𝑡
𝑝𝑡
𝜚
𝒚 𝒙 , 𝑝𝑐 𝒚 𝒙
• ℒ 𝐽𝑆 : the Jensen-Shannon divergence
* JSD 𝑝, 𝑞 =
1
2
𝐷 𝐾𝐿(𝑝||
𝑝+𝑞
2
) +
1
2
𝐷 𝐾𝐿(𝑞||
𝑝+𝑞
2
)
11
Methods
• KDGAN
12
Methods
• KDGAN
• ℒ 𝐽𝑆 reaches the minimum if and only if 𝑝 𝛼
𝜚
𝒚 𝒙 = 𝑝 𝑢(𝒚|𝒙)
• ℒ 𝐷𝑆
𝑐
(or ℒ 𝐷𝑆
𝑡
) reaches the minimum if and only if 𝑝𝑐 𝒚 𝒙 = 𝑝𝑡
𝜚
𝒚 𝒙
→KDGAN equilibrium
𝒑 𝒄 𝒚 𝒙 = 𝒑 𝒕
𝝔
𝒚 𝒙 = 𝒑 𝒖(𝒚|𝒙) (𝑪 learns the true data distribution.)
13
Methods
• KDGAN Training
• The training speed is closely related to the variance of gradients.
• The KDGAN framework by design
→ can reduce the variance of gradients!!
→ use a continuous space by relaxing the discrete samples!!
14
Methods
• KDGAN Training
15
Methods
• KDGAN Training
16
Methods
• KDGAN Training
• Gumbel-Max trick
17https://hulk89.github.io/machine%20learning/2017/11/20/reparametrization-trick/
http://openresearch.ai/t/categorical-reparameterization-with-gumbel-softmax/386
Results
• Experiments
• 𝛼 in [0.0, 1.0]
• 𝛽 in [0.001, 1000]
• 𝛾 in [0.0001, 100]
• The temperature parameter 𝜏
→ start with a large value (1.0) and exponentially decay it to a small value (0.1)
18
Results
• Deep Model Compression
19
Results
• Image Tag Recommendation
20
감 사 합 니 다
21

More Related Content

Similar to KDGAN: Knowledge Distillation with Generative Adversarial Networks

DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
AIML2 DNN 3.5hr (111-1).pdf
AIML2 DNN  3.5hr (111-1).pdfAIML2 DNN  3.5hr (111-1).pdf
AIML2 DNN 3.5hr (111-1).pdfssuserb4d806
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”Lviv Startup Club
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...sleepy_yoshi
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)NamHyuk Ahn
 
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty FiveMulti Task DPP for Basket Completion by Romain WARLOP, Fifty Five
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Fiverecsysfr
 
Ch 1-2 NN classifier
Ch 1-2 NN classifierCh 1-2 NN classifier
Ch 1-2 NN classifierZahra Amini
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lectureMuhammadAhmedShah2
 

Similar to KDGAN: Knowledge Distillation with Generative Adversarial Networks (20)

DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
AIML2 DNN 3.5hr (111-1).pdf
AIML2 DNN  3.5hr (111-1).pdfAIML2 DNN  3.5hr (111-1).pdf
AIML2 DNN 3.5hr (111-1).pdf
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularizat...
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty FiveMulti Task DPP for Basket Completion by Romain WARLOP, Fifty Five
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
Ch 1-2 NN classifier
Ch 1-2 NN classifierCh 1-2 NN classifier
Ch 1-2 NN classifier
 
ICML2015 Slides
ICML2015 SlidesICML2015 Slides
ICML2015 Slides
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
07 learning
07 learning07 learning
07 learning
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 

More from Sungchul Kim

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionSungchul Kim
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorSungchul Kim
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with ConvolutionsSungchul Kim
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationSungchul Kim
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Sungchul Kim
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic SegmentationSungchul Kim
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondSungchul Kim
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksSungchul Kim
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design SpacesSungchul Kim
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive LearningSungchul Kim
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningSungchul Kim
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Sungchul Kim
 

More from Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 

Recently uploaded

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 

Recently uploaded (20)

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 

KDGAN: Knowledge Distillation with Generative Adversarial Networks

  • 2. Contents 1. Introduction 2. Methods 3. Results 4. Conclusion 2
  • 3. Introduction • Terminologies • Privileged provision : input features or computational resources • Privileged information : input features • Example about the privileged provision between training and inference • Image tag recommendation / face recognition 3 →Resources are only accessible for training!
  • 4. Introduction • Knowledge Distillation (KD) • Classifier (student) / Teacher • Generative Adversarial Networks (GAN) • Classifier (Generator) / Discriminator 4https://light-tree.tistory.com/196 https://pathmind.com/kr/wiki/generative-adversarial-network-gan
  • 5. Introduction • Knowledge Distillation with Generative Adversarial Networks (KDGAN) • KD to train a lightweight classifier. • GAN to teach the classifier the true data distribution. 5
  • 6. Methods • Naïve GAN (NaGAN) • The advantages and disadvantages of KD and NaGAN • KD : requires a small number of training instances and epochs, but ~. • NaGAN : ensures the equilibrium where 𝑝𝑐 𝒚 𝒙 = 𝑝 𝑢(𝒚|𝒙), but ~. 6
  • 7. Methods • KDGAN • a classifier 𝐶, a teacher 𝑇, a discriminator 𝐷 with privileged provision 𝜚 • 𝐷 : maximize the probability of correctly distinguishing the true and pseudo labels. • 𝐶, 𝑇 : minimize the probability that 𝐷 rejects their generated pseudo labels. • 𝐶 : learn from 𝑇 by mimicking the learned distribution of 𝑇. • 𝑻 also learn from 𝑪, because a teacher’s ability can also be enhanced by interacting with students. → reduce the probability of generating different pseudo labels. 7
  • 9. Methods • KDGAN • The value function 𝑈 𝑐, 𝑡, 𝑑 (𝛼 ∈ 0, 1 , 𝛽 ∈ 0, +∞ , 𝛾 ∈ 0, +∞ ) • Distillation losses : L2 loss, Kullback-Leibler divergence, … • ℒ 𝐷𝑆 𝑐 , ℒDS t are used to train the classifier and the teacher, respectively. 9 Distillation lossesAdversarial losses
  • 10. Methods • KDGAN • The classifier perfectly learns the true data distribution. → 𝑝 𝛼 𝜚 𝒚 𝒙 = 𝛼𝑝𝑐 𝒚 𝒙 + 1 − 𝛼 𝑝𝑡 𝜚 (𝒚|𝒙) 10
  • 11. Methods • KDGAN • ℒ 𝑀𝐷 = 𝛽ℒ 𝐷𝑆 𝑐 𝑝𝑐 𝒚 𝒙 , 𝑝𝑡 𝜚 𝒚 𝒙 + 𝛾ℒ 𝐷𝑆 𝑡 𝑝𝑡 𝜚 𝒚 𝒙 , 𝑝𝑐 𝒚 𝒙 • ℒ 𝐽𝑆 : the Jensen-Shannon divergence * JSD 𝑝, 𝑞 = 1 2 𝐷 𝐾𝐿(𝑝|| 𝑝+𝑞 2 ) + 1 2 𝐷 𝐾𝐿(𝑞|| 𝑝+𝑞 2 ) 11
  • 13. Methods • KDGAN • ℒ 𝐽𝑆 reaches the minimum if and only if 𝑝 𝛼 𝜚 𝒚 𝒙 = 𝑝 𝑢(𝒚|𝒙) • ℒ 𝐷𝑆 𝑐 (or ℒ 𝐷𝑆 𝑡 ) reaches the minimum if and only if 𝑝𝑐 𝒚 𝒙 = 𝑝𝑡 𝜚 𝒚 𝒙 →KDGAN equilibrium 𝒑 𝒄 𝒚 𝒙 = 𝒑 𝒕 𝝔 𝒚 𝒙 = 𝒑 𝒖(𝒚|𝒙) (𝑪 learns the true data distribution.) 13
  • 14. Methods • KDGAN Training • The training speed is closely related to the variance of gradients. • The KDGAN framework by design → can reduce the variance of gradients!! → use a continuous space by relaxing the discrete samples!! 14
  • 17. Methods • KDGAN Training • Gumbel-Max trick 17https://hulk89.github.io/machine%20learning/2017/11/20/reparametrization-trick/ http://openresearch.ai/t/categorical-reparameterization-with-gumbel-softmax/386
  • 18. Results • Experiments • 𝛼 in [0.0, 1.0] • 𝛽 in [0.001, 1000] • 𝛾 in [0.0001, 100] • The temperature parameter 𝜏 → start with a large value (1.0) and exponentially decay it to a small value (0.1) 18
  • 19. Results • Deep Model Compression 19
  • 20. Results • Image Tag Recommendation 20
  • 21. 감 사 합 니 다 21