SlideShare a Scribd company logo
Network Recasting: A Universal Method
for Network Architecture Transformation
Speaker: Joonsang Yu
Speaker
2
Joonsang Yu
Education
• Dept. of EE at POSTECH (B.S)
• Dept. of ECE at SNE (Ph.D student)
• Advisor: Prof. Kiyoung Choi
Research Interests
• Hardware-friendly DL optimization
• Efficient hardware for DL
Publication
• Hardware: ICCD, DAC, ISOCC
• Machine learning: AAAI
• Related works
• Network Recasting
• Training Methods
• Experiments
• Conclusion
Outline
3
Related Works
Related Works
5
Hardware architecture
Intel Skylake architecture [1]
NVIDIA Turing architecture [2]
• Traditional computer architectures are not efficient for DNNs.
• NVDIA introduced Tensor cores to accelerate DNNs.
Related Works
6
DL accelerator
DianNao architecture [3] ZeNa architecture [4]
• To accelerate neural network, several accelerators are also introduced.
• DNNs consists of simple operations (MAC), so it is easy to accelerate.
• In addition, conditional memory access is also possible thanks to pruning.
Related Works
7
Network architecture
Big-Little architecture [5]
ShuffleNet v2 architecture [6]
• Many network architectures are introduced to improve performance.
• In addition, many research also focus on light-weight
and light-computation CNN architecture.
Related Works
8
Compression (pruning)
Example of pruning method: ThiNet [7]
• Pruning-based network compression methods were introduced.
• After training, we can remove weak weights or filters.
Related Works
9
Compression (distillation)
Knowledge distillation [8]
• By distilling knowledge from cumbersome model, small network can
achieve higher accuracy compared with conventional training method.
Related Works
10
Compression (distillation)
Deep mutual learning [9]
• By distilling knowledge from cumbersome model, small network can
achieve higher accuracy compared with conventional training method.
Related Works
11
Compression (distillation)
Deep mutual learning [10]
• In addition, knowledge distillation enables reducing network depth and
architecture transformation.
Network Recasting
Network Recasting
13
Network Recasting
• We transform pretrained blocks (source) into new blocks (target).
• The transformation is done by training the target block to generate output
activations (=feature map) similar to those of the source block.
• After training, the source block can be replaced with the target block.
Basic concept of network recasting.
Teacher network Student network
Network Recasting
14
Network Recasting
• Select the arbitrary block and recast.

Network Recasting
15
Network Recasting
• Train target block to generate output activation of source block.
Network Recasting
16
Network Recasting
• Replace the source block with trained target block.
Network Recasting
17
Network Recasting
• We can use the network after recasting.
“Dog”
Image via wikipedia
Network Recasting
18
Network Recasting
• The source block can be recast into any kinds of block.
Network Recasting
19
Network Recasting
Source Target
Network Recasting
20
Network Recasting
Source Target
Network Recasting
21
Network Recasting
Source Target
Network Recasting
22
Network Recasting
• We can recast the arbitrary source block into the arbitrary target block.
Source Target
Network Recasting
23
Network Recasting
Mixed-architecture
networkTeacher Student
• We can recast the entire blocks or some part of blocks.
DenseNet ConvNet
Training Methods
24
Mixed-architecture network
• When we recast partially, we can obtain a mixed-architecture network.
• The mixed-architecture network has both advantages of consisting blocks.
Mixed-architecture network
Image via wikipedia
Bottom Top
Training Methods
Training Methods
26
Block Training
• To avoid dimension mismatch problem, when training a target block, we
train the target block together with the next block by approximating the
output activations of the next block.
256-d 64-d
Training Methods
27
Block Training
• To avoid dimension mismatch problem, when training a target block, we
train the target block together with the next block by approximating the
output activations of the next block.
Dimension mismatch!
256-d 64-d
Training Methods
28
Block Training
• To avoid dimension mismatch problem, when training a target block, we
train the target block together with the next block by approximating the
output activations of the next block.
Dimension mismatch!
256-d 64-d
256-d 256-d
Training Methods
29
Sequential Recasting
• To recast the entire network, we recast the blocks sequentially.
Teacher Student
Training Methods
30
Sequential Recasting
• To recast the entire network, we recast the blocks sequentially.
Teacher Student
Training Methods
31
Sequential Recasting
• To recast the entire network, we recast the blocks sequentially.
Teacher Student
Training Methods
32
Sequential Recasting
• Sequential recasting can alleviate the vanishing gradient problem.
Teacher Student
Training Methods
33
Sequential Recasting
• Sequential recasting can alleviate the vanishing gradient problem.
Teacher Student
Gradient path
is very long
Training Methods
34
Sequential Recasting
• Sequential recasting can alleviate the vanishing gradient problem.
Teacher Student
Very short
gradient path

Training Methods
35
Sequential Recasting
• Sequential recasting can alleviate the vanishing gradient problem.
Teacher Student
Very short
gradient path

Training Methods
36
Sequential Recasting
• Sequential recasting can alleviate the vanishing gradient problem.
Teacher Student
Very short
gradient path

Training Methods
37
Fine-tuning
• After finishing sequential recasting, we use the knowledge distillation
approach to fine-tune the student network.
• We train the student network with logits of the teacher network and
ground truth.
MSE loss for the logits Cross-entropy loss between
the given label and softmax output
Experiments
Experiments
39
Experiments
• Filter reduction
• Vanishing gradient problem
• Actual speed-up on GPU
Experiments
40
Experiments
 Filter reduction
• Vanishing gradient problem
• Actual speed-up on GPU
Experiments
41
Filter reduction (Compression)
• Recast a given source block into a smaller target block of the same type.
• Network recasting automatically remove redundant filters to reconstruct
the output activation of source block.
Source Target
Experiments
42
Visualization of Filter Reduction
• We recast the first block of AlexNet to visualize the filter reduction.
• Our method can remove redundant filters without any similarity or
effectiveness check criteria.
Visualization of filters in the first layer of AlexNet
Experiments
43
Experiments
• Filter reduction
 Vanishing gradient problem
• Actual speed-up on GPU
Experiments
44
Vanishing gradient
• We compare the recasting with knowledge distillation and backpropagation.
KD & Backprop Network recasting
Gradient
path
Gradient
path
Experiments
45
Vanishing gradient
• Our method achieved much higher accuracy in spite of deep plain network.
Method Type # layers C10+ C100+
ResNet-56
Baseline 56 7.02 30.89
Recasting Conv 29 6.75 32.14
KD Conv 29 9.43 33.22
Backprop Conv 29 10.61 37.85
Recasting result on CIFAR dataset.
Experiments
46
Experiments
• Filter reduction
• Vanishing gradient problem
 Actual speed-up on GPU
Experiments
47
Activation load
• Generally, 1x1 convolution is used to reduce # of mults and params.
• However, 1x1 convolution actually increases activation loads from main
memory, and thus inference time.
Comparison of # multiplications. Comparison of inference time.
Experiments
48
Activation load
• For the activation reduction, we recast source block into the different type.
• By transforming network architecture, we can reduce the inference time.
Smaller
activation
Source Target
Experiments
49
Recasting Results on ILSVRC2012
Recasting results. (batch size is 64, NVIDIA Titan X (pascal))
Method Type Top1 Act/batch Time/batch
ResNet-56
Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x)
Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x)
Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x)
DenseNet-121
Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x)
Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x)
Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x)
Basic: Basic residual block Bottle: Bottleneck residual block.
Experiments
50
Recasting Results on ILSVRC2012
Recasting results. (batch size is 64, NVIDIA Titan X (pascal))
Method Type Top1 Act/batch Time/batch
ResNet-56
Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x)
Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x)
Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x)
DenseNet-121
Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x)
Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x)
Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x)
Basic: Basic residual block Bottle: Bottleneck residual block.
Experiments
51
Recasting Results on ILSVRC2012
Recasting results. (batch size is 64, NVIDIA Titan X (pascal))
Method Type Top1 Act/batch Time/batch
ResNet-56
Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x)
Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x)
Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x)
DenseNet-121
Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x)
Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x)
Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x)
Basic: Basic residual block Bottle: Bottleneck residual block.
Experiments
52
Recasting Results on ILSVRC2012
Recasting results. (batch size is 64, NVIDIA Titan X (pascal))
Method Type Top1 Act/batch Time/batch
ResNet-56
Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x)
Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x)
Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x)
DenseNet-121
Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x)
Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x)
Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x)
Basic: Basic residual block Bottle: Bottleneck residual block.
Experiments
53
Previous works
• Many previous use weight/Filter pruning to reduce # of mults and params.
• The network architecture is not changed, so many 1x1 convolutions still exist.
• Thus, activation loads are still large.
Limitation of weight/filter pruning.
Experiments
54
Comparison with Previous work
• Compared with previous work, network recasting achived the lowest error
rate and the highest actual speed-up.
Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
Experiments
55
Comparison with Previous work
• Compared with previous work, network recasting achived the lowest error
rate and the highest actual speed-up.
Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
Experiments
56
Comparison with Previous work
• Compared with previous work, network recasting achived the lowest error
rate and the highest actual speed-up.
Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
Conclusion
Conclusion
58
• The network recasting enables transformation of a network into a
different type.
• Sequential training of a student network gives a better result even
by alleviating vanishing gradient problem.
• The network recasting can remove redundant filters and also
accelerate inference effectively.
 We achieved up to 2.1x inference time reduction on ResNet-50
 We also achieved up to 3.2x reduction on VGG-16.
Question
59
Thank you!
If you have question, please contact to me
shorm21@dal.snu.ac.kr
Reference
60
• [1] https://wccftech.com/idf15-intel-skylake-analysis-cpu-gpu-microarchitecture-ddr4-memory-impact/3/
• [2] https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/
• [3] Chen, Tianshi, et al. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS,
2014.
• [4] Kim, Dongyoung, et al. Zena: Zero-aware neural network accelerator. IEEE Design & Test 35.1 (2018): 39-46.
• [5] Chen, Chun-Fu, et al. Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition.
In ICLR, 2019.
• [6] Ma, Ningning, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV. 2018.
• [7] Luo, J.-H., et al. ThiNet: A filter level pruning method for deep neural network compression. In ICCV, 2017.
• [8] Hinton, G. et al. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning
Workshop, 2014.
• [9] Zhang, Ying, et al. Deep mutual learning. IN CVPR. 2018.
• [10] Yim, Junho, et al. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In
CVPR. 2017.
Appendix
Parameter & Activation load
Appendix
Inference time
Appendix
63
Block Training
Block training method.
Dimension mismatch!
256-d 64-d
256-d 256-d
𝐴 ∶ 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛
𝑊𝑇, 𝑊𝑆: 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑎𝑛𝑑 𝑠𝑡𝑢𝑑𝑒𝑛𝑡
𝑁: # 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛
Appendix
64
Experimental Setup
• The network recasting was implemented on the PyTorch framework.
• We adopted batch normalization for all networks.
• We used the Xavier initializer in all experiments.
• We used SGD with Nesterov momentum to train the teacher network and used
Adam optimizer for the network recasting.
• we used the pre-trained ResNet-50, DenseNet-121, and VGG-16 available from
torchvision.
Appendix
Mixed-architecture Network
Appendix
66
Recasting Results on CIFAR
6.90 31.04
31.56
22.39
4.71
25.60
6.75
8.31

More Related Content

What's hot

ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
InVID Project
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
NAVER Engineering
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
Jeremy Nixon
 
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
Kodaira Tomonori
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
JaeJun Yoo
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image Classfication
Yogendra Tamang
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
Dmytro Mishkin
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
ananth
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Vincenzo Lomonaco
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
NAVER Engineering
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone
 
Cnn
CnnCnn
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Vishal Mishra
 
ICML 2017 Meta network
ICML 2017 Meta networkICML 2017 Meta network
ICML 2017 Meta network
Katy Lee
 

What's hot (20)

ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
[Introduction] Neural Network-Based Abstract Generation for Opinions and Argu...
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image Classfication
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
ICML 2017 Meta network
ICML 2017 Meta networkICML 2017 Meta network
ICML 2017 Meta network
 

Similar to Network recasting

Network Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture TransformationNetwork Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture Transformation
JoonsangYu2
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
ankit_ppt
 
Image classification with neural networks
Image classification with neural networksImage classification with neural networks
Image classification with neural networks
Sepehr Rasouli
 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
migob991
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
NECST Lab @ Politecnico di Milano
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
Tech Triveni
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA Taiwan
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Deep learning
Deep learningDeep learning
Deep learning
Rouyun Pan
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Ohsawa Goodfellow
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
asdq4
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 

Similar to Network recasting (20)

Network Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture TransformationNetwork Recasting: A Universal Method for Network Architecture Transformation
Network Recasting: A Universal Method for Network Architecture Transformation
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
Image classification with neural networks
Image classification with neural networksImage classification with neural networks
Image classification with neural networks
 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 

More from NAVER Engineering

React vac pattern
React vac patternReact vac pattern
React vac pattern
NAVER Engineering
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
NAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Network recasting

  • 1. Network Recasting: A Universal Method for Network Architecture Transformation Speaker: Joonsang Yu
  • 2. Speaker 2 Joonsang Yu Education • Dept. of EE at POSTECH (B.S) • Dept. of ECE at SNE (Ph.D student) • Advisor: Prof. Kiyoung Choi Research Interests • Hardware-friendly DL optimization • Efficient hardware for DL Publication • Hardware: ICCD, DAC, ISOCC • Machine learning: AAAI
  • 3. • Related works • Network Recasting • Training Methods • Experiments • Conclusion Outline 3
  • 5. Related Works 5 Hardware architecture Intel Skylake architecture [1] NVIDIA Turing architecture [2] • Traditional computer architectures are not efficient for DNNs. • NVDIA introduced Tensor cores to accelerate DNNs.
  • 6. Related Works 6 DL accelerator DianNao architecture [3] ZeNa architecture [4] • To accelerate neural network, several accelerators are also introduced. • DNNs consists of simple operations (MAC), so it is easy to accelerate. • In addition, conditional memory access is also possible thanks to pruning.
  • 7. Related Works 7 Network architecture Big-Little architecture [5] ShuffleNet v2 architecture [6] • Many network architectures are introduced to improve performance. • In addition, many research also focus on light-weight and light-computation CNN architecture.
  • 8. Related Works 8 Compression (pruning) Example of pruning method: ThiNet [7] • Pruning-based network compression methods were introduced. • After training, we can remove weak weights or filters.
  • 9. Related Works 9 Compression (distillation) Knowledge distillation [8] • By distilling knowledge from cumbersome model, small network can achieve higher accuracy compared with conventional training method.
  • 10. Related Works 10 Compression (distillation) Deep mutual learning [9] • By distilling knowledge from cumbersome model, small network can achieve higher accuracy compared with conventional training method.
  • 11. Related Works 11 Compression (distillation) Deep mutual learning [10] • In addition, knowledge distillation enables reducing network depth and architecture transformation.
  • 13. Network Recasting 13 Network Recasting • We transform pretrained blocks (source) into new blocks (target). • The transformation is done by training the target block to generate output activations (=feature map) similar to those of the source block. • After training, the source block can be replaced with the target block. Basic concept of network recasting. Teacher network Student network
  • 14. Network Recasting 14 Network Recasting • Select the arbitrary block and recast. 
  • 15. Network Recasting 15 Network Recasting • Train target block to generate output activation of source block.
  • 16. Network Recasting 16 Network Recasting • Replace the source block with trained target block.
  • 17. Network Recasting 17 Network Recasting • We can use the network after recasting. “Dog” Image via wikipedia
  • 18. Network Recasting 18 Network Recasting • The source block can be recast into any kinds of block.
  • 22. Network Recasting 22 Network Recasting • We can recast the arbitrary source block into the arbitrary target block. Source Target
  • 23. Network Recasting 23 Network Recasting Mixed-architecture networkTeacher Student • We can recast the entire blocks or some part of blocks. DenseNet ConvNet
  • 24. Training Methods 24 Mixed-architecture network • When we recast partially, we can obtain a mixed-architecture network. • The mixed-architecture network has both advantages of consisting blocks. Mixed-architecture network Image via wikipedia Bottom Top
  • 26. Training Methods 26 Block Training • To avoid dimension mismatch problem, when training a target block, we train the target block together with the next block by approximating the output activations of the next block. 256-d 64-d
  • 27. Training Methods 27 Block Training • To avoid dimension mismatch problem, when training a target block, we train the target block together with the next block by approximating the output activations of the next block. Dimension mismatch! 256-d 64-d
  • 28. Training Methods 28 Block Training • To avoid dimension mismatch problem, when training a target block, we train the target block together with the next block by approximating the output activations of the next block. Dimension mismatch! 256-d 64-d 256-d 256-d
  • 29. Training Methods 29 Sequential Recasting • To recast the entire network, we recast the blocks sequentially. Teacher Student
  • 30. Training Methods 30 Sequential Recasting • To recast the entire network, we recast the blocks sequentially. Teacher Student
  • 31. Training Methods 31 Sequential Recasting • To recast the entire network, we recast the blocks sequentially. Teacher Student
  • 32. Training Methods 32 Sequential Recasting • Sequential recasting can alleviate the vanishing gradient problem. Teacher Student
  • 33. Training Methods 33 Sequential Recasting • Sequential recasting can alleviate the vanishing gradient problem. Teacher Student Gradient path is very long
  • 34. Training Methods 34 Sequential Recasting • Sequential recasting can alleviate the vanishing gradient problem. Teacher Student Very short gradient path 
  • 35. Training Methods 35 Sequential Recasting • Sequential recasting can alleviate the vanishing gradient problem. Teacher Student Very short gradient path 
  • 36. Training Methods 36 Sequential Recasting • Sequential recasting can alleviate the vanishing gradient problem. Teacher Student Very short gradient path 
  • 37. Training Methods 37 Fine-tuning • After finishing sequential recasting, we use the knowledge distillation approach to fine-tune the student network. • We train the student network with logits of the teacher network and ground truth. MSE loss for the logits Cross-entropy loss between the given label and softmax output
  • 39. Experiments 39 Experiments • Filter reduction • Vanishing gradient problem • Actual speed-up on GPU
  • 40. Experiments 40 Experiments  Filter reduction • Vanishing gradient problem • Actual speed-up on GPU
  • 41. Experiments 41 Filter reduction (Compression) • Recast a given source block into a smaller target block of the same type. • Network recasting automatically remove redundant filters to reconstruct the output activation of source block. Source Target
  • 42. Experiments 42 Visualization of Filter Reduction • We recast the first block of AlexNet to visualize the filter reduction. • Our method can remove redundant filters without any similarity or effectiveness check criteria. Visualization of filters in the first layer of AlexNet
  • 43. Experiments 43 Experiments • Filter reduction  Vanishing gradient problem • Actual speed-up on GPU
  • 44. Experiments 44 Vanishing gradient • We compare the recasting with knowledge distillation and backpropagation. KD & Backprop Network recasting Gradient path Gradient path
  • 45. Experiments 45 Vanishing gradient • Our method achieved much higher accuracy in spite of deep plain network. Method Type # layers C10+ C100+ ResNet-56 Baseline 56 7.02 30.89 Recasting Conv 29 6.75 32.14 KD Conv 29 9.43 33.22 Backprop Conv 29 10.61 37.85 Recasting result on CIFAR dataset.
  • 46. Experiments 46 Experiments • Filter reduction • Vanishing gradient problem  Actual speed-up on GPU
  • 47. Experiments 47 Activation load • Generally, 1x1 convolution is used to reduce # of mults and params. • However, 1x1 convolution actually increases activation loads from main memory, and thus inference time. Comparison of # multiplications. Comparison of inference time.
  • 48. Experiments 48 Activation load • For the activation reduction, we recast source block into the different type. • By transforming network architecture, we can reduce the inference time. Smaller activation Source Target
  • 49. Experiments 49 Recasting Results on ILSVRC2012 Recasting results. (batch size is 64, NVIDIA Titan X (pascal)) Method Type Top1 Act/batch Time/batch ResNet-56 Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x) Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x) Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x) DenseNet-121 Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x) Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x) Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x) Basic: Basic residual block Bottle: Bottleneck residual block.
  • 50. Experiments 50 Recasting Results on ILSVRC2012 Recasting results. (batch size is 64, NVIDIA Titan X (pascal)) Method Type Top1 Act/batch Time/batch ResNet-56 Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x) Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x) Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x) DenseNet-121 Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x) Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x) Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x) Basic: Basic residual block Bottle: Bottleneck residual block.
  • 51. Experiments 51 Recasting Results on ILSVRC2012 Recasting results. (batch size is 64, NVIDIA Titan X (pascal)) Method Type Top1 Act/batch Time/batch ResNet-56 Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x) Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x) Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x) DenseNet-121 Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x) Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x) Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x) Basic: Basic residual block Bottle: Bottleneck residual block.
  • 52. Experiments 52 Recasting Results on ILSVRC2012 Recasting results. (batch size is 64, NVIDIA Titan X (pascal)) Method Type Top1 Act/batch Time/batch ResNet-56 Baseline Bottle 23.85 740.48M(1.0x) 107.17ms(1.0x) Recasting(C) Conv 30.74 161.92M(4.6x) 37.21ms(2.9x) Recasting(C+R) Conv+Bottle 25.00 236.16M(3.1x) 49.97ms(2.1x) DenseNet-121 Baseline Dense 25.57 1,057.28M(1.0x) 111.31ms(1.0x) Recasting(R) Basic 26.42 340.48M(3.1x) 81.17ms(1.4x) Recasting(R+D) Basic+Bottle 24.87 585.60M(1.8x) 88.94ms(1.3x) Basic: Basic residual block Bottle: Bottleneck residual block.
  • 53. Experiments 53 Previous works • Many previous use weight/Filter pruning to reduce # of mults and params. • The network architecture is not changed, so many 1x1 convolutions still exist. • Thus, activation loads are still large. Limitation of weight/filter pruning.
  • 54. Experiments 54 Comparison with Previous work • Compared with previous work, network recasting achived the lowest error rate and the highest actual speed-up. Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
  • 55. Experiments 55 Comparison with Previous work • Compared with previous work, network recasting achived the lowest error rate and the highest actual speed-up. Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
  • 56. Experiments 56 Comparison with Previous work • Compared with previous work, network recasting achived the lowest error rate and the highest actual speed-up. Comparison with previous works. (batch size is 64, NVIDIA Titan X (pascal))
  • 58. Conclusion 58 • The network recasting enables transformation of a network into a different type. • Sequential training of a student network gives a better result even by alleviating vanishing gradient problem. • The network recasting can remove redundant filters and also accelerate inference effectively.  We achieved up to 2.1x inference time reduction on ResNet-50  We also achieved up to 3.2x reduction on VGG-16.
  • 59. Question 59 Thank you! If you have question, please contact to me shorm21@dal.snu.ac.kr
  • 60. Reference 60 • [1] https://wccftech.com/idf15-intel-skylake-analysis-cpu-gpu-microarchitecture-ddr4-memory-impact/3/ • [2] https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/ • [3] Chen, Tianshi, et al. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014. • [4] Kim, Dongyoung, et al. Zena: Zero-aware neural network accelerator. IEEE Design & Test 35.1 (2018): 39-46. • [5] Chen, Chun-Fu, et al. Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition. In ICLR, 2019. • [6] Ma, Ningning, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV. 2018. • [7] Luo, J.-H., et al. ThiNet: A filter level pruning method for deep neural network compression. In ICCV, 2017. • [8] Hinton, G. et al. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2014. • [9] Zhang, Ying, et al. Deep mutual learning. IN CVPR. 2018. • [10] Yim, Junho, et al. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR. 2017.
  • 63. Appendix 63 Block Training Block training method. Dimension mismatch! 256-d 64-d 256-d 256-d 𝐴 ∶ 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑊𝑇, 𝑊𝑆: 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑜𝑓 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑎𝑛𝑑 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑁: # 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛
  • 64. Appendix 64 Experimental Setup • The network recasting was implemented on the PyTorch framework. • We adopted batch normalization for all networks. • We used the Xavier initializer in all experiments. • We used SGD with Nesterov momentum to train the teacher network and used Adam optimizer for the network recasting. • we used the pre-trained ResNet-50, DenseNet-121, and VGG-16 available from torchvision.
  • 66. Appendix 66 Recasting Results on CIFAR 6.90 31.04 31.56 22.39 4.71 25.60 6.75 8.31