SlideShare a Scribd company logo
1 of 22
Download to read offline
Neural Network Pruning
with Residual-Connections and Limited-Data
Dong Min Choi
Yonsei University Severance Hospital CCIDS
Introduction
• Filter level pruning

- an effective method to accelerate the inference speed

- Problems

1) how to prune residual connection?

2) pruning with limited data (pruning is worse than fine-tuning)
S Han et al. Learning both Weights and Connections for Efficient Neural Networks. arXiv:1506.02626
Introduction
• Pruning residual connection

- Most methods only prune filters inside the residual connection, leaving the
number of output channels unchanged

- The pruned block will become an hourglass (middle layer is handicapped)

- Therefore, pruning channels both inside and outside is more preferred

(still bottleneck or an opened wallet shape)
Introduction
• Pruning residual connection

- The advantages of wallet structure compared with hourglass

1) more accurate thanks to a larger pruning space

2) faster even with the same number of FLOPs

3) save more storage because more weights will be pruned
Introduction
• Pruning with limited data

- Method 1 : Fine-tuning

- Method 2 : directly prune the model without the large dataset



⇨ Method 2 usually has a significantly lower accuracy than Method 1
Introduction
CURL (Compression Using Residual-connections and Limited-data)
• Pruning Residual Connection

- prune not only channels inside the residual branch, but also channels of its output
activation maps (both the identity branch and the residual branch)

- The resulting wallet-shaped structure shows more advantages 

• Pruning with limited data

- Combining data augmentation and knowledge distillation

- A label refinement strategy



Method
1. Pruning Residual-Connections
• Most previous studies only focus on

reducing channels inside the residual

block

• To prune the residual block,

a new criterion that can evaluate

multiple filters simultaneously

should be designed
Method
1. Pruning Residual-Connections
• Idea : to (1) remove the channels one by one
and (2) calculate the information loss



- (1) : set and of the BN layers to 0









- (2) : randomly select 256 images from training
dataset and compare the similarity of two prediction
probability by using KL-divergence
γ β
Let’s reduce the output channels !
Method
1. Pruning Residual-Connections
• Idea : to (1) remove the channels one by one
and (2) calculate the information loss



- (1) : set and of the BN layers to 0









- (2) : randomly select 256 images from training
dataset and compare the similarity of two prediction
probability by using KL-divergence
γ β
Let’s reduce the output channels !
Method
1. Pruning Residual-Connections
• Idea : to (1) remove the channels one by one
and (2) calculate the information loss
• Repeat this step 256 times, resulting in 256

importance scores, one for each channel
Let’s reduce the output channels !
Method
1. Pruning Residual-Connections
• Idea : to (1) remove the channels one by one
and (2) calculate the information loss
• Repeat this step 256 times, resulting in 256

importance scores, one for each channel

• For those channels inside the residual block,

only need to erase one filter at each step

• The top filter will be removed, leading to a
pruned small model
k
Method
2. Prune with Limited Data
• The pruned small model is then fine-tuned on the target dataset

• Fine-tuning

- Data augmentation

- Knowledge distillation
Method
2. Prune with Limited Data
• Data Augmentation
Motivation : Most discriminative information often lies in local image patches 

rather than global information
Method
2. Prune with Limited Data
• Label Refinement

- Fine-tuning & Knowledge Distillation (KD)

- Problem : Because the teacher model has not seen the new data,

its output (logits) may be noisy

- Update the noisy logits during training via SGD

- Two Steps

1) Fine-tuning on original small dataset with KD plus mixup

2) Fine-tuning on expanded dataset with label refinement
Method
2. Prune with Limited Data
• Label Refinement

Step 1) Knowledge Distillation with Mixup

- A new input via mixup : 

- Knowledge distillation w/ the new input : 



- With these two techniques, the pruned small model can converge
into a good local minima
https://blog.airlab.re.kr/2019/11/mixup
Method
2. Prune with Limited Data
• Label Refinement

Step 2) Knowledge Distillation with Label Refinement

- Fine-tuning the small model on the expanded dataset and update
logits to remove label noises

- The soft-target of each image

will be extracted first and stored

- Update soft-target via SGD
Experiments
1. Pruning ResNet50 on ImageNet
* Actual Inference Speed Test (on NVIDIA Tesla M40 GPU) for 256 mini-batch

- Hourglass (AutoPruner) : 0.21s

- Wallet (CURL, MACS : 1.39G, #Param : 7.83M) : 0.19s
Experiments
2. Pruning on Small-scale Datasets
Ablation Studies
1. Impact of Fine-Tuning Strategy
Ablation Studies
2. Impact of Pruning Criterion
Ablation Studies
3. Impact of Label Refinement
Thank you

More Related Content

What's hot

Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Hsing-chuan Hsieh
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines IntroductionBartlomiej Twardowski
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance홍배 김
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural NetworksVikram Nandini
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning pptshubhamshirke12
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
 
Lecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrixLecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrixMostafa El-Hosseini
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료taeseon ryu
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
Natural Disasters Intensity Analysis and Classification using Artificial Inte...
Natural Disasters Intensity Analysis and Classification using Artificial Inte...Natural Disasters Intensity Analysis and Classification using Artificial Inte...
Natural Disasters Intensity Analysis and Classification using Artificial Inte...Arun K.S
 

What's hot (20)

Objects as points
Objects as pointsObjects as points
Objects as points
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines Introduction
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Activation function
Activation functionActivation function
Activation function
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Lecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrixLecture 12 binary classifier confusion matrix
Lecture 12 binary classifier confusion matrix
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
Natural Disasters Intensity Analysis and Classification using Artificial Inte...
Natural Disasters Intensity Analysis and Classification using Artificial Inte...Natural Disasters Intensity Analysis and Classification using Artificial Inte...
Natural Disasters Intensity Analysis and Classification using Artificial Inte...
 

Similar to Neural network pruning with residual connections and limited-data review [cdm]

Neural network learning ability
Neural network learning abilityNeural network learning ability
Neural network learning abilityNabeel Aron
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Dongmin Choi
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxFaridAliMousa1
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateBilly Yang
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA Taiwan
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 

Similar to Neural network pruning with residual connections and limited-data review [cdm] (20)

Neural network learning ability
Neural network learning abilityNeural network learning ability
Neural network learning ability
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & Update
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 

More from Dongmin Choi

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...Dongmin Choi
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Dongmin Choi
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer VisionDongmin Choi
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationDongmin Choi
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Dongmin Choi
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationDongmin Choi
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Dongmin Choi
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Dongmin Choi
 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingDongmin Choi
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Dongmin Choi
 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Dongmin Choi
 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
Augmix review [cdm]
Augmix review [cdm]Augmix review [cdm]
Augmix review [cdm]Dongmin Choi
 
ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]Dongmin Choi
 

More from Dongmin Choi (20)

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-training
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]
 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
Augmix review [cdm]
Augmix review [cdm]Augmix review [cdm]
Augmix review [cdm]
 
ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]ICCV 2019 REVIEW [CDM]
ICCV 2019 REVIEW [CDM]
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Neural network pruning with residual connections and limited-data review [cdm]

  • 1. Neural Network Pruning with Residual-Connections and Limited-Data Dong Min Choi Yonsei University Severance Hospital CCIDS
  • 2. Introduction • Filter level pruning
 - an effective method to accelerate the inference speed
 - Problems
 1) how to prune residual connection?
 2) pruning with limited data (pruning is worse than fine-tuning) S Han et al. Learning both Weights and Connections for Efficient Neural Networks. arXiv:1506.02626
  • 3. Introduction • Pruning residual connection
 - Most methods only prune filters inside the residual connection, leaving the number of output channels unchanged
 - The pruned block will become an hourglass (middle layer is handicapped)
 - Therefore, pruning channels both inside and outside is more preferred
 (still bottleneck or an opened wallet shape)
  • 4. Introduction • Pruning residual connection
 - The advantages of wallet structure compared with hourglass
 1) more accurate thanks to a larger pruning space
 2) faster even with the same number of FLOPs
 3) save more storage because more weights will be pruned
  • 5. Introduction • Pruning with limited data
 - Method 1 : Fine-tuning
 - Method 2 : directly prune the model without the large dataset
 
 ⇨ Method 2 usually has a significantly lower accuracy than Method 1
  • 6. Introduction CURL (Compression Using Residual-connections and Limited-data) • Pruning Residual Connection
 - prune not only channels inside the residual branch, but also channels of its output activation maps (both the identity branch and the residual branch)
 - The resulting wallet-shaped structure shows more advantages 
 • Pruning with limited data
 - Combining data augmentation and knowledge distillation
 - A label refinement strategy
 

  • 7. Method 1. Pruning Residual-Connections • Most previous studies only focus on
 reducing channels inside the residual
 block • To prune the residual block,
 a new criterion that can evaluate
 multiple filters simultaneously
 should be designed
  • 8. Method 1. Pruning Residual-Connections • Idea : to (1) remove the channels one by one and (2) calculate the information loss
 
 - (1) : set and of the BN layers to 0
 
 
 
 
 - (2) : randomly select 256 images from training dataset and compare the similarity of two prediction probability by using KL-divergence γ β Let’s reduce the output channels !
  • 9. Method 1. Pruning Residual-Connections • Idea : to (1) remove the channels one by one and (2) calculate the information loss
 
 - (1) : set and of the BN layers to 0
 
 
 
 
 - (2) : randomly select 256 images from training dataset and compare the similarity of two prediction probability by using KL-divergence γ β Let’s reduce the output channels !
  • 10. Method 1. Pruning Residual-Connections • Idea : to (1) remove the channels one by one and (2) calculate the information loss • Repeat this step 256 times, resulting in 256
 importance scores, one for each channel Let’s reduce the output channels !
  • 11. Method 1. Pruning Residual-Connections • Idea : to (1) remove the channels one by one and (2) calculate the information loss • Repeat this step 256 times, resulting in 256
 importance scores, one for each channel • For those channels inside the residual block,
 only need to erase one filter at each step • The top filter will be removed, leading to a pruned small model k
  • 12. Method 2. Prune with Limited Data • The pruned small model is then fine-tuned on the target dataset • Fine-tuning
 - Data augmentation
 - Knowledge distillation
  • 13. Method 2. Prune with Limited Data • Data Augmentation Motivation : Most discriminative information often lies in local image patches 
 rather than global information
  • 14. Method 2. Prune with Limited Data • Label Refinement
 - Fine-tuning & Knowledge Distillation (KD)
 - Problem : Because the teacher model has not seen the new data,
 its output (logits) may be noisy
 - Update the noisy logits during training via SGD
 - Two Steps
 1) Fine-tuning on original small dataset with KD plus mixup
 2) Fine-tuning on expanded dataset with label refinement
  • 15. Method 2. Prune with Limited Data • Label Refinement
 Step 1) Knowledge Distillation with Mixup
 - A new input via mixup : 
 - Knowledge distillation w/ the new input : 
 
 - With these two techniques, the pruned small model can converge into a good local minima https://blog.airlab.re.kr/2019/11/mixup
  • 16. Method 2. Prune with Limited Data • Label Refinement
 Step 2) Knowledge Distillation with Label Refinement
 - Fine-tuning the small model on the expanded dataset and update logits to remove label noises
 - The soft-target of each image
 will be extracted first and stored
 - Update soft-target via SGD
  • 17. Experiments 1. Pruning ResNet50 on ImageNet * Actual Inference Speed Test (on NVIDIA Tesla M40 GPU) for 256 mini-batch
 - Hourglass (AutoPruner) : 0.21s
 - Wallet (CURL, MACS : 1.39G, #Param : 7.83M) : 0.19s
  • 18. Experiments 2. Pruning on Small-scale Datasets
  • 19. Ablation Studies 1. Impact of Fine-Tuning Strategy
  • 20. Ablation Studies 2. Impact of Pruning Criterion
  • 21. Ablation Studies 3. Impact of Label Refinement