SlideShare a Scribd company logo
1 of 17
Download to read offline
ResNeSt: Split-Attention Networks
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
University of California, Davis & Amazon
CVPR 2020
2020.05.31
Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents
ResNeSt
Introduction – Proposal
• While Image classification models have recently
continued to advance, most downstream applications
still employ “ResNet” as the backbone network
• NAS-derived models are usually not optimized for
training efficiency of memory usage.
• Recent image classification networks have focused
more on group or depth-wise convolution, these
methods do not transfer well to other tasks (No cross-
channel relationships)
Introduction / Related Work / Methods and Experiments / Conclusion
Depth-wise Convolution
Neural Architecture Search
01
ResNeSt
Introduction – Contributions
• Explored a simple architectural modification of the ResNet.
→ Requires no additional computation and is easy to be adopted as a backbone for
other vision tasks.
• Set large scale benchmarks on image classifications and transfer learning applications.
→ Tested on image classification, object detection, instance segmentation, and
semantic segmentation.
• ResNeSt outperforms all existing ResNet variants and has the same computational
efficiency and even achieves better speed-accuracy trade-offs than SOTA NAS-derived
models.
Introduction / Related Work / Methods and Experiments / Conclusion
02
Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
Multi-path and Feature-Map Attention
• Multi-path representation has shown
success in “GoogleNet”
• “ResNext” adopted group convolution in
the ResNet bottle block, which converts the
multi-path structure into a unified operation.
• “SE-Net” introduced a channel-attention
mechanism.
• “SK-Net” brings the feature-map attention
across two network branches.
Group Convolution
Inception Block
Squeeze-and-Excitation Block 03
Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
• ResNeSt generalized the channel-wise attention into feature-map
group representation
Split Attention
04
Methods and Experiments
Split-Attention Networks
Introduction / Related Work / Methods and Experiments / Conclusion
• Features are divided into several groups
- Cardinality hyperparameter: K
- Radix hyperparameter: R
- Total number of feature groups: G = RK
• Element-wise summation across multiple
splits → Feature-map groups with the same
cardinality-index but different radix index
are fused together
• Global contextual information with
embedded channel-wise statistics can be
gathered with GAP
• Two consecutive FC layers are added to
predict the attention weights for each splits
05
Methods and Experiments
Network Tweaks
Introduction / Related Work / Methods and Experiments / Conclusion
• Average Downsampling
→ In terms of preserving spatial information, zero
padding is suboptimal. Instead of using strided
convolution at the transitioning block, use average
pooling layer.
• Tweaks from ResNet-D
→ The first 7x7 convolutional layer is replaced with
three consecutive 3x3 layers, which have the same
receptive field size with a similar computational cost
→ 2x2 average pooling layer is added to the shortcut
connection prior to the 1x1 convolutional layer for
the transitioning blocks.
06
Methods and Experiments
Training Strategy
Introduction / Related Work / Methods and Experiments / Conclusion
• Large Mini-batch Distributed Training
→ Used cosine scheduling, and linearly scaled-up the initial learning rate based on the
mini batch size (n = B/256 * 0.1)
• Label Smoothing
• Auto Augmentation
→ First introduce 16 different types of image transformations and make 24 different
combinations of those transformations. 24 polices are randomly chosen and applied
to each sample image during training
• Mixup Training
→ Weighted combinations of random image pairs from the training data.
07
Methods and Experiments
Training Strategy
Introduction / Related Work / Methods and Experiments / Conclusion
• Large Crop Size
→ EfficientNet has demonstrated that increasing the input image size for a deeper and
wider network may better trade off accuracy vs. FLOPS
→ Used diverse crop sizes for input image. 224, and 256
• Regularization
→ Dropout with probability of 0.2 is applied.
→ Also applied DropBlock layers to the convolutional layers at the last two stages of the
network, which is more effective than dropout for specifically regularizing layers.
08
Methods and Experiments
Main Results – Image Classification
Introduction / Related Work / Methods and Experiments / Conclusion
09
Methods and Experiments
Main Results – Image Classification
Introduction / Related Work / Methods and Experiments / Conclusion
* ResNeSt-200 : 256 x 256 , ResNeSt-269: 320 x 320
* Bicubic upsampling is employed for input size greater than 256
* Result proved that Depth-wise convolution is not optimized for inference speed. 10
Methods and Experiments
Main Results – Ablation Studies
Introduction / Related Work / Methods and Experiments / Conclusion
* Improving radix from 0 to 4 continuously improved the top-1 accuracy, while also
increasing latency and memory usage.
* Finally employed 2s1x64d setting for good trade off between speed, and accuracy.
11
Methods and Experiments
Main Results – Object Detection
Introduction / Related Work / Methods and Experiments / Conclusion
* Test on MS-COCO validation set
12
Methods and Experiments
Main Results – Instance Segmentation
Introduction / Related Work / Methods and Experiments / Conclusion
13
Methods and Experiments
Main Results – Semantic Segmentation
Introduction / Related Work / Methods and Experiments / Conclusion
14
Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• ResNeSt architecture proposed a novel Split-Attention block that
universally improved the learned feature representations to boost
performance.
• In the downstream tasks, simply switching the backbone network to
ResNeSt showed substantially better result.
• Depth-wise convolution is not optimal for training and inference
efficiency on GPU
• Model accuracy get saturated on ImageNet with a fixed input image Size
• Increasing input image size can get better accuracy and FLOPS trade-off.
15

More Related Content

What's hot

Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
DeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsDeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsBryan Perozzi
 
Resnet for image processing (3)
Resnet for image processing (3)Resnet for image processing (3)
Resnet for image processing (3)devikarb
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural NetworksIRJET Journal
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFWonjoongCheon
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsJoseph Paul Cohen PhD
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
 

What's hot (20)

Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Objects as points
Objects as pointsObjects as points
Objects as points
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
DeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsDeepWalk: Online Learning of Representations
DeepWalk: Online Learning of Representations
 
Resnet for image processing (3)
Resnet for image processing (3)Resnet for image processing (3)
Resnet for image processing (3)
 
Centernet
CenternetCenternet
Centernet
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 

Similar to ResNeSt: Split-Attention Networks

Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutSeunghyun Hwang
 
How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?Seunghyun Hwang
 
Mix Conv: Mixed Depthwise Convolutional Kernels
Mix Conv: Mixed Depthwise Convolutional KernelsMix Conv: Mixed Depthwise Convolutional Kernels
Mix Conv: Mixed Depthwise Convolutional KernelsSeunghyun Hwang
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...ssuser4b1f48
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]taeseon ryu
 
TIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfTIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfBoahKim2
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...dyyjkd
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Seunghyun Hwang
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_papershanullah3
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionSafaa Alnabulsi
 

Similar to ResNeSt: Split-Attention Networks (20)

Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?How useful is self-supervised pretraining for Visual tasks?
How useful is self-supervised pretraining for Visual tasks?
 
Mix Conv: Mixed Depthwise Convolutional Kernels
Mix Conv: Mixed Depthwise Convolutional KernelsMix Conv: Mixed Depthwise Convolutional Kernels
Mix Conv: Mixed Depthwise Convolutional Kernels
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
TIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfTIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdf
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 

More from Seunghyun Hwang

An annotation sparsification strategy for 3D medical image segmentation via r...
An annotation sparsification strategy for 3D medical image segmentation via r...An annotation sparsification strategy for 3D medical image segmentation via r...
An annotation sparsification strategy for 3D medical image segmentation via r...Seunghyun Hwang
 
Do wide and deep networks learn the same things? Uncovering how neural networ...
Do wide and deep networks learn the same things? Uncovering how neural networ...Do wide and deep networks learn the same things? Uncovering how neural networ...
Do wide and deep networks learn the same things? Uncovering how neural networ...Seunghyun Hwang
 
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...Seunghyun Hwang
 
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model Seunghyun Hwang
 
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...Seunghyun Hwang
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation Seunghyun Hwang
 
Segmenting Medical MRI via Recurrent Decoding Cell
Segmenting Medical MRI via Recurrent Decoding CellSegmenting Medical MRI via Recurrent Decoding Cell
Segmenting Medical MRI via Recurrent Decoding CellSeunghyun Hwang
 
Progressive learning and Disentanglement of hierarchical representations
Progressive learning and Disentanglement of hierarchical representationsProgressive learning and Disentanglement of hierarchical representations
Progressive learning and Disentanglement of hierarchical representationsSeunghyun Hwang
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual RepresentationsA Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual RepresentationsSeunghyun Hwang
 
DeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary RefinementDeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary RefinementSeunghyun Hwang
 
Your Classifier is Secretly an Energy based model and you should treat it lik...
Your Classifier is Secretly an Energy based model and you should treat it lik...Your Classifier is Secretly an Energy based model and you should treat it lik...
Your Classifier is Secretly an Energy based model and you should treat it lik...Seunghyun Hwang
 
A Probabilistic U-Net for Segmentation of Ambiguous Images
A Probabilistic U-Net for Segmentation of Ambiguous ImagesA Probabilistic U-Net for Segmentation of Ambiguous Images
A Probabilistic U-Net for Segmentation of Ambiguous ImagesSeunghyun Hwang
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisSeunghyun Hwang
 

More from Seunghyun Hwang (14)

An annotation sparsification strategy for 3D medical image segmentation via r...
An annotation sparsification strategy for 3D medical image segmentation via r...An annotation sparsification strategy for 3D medical image segmentation via r...
An annotation sparsification strategy for 3D medical image segmentation via r...
 
Do wide and deep networks learn the same things? Uncovering how neural networ...
Do wide and deep networks learn the same things? Uncovering how neural networ...Do wide and deep networks learn the same things? Uncovering how neural networ...
Do wide and deep networks learn the same things? Uncovering how neural networ...
 
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
 
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
 
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation Deep Generative model-based quality control for cardiac MRI segmentation
Deep Generative model-based quality control for cardiac MRI segmentation
 
Segmenting Medical MRI via Recurrent Decoding Cell
Segmenting Medical MRI via Recurrent Decoding CellSegmenting Medical MRI via Recurrent Decoding Cell
Segmenting Medical MRI via Recurrent Decoding Cell
 
Progressive learning and Disentanglement of hierarchical representations
Progressive learning and Disentanglement of hierarchical representationsProgressive learning and Disentanglement of hierarchical representations
Progressive learning and Disentanglement of hierarchical representations
 
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual RepresentationsA Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
 
DeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary RefinementDeepStrip: High Resolution Boundary Refinement
DeepStrip: High Resolution Boundary Refinement
 
Your Classifier is Secretly an Energy based model and you should treat it lik...
Your Classifier is Secretly an Energy based model and you should treat it lik...Your Classifier is Secretly an Energy based model and you should treat it lik...
Your Classifier is Secretly an Energy based model and you should treat it lik...
 
A Probabilistic U-Net for Segmentation of Ambiguous Images
A Probabilistic U-Net for Segmentation of Ambiguous ImagesA Probabilistic U-Net for Segmentation of Ambiguous Images
A Probabilistic U-Net for Segmentation of Ambiguous Images
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

ResNeSt: Split-Attention Networks

  • 1. ResNeSt: Split-Attention Networks Hwang seung hyun Yonsei University Severance Hospital CCIDS University of California, Davis & Amazon CVPR 2020 2020.05.31
  • 2. Introduction Related Work Methods and Experiments 01 02 03 Conclusion 04 Yonsei Unversity Severance Hospital CCIDS Contents
  • 3. ResNeSt Introduction – Proposal • While Image classification models have recently continued to advance, most downstream applications still employ “ResNet” as the backbone network • NAS-derived models are usually not optimized for training efficiency of memory usage. • Recent image classification networks have focused more on group or depth-wise convolution, these methods do not transfer well to other tasks (No cross- channel relationships) Introduction / Related Work / Methods and Experiments / Conclusion Depth-wise Convolution Neural Architecture Search 01
  • 4. ResNeSt Introduction – Contributions • Explored a simple architectural modification of the ResNet. → Requires no additional computation and is easy to be adopted as a backbone for other vision tasks. • Set large scale benchmarks on image classifications and transfer learning applications. → Tested on image classification, object detection, instance segmentation, and semantic segmentation. • ResNeSt outperforms all existing ResNet variants and has the same computational efficiency and even achieves better speed-accuracy trade-offs than SOTA NAS-derived models. Introduction / Related Work / Methods and Experiments / Conclusion 02
  • 5. Related Work Introduction / Related Work / Methods and Experiments / Conclusion Multi-path and Feature-Map Attention • Multi-path representation has shown success in “GoogleNet” • “ResNext” adopted group convolution in the ResNet bottle block, which converts the multi-path structure into a unified operation. • “SE-Net” introduced a channel-attention mechanism. • “SK-Net” brings the feature-map attention across two network branches. Group Convolution Inception Block Squeeze-and-Excitation Block 03
  • 6. Related Work Introduction / Related Work / Methods and Experiments / Conclusion • ResNeSt generalized the channel-wise attention into feature-map group representation Split Attention 04
  • 7. Methods and Experiments Split-Attention Networks Introduction / Related Work / Methods and Experiments / Conclusion • Features are divided into several groups - Cardinality hyperparameter: K - Radix hyperparameter: R - Total number of feature groups: G = RK • Element-wise summation across multiple splits → Feature-map groups with the same cardinality-index but different radix index are fused together • Global contextual information with embedded channel-wise statistics can be gathered with GAP • Two consecutive FC layers are added to predict the attention weights for each splits 05
  • 8. Methods and Experiments Network Tweaks Introduction / Related Work / Methods and Experiments / Conclusion • Average Downsampling → In terms of preserving spatial information, zero padding is suboptimal. Instead of using strided convolution at the transitioning block, use average pooling layer. • Tweaks from ResNet-D → The first 7x7 convolutional layer is replaced with three consecutive 3x3 layers, which have the same receptive field size with a similar computational cost → 2x2 average pooling layer is added to the shortcut connection prior to the 1x1 convolutional layer for the transitioning blocks. 06
  • 9. Methods and Experiments Training Strategy Introduction / Related Work / Methods and Experiments / Conclusion • Large Mini-batch Distributed Training → Used cosine scheduling, and linearly scaled-up the initial learning rate based on the mini batch size (n = B/256 * 0.1) • Label Smoothing • Auto Augmentation → First introduce 16 different types of image transformations and make 24 different combinations of those transformations. 24 polices are randomly chosen and applied to each sample image during training • Mixup Training → Weighted combinations of random image pairs from the training data. 07
  • 10. Methods and Experiments Training Strategy Introduction / Related Work / Methods and Experiments / Conclusion • Large Crop Size → EfficientNet has demonstrated that increasing the input image size for a deeper and wider network may better trade off accuracy vs. FLOPS → Used diverse crop sizes for input image. 224, and 256 • Regularization → Dropout with probability of 0.2 is applied. → Also applied DropBlock layers to the convolutional layers at the last two stages of the network, which is more effective than dropout for specifically regularizing layers. 08
  • 11. Methods and Experiments Main Results – Image Classification Introduction / Related Work / Methods and Experiments / Conclusion 09
  • 12. Methods and Experiments Main Results – Image Classification Introduction / Related Work / Methods and Experiments / Conclusion * ResNeSt-200 : 256 x 256 , ResNeSt-269: 320 x 320 * Bicubic upsampling is employed for input size greater than 256 * Result proved that Depth-wise convolution is not optimized for inference speed. 10
  • 13. Methods and Experiments Main Results – Ablation Studies Introduction / Related Work / Methods and Experiments / Conclusion * Improving radix from 0 to 4 continuously improved the top-1 accuracy, while also increasing latency and memory usage. * Finally employed 2s1x64d setting for good trade off between speed, and accuracy. 11
  • 14. Methods and Experiments Main Results – Object Detection Introduction / Related Work / Methods and Experiments / Conclusion * Test on MS-COCO validation set 12
  • 15. Methods and Experiments Main Results – Instance Segmentation Introduction / Related Work / Methods and Experiments / Conclusion 13
  • 16. Methods and Experiments Main Results – Semantic Segmentation Introduction / Related Work / Methods and Experiments / Conclusion 14
  • 17. Conclusion Introduction / Related Work / Methods and Experiments / Conclusion • ResNeSt architecture proposed a novel Split-Attention block that universally improved the learned feature representations to boost performance. • In the downstream tasks, simply switching the backbone network to ResNeSt showed substantially better result. • Depth-wise convolution is not optimal for training and inference efficiency on GPU • Model accuracy get saturated on ImageNet with a fixed input image Size • Increasing input image size can get better accuracy and FLOPS trade-off. 15