SlideShare a Scribd company logo
Bag of tricks for image classiļ¬cation
with convolutional neural networks
Choi Dongmin
Yonsei University Severance Hospital CCIDS
Contents
ā€¢ Abstract

ā€¢ Introduction

ā€¢ Training Procedures

ā€¢ Eļ¬ƒcient Training

ā€¢ Model Tweaks

ā€¢ Training Reļ¬nements

ā€¢ Transfer Learning

ā€¢ Conclusion
Abstract
ā€¢ Focus on training reļ¬nements, such as data augmentations
and optimization methods

ā€¢ Tips of how to deal with these reļ¬nements

ā€¢ Raised ResNet-50ā€™s top-1 accuracy from 73.5% to 79.29%ā€Ø
on ImageNet

ā€¢ These improvements also leads to better transfer learning
performance
Introduction
ā€¢ Image Classiļ¬cation performance has been raised with various
architecture including AlexNet, VGG, ResNet, DenseNet and
NASNet.

ā€¢ However, these advancements did not solely come from improved
model architectureā€Ø
- Training reļ¬nements (ex. loss functions, data preprocessing,
optimization) as played a major role
Introduction
Training Procedures
ā€¢ 1. Baseline Training Procedure

ā€¢ 2. Experiment Results
Training Procedures
1. Baseline Training Procedure - During Training
ā€¢ (1) Randomly sample an image and decode it into 32-bit ļ¬‚oating point raw pixel
values in [0, 255].

ā€¢ (2) Randomly crop a rectangular region whose aspect ratio is randomly sampled
in [3/4, 4/3] and area randomly sampled in [8%, 100%], then resize the cropped
region into a 224-by-224 square image.

ā€¢ (3) Flip horizontally with 0.5 probability.

ā€¢ (e) Scale hue, saturation, and brightness with coeļ¬ƒcients uniformly drawn fromā€Ø
[0.6, 1.4].

ā€¢ (5) Add PCA noise with a coeļ¬ƒcient sampled from a normal distribution (0, 0.1).

ā€¢ (6). Normalize RGB channels by subtracting 123.68, 116.779, 103.939 and
dividing by 58.393, 57.12, 57.375, respectively.
N
Training Procedures
1. Baseline Training Procedure - During Validation
ā€¢ Resize each imageā€™s shorter edge to 256 pixels while keeping its aspect
ratio

ā€¢ Crop out 224-by-224 region in the center

ā€¢ Normalize RGB channels similar to training

ā€¢ No random data augmentations
Training Procedures
1. Baseline Training Procedure - Weights Initialization
ā€¢ Convolutional and fully-connected layers : Xavier algorithm

ā€¢ All biases : 0

ā€¢ For batch normalization, vectors : 1 / vectors : 0Ī³ Ī²
Training Procedures
1. Baseline Training Procedure - Optimizer and Hyper-parameters
ā€¢ Optimizer : Nesterov Accelerated Gradient (NAG) descent

ā€¢ Epochs : 120

ā€¢ Batch size : 256

ā€¢ GPU : 8 NVIDIA V100 GPUs

ā€¢ Learning rate : 0.1 (decay at 30, 60, and 90 epochs)
Training Procedures
2. Experiment Results
Efļ¬cient Training
ā€¢ 1. Large-batch Training

ā€¢ 2. Low-precision Training

ā€¢ 3. Experiment Result
Efļ¬cient Training
1. Large-batch Training
ā€¢ It is more eļ¬ƒcient to use larger batch size

ā€¢ But using large batch size decreases convergence rate for convex
problems and degrades validation accuracy.

ā€¢ Four heuristics that help scale the batch size upā€Ø
- Linear scaling learning rateā€Ø
- Learning rate warmupā€Ø
- Zero ā€Ø
- No bias decay
Ī³
Efļ¬cient Training
1. Large-batch Training : Linear Scaling Learning Rate
ā€¢ A large batch size reduces the noise in the gradient (= reduces variance)

ā€¢ It is possible to increase the learning rate with a large batch size

ā€¢ In Goyal et al*, linearly increasing the learning rate with the batch size
works empirically for ResNet-50 training.

ā€¢ In He et al**, set the initial learning rate to 0.1 x b/256 (b = batch size)
Goyal et al*, Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017
He et al**, Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770ā€“778, 2016
Efļ¬cient Training
1. Large-batch Training : Learning Rate Warmup
ā€¢ Initial network parameters are typically far away from the ļ¬nal solutionā€Ø
- Using a too large learning rate may result in numerical instability

ā€¢ Using a small learning rate at the beginning and thenā€Ø
switch back to the initial learning rate when the training process is stable

ā€¢ In Goyal et al*, set the learning rate as belowā€Ø
- Use the ļ¬rst batches to warm upā€Ø
- Initial learning rate : ā€Ø
- At batch , the learning rate :
m
Ī·
i(1 ā‰¤ i ā‰¤ m) iĪ·/m
Efļ¬cient Training
1. Large-batch Training : Zero Ī³
ā€¢ In batch normalization, the output is ā€Ø
( : standardized input / : learnable parameters initialized to 1s and 0s)

ā€¢ Zero initialization heuristicā€Ø
- Initialize for all BN layers that sit at the end of a residual blockā€Ø
- All residual blocks just return their inputs [ output = + block ]ā€Ø
- Mimics network that has less number of layers and is easier to trainā€Ø
at the initial stage
Ī³ Ģ‚x + Ī²
Ģ‚x x Ī³, Ī²
Ī³
Ī³ = 0
x (x)
https://mc.ai/bag-of-tricks-for-image-classiļ¬cation-with-convolutional-neural-networks-in-keras/
Efļ¬cient Training
1. Large-batch Training : No bias decay
ā€¢ Weight decay is often applied to all learnable parameters in order to ā€Ø
avoid overļ¬tting

ā€¢ According to Jia et al*, itā€™s recommended to only apply weight decay to
weights (not to bias)

ā€¢ That is, other parameters (biases and and in BN layers) are left
unregularized
Ī³ Ī²
Jia et al*, Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205, 2018. ā€Ø
Efļ¬cient Training
2. Low-precision training
ā€¢ Neural networks are commonly trained with 32-bit-ļ¬‚oating-point (FP32)

ā€¢ However, new hardware support lower precision data types

ā€¢ 2 to 3 times faster after switchingā€Ø
from FP32 to FP16 on V100

ā€¢ Micikevicius et al* suggestedā€Ø
Mixed precision trainingā€Ø
( Forward and backward in FP16,ā€Ø
Weight update in FP32)
Micikevicius et al*, Mixed precision training. arXiv preprint arXiv:1710.03740, 2017.
Efļ¬cient Training
3. Experiment Results
Model Tweaks
ā€¢ A minor adjustment to the network architecture, such as
changing the stride

ā€¢ Such a tweak often barely changes the computational
complexity but might have a non-negligible eļ¬€ect on accuracy
Model Tweaks
Original ResNet-50
Input Stem
Downsampling Block
Model Tweaks
Original Downsampling Block
ResNet-B Downsampling Block
Using a kernel size 1x1 with a stride of 2ā€Ø
ā†’ Ignores three-quarters of the input feature map
First appeared in a Torch implementation
Model Tweaks
Original Input Stem
ResNet-C Input Stem
7 x 7 convolution is 5.4 times more expensiveā€Ø
than 3 x 3 convolution
Proposed in Inception-v2 originally
Model Tweaks
Original Input Stem
ResNet-D Downsampling Block
Path B also ignores 3/4 of input feature mapsā€Ø
because of stride size
Proposed model tweak
Original Downsampling Block
Model Tweaks
Training Reļ¬nements
ā€¢ 1. Cosine Learning Rate Decay

ā€¢ 2. Label Smoothing

ā€¢ 3. Knowledge Distillation

ā€¢ 4. Mixup Training 

ā€¢ 5. Experiment Results
Training Reļ¬nements
1. Cosine Learning Rate Decay
ā€¢ Cosine Annealing Strategy proposed by Loshchilov et al*
Ī·t =
1
2 (
1 + cos
(
tĻ€
T ))
Ī·
- : Learning rate at batch
- : The number of total batches
Ī·t t
T
Loshchilov et al*, SGDR: stochastic gradient de- scent with restarts. CoRR, abs/1608.03983, 2016.
Training Reļ¬nements
1. Cosine Learning Rate Decay
Remains large ā†’ Potentially improves training process
Training Reļ¬nements
2. Label Smoothing Proposed by Inception-v2*
Inception-v2* : C.Szegedy et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
Ground Truth Probability :
Optimal Solution :
- : the number of labelsK
Training Reļ¬nements
2. Label Smoothing Proposed by Inception-v2*
Inception-v2* : C.Szegedy et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
centers at the theoretical value
fewer extreme values
Training Reļ¬nements
3. Knowledge Distillation
G Hinton et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. ā€Ø
Image
Teacher Modelā€Ø
(Large; ex. ResNet-152)
Student Modelā€Ø
(Small; ex. ResNet-50)
r
z
Output
Transfer
loss(p, softmax(z)) + T2
loss(softmax(r/T), softmax(z/T))
To improve accuracy of the student model while keeping its complexity the same
ā€» : the temperature hyper-parameter to make the softmax outputs smootherT
Training Reļ¬nements
4. Mixup Training
H Zhang et al. mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.
https://hoya012.github.io/blog/Bag-of-Tricks-for-Image-Classiļ¬cation-with-Convolutional-Neural-Networks-Review/
Only use , generated by a weighted linear interpolation of two example( Ģ‚x, Ģ‚y) (xi, yi), (xj, yj)
drawn from the Beta distribution(Ī±, Ī±)
Training Reļ¬nements
5. Experiment Results
- for label smoothing
- for knowledge distillation
- for mixup training
Ļµ = 0.1
T = 20
Ī± = 0.2
Transfer Learning
Object Detection
Transfer Learning
Semantic Segmentation
Thank you
Yonsei University Severance Hospital CCIDS

More Related Content

What's hot

Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
Silversparro Technologies
Ā 
Domain adaptation
Domain adaptationDomain adaptation
Domain adaptation
Tomoya Koike
Ā 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
Ā 
ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...
ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...
ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...
Edge AI and Vision Alliance
Ā 
U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹
U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹
U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹
KCS Keio Computer Society
Ā 
Deep Fakes Detection
Deep Fakes DetectionDeep Fakes Detection
Deep Fakes Detection
Yusuke Uchida
Ā 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
Ā 
DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘
DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘
DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘
Naoki Matsunaga
Ā 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat PolitĆØcnica de Catalunya
Ā 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
Ā 
(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks
(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks
(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks
Masahiro Suzuki
Ā 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
Amol Patil
Ā 
[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation
[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation
[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation
Ryutaro Yamauchi
Ā 
Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...
Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...
Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...
Yoshitaka Ushiku
Ā 
強化学ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæ恟恄
強化学ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæćŸć„å¼·åŒ–å­¦ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæ恟恄
強化学ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæ恟恄
Takuma Wakamori
Ā 
Invariant Information Clustering for Unsupervised Image Classification and Se...
Invariant Information Clustering for Unsupervised Image Classification and Se...Invariant Information Clustering for Unsupervised Image Classification and Se...
Invariant Information Clustering for Unsupervised Image Classification and Se...
harmonylab
Ā 
ꕵåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰
ꕵåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰ę•µåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰
ꕵåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰
cvpaper. challenge
Ā 
CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰
CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰
CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰
MasanoriSuganuma
Ā 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
BeerenSahu
Ā 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
Md Tajul Islam
Ā 

What's hot (20)

Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
Ā 
Domain adaptation
Domain adaptationDomain adaptation
Domain adaptation
Ā 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Ā 
ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...
ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...
ā€œAn Introduction to Data Augmentation Techniques in ML Frameworks,ā€ a Present...
Ā 
U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹
U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹
U-Net: Convolutional Networks for Biomedical Image Segmentation恮ē“¹ä»‹
Ā 
Deep Fakes Detection
Deep Fakes DetectionDeep Fakes Detection
Deep Fakes Detection
Ā 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Ā 
DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘
DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘
DNNć®ę›–ę˜§ę€§ć«é–¢ć™ć‚‹ē ”ē©¶å‹•å‘
Ā 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Ā 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Ā 
(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks
(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks
(ē ”ē©¶ä¼šč¼ŖčŖ­) Weight Uncertainty in Neural Networks
Ā 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
Ā 
[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation
[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation
[DeepLearningč«–ę–‡čŖ­ćæ会] Dataset Distillation
Ā 
Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...
Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...
Vision-and-Language Navigation: Interpreting visually-grounded navigation ins...
Ā 
強化学ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæ恟恄
強化学ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæćŸć„å¼·åŒ–å­¦ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæ恟恄
強化学ēæ’初åæƒč€…ćŒå¼·åŒ–å­¦ēæ’ć§ćƒ‹ćƒ„ćƒ¼ćƒ©ćƒ«ćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æ恮čØ­čØˆć‚’č‡Ŗå‹•åŒ–ć—ć¦ćæ恟恄
Ā 
Invariant Information Clustering for Unsupervised Image Classification and Se...
Invariant Information Clustering for Unsupervised Image Classification and Se...Invariant Information Clustering for Unsupervised Image Classification and Se...
Invariant Information Clustering for Unsupervised Image Classification and Se...
Ā 
ꕵåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰
ꕵåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰ę•µåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰
ꕵåƾēš„ē”ŸęˆćƒćƒƒćƒˆćƒÆćƒ¼ć‚Æļ¼ˆGANļ¼‰
Ā 
CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰
CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰
CNNć®ę§‹é€ ęœ€é©åŒ–ę‰‹ę³•ļ¼ˆē¬¬3回3D勉強会ļ¼‰
Ā 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
Ā 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
Ā 

Similar to Bag of tricks for image classification with convolutional neural networks review [cdm]

Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
LEE HOSEONG
Ā 
Deeplearning
Deeplearning Deeplearning
Deeplearning
Nimrita Koul
Ā 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
Ā 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
Sunghoon Joo
Ā 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Seunghyun Hwang
Ā 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
FEG
Ā 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat PolitĆØcnica de Catalunya
Ā 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
Ā 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
Ā 
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
ssuser6a46522
Ā 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Sunghoon Joo
Ā 
International hpc summer school presentation
International hpc summer school presentationInternational hpc summer school presentation
International hpc summer school presentation
Mila, UniversitƩ de MontrƩal
Ā 
Practical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaitonPractical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaiton
RyuichiKanoh
Ā 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
Ā 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Vincenzo Lomonaco
Ā 
Semi-Supervised Deep Learning
Semi-Supervised Deep LearningSemi-Supervised Deep Learning
Semi-Supervised Deep Learning
Kamer Ali Yuksel
Ā 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
Ā 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
Ā 
Predicting Drug Target Interaction Using Deep Belief Network
Predicting Drug Target Interaction Using Deep Belief NetworkPredicting Drug Target Interaction Using Deep Belief Network
Predicting Drug Target Interaction Using Deep Belief Network
Rashim Dhaubanjar
Ā 
Deep gradient compression
Deep gradient compressionDeep gradient compression
Deep gradient compression
David Tung
Ā 

Similar to Bag of tricks for image classification with convolutional neural networks review [cdm] (20)

Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
Ā 
Deeplearning
Deeplearning Deeplearning
Deeplearning
Ā 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Ā 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
Ā 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Ā 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
Ā 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Ā 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
Ā 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Ā 
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Ā 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Ā 
International hpc summer school presentation
International hpc summer school presentationInternational hpc summer school presentation
International hpc summer school presentation
Ā 
Practical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaitonPractical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaiton
Ā 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Ā 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Ā 
Semi-Supervised Deep Learning
Semi-Supervised Deep LearningSemi-Supervised Deep Learning
Semi-Supervised Deep Learning
Ā 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Ā 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Ā 
Predicting Drug Target Interaction Using Deep Belief Network
Predicting Drug Target Interaction Using Deep Belief NetworkPredicting Drug Target Interaction Using Deep Belief Network
Predicting Drug Target Interaction Using Deep Belief Network
Ā 
Deep gradient compression
Deep gradient compressionDeep gradient compression
Deep gradient compression
Ā 

More from Dongmin Choi

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Dongmin Choi
Ā 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
Dongmin Choi
Ā 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
Dongmin Choi
Ā 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
Dongmin Choi
Ā 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Dongmin Choi
Ā 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
Dongmin Choi
Ā 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Dongmin Choi
Ā 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
Dongmin Choi
Ā 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
Ā 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Dongmin Choi
Ā 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Dongmin Choi
Ā 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Dongmin Choi
Ā 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-training
Dongmin Choi
Ā 
Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...
Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...
Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...
Dongmin Choi
Ā 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]
Dongmin Choi
Ā 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]
Dongmin Choi
Ā 
Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]
Dongmin Choi
Ā 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
Dongmin Choi
Ā 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
Dongmin Choi
Ā 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
Dongmin Choi
Ā 

More from Dongmin Choi (20)

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Ā 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
Ā 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
Ā 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
Ā 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Ā 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
Ā 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Ā 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
Ā 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Ā 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Ā 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Ā 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Ā 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-training
Ā 
Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...
Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...
Review : Structure Boundary Preserving Segmentationā€Øfor Medical Image with Am...
Ā 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]
Ā 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]
Ā 
Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]
Ā 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
Ā 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
Ā 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
Ā 

Recently uploaded

Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
Ā 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
Ā 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
Ā 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
Ā 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
Ā 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
Ā 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
Ā 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
Ā 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
Ā 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
Ā 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
Ā 
What is an RPA CoE? Session 1 ā€“ CoE Vision
What is an RPA CoE?  Session 1 ā€“ CoE VisionWhat is an RPA CoE?  Session 1 ā€“ CoE Vision
What is an RPA CoE? Session 1 ā€“ CoE Vision
DianaGray10
Ā 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
Ā 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
Ā 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
Ā 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
Ā 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
Ā 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
Ā 
ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...
ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...
ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
Ā 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
Ā 

Recently uploaded (20)

Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Ā 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Ā 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Ā 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ā 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
Ā 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Ā 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Ā 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Ā 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Ā 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
Ā 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
Ā 
What is an RPA CoE? Session 1 ā€“ CoE Vision
What is an RPA CoE?  Session 1 ā€“ CoE VisionWhat is an RPA CoE?  Session 1 ā€“ CoE Vision
What is an RPA CoE? Session 1 ā€“ CoE Vision
Ā 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Ā 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Ā 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Ā 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Ā 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Ā 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Ā 
ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...
ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...
ā€œTemporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Ā 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Ā 

Bag of tricks for image classification with convolutional neural networks review [cdm]

  • 1. Bag of tricks for image classiļ¬cation with convolutional neural networks Choi Dongmin Yonsei University Severance Hospital CCIDS
  • 2. Contents ā€¢ Abstract ā€¢ Introduction ā€¢ Training Procedures ā€¢ Eļ¬ƒcient Training ā€¢ Model Tweaks ā€¢ Training Reļ¬nements ā€¢ Transfer Learning ā€¢ Conclusion
  • 3. Abstract ā€¢ Focus on training reļ¬nements, such as data augmentations and optimization methods ā€¢ Tips of how to deal with these reļ¬nements ā€¢ Raised ResNet-50ā€™s top-1 accuracy from 73.5% to 79.29%ā€Ø on ImageNet ā€¢ These improvements also leads to better transfer learning performance
  • 4. Introduction ā€¢ Image Classiļ¬cation performance has been raised with various architecture including AlexNet, VGG, ResNet, DenseNet and NASNet. ā€¢ However, these advancements did not solely come from improved model architectureā€Ø - Training reļ¬nements (ex. loss functions, data preprocessing, optimization) as played a major role
  • 6. Training Procedures ā€¢ 1. Baseline Training Procedure ā€¢ 2. Experiment Results
  • 7. Training Procedures 1. Baseline Training Procedure - During Training ā€¢ (1) Randomly sample an image and decode it into 32-bit ļ¬‚oating point raw pixel values in [0, 255]. ā€¢ (2) Randomly crop a rectangular region whose aspect ratio is randomly sampled in [3/4, 4/3] and area randomly sampled in [8%, 100%], then resize the cropped region into a 224-by-224 square image. ā€¢ (3) Flip horizontally with 0.5 probability. ā€¢ (e) Scale hue, saturation, and brightness with coeļ¬ƒcients uniformly drawn fromā€Ø [0.6, 1.4]. ā€¢ (5) Add PCA noise with a coeļ¬ƒcient sampled from a normal distribution (0, 0.1). ā€¢ (6). Normalize RGB channels by subtracting 123.68, 116.779, 103.939 and dividing by 58.393, 57.12, 57.375, respectively. N
  • 8. Training Procedures 1. Baseline Training Procedure - During Validation ā€¢ Resize each imageā€™s shorter edge to 256 pixels while keeping its aspect ratio ā€¢ Crop out 224-by-224 region in the center ā€¢ Normalize RGB channels similar to training ā€¢ No random data augmentations
  • 9. Training Procedures 1. Baseline Training Procedure - Weights Initialization ā€¢ Convolutional and fully-connected layers : Xavier algorithm ā€¢ All biases : 0 ā€¢ For batch normalization, vectors : 1 / vectors : 0Ī³ Ī²
  • 10. Training Procedures 1. Baseline Training Procedure - Optimizer and Hyper-parameters ā€¢ Optimizer : Nesterov Accelerated Gradient (NAG) descent ā€¢ Epochs : 120 ā€¢ Batch size : 256 ā€¢ GPU : 8 NVIDIA V100 GPUs ā€¢ Learning rate : 0.1 (decay at 30, 60, and 90 epochs)
  • 12. Efļ¬cient Training ā€¢ 1. Large-batch Training ā€¢ 2. Low-precision Training ā€¢ 3. Experiment Result
  • 13. Efļ¬cient Training 1. Large-batch Training ā€¢ It is more eļ¬ƒcient to use larger batch size ā€¢ But using large batch size decreases convergence rate for convex problems and degrades validation accuracy. ā€¢ Four heuristics that help scale the batch size upā€Ø - Linear scaling learning rateā€Ø - Learning rate warmupā€Ø - Zero ā€Ø - No bias decay Ī³
  • 14. Efļ¬cient Training 1. Large-batch Training : Linear Scaling Learning Rate ā€¢ A large batch size reduces the noise in the gradient (= reduces variance) ā€¢ It is possible to increase the learning rate with a large batch size ā€¢ In Goyal et al*, linearly increasing the learning rate with the batch size works empirically for ResNet-50 training. ā€¢ In He et al**, set the initial learning rate to 0.1 x b/256 (b = batch size) Goyal et al*, Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017 He et al**, Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770ā€“778, 2016
  • 15. Efļ¬cient Training 1. Large-batch Training : Learning Rate Warmup ā€¢ Initial network parameters are typically far away from the ļ¬nal solutionā€Ø - Using a too large learning rate may result in numerical instability ā€¢ Using a small learning rate at the beginning and thenā€Ø switch back to the initial learning rate when the training process is stable ā€¢ In Goyal et al*, set the learning rate as belowā€Ø - Use the ļ¬rst batches to warm upā€Ø - Initial learning rate : ā€Ø - At batch , the learning rate : m Ī· i(1 ā‰¤ i ā‰¤ m) iĪ·/m
  • 16. Efļ¬cient Training 1. Large-batch Training : Zero Ī³ ā€¢ In batch normalization, the output is ā€Ø ( : standardized input / : learnable parameters initialized to 1s and 0s) ā€¢ Zero initialization heuristicā€Ø - Initialize for all BN layers that sit at the end of a residual blockā€Ø - All residual blocks just return their inputs [ output = + block ]ā€Ø - Mimics network that has less number of layers and is easier to trainā€Ø at the initial stage Ī³ Ģ‚x + Ī² Ģ‚x x Ī³, Ī² Ī³ Ī³ = 0 x (x) https://mc.ai/bag-of-tricks-for-image-classiļ¬cation-with-convolutional-neural-networks-in-keras/
  • 17. Efļ¬cient Training 1. Large-batch Training : No bias decay ā€¢ Weight decay is often applied to all learnable parameters in order to ā€Ø avoid overļ¬tting ā€¢ According to Jia et al*, itā€™s recommended to only apply weight decay to weights (not to bias) ā€¢ That is, other parameters (biases and and in BN layers) are left unregularized Ī³ Ī² Jia et al*, Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205, 2018. ā€Ø
  • 18. Efļ¬cient Training 2. Low-precision training ā€¢ Neural networks are commonly trained with 32-bit-ļ¬‚oating-point (FP32) ā€¢ However, new hardware support lower precision data types ā€¢ 2 to 3 times faster after switchingā€Ø from FP32 to FP16 on V100 ā€¢ Micikevicius et al* suggestedā€Ø Mixed precision trainingā€Ø ( Forward and backward in FP16,ā€Ø Weight update in FP32) Micikevicius et al*, Mixed precision training. arXiv preprint arXiv:1710.03740, 2017.
  • 20. Model Tweaks ā€¢ A minor adjustment to the network architecture, such as changing the stride ā€¢ Such a tweak often barely changes the computational complexity but might have a non-negligible eļ¬€ect on accuracy
  • 21. Model Tweaks Original ResNet-50 Input Stem Downsampling Block
  • 22. Model Tweaks Original Downsampling Block ResNet-B Downsampling Block Using a kernel size 1x1 with a stride of 2ā€Ø ā†’ Ignores three-quarters of the input feature map First appeared in a Torch implementation
  • 23. Model Tweaks Original Input Stem ResNet-C Input Stem 7 x 7 convolution is 5.4 times more expensiveā€Ø than 3 x 3 convolution Proposed in Inception-v2 originally
  • 24. Model Tweaks Original Input Stem ResNet-D Downsampling Block Path B also ignores 3/4 of input feature mapsā€Ø because of stride size Proposed model tweak Original Downsampling Block
  • 26. Training Reļ¬nements ā€¢ 1. Cosine Learning Rate Decay ā€¢ 2. Label Smoothing ā€¢ 3. Knowledge Distillation ā€¢ 4. Mixup Training ā€¢ 5. Experiment Results
  • 27. Training Reļ¬nements 1. Cosine Learning Rate Decay ā€¢ Cosine Annealing Strategy proposed by Loshchilov et al* Ī·t = 1 2 ( 1 + cos ( tĻ€ T )) Ī· - : Learning rate at batch - : The number of total batches Ī·t t T Loshchilov et al*, SGDR: stochastic gradient de- scent with restarts. CoRR, abs/1608.03983, 2016.
  • 28. Training Reļ¬nements 1. Cosine Learning Rate Decay Remains large ā†’ Potentially improves training process
  • 29. Training Reļ¬nements 2. Label Smoothing Proposed by Inception-v2* Inception-v2* : C.Szegedy et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. Ground Truth Probability : Optimal Solution : - : the number of labelsK
  • 30. Training Reļ¬nements 2. Label Smoothing Proposed by Inception-v2* Inception-v2* : C.Szegedy et al. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. centers at the theoretical value fewer extreme values
  • 31. Training Reļ¬nements 3. Knowledge Distillation G Hinton et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. ā€Ø Image Teacher Modelā€Ø (Large; ex. ResNet-152) Student Modelā€Ø (Small; ex. ResNet-50) r z Output Transfer loss(p, softmax(z)) + T2 loss(softmax(r/T), softmax(z/T)) To improve accuracy of the student model while keeping its complexity the same ā€» : the temperature hyper-parameter to make the softmax outputs smootherT
  • 32. Training Reļ¬nements 4. Mixup Training H Zhang et al. mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017. https://hoya012.github.io/blog/Bag-of-Tricks-for-Image-Classiļ¬cation-with-Convolutional-Neural-Networks-Review/ Only use , generated by a weighted linear interpolation of two example( Ģ‚x, Ģ‚y) (xi, yi), (xj, yj) drawn from the Beta distribution(Ī±, Ī±)
  • 33. Training Reļ¬nements 5. Experiment Results - for label smoothing - for knowledge distillation - for mixup training Ļµ = 0.1 T = 20 Ī± = 0.2
  • 36. Thank you Yonsei University Severance Hospital CCIDS