SlideShare a Scribd company logo
1 of 24
Download to read offline
PR-373
NeurIPS 2021 주성훈, VUNO Inc.


2022. 2. 27.
1. Research Background
2. Methods
1. Research Background 3
The performance of a vision model
•최신 training methodology를 적용한 새로운 architectures에서의 성능과,


이전 방식의 training methodology를 적용한 오래된 architectures성능과 비교하는 것을 지적함
•Architecture •Training methods •Scaling strategy
/ 24
2. Methods
1. Research Background 4
Objective:


Vision architecture를 scaling 하는데에 new perspectives와 practical advice 제공
•ResNet에 modern training and regularization techniques를 적용. Large Epoch.
•Re-scaled (Depth, Width, Resolution) ResNet (ResNet-RS)
/ 24
2. Methods
1. Research Background 5
ResNeXt
Automated architecture search를 활용한 구조
[67 (NASNET), 41 (AmoebaNet: 83.9), 55 (EfficientNet-B7, 84.4%, 2019)].
Adapting self-attention to the visual domain
AA-ResNet-152, 79.1%, 2019
ViT-L/16 87.76±0.03%, 2020
LambdaResNet200 84.3%, 2021
Previous works
•Architecture
VGG
ResNet
Inception
ViT-L/16 87.76±0.03%, 2020
/ 24
2. Methods
1. Research Background 6
Previous works
Compound Scaling
•이 논문에서 지적하는 단점들
• Small Epoch
• FLOPS 를 최적화해 설계된 baseline network (EfficientNet-B0)
•Architecture - EfficientNet
•EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, 2019, ICML 2019)
•PR-169
/ 24
2. Methods
1. Research Background 7
•Innovations in training (e.g. improved learning rate schedules)
•regularization methods
•dropout
•label smoothing
•stochastic depth
•dropblock
•data augmentation
Regularization methods have become especially useful to prevent overfitting when training ever-increasingly
larger models on limited data (e.g. 1.2M ImageNet images).
Previous works
•training/regularization methodology
https://arxiv.org/pdf/2011.12562.pdf
Label smoothing
/ 24
2. Methods
1. Research Background 8
Previous works
NFNets : 89.2% (Brock, A. et al. High-Performance Large-Scale Image Recognition Without
Normalization. (2021).)
• ResNet
• JFT-300M
• Batch normalization -> Adaptive Gradient Clipping(AGC)
ViT: An Image is Worth 16x16 Words:
Transformers for Image Recognition
at Scale (ICLR 2021): 88.6%
• Transformer
• ImageNet-21k (1,400만 개, 10배)
• JFT-300M
•Additional training data.
•Pre-training on large-scale datasets
76.4 -> 79.2
Revisiting Unreasonable Effectiveness of
Data in Deep Learning Era. ICCV. 2017
/ 24
2. Methods
1. Research Background 9
NoisyStudent (EfficientNet-L2) 88.4%, CVPR
-Self-training with Noisy Student improves ImageNet classification
Previous works
•Additional training data.
•Semi-supervised learning
Meta Pseudo Labels (EfficientNet-L2), 90.2%,
PR-336
/ 24
2. Experimental Results & Methodology
2. Methods
2. Experimental Results & Methodology 11
Additive Study of Improvements
•Architecture change 없이 Top-1 accuracy 82.2% (+3.2%) 달성
•Baseline ResNet-200
•256x256
•90 epochs
•Stepwise learning rate decay
•Hyperparameter selection:
•held-out validation set comprising 2% of the
ImageNet training set (minival-set)
•Increase training epochs (90->350 epoch)
•Regularization을 적용한 후에는 유용함
Top-1 accuracy: ImageNet validation-set, averaged over 2 runs
Training Regularization Architecture
/ 24
2. Methods
2. Experimental Results & Methodology 12
Additive Study of Improvements
•Architecture change 없이 Top-1 accuracy 82.2% (+3.2%) 달성
Regularization and Data Augmentation
•Dropout: after the global average pooling
•Decrease weight decay 적용 유무에 따라 결과 달라짐
•Stochastic depth (G. Huang et al., 2016, ECCV)
•RandAugment: random image transformations (e.g.
translate, shear, color distortions)
•Momentum with cosine learning rate schedule
Top-1 accuracy: ImageNet validation-set, averaged over 2 runs
Baseline ResNet-200
•256x256, 90 epochs, Stepwise learning rate decay
Training Regularization Architecture
/ 24
2. Methods
2. Experimental Results & Methodology 13
Top-1 accuracy: ImageNet validation-set, averaged over 2 runs
Additive Study of Improvements
•Architecture change 없이 Top-1 accuracy 82.2% (+3.2%) 달성
•간단한 architecture change (SE, ResNet-D) 를 추가해 Top-1 accuracy 83.4% (+4.4%) 달성
Baseline ResNet-200
•256x256, 90 epochs, Stepwise learning rate decay
Architecture changes
•ResNet-D (PR-201)
•Squeeze-and-Excitation (Hu, J. et al., 2020)
Training Regularization Architecture
/ 24
2. Methods
2. Experimental Results & Methodology 14
Top-1 accuracy: ImageNet validation-set, averaged over 2 runs
Importance of decreasing weight decay when combining regularization methods
•RA나 LS를 적용하면 default weight decay (1e-4)에서 변경 할 필요 없음
RA: RandAugment, LS: Label Smoothing, DO: Dropout on FC, SD: Stochastic Depth
•더 많은 regularization을 적용할 때 weight decay 수준을 낮추지 않으면 성능이 좋아지지 않는다.


-> 여러 regularization을 적용했을 때 모델을 과도하게 regularization이 되지 않도록 조절이 필요함.
/ 24
2. Methods
2. Experimental Results & Methodology 15
Extensive search on ImageNet over scaling strategies
•Extensive search on ImageNet over width multipliers in [0.25,0.5,1.0,1.5,2.0], depths of [26, 50, 101, 200, 300, 350,
400] and resolutions of [128,160,224,320,448].
•350 epochs, increase regularization with model size in an effort to limit overfitting
Smaller model - power law trend
동일 FLOPs에서 image resolution의 영향을 받음 ->
slow image resolution scaling의 motivation이 됨
•RandAugment:
•magnitude is set to 10 for filter multipliers in [0.25, 0.5]


or image resolution in [64, 160],
•15 for image resolution in [224, 320]
•20 otherwise.
•Stochastic depth
•Drop rate of 0.2 for image resolutions 224 and above.
•We do not apply stochastic depth filter multiplier 0.25 (or images smaller than 224).
•All models use a label smoothing of 0.1 and a weight decay of 4e-5.
/ 24
2. Methods
2. Experimental Results & Methodology 16
Depth Scaling in Regimes Where Overfitting Can Occur
•Depth scaling은 longer epoch 조건에서, width scaling은 short epoch 조건에서 성능 향상
•Width scaling하면 overfitting 발생: Model Width를 확장할 때 hyperparameter가 더 크게 증가
•이전 연구들은 40 epoch정도의 training regime에서도 width scaling을 택함을 지적
•BiT (2019) scales the width of ResNet-152 with 4x filter multiplier
•FNnet (2021) scales the width with ~1.5x filter multiplier
Image resolution [128, 160, 224, 320]
Depth [101, 200, 300, 400]
Width multiplier [1.0, 1.5, 2.0]
/ 24
2. Methods
2. Experimental Results & Methodology 17
Comparing EfficientNets against ResNet-RS(re-scaled) on a speed-accuracy Pareto curve
•FLOPs라는 지표가 실제 training과 inference에 대한 latency times을 반영하지 못한다는 것을 확인
•GPU/TPU에서 memory access cost가 latency에 더 큰 영향을 주는데, FLOPs는 이를 반영하지 못함
•EfficientNet의 더 큰 activations을 활용하는 Depthwise convolution 으로 인해 ResNet보다 GPU/TPU에서 latency 손해를 봄
•더 많은 Memory 소비 -> memory access 증가 -> latency 증가
• RS-350의 FLOPs가 1.8배 많지만
TPU-v3 latency가 2.7배 낮음
• RS-350의 Params가 3.8배 많지만
2.3배 적은 메모리를 사용함
• Depth-resolution
/ 24
2. Methods
2. Experimental Results & Methodology 18
Comparing EfficientNets against ResNet-RS(re-scaled) on a speed-accuracy Pareto curve
•Width, depth보다 천천히 scale-up되는 image resolution을 활용해 activation 수를 줄이고 그 결과 메모리 사용량을 줄이는 것이
ResNet-RS의 핵심
/ 24
2. Methods
2. Experimental Results & Methodology 19
Improving scaling of EfficientNet
•EfficientNets(EfficientNet-RS)의 성능 개선
•EfficientNet의 jointly increases model depth, width and resolution at a constant rate 전략이 차선책
이라고 주장
• Architecture - resolution
Slow image resolution scaling strategy를 EfficientNets에 적용 (width와 depth 변경 없음)
/ 24
2. Methods
2. Experimental Results & Methodology 20
ResNet-RS performance in a large scale semi-supervised learning setup
•Semi-supervised learning setup에서도 높은 정확도를 보이면서 EfficientNet보다 4.7x - 5.5x 더 빠른 inference
speed를 보인다.
•ResNets-RS 를 1.3M labeled ImageNet images 와 130M pseudo-labeled images로 학습 (Noisy Student 처럼)
•130M pseudo-labeled image는 Noisy Student 논문의 EfficientNet-L2 (88.4%)로 생성
/ 24
2. Methods
2. Experimental Results & Methodology 21
Transfer Learning to Downstream Tasks with ResNet-RS
•re-scale을 적용한 supervised learning이 SimCLR/SImCLRv2 (Self-supervised learning)을 outperform함.
•Self-supervised learning이 supervised learning보다 universal representations을 학습한다는 개념과 상반된 결과.
•SimCLR와의 공정한 비교를 위한 실험 셋팅:
•ResNet-RS: 400 epochs with cosine learning rate decay, data augmentation (RandAugment),
label smoothing, dropout and decreased weight decay but do not use stochastic depth or
exponential moving average (EMA) of the weights.
•SimCLR/SimCLRv2: longer training (800 epochs) with cosine learning rate decay, a tailored data
augmentation strategy, a tuned temperature parameter in the contrastive loss and a tuned
weight decay.
•Vanilla ResNet architecture pre-trained on ImageNet.
/ 24
2. Methods
2. Experimental Results & Methodology 22
Revised 3D ResNet for Video Classification
Kinetics-400 video classification task
•Image classification과 비슷한 결과. Architecture 변경 없이도 높은 성능 개선 효과가 있었음.
/ 24
3. Conclusion
2. Methods
3. Conclusions 24
• 복잡한 architectural change에서 얻을 수 있는 성능 향상만큼 scaling and training strategies만으로도
달성할 수 있다는 것을 보임
• 모델 아키텍처를 변경하지 않고도 강력한 성능(예: +3.2% top-1 ImageNet 정확도, +4.0% top-1
Kinetics-400 정확도)을 달성하는 regularization techniques와 training strategy 에 대한 실증적 연구
• 새로운 network scaling strategy:
• (1) Scale depth when overfitting can occur (scaling width can be preferable otherwise)
• (2) Scale the image resolution more slowly
• FLOPs라는 지표가 실제 training과 inference에 대한 latency times을 반영하지 못한다는 것을 확인
• Our work suggests that the field has myopically overemphasized architectural innovations at
the expense of experimental diligence, and we hope it encourages further scrutiny in
maintaining consistent methodology for both proposed innovations and baselines alike.
/ 24

More Related Content

What's hot

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
Computer Graphics
Computer GraphicsComputer Graphics
Computer GraphicsAdri Jovin
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarinePope Kim
 
[DL Hacks]Visdomを使ったデータ可視化
[DL Hacks]Visdomを使ったデータ可視化[DL Hacks]Visdomを使ったデータ可視化
[DL Hacks]Visdomを使ったデータ可視化Deep Learning JP
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingPreferred Networks
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.ozlael ozlael
 
Shadow mapping 정리
Shadow mapping 정리Shadow mapping 정리
Shadow mapping 정리changehee lee
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
 
広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9
広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9
広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9Kazuhiro Ota
 
Practical Occlusion Culling in Killzone 3
Practical Occlusion Culling in Killzone 3Practical Occlusion Culling in Killzone 3
Practical Occlusion Culling in Killzone 3Guerrilla
 
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」ManaMurakami1
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimePhil Estes
 

What's hot (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
Image Sensing & Acquisition
Image Sensing & AcquisitionImage Sensing & Acquisition
Image Sensing & Acquisition
 
Computer Graphics
Computer GraphicsComputer Graphics
Computer Graphics
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space Marine
 
[DL Hacks]Visdomを使ったデータ可視化
[DL Hacks]Visdomを使ったデータ可視化[DL Hacks]Visdomを使ったデータ可視化
[DL Hacks]Visdomを使ったデータ可視化
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
 
Image forgery and security
Image forgery and securityImage forgery and security
Image forgery and security
 
Shadow mapping 정리
Shadow mapping 정리Shadow mapping 정리
Shadow mapping 정리
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 
広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9
広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9
広告クリエイティブ制作におけるコンピュータビジョングラフィックデザイン CA Data Engineering & Data Analysis WS #9
 
Practical Occlusion Culling in Killzone 3
Practical Occlusion Culling in Killzone 3Practical Occlusion Culling in Killzone 3
Practical Occlusion Culling in Killzone 3
 
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container Runtime
 
Shading in OpenGL
Shading in OpenGLShading in OpenGL
Shading in OpenGL
 

Similar to Visions of Scaling: An Empirical Study of Training and Regularization Methods for Vision Models

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceLEE HOSEONG
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training ReviewLEE HOSEONG
 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper ReviewLEE HOSEONG
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...jaumebp
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
 
Centertrack and naver airush 2020 review
Centertrack and naver airush 2020 reviewCentertrack and naver airush 2020 review
Centertrack and naver airush 2020 review경훈 김
 
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTIONANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTIONRajatRoy60
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
Deep Learning in Limited Resource Environments
Deep Learning in Limited Resource EnvironmentsDeep Learning in Limited Resource Environments
Deep Learning in Limited Resource EnvironmentsOguzVuruskaner
 
Deep gradient compression
Deep gradient compressionDeep gradient compression
Deep gradient compressionDavid Tung
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 

Similar to Visions of Scaling: An Empirical Study of Training and Regularization Methods for Vision Models (20)

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Centertrack and naver airush 2020 review
Centertrack and naver airush 2020 reviewCentertrack and naver airush 2020 review
Centertrack and naver airush 2020 review
 
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTIONANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
ANALYSIS OF INSTANCE SEGMENTATION APPROACH FOR LANE DETECTION
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Deep Learning in Limited Resource Environments
Deep Learning in Limited Resource EnvironmentsDeep Learning in Limited Resource Environments
Deep Learning in Limited Resource Environments
 
Deep gradient compression
Deep gradient compressionDeep gradient compression
Deep gradient compression
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 

More from Sunghoon Joo

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingSunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

More from Sunghoon Joo (19)

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

Recently uploaded

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 

Visions of Scaling: An Empirical Study of Training and Regularization Methods for Vision Models

  • 1. PR-373 NeurIPS 2021 주성훈, VUNO Inc. 2022. 2. 27.
  • 3. 2. Methods 1. Research Background 3 The performance of a vision model •최신 training methodology를 적용한 새로운 architectures에서의 성능과, 
 이전 방식의 training methodology를 적용한 오래된 architectures성능과 비교하는 것을 지적함 •Architecture •Training methods •Scaling strategy / 24
  • 4. 2. Methods 1. Research Background 4 Objective: 
 Vision architecture를 scaling 하는데에 new perspectives와 practical advice 제공 •ResNet에 modern training and regularization techniques를 적용. Large Epoch. •Re-scaled (Depth, Width, Resolution) ResNet (ResNet-RS) / 24
  • 5. 2. Methods 1. Research Background 5 ResNeXt Automated architecture search를 활용한 구조 [67 (NASNET), 41 (AmoebaNet: 83.9), 55 (EfficientNet-B7, 84.4%, 2019)]. Adapting self-attention to the visual domain AA-ResNet-152, 79.1%, 2019 ViT-L/16 87.76±0.03%, 2020 LambdaResNet200 84.3%, 2021 Previous works •Architecture VGG ResNet Inception ViT-L/16 87.76±0.03%, 2020 / 24
  • 6. 2. Methods 1. Research Background 6 Previous works Compound Scaling •이 논문에서 지적하는 단점들 • Small Epoch • FLOPS 를 최적화해 설계된 baseline network (EfficientNet-B0) •Architecture - EfficientNet •EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, 2019, ICML 2019) •PR-169 / 24
  • 7. 2. Methods 1. Research Background 7 •Innovations in training (e.g. improved learning rate schedules) •regularization methods •dropout •label smoothing •stochastic depth •dropblock •data augmentation Regularization methods have become especially useful to prevent overfitting when training ever-increasingly larger models on limited data (e.g. 1.2M ImageNet images). Previous works •training/regularization methodology https://arxiv.org/pdf/2011.12562.pdf Label smoothing / 24
  • 8. 2. Methods 1. Research Background 8 Previous works NFNets : 89.2% (Brock, A. et al. High-Performance Large-Scale Image Recognition Without Normalization. (2021).) • ResNet • JFT-300M • Batch normalization -> Adaptive Gradient Clipping(AGC) ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ICLR 2021): 88.6% • Transformer • ImageNet-21k (1,400만 개, 10배) • JFT-300M •Additional training data. •Pre-training on large-scale datasets 76.4 -> 79.2 Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. ICCV. 2017 / 24
  • 9. 2. Methods 1. Research Background 9 NoisyStudent (EfficientNet-L2) 88.4%, CVPR -Self-training with Noisy Student improves ImageNet classification Previous works •Additional training data. •Semi-supervised learning Meta Pseudo Labels (EfficientNet-L2), 90.2%, PR-336 / 24
  • 10. 2. Experimental Results & Methodology
  • 11. 2. Methods 2. Experimental Results & Methodology 11 Additive Study of Improvements •Architecture change 없이 Top-1 accuracy 82.2% (+3.2%) 달성 •Baseline ResNet-200 •256x256 •90 epochs •Stepwise learning rate decay •Hyperparameter selection: •held-out validation set comprising 2% of the ImageNet training set (minival-set) •Increase training epochs (90->350 epoch) •Regularization을 적용한 후에는 유용함 Top-1 accuracy: ImageNet validation-set, averaged over 2 runs Training Regularization Architecture / 24
  • 12. 2. Methods 2. Experimental Results & Methodology 12 Additive Study of Improvements •Architecture change 없이 Top-1 accuracy 82.2% (+3.2%) 달성 Regularization and Data Augmentation •Dropout: after the global average pooling •Decrease weight decay 적용 유무에 따라 결과 달라짐 •Stochastic depth (G. Huang et al., 2016, ECCV) •RandAugment: random image transformations (e.g. translate, shear, color distortions) •Momentum with cosine learning rate schedule Top-1 accuracy: ImageNet validation-set, averaged over 2 runs Baseline ResNet-200 •256x256, 90 epochs, Stepwise learning rate decay Training Regularization Architecture / 24
  • 13. 2. Methods 2. Experimental Results & Methodology 13 Top-1 accuracy: ImageNet validation-set, averaged over 2 runs Additive Study of Improvements •Architecture change 없이 Top-1 accuracy 82.2% (+3.2%) 달성 •간단한 architecture change (SE, ResNet-D) 를 추가해 Top-1 accuracy 83.4% (+4.4%) 달성 Baseline ResNet-200 •256x256, 90 epochs, Stepwise learning rate decay Architecture changes •ResNet-D (PR-201) •Squeeze-and-Excitation (Hu, J. et al., 2020) Training Regularization Architecture / 24
  • 14. 2. Methods 2. Experimental Results & Methodology 14 Top-1 accuracy: ImageNet validation-set, averaged over 2 runs Importance of decreasing weight decay when combining regularization methods •RA나 LS를 적용하면 default weight decay (1e-4)에서 변경 할 필요 없음 RA: RandAugment, LS: Label Smoothing, DO: Dropout on FC, SD: Stochastic Depth •더 많은 regularization을 적용할 때 weight decay 수준을 낮추지 않으면 성능이 좋아지지 않는다. 
 -> 여러 regularization을 적용했을 때 모델을 과도하게 regularization이 되지 않도록 조절이 필요함. / 24
  • 15. 2. Methods 2. Experimental Results & Methodology 15 Extensive search on ImageNet over scaling strategies •Extensive search on ImageNet over width multipliers in [0.25,0.5,1.0,1.5,2.0], depths of [26, 50, 101, 200, 300, 350, 400] and resolutions of [128,160,224,320,448]. •350 epochs, increase regularization with model size in an effort to limit overfitting Smaller model - power law trend 동일 FLOPs에서 image resolution의 영향을 받음 -> slow image resolution scaling의 motivation이 됨 •RandAugment: •magnitude is set to 10 for filter multipliers in [0.25, 0.5] 
 or image resolution in [64, 160], •15 for image resolution in [224, 320] •20 otherwise. •Stochastic depth •Drop rate of 0.2 for image resolutions 224 and above. •We do not apply stochastic depth filter multiplier 0.25 (or images smaller than 224). •All models use a label smoothing of 0.1 and a weight decay of 4e-5. / 24
  • 16. 2. Methods 2. Experimental Results & Methodology 16 Depth Scaling in Regimes Where Overfitting Can Occur •Depth scaling은 longer epoch 조건에서, width scaling은 short epoch 조건에서 성능 향상 •Width scaling하면 overfitting 발생: Model Width를 확장할 때 hyperparameter가 더 크게 증가 •이전 연구들은 40 epoch정도의 training regime에서도 width scaling을 택함을 지적 •BiT (2019) scales the width of ResNet-152 with 4x filter multiplier •FNnet (2021) scales the width with ~1.5x filter multiplier Image resolution [128, 160, 224, 320] Depth [101, 200, 300, 400] Width multiplier [1.0, 1.5, 2.0] / 24
  • 17. 2. Methods 2. Experimental Results & Methodology 17 Comparing EfficientNets against ResNet-RS(re-scaled) on a speed-accuracy Pareto curve •FLOPs라는 지표가 실제 training과 inference에 대한 latency times을 반영하지 못한다는 것을 확인 •GPU/TPU에서 memory access cost가 latency에 더 큰 영향을 주는데, FLOPs는 이를 반영하지 못함 •EfficientNet의 더 큰 activations을 활용하는 Depthwise convolution 으로 인해 ResNet보다 GPU/TPU에서 latency 손해를 봄 •더 많은 Memory 소비 -> memory access 증가 -> latency 증가 • RS-350의 FLOPs가 1.8배 많지만 TPU-v3 latency가 2.7배 낮음 • RS-350의 Params가 3.8배 많지만 2.3배 적은 메모리를 사용함 • Depth-resolution / 24
  • 18. 2. Methods 2. Experimental Results & Methodology 18 Comparing EfficientNets against ResNet-RS(re-scaled) on a speed-accuracy Pareto curve •Width, depth보다 천천히 scale-up되는 image resolution을 활용해 activation 수를 줄이고 그 결과 메모리 사용량을 줄이는 것이 ResNet-RS의 핵심 / 24
  • 19. 2. Methods 2. Experimental Results & Methodology 19 Improving scaling of EfficientNet •EfficientNets(EfficientNet-RS)의 성능 개선 •EfficientNet의 jointly increases model depth, width and resolution at a constant rate 전략이 차선책 이라고 주장 • Architecture - resolution Slow image resolution scaling strategy를 EfficientNets에 적용 (width와 depth 변경 없음) / 24
  • 20. 2. Methods 2. Experimental Results & Methodology 20 ResNet-RS performance in a large scale semi-supervised learning setup •Semi-supervised learning setup에서도 높은 정확도를 보이면서 EfficientNet보다 4.7x - 5.5x 더 빠른 inference speed를 보인다. •ResNets-RS 를 1.3M labeled ImageNet images 와 130M pseudo-labeled images로 학습 (Noisy Student 처럼) •130M pseudo-labeled image는 Noisy Student 논문의 EfficientNet-L2 (88.4%)로 생성 / 24
  • 21. 2. Methods 2. Experimental Results & Methodology 21 Transfer Learning to Downstream Tasks with ResNet-RS •re-scale을 적용한 supervised learning이 SimCLR/SImCLRv2 (Self-supervised learning)을 outperform함. •Self-supervised learning이 supervised learning보다 universal representations을 학습한다는 개념과 상반된 결과. •SimCLR와의 공정한 비교를 위한 실험 셋팅: •ResNet-RS: 400 epochs with cosine learning rate decay, data augmentation (RandAugment), label smoothing, dropout and decreased weight decay but do not use stochastic depth or exponential moving average (EMA) of the weights. •SimCLR/SimCLRv2: longer training (800 epochs) with cosine learning rate decay, a tailored data augmentation strategy, a tuned temperature parameter in the contrastive loss and a tuned weight decay. •Vanilla ResNet architecture pre-trained on ImageNet. / 24
  • 22. 2. Methods 2. Experimental Results & Methodology 22 Revised 3D ResNet for Video Classification Kinetics-400 video classification task •Image classification과 비슷한 결과. Architecture 변경 없이도 높은 성능 개선 효과가 있었음. / 24
  • 24. 2. Methods 3. Conclusions 24 • 복잡한 architectural change에서 얻을 수 있는 성능 향상만큼 scaling and training strategies만으로도 달성할 수 있다는 것을 보임 • 모델 아키텍처를 변경하지 않고도 강력한 성능(예: +3.2% top-1 ImageNet 정확도, +4.0% top-1 Kinetics-400 정확도)을 달성하는 regularization techniques와 training strategy 에 대한 실증적 연구 • 새로운 network scaling strategy: • (1) Scale depth when overfitting can occur (scaling width can be preferable otherwise) • (2) Scale the image resolution more slowly • FLOPs라는 지표가 실제 training과 inference에 대한 latency times을 반영하지 못한다는 것을 확인 • Our work suggests that the field has myopically overemphasized architectural innovations at the expense of experimental diligence, and we hope it encourages further scrutiny in maintaining consistent methodology for both proposed innovations and baselines alike. / 24