SlideShare a Scribd company logo
Designing Network Design Spaces
Ilija Radosavovic, et al., “Designing Network Design Spaces”
3rd May, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
Designing Network Design Spaces
Introduction
• Over the past several years better architectures have resulted in
considerable progress in a wide range of visual recognition tasks.
 Ex)VGG, ResNet, MobileNet, EfficientNet, etc.
• While manual network design has led to large advances, finding well-
optimized networks manually can be challenging, especially as the
number of design choices increases.
• A popular approach to address this limitation is neural architecture
search (NAS).
• However, it does not enable discovery of network design principles
that deepen our understanding and allow us to generalize to new
settings.
Introduction
• In this work, the authors present a new network design paradigm
that combines the advantages of manual design and NAS.
• Instead of focusing on designing individual network instances, they
design design spaces that parametrize populations of networks.
Exploring RandomlyWired Neural Networks for
Image Recognition(PR-155)
• Design a Network Generator not an
Individual Network!
Introduction
• The authors start with a relatively unconstrained design space we call
AnyNet and apply human-in- the-loop methodology to arrive at a
low-dimensional design space consisting of simple “regular”
networks, RegNet.
• RegNet design space generalizes to various compute regimes,
schedule lengths and network block types.
• They analyze the RegNet design space and arrive at interesting
findings that do not match the current practice of network design.
Tools for Design Space Design
• Rather than designing or searching for a single best model under
specific settings, the authors study the behavior of populations of
models.
• They rely on the concept of network design spaces introduced by
Radosavovic et al., “On network design spaces for visual
recognition.”, ICCV2019.
• Core idea of the paper is that we can quantify the quality of a design
space by sampling a set of models from that design space and
characterizing the resulting model error distribution.
Tools for Design Space Design
• To obtain a distribution of models, sample and train n models from a
design space.
• A primary tool for analyzing design space quality is the error
empirical distribution function (EDF).The error EDF of n models with
errors 𝑒𝑖 is given by:
𝐹 𝑒 =
1
𝑛
෍
𝑖=1
𝑛
1[𝑒𝑖 < 𝑒]
• F(e) gives the fraction of models with
error less than 𝑒.
Tools for Design Space Design
• Given a population of trained models, we can plot and analyze
various network properties versus network error.
• For these plots, an empirical bootstrap is applied to estimate the
likely range in which the best models fall.
The blue shaded regions are ranges containing the best models with 95% confidence, and the black vertical line
the most likely best value.
Tools for Design Space Design
• To summarize:
1. generate distributions of models obtained by sampling and
training n models from a design space.
2. compute and plot error EDFs to summarize design space quality.
3. visualize various properties of a design space and use an
empirical bootstrap to gain insight.
4. use these insights to refine the design space.
The AnyNet Design Space
• Given an input image, a network consists of a simple stem, followed by the
network body that performs the bulk of the computation, and a final network
head that predicts the output classes.
• Keep the stem and head fixed and as simple as possible, and instead focus on
the structure of the network body.
• The network body consists of 4 stages operating at progressively reduced
resolution, each stage consists of a sequence of identical blocks.
AnyNetX
• Most of our experiments use the standard residual bottlenecks block
with group convolution.They refer to this as the X block, and the
AnyNet design space built on it as AnyNetX.
AnyNetX
• The AnyNetX design space has 16 degrees of freedom as each
network consists of 4 stages and each stage 𝑖 has 4 parameters: the
number of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, and group
width 𝑔𝑖.
• Resolution 𝑟 = 224 (fixed)
• To obtain valid models, we perform log-uniform sampling of 𝑑𝑖 ≤ 16,
𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ {1, 2, 4}, and 𝑔𝑖 ∈ {1, 2, … , 32}.
• There are (16 ∙ 128 ∙ 3 ∙ 6)4≈ 1018possible model configurations in
the AnyNetX design space.
Design Space Design Aims
1. To simplify the structure of the design.
2. To improve the interpretability of the design space.
3. To improve or maintain the design space quality.
4. To maintain model diversity in the design space.
AnyNetX(A, B, C)
• Refer to unconstrained AnyNet design space as AnyNetXA.
• Shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stage i for the AnyNetXA  AynNetXB.
• Shared group width 𝑔𝑖 = 𝑔 for all stage i for the AnyNetXB  AnyNetXC.
AnyNetX(D, E)
• AnyNetXD is from examining typical network structures of both good
and bad networks from AnyNetXC.
 A pattern emerges: good network have increasing widths.
• AnyNetXD constraint: AnyNetXC & 𝑤𝑖+1 ≥ 𝑤𝑖.
• In addition to stage widths 𝑤𝑖 increasing with i, the stage depths 𝑑𝑖
likewise tend to increase for the best models
• AnyNetXE constraint: AnyNetXD & 𝑑𝑖+1 ≥ 𝑑𝑖.
• Finally, constraints on 𝑤𝑖 and 𝑑𝑖 each reduce the design space by 4!,
with a cumulative reduction of O(107) from AnyNetXA.
AnyNetX(D, E)
Linear Fits
• To gain further insight into the model structure, the best 20 models
from AnyNetXE are showed in a single plot.
• While there is significant variance in the individual models (gray
curves), in the aggregate a pattern emerges.
• In particular, in the same plot we show the line 𝑤𝑗 = 48 · (𝑗 + 1) for
0 ≤ 𝑗 ≤ 20
Linear Fits
• Inspired of AnyNetXD and AnyNetXE, a linear parameterization of
block widths is as follow:
𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑, 𝑤0 > 0, 𝑤 𝑎 > 0
• To quantize 𝑢𝑗, 𝑤 𝑚 is introduced as an additional parameter
𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚
𝑠 𝑗
• Then, to quantize 𝑢𝑗, simply rounding 𝑠𝑗 and compute quantized per-
block width 𝑤𝑗 via:
𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚
‫ہ‬ 𝑠 ‫ۀ‬𝑗
• Converting the per-block 𝑤𝑗 to per-stage format 𝑤𝑖:
𝑤𝑖 = 𝑤0 ⋅ 𝑤 𝑚
𝑖
𝑑𝑖 = ෍
𝑗
1 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 = 1
Linear Fits
efit is a mean log-ratio
The RegNet Design Space
• The design space of RegNet contains only simple, regular models.
 𝑑 < 64
 𝑤0, 𝑤 𝑎 < 256
 1.5 ≤ 𝑤 𝑚 ≤ 3
 𝑏 𝑎𝑛𝑑 𝑔 are same as AnyNet
• 𝑤 𝑚 = 2 𝑎𝑛𝑑 𝑤0 = 𝑤 𝑎 make good performance, but to maintain
the diversity of models they are not applied to RegNet design space.
Design Space Summary
Design Space Generalization
Design Space Generalization
Common Design Patterns
• The deeper the model, the better the performance.
• Double the number of channels whenever the spatial activation size
is reduced.
• Skip connection is good.
• Bottleneck is good.
• Depthwise separable convolution is popular for low compute regime.
• Inverted bottleneck is also good.
RegNetTrends
• The depth of best models is stable across regimes, with an optimal
depth of ~20 blocks(60 layers).
• This is in contrast to the common practice of using deeper models for
higher flop regimes.
RegNetTrends
• The best models use a bottleneck ratio 𝑏 of 1.0, which effectively
removes the bottleneck.
• The width multiplier 𝑤 𝑚 of good models is ~2.5, similar but not
identical to the popular recipe of doubling widths across stages.
RegNetTrends
• The remaining parameters(𝑔, 𝑤 𝑎, 𝑤0) increase with complexity
Complexity Analysis
• While not a common measure of network complexity, activations can
heavily affect runtime on memory-bound hardware accelerators.
• Activations increase with the square-root of flops, parameters
increase linearly.
RegNetX Constrained
• Using these findings, RegNetX design space is refined – RegNetX C
 𝑏 = 1, 𝑑 ≤ 40, and 𝑤 𝑚 ≥ 2
 Limited parameters and activations following complexity analysis
 Further depth limit: 12 ≤ 𝑑 ≤ 28
Alternate Design Choices
• Inverted bottleneck(𝑏 < 1) degrades the EDF slightly and depthwise
conv performs even worse relative to 𝑏 = 1 and 𝑔 ≥ 1.
• For RegNetX, a fixed resolution of 224x224 is best, even at higher flops.
• Squeeze-and-Excitation(SE) op yields good gains – RegNetY
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
• The higher flop models have a large number of blocks in the third
stage and a small number of blocks in the last stage.
• The group width 𝑔 increases with complexity, but depth 𝑑 saturates
for large models.
State of the Art Comparison: Mobile Regime
RegNeXt
Comparison
EfficientNet
Comparison
At low flops, EfficientNet outperforms the
RegNetY. At intermediate flops, RegNetY
outperforms EfficientNet, and at higher
flops both RegNetX and RegNetY perform
better.
Test Set Evaluation
Additional Ablations
• Fixed Depth
 Surprisingly, fixed-depth networks can match the performance of variable depth networks
for all flop regimes.
• Fewer Stages
 Top RegNet models at high flops have few blocks in the fourth stage but, 3 stage networks
perform considerably worse.
• Inverted Bottleneck
 In a high-compute regime, b < 1 degrades results further.
Additional Ablations
• Swish vs ReLU
 Swish outperforms ReLU at low flops, but ReLU is better at high flops.
 Interestingly, if g is restricted to be 1(depthwise conv), Swish performs much
better than ReLU.
Optimization Settings
• Initial learning rate and weight decay are stable across complexity regimes.
RegNet
EfficientNet

More Related Content

What's hot

PhD Defense
PhD DefensePhD Defense
PhD Defense
Taehoon Lee
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
Ryohei Suzuki
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
Yan Xu
 
Collaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCFCollaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCF
Park JunPyo
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
taeseon ryu
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
LEE HOSEONG
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Lviv Startup Club
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
taeseon ryu
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Tianxiang Xiong
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
Sangmin Woo
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
YaminiAlapati1
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Amazon Web Services
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
taeseon ryu
 

What's hot (20)

PhD Defense
PhD DefensePhD Defense
PhD Defense
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Collaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCFCollaborative Filtering - MF, NCF, NGCF
Collaborative Filtering - MF, NCF, NGCF
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
 

Similar to PR243: Designing Network Design Spaces

Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
Sungchul Kim
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
Beatrice van Eden
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
DaeJin Kim
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
ssuser2624f71
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
威智 黃
 
ResNet.pptx
ResNet.pptxResNet.pptx
ResNet.pptx
ssuser2624f71
 
ResNet.pptx
ResNet.pptxResNet.pptx
ResNet.pptx
ssuser2624f71
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
Safaa Alnabulsi
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
Sushant Gautam
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
Jinwon Lee
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
Changjin Lee
 
ConvNeXt.pptx
ConvNeXt.pptxConvNeXt.pptx
ConvNeXt.pptx
YanhuaSi
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
Yongsu Baek
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 

Similar to PR243: Designing Network Design Spaces (20)

Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
ResNet.pptx
ResNet.pptxResNet.pptx
ResNet.pptx
 
ResNet.pptx
ResNet.pptxResNet.pptx
ResNet.pptx
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
ConvNeXt.pptx
ConvNeXt.pptxConvNeXt.pptx
ConvNeXt.pptx
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 

More from Jinwon Lee

PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
Jinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
Jinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
Jinwon Lee
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
Jinwon Lee
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
Jinwon Lee
 

More from Jinwon Lee (20)

PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

PR243: Designing Network Design Spaces

  • 1. Designing Network Design Spaces Ilija Radosavovic, et al., “Designing Network Design Spaces” 3rd May, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 3. Introduction • Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.  Ex)VGG, ResNet, MobileNet, EfficientNet, etc. • While manual network design has led to large advances, finding well- optimized networks manually can be challenging, especially as the number of design choices increases. • A popular approach to address this limitation is neural architecture search (NAS). • However, it does not enable discovery of network design principles that deepen our understanding and allow us to generalize to new settings.
  • 4. Introduction • In this work, the authors present a new network design paradigm that combines the advantages of manual design and NAS. • Instead of focusing on designing individual network instances, they design design spaces that parametrize populations of networks.
  • 5. Exploring RandomlyWired Neural Networks for Image Recognition(PR-155) • Design a Network Generator not an Individual Network!
  • 6. Introduction • The authors start with a relatively unconstrained design space we call AnyNet and apply human-in- the-loop methodology to arrive at a low-dimensional design space consisting of simple “regular” networks, RegNet. • RegNet design space generalizes to various compute regimes, schedule lengths and network block types. • They analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design.
  • 7. Tools for Design Space Design • Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models. • They rely on the concept of network design spaces introduced by Radosavovic et al., “On network design spaces for visual recognition.”, ICCV2019. • Core idea of the paper is that we can quantify the quality of a design space by sampling a set of models from that design space and characterizing the resulting model error distribution.
  • 8. Tools for Design Space Design • To obtain a distribution of models, sample and train n models from a design space. • A primary tool for analyzing design space quality is the error empirical distribution function (EDF).The error EDF of n models with errors 𝑒𝑖 is given by: 𝐹 𝑒 = 1 𝑛 ෍ 𝑖=1 𝑛 1[𝑒𝑖 < 𝑒] • F(e) gives the fraction of models with error less than 𝑒.
  • 9. Tools for Design Space Design • Given a population of trained models, we can plot and analyze various network properties versus network error. • For these plots, an empirical bootstrap is applied to estimate the likely range in which the best models fall. The blue shaded regions are ranges containing the best models with 95% confidence, and the black vertical line the most likely best value.
  • 10. Tools for Design Space Design • To summarize: 1. generate distributions of models obtained by sampling and training n models from a design space. 2. compute and plot error EDFs to summarize design space quality. 3. visualize various properties of a design space and use an empirical bootstrap to gain insight. 4. use these insights to refine the design space.
  • 11. The AnyNet Design Space • Given an input image, a network consists of a simple stem, followed by the network body that performs the bulk of the computation, and a final network head that predicts the output classes. • Keep the stem and head fixed and as simple as possible, and instead focus on the structure of the network body. • The network body consists of 4 stages operating at progressively reduced resolution, each stage consists of a sequence of identical blocks.
  • 12. AnyNetX • Most of our experiments use the standard residual bottlenecks block with group convolution.They refer to this as the X block, and the AnyNet design space built on it as AnyNetX.
  • 13. AnyNetX • The AnyNetX design space has 16 degrees of freedom as each network consists of 4 stages and each stage 𝑖 has 4 parameters: the number of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, and group width 𝑔𝑖. • Resolution 𝑟 = 224 (fixed) • To obtain valid models, we perform log-uniform sampling of 𝑑𝑖 ≤ 16, 𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ {1, 2, 4}, and 𝑔𝑖 ∈ {1, 2, … , 32}. • There are (16 ∙ 128 ∙ 3 ∙ 6)4≈ 1018possible model configurations in the AnyNetX design space.
  • 14. Design Space Design Aims 1. To simplify the structure of the design. 2. To improve the interpretability of the design space. 3. To improve or maintain the design space quality. 4. To maintain model diversity in the design space.
  • 15. AnyNetX(A, B, C) • Refer to unconstrained AnyNet design space as AnyNetXA. • Shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stage i for the AnyNetXA  AynNetXB. • Shared group width 𝑔𝑖 = 𝑔 for all stage i for the AnyNetXB  AnyNetXC.
  • 16. AnyNetX(D, E) • AnyNetXD is from examining typical network structures of both good and bad networks from AnyNetXC.  A pattern emerges: good network have increasing widths. • AnyNetXD constraint: AnyNetXC & 𝑤𝑖+1 ≥ 𝑤𝑖. • In addition to stage widths 𝑤𝑖 increasing with i, the stage depths 𝑑𝑖 likewise tend to increase for the best models • AnyNetXE constraint: AnyNetXD & 𝑑𝑖+1 ≥ 𝑑𝑖. • Finally, constraints on 𝑤𝑖 and 𝑑𝑖 each reduce the design space by 4!, with a cumulative reduction of O(107) from AnyNetXA.
  • 18. Linear Fits • To gain further insight into the model structure, the best 20 models from AnyNetXE are showed in a single plot. • While there is significant variance in the individual models (gray curves), in the aggregate a pattern emerges. • In particular, in the same plot we show the line 𝑤𝑗 = 48 · (𝑗 + 1) for 0 ≤ 𝑗 ≤ 20
  • 19. Linear Fits • Inspired of AnyNetXD and AnyNetXE, a linear parameterization of block widths is as follow: 𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑, 𝑤0 > 0, 𝑤 𝑎 > 0 • To quantize 𝑢𝑗, 𝑤 𝑚 is introduced as an additional parameter 𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚 𝑠 𝑗 • Then, to quantize 𝑢𝑗, simply rounding 𝑠𝑗 and compute quantized per- block width 𝑤𝑗 via: 𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 • Converting the per-block 𝑤𝑗 to per-stage format 𝑤𝑖: 𝑤𝑖 = 𝑤0 ⋅ 𝑤 𝑚 𝑖 𝑑𝑖 = ෍ 𝑗 1 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 = 1
  • 20. Linear Fits efit is a mean log-ratio
  • 21. The RegNet Design Space • The design space of RegNet contains only simple, regular models.  𝑑 < 64  𝑤0, 𝑤 𝑎 < 256  1.5 ≤ 𝑤 𝑚 ≤ 3  𝑏 𝑎𝑛𝑑 𝑔 are same as AnyNet • 𝑤 𝑚 = 2 𝑎𝑛𝑑 𝑤0 = 𝑤 𝑎 make good performance, but to maintain the diversity of models they are not applied to RegNet design space.
  • 25. Common Design Patterns • The deeper the model, the better the performance. • Double the number of channels whenever the spatial activation size is reduced. • Skip connection is good. • Bottleneck is good. • Depthwise separable convolution is popular for low compute regime. • Inverted bottleneck is also good.
  • 26. RegNetTrends • The depth of best models is stable across regimes, with an optimal depth of ~20 blocks(60 layers). • This is in contrast to the common practice of using deeper models for higher flop regimes.
  • 27. RegNetTrends • The best models use a bottleneck ratio 𝑏 of 1.0, which effectively removes the bottleneck. • The width multiplier 𝑤 𝑚 of good models is ~2.5, similar but not identical to the popular recipe of doubling widths across stages.
  • 28. RegNetTrends • The remaining parameters(𝑔, 𝑤 𝑎, 𝑤0) increase with complexity
  • 29. Complexity Analysis • While not a common measure of network complexity, activations can heavily affect runtime on memory-bound hardware accelerators. • Activations increase with the square-root of flops, parameters increase linearly.
  • 30. RegNetX Constrained • Using these findings, RegNetX design space is refined – RegNetX C  𝑏 = 1, 𝑑 ≤ 40, and 𝑤 𝑚 ≥ 2  Limited parameters and activations following complexity analysis  Further depth limit: 12 ≤ 𝑑 ≤ 28
  • 31. Alternate Design Choices • Inverted bottleneck(𝑏 < 1) degrades the EDF slightly and depthwise conv performs even worse relative to 𝑏 = 1 and 𝑔 ≥ 1. • For RegNetX, a fixed resolution of 224x224 is best, even at higher flops. • Squeeze-and-Excitation(SE) op yields good gains – RegNetY
  • 36. Comparison to Existing Networks • The higher flop models have a large number of blocks in the third stage and a small number of blocks in the last stage. • The group width 𝑔 increases with complexity, but depth 𝑑 saturates for large models.
  • 37. State of the Art Comparison: Mobile Regime
  • 39. EfficientNet Comparison At low flops, EfficientNet outperforms the RegNetY. At intermediate flops, RegNetY outperforms EfficientNet, and at higher flops both RegNetX and RegNetY perform better.
  • 41. Additional Ablations • Fixed Depth  Surprisingly, fixed-depth networks can match the performance of variable depth networks for all flop regimes. • Fewer Stages  Top RegNet models at high flops have few blocks in the fourth stage but, 3 stage networks perform considerably worse. • Inverted Bottleneck  In a high-compute regime, b < 1 degrades results further.
  • 42. Additional Ablations • Swish vs ReLU  Swish outperforms ReLU at low flops, but ReLU is better at high flops.  Interestingly, if g is restricted to be 1(depthwise conv), Swish performs much better than ReLU.
  • 43. Optimization Settings • Initial learning rate and weight decay are stable across complexity regimes. RegNet EfficientNet