SlideShare a Scribd company logo
1 of 27
Download to read offline
김 성 철
Contents
1. Previous research
2. Introduction
3. Related work
4. Design Space Design
5. Analyzing the RegNetX Design Space
6. Comparison to Existing Networks
2
Previous research
• Terminology
• Model family
• ResNet, DenseNet, …
• Design space
• A concrete set of architectures that can be instantiated from the model family
• Two components
• A parametrization of a model family such that
specifying a set of model hyperparameters fully defines a network instantiation
• A set of allowable values for each hyperparameter
3
Previous research
4I. Radosavovic et al., On Network Design Spaces for Visual Recognition, ICCV 2019.
Introduction
• Manual network design
• LeNet, AlexNet, VGG, ResNet, …
• Particular network instantiations and design principles can be generalized and applied to numerous settings
• Finding well-optimized neworks
• Neural Architecture Search (NAS)
• High performance, but NO discovery of network design principles
5
Introduction
• AnyNet
• Unconstrained design space
• RegNet
• A low-dimensional design space
consisting of simple “regular” networks
• Stage width and depths are determined
by a quantized linear function
6
Related work
• Manual network design
• VGG, Inception, ResNet, ResNeXt, DenseNet, MobileNet, …
• Automated network design
• NAS
• Network scaling
• EfficientNet
• Comparing networks
• Parameterization
• An empirical study justifying the design choices & insights into structural design choices
7
Design Space Design
8
AnyNet
1
2
3
n
…
RegNet
Design Space Design
• Tools for Design Space Design
• I. Radosavovic et al.
• Quantify the quality of a design space by sampling a set of models from that design space and
characterizing the resulting model error distribution
• How to obtain a distribution of models?
• low-compute, low-epoch training regime!!!
• e.g. use the 400M flop regime and train each sampled model for 10 epochs
9
Design Space Design
• Tools for Design Space Design
• The error empirical distribution function (EDF)
• Compute and plot error EDFs to summarize design space quality
• Visualize various properties of a design space & use an empirical bootstrap
10
Design Space Design
• The AnyNet Design Space
• The structure of the network
→ network depth, # of channels, bottleneck ratios, group widths
• stem → body → head
• Stem and head are fixed and body is focused
• X block : residual bottleneck + group conv
• 16 degrees of freedom
• 4 stages
• # of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, group width 𝑔𝑖
• Log-uniform sampling
• 𝑑𝑖 ≤ 16, 𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ 1, 2, 4 , 𝑔𝑖 ∈ {1, 2, … , 32} 11
Design Space Design
• The AnyNet Design Space
• AnyNetX 𝐴 : unconstrained AnyNetX design space
• AnyNetX 𝐵 : shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stages i for the AnyNetX 𝐴 design space
• AnyNetX 𝐶 : shared bottleneck ratio 𝑔𝑖 = 𝑔 for all stages
12
Design Space Design
• The AnyNet Design Space
• AnyNetX 𝐷 : 𝑤𝑖+1 ≥ 𝑤𝑖 from AnyNetX 𝐶
• AnyNetX 𝐸 : 𝑑𝑖+1 ≥ 𝑑𝑖 from AnyNetX 𝐷
• The constraints on 𝒘𝒊 and 𝒅𝒊 each reduce the design space by 4!,
with a cumulative reduction of 𝑶 𝟏𝟎 𝟕 from AnyNetX 𝑨
13
Design Space Design
• The RegNet Design Space
• (top-left) the best 20 models from AnyNetX 𝐸
• 𝑤𝑗 = 48 ⋅ 𝑗 + 1 for 0 ≤ 𝑗 ≤ 20
• Trivial linear fit seems to explain the population trend!
• A strategy to quantize a line
• A linear parameterization
𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑
(depth 𝑑, initial width 𝑤0 > 0, slope 𝑤 𝑎 > 0, block width 𝑢𝑗 for each block 𝑗 < 𝑑)
compute 𝑠𝑗 from 𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚
𝑠 𝑗
simply round 𝑠𝑗 (𝑠 𝑗 ) and compute quantized per-block width 𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚
𝑠 𝑗
14
Design Space Design
• The RegNet Design Space
• Sample 𝑑 < 64, 𝑤0, 𝑤 𝑎 < 256, 1.5 ≤ 𝑤 𝑚 ≤ 3
• (left) The error EDF of RegNetX
• (middle) 𝑤 𝑚 = 2 (doubling width between stages) / 𝑤0 = 𝑤 𝑎 (𝑢𝑗 = 𝑤 𝑎 ⋅ (𝑗 + 1))
• (right) random search efficiency
15
Design Space Design
• The RegNet Design Space
• Sample 𝑑 < 64, 𝑤0, 𝑤 𝑎 < 256, 1.5 ≤ 𝑤 𝑚 ≤ 3
• (left) The error EDF of RegNetX
• (middle) 𝑤 𝑚 = 2 (doubling width between stages) / 𝑤0 = 𝑤 𝑎 (𝑢𝑗 = 𝑤 𝑎 ⋅ (𝑗 + 1))
• (right) random search efficiency
16
Design Space Design
• Design Space Generalization
• Discover general principles of network design that can generalize to new settings
• Higher flops, higher epochs with 5-stage networks, with various block types
17
Analyzing the RegNetX Design Space
• RegNet trends
• (top-left) the depth 𝑑 is stable across regimes
→ optimal depth : ~20 blocks (60 layers)
• (top-middle) an optimal bottleneck ratio 𝑏 is 1.0
→ effectively removes the bottleneck
• (top-right) an optimal width multiplier 𝑤 𝑚 is ~2.5
→ similar but not identical to the popular recipe
• (bottom) 𝑔, 𝑤 𝑎, 𝑤0 increase with complexity
18
Analyzing the RegNetX Design Space
• Complexity analysis
• (top-left) activation : the size of the output tensors of all conv layers
→ activation can have a stronger correlation to runtime than flops
• (bottom) complexity vs. flops
19
Analyzing the RegNetX Design Space
• RegNetX constrained
• 𝑏 = 1, 𝑑 ≤ 40, 𝑤 𝑚 ≥ 2 & limit parameters and activations
→ fast, low-parameter, low-memory models without affecting accuracy
• The constrained version is superior across all flop regimes
20
Analyzing the RegNetX Design Space
• Alternate design choices
• (left) the inverted bottleneck degrades the EDF and depthwise conv performs even worse
• (middle) a fixed resolution of 224x224 is best
• SE (Squeeze-and-Excitation)
• (right) X+SE=Y → RegNetY yields good gains
21
Comparison to Existing Networks
• RegNetX and RegNetY design spaces
• The best model from 25 random settings of the RegNet parameters (𝑑, 𝑔, 𝑤 𝑚, 𝑤 𝑎, 𝑤0)
• Re-train the top model 5 times at 100 epochs
• The higher flop models
• a large number of blocks in the third stage and a small number of blocks in the last stage
• The group width 𝑔 increases with complexity, but depth 𝑑 saturates for large models
22
Comparison to Existing Networks
• RegNetX and RegNetY design spaces
23
Comparison to Existing Networks
• State-of-the-Art Comparison: Mobile Regime
• The mobile regime : ~600MF
• NO enhancements for RegNet, while most mobile networks use various enhancements
24
Comparison to Existing Networks
• Standard Baselines Comparison: ResNe(X)t
25
Comparison to Existing Networks
• State-of-the-Art Comparison: Full Regime
26
감 사 합 니 다
27

More Related Content

What's hot

PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033Jinwon Lee
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsMathias Niepert
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054Jinwon Lee
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture SearchDaeJin Kim
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraWenlei Xie
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkNamHyuk Ahn
 
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...e8xu
 

What's hot (20)

PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
SROP_Poster
SROP_PosterSROP_Poster
SROP_Poster
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Densebox
DenseboxDensebox
Densebox
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data Era
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
(Paper note) Real time rgb-d camera relocalization via randomized ferns for k...
 

Similar to Designing Network Design Spaces

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and videoCONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and videoCristiano Rafael Steffens
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionYongsu Baek
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Parallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural ClusteringParallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural Clustering煜林 车
 
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Ontico
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Topological Data Analysis
Topological Data AnalysisTopological Data Analysis
Topological Data AnalysisDeviousQuant
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2Ani Sridhar
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...James McCombs
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Sara Granados Cabeza
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 

Similar to Designing Network Design Spaces (20)

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and videoCONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
CONVOLUTIONAL NEURAL NETWORKS: The workhorse of image and video
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Parallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural ClusteringParallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural Clustering
 
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Topological Data Analysis
Topological Data AnalysisTopological Data Analysis
Topological Data Analysis
 
Network recasting
Network recastingNetwork recasting
Network recasting
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 

More from Sungchul Kim

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionSungchul Kim
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorSungchul Kim
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with ConvolutionsSungchul Kim
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationSungchul Kim
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Sungchul Kim
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic SegmentationSungchul Kim
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondSungchul Kim
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksSungchul Kim
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksSungchul Kim
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive LearningSungchul Kim
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningSungchul Kim
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Sungchul Kim
 

More from Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 

Designing Network Design Spaces

  • 2. Contents 1. Previous research 2. Introduction 3. Related work 4. Design Space Design 5. Analyzing the RegNetX Design Space 6. Comparison to Existing Networks 2
  • 3. Previous research • Terminology • Model family • ResNet, DenseNet, … • Design space • A concrete set of architectures that can be instantiated from the model family • Two components • A parametrization of a model family such that specifying a set of model hyperparameters fully defines a network instantiation • A set of allowable values for each hyperparameter 3
  • 4. Previous research 4I. Radosavovic et al., On Network Design Spaces for Visual Recognition, ICCV 2019.
  • 5. Introduction • Manual network design • LeNet, AlexNet, VGG, ResNet, … • Particular network instantiations and design principles can be generalized and applied to numerous settings • Finding well-optimized neworks • Neural Architecture Search (NAS) • High performance, but NO discovery of network design principles 5
  • 6. Introduction • AnyNet • Unconstrained design space • RegNet • A low-dimensional design space consisting of simple “regular” networks • Stage width and depths are determined by a quantized linear function 6
  • 7. Related work • Manual network design • VGG, Inception, ResNet, ResNeXt, DenseNet, MobileNet, … • Automated network design • NAS • Network scaling • EfficientNet • Comparing networks • Parameterization • An empirical study justifying the design choices & insights into structural design choices 7
  • 9. Design Space Design • Tools for Design Space Design • I. Radosavovic et al. • Quantify the quality of a design space by sampling a set of models from that design space and characterizing the resulting model error distribution • How to obtain a distribution of models? • low-compute, low-epoch training regime!!! • e.g. use the 400M flop regime and train each sampled model for 10 epochs 9
  • 10. Design Space Design • Tools for Design Space Design • The error empirical distribution function (EDF) • Compute and plot error EDFs to summarize design space quality • Visualize various properties of a design space & use an empirical bootstrap 10
  • 11. Design Space Design • The AnyNet Design Space • The structure of the network → network depth, # of channels, bottleneck ratios, group widths • stem → body → head • Stem and head are fixed and body is focused • X block : residual bottleneck + group conv • 16 degrees of freedom • 4 stages • # of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, group width 𝑔𝑖 • Log-uniform sampling • 𝑑𝑖 ≤ 16, 𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ 1, 2, 4 , 𝑔𝑖 ∈ {1, 2, … , 32} 11
  • 12. Design Space Design • The AnyNet Design Space • AnyNetX 𝐴 : unconstrained AnyNetX design space • AnyNetX 𝐵 : shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stages i for the AnyNetX 𝐴 design space • AnyNetX 𝐶 : shared bottleneck ratio 𝑔𝑖 = 𝑔 for all stages 12
  • 13. Design Space Design • The AnyNet Design Space • AnyNetX 𝐷 : 𝑤𝑖+1 ≥ 𝑤𝑖 from AnyNetX 𝐶 • AnyNetX 𝐸 : 𝑑𝑖+1 ≥ 𝑑𝑖 from AnyNetX 𝐷 • The constraints on 𝒘𝒊 and 𝒅𝒊 each reduce the design space by 4!, with a cumulative reduction of 𝑶 𝟏𝟎 𝟕 from AnyNetX 𝑨 13
  • 14. Design Space Design • The RegNet Design Space • (top-left) the best 20 models from AnyNetX 𝐸 • 𝑤𝑗 = 48 ⋅ 𝑗 + 1 for 0 ≤ 𝑗 ≤ 20 • Trivial linear fit seems to explain the population trend! • A strategy to quantize a line • A linear parameterization 𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑 (depth 𝑑, initial width 𝑤0 > 0, slope 𝑤 𝑎 > 0, block width 𝑢𝑗 for each block 𝑗 < 𝑑) compute 𝑠𝑗 from 𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚 𝑠 𝑗 simply round 𝑠𝑗 (𝑠 𝑗 ) and compute quantized per-block width 𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚 𝑠 𝑗 14
  • 15. Design Space Design • The RegNet Design Space • Sample 𝑑 < 64, 𝑤0, 𝑤 𝑎 < 256, 1.5 ≤ 𝑤 𝑚 ≤ 3 • (left) The error EDF of RegNetX • (middle) 𝑤 𝑚 = 2 (doubling width between stages) / 𝑤0 = 𝑤 𝑎 (𝑢𝑗 = 𝑤 𝑎 ⋅ (𝑗 + 1)) • (right) random search efficiency 15
  • 16. Design Space Design • The RegNet Design Space • Sample 𝑑 < 64, 𝑤0, 𝑤 𝑎 < 256, 1.5 ≤ 𝑤 𝑚 ≤ 3 • (left) The error EDF of RegNetX • (middle) 𝑤 𝑚 = 2 (doubling width between stages) / 𝑤0 = 𝑤 𝑎 (𝑢𝑗 = 𝑤 𝑎 ⋅ (𝑗 + 1)) • (right) random search efficiency 16
  • 17. Design Space Design • Design Space Generalization • Discover general principles of network design that can generalize to new settings • Higher flops, higher epochs with 5-stage networks, with various block types 17
  • 18. Analyzing the RegNetX Design Space • RegNet trends • (top-left) the depth 𝑑 is stable across regimes → optimal depth : ~20 blocks (60 layers) • (top-middle) an optimal bottleneck ratio 𝑏 is 1.0 → effectively removes the bottleneck • (top-right) an optimal width multiplier 𝑤 𝑚 is ~2.5 → similar but not identical to the popular recipe • (bottom) 𝑔, 𝑤 𝑎, 𝑤0 increase with complexity 18
  • 19. Analyzing the RegNetX Design Space • Complexity analysis • (top-left) activation : the size of the output tensors of all conv layers → activation can have a stronger correlation to runtime than flops • (bottom) complexity vs. flops 19
  • 20. Analyzing the RegNetX Design Space • RegNetX constrained • 𝑏 = 1, 𝑑 ≤ 40, 𝑤 𝑚 ≥ 2 & limit parameters and activations → fast, low-parameter, low-memory models without affecting accuracy • The constrained version is superior across all flop regimes 20
  • 21. Analyzing the RegNetX Design Space • Alternate design choices • (left) the inverted bottleneck degrades the EDF and depthwise conv performs even worse • (middle) a fixed resolution of 224x224 is best • SE (Squeeze-and-Excitation) • (right) X+SE=Y → RegNetY yields good gains 21
  • 22. Comparison to Existing Networks • RegNetX and RegNetY design spaces • The best model from 25 random settings of the RegNet parameters (𝑑, 𝑔, 𝑤 𝑚, 𝑤 𝑎, 𝑤0) • Re-train the top model 5 times at 100 epochs • The higher flop models • a large number of blocks in the third stage and a small number of blocks in the last stage • The group width 𝑔 increases with complexity, but depth 𝑑 saturates for large models 22
  • 23. Comparison to Existing Networks • RegNetX and RegNetY design spaces 23
  • 24. Comparison to Existing Networks • State-of-the-Art Comparison: Mobile Regime • The mobile regime : ~600MF • NO enhancements for RegNet, while most mobile networks use various enhancements 24
  • 25. Comparison to Existing Networks • Standard Baselines Comparison: ResNe(X)t 25
  • 26. Comparison to Existing Networks • State-of-the-Art Comparison: Full Regime 26
  • 27. 감 사 합 니 다 27