SlideShare a Scribd company logo
1 of 27
Download to read offline
MobileNetV2:
Inverted Residuals and Linear Bottlenecks
7th October, 2018
PR12 Paper Review
Jinwon Lee
Samsung Electronics
Mark Sandler, et al., “MobileNetV2: Inverted Residuals and Linear
Bottlenecks”, CVPR 2018
Related Papers in PR12
• Andrew G. Howard, et al., ” MobileNets: Efficient Convolutional
Neural Networks for MobileVision Applications”
 PR-044 : https://youtu.be/7UoOFKcyIvM
• Xiangyu Zhang, et al., “ShuffleNet: An Extremely Efficient
Convolutional Neural Network for Mobile Devices”
 PR-056 : https://youtu.be/pNuBdj53Hbc
Introduction & RelatedWork
• Modern state of the art networks require high computational
resources beyond the capabilities of many mobile and embedded
applications
• Recently opened up a new direction of bringing optimization
methods including genetic algorithms and reinforcement learning to
architectural search. However one drawback is that the resulting
networks end up very complex.
Recap – MobileNetV1
• Depthwise Separable Convolution (depthwise conv + pointwise conv)
Depthwise
convolution
Pointwise
convolution
Depthwise
convolution
Pointwise
convolution
Figures from https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/
http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
Recap – MobileNetV1
• Standard convolution takes an ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 input tensor 𝐿𝑖, and
applies convolutional kernel 𝐾 ∈ 𝑅 𝑘×𝑘×𝑑 𝑖×𝑑 𝑗 to produce an ℎ𝑖 ×
𝑤𝑖 × 𝑑𝑗 output tensor 𝐿𝑗 .
• Standard convolutions have the computational cost of
 ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 × 𝑘 × 𝑘 × 𝑑𝑖
• Depthwise separable convolutions cost
 ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 × (𝑘2 + 𝑑𝑗)
• Reduction in computations
 ൗ1
𝑑 𝑗
+ Τ1
𝑘2
 If we use 3x3 depthwise separable convolutions, we get between 8 to 9 times
less computations
Recap – MobileNetV1
Recap – MobileNetV1
• Width Multiplier & Resolution Multiplier
 For a given layer and width multiplier α, the number of input channels M
becomes αM and the number of output channels N becomes αN – where α
with typical settings of 1, 0.75, 0.5 and 0.25
 The second hyper-parameter to reduce the computational cost of a neural
network is a resolution multiplier ρ
Key Features
• Depthwise Separable Convolutions
• Linear Bottlenecks
• Inverted Residuals
Linear Bottlenecks
• Informally, for an input set of real images, we say that the set of layer
activations forms a “manifold of interest”
• It has been long assumed that manifolds of interest in neural
networks could be embedded in low-dimensional subspaces
Linear Bottlenecks
• The authors have highlighted two properties that are indicative of
the requirement that the manifold of interest should lie in a low-
dimensional subspace of the higher-dimensional activation space
1. If the manifold of interest remains non-zero volume after ReLU
transformation, it corresponds to a linear transformation.
2. ReLU is capable of preserving complete information about the input
manifold, but only if the input manifold lies in a low-dimensional subspace
of the input space.
• Assuming the manifold of interest is low-dimensional we can capture
this by inserting linear bottleneck layers into the convolutional blocks
Residual Blocks
• Residual blocks connect the beginning and end of a convolutional
block with a shortcut connection. By adding these two states the
network has the opportunity of accessing earlier activations that
weren’t modified in the convolutional block.
• wide  narrow(bottleneck)  wide approach
Inverted Residuals
• Inspired by the intuition that the bottlenecks actually contain all the
necessary information, while an expansion layer acts merely as an
implementation detail that accompanies a non-linear transformation
of the tensor, the authors use shortcuts directly between the
bottlenecks
• narrow  wide  narrow approach
Bottleneck Residual Blocks
• The basic building block is a bottleneck depth-separable convolution
with residuals
Figures from http://machinethink.net/blog/mobilenet-v2/
# of Operations
• For a block of size ℎ × 𝑤, expansion factor 𝑡 and kernel size 𝑘 with 𝑑′
input channels and 𝑑′′ output channels,
• The total number of multiply add required is
ℎ × 𝑤 × 𝑡 × 𝑑′
× 𝑑′
+ ℎ × 𝑤 × 𝑡 × 𝑑′
× 𝑘 × 𝑘 × 𝑑′
+ ℎ × 𝑤 × 𝑑′′ × 𝑡 × 𝑑′
= ℎ ∙ 𝑤 ∙ 𝑑′ ∙ 𝑡(𝑑′ + 𝑘2 + 𝑑′′)
Information Flow Interpretation
• The proposed convolutional block has a unique property that allows
to separate the network expressiveness (encoded by expansion
layers) from its capacity (encoded by bottleneck inputs).
Figures from http://machinethink.net/blog/mobilenet-v2/
The Architecture of MobileNetV2
• The architecture of MobileNetV2
contains the initial fully convolution
layer with 32 filters, followed by 19
residual bottleneck layers described in
theTable 2.
Trade-off Hyper Parameters
• Input Resolution
 From 96 to 224
• Width Multiplier
 From 0.35 to 1.4
Memory Efficient Inference
• The amount of memory is simply the maximum total size of
combined inputs and outputs across all operations.
• If we treat a bottleneck residual block as a single operation (and treat
inner convolution as a disposable tensor), the total amount of
memory would be dominated by the size of bottleneck tensors,
rather than the size of tensors that are internal to bottleneck (and
much larger)
Bottleneck Residual Block – ATrick for Reducing
Memory Requirement
• A bottleneck block operator 𝐹(𝑥) can be expressed as a composition of
three operators 𝐹 𝑥 = 𝐴 ∘ 𝑁 ∘ 𝐵 𝑥
• 𝐴, 𝐵 is a linear transform, 𝑁 = 𝑅𝑒𝐿𝑈6 ∘ 𝑑𝑤𝑖𝑠𝑒 ∘ 𝑅𝑒𝐿𝑈6
• 𝐴: 𝑅 𝑠×𝑠×𝑘 → 𝑅 𝑠×𝑠×𝑛, 𝑁: 𝑅 𝑠×𝑠×𝑛 → 𝑅 𝑠′×𝑠′×𝑛, 𝐵: 𝑅 𝑠′×𝑠′×𝑘 → 𝑅 𝑠′×𝑠′×𝑛
• The algorithm is based on the fact that the inner tensor 𝐼 can be
represented as concatenation of 𝑡 tensors, of size 𝑛/𝑡 each and our
function can then be represented as 𝐹 𝑥 = σ𝑖=1
𝑡
(𝐴𝑖 ∘ 𝑁 ∘ 𝐵𝑖)(𝑥)
• Using 𝑛 = 𝑡 we end up having to keep only a single channel of the
intermediate representation at all times
• The authors find that this approach is the most helpful to be used with 𝑡
being a small constant between 2 and 5
 Too small 𝑡 causes cache misses
Comparison of Convolutional Blocks for Different
Architectures
The Max Number of Channels/Memory(in Kb)
ImageNet Classification Results
Vs MobileNetV1
Data from http://machinethink.net/blog/mobilenet-v2/
Object Detection – SSDLite
• Replacing all the regular convolutions with separable convolutions
(depthwise followed by 1 × 1 projection) in SSD prediction layers.
Object Detection Results
Semantic Segmentation Results
The Impact of Non-linearities andVariousTypes
of Shortcut Connections

More Related Content

What's hot

CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2Khang Pham
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative ModelsMijung Kim
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to AutoencodersYan Xu
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 

What's hot (20)

Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
MobileNet V3
MobileNet V3MobileNet V3
MobileNet V3
 
CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Lstm
LstmLstm
Lstm
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 

Similar to PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks

A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIPMODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIPVLSICS Design
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054Jinwon Lee
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015Beatrice van Eden
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...taeseon ryu
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTRishabhTyagi48
 
Implementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderImplementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderVLSICS Design
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 

Similar to PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks (20)

A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIPMODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Implementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderImplementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adder
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
wsn
wsnwsn
wsn
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 

More from Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033Jinwon Lee
 

More from Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks

  • 1. MobileNetV2: Inverted Residuals and Linear Bottlenecks 7th October, 2018 PR12 Paper Review Jinwon Lee Samsung Electronics Mark Sandler, et al., “MobileNetV2: Inverted Residuals and Linear Bottlenecks”, CVPR 2018
  • 2. Related Papers in PR12 • Andrew G. Howard, et al., ” MobileNets: Efficient Convolutional Neural Networks for MobileVision Applications”  PR-044 : https://youtu.be/7UoOFKcyIvM • Xiangyu Zhang, et al., “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”  PR-056 : https://youtu.be/pNuBdj53Hbc
  • 3. Introduction & RelatedWork • Modern state of the art networks require high computational resources beyond the capabilities of many mobile and embedded applications • Recently opened up a new direction of bringing optimization methods including genetic algorithms and reinforcement learning to architectural search. However one drawback is that the resulting networks end up very complex.
  • 4. Recap – MobileNetV1 • Depthwise Separable Convolution (depthwise conv + pointwise conv) Depthwise convolution Pointwise convolution Depthwise convolution Pointwise convolution Figures from https://eli.thegreenplace.net/2018/depthwise-separable-convolutions-for-machine-learning/ http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
  • 5. Recap – MobileNetV1 • Standard convolution takes an ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 input tensor 𝐿𝑖, and applies convolutional kernel 𝐾 ∈ 𝑅 𝑘×𝑘×𝑑 𝑖×𝑑 𝑗 to produce an ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 output tensor 𝐿𝑗 . • Standard convolutions have the computational cost of  ℎ𝑖 × 𝑤𝑖 × 𝑑𝑗 × 𝑘 × 𝑘 × 𝑑𝑖 • Depthwise separable convolutions cost  ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 × (𝑘2 + 𝑑𝑗) • Reduction in computations  ൗ1 𝑑 𝑗 + Τ1 𝑘2  If we use 3x3 depthwise separable convolutions, we get between 8 to 9 times less computations
  • 7. Recap – MobileNetV1 • Width Multiplier & Resolution Multiplier  For a given layer and width multiplier α, the number of input channels M becomes αM and the number of output channels N becomes αN – where α with typical settings of 1, 0.75, 0.5 and 0.25  The second hyper-parameter to reduce the computational cost of a neural network is a resolution multiplier ρ
  • 8. Key Features • Depthwise Separable Convolutions • Linear Bottlenecks • Inverted Residuals
  • 9. Linear Bottlenecks • Informally, for an input set of real images, we say that the set of layer activations forms a “manifold of interest” • It has been long assumed that manifolds of interest in neural networks could be embedded in low-dimensional subspaces
  • 10. Linear Bottlenecks • The authors have highlighted two properties that are indicative of the requirement that the manifold of interest should lie in a low- dimensional subspace of the higher-dimensional activation space 1. If the manifold of interest remains non-zero volume after ReLU transformation, it corresponds to a linear transformation. 2. ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space. • Assuming the manifold of interest is low-dimensional we can capture this by inserting linear bottleneck layers into the convolutional blocks
  • 11. Residual Blocks • Residual blocks connect the beginning and end of a convolutional block with a shortcut connection. By adding these two states the network has the opportunity of accessing earlier activations that weren’t modified in the convolutional block. • wide  narrow(bottleneck)  wide approach
  • 12. Inverted Residuals • Inspired by the intuition that the bottlenecks actually contain all the necessary information, while an expansion layer acts merely as an implementation detail that accompanies a non-linear transformation of the tensor, the authors use shortcuts directly between the bottlenecks • narrow  wide  narrow approach
  • 13. Bottleneck Residual Blocks • The basic building block is a bottleneck depth-separable convolution with residuals Figures from http://machinethink.net/blog/mobilenet-v2/
  • 14. # of Operations • For a block of size ℎ × 𝑤, expansion factor 𝑡 and kernel size 𝑘 with 𝑑′ input channels and 𝑑′′ output channels, • The total number of multiply add required is ℎ × 𝑤 × 𝑡 × 𝑑′ × 𝑑′ + ℎ × 𝑤 × 𝑡 × 𝑑′ × 𝑘 × 𝑘 × 𝑑′ + ℎ × 𝑤 × 𝑑′′ × 𝑡 × 𝑑′ = ℎ ∙ 𝑤 ∙ 𝑑′ ∙ 𝑡(𝑑′ + 𝑘2 + 𝑑′′)
  • 15. Information Flow Interpretation • The proposed convolutional block has a unique property that allows to separate the network expressiveness (encoded by expansion layers) from its capacity (encoded by bottleneck inputs). Figures from http://machinethink.net/blog/mobilenet-v2/
  • 16. The Architecture of MobileNetV2 • The architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers described in theTable 2.
  • 17. Trade-off Hyper Parameters • Input Resolution  From 96 to 224 • Width Multiplier  From 0.35 to 1.4
  • 18. Memory Efficient Inference • The amount of memory is simply the maximum total size of combined inputs and outputs across all operations. • If we treat a bottleneck residual block as a single operation (and treat inner convolution as a disposable tensor), the total amount of memory would be dominated by the size of bottleneck tensors, rather than the size of tensors that are internal to bottleneck (and much larger)
  • 19. Bottleneck Residual Block – ATrick for Reducing Memory Requirement • A bottleneck block operator 𝐹(𝑥) can be expressed as a composition of three operators 𝐹 𝑥 = 𝐴 ∘ 𝑁 ∘ 𝐵 𝑥 • 𝐴, 𝐵 is a linear transform, 𝑁 = 𝑅𝑒𝐿𝑈6 ∘ 𝑑𝑤𝑖𝑠𝑒 ∘ 𝑅𝑒𝐿𝑈6 • 𝐴: 𝑅 𝑠×𝑠×𝑘 → 𝑅 𝑠×𝑠×𝑛, 𝑁: 𝑅 𝑠×𝑠×𝑛 → 𝑅 𝑠′×𝑠′×𝑛, 𝐵: 𝑅 𝑠′×𝑠′×𝑘 → 𝑅 𝑠′×𝑠′×𝑛 • The algorithm is based on the fact that the inner tensor 𝐼 can be represented as concatenation of 𝑡 tensors, of size 𝑛/𝑡 each and our function can then be represented as 𝐹 𝑥 = σ𝑖=1 𝑡 (𝐴𝑖 ∘ 𝑁 ∘ 𝐵𝑖)(𝑥) • Using 𝑛 = 𝑡 we end up having to keep only a single channel of the intermediate representation at all times • The authors find that this approach is the most helpful to be used with 𝑡 being a small constant between 2 and 5  Too small 𝑡 causes cache misses
  • 20. Comparison of Convolutional Blocks for Different Architectures
  • 21. The Max Number of Channels/Memory(in Kb)
  • 23. Vs MobileNetV1 Data from http://machinethink.net/blog/mobilenet-v2/
  • 24. Object Detection – SSDLite • Replacing all the regular convolutions with separable convolutions (depthwise followed by 1 × 1 projection) in SSD prediction layers.
  • 27. The Impact of Non-linearities andVariousTypes of Shortcut Connections