SlideShare a Scribd company logo
Pham Quang Khang
2018/8/18 Paper Reading Fest 20180819 1
MobileNet V2: Inverted Residuals
and Linear Bottlenecks
Mark Sandler et al. CVPR 2018
Agendas
1. Motivation of research
2. Key components of MobileNet V2
a. Depthwise Separable Convolutions
b. Linear bottlenecks and inverted residual
c. Effect of linear bottlenecks and inverted residual
3. Architecture of MobileNet V2
4. Experiments and results
2018/8/18 Paper Reading Fest 20180819 2
Agendas
1. Motivation of research
2. Key components of MobileNet V2
a. Depthwise Separable Convolutions
b. Linear bottlenecks and inverted residual
c. Effect of linear bottlenecks and inverted residual
3. Architecture of MobileNet V2
4. Experiments and results
2018/8/18 Paper Reading Fest 20180819 3
Convolutional Neural Networks
■ LeNet for hand written characters
2018/8/18 Paper Reading Fest 20180819 4
Yann LeCun, 1998
Evolution of ImageNet
■ 2012: AlexNet major debut for power of CNN
– Conv layers:3, 48, 128, 192, 192, 128
– FC layers: 2048, 2048
■ 2014: VGG 19 power of very deep network
– Conv layers: 19 conv3
– FC layers: 4096, 4096
■ 2015: ResNet very very very deep network
– 152-layers residual of various side conv
– No FC
■ 2014 – 2016: Inception -> Inception v4, Inception + ResNet
■ Xception (CVPR 2017)
■ MobileNet, ShuffleNet => it is time for architectures can fit on mobile
2018/8/18 Paper Reading Fest 20180819 5
Computation power requirements
■ Previous architectures required massive amount of memory and computational
power
■ In order to run image classification or detection on mobile devices, it is a must to
create lighter model with sufficient accuracy
2018/8/18 Paper Reading Fest 20180819 6
Model
ImageNet
Accuracy
Million
Mult-Adds
Million
Parameters
MobileNetV2 72.0% 300 3.4
MobileNet(1) 70.6 569 4.2
GoogleNet
(Inception)
69.8% 1550 6.8
VGG 16 71.5% 15300 138
Andrew G. Howard et al. 2017
Mark Sandler et al. 2018
Agendas
1. Motivation of research
2. Key components of MobileNet V2
a. Depthwise Separable Convolutions
b. Linear bottlenecks and inverted residual
c. Effect of linear bottlenecks and inverted residual
3. Architecture of MobileNet V2
4. Experiments and results
2018/8/18 Paper Reading Fest 20180819 7
Depthwise Separable Conv
■ Conventional Conv: transform DF x DF x M (input size of DF and M
channels) to DF x DF x N, using DK x DK x M x N kernel
– Cost to compute one point in output: DKxDKxM
– Cost to compute whole output: DK x DK x M x DF x DF x N
■ Conv = filtering + combination
■ New way: split into 2 steps of filtering and combination
– Depthwise conv (filtering): use kernel size DKxDKx1 to first get
the DF x DF x M output 1
Cost: DK x DK x M x DF x DF
– Pointwise conv (combination): use kernel size 1x1xMxN to
combine channels of output 1 to final output of DF x DF x N
Cost: M x DF x DF x N
– Total cost: DF x DF x M x (DK x DK + N)
– With DK = 3, cost is down around 9 times
2018/8/18 Paper Reading Fest 20180819 8
Andrew G. Howard et al. 2017
ReLu and information lost
■ Manifold of interest: each activation tensor of dims ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 can be treated as
ℎ𝑖 × 𝑤𝑖 pixels with 𝑑𝑖 dimensions
■ Manifold of interest can be embedded in low-dimensional subspaces => reducing
the dimension of the layer would not cause information lost
■ Not so true with non-linear transformation like ReLU:
– If manifold of interest remains non-zero volume after ReLU transformation, it
corresponds to a linear transformation
– ReLU is capable of preserving complete information about input manifold, but
only if the input manifold lies in a low-dimensional subspace of input space
2018/8/18 Paper Reading Fest 20180819 9
Use linear bottleneck layers
Inverted Residuals and Linear Bottlenecks
■ Residual connections: improve the ability of gradient to propagate
■ Inverted: considerably more memory efficient
2018/8/18 Paper Reading Fest 20180819 10
Kaiming He et al. 2015
Unit block of MobileNet V2
■ Combining Depthwise Separable Convolutions, linear bottlenecks and inverted
residual block
■ Computational cost per block:
ℎ × 𝑤 × 𝑑 × 𝑡(𝑑′ + 𝑘2 + 𝑑)
■ With this, input and output dimension can
be relatively small
2018/8/18 Paper Reading Fest 20180819 11
Input Operator Output
ℎ × 𝑤 × 𝑑 1x1 conv2d, ReLU6 ℎ × 𝑤 × (𝑡𝑑)
ℎ × 𝑤 × 𝑡𝑑 3x3 dwise s=s, ReLU6
ℎ
𝑠
×
𝑤
𝑠
× (𝑡𝑑)
ℎ
𝑠
×
𝑤
𝑠
× 𝑡𝑑 Linear 1x1 conv2d
ℎ
𝑠
×
𝑤
𝑠
× 𝑑′
Inverted residual bottleneck for memory saving
■ Transformation function: 𝐹 𝑥 = 𝐴 ∙ 𝑁 ∙ 𝐵 𝑥
A: linear transformation: 𝑅 𝑠×𝑠×𝑘 → 𝑅 𝑠×𝑠×𝑛
N: ReLU6 ∙ dwise ∙ ReLU6: 𝑅 𝑠×𝑠×𝑛 → 𝑅 𝑠′×𝑠′×𝑛
B: linear transformation: 𝑅 𝑠′×𝑠′×𝑛 → 𝑅 𝑠′×𝑠′×𝑘′
■ Memory needed is:
𝑠2
𝑘 + 𝑠′2
𝑘′
+ 𝑂(max 𝑠2
, 𝑠′2
)
■ If expansion layers can be separated into t tensors (that concatenation of them
made up the tensors):
𝐹 𝑥 = σ𝑖=1
𝑡
( 𝐴𝑖 . 𝑁 . 𝐵𝑖) 𝑥
2018/8/18 Paper Reading Fest 20180819 12
A
N
B
Agendas
1. Motivation of research
2. Key components of MobileNet V2
a. Depthwise Separable Convolutions
b. Linear bottlenecks and inverted residual
c. Effect of linear bottlenecks and inverted residual
3. Architecture of MobileNet V2
4. Experiments and results
2018/8/18 Paper Reading Fest 20180819 13
Architecture of the model
■ Each line is a sequence of 1 or
more identical layers, repeated n
times
■ Output channel number: c
■ First layer of each sequence has a
stride s and all others use stride 1
■ All spatial conv use 3x3 kernels
■ Bottleneck layer expansion factor t
■ Input resolution should be 96-224
■ Can use multiplier to use thinner
model
2018/8/18 Paper Reading Fest 20180819 14
Input Operator t c n s
2242
× 3 Conv2d - 32 1 2
1122
× 32 bottleneck 1 16 1 1
1122 × 16 bottleneck 6 24 2 2
562 × 24 bottleneck 6 32 3 2
282
× 32 bottleneck 6 64 4 2
142
× 64 bottleneck 6 96 3 1
142
× 96 bottleneck 6 160 3 2
72 × 160 bottleneck 6 320 1 1
72 × 320 Conv2d 1x1 - 1280 1 1
72
× 1280 Avgpool 7x7 - - 1 -
1 × 1 × 1280 Conv2d 1x1 - k -
Keras code
2018/8/18 Paper Reading Fest 20180819 15
Agendas
1. Motivation of research
2. Key components of MobileNet V2
a. Depthwise Separable Convolutions
b. Linear bottlenecks and inverted residual
c. Effect of linear bottlenecks and inverted residual
3. Architecture of MobileNet V2
4. Experiments and results
2018/8/18 Paper Reading Fest 20180819 16
ImageNet Classification
■ Tensorflow
■ RMSProp: decay and momentum of 0.9
■ Batchnorm after every layer
■ Weight decay of 0.00004
■ Initial learning rate 0.045
■ Learning rate decay 0.98 per epoch
■ 16 GPU
■ Batch size 96
2018/8/18 Paper Reading Fest 20180819 17
Model
ImageNet
Accuracy
Million
Mult-Adds
Million
Parameters
MobileNetV2 72.0% 300 3.4
MobileNet(1) 70.6 569 4.2
GoogleNet
(Inception)
69.8% 1550 6.8
VGG 16 71.5% 15300 138
Comparison between models for mobile (ImageNet)
■ MobileNet, ShuffleNet, NasNet ■ MobileNetV2 with different input
resolution vs NasNet, MobileNetV1,
Shuffle Net
2018/8/18 Paper Reading Fest 20180819 18
Model
ImageNet
Accuracy
Million
Mult-Adds
Million
Parameters
MobileNetV1
70.6 575 4.2
ShuffleNet(1.5) 71.5% 292 3.4
ShuffleNet (x2) 73.7% 524 5.4
NasNet-A 74% 564 5.3
MobileNetV2 72.0 300 3.4
MobileNetV2(1.
4)
74.7% 585 6.9
Object detection
■ Use MobileNet V2 as feature extractors for object detection with modified version of
Single Shot Detector (SSD) on COCO dataset
■ Compare with YOLOv2, original SSD
■ SSDLite: replace all normal conv with separable conv in SSD prediction layers
■ MNetV2 + SSDLite run on Pixel 1
2018/8/18 Paper Reading Fest 20180819 19Liu et al.2016
Model mAP
Ave. Precision
Params
Millions
MAdd CPU
SSD300 23.2 36.1 35.2B
SSD512 26.8 36.1 99.5B
YOLOv2 21.6 50.7 17.5B
MNet 1
SSDLite
22.2 5.1 1.3B 270ms
MNet 2
SSD Lite
22.1 4.3 0.8B 200ms
Thank you for listening. Time for Q&A
2018/8/18 Paper Reading Fest 20180819 20

More Related Content

What's hot

Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Abdulrazak Zakieh
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
Edge AI and Vision Alliance
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Cnn method
Cnn methodCnn method
Cnn method
AmirSajedi1
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
Sangeeta Tiwari
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
Brodmann17
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
Somnath Banerjee
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
Qualcomm Research
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
The Integral Worm
 
07 regularization
07 regularization07 regularization
07 regularization
Ronald Teo
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Vgg
VggVgg
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
IRJET Journal
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
Abhishek Sharma
 
Model compression
Model compressionModel compression
Model compression
Nanhee Kim
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
Sungjoon Choi
 
Neural network
Neural networkNeural network
Neural network
Ramesh Giri
 

What's hot (20)

Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Cnn method
Cnn methodCnn method
Cnn method
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
 
07 regularization
07 regularization07 regularization
07 regularization
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Vgg
VggVgg
Vgg
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Model compression
Model compressionModel compression
Model compression
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Neural network
Neural networkNeural network
Neural network
 

Similar to CVPR 2018 Paper Reading MobileNet V2

PowerGraph
PowerGraphPowerGraph
PowerGraph
Igor Shevchenko
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
IRJET Journal
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
IRJET Journal
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
ssuser2624f71
 
IRJET- Design of Memristor based Multiplier
IRJET- Design of Memristor based MultiplierIRJET- Design of Memristor based Multiplier
IRJET- Design of Memristor based Multiplier
IRJET Journal
 
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
Edge AI and Vision Alliance
 
CE1009_Implementation of Civil IoT Architecture.pdf
CE1009_Implementation  of Civil IoT Architecture.pdfCE1009_Implementation  of Civil IoT Architecture.pdf
CE1009_Implementation of Civil IoT Architecture.pdf
Chenkai Sun
 
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdfUsing Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
Wes Madrigal
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
RECAP Project
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
Otávio Carvalho
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI
 
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
TELKOMNIKA JOURNAL
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Leonardo ENERGY
 
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
nishimurashoji
 
IRJET- Single Precision Floating Point Arithmetic using VHDL Coding
IRJET-  	  Single Precision Floating Point Arithmetic using VHDL CodingIRJET-  	  Single Precision Floating Point Arithmetic using VHDL Coding
IRJET- Single Precision Floating Point Arithmetic using VHDL Coding
IRJET Journal
 
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
オープンハウスにおける機械学習・データサイエンスの取り組みについてオープンハウスにおける機械学習・データサイエンスの取り組みについて
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
Teito Nakagawa
 
Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...
SURFevents
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
J010224750
J010224750J010224750
J010224750
IOSR Journals
 

Similar to CVPR 2018 Paper Reading MobileNet V2 (20)

PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
 
IRJET- Design of Memristor based Multiplier
IRJET- Design of Memristor based MultiplierIRJET- Design of Memristor based Multiplier
IRJET- Design of Memristor based Multiplier
 
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
 
CE1009_Implementation of Civil IoT Architecture.pdf
CE1009_Implementation  of Civil IoT Architecture.pdfCE1009_Implementation  of Civil IoT Architecture.pdf
CE1009_Implementation of Civil IoT Architecture.pdf
 
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdfUsing Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
 
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
 
IRJET- Single Precision Floating Point Arithmetic using VHDL Coding
IRJET-  	  Single Precision Floating Point Arithmetic using VHDL CodingIRJET-  	  Single Precision Floating Point Arithmetic using VHDL Coding
IRJET- Single Precision Floating Point Arithmetic using VHDL Coding
 
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
オープンハウスにおける機械学習・データサイエンスの取り組みについてオープンハウスにおける機械学習・データサイエンスの取り組みについて
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
 
Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...Computational steering Interactive Design-through-Analysis for Simulation Sci...
Computational steering Interactive Design-through-Analysis for Simulation Sci...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
J010224750
J010224750J010224750
J010224750
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

CVPR 2018 Paper Reading MobileNet V2

  • 1. Pham Quang Khang 2018/8/18 Paper Reading Fest 20180819 1 MobileNet V2: Inverted Residuals and Linear Bottlenecks Mark Sandler et al. CVPR 2018
  • 2. Agendas 1. Motivation of research 2. Key components of MobileNet V2 a. Depthwise Separable Convolutions b. Linear bottlenecks and inverted residual c. Effect of linear bottlenecks and inverted residual 3. Architecture of MobileNet V2 4. Experiments and results 2018/8/18 Paper Reading Fest 20180819 2
  • 3. Agendas 1. Motivation of research 2. Key components of MobileNet V2 a. Depthwise Separable Convolutions b. Linear bottlenecks and inverted residual c. Effect of linear bottlenecks and inverted residual 3. Architecture of MobileNet V2 4. Experiments and results 2018/8/18 Paper Reading Fest 20180819 3
  • 4. Convolutional Neural Networks ■ LeNet for hand written characters 2018/8/18 Paper Reading Fest 20180819 4 Yann LeCun, 1998
  • 5. Evolution of ImageNet ■ 2012: AlexNet major debut for power of CNN – Conv layers:3, 48, 128, 192, 192, 128 – FC layers: 2048, 2048 ■ 2014: VGG 19 power of very deep network – Conv layers: 19 conv3 – FC layers: 4096, 4096 ■ 2015: ResNet very very very deep network – 152-layers residual of various side conv – No FC ■ 2014 – 2016: Inception -> Inception v4, Inception + ResNet ■ Xception (CVPR 2017) ■ MobileNet, ShuffleNet => it is time for architectures can fit on mobile 2018/8/18 Paper Reading Fest 20180819 5
  • 6. Computation power requirements ■ Previous architectures required massive amount of memory and computational power ■ In order to run image classification or detection on mobile devices, it is a must to create lighter model with sufficient accuracy 2018/8/18 Paper Reading Fest 20180819 6 Model ImageNet Accuracy Million Mult-Adds Million Parameters MobileNetV2 72.0% 300 3.4 MobileNet(1) 70.6 569 4.2 GoogleNet (Inception) 69.8% 1550 6.8 VGG 16 71.5% 15300 138 Andrew G. Howard et al. 2017 Mark Sandler et al. 2018
  • 7. Agendas 1. Motivation of research 2. Key components of MobileNet V2 a. Depthwise Separable Convolutions b. Linear bottlenecks and inverted residual c. Effect of linear bottlenecks and inverted residual 3. Architecture of MobileNet V2 4. Experiments and results 2018/8/18 Paper Reading Fest 20180819 7
  • 8. Depthwise Separable Conv ■ Conventional Conv: transform DF x DF x M (input size of DF and M channels) to DF x DF x N, using DK x DK x M x N kernel – Cost to compute one point in output: DKxDKxM – Cost to compute whole output: DK x DK x M x DF x DF x N ■ Conv = filtering + combination ■ New way: split into 2 steps of filtering and combination – Depthwise conv (filtering): use kernel size DKxDKx1 to first get the DF x DF x M output 1 Cost: DK x DK x M x DF x DF – Pointwise conv (combination): use kernel size 1x1xMxN to combine channels of output 1 to final output of DF x DF x N Cost: M x DF x DF x N – Total cost: DF x DF x M x (DK x DK + N) – With DK = 3, cost is down around 9 times 2018/8/18 Paper Reading Fest 20180819 8 Andrew G. Howard et al. 2017
  • 9. ReLu and information lost ■ Manifold of interest: each activation tensor of dims ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 can be treated as ℎ𝑖 × 𝑤𝑖 pixels with 𝑑𝑖 dimensions ■ Manifold of interest can be embedded in low-dimensional subspaces => reducing the dimension of the layer would not cause information lost ■ Not so true with non-linear transformation like ReLU: – If manifold of interest remains non-zero volume after ReLU transformation, it corresponds to a linear transformation – ReLU is capable of preserving complete information about input manifold, but only if the input manifold lies in a low-dimensional subspace of input space 2018/8/18 Paper Reading Fest 20180819 9 Use linear bottleneck layers
  • 10. Inverted Residuals and Linear Bottlenecks ■ Residual connections: improve the ability of gradient to propagate ■ Inverted: considerably more memory efficient 2018/8/18 Paper Reading Fest 20180819 10 Kaiming He et al. 2015
  • 11. Unit block of MobileNet V2 ■ Combining Depthwise Separable Convolutions, linear bottlenecks and inverted residual block ■ Computational cost per block: ℎ × 𝑤 × 𝑑 × 𝑡(𝑑′ + 𝑘2 + 𝑑) ■ With this, input and output dimension can be relatively small 2018/8/18 Paper Reading Fest 20180819 11 Input Operator Output ℎ × 𝑤 × 𝑑 1x1 conv2d, ReLU6 ℎ × 𝑤 × (𝑡𝑑) ℎ × 𝑤 × 𝑡𝑑 3x3 dwise s=s, ReLU6 ℎ 𝑠 × 𝑤 𝑠 × (𝑡𝑑) ℎ 𝑠 × 𝑤 𝑠 × 𝑡𝑑 Linear 1x1 conv2d ℎ 𝑠 × 𝑤 𝑠 × 𝑑′
  • 12. Inverted residual bottleneck for memory saving ■ Transformation function: 𝐹 𝑥 = 𝐴 ∙ 𝑁 ∙ 𝐵 𝑥 A: linear transformation: 𝑅 𝑠×𝑠×𝑘 → 𝑅 𝑠×𝑠×𝑛 N: ReLU6 ∙ dwise ∙ ReLU6: 𝑅 𝑠×𝑠×𝑛 → 𝑅 𝑠′×𝑠′×𝑛 B: linear transformation: 𝑅 𝑠′×𝑠′×𝑛 → 𝑅 𝑠′×𝑠′×𝑘′ ■ Memory needed is: 𝑠2 𝑘 + 𝑠′2 𝑘′ + 𝑂(max 𝑠2 , 𝑠′2 ) ■ If expansion layers can be separated into t tensors (that concatenation of them made up the tensors): 𝐹 𝑥 = σ𝑖=1 𝑡 ( 𝐴𝑖 . 𝑁 . 𝐵𝑖) 𝑥 2018/8/18 Paper Reading Fest 20180819 12 A N B
  • 13. Agendas 1. Motivation of research 2. Key components of MobileNet V2 a. Depthwise Separable Convolutions b. Linear bottlenecks and inverted residual c. Effect of linear bottlenecks and inverted residual 3. Architecture of MobileNet V2 4. Experiments and results 2018/8/18 Paper Reading Fest 20180819 13
  • 14. Architecture of the model ■ Each line is a sequence of 1 or more identical layers, repeated n times ■ Output channel number: c ■ First layer of each sequence has a stride s and all others use stride 1 ■ All spatial conv use 3x3 kernels ■ Bottleneck layer expansion factor t ■ Input resolution should be 96-224 ■ Can use multiplier to use thinner model 2018/8/18 Paper Reading Fest 20180819 14 Input Operator t c n s 2242 × 3 Conv2d - 32 1 2 1122 × 32 bottleneck 1 16 1 1 1122 × 16 bottleneck 6 24 2 2 562 × 24 bottleneck 6 32 3 2 282 × 32 bottleneck 6 64 4 2 142 × 64 bottleneck 6 96 3 1 142 × 96 bottleneck 6 160 3 2 72 × 160 bottleneck 6 320 1 1 72 × 320 Conv2d 1x1 - 1280 1 1 72 × 1280 Avgpool 7x7 - - 1 - 1 × 1 × 1280 Conv2d 1x1 - k -
  • 15. Keras code 2018/8/18 Paper Reading Fest 20180819 15
  • 16. Agendas 1. Motivation of research 2. Key components of MobileNet V2 a. Depthwise Separable Convolutions b. Linear bottlenecks and inverted residual c. Effect of linear bottlenecks and inverted residual 3. Architecture of MobileNet V2 4. Experiments and results 2018/8/18 Paper Reading Fest 20180819 16
  • 17. ImageNet Classification ■ Tensorflow ■ RMSProp: decay and momentum of 0.9 ■ Batchnorm after every layer ■ Weight decay of 0.00004 ■ Initial learning rate 0.045 ■ Learning rate decay 0.98 per epoch ■ 16 GPU ■ Batch size 96 2018/8/18 Paper Reading Fest 20180819 17 Model ImageNet Accuracy Million Mult-Adds Million Parameters MobileNetV2 72.0% 300 3.4 MobileNet(1) 70.6 569 4.2 GoogleNet (Inception) 69.8% 1550 6.8 VGG 16 71.5% 15300 138
  • 18. Comparison between models for mobile (ImageNet) ■ MobileNet, ShuffleNet, NasNet ■ MobileNetV2 with different input resolution vs NasNet, MobileNetV1, Shuffle Net 2018/8/18 Paper Reading Fest 20180819 18 Model ImageNet Accuracy Million Mult-Adds Million Parameters MobileNetV1 70.6 575 4.2 ShuffleNet(1.5) 71.5% 292 3.4 ShuffleNet (x2) 73.7% 524 5.4 NasNet-A 74% 564 5.3 MobileNetV2 72.0 300 3.4 MobileNetV2(1. 4) 74.7% 585 6.9
  • 19. Object detection ■ Use MobileNet V2 as feature extractors for object detection with modified version of Single Shot Detector (SSD) on COCO dataset ■ Compare with YOLOv2, original SSD ■ SSDLite: replace all normal conv with separable conv in SSD prediction layers ■ MNetV2 + SSDLite run on Pixel 1 2018/8/18 Paper Reading Fest 20180819 19Liu et al.2016 Model mAP Ave. Precision Params Millions MAdd CPU SSD300 23.2 36.1 35.2B SSD512 26.8 36.1 99.5B YOLOv2 21.6 50.7 17.5B MNet 1 SSDLite 22.2 5.1 1.3B 270ms MNet 2 SSD Lite 22.1 4.3 0.8B 200ms
  • 20. Thank you for listening. Time for Q&A 2018/8/18 Paper Reading Fest 20180819 20