CVPR 2018 Paper Reading MobileNet V2

Pham Quang Khang
2018/8/18 Paper Reading Fest 20180819 1
MobileNet V2: Inverted Residuals
and Linear Bottlenecks
Mark Sandler et al. CVPR 2018

Agendas
1. Motivation of research
2. Key components of MobileNet V2
a. Depthwise Separable Convolutions
b. Linear bottlenecks and inverted residual
c. Effect of linear bottlenecks and inverted residual
3. Architecture of MobileNet V2
4. Experiments and results

Agendas

Convolutional Neural Networks
■ LeNet for hand written characters
Yann LeCun, 1998

Evolution of ImageNet
■ 2012: AlexNet major debut for power of CNN
– Conv layers:3, 48, 128, 192, 192, 128
– FC layers: 2048, 2048
■ 2014: VGG 19 power of very deep network
– Conv layers: 19 conv3
– FC layers: 4096, 4096
■ 2015: ResNet very very very deep network
– 152-layers residual of various side conv
– No FC
■ 2014 – 2016: Inception -> Inception v4, Inception + ResNet
■ Xception (CVPR 2017)
■ MobileNet, ShuffleNet => it is time for architectures can fit on mobile

Computation power requirements
■ Previous architectures required massive amount of memory and computational
power
■ In order to run image classification or detection on mobile devices, it is a must to
create lighter model with sufficient accuracy
Model
ImageNet
Accuracy
Million
Mult-Adds
Million
Parameters
MobileNetV2 72.0% 300 3.4
MobileNet(1) 70.6 569 4.2
GoogleNet
(Inception)
69.8% 1550 6.8
VGG 16 71.5% 15300 138
Andrew G. Howard et al. 2017
Mark Sandler et al. 2018

Agendas

Depthwise Separable Conv
■ Conventional Conv: transform DF x DF x M (input size of DF and M
channels) to DF x DF x N, using DK x DK x M x N kernel
– Cost to compute one point in output: DKxDKxM
– Cost to compute whole output: DK x DK x M x DF x DF x N
■ Conv = filtering + combination
■ New way: split into 2 steps of filtering and combination
– Depthwise conv (filtering): use kernel size DKxDKx1 to first get
the DF x DF x M output 1
Cost: DK x DK x M x DF x DF
– Pointwise conv (combination): use kernel size 1x1xMxN to
combine channels of output 1 to final output of DF x DF x N
Cost: M x DF x DF x N
– Total cost: DF x DF x M x (DK x DK + N)
– With DK = 3, cost is down around 9 times
Andrew G. Howard et al. 2017

ReLu and information lost
■ Manifold of interest: each activation tensor of dims ℎ𝑖 × 𝑤𝑖 × 𝑑𝑖 can be treated as
ℎ𝑖 × 𝑤𝑖 pixels with 𝑑𝑖 dimensions
■ Manifold of interest can be embedded in low-dimensional subspaces => reducing
the dimension of the layer would not cause information lost
■ Not so true with non-linear transformation like ReLU:
– If manifold of interest remains non-zero volume after ReLU transformation, it
corresponds to a linear transformation
– ReLU is capable of preserving complete information about input manifold, but
only if the input manifold lies in a low-dimensional subspace of input space
Use linear bottleneck layers

Inverted Residuals and Linear Bottlenecks
■ Residual connections: improve the ability of gradient to propagate
■ Inverted: considerably more memory efficient
Kaiming He et al. 2015

Unit block of MobileNet V2
■ Combining Depthwise Separable Convolutions, linear bottlenecks and inverted
residual block
■ Computational cost per block:
ℎ × 𝑤 × 𝑑 × 𝑡(𝑑′ + 𝑘2 + 𝑑)
■ With this, input and output dimension can
be relatively small
Input Operator Output
ℎ × 𝑤 × 𝑑 1x1 conv2d, ReLU6 ℎ × 𝑤 × (𝑡𝑑)
ℎ × 𝑤 × 𝑡𝑑 3x3 dwise s=s, ReLU6
ℎ
𝑠
×
𝑤
𝑠
× (𝑡𝑑)
ℎ
𝑠
×
𝑤
𝑠
× 𝑡𝑑 Linear 1x1 conv2d
ℎ
𝑠
×
𝑤
𝑠
× 𝑑′

Inverted residual bottleneck for memory saving
■ Transformation function: 𝐹 𝑥 = 𝐴 ∙ 𝑁 ∙ 𝐵 𝑥
A: linear transformation: 𝑅 𝑠×𝑠×𝑘 → 𝑅 𝑠×𝑠×𝑛
N: ReLU6 ∙ dwise ∙ ReLU6: 𝑅 𝑠×𝑠×𝑛 → 𝑅 𝑠′×𝑠′×𝑛
B: linear transformation: 𝑅 𝑠′×𝑠′×𝑛 → 𝑅 𝑠′×𝑠′×𝑘′
■ Memory needed is:
𝑠2
𝑘 + 𝑠′2
𝑘′
+ 𝑂(max 𝑠2
, 𝑠′2
)
■ If expansion layers can be separated into t tensors (that concatenation of them
made up the tensors):
𝐹 𝑥 = σ𝑖=1
𝑡
( 𝐴𝑖 . 𝑁 . 𝐵𝑖) 𝑥
A
N
B

Agendas

Architecture of the model
■ Each line is a sequence of 1 or
more identical layers, repeated n
times
■ Output channel number: c
■ First layer of each sequence has a
stride s and all others use stride 1
■ All spatial conv use 3x3 kernels
■ Bottleneck layer expansion factor t
■ Input resolution should be 96-224
■ Can use multiplier to use thinner
model
Input Operator t c n s
2242
× 3 Conv2d - 32 1 2
1122
× 32 bottleneck 1 16 1 1
1122 × 16 bottleneck 6 24 2 2
562 × 24 bottleneck 6 32 3 2
282
142
142
72 × 160 bottleneck 6 320 1 1
72 × 320 Conv2d 1x1 - 1280 1 1
72
× 1280 Avgpool 7x7 - - 1 -
1 × 1 × 1280 Conv2d 1x1 - k -

Keras code

Agendas

ImageNet Classification
■ Tensorflow
■ RMSProp: decay and momentum of 0.9
■ Batchnorm after every layer
■ Weight decay of 0.00004
■ Initial learning rate 0.045
■ Learning rate decay 0.98 per epoch
■ 16 GPU
■ Batch size 96
Model
ImageNet
Accuracy
Million
Mult-Adds
Million
Parameters
MobileNetV2 72.0% 300 3.4
MobileNet(1) 70.6 569 4.2
GoogleNet
(Inception)
69.8% 1550 6.8
VGG 16 71.5% 15300 138

Comparison between models for mobile (ImageNet)
■ MobileNet, ShuffleNet, NasNet ■ MobileNetV2 with different input
resolution vs NasNet, MobileNetV1,
Shuffle Net
Model
ImageNet
Accuracy
Million
Mult-Adds
Million
Parameters
MobileNetV1
70.6 575 4.2
ShuffleNet(1.5) 71.5% 292 3.4
ShuffleNet (x2) 73.7% 524 5.4
NasNet-A 74% 564 5.3
MobileNetV2 72.0 300 3.4
MobileNetV2(1.
4)
74.7% 585 6.9

Object detection
■ Use MobileNet V2 as feature extractors for object detection with modified version of
Single Shot Detector (SSD) on COCO dataset
■ Compare with YOLOv2, original SSD
■ SSDLite: replace all normal conv with separable conv in SSD prediction layers
■ MNetV2 + SSDLite run on Pixel 1
2018/8/18 Paper Reading Fest 20180819 19Liu et al.2016
Model mAP
Ave. Precision
Params
Millions
MAdd CPU
SSD300 23.2 36.1 35.2B
SSD512 26.8 36.1 99.5B
YOLOv2 21.6 50.7 17.5B
MNet 1
SSDLite
22.2 5.1 1.3B 270ms
MNet 2
SSD Lite
22.1 4.3 0.8B 200ms

Thank you for listening. Time for Q&A

CVPR 2018 Paper Reading MobileNet V2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CVPR 2018 Paper Reading MobileNet V2

Similar to CVPR 2018 Paper Reading MobileNet V2 (20)

Recently uploaded

Recently uploaded (20)

CVPR 2018 Paper Reading MobileNet V2