Mix Conv: Mixed Depthwise Convolutional Kernels

Mix Conv: Mixed Depthwise
Convolutional Kernels
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
Google Brain | BMVC 2019
2020.03.22

Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents

Mix Net
Introduction
• Recent trend in CNNs is to improve both accuracy and
efficiency
• Depthwise convolution are becoming more popular
[MobileNets, ShuffleNets, NASNet, AmoebaNet, MnasNet, EfficientNet]
• Mix Net focuses on kernel size.
• Recent studies showed large size kernels such as 5x5, and 7x7
kernels can potentially improve model accuracy and efficiency
with more details at the cost of more parameters and
computations.
• But do the always improve accuracy?
Introduction / Related Work / Methods and Experiments / Conclusion

Mix Net
Introduction

Mix Net
Introduction
• Very large kernel sizes can hurt both accuracy and efficiency
• MixNet suggest limitation of single kernel size
• Propose Mixed Depthwise Convolution(MixConv)
→ Mixes up different kernel sizes in a single convolution operation, to
capture different patterns with various resolutions
• Partitions channels into multiple groups and apply different
kernel sizes to each group of channels
Introduction / Background / Methods and Experiments/ Conclusion

Mix Net
Introduction
• Significantly improve MobileNets accuracy and efficiency on
both ImageNet classification and COCO object detection
• Leverage Neural Architecture Search(NAS) to develop new
family of models named MixNets
• MixNet model achieved SOTA with 78.9% accuracy on
ImageNet dataset.

Related Work
Efficient ConvNets
• Depthwise Convolution has been increasingly popular in all
mobile-size Conv nets
• EfficientNet achieved SOTA accuracy on ImageNet, by using
depthwise and pointwise convolutions

Related Work
Multi-Scale Networks and Features
• Use multiple branches in each layer to utilize different
operations in a single layer
• Inceptions, Inception-ResNet, ResNext, NASnet
* Part of Inception-resnetV2

Related Work
Neural Architecture Search
• NAS has achieved better performance than hand-crafted
models by automating the design process and learning better
design choises
• Developed a new family of MixNets by adding MixConv into the
search space.

Methods and Experiments
MixConv Feature Map
Introduction / Related Work / Methodsand Experiments / Conclusion
• MixConv partitions channels into groups and applies different
kernel sizes to each group
1. Input tensor partitioned into “g” groups of tensors
2. Convolutional kernels grouped into ‘g’ virtual kernels
3. Virtual output 4. Final output

MixConv Feature Map
* Grouped Convolution

MixConv Design Choices
• Group Size g
→ With the help of NAS, authors experimented variety of group sizes from 1 to 5.
• Kernel Size per Group
→ Starts from 3x3, monotonically increases by 2 per group
• Channel Size per Group
→ Equal partition
→ Exponential Partition (i-th group with 2-i portion of total channels)
ex) 4 Group MixConv with total filter size 32
= Divide channels into (8,8,8,8) with equal partition
= Divide channels into (16, 8, 4, 4) with exponential partition
• Dilated Convolution
→ Since large kernels need more parameters and computations, an alternative is to use dilated
convolution. However, dilated convolutions usually have inferior accuracy than large kernel sizes.

MixConv Performance on Mobile Nets – ImageNet
• Based on MobileNet V1, and V2, authors replaced all original
3x3 depthwise convolutional kernels with larger kernels of
MixCov kernels.
• MixConv generally uses much less parameters and FLOPS, but
its accuracy is similar or better than original.
• MixConv is much less sensitive to very large kernels

MixConv Performance on Mobile Nets – ImageNet

MixConv Performance on Mobile Nets – COCO object detection
• MixConv consistently achieves better efficiency and accuracy
than original
• 0.6% higher mAP(Mean Average Precision) on MobileNetV1
• 1.1% higher mAP on MobileNetV2 using fewer parameters
and FLOPS

MixConv Performance on Mobile Nets – COCO object detection

Ablation Study
1. MixConv for Single Layer
• In addition of applying MixConv to the whole network, analyzed per-
layer performance on MobileNetV2.
• MixConv achieved similar of slightly better performance for most of
the layers

Ablation Study
2. Channel Partition Methods
• Equal Partition vs. Exponential Partition
• Exponential partition required less parameters and FLOPS for the same
kernel size, by assigning more channels to smaller kernels.
• No clear winner between two partition methods

Ablation Study
3. Dilated Convolution
• Dilated convolution has reasonable performance for small kernels, but
the accuracy drops quickly for large kernels.
• Dilated Convolution skips a lot of local information within large kernels,
dropping accuracy

MixNet – Architecture Search
• Leverage recent progress in neural architecture search to
develop a new family of MixConv-based models, named as
MixNets.
• NAS search settings are similar to recent MnasNet
• Used MobileNetV2 as baseline network
• NAS search for the best kernel size, expansion ratio, channel
size, etc
• Equal channel partition
• No dilated convolutions

MixNet – Architecture Search
• New additions to search space
* Swish Activation
* Squeeze-and-Excitation Module
* Grouped Convolutions with group size from 1 to 5
* Adopted MixConv as the basic convolutional operation
* Swish Activation

MixNet – Architectures
• Small Kernels are more common in early stage for saving computational
cost
• Large Kernels are more common in later stage for better accuracy
• MixNets are capable of utilizing very large kernels such as 9x9 and
11x11 to capture high-resolution patterns from input images

MixNet – Performance on ImageNet
• Obtained MixNet-S and M from NAS, and scaled up MixNet-M with depth
multiplier 1.3 to obtain MixNet-L
• MixNets outperform all latest mobile ConvNets

Conclusion
• Studied impact of kernel size for depthwise convolution
• Traditional depthwise convolution suffers from the limitations of single
kernel size
• Proposed MixConv which mixes multiple kernels in a single op
• Improves the accuracy and efficiency for MobileNets on both Image
Classification and Object Detection tasks
• Further Developed a new family of MixNets using NAS
• MixNets achieved significantly better accuracy and efficiency than all latest
mobile ConvNets

Mix Conv: Mixed Depthwise Convolutional Kernels

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mix Conv: Mixed Depthwise Convolutional Kernels

Similar to Mix Conv: Mixed Depthwise Convolutional Kernels (20)

More from Seunghyun Hwang

More from Seunghyun Hwang (16)

Recently uploaded

Recently uploaded (20)

Mix Conv: Mixed Depthwise Convolutional Kernels