2. Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents
3. Mix Net
Introduction
• Recent trend in CNNs is to improve both accuracy and
efficiency
• Depthwise convolution are becoming more popular
[MobileNets, ShuffleNets, NASNet, AmoebaNet, MnasNet, EfficientNet]
• Mix Net focuses on kernel size.
• Recent studies showed large size kernels such as 5x5, and 7x7
kernels can potentially improve model accuracy and efficiency
with more details at the cost of more parameters and
computations.
• But do the always improve accuracy?
Introduction / Related Work / Methods and Experiments / Conclusion
5. Mix Net
Introduction
• Very large kernel sizes can hurt both accuracy and efficiency
• MixNet suggest limitation of single kernel size
• Propose Mixed Depthwise Convolution(MixConv)
→ Mixes up different kernel sizes in a single convolution operation, to
capture different patterns with various resolutions
• Partitions channels into multiple groups and apply different
kernel sizes to each group of channels
Introduction / Background / Methods and Experiments/ Conclusion
7. Mix Net
Introduction
• Significantly improve MobileNets accuracy and efficiency on
both ImageNet classification and COCO object detection
• Leverage Neural Architecture Search(NAS) to develop new
family of models named MixNets
• MixNet model achieved SOTA with 78.9% accuracy on
ImageNet dataset.
Introduction / Related Work / Methods and Experiments / Conclusion
8. Related Work
Efficient ConvNets
Introduction / Related Work / Methods and Experiments / Conclusion
• Depthwise Convolution has been increasingly popular in all
mobile-size Conv nets
• EfficientNet achieved SOTA accuracy on ImageNet, by using
depthwise and pointwise convolutions
9. Related Work
Multi-Scale Networks and Features
Introduction / Related Work / Methods and Experiments / Conclusion
• Use multiple branches in each layer to utilize different
operations in a single layer
• Inceptions, Inception-ResNet, ResNext, NASnet
* Part of Inception-resnetV2
10. Related Work
Neural Architecture Search
Introduction / Related Work / Methods and Experiments / Conclusion
• NAS has achieved better performance than hand-crafted
models by automating the design process and learning better
design choises
• Developed a new family of MixNets by adding MixConv into the
search space.
11. Methods and Experiments
MixConv Feature Map
Introduction / Related Work / Methodsand Experiments / Conclusion
• MixConv partitions channels into groups and applies different
kernel sizes to each group
1. Input tensor partitioned into “g” groups of tensors
2. Convolutional kernels grouped into ‘g’ virtual kernels
3. Virtual output 4. Final output
12. Methods and Experiments
MixConv Feature Map
Introduction / Related Work / Methodsand Experiments / Conclusion
* Grouped Convolution
13. Methods and Experiments
MixConv Design Choices
Introduction / Related Work / Methodsand Experiments / Conclusion
• Group Size g
→ With the help of NAS, authors experimented variety of group sizes from 1 to 5.
• Kernel Size per Group
→ Starts from 3x3, monotonically increases by 2 per group
• Channel Size per Group
→ Equal partition
→ Exponential Partition (i-th group with 2-i portion of total channels)
ex) 4 Group MixConv with total filter size 32
= Divide channels into (8,8,8,8) with equal partition
= Divide channels into (16, 8, 4, 4) with exponential partition
• Dilated Convolution
→ Since large kernels need more parameters and computations, an alternative is to use dilated
convolution. However, dilated convolutions usually have inferior accuracy than large kernel sizes.
14. Methods and Experiments
MixConv Performance on Mobile Nets – ImageNet
Introduction / Related Work / Methodsand Experiments / Conclusion
• Based on MobileNet V1, and V2, authors replaced all original
3x3 depthwise convolutional kernels with larger kernels of
MixCov kernels.
• MixConv generally uses much less parameters and FLOPS, but
its accuracy is similar or better than original.
• MixConv is much less sensitive to very large kernels
15. Methods and Experiments
MixConv Performance on Mobile Nets – ImageNet
Introduction / Related Work / Methodsand Experiments / Conclusion
16. Methods and Experiments
MixConv Performance on Mobile Nets – COCO object detection
Introduction / Related Work / Methodsand Experiments / Conclusion
• MixConv consistently achieves better efficiency and accuracy
than original
• 0.6% higher mAP(Mean Average Precision) on MobileNetV1
• 1.1% higher mAP on MobileNetV2 using fewer parameters
and FLOPS
17. Methods and Experiments
MixConv Performance on Mobile Nets – COCO object detection
Introduction / Related Work / Methodsand Experiments / Conclusion
18. Methods and Experiments
Ablation Study
Introduction / Related Work / Methodsand Experiments / Conclusion
1. MixConv for Single Layer
• In addition of applying MixConv to the whole network, analyzed per-
layer performance on MobileNetV2.
• MixConv achieved similar of slightly better performance for most of
the layers
19. Methods and Experiments
Ablation Study
Introduction / Related Work / Methodsand Experiments / Conclusion
2. Channel Partition Methods
• Equal Partition vs. Exponential Partition
• Exponential partition required less parameters and FLOPS for the same
kernel size, by assigning more channels to smaller kernels.
• No clear winner between two partition methods
20. Methods and Experiments
Ablation Study
Introduction / Related Work / Methodsand Experiments / Conclusion
3. Dilated Convolution
• Dilated convolution has reasonable performance for small kernels, but
the accuracy drops quickly for large kernels.
• Dilated Convolution skips a lot of local information within large kernels,
dropping accuracy
21. MixNet – Architecture Search
Methods and Experiments
Introduction / Related Work / Methodsand Experiments / Conclusion
• Leverage recent progress in neural architecture search to
develop a new family of MixConv-based models, named as
MixNets.
• NAS search settings are similar to recent MnasNet
• Used MobileNetV2 as baseline network
• NAS search for the best kernel size, expansion ratio, channel
size, etc
• Equal channel partition
• No dilated convolutions
22. MixNet – Architecture Search
Methods and Experiments
Introduction / Related Work / Methodsand Experiments / Conclusion
• New additions to search space
* Swish Activation
* Squeeze-and-Excitation Module
* Grouped Convolutions with group size from 1 to 5
* Adopted MixConv as the basic convolutional operation
* Swish Activation
23. MixNet – Architectures
Methods and Experiments
Introduction / Related Work / Methodsand Experiments / Conclusion
• Small Kernels are more common in early stage for saving computational
cost
• Large Kernels are more common in later stage for better accuracy
• MixNets are capable of utilizing very large kernels such as 9x9 and
11x11 to capture high-resolution patterns from input images
24. MixNet – Performance on ImageNet
Methods and Experiments
Introduction / Related Work / Methodsand Experiments / Conclusion
• Obtained MixNet-S and M from NAS, and scaled up MixNet-M with depth
multiplier 1.3 to obtain MixNet-L
• MixNets outperform all latest mobile ConvNets
25. Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• Studied impact of kernel size for depthwise convolution
• Traditional depthwise convolution suffers from the limitations of single
kernel size
• Proposed MixConv which mixes multiple kernels in a single op
• Improves the accuracy and efficiency for MobileNets on both Image
Classification and Object Detection tasks
• Further Developed a new family of MixNets using NAS
• MixNets achieved significantly better accuracy and efficiency than all latest
mobile ConvNets