Pelee: a real time object detection system on mobile devices Paper Review
1. Intelligence Machine Vision Lab
Strictly Confidential
Pelee: A Real-Time Object Detection System on
Mobile Devices 리뷰
수아랩 이호성
2. 2Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
3. 3Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
4. 4Type A-3
Introduction
• Increasing need of running CNN on mobile devices
• Limited computing power and memory resource
• Ex) Drone, Smart Camera, Smart Phone..
• A number of efficient oriented CNN have been proposed
• MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution
• Pelee only use conventional convolution instead
• Pelee can be used for both classification and object detection!
Inefficient implementation..
PeleeNet Pelee
5. 5Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
6. 6Type A-3
Related Works
MobileNet, 2017 arXiv
• Depthwise Separable Convolution
Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
https://arxiv.org/pdf/1704.04861.pdf
7. 7Type A-3
Related Works
ShuffleNet, 2017 arXiv
• Depthwise Separable Convolution
• Pointwise Group Convolution
• Channel Shuffle Operation
https://arxiv.org/pdf/1707.01083.pdf
8. 8Type A-3
Related Works
MobileNet V2, 2018 arXiv
• Depthwise Separable Convolution
• Linear Bottlenecks
• Inverted Residuals
https://arxiv.org/pdf/1801.04381.pdf
9. 9Type A-3
Related Works
ShuffleNet V2, 2018 arXiv
• Equal channel width minimizes memory access cost (balanced convolution)
• Excessive group convolution increases memory access cost
• Network fragmentation reduces degree of parallelism
• Element-wise operation are non-negligible
https://arxiv.org/pdf/1807.11164.pdf
10. 10Type A-3
Related Works
DenseNet, 2017 arXiv
• Densely Connected Convolution
• BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer
https://arxiv.org/pdf/1608.06993.pdf
11. 11Type A-3
Related Works
MobileNet, 2017 arXiv
ShuffleNet, 2017 arXiv
MobileNet V2, 2018 arXiv
ShuffleNet V2, 2018 arXiv
DenseNet, 2017 arXiv
5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다.
https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br
https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8
PR12 Season 1
PR12 Season 2
12. 12Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
13. 13Type A-3
PeleeNet: an efficient feature extraction network for image classification
• DenseNet variant architecture – PeleeNet
• Key Features
• Two-way Dense Layer
• Stem Block
• Dynamic number of Channels in Bottleneck Layer
• Transition Layer without Compression
• Composite Function
Classification
14. 14Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Two-Way Dense Layer
• Motivated by GoogLeNet, use a 2-way dense layer
• Can get different scales of receptive fields
• Two stacked 3x3 conv → learn visual patterns for large objects
Classification
15. 15Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Stem Block
• Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer
• Can improve the feature expression ability w/o adding computational cost
Classification
16. 16Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Dynamic number of Channels in Bottleneck Layer
• Varies according to the input shape instead of fixed 4 times of growth rate
• For the first several dense layer, bottleneck layer increases computational cost instead of reducing
Classification
17. 17Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Transition Layer without Compression
• Compression factor proposed by DenseNet can hurts the feature expression
• Keep the number of output channels the same as the number of input channels in transition layer
• Composite Function
• Use conventional post-activation (Conv-BN-ReLU)
• Also add 1x1 conv after the last dense block to get the stronger representational ability
Classification
18. 18Type A-3
PeleeNet: an efficient feature extraction network for image classification
• PeleeNet
• Early stage features are very important for vision tasks
• Premature reducing the feature map size can impair representational ability
PeleeNet architecture
PeleeNet ablation study
Classification
19. 19Type A-3
PeleeNet: an efficient feature extraction network for image classification
• PeleeNet Result
• Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on
NVIDIA TX2 using only 66% of the model size of MobileNet.
• PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode.
→ Depthwise Separable Convolution is slow in TX2 FP16
Classification
ImageNet Result
Speed on NVIDIA TX2
20. 20Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
21. 21Type A-3
Pelee: a real-time object detection system
• SSD + PeleeNet → Pelee detector
• Key Features
• Feature Map Selection
• Residual Prediction Block
• Small Convolutional Kernel for Prediction
Object Detection
Effects of key features
22. 22Type A-3
Pelee: a real-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1)
• Do not use 38x38 feature map to reduce computational cost
Object Detection
SSD architecture
Feature Map Selection
23. 23Type A-3
Pelee: a real-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38
• Residual Prediction Block
• For each feature map, build residual block before conducting prediction
• 1x1 Convolutional Kernel for prediction
Object Detection
24. 24Type A-3
Pelee: a real-time object detection system
• Pelee Result
• PASCAL VOC 2007, COCO 15 benchmark
• Fast, Low Computational Cost, and Accurate than SSD, YOLO
Object Detection
25. 25Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
26. 26Type A-3
Conclusion
• Depthwise Separable Convolution is not only way to build an efficient model
• PeleeNet and Pelee are built with conventional convolution
• In real devices(iPhone8, Jetson TX2), perform real-time prediction for image
classification and object detection
• Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate!
• And, the code is simple to implement!! So I highly recommend it!!