SlideShare a Scribd company logo
1 of 27
Download to read offline
Intelligence Machine Vision Lab
Strictly Confidential
Pelee: A Real-Time Object Detection System on
Mobile Devices 리뷰
수아랩 이호성
2Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
3Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
4Type A-3
Introduction
• Increasing need of running CNN on mobile devices
• Limited computing power and memory resource
• Ex) Drone, Smart Camera, Smart Phone..
• A number of efficient oriented CNN have been proposed
• MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution
• Pelee only use conventional convolution instead
• Pelee can be used for both classification and object detection!
Inefficient implementation..
PeleeNet Pelee
5Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
6Type A-3
Related Works
MobileNet, 2017 arXiv
• Depthwise Separable Convolution
Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
https://arxiv.org/pdf/1704.04861.pdf
7Type A-3
Related Works
ShuffleNet, 2017 arXiv
• Depthwise Separable Convolution
• Pointwise Group Convolution
• Channel Shuffle Operation
https://arxiv.org/pdf/1707.01083.pdf
8Type A-3
Related Works
MobileNet V2, 2018 arXiv
• Depthwise Separable Convolution
• Linear Bottlenecks
• Inverted Residuals
https://arxiv.org/pdf/1801.04381.pdf
9Type A-3
Related Works
ShuffleNet V2, 2018 arXiv
• Equal channel width minimizes memory access cost (balanced convolution)
• Excessive group convolution increases memory access cost
• Network fragmentation reduces degree of parallelism
• Element-wise operation are non-negligible
https://arxiv.org/pdf/1807.11164.pdf
10Type A-3
Related Works
DenseNet, 2017 arXiv
• Densely Connected Convolution
• BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer
https://arxiv.org/pdf/1608.06993.pdf
11Type A-3
Related Works
MobileNet, 2017 arXiv
ShuffleNet, 2017 arXiv
MobileNet V2, 2018 arXiv
ShuffleNet V2, 2018 arXiv
DenseNet, 2017 arXiv
5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다.
https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br
https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8
PR12 Season 1
PR12 Season 2
12Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
13Type A-3
PeleeNet: an efficient feature extraction network for image classification
• DenseNet variant architecture – PeleeNet
• Key Features
• Two-way Dense Layer
• Stem Block
• Dynamic number of Channels in Bottleneck Layer
• Transition Layer without Compression
• Composite Function
Classification
14Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Two-Way Dense Layer
• Motivated by GoogLeNet, use a 2-way dense layer
• Can get different scales of receptive fields
• Two stacked 3x3 conv → learn visual patterns for large objects
Classification
15Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Stem Block
• Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer
• Can improve the feature expression ability w/o adding computational cost
Classification
16Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Dynamic number of Channels in Bottleneck Layer
• Varies according to the input shape instead of fixed 4 times of growth rate
• For the first several dense layer, bottleneck layer increases computational cost instead of reducing
Classification
17Type A-3
PeleeNet: an efficient feature extraction network for image classification
• Transition Layer without Compression
• Compression factor proposed by DenseNet can hurts the feature expression
• Keep the number of output channels the same as the number of input channels in transition layer
• Composite Function
• Use conventional post-activation (Conv-BN-ReLU)
• Also add 1x1 conv after the last dense block to get the stronger representational ability
Classification
18Type A-3
PeleeNet: an efficient feature extraction network for image classification
• PeleeNet
• Early stage features are very important for vision tasks
• Premature reducing the feature map size can impair representational ability
PeleeNet architecture
PeleeNet ablation study
Classification
19Type A-3
PeleeNet: an efficient feature extraction network for image classification
• PeleeNet Result
• Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on
NVIDIA TX2 using only 66% of the model size of MobileNet.
• PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode.
→ Depthwise Separable Convolution is slow in TX2 FP16
Classification
ImageNet Result
Speed on NVIDIA TX2
20Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
21Type A-3
Pelee: a real-time object detection system
• SSD + PeleeNet → Pelee detector
• Key Features
• Feature Map Selection
• Residual Prediction Block
• Small Convolutional Kernel for Prediction
Object Detection
Effects of key features
22Type A-3
Pelee: a real-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1)
• Do not use 38x38 feature map to reduce computational cost
Object Detection
SSD architecture
Feature Map Selection
23Type A-3
Pelee: a real-time object detection system
• Feature Map Selection
• SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38
• Residual Prediction Block
• For each feature map, build residual block before conducting prediction
• 1x1 Convolutional Kernel for prediction
Object Detection
24Type A-3
Pelee: a real-time object detection system
• Pelee Result
• PASCAL VOC 2007, COCO 15 benchmark
• Fast, Low Computational Cost, and Accurate than SSD, YOLO
Object Detection
25Type A-3
Contents
• Introduction
• Related Works
• PeleeNet: an efficient feature extraction network for image classification
• Pelee: a real-time object detection system
• Conclusion
26Type A-3
Conclusion
• Depthwise Separable Convolution is not only way to build an efficient model
• PeleeNet and Pelee are built with conventional convolution
• In real devices(iPhone8, Jetson TX2), perform real-time prediction for image
classification and object detection
• Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate!
• And, the code is simple to implement!! So I highly recommend it!!
Thank you

More Related Content

What's hot (11)

YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESYOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
 
Blue eye technology
Blue eye technologyBlue eye technology
Blue eye technology
 
TRAFFIC SIGN BOARD RECOGNITION AND VOICE ALERT SYSTEM USING CNN
TRAFFIC SIGN BOARD RECOGNITION AND VOICE ALERT SYSTEM USING CNNTRAFFIC SIGN BOARD RECOGNITION AND VOICE ALERT SYSTEM USING CNN
TRAFFIC SIGN BOARD RECOGNITION AND VOICE ALERT SYSTEM USING CNN
 
Synops emotion recognize
Synops emotion recognizeSynops emotion recognize
Synops emotion recognize
 
Virtual keyboard
Virtual keyboardVirtual keyboard
Virtual keyboard
 
Virtual keyboard
Virtual keyboardVirtual keyboard
Virtual keyboard
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
ppt on virtual keyboard
ppt on virtual keyboardppt on virtual keyboard
ppt on virtual keyboard
 
8th sem.pptx
8th sem.pptx8th sem.pptx
8th sem.pptx
 
CLOCKLESS CHIP BY Saurabh singh
CLOCKLESS CHIP BY Saurabh singhCLOCKLESS CHIP BY Saurabh singh
CLOCKLESS CHIP BY Saurabh singh
 
Virtual Key Board
Virtual Key BoardVirtual Key Board
Virtual Key Board
 

Similar to Pelee: a real time object detection system on mobile devices Paper Review

Similar to Pelee: a real time object detection system on mobile devices Paper Review (20)

Nas net where model learn to generate models
Nas net where model learn to generate modelsNas net where model learn to generate models
Nas net where model learn to generate models
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptx
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural Networks
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
VINX-NOG 2022: An update on IPv6, RPKI and tools
VINX-NOG 2022: An update on IPv6, RPKI and tools VINX-NOG 2022: An update on IPv6, RPKI and tools
VINX-NOG 2022: An update on IPv6, RPKI and tools
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 

More from LEE HOSEONG

More from LEE HOSEONG (20)

Unsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationUnsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillation
 
do adversarially robust image net models transfer better
do adversarially robust image net models transfer betterdo adversarially robust image net models transfer better
do adversarially robust image net models transfer better
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
 
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly DetectionMVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
2019 ICLR Best Paper Review
2019 ICLR Best Paper Review2019 ICLR Best Paper Review
2019 ICLR Best Paper Review
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 

Pelee: a real time object detection system on mobile devices Paper Review

  • 1. Intelligence Machine Vision Lab Strictly Confidential Pelee: A Real-Time Object Detection System on Mobile Devices 리뷰 수아랩 이호성
  • 2. 2Type A-3 Contents • Introduction • Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 3. 3Type A-3 Contents • Introduction • Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 4. 4Type A-3 Introduction • Increasing need of running CNN on mobile devices • Limited computing power and memory resource • Ex) Drone, Smart Camera, Smart Phone.. • A number of efficient oriented CNN have been proposed • MobileNet, ShuffleNet, and MobileNet V2 → heavily dependent on depthwise separable convolution • Pelee only use conventional convolution instead • Pelee can be used for both classification and object detection! Inefficient implementation.. PeleeNet Pelee
  • 5. 5Type A-3 Contents • Introduction • Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 6. 6Type A-3 Related Works MobileNet, 2017 arXiv • Depthwise Separable Convolution Fig from https://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/ https://arxiv.org/pdf/1704.04861.pdf
  • 7. 7Type A-3 Related Works ShuffleNet, 2017 arXiv • Depthwise Separable Convolution • Pointwise Group Convolution • Channel Shuffle Operation https://arxiv.org/pdf/1707.01083.pdf
  • 8. 8Type A-3 Related Works MobileNet V2, 2018 arXiv • Depthwise Separable Convolution • Linear Bottlenecks • Inverted Residuals https://arxiv.org/pdf/1801.04381.pdf
  • 9. 9Type A-3 Related Works ShuffleNet V2, 2018 arXiv • Equal channel width minimizes memory access cost (balanced convolution) • Excessive group convolution increases memory access cost • Network fragmentation reduces degree of parallelism • Element-wise operation are non-negligible https://arxiv.org/pdf/1807.11164.pdf
  • 10. 10Type A-3 Related Works DenseNet, 2017 arXiv • Densely Connected Convolution • BN-ReLU-Conv 1x1-BN-ReLU-Conv 3x3 bottleneck layer https://arxiv.org/pdf/1608.06993.pdf
  • 11. 11Type A-3 Related Works MobileNet, 2017 arXiv ShuffleNet, 2017 arXiv MobileNet V2, 2018 arXiv ShuffleNet V2, 2018 arXiv DenseNet, 2017 arXiv 5편의 논문에 대한 리뷰는 PR-12에서 찾아볼 수 있습니다. https://www.youtube.com/watch?v=auKdde7Anr8&list=PLWKf9beHi3Tg50UoyTe6rIm20sVQOH1br https://www.youtube.com/watch?v=FfBp6xJqZVA&list=PLWKf9beHi3TgstcIn8K6dI_85_ppAxzB8 PR12 Season 1 PR12 Season 2
  • 12. 12Type A-3 Contents • Introduction • Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 13. 13Type A-3 PeleeNet: an efficient feature extraction network for image classification • DenseNet variant architecture – PeleeNet • Key Features • Two-way Dense Layer • Stem Block • Dynamic number of Channels in Bottleneck Layer • Transition Layer without Compression • Composite Function Classification
  • 14. 14Type A-3 PeleeNet: an efficient feature extraction network for image classification • Two-Way Dense Layer • Motivated by GoogLeNet, use a 2-way dense layer • Can get different scales of receptive fields • Two stacked 3x3 conv → learn visual patterns for large objects Classification
  • 15. 15Type A-3 PeleeNet: an efficient feature extraction network for image classification • Stem Block • Motivated by Inception-v4 and DSOD, use a cost efficient stem block before first dense layer • Can improve the feature expression ability w/o adding computational cost Classification
  • 16. 16Type A-3 PeleeNet: an efficient feature extraction network for image classification • Dynamic number of Channels in Bottleneck Layer • Varies according to the input shape instead of fixed 4 times of growth rate • For the first several dense layer, bottleneck layer increases computational cost instead of reducing Classification
  • 17. 17Type A-3 PeleeNet: an efficient feature extraction network for image classification • Transition Layer without Compression • Compression factor proposed by DenseNet can hurts the feature expression • Keep the number of output channels the same as the number of input channels in transition layer • Composite Function • Use conventional post-activation (Conv-BN-ReLU) • Also add 1x1 conv after the last dense block to get the stronger representational ability Classification
  • 18. 18Type A-3 PeleeNet: an efficient feature extraction network for image classification • PeleeNet • Early stage features are very important for vision tasks • Premature reducing the feature map size can impair representational ability PeleeNet architecture PeleeNet ablation study Classification
  • 19. 19Type A-3 PeleeNet: an efficient feature extraction network for image classification • PeleeNet Result • Achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2 using only 66% of the model size of MobileNet. • PeleeNet runs 1.8 times faster in FP16 mode than in FP32 mode. → Depthwise Separable Convolution is slow in TX2 FP16 Classification ImageNet Result Speed on NVIDIA TX2
  • 20. 20Type A-3 Contents • Introduction • Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 21. 21Type A-3 Pelee: a real-time object detection system • SSD + PeleeNet → Pelee detector • Key Features • Feature Map Selection • Residual Prediction Block • Small Convolutional Kernel for Prediction Object Detection Effects of key features
  • 22. 22Type A-3 Pelee: a real-time object detection system • Feature Map Selection • SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) • Do not use 38x38 feature map to reduce computational cost Object Detection SSD architecture Feature Map Selection
  • 23. 23Type A-3 Pelee: a real-time object detection system • Feature Map Selection • SSD with 5 scale feature map (19x19, 10x10, 5x5, 3x3, 1x1) – do not use 38x38 • Residual Prediction Block • For each feature map, build residual block before conducting prediction • 1x1 Convolutional Kernel for prediction Object Detection
  • 24. 24Type A-3 Pelee: a real-time object detection system • Pelee Result • PASCAL VOC 2007, COCO 15 benchmark • Fast, Low Computational Cost, and Accurate than SSD, YOLO Object Detection
  • 25. 25Type A-3 Contents • Introduction • Related Works • PeleeNet: an efficient feature extraction network for image classification • Pelee: a real-time object detection system • Conclusion
  • 26. 26Type A-3 Conclusion • Depthwise Separable Convolution is not only way to build an efficient model • PeleeNet and Pelee are built with conventional convolution • In real devices(iPhone8, Jetson TX2), perform real-time prediction for image classification and object detection • Compared to existing model, PeleeNet and Pelee is faster, cheap and accurate! • And, the code is simple to implement!! So I highly recommend it!!