PVANet - PR033

PVANet:
Lightweight Deep Neural Networks
for Real-time Object Detection
3rd September, 2017
JinWon Lee
Samsung Electronics
Sanghoon Hong, B. Roh, K. Kim,Y. Cheon, M. Park
Intel Imaging and CameraTechnology

Many slides are copied from Sanghoon Hong’s slides
https://drive.google.com/drive/folders/0B8z5oUpB2DysSm1IOV9yeXRULVE

BeforeWe Start…
• Faster R-CNN
 PR-013 : presented by Jinwon Lee
 https://youtu.be/kcPAGIgBGRs
• YOLO
 PR-016 : presented byTaegyun Jeon
 https://youtu.be/eTDcoeqj1_w
• YOLO9000
 PR-023 : presented by Jinwon Lee
 https://youtu.be/6fdclSGgeio
• Concepts of Distance / Metric
 Terry’s deep learning talk byTerryTaewoong Um
 https://youtu.be/4KXgdf6Bmo4?list=PL0oFI08O71gKEXITQ7OG2SCCXkrtid7
Fq

Recap – Faster R-CNN
• Insert a Region Proposal Network (RPN)
after the last convolutional layer 
using GPU!
• RPN trained to produce region
proposals directly; no need for external
region proposals
• After RPN, use RoI Pooling and an
upstream classifier and bbox regressor
just like Fast R-CNN

Motivations
• Object Detection: slow & computationally expensive
• Successes in network compression
• Can we design a less-redundant network from scratch?
Kim et al. (2016). Compression of Deep Convolutional Neural Networks for
Fast and Low Power Mobile Applications
Han et al. (2015) Learning both weights and connections for
efficient neural networks

Design Principles
• Deep but Narrow
• Modified concatenated ReLU
• Inception
• Hyper-feature concatenation

Deep but Narrow
• Reduce redundancies from excessive convolutional outputs

Modified Concatenated ReLU(mCReLU)
• Reduce redundancies in the early convolutional layers
• Better accuracy and less training loss than the original C.ReLU(Shang et al. 2016)

Inception
• Reduce redundancies resulted from various-sized objects
(Szegedy et al. 2015)

Main Building Blocks of PVANet
• Every convolutional layer in these building blocks has its
corresponding activation layers, a BatchNorm and a ReLU layer

Hyper-featureConcatenation
• Low-level details bypass redundant convolutional layers
• Higher-level convolutions concentrate on contexts/abstractions
Kong et al. (2016) HyperNet: Towards Accurate Region
Proposal Generation and Joint Object Detection
pooling upscale

Overall Structure
• 54 convolutional + 3 fully connected layers
• Residual connections and batch normalization

Results
• ILSVRC2012 Classification(Validation)
 As accurate as GoogLeNet and as light as AlexNet

Results
• VOC2007 Detection
 PRN can capture almost 99% of the target objects with only 200 proposals

Results
 The lightest among >80% mAP models

Results
 Compressed model runs real-time (30 fps) on a GPU

Summary
• PVANet: Lightweight, deep neural network for high-accuracy real-time object
detection
• Design principles for a less-redundant network
 Deep but narrow
 Modified C.ReLU
 Inception and hyper-feature concatenation
• Potential for real-time object detection in edge devices or embedded systems
• Other methodologies can be easily integrated with PVANet and further
reduce its computational cost

PVANet - PR033

More Related Content

What's hot

Viewers also liked

Similar to PVANet - PR033

More from Jinwon Lee

Recently uploaded

PVANet - PR033