PVANet:
Lightweight Deep Neural Networks
for Real-time Object Detection
3rd September, 2017
JinWon Lee
Samsung Electronics
Sanghoon Hong, B. Roh, K. Kim,Y. Cheon, M. Park
Intel Imaging and CameraTechnology
Many slides are copied from Sanghoon Hong’s slides
https://drive.google.com/drive/folders/0B8z5oUpB2DysSm1IOV9yeXRULVE
BeforeWe Start…
• Faster R-CNN
 PR-013 : presented by Jinwon Lee
 https://youtu.be/kcPAGIgBGRs
• YOLO
 PR-016 : presented byTaegyun Jeon
 https://youtu.be/eTDcoeqj1_w
• YOLO9000
 PR-023 : presented by Jinwon Lee
 https://youtu.be/6fdclSGgeio
• Concepts of Distance / Metric
 Terry’s deep learning talk byTerryTaewoong Um
 https://youtu.be/4KXgdf6Bmo4?list=PL0oFI08O71gKEXITQ7OG2SCCXkrtid7
Fq
PASCALVOC 2012 Leaderboard
Recap – Faster R-CNN
• Insert a Region Proposal Network (RPN)
after the last convolutional layer 
using GPU!
• RPN trained to produce region
proposals directly; no need for external
region proposals
• After RPN, use RoI Pooling and an
upstream classifier and bbox regressor
just like Fast R-CNN
Recap – Faster R-CNN
Motivations
• Object Detection: slow & computationally expensive
• Successes in network compression
• Can we design a less-redundant network from scratch?
Kim et al. (2016). Compression of Deep Convolutional Neural Networks for
Fast and Low Power Mobile Applications
Han et al. (2015) Learning both weights and connections for
efficient neural networks
Design Principles
• Deep but Narrow
• Modified concatenated ReLU
• Inception
• Hyper-feature concatenation
Deep but Narrow
• Reduce redundancies from excessive convolutional outputs
Modified Concatenated ReLU(mCReLU)
• Reduce redundancies in the early convolutional layers
• Better accuracy and less training loss than the original C.ReLU(Shang et al. 2016)
Inception
• Reduce redundancies resulted from various-sized objects
(Szegedy et al. 2015)
Main Building Blocks of PVANet
• Every convolutional layer in these building blocks has its
corresponding activation layers, a BatchNorm and a ReLU layer
Hyper-featureConcatenation
• Low-level details bypass redundant convolutional layers
• Higher-level convolutions concentrate on contexts/abstractions
Kong et al. (2016) HyperNet: Towards Accurate Region
Proposal Generation and Joint Object Detection
pooling upscale
Overall Structure
• 54 convolutional + 3 fully connected layers
• Residual connections and batch normalization
Details of Networks
Results
• ILSVRC2012 Classification(Validation)
 As accurate as GoogLeNet and as light as AlexNet
Results
• VOC2007 Detection
 PRN can capture almost 99% of the target objects with only 200 proposals
Results
• VOC2012 Detection
 The lightest among >80% mAP models
Results
• VOC2012 Detection
 Compressed model runs real-time (30 fps) on a GPU
Summary
• PVANet: Lightweight, deep neural network for high-accuracy real-time object
detection
• Design principles for a less-redundant network
 Deep but narrow
 Modified C.ReLU
 Inception and hyper-feature concatenation
• Potential for real-time object detection in edge devices or embedded systems
• Other methodologies can be easily integrated with PVANet and further
reduce its computational cost

PVANet - PR033

  • 1.
    PVANet: Lightweight Deep NeuralNetworks for Real-time Object Detection 3rd September, 2017 JinWon Lee Samsung Electronics Sanghoon Hong, B. Roh, K. Kim,Y. Cheon, M. Park Intel Imaging and CameraTechnology
  • 2.
    Many slides arecopied from Sanghoon Hong’s slides https://drive.google.com/drive/folders/0B8z5oUpB2DysSm1IOV9yeXRULVE
  • 3.
    BeforeWe Start… • FasterR-CNN  PR-013 : presented by Jinwon Lee  https://youtu.be/kcPAGIgBGRs • YOLO  PR-016 : presented byTaegyun Jeon  https://youtu.be/eTDcoeqj1_w • YOLO9000  PR-023 : presented by Jinwon Lee  https://youtu.be/6fdclSGgeio • Concepts of Distance / Metric  Terry’s deep learning talk byTerryTaewoong Um  https://youtu.be/4KXgdf6Bmo4?list=PL0oFI08O71gKEXITQ7OG2SCCXkrtid7 Fq
  • 4.
  • 5.
    Recap – FasterR-CNN • Insert a Region Proposal Network (RPN) after the last convolutional layer  using GPU! • RPN trained to produce region proposals directly; no need for external region proposals • After RPN, use RoI Pooling and an upstream classifier and bbox regressor just like Fast R-CNN
  • 6.
  • 7.
    Motivations • Object Detection:slow & computationally expensive • Successes in network compression • Can we design a less-redundant network from scratch? Kim et al. (2016). Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications Han et al. (2015) Learning both weights and connections for efficient neural networks
  • 8.
    Design Principles • Deepbut Narrow • Modified concatenated ReLU • Inception • Hyper-feature concatenation
  • 9.
    Deep but Narrow •Reduce redundancies from excessive convolutional outputs
  • 10.
    Modified Concatenated ReLU(mCReLU) •Reduce redundancies in the early convolutional layers • Better accuracy and less training loss than the original C.ReLU(Shang et al. 2016)
  • 11.
    Inception • Reduce redundanciesresulted from various-sized objects (Szegedy et al. 2015)
  • 12.
    Main Building Blocksof PVANet • Every convolutional layer in these building blocks has its corresponding activation layers, a BatchNorm and a ReLU layer
  • 13.
    Hyper-featureConcatenation • Low-level detailsbypass redundant convolutional layers • Higher-level convolutions concentrate on contexts/abstractions Kong et al. (2016) HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection pooling upscale
  • 14.
    Overall Structure • 54convolutional + 3 fully connected layers • Residual connections and batch normalization
  • 15.
  • 16.
    Results • ILSVRC2012 Classification(Validation) As accurate as GoogLeNet and as light as AlexNet
  • 17.
    Results • VOC2007 Detection PRN can capture almost 99% of the target objects with only 200 proposals
  • 18.
    Results • VOC2012 Detection The lightest among >80% mAP models
  • 19.
    Results • VOC2012 Detection Compressed model runs real-time (30 fps) on a GPU
  • 20.
    Summary • PVANet: Lightweight,deep neural network for high-accuracy real-time object detection • Design principles for a less-redundant network  Deep but narrow  Modified C.ReLU  Inception and hyper-feature concatenation • Potential for real-time object detection in edge devices or embedded systems • Other methodologies can be easily integrated with PVANet and further reduce its computational cost