SlideShare a Scribd company logo
1 of 50
Perception and Intelligence Laboratory
Seoul
National
University
Fast R-CNN
Ross Girshick, MSRA
Junho Cho
15/08/07
• FRCN (Fast R-CNN)
• Fast Region-based Convolutional Networks (R-CNNs) for Object Detection
• VGG16: Trains 9x faster than RCNN, 3x faster than SPPnet
Runs 200x faster than RCNN, 10x faster than SPPnet
• Implemented on Python and C++/Caffe
https://github.com/rbgirshick/fast-rcnn
Perception and Intelligence Lab., Copyright © 2015 2
Introduction
< VGG16>
Previous methods
-RCNN & SPPnet
Chapter 01.
Perception and Intelligence Lab., Copyright © 2015 4
Classification & Detection
• R-CNN
Rich Feature Hierarchies for Accurate Object Detection
and Semantic Segmentation [CVPR 2014]
• SPPnet
Spatial Pyramid Pooling in Deep Convolutional Networks
for Visual Recognition [ECCV 2014]
• DeepMultiBox
Scalable Object Detection using Deep Neural Networks
[CVPR 2014]
5
Previously
Lab meeting
R-CNN:Regions with CNNfeatures
aeroplane? no.
..
person? yes.
tvmonitor? no.
..
CNN
Input
image
Extract region
proposals(~2k/ image)
ComputeCNN
features
Classifyregions
(linearSVM)
Traditionally..
4096
1000
4096
traditional
CNN
(R-CNN) fixed size conv fcfixed size
SPP net
SPP-net
any size
4096
1000
4096
spatial pyramid
pooling
• Fix bin numbers
• DO NOT fix bin size
Spatial Pyramid Pooling
conv feature maps
conv layers
input image
region
fc layers
…...
SPP net
SPP-net
any size
4096
1000
4096
spatial pyramid
pooling
• Fix bin numbers
• DO NOT fix bin size
4096
1000
4096
traditional
CNN
(R-CNN) fixed size conv fcfixed size
RCNN vs. SPP
• image regions vs. feature map regions
image
SPP-net
1 net on full image
net
feature
feature
feature
net
feature
image
R-CNN
2000 nets on image regions
net
feature
net
feature
net
feature
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Perception and Intelligence Lab., Copyright © 2015 11
SPP-net
Forward
any size
4096
1000
4096
spatial pyramid
pooling
R times
back-propagation
Backward
R times
Slow and Heavy
computation
R: # of RoIs
FRCN
• Higher mAP on PASCAL VOC than RCNN & SPPnet
• Training is single-stage, using multi-task loss
• No liniear SVM (unlike RCNN SPPnet)
• Softmax & BB regressor altogether!
• Simpler training & Higher mAP
• All network layers can be updated during training
• SPPnet only can update FCs
• Higher mAP
• No disk storage is required for feature caching.
• Unlike RCNN & SPPnet
• Very fast training & test time
• Novel method to train(BP) ConvNet faster than SPPnet
Perception and Intelligence Lab., Copyright © 2015 12
Contribution of FRCN
FRCN
Chapter 02.
• Caffe implemented architecture
Perception and Intelligence Lab., Copyright © 2015 14
FRCN (test-time detection)
Perception and Intelligence Lab., Copyright © 2015 15
FRCN (test-time detection)
Each RoI pooled into fixed-size feature map
Mapped to RoI feature vector by fully-connected layers (FCs).
𝑁: # of feature maps
𝐾: # of object classes
𝑅: # of RoIs
Perception and Intelligence Lab., Copyright © 2015 16
FRCN architecture (RoI pooling layer)
𝑁: # of feature maps
𝐾: # of object classes
𝑅: # of RoIs
RoI Pooling Layer
• Special case of SPP layer
• Two inputs
• Conv feature map: 512 × 𝐻 × 𝑊
(512&𝐻&𝑊: 𝑏𝑙𝑜𝑏 𝑠𝑖𝑧𝑒 𝑎𝑓𝑡𝑒𝑟 𝑐𝑜𝑛𝑣)
• RoI: 𝑅 × 5
• 5 from 𝑟, 𝑥, 𝑦, ℎ, 𝑤
• 𝑟 ∈ 0, 𝑅 − 1 : image batch index
• Adaptive max pooling
• Pooled to fixed size feature vector
Perception and Intelligence Lab., Copyright © 2015 17
FRCN architecture (RoI pooling layer)
Perception and Intelligence Lab., Copyright © 2015 18
FRCN architecture (RoI pooling layer)
• Two sibling layers per each RoI
1. Softmax probability estimates over the K objects + 1 b.g.
2. 4 real-valued numbers (x, y, h, w) for each of the K object classes
• 4K values encode refined b.b. for each class
Perception and Intelligence Lab., Copyright © 2015 19
FRCN architecture (Two sibling layers)
𝑁: # of object box proposals
𝐾: # of object classes
• Two output types:
1. Softmax: 𝑁𝐾 regressed object boxes
2. Bbox regressors: 𝑃(𝑐𝑙𝑠 = 𝑘|𝑏𝑜𝑥 = 𝑛, 𝑖𝑚𝑎𝑔𝑒) for each 𝑁𝐾 boxes
Perception and Intelligence Lab., Copyright © 2015 20
FRCN architecture (Two sibling layers)
• Use 3 pre-trained ImageNet networks
• CaffeNet(AlexNet) as S (5convs 3FCs)
• VGG_CNN_M_1024 as M (deep as S but wider)
• VGG16 as L (13convs 3FCs)
Perception and Intelligence Lab., Copyright © 2015 21
Training
< AlexNet> < VGG16>
Modification based on RCNN
• Last max pooling layer: replaced by RoI pooling layer
• Pooled to fixed size 𝐻′
, 𝑊′
compatible with FCs
• Final FC layer & softmax  two sibling layers
• a FC layer and softmax over 𝐾 + 1 categories
• BB regressors
• Two data inputs
• A batch of 𝑁 images
• A list of 𝑅 RoIs
Perception and Intelligence Lab., Copyright © 2015 22
Training
• SPPnet
• SPP applied to pre-computed conv feature maps of whole image
• Conv features computed offline,
fine-tuning can’t back-propagate errors below SPP layer
• VGG16: first 13 conv layers remain fixed. Only 3 FC layers updated
• RoI-centric sampling
• Sample from all RoIs (like RCNN)
• SGD back propagation for each RoIs
• Too much memory, too slow
• FRCN
• Image-centric sampling: more efficient
• Mini-batches are sampled hierarchically
• First sampling images and then RoIs within those images
• RoIs share CNN (computation and memory) more efficient
• Thus, one fine-tuning stage: jointly optimizes softmax classifier & BB regressors
• Loss, mini-batch sampling strategy, bp through RoI pooling layers, and SGD hyperparameters
Perception and Intelligence Lab., Copyright © 2015 23
Fine-tuning
• Mini-batch sampling
• Each SGD mini-batch from 𝑁 = 2 images
• Mini-batches of 𝑅 = 128, 64 RoIs from 2 images
• 25% from RoIs from obj proposals which IoU ≥ 0.5 with ground truth
• Maximum IoU with ground truth in [0.1, 0.5) used as BG
• Sampled image horizontally flipped with probability 0.5
Perception and Intelligence Lab., Copyright © 2015 24
Fine-tuning (mini-batch sampling)
Perception and Intelligence Lab., Copyright © 2015 25
Fine-tuning (mini-batch sampling)
84=21class * 4 co-ord values 5:Index and co-ord
• 𝑅 = 128
• Multi-task loss 𝐿 is averaged over 𝑅 outputs.
• Input variable 𝑥
• Sum over all RoIs that max-pooled 𝑥 in the forward pass:
Perception and Intelligence Lab., Copyright © 2015 26
Fine-tuning
(Back-propagation through RoI pooling layer)
For all RoI, for all y in pooled vector, if y pooled x
Perception and Intelligence Lab., Copyright © 2015 27
SPP-net
Forward
any size
4096
1000
4096
spatial pyramid
pooling
R times
back-propagation
Backward R times
Slow and Heavy
computation
R: # of RoIs
FRCN
Backward
R times back-propagation1 time back-propagation
Fast & Efficient
Multi-task loss 𝑳 to train network jointly for CLS and BB regressors
• Two sibling layers
1. Discrete probability distribution per RoI
• 𝑝 = (𝑝0, … , 𝑝 𝐾) over 𝐾 + 1 categories
• 𝑝 computed by a softmaxover 𝐾 + 1 categories
2. BB regressor offsets
• 𝑡 𝑘
= (𝑡 𝑥
𝑘
, 𝑡 𝑦
𝑘
, 𝑡 𝑤
𝑘
, 𝑡ℎ
𝑘
) fore each 𝐾 object classes,
indexed by 𝑘 ∈ [0, … , 𝐾]. 0 as back ground (BG).
28
Fine-tuning (Multi-task loss)
• Multi-task loss 𝑳 to train network jointly for CLS and BB regressors
𝑝 = (𝑝0, … , 𝑝 𝐾) over 𝐾 + 1 categories
𝑡 𝑘
= (𝑡 𝑥
𝑘
, 𝑡 𝑦
𝑘
, 𝑡 𝑤
𝑘
, 𝑡ℎ
𝑘
) fore each 𝐾 object classes, indexed by 𝑘 ∈ [0, … , 𝐾]. 0 as BG.
• 𝑘∗
: true class label
• 𝐿 𝑐𝑙𝑠 𝑝, 𝑘∗ = − log 𝑝 𝑘∗ : standard cross-entropy/log loss
• 𝐿𝑙𝑜𝑐 : true bb for class 𝑘∗
: 𝑡∗
= (𝑡 𝑥
∗
, 𝑡 𝑦
∗
, 𝑡 𝑤
∗
, 𝑡ℎ
∗
)
predicted bb: 𝑡 = (𝑡 𝑥, 𝑡 𝑦, 𝑡 𝑤, 𝑡ℎ)
29
Fine-tuning (Multi-task loss)
Iversion bracket
0 if 𝑘∗
= 0 (𝐵𝐺)
1 otherwise
• Use L1 smooth
• Less sensitive to outliers than L2
• L2 loss: significant tuning of learning rate
• 𝜆 balance two losses.
• Generally 1
Perception and Intelligence Lab., Copyright © 2015 30
Fine-tuning (Multi-task loss)
Perception and Intelligence Lab., Copyright © 2015 31
Fine-tuning (Multi-task loss)
RCNN SPPnet FRCN
Multi-stage pipeline
• Separate learning stage
• Extract features
• Fine-tune network with cross-entropy loss
• Train SVMs
• Fitting bounding box regressors
Single-stage training algorithm
• Simplification of learning process
• Using multi-task loss (CLS+BB regressors)
Expensive training on space
• Caching features for SVM & regressors
• Huge storage for VGG16
No disk storage is required for feature
caching
Slow test-time detection
• CNN for all object proposals
• VGG16 detection takes 47s/image
Fast test-time detection
Proposal warping after ConvNet & SPP.
Only one CNN computation
- Only fully-connected layers
(after SPP) can be updated
Whole network can be updated
Perception and Intelligence Lab., Copyright © 2015 32
RCNN, SPPnet, FRCN comparison
Perception and Intelligence Lab., Copyright © 2015 33
Demo
Perception and Intelligence Lab., Copyright © 2015 34
Demo
Results & Discussion
Conclusion
Chapter 03.
1. State-of-the-art mAP on VOC07, 2010, 2012 (at the moment)
2. Fast training & testing time compared to RCNN &SPPnet
3. Fine-tuning conv layers in VGG16 is important
• NOT only FC layers
Perception and Intelligence Lab., Copyright © 2015 36
Results
< All networks are based on VGG16>
Perception and Intelligence Lab., Copyright © 2015 37
Results (mAP)
Training & Test time
Perception and Intelligence Lab., Copyright © 2015 38
Results (Time)
Fine-tuning only FCs VS whole network?
• Only FC layers fine-tuning seems fine
• But doesn’t hold for VGG16 (very deep NNs)
• Freezing 13 conv layers, only 3 FC layers learn, emulates SPPnet
• mAP drop 66.9%  61.4%
• Training through the RoI pooling layer
very important for very deep net (VGG16)
Perception and Intelligence Lab., Copyright © 2015 39
Results (Fine-tuning how many layers?)
• But fine-tuning all conv layers  Inefficient
• Updating from conv2_1 slows training 1.3x compared to conv3_1(12.5h vs 9.5h)
• Over-runs GPU memory
• Conv1: generic and task independent
Perception and Intelligence Lab., Copyright © 2015 40
Results (Fine-tuning how many layers?)
Benefits from Multi-task training
• Convenient training
• Improve results. Tasks influence each other through the ConvNet
• 𝜆 = 0, not BB regressors. Only CLS
• 𝜆 = 1, but disabled BB regressors at test time
• Isolates network’s CLS accuracy for comparison
• Improves pure CLS accuracy! (+0.8~1.1 mAP)
• Train with CLS loss only, then train BB regressors layer 𝐿𝑙𝑜𝑐 freezing others.
• Good, but still under performs multi-task learning
Results (Multi-task training)
More training data
• RCNN based on deep ConvNet learns better with larger dataset
Perception and Intelligence Lab., Copyright © 2015 42
Results (Additional Data)
• Increase # of object proposals don’t help. (Although Average Recall ↑)
• Sparse object proposal methods (e.g. Selective Search) are bottleneck.
• Replacement with dense set of sliding window (free cost)
• Still sparse proposals better on detection quality
Perception and Intelligence Lab., Copyright © 2015 43
Results (Object proposals)
• State-of-the-art detection result
• Detailed experiments providing insights.
• Sparse object proposals improve detector quality
• But, a bottleneck
• Decreasing the object proposal time is critical in the future.
Further more
• Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks Object proposal [ArXiv]
• Detection network also proposes objects
• Cost of proposals: 10ms, VGG16 runtime ~200ms including all steps
• Higher mAP, faster
• R-CNN minus R [BMVC2015]
• Fast detector without Selective Search
• No algorithms other than CNN itself
• Attempts to remove Object proposal algorithms and rely exclusively on CNN
• More integrated, simpler and faster detector
• Share full-image convolutional features with detection net
Perception and Intelligence Lab., Copyright © 2015 44
Conclusion
Thank you
• DeepMultiBox
• Scalable object detection using DNN
• Class-agnostic scalable object detection
• Only Bounding box. Not aware of what the object is in the box.
• Prediction a set of bounding boxes where potential objects are
• Localize then recognize
• Boxes generated using single DNN
• Outputs
• fixed number of bounding boxes.
• A score for each box. Confidence of the box containing an object.
46
Introduction
Training & Test time
• Truncated SVD for network compression (on FC layers)
• High speed-ups with smaller drops in mAP
(reduce detection time 30%, 0.3mAP drop)
Perception and Intelligence Lab., Copyright © 2015 47
Results (Time)
• Multi-stage pipeline
• Separate learning stages
• FRCN: Simplification no learning process & state of art
Single-stage training algorithm
Perception and Intelligence Lab., Copyright © 2015 48
RCNN & SPPnet
Method dataset Measure 1 Measure 2 Measure 3 Measure 4
Baseline ABC 92 12 34 45
XXX ABC 32 32 54 76
YYY ABC 14 14 12 98
ZZZ ABC 32 23 32 67
Proposed ABC 14 42 41 87
Proposed (w.XX) ABC 32 15 35 67
Perception and Intelligence Lab., Copyright © 2015 49
Table example
Table Title (if you want it to place here)
Perception and Intelligence Lab., Copyright © 2015 50
Figure example
< Updated cells > < CNN architecture >
For highlight

More Related Content

What's hot

Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extractionskylian
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentationMrsShwetaBanait1
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 

What's hot (20)

Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Yolo
YoloYolo
Yolo
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Object detection
Object detectionObject detection
Object detection
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
You only look once
You only look onceYou only look once
You only look once
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 

Similar to 150807 Fast R-CNN

Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
Classification of EEG P300 ERPs using Riemannian Geometry and Quantum Computing
Classification of EEG P300 ERPs using Riemannian Geometry and Quantum ComputingClassification of EEG P300 ERPs using Riemannian Geometry and Quantum Computing
Classification of EEG P300 ERPs using Riemannian Geometry and Quantum ComputingAntonAndreev13
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]taeseon ryu
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networksEntrepreneur / Startup
 

Similar to 150807 Fast R-CNN (20)

Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Temporal Segment Network
Temporal Segment NetworkTemporal Segment Network
Temporal Segment Network
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Detection
DetectionDetection
Detection
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Classification of EEG P300 ERPs using Riemannian Geometry and Quantum Computing
Classification of EEG P300 ERPs using Riemannian Geometry and Quantum ComputingClassification of EEG P300 ERPs using Riemannian Geometry and Quantum Computing
Classification of EEG P300 ERPs using Riemannian Geometry and Quantum Computing
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 

More from Junho Cho

Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GANJunho Cho
 
Get Used to Command Line Interface
Get Used to Command Line InterfaceGet Used to Command Line Interface
Get Used to Command Line InterfaceJunho Cho
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkJunho Cho
 
160805 End-to-End Memory Networks
160805 End-to-End Memory Networks160805 End-to-End Memory Networks
160805 End-to-End Memory NetworksJunho Cho
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural RepresentationJunho Cho
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural NetworksJunho Cho
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural NetworksJunho Cho
 
161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMsJunho Cho
 
Unsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image GenerationUnsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image GenerationJunho Cho
 

More from Junho Cho (9)

Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Get Used to Command Line Interface
Get Used to Command Line InterfaceGet Used to Command Line Interface
Get Used to Command Line Interface
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
160805 End-to-End Memory Networks
160805 End-to-End Memory Networks160805 End-to-End Memory Networks
160805 End-to-End Memory Networks
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
 
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
 
161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs161209 Unsupervised Learning of Video Representations using LSTMs
161209 Unsupervised Learning of Video Representations using LSTMs
 
Unsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image GenerationUnsupervised Cross-Domain Image Generation
Unsupervised Cross-Domain Image Generation
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

150807 Fast R-CNN

  • 1. Perception and Intelligence Laboratory Seoul National University Fast R-CNN Ross Girshick, MSRA Junho Cho 15/08/07
  • 2. • FRCN (Fast R-CNN) • Fast Region-based Convolutional Networks (R-CNNs) for Object Detection • VGG16: Trains 9x faster than RCNN, 3x faster than SPPnet Runs 200x faster than RCNN, 10x faster than SPPnet • Implemented on Python and C++/Caffe https://github.com/rbgirshick/fast-rcnn Perception and Intelligence Lab., Copyright © 2015 2 Introduction < VGG16>
  • 3. Previous methods -RCNN & SPPnet Chapter 01.
  • 4. Perception and Intelligence Lab., Copyright © 2015 4 Classification & Detection
  • 5. • R-CNN Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation [CVPR 2014] • SPPnet Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition [ECCV 2014] • DeepMultiBox Scalable Object Detection using Deep Neural Networks [CVPR 2014] 5 Previously Lab meeting
  • 6. R-CNN:Regions with CNNfeatures aeroplane? no. .. person? yes. tvmonitor? no. .. CNN Input image Extract region proposals(~2k/ image) ComputeCNN features Classifyregions (linearSVM)
  • 8. SPP net SPP-net any size 4096 1000 4096 spatial pyramid pooling • Fix bin numbers • DO NOT fix bin size Spatial Pyramid Pooling conv feature maps conv layers input image region fc layers …...
  • 9. SPP net SPP-net any size 4096 1000 4096 spatial pyramid pooling • Fix bin numbers • DO NOT fix bin size 4096 1000 4096 traditional CNN (R-CNN) fixed size conv fcfixed size
  • 10. RCNN vs. SPP • image regions vs. feature map regions image SPP-net 1 net on full image net feature feature feature net feature image R-CNN 2000 nets on image regions net feature net feature net feature “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
  • 11. Perception and Intelligence Lab., Copyright © 2015 11 SPP-net Forward any size 4096 1000 4096 spatial pyramid pooling R times back-propagation Backward R times Slow and Heavy computation R: # of RoIs
  • 12. FRCN • Higher mAP on PASCAL VOC than RCNN & SPPnet • Training is single-stage, using multi-task loss • No liniear SVM (unlike RCNN SPPnet) • Softmax & BB regressor altogether! • Simpler training & Higher mAP • All network layers can be updated during training • SPPnet only can update FCs • Higher mAP • No disk storage is required for feature caching. • Unlike RCNN & SPPnet • Very fast training & test time • Novel method to train(BP) ConvNet faster than SPPnet Perception and Intelligence Lab., Copyright © 2015 12 Contribution of FRCN
  • 14. • Caffe implemented architecture Perception and Intelligence Lab., Copyright © 2015 14 FRCN (test-time detection)
  • 15. Perception and Intelligence Lab., Copyright © 2015 15 FRCN (test-time detection)
  • 16. Each RoI pooled into fixed-size feature map Mapped to RoI feature vector by fully-connected layers (FCs). 𝑁: # of feature maps 𝐾: # of object classes 𝑅: # of RoIs Perception and Intelligence Lab., Copyright © 2015 16 FRCN architecture (RoI pooling layer)
  • 17. 𝑁: # of feature maps 𝐾: # of object classes 𝑅: # of RoIs RoI Pooling Layer • Special case of SPP layer • Two inputs • Conv feature map: 512 × 𝐻 × 𝑊 (512&𝐻&𝑊: 𝑏𝑙𝑜𝑏 𝑠𝑖𝑧𝑒 𝑎𝑓𝑡𝑒𝑟 𝑐𝑜𝑛𝑣) • RoI: 𝑅 × 5 • 5 from 𝑟, 𝑥, 𝑦, ℎ, 𝑤 • 𝑟 ∈ 0, 𝑅 − 1 : image batch index • Adaptive max pooling • Pooled to fixed size feature vector Perception and Intelligence Lab., Copyright © 2015 17 FRCN architecture (RoI pooling layer)
  • 18. Perception and Intelligence Lab., Copyright © 2015 18 FRCN architecture (RoI pooling layer)
  • 19. • Two sibling layers per each RoI 1. Softmax probability estimates over the K objects + 1 b.g. 2. 4 real-valued numbers (x, y, h, w) for each of the K object classes • 4K values encode refined b.b. for each class Perception and Intelligence Lab., Copyright © 2015 19 FRCN architecture (Two sibling layers)
  • 20. 𝑁: # of object box proposals 𝐾: # of object classes • Two output types: 1. Softmax: 𝑁𝐾 regressed object boxes 2. Bbox regressors: 𝑃(𝑐𝑙𝑠 = 𝑘|𝑏𝑜𝑥 = 𝑛, 𝑖𝑚𝑎𝑔𝑒) for each 𝑁𝐾 boxes Perception and Intelligence Lab., Copyright © 2015 20 FRCN architecture (Two sibling layers)
  • 21. • Use 3 pre-trained ImageNet networks • CaffeNet(AlexNet) as S (5convs 3FCs) • VGG_CNN_M_1024 as M (deep as S but wider) • VGG16 as L (13convs 3FCs) Perception and Intelligence Lab., Copyright © 2015 21 Training < AlexNet> < VGG16>
  • 22. Modification based on RCNN • Last max pooling layer: replaced by RoI pooling layer • Pooled to fixed size 𝐻′ , 𝑊′ compatible with FCs • Final FC layer & softmax  two sibling layers • a FC layer and softmax over 𝐾 + 1 categories • BB regressors • Two data inputs • A batch of 𝑁 images • A list of 𝑅 RoIs Perception and Intelligence Lab., Copyright © 2015 22 Training
  • 23. • SPPnet • SPP applied to pre-computed conv feature maps of whole image • Conv features computed offline, fine-tuning can’t back-propagate errors below SPP layer • VGG16: first 13 conv layers remain fixed. Only 3 FC layers updated • RoI-centric sampling • Sample from all RoIs (like RCNN) • SGD back propagation for each RoIs • Too much memory, too slow • FRCN • Image-centric sampling: more efficient • Mini-batches are sampled hierarchically • First sampling images and then RoIs within those images • RoIs share CNN (computation and memory) more efficient • Thus, one fine-tuning stage: jointly optimizes softmax classifier & BB regressors • Loss, mini-batch sampling strategy, bp through RoI pooling layers, and SGD hyperparameters Perception and Intelligence Lab., Copyright © 2015 23 Fine-tuning
  • 24. • Mini-batch sampling • Each SGD mini-batch from 𝑁 = 2 images • Mini-batches of 𝑅 = 128, 64 RoIs from 2 images • 25% from RoIs from obj proposals which IoU ≥ 0.5 with ground truth • Maximum IoU with ground truth in [0.1, 0.5) used as BG • Sampled image horizontally flipped with probability 0.5 Perception and Intelligence Lab., Copyright © 2015 24 Fine-tuning (mini-batch sampling)
  • 25. Perception and Intelligence Lab., Copyright © 2015 25 Fine-tuning (mini-batch sampling) 84=21class * 4 co-ord values 5:Index and co-ord
  • 26. • 𝑅 = 128 • Multi-task loss 𝐿 is averaged over 𝑅 outputs. • Input variable 𝑥 • Sum over all RoIs that max-pooled 𝑥 in the forward pass: Perception and Intelligence Lab., Copyright © 2015 26 Fine-tuning (Back-propagation through RoI pooling layer) For all RoI, for all y in pooled vector, if y pooled x
  • 27. Perception and Intelligence Lab., Copyright © 2015 27 SPP-net Forward any size 4096 1000 4096 spatial pyramid pooling R times back-propagation Backward R times Slow and Heavy computation R: # of RoIs FRCN Backward R times back-propagation1 time back-propagation Fast & Efficient
  • 28. Multi-task loss 𝑳 to train network jointly for CLS and BB regressors • Two sibling layers 1. Discrete probability distribution per RoI • 𝑝 = (𝑝0, … , 𝑝 𝐾) over 𝐾 + 1 categories • 𝑝 computed by a softmaxover 𝐾 + 1 categories 2. BB regressor offsets • 𝑡 𝑘 = (𝑡 𝑥 𝑘 , 𝑡 𝑦 𝑘 , 𝑡 𝑤 𝑘 , 𝑡ℎ 𝑘 ) fore each 𝐾 object classes, indexed by 𝑘 ∈ [0, … , 𝐾]. 0 as back ground (BG). 28 Fine-tuning (Multi-task loss)
  • 29. • Multi-task loss 𝑳 to train network jointly for CLS and BB regressors 𝑝 = (𝑝0, … , 𝑝 𝐾) over 𝐾 + 1 categories 𝑡 𝑘 = (𝑡 𝑥 𝑘 , 𝑡 𝑦 𝑘 , 𝑡 𝑤 𝑘 , 𝑡ℎ 𝑘 ) fore each 𝐾 object classes, indexed by 𝑘 ∈ [0, … , 𝐾]. 0 as BG. • 𝑘∗ : true class label • 𝐿 𝑐𝑙𝑠 𝑝, 𝑘∗ = − log 𝑝 𝑘∗ : standard cross-entropy/log loss • 𝐿𝑙𝑜𝑐 : true bb for class 𝑘∗ : 𝑡∗ = (𝑡 𝑥 ∗ , 𝑡 𝑦 ∗ , 𝑡 𝑤 ∗ , 𝑡ℎ ∗ ) predicted bb: 𝑡 = (𝑡 𝑥, 𝑡 𝑦, 𝑡 𝑤, 𝑡ℎ) 29 Fine-tuning (Multi-task loss) Iversion bracket 0 if 𝑘∗ = 0 (𝐵𝐺) 1 otherwise
  • 30. • Use L1 smooth • Less sensitive to outliers than L2 • L2 loss: significant tuning of learning rate • 𝜆 balance two losses. • Generally 1 Perception and Intelligence Lab., Copyright © 2015 30 Fine-tuning (Multi-task loss)
  • 31. Perception and Intelligence Lab., Copyright © 2015 31 Fine-tuning (Multi-task loss)
  • 32. RCNN SPPnet FRCN Multi-stage pipeline • Separate learning stage • Extract features • Fine-tune network with cross-entropy loss • Train SVMs • Fitting bounding box regressors Single-stage training algorithm • Simplification of learning process • Using multi-task loss (CLS+BB regressors) Expensive training on space • Caching features for SVM & regressors • Huge storage for VGG16 No disk storage is required for feature caching Slow test-time detection • CNN for all object proposals • VGG16 detection takes 47s/image Fast test-time detection Proposal warping after ConvNet & SPP. Only one CNN computation - Only fully-connected layers (after SPP) can be updated Whole network can be updated Perception and Intelligence Lab., Copyright © 2015 32 RCNN, SPPnet, FRCN comparison
  • 33. Perception and Intelligence Lab., Copyright © 2015 33 Demo
  • 34. Perception and Intelligence Lab., Copyright © 2015 34 Demo
  • 36. 1. State-of-the-art mAP on VOC07, 2010, 2012 (at the moment) 2. Fast training & testing time compared to RCNN &SPPnet 3. Fine-tuning conv layers in VGG16 is important • NOT only FC layers Perception and Intelligence Lab., Copyright © 2015 36 Results < All networks are based on VGG16>
  • 37. Perception and Intelligence Lab., Copyright © 2015 37 Results (mAP)
  • 38. Training & Test time Perception and Intelligence Lab., Copyright © 2015 38 Results (Time)
  • 39. Fine-tuning only FCs VS whole network? • Only FC layers fine-tuning seems fine • But doesn’t hold for VGG16 (very deep NNs) • Freezing 13 conv layers, only 3 FC layers learn, emulates SPPnet • mAP drop 66.9%  61.4% • Training through the RoI pooling layer very important for very deep net (VGG16) Perception and Intelligence Lab., Copyright © 2015 39 Results (Fine-tuning how many layers?)
  • 40. • But fine-tuning all conv layers  Inefficient • Updating from conv2_1 slows training 1.3x compared to conv3_1(12.5h vs 9.5h) • Over-runs GPU memory • Conv1: generic and task independent Perception and Intelligence Lab., Copyright © 2015 40 Results (Fine-tuning how many layers?)
  • 41. Benefits from Multi-task training • Convenient training • Improve results. Tasks influence each other through the ConvNet • 𝜆 = 0, not BB regressors. Only CLS • 𝜆 = 1, but disabled BB regressors at test time • Isolates network’s CLS accuracy for comparison • Improves pure CLS accuracy! (+0.8~1.1 mAP) • Train with CLS loss only, then train BB regressors layer 𝐿𝑙𝑜𝑐 freezing others. • Good, but still under performs multi-task learning Results (Multi-task training)
  • 42. More training data • RCNN based on deep ConvNet learns better with larger dataset Perception and Intelligence Lab., Copyright © 2015 42 Results (Additional Data)
  • 43. • Increase # of object proposals don’t help. (Although Average Recall ↑) • Sparse object proposal methods (e.g. Selective Search) are bottleneck. • Replacement with dense set of sliding window (free cost) • Still sparse proposals better on detection quality Perception and Intelligence Lab., Copyright © 2015 43 Results (Object proposals)
  • 44. • State-of-the-art detection result • Detailed experiments providing insights. • Sparse object proposals improve detector quality • But, a bottleneck • Decreasing the object proposal time is critical in the future. Further more • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Object proposal [ArXiv] • Detection network also proposes objects • Cost of proposals: 10ms, VGG16 runtime ~200ms including all steps • Higher mAP, faster • R-CNN minus R [BMVC2015] • Fast detector without Selective Search • No algorithms other than CNN itself • Attempts to remove Object proposal algorithms and rely exclusively on CNN • More integrated, simpler and faster detector • Share full-image convolutional features with detection net Perception and Intelligence Lab., Copyright © 2015 44 Conclusion
  • 46. • DeepMultiBox • Scalable object detection using DNN • Class-agnostic scalable object detection • Only Bounding box. Not aware of what the object is in the box. • Prediction a set of bounding boxes where potential objects are • Localize then recognize • Boxes generated using single DNN • Outputs • fixed number of bounding boxes. • A score for each box. Confidence of the box containing an object. 46 Introduction
  • 47. Training & Test time • Truncated SVD for network compression (on FC layers) • High speed-ups with smaller drops in mAP (reduce detection time 30%, 0.3mAP drop) Perception and Intelligence Lab., Copyright © 2015 47 Results (Time)
  • 48. • Multi-stage pipeline • Separate learning stages • FRCN: Simplification no learning process & state of art Single-stage training algorithm Perception and Intelligence Lab., Copyright © 2015 48 RCNN & SPPnet
  • 49. Method dataset Measure 1 Measure 2 Measure 3 Measure 4 Baseline ABC 92 12 34 45 XXX ABC 32 32 54 76 YYY ABC 14 14 12 98 ZZZ ABC 32 23 32 67 Proposed ABC 14 42 41 87 Proposed (w.XX) ABC 32 15 35 67 Perception and Intelligence Lab., Copyright © 2015 49 Table example Table Title (if you want it to place here)
  • 50. Perception and Intelligence Lab., Copyright © 2015 50 Figure example < Updated cells > < CNN architecture > For highlight