SlideShare a Scribd company logo
1 of 41
Faster R-CNN
By; Deep Learning Team
Faster R-CNN(NIPS 2015)
Computer Vision Task
History(?) of R-CNN
• Rich feature hierarchies for accurate object detection and semantic segmentation(2013)
• Fast R-CNN(2015)
• Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(2015)
• Mask R-CNN(2017)
Is Faster R-CNN
Really Fast?
• Generally R-FCN and SSD
models are faster on
average while Faster R-
CNN models are more
accurate
• Faster R-CNN models can
be faster if we limit the
number of regions
proposed
R-CNN Architecture
R-CNN
Region Proposals – Selective Search
• Bottom-up segmentation, merging regions at multiple scales
Convert
regions to
boxes
R-CNN Training
• Pre-train a ConvNet(AlexNet) for ImageNet classification dataset
• Fine-tune for object detection(softmax + log loss)
• Cache feature vectors to disk
• Train post hoc linear SVMs(hinge loss)
• Train post hoc linear bounding-box regressors(squared loss)
“Post hoc” means the parameters are learned after
the ConvNet is fixed
Bounding-Box Regression
specifies the pixel coordinates of the center of proposal Pi’s
bounding box together with Pi’s width and height in pixels
means the ground-truth bounding box
Bounding-Box Regression
Problems of R-CNN
• Slow at test-time: need to run full forward path of CNN for
each region proposal
 13s/image on a GPU(K40)
 53s/image on a CPU
• SVM and regressors are post-hoc: CNN features not updated
in response to SVMs and regressors
• Complex multistage training pipeline (84 hours using K40
GPU)
 Fine-tune network with softmax classifier(log loss)
 Train post-hoc linear SVMs(hinge loss)
 Train post-hoc bounding-box regressions(squared loss)
Fast R-CNN
• Fix most of what’s wrong with R-CNN and SPP-net
• Train the detector in a single stage, end-to-end
 No caching features to disk
 No post hoc training steps
• Train all layers of the network
Fast R-CNN Architecture
Fast R-CNN
RoI Pooling
RoI Pooling
RoI in Conv feature map : 21x14  3x2 max pooling with stride(3, 2)  output : 7x7
RoI in Conv feature map : 35x42  5x6 max pooling with stride(5, 6)  output : 7x7
VGG-16
Training & Testing
1. Takes an input and a set of bounding boxes
2. Generate convolutional feature maps
3. For each bbox, get a fixed-length feature vector from RoI
pooling layer
4. Outputs have two information
 K+1 class labels
 Bounding box locations
• Loss function
R-CNN vs SPP-net vs Fast R-CNN
Runtime dominated by
region proposals!
Problems of Fast R-CNN
• Out-of-network region proposals are the test-time
computational bottleneck
• Is it fast enough??
Faster R-CNN(RPN + Fast R-CNN)
• Insert a Region Proposal
Network (RPN) after the last
convolutional layer  using GPU!
• RPN trained to produce region
proposals directly; no need for
external region proposals
• After RPN, use RoI Pooling and
an upstream classifier and bbox
regressor just like Fast R-CNN
Training Goal : Share Features
3 x 3
RPN
• Slide a small window on the
feature map
• Build a small network for
 Classifying object or not-object
 Regressing bbox locations
• Position of the sliding window
provides localization information
with reference to the image
• Box regression provides finer
localization information with
reference to this sliding window
ZF : 256-d, VGG : 512-d
RPN
• Use k anchor boxes at each
location
• Anchors are translation
invariant: use the same ones at
every location
• Regression gives offsets from
anchor boxes
• Classification gives the
probability that each
(regressed) anchor shows an
object
3 x 3
RPN(Fully Convolutional Network)
• Intermediate Layer – 256(or 512)
3x3 filter, stride 1, padding 1
• Cls layer – 18(9x2) 1x1 filter, stride
1, padding 0
• Reg layer – 36(9x4) 1x1 filter, stride
1, padding 0
ZF : 256-d, VGG : 512-d
Anchors as references
• Anchors: pre-defined reference boxes
• Multi-scale/size anchors:
 Multiple anchors are used at each position:
3 scale(128x128, 256x256, 512x512) and 3 aspect rations(2:1, 1:1, 1:2) yield 9
anchors
 Each anchor has its own prediction function
 Single-scale features, multi-scale predictions
Positive/Negative Samples
• An anchor is labeled as positive if
 The anchor is the one with highest IoU overlap with a ground-truth
box
 The anchor has an IoU overlap with a ground-truth box higher than
0.7
• Negative labels are assigned to anchors with IoU lower than
0.3 for all ground-truth boxes
• 50%/50% ratio of positive/negative anchors in a minibatch
RPN Loss Function
4-Step Alternating Training
Results
Experiments
Experiments
Experiments
Is It Enough?
• RoI Pooling has some quantization operations
• These quantizations introduce misalignments between the RoI
and the extracted features
• While this may not impact classification, it can make a
negative effect on predicting bbox
Mask R-CNN
Mask R-CNN
• Mask R-CNN extends Faster R-CNN by adding a branch for
predicting segmentation masks on each Region of Interest
(RoI), in parallel with the existing branch for classification and
bounding box regression
Loss Function, Mask Branch
• The mask branch has a K x m x m - dimensional output for
each RoI, which encodes K binary masks of resolution m × m,
one for each of the K classes.
• Applying per-pixel sigmoid
• For an RoI associated with ground-truth class k, Lmask is only
defined on the k-th mask
RoI Align
• RoI Align don’t use quantization of the RoI boundaries
• Bilinear interpolation is used for computing the exact values
of the input features
Results – MS COCO
ThankYou
Thank You

More Related Content

Similar to Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkNAVER Engineering
 
Image segmentation hj_cho
Image segmentation hj_choImage segmentation hj_cho
Image segmentation hj_choHyungjoo Cho
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Universitat de Barcelona
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detectionFast methods for deep learning based object detection
Fast methods for deep learning based object detectionBrodmann17
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...James McCombs
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image searchUniversitat Politècnica de Catalunya
 

Similar to Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (20)

MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
 
Image segmentation hj_cho
Image segmentation hj_choImage segmentation hj_cho
Image segmentation hj_cho
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Detection
DetectionDetection
Detection
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detectionFast methods for deep learning based object detection
Fast methods for deep learning based object detection
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 

More from fahmi324663

Week3- Face Identification with K-Nears Neighbour .pptx
Week3- Face Identification with K-Nears Neighbour .pptxWeek3- Face Identification with K-Nears Neighbour .pptx
Week3- Face Identification with K-Nears Neighbour .pptxfahmi324663
 
Week3-Deep Neural Network (DNN).pptx
Week3-Deep Neural Network (DNN).pptxWeek3-Deep Neural Network (DNN).pptx
Week3-Deep Neural Network (DNN).pptxfahmi324663
 
intelligentsystems-140424154432-phpapp01.pptx
intelligentsystems-140424154432-phpapp01.pptxintelligentsystems-140424154432-phpapp01.pptx
intelligentsystems-140424154432-phpapp01.pptxfahmi324663
 
Week2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxWeek2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxfahmi324663
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptxfahmi324663
 
WMP_MP02_revd(10092023).pptx
WMP_MP02_revd(10092023).pptxWMP_MP02_revd(10092023).pptx
WMP_MP02_revd(10092023).pptxfahmi324663
 
WMP_MP02_revd_03(10092023).pptx
WMP_MP02_revd_03(10092023).pptxWMP_MP02_revd_03(10092023).pptx
WMP_MP02_revd_03(10092023).pptxfahmi324663
 
si402_p02_konsep-arsitektur-enterprise.pptx
si402_p02_konsep-arsitektur-enterprise.pptxsi402_p02_konsep-arsitektur-enterprise.pptx
si402_p02_konsep-arsitektur-enterprise.pptxfahmi324663
 
Pertemuan 12 Teorema Bayes Lanjutan.pptx
Pertemuan 12  Teorema Bayes Lanjutan.pptxPertemuan 12  Teorema Bayes Lanjutan.pptx
Pertemuan 12 Teorema Bayes Lanjutan.pptxfahmi324663
 
Pertemuan 4 Metode Forward Chaining.pptx
Pertemuan 4  Metode Forward Chaining.pptxPertemuan 4  Metode Forward Chaining.pptx
Pertemuan 4 Metode Forward Chaining.pptxfahmi324663
 

More from fahmi324663 (11)

Week3- Face Identification with K-Nears Neighbour .pptx
Week3- Face Identification with K-Nears Neighbour .pptxWeek3- Face Identification with K-Nears Neighbour .pptx
Week3- Face Identification with K-Nears Neighbour .pptx
 
Week3-Deep Neural Network (DNN).pptx
Week3-Deep Neural Network (DNN).pptxWeek3-Deep Neural Network (DNN).pptx
Week3-Deep Neural Network (DNN).pptx
 
intelligentsystems-140424154432-phpapp01.pptx
intelligentsystems-140424154432-phpapp01.pptxintelligentsystems-140424154432-phpapp01.pptx
intelligentsystems-140424154432-phpapp01.pptx
 
Week2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxWeek2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptx
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
WMP_MP02_revd(10092023).pptx
WMP_MP02_revd(10092023).pptxWMP_MP02_revd(10092023).pptx
WMP_MP02_revd(10092023).pptx
 
WMP_MP02_revd_03(10092023).pptx
WMP_MP02_revd_03(10092023).pptxWMP_MP02_revd_03(10092023).pptx
WMP_MP02_revd_03(10092023).pptx
 
DSS-1.pptx
DSS-1.pptxDSS-1.pptx
DSS-1.pptx
 
si402_p02_konsep-arsitektur-enterprise.pptx
si402_p02_konsep-arsitektur-enterprise.pptxsi402_p02_konsep-arsitektur-enterprise.pptx
si402_p02_konsep-arsitektur-enterprise.pptx
 
Pertemuan 12 Teorema Bayes Lanjutan.pptx
Pertemuan 12  Teorema Bayes Lanjutan.pptxPertemuan 12  Teorema Bayes Lanjutan.pptx
Pertemuan 12 Teorema Bayes Lanjutan.pptx
 
Pertemuan 4 Metode Forward Chaining.pptx
Pertemuan 4  Metode Forward Chaining.pptxPertemuan 4  Metode Forward Chaining.pptx
Pertemuan 4 Metode Forward Chaining.pptx
 

Recently uploaded

Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 

Recently uploaded (20)

Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

  • 1. Faster R-CNN By; Deep Learning Team
  • 4. History(?) of R-CNN • Rich feature hierarchies for accurate object detection and semantic segmentation(2013) • Fast R-CNN(2015) • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(2015) • Mask R-CNN(2017)
  • 5. Is Faster R-CNN Really Fast? • Generally R-FCN and SSD models are faster on average while Faster R- CNN models are more accurate • Faster R-CNN models can be faster if we limit the number of regions proposed
  • 8. Region Proposals – Selective Search • Bottom-up segmentation, merging regions at multiple scales Convert regions to boxes
  • 9. R-CNN Training • Pre-train a ConvNet(AlexNet) for ImageNet classification dataset • Fine-tune for object detection(softmax + log loss) • Cache feature vectors to disk • Train post hoc linear SVMs(hinge loss) • Train post hoc linear bounding-box regressors(squared loss) “Post hoc” means the parameters are learned after the ConvNet is fixed
  • 10. Bounding-Box Regression specifies the pixel coordinates of the center of proposal Pi’s bounding box together with Pi’s width and height in pixels means the ground-truth bounding box
  • 12. Problems of R-CNN • Slow at test-time: need to run full forward path of CNN for each region proposal  13s/image on a GPU(K40)  53s/image on a CPU • SVM and regressors are post-hoc: CNN features not updated in response to SVMs and regressors • Complex multistage training pipeline (84 hours using K40 GPU)  Fine-tune network with softmax classifier(log loss)  Train post-hoc linear SVMs(hinge loss)  Train post-hoc bounding-box regressions(squared loss)
  • 13. Fast R-CNN • Fix most of what’s wrong with R-CNN and SPP-net • Train the detector in a single stage, end-to-end  No caching features to disk  No post hoc training steps • Train all layers of the network
  • 17. RoI Pooling RoI in Conv feature map : 21x14  3x2 max pooling with stride(3, 2)  output : 7x7 RoI in Conv feature map : 35x42  5x6 max pooling with stride(5, 6)  output : 7x7 VGG-16
  • 18. Training & Testing 1. Takes an input and a set of bounding boxes 2. Generate convolutional feature maps 3. For each bbox, get a fixed-length feature vector from RoI pooling layer 4. Outputs have two information  K+1 class labels  Bounding box locations • Loss function
  • 19. R-CNN vs SPP-net vs Fast R-CNN Runtime dominated by region proposals!
  • 20. Problems of Fast R-CNN • Out-of-network region proposals are the test-time computational bottleneck • Is it fast enough??
  • 21. Faster R-CNN(RPN + Fast R-CNN) • Insert a Region Proposal Network (RPN) after the last convolutional layer  using GPU! • RPN trained to produce region proposals directly; no need for external region proposals • After RPN, use RoI Pooling and an upstream classifier and bbox regressor just like Fast R-CNN
  • 22. Training Goal : Share Features
  • 23. 3 x 3 RPN • Slide a small window on the feature map • Build a small network for  Classifying object or not-object  Regressing bbox locations • Position of the sliding window provides localization information with reference to the image • Box regression provides finer localization information with reference to this sliding window ZF : 256-d, VGG : 512-d
  • 24. RPN • Use k anchor boxes at each location • Anchors are translation invariant: use the same ones at every location • Regression gives offsets from anchor boxes • Classification gives the probability that each (regressed) anchor shows an object
  • 25. 3 x 3 RPN(Fully Convolutional Network) • Intermediate Layer – 256(or 512) 3x3 filter, stride 1, padding 1 • Cls layer – 18(9x2) 1x1 filter, stride 1, padding 0 • Reg layer – 36(9x4) 1x1 filter, stride 1, padding 0 ZF : 256-d, VGG : 512-d
  • 26. Anchors as references • Anchors: pre-defined reference boxes • Multi-scale/size anchors:  Multiple anchors are used at each position: 3 scale(128x128, 256x256, 512x512) and 3 aspect rations(2:1, 1:1, 1:2) yield 9 anchors  Each anchor has its own prediction function  Single-scale features, multi-scale predictions
  • 27. Positive/Negative Samples • An anchor is labeled as positive if  The anchor is the one with highest IoU overlap with a ground-truth box  The anchor has an IoU overlap with a ground-truth box higher than 0.7 • Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth boxes • 50%/50% ratio of positive/negative anchors in a minibatch
  • 34. Is It Enough? • RoI Pooling has some quantization operations • These quantizations introduce misalignments between the RoI and the extracted features • While this may not impact classification, it can make a negative effect on predicting bbox
  • 36. Mask R-CNN • Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression
  • 37. Loss Function, Mask Branch • The mask branch has a K x m x m - dimensional output for each RoI, which encodes K binary masks of resolution m × m, one for each of the K classes. • Applying per-pixel sigmoid • For an RoI associated with ground-truth class k, Lmask is only defined on the k-th mask
  • 38. RoI Align • RoI Align don’t use quantization of the RoI boundaries • Bilinear interpolation is used for computing the exact values of the input features