Object Detection
Using R-CNN Deep Learning
Framework
Nader Karimi Bavandpour (nader.karimi.b@gmail.com)
Summer School of Intelligent Learning
IPM, 2019
Table of Content
● Machine Learning Key Point: Inductive Bias
● From Classification to Instance Segmentation
● Region Proposal
● R-CNN Framework
2
Machine Learning Key Point:
Inductive Bias
3
DeïŹnition of Inductive Bias
The kind of necessary assumptions about the nature of the target function are subsumed in the phrase
inductive bias.
- Wikipedia
Every machine learning algorithm with any ability to generalize beyond the training data that it sees has
some type of inductive bias.
- StackOverflow
4
Examples of Inductive Bias
● Maximum Margin: Maximize the width of the boundary between two classes
● Nearest Neighbors: Most of the cases in a small neighborhood in feature space belong to the same
class
● Minimum Cross-Validation Error: Select the hypothesis with the lowest cross-validation error
5
○ Although cross-validation may seem to be free of bias,
the "no free lunch" theorems show that cross-validation must be biased.
● Locality of Receptive Field: Use convolutional layers instead of fc layers
From ClassiïŹcation to
Instance Segmentation
6
Object ClassiïŹcation
7
● Image Category Recognition
● Input: image
● Output: Class label
● Types:
○ Binary/Multi-class Classification
○ Multiclass Classification
○ Binary/Multi-label Classification
Object Localization
8
● Object Bounding Box Recognition
● Input: image
● Output: Box in the image (x, y, w, h)
Semantic Segmentation
9
● Pixel Category Recognition
● Input: Image
● Output: Category-aware pixel labels
Instance Segmentation
10
● Instance-Aware Pixel Category Recognition
● Input: Image
● Output: Instance-aware pixel labels
Intersection Over Union (IoU)
Important measurement for object localization
Used in both training and evaluation
11
Datasets: ImageNet Challenge
● 1000 Classes
● Each image has 1 class with at least one bounding box
● About 800 Training images per class
● Algorithm produces 5 (class + bounding box) guesses
● Correct if at least one of guess has correct class and bounding box
at least 50% intersection over union.
12
13
Region Proposal
14
Selective Search for Region Proposal
● A region proposal algorithm used in object detection
● Designed to be fast with a very high recall
● Based on computing hierarchical grouping of similar regions based on
color, texture, size and shape compatibility
15
Selective Search for Region Proposal
● First takes an image as input
16
Selective Search for Region Proposal
● Generates initial sub-segmentations
17
Selective Search for Region Proposal
● Combines the similar regions to form a larger region
○ based on color similarity, texture similarity, size
similarity, and shape compatibility
● Finally, these regions produce the Regions of
Interest (RoI)
18
R-CNN Framework
19
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Adds Object Boundary Prediction to R-CNN
20
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Adds Object Boundary Prediction to R-CNN
21
R-CNN
22
R-CNN
23
R-CNN
24
R-CNN
25
R-CNN
26
Problems with R-CNN
● Extracting 2,000 regions for each image based on selective search
● Extracting features using CNN for every image region. Suppose we have N images, then the number of
CNN features will be N*2,000
● The entire process of object detection using R-CNN has three models:
○ CNN for feature extraction
○ Linear SVM classifier for identifying objects
○ Regression model for tightening the bounding boxes
27
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
28
Fast RCNN
● Selective search as a proposal method
to find the Regions of Interest is slow
● Takes around 2 seconds per image to
detect objects, which is much better
compared to RCNN
29
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
30
Faster RCNN
● Region Proposal Network (RPN) for region proposal
○ Input: Image of any size
○ Output: A set of rectangular object proposals and objectness
scores
○ Related to attention mechanisms
31
Faster RCNN
● Feature maps from CNN are passed to the
Region Proposal Network (RPN)
● k Anchor boxes of different shapes are
generated using a sliding window in the RPN
● Anchor boxes are fixed sized boundary boxes
that are placed throughout the image and
have different shapes and size
32
Faster RCNN
● For each anchor, RPN predicts two things:
○ The first is the probability that an anchor is an object (it does not consider which
class the object belongs to)
○ Second is the bounding box regressor for adjusting the anchors to better fit the
object
33
R-CNN Family
● R-CNN: Selective search → Cropped Image → CNN
● Fast R-CNN: Selective search → Crop feature map of CNN
● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN
● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN
34
Mask R-CNN
● Extends Faster R-CNN by adding a
branch for predicting an object mask in
parallel with the existing branch for
bounding box recognition
35
Mask R-CNN
● Defines a multi-task loss on each sampled RoI
as:
L = L_cls + L_box + L_mask
36
Mask R-CNN
37
Thanks for Your Attention!
38

Object Detection Using R-CNN Deep Learning Framework

  • 1.
    Object Detection Using R-CNNDeep Learning Framework Nader Karimi Bavandpour (nader.karimi.b@gmail.com) Summer School of Intelligent Learning IPM, 2019
  • 2.
    Table of Content ●Machine Learning Key Point: Inductive Bias ● From Classification to Instance Segmentation ● Region Proposal ● R-CNN Framework 2
  • 3.
    Machine Learning KeyPoint: Inductive Bias 3
  • 4.
    DeïŹnition of InductiveBias The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias. - Wikipedia Every machine learning algorithm with any ability to generalize beyond the training data that it sees has some type of inductive bias. - StackOverflow 4
  • 5.
    Examples of InductiveBias ● Maximum Margin: Maximize the width of the boundary between two classes ● Nearest Neighbors: Most of the cases in a small neighborhood in feature space belong to the same class ● Minimum Cross-Validation Error: Select the hypothesis with the lowest cross-validation error 5 ○ Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased. ● Locality of Receptive Field: Use convolutional layers instead of fc layers
  • 6.
  • 7.
    Object ClassiïŹcation 7 ● ImageCategory Recognition ● Input: image ● Output: Class label ● Types: ○ Binary/Multi-class Classification ○ Multiclass Classification ○ Binary/Multi-label Classification
  • 8.
    Object Localization 8 ● ObjectBounding Box Recognition ● Input: image ● Output: Box in the image (x, y, w, h)
  • 9.
    Semantic Segmentation 9 ● PixelCategory Recognition ● Input: Image ● Output: Category-aware pixel labels
  • 10.
    Instance Segmentation 10 ● Instance-AwarePixel Category Recognition ● Input: Image ● Output: Instance-aware pixel labels
  • 11.
    Intersection Over Union(IoU) Important measurement for object localization Used in both training and evaluation 11
  • 12.
    Datasets: ImageNet Challenge ●1000 Classes ● Each image has 1 class with at least one bounding box ● About 800 Training images per class ● Algorithm produces 5 (class + bounding box) guesses ● Correct if at least one of guess has correct class and bounding box at least 50% intersection over union. 12
  • 13.
  • 14.
  • 15.
    Selective Search forRegion Proposal ● A region proposal algorithm used in object detection ● Designed to be fast with a very high recall ● Based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility 15
  • 16.
    Selective Search forRegion Proposal ● First takes an image as input 16
  • 17.
    Selective Search forRegion Proposal ● Generates initial sub-segmentations 17
  • 18.
    Selective Search forRegion Proposal ● Combines the similar regions to form a larger region ○ based on color similarity, texture similarity, size similarity, and shape compatibility ● Finally, these regions produce the Regions of Interest (RoI) 18
  • 19.
  • 20.
    R-CNN Family ● R-CNN:Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Adds Object Boundary Prediction to R-CNN 20
  • 21.
    R-CNN Family ● R-CNN:Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Adds Object Boundary Prediction to R-CNN 21
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    Problems with R-CNN ●Extracting 2,000 regions for each image based on selective search ● Extracting features using CNN for every image region. Suppose we have N images, then the number of CNN features will be N*2,000 ● The entire process of object detection using R-CNN has three models: ○ CNN for feature extraction ○ Linear SVM classifier for identifying objects ○ Regression model for tightening the bounding boxes 27
  • 28.
    R-CNN Family ● R-CNN:Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN 28
  • 29.
    Fast RCNN ● Selectivesearch as a proposal method to find the Regions of Interest is slow ● Takes around 2 seconds per image to detect objects, which is much better compared to RCNN 29
  • 30.
    R-CNN Family ● R-CNN:Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN 30
  • 31.
    Faster RCNN ● RegionProposal Network (RPN) for region proposal ○ Input: Image of any size ○ Output: A set of rectangular object proposals and objectness scores ○ Related to attention mechanisms 31
  • 32.
    Faster RCNN ● Featuremaps from CNN are passed to the Region Proposal Network (RPN) ● k Anchor boxes of different shapes are generated using a sliding window in the RPN ● Anchor boxes are fixed sized boundary boxes that are placed throughout the image and have different shapes and size 32
  • 33.
    Faster RCNN ● Foreach anchor, RPN predicts two things: ○ The first is the probability that an anchor is an object (it does not consider which class the object belongs to) ○ Second is the bounding box regressor for adjusting the anchors to better fit the object 33
  • 34.
    R-CNN Family ● R-CNN:Selective search → Cropped Image → CNN ● Fast R-CNN: Selective search → Crop feature map of CNN ● Faster R-CNN: CNN → Region-Proposal Network → Crop feature map of CNN ● Mask-CNN: Mask-CNN: Adds Object Boundary Prediction to R-CNN 34
  • 35.
    Mask R-CNN ● ExtendsFaster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition 35
  • 36.
    Mask R-CNN ● Definesa multi-task loss on each sampled RoI as: L = L_cls + L_box + L_mask 36
  • 37.
  • 38.
    Thanks for YourAttention! 38