21. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
RCNN
Use Graph Based Segmentation to generate candidate regions.
Selective Search Algorithm generates 2000 Region Proposals by
combining smaller regions into larger ones.
Each of the 2000 proposals is fed into a CNN that outputs a 4096
dimensional feature vector
SVMs for each class are used to classify the presence of that object in
a proposal.
Bounding boxes are generated for each object containing region.
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021
22. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Fast-RCNN
RCNN tries to classify 2000 Region Proposals per image, which both
time consuming and wasteful.
Fast RCNN reduces this time by feeds the input image to the CNN
instead, and then maps the proposed regions onto the convolutional
feature map.
Regions of Interests that are identified are then warped into squares
and then passed through a pooling layer, where they are reshaped
into a fixed size.
The pooled RoIs are fed into a fully connected layer, where a softmax
layer is used for classification, and linear regression is performed for
Bounding Box offset values.
The entire network is trained using Log Loss (for classification) +
Smooth L1 Loss (for Bbox regression).
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021
25. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Faster RCNN
Region Proposal Algorithms are heuristical and slow, and hence form
a bottleneck during training and testing.
Faster RCNN uses Region Proposal Network (RPN) to propose
regions from the convolutional feature map.
Uses anchor points for different scales and aspect ratios to account
for different scales of the objects in an image.
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021
27. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Mask RCNN
Extension of Faster RCNN. Makes predictions for masks also.
Masks have K ∗ m2 dimensional output for each RoI, which encodes
K binary masks of resolution m ∗ m for each of the K classes.
Masks are predicted for the RoI pooled feature map, and need to be
aligned with the input RoI. Thus RoIAlign method is used which uses
bilinear interpolation to align pooled feature map with the input
feature map.
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021
29. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Mask RCNN - Loss
Mask RCNN loss function is a multi-task loss function, since it incorporates
the prediction losses of classes, bounding boxes and segmentation masks.
Thus it can be represented as L = Lcls + Lbox + Lmask, where
Lcls: This represents a binary cross entropy loss function for each of
the K classes.
Lbox: This represents the smooth L1 loss function, which is used for
regresssion loss
Lmask: This represents the binary cross entropy loss function, used
for prediction of binary masks for each of the K classes.
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021
38. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Idea
We are till now looking on Pixel Level, which might not be a good thing,
example, for a classifier for say humans vs no humans, classifier does not
look at all pixels it uses pixels to extract some features and then use them
to make prediction.
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021