Pr057 mask rcnn

Bbox
Regression
Classification
RoI
from
Selective Search
RoI Pooling
FixedSizeRepresentation

Bbox
Regression
Classification
RoI Pooling
Bbox
Regression
Objectness
RPN
Region
Proposal
Network

32x32x3
Conv1
Pool1
16x16x64
Conv2
Pool2
8x8x128
Conv3
Pool3
4x4x256
Conv4
Pool4
2x2x512
Conv5
Pool5
1x1x512
1x1x512 Conv
1x1 Heatmap
x32 Upsample
Softmax
Remove Pooling
1x1 Conv for Heatmap Output

SlidefromMaskR-CNNTutorial, K.He.ICCV2017

Sheep Dog
Human
Sheep
Sheep Sheep Sheep

Dog
Human
Sheep
Sheep
Sheep Sheep Sheep

BBox
Classification
Segmentation
Classification

BBox
Classification
Segmentation
Classification
Can Separate
Cannot Segment

BBox
Classification
Segmentation
Classification
Can Separate
Cannot Segment
Cannot Separate
Can Segment

BBox
Classification
Segmentation
Classification
Segmentation
in BBox
Classification
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment

BBox
Classification
Segmentation
Classification
Segmentation
in BBox
Classification
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment
Faster R-CNN FCN

BBox
Classification
Segmentation
Classification
Segmentation
in BBox
Classification
Faster R-CNN FCN FCN
on BBOX !
+ =
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment

FCN
• Pixel-level Classification
• Per Pixel Softmax (Multinomial)
• Multi Instance

FCN
• Per Pixel Softmax (Multinomial)
• Multi Instance
Faster R-CNN
• Classification
• Instance Level RoI

FCN
• Per Pixel Softmax  Sigmoid (Binary)
• Multi Instance
Faster R-CNN
• Classification
• Instance Level RoI

DB
BBox + Class + Mask
𝐿 = 𝐿𝑐𝑙𝑠 + 𝐿 𝑏𝑜𝑥 + 𝐿 𝑚𝑎𝑠𝑘
𝐿𝑐𝑙𝑠: Softmax Cross Entropy
𝐿 𝑏𝑜𝑥: Regression
𝐿 𝑚𝑎𝑠𝑘: Binary Cross Entropy

Training Phase
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐1 + 𝐿𝑐2 + ⋯+ 𝐿𝑐𝑘
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐3
if) GT Class is 3

Training Phase
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐1 + 𝐿𝑐2 + ⋯+ 𝐿𝑐𝑘
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐3
if) GT Class is 3
Mask Branch Only Learns How to Mask independent of Class

Test Phase
Predicts Human Mask
Predicts Car Mask
Predicts Horse Mask
Predicts ...

Test Phase
Predicts Human Mask
Predicts Car Mask
Predicts Horse Mask
Predicts ...
Winner Takes All

SlidefromMaskR-CNNTutorial, K.He.ICCV2017 FasterR-CNN,S.Ren,NIPS2015

Deconv
2x2 str2
Deconv
2x2 str2

SlidefromMaskR-CNNTutorial, K.He.ICCV2017 3x3 Conv
4 Layer

1x1 Conv
1x1 Conv

Bbox
Regression
Classification
RoI Pooling
Pooled Feature
7x7

RoI Pooling (Fast R-CNN)
• Input: Each RoI
• Output: 7x7 Pooled Feature
RoI Align (Mask R-CNN)
• Input: Each RoI
• Output: 7x7 Pooled Feature

Feature Map
RoI
Note:
Region Proposal Network RoI Prediction
= Floating Point Representation

Feature Map
RoI
2x2 Subcells for Precision

= 0.15 + 0.25
+ 0.25 + 0.35
RoI

Feature Map
RoI
2x2 Subcell Max Pooling

Bbox
Regression
Classification
RoI Align
Bbox
Regression
Objectness
RPN
Binary Mask

Bbox
Regression
Classification
RoI Align
Bbox
Regression
Objectness
RPN
Binary Mask
Paste Back

• Faster R-CNN + ResNet
Deep ResidualLearning for Image Recognition, K He, 2016 CVPR
• Faster R-CNN + FPN
Feature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

• Faster R-CNN + ResNet
Deep ResidualLearning for Image Recognition, K He, 2016 CVPR

• Faster R-CNN + FPN
Feature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign

Detection Performance Improvement

Pr057 mask rcnn

More Related Content

What's hot

Similar to Pr057 mask rcnn

More from Taeoh Kim

Recently uploaded

Pr057 mask rcnn