R-FCN: Object Detection via Region-based
Fully Convolutional Networks
2022/04/19, Changjin Lee
Introduction
Translation Invariance Translation Variance
DOG
Classification Detection
dilemma
Introduction
● Two-stage object detection networks have two subnetworks
○ Shared fully convolutional subnetwork independent of RoIs
○ RoI-wise subnetwork that does not share computation
● RoI pooling layer is unnaturally inserted to address invariance vs variance dilemma
○ Sacrifices training and testing efficiency since it introduces a considerable number of
region-wise layers -> each RoI goes through classification layer
R-FCN Specialized feature map in detecting top-left
corner of cat
k
k
R-FCN
position-sensitive
score maps
position-sensitive
RoI pooling layer
R-FCN vs Faster R-CNN
Faster R-CNN
R-FCN
conv layer
NO conv layer
Position-sensitive score maps
● Attach a convolutional layer on top of feature map to produce k^2(C+1) position-
sensitive score maps
● For each class C, k^2 feature maps are produced
○ feature map specialized for (top-left, top-middle,...) locations of an object
k
k
Position-sensitive score maps
Position-sensitive RoI Pooling
● Each RoI rectangle is divided into k x k bins
○ For w x h RoI, each bin has size of (w/k x h/k)
● For each (i, j)th bin, position-sensitive RoI pooling pools only over the (i,j)th score map
RoI
0 1 2
0
1
2
one score map out of k^2(C+1) score
maps
top_left corner of an
ROI
# of pixels in the bin
(123, 245)
Position-sensitive RoI Pooling
-> For each class!
For each class: [0.47, 0.77, 0.18, ….], n= C+1 classes
Voting
softmax
● For each class, obtain average class score
○ Total (C+1) class scores
● Apply softmax to determine the classification result
Position-sensitive RoI Pooling
Position-sensitive RoI Pooling
Bounding box regression
● Aside from the k^2(C+1)-d conv layer, a sibling 4k^2 sibling conv layer for bounding box regression
is appended
○ produce 4k^2-d vector for each RoI
● Then, it’s aggregated into 4-d vector by average voting.
● 4-d vector parameterizes (t_x, t_y, t_w, t_h)
R-FCN Recap
Performances
References
[1]https://arxiv.org/abs/1605.06409
[2]https://jonathan-hui.medium.com/understanding-region-based-fully-
convolutional-networks-r-fcn-for-object-detection-828316f07c99

R-FCN.pptx