SlideShare a Scribd company logo
1 of 30
Tutorial
Faster R-CNN
Object Detection: Localization & Classification
Hwa Pyung Kim
Department of Computational Science and Engineering, Yonsei University
hpkim0512@yonsei.ac.kr
𝑥
𝑦
𝑤
ℎ
Bounding box regression (localization):
Where?
Object Detection: Classification + Regression
A dog at (𝒙, 𝒚, 𝒘, 𝒉)
+ =
1
0
0
⋮
Dog
Cat
⋮
Person
Classification (recognition):
What?
Objection Detection
Feature
map
Encoding
(conv&pool)
Combining
features
𝒙, 𝒚
w
h
Bounding box information
• 𝒙, 𝒚 : top left corner position
• w = width
• h = height
Dog
Cat
Person
⋮
pool5 features[224,224,3]
[7,7,512]
Input image
224
224
7 =
224
32
32 = 25
5 = # of pooling
7
7
Vgg16 Networks
Pooling
CNN-based Object Detection:
There are clues of dog (What) at local position (Where)
in the convolution feature map
Fully-connected
layers
Classification
Regression
𝑥
𝑦
𝑤
ℎ
1
0
0
⋮
These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”.
⋯ ⋯ Dog
Multiple Object Detection:
Localize and Classify all objects appearing in the image
How many objects are in there?
• Classify these multiply overlapping objects
• Identify their bounding boxes
PASCAL VOC2007
Background
Person
Dining table
Extract “region proposals” using
selective search method.
ConvNet
Region based CNN (R-CNN) method
CNN input (fixed size)
Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape
Classifier
&
Regressor
Classifier
&
Regressor
Classifier
&
Regressor
Fast R-CNN
feature map
ConvNet
Classifier &
Regressor
RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial
Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Classifier
&
Regressor
What is Region Proposal Network?
Region Proposal Network (RPN)
Region Proposal Network
380
480 11 =
360
32
, 15 =
480
32
32 = 25
5 = # of pooling
512 = # of filters
15
11
512
Conv feature map
RPN
RPN outputs a set of rectangular object
proposals, each with an objectness score.
How?
Region proposals
Region Proposal Network
Conv feature map
15
11
512
Region Proposals & Anchor Boxes
𝑠 𝑜𝑏𝑗
𝑠 𝑛𝑜𝑏𝑗
t𝑥
t𝑦
t𝑤
tℎ
Fully-
connected
layers
Input: each sliding window
3×3×512
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the proposal is parametrized relative to an anchor.
𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥
𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦
𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤
𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ
Output:
• 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
• 2 scores: 𝑠 𝑜𝑏𝑗
, 𝑠 𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
Anchor box information
• 𝒂𝒙 , 𝒂𝒚 : center position
• 𝒂𝒘 = width
• 𝒂𝒉 = height
Anchor box
For example, 𝑎𝑤 = 𝑎ℎ = 128
• 𝑎𝑤 and 𝑎ℎ are fixed.
• 𝑎𝑥 , 𝑎𝑦 is determined by the
position of the red box
Region Proposals & Anchor Boxes
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1Conv feature map
15
11
512
Fully-
connected
layers
3×3×512
• 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed.
• 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the
position of the red box
9 Anchor boxes = 3 ratios × 3 scales
For example,
𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128,
𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯
𝑎𝑤7 =
1
2
× 𝑎ℎ7 = 128, ⋯
Output: For 𝑖 = 1, ⋯ , 9,
• 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖
• 2 scores: 𝑠𝑖
𝑜𝑏𝑗
, 𝑠𝑖
𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the 9 proposals are parametrized relative to 9 anchors.
Input: each sliding window
Region Proposal Network
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖
Anchor box information
• 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position
• 𝒂𝒘𝒊 = width
• 𝒂𝒉𝒊 = height
Region Proposal Network
Fully-
connected
layers
Conv feature map
Anchor boxes
15
11
512
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑜𝑏𝑗
exp 𝑠𝑖
𝑜𝑏𝑗
+ exp 𝑠𝑖
𝑛𝑜𝑏𝑗
⋮
𝑝1
𝑝𝑥1
𝑝𝑦1
𝑝𝑤1
𝑝ℎ1
𝑝2
𝑝𝑥2
𝑝𝑦2
𝑝𝑤2
𝑝ℎ2
𝑝9
𝑝𝑥9
𝑝𝑦9
𝑝𝑤9
𝑝ℎ9
Extract 9 Proposals relative to 9 Anchors
Proposals
3×3×512
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
⋮
⋮
Total # of windows # of proposals
per a window
Total # of proposals: 11 × 15 × 9 = 1485
Conv feature map
The proposals highly overlaps each other!
Need to reduce redundancy.
Generate Region Proposals
15
11
512
Total#ofwindows=11×15
Region Proposal Network
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯ ⋯
Most probable proposal
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Proposal information
• 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position
• 𝒑𝒘𝒊 = width
• 𝒑𝒉𝒊 = height
• 𝒑𝒊 = objectness probability,
𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓
𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
Region Proposal Network
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.83𝐼𝑂𝑈 = 0.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.830.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
𝐼𝑂𝑈 =
Most probable proposal
30 proposals having IoU>0.7
are discarded.
Region Proposal Network
Given the most probable proposal,
the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Summary of step 1-2 in NMS.
Step 3:
Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process.
Next most probable proposal
36 proposals having IoU>0.7
are discarded.
Reduce redundancy by NMS
Region Proposal Network
Before NMS After NMS
1,485 proposals 300 proposals
Repeats the previous procedure until…
Reduce redundancy by NMS
Summary of RPN
Inputs:
• Conv feature map
Outputs:
• Region proposals coordinates.
• Probabilities representing how likely the image in that region proposal will be an object.
Region Proposal Network
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Now we are ready to explain
Classifier & Regressor.
Classifier
&
Regressor
Classifier & Regressor
RoI pooling layer
Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Classifier & Regressor
Bilinear interpolation
& Max pooling
Input for
Classifier & Regressor
: fixed-size
Conv feature map
Bilinear interpolation
& Max pooling
Convert the features inside valid RoI into a small feature map with a fixed spatial extent.
𝑝𝑥′
= 𝑝𝑥 ⋅
15
, 𝑝𝑦′
= 𝑝𝑦 ⋅
11
, 𝑝𝑤′
= 𝑝𝑤 ⋅
15
, 𝑝ℎ′
= 𝑝ℎ ⋅
11
360
480
11
15
5
8
3
9
7
7
7
7
𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
⋯
300 RoI pooled feature maps
RoI pooling layer generates
inputs for Classifier & Regressor
Classifier & Regressor
7
7
512
7
7
512
7
7
512
7
7
512
⋮
𝑠0
𝑟𝑥0
𝑟𝑦0
𝑟𝑤0
𝑟ℎ0
𝑠15
𝑟𝑥15
𝑟𝑦15
𝑟𝑤15
𝑟ℎ15
𝑠20
𝑟𝑥20
𝑟𝑦20
𝑟𝑤20
𝑟ℎ20
𝑝0 = 0.0124
𝑝15 = 0.9797
𝑝20 = 0.0001
⋮
RoI pooling
Classification & Regression per each proposal
𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖
𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖
𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖
ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑗=0
20
exp 𝑠𝑗
Background
Person
TV monitor
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Fully-connected
layers
⋮
𝑝0
𝑥0
𝑦0
𝑤0
ℎ0
𝑝15
𝑥15
𝑦15
𝑤15
ℎ15
𝑝20
𝑥20
𝑦20
𝑤20
ℎ20
⋮
Proposal
Classification &
Bounding-box regression
Each of the 21 classes
gets its own refined
bounding-box prediction and
assign estimated probability.
Classifier & Regressor
7
7
512
7×7×512
4096
Summary of Classification & Regression
Regress & classify
each class from proposals
⋮
Background
Person
TV monitor
⋮
⋮
Reduce redundancy
by NMS
Dining table
⋮
None
None
Classifier & Regressor
Discard bounding boxes
(p < 0.6 or background)
⋮
⋮
⋮
Region Proposals
Summary of Classifier & Regressor
Inputs:
• Conv feature map
• Region proposals
Outputs:
• Bounding boxes coordinate of objects in the image.
• Classification of bounding boxes
Classifier & Regressor
Training process for RPN
Ground-truth proposals associated with anchors 𝐴𝑗
𝑘
Find the nearest bounding box from each anchors, 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝐴𝑗
𝑘
• Ground-truth probability of objectness: 𝑝𝑗
(𝑘)
≔
1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
> 0.7
0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
< 0.3
• Ground-truth proposal transformation: 𝑡𝑗
(𝑘)
≔ 𝑡𝑥𝑗
(𝑘)
, 𝑡𝑦𝑗
(𝑘)
, 𝑡𝑤𝑗
(𝑘)
, 𝑡ℎ𝑗
(𝑘)
where Δ 𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑎𝑥𝑗
(𝑘)
/𝑎𝑤𝑗
(𝑘)
, Δ 𝑦𝑗
𝑘
= 𝑦𝑖
(𝑘)
− 𝑎𝑦𝑗
(𝑘)
/𝑎ℎ𝑗
(𝑘)
, Δ 𝑤𝑗 = log 𝑤𝑖
𝑘
/𝑎𝑤𝑗
(𝑘)
, Δℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑎ℎ𝑗
(𝑘)
Predicted proposals
• Predicted probability of objectness: 𝑝𝑗
𝑘
• Predicted proposal transformation: 𝑡𝑗
(𝑘)
= 𝑡𝑥𝑗
𝑘
, 𝑡𝑦𝑗
𝑘
, t𝑤𝑗
𝑘
, tℎ𝑗
𝑘
where
𝑡𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 ,
Anchor boxes
𝐴(𝑘)
= 𝐴𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
(𝑘)
where A𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
, 𝑎𝑦𝑗
(𝑘)
, 𝑎𝑤𝑗
(𝑘)
, 𝑎ℎ𝑗
(𝑘)
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
𝐿 𝑅𝑃𝑁 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
, 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 =
1
2
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝐻 𝑝𝑗
(𝑘)
, 𝑝𝑗
𝑘
+ 𝜆 𝑅𝑃𝑁
1
𝑁𝑎𝑛𝑐
(𝑘)
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝑝𝑗
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑡𝑗
𝑘
, 𝑡𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Training process for Classifier & Regressor
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
c
Ground-truth Classification & Regression associated with proposals 𝑃𝑗
(𝑘)
Find the nearest bounding box from each proposals 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝑃𝑗
𝑘
• Ground-truth Classification: 𝑐𝑗
(𝑘)
≔ 𝑐𝑗,0
(𝑘)
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
(𝑘)
=
1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝑃𝑗
𝑘
< 0.5
0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠
• Ground-truth Regression: 𝑟𝑗
(𝑘)
≔ 𝑟𝑥𝑗
(𝑘)
, 𝑟𝑦𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
(𝑘)
where 𝑟𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑝𝑥 𝑗
(𝑘)
/𝑝𝑤𝑗
(𝑘)
, 𝑟𝑦𝑗
𝑘
= 𝑦𝑖
𝑘
− 𝑝𝑦𝑗
(𝑘)
/𝑝ℎ 𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
= log 𝑤𝑖
𝑘
/𝑝𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑝ℎ 𝑗
(𝑘)
𝐶𝑖
𝑘
+ 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡
Predicted Classification & Regression
• Predicted Classification: 𝑐𝑗
𝑘
= 𝑐𝑗,0
𝑘
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
𝑘
• Predicted Regression: 𝑟𝑗
(𝑘)
= r𝑥𝑗
𝑘
, r𝑦𝑗
𝑘
, r𝑤𝑗
𝑘
, rℎ𝑗
𝑘
where
𝑟𝑗
𝑘
, 𝑐𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘
; 𝑊𝐶𝑅
Region Proposals associated with anchors 𝐴𝑗
(𝑘)
P(𝑘)
≔ 𝑃𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
, 𝑃𝑗
𝑘
= 𝑝𝑥 𝑗
𝑘
, 𝑝𝑦𝑗
𝑘
, 𝑝𝑤𝑗
𝑘
, 𝑝ℎ 𝑗
𝑘
where
𝑝𝑥 𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
+ 𝑎𝑤𝑗
(𝑘)
𝑡𝑥𝑗
(𝑘)
, 𝑝𝑦𝑗
𝑘
= 𝑎𝑦𝑗
(𝑘)
+ 𝑎ℎ𝑗
(𝑘)
𝑡𝑦𝑗
(𝑘)
𝑝𝑤𝑗
𝑘
= 𝑎𝑤𝑗
𝑘
exp 𝑡𝑤𝑗
(𝑘)
, 𝑝ℎ 𝑗
(𝑘)
= 𝑎ℎ𝑗
(𝑘)
exp 𝑡ℎ𝑗
(𝑘)
𝑃(𝑘)
← 𝑁𝑀𝑆(𝑃 𝑘
, 𝑁𝑝𝑟𝑜𝑝)
𝐿 𝐶𝑅 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
, 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 =
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
𝐻 𝑐𝑗
𝑘
, 𝑐𝑗
𝑘
+ 𝜆 𝐶𝑅
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
1 − 𝑐𝑗,0
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑟𝑗
𝑘
, 𝑟𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
The History of object detection
in deep learning
Yolo Yolo v2 SSD
RCNN
Fast RCNN
Faster RCNN
Mask RCNN
DSSD
2012.12
AlexNet
2014.9
VggNet &
InceptionNet
15.12.10
ResNet
2013.11.11
2015.4.30
2015.5.14
15.6.8 15.12.2515.12.08 17.1.23
17.3.20
Application to Ultrasound-based Fetal biometry
References
[Gitbooks] Object Localization and Detection
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html
[ICCV2015 Tutorial] Convolutional Feature Maps
https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf
[Infographic] The Modern History of Object Recognition
https://github.com/Nikasa1889/HistoryObjectRecognition
[Tensorflow Code] tf-Faster-RCNN
https://github.com/kevinjliang/tf-Faster-RCNN
[Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4
[pyimagesearch] Intersection over Union (IoU) for object detection
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[Stanford c231n] Lecture 11: Detection and Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Thank you
E-mail: hpkim0512@yonsei.ac.kr/
Hompage: https://hpkim0512.blogspot.com

More Related Content

What's hot

Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSDThomas Delteil
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnnTaeoh Kim
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Md. Minhazul Haque
 

What's hot (20)

Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
You only look once
You only look onceYou only look once
You only look once
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Yolo
YoloYolo
Yolo
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 

Similar to Tutorial on Object Detection (Faster R-CNN)

CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxSubrata Kumer Paul
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxGopiNathVelivela
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsShrey Verma
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxarifimad15
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentssuser2be88c
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingHsing-chuan Hsieh
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
 

Similar to Tutorial on Object Detection (Faster R-CNN) (20)

CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher Dimensions
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptx
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
LSH
LSHLSH
LSH
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 

Recently uploaded

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Cherry
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycleCherry
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methodsimroshankoirala
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Cherry
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfCherry
 

Recently uploaded (20)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Tutorial on Object Detection (Faster R-CNN)

  • 1. Tutorial Faster R-CNN Object Detection: Localization & Classification Hwa Pyung Kim Department of Computational Science and Engineering, Yonsei University hpkim0512@yonsei.ac.kr
  • 2. 𝑥 𝑦 𝑤 ℎ Bounding box regression (localization): Where? Object Detection: Classification + Regression A dog at (𝒙, 𝒚, 𝒘, 𝒉) + = 1 0 0 ⋮ Dog Cat ⋮ Person Classification (recognition): What? Objection Detection Feature map Encoding (conv&pool) Combining features 𝒙, 𝒚 w h Bounding box information • 𝒙, 𝒚 : top left corner position • w = width • h = height
  • 3. Dog Cat Person ⋮ pool5 features[224,224,3] [7,7,512] Input image 224 224 7 = 224 32 32 = 25 5 = # of pooling 7 7 Vgg16 Networks Pooling CNN-based Object Detection: There are clues of dog (What) at local position (Where) in the convolution feature map Fully-connected layers Classification Regression 𝑥 𝑦 𝑤 ℎ 1 0 0 ⋮ These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”. ⋯ ⋯ Dog
  • 4. Multiple Object Detection: Localize and Classify all objects appearing in the image How many objects are in there? • Classify these multiply overlapping objects • Identify their bounding boxes PASCAL VOC2007
  • 5. Background Person Dining table Extract “region proposals” using selective search method. ConvNet Region based CNN (R-CNN) method CNN input (fixed size) Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape Classifier & Regressor Classifier & Regressor Classifier & Regressor
  • 6. Fast R-CNN feature map ConvNet Classifier & Regressor RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial
  • 7. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks feature map Region Proposal Network RoI pooling proposals ConvNet Classifier & Regressor What is Region Proposal Network?
  • 8. Region Proposal Network (RPN) Region Proposal Network 380 480 11 = 360 32 , 15 = 480 32 32 = 25 5 = # of pooling 512 = # of filters 15 11 512 Conv feature map RPN RPN outputs a set of rectangular object proposals, each with an objectness score. How? Region proposals
  • 9. Region Proposal Network Conv feature map 15 11 512 Region Proposals & Anchor Boxes 𝑠 𝑜𝑏𝑗 𝑠 𝑛𝑜𝑏𝑗 t𝑥 t𝑦 t𝑤 tℎ Fully- connected layers Input: each sliding window 3×3×512 For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 , the proposal is parametrized relative to an anchor. 𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥 𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦 𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤 𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ Output: • 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ • 2 scores: 𝑠 𝑜𝑏𝑗 , 𝑠 𝑛𝑜𝑏𝑗 that estimate probability of object or not object for each proposal Anchor box information • 𝒂𝒙 , 𝒂𝒚 : center position • 𝒂𝒘 = width • 𝒂𝒉 = height Anchor box For example, 𝑎𝑤 = 𝑎ℎ = 128 • 𝑎𝑤 and 𝑎ℎ are fixed. • 𝑎𝑥 , 𝑎𝑦 is determined by the position of the red box
  • 10. Region Proposals & Anchor Boxes ⋮ 𝑠1 𝑜𝑏𝑗 𝑠1 𝑛𝑜𝑏𝑗 t𝑥1 t𝑦1 t𝑤1 tℎ1Conv feature map 15 11 512 Fully- connected layers 3×3×512 • 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed. • 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the position of the red box 9 Anchor boxes = 3 ratios × 3 scales For example, 𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128, 𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯ 𝑎𝑤7 = 1 2 × 𝑎ℎ7 = 128, ⋯ Output: For 𝑖 = 1, ⋯ , 9, • 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖 • 2 scores: 𝑠𝑖 𝑜𝑏𝑗 , 𝑠𝑖 𝑛𝑜𝑏𝑗 that estimate probability of object or not object for each proposal For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 , the 9 proposals are parametrized relative to 9 anchors. Input: each sliding window Region Proposal Network 𝑠2 𝑜𝑏𝑗 𝑠2 𝑛𝑜𝑏𝑗 t𝑥2 t𝑦2 t𝑤2 tℎ2 𝑠9 𝑜𝑏𝑗 𝑠9 𝑛𝑜𝑏𝑗 t𝑥9 t𝑦9 t𝑤9 tℎ9 For 𝑖 = 1, ⋯ 9, 𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖 𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖 𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖 𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖 Anchor box information • 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position • 𝒂𝒘𝒊 = width • 𝒂𝒉𝒊 = height
  • 11. Region Proposal Network Fully- connected layers Conv feature map Anchor boxes 15 11 512 For 𝑖 = 1, ⋯ 9, 𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖 𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖 𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖 𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖 𝑝𝑖 = exp 𝑠𝑖 𝑜𝑏𝑗 exp 𝑠𝑖 𝑜𝑏𝑗 + exp 𝑠𝑖 𝑛𝑜𝑏𝑗 ⋮ 𝑝1 𝑝𝑥1 𝑝𝑦1 𝑝𝑤1 𝑝ℎ1 𝑝2 𝑝𝑥2 𝑝𝑦2 𝑝𝑤2 𝑝ℎ2 𝑝9 𝑝𝑥9 𝑝𝑦9 𝑝𝑤9 𝑝ℎ9 Extract 9 Proposals relative to 9 Anchors Proposals 3×3×512 ⋮ 𝑠1 𝑜𝑏𝑗 𝑠1 𝑛𝑜𝑏𝑗 t𝑥1 t𝑦1 t𝑤1 tℎ1 𝑠2 𝑜𝑏𝑗 𝑠2 𝑛𝑜𝑏𝑗 t𝑥2 t𝑦2 t𝑤2 tℎ2 𝑠9 𝑜𝑏𝑗 𝑠9 𝑛𝑜𝑏𝑗 t𝑥9 t𝑦9 t𝑤9 tℎ9
  • 12. ⋮ ⋮ Total # of windows # of proposals per a window Total # of proposals: 11 × 15 × 9 = 1485 Conv feature map The proposals highly overlaps each other! Need to reduce redundancy. Generate Region Proposals 15 11 512 Total#ofwindows=11×15 Region Proposal Network
  • 13. Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ ⋯ Most probable proposal Region Proposal Network Step 1. Take the most probable proposal from 1485 proposals Proposal information • 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position • 𝒑𝒘𝒊 = width • 𝒑𝒉𝒊 = height • 𝒑𝒊 = objectness probability, 𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓 𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
  • 14. Region Proposal Network Step 2. Compute the 𝐼𝑜𝑈 between the most probable and the other proposals, and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) Step 1. Take the most probable proposal from 1485 proposals Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480 0.83𝐼𝑂𝑈 = 0.71 ⋯ ⋯ 0.30 0 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
  • 15. Region Proposal Network Step 1. Take the most probable proposal from 1485 proposals Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480 0.830.71 ⋯ ⋯ 0.30 0 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2 Step 2. Compute the 𝐼𝑜𝑈 between the most probable and the other proposals, and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) 𝐼𝑂𝑈 =
  • 16. Most probable proposal 30 proposals having IoU>0.7 are discarded. Region Proposal Network Given the most probable proposal, the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) Summary of step 1-2 in NMS. Step 3: Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process. Next most probable proposal 36 proposals having IoU>0.7 are discarded. Reduce redundancy by NMS
  • 17. Region Proposal Network Before NMS After NMS 1,485 proposals 300 proposals Repeats the previous procedure until… Reduce redundancy by NMS
  • 18. Summary of RPN Inputs: • Conv feature map Outputs: • Region proposals coordinates. • Probabilities representing how likely the image in that region proposal will be an object. Region Proposal Network
  • 19. feature map Region Proposal Network RoI pooling proposals ConvNet Now we are ready to explain Classifier & Regressor. Classifier & Regressor Classifier & Regressor
  • 20. RoI pooling layer Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′ , 𝑝𝑦′ , 𝑝𝑤′ , 𝑝ℎ′ 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ Classifier & Regressor Bilinear interpolation & Max pooling Input for Classifier & Regressor : fixed-size Conv feature map Bilinear interpolation & Max pooling Convert the features inside valid RoI into a small feature map with a fixed spatial extent. 𝑝𝑥′ = 𝑝𝑥 ⋅ 15 , 𝑝𝑦′ = 𝑝𝑦 ⋅ 11 , 𝑝𝑤′ = 𝑝𝑤 ⋅ 15 , 𝑝ℎ′ = 𝑝ℎ ⋅ 11 360 480 11 15 5 8 3 9 7 7 7 7 𝑝𝑥′ , 𝑝𝑦′ , 𝑝𝑤′ , 𝑝ℎ′
  • 21. ⋯ 300 RoI pooled feature maps RoI pooling layer generates inputs for Classifier & Regressor Classifier & Regressor 7 7 512 7 7 512 7 7 512 7 7 512
  • 22. ⋮ 𝑠0 𝑟𝑥0 𝑟𝑦0 𝑟𝑤0 𝑟ℎ0 𝑠15 𝑟𝑥15 𝑟𝑦15 𝑟𝑤15 𝑟ℎ15 𝑠20 𝑟𝑥20 𝑟𝑦20 𝑟𝑤20 𝑟ℎ20 𝑝0 = 0.0124 𝑝15 = 0.9797 𝑝20 = 0.0001 ⋮ RoI pooling Classification & Regression per each proposal 𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖 𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖 𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖 ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖 𝑝𝑖 = exp 𝑠𝑖 𝑗=0 20 exp 𝑠𝑗 Background Person TV monitor 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ Fully-connected layers ⋮ 𝑝0 𝑥0 𝑦0 𝑤0 ℎ0 𝑝15 𝑥15 𝑦15 𝑤15 ℎ15 𝑝20 𝑥20 𝑦20 𝑤20 ℎ20 ⋮ Proposal Classification & Bounding-box regression Each of the 21 classes gets its own refined bounding-box prediction and assign estimated probability. Classifier & Regressor 7 7 512 7×7×512 4096
  • 23. Summary of Classification & Regression Regress & classify each class from proposals ⋮ Background Person TV monitor ⋮ ⋮ Reduce redundancy by NMS Dining table ⋮ None None Classifier & Regressor Discard bounding boxes (p < 0.6 or background) ⋮ ⋮ ⋮ Region Proposals
  • 24. Summary of Classifier & Regressor Inputs: • Conv feature map • Region proposals Outputs: • Bounding boxes coordinate of objects in the image. • Classification of bounding boxes Classifier & Regressor
  • 25. Training process for RPN Ground-truth proposals associated with anchors 𝐴𝑗 𝑘 Find the nearest bounding box from each anchors, 𝐵𝑖 𝑘 = argmax 𝐵∈ 𝐵(𝑘) 𝐼𝑜𝑈 𝐵, 𝐴𝑗 𝑘 • Ground-truth probability of objectness: 𝑝𝑗 (𝑘) ≔ 1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝐴𝑗 𝑘 > 0.7 0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝐴𝑗 𝑘 < 0.3 • Ground-truth proposal transformation: 𝑡𝑗 (𝑘) ≔ 𝑡𝑥𝑗 (𝑘) , 𝑡𝑦𝑗 (𝑘) , 𝑡𝑤𝑗 (𝑘) , 𝑡ℎ𝑗 (𝑘) where Δ 𝑥𝑗 (𝑘) = 𝑥𝑖 𝑘 − 𝑎𝑥𝑗 (𝑘) /𝑎𝑤𝑗 (𝑘) , Δ 𝑦𝑗 𝑘 = 𝑦𝑖 (𝑘) − 𝑎𝑦𝑗 (𝑘) /𝑎ℎ𝑗 (𝑘) , Δ 𝑤𝑗 = log 𝑤𝑖 𝑘 /𝑎𝑤𝑗 (𝑘) , Δℎ𝑗 𝑘 = log ℎ𝑖 𝑘 /𝑎ℎ𝑗 (𝑘) Predicted proposals • Predicted probability of objectness: 𝑝𝑗 𝑘 • Predicted proposal transformation: 𝑡𝑗 (𝑘) = 𝑡𝑥𝑗 𝑘 , 𝑡𝑦𝑗 𝑘 , t𝑤𝑗 𝑘 , tℎ𝑗 𝑘 where 𝑡𝑗 𝑘 , 𝑝𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 = 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘 ; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 , Anchor boxes 𝐴(𝑘) = 𝐴𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 (𝑘) where A𝑗 𝑘 = 𝑎𝑥𝑗 (𝑘) , 𝑎𝑦𝑗 (𝑘) , 𝑎𝑤𝑗 (𝑘) , 𝑎ℎ𝑗 (𝑘) Input • Image 𝑋 𝑘 Ground-truth • Bounding boxes 𝐵(𝑘) = 𝐵𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) where 𝐵𝑖 𝑘 = 𝑥𝑖 𝑘 , 𝑦𝑖 𝑘 , 𝑤𝑖 𝑘 , ℎ𝑖 𝑘 • Classes 𝐶(𝑘) = 𝐶𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) 𝐿 𝑅𝑃𝑁 𝑝𝑗 (𝑘) , 𝑡𝑗 (𝑘) , 𝑝𝑗 (𝑘) , 𝑡𝑗 (𝑘) ; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 = 1 2 𝑗=1 𝑁 𝑏𝑎𝑡𝑐ℎ 𝐻 𝑝𝑗 (𝑘) , 𝑝𝑗 𝑘 + 𝜆 𝑅𝑃𝑁 1 𝑁𝑎𝑛𝑐 (𝑘) 𝑗=1 𝑁 𝑏𝑎𝑡𝑐ℎ 𝑝𝑗 𝑘 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑡𝑗 𝑘 , 𝑡𝑗 𝑘 where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑥 = 0.5𝑥2 , 𝑖𝑓 𝑥 < 1 𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
  • 26. Training process for Classifier & Regressor Input • Image 𝑋 𝑘 Ground-truth • Bounding boxes 𝐵(𝑘) = 𝐵𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) where 𝐵𝑖 𝑘 = 𝑥𝑖 𝑘 , 𝑦𝑖 𝑘 , 𝑤𝑖 𝑘 , ℎ𝑖 𝑘 • Classes 𝐶(𝑘) = 𝐶𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) c Ground-truth Classification & Regression associated with proposals 𝑃𝑗 (𝑘) Find the nearest bounding box from each proposals 𝐵𝑖 𝑘 = argmax 𝐵∈ 𝐵(𝑘) 𝐼𝑜𝑈 𝐵, 𝑃𝑗 𝑘 • Ground-truth Classification: 𝑐𝑗 (𝑘) ≔ 𝑐𝑗,0 (𝑘) , ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠 (𝑘) = 1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝑃𝑗 𝑘 < 0.5 0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠 • Ground-truth Regression: 𝑟𝑗 (𝑘) ≔ 𝑟𝑥𝑗 (𝑘) , 𝑟𝑦𝑗 (𝑘) , 𝑟𝑤𝑗 (𝑘) , 𝑟ℎ𝑗 (𝑘) where 𝑟𝑥𝑗 (𝑘) = 𝑥𝑖 𝑘 − 𝑝𝑥 𝑗 (𝑘) /𝑝𝑤𝑗 (𝑘) , 𝑟𝑦𝑗 𝑘 = 𝑦𝑖 𝑘 − 𝑝𝑦𝑗 (𝑘) /𝑝ℎ 𝑗 (𝑘) , 𝑟𝑤𝑗 (𝑘) = log 𝑤𝑖 𝑘 /𝑝𝑤𝑗 (𝑘) , 𝑟ℎ𝑗 𝑘 = log ℎ𝑖 𝑘 /𝑝ℎ 𝑗 (𝑘) 𝐶𝑖 𝑘 + 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡 Predicted Classification & Regression • Predicted Classification: 𝑐𝑗 𝑘 = 𝑐𝑗,0 𝑘 , ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠 𝑘 • Predicted Regression: 𝑟𝑗 (𝑘) = r𝑥𝑗 𝑘 , r𝑦𝑗 𝑘 , r𝑤𝑗 𝑘 , rℎ𝑗 𝑘 where 𝑟𝑗 𝑘 , 𝑐𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 = 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘 ; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘 ; 𝑊𝐶𝑅 Region Proposals associated with anchors 𝐴𝑗 (𝑘) P(𝑘) ≔ 𝑃𝑗 𝑘 , 𝑝𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 , 𝑃𝑗 𝑘 = 𝑝𝑥 𝑗 𝑘 , 𝑝𝑦𝑗 𝑘 , 𝑝𝑤𝑗 𝑘 , 𝑝ℎ 𝑗 𝑘 where 𝑝𝑥 𝑗 𝑘 = 𝑎𝑥𝑗 (𝑘) + 𝑎𝑤𝑗 (𝑘) 𝑡𝑥𝑗 (𝑘) , 𝑝𝑦𝑗 𝑘 = 𝑎𝑦𝑗 (𝑘) + 𝑎ℎ𝑗 (𝑘) 𝑡𝑦𝑗 (𝑘) 𝑝𝑤𝑗 𝑘 = 𝑎𝑤𝑗 𝑘 exp 𝑡𝑤𝑗 (𝑘) , 𝑝ℎ 𝑗 (𝑘) = 𝑎ℎ𝑗 (𝑘) exp 𝑡ℎ𝑗 (𝑘) 𝑃(𝑘) ← 𝑁𝑀𝑆(𝑃 𝑘 , 𝑁𝑝𝑟𝑜𝑝) 𝐿 𝐶𝑅 𝑟𝑗 (𝑘) , 𝑐𝑗 (𝑘) , 𝑟𝑗 (𝑘) , 𝑐𝑗 (𝑘) ; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 = 𝑗=1 𝑁 𝑝𝑟𝑜𝑝 𝐻 𝑐𝑗 𝑘 , 𝑐𝑗 𝑘 + 𝜆 𝐶𝑅 𝑗=1 𝑁 𝑝𝑟𝑜𝑝 1 − 𝑐𝑗,0 𝑘 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑟𝑗 𝑘 , 𝑟𝑗 𝑘 where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑥 = 0.5𝑥2 , 𝑖𝑓 𝑥 < 1 𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
  • 27. The History of object detection in deep learning Yolo Yolo v2 SSD RCNN Fast RCNN Faster RCNN Mask RCNN DSSD 2012.12 AlexNet 2014.9 VggNet & InceptionNet 15.12.10 ResNet 2013.11.11 2015.4.30 2015.5.14 15.6.8 15.12.2515.12.08 17.1.23 17.3.20
  • 29. References [Gitbooks] Object Localization and Detection https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html [ICCV2015 Tutorial] Convolutional Feature Maps https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf [Infographic] The Modern History of Object Recognition https://github.com/Nikasa1889/HistoryObjectRecognition [Tensorflow Code] tf-Faster-RCNN https://github.com/kevinjliang/tf-Faster-RCNN [Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 [pyimagesearch] Intersection over Union (IoU) for object detection https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/ [Stanford c231n] Lecture 11: Detection and Segmentation http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
  • 30. Thank you E-mail: hpkim0512@yonsei.ac.kr/ Hompage: https://hpkim0512.blogspot.com