2. Introduction
● A tricky challenge in object detection
○ A detector trained with low IoU threshold produces noisy bounding boxes
○ A detector trained with high IoU threshold (weirdly)shows degraded performance
Low IoU
threshold
noisy
bboxes
High IoU
threshold
high quality
bboxes
X
3. Why high IoU leads to worse performance?
● Overfitting - The number of positive examples exponentially vanishes when trained with a high
IoU threshold
○ being more picky towards the bounding box quality
● Mismatch between the IoUs for which the detector is optimal and those of the input hypotheses
during inference
○ Suppose a detector is trained for IoU of 0.5 but if it’s asked for a competition
where the criteria is 0.7, there’s a mismatch.
0.6
0.6
0.69
0.8
0.9 0.6
0.68
IoU threshold: 0.7
detector
But I was trained
with IoU 0.5..
it’s time to test
with IoU 0.7!
4. Motivations of Cascade R-CNN
[1] A detector optimized at a single IoU level is not necessarily optimal at other levels
[2] A detector can only have high quality predictions if presented with high quality proposals (e.g. from
RPN)
(Bbox IoU with GT from RPN)
RPN
5. Motivations of Cascade R-CNN
● Just increasing IoU threshold doesn’t solve the problem.
1. The distribution of bounding box quality from a proposal network is
heavily imbalanced towards low quality. If you increase IoU threshold,
a lot of examples are wiped out. -> resulting in overfitting
2. High quality detectors are only optimal for high quality proposals. A
large . distribution gap between RPN and detection head leads to
mismatch
High quality proposals
Low quality proposals
Goal: mAP70
RPN
Detection
Head
6. Cascade R-CNN
● Cascade-RCNN “stages” are trained sequentially with increasing IoU thresholds, using
the output of one stage to train the next, being more selective against close false
positives
○ Let a single detector to handle a single IoU!
● The output of a detector is a “good distribution” for training the next higher quality
detector
● The same cascade procedure is also applied at inference
Stage 1
(IoU 0.5)
Stage 2
(IoU 0.6)
Stage 3
(IoU 0.7)
0.3
0.7
0.3
● H0: RPN (proposal network)
● H1: Detection Head
● C: Classification score
● B: Bounding box predictions
RPN
RPN
Head
7. Summary of Target Problems
[1] A detector optimized at a single IoU level is not necessarily optimal at other levels
[2] A detector can only have high quality predictions if presented with high quality proposals (e.g. from
RPN) -> large distribution gap b/w RPN and Detection Head
[3] Overfitting - Vanishing positive examples
[4] Mismatch between the IoUs for which the detector is optimal and those of the input hypotheses during
inference
8. Stage 1
(IoU 0.5)
Stage 2
(IoU 0.6)
Stage 3
(IoU 0.7)
[1] A detector optimized at a single IoU level is not necessarily optimal at other levels
[2] A detector can only have high quality predictions if presented with high quality proposals (e.g. from RPN) -> large
distribution gap b/w RPN and Detection Head
[3] Overfitting - Vanishing positive examples
[4] Mismatch between the IoUs for which the detector is optimal and those of the input hypotheses during inference
9. [1] A detector optimized at a single IoU level is not necessarily optimal at other levels
[2] A detector can only have high quality predictions if presentedwith high quality proposals (e.g. from RPN) -> large
distribution gap b/w RPN and Detection Head
[3] Overfitting- Vanishing positive examples
[4] Mismatch between the IoUs for which the detector is optimal and those of the input hypotheses during inference
Stage 1
(IoU 0.5)
Stage 2
(IoU 0.6)
Stage 3
(IoU 0.7)
resampling distribution to
higher quality
10. [1] A detector optimized at a single IoU level is not necessarily optimal at other levels
[2] A detector can only have high quality predictions if presented with high quality proposals (e.g. from RPN) -> large
distribution gap b/w RPN and Detection Head
[3] Overfitting - Vanishing positive examples
[4] Mismatch between the IoUs for which the detector is optimal and those of the input hypotheses during inference
Stage 1
(IoU 0.5)
Stage 2
(IoU 0.6)
Stage 3
(IoU 0.7)
Inference