PR-146: CornerNet detecting objects as paired keypoints

PR-146
CornerNet: Detecting Object as Paired Keypoints
Hei Law, Jia Deng. ECCV’18
visonNoob(Jaewon Lee)

Object Detection
person dog dog
(multiple objects)
https://youtu.be/8jfscFuP_9k

Many slides from https://heilaw.github.io/
Author’s page : https://heilaw.github.io/
Code : https://github.com/princeton-vl/CornerNet (PyTorch impl)
ECCV’18 oral session : https://youtu.be/aJnvTT1-spc
Slides : https://heilaw.github.io/slides/CornerNet.pptx

Paper list from 2014 to now(2019) for object detection based on DL
https://github.com/hoya012/deep_learning_object_detection

Main Contributions
• CornerNet: Detecting objects as pairs of top-left and bottom-
right corners
• Corner pooling to help better localize corners
• State-of-the-art performance among single-stage detectors
https://heilaw.github.io/
2. Introduction

CornerNet: Detecting Objects as Paired Keypoints
2. Introduction

Person
Top-Left
Corner?
ConvNet
Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
2. Introduction

Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose Botto
m-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
2. Introduction

Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
Loss: distance
Loss: similarity
2. Introduction

2. Introduction
https://heilaw.github.io/ https://www.youtube.com/watch?v=pW6nZXeWlGM https://youtu.be/pW6nZXeWlGM?t=90

Experiment: CornerNet versus Others
42.1
41.8
39.1
33.2
21.6
45.7
42.8
39.8
0 5 10 15 20 25 30 35 40 45 50
CornerNet
RefineDet
RetinaNet
DSSD
YOLOv2
D-RFCN + SNIP
Cascade R-CNN
Mask R-CNN
Two-stage One-stage mAP
2. Introduction

Two-Stage Detector
[Girshick et al. CVPR’14] [He et al. ECCV’14] [He et al. ICCV’17] [Cai & Vasconcelos, CVPR’18] [Singh & Davis, CVPR’18]
Region Pooling
[Girshick, ICCV’15]
Region of Interest
[Ren et al. NIPS’15]
1st Network
2nd Network
Person
Person
r-cnn SPP Mask r-cnn Cascade r-cnn snip
Faster R-CNN PR-012 : https://youtu.be/kcPAGIgBGRs
Mask R-CNN PR-057 : https://youtu.be/RtSZALC9DlU
3. Related Works

One-stage Detector
Class
Person
Class
Person
Class
Background
Anchors
Anchors
Anchors
[Redmon & Farhadi, CVPR’17] [Shen et al. ICCV’17] [Liu et al. ECCV’16] [Fu et al. arXiv’17] [Lin et al. ICCV’17] [Zhang et al. CVPR’18]
ConvNet
Yolo9000 Dsod Ssd Dssd RetinaNet RefineDet
Yolo PR-016 : https://youtu.be/eTDcoeqj1_w
Yolo9000 PR-023 : https://youtu.be/6fdclSGgeio
SSD PR-132 https://youtu.be/ej1ISEoAK5g
3. Related Works

Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks."
Advances in neural information processing systems. 2015. (https://arxiv.org/abs/1506.01497)
3. Related Works
Anchor Boxes
https://medium.com/@andersasac/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9

Drawbacks of Anchor Boxes
1. Need a large number of anchors
 A tiny fraction of anchors are positive examples
 Slow down training [Lin et al. ICCV’17]
2. Extra hyperparameters – sizes and aspect ratios
At least one anchor
sufficiently overlaps
with ground-truth
3. Related Works

3.2 Detecting Corner
Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose
estimation." European Conference on Computer Vision. Springer, Cham, 2016.

Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE
international conference on computer vision. 2017.
Ground-Truth Annotation

Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
Faster R-CNN
Bounding-box regression
𝑜 𝑘: offset
n : downsampling factor
𝑥 𝑘, yk: coordinate for corner k

Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Righ
t?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
Loss: distance
Loss: similarity
3.3 Grouping Corners

Associative Embedding [Newell et al. NIPS’1
7]

Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint detection and grouping."
Advances in Neural Information Processing Systems. 2017.

𝑒𝑡 𝑘
: embedding for the top-left corner of object k
𝑒 𝑏 𝑘
: embedding for the bottom-right corner of object k
𝑒 𝑘: : average of 𝑒𝑡 𝑘
and 𝑒 𝑏 𝑘
△ : 1

Top-Left Corner Pooling
ma
x
max
feature maps

4.4 Ablation Study
Corner Pooling

α and β to 0.1 and γ to 1
4 Experiments
4.1 Training Details
- Implementation in PyTorch https://github.com/princeton-vl/CornerNet
- Network is randomly initialized with no pretraining on any external dataset
- Input Resolution : 511 x 511, Output Resolution : 128 x 128
- Data augmentation : horizontal flipping, random scaling/cropping/color jittering
- Bach_size : 49 (Total 10 Tintan X GPUs, 4 on the master GPU, 5 images for the rest)
- For ablation study : 250k iterations with a learning rate of 2.5 × 10−4
- For comparing with others : an extra 250k iterations and reduce the learning rate to 2.5 ×
10−5 for the last 50k iterations.

4 Experiments
4.2 Testing Details
A simple post-processing algorithm
1. Non-maximal suppression :
3 x 3 max pooling layer on the corner heatmap
2. Picking the top 100 top-left, bottom-right corners from the heatmap
3. The corner locations are adjusted by the corresponding offsets
4. Calculation L1 distances between the embeddings of the top-left and bottom-right corners.
5. Pairs that have distances greater than 0.5 or contain corners from different categories are
rejected.
6. The average scores of the top-left and bottom-right corners are used as the detection
Generating
bounding boxes

4.5 Comparisons with state-of-the-art detectors

Conclusion
• CornerNet: Detecting objects as pairs of top-left and bottom-
right corners
• Corner pooling to help better localize corners
• State-of-the-art performance among single-stage detectors

Further Discussion
• Other backbone?
• Occlusion between points?
• Corner Pooling
• Speed?
Corner pooling

The average inference time : 244ms per image on a Titan X (PASCAL) GPU (AP : 42.1)
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.

REFERENCES
[1] Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints."
Proceedings of the European Conference on Computer Vision (ECCV). 2018.
[2] Lin, Tsung-Yi, et al. "Focal loss for dense object detection."
Proceedings of the IEEE international conference on computer vision. 2017.
[3] Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint
detection and grouping." Advances in Neural Information Processing Systems. 2017.
[4] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
[5] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose
estimation." European Conference on Computer Vision. Springer, Cham, 2016.

PR-146: CornerNet detecting objects as paired keypoints

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PR-146: CornerNet detecting objects as paired keypoints

Similar to PR-146: CornerNet detecting objects as paired keypoints (20)

More from jaewon lee

More from jaewon lee (9)

Recently uploaded

Recently uploaded (20)

PR-146: CornerNet detecting objects as paired keypoints

Editor's Notes