Tackling Open Images Challenge (2019)

Mobility Technologies Co., Ltd.
Tackling Open Images Challenge
- presented at the 26th Symposium on Sensing via Image
Information
June 12, 2020
Hiroto Honda, Mobility Technologies Co., Ltd.

Mobility Technologies Co., Ltd.2
1 About Me

About Me
Hiroto Honda
https://hirotomusiker.github.io/
kaggle name : Schwert
‘Schwert’ = sword in German
R&D of Imaging devices in a Japanese Electronics company
→ DeNA computer vision team →Mobility Technologies

Check out my Blog Series!
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
Digging into Detectron 2 (object detection)

2 Kaggle and Open Images Challenge

Val Data
6
How to Try Kaggle
Test data
→private leaderboard
→public leaderboard
Train Data
How can you maximize your
model’s score on the HIDDEN
test data?
Evaluation metrics are described in the ‘Evaluation’ section - mean
average precision、Dice Coefficient, and so on. Sometimes non-standard
metrics are employed and discussed in the ‘Discussion’ threads.
Cross Validation and Test data
Val Data
Train Data
Val Data
Train Data

Open Images Dataset (v5) :
900 million images collected from Flickr
・16M Bounding box annotations of 600 classes on 1.9M images
・Segmentation polygons on 350-class instances
・329 inter-object relationship
Open Images Challenge
https://storage.googleapis.com/openimages/web/challenge.html
https://www.kaggle.com/c/open-images-2019-object-detection/

1GB of bounding box data!! (on 500GB of image data)
How Huge is Open Images Dataset ?

3 How to Tackle Object Detection
Challenges

Object Detection
- detects object positions, sizes and classes from an image
- tremendous success of deep-learning-based approaches
(e.g. Faster R-CNN, YOLO, and EfficientDet)

NOT RECOMMENDED!
Okay, Why Not Code Object Detectors

What an Object Detector Looks Like

Backbone Network
Region Proposal
Network
ROI Head
accuracy written in papers is achieved by managing
more than 100 config parameters
What an Object Detector Looks Like

How It Was Hard to Reproduce YOLOv3 in PyTorch
took months to perfectly reproduce the original repo’s accuracy.
implementation details such as weight init, loss definition, and lr schedule are
critical
https://github.com/DeNA/PyTorch_YOLOv3
blog: https://medium.com/@hirotoschwert/reproducing-training-performance-of-yolov3-in-pytorch-part-0-a792e15ac90d

You Should Care Tiny Accuracy Differences
Model Name AP
A: Faster R-CNN Res50 34.8
B: Faster R-CNN Res50 +
Feature Pyramid Network
36.7
C: RetinaNet (single-shot)
Res50 Feature Pyramid
Network + Focal Loss
35.7
NIPS’15
CVPR’17
ICCV’17
model B from a non-official repo with AP=33.0 is less accurate than
the official model A

MMDetection (CUHK)　
https://github.com/open-mmlab/mmdetection
Detectron 2 (Facebook)
https://github.com/facebookresearch/detectron2
automl/efficientdet (Google)
https://github.com/google/automl/tree/master/efficientdet
tpu/models (Google)
https://github.com/tensorflow/tpu/tree/master/models/official
R. Wightman repos (tf->pytorch, non-official)
https://github.com/rwightman
Popular and Reliable Detection Frameworks
Authors’ official repos are basically recommended
Schwert used
maskrcnn-benchmark for the
competition

17
takes 1 GPU month to train one model!
How to Choose Approaches for Large-scale Detection Competition
1month
one attempt is so costly...

１：Last Year’s solutions
２：Detection papers (CVPR, ICCV…)
３：Benchmark website such as papers with code
are good resources to find:
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
How to Choose Approaches for Large-scale Detection Competition

Looks like ResNet50 works..
OK, let’s try ResNeXt101
...and why not adding Random Cropping_
Example of Bad Experiment
model 1 (baseline)
new
feature
A
new
feature
B
model 2
Important to add / remove one exclusive feature at a time!

4 Schwert’s Solution

Schwert’s ranks：
Detection Track: 6th / 558 (Gold) [1] [2]
Segmentation Track: 11th / 193 (Silver) [3]
Relationship Track: 30th / 201 (Silver)
Results of Open Images Competition (2019)
# Team Name # of
members
score
1 MMfruit 5 0.65887
2 imagesearch 7 0.65337
3 Prisms 6 0.64214
4 PFDet 6 0.62221
5 Omni-Detection 3 0.60406
6 Schwert 1 (solo) 0.60231
7 Team 5 5 0.60210
8 pudae 1 (solo) 0.59727
Got a solo gold medal at the first kaggle competition!

“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
EFAC examples from the solution writeups of Open Images 2018 [4][5][6]
・class balancing (3rd、5pts↑)
・Ensemble (1st / 3rd、5pts↑)
・voting NMS (1st / 3rd)
・long cosine annealing (2nd)
・parent class expansion
・ResNext 152 + SE (1st, 2nd, 3rd)
class balancing and model ensemble are essential

mean Average Precision (mAP) at IoU > 0.5 , avg of 500 classes
1: EVERY class is equal, even if it’s extremely rare.
　　　　　　images including ‘person’ instances：250,000
　　　　　　 ‘torch’ instances ： 18
2: Strict localization is not required.
classiﬁcation matters...
Evaluation Metrics

Method 1：Class Balancing [1]
- Equal probability for a model to encounter a certain class.
- Rare classes: increase sampling rate.
- Non-rare classes: limit number of images.
- Total number of images: 4k x 500 (2M) → efficient training

Method 2 : Ensembling Pipeline of Multiple Models [1]
・Baseliene model: ResNeXt152 [7] + Deformable Convnets v2 [8] + Feature
Pyramid Network [9]
・Train different types of models on training data with different seeds
・8 models are ensembled

Contribution of each exclusive feature on val and leaderboard accuracies
Ablation Study
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt101 None Inference Time 4k per class 69.8 54.0
ResNeXt101 DCN v2 Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 16k per class 72.4 (+2.6)
ResNeXt152 DCN v2 Inference Time 4k per class 73.2 (+3.4) 56.4 (best
single model)
ResNeXt152 None Training Time 4k per class 72.4 (+2.6)*

Method 3：Enhanced (Voting) NMS [6]
Non-Maximum Suppression for Model Ensembling
When the multiple boxes from different models are overlapped, the
resulting box earns added confidence scores

Result of 8 Model Ensembling
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt152 DCN v2 Inference
Time
4k per class 73.2 (+3.4) 56.4 (best
single
model)
Ensemble of
8 models +
NMS tuned
60.23
~13th
place
6th
place!

Visualization Demo of the Best Single Model

Independently train detection and segmentation
Schwert’s Approach on Segmentation Track (11th Place) [2]
Inference results using detection model

5 Take-Home Messages

・Kaggle is a wonderful platform where you can learn cutting-edge computer vision
methods and implementations. Discussion with great kagglers is always fun
・Like research, it’s a tough but fun job to develop (or surpass) the state-of-the-art method
methods
・Choosing a reliable framework is a must for Object Detection competitions
・Understand the past solutions and pick an Exclusive Feature that Apparently Contributes to
the score (EFAC)
Take-Home Messages

[1] Hiroto Honda, “The 6th Place Solution for the Open Images 2019 Object Detection Track, ”
presented at ICCVW 2019, https://hirotomusiker.github.io/files/schwert_open_images_6th_solution_v1.pdf
[2] Hiroto Honda, “6th place solution” , discussion in Open Images 2019 Object Detection Track,
https://www.kaggle.com/c/open-images-2019-object-detection/discussion/110953
[3] Hiroto Honda, “11th place solution, discussion in Open Images 2019 Instance Segmentation Track,
https://www.kaggle.com/c/open-images-2019-instance-segmentation/discussion/111351
[4] kivajok, 1st place writeup, https://storage.googleapis.com/openimages/web/challenge.html
[5] Takuya Akiba et al., “PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection
Track”, arXiv:1809.00778
[6] Yuan Gao et al., “Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete
Annotation and Data Imbalance”, arXiv:1810.06208
[7] Saining Xie et al., “Aggregated Residual Transformations for Deep Neural Networks,” CVPR 2017
[8] Xizhou Zhu et al., “Deformable ConvNets v2: More Deformable, Better Results”, CVPR 2019
[9] Tsung-Yi Lin et al., “Feature Pyramid Networks for Object Detection”, CVPR 2017
* All the photos used in this presentation were taken by Hiroto Honda
References

文章·画像等の内容の無断転載及び複製等の行為はご遠慮ください。
35

Tackling Open Images Challenge (2019)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tackling Open Images Challenge (2019)

Similar to Tackling Open Images Challenge (2019) (20)

Recently uploaded

Recently uploaded (20)

Tackling Open Images Challenge (2019)