2. Main Topic
Scale Invariance in Object Recognition(Detection) Tasks
PR-199: SNIPER : Efficient Multi-Scale Training 2
3. Preliminary
- Scale Problem in Object Recognition
- Multi Scale Strategies
- SNIP(Scale Normalized Image Pyramid)
PR-199: SNIPER : Efficient Multi-Scale Training 3
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
https://youtu.be/nimHWHxjBJ8
4. Relative Scale =
!"#$(&#'& ()*'+$ )
!"#$(&#'& -.&/' )
MS COCO dataset has L
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network4
11. Fundamental Problem of Image Pyramid?
PR-199: SNIPER : Efficient Multi-Scale Training 11
12. Train
Train
Train
Image Pyramid has to process
14 times more pixels
+ Can’t be in same batch
Original
x2
x3
PR-199: SNIPER : Efficient Multi-Scale Training 12
33. Training Time (+benefit)
- Randomly sample chips from the whole dataset for batch.
MS COCO case
- 128 batch size (8 GPU)
- Area range : (0,802), (322, 1502), and (1202, inf)
- On average 5 chips (512x512) when training on scales (512/ms, 1.667, 3)
- Just 30% more than pixel than single scale training (800x1333)
- Always same input size (512x512)
- 3 scale training + large batch size (good batch normalization)
- Image resolution bottleneck is alleviated
OpenImagesV4 (1.7M)
- High resolution(1024x768) -> less important up-sampling
- Training on scales (512/ms, 1) -> total 3.5M chips (512x512)
Experiments
PR-199: SNIPER : Efficient Multi-Scale Training 33
34. Experiments
MS COCO
Training
- 14 hours to train SNIPER on single 8 GPU V100 node with Faster-RCNN with ResNet-101
Inference Time
- (480, 512), (800, 1280) and (1400, 2000)
- Soft-NSM
PR-199: SNIPER : Efficient Multi-Scale Training 34
35. SNIPER uses negative chip mining to reduce the false positive rate
while speeding up the training by skipping the easy regions inside the image.
PR-199: SNIPER : Efficient Multi-Scale Training 35