ロス関数は
物体の位置ずれである、localization loss (loc)と
物体のクラスである、confidence loss (conf)を組み合わせたもの
各画像で出てきた全ての出力に対して、(1)式を計算する
(Nはマッチしたボックスの数、重みαは実験では1.0)
Training Objective
29
SSD300 is alreadybetter than Faster R-CNN by 1.1%
SSD512 is 3.6% better.
PASCAL VOC 2007
37
38.
- Data augmentationは重要
-Default boxは多い方がいい
- Atrous is faster
- 使わない場合は精度はほぼ同じで20%遅い
Model analysis
38
39.
- Multiple outputlayers at different resolutions is better
- SSDのメジャーコントリビューション
- conv7だけだと一番精度が低い
- ROI Poolingを使わないので"collapsing bins problem"は起きない
Model analysis
39
• [1] Girshick,Ross, et al. "Rich feature hierarchies for accurate object detection and semantic
segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
• [2] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition."
European Conference on Computer Vision. Springer International Publishing, 2014.
• [3] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
• [4] Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks."
Advances in neural information processing systems. 2015.
• [5] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:
1506.02640 (2015).
• [6] Liu, Wei, et al. "SSD: Single Shot MultiBox Detector." arXiv preprint arXiv:1512.02325 (2015).
• [7] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer
vision 104.2 (2013): 154-171.
Appendix
45