8. SPP net
SPP-net
any size
4096
1000
4096
spatial pyramid
pooling
• Fix bin numbers
• DO NOT fix bin size
Spatial Pyramid Pooling
conv feature maps
conv layers
input image
region
fc layers
…...
10. RCNN vs. SPP
• image regions vs. feature map regions
image
SPP-net
1 net on full image
net
feature
feature
feature
net
feature
image
R-CNN
2000 nets on image regions
net
feature
net
feature
net
feature
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”
K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
41. Benefits from Multi-task training
• Convenient training
• Improve results. Tasks influence each other through the ConvNet
• 𝜆 = 0, not BB regressors. Only CLS
• 𝜆 = 1, but disabled BB regressors at test time
• Isolates network’s CLS accuracy for comparison
• Improves pure CLS accuracy! (+0.8~1.1 mAP)
• Train with CLS loss only, then train BB regressors layer 𝐿𝑙𝑜𝑐 freezing others.
• Good, but still under performs multi-task learning
Results (Multi-task training)
46. • DeepMultiBox
• Scalable object detection using DNN
• Class-agnostic scalable object detection
• Only Bounding box. Not aware of what the object is in the box.
• Prediction a set of bounding boxes where potential objects are
• Localize then recognize
• Boxes generated using single DNN
• Outputs
• fixed number of bounding boxes.
• A score for each box. Confidence of the box containing an object.
46
Introduction