Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Devil in the Details:
Analysing the Performance
of ConvNet Features
Ken Chatfield - University of Oxford
May 2015
The Devil is still in the Details
2011 2014
• This work is about comparing the latest ConvNet
based feature representations on common ground
• We compare both differe...
Performance Evolution over VOC2007
BOW
32K
–
IFV-BL
327K
–
IFV
84K
–
IFV
84K
f s
DeCAF
4K
t t
CNN-F
4K
f s
CNN-M 2K
2K
f s...
Evaluation Setup
SVM Classifier
train
test
training set
test set
Evaluate using
mAP, accuracy etc.
classifier output
Pre-tra...
Outline
1
2
3
4
Different pre-trained networks
Data augmentation (for both CNN and IFV)
Dataset fine-tuning
• CNN-F Network
• CNN-M Network
• CNN-S Network
• VGG Very Deep Network
Network Architectures
Network Architectures
CNN-F Network
Similar to Krizhevsky et al. (ILSVRC-2012 winner)
conv3
256x3x3
stride
1
conv4
512x3x3...
Network Architectures
CNN-M Network
Similar to Zeiler & Fergus (ILSVRC-2013 winner)
conv3
512x3x3
stride
1
conv4
512x3x3
c...
Network Architectures
CNN-S Network
Similar to Overfeat ‘accurate’ network (ICLR 2014)
conv3
512x3x3
stride
1
conv4
512x3x...
Network Architectures
VGG Very Deep Network
Simonyan & Zisserman (ICLR 2015)
conv1a
64x3x3
stride
1
fc6
d.o.
4096-D
fc7
d....
Pre-trained networks
mAP(VOC07)
70
75
80
85
90
Decaf CNN-F CNN-M CNN-S VGG-VD
89.3
79.7479.89
77.38
73.41
Outline
1
2
3
4
Different pre-trained networks
Data augmentation (for both CNN and IFV)
Dataset fine-tuning
Data Augmentation
Given pre-trained ConvNet, augmentation applied at test time
CNN Feature
Extractor
Pre-trained Network
a...
Data Augmentation
a. No augmentation (= 1 image)
b. Flip augmentation (= 2 images)
c. Crop+Flip augmentation (= 10 images)...
Data Augmentation
mAP(VOC07)
60
65
70
75
80
IFV CNN-M
79.89
67.17
79.44
66.68
76.99
64.35
76.97
64.36
None
Flip
Crop+Flip ...
Scale Augmentation
+ flips
224x224
[Smin, Smax] = [256, 512]
+ flips
224x224
256512
Q = {Smin, 0.5(Smin + Smax), Smax}
Fully Convolutional Net
Sermanet et al. 2014 (Overfeat)
• Convert final fc layers to convolutional layers
• Output is then ...
Outline
1
2
3
4
Different pre-trained networks
Data augmentation (for both CNN and IFV)
Dataset fine-tuning
Fine Tuning
conv3
512x3x3
conv4
512x3x3
conv2
256x5x5
conv1
96x7x7
conv5
512x3x3
fc6
d.o.
4096-D
fc7
d.o.
4096-D
ILSVRC
so...
Fine Tuning
conv3
512x3x3
conv4
512x3x3
conv2
256x5x5
conv1
96x7x7
conv5
512x3x3
fc6
d.o.
4096-D
fc7
d.o.
4096-D
VO
C
07
S...
Fine Tuning
mAP(VOC07)
79
80
81
82
83
No TN TN-RNK TN-RNK
82.4
82.2
79.7
• TN-CLS – classification loss max{ 0, 1 - ywT
φ( ...
Comparison with State of the Art
VOC2007 VOC2012ILSVRC-2012
CNN-M 2048
CNN-S
CNN-S TUNE-RNK
13.5
13.1
13.1
80.1
79.7
82.4
...
If you get the details right, a relatively simple ConvNet-
based pipeline can outperform much more complex
architectures
•...
• Presented here was just a subset of the full results
from the paper
• Check out the paper for full results on:
• VOC 200...
• Caffe-compatible CNN models can be
downloaded from the Caffe Model Zoo: https://
github.com/BVLC/caffe/wiki/Model-Zoo
• ...
Related Publications
“Return of the Devil in the Details: Delving Deep into Convolutional Nets”

BMVC 2014 Ken Chatfield, K...
Upcoming SlideShare
Loading in …5
×

Devil in the Details: Analysing the Performance of ConvNet Features

Presentation to accompany paper: http://www.robots.ox.ac.uk/~vgg/research/deep_eval/

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Devil in the Details: Analysing the Performance of ConvNet Features

  1. 1. Devil in the Details: Analysing the Performance of ConvNet Features Ken Chatfield - University of Oxford May 2015
  2. 2. The Devil is still in the Details 2011 2014
  3. 3. • This work is about comparing the latest ConvNet based feature representations on common ground • We compare both different pre-trained network architectures and different learning heuristics Comparing Apples to Apples Fixed Evaluation Protocol Fixed Learning CNN Arch 1 CNN Arch 2 IFV Input Dataset …
  4. 4. Performance Evolution over VOC2007 BOW 32K – IFV-BL 327K – IFV 84K – IFV 84K f s DeCAF 4K t t CNN-F 4K f s CNN-M 2K 2K f s CNN-S 4K (TN) f s VGG-D+E 4K S s 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 mAP 68.02 54.48 61.69 64.36 73.41 77.15 80.13 2008 2010 2013 2014... 82.42 Method Dim. Aug. 2015 89.70 CNN-based methods
  5. 5. Evaluation Setup SVM Classifier train test training set test set Evaluate using mAP, accuracy etc. classifier output Pre-trained Net on 1,000 ImageNet Classes CNN Feature Extractor (4096-D feature vector out)
  6. 6. Outline 1 2 3 4 Different pre-trained networks Data augmentation (for both CNN and IFV) Dataset fine-tuning
  7. 7. • CNN-F Network • CNN-M Network • CNN-S Network • VGG Very Deep Network Network Architectures
  8. 8. Network Architectures CNN-F Network Similar to Krizhevsky et al. (ILSVRC-2012 winner) conv3 256x3x3 stride 1 conv4 512x3x3 conv2 256x5x5 stride 1 conv1 64x11x11 stride 4 conv5 512x3x3 fc6 d.o. 4096-D fc7 d.o. 4096-D input image x2 x2
  9. 9. Network Architectures CNN-M Network Similar to Zeiler & Fergus (ILSVRC-2013 winner) conv3 512x3x3 stride 1 conv4 512x3x3 conv2 256x5x5 stride 2 conv1 96x7x7 stride 2 conv5 512x3x3 fc6 d.o. 4096-D fc7 d.o. 4096-D input image x2 x2 Smaller receptive window size + stride in conv1
  10. 10. Network Architectures CNN-S Network Similar to Overfeat ‘accurate’ network (ICLR 2014) conv3 512x3x3 stride 1 conv4 512x3x3 conv2 256x5x5 stride 1 conv1 96x7x7 stride 2 conv5 512x3x3 fc6 d.o. 4096-D fc7 d.o. 4096-D input image x3 x2 Smaller stride in in conv2
  11. 11. Network Architectures VGG Very Deep Network Simonyan & Zisserman (ICLR 2015) conv1a 64x3x3 stride 1 fc6 d.o. 4096-D fc7 d.o. 4096-D input image Smaller receptive window size + stride, and deeper conv1b 64x3x3 stride 1 conv1c 64x3x3 stride 1 x2 conv2a 128x3x3 stride 1 conv2b 128x3x3 stride 1 conv2c 128x3x3 stride 1 3(32 C2 ) = 27C2 72 C2 = 49C2
  12. 12. Pre-trained networks mAP(VOC07) 70 75 80 85 90 Decaf CNN-F CNN-M CNN-S VGG-VD 89.3 79.7479.89 77.38 73.41
  13. 13. Outline 1 2 3 4 Different pre-trained networks Data augmentation (for both CNN and IFV) Dataset fine-tuning
  14. 14. Data Augmentation Given pre-trained ConvNet, augmentation applied at test time CNN Feature Extractor Pre-trained Network a. Extract crops b. Pool features (average, max)
  15. 15. Data Augmentation a. No augmentation (= 1 image) b. Flip augmentation (= 2 images) c. Crop+Flip augmentation (= 10 images) + + flips 224x224 224x224 224x224
  16. 16. Data Augmentation mAP(VOC07) 60 65 70 75 80 IFV CNN-M 79.89 67.17 79.44 66.68 76.99 64.35 76.97 64.36 None Flip Crop+Flip (train pooling: sum, test pooling: sum) Crop+Flip (train pooling: none, test pooling: sum)
  17. 17. Scale Augmentation + flips 224x224 [Smin, Smax] = [256, 512] + flips 224x224 256512 Q = {Smin, 0.5(Smin + Smax), Smax}
  18. 18. Fully Convolutional Net Sermanet et al. 2014 (Overfeat) • Convert final fc layers to convolutional layers • Output is then an activation map which can be pooled 8.8% 7.5% top-5 val. error ILSVRC-2014
  19. 19. Outline 1 2 3 4 Different pre-trained networks Data augmentation (for both CNN and IFV) Dataset fine-tuning
  20. 20. Fine Tuning conv3 512x3x3 conv4 512x3x3 conv2 256x5x5 conv1 96x7x7 conv5 512x3x3 fc6 d.o. 4096-D fc7 d.o. 4096-D ILSVRC softm ax
  21. 21. Fine Tuning conv3 512x3x3 conv4 512x3x3 conv2 256x5x5 conv1 96x7x7 conv5 512x3x3 fc6 d.o. 4096-D fc7 d.o. 4096-D VO C 07 SVM loss VOC 2007 Train Images
  22. 22. Fine Tuning mAP(VOC07) 79 80 81 82 83 No TN TN-RNK TN-RNK 82.4 82.2 79.7 • TN-CLS – classification loss max{ 0, 1 - ywT φ( I ) } • TN-RNK – ranking loss max{ 0, 1 - wT ( φ( IPOS ) - φ( INEG ) ) }
  23. 23. Comparison with State of the Art VOC2007 VOC2012ILSVRC-2012 CNN-M 2048 CNN-S CNN-S TUNE-RNK 13.5 13.1 13.1 80.1 79.7 82.4 82.4 82.9 83.2 Zeiler & Fergus Oquab et al. Wei et al. Clarifai (1 net) 16.1 79.0 18.0 77.7 78.7 (82.8*) 81.5 (85.2*) 81.7 (90.3*) GoogLeNet (1 net) 12.5 7.9 VGG Very Deep (1 net) 89.3 89.07.0
  24. 24. If you get the details right, a relatively simple ConvNet- based pipeline can outperform much more complex architectures • Data augmentation helps a lot, both for deep and shallow features • Fine tuning makes a difference, and should use ranking loss where appropriate • Smaller filters and deeper networks help, although feature computation is slower Take-home Messages
  25. 25. • Presented here was just a subset of the full results from the paper • Check out the paper for full results on: • VOC 2007 • VOC 2012 • Caltech-101 • Caltech-256 • ILSVRC-2012 There’s more…
  26. 26. • Caffe-compatible CNN models can be downloaded from the Caffe Model Zoo: https:// github.com/BVLC/caffe/wiki/Model-Zoo • Matlab feature computation code is also available from the project website: http:// www.robots.ox.ac.uk/~vgg/software/deep_eval Source Code
  27. 27. Related Publications “Return of the Devil in the Details: Delving Deep into Convolutional Nets”
 BMVC 2014 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman (Best Paper Prize) “The devil is in the details: an evaluation of recent feature encoding methods”
 BMVC 2011 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Victor Lempitsky, Andrew Zisserman
 (Best Poster Prize Honourable Mention, 300+ citations) http://www.robots.ox.ac.uk/~ken

×