Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

convolutional neural networks for image
classification
Evidence from Kaggle National Data Science Bowl
.
Dmytro Mishkin, ducha.aiki at gmail com
March 25, 2015
Czech Technical University in Prague

kaggle national data science bowl overview
The image classification problem
130,400 test images
30,336 train images
1 channel (grayscale)
121 (biased) classess
90% images ≤ 100x100 px
logloss score = - 1
N
N∑
i=1
M∑
j=1
yij log pij
No external data
1

classes diagram
1
1url: http://npow.github.io/plankton/viewer/index.html.
2

lunch time chat at kth’s computer vision group
a computer vision scientist: How long does it take to train these
generic features on ImageNet?
Hossein: 2 weeks
Ali: almost 3 weeks depending on the hardware
the computer vision scientist: hmmmm...
Stefan: Well, you have to compare the three weeks to the last 40
years of computer vision2
2http://www.csc.kth.se/cvap/cvg/DL/ots/
5

convolutional networks
CNNs are state-of-art in such fields of image recognition as:3
:
– Object Image Classification
– Scene Image Classification
– Action Image Classification
– Object Detection
– Semantic Segmentation
– Fine-grained Recognition
– Attribute Detection
– Metric Learning
– Instance Retrieval (almost).
3beat classic computer vision methods in 19 datasets out of 20
http://www.csc.kth.se/cvap/cvg/DL/ots/
6

contents
1. Basics of convolutional networks
2. Image preprocessing
3. Network architectures
4. Ensembling
5. What (seems that) do and does not work
6. Winner‘s solution highlights
7

..
basics of convolutional net-
works

what is convolution
4
4https://developer.apple.com/library/ios/documentation/Performance/
Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
9

softmax classifier
Softmax(cross-entropy) loss
L = − log e
fyi
∑
j
e
fj
SVM (hinge)loss
L =
∑
j̸=yi
max(0, f(xi, W)j − f(xi, W)yi + ∆)
5
5http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/
10

lenet-5. no other layers are necessary
6
Firstly idea proposed by LeCun7
in 1989, recently revived by
Springenberg et. al. in ”All Convolutional Net”8
,
6http://eblearn.sourceforge.net/beginner_tutorial2_train.html
7url: https://www.facebook.com/yann.lecun/posts/10152766574417143.
8J. T. Springenberg et al. “Striving for Simplicity: The All Convolutional Net”. In:
ArXiv e-prints (2014). arXiv: 1412.6806 [cs.LG].
11

non-linearities
−3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
4
Input
Activation TanH
Sigmoid
ReLU
maxout (sort of)
LeakyReLU
12

regularization - dropout, weight decay
9
9Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks from
Overﬁtting”. In: Journal of Machine Learning Research 15 (2014), pp. 1929–1958.
url: http://jmlr.org/papers/v15/srivastava14a.html.
13

deep learning libraries
Table 1: Popular deep learning GPU libraries
Name url languages Notes
caffe github.com/BVLC/caffe C++/Python/no largest community
cxxnet github.com/dmlc/cxxnet C++/no good memory management
Theano github.com/Theano/Theano Python huge ﬂexibility
Torch facebook/fbcunn lua LeCun Facebook library
cuda-convnet2 code.google.com/p/cuda-convnet2/ C++/python
SparseConvNet http://tinyurl.com/pu65cfp C++/CUDA differs from others
14

basic network architecture
72x72x1 → Crop to 64x64 →20C5 →MP2 →50C5 → MP2 →500IP → clf
16

basic data preprocessing
Table 2: 5-layer network experiments, 48x48 input image, no non-linearities,
mean pixel extraction
Name, augmentation Val logloss Train logloss
No mean extraction, no scaling – –
mirror 1.67 0.64
histeq, mirror 1.74 0.64
mirror + ReLU 1.61 0.44
mirror + scale 1.42 0.937
mirror + scale LeakyReLU 1.34 0.83
mirror + rand rot 1.53 1.31
17

basic data preprocessing
Table 3: 5-layer network experiments, 48x48 input image, LeakyReLU
non-linearities, mean pixel extraction
Name, augmentation Val logloss Train logloss
mirror + scale 1.34 0.83
invert, mirror + scale 1.27 0.80
invert, norm, mirror + scale 1.24 0.505
invert, norm, mirror + scale, salt-pepper 1.15 n/a
18

more geometric transformations
Name, augmentation Val logloss
mirror 1.30
mirror + scale (resize modes) 1.12
h+v mirror, scale 1.10
h+v mirror, scale + rot 1.08
mirror, less baselr 1.04 :)
h+v mirror, scale + rot, depolar imgs 1.28
19

regularization methods
Name, augmentation Val logloss
h+v mirror, scale + rot, vanilla 1.08
h+v mirror, scale + rot, PReLU (but slow down a lot)10
1.03
h+v mirror, scale + rot, BatchNorm11
1.10
h+v mirror, scale + rot, StochPool12
0.98
10K. He et al. “Delving Deep into Rectiﬁers: Surpassing Human-Level Performance on
ImageNet Classiﬁcation”. In: ArXiv e-prints (2015). arXiv: 1502.01852 [cs.CV].
11S. Ioffe and C. Szegedy. “Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift”. In: ArXiv e-prints (2015). arXiv: 1502.03167
[cs.LG].
12M. D. Zeiler and R. Fergus. “Stochastic Pooling for Regularization of Deep
Convolutional Neural Networks”. In: ArXiv e-prints (2013). arXiv: 1301.3557 [cs.LG].
20

data augmentation - don‘t forget about it during test time
for i = 0,90,180,270 degrees rotation
for 9 crops (N, NE, E, ...)
get predictions for mirrored/non-mirrored
21

cifar/lenet for testing
Pro‘s
+ Training time 20 min
+ Can be done in parallel
+ therefore lots of experiments
Con‘s
- Not complex enough to check smth (i.e. BatchNorm)
- That is why might lead to wrong conclusions about ”bad” things (i.e.
random rotations hurts CifarNets, but helps VGGNets)
- Or ”good” things (i.e. Stochastic pooling helps CifarNets, but none
for VGGNets)
23

googlenet
GoogLeNet architecture13
13C. Szegedy et al. “Going Deeper with Convolutions”. In: ArXiv e-prints (2014).
arXiv: 1409.4842 [cs.CV].
25

googlenet
22 layers, but simple base brick – ”Inception”
26

internal ensemble
Take mean of all auxiliary classiﬁers instead of just throwing away them
Table 6: GoogLeNet,validation loss
Name Public LB
clf on inc3 0.722
clf on inc4a 0.754
clf on inc4b 0.757
clf on inc5b 0.855
average 0.693
Table 7: VGGNet,validation loss
Name Public LB
clf on pool4 0.762
clf on pool5 0.657
clf on fc7 0.707
average 0.630
14
14J. Xie, B. Xu, and Z. Chuang. “Horizontal and Vertical Ensemble with Deep
Representation for Classiﬁcation”. In: ArXiv e-prints (2013). arXiv: 1306.2759
[cs.LG].
27

googlenet-results
Table 8: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),
AlexNet-oversample
Name Public LB
No inv, scale, ReLU, last-clf 0.910
No inv, scale, ReLU 0.859
No inv, scale 0.816
No inv scale, maxout-clf 0.785
Inv, scale, maxout-clf, retrain 0.703
96x96, inv, scale, maxout-clf, retrained, no-aug-ft15
0.684
112x112, inv, scale, maxout-clf, retrained, no-aug-ft. 0.716
48x48, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.749
96x96, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.679
48x48+96x96+112x112, inv, scale, maxout-clf, retrained, no-aug-ft 0.677
15Ben Graham‘s trick: ﬁnetune converged model for 1-5 epochs without
data-augmentation with small lrhttp://blog.kaggle.com/2015/01/02/
cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmu
28

vggnet
VGGNet architectures16
Diﬀerences: Dropout in conv-layers (0.3), SPP-pooling for pool5, LeakyReLU,
aux. clf.
16K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-Scale
Image Recognition”. In: ArXiv e-prints (Sept. 2014). arXiv: 1409.1556 [cs.CV].
29

spatial pyramid pooling
17
17K. He et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual
Recognition”. In: ArXiv e-prints (2014). arXiv: 1406.4729 [cs.CV].
30

vggnet-results
Table 9: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),
AlexNet-oversample, no-SPP
Name Public LB
No inv, scale, ReLU, fc-maxout 0.752
Inv, scale, single random crop 0.773
Inv, scale, 50 random crops 0.751
Inv, scale, 0.729
Inv, scale, retrained 0.720
Inv, scale, fc-maxout 0.662
Inv, scale, fc-maxout, SPP 0.654
All VGGNets Mix 0.650
31

sparseconvnet
– 0.79 LB Score
– Unusual library
– C2 instead of C3 convolution
– Only padding - for input image
– Kaggle CIFAR-10 winning architecture
320C2 - 320C2 - MP2 -
640C2 - 10% dropout - 640C2 - 10% dropout - MP2 -
1920C2 - 50% dropout - 1920C1 - 50% dropout - 121C1 - Softmax output
32

ensemble-results
Table 10: Diﬀerent mixes of all modes (3 GoogleNets, 4 VGGNets, 1
SparseConvNet)
Name Public LB Private LB
4 VGG 0.650 0.651
3 VGG, 1 GLN 0.625 0.629
4 VGG, 3 GLN 0.617 0.618
4 VGG, 3 GLN, 1 Sparse 0.611 0.616
4 VGG, 3 GLN, 1 Sparse, ﬁgure-skating 0.609 0.613
33

batchnorm
Works for CIFAR
But no big diﬀerence for VGGNet in KNDB for me. However, works for
other people, i.e. Jae Hyun Lim18
, 22nd place
18https://github.com/lim0606/ndsb
35

what else seems to work here
– Retrain top layers with diﬀerent non-linearity (cheat diversity)
– Figure-skating average – throw away max and min prediction (0.003
LB score)
36

what seems, that does not work here
– Dense SIFT + BOW / Fisher Vector 6̃0% accuracy
– Random forest on CNN features 6̃5% accuracy
– Mix of Hinge and Cross-Entropy losses
– Averaging with other mean than arithmetical
– Image enhancement or preprocessing (histogram equalization, etc.)
37

..winner‘s solution highlights

team work
– Roll-pool
– Hand-engineered features
– RMS-Pool
– Knowledge distillation
19
19http://benanne.github.io/2015/03/17/plankton.html
39

thanks
This nice presentation theme is taken from
github.com/matze/mtheme
The theme itself is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.
cba
41

Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl

Similar to Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl (20)

Recently uploaded

Recently uploaded (20)

Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl