2. Workshop setup
1. Clone code from https://github.com/hokmund/cnn-tips-and-tricks
2. Download data and checkpoints from http://tiny.cc/4flryy
3. Extract them from the archive and place under src/ in the source
code folder
4. Run pip install โr requirements.txt
5. Exploratory data analysis
โข Real-world images of
various goods.
โข Different occlusions,
illumination, etc.
โข Most of items are
centered on the
picture.
โข There are extremely
close classes.
7. Dataset split
โข Validation set is used for hyperparameter tuning.
โข Test set is used for the final evaluation of the tuned model.
โข Train set โ 37184 samples (imbalanced).
โข Validation set โ 12800 samples (balanced).
โข Test set โ 25600 samples (balanced).
9. Transfer learning
Your have little data You have a lot of data
Datasets
are similar
Train a classifier (usually, logistic
regression or MLP) on bottleneck
features
Fine-tune several or all layers
Datasets
are
different
Train a classifier on deep features of the
CNN
Fine-tune all layers (use pre-trained
weights as an initialization for your CNN)
13. Learning curve
Overfitting (train accuracy
increases while validation
get worse, so you need to
add regularization or
increase dataset if
possible)
14. Learning curve
Overfitting with oscillations
(network became unstable
after several epochs; you
need to decrease learning
rate during training)
17. Learning rate strategies
Time-based decay:
๐๐ = ๐๐ โ
1
1 + ๐๐๐๐๐ฆ โ ๐๐๐๐โ
This decay is used by default in
Keras optimizers.
19. Reducing learning rate on plateau
Reducing learning rate whenever validation metric stops improving
(can be combined with previously discussed strategies).
Keras implementation โ ReduceLROnPlateau callback.
20. Cyclic learning rate
Learning rate increases and
decreases in a cycle.
Upper bound of the cycle can be
static or can decrease with time.
Upper bound is selected by LR
finder algorithm.
Lower bound is chosen to be 1-2
orders of magnitude less than
upper bound.
Original paper - https://arxiv.org/abs/1506.01186
21. Learning rate finder
1. Select reasonably small lower
bound (e.g. 1e-6)
2. Usually, 1e0 is a good choice
for an upper bound
3. Increase learning rate
exponentially
4. Plot smoothed loss vs LR
5. Select a point slightly lower
than the global minimum
24. Augmentation
โข Augmentation increases dataset
size by applying natural
transformations to images.
โข Useful strategy:
โข Start with soft augmentation.
โข Make it harsher with time.
โข If the dataset is big enough, finish
training with several epochs with soft
augmentation / without any.
Implementation:
https://github.com/albu/albumentations
26. Dealing with imbalanced train set
Common ways to deal with it imbalanced classification are upsampling and
downsampling. In case of deep learning there is also weighted loss.
Weighted loss example:
Class A has 1000 samples.
Class B has 2000 samples.
Class C has 400 samples.
Overall loss:
๐๐๐ ๐ =
๐๐๐๐ ๐ =0
๐
๐๐๐ ๐ ๐๐๐๐ ๐
๐๐๐๐ ๐ =0
๐
๐ค๐๐๐โ๐ก๐๐๐๐ ๐
=
2 โ ๐๐๐ ๐ ๐ด + ๐๐๐ ๐ ๐ต + 5 โ ๐๐๐ ๐ ๐ถ
8
28. Test-time augmentation
โข One way to apply TTA is to use
augmentations similar to
training but softer.
โข Simpler strategies:
โข Only flips
โข Flips + crops
โข Caution: TTA increases inference
time!
30. Semi-supervised approach
โข Deep layers of a CNN learn very
generic features.
โข You can refine such feature
extractors by training on
unlabeled data.
โข Most popular approach for such
training is called pseudolabeling.
31. Pseudolabeling
1. Train classifier on the initial training set.
2. Predict validation / test set with your
classifier.
3. Optional: remove images with low-
confidence labels.
4. Add pseudolabeled data to your training
set.
5. Use it to train CNN from scratch (some kind
of a warmup) or to refine your previous
classifier.
Source - https://www.analyticsvidhya.com/blog/2017/09/pseudo-
labelling-semi-supervised-learning-technique/
32. Pseudolabeling constraints
1. Test dataset has reasonable size (at least comparable to the
training set).
2. Network which is trained on pseudolabels is deep enough
(especially when pseudolabels are generated by an
ensemble of models).
3. Training data and pseudolabeled data are mixed in 1:2 โ
1:4 proportions respectively.
33. Using pseudolabeling
In competitions:
- Label test set with your ensemble;
- Train new model;
- Add it to the final ensemble.
In production:
- Collect as much data as possible (both labeled and unlabeled);
- Train model on labeled data;
- Apply pseudolabeling.
35. Summary
1. Train networkโs head
2. Add head to the convolutional part
3. Add augmentations and learning rate scheduling / CLR
4. Select appropriate loss
5. Predict with test-time augmentations
6. If you donโt have enough training data, apply pseudolabeling
7. Good luck!
36. Other tricks (out of scope)
โข How to select network architecture (size,
regularization, pooling type, classifier structure)
โข How to select an optimizer (Adam, RMSprop, etc.)
โข Training on the bigger resolution
โข Hard samples mining
โข Ensembling