Cracking Kaggle HPA Classification with Neural Nets & Stacking

Cracking Kaggle: Human Protein Atlas
classification
Dmytro Panchenko
Machine learning engineer, Altexsoft

Kaggle
Kaggle is an online community of data scientists and machine learners.
Why do we love it:
1. Learning.
2. Networking.
3. Benchmarking your skills.
4. Competing.

Competition task
• Predicting protein localization
• Multilabel classification
• 28 classes
• 31.1k train, 11.7k test
• F1-macro (average of f1-score,
computed for each class)
• 4 channels per image

Challenges
• There are extremely small classes (dozen of sample per class)
• External data is allowed (data leakage)
• 2048x2048 images are given (size of your GPU matters)

Our approach
• Get external data and find
duplicates
• Set up validation
• Train neural networks
• Stack
• Pray

Validation: adversarial scheme
• Holdout (we don’t have
enough hardware for
proper k-fold)
• Validation samples are
selected by adversarial
validation
Whole dataset
(shuffled train and test)
Classifier
(convolutional neural network)
high ROC-AUC ≈0.5 ROC-AUC
adversarial
holdout
stratified split

Validation: folds in holdout
• All “train-train” and “train-external” duplicate pairs are
either in train or in validation part
• All “external-test” duplicate pairs are in validation
• For each class we calculate 5-folds validation on holdout and
average it 20 times to obtain stable thresholds

Train networks: fast.ai
Pros:
• Lots of advanced stuff is out-of-
box
• Works over PyTorch (so it is fast)
Cons:
• Bad API and lots of bugs (need to
patch a lot)
• 40 releases during the
competition

Train networks: general approach
• SE-ResNext-50 for conducting experiments
• Train on RGB
• Focal loss, LSEP loss (https://arxiv.org/abs/1704.03135)
• Simple one-layer head
• Augmentations: D4 symmetry group, brightness, warp, crop, resize.
• Test-time augmentations: 32x, same transforms as we used during
training.

Train networks: LR tricks
• Choose learning rate by LR finder
(https://arxiv.org/abs/1506.01186)
• Use cyclical learning rate to
avoid local minima
• Use differential learning rate to
smooth out transfer learning

Train networks: batch norm trick
batch size ≥ 16 16 > batch size ≥ 6 6 > batch size ≥ 4 4 > batch size
Train BN with any
momentum
Train BN with low
momentum
1. Freeze BN
2. Train almost to
convergence
3. Unfreeze BN
4. Train BN with low
momentum
Oh crap
Freeze BN
Disclaimer: tested by me only on HPA competition. Use at your own risk.

Things which didn’t work
• RGBY
• Sample pairing
• Mixup augmentation
• Training one-vs-all models
• Domain-inspired augmentations
• Large architectures (Nasnet,
Senet-154)
• Training on high resolution
images
• Snapshot ensembling

Other ideas: KNN and metric learning

Stacking: features
• Predictions of 14 models:
• Architectures (Se-ResNext-50, InceptionV4, BN-Inception, Xception)
• Scales (mostly 512x512, but also 256x256, 768x768)
• Channels (13xRGB, 1xRGBY)
• Target (mostly all classes, but also network trained on minor classes only)
• Meta-features: brightness, contrast, correlation between channels,
etc.

K-fold for each class
Outer train fold #4
Outer train fold #3
Outer train fold #2
Outer train fold #1
Outer validation fold
Train set (holdout) Test set

K-fold inside outer training folds
Inner train fold
#1
Inner train fold
#2
Inner train fold
#3
Inner train fold
#4
Inner validation
fold

Predict test set and outer validation fold
Inner train fold
#1
Inner train fold
#2
Inner train fold
#3
Inner train fold
#4
Inner validation
fold
(we already
now threshold
for each inner
model)

Calculate voting thresholds for each outer fold
Outer fold #1 (we have voting from other outer folds here)
We have voting
from outer folds
here

To make things worse:
• We repeat inner k-folds ~3 times
• We repeat outer k-folds ~5 times
with random downsampling
• For minor classes we repeat
outer folds ~5 times and vote
between those iterations for
additional stability
• Approximately 375 boosting
models are fitted for each class

Results & credits
• Best single network - top-21 private LB
• Stacking - top-8 private LB
Sergei Fironov Dmitry Buslov Dmytro Panchenko Alexander Kiselev

Thank you for attention
Questions are welcomed

Cracking Kaggle HPA Classification with Neural Nets & Stacking

Recommended

Recommended

More Related Content

Similar to Cracking Kaggle HPA Classification with Neural Nets & Stacking

Similar to Cracking Kaggle HPA Classification with Neural Nets & Stacking (20)

More from Lviv Startup Club

More from Lviv Startup Club (20)

Recently uploaded

Recently uploaded (20)

Cracking Kaggle HPA Classification with Neural Nets & Stacking