This document summarizes Dmytro Panchenko's approach for classifying proteins in human cells on Kaggle's Human Protein Atlas competition. The competition task involved predicting the subcellular localization of proteins in microscopy images using multilabel classification across 28 classes. Panchenko's approach involved obtaining external data, setting up an adversarial validation scheme, training neural networks with techniques like cyclical learning rates and test time augmentations, and creating an ensemble of models by stacking their predictions. Extensive validation, including k-fold cross validation within k-fold cross validation, was used to obtain stable thresholds for each class. This complex stacking and validation approach helped Panchenko achieve a top-8 private leaderboard position on the competition.
The CMO Survey - Highlights and Insights Report - Spring 2024
Cracking Kaggle HPA Classification with Neural Nets & Stacking
1. Cracking Kaggle: Human Protein Atlas
classification
Dmytro Panchenko
Machine learning engineer, Altexsoft
2. Kaggle
Kaggle is an online community of data scientists and machine learners.
Why do we love it:
1. Learning.
2. Networking.
3. Benchmarking your skills.
4. Competing.
3. Competition task
• Predicting protein localization
• Multilabel classification
• 28 classes
• 31.1k train, 11.7k test
• F1-macro (average of f1-score,
computed for each class)
• 4 channels per image
4.
5. Challenges
• There are extremely small classes (dozen of sample per class)
• External data is allowed (data leakage)
• 2048x2048 images are given (size of your GPU matters)
6. Our approach
• Get external data and find
duplicates
• Set up validation
• Train neural networks
• Stack
• Pray
7. Validation: adversarial scheme
• Holdout (we don’t have
enough hardware for
proper k-fold)
• Validation samples are
selected by adversarial
validation
Whole dataset
(shuffled train and test)
Classifier
(convolutional neural network)
high ROC-AUC ≈0.5 ROC-AUC
adversarial
holdout
stratified split
8. Validation: folds in holdout
• All “train-train” and “train-external” duplicate pairs are
either in train or in validation part
• All “external-test” duplicate pairs are in validation
• For each class we calculate 5-folds validation on holdout and
average it 20 times to obtain stable thresholds
9. Train networks: fast.ai
Pros:
• Lots of advanced stuff is out-of-
box
• Works over PyTorch (so it is fast)
Cons:
• Bad API and lots of bugs (need to
patch a lot)
• 40 releases during the
competition
10. Train networks: general approach
• SE-ResNext-50 for conducting experiments
• Train on RGB
• Focal loss, LSEP loss (https://arxiv.org/abs/1704.03135)
• Simple one-layer head
• Augmentations: D4 symmetry group, brightness, warp, crop, resize.
• Test-time augmentations: 32x, same transforms as we used during
training.
11. Train networks: LR tricks
• Choose learning rate by LR finder
(https://arxiv.org/abs/1506.01186)
• Use cyclical learning rate to
avoid local minima
• Use differential learning rate to
smooth out transfer learning
12. Train networks: batch norm trick
batch size ≥ 16 16 > batch size ≥ 6 6 > batch size ≥ 4 4 > batch size
Train BN with any
momentum
Train BN with low
momentum
1. Freeze BN
2. Train almost to
convergence
3. Unfreeze BN
4. Train BN with low
momentum
Oh crap
Freeze BN
Disclaimer: tested by me only on HPA competition. Use at your own risk.
13. Things which didn’t work
• RGBY
• Sample pairing
• Mixup augmentation
• Training one-vs-all models
• Domain-inspired augmentations
• Large architectures (Nasnet,
Senet-154)
• Training on high resolution
images
• Snapshot ensembling
15. Stacking: features
• Predictions of 14 models:
• Architectures (Se-ResNext-50, InceptionV4, BN-Inception, Xception)
• Scales (mostly 512x512, but also 256x256, 768x768)
• Channels (13xRGB, 1xRGBY)
• Target (mostly all classes, but also network trained on minor classes only)
• Meta-features: brightness, contrast, correlation between channels,
etc.
16. K-fold for each class
Outer train fold #4
Outer train fold #3
Outer train fold #2
Outer train fold #1
Outer validation fold
Train set (holdout) Test set
17. K-fold inside outer training folds
Inner train fold
#1
Outer validation fold
Train set (holdout) Test set
Inner train fold
#2
Inner train fold
#3
Inner train fold
#4
Inner validation
fold
18. Predict test set and outer validation fold
Inner train fold
#1
Outer validation fold
Train set (holdout) Test set
Inner train fold
#2
Inner train fold
#3
Inner train fold
#4
Inner validation
fold
(we already
now threshold
for each inner
model)
19. Calculate voting thresholds for each outer fold
Outer fold #1 (we have voting from other outer folds here)
We have voting
from outer folds
here
Train set (holdout) Test set
Outer fold #2 (we have voting from other outer folds here)
Outer fold #3 (we have voting from other outer folds here)
Outer fold #4 (we have voting from other outer folds here)
Outer fold #5 (we have voting from other outer folds here)
20. To make things worse:
• We repeat inner k-folds ~3 times
• We repeat outer k-folds ~5 times
with random downsampling
• For minor classes we repeat
outer folds ~5 times and vote
between those iterations for
additional stability
• Approximately 375 boosting
models are fitted for each class
21. Results & credits
• Best single network - top-21 private LB
• Stacking - top-8 private LB
Sergei Fironov Dmitry Buslov Dmytro Panchenko Alexander Kiselev