7. Train label distribution
Image per class Number of classes Total images,
k
>10k 6 168
1k..10k 123 280
100..1000 1978 581
10..100 6674 222
<10 6170 33
8. Train
– Geolocation
– Visual similarity
Test
– Human annotators
=> Train distribution != Test
Data labeling process
15. Like in Cloud:
– MRG-network
• Wideresnet 50-2
• trained on scenes & attractions
– Softmax + Center loss
– Centroids
– Distance threshold for NA
First attempt
Name Public LB Private LB
MRG default 0.091 0.081
16. Added
– Scenes/OpenImages as Not Attractions
– WeightedRandomSampler to balance dataset
• NA = 1/2 of the weights
– Results Softmax only for NA (p>0.5)
Little tricks
Name Public LB Private LB
MRG tricks 0.123 0.081
17. Changes:
– Arcface from Face Recognition
• m=0.25
– NA = 1/3 of the weights
– Softmax NA > 0.25
Arcface
Name Public LB Private LB
Arcface 0.193 0.190 (+0.1)
18. Test-time augmentation
– 10 crops for inference
• center crop
• 4 corners
• + flip (2x)
TTA
Name Public LB Private LB
Arcface + TTA 0.201 0.202 (+0.012)
It works!
19. Tested other pretrained models
– Resnet101, Inception_v3, Densenet121
– 1024 embedding layer
– 0.01 lr modifier for pretrained blocks + warmup
– Arcface, TTA
Other architectures
Name Public LB Private LB
Resnet101 (imagenet) 0.223 0.205 (+0.003)
Densenet (imagenet) 0.220 0.213 (+0.01)
13th place
Resnet101
(openimages)
0.200 0.199
20. – We used all models (5 single, 3 k-folds)
• top-10 predictions for each row
– All k-folds models – averaged prediction
– Weights according to LB (val doesn’t correlate with test)
Voting
Name Public LB Private LB
Ensemble 0.241 0.228 (+0.015)
23. – Main part on inference
– Experiments
• Various per centroid images number
• Filter each class
• Several centroids per class (via clustering)
• Not improve
– 100 images per class is the magic constant
Centroids experiments
24. – Centroids approach is an approximation
– Raw kNN
– Filter data via hierarchical clustering then kNN
– Optimal k = 3
kNN
Name Public LB Private LB
ResNet-101 + TTA + clean
data + k=3
0.194 0.195
25. 4th place
– ResNet-50 for not attractions recognition (OpenImages)
– kNN:
• k = 5
• scores[class_id] = sum(cos(query_image, index) for index in K_closest_images)
• scores[class_id] /= min(K, number of samples in train dataset with class=class_id)
• label = argmax(scores), confidence = scores[label]
– 100 augmented local crops from each image + kNN
– Simple voting
– Pure single model, overfitting
Other competitors
Name Public LB Private LB
ods 0.323 0.255
37. Mail.ru Vision solution: 10th place
Train
– Arcface for Metric Learning (instead of Softmax)
– Data skew: sample ~ sqrt(class frequency)
– Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset)
– Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3
Inference
– Centroid per landmark (random 100 elements): closest by cosine distance
– Not landmarks excluded by Softmax
– TTA: ten-crop augmentation
Model Public LB Private LB
DenseNet-121
(from Imagenet)
0.220 0.213
13th place
Average of 5 single
models & 3 k-folds
0.241 0.228
Post about our solution.
Team members:
• Eduard Tyantov
tyantov@corp.mail.ru
• Andrei Boiarov
a.boiarov@corp.mail.ru