Kaggle Google Landmark recognition

Kaggle: CVPR attractions
Andrew Boiarov, Tyantov Eduard

Overview
– Goal – attraction recognition
– 14951 classes
– 1.2M train images
– 117k test images
Challenge

Train label distribution
Image per class Number of classes Total images,
k
>10k 6 168
1k..10k 123 280
100..1000 1978 581
10..100 6674 222
<10 6170 33

Train
– Geolocation
– Visual similarity
Test
– Human annotators
=> Train distribution != Test
Data labeling process

90-95% no landmarks out of 117k
Test set

Global Average Precision (GAP)
– Order matters
– Errors in first positions greatly affect score
Evaluation

Options
– Softmax
– Metric learning: center loss,
arcface, triplet, …
– DELF: key points
• reported poor performance
How to train ?

Options
– Softmax
– Centroids
– KNN
How to handle not attractions ?
Inference

Like in Cloud:
– MRG-network
• Wideresnet 50-2
• trained on scenes & attractions
– Softmax + Center loss
– Centroids
– Distance threshold for NA
First attempt
Name Public LB Private LB
MRG default 0.091 0.081

Added
– Scenes/OpenImages as Not Attractions
– WeightedRandomSampler to balance dataset
• NA = 1/2 of the weights
– Results Softmax only for NA (p>0.5)
Little tricks
MRG tricks 0.123 0.081

Changes:
– Arcface from Face Recognition
• m=0.25
– NA = 1/3 of the weights
– Softmax NA > 0.25
Arcface
Arcface 0.193 0.190 (+0.1)

Test-time augmentation
– 10 crops for inference
• center crop
• 4 corners
• + flip (2x)
TTA
Arcface + TTA 0.201 0.202 (+0.012)
It works!

Tested other pretrained models
– Resnet101, Inception_v3, Densenet121
– 1024 embedding layer
– 0.01 lr modifier for pretrained blocks + warmup
– Arcface, TTA
Other architectures
Resnet101 (imagenet) 0.223 0.205 (+0.003)
Densenet (imagenet) 0.220 0.213 (+0.01)
13th place
Resnet101
(openimages)
0.200 0.199

– We used all models (5 single, 3 k-folds)
• top-10 predictions for each row
– All k-folds models – averaged prediction
– Weights according to LB (val doesn’t correlate with test)
Voting
Ensemble 0.241 0.228 (+0.015)

Centroids
class centroid
embedding
embedding
embedding

– Main part on inference
– Experiments
• Various per centroid images number
• Filter each class
• Several centroids per class (via clustering)
• Not improve 
– 100 images per class is the magic constant
Centroids experiments

– Centroids approach is an approximation
– Raw kNN
– Filter data via hierarchical clustering then kNN
– Optimal k = 3
kNN
ResNet-101 + TTA + clean
data + k=3
0.194 0.195

4th place
– ResNet-50 for not attractions recognition (OpenImages)
– kNN:
• k = 5
• scores[class_id] = sum(cos(query_image, index) for index in K_closest_images)
• scores[class_id] /= min(K, number of samples in train dataset with class=class_id)
• label = argmax(scores), confidence = scores[label]
– 100 augmented local crops from each image + kNN
– Simple voting
– Pure single model, overfitting
Other competitors
ods 0.323 0.255

Impressive examples: test query (1)

Impressive examples: results (1)

Private Leaderboard
Our solution in text: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/58050

Mail.ru Vision solution: 10th place
Train
– Arcface for Metric Learning (instead of Softmax)
– Data skew: sample ~ sqrt(class frequency)
– Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset)
– Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3
Inference
– Centroid per landmark (random 100 elements): closest by cosine distance
– Not landmarks excluded by Softmax
– TTA: ten-crop augmentation
Model Public LB Private LB
DenseNet-121
(from Imagenet)
0.220 0.213
13th place
Average of 5 single
models & 3 k-folds
0.241 0.228
Post about our solution.
Team members:
• Eduard Tyantov
tyantov@corp.mail.ru
• Andrei Boiarov
a.boiarov@corp.mail.ru

Kaggle Google Landmark recognition

More Related Content

More from Eduard Tyantov

Recently uploaded

Kaggle Google Landmark recognition