Kaggle Google Landmark recognition

Kaggle: CVPR attractions
Andrew Boiarov, Tyantov Eduard

Overview
– Goal – attraction recognition
– 14951 classes
– 1.2M train images
– 117k test images
Challenge

Train label distribution
Image per class Number of classes Total images,
k
>10k 6 168
1k..10k 123 280
100..1000 1978 581
10..100 6674 222
<10 6170 33

Train
– Geolocation
– Visual similarity
Test
– Human annotators
=> Train distribution != Test
Data labeling process

90-95% no landmarks out of 117k
Test set

Global Average Precision (GAP)
– Order matters
– Errors in first positions greatly affect score
Evaluation

Options
– Softmax
– Metric learning: center loss,
arcface, triplet, …
– DELF: key points
• reported poor performance
How to train ?

Options
– Softmax
– Centroids
– KNN
How to handle not attractions ?
Inference

Like in Cloud:
– MRG-network
• Wideresnet 50-2
• trained on scenes & attractions
– Softmax + Center loss
– Centroids
– Distance threshold for NA
First attempt
Name Public LB Private LB
MRG default 0.091 0.081

Added
– Scenes/OpenImages as Not Attractions
– WeightedRandomSampler to balance dataset
• NA = 1/2 of the weights
– Results Softmax only for NA (p>0.5)
Little tricks
MRG tricks 0.123 0.081

Changes:
– Arcface from Face Recognition
• m=0.25
– NA = 1/3 of the weights
– Softmax NA > 0.25
Arcface
Arcface 0.193 0.190 (+0.1)

Test-time augmentation
– 10 crops for inference
• center crop
• 4 corners
• + flip (2x)
TTA
Arcface + TTA 0.201 0.202 (+0.012)
It works!

Tested other pretrained models
– Resnet101, Inception_v3, Densenet121
– 1024 embedding layer
– 0.01 lr modifier for pretrained blocks + warmup
– Arcface, TTA
Other architectures
Resnet101 (imagenet) 0.223 0.205 (+0.003)
Densenet (imagenet) 0.220 0.213 (+0.01)
13th place
Resnet101
(openimages)
0.200 0.199

– We used all models (5 single, 3 k-folds)
• top-10 predictions for each row
– All k-folds models – averaged prediction
– Weights according to LB (val doesn’t correlate with test)
Voting
Ensemble 0.241 0.228 (+0.015)

Centroids
class centroid
embedding
embedding
embedding

– Main part on inference
– Experiments
• Various per centroid images number
• Filter each class
• Several centroids per class (via clustering)
• Not improve 
– 100 images per class is the magic constant
Centroids experiments

– Centroids approach is an approximation
– Raw kNN
– Filter data via hierarchical clustering then kNN
– Optimal k = 3
kNN
ResNet-101 + TTA + clean
data + k=3
0.194 0.195

4th place
– ResNet-50 for not attractions recognition (OpenImages)
– kNN:
• k = 5
• scores[class_id] = sum(cos(query_image, index) for index in K_closest_images)
• scores[class_id] /= min(K, number of samples in train dataset with class=class_id)
• label = argmax(scores), confidence = scores[label]
– 100 augmented local crops from each image + kNN
– Simple voting
– Pure single model, overfitting
Other competitors
ods 0.323 0.255

Impressive examples: test query (1)

Impressive examples: results (1)

Private Leaderboard
Our solution in text: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/58050

Mail.ru Vision solution: 10th place
Train
– Arcface for Metric Learning (instead of Softmax)
– Data skew: sample ~ sqrt(class frequency)
– Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset)
– Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3
Inference
– Centroid per landmark (random 100 elements): closest by cosine distance
– Not landmarks excluded by Softmax
– TTA: ten-crop augmentation
Model Public LB Private LB
DenseNet-121
(from Imagenet)
0.220 0.213
13th place
Average of 5 single
models & 3 k-folds
0.241 0.228
Post about our solution.
Team members:
• Eduard Tyantov
tyantov@corp.mail.ru
• Andrei Boiarov
a.boiarov@corp.mail.ru

Kaggle Google Landmark recognition

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kaggle Google Landmark recognition

Similar to Kaggle Google Landmark recognition (20)

More from Eduard Tyantov

More from Eduard Tyantov (8)

Recently uploaded

Recently uploaded (20)

Kaggle Google Landmark recognition