Kaggle: CVPR attractions
Andrew Boiarov, Tyantov Eduard
Challenge overview
Overview
– Goal – attraction recognition
– 14951 classes
– 1.2M train images
– 117k test images
Challenge
Examples
Examples
Examples: more challenging
Train label distribution
Image per class Number of classes Total images,
k
>10k 6 168
1k..10k 123 280
100..1000 1978 581
10..100 6674 222
<10 6170 33
Train
– Geolocation
– Visual similarity
Test
– Human annotators
=> Train distribution != Test
Data labeling process
90-95% no landmarks out of 117k
Test set
Test juice
Global Average Precision (GAP)
– Order matters
– Errors in first positions greatly affect score
Evaluation
Solutions
Options
– Softmax
– Metric learning: center loss,
arcface, triplet, …
– DELF: key points
• reported poor performance
How to train ?
Options
– Softmax
– Centroids
– KNN
How to handle not attractions ?
Inference
Like in Cloud:
– MRG-network
• Wideresnet 50-2
• trained on scenes & attractions
– Softmax + Center loss
– Centroids
– Distance threshold for NA
First attempt
Name Public LB Private LB
MRG default 0.091 0.081
Added
– Scenes/OpenImages as Not Attractions
– WeightedRandomSampler to balance dataset
• NA = 1/2 of the weights
– Results Softmax only for NA (p>0.5)
Little tricks
Name Public LB Private LB
MRG tricks 0.123 0.081
Changes:
– Arcface from Face Recognition
• m=0.25
– NA = 1/3 of the weights
– Softmax NA > 0.25
Arcface
Name Public LB Private LB
Arcface 0.193 0.190 (+0.1)
Test-time augmentation
– 10 crops for inference
• center crop
• 4 corners
• + flip (2x)
TTA
Name Public LB Private LB
Arcface + TTA 0.201 0.202 (+0.012)
It works!
Tested other pretrained models
– Resnet101, Inception_v3, Densenet121
– 1024 embedding layer
– 0.01 lr modifier for pretrained blocks + warmup
– Arcface, TTA
Other architectures
Name Public LB Private LB
Resnet101 (imagenet) 0.223 0.205 (+0.003)
Densenet (imagenet) 0.220 0.213 (+0.01)
13th place
Resnet101
(openimages)
0.200 0.199
– We used all models (5 single, 3 k-folds)
• top-10 predictions for each row
– All k-folds models – averaged prediction
– Weights according to LB (val doesn’t correlate with test)
Voting
Name Public LB Private LB
Ensemble 0.241 0.228 (+0.015)
Experiments
Centroids
class centroid
embedding
embedding
embedding
– Main part on inference
– Experiments
• Various per centroid images number
• Filter each class
• Several centroids per class (via clustering)
• Not improve 
– 100 images per class is the magic constant
Centroids experiments
– Centroids approach is an approximation
– Raw kNN
– Filter data via hierarchical clustering then kNN
– Optimal k = 3
kNN
Name Public LB Private LB
ResNet-101 + TTA + clean
data + k=3
0.194 0.195
4th place
– ResNet-50 for not attractions recognition (OpenImages)
– kNN:
• k = 5
• scores[class_id] = sum(cos(query_image, index) for index in K_closest_images)
• scores[class_id] /= min(K, number of samples in train dataset with class=class_id)
• label = argmax(scores), confidence = scores[label]
– 100 augmented local crops from each image + kNN
– Simple voting
– Pure single model, overfitting
Other competitors
Name Public LB Private LB
ods 0.323 0.255
Results
Impressive examples: test query (1)
Impressive examples: results (1)
Impressive examples: results (1)
Impressive examples: test query (2)
Impressive examples: results (2)
Impressive examples: results (2)
Impressive examples: test query (3)
Impressive examples: results (3)
Impressive examples: results (3)
Private Leaderboard
Our solution in text: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/58050
Mail.ru Vision solution: 10th place
Train
– Arcface for Metric Learning (instead of Softmax)
– Data skew: sample ~ sqrt(class frequency)
– Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset)
– Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3
Inference
– Centroid per landmark (random 100 elements): closest by cosine distance
– Not landmarks excluded by Softmax
– TTA: ten-crop augmentation
Model Public LB Private LB
DenseNet-121
(from Imagenet)
0.220 0.213
13th place
Average of 5 single
models & 3 k-folds
0.241 0.228
Post about our solution.
Team members:
• Eduard Tyantov
tyantov@corp.mail.ru
• Andrei Boiarov
a.boiarov@corp.mail.ru

Kaggle Google Landmark recognition

  • 1.
    Kaggle: CVPR attractions AndrewBoiarov, Tyantov Eduard
  • 2.
  • 3.
    Overview – Goal –attraction recognition – 14951 classes – 1.2M train images – 117k test images Challenge
  • 4.
  • 5.
  • 6.
  • 7.
    Train label distribution Imageper class Number of classes Total images, k >10k 6 168 1k..10k 123 280 100..1000 1978 581 10..100 6674 222 <10 6170 33
  • 8.
    Train – Geolocation – Visualsimilarity Test – Human annotators => Train distribution != Test Data labeling process
  • 9.
    90-95% no landmarksout of 117k Test set
  • 10.
  • 11.
    Global Average Precision(GAP) – Order matters – Errors in first positions greatly affect score Evaluation
  • 12.
  • 13.
    Options – Softmax – Metriclearning: center loss, arcface, triplet, … – DELF: key points • reported poor performance How to train ?
  • 14.
    Options – Softmax – Centroids –KNN How to handle not attractions ? Inference
  • 15.
    Like in Cloud: –MRG-network • Wideresnet 50-2 • trained on scenes & attractions – Softmax + Center loss – Centroids – Distance threshold for NA First attempt Name Public LB Private LB MRG default 0.091 0.081
  • 16.
    Added – Scenes/OpenImages asNot Attractions – WeightedRandomSampler to balance dataset • NA = 1/2 of the weights – Results Softmax only for NA (p>0.5) Little tricks Name Public LB Private LB MRG tricks 0.123 0.081
  • 17.
    Changes: – Arcface fromFace Recognition • m=0.25 – NA = 1/3 of the weights – Softmax NA > 0.25 Arcface Name Public LB Private LB Arcface 0.193 0.190 (+0.1)
  • 18.
    Test-time augmentation – 10crops for inference • center crop • 4 corners • + flip (2x) TTA Name Public LB Private LB Arcface + TTA 0.201 0.202 (+0.012) It works!
  • 19.
    Tested other pretrainedmodels – Resnet101, Inception_v3, Densenet121 – 1024 embedding layer – 0.01 lr modifier for pretrained blocks + warmup – Arcface, TTA Other architectures Name Public LB Private LB Resnet101 (imagenet) 0.223 0.205 (+0.003) Densenet (imagenet) 0.220 0.213 (+0.01) 13th place Resnet101 (openimages) 0.200 0.199
  • 20.
    – We usedall models (5 single, 3 k-folds) • top-10 predictions for each row – All k-folds models – averaged prediction – Weights according to LB (val doesn’t correlate with test) Voting Name Public LB Private LB Ensemble 0.241 0.228 (+0.015)
  • 21.
  • 22.
  • 23.
    – Main parton inference – Experiments • Various per centroid images number • Filter each class • Several centroids per class (via clustering) • Not improve  – 100 images per class is the magic constant Centroids experiments
  • 24.
    – Centroids approachis an approximation – Raw kNN – Filter data via hierarchical clustering then kNN – Optimal k = 3 kNN Name Public LB Private LB ResNet-101 + TTA + clean data + k=3 0.194 0.195
  • 25.
    4th place – ResNet-50for not attractions recognition (OpenImages) – kNN: • k = 5 • scores[class_id] = sum(cos(query_image, index) for index in K_closest_images) • scores[class_id] /= min(K, number of samples in train dataset with class=class_id) • label = argmax(scores), confidence = scores[label] – 100 augmented local crops from each image + kNN – Simple voting – Pure single model, overfitting Other competitors Name Public LB Private LB ods 0.323 0.255
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Private Leaderboard Our solutionin text: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/58050
  • 37.
    Mail.ru Vision solution:10th place Train – Arcface for Metric Learning (instead of Softmax) – Data skew: sample ~ sqrt(class frequency) – Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset) – Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3 Inference – Centroid per landmark (random 100 elements): closest by cosine distance – Not landmarks excluded by Softmax – TTA: ten-crop augmentation Model Public LB Private LB DenseNet-121 (from Imagenet) 0.220 0.213 13th place Average of 5 single models & 3 k-folds 0.241 0.228 Post about our solution. Team members: • Eduard Tyantov tyantov@corp.mail.ru • Andrei Boiarov a.boiarov@corp.mail.ru