SlideShare a Scribd company logo
1 of 37
Kaggle: CVPR attractions
Andrew Boiarov, Tyantov Eduard
Challenge overview
Overview
– Goal – attraction recognition
– 14951 classes
– 1.2M train images
– 117k test images
Challenge
Examples
Examples
Examples: more challenging
Train label distribution
Image per class Number of classes Total images,
k
>10k 6 168
1k..10k 123 280
100..1000 1978 581
10..100 6674 222
<10 6170 33
Train
– Geolocation
– Visual similarity
Test
– Human annotators
=> Train distribution != Test
Data labeling process
90-95% no landmarks out of 117k
Test set
Test juice
Global Average Precision (GAP)
– Order matters
– Errors in first positions greatly affect score
Evaluation
Solutions
Options
– Softmax
– Metric learning: center loss,
arcface, triplet, …
– DELF: key points
• reported poor performance
How to train ?
Options
– Softmax
– Centroids
– KNN
How to handle not attractions ?
Inference
Like in Cloud:
– MRG-network
• Wideresnet 50-2
• trained on scenes & attractions
– Softmax + Center loss
– Centroids
– Distance threshold for NA
First attempt
Name Public LB Private LB
MRG default 0.091 0.081
Added
– Scenes/OpenImages as Not Attractions
– WeightedRandomSampler to balance dataset
• NA = 1/2 of the weights
– Results Softmax only for NA (p>0.5)
Little tricks
Name Public LB Private LB
MRG tricks 0.123 0.081
Changes:
– Arcface from Face Recognition
• m=0.25
– NA = 1/3 of the weights
– Softmax NA > 0.25
Arcface
Name Public LB Private LB
Arcface 0.193 0.190 (+0.1)
Test-time augmentation
– 10 crops for inference
• center crop
• 4 corners
• + flip (2x)
TTA
Name Public LB Private LB
Arcface + TTA 0.201 0.202 (+0.012)
It works!
Tested other pretrained models
– Resnet101, Inception_v3, Densenet121
– 1024 embedding layer
– 0.01 lr modifier for pretrained blocks + warmup
– Arcface, TTA
Other architectures
Name Public LB Private LB
Resnet101 (imagenet) 0.223 0.205 (+0.003)
Densenet (imagenet) 0.220 0.213 (+0.01)
13th place
Resnet101
(openimages)
0.200 0.199
– We used all models (5 single, 3 k-folds)
• top-10 predictions for each row
– All k-folds models – averaged prediction
– Weights according to LB (val doesn’t correlate with test)
Voting
Name Public LB Private LB
Ensemble 0.241 0.228 (+0.015)
Experiments
Centroids
class centroid
embedding
embedding
embedding
– Main part on inference
– Experiments
• Various per centroid images number
• Filter each class
• Several centroids per class (via clustering)
• Not improve 
– 100 images per class is the magic constant
Centroids experiments
– Centroids approach is an approximation
– Raw kNN
– Filter data via hierarchical clustering then kNN
– Optimal k = 3
kNN
Name Public LB Private LB
ResNet-101 + TTA + clean
data + k=3
0.194 0.195
4th place
– ResNet-50 for not attractions recognition (OpenImages)
– kNN:
• k = 5
• scores[class_id] = sum(cos(query_image, index) for index in K_closest_images)
• scores[class_id] /= min(K, number of samples in train dataset with class=class_id)
• label = argmax(scores), confidence = scores[label]
– 100 augmented local crops from each image + kNN
– Simple voting
– Pure single model, overfitting
Other competitors
Name Public LB Private LB
ods 0.323 0.255
Results
Impressive examples: test query (1)
Impressive examples: results (1)
Impressive examples: results (1)
Impressive examples: test query (2)
Impressive examples: results (2)
Impressive examples: results (2)
Impressive examples: test query (3)
Impressive examples: results (3)
Impressive examples: results (3)
Private Leaderboard
Our solution in text: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/58050
Mail.ru Vision solution: 10th place
Train
– Arcface for Metric Learning (instead of Softmax)
– Data skew: sample ~ sqrt(class frequency)
– Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset)
– Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3
Inference
– Centroid per landmark (random 100 elements): closest by cosine distance
– Not landmarks excluded by Softmax
– TTA: ten-crop augmentation
Model Public LB Private LB
DenseNet-121
(from Imagenet)
0.220 0.213
13th place
Average of 5 single
models & 3 k-folds
0.241 0.228
Post about our solution.
Team members:
• Eduard Tyantov
tyantov@corp.mail.ru
• Andrei Boiarov
a.boiarov@corp.mail.ru

More Related Content

What's hot

SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
Reza Rahimi
 
Bees algorithm
Bees algorithmBees algorithm
Bees algorithm
Amrit Kaur
 

What's hot (20)

Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Birch
BirchBirch
Birch
 
implementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity pptimplementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity ppt
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Bee algorithm
Bee algorithmBee algorithm
Bee algorithm
 
KNN
KNNKNN
KNN
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.ppt
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classification
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Bees algorithm
Bees algorithmBees algorithm
Bees algorithm
 

Similar to Kaggle Google Landmark recognition

Similar to Kaggle Google Landmark recognition (20)

Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Face Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchFace Recognition: From Scratch To Hatch
Face Recognition: From Scratch To Hatch
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
 
Deep parking
Deep parkingDeep parking
Deep parking
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
Supervised Learning of Semantic Classes for Image Annotation and RetrievalSupervised Learning of Semantic Classes for Image Annotation and Retrieval
Supervised Learning of Semantic Classes for Image Annotation and Retrieval
 
Text categorization
Text categorizationText categorization
Text categorization
 
Lower back pain Regression models
Lower back pain Regression modelsLower back pain Regression models
Lower back pain Regression models
 
Building ML Pipelines
Building ML PipelinesBuilding ML Pipelines
Building ML Pipelines
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
machine_learning.pptx
machine_learning.pptxmachine_learning.pptx
machine_learning.pptx
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
 
Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning Pipelines
 
[ppt]
[ppt][ppt]
[ppt]
 

More from Eduard Tyantov

Эксплуатация ML в Почте Mail.ru
Эксплуатация ML в Почте Mail.ruЭксплуатация ML в Почте Mail.ru
Эксплуатация ML в Почте Mail.ru
Eduard Tyantov
 
Опыт моделеварения от команды ComputerVision Mail.ru
Опыт моделеварения от команды ComputerVision Mail.ruОпыт моделеварения от команды ComputerVision Mail.ru
Опыт моделеварения от команды ComputerVision Mail.ru
Eduard Tyantov
 
Саморазвитие: как я не усидел на двух стульях и нашел третий
Саморазвитие: как я не усидел на двух стульях и нашел третийСаморазвитие: как я не усидел на двух стульях и нашел третий
Саморазвитие: как я не усидел на двух стульях и нашел третий
Eduard Tyantov
 
Project Management 2.0: AI Transformation
Project Management 2.0: AI TransformationProject Management 2.0: AI Transformation
Project Management 2.0: AI Transformation
Eduard Tyantov
 

More from Eduard Tyantov (8)

Эксплуатация ML в Почте Mail.ru
Эксплуатация ML в Почте Mail.ruЭксплуатация ML в Почте Mail.ru
Эксплуатация ML в Почте Mail.ru
 
Опыт моделеварения от команды ComputerVision Mail.ru
Опыт моделеварения от команды ComputerVision Mail.ruОпыт моделеварения от команды ComputerVision Mail.ru
Опыт моделеварения от команды ComputerVision Mail.ru
 
Саморазвитие: как я не усидел на двух стульях и нашел третий
Саморазвитие: как я не усидел на двух стульях и нашел третийСаморазвитие: как я не усидел на двух стульях и нашел третий
Саморазвитие: как я не усидел на двух стульях и нашел третий
 
Project Management 2.0: AI Transformation
Project Management 2.0: AI TransformationProject Management 2.0: AI Transformation
Project Management 2.0: AI Transformation
 
Deep Learning: Advances Of The Last Year
Deep Learning: Advances Of The Last Year Deep Learning: Advances Of The Last Year
Deep Learning: Advances Of The Last Year
 
Kaggle review Planet: Understanding the Amazon from Space
Kaggle reviewPlanet: Understanding the Amazon from SpaceKaggle reviewPlanet: Understanding the Amazon from Space
Kaggle review Planet: Understanding the Amazon from Space
 
Ultrasound nerve segmentation, kaggle review
Ultrasound nerve segmentation, kaggle reviewUltrasound nerve segmentation, kaggle review
Ultrasound nerve segmentation, kaggle review
 
Artisto App, Highload 2016
Artisto App, Highload 2016Artisto App, Highload 2016
Artisto App, Highload 2016
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Kaggle Google Landmark recognition

  • 1. Kaggle: CVPR attractions Andrew Boiarov, Tyantov Eduard
  • 3. Overview – Goal – attraction recognition – 14951 classes – 1.2M train images – 117k test images Challenge
  • 7. Train label distribution Image per class Number of classes Total images, k >10k 6 168 1k..10k 123 280 100..1000 1978 581 10..100 6674 222 <10 6170 33
  • 8. Train – Geolocation – Visual similarity Test – Human annotators => Train distribution != Test Data labeling process
  • 9. 90-95% no landmarks out of 117k Test set
  • 11. Global Average Precision (GAP) – Order matters – Errors in first positions greatly affect score Evaluation
  • 13. Options – Softmax – Metric learning: center loss, arcface, triplet, … – DELF: key points • reported poor performance How to train ?
  • 14. Options – Softmax – Centroids – KNN How to handle not attractions ? Inference
  • 15. Like in Cloud: – MRG-network • Wideresnet 50-2 • trained on scenes & attractions – Softmax + Center loss – Centroids – Distance threshold for NA First attempt Name Public LB Private LB MRG default 0.091 0.081
  • 16. Added – Scenes/OpenImages as Not Attractions – WeightedRandomSampler to balance dataset • NA = 1/2 of the weights – Results Softmax only for NA (p>0.5) Little tricks Name Public LB Private LB MRG tricks 0.123 0.081
  • 17. Changes: – Arcface from Face Recognition • m=0.25 – NA = 1/3 of the weights – Softmax NA > 0.25 Arcface Name Public LB Private LB Arcface 0.193 0.190 (+0.1)
  • 18. Test-time augmentation – 10 crops for inference • center crop • 4 corners • + flip (2x) TTA Name Public LB Private LB Arcface + TTA 0.201 0.202 (+0.012) It works!
  • 19. Tested other pretrained models – Resnet101, Inception_v3, Densenet121 – 1024 embedding layer – 0.01 lr modifier for pretrained blocks + warmup – Arcface, TTA Other architectures Name Public LB Private LB Resnet101 (imagenet) 0.223 0.205 (+0.003) Densenet (imagenet) 0.220 0.213 (+0.01) 13th place Resnet101 (openimages) 0.200 0.199
  • 20. – We used all models (5 single, 3 k-folds) • top-10 predictions for each row – All k-folds models – averaged prediction – Weights according to LB (val doesn’t correlate with test) Voting Name Public LB Private LB Ensemble 0.241 0.228 (+0.015)
  • 23. – Main part on inference – Experiments • Various per centroid images number • Filter each class • Several centroids per class (via clustering) • Not improve  – 100 images per class is the magic constant Centroids experiments
  • 24. – Centroids approach is an approximation – Raw kNN – Filter data via hierarchical clustering then kNN – Optimal k = 3 kNN Name Public LB Private LB ResNet-101 + TTA + clean data + k=3 0.194 0.195
  • 25. 4th place – ResNet-50 for not attractions recognition (OpenImages) – kNN: • k = 5 • scores[class_id] = sum(cos(query_image, index) for index in K_closest_images) • scores[class_id] /= min(K, number of samples in train dataset with class=class_id) • label = argmax(scores), confidence = scores[label] – 100 augmented local crops from each image + kNN – Simple voting – Pure single model, overfitting Other competitors Name Public LB Private LB ods 0.323 0.255
  • 36. Private Leaderboard Our solution in text: https://www.kaggle.com/c/landmark-recognition-challenge/discussion/58050
  • 37. Mail.ru Vision solution: 10th place Train – Arcface for Metric Learning (instead of Softmax) – Data skew: sample ~ sqrt(class frequency) – Added Places/OpenImages images for not landmarks (1/3 of the sampled dataset) – Models: ResNet-101, WideResNet-50-2, DenseNet-121, Inception_v3 Inference – Centroid per landmark (random 100 elements): closest by cosine distance – Not landmarks excluded by Softmax – TTA: ten-crop augmentation Model Public LB Private LB DenseNet-121 (from Imagenet) 0.220 0.213 13th place Average of 5 single models & 3 k-folds 0.241 0.228 Post about our solution. Team members: • Eduard Tyantov tyantov@corp.mail.ru • Andrei Boiarov a.boiarov@corp.mail.ru