SlideShare a Scribd company logo
1 of 34
Download to read offline
From ImageNet to Image Classification:
Contextualizing Progress on Benchmarks
Dimitris Tsipras, et al., “From ImageNet to Image Classification: Contextualizing Progress on Benchmarks”
5th July, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
banana? or orange?
Related Papers
• Lucas Beyer, et al., “Are we done with ImageNet?”
 https://arxiv.org/abs/2006.07159
• Kai Xiao, et al., “Noise or Signal:The Role of Image Backgrounds in
Object Recognition”
 https://arxiv.org/abs/2006.09994
Datasets for Image Classification
• How aligned are existing benchmarks with their motivating real-world
tasks?
• The sheer size of machine learning datasets makes meticulous data
curation virtually impossible. Dataset creators thus resort to scalable
methods such as automated data retrieval and crowd-sourced
annotation.
• As a result, the dataset and its corresponding annotations can
sometimes be ambiguous, incorrect, or otherwise misaligned with
ground truth.
Correct Labels?
projectile acoustic guitar church American
Staffordshire terrier
ImageNet
Labels
ImageNetCreation Pipeline
• Image and label collection
 The ImageNet creators first selected a set of classes using the WordNet
hierarchy.
 Then, for each class, they sourced images by querying several search engines
with the WordNet synonyms of the class, augmented with synonyms of its
WordNet parent node(s).
 To further expand the image pool, the dataset creators also performed
queries in several languages.
ImageNet Creation Pipeline
• Image validation via the CONTAINS task
 The ImageNet creators employed annotators via the MechanicalTurk (MTurk)
crowd-sourcing platform.
 Annotators were presented with its description (along with links to the
relevant Wikipedia pages) and a grid of candidate images.
 Their task was then to select all images in that grid that contained an object
of that class.
Revisiting the ImageNet Labels
• The root cause for many of these errors is that the image validation
stage (i.e., the CONTAINS task) asks annotators only to verify if a
specific proposed label.
• Crucially, annotators are never asked to choose among different
possible labels for the image and, in fact, have no knowledge of what
the other classes even are.
Images with Multiple Objects
• Annotators are instructed to ignore the presence of other objects
when validating a particular ImageNet label for an image.
Biases in Image Filtering
• Since annotators have no knowledge of what the other classes are,
they do not have a sense of the granularity of image features they
should pay attention to.
• Moreover, the task itself does not necessary account for their
expertise, e.g., between all the 24 terrier breeds that are present in
ImageNet.
From LabelValidation to Image Classification
• Authors would like them to classify the image, selecting all the
relevant labels for it. However, asking (untrained) annotators to
choose from among all 1,000 ImageNet classes is infeasible.
• First, they obtain a small set of (potentially) relevant candidate labels
for each image.
• Then, they present these labels to annotators and ask them to select
one of them for each distinct object using what they call the
CLASSIFY task.
• They use 10,000 images from ImageNet validation set – 10 randomly
selected images per class.
Obtaining Candidate Labels
• Potential labels for each image by simply combining the top-5
predictions of 10 models from different parts of the accuracy
spectrum with the existing ImageNet label.
• Asking annotators whether an image contains a particular class but
for all potential labels.The outcome of this experiment is a selection
frequency for each image-label pair, i.e., the fraction of annotators
that selected the image as containing the corresponding label
Obtaining Candidate Labels
Image Classification via the CLASSIFYTask
• Asking annotators to identify
 All labels that correspond to objects in the image.
 The label for the main object (according to their judgment).
 Select only one label per distinct object except mutually exclusive objects.
• Identifying the main label and number of objects
 From each annotator’s response, authors learn what they consider to be the
label of the main object, as well as how many objects they think are present
in the image
Image Classification via the CLASSIFYTask
Quantifying the Benchmark-Task Alignment of
ImageNet
• Multi-object images
 More than 1/5 contains at least two objects.
 Models perform significantly worse on multi-label images based on top-1
accuracy (measured w.r.t. ImageNet labels): accuracy drops by more than
10% across all models.
 There are pairs of classes which consistently co-occur.
 Model accuracy is especially low on certain classes that systematically co-
occur.
 In light of this, a more natural notion of accuracy for multi-object images
would be to consider a model prediction to be correct if it matches the label
of any object in the image.
Multi-Object Images
Multi-Object Images
Multi-Object Images
Multi-Object Images
• Human-label disagreement
 Although models suffer a sizeable accuracy drop on multi-object images,
they are still relatively good at predicting the ImageNet label.
 This bias could be justified whenever there is a distinct main object in the
image and it corresponds to the ImageNet label.
 However, for nearly a third of the multi-object images, the ImageNet label
does not denote the most likely main object as judged by human annotators.
Human-Label disagreement
Human-Label disagreement
Human-Label disagreement
Quantifying the Benchmark-Task Alignment of
ImageNet
• Bias in label validation
 Recall that annotators are asked a somewhat leading question, of whether a
specific label is present in the image, making them prone to answering
positively even for images of a different, yet similar, class.
 Under the original task setup (i.e., the CONTAINS task), annotators
consistently select multiple labels, for nearly 40% of the images, another
label is selected at least as often as the ImageNet label.
 Even when annotators perceive a single object, they often select as many as
10 classes.
 If instead of asking annotators to judge the validity of a specific label(s) in
isolation, asking them to choose the main object among several possible
labels simultaneously (i.e., via the CLASSIFY task), they select substantially
fewer labels.
Bias in LabelValidation
Bias in LabelValidation
Bias in LabelValidation
• Confusing class pairs
 There are several pairs of ImageNet classes that annotators have trouble
telling apart—they consistently select both labels as valid for images of
either class.
 On some of these pairs we see that models still perform well.
 However, for other pairs, even state-of-the-art models have poor accuracy
(below 40%).
 If that is indeed the case, it is natural to wonder whether accuracy on
ambiguous classes can be improved without overfitting to the ImageNet test
set.
Bias in LabelValidation
• Confusing class pairs
 The annotators’ inability to remedy overlaps in the case of ambiguous class
pairs could be due to the presence of classes that are semantically quite
similar, e.g., “rifle” and “assault rifle”.
 In some cases, there also were errors in the task setup. For instance, there
were occasional overlaps in the class names (e.g., “maillot” and “maillot,
tank suit”) andWikipedia links (e.g., “laptop computer” and “notebook
computer”) presented to the annotators.
BeyondTest Accuracy: Human-In-The-Loop
Model Evaluation
• Human assessment of model predictions
 Selection frequency of the prediction
How often annotators select the predicted label as being present in the image
(determined using the CONTAINS task).
 Accuracy based on main label annotation
How frequently the prediction matches the main label for the image, as obtained using
our CLASSIFY task.
Human Assessment of Model Predictions
Human Assessment of Model Predictions –
Incorrect Predictions
Human Assessment of Model Predictions
• The predictions of state-of-the-art models have gotten, on average,
quite close to ImageNet labels.
• That is, annotators are almost equally likely to select the predicted
label as being present in the image (or being the main object) as the
ImageNet label.
• This indicates that model predictions might be closer to what non-
expert annotators can recognize as the ground truth.
• This also means, we are at a point where we may no longer by able to
easily identify (e.g., using crowd-sourcing) the extent to which
further gains in accuracy correspond to such improvements.
Confusion Matrix
Conclusion
• There are many images with multiple valid labels, and ambiguous classes
in ImageNet.
• Top-1 accuracy often underestimates the performance of models by
unduly penalizing them for predicting the label of a different, but also
valid, image object.
• Taking a step towards evaluation metrics that circumvent these issues,
authors utilize human annotators to directly judge the correctness of
model predictions.
• On the positive side, they find that models that are more accurate on
ImageNet also tend to be more human-aligned in their errors.
• While this might be reassuring, it also indicates that we are at a point
where we cannot easily gauge (e.g., via simple crowd-sourcing) whether
further progress on the ImageNet benchmark is meaningful, or is simply a
result of overfitting to this benchmarks’ idiosyncrasies.

More Related Content

What's hot

Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1ananth
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionSafaa Alnabulsi
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_papershanullah3
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist databasebtandale
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkItachi SK
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationYogendra Tamang
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTRishabhTyagi48
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionananth
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
Random Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterRandom Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterEditor IJCATR
 

What's hot (20)

Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
One shot learning
One shot learningOne shot learning
One shot learning
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Efficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image ClassficationEfficient Neural Network Architecture for Image Classfication
Efficient Neural Network Architecture for Image Classfication
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Random Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterRandom Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural Filter
 

Similar to PR-258: From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Supervised Learning
Supervised LearningSupervised Learning
Supervised LearningFEG
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxK Manjunath
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and NowSi Krishan
 
Novel Hybrid Approach to Visual Concept Detection Using Image Annotation
Novel Hybrid Approach to Visual Concept Detection Using Image AnnotationNovel Hybrid Approach to Visual Concept Detection Using Image Annotation
Novel Hybrid Approach to Visual Concept Detection Using Image AnnotationCSCJournals
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
Cs231n convolutional neural networks for visual recognition
Cs231n convolutional neural networks for visual recognitionCs231n convolutional neural networks for visual recognition
Cs231n convolutional neural networks for visual recognitionvidhya DS
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...JacobSilbiger1
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Face Detection techniques
Face Detection techniquesFace Detection techniques
Face Detection techniquesAbhineet Bhamra
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learningZAMANCHBWN
 
Course Title CS591-Advance Artificial Intelligence
Course Title CS591-Advance Artificial Intelligence           Course Title CS591-Advance Artificial Intelligence
Course Title CS591-Advance Artificial Intelligence CruzIbarra161
 
Leveraging social media for training object detectors
Leveraging social media for training object detectorsLeveraging social media for training object detectors
Leveraging social media for training object detectorsManish Kumar
 
Comparison of Various Web Image Re - Ranking Techniques
Comparison of Various Web Image Re - Ranking TechniquesComparison of Various Web Image Re - Ranking Techniques
Comparison of Various Web Image Re - Ranking TechniquesIJSRD
 
Watson Visual Recognition - NYC JavaSig
Watson Visual Recognition - NYC JavaSigWatson Visual Recognition - NYC JavaSig
Watson Visual Recognition - NYC JavaSigUpkar Lidder
 
Learning to Rank Image Tags With Limited Training Examples
Learning to Rank Image Tags With Limited Training ExamplesLearning to Rank Image Tags With Limited Training Examples
Learning to Rank Image Tags With Limited Training Examples1crore projects
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2zukun
 

Similar to PR-258: From ImageNet to Image Classification: Contextualizing Progress on Benchmarks (20)

Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and Now
 
Novel Hybrid Approach to Visual Concept Detection Using Image Annotation
Novel Hybrid Approach to Visual Concept Detection Using Image AnnotationNovel Hybrid Approach to Visual Concept Detection Using Image Annotation
Novel Hybrid Approach to Visual Concept Detection Using Image Annotation
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
Cs231n convolutional neural networks for visual recognition
Cs231n convolutional neural networks for visual recognitionCs231n convolutional neural networks for visual recognition
Cs231n convolutional neural networks for visual recognition
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Face Detection techniques
Face Detection techniquesFace Detection techniques
Face Detection techniques
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learning
 
Course Title CS591-Advance Artificial Intelligence
Course Title CS591-Advance Artificial Intelligence           Course Title CS591-Advance Artificial Intelligence
Course Title CS591-Advance Artificial Intelligence
 
Leveraging social media for training object detectors
Leveraging social media for training object detectorsLeveraging social media for training object detectors
Leveraging social media for training object detectors
 
Comparison of Various Web Image Re - Ranking Techniques
Comparison of Various Web Image Re - Ranking TechniquesComparison of Various Web Image Re - Ranking Techniques
Comparison of Various Web Image Re - Ranking Techniques
 
Watson Visual Recognition - NYC JavaSig
Watson Visual Recognition - NYC JavaSigWatson Visual Recognition - NYC JavaSig
Watson Visual Recognition - NYC JavaSig
 
Learning to Rank Image Tags With Limited Training Examples
Learning to Rank Image Tags With Limited Training ExamplesLearning to Rank Image Tags With Limited Training Examples
Learning to Rank Image Tags With Limited Training Examples
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 

More from Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

More from Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

PR-258: From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

  • 1. From ImageNet to Image Classification: Contextualizing Progress on Benchmarks Dimitris Tsipras, et al., “From ImageNet to Image Classification: Contextualizing Progress on Benchmarks” 5th July, 2020 PR12 Paper Review JinWon Lee Samsung Electronics banana? or orange?
  • 2. Related Papers • Lucas Beyer, et al., “Are we done with ImageNet?”  https://arxiv.org/abs/2006.07159 • Kai Xiao, et al., “Noise or Signal:The Role of Image Backgrounds in Object Recognition”  https://arxiv.org/abs/2006.09994
  • 3. Datasets for Image Classification • How aligned are existing benchmarks with their motivating real-world tasks? • The sheer size of machine learning datasets makes meticulous data curation virtually impossible. Dataset creators thus resort to scalable methods such as automated data retrieval and crowd-sourced annotation. • As a result, the dataset and its corresponding annotations can sometimes be ambiguous, incorrect, or otherwise misaligned with ground truth.
  • 4. Correct Labels? projectile acoustic guitar church American Staffordshire terrier ImageNet Labels
  • 5. ImageNetCreation Pipeline • Image and label collection  The ImageNet creators first selected a set of classes using the WordNet hierarchy.  Then, for each class, they sourced images by querying several search engines with the WordNet synonyms of the class, augmented with synonyms of its WordNet parent node(s).  To further expand the image pool, the dataset creators also performed queries in several languages.
  • 6. ImageNet Creation Pipeline • Image validation via the CONTAINS task  The ImageNet creators employed annotators via the MechanicalTurk (MTurk) crowd-sourcing platform.  Annotators were presented with its description (along with links to the relevant Wikipedia pages) and a grid of candidate images.  Their task was then to select all images in that grid that contained an object of that class.
  • 7. Revisiting the ImageNet Labels • The root cause for many of these errors is that the image validation stage (i.e., the CONTAINS task) asks annotators only to verify if a specific proposed label. • Crucially, annotators are never asked to choose among different possible labels for the image and, in fact, have no knowledge of what the other classes even are.
  • 8. Images with Multiple Objects • Annotators are instructed to ignore the presence of other objects when validating a particular ImageNet label for an image.
  • 9. Biases in Image Filtering • Since annotators have no knowledge of what the other classes are, they do not have a sense of the granularity of image features they should pay attention to. • Moreover, the task itself does not necessary account for their expertise, e.g., between all the 24 terrier breeds that are present in ImageNet.
  • 10. From LabelValidation to Image Classification • Authors would like them to classify the image, selecting all the relevant labels for it. However, asking (untrained) annotators to choose from among all 1,000 ImageNet classes is infeasible. • First, they obtain a small set of (potentially) relevant candidate labels for each image. • Then, they present these labels to annotators and ask them to select one of them for each distinct object using what they call the CLASSIFY task. • They use 10,000 images from ImageNet validation set – 10 randomly selected images per class.
  • 11. Obtaining Candidate Labels • Potential labels for each image by simply combining the top-5 predictions of 10 models from different parts of the accuracy spectrum with the existing ImageNet label. • Asking annotators whether an image contains a particular class but for all potential labels.The outcome of this experiment is a selection frequency for each image-label pair, i.e., the fraction of annotators that selected the image as containing the corresponding label
  • 13. Image Classification via the CLASSIFYTask • Asking annotators to identify  All labels that correspond to objects in the image.  The label for the main object (according to their judgment).  Select only one label per distinct object except mutually exclusive objects. • Identifying the main label and number of objects  From each annotator’s response, authors learn what they consider to be the label of the main object, as well as how many objects they think are present in the image
  • 14. Image Classification via the CLASSIFYTask
  • 15. Quantifying the Benchmark-Task Alignment of ImageNet • Multi-object images  More than 1/5 contains at least two objects.  Models perform significantly worse on multi-label images based on top-1 accuracy (measured w.r.t. ImageNet labels): accuracy drops by more than 10% across all models.  There are pairs of classes which consistently co-occur.  Model accuracy is especially low on certain classes that systematically co- occur.  In light of this, a more natural notion of accuracy for multi-object images would be to consider a model prediction to be correct if it matches the label of any object in the image.
  • 19. Multi-Object Images • Human-label disagreement  Although models suffer a sizeable accuracy drop on multi-object images, they are still relatively good at predicting the ImageNet label.  This bias could be justified whenever there is a distinct main object in the image and it corresponds to the ImageNet label.  However, for nearly a third of the multi-object images, the ImageNet label does not denote the most likely main object as judged by human annotators.
  • 23. Quantifying the Benchmark-Task Alignment of ImageNet • Bias in label validation  Recall that annotators are asked a somewhat leading question, of whether a specific label is present in the image, making them prone to answering positively even for images of a different, yet similar, class.  Under the original task setup (i.e., the CONTAINS task), annotators consistently select multiple labels, for nearly 40% of the images, another label is selected at least as often as the ImageNet label.  Even when annotators perceive a single object, they often select as many as 10 classes.  If instead of asking annotators to judge the validity of a specific label(s) in isolation, asking them to choose the main object among several possible labels simultaneously (i.e., via the CLASSIFY task), they select substantially fewer labels.
  • 26. Bias in LabelValidation • Confusing class pairs  There are several pairs of ImageNet classes that annotators have trouble telling apart—they consistently select both labels as valid for images of either class.  On some of these pairs we see that models still perform well.  However, for other pairs, even state-of-the-art models have poor accuracy (below 40%).  If that is indeed the case, it is natural to wonder whether accuracy on ambiguous classes can be improved without overfitting to the ImageNet test set.
  • 27. Bias in LabelValidation • Confusing class pairs  The annotators’ inability to remedy overlaps in the case of ambiguous class pairs could be due to the presence of classes that are semantically quite similar, e.g., “rifle” and “assault rifle”.  In some cases, there also were errors in the task setup. For instance, there were occasional overlaps in the class names (e.g., “maillot” and “maillot, tank suit”) andWikipedia links (e.g., “laptop computer” and “notebook computer”) presented to the annotators.
  • 28.
  • 29. BeyondTest Accuracy: Human-In-The-Loop Model Evaluation • Human assessment of model predictions  Selection frequency of the prediction How often annotators select the predicted label as being present in the image (determined using the CONTAINS task).  Accuracy based on main label annotation How frequently the prediction matches the main label for the image, as obtained using our CLASSIFY task.
  • 30. Human Assessment of Model Predictions
  • 31. Human Assessment of Model Predictions – Incorrect Predictions
  • 32. Human Assessment of Model Predictions • The predictions of state-of-the-art models have gotten, on average, quite close to ImageNet labels. • That is, annotators are almost equally likely to select the predicted label as being present in the image (or being the main object) as the ImageNet label. • This indicates that model predictions might be closer to what non- expert annotators can recognize as the ground truth. • This also means, we are at a point where we may no longer by able to easily identify (e.g., using crowd-sourcing) the extent to which further gains in accuracy correspond to such improvements.
  • 34. Conclusion • There are many images with multiple valid labels, and ambiguous classes in ImageNet. • Top-1 accuracy often underestimates the performance of models by unduly penalizing them for predicting the label of a different, but also valid, image object. • Taking a step towards evaluation metrics that circumvent these issues, authors utilize human annotators to directly judge the correctness of model predictions. • On the positive side, they find that models that are more accurate on ImageNet also tend to be more human-aligned in their errors. • While this might be reassuring, it also indicates that we are at a point where we cannot easily gauge (e.g., via simple crowd-sourcing) whether further progress on the ImageNet benchmark is meaningful, or is simply a result of overfitting to this benchmarks’ idiosyncrasies.