A Survey of Image Classification with Deep Learning in the Presence of Noisy Labels
1. A Survey of Image Classification with Deep
Learning in the Presence of Noisy Labels
Monica Dommaraju
San Jose State University
2. Introduction
● The advancement of deep neural networks has placed major
importance in Image Classification, Object detection, Semantic
Segmentation, and others.
● But, they require huge amounts of labeled data to train the
model. This is a very expensive process and now it has become a
major problem and is practically very challenging.
● So, Label noise has become a common problem in many
datasets.
3. Two kinds of Noise in a dataset are:
1. Feature Noise corresponds to corruption in observed features of data.
2. Label Noise means the change of label from its true class.
● Both Noises try to cause a significant decrease in performance but Label
Noise is considered more harmful.
● Main factors that worsen the performance of classification are:
1. The label is unique for each data, while features are multiple.
2. The importance of each feature varies while the label always has a
significant impact.
4. Data features the true label of data, and the labeler characteristics are the main
factors that affect the label noise. The dependence of these factors it is again
classified into three subclasses.
1. Random Noise is totally random and does not depend on either instance
features or its true class.
2. Y-dependent noise is independent of image features but depends on its
class.
3. XY-dependent noise depends on both image features and its class
5. Label noise
● It is a natural outcome of the dataset collection process and can occur in various
domains, such as medical imaging, crowd-sourcing, social network tagging,
financial analysis, and many more.
● This work focuses on various solutions to such problems, but it may be helpful to
investigate the causes of label noise in order to understand the phenomenon
better.
6. Causes of Label Noise:
● Firstly, we can make use of the huge amount of data available on the web and
social media. But, as the labels are coming from automated systems used by
search engines or user tags which may result in noisy labels.
● Secondly, multiple experts can label the data, but each expert has different
experience levels which again leads to noisy labels.
● Sometimes, data is too complicated and even experts also will not be able to
label them correctly, for example, Medical Imaging.
● It can also be injected for the purpose of regularizing or data poisoning.
7. ● Different algorithms are used to find the noise structure and train the
base classifier with estimated new parameters.
● This paper mainly throws light on these algorithms which are categorized
into two subgroups:
1. Noise-model based methods
2. Noise-model free methods
8. Noise-model based methods:
● They aim for extracting the information which is noise-free from the dataset
by either neglecting or de-emphasizing the information coming from noisy
samples.
● Their performance is good when we have prior information about the
structure.
● The advantage of noise model-based methods is the label noise estimation
and decoupling of classification, which helps them to work with the
classification algorithm.
9. Different types of noise-model based methods are:
1. Noisy Channel:
● It is used in the training phase and it will be removed in the evaluation
phase as the classifier requires noise-free predictions.
2. Label Noise Cleansing
● Correcting the suspicious labels to their corresponding true class is the
main solution for the noisy label.
10. 3. Dataset Pruning
● We can remove the noisy labels instead of correcting them to their true
labels which will result in the loss of information and also prevents the
negative impact of noise.
4. Sample Choosing
● To overcome label noise Sample Choosing uses an approach to
manipulate the input stream to the classifier.
11. 5. Sample Importance Weighting
● In case of availability of both clean and noisy data, weighting clean data
more.
6. Labeler Quality Assessment
● Depending on the expertise of each labeler, the way they label changes
and very rarely contradict each other.
● Two unknown names will be heard while discussing this setup; noisy
labeler characteristics and ground truth labels.
12. Noise-model free methods:
● They aim to come up with inherent noise-robust methods without explicit
modeling of the noise structure. Prior information about the structure of
noise is not required.
● They mainly concentrate on regularizing the network training procedure to
avoid overfitting as these kinds of approaches assume that the classifier is not
too sensitive to the noise, and performance degradation is a result of
overfitting.
13. Different types of noise-model free methods:
1. Robust Losses
● A noise-robust loss function is said to be learned with the noise-free
and noisy data.
1. Meta-Learning
● It is known as learning how to learn. It is a potential learning
paradigm that can absorb information from one task and generalize
that information to unseen tasks proficiently.
14. 3. Regularizers
● Regularizer methods treat noisy data performance degradation as
overfitting to noise
4. Ensemble Methods
● It is well known that Bagging is best used to make noise more robust
rather than boosting.
● However, by choosing the boosting algorithm, the degree of label
noise robustness changes accordingly.
15. Conclusion:
● In order to achieve the best results from real-world datasets, Label noise is the
main obstacle to deal with.
● There are many other important fields other than image classification where
treating noisy labels is important like generative networks, semantic
segmentation, sound classification, and many more.
● Noise model-based methods are heavily dependent on the accurate estimate of the
noise structure. One can choose their best appropriate method based on their
preferences.
● Noise model-free methods do not require any prior information about the noise
structure. So, they are easier to implement if the noise is random and overfitting is