The document summarizes a research paper that investigated the impact of label errors in test sets on machine learning benchmarks. Some key points:
1. The researchers estimated label error rates in 10 popular datasets, finding an average of 3.6% errors. They corrected labels in CIFAR10 and ImageNet.
2. Experiments showed that while test label errors did not change the relative performance of models on the full test set, models could achieve higher accuracy on the errors compared to corrected labels.
3. This suggests models may overfit to common label errors in both train and test sets, undermining the reliability of benchmarks in selecting the best model. The importance of ensuring test label accuracy is emphasized.
29. 引用
1. Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In
Proceedings of the IEEE, pages 2278–2324, 1998.
2. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
3. G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology,
2007.
4. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Conference on
Computer Vision and Pattern Recognition (CVPR), 2009.
5. David Ha and Douglas Eck. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477, 2017.
6. Tom Mitchell. Twenty newsgroups dataset, 1999.
7. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for
sentiment analysis. In Annual Conference of the Association for Computational Linguistics (ACL), pages 142–150, Portland, Oregon,
USA, June 2011.
8. Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-based recommendations on styles and
substitutes. In Special Interest Group on Information Retrieval (SIGIR), pages 43–52, New York, NY, USA, 2015.
29
30. 引用
9. Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin
Ritter. Audio set: An ontology and human-labeled dataset for audio events. In IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), New Orleans, LA, 2017.
10. Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial
Intelligence Research (JAIR), 2021.
11. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.
12. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556, 2014.
30