1) The document evaluates how state-of-the-art convolutional neural networks (CNNs) perform on image recognition tasks when images are exposed to different types of noise, distortions and compression.
2) It finds that while CNN models are robust to mild exposure issues and noise, performance decreases significantly under moderate to severe exposure problems and salt and pepper noise.
3) Larger CNN models like NASNet Large perform best, while smaller mobile models are most affected by distortions. The study aims to improve CNN robustness and build image processing pipelines to handle faulty data.
2. OUTLINE
Evaluating image recognition models behiond validation sets
• Perception / Vision is an important component of modern autonomous systems
• CNNs hold the state-of-the-art in image recognition
• Growing interest in reliability / robustness
• Comprehensive assessment
• Clear methodology
• State-of-the-art models
• Several types of distortion
• Further directions
• How can we build better models?
• Can we prevent systems from operating on faulty data?
• Can we build better pipelines?
2
3. MOTIVATION
Under-exposure conditions
3
• Weakly illuminated scenes
• Time constraints (i.e. the robot depends on the image acquisition/processing to make a decision)
• Scenes with high dynamic ranges
• Small apperture (hardware construction)
• Low quality/cost sensors
Properly exposed Low Range Gamma 2 Gamma 4 Gamma 8
5. MOTIVATION
Over-Exposure
5
• Scene with high dynamic range
• Ill adjusted optics/gain
• Time constraints
• Reflective surfaces
• Low dynamic range sensors
Properly exposed Low Range Gamma 1/2 Gamma 1/4 Gamma 1/8
6. MOTIVATION
Lossy Compression, Poisson, Gaussian, Salt & Pepper and Speckle Noise
6
• Bandwidth limitation
• Storage limitation
• Sensor quality
• Dead pixels (always off or always on)
• Wear and tear
• Dust, damage on lens and sensors, noise
Over-compression Poisson Noise Gaussian Noise Salt & Pepper Speckle Noise
8. PROCEDURE
A procedure that can be reproduced and used for any vision task
• We use pre-trained image recognition models
• No fine-tuning
• Exact same preprocessing as in the original implementation
• Oficial Imagenet validation set
• 1000 classes
• 50 images per class
• Inference on:
• Original set (to avoid hardware related, interpolation and other bias)
• 8 levels of misexposure
• Over-compressed images
• 4 types of typical noise
8
11. RESULTS - INCEPTION-RESNET-V2
Overall good performance. Robust towards mild mis-exposure, compression, Gaussian and Poisson
11
FNs are limited to 50 due to the validation
dataset properties
No upper bound for FP
Statistics are per class:
A median of 10 means that 50% of the
classes in the dataset presented 10 or less
false negatives.
What is more important?
Would you rather overrun a person due to a
FN or stop in the middle of the road due to a
FP ?
12. RESULTS - MOBILENETV1
No robustness to S&P and Speckle Noise. Highly affected by moderate mis-exposure.
12
13. RESULTS - NASNET LARGE
Best accuracy, precision, and F1-Score among all models considered in this study
13
14. RESULTS - NASNET MOBILE
Significantly affected by severe miss-exposure conditions, S&P, and Speckle noise
14
16. RESULTS – XCEPTION
Robust towards moderate mis-exposure, over-compression, Gaussian and Poisson noise
16
17. CONCLUSION
New is Always better! Larger is better!
• Relevant
• Autonomous systems
• Robotics
• Applications that rely on visual perception
• Comprehensive experiment
• Broad set of classifiers
• Based on standard ILSVRC validation set
• Poor exposure
• Heavy compression
• Signal independent noise
• Signal dependent noise
• Reproducible procedure
• Objective evaluation
• No human bias
17
18. CONCLUSION
New is Always better! Larger is better!
• Most models are
• Little affected by mild miss-exposure.
• Robust towards Poisson and Gaussian noise
• Critically affected by moderate to severe miss exposure
• Critically affected by S&P and Speckle noise
• CNNs are evolving
• Modern architectures, such as NASNet, Inception Resnet v2 and Xception are more robust
• VGG is among the least robust
• Large models are better
• NOT you VGG!!
• NASNet Large performs significantly better than its Mobile version (while both share the same building
blocks)
• Mobile models are most affected
18
19. ONGOINGAND FUTURE WORK
We have a real issue! How can we solve it?
• Could the models’ accuracy be improved by adding these
common distortions in training time?
😞 Preliminary results show small improvement
• Can we build image processing pipelines which protect the
application from failing due to faulty data?
😃 Absolutelly! Preliminary results are promising 👉
• Can we prevent ill exposure in mobile/outdoor robotics?
⏳ Future Work
• Can we improve classification models by putting more
emphasis on image classes that are more prone to error?
⏳ Future Work
19
👈 Damaged
👈 Restored
☝️ Original