Image classification is a well-established problem in computer vision. Most state-of-the-art models rely on Convolutional Neural Networks to achieve near-human performance in that task. However, CNNs have shown to be susceptible to image manipulation, which undermines the trustability of perception systems. This property is critical, especially in unmanned systems, autonomous vehicles, and scenarios where light cannot be controlled. We investigate the robustness of several Deep-Learning based image recognition models and how the accuracy is affected by several distinct image distortions. The distortions include ill-exposure, low-range image sensors, and common noise types. Furthermore, we also propose and evaluate an image pipeline designed to minimize image distortion before the image classification is performed. Results show that most CNN models are marginally affected by mild miss-exposure...
2. ABOUT
• Badly exposed and noisy images
• Image Recognition / Classification
• Assessing the impacts of image distortions on the object rcognition task
• Image restoration and enhancement as a pre-processing strategy
• How much can it impact the results?
2
3. INTRODUCTION
How can the image be affected at capture time?
• Typical challenges
• High contrast (light source, contrasting scene)
• Low contrast
• Low light (weakly illuminated scene)
• Lens apperture
⬆ Larger aperture equals more light, shorter exposure time
⬇ Larger aperture equals lower depth of field
• Exposure time vs. Blur
⬆ Higher exposure time equals more light getting to the sensor
⬆ Higher time equals more blur artifacts (inadequate for scenes with moving subjects)
• Gain vs. Granularity
⬆ More gain equals more contrast
⬆ More gain equals in more noise (both signal and noise are amplified)
• Quantization, Sampling and Clipping
3
8. OURAPPROACH
To use a Convolutional Neural Network for image enhancement and noise supression
• Modular approach to maximize reuse
• Avoid model adjustment
• Work in the sRGB color-space
• Flexible to accept diferente image resolutions
8
Scene Picture
Enhance
De-noise
Predict
Dog
10. IMAGE RESTORATION MODELS
• ReExposeNet for mis-exposed images
• Based on U-Net and Context Aggergation Network
• One size fits all
• Small model in terms of parameters when compared to other with the same purpose
• Adjusted using supervised learning on sinthetic and real datasets
• DnCNN-3 for de-noising
• Very deep feed forward CNN
• Relies on Residual Rearning
• Can tackle several image denoising tasks, as well as JPEG deblocking and super-resolution
• Adequate for real-time applications
10
25. CONCLUSION
Comprehensive experiments on the robustness of state-of-the-art CNN-based image recognition
• Several common image distortions
• Ill exposure
• Signal dependente noise
• Signal independente noise
• Used a set of classifiers which had outstanding accuracy in the ILSVRC Competition
• We offer a succinct representation of the performance of the classifier
• Existing CNNs are little affected by slight miss-exposure or saturated pixel values
• Poisson noise and AWGN also have limited effect on the accuracy
• Models appear vulnerable to severe miss-exposure and signal-independent noise
• What next?
• Do segmentation, mapping, and localization systems follow the same robustness pattern?
25
Hi,
My name is Criatiano Steffens. I’ll be presenting “A Pipelined Approach to Deal with Image Distortion in Computer Vision”. This work was done with my colleague Lucas Messias under the supervision of Professor Drews and Professor Botelho.
Here, we extend a previous work, published las year, in which we showed the impacts of image distortion in several image-based applications.For those who are unfamiliar with the image recognition/ classification task, please consider this as a classic approach to deal with garbage-in > garbage-out issue. Our models are only as good as our input data. If you have faulty data, you are likely to get unreliable results.
Although images are multidimensional data, in which the content can often be inferred by the inherent relationship among distinct image parts, we show that the same issues still apply.
Image classification is a well-established problem in computer vision. Most state-of-the-art models rely on Convolutional Neural Networks to achieve near-human performance in that task.However, CNNs have shown to be susceptible to image manipulation, which undermines the trustability of perception systems. This property is critical, especially in unmanned systems, autonomous vehicles, and scenarios where light cannot be controlled. We investigate the robustness of several Deep-Learning based image recognition models and how the accuracy is affected by several distinct image distortions. The distortions include ill-exposure, low-range image sensors, and common noise types.Furthermore, we also propose and evaluate an image pipeline designed to minimize image distortion before the image classification is performed.Results show that most CNN models are marginally affected by mild miss-exposure and Shot noise. On the one hand, the proposed pipeline can provide significant gain on miss-exposed images. On the other hand, harsh miss-exposure, signal-dependent noise, and impulse noise, incur in a high impact on all evaluated models.
(considering most image data is transmitted or stored in formats that can easily be converte to sRGB)
[worthy of attention or notice; remarkable]
Image classification models are built to predict the classes of objects present in an image. The remaining of this paper explores CNN based classification models, which have been adjusted for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Convolutional networks have recently enjoyed great success in this task. Among these, we highlight the following popular models: \textit{i.} VGG, by Simonyan \etal \cite{simonyan2014very}, which obtained both first and second place in the ILSVRC-2014; \textit{ii.} ResNet, by \cite{he2016deep}, which obtained first place in the ILSVRC-2015; \textit{iii.} Inception-v3, by \cite{szegedy2016rethinking}, which introduces factorized convolutions and aggressive regularization; \textit{iv.} Inception-ResNet-v2, by \cite{szegedy2017inception}, which combines residual connections; \textit{v.} MobileNetV1, by \cite{howard2017mobilenets}, which includes depthwise separable convolutions between the regular convolutions layers; \textit{vi.} DenseNet, by \cite{huang2017densely}, where each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers; \textit{vii.} NASNet, by \cite{zoph2018learning}, NASNet which automates network design using information acquired on a small dataset.
In order to restore miss-exposed images, we introduced the ReExposeNet \cite{steffens2019icip} image restoration model into the image recognition pipeline. This model is designed to estimate the radiance of an improperly exposed image, a task that requires restoration and enhancement of non-clipped pixels to maximize visibility and color accuracy, as well as reconstruction strategies for regions where the signal has been clipped. %ReExposeNet is a fast and small CNN exposure correction model, capable of synthesizing substantial clipped parts in high-resolution images. It combines aspects of both U-Nets and Context Aggregation Networks (CANs). ReExposeNet relies on supervised training considering a custom content-based objective function to maximize restoration and reconstruction in clipped areas. It has been adjusted considering both synthetic and real miss-exposed images in three different datasets. ReExposeNet is released as a one-size-fits-all solution, which can be consistently applied on a wide range of image miss-exposure levels. For the present work, we used the model as released by its authors, without further fine-tuning.
%To restore images damaged by noise, we used the DnCNN-3 model \cite{zhang2017beyond}. DnCNN-3 is a very deep feed-forward denoising convolutional neural network. It relies on residual learning and batch normalization to speed up the training process as well as boost the denoising performance. Zhang \etal claims to provide a single DnCNN model to tackle several general image denoising tasks, such as blind Gaussian denoising, single image super-resolution, and JPEG image deblocking. The authors show that the DnCNN model can not only exhibit high effectiveness in several general image denoising tasks but also be efficiently implemented by benefiting from GPU computing, which makes it adequate for real-time applications.
Gamma Power Transformation is a nonlinear operation used to encode and decode luminance values in image systems. It is used to adjust and compensate the response of some luminance levels in the input image. We use Gamma Power Transformation to mimic the conditions observed in under-exposed and overexposed images as $\hat{I} = I^\gamma$. The power transformation is followed by min-max normalization in order to adjust pixel values to a valid representation range. This transformation results in lost data in dark regions, when $\gamma > 1$, or bright and washed-out regions, when $\gamma < 1$. For simulation purposes, we used $\gamma = [\frac{1}{4}; \frac{1}{6}; \frac{1}{8}; 4; 6; 8]$.
R stands for restored, which means we are using our pipelined approach.
For gamma 4, we notice an expressive improvement in the accuracy with the pipeline.
We also notice that Nasnet Large (purple pentagon), Inception Resnet v2 (green x ), Inception v3 (orange triangle) seem to be more robust towards mild exposure than the other models considered.
Small models such as mobile net v2 seem to be less resilient towards image distortion.
Gaussian Noise is randomly added to the input image. The random noise follows a normal distribution.
Shot Noise also know as Photon or Poisson Noise is a data-dependent noise model. A Poisson model of noise may be more appropriate than a Gaussian model for low light conditions where the noise is due to low photon counts.
Talbot claims that image sensor noise is dominated by Poisson statistics, even at high illumination level, this being a typical effect in images captured by robots.
Salt and Pepper Noise (S\&P) is an impulse noise, added to an image by setting white (pixel value equals 255 in an 8-bit per color color-space) and black pixels (pixel value equals 0). In real applications, Salt \& Pepper noise is often associated with dead pixels the camera's sensor array.
Details on the probability are provided in the paper.
Speckle noise is originated from coherent processing of back-scattered signals from multiple distributed points. It follows a uniform distribution.
Speckle noise in real applications is often related to environmental conditions that affect the imaging sensor during image acquisition. It is also common in medical images, as well as active Radar images.
NASNet Large is best
Left to right: VGG, ResNet, Inception v3, Inception Resnet v2, DenseNet, NasNet Large, NasNet Mobile, MobileNet v2
That’s it for today,
If you have any questions regarding the experiments or if you’d like to share your toughts on this matter, please get in touch.
Thank you for your atention.