The document proposes an enhanced super-resolution generative adversarial network called ESRGAN. ESRGAN improves on SRGAN in three key ways: 1) It introduces a novel residual-in-residual dense block as the basic network unit, removing batch normalization layers. 2) It uses a relativistic average discriminator which predicts relative realness of images rather than absolute values. 3) It improves the perceptual loss by using VGG features before activation rather than after. These improvements allow ESRGAN to generate super-resolution images with more realistic textures and details than SRGAN. ESRGAN was the top performer in the PIRM2018 super-resolution challenge based on its perceptual quality scores.
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
The resolution of an image is a very important criterion for evaluating the quality of the image. Higher resolution of image is always preferable as images of lower resolution are unsuitable due to fuzzy quality. Higher resolution of image is important for various fields such as medical imaging; astronomy works and so on as images of lower resolution becomes unclear and indistinct when their sizes are enlarged. In recent times, various research works are performed to generate higher resolution of an image from its lower resolution. In this paper, we have proposed a technique of generating higher resolution images form lower resolution using Residual in Residual Dense Block network architecture with a deep network. We have also compared our method with other methods to prove that our method provides better visual quality images.
An Experiment with Sparse Field and Localized Region Based Active Contour Int...CSCJournals
This paper discusses various experiments conducted on different types of Level Sets interactive segmentation techniques using Matlab software, on select images. The objective is to assess the effectiveness on specific natural images, which have complex image composition in terms of intensity, colour mix, indistinct object boundary, low contrast, etc. Besides visual assessment, measures such as Jaccard Index, Dice Coefficient and Hausdorrf Distance have been computed to assess the accuracy of these techniques, between segmented and ground truth images. This paper particularly discusses Sparse Field Matrix and Localized Region Based Active Contours, both based on Level Sets. These techniques were not found to be effective where object boundary is not very distinct and/or has low contrast with background. Also, the techniques were ineffective on such images where foreground object stretches up to the image boundary.
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYcsandit
The majority of applications requiring high resolution images to derive and analyze data
accurately and easily. Image super resolution is playing an effective role in those applications.
Image super resolution is the process of producing high resolution image from low resolution
image. In this paper, we study various image super resolution techniques with respect to the
quality of results and processing time. This comparative study introduces a comparison between
four algorithms of single image super-resolution. For fair comparison, the compared algorithms
are tested on the same dataset and same platform to show the major advantages of one over the
others.
Biologically inspired deep residual networksIAESIJAI
Many difficult computer vision issues have been effectively tackled by deep neural networks. Not only that but it was discovered that traditional residual neural networks (ResNet) captures features with high generalizability, rendering it a cutting-edge convolutional neural network (CNN). The images classified by the authors of this research introduce a deep residual neural network that is biologically inspired introduces hexagonal convolutions along the skip connection. With the competitive training techniques, the effectiveness of several ResNet variations using square and hexagonal convolution is assessed. Using the hex-convolution on skip connection, we designed a family of ResNet architecture, hexagonal residual neural network (HexResNet), which achieves the highest testing accuracy of 94.02%, and 55.71% on Canadian Institute For Advanced Research (CIFAR)-10 and TinyImageNet, respectively. We demonstrate that the suggested method improves vanilla ResNet architectures’ baseline image classification accuracy on the CIFAR-10 dataset, and a similar effect was seen on the TinyImageNet dataset. For Tiny- ImageNet and CIFAR-10, we saw an average increase in accuracy of 1.46% and 0.48% in the baseline Top-1 accuracy, respectively. The generalized performance of advancements was reported for the suggested bioinspired deep residual networks. This represents an area that might be explored more extensively in the future to enhance all the discriminative power of image classification systems.
This is a paper I wrote as part of my seminar "Inverse Problems in Computer Vision" while pursuing my M.Sc Medical Engineering at FAU, Erlangen, Germany.
The paper details a state-of-the-art method used for Single Image Super Resolution using Deep Convolutional Networks and the possible extensions to the original approach by considering compression and noise artifacts.
Single Image Super Resolution using Interpolation and Discrete Wavelet Transformijtsrd
An interpolation-based method, such as bilinear, bicubic, or nearest neighbor interpolation, is regarded as a simple way to increase the spatial resolution for the LR image It uses the interpolation kernel to predict the missing pixel values, which fails to approximate the underlying image structure and leads to some blurred edges In this work a super resolution technique based on Sparse characteristics of wavelet transform Hence, we proposed a wavelet based super-resolution technique, which will be of the category of interpolative methods, using sparse property of wavelets It is based on sparse representation property of the wavelets Simulation results prove that the proposed wavelet based interpolation method outperforms all other existing methods for single image super resolution The proposed method has 7 7 dB improvement in PSNR compared with Adaptive sparse representation and self-learning ASR-SL 1 for test image Leaves, 12 92 dB improvement for test image Mountain Lion and 7 15 dB improvement for test image Hat compared with ASR-SL 1 Similarly, 12 improvement in SSIM for test image Leaves compared with 1 , 29 improvement in SSIM for test image Mountain Lion compared with 1 and 17 improvement in SSIM for test image Hat compared with 1 Shalini Dubey | Prof. Pankaj Sahu | Prof. Surya Bazal "Single Image Super Resolution using Interpolation & Discrete Wavelet Transform" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18340.pdf
This document contains summaries of several academic papers. The papers discuss topics related to computer vision, image processing, and machine learning including monocular depth estimation, subspace clustering, person re-identification, image set coding, 3D shape matching, single-image super-resolution, video synopsis, frame rate upconversion, in-loop filtering in video coding, microscopy image denoising, visual tracking, stereo matching, and brain image segmentation. Contact information is provided for TSYS Academic Projects in Adyar, India.
Pixel Recursive Super Resolution.
Ryan Dahl, Mohammad Norouzi & Jonathon Shlens
Google Brain.
Abstract
We present a pixel recursive super resolution model that
synthesizes realistic details into images while enhancing
their resolution. A low resolution image may correspond
to multiple plausible high resolution images, thus modeling
the super resolution process with a pixel independent conditional
model often results in averaging different details–
hence blurry edges. By contrast, our model is able to represent
a multimodal conditional distribution by properly modeling
the statistical dependencies among the high resolution
image pixels, conditioned on a low resolution input. We
employ a PixelCNN architecture to define a strong prior
over natural images and jointly optimize this prior with a
deep conditioning convolutional network. Human evaluations
indicate that samples from our proposed model look
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
The resolution of an image is a very important criterion for evaluating the quality of the image. Higher resolution of image is always preferable as images of lower resolution are unsuitable due to fuzzy quality. Higher resolution of image is important for various fields such as medical imaging; astronomy works and so on as images of lower resolution becomes unclear and indistinct when their sizes are enlarged. In recent times, various research works are performed to generate higher resolution of an image from its lower resolution. In this paper, we have proposed a technique of generating higher resolution images form lower resolution using Residual in Residual Dense Block network architecture with a deep network. We have also compared our method with other methods to prove that our method provides better visual quality images.
An Experiment with Sparse Field and Localized Region Based Active Contour Int...CSCJournals
This paper discusses various experiments conducted on different types of Level Sets interactive segmentation techniques using Matlab software, on select images. The objective is to assess the effectiveness on specific natural images, which have complex image composition in terms of intensity, colour mix, indistinct object boundary, low contrast, etc. Besides visual assessment, measures such as Jaccard Index, Dice Coefficient and Hausdorrf Distance have been computed to assess the accuracy of these techniques, between segmented and ground truth images. This paper particularly discusses Sparse Field Matrix and Localized Region Based Active Contours, both based on Level Sets. These techniques were not found to be effective where object boundary is not very distinct and/or has low contrast with background. Also, the techniques were ineffective on such images where foreground object stretches up to the image boundary.
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYcsandit
The majority of applications requiring high resolution images to derive and analyze data
accurately and easily. Image super resolution is playing an effective role in those applications.
Image super resolution is the process of producing high resolution image from low resolution
image. In this paper, we study various image super resolution techniques with respect to the
quality of results and processing time. This comparative study introduces a comparison between
four algorithms of single image super-resolution. For fair comparison, the compared algorithms
are tested on the same dataset and same platform to show the major advantages of one over the
others.
Biologically inspired deep residual networksIAESIJAI
Many difficult computer vision issues have been effectively tackled by deep neural networks. Not only that but it was discovered that traditional residual neural networks (ResNet) captures features with high generalizability, rendering it a cutting-edge convolutional neural network (CNN). The images classified by the authors of this research introduce a deep residual neural network that is biologically inspired introduces hexagonal convolutions along the skip connection. With the competitive training techniques, the effectiveness of several ResNet variations using square and hexagonal convolution is assessed. Using the hex-convolution on skip connection, we designed a family of ResNet architecture, hexagonal residual neural network (HexResNet), which achieves the highest testing accuracy of 94.02%, and 55.71% on Canadian Institute For Advanced Research (CIFAR)-10 and TinyImageNet, respectively. We demonstrate that the suggested method improves vanilla ResNet architectures’ baseline image classification accuracy on the CIFAR-10 dataset, and a similar effect was seen on the TinyImageNet dataset. For Tiny- ImageNet and CIFAR-10, we saw an average increase in accuracy of 1.46% and 0.48% in the baseline Top-1 accuracy, respectively. The generalized performance of advancements was reported for the suggested bioinspired deep residual networks. This represents an area that might be explored more extensively in the future to enhance all the discriminative power of image classification systems.
This is a paper I wrote as part of my seminar "Inverse Problems in Computer Vision" while pursuing my M.Sc Medical Engineering at FAU, Erlangen, Germany.
The paper details a state-of-the-art method used for Single Image Super Resolution using Deep Convolutional Networks and the possible extensions to the original approach by considering compression and noise artifacts.
Single Image Super Resolution using Interpolation and Discrete Wavelet Transformijtsrd
An interpolation-based method, such as bilinear, bicubic, or nearest neighbor interpolation, is regarded as a simple way to increase the spatial resolution for the LR image It uses the interpolation kernel to predict the missing pixel values, which fails to approximate the underlying image structure and leads to some blurred edges In this work a super resolution technique based on Sparse characteristics of wavelet transform Hence, we proposed a wavelet based super-resolution technique, which will be of the category of interpolative methods, using sparse property of wavelets It is based on sparse representation property of the wavelets Simulation results prove that the proposed wavelet based interpolation method outperforms all other existing methods for single image super resolution The proposed method has 7 7 dB improvement in PSNR compared with Adaptive sparse representation and self-learning ASR-SL 1 for test image Leaves, 12 92 dB improvement for test image Mountain Lion and 7 15 dB improvement for test image Hat compared with ASR-SL 1 Similarly, 12 improvement in SSIM for test image Leaves compared with 1 , 29 improvement in SSIM for test image Mountain Lion compared with 1 and 17 improvement in SSIM for test image Hat compared with 1 Shalini Dubey | Prof. Pankaj Sahu | Prof. Surya Bazal "Single Image Super Resolution using Interpolation & Discrete Wavelet Transform" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18340.pdf
This document contains summaries of several academic papers. The papers discuss topics related to computer vision, image processing, and machine learning including monocular depth estimation, subspace clustering, person re-identification, image set coding, 3D shape matching, single-image super-resolution, video synopsis, frame rate upconversion, in-loop filtering in video coding, microscopy image denoising, visual tracking, stereo matching, and brain image segmentation. Contact information is provided for TSYS Academic Projects in Adyar, India.
Pixel Recursive Super Resolution.
Ryan Dahl, Mohammad Norouzi & Jonathon Shlens
Google Brain.
Abstract
We present a pixel recursive super resolution model that
synthesizes realistic details into images while enhancing
their resolution. A low resolution image may correspond
to multiple plausible high resolution images, thus modeling
the super resolution process with a pixel independent conditional
model often results in averaging different details–
hence blurry edges. By contrast, our model is able to represent
a multimodal conditional distribution by properly modeling
the statistical dependencies among the high resolution
image pixels, conditioned on a low resolution input. We
employ a PixelCNN architecture to define a strong prior
over natural images and jointly optimize this prior with a
deep conditioning convolutional network. Human evaluations
indicate that samples from our proposed model look
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
This document summarizes key developments in deep learning for object detection from 2012 onwards. It begins with a timeline showing that 2012 was a turning point, as deep learning achieved record-breaking results in image classification. The document then provides overviews of 250+ contributions relating to object detection frameworks, fundamental problems addressed, evaluation benchmarks and metrics, and state-of-the-art performance. Promising future research directions are also identified.
This document describes a technique called Deep Shading that uses convolutional neural networks to perform screen-space shading at interactive rates. The network takes in attributes from deferred shading buffers like position, normals, reflectance as input and outputs RGB colors. It is trained on example images rendered with path tracing to learn complex shading effects like ambient occlusion, indirect lighting, subsurface scattering, depth of field, motion blur, and anti-aliasing. Deep Shading can achieve quality comparable to hand-written shaders but avoids the programming effort, and generalizes better by learning from example data rather than relying on human assumptions.
Image reconstruction through compressive sampling matching pursuit and curvel...IJECEIAES
An interesting area of research is image reconstruction, which uses algorithms and techniques to transform a degraded image into a good one. The quality of the reconstructed image plays a vital role in the field of image processing. Compressive Sampling is an innovative and rapidly growing method for reconstructing signals. It is extensively used in image reconstruction. The literature uses a variety of matching pursuits for image reconstruction. In this paper, we propose a modified method named compressive sampling matching pursuit (CoSaMP) for image reconstruction that promises to sample sparse signals from far fewer observations than the signal’s dimension. The main advantage of CoSaMP is that it has an excellent theoretical guarantee for convergence. The proposed technique combines CoSaMP with curvelet transform for better reconstruction of image. Experiments are carried out to evaluate the proposed technique on different test images. The results indicate that qualitative and quantitative performance is better compared to existing methods.
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...IRJET Journal
This document presents a CNN-MRF based system for counting people in dense crowd images. The system divides dense crowd images into overlapping patches. A CNN is used to extract features from each patch and regress the patch count. Since patches overlap, neighboring patch counts are strongly correlated. An MRF smooths the patch counts using this correlation to obtain a more accurate overall count. The system was developed to address challenges in accurately locating, sizing, and counting people in dense crowds via detection.
This document proposes using deep neural networks for change detection in synthetic aperture radar (SAR) images. It summarizes the classic difference image method for SAR change detection and its challenges. The proposed method uses a restricted Boltzmann machine deep neural network to directly classify pixels in the original SAR images as changed or unchanged, without generating a difference image. It involves pre-classifying the images using fuzzy c-means clustering to label pixels for training the neural network. The document describes establishing and training the deep neural network to learn image features and classify pixels.
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
This document proposes a technique for binarizing degraded text documents and palm leaf manuscripts. It involves taking the average pixel value of the image as a threshold to distinguish foreground from background. The algorithm first computes the average value of the original image and uses it to set pixels above the threshold to black, removing background. It then computes the average of the remaining image, excluding black pixels, and uses that value as a new threshold to set remaining pixels above it to white, extracting the foreground. The technique is tested on old documents and manuscripts, showing improvement over existing methods based on metrics like peak signal-to-noise ratio. While effective for documents, it needs improvement for palm leaf manuscripts with non-uniform degradation.
Survey on Single image Super Resolution TechniquesIOSR Journals
Super-resolution is the process of recovering a high-resolution image from multiple lowresolutionimages
of the same scene. The key objective of super-resolution (SR) imaging is to reconstruct a
higher-resolution image based on a set of images, acquired from the same scene and denoted as ‘lowresolution’
images, to overcome the limitation and/or ill-posed conditions of the image acquisition process for
facilitating better content visualization and scene recognition. In this paper, we provide a comprehensive review
of existing super-resolution techniques and highlight the future research challenges. This includes the
formulation of an observation model and coverage of the dominant algorithm – Iterative back projection.We
critique these methods and identify areas which promise performance improvements. In this paper, future
directions for super-resolution algorithms are discussed. Finally results of available methods are given.
Survey on Single image Super Resolution TechniquesIOSR Journals
Abstract:Super-resolution is the process of recovering a high-resolution image from multiple low-resolutionimages of the same scene. The key objective of super-resolution (SR) imaging is to reconstruct a higher-resolution image based on a set of images, acquired from the same scene and denoted as ‘low-resolution’ images, to overcome the limitation and/or ill-posed conditions of the image acquisition process for facilitating better content visualization and scene recognition. In this paper, we provide a comprehensive review of existing super-resolution techniques and highlight the future research challenges. This includes the formulation of an observation model and coverage of the dominant algorithm – Iterative back projection.We critique these methods and identify areas which promise performance improvements. In this paper, future directions for super-resolution algorithms are discussed. Finally results of available methods are given. Keywords: Super-resolution, POCS, IBP, Canny Edge Detection
An Experimental Approach For Evaluating Superpixel's Consistency Over 2D Gaus...CSCJournals
This article proposes a rigorous method to assess the consistency of superpixels for different superpixel segmentation algorithms. The proposed method extracts the superpixels that remain unchanged over certain levels of noise by adopting the Jaccard Similarity Coefficient (JSC). Technically, we developed a measure of Jaccard similarity for superpixel segmentation algorithms to compare the similarity between sets of superpixels (original and noisy). The algorithm calls the superpixel segmentation algorithm to generate the superpixel results of the original images and saves their boundary masks and labels. It then applies varying degrees of noise to the images and produces the superpixels results, and the process is repeated for four levels with increased noise value at each iteration. We chose 2D Gaussian Blur, Impulse Noise and a combination of both to corrupt the images. The proposed algorithm generates similarity indices of superpixels (original and noisy) using Jaccard Similarity (JS). To be categorized as a consistent superpixel, the similarity index must meet a predefined coefficient threshold (?) of JSC. The superpixels consistency of four different superpixel segmentation algorithms including Bilateral geodesic distance (BGD), Flooding based superpixels generation (FBS), superpixels via geodesic distance (GDS), and Turbopixel (TP) are evaluated. Precisely, the experimental results demonstrated that no single algorithm was able to yield an optimal outcome and failed to maintain consistent superpixels at each level of noise. Conclusively, more robust superpixel algorithms must be developed to solve such problems effectively.
Обучение нейросетей компьютерного зрения в видеоиграхAnatol Alizar
The document presents experiments assessing the effectiveness of training computer vision models using synthetic RGB images extracted from video games. The authors:
1) Collected over 60,000 synthetic samples from a video game with similar conditions to real-world datasets like CamVid and Cityscapes, providing groundtruth labels, depth, and other data.
2) Trained convolutional networks on the synthetic data for tasks like image segmentation and depth estimation, finding the networks achieved similar performance to those trained on real data.
3) Further improved performance by fine-tuning networks pre-trained on synthetic data on real data, outperforming networks pre-trained only on real data.
A Review on Deformation Measurement from Speckle Patterns using Digital Image...IRJET Journal
This document reviews digital image correlation (DIC) for deformation measurement using speckle patterns. DIC is a non-contact optical method that uses digital images of a speckle pattern on a surface before and after deformation. By comparing the speckle patterns in the images, DIC can determine displacement and strain fields with high accuracy. The document discusses speckle pattern types, the DIC process, related works that have improved DIC methods, and applications of DIC such as for high-temperature testing. DIC provides full-field measurements and greater accuracy compared to conventional contact methods.
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
1) The document discusses image segmentation in satellite images using optimal texture measures. It evaluates four texture measures from the gray-level co-occurrence matrix (GLCM) with six different window sizes.
2) Principal Component Analysis (PCA) is applied to reduce the texture measures to a manageable size while retaining discrimination information.
3) The methodology consists of selecting an optimal window size and optimal texture measure. A 7x7 window size provided superior performance for classification. PCA is used to analyze correlations between texture measures and window sizes.
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...CSCJournals
High-resolution (HR) images play a vital role in all imaging applications as they offer more details. The images captured by the camera system are of degraded quality due to the imaging system and are low-resolution (LR) images. Image super-resolution (SR) is a process, where HR image is obtained from combining one or multiple LR images of same scene. In this paper, learning based single frame image super-resolution technique is proposed by using Fast Discrete Curvelet Transform (FDCT) coefficients. FDCT is an extension to Cartesian wavelets having anisotropic scaling with many directions and positions, which forms tight wedges. Such wedges allow FDCT to capture the smooth curves and fine edges at multiresolution level. The finer scale curvelet coefficients of LR image are learnt locally from a set of high-resolution training images. The super-resolved image is reconstructed by inverse Fast Discrete Curvelet Transform (IFDCT). This technique represents fine edges of reconstructed HR image by extrapolating the FDCT coefficients from the high-resolution training images. Experimentation based results show appropriate improvements in MSE and PSNR.
This paper proposes a new algorithm for single-image super-resolution that exploits image compressibility in the wavelet domain using compressed sensing theory. The algorithm incorporates the downsampling low-pass filter into the measurement matrix to decrease coherence between the wavelet basis and sampling basis, allowing use of wavelets. It then uses a greedy algorithm to solve for sparse wavelet coefficients representing the high-resolution image. Results show improved performance over existing super-resolution approaches without requiring training data.
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...adeij1
Segmentation of objects with various sizes is relatively less explored in medical imaging, and has been very challenging in computer vision tasks in general. We hypothesize that the receptive field of a deep model corresponds closely to the size of object to be segmented, which could critically influence the segmentation accuracy of objects with varied sizes. In this study, we employed “AmygNet”, a dual-branch fully convolutional neural network (FCNN) with two different sizes of receptive fields, to investigate the effects of receptive field on segmenting four major subnuclei of bilateral amygdalae. The experiment was conducted on 14 subjects, which are all 3-dimensional MRI human brain images. Since the scale of different subnuclear groups are different, by investigating the accuracy of each subnuclear group while using receptive fields of various sizes, we may find which kind of receptive field is suitable for object of which scale respectively. In the given condition, AmygNet with multiple receptive fields presents great potential in segmenting objects of different sizes.
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
This document summarizes key developments in deep learning for object detection from 2012 onwards. It begins with a timeline showing that 2012 was a turning point, as deep learning achieved record-breaking results in image classification. The document then provides overviews of 250+ contributions relating to object detection frameworks, fundamental problems addressed, evaluation benchmarks and metrics, and state-of-the-art performance. Promising future research directions are also identified.
This document describes a technique called Deep Shading that uses convolutional neural networks to perform screen-space shading at interactive rates. The network takes in attributes from deferred shading buffers like position, normals, reflectance as input and outputs RGB colors. It is trained on example images rendered with path tracing to learn complex shading effects like ambient occlusion, indirect lighting, subsurface scattering, depth of field, motion blur, and anti-aliasing. Deep Shading can achieve quality comparable to hand-written shaders but avoids the programming effort, and generalizes better by learning from example data rather than relying on human assumptions.
Image reconstruction through compressive sampling matching pursuit and curvel...IJECEIAES
An interesting area of research is image reconstruction, which uses algorithms and techniques to transform a degraded image into a good one. The quality of the reconstructed image plays a vital role in the field of image processing. Compressive Sampling is an innovative and rapidly growing method for reconstructing signals. It is extensively used in image reconstruction. The literature uses a variety of matching pursuits for image reconstruction. In this paper, we propose a modified method named compressive sampling matching pursuit (CoSaMP) for image reconstruction that promises to sample sparse signals from far fewer observations than the signal’s dimension. The main advantage of CoSaMP is that it has an excellent theoretical guarantee for convergence. The proposed technique combines CoSaMP with curvelet transform for better reconstruction of image. Experiments are carried out to evaluate the proposed technique on different test images. The results indicate that qualitative and quantitative performance is better compared to existing methods.
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...IRJET Journal
This document presents a CNN-MRF based system for counting people in dense crowd images. The system divides dense crowd images into overlapping patches. A CNN is used to extract features from each patch and regress the patch count. Since patches overlap, neighboring patch counts are strongly correlated. An MRF smooths the patch counts using this correlation to obtain a more accurate overall count. The system was developed to address challenges in accurately locating, sizing, and counting people in dense crowds via detection.
This document proposes using deep neural networks for change detection in synthetic aperture radar (SAR) images. It summarizes the classic difference image method for SAR change detection and its challenges. The proposed method uses a restricted Boltzmann machine deep neural network to directly classify pixels in the original SAR images as changed or unchanged, without generating a difference image. It involves pre-classifying the images using fuzzy c-means clustering to label pixels for training the neural network. The document describes establishing and training the deep neural network to learn image features and classify pixels.
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
This document proposes a technique for binarizing degraded text documents and palm leaf manuscripts. It involves taking the average pixel value of the image as a threshold to distinguish foreground from background. The algorithm first computes the average value of the original image and uses it to set pixels above the threshold to black, removing background. It then computes the average of the remaining image, excluding black pixels, and uses that value as a new threshold to set remaining pixels above it to white, extracting the foreground. The technique is tested on old documents and manuscripts, showing improvement over existing methods based on metrics like peak signal-to-noise ratio. While effective for documents, it needs improvement for palm leaf manuscripts with non-uniform degradation.
Survey on Single image Super Resolution TechniquesIOSR Journals
Super-resolution is the process of recovering a high-resolution image from multiple lowresolutionimages
of the same scene. The key objective of super-resolution (SR) imaging is to reconstruct a
higher-resolution image based on a set of images, acquired from the same scene and denoted as ‘lowresolution’
images, to overcome the limitation and/or ill-posed conditions of the image acquisition process for
facilitating better content visualization and scene recognition. In this paper, we provide a comprehensive review
of existing super-resolution techniques and highlight the future research challenges. This includes the
formulation of an observation model and coverage of the dominant algorithm – Iterative back projection.We
critique these methods and identify areas which promise performance improvements. In this paper, future
directions for super-resolution algorithms are discussed. Finally results of available methods are given.
Survey on Single image Super Resolution TechniquesIOSR Journals
Abstract:Super-resolution is the process of recovering a high-resolution image from multiple low-resolutionimages of the same scene. The key objective of super-resolution (SR) imaging is to reconstruct a higher-resolution image based on a set of images, acquired from the same scene and denoted as ‘low-resolution’ images, to overcome the limitation and/or ill-posed conditions of the image acquisition process for facilitating better content visualization and scene recognition. In this paper, we provide a comprehensive review of existing super-resolution techniques and highlight the future research challenges. This includes the formulation of an observation model and coverage of the dominant algorithm – Iterative back projection.We critique these methods and identify areas which promise performance improvements. In this paper, future directions for super-resolution algorithms are discussed. Finally results of available methods are given. Keywords: Super-resolution, POCS, IBP, Canny Edge Detection
An Experimental Approach For Evaluating Superpixel's Consistency Over 2D Gaus...CSCJournals
This article proposes a rigorous method to assess the consistency of superpixels for different superpixel segmentation algorithms. The proposed method extracts the superpixels that remain unchanged over certain levels of noise by adopting the Jaccard Similarity Coefficient (JSC). Technically, we developed a measure of Jaccard similarity for superpixel segmentation algorithms to compare the similarity between sets of superpixels (original and noisy). The algorithm calls the superpixel segmentation algorithm to generate the superpixel results of the original images and saves their boundary masks and labels. It then applies varying degrees of noise to the images and produces the superpixels results, and the process is repeated for four levels with increased noise value at each iteration. We chose 2D Gaussian Blur, Impulse Noise and a combination of both to corrupt the images. The proposed algorithm generates similarity indices of superpixels (original and noisy) using Jaccard Similarity (JS). To be categorized as a consistent superpixel, the similarity index must meet a predefined coefficient threshold (?) of JSC. The superpixels consistency of four different superpixel segmentation algorithms including Bilateral geodesic distance (BGD), Flooding based superpixels generation (FBS), superpixels via geodesic distance (GDS), and Turbopixel (TP) are evaluated. Precisely, the experimental results demonstrated that no single algorithm was able to yield an optimal outcome and failed to maintain consistent superpixels at each level of noise. Conclusively, more robust superpixel algorithms must be developed to solve such problems effectively.
Обучение нейросетей компьютерного зрения в видеоиграхAnatol Alizar
The document presents experiments assessing the effectiveness of training computer vision models using synthetic RGB images extracted from video games. The authors:
1) Collected over 60,000 synthetic samples from a video game with similar conditions to real-world datasets like CamVid and Cityscapes, providing groundtruth labels, depth, and other data.
2) Trained convolutional networks on the synthetic data for tasks like image segmentation and depth estimation, finding the networks achieved similar performance to those trained on real data.
3) Further improved performance by fine-tuning networks pre-trained on synthetic data on real data, outperforming networks pre-trained only on real data.
A Review on Deformation Measurement from Speckle Patterns using Digital Image...IRJET Journal
This document reviews digital image correlation (DIC) for deformation measurement using speckle patterns. DIC is a non-contact optical method that uses digital images of a speckle pattern on a surface before and after deformation. By comparing the speckle patterns in the images, DIC can determine displacement and strain fields with high accuracy. The document discusses speckle pattern types, the DIC process, related works that have improved DIC methods, and applications of DIC such as for high-temperature testing. DIC provides full-field measurements and greater accuracy compared to conventional contact methods.
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
1) The document discusses image segmentation in satellite images using optimal texture measures. It evaluates four texture measures from the gray-level co-occurrence matrix (GLCM) with six different window sizes.
2) Principal Component Analysis (PCA) is applied to reduce the texture measures to a manageable size while retaining discrimination information.
3) The methodology consists of selecting an optimal window size and optimal texture measure. A 7x7 window size provided superior performance for classification. PCA is used to analyze correlations between texture measures and window sizes.
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...CSCJournals
High-resolution (HR) images play a vital role in all imaging applications as they offer more details. The images captured by the camera system are of degraded quality due to the imaging system and are low-resolution (LR) images. Image super-resolution (SR) is a process, where HR image is obtained from combining one or multiple LR images of same scene. In this paper, learning based single frame image super-resolution technique is proposed by using Fast Discrete Curvelet Transform (FDCT) coefficients. FDCT is an extension to Cartesian wavelets having anisotropic scaling with many directions and positions, which forms tight wedges. Such wedges allow FDCT to capture the smooth curves and fine edges at multiresolution level. The finer scale curvelet coefficients of LR image are learnt locally from a set of high-resolution training images. The super-resolved image is reconstructed by inverse Fast Discrete Curvelet Transform (IFDCT). This technique represents fine edges of reconstructed HR image by extrapolating the FDCT coefficients from the high-resolution training images. Experimentation based results show appropriate improvements in MSE and PSNR.
This paper proposes a new algorithm for single-image super-resolution that exploits image compressibility in the wavelet domain using compressed sensing theory. The algorithm incorporates the downsampling low-pass filter into the measurement matrix to decrease coherence between the wavelet basis and sampling basis, allowing use of wavelets. It then uses a greedy algorithm to solve for sparse wavelet coefficients representing the high-resolution image. Results show improved performance over existing super-resolution approaches without requiring training data.
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...adeij1
Segmentation of objects with various sizes is relatively less explored in medical imaging, and has been very challenging in computer vision tasks in general. We hypothesize that the receptive field of a deep model corresponds closely to the size of object to be segmented, which could critically influence the segmentation accuracy of objects with varied sizes. In this study, we employed “AmygNet”, a dual-branch fully convolutional neural network (FCNN) with two different sizes of receptive fields, to investigate the effects of receptive field on segmenting four major subnuclei of bilateral amygdalae. The experiment was conducted on 14 subjects, which are all 3-dimensional MRI human brain images. Since the scale of different subnuclear groups are different, by investigating the accuracy of each subnuclear group while using receptive fields of various sizes, we may find which kind of receptive field is suitable for object of which scale respectively. In the given condition, AmygNet with multiple receptive fields presents great potential in segmenting objects of different sizes.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
2018_Enhanced SRGAN.pdf
1. ESRGAN: Enhanced Super-Resolution
Generative Adversarial Networks
Xintao Wang1
, Ke Yu1
, Shixiang Wu2
, Jinjin Gu3
, Yihao Liu4
,
Chao Dong2
, Chen Change Loy5
, Yu Qiao2
, Xiaoou Tang1
1
CUHK-SenseTime Joint Lab, The Chinese University of Hong Kong
2
SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology,
Chinese Academy of Sciences 3
The Chinese University of Hong Kong, Shenzhen
4
University of Chinese Academy of Sciences 5
Nanyang Technological University, Singapore
{wx016,yk017,xtang}@ie.cuhk.edu.hk, {sx.wu,chao.dong,yu.qiao}@siat.ac.cn
liuyihao14@mails.ucas.ac.cn, 115010148@link.cuhk.edu.cn, ccloy@ntu.edu.sg
Abstract. The Super-Resolution Generative Adversarial Network (SR-
GAN) [1] is a seminal work that is capable of generating realistic textures
during single image super-resolution. However, the hallucinated details
are often accompanied with unpleasant artifacts. To further enhance the
visual quality, we thoroughly study three key components of SRGAN –
network architecture, adversarial loss and perceptual loss, and improve
each of them to derive an Enhanced SRGAN (ESRGAN). In particu-
lar, we introduce the Residual-in-Residual Dense Block (RRDB) without
batch normalization as the basic network building unit. Moreover, we
borrow the idea from relativistic GAN [2] to let the discriminator predict
relative realness instead of the absolute value. Finally, we improve the
perceptual loss by using the features before activation, which could pro-
vide stronger supervision for brightness consistency and texture recovery.
Benefiting from these improvements, the proposed ESRGAN achieves
consistently better visual quality with more realistic and natural textures
than SRGAN and won the first place in the PIRM2018-SR Challenge1
[3].
The code is available at https://github.com/xinntao/ESRGAN.
1 Introduction
Single image super-resolution (SISR), as a fundamental low-level vision prob-
lem, has attracted increasing attention in the research community and AI com-
panies. SISR aims at recovering a high-resolution (HR) image from a single
low-resolution (LR) one. Since the pioneer work of SRCNN proposed by Dong
et al. [4], deep convolution neural network (CNN) approaches have brought pros-
perous development. Various network architecture designs and training strategies
have continuously improved the SR performance, especially the Peak Signal-to-
Noise Ratio (PSNR) value [5,6,7,1,8,9,10,11,12]. However, these PSNR-oriented
approaches tend to output over-smoothed results without sufficient high-frequency
details, since the PSNR metric fundamentally disagrees with the subjective eval-
uation of human observers [1].
1
We won the first place in region 3 and got the best perceptual index.
arXiv:1809.00219v2
[cs.CV]
17
Sep
2018
2. 2 Xintao Wang et al.
SRGAN ESRGAN Ground Truth
Fig. 1: The super-resolution results of ×4 for SRGAN2
, the proposed ESRGAN
and the ground-truth. ESRGAN outperforms SRGAN in sharpness and details.
Several perceptual-driven methods have been proposed to improve the visual
quality of SR results. For instance, perceptual loss [13,14] is proposed to opti-
mize super-resolution model in a feature space instead of pixel space. Generative
adversarial network [15] is introduced to SR by [1,16] to encourage the network
to favor solutions that look more like natural images. The semantic image prior
is further incorporated to improve recovered texture details [17]. One of the
milestones in the way pursuing visually pleasing results is SRGAN [1]. The basic
model is built with residual blocks [18] and optimized using perceptual loss in a
GAN framework. With all these techniques, SRGAN significantly improves the
overall visual quality of reconstruction over PSNR-oriented methods.
However, there still exists a clear gap between SRGAN results and the
ground-truth (GT) images, as shown in Fig. 1. In this study, we revisit the
key components of SRGAN and improve the model in three aspects. First, we
improve the network structure by introducing the Residual-in-Residual Dense
Block (RDDB), which is of higher capacity and easier to train. We also remove
Batch Normalization (BN) [19] layers as in [20] and use residual scaling [21,20]
and smaller initialization to facilitate training a very deep network. Second, we
improve the discriminator using Relativistic average GAN (RaGAN) [2], which
learns to judge “whether one image is more realistic than the other” rather than
“whether one image is real or fake”. Our experiments show that this improvement
helps the generator recover more realistic texture details. Third, we propose an
improved perceptual loss by using the VGG features before activation instead of
after activation as in SRGAN. We empirically find that the adjusted perceptual
loss provides sharper edges and more visually pleasing results, as will be shown
2
We use the released results of original SRGAN [1] paper – https://twitter.app.
box.com/s/lcue6vlrd01ljkdtdkhmfvk7vtjhetog.
3. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 3
Perceptual
Index
RMSE
ESRGAN
EnhanceNet
RCAN
EDSR
R1 R2
EDSR
RCAN
ESRGAN
EnhanceNet
Method PI RMSE
2.040 15.15
2.688
5.243
15.99
10.87
11.16
4.831
Results on PIRM self val dataset
interp_1
interp_2
R3
interp_2 2.567 12.45
interp_1 3.279 11.47
Fig. 2: Perception-distortion plane on PIRM self validation dataset. We show
the baselines of EDSR [20], RCAN [12] and EnhanceNet [16], and the submitted
ESRGAN model. The blue dots are produced by image interpolation.
in Sec. 4.4. Extensive experiments show that the enhanced SRGAN, termed ES-
RGAN, consistently outperforms state-of-the-art methods in both sharpness and
details (see Fig. 1 and Fig. 7).
We take a variant of ESRGAN to participate in the PIRM-SR Challenge [3].
This challenge is the first SR competition that evaluates the performance in a
perceptual-quality aware manner based on [22], where the authors claim that
distortion and perceptual quality are at odds with each other. The perceptual
quality is judged by the non-reference measures of Ma’s score [23] and NIQE [24],
i.e., perceptual index = 1
2 ((10−Ma)+NIQE). A lower perceptual index represents
a better perceptual quality.
As shown in Fig. 2, the perception-distortion plane is divided into three
regions defined by thresholds on the Root-Mean-Square Error (RMSE), and the
algorithm that achieves the lowest perceptual index in each region becomes the
regional champion. We mainly focus on region 3 as we aim to bring the perceptual
quality to a new high. Thanks to the aforementioned improvements and some
other adjustments as discussed in Sec. 4.6, our proposed ESRGAN won the first
place in the PIRM-SR Challenge (region 3) with the best perceptual index.
In order to balance the visual quality and RMSE/PSNR, we further propose
the network interpolation strategy, which could continuously adjust the recon-
struction style and smoothness. Another alternative is image interpolation, which
directly interpolates images pixel by pixel. We employ this strategy to partici-
pate in region 1 and region 2. The network interpolation and image interpolation
strategies and their differences are discussed in Sec. 3.4.
2 Related Work
We focus on deep neural network approaches to solve the SR problem. As a
pioneer work, Dong et al. [4,25] propose SRCNN to learn the mapping from LR
4. 4 Xintao Wang et al.
to HR images in an end-to-end manner, achieving superior performance against
previous works. Later on, the field has witnessed a variety of network architec-
tures, such as a deeper network with residual learning [5], Laplacian pyramid
structure [6], residual blocks [1], recursive learning [7,8], densely connected net-
work [9], deep back projection [10] and residual dense network [11]. Specifically,
Lim et al. [20] propose EDSR model by removing unnecessary BN layers in
the residual block and expanding the model size, which achieves significant im-
provement. Zhang et al. [11] propose to use effective residual dense block in SR,
and they further explore a deeper network with channel attention [12], achiev-
ing the state-of-the-art PSNR performance. Besides supervised learning, other
methods like reinforcement learning [26] and unsupervised learning [27] are also
introduced to solve general image restoration problems.
Several methods have been proposed to stabilize training a very deep model.
For instance, residual path is developed to stabilize the training and improve the
performance [18,5,12]. Residual scaling is first employed by Szegedy et al. [21]
and also used in EDSR. For general deep networks, He et al. [28] propose a robust
initialization method for VGG-style networks without BN. To facilitate training
a deeper network, we develop a compact and effective residual-in-residual dense
block, which also helps to improve the perceptual quality.
Perceptual-driven approaches have also been proposed to improve the visual
quality of SR results. Based on the idea of being closer to perceptual similar-
ity [29,14], perceptual loss [13] is proposed to enhance the visual quality by min-
imizing the error in a feature space instead of pixel space. Contextual loss [30] is
developed to generate images with natural image statistics by using an objective
that focuses on the feature distribution rather than merely comparing the ap-
pearance. Ledig et al. [1] propose SRGAN model that uses perceptual loss and
adversarial loss to favor outputs residing on the manifold of natural images. Saj-
jadi et al. [16] develop a similar approach and further explored the local texture
matching loss. Based on these works, Wang et al. [17] propose spatial feature
transform to effectively incorporate semantic prior in an image and improve the
recovered textures.
Throughout the literature, photo-realism is usually attained by adversarial
training with GAN [15]. Recently there are a bunch of works that focus on de-
veloping more effective GAN frameworks. WGAN [31] proposes to minimize a
reasonable and efficient approximation of Wasserstein distance and regularizes
discriminator by weight clipping. Other improved regularization for discrimina-
tor includes gradient clipping [32] and spectral normalization [33]. Relativistic
discriminator [2] is developed not only to increase the probability that gener-
ated data are real, but also to simultaneously decrease the probability that real
data are real. In this work, we enhance SRGAN by employing a more effective
relativistic average GAN.
SR algorithms are typically evaluated by several widely used distortion mea-
sures, e.g., PSNR and SSIM. However, these metrics fundamentally disagree with
the subjective evaluation of human observers [1]. Non-reference measures are
used for perceptual quality evaluation, including Ma’s score [23] and NIQE [24],
5. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 5
both of which are used to calculate the perceptual index in the PIRM-SR Chal-
lenge [3]. In a recent study, Blau et al. [22] find that the distortion and perceptual
quality are at odds with each other.
3 Proposed Methods
Our main aim is to improve the overall perceptual quality for SR. In this sec-
tion, we first describe our proposed network architecture and then discuss the
improvements from the discriminator and perceptual loss. At last, we describe
the network interpolation strategy for balancing perceptual quality and PSNR.
Conv
Upsampling
Conv
Conv
Conv
LR
SR
Basic Block Basic Block Basic Block
Fig. 3: We employ the basic architecture of SRResNet [1], where most computa-
tion is done in the LR feature space. We could select or design “basic blocks”
(e.g., residual block [18], dense block [34], RRDB) for better performance.
3.1 Network Architecture
In order to further improve the recovered image quality of SRGAN, we mainly
make two modifications to the structure of generator G: 1) remove all BN lay-
ers; 2) replace the original basic block with the proposed Residual-in-Residual
Dense Block (RRDB), which combines multi-level residual network and dense
connections as depicted in Fig. 4.
Residual Block (RB) Residual in Residual Dense Block (RRDB)
Conv
BN
ReLU
Conv
BN
Conv
ReLU
Conv
+ +
SRGAN RB w/o BN
Conv
LReLU
Conv
LReLU
Conv
LReLU
Conv
LReLU
Conv
+ + +
Dense
Block
×
𝛽
×
𝛽
×
𝛽
× 𝛽
Dense
Block
Dense
Block
+
Fig. 4: Left: We remove the BN layers in residual block in SRGAN. Right:
RRDB block is used in our deeper model and β is the residual scaling parameter.
Removing BN layers has proven to increase performance and reduce com-
putational complexity in different PSNR-oriented tasks including SR [20] and
deblurring [35]. BN layers normalize the features using mean and variance in a
batch during training and use estimated mean and variance of the whole train-
ing dataset during testing. When the statistics of training and testing datasets
differ a lot, BN layers tend to introduce unpleasant artifacts and limit the gener-
alization ability. We empirically observe that BN layers are more likely to bring
6. 6 Xintao Wang et al.
artifacts when the network is deeper and trained under a GAN framework. These
artifacts occasionally appear among iterations and different settings, violating
the needs for a stable performance over training. We therefore remove BN layers
for stable training and consistent performance. Furthermore, removing BN layers
helps to improve generalization ability and to reduce computational complexity
and memory usage.
We keep the high-level architecture design of SRGAN (see Fig. 3), and use a
novel basic block namely RRDB as depicted in Fig. 4. Based on the observation
that more layers and connections could always boost performance [20,11,12], the
proposed RRDB employs a deeper and more complex structure than the original
residual block in SRGAN. Specifically, as shown in Fig. 4, the proposed RRDB
has a residual-in-residual structure, where residual learning is used in different
levels. A similar network structure is proposed in [36] that also applies a multi-
level residual network. However, our RRDB differs from [36] in that we use dense
block [34] in the main path as [11], where the network capacity becomes higher
benefiting from the dense connections.
In addition to the improved architecture, we also exploit several techniques
to facilitate training a very deep network: 1) residual scaling [21,20], i.e., scaling
down the residuals by multiplying a constant between 0 and 1 before adding them
to the main path to prevent instability; 2) smaller initialization, as we empirically
find residual architecture is easier to train when the initial parameter variance
becomes smaller. More discussion can be found in the supplementary material.
The training details and the effectiveness of the proposed network will be
presented in Sec. 4.
3.2 Relativistic Discriminator
Besides the improved structure of generator, we also enhance the discriminator
based on the Relativistic GAN [2]. Different from the standard discriminator D
in SRGAN, which estimates the probability that one input image x is real and
natural, a relativistic discriminator tries to predict the probability that a real
image xr is relatively more realistic than a fake one xf , as shown in Fig. 5.
→ 1
→ 0
, → 1
Real?
Fake?
Real
Fake
Real
Real
Fake
Fake
More realistic
than fake data?
Less realistic
than real data?
b) Relativistic GAN
a) Standard GAN
, → 0
Fig. 5: Difference between standard discriminator and relativistic discriminator.
Specifically, we replace the standard discriminator with the Relativistic av-
erage Discriminator RaD [2], denoted as DRa. The standard discriminator in
SRGAN can be expressed as D(x) = σ(C(x)), where σ is the sigmoid function
and C(x) is the non-transformed discriminator output. Then the RaD is for-
mulated as DRa(xr, xf ) = σ(C(xr) − Exf
[C(xf )]), where Exf
[·] represents the
7. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 7
operation of taking average for all fake data in the mini-batch. The discriminator
loss is then defined as:
LRa
D = −Exr [log(DRa(xr, xf ))] − Exf
[log(1 − DRa(xf , xr))]. (1)
The adversarial loss for generator is in a symmetrical form:
LRa
G = −Exr
[log(1 − DRa(xr, xf ))] − Exf
[log(DRa(xf , xr))], (2)
where xf = G(xi) and xi stands for the input LR image. It is observed that the
adversarial loss for generator contains both xr and xf . Therefore, our generator
benefits from the gradients from both generated data and real data in adversarial
training, while in SRGAN only generated part takes effect. In Sec. 4.4, we will
show that this modification of discriminator helps to learn sharper edges and
more detailed textures.
3.3 Perceptual Loss
We also develop a more effective perceptual loss Lpercep by constraining on fea-
tures before activation rather than after activation as practiced in SRGAN.
Based on the idea of being closer to perceptual similarity [29,14], Johnson
et al. [13] propose perceptual loss and it is extended in SRGAN [1]. Perceptual
loss is previously defined on the activation layers of a pre-trained deep network,
where the distance between two activated features is minimized. Contrary to
the convention, we propose to use features before the activation layers, which
will overcome two drawbacks of the original design. First, the activated features
are very sparse, especially after a very deep network, as depicted in Fig. 6.
For example, the average percentage of activated neurons for image ‘baboon’
after VGG19-543
layer is merely 11.17%. The sparse activation provides weak
supervision and thus leads to inferior performance. Second, using features after
activation also causes inconsistent reconstructed brightness compared with the
ground-truth image, which we will show in Sec. 4.4.
Therefore, the total loss for the generator is:
LG = Lpercep + λLRa
G + ηL1, (3)
where L1 = Exi ||G(xi) − y||1 is the content loss that evaluate the 1-norm dis-
tance between recovered image G(xi) and the ground-truth y, and λ, η are the
coefficients to balance different loss terms.
We also explore a variant of perceptual loss in the PIRM-SR Challenge. In
contrast to the commonly used perceptual loss that adopts a VGG network
trained for image classification, we develop a more suitable perceptual loss for
SR – MINC loss. It is based on a fine-tuned VGG network for material recog-
nition [38], which focuses on textures rather than object. Although the gain of
perceptual index brought by MINC loss is marginal, we still believe that explor-
ing perceptual loss that focuses on texture is critical for SR.
3
We use pre-trained 19-layer VGG network[37], where 54 indicates features obtained
by the 4th
convolution before the 5th
maxpooling layer, representing high-level fea-
tures and similarly, 22 represents low-level features.
8. 8 Xintao Wang et al.
a) activation map of VGG19-22 b) activation map of VGG19-54
2nd Channel 4th Channel
before
activation
after
activation 48th Channel 13th Channel
Fig. 6: Representative feature maps before and after activation for image ‘ba-
boon’. With the network going deeper, most of the features after activation
become inactive while features before activation contains more information.
3.4 Network Interpolation
To remove unpleasant noise in GAN-based methods while maintain a good per-
ceptual quality, we propose a flexible and effective strategy – network interpola-
tion. Specifically, we first train a PSNR-oriented network GPSNR and then obtain
a GAN-based network GGAN by fine-tuning. We interpolate all the correspond-
ing parameters of these two networks to derive an interpolated model GINTERP,
whose parameters are:
θINTERP
G = (1 − α) θPSNR
G + α θGAN
G , (4)
where θINTERP
G , θPSNR
G and θGAN
G are the parameters of GINTERP, GPSNR and
GGAN, respectively, and α ∈ [0, 1] is the interpolation parameter.
The proposed network interpolation enjoys two merits. First, the interpo-
lated model is able to produce meaningful results for any feasible α without
introducing artifacts. Second, we can continuously balance perceptual quality
and fidelity without re-training the model.
We also explore alternative methods to balance the effects of PSNR-oriented
and GAN-based methods. For instance, one can directly interpolate their output
images (pixel by pixel) rather than the network parameters. However, such an
approach fails to achieve a good trade-off between noise and blur, i.e., the inter-
polated image is either too blurry or noisy with artifacts (see Sec. 4.5). Another
method is to tune the weights of content loss and adversarial loss, i.e., the pa-
rameter λ and η in Eq. (3). But this approach requires tuning loss weights and
fine-tuning the network, and thus it is too costly to achieve continuous control
of the image style.
4 Experiments
4.1 Training Details
Following SRGAN [1], all experiments are performed with a scaling factor of
×4 between LR and HR images. We obtain LR images by down-sampling HR
9. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 9
images using the MATLAB bicubic kernel function. The mini-batch size is set to
16. The spatial size of cropped HR patch is 128 × 128. We observe that training
a deeper network benefits from a larger patch size, since an enlarged receptive
field helps to capture more semantic information. However, it costs more training
time and consumes more computing resources. This phenomenon is also observed
in PSNR-oriented methods (see supplementary material).
The training process is divided into two stages. First, we train a PSNR-
oriented model with the L1 loss. The learning rate is initialized as 2 × 10−4
and
decayed by a factor of 2 every 2 × 105
of mini-batch updates. We then employ
the trained PSNR-oriented model as an initialization for the generator. The
generator is trained using the loss function in Eq. (3) with λ = 5×10−3
and η =
1×10−2
. The learning rate is set to 1×10−4
and halved at [50k, 100k, 200k, 300k]
iterations. Pre-training with pixel-wise loss helps GAN-based methods to obtain
more visually pleasing results. The reasons are that 1) it can avoid undesired
local optima for the generator; 2) after pre-training, the discriminator receives
relatively good super-resolved images instead of extreme fake ones (black or
noisy images) at the very beginning, which helps it to focus more on texture
discrimination.
For optimization, we use Adam [39] with β1 = 0.9, β2 = 0.999. We alternately
update the generator and discriminator network until the model converges. We
use two settings for our generator – one of them contains 16 residual blocks,
with a capacity similar to that of SRGAN and the other is a deeper model with
23 RRDB blocks. We implement our models with the PyTorch framework and
train them using NVIDIA Titan Xp GPUs.
4.2 Data
For training, we mainly use the DIV2K dataset [40], which is a high-quality (2K
resolution) dataset for image restoration tasks. Beyond the training set of DIV2K
that contains 800 images, we also seek for other datasets with rich and diverse
textures for our training. To this end, we further use the Flickr2K dataset [41]
consisting of 2650 2K high-resolution images collected on the Flickr website,
and the OutdoorSceneTraining (OST) [17] dataset to enrich our training set.
We empirically find that using this large dataset with richer textures helps the
generator to produce more natural results, as shown in Fig. 8.
We train our models in RGB channels and augment the training dataset
with random horizontal flips and 90 degree rotations. We evaluate our mod-
els on widely used benchmark datasets – Set5 [42], Set14 [43], BSD100 [44],
Urban100 [45], and the PIRM self-validation dataset that is provided in the
PIRM-SR Challenge.
4.3 Qualitative Results
We compare our final models on several public benchmark datasets with state-of-
the-art PSNR-oriented methods including SRCNN [4], EDSR [20] and RCAN [12],
and also with perceptual-driven approaches including SRGAN [1] and EnhanceNet
11. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 11
[16]. Since there is no effective and standard metric for perceptual quality, we
present some representative qualitative results in Fig. 7. PSNR (evaluated on
the luminance channel in YCbCr color space) and the perceptual index used in
the PIRM-SR Challenge are also provided for reference.
It can be observed from Fig. 7 that our proposed ESRGAN outperforms
previous approaches in both sharpness and details. For instance, ESRGAN can
produce sharper and more natural baboon’s whiskers and grass textures (see
image 43074) than PSNR-oriented methods, which tend to generate blurry re-
sults, and than previous GAN-based methods, whose textures are unnatural and
contain unpleasing noise. ESRGAN is capable of generating more detailed struc-
tures in building (see image 102061) while other methods either fail to produce
enough details (SRGAN) or add undesired textures (EnhanceNet). Moreover,
previous GAN-based methods sometimes introduce unpleasant artifacts, e.g.,
SRGAN adds wrinkles to the face. Our ESRGAN gets rid of these artifacts and
produces natural results.
4.4 Ablation Study
In order to study the effects of each component in the proposed ESRGAN, we
gradually modify the baseline SRGAN model and compare their differences.
The overall visual comparison is illustrated in Fig. 8. Each column represents
a model with its configurations shown in the top. The red sign indicates the
main improvement compared with the previous model. A detailed discussion is
provided as follows.
BN removal. We first remove all BN layers for stable and consistent perfor-
mance without artifacts. It does not decrease the performance but saves the
computational resources and memory usage. For some cases, a slight improve-
ment can be observed from the 2nd
and 3rd
columns in Fig. 8 (e.g., image 39).
Furthermore, we observe that when a network is deeper and more complicated,
the model with BN layers is more likely to introduce unpleasant artifacts. The
examples can be found in the supplementary material.
Before activation in perceptual loss. We first demonstrate that using fea-
tures before activation can result in more accurate brightness of reconstructed
images. To eliminate the influences of textures and color, we filter the image with
a Gaussian kernel and plot the histogram of its gray-scale counterpart. Fig. 9a
shows the distribution of each brightness value. Using activated features skews
the distribution to the left, resulting in a dimmer output while using features
before activation leads to a more accurate brightness distribution closer to that
of the ground-truth.
We can further observe that using features before activation helps to produce
sharper edges and richer textures as shown in Fig. 9b (see bird feather) and Fig. 8
(see the 3rd
and 4th
columns), since the dense features before activation offer a
stronger supervision than that a sparse activation could provide.
RaGAN. RaGAN uses an improved relativistic discriminator, which is shown
to benefit learning sharper edges and more detailed textures. For example, in
12. 12 Xintao Wang et al.
39 from PIRM self_val
43074 from BSD100
69015 from BSD100
6 from PIRM self_val
20 from PIRM self_val
208001 from BSD100
baboon from Set14
baboon from Set14
BN?
Activation?
GAN?
Deeper with RRDB?
After Before Before Before Before
Standard GAN
After
Standard GAN Standard GAN RaGAN RaGAN RaGAN
More data?
1st 2nd 3rd 4th 5th 6th 7th
Fig. 8: Overall visual comparisons for showing the effects of each component in
ESRGAN. Each column represents a model with its configurations in the top.
The red sign indicates the main improvement compared with the previous model.
13. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 13
before activation
after activation
GT
175032 from BSD100
Comparison of grayscale histogram
Pixel Value
Number
of
Pixels
(a) brightness influence
after activation
before activation
GT
163085 from BSD100
(b) detail influence
Fig. 9: Comparison between before activation and after activation.
the 5th
column of Fig. 8, the generated images are sharper with richer textures
than those on their left (see the baboon, image 39 and image 43074).
Deeper network with RRDB. Deeper model with the proposed RRDB can
further improve the recovered textures, especially for the regular structures like
the roof of image 6 in Fig. 8, since the deep model has a strong representation
capacity to capture semantic information. Also, we find that a deeper model can
reduce unpleasing noises like image 20 in Fig. 8.
In contrast to SRGAN, which claimed that deeper models are increasingly
difficult to train, our deeper model shows its superior performance with easy
training, thanks to the improvements mentioned above especially the proposed
RRDB without BN layers.
4.5 Network Interpolation
We compare the effects of network interpolation and image interpolation strate-
gies in balancing the results of a PSNR-oriented model and GAN-based method.
We apply simple linear interpolation on both the schemes. The interpolation
parameter α is chosen from 0 to 1 with an interval of 0.2.
As depicted in Fig. 10, the pure GAN-based method produces sharp edges
and richer textures but with some unpleasant artifacts, while the pure PSNR-
oriented method outputs cartoon-style blurry images. By employing network
interpolation, unpleasing artifacts are reduced while the textures are maintained.
By contrast, image interpolation fails to remove these artifacts effectively.
Interestingly, it is observed that the network interpolation strategy provides
a smooth control of balancing perceptual quality and fidelity in Fig. 10.
4.6 The PIRM-SR Challenge
We take a variant of ESRGAN to participate in the PIRM-SR Challenge [3].
Specifically, we use the proposed ESRGAN with 16 residual blocks and also em-
pirically make some modifications to cater to the perceptual index. 1) The MINC
loss is used as a variant of perceptual loss, as discussed in Sec. 3.3. Despite the
marginal gain on the perceptual index, we still believe that exploring perceptual
loss that focuses on texture is crucial for SR. 2) Pristine dataset [24], which is
14. 14 Xintao Wang et al.
𝛼=1
net
interp
image
interp
net
interp
image
interp
𝛼=0.8 𝛼=0.6 𝛼=0.4 𝛼=0.2 𝛼=0
PSNR-oriented
Perceptual-driven, GAN-based
79 from PIRM self_val
3 from PIRM self_val
Fig. 10: The comparison between network interpolation and image interpolation.
used for learning the perceptual index, is also employed in our training; 3) a
high weight of loss L1 up to η = 10 is used due to the PSNR constraints; 4) we
also use back projection [46] as post-processing, which can improve PSNR and
sometimes lower the perceptual index.
For other regions 1 and 2 that require a higher PSNR, we use image in-
terpolation between the results of our ESRGAN and those of a PSNR-oriented
method RCAN [12]. The image interpolation scheme achieves a lower perceptual
index (lower is better) although we observed more visually pleasing results by
using the network interpolation scheme. Our proposed ESRGAN model won the
first place in the PIRM-SR Challenge (region 3) with the best perceptual index.
5 Conclusion
We have presented an ESRGAN model that achieves consistently better per-
ceptual quality than previous SR methods. The method won the first place in
the PIRM-SR Challenge in terms of the perceptual index. We have formulated
a novel architecture containing several RDDB blocks without BN layers. In ad-
dition, useful techniques including residual scaling and smaller initialization are
employed to facilitate the training of the proposed deep model. We have also
introduced the use of relativistic GAN as the discriminator, which learns to
judge whether one image is more realistic than another, guiding the generator
to recover more detailed textures. Moreover, we have enhanced the perceptual
loss by using the features before activation, which offer stronger supervision and
thus restore more accurate brightness and realistic textures.
15. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 15
Acknowledgement. This work is supported by SenseTime Group Limited, the
General Research Fund sponsored by the Research Grants Council of the Hong
Kong SAR (CUHK 14241716, 14224316. 14209217), National Natural Science
Foundation of China (U1613211) and Shenzhen Research Program
(JCYJ20170818164704758, JCYJ20150925163005055).
References
1. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken,
A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-
resolution using a generative adversarial network. In: CVPR. (2017)
2. Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from
standard gan. arXiv preprint arXiv:1807.00734 (2018)
3. Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The pirm
challenge on perceptual super resolution. https://www.pirm2018.org/PIRM-SR.
html (2018)
4. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for
image super-resolution. In: ECCV. (2014)
5. Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very
deep convolutional networks. In: CVPR. (2016)
6. Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks
for fast and accurate super-resolution. In: CVPR. (2017)
7. Kim, J., Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolutional network for
image super-resolution. In: CVPR. (2016)
8. Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual
network. In: CVPR. (2017)
9. Tai, Y., Yang, J., Liu, X., Xu, C.: Memnet: A persistent memory network for
image restoration. In: ICCV. (2017)
10. Haris, M., Shakhnarovich, G., Ukita, N.: Deep backprojection networks for super-
resolution. In: CVPR. (2018)
11. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for
image super-resolution. In: CVPR. (2018)
12. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution
using very deep residual channel attention networks. In: ECCV. (2018)
13. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer
and super-resolution. In: ECCV. (2016)
14. Bruna, J., Sprechmann, P., LeCun, Y.: Super-resolution with deep convolutional
sufficient statistics. In: ICLR. (2015)
15. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. (2014)
16. Sajjadi, M.S., Schölkopf, B., Hirsch, M.: Enhancenet: Single image super-resolution
through automated texture synthesis. In: ICCV. (2017)
17. Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image
super-resolution by deep spatial feature transform. In: CVPR. (2018)
18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: CVPR. (2016)
19. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by
reducing internal covariate shift. In: ICMR. (2015)
16. 16 Xintao Wang et al.
20. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks
for single image super-resolution. In: CVPRW. (2017)
21. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact
of residual connections on learning. arXiv preprint arXiv:1602.07261 (2016)
22. Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR. (2017)
23. Ma, C., Yang, C.Y., Yang, X., Yang, M.H.: Learning a no-reference quality metric
for single-image super-resolution. CVIU 158 (2017) 1–16
24. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a completely blind image
quality analyzer. IEEE Signal Process. Lett. 20(3) (2013) 209–212
25. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convo-
lutional networks. TPAMI 38(2) (2016) 295–307
26. Yu, K., Dong, C., Lin, L., Loy, C.C.: Crafting a toolchain for image restoration by
deep reinforcement learning. In: CVPR. (2018)
27. Yuan, Y., Liu, S., Zhang, J., Zhang, Y., Dong, C., Lin, L.: Unsupervised image
super-resolution using cycle-in-cycle generative adversarial networks. In: CVPRW.
(2018)
28. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on imagenet classification. In: ICCV. (2015)
29. Gatys, L., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural
networks. In: NIPS. (2015)
30. Mechrez, R., Talmi, I., Shama, F., Zelnik-Manor, L.: Maintaining natural image
statistics with the contextual loss. arXiv preprint arXiv:1803.04626 (2018)
31. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint
arXiv:1701.07875 (2017)
32. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved
training of wasserstein gans. In: NIPS. (2017)
33. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for
generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
34. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected
convolutional networks. In: CVPR. (2017)
35. Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for
dynamic scene deblurring. In: CVPR. (2017)
36. Zhang, K., Sun, M., Han, X., Yuan, X., Guo, L., Liu, T.: Residual networks of
residual networks: Multilevel residual networks. IEEE Transactions on Circuits
and Systems for Video Technology (2017)
37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
38. Bell, S., Upchurch, P., Snavely, N., Bala, K.: Material recognition in the wild with
the materials in context database. In: CVPR. (2015)
39. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: ICLR. (2015)
40. Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution:
Dataset and study. In: CVPRW. (2017)
41. Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L., Lim, B., Son,
S., Kim, H., Nah, S., Lee, K.M., et al.: Ntire 2017 challenge on single image
super-resolution: Methods and results. In: CVPRW. (2017)
42. Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity
single-image super-resolution based on nonnegative neighbor embedding. In:
BMVC, BMVA press (2012)
43. Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-
representations. In: International Conference on Curves and Surfaces, Springer
(2010)
17. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks 17
44. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural
images and its application to evaluating segmentation algorithms and measuring
ecological statistics. In: ICCV. (2001)
45. Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed
self-exemplars. In: CVPR. (2015)
46. Timofte, R., Rothe, R., Van Gool, L.: Seven ways to improve example-based single
image super resolution. In: CVPR. (2016)
47. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward
neural networks. In: International Conference on Artificial Intelligence and Statis-
tics. (2010)
18. ESRGAN: Enhanced Super-Resolution
Generative Adversarial Networks
Supplementary File
Xintao Wang1
, Ke Yu1
, Shixiang Wu2
, Jinjin Gu3
, Yihao Liu4
,
Chao Dong2
, Chen Change Loy5
, Yu Qiao2
, Xiaoou Tang1
1
CUHK-SenseTime Joint Lab, The Chinese University of Hong Kong
2
SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology,
Chinese Academy of Sciences 3
The Chinese University of Hong Kong, Shenzhen
4
University of Chinese Academy of Sciences 5
Nanyang Technological University, Singapore
{wx016,yk017,xtang}@ie.cuhk.edu.hk, {sx.wu,chao.dong,yu.qiao}@siat.ac.cn
liuyihao14@mails.ucas.ac.cn, 115010148@link.cuhk.edu.cn, ccloy@ntu.edu.sg
Abstract. In this supplementary file, we first show more examples of
Batch-Normalization (BN) related artifacts in Section 1. Then we intro-
duce several useful techniques that facilitate training very deep models in
Section 2. The analysis of the influence of different datasets and training
patch size is depicted in Section 3 and Section 4, respectively. Finally, in
Section 5, we provide more qualitative results for visual comparison.
1 BN artifacts
We empirically observe that BN layers tend to bring artifacts. These artifacts,
namely BN artifacts, occasionally appear among iterations and different settings,
violating the needs for a stable performance over training. In this section, we
present that the network depth, BN position, training dataset and training loss
have impact on the occurrence of BN artifacts and show corresponding visual
examples in Fig. 1, 2 and 3.
Table 1: Experimental variants for exploring BN artifacts.
Name Number of RB BN position training dataset training loss
Exp base 16 LR space DIV2K L1
Exp BNinHR 16 LR and HR space DIV2K L1
Exp 64RB 64 LR space DIV2K L1
Exp skydata 16 LR space sky data L1
Exp SRGAN 16 LR space DIV2K V GG + GAN + L1
To explore BN artifacts, we conduct several experiments as shown in Tab. 1.
The baseline is similar to SRResNet [1] with 16 Residual Blocks (RB) and all
the BN layers are in the LR space, i.e., before up-sampling layers. The baseline
setting is unlikely to introduce BN artifacts in our experiments. However, if
the network goes deeper or there is an extra BN layer in HR space (i.e., after
up-sampling layers), BN artifacts are more likely to appear (see examples in
Fig. 1).
When we replace the training dataset of the baseline with the sky dataset [17],
the BN artifacts appear (see examples in Fig. 1). BN layers normalize the features
19. ESRGAN Supplementary File 19
Exp_64RB
Deeper network with 64 RBs
Exp_BNinHR
with BN in HR space
Exp_skydata
training with sky dataset
Fig. 1: Examples of BN artifacts in PSNR-oriented methods. The BN artifacts
are more likely to appear in deeper networks, with BN in HR space and using
mismatched dataset whose statistics are different from those of testing dataset.
using mean and variance in a batch during training while using estimated mean
and variance of the whole training dataset during testing. Therefore, when the
statistics of training (e.g., sky dataset) and testing datasets differ a lot, BN layers
tend to introduce unpleasant artifacts and limit the generalization ability.
Training in a GAN framework increases the occurrence probability of BN
artifacts in our experiments. We employ the same network structure as baseline
and replace the L1 loss with V GG + GAN + L1 loss. The BN artifacts become
more likely to appear and the visual examples are shown in Fig. 2.
baboon from Set14 zebra from Set14 175043 from BSD100
Fig. 2: Examples of BN artifacts in models under the GAN framework.
The BN artifacts occasionally appear over training, i.e, the BN artifacts
appear, disappear and change on different training iterations, as shown in Fig 3.
We therefore remove BN layers for stable training and consistent performance.
The reasons behind and potential solutions remain to be further studied.
2 Useful techniques to train a very deep network
Since we remove BN layers for stable training and consistent performance, train-
ing a very deep network becomes a problem. Despite the proposed Residual-in-
Residual Dense Block (RRDB), which takes advantages of residual learning and
more connections, we also find two useful techniques to ease the training of a
very deep networks – smaller initialization and residual scaling.
20. 20 Xintao Wang et al.
185k 285k 385k 485k 850k
Fig. 3: Evolution of the model Exp BNinHR (with BN in HR space) during
training progress.The BN artifacts occasionally appear over training, resulting
in unstable performance.
Initialization is important for a very deep network especially without BN lay-
ers [47,28]. He et al. [28] propose a robust initialization method, namely MSRA
initialization, that is suitable for VGG-style network (plain network without
residual connections). The assumption is that a proper initialization method
should avoid reducing or magnifying the magnitudes of input signals exponen-
tially. It is worth noting that this assumption no longer holds due to the residual
path in ResNet [18], leading to a magnified magnitudes of input signals. This
problem is alleviated by normalizing the features with BN layers [19]. For a very
deep network containing residual blocks without BN layers, a new initialization
method should be applied. We find a smaller initialization than MSRA initializa-
tion (multiplying 0.1 for all initialization parameters that calculated by MSRA
initialization) works well in our experiments.
Another method for training deeper networks is residual learning, proposed
by Szegedy et al. [21] and also used in used in EDSR [20]. It scales down the
residuals by multiplying a constant between 0 and 1 before adding them to
the main path to prevent instability. In our settings, for each residual block, the
residual features after the last convolution layer are multiplied by 0.2. Intuitively,
the residual scaling can be interpreted to correct the improper initialization, thus
avoiding magnifying the magnitudes of input signals in residual networks.
We use a very deep network containing 64 RBs for experiments. As shown
in Fig. 4a, if we simply use MSRA initialization, the network falls into an ex-
tremely bad local minimum with poor performance. However, smaller initializa-
tion (×0.1) helps the network to jump out the bad local minimum and achieve
good performance. The zoomed curves are shown in Fig. 4b. Smaller initializa-
tion achieves a higher PSNR than residual scaling. In addition, we can use both
techniques to further obtain a slight improvement.
3 The influence of different datasets
First we show that larger datasets lead to better performance for PSNR-oriented
methods. We use a large model, where 23 Residual-in-Residual Blocks (RRDB)
are placed before the upsampling layer followed by two convolution layers for
reconstruction. The overall comparison of quantitative evaluation can be found
in Tab. 2.
21. ESRGAN Supplementary File 21
0 200k 400k 600k 800k 1000k
Iteration
5
10
15
20
25
30
PSNR
MSRA init x0.1
residual scaling (x0.2)
MSRA init
(a)
0 200k 400k 600k 800k 1000k
Iteration
29.8
29.9
30.0
30.1
30.2
30.3
30.4
PSNR
MSRA init x0.1 + residual scaling
MSRA init x0.1
residual scaling (x0.2)
(b)
Fig. 4: Smaller initialization and residual scaling benefit the convergence and
the performance of very deep networks (PSNR is evaluated on Set5 with RGB
channels).
A widely used training dataset is DIV2K [40] that contains 800 images. We
also explore other datasets with more diverse scenes – Flickr2K dataset [41]
consisting of 2650 2K high-resolution images collected on the Flickr website. It
is observed that the merged dataset with DIV2K and Flickr2K, namely DF2K
dataset, increases the PSNR performance (see Tab. 2).
Table 2: Quantitative evaluation of state-of-the-art PSNR-oriented SR algo-
rithms: average PSNR/SSIM on Y channel. The best and second best results
are highlighted and underlined, respectively.
Method
with training data
Set5 Set14 BSD100 Urban100 Manga109
PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM
Bicubic - 28.42/0.8104 26.00/0.7027 25.96/0.6675 23.14/0.6577 24.89/0.7866
SRCNN [4] 291 30.48/0.8628 27.50/0.7513 26.90/0.7101 24.52/0.7221 27.58/0.8555
MemNet [9] 291 31.74/0.8893 28.26/0.7723 27.40/0.7281 25.50/0.7630 29.42/0.8942
EDSR [20] DIV2K 32.46/0.8968 28.80/0.7876 27.71/0.7420 26.64/0.8033 31.02/0.9148
RDN [11] DIV2K 32.47/0.8990 28.81/0.7871 27.72/0.7419 26.61/0.8028 31.00/0.9151
RCAN [12] DIV2K 32.63/0.9002 28.87/0.7889 27.77/0.7436 26.82/ 0.8087 31.22/ 0.9173
RRDB(ours) DIV2K 32.60/0.9002 28.88/0.7896 27.76/ 0.7432 26.73/0.8072 31.16/0.9164
RRDB(ours) DF2K 32.73/0.9011 28.99/0.7917 27.85/0.7455 27.03/0.8153 31.66/0.9196
For perceptual-driven methods that focus on texture restoration, we further
enrich the training set with OutdoorSceneTraining (OST) [17] dataset with di-
verse natural textures. We employ the large model with 23 RRDB blocks. A
subset of ImageNet containing about 450k images is also used for comparison.
The qualitative results are shown in Fig. 5. Training with ImageNet introduces
new types of artifacts as in image zebra of Fig. 5 while OST dataset benefits the
grass restoration.
4 The influence of training patch size
We observe that training a deeper network benefits from a larger patch size,
since an enlarged receptive field helps the network to capture more semantic
22. 22 Xintao Wang et al.
ImageNet (450k)
baboon from Set14
DIV2K DF2K DF2K+OST
zebra from Set14
78004 from BSD100
Fig. 5: The influence of different datasets.
information. We try training patch size 96 × 96, 128 × 128 and 192 × 192 on
models with 16 RBs and 23 RRDBs (larger model capacity). The training curves
(evaluated on Set5 with RGB channels) are shown in Fig. 6.
It is observed that both models benefit from larger training patch size. More-
over, the deeper model achieves more improvement (∼0.12dB) than the shallower
one (∼0.04dB) since larger model capacity is capable of taking full advantage of
larger training patch size.
However, larger training patch size costs more training time and consumes
more computing resources. As a trade-off, we use 192 × 192 for PSNR-oriented
methods and 128 × 128 for perceptual-driven methods.
0 200k 400k 600k 800k 1000k
Iteration
29.8
29.9
30.0
30.1
30.2
30.3
PSNR
192x192
128x128
96x96
(a) 16 Residual Blocks
0 100k 200k 300k 400k 500k 600k 700k 800k
Iteration
29.8
30.0
30.2
30.4
30.6
30.8
PSNR
192x192
128x128
96x96
(b) 23 RRDBs
Fig. 6: The influence of training patch size (PSNR is evaluated on Set5 with
RGB channels).
5 More qualitative comparison