This document discusses Gabor convolutional networks (GCNs), which incorporate Gabor filters into deep convolutional neural networks (DCNNs) to improve robustness to orientation and scale changes. GCNs modulate the learned convolution filters with Gabor filters, generating convolutional Gabor orientation filters (GoFs) that endow the ability to capture spatial localization, orientation selectivity, and spatial frequency selectivity. This allows GCNs to learn more compact yet enhanced representations with improved generalization to image transformations, outperforming conventional DCNN architectures. The incorporation of Gabor filters into the convolution operation can be integrated into any deep learning model to enhance robustness to scale and rotation variations.
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...ijistjournal
This paper proposes a new procedure in order to improve the performance of block matching and 3-D filtering (BM3D) image denoising algorithm. It is demonstrated that it is possible to achieve a better performance than that of BM3D algorithm in a variety of noise levels. This method changes BM3D algorithm parameter values according to noise level, removes prefiltering, which is used in high noise level; therefore Peak Signal-to-Noise Ratio (PSNR) and visual quality get improved, and BM3D complexities and processing time are reduced. This improved BM3D algorithm is extended and used to denoise satellite and color filter array (CFA) images. Output results show that the performance has upgraded in comparison with current methods of denoising satellite and CFA images. In this regard this algorithm is compared with Adaptive PCA algorithm, that has led to superior performance for denoising CFA images, on the subject of PSNR and visual quality. Also the processing time has decreased significantly.
In this paper, we analyze and compare the performance of fusion methods based on four different
transforms: i) wavelet transform, ii) curvelet transform, iii) contourlet transform and iv) nonsubsampled
contourlet transform. Fusion framework and scheme are explained in detail, and two different sets of
images are used in our experiments. Furthermore, eight different performancemetrics are adopted to
comparatively analyze the fusion results. The comparison results show that the nonsubsampled contourlet
transform method performs better than the other three methods, both spatially and spectrally. We also
observed from additional experiments that the decomposition level of 3 offered the best fusion performance,
anddecomposition levels beyond level-3 did not significantly improve the fusion results.
This document describes a narrow band region approach for 2D and 3D image segmentation using deformable curves and surfaces. Specifically, it develops a region energy term involving a fixed-width band around the evolving curve or surface. This energy achieves a balance between local gradient features and global region statistics. The region energy is formulated to allow efficient computation on explicit parametric models and implicit level set models. Two different region terms are introduced, each suited to different configurations of the target object and its surroundings. The document derives the mathematical framework for computing the region energies and their gradients to allow minimization via gradient descent. It then discusses numerical implementations and provides experiments segmenting medical and natural images.
Color Image Watermarking using Cycle Spinning based Sharp Frequency Localized...CSCJournals
This paper describes a new approach for color image watermarking using Cycle Spinning based Sharp Frequency Localized Contourlet Transform and Principal Component Analysis. The approach starts with decomposition of images into various subbands using Contourlet Transform(CT) successively for all the color spaces of both host and watermark images. Then principal components of middle band(x bands) are considered for inserting operation. The ordinary contourlet transform suffers from lack of frequency localization. The localization being the most important criterion for watermarking, the conventional CT is not very suitable for watermarking. This problem of CT is over come by Sharp Frequency Localized Contourlet, but this lacks of translation invariance. Hence the cycle spinning based sharp frequency localized contourlet chosen for watermarking. Embedding at middle level sub bands(x band) preserves the curve nature of edges in the host image hence less disturbance is observed when host and watermark images are compared. This result in very good Peak Signal to Noise Ratio (PSNR) instead of directly adding of mid frequency components of watermark and host images the principal components are only added. Likewise the amount of payload to be added is reduced hence host images get very less distortion. Usage of principal components also helps in fruitful extraction of watermark information from host image hence gives good correlation between input watermark and extracted one. This technique has shown a very high robustness under various intentional and non intentional attacks.
Study on Reconstruction Accuracy using shapiness index of morphological trans...ijcseit
Basin, lakes, and pore-grain space are important geophysical shapes, which can fit with the several
classical and fractal binary shapes, are processed by employing morphological transformations, and
methods. The decomposition of skeleton network (minimum morphological information) using various
classical structures like square, octagon and rhombus. Then derive the dilated subsets respective degree by
the structures for reconstruct the original image. Through shapiness index of pattern spectrum procedure,
we try test the reconstruction accuracy in a quantitative manner. It gives some general procedure to
characterise the shape-size complexity of surface water body. The reconstruction accuracy is against the
size of water bodies with which we produce the some example of different shapiness index for different
structuring element of shapes. In which quantitative manner approach yields better reconstruction level.
The complexity of water bodies are compared with the surfaces.
This project report summarizes research on using deep learning methods for super resolution of document images to improve optical character recognition (OCR) accuracy. Two approaches are studied: SRCNN and incorporating OCR accuracy into the loss function. Experiments include modifying the dataset to focus on text regions and varying the SRCNN architecture. Results show the 2-layer SRCNN trained on the modified dataset achieves the best OCR accuracy, exceeding the ground truth in some cases.
This document compares the performance of three mobile ad hoc network (MANET) routing protocols: AODV, FSR, and IERP. It uses the QualNet network simulator to evaluate these protocols based on various metrics like throughput, average jitter, average end-to-end delay, and packet delivery ratio. The protocols are evaluated under different node speeds on a grid topology network with 90 nodes over an area of 1500x1500 meters. Simulation results show that AODV generally performs best in terms of throughput and packet delivery ratio across varying node speeds, while FSR performs worst for these metrics. IERP shows the worst performance for average end-to-end delay and average jitter as node speed increases.
Remote sensing image fusion using contourlet transform with sharp frequency l...Zac Darcy
This paper addresses four different aspects of the remote sensing image fusion: i) image fusion method, ii)
quality analysis of fusion results, iii) effects of image decomposition level, and iv) importance of image
registration. First, a new contourlet-based image fusion method is presented, which is an improvement
over the wavelet-based fusion. This fusion method is then utilized withinthe main fusion process to analyze
the final fusion results. Fusion framework, scheme and datasets used in the study are discussed in detail.
Second, quality analysis of the fusion results is discussed using various quantitative metrics for both spatial
and spectral analyses. Our results indicate that the proposed contourlet-based fusion method performs
better than the conventional wavelet-based fusion methodsin terms of both spatial and spectral analyses.
Third, we conducted an analysis on the effects of the image decomposition level and observed that the
decomposition level of 3 produced better fusion results than both smaller and greater number of levels.
Last, we created four different fusion scenarios to examine the importance of the image registration. As a
result, the feature-based image registration using the edge features of the source images produced better
fusion results than the intensity-based imageregistration.
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...ijistjournal
This paper proposes a new procedure in order to improve the performance of block matching and 3-D filtering (BM3D) image denoising algorithm. It is demonstrated that it is possible to achieve a better performance than that of BM3D algorithm in a variety of noise levels. This method changes BM3D algorithm parameter values according to noise level, removes prefiltering, which is used in high noise level; therefore Peak Signal-to-Noise Ratio (PSNR) and visual quality get improved, and BM3D complexities and processing time are reduced. This improved BM3D algorithm is extended and used to denoise satellite and color filter array (CFA) images. Output results show that the performance has upgraded in comparison with current methods of denoising satellite and CFA images. In this regard this algorithm is compared with Adaptive PCA algorithm, that has led to superior performance for denoising CFA images, on the subject of PSNR and visual quality. Also the processing time has decreased significantly.
In this paper, we analyze and compare the performance of fusion methods based on four different
transforms: i) wavelet transform, ii) curvelet transform, iii) contourlet transform and iv) nonsubsampled
contourlet transform. Fusion framework and scheme are explained in detail, and two different sets of
images are used in our experiments. Furthermore, eight different performancemetrics are adopted to
comparatively analyze the fusion results. The comparison results show that the nonsubsampled contourlet
transform method performs better than the other three methods, both spatially and spectrally. We also
observed from additional experiments that the decomposition level of 3 offered the best fusion performance,
anddecomposition levels beyond level-3 did not significantly improve the fusion results.
This document describes a narrow band region approach for 2D and 3D image segmentation using deformable curves and surfaces. Specifically, it develops a region energy term involving a fixed-width band around the evolving curve or surface. This energy achieves a balance between local gradient features and global region statistics. The region energy is formulated to allow efficient computation on explicit parametric models and implicit level set models. Two different region terms are introduced, each suited to different configurations of the target object and its surroundings. The document derives the mathematical framework for computing the region energies and their gradients to allow minimization via gradient descent. It then discusses numerical implementations and provides experiments segmenting medical and natural images.
Color Image Watermarking using Cycle Spinning based Sharp Frequency Localized...CSCJournals
This paper describes a new approach for color image watermarking using Cycle Spinning based Sharp Frequency Localized Contourlet Transform and Principal Component Analysis. The approach starts with decomposition of images into various subbands using Contourlet Transform(CT) successively for all the color spaces of both host and watermark images. Then principal components of middle band(x bands) are considered for inserting operation. The ordinary contourlet transform suffers from lack of frequency localization. The localization being the most important criterion for watermarking, the conventional CT is not very suitable for watermarking. This problem of CT is over come by Sharp Frequency Localized Contourlet, but this lacks of translation invariance. Hence the cycle spinning based sharp frequency localized contourlet chosen for watermarking. Embedding at middle level sub bands(x band) preserves the curve nature of edges in the host image hence less disturbance is observed when host and watermark images are compared. This result in very good Peak Signal to Noise Ratio (PSNR) instead of directly adding of mid frequency components of watermark and host images the principal components are only added. Likewise the amount of payload to be added is reduced hence host images get very less distortion. Usage of principal components also helps in fruitful extraction of watermark information from host image hence gives good correlation between input watermark and extracted one. This technique has shown a very high robustness under various intentional and non intentional attacks.
Study on Reconstruction Accuracy using shapiness index of morphological trans...ijcseit
Basin, lakes, and pore-grain space are important geophysical shapes, which can fit with the several
classical and fractal binary shapes, are processed by employing morphological transformations, and
methods. The decomposition of skeleton network (minimum morphological information) using various
classical structures like square, octagon and rhombus. Then derive the dilated subsets respective degree by
the structures for reconstruct the original image. Through shapiness index of pattern spectrum procedure,
we try test the reconstruction accuracy in a quantitative manner. It gives some general procedure to
characterise the shape-size complexity of surface water body. The reconstruction accuracy is against the
size of water bodies with which we produce the some example of different shapiness index for different
structuring element of shapes. In which quantitative manner approach yields better reconstruction level.
The complexity of water bodies are compared with the surfaces.
This project report summarizes research on using deep learning methods for super resolution of document images to improve optical character recognition (OCR) accuracy. Two approaches are studied: SRCNN and incorporating OCR accuracy into the loss function. Experiments include modifying the dataset to focus on text regions and varying the SRCNN architecture. Results show the 2-layer SRCNN trained on the modified dataset achieves the best OCR accuracy, exceeding the ground truth in some cases.
This document compares the performance of three mobile ad hoc network (MANET) routing protocols: AODV, FSR, and IERP. It uses the QualNet network simulator to evaluate these protocols based on various metrics like throughput, average jitter, average end-to-end delay, and packet delivery ratio. The protocols are evaluated under different node speeds on a grid topology network with 90 nodes over an area of 1500x1500 meters. Simulation results show that AODV generally performs best in terms of throughput and packet delivery ratio across varying node speeds, while FSR performs worst for these metrics. IERP shows the worst performance for average end-to-end delay and average jitter as node speed increases.
Remote sensing image fusion using contourlet transform with sharp frequency l...Zac Darcy
This paper addresses four different aspects of the remote sensing image fusion: i) image fusion method, ii)
quality analysis of fusion results, iii) effects of image decomposition level, and iv) importance of image
registration. First, a new contourlet-based image fusion method is presented, which is an improvement
over the wavelet-based fusion. This fusion method is then utilized withinthe main fusion process to analyze
the final fusion results. Fusion framework, scheme and datasets used in the study are discussed in detail.
Second, quality analysis of the fusion results is discussed using various quantitative metrics for both spatial
and spectral analyses. Our results indicate that the proposed contourlet-based fusion method performs
better than the conventional wavelet-based fusion methodsin terms of both spatial and spectral analyses.
Third, we conducted an analysis on the effects of the image decomposition level and observed that the
decomposition level of 3 produced better fusion results than both smaller and greater number of levels.
Last, we created four different fusion scenarios to examine the importance of the image registration. As a
result, the feature-based image registration using the edge features of the source images produced better
fusion results than the intensity-based imageregistration.
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...ijma
Thresholding operators have been used successfully for denoising signals, mostly in the wavelet domain.
These operators transform a noisy coefficient into a denoised coefficient with a mapping that depends on
signal statistics and the value of the noisy coefficient itself. This paper demonstrates that a polynomial
threshold mapping can be used for enhanced denoising of Principal Component Analysis (PCA) transform
coefficients. In particular, two polynomial threshold operators are used here to map the coefficients
obtained with the popular local pixel grouping method (LPG-PCA), which eventually improves the
denoising power of LPG-PCA. The method reduces the computational burden of LPG-PCA, by eliminating
the need for a second iteration in most cases. Quality metrics and visual assessment show the improvement.
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...ijcsit
This paper introduces a novel approach for efficient video categorization. It relies on two main
components. The first one is a new relational clustering technique that identifies video key frames by
learning cluster dependent Gaussian kernels. The proposed algorithm, called clustering and Local Scale
Learning algorithm (LSL) learns the underlying cluster dependent dissimilarity measure while finding
compact clusters in the given dataset. The learned measure is a Gaussian dissimilarity function defined
with respect to each cluster. We minimize one objective function to optimize the optimal partition and the
cluster dependent parameter. This optimization is done iteratively by dynamically updating the partition
and the local measure. The kernel learning task exploits the unlabeled data and reciprocally, the
categorization task takes advantages of the local learned kernel. The second component of the proposed
video categorization system consists in discovering the video categories in an unsupervised manner using
the proposed LSL. We illustrate the clustering performance of LSL on synthetic 2D datasets and on high
dimensional real data. Also, we assess the proposed video categorization system using a real video
collection and LSL algorithm.
REGION CLASSIFICATION BASED IMAGE DENOISING USING SHEARLET AND WAVELET TRANSF...csandit
This paper proposes a neural network based region classification technique that classifies
regions in an image into two classes: textures and homogenous regions. The classification is
based on training a neural network with statistical parameters belonging to the regions of
interest. An application of this classification method is applied in image denoising by applying
different transforms to the two different classes. Texture is denoised by shearlets while
homogenous regions are denoised by wavelets. The denoised results show better performance
than either of the transforms applied independently. The proposed algorithm successfully
reduces the mean square error of the denoised result and provides perceptually good results.
This document proposes using deep neural networks for change detection in synthetic aperture radar (SAR) images. It summarizes the classic difference image method for SAR change detection and its challenges. The proposed method uses a restricted Boltzmann machine deep neural network to directly classify pixels in the original SAR images as changed or unchanged, without generating a difference image. It involves pre-classifying the images using fuzzy c-means clustering to label pixels for training the neural network. The document describes establishing and training the deep neural network to learn image features and classify pixels.
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
This document proposes a new texture feature-based approach for crowd density estimation using Completed Local Binary Pattern (CLBP). The approach divides images into blocks and further divides blocks into cells to compute CLBP features for each cell. A multi-class Support Vector Machine (SVM) classifier is trained to classify each block into one of four crowd density categories (Very Low, Low, Medium, High). Experiments on the PETS 2009 dataset show the proposed CLBP descriptor achieves 95% accuracy and outperforms other texture descriptors for crowd density estimation.
Higher Order Feature Set For Underwater Noise ClassificationCSCJournals
The development of intelligent systems for classification of underwater noise sources has been a field of research interest for decades. Such systems include the extraction of features from the received signals, followed by the application of suitable classification algorithms. Most of the existing feature extraction methods rely on the classical power spectral methods, which may fail to provide information pertaining to the deviations from linearity and Gaussianity of stochastic processes. Hence, many recent research efforts focus on higher order spectral methods in order to prevail over such limitations. This paper makes use of bispectrum, which is a higher order spectrum of order three, in order to extract a set of robust features for the classification of underwater noise sources. An SVM classifier is used for evaluating the performance of the feature set.
This document presents a new method for image compression called Haar Wavelet Based Joint Compression Method Using Adaptive Fractal Image Compression (DWT+AFIC). It combines discrete wavelet transform with an existing adaptive fractal image compression technique to improve compression ratio and reconstructed image quality compared to previous fractal image compression methods. The document introduces fractal image compression and its limitations, describes the proposed DWT+AFIC method and 5 other compression techniques, provides simulation results on test images showing DWT+AFIC achieves higher peak signal to noise ratios and compression ratios than other methods, and concludes DWT+AFIC decreases encoding time while increasing compression ratio and maintaining reconstructed image quality.
Survey on clustering based color image segmentation and novel approaches to f...eSAT Journals
Abstract Segmentation is an important image processing technique that helps to analyze an image automatically. Applications involving detection or recognition of objects in images often include segmentation process. This paper describes two unsupervised clustering based color image segmentation techniques namely K-means clustering and Fuzzy C-means (FCM) clustering. The advantages and disadvantages of both K-means and Fuzzy C-means algorithm are also presented in this paper. K-means algorithm takes less computation time as compared to Fuzzy C-means algorithm which produces result close to that of K-means. On the other hand in FCM algorithm each pixel of an image can have membership to more than one cluster which is not in case of K-means algorithm, an advantage to FCM method. Color images contain wide variety of information and are more complicated than gray scale images. In image processing, though color image segmentation is a challenging task but provides a path for image analysis in practical application fields. Secondly some novel approaches to FCM algorithm for better image segmentation are also discussed such as SFCM (Spatial FCM) and THFCM (Thresholding FCM). Basic FCM algorithm does not take into consideration the spatial information of the image. SFCM specially focus on spatial details and contribute towards image segmentation results for image analysis. It introduces spatial function into FCM algorithm membership function and then operates with available spatial information. THFCM is another approach that focus on thresholding technique for image segmentation. It main task is to find a discerner cluster that will act as automatic threshold. These two approaches shows how better segmentation results can be obtained.
Review of Diverse Techniques Used for Effective Fractal Image CompressionIRJET Journal
This document reviews different techniques for fractal image compression to enhance compression ratio while maintaining image quality. It discusses algorithms like quadtree partitioning with Huffman coding (QPHC), discrete cosine transform based fractal image compression (DCT-FIC), discrete wavelet transform based fractal image compression (DWTFIC), and Grover's quantum search algorithm based fractal image compression (QAFIC). The document also analyzes works applying these techniques and concludes that combining QAFIC with the tiny block size processing algorithm may further improve compression ratio with minimal quality loss.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This document summarizes JPEG image compression techniques. It discusses how images are divided into blocks and transformed from the spatial domain to the frequency domain using the Discrete Cosine Transform (DCT). It then describes how the DCT coefficients are quantized and arranged in zigzag order before entropy encoding with Huffman coding. The goal of JPEG compression is to store image data using as little space as possible while maintaining enough visual detail. The techniques discussed aim to remove irrelevant and redundant image data through DCT, quantization, and entropy encoding.
An improved image compression algorithm based on daubechies wavelets with ar...Alexander Decker
This document summarizes an academic article that proposes a new image compression algorithm using Daubechies wavelets and arithmetic coding. It first discusses existing image compression techniques and their limitations. It then describes the proposed algorithm, which applies Daubechies wavelet transform followed by 2D Walsh wavelet transform on image blocks and arithmetic coding. Results show the proposed method achieves higher compression ratios and PSNR values than existing algorithms like EZW and SPIHT. Future work aims to improve results by exploring different wavelets and compression techniques.
This document proposes an Adaptive Resolution Enhancement Algorithm (AREA) using an Edge Targeted Filter (ETF) and Dual Tree Complex Wavelet Transform (DTCWT) to enhance the resolution and image quality of low-quality, low-resolution images. The ETF first detects edges in an input image and generates a mask. It then focuses on improving edge reconstruction. The output is input to DTCWT, which analyzes subband coefficients to detect flaws and interpolates coefficients to enhance spectral resolution. The inverse DTCWT produces a spatially enhanced output image with improved resolution up to 17% compared to existing techniques. Simulation results demonstrate the effectiveness of the proposed algorithm.
The mean shift procedure is a general nonparametric technique for analyzing complex multimodal feature spaces and delineating arbitrarily shaped clusters. It works by recursively finding the nearest stationary point of the underlying density function, which corresponds to the mode of the density. The mean shift procedure relates to kernel density estimation and robust M-estimators of location. It provides a versatile tool for feature space analysis that can solve many low-level computer vision tasks with few parameters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
PDE BASED FEATURES FOR TEXTURE ANALYSIS USING WAVELET TRANSFORMIJCI JOURNAL
This document summarizes a research paper that proposes a novel method for texture analysis using wavelet transforms and partial differential equations (PDEs). The method involves applying wavelet transforms to images to obtain directional information. Anisotropic diffusion, a PDE technique, is then used on the directional information to compute a texture approximation. Various statistical features are extracted from the texture approximation. Linear discriminant analysis enhances class separability of the features before classification using k-nearest neighbors. The method is evaluated on the Brodatz texture dataset and results show it achieves better classification accuracy than other methods while having lower computational cost.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes an approach for object retrieval in videos using techniques from text retrieval. Regions of interest in video frames are detected and described using invariant descriptors. The descriptors are then vector quantized to form a "visual vocabulary" of words. Frames are represented as vectors of word frequencies and ranked based on similarity to query frames. The method is evaluated on two feature films, demonstrating immediate retrieval of all frames containing a queried object throughout the videos.
EFFICIENT IMAGE COMPRESSION USING LAPLACIAN PYRAMIDAL FILTERS FOR EDGE IMAGESijcnac
This project presents a new image compression technique for the coding of retinal and
fingerprint images. Retinal images are used to detect diseases like diabetes or
hypertension. Fingerprint images are used for the security purpose. In this work, the
contourlet transform of the retinal and fingerprint image is taken first. The coefficients of
the contourlet transform are quantized using adaptive multistage vector quantization
scheme. The number of code vectors in the adaptive vector quantization scheme depends
on the dynamic range of the input image.
This document provides information on several remote sensing projects from IEEE 2015. It lists the titles, languages, and abstracts for 8 projects related to classification and analysis of hyperspectral and multispectral images. The projects focus on techniques such as sparse representation in tangent space, Gabor feature-based collaborative representation, level set evolutions for object extraction, and dimension reduction using spatial and spectral regularization.
DUAL POLYNOMIAL THRESHOLDING FOR TRANSFORM DENOISING IN APPLICATION TO LOCAL ...ijma
Thresholding operators have been used successfully for denoising signals, mostly in the wavelet domain.
These operators transform a noisy coefficient into a denoised coefficient with a mapping that depends on
signal statistics and the value of the noisy coefficient itself. This paper demonstrates that a polynomial
threshold mapping can be used for enhanced denoising of Principal Component Analysis (PCA) transform
coefficients. In particular, two polynomial threshold operators are used here to map the coefficients
obtained with the popular local pixel grouping method (LPG-PCA), which eventually improves the
denoising power of LPG-PCA. The method reduces the computational burden of LPG-PCA, by eliminating
the need for a second iteration in most cases. Quality metrics and visual assessment show the improvement.
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...ijcsit
This paper introduces a novel approach for efficient video categorization. It relies on two main
components. The first one is a new relational clustering technique that identifies video key frames by
learning cluster dependent Gaussian kernels. The proposed algorithm, called clustering and Local Scale
Learning algorithm (LSL) learns the underlying cluster dependent dissimilarity measure while finding
compact clusters in the given dataset. The learned measure is a Gaussian dissimilarity function defined
with respect to each cluster. We minimize one objective function to optimize the optimal partition and the
cluster dependent parameter. This optimization is done iteratively by dynamically updating the partition
and the local measure. The kernel learning task exploits the unlabeled data and reciprocally, the
categorization task takes advantages of the local learned kernel. The second component of the proposed
video categorization system consists in discovering the video categories in an unsupervised manner using
the proposed LSL. We illustrate the clustering performance of LSL on synthetic 2D datasets and on high
dimensional real data. Also, we assess the proposed video categorization system using a real video
collection and LSL algorithm.
REGION CLASSIFICATION BASED IMAGE DENOISING USING SHEARLET AND WAVELET TRANSF...csandit
This paper proposes a neural network based region classification technique that classifies
regions in an image into two classes: textures and homogenous regions. The classification is
based on training a neural network with statistical parameters belonging to the regions of
interest. An application of this classification method is applied in image denoising by applying
different transforms to the two different classes. Texture is denoised by shearlets while
homogenous regions are denoised by wavelets. The denoised results show better performance
than either of the transforms applied independently. The proposed algorithm successfully
reduces the mean square error of the denoised result and provides perceptually good results.
This document proposes using deep neural networks for change detection in synthetic aperture radar (SAR) images. It summarizes the classic difference image method for SAR change detection and its challenges. The proposed method uses a restricted Boltzmann machine deep neural network to directly classify pixels in the original SAR images as changed or unchanged, without generating a difference image. It involves pre-classifying the images using fuzzy c-means clustering to label pixels for training the neural network. The document describes establishing and training the deep neural network to learn image features and classify pixels.
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
This document proposes a new texture feature-based approach for crowd density estimation using Completed Local Binary Pattern (CLBP). The approach divides images into blocks and further divides blocks into cells to compute CLBP features for each cell. A multi-class Support Vector Machine (SVM) classifier is trained to classify each block into one of four crowd density categories (Very Low, Low, Medium, High). Experiments on the PETS 2009 dataset show the proposed CLBP descriptor achieves 95% accuracy and outperforms other texture descriptors for crowd density estimation.
Higher Order Feature Set For Underwater Noise ClassificationCSCJournals
The development of intelligent systems for classification of underwater noise sources has been a field of research interest for decades. Such systems include the extraction of features from the received signals, followed by the application of suitable classification algorithms. Most of the existing feature extraction methods rely on the classical power spectral methods, which may fail to provide information pertaining to the deviations from linearity and Gaussianity of stochastic processes. Hence, many recent research efforts focus on higher order spectral methods in order to prevail over such limitations. This paper makes use of bispectrum, which is a higher order spectrum of order three, in order to extract a set of robust features for the classification of underwater noise sources. An SVM classifier is used for evaluating the performance of the feature set.
This document presents a new method for image compression called Haar Wavelet Based Joint Compression Method Using Adaptive Fractal Image Compression (DWT+AFIC). It combines discrete wavelet transform with an existing adaptive fractal image compression technique to improve compression ratio and reconstructed image quality compared to previous fractal image compression methods. The document introduces fractal image compression and its limitations, describes the proposed DWT+AFIC method and 5 other compression techniques, provides simulation results on test images showing DWT+AFIC achieves higher peak signal to noise ratios and compression ratios than other methods, and concludes DWT+AFIC decreases encoding time while increasing compression ratio and maintaining reconstructed image quality.
Survey on clustering based color image segmentation and novel approaches to f...eSAT Journals
Abstract Segmentation is an important image processing technique that helps to analyze an image automatically. Applications involving detection or recognition of objects in images often include segmentation process. This paper describes two unsupervised clustering based color image segmentation techniques namely K-means clustering and Fuzzy C-means (FCM) clustering. The advantages and disadvantages of both K-means and Fuzzy C-means algorithm are also presented in this paper. K-means algorithm takes less computation time as compared to Fuzzy C-means algorithm which produces result close to that of K-means. On the other hand in FCM algorithm each pixel of an image can have membership to more than one cluster which is not in case of K-means algorithm, an advantage to FCM method. Color images contain wide variety of information and are more complicated than gray scale images. In image processing, though color image segmentation is a challenging task but provides a path for image analysis in practical application fields. Secondly some novel approaches to FCM algorithm for better image segmentation are also discussed such as SFCM (Spatial FCM) and THFCM (Thresholding FCM). Basic FCM algorithm does not take into consideration the spatial information of the image. SFCM specially focus on spatial details and contribute towards image segmentation results for image analysis. It introduces spatial function into FCM algorithm membership function and then operates with available spatial information. THFCM is another approach that focus on thresholding technique for image segmentation. It main task is to find a discerner cluster that will act as automatic threshold. These two approaches shows how better segmentation results can be obtained.
Review of Diverse Techniques Used for Effective Fractal Image CompressionIRJET Journal
This document reviews different techniques for fractal image compression to enhance compression ratio while maintaining image quality. It discusses algorithms like quadtree partitioning with Huffman coding (QPHC), discrete cosine transform based fractal image compression (DCT-FIC), discrete wavelet transform based fractal image compression (DWTFIC), and Grover's quantum search algorithm based fractal image compression (QAFIC). The document also analyzes works applying these techniques and concludes that combining QAFIC with the tiny block size processing algorithm may further improve compression ratio with minimal quality loss.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This document summarizes JPEG image compression techniques. It discusses how images are divided into blocks and transformed from the spatial domain to the frequency domain using the Discrete Cosine Transform (DCT). It then describes how the DCT coefficients are quantized and arranged in zigzag order before entropy encoding with Huffman coding. The goal of JPEG compression is to store image data using as little space as possible while maintaining enough visual detail. The techniques discussed aim to remove irrelevant and redundant image data through DCT, quantization, and entropy encoding.
An improved image compression algorithm based on daubechies wavelets with ar...Alexander Decker
This document summarizes an academic article that proposes a new image compression algorithm using Daubechies wavelets and arithmetic coding. It first discusses existing image compression techniques and their limitations. It then describes the proposed algorithm, which applies Daubechies wavelet transform followed by 2D Walsh wavelet transform on image blocks and arithmetic coding. Results show the proposed method achieves higher compression ratios and PSNR values than existing algorithms like EZW and SPIHT. Future work aims to improve results by exploring different wavelets and compression techniques.
This document proposes an Adaptive Resolution Enhancement Algorithm (AREA) using an Edge Targeted Filter (ETF) and Dual Tree Complex Wavelet Transform (DTCWT) to enhance the resolution and image quality of low-quality, low-resolution images. The ETF first detects edges in an input image and generates a mask. It then focuses on improving edge reconstruction. The output is input to DTCWT, which analyzes subband coefficients to detect flaws and interpolates coefficients to enhance spectral resolution. The inverse DTCWT produces a spatially enhanced output image with improved resolution up to 17% compared to existing techniques. Simulation results demonstrate the effectiveness of the proposed algorithm.
The mean shift procedure is a general nonparametric technique for analyzing complex multimodal feature spaces and delineating arbitrarily shaped clusters. It works by recursively finding the nearest stationary point of the underlying density function, which corresponds to the mode of the density. The mean shift procedure relates to kernel density estimation and robust M-estimators of location. It provides a versatile tool for feature space analysis that can solve many low-level computer vision tasks with few parameters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
PDE BASED FEATURES FOR TEXTURE ANALYSIS USING WAVELET TRANSFORMIJCI JOURNAL
This document summarizes a research paper that proposes a novel method for texture analysis using wavelet transforms and partial differential equations (PDEs). The method involves applying wavelet transforms to images to obtain directional information. Anisotropic diffusion, a PDE technique, is then used on the directional information to compute a texture approximation. Various statistical features are extracted from the texture approximation. Linear discriminant analysis enhances class separability of the features before classification using k-nearest neighbors. The method is evaluated on the Brodatz texture dataset and results show it achieves better classification accuracy than other methods while having lower computational cost.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes an approach for object retrieval in videos using techniques from text retrieval. Regions of interest in video frames are detected and described using invariant descriptors. The descriptors are then vector quantized to form a "visual vocabulary" of words. Frames are represented as vectors of word frequencies and ranked based on similarity to query frames. The method is evaluated on two feature films, demonstrating immediate retrieval of all frames containing a queried object throughout the videos.
EFFICIENT IMAGE COMPRESSION USING LAPLACIAN PYRAMIDAL FILTERS FOR EDGE IMAGESijcnac
This project presents a new image compression technique for the coding of retinal and
fingerprint images. Retinal images are used to detect diseases like diabetes or
hypertension. Fingerprint images are used for the security purpose. In this work, the
contourlet transform of the retinal and fingerprint image is taken first. The coefficients of
the contourlet transform are quantized using adaptive multistage vector quantization
scheme. The number of code vectors in the adaptive vector quantization scheme depends
on the dynamic range of the input image.
This document provides information on several remote sensing projects from IEEE 2015. It lists the titles, languages, and abstracts for 8 projects related to classification and analysis of hyperspectral and multispectral images. The projects focus on techniques such as sparse representation in tangent space, Gabor feature-based collaborative representation, level set evolutions for object extraction, and dimension reduction using spatial and spectral regularization.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Implementation of Fuzzy Logic for the High-Resolution Remote Sensing Images w...IOSR Journals
This document describes an implementation of fuzzy logic for high-resolution remote sensing image classification with improved accuracy. It discusses using an object-based approach with fuzzy rules to classify urban land covers in a satellite image. The approach involves image segmentation using k-means clustering or ISODATA clustering. Features are then extracted from the image objects and fuzzy logic is applied to classify the objects based on membership functions. The method was tested on different sensor and resolution images in MATLAB and showed improved classification accuracy over other techniques, achieving lower entropy in results. Future work planned includes designing an unsupervised classification model combining k-means clustering and fuzzy-based object orientation.
Multi Wavelet for Image Retrival Based On Using Texture and Color QuerysIOSR Journals
This document summarizes a research paper on using multi-wavelet transforms for content-based image retrieval. The paper proposes extracting multi-wavelet features from images in a database and query images to measure similarity. It calculates energy levels from multi-wavelet sub-bands and uses Canberra distance between feature vectors to retrieve similar images. The method achieves 98.5% accuracy and is faster than using Gabor wavelets. In conclusion, multi-wavelet transforms provide good performance for content-based image retrieval applications.
This document summarizes research on using the discrete curvelet transform for content-based image retrieval.
The researchers propose extracting texture features from images using discrete curvelet transforms. Low-level statistics like mean and standard deviation are computed from the curvelet coefficients to represent images. Experiments on the Brodatz texture database show the curvelet-based features significantly outperform Gabor filters and wavelets. Curvelets better capture edge information and directionality compared to other transforms. With a 5-level decomposition, average retrieval precision was 79.54% versus 70.3% for wavelets. In conclusion, curvelet transforms are a promising technique for texture-based image retrieval.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document proposes EfficientDet, a new family of object detectors that achieve better accuracy and efficiency across a wide range of resource constraints. The key contributions are:
1. A weighted bi-directional feature pyramid network (BiFPN) that introduces learnable weights to efficiently fuse multi-scale features from different levels.
2. A compound scaling method that jointly scales the resolution, depth, and width of the backbone, feature network, and box/class prediction networks for higher accuracy.
3. Combining EfficientNet backbones with BiFPN and compound scaling, EfficientDet achieves state-of-the-art 52.2% AP on COCO while being 4x smaller and using 13x
Semantic Image Retrieval Using Relevance Feedback dannyijwest
This paper presents optimized interactive content-based image retrieval framework based on AdaBoost
learning method. As we know relevance feedback (RF) is online process, so we have optimized the learning
process by considering the most positive image selection on each feedback iteration. To learn the system we
have used AdaBoost. The main significances of our system are to address the small training sample and to
reduce retrieval time. Experiments are conducted on 1000 semantic colour images from Corel database to
demonstrate the effectiveness of the proposed framework. These experiments employed large image
database and combined RCWFs and DT-CWT texture descriptors to represent content of the images.
Reconfigurable intelligent surface passive beamforming enhancement using uns...IJECEIAES
Reconfigurable intelligent surfaces (RIS) is a wireless technology that has the potential to improve cellular communication systems significantly. This paper considers enhancing the RIS beamforming in a RIS-aided multiuser multi-input multi-output (MIMO) system to enhance user throughput in cellular networks. The study offers an unsupervised/deep neural network (U/DNN) that simultaneously optimizes the intelligent surface beamforming with less complexity to overcome the non-convex sum-rate problem difficulty. The numerical outcomes comparing the suggested approach to the near-optimal iterative semi-definite programming strategy indicate that the proposed method retains most performance (more than 95% of optimal throughput value when the number of antennas is 4 and RIS’s elements are 30) while drastically reducing system computing complexity.
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
The document describes a new method called Cluster Sensing Superpixel (CSS) for generating superpixels. CSS identifies ideal cluster centers using pixel density, which reflects pixel concentration. CSS searches for local optimal cluster centers that have high pixel density (representativeness) and are isolated from other high density pixels. It is approximately 5 times faster than state-of-the-art methods while maintaining comparable performance. When applied to microscopy image segmentation, CSS benefits from its efficient implementation.
IRJET- Object Detection using Hausdorff DistanceIRJET Journal
This document proposes a new object recognition system using Hausdorff distance. The system aims to improve on existing methods like YOLO that struggle with small objects and can capture garbage data. The document outlines preprocessing steps like noise cancellation, representing shapes as point sets, and extracting features. It then describes using Hausdorff distance and shape context to find the best match between input and reference shapes. Testing on datasets showed encouraging results for recognizing handwritten digits.
Omni-Modeler: Rapid Adaptive Visual Recognition with Dynamic Learningsipij
Deep neural network (DNN) image classification has grown rapidly as a general pattern detection
tool for an extremely diverse set of applications; yet dataset accessibility remains a major limiting
factor for many applications. This paper presents a novel dynamic learning approach to leverage
pretrained knowledge to novel image spaces in the effort to extend the algorithm knowledge domain and reduce dataset collection requirements. The proposed Omni-Modeler generates a dynamic
knowledge set by reshaping known concepts to create dynamic representation models of unknown
concepts. The Omni-Modeler embeds images with a pretrained DNN and formulates compressed
language encoder. The language encoded feature space is then used to rapidly generate a dynamic
dictionary of concept appearance models. The results of this study demonstrate the Omni-Modeler
capability to rapidly adapt across a range of image types enabling the usage of dynamically learning
image classification with limited data availability.
IRJET - Object Detection using Hausdorff DistanceIRJET Journal
This document proposes using Hausdorff distance for object detection as it can better handle noise compared to other methods like Euclidean distance. The document discusses preprocessing images using Gaussian filtering for noise cancellation. It then represents shapes as point sets for feature extraction before using Hausdorff distance to match shapes between reference and test images for object recognition. Encouraging results were obtained when testing on MNIST, COIL and private handwritten digit datasets.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
A HYBRID MORPHOLOGICAL ACTIVE CONTOUR FOR NATURAL IMAGESIJCSEA Journal
Morphological active contours for image segmentation have become popular due to their low
computational complexity coupled with their accurate approximation of the partial differential equations
involved in the energy minimization of the segmentation process. In this paper, a morphological active
contour which mimics the energy minimization of the popular Chan-Vese Active Contour without Edges is
coupled with a morphological edge-driven segmentation term to accurately segment natural images. By
using morphological approximations of the energy minimization steps, the algorithm has a low
computational complexity. Additionally, the coupling of the edge-based and region-based segmentation
techniques allows the proposed method to be robust and accurate. We will demonstrate the accuracy and
robustness of the algorithm using images from the Weizmann Segmentation Evaluation Database and
report on the segmentation results using the Sorensen-Dice similarity coefficient
A HYBRID MORPHOLOGICAL ACTIVE CONTOUR FOR NATURAL IMAGESIJCSEA Journal
Morphological active contours for image segmentation have become popular due to their low computational complexity coupled with their accurate approximation of the partial differential equations involved in the energy minimization of the segmentation process. In this paper, a morphological active contour which mimics the energy minimization of the popular Chan-Vese Active Contour without Edges is coupled with a morphological edge-driven segmentation term to accurately segment natural images. By using morphological approximations of the energy minimization steps, the algorithm has a low computational complexity. Additionally, the coupling of the edge-based and region-based segmentation techniques allows the proposed method to be robust and accurate. We will demonstrate the accuracy and robustness of the algorithm using images from the Weizmann Segmentation Evaluation Database and report on the segmentation results using the Sorensen-Dice similarity coefficient.
A HYBRID MORPHOLOGICAL ACTIVE CONTOUR FOR NATURAL IMAGESIJCSEA Journal
Morphological active contours for image segmentation have become popular due to their low computational complexity coupled with their accurate approximation of the partial differential equations
involved in the energy minimization of the segmentation process. In this paper, a morphological active contour which mimics the energy minimization of the popular Chan-Vese Active Contour without Edges is coupled with a morphological edge-driven segmentation term to accurately segment natural images. By using morphological approximations of the energy minimization steps, the algorithm has a low computational complexity. Additionally, the coupling of the edge-based and region-based segmentation techniques allows the proposed method to be robust and accurate. We will demonstrate the accuracy and robustness of the algorithm using images from the Weizmann Segmentation Evaluation Database and report on the segmentation results using the Sorensen-Dice similarity coefficient.
Enhanced data collection methods can help uncover the true extent of child abuse and neglect. This includes Integrated Data Systems from various sources (e.g., schools, healthcare providers, social services) to identify patterns and potential cases of abuse and neglect.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/316874386
Gabor Convolutional Networks
Conference Paper · April 2017
CITATIONS
0
READS
86
6 authors, including:
Some of the authors of this publication are also working on these related projects:
Deep learning for image classification, action recognition, automatic target detection, remote sensing analysis View project
deep learning View project
Shangzhen Luan
Beihang University (BUAA)
7 PUBLICATIONS 27 CITATIONS
SEE PROFILE
Baochang Zhang
Beihang University (BUAA)
93 PUBLICATIONS 1,839 CITATIONS
SEE PROFILE
Chen Chen
University of North Carolina at Charlotte
130 PUBLICATIONS 1,953 CITATIONS
SEE PROFILE
Jungong Han
Technische Universiteit Eindhoven
114 PUBLICATIONS 1,795 CITATIONS
SEE PROFILE
All content following this page was uploaded by Chen Chen on 12 May 2017.
The user has requested enhancement of the downloaded file.
2. : GAOBR CONVOLUTION NETWORKS 1
Gabor Convolutional Networks
Shangzhen Luan1
luanshangzhen@buaa.edu.cn
Baochang Zhang1
bczhang@139.com
Chen Chen2
chenchen870713@gmail.com
Xianbin Cao3
xbcao@buaa.edu.cn
Jungong Han4
jungonghan77@gmail.com
Jianzhuang Liu5
liu.jianzhuang@huawei.com
1
School of Automation Science and
electrical engineering
Beihang University
Beijing, China
2
Center for Research in Computer
Vision (CRCV)
University of Central Florida
Orlando, FL, USA
3
School of Electronics and Information
Engineering
Beihang University
Beijing, China
4
Northumbria University
Newcastle, UK
5
Huawei Company
Shenzhen, China
Abstract
Steerable properties dominate the design of traditional filters, e.g., Gabor filters, and
endow features the capability of dealing with spatial transformations. However, such ex-
cellent properties have not been well explored in the popular deep convolutional neural
networks (DCNNs). In this paper, we propose a new deep model, termed Gabor Convolu-
tional Networks (GCNs or Gabor CNNs), which incorporates Gabor filters into DCNNs
to enhance the resistance of deep learned features to the orientation and scale changes.
By only manipulating the basic element of DCNNs based on Gabor filters, i.e., the con-
volution operator, GCNs can be easily implemented and are compatible with any popular
deep learning architecture. Experimental results demonstrate the super capability of our
algorithm in recognizing objects, where the scale and rotation changes occur frequently.
The proposed GCNs have much fewer learnable network parameters, and thus is easier
to train with an end-to-end pipeline.
1 Introduction
Anisotropic filtering techniques have been widely used to extract robust image representa-
tion. Among them, the Gabor wavelets based on a sinusoidal plane wave with particular
frequency and orientation can characterize the spatial frequency structure in images while
preserving information of spatial relations, thus enabling to extract orientation-dependent
frequency contents of patterns. Recently, deep convolutional neural networks (DCNNs)
based on convolution filters have gained much attention in computer vision. This efficient,
c 2017. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.
arXiv:1705.01450v2[cs.CV]4May2017
3. 2 : GAOBR CONVOLUTION NETWORKS
A learned filter
Modulated filter
Figure 1: Left illustrates Alexnet filters. Middle shows Gabor filters. Right presents the
convolution filters modulated by Gabor filters. Filters are often redundantly learned in CNN,
and some of which are similar to Gabor filters. From this fact, we are motivated to actively
manipulate the learned filters using Gabor filters, and consequently the number of learned
filters can be reduced and leading to a compressed deep model. In the right column, a filter
is modulated by Gabor filters via Eq. 2 to enhance the orientation property.
scalable and end-to-end model has the amazing capability of learning powerful feature rep-
resentations from raw pixels, boosting performance of many computer vision tasks, such as
image classification, object detection, and semantic segmentation. Unlike hand-crafted fil-
ters without any learning process, DCNNs-based feature extraction is a purely data-driven
technique that can learn robust representations from data, but usually at the cost of expensive
training and complex model parameters. Additionally, the capability of modeling geometric
transformations mostly comes from extensive data augmentation, large models, and hand-
crafted modules (e.g., max-pooling [1] for small translation-invariance). Therefore, DCNNs
normally fail to handle large and unknown object transformations if the training data are not
enough, one reason of which originates from the way of filter designing [1, 26].
Fortunately, the need to enhance model capacity to transformations has been perceived
by researchers and some attempts have been made in recent years. In [3], a deformable
convolution filter was introduced to enhance DCNNs’ capacity of modeling geometric trans-
formations. It allows free form deformation of the sampling grid, whose offsets are learned
from the preceding feature maps. However, the deformable filtering is still complicated and
associated with the RoI pooling technique originally designed for object detection [5]. In
[26], Actively Rotating Filters (ARFs) was proposed to give DCNNs the generalization abil-
ity of rotation. However, such a filter rotation method is actually only suitable to small and
simple filters, i.e., with sizes of 1x1 and 3x3. Though the authors claimed a general method
to modulate the filters based on the Fourier transform, it was not implemented in [26]. One of
the reasons may lie in its computational complexity. Furthermore, 3D filters [10] are hardly
modified by deformable filters or ARFs. In [8], by combining low level filters (Gaussian
derivatives up to the 4-th order) with learned weight coefficients, the regularization over the
filter function space is shown to improve the generalization ability but only when the set of
training data is small.
In Fig. 1 the visualization of convolutional filters [12] indicates that filters are often
redundantly learned, such as those Alexnet filters trained on ImageNet1, and some of the
filters from shallow layers are similar to Gabor filters. It is known that the steerable properties
of Gabor filters dominate the traditional filter design due to their enhanced capability of
scale and orientation decomposition of signals, which is unfortunately neglected in prevailing
convolutional filters in DCNNs. Can we just learn a small set of filters, which are then
manipulated to create more in a similar way as the design of Gabor filters? If so, one of the
obvious advantages lies in that we just need to learn a small set of filters for DCNNs, leading
to a more compact but enhanced deep model.
1For illustration purpose, Alexnet is selected because its filters are of big sizes.
4. : GAOBR CONVOLUTION NETWORKS 3
In this paper we propose using traditional hand-crafted Gabor filters to manipulate the
learnable convolution filters with the aim to reduce the number of learnable network param-
eters and enhance the deep features. The manipulation of learnable convolution filters with
powerful Gabor filters produces convolutional Gabor orientation filters (GoFs), which endow
the convolution filters the additional capability of capturing the visual properties such as spa-
tial localization, orientation selectivity and spatial frequency selectivity in the output feature
maps. GoFs are implemented on the basic element of CNNs, i.e., the convolution filter, and
thus can be easily integrated into any deep architecture. DCNNs with GoFs, referred to as
Gabor CNNs (GCNs), can learn more robust feature representations, particularly for images
of spatial transformations. Owing to a set of steerable filters that can generate enhanced
feature maps for Gabor CNNs, the resulting model is more compact and representative with
a small set of learned filters, and thus is easy to train. The contributions of this paper are
summarized as follows:
1) To the best of our knowledge, it is the first time Gabor filters are incorporated into
the convolution filter to improve the robustness of DCNNs to image transformations such as
transitions, scale changes and rotations.
2) Without bells and whistles, GCNs improve the widely used DCNNs architectures in-
cluding conventional CNNs and ResNet [6], obtaining new state-of-the-art results on popular
benchmarks.
2 Related Work
2.1 Gabor filters
Gabor wavelets [4] were invented by Dennis Gabor using complex functions to serve as a
basis for Fourier transforms in information theory applications. An important property of
the wavelets is that the product of its standard deviations is minimized in both time and
frequency domains. Gabor was widely used to model receptive fields of simple cells of the
visual cortex. The Gabor wavelets (kernels or filters) are defined as follows [17]:
Ψu,v(z) =
||ku,v||2
σ2
e−(||ku,v||2||z||2/2σ2)
[eiku,vz
−e−σ2/2
] (1)
where ku,v = kveiku , kv = (π/2)/
√
2
(v−1)
, ku = u π
U , with v = 0,...,V and u = 0,...,U and v is
the frequency and u is the orientation, and σ = 2π. In [19, 25], Gabor wavelets were used to
initialize the deep models or serve as the input layer. However, we take a different approach
by utilizing Gabor filters to modulate the learned convolution filters. Specifically, we change
the basic element of CNNs – convolution filters to GoFs to enforce the impact of Gabor filters
on each convolutional layer. Therefore, the steerable properties are inherited into the DCNNs
to enhance the robustness to scale and orientation variations in feature representations.
2.2 Learning feature representations
Given rich and often redundant convolutional filters, data augmentation is used to achieve
local/global transform invariance [22]. Despite the effectiveness of data augmentation, the
main drawback lies in that learning all possible transformations usually requires a lot of net-
work parameters, which significantly increases the training cost and the risk of over-fitting.
5. 4 : GAOBR CONVOLUTION NETWORKS
Most recently, TI-Pooling [13] alleviates the drawback by using parallel network architec-
tures for the transformation set and applying the transformation invariant pooling operator on
the outputs before the top layer. Nevertheless, with a built-in data augmentation, TI-Pooling
requires significantly more training and testing computational cost than a standard CNN.
Spatial Transformer Networks: To gain more robustness against spatial transforma-
tions, a new framework for spatial transformation termed spatial transformer network (STN)
[9] is introduced by using an additional network module that can manipulate the feature
maps according to the transform matrix estimated with a localisation sub-CNN. However,
STN does not provide a solution to precisely estimate complex transformation parameters.
Oriented Response Networks: By using Actively Rotating Filters (ARFs) to generate
orientation-tensor feature maps, Oriented Response Network (ORN) [26] encodes hierar-
chical orientation responses of discriminative structures. With these responses, ORN can
be used to either encode the orientation-invariant feature representation or estimate object
orientations. However, ORN is more suitable to small size filters, i.e., 3x3, whose orienta-
tion invariance property is not guaranteed by the ORAlign strategy based on their marginal
performance improvement as compared with TI-Pooling.
Deformable convolutional network: Deformable convolution and deformable RoI pool-
ing are introduced in [3] to enhance the transformation modeling capacity of CNNs, making
the network robust to geometric transformations. However, the deformable filters still prefer
operating on small-sized filters.
Scattering Networks: In wavelet scattering network [2, 20], expressing receptive fields
in CNNs as a weighted sum over a fixed basis allows the new structured receptive field
networks to increase the performance considerably over unstructured CNNs for small and
medium datasets. In contrast to the scattering networks, our GCNs are based on Gabor
filters to change the convolution filters in a steerable way.
3 Gabor CNNs
Gabor Convolutional Networks (GCNs) are deep convolutional neural networks using Gabor
orientation filters (GoFs). An GoF is a steerable filter, created by manipulating the learned
filters via Gabor filter banks, and then used to produce the enhanced feature maps. With
GoFs, GCNs ont only involve significant fewer learnable filters and thus are easy to be
trained, but also lead to enhanced deep models.
In what follows, we address three issues in adopting GoFs in DCNNs. First, we give the
details on obtaining GoFs through Gabor filters. Second, we describe convolutions that use
GoFs to produce feature maps with scale and orientation information enhanced. Third, we
show how GoFs are learned during the back-propagation update stage.
3.1 Convolutional Gabor orientation Filters (GoFs)
Gabor filters are of U directions and V scales. To incorporate the steerable properties into
the GCNs, the orientation information is encoded in the learned filters, and at the same time
the scale information is embedded into different layers. Due to the orientation and scale
information captured by Gabor filters in GoFs, the corresponding convolution features are
enhanced.
Before being modulated by Gabor filters, the convolution filters in standard CNNs are
learned by back propagation (BP) algorithm, which are denoted as learned filters. Let a
6. : GAOBR CONVOLUTION NETWORKS 5
GoFGabor filter bankLearned filter GoF
GCConv
Input feature map (F) ˆFOnput feature map ( )
4x3x3 4x4x3x3 1x4x32x32 4x4x3x3 1x4x30x304, 3x3
Figure 2: Left shows modulation process of GoFs. Right illustrates an example of GCN
convolution with 4 channels. In a GoF, the number of channels is set to be the number of
Gabor orientations U for implementation convenience.
learned filter be with size N ×W ×W, where W ×W is the size of 2D filter (N channels).
For implementation convenience, N is chosen to be U the number of the orientations of the
Gabor filters that will be used to modulate this learned filter. A GoF is obtained based on a
modulated process using U Gabor filters on the learned filters for a given scale v. The details
concerning the filter modulation are shown in Eq. 2 and Fig. 2. For the vth scale, we define:
Cv
i,u = Ci,o ◦G(u,v) (2)
where Ci,o is a learned filter, and ◦ is an element-by-element product operation between
G(u,v) 2 and each 2D filter of Ci,o. Cv
i,u is the modulated filter of Ci,o by the v-scale Gabor
filter G(u,v). And a GoF is defined as:
Cv
i = (Cv
i,1,...,Cv
i,U ) (3)
Thus, the ith GoF Cv
i is actually U 3D filters (see Fig. 2, here U = 4). In GoFs, the value
of v increases with increasing layers, which means that scales of Gabor filters in GoFs are
changed based on layers. At each scale, the size of a GoF is U × N ×W ×W. But we
only save N ×W ×W-sized learned filters, because Gabor filters are given. To simplify the
description of the learning process, v is omitted in the next section.
3.2 GCN convolution
In GCNs, GoFs are used to produce feature maps, which explicitly enhance the scale and
orientation information in deep features. A output feature map F in GCNs is denoted as:
F = GCconv(F,Ci) (4)
where Ci is the ith GoF and F is the input feature map as shown in Fig. 2. The channels of
F is obtained by the following convolution:
Fi,k =
N
∑
n=1
F(n)
⊗C
(n)
i,u=k (5)
where (n) refers to the nth channel of F and Ci,u, and Fi,k is the kth orientation response of
F. For example as shown in Fig. 2, let the size of the input feature map be 1×4×32×32.
If there are 10 GoFs with 4 Gabor orientations, the size of the output feature map is 10×4×
30×30.
2Real parts of Gabor filters are used
7. 6 : GAOBR CONVOLUTION NETWORKS
Extend
4
GC(4,v=1)
4x3x3,20
M
1x32x32 1x4x32x32 20x4x15x15 40x4x6x6 80x4x3x3 160x4x1x1 160x1
FC
1024x1
Output
10x1
C 3*3
80
Output
1x32x32 80*15*15 160*6*6 320*3*3 640*1*1 1024*1 10*1
MP+R
C 3x3
160
MP+R
C 3x3
320
MP+R
C 3x3
160
R FC
MP
+R
B
N
GC(4,v=2)
4x3x3,20
MP
+R
B
N
GC(4,v=3)
4x3x3,20
MP
+R
B
N
GC(4,v=4)
4x3x3,20
R
B
N
Input
image
Input
image
Extend
4
ORConv4
4x3x3,20
Align
1x32x32 1x4x32x32 20x4x15x15 40x4x6x6 80x4x3x3 160x4x1x1 160x4
FC
1024x1
Output
10x1
MP
+R
ORConv4
4x3x3,40
MP
+R
ORConv4
4x3x3,80
MP
+R
ORConv4
4x3x3,160
R
Input
image
D
D
D
CNN
ORN
GCN
C: Spatial Convolution MP: Max Pooling R: ReLu M: Max Align: ORAlign BN:BatchNormlization D:Dropout
Figure 3: Network structures of CNNs, ORNs
and GCNs.
Conv1x1
Conv3x3
+
xl
xl+1
Conv3x3
Conv3x3
+
xl
xl+1
Conv1x1
GCConv3x3
xl
xl+1
GCConv3x3
+
GCConv5x5
xl
xl+1
GCConv3x3
+
(a) Resnet basic block (b) Resnet bottleneck (c) GCN basic-1 (d) GCN basic-2
Figure 4: The residual block. (a) and (b)
are for ResNet. (c) Small kernel and (d)
large kernel are for GCNs.
Table 1: Results (error rate (%) on
MNIST) vs. Gabor filter scales.
kernel 5x5 5x5 3x3 3x3
# params (M) 1.86 0.51 0.78 0.25
V = 1 scale 0.48 0.57 0.51 0.7
V = 4 scales 0.48 0.56 0.49 0.63
Table 2: Results (error rate (%) on MNIST) vs.
Gabor filter orientations.
U 2 3 4 5 6 7
5x5 0.52 0.51 0.48 0.49 0.49 0.52
3x3 0.68 0.58 0.56 0.59 0.56 0.6
3.3 Updating GoF
In the back-propagation (BP) process, only the leaned filer Ci,o needs to be updated. And we
have:
δ =
U
∑
u=1
∂L
∂Ci,u
(6)
Ci,o = Ci,o −ηδ (7)
where L is the loss function. From the above description, it can be seen that the BP process
is easily implemented and is very different from ORNs and deformable kernels that usu-
ally require a relatively complicated procedure. By only updating the learned convolution
filters Ci,o, the GCNs model can be more compact and efficient, and also is more robust to
orientation and scale variations.
4 Implementation and Experiments
In this section, we present the details of the GCNs implementation based on conventional
DCNNs architectures. Afterwards, we evaluate GCNs on the MNIST digit recognition
dataset [14, 15] as well as its rotated version MNIST-rot used in ORNs, which is generated
by rotating each sample in the MNIST dataset by a random angle between [0,2π]. To further
evaluate the performance of GCNs, the experiments on the SVHN dataset [18], CIFAR-10
and CIFAR-100 [11] are also provided 3. We have two GPU platforms used in our experi-
ments, NVIDIA GeForce GTX 1070 and GeForce GTX TITAN X(2).
4.1 MNIST
For the MNIST dataset, we randomly select 10,000 samples from the training set for valida-
tion and the remaining 50,000 samples for training. Adadelta optimization algorithm [24] is
3 Please also refer to the supplementary materials for more details.
8. : GAOBR CONVOLUTION NETWORKS 7
Bird
4.1%
Ship
2.5%
Deer
3.2%
Figure 5: Recognition results of different categories on CIFAR10. Compared with ResNet-
110, GCNs perform significantly better on the categories with large scale variation.
Table 3: Results comparison on MNIST
Method error (%)
# network stage kernels # params (M) time (s) MNIST MNIST-rot
Baseline CNN 80-160-320-640 3.08 6.50 0.73 2.82
STN 80-160-320-640 3.20 7.33 0.61 2.52
TIPooling(×8) (80-160-320-640)×8 24.64 50.21 0.97 not permitted
ORN4(ORAlign) 10-20-40-80 0.49 9.21 0.57 1.69
ORN8(ORAlign) 10-20-40-80 0.96 16.01 0.59 1.42
ORN4(ORPooling) 10-20-40-80 0.25 4.60 0.59 1.84
ORN8(ORPooling) 10-20-40-80 0.39 6.56 0.66 1.37
GCN4(with 3×3) 10-20-40-80 0.25 3.45 0.63 1.45
GCN4(with 3×3) 20-40-80-160 0.78 6.67 0.56 1.28
GCN4(with 5×5) 10-20-40-80 0.51 10.45 0.49 1.26
GCN4(with 5×5) 20-40-80-160 1.86 23.85 0.48 1.10
GCN4(with 7×7) 10-20-40-80 0.92 10.80 0.46 1.33
GCN4(with 7×7) 20-40-80-160 3.17 25.17 0.42 1.20
Table 4: Results comparison on SVHN. No additional training set is used for training
Method VGG ResNet-110 ResNet-172 GCN4-40 GCN4-28 ORN4-40 ORN4-28
# params 20.3M 1.7M 2.7M 2.2M 1.4M 2.2M 1.4M
Accuracy (%) 95.66 95.8 95.88 96.9 96.86 96.35 96.19
used during the training process, with the batch size as 128, initial learning rate as 0.001 (η)
and weight decay as 0.00005. The learning rate is reduced to half per 25 epochs. We report
the performance of our algorithm on a test set after 200 epochs based on the average over 5
runs. The state-of-the-art STN [9], TI-Pooling [13], ResNet [6] and ORNs [26] are involved
in the comparison. Among them, STN is more robust to spatial transformation than the base-
line CNNs, due to a spatial transform layer prior to the first convolution layer. TI-Pooling
adopts a transform-invariant pooling layer to get the response of main direction, resulting in
rotation robust features. ORNs capture the response of each direction by rotating the original
convolution kernel spatially.
Fig. 3 shows the network structures of CNNs, ORNs and GCNs (U = 4), which are
used in this experiment. For all models, we adopt Max-pooling and ReLU after convolution
layers, and a dropout layer [7] after FC layer to avoid over-fitting. To compare with other
CNNs in a similar model size, we reduce the width of layer 4 by a certain proportion as done
4The number of convolution kernels per layer.
9. 8 : GAOBR CONVOLUTION NETWORKS
Table 5: Results comparison on CIFAR-10 and CIFAR-100
Method
error (%)
CIFAR-10 CIFAR-100
NIN 8.81 35.67
VGG 6.32 28.49
# network stage kernels
# params
Fig.4(c)/Fig.4(d)
ResNet-110 16-16-32-64 1.7M 6.43 25.16
ResNet-1202 16-16-32-64 10.2M 7.83 27.82
GCN2-110 12-12-24-45 1.7M/3.3M 6.34/5.62
GCN2-110 16-16-32-64 3.4M/6.5M 5.65/4.96 26.14/25.3
GCN4-110 8-8-16-32 1.7M 6.19
GCN2-40 16-32-64-128 4.5M 4.95 24.23
GCN4-40 16-16-32-64 2.2M 5.34 25.65
WRN-40 64-64-128-256 8.9M 4.53 21.18
WRN-28 160-160-320-640 36.5M 4.00 19.25
GCN2-40 16-64-128-256 17.9M 4.41 20.73
GCN4-40 16-32-64-128 8.9M 4.65 21.75
GCN3-28 64-64-128-256 17.6M 3.88 20.13
in ORNs, i.e., 1/8 [26].
We evaluate different scales for different GCNs layers (i.e., V = 4,V = 1), where larger
scale Gabor filters are used on shallow layers or a single scale is used on all layers. It should
be noted that in the following experiments, we also use V = 4 for deeper networks (ResNet),
which are equally distributed across layers. As shown in Table 1, the results of V = 4 in
terms of error rate are better than those when a single scale (V = 1) is used on all layers
when U = 4. We also test different orientations as shown in Table 2. The results indicate
that GCNs perform better using 3 to 6 orientations when V = 4, which is more flexible than
ORNs. In comparison, ORNs use a complicated interpolation process via ARFs besides 4-
and 8-pixel rotations.
In Table 3, the second column refers to the width of each layer, and a similar notation
is also used in [23]. Considering a GoF has multiple channels (N), we decrease the width
of layer (i.e., the number of GoFs per layer) to reduce the model size for the purpose of a
fair comparison. The parameter size of GCNs is linear with channel (N) but quadratic with
width of layer. Therefore, the GCNs complexity is reduced as compared with CNNs (see
the third column of Table 3). In the fourth column, we compare the computation time (s)
for training epoch of different methods using GTX 1070, which clearly shows that GCNs
are more efficient than other state-of-the-art models. The performance comparison is shown
in the last two columns in terms of error rate. By comparing with baseline CNNs, GCNs
achieved much better performance with 3x3 kernel but only using 1/12, 1/4 parameters of
CNNs. It is observed from experiments that the GCNs with 5x5 and 7x7 kernels achieve
1.10% on MNIST-rot and 0.42% on MNIST test error respectively, which are better than
those of ORNs. This can be explained by the fact that the kernels with larger size carry more
information of Gabor orientation, and thus capture better orientation response features. Table
3 also demonstrates that a larger GCN model can result into a better performance. In addi-
tion, in MNIST-rot datasets, the performance of baseline CNN model is greatly disturbed
by rotation, while ORNs and GCNs can capture orientation features and achieve better re-
sults. Again, GCNs outperform than ORNs, which confirms that Gabor modulation indeed
helps to gain the robustness to the rotation variations, because deep features are enhanced
based on steerable filters. In contrast, ORNs just actively rotate the filters and lack a feature
10. : GAOBR CONVOLUTION NETWORKS 9
enhancement process.
4.2 SVHN
The Street View House Numbers (SVHN) dataset [18] is a real-world image dataset taken
from Google Street View images. SVHN contains MNIST-like 32x32-sized images centered
around a single character, which however include a plethora of challenges like illumination
changes, rotations and complex backgrounds. The dataset consists of 600000 digit images:
73257 digits for training, 26032 digits for testing, and 531131 additional images. Note that
the additional images are not used for all methods in this experiment. For this large scale
dataset, we implement GCNs based on ResNet. Specifically, we replace the spatial convo-
lution layers with our GoFs based GCConv layers, leading to GCN-ResNet. The bottleneck
structure is not used since the 1x1 kernel does not propagate any Gabor filter information.
ResNet divides the whole network into 4 stages, and the width of stage (the number of convo-
lution kernels per layer) is set as 16,16,32,64, respectively. We make appropriate adjustments
to the network depth and width to ensure our GCNs method has a similar model size as com-
pared with VGG [21] and ResNet. And we set up 40-layer and 28-layer GCN-ResNets with
basic block-(c)(Fig. 4), using the same hyper-parameters as ResNet. The network stage is
also set as 16-16-32-64. The results are shown in Table 4. Compared to VGG model, GCNs
have much smaller parameter size, yet obtain a better performance with 1.2% improvement.
With a similar parameter size, the GCN-ResNet achieves better results (1.1%, 0.66%) than
ResNet and ORNs respectively, which further validates the superiority of GCNs for real-
world problems.
4.3 Natural Image Classification
For the natural image classification task, we use the CIFAR datasets including CIFAR-10
and CIFAR-100 [11]. The CIFAR datasets consist of 60000 32x32-sized color images in
10 or 100 classes, with 6000 or 600 images per class. There are 50000 training images and
10000 test images.
CIFAR datasets contain a wide variety of categories with object scale and orientation
variations. Similar to SVHN, we test GCN-ResNet on CIFAR datasets. Experiments are
conducted to compare our method with the state-of-the-art networks (i.e., NIN [16], VGG
[21], ORN [26] and ResNet [6]). For example on CIFAR-10, Table 5 shows that GCNs
consistently improve the performance regardless of the number of parameters or kernels as
compared with baseline ResNet. We further compare GCNs with Wide Residue network
(WRN) [23], and again it achieves a better performance (4% vs. 3.88%) when our model is
half the size of WRN, indicating significant advantage of GCNs in terms of model efficiency.
Similar to CIFAR-10, one can also observe the performance improvement on CIFAR-100,
with similar parameter sizes. Moreover, when using different kernel size configurations
(from 3 × 3 to 5 × 5 as shown in Fig. 4(c) and Fig. 4(d)), the model size is increased but
with a performance (error rate) improvement from 6.34% to 5.62%. We notice that some
top improved classes in CIFAR10 are bird (4.1% higher than baseline ResNet), and deer
(3.2%), which exhibit significant within class scale variations. This implies that the Gabor
manipulation in CNNs enhances the capability of handling scale variations as shown in Fig.
5.
11. 10 : GAOBR CONVOLUTION NETWORKS
5 Conclusion
This paper has incorporate Gabor filters to DCNNs, aiming to enhance the deep feature
representations with steerable orientation and scale capacities. The proposed Gabor Con-
volutional Networks (GCNs) improve DCNNs on the generalization ability of rotation and
scale variations by introducing extra functional modules on the basic element of DCNNs,
i.e., the convolution filters. GCNs can be easily implemented using popular architectures.
The extensive experiments show that GCNs significantly improved baselines, resulting in
the state-of-the-art performance over several benchmarks.
References
[1] Y-Lan Boureau, Jean Ponce, and Yann LeCun. A theoretical analysis of feature pooling
in visual recognition. In Proceedings of the 27th international conference on machine
learning (ICML), pages 111–118, 2010.
[2] Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. IEEE
transactions on pattern analysis and machine intelligence, 35(8):1872–1886, 2013.
[3] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen
Wei. Deformable convolutional networks. arXiv preprint arXiv:1703.06211, 2017.
[4] Dennis Gabor. Theory of communication. part 1: The analysis of information. Journal
of the Institution of Electrical Engineers-Part III: Radio and Communication Engineer-
ing, 93(26):429–441, 1946.
[5] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on
Computer Vision, pages 1440–1448, 2015.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for
image recognition. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 770–778, 2016.
[7] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R
Salakhutdinov. Improving neural networks by preventing co-adaptation of feature de-
tectors. arXiv preprint arXiv:1207.0580, 2012.
[8] Jorn-Henrik Jacobsen, Jan van Gemert, Zhongyu Lou, and Arnold WM Smeulders.
Structured receptive fields in cnns. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 2610–2619, 2016.
[9] Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer net-
works. In Advances in Neural Information Processing Systems, pages 2017–2025,
2015.
[10] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolutional neural networks
for human action recognition. IEEE transactions on pattern analysis and machine
intelligence, 35(1):221–231, 2013.
[11] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny
images. 2009.
12. : GAOBR CONVOLUTION NETWORKS 11
[12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with
deep convolutional neural networks. In Advances in neural information processing
systems, pages 1097–1105, 2012.
[13] Dmitry Laptev, Nikolay Savinov, Joachim M Buhmann, and Marc Pollefeys. Ti-
pooling: transformation-invariant pooling for feature learning in convolutional neural
networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 289–297, 2016.
[14] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learn-
ing applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324,
1998.
[15] Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of hand-
written digits, 1998.
[16] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. in Proc. of ICLR,
2014.
[17] Chengjun Liu and Harry Wechsler. Gabor feature based classification using the en-
hanced fisher linear discriminant model for face recognition. IEEE Transactions on
Image processing, 11(4):467–476, 2002.
[18] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y
Ng. Reading digits in natural images with unsupervised feature learning. In NIPS
workshop on deep learning and unsupervised feature learning, volume 2011, page 5,
2011.
[19] Wanli Ouyang and Xiaogang Wang. Joint deep learning for pedestrian detection. In
Proceedings of the IEEE International Conference on Computer Vision, pages 2056–
2063, 2013.
[20] Laurent Sifre and Stéphane Mallat. Rotation, scaling and deformation invariant scat-
tering for texture discrimination. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1233–1240, 2013.
[21] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[22] David A Van Dyk and Xiao-Li Meng. The art of data augmentation. Journal of Com-
putational and Graphical Statistics, 10(1):1–50, 2001.
[23] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint
arXiv:1605.07146, 2016.
[24] Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint
arXiv:1212.5701, 2012.
[25] Zhuoyao Zhong, Lianwen Jin, and Zecheng Xie. High performance offline handwritten
chinese character recognition using googlenet and directional feature maps. In 13th
International Conference on Document Analysis and Recognition (ICDAR), pages 846–
850. IEEE, 2015.
13. 12 : GAOBR CONVOLUTION NETWORKS
[26] Yanzhao Zhou, Qixiang Ye, Qiang Qiu, and Jianbin Jiao. Oriented response networks.
Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogin-
tion, 2017.
View publication statsView publication stats