Autonomous vehicle is one the prominent area of research in computer
vision. In today’s AI world, the concept of autonomous vehicles has become
popular largely to avoid accidents due to negligence of driver. Perceiving the
depth of the surrounding region accurately is a challenging task in
autonomous vehicles. Sensors like light detection and ranging can be used
for depth estimation but these sensors are expensive. Hence stereo matching
is an alternate solution to estimate the depth. The main difficulties observed
in stereo matching is to minimize mismatches in the ill-posed regions, like
occluded, texture less and discontinuous regions. This paper presents an
efficient deep stereo matching technique for estimating disparity map from
stereo images in ill-posed regions. The images from Middlebury stereo data
set are used to assess the efficacy of the model proposed. The experimental
outcome dipicts that the proposed model generates reliable results in the
occluded, texture less and discontinuous regions as compared to the existing
techniques.
Noise Removal in Traffic Sign Detection SystemsCSEIJJournal
The application of Traffic sign detection and recognition is growing in traffic assistant driving systems and
automatic driving systems. It helps drivers and automatic driving systems to detect and recognize the
traffic signs effectively. However, it is found that it may be difficult for these systems to work in challenging
environments like rain, haze, hue, etc. To help the detection systems to have better performance in
challenging conditions like rain and haze, we propose the use of a deep learning technique based on a
Convolutional Neural Network to process visual data. The processed data could be used in the detection.
We are using the NoiseNet model [11], a noise reduction network for our architecture. The model is
trained to enhance images in patches instead of as a whole. The training is done using the Challenging
Unreal and Real Environment - Traffic Sign Detection Dataset(CURE-TSD) which contains videos of
different roads in various challenging situations. The
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Noise Removal in Traffic Sign Detection SystemsCSEIJJournal
The application of Traffic sign detection and recognition is growing in traffic assistant driving systems and
automatic driving systems. It helps drivers and automatic driving systems to detect and recognize the
traffic signs effectively. However, it is found that it may be difficult for these systems to work in challenging
environments like rain, haze, hue, etc. To help the detection systems to have better performance in
challenging conditions like rain and haze, we propose the use of a deep learning technique based on a
Convolutional Neural Network to process visual data. The processed data could be used in the detection.
We are using the NoiseNet model [11], a noise reduction network for our architecture. The model is
trained to enhance images in patches instead of as a whole. The training is done using the Challenging
Unreal and Real Environment - Traffic Sign Detection Dataset(CURE-TSD) which contains videos of
different roads in various challenging situations. The
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
Stereo matching based on absolute differences for multiple objects detectionTELKOMNIKA JOURNAL
This article presents a new algorithm for object detection using stereo camera system. The problem to get an accurate object detion using stereo camera is the imprecise of matching process between two scenes with the same viewpoint. Hence, this article aims to reduce the incorrect matching pixel with four stages. This new algorithm is the combination of continuous process of matching cost computation, aggregation, optimization and filtering. The first stage is matching cost computation to acquire preliminary result using an absolute differences method. Then the second stage known as aggregation step uses a guided filter with fixed window support size. After that, the optimization stage uses winner-takes-all (WTA) approach which selects the smallest matching differences value and normalized it to the disparity level. The last stage in the framework uses a bilateral filter. It is effectively further decrease the error on the disparity map which contains information of object detection and locations. The proposed work produces low errors (i.e., 12.11% and 14.01% nonocc and all errors) based on the KITTI dataset and capable to perform much better compared with before the proposed framework and competitive with some newly available methods.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Depth-DensePose: an efficient densely connected deep learning model for came...IJECEIAES
Camera/image-based localization is important for many emerging applications such as augmented reality (AR), mixed reality, robotics, and self-driving. Camera localization is the problem of estimating both camera position and orientation with respect to an object. Use cases for camera localization depend on two key factors: accuracy and speed (latency). Therefore, this paper proposes Depth-DensePose, an efficient deep learning model for 6-degrees-of-freedom (6-DoF) camera-based localization. The Depth-DensePose utilizes the advantages of both DenseNets and adapted depthwise separable convolution (DS-Conv) to build a deeper and more efficient network. The proposed model consists of iterative depth-dense blocks. Each depth dense block contains two adapted DS-Conv with two kernel sizes 3 and 5, which are useful to retain both low-level as well as high-level features. We evaluate the proposed Depth-DensePose on the Cambridge Landmarks dataset, which shows that the Depth-DensePose outperforms the performance of related deep learning models for camera based localization. Furthermore, extensive experiments were conducted which proven the adapted DS-Conv is more efficient than the standard convolution. Especially, in terms of memory and processing time which is important to real-time and mobile applications.
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
Machine learning based augmented reality for improved learning application th...IJECEIAES
Detection of objects and their location in an image are important elements of current research in computer vision. In May 2020, Meta released its state-ofthe-art object-detection model based on a transformer architecture called detection transformer (DETR). There are several object-detection models such as region-based convolutional neural network (R-CNN), you only look once (YOLO) and single shot detectors (SSD), but none have used a transformer to accomplish this task. These models mentioned earlier, use all sorts of hyperparameters and layers. However, the advantages of using a transformer pattern make the architecture simple and easy to implement. In this paper, we determine the name of a chemical experiment through two steps: firstly, by building a DETR model, trained on a customized dataset, and then integrate it into an augmented reality mobile application. By detecting the objects used during the realization of an experiment, we can predict the name of the experiment using a multi-class classification approach. The combination of various computer vision techniques with augmented reality is indeed promising and offers a better user experience.
A new function of stereo matching algorithm based on hybrid convolutional neu...IJEECSIAES
This paper proposes a new hybrid method between the learning-based and handcrafted methods for a stereo matching algorithm. The main purpose of the stereo matching algorithm is to produce a disparity map. This map is essential for many applications, including three-dimensional (3D) reconstruction. The raw disparity map computed by a convolutional neural network (CNN) is still prone to errors in the low texture region. The algorithm is set to improve the matching cost computation stage with hybrid CNN-based combined with truncated directional intensity computation. The difference in truncated directional intensity value is employed to decrease radiometric errors. The proposed method’s raw matching cost went through the cost aggregation step using the bilateral filter (BF) to improve accuracy. The winner-take-all (WTA) optimization uses the aggregated cost volume to produce an initial disparity map. Finally, a series of refinement processes enhance the initial disparity map for a more accurate final disparity map. This paper verified the performance of the algorithm using the Middlebury online stereo benchmarking system. The proposed algorithm achieves the objective of generating a more accurate and smooth disparity map with different depths at low texture regions through better matching cost quality.
A new function of stereo matching algorithm based on hybrid convolutional neu...nooriasukmaningtyas
This paper proposes a new hybrid method between the learning-based and handcrafted methods for a stereo matching algorithm. The main purpose of the stereo matching algorithm is to produce a disparity map. This map is essential for many applications, including three-dimensional (3D) reconstruction. The raw disparity map computed by a convolutional neural network (CNN) is still prone to errors in the low texture region. The algorithm is set to improve the matching cost computation stage with hybrid CNN-based combined with truncated directional intensity computation. The difference in truncated directional intensity value is employed to decrease radiometric errors. The proposed method’s raw matching cost went through the cost aggregation step using the bilateral filter (BF) to improve accuracy. The winner-take-all (WTA) optimization uses the aggregated cost volume to produce an initial disparity map. Finally, a series of refinement processes enhance the initial disparity map for a more accurate final disparity map. This paper verified the performance of the algorithm using the Middlebury online stereo benchmarking system. The proposed algorithm achieves the objective of generating a more accurate and smooth disparity map with different depths at low texture regions through better matching cost quality.
Disparity Estimation by a Real Time Approximation AlgorithmCSCJournals
This paper presents an approximation real time algorithm for estimating the disparity of the stereo
images. The approximation is achieved by shrinking the left and right of original images.
According to this method (i ) left and right images have been shrinked three times,(ii) the disparity
image is computed from the shrinked left and right images to reconstruct the disparity image and
extrapolate the disparity image to retrieve the original image size. The computational time of
proposed algorithm is less than the existing methods, approximately real time and requires less
memory space. This method is applied on the standard stereo images and the results show that it
can easily reduce the computational time of about 76.34 % with no appreciable degradation of
accuracy.
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
Human activity recognition (HAR) is recently used in numerous applications including smart homes to monitor human behavior, automate homes according to human activities, entertainment, falling detection, violence detection, and people care. Vision-based recognition is the most powerful method widely used in HAR systems implementation due to its characteristics in recognizing complex human activities. This paper addresses the design of a 3D convolutional neural network (3D-CNN) model that can be used in smart homes to identify several numbers of activities. The model is trained using KTH dataset that contains activities like (walking, running, jogging, handwaving handclapping, boxing). Despite the challenges of this method due to the effectiveness of the lamination, background variation, and human body variety, the proposed model reached an accuracy of 93.33%. The model was implemented, trained and tested using moderate computation machine and the results show that the proposal was successfully capable to recognize human activities with reasonable computations.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
More Related Content
Similar to A deep learning based stereo matching model for autonomous vehicle
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
Stereo matching based on absolute differences for multiple objects detectionTELKOMNIKA JOURNAL
This article presents a new algorithm for object detection using stereo camera system. The problem to get an accurate object detion using stereo camera is the imprecise of matching process between two scenes with the same viewpoint. Hence, this article aims to reduce the incorrect matching pixel with four stages. This new algorithm is the combination of continuous process of matching cost computation, aggregation, optimization and filtering. The first stage is matching cost computation to acquire preliminary result using an absolute differences method. Then the second stage known as aggregation step uses a guided filter with fixed window support size. After that, the optimization stage uses winner-takes-all (WTA) approach which selects the smallest matching differences value and normalized it to the disparity level. The last stage in the framework uses a bilateral filter. It is effectively further decrease the error on the disparity map which contains information of object detection and locations. The proposed work produces low errors (i.e., 12.11% and 14.01% nonocc and all errors) based on the KITTI dataset and capable to perform much better compared with before the proposed framework and competitive with some newly available methods.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Depth-DensePose: an efficient densely connected deep learning model for came...IJECEIAES
Camera/image-based localization is important for many emerging applications such as augmented reality (AR), mixed reality, robotics, and self-driving. Camera localization is the problem of estimating both camera position and orientation with respect to an object. Use cases for camera localization depend on two key factors: accuracy and speed (latency). Therefore, this paper proposes Depth-DensePose, an efficient deep learning model for 6-degrees-of-freedom (6-DoF) camera-based localization. The Depth-DensePose utilizes the advantages of both DenseNets and adapted depthwise separable convolution (DS-Conv) to build a deeper and more efficient network. The proposed model consists of iterative depth-dense blocks. Each depth dense block contains two adapted DS-Conv with two kernel sizes 3 and 5, which are useful to retain both low-level as well as high-level features. We evaluate the proposed Depth-DensePose on the Cambridge Landmarks dataset, which shows that the Depth-DensePose outperforms the performance of related deep learning models for camera based localization. Furthermore, extensive experiments were conducted which proven the adapted DS-Conv is more efficient than the standard convolution. Especially, in terms of memory and processing time which is important to real-time and mobile applications.
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
Machine learning based augmented reality for improved learning application th...IJECEIAES
Detection of objects and their location in an image are important elements of current research in computer vision. In May 2020, Meta released its state-ofthe-art object-detection model based on a transformer architecture called detection transformer (DETR). There are several object-detection models such as region-based convolutional neural network (R-CNN), you only look once (YOLO) and single shot detectors (SSD), but none have used a transformer to accomplish this task. These models mentioned earlier, use all sorts of hyperparameters and layers. However, the advantages of using a transformer pattern make the architecture simple and easy to implement. In this paper, we determine the name of a chemical experiment through two steps: firstly, by building a DETR model, trained on a customized dataset, and then integrate it into an augmented reality mobile application. By detecting the objects used during the realization of an experiment, we can predict the name of the experiment using a multi-class classification approach. The combination of various computer vision techniques with augmented reality is indeed promising and offers a better user experience.
A new function of stereo matching algorithm based on hybrid convolutional neu...IJEECSIAES
This paper proposes a new hybrid method between the learning-based and handcrafted methods for a stereo matching algorithm. The main purpose of the stereo matching algorithm is to produce a disparity map. This map is essential for many applications, including three-dimensional (3D) reconstruction. The raw disparity map computed by a convolutional neural network (CNN) is still prone to errors in the low texture region. The algorithm is set to improve the matching cost computation stage with hybrid CNN-based combined with truncated directional intensity computation. The difference in truncated directional intensity value is employed to decrease radiometric errors. The proposed method’s raw matching cost went through the cost aggregation step using the bilateral filter (BF) to improve accuracy. The winner-take-all (WTA) optimization uses the aggregated cost volume to produce an initial disparity map. Finally, a series of refinement processes enhance the initial disparity map for a more accurate final disparity map. This paper verified the performance of the algorithm using the Middlebury online stereo benchmarking system. The proposed algorithm achieves the objective of generating a more accurate and smooth disparity map with different depths at low texture regions through better matching cost quality.
A new function of stereo matching algorithm based on hybrid convolutional neu...nooriasukmaningtyas
This paper proposes a new hybrid method between the learning-based and handcrafted methods for a stereo matching algorithm. The main purpose of the stereo matching algorithm is to produce a disparity map. This map is essential for many applications, including three-dimensional (3D) reconstruction. The raw disparity map computed by a convolutional neural network (CNN) is still prone to errors in the low texture region. The algorithm is set to improve the matching cost computation stage with hybrid CNN-based combined with truncated directional intensity computation. The difference in truncated directional intensity value is employed to decrease radiometric errors. The proposed method’s raw matching cost went through the cost aggregation step using the bilateral filter (BF) to improve accuracy. The winner-take-all (WTA) optimization uses the aggregated cost volume to produce an initial disparity map. Finally, a series of refinement processes enhance the initial disparity map for a more accurate final disparity map. This paper verified the performance of the algorithm using the Middlebury online stereo benchmarking system. The proposed algorithm achieves the objective of generating a more accurate and smooth disparity map with different depths at low texture regions through better matching cost quality.
Disparity Estimation by a Real Time Approximation AlgorithmCSCJournals
This paper presents an approximation real time algorithm for estimating the disparity of the stereo
images. The approximation is achieved by shrinking the left and right of original images.
According to this method (i ) left and right images have been shrinked three times,(ii) the disparity
image is computed from the shrinked left and right images to reconstruct the disparity image and
extrapolate the disparity image to retrieve the original image size. The computational time of
proposed algorithm is less than the existing methods, approximately real time and requires less
memory space. This method is applied on the standard stereo images and the results show that it
can easily reduce the computational time of about 76.34 % with no appreciable degradation of
accuracy.
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
Human activity recognition (HAR) is recently used in numerous applications including smart homes to monitor human behavior, automate homes according to human activities, entertainment, falling detection, violence detection, and people care. Vision-based recognition is the most powerful method widely used in HAR systems implementation due to its characteristics in recognizing complex human activities. This paper addresses the design of a 3D convolutional neural network (3D-CNN) model that can be used in smart homes to identify several numbers of activities. The model is trained using KTH dataset that contains activities like (walking, running, jogging, handwaving handclapping, boxing). Despite the challenges of this method due to the effectiveness of the lamination, background variation, and human body variety, the proposed model reached an accuracy of 93.33%. The model was implemented, trained and tested using moderate computation machine and the results show that the proposal was successfully capable to recognize human activities with reasonable computations.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
K-centroid convergence clustering identification in one-label per type for di...IAESIJAI
Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC3 I system with the most important features and remain the same accuracy.
Plant leaf detection through machine learning based image classification appr...IAESIJAI
Since maize is a staple diet for people, especially vegetarians and vegans, maize leaf disease has a significant influence here on the food industry including maize crop productivity. Therefore, it should be understood that maize quality must be optimal; yet, to do so, maize must be safeguarded from several illnesses. As a result, there is a great demand for such an automated system that can identify the condition early on and take the appropriate action. Early disease identification is crucial, but it also poses a major obstacle. As a result, in this research project, we adopt the fundamental k-nearest neighbor (KNN) model and concentrate on building and developing the enhanced k-nearest neighbor (EKNN) model. EKNN aids in identifying several classes of disease. To gather discriminative, boundary, pattern, and structurally linked information, additional high-quality fine and coarse features are generated. This information is then used in the classification process. The classification algorithm offers high-quality gradient-based features. Additionally, the proposed model is assessed using the Plant-Village dataset, and a comparison with many standard classification models using various metrics is also done.
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
In this work, we propose a novel backbone search method for object detection for applications in intrusion warning systems. The goal is to find a compact model for use in embedded thermal imaging cameras widely used in intrusion warning systems. The proposed method is based on faster region-based convolutional neural network (Faster R-CNN) because it can detect small objects. Inspired by EfficientNet, the sought-after backbone architecture is obtained by finding the most suitable width scale for the base backbone (ResNet50). The evaluation metrics are mean average precision (mAP), number of parameters, and number of multiply–accumulate operations (MACs). The experimental results showed that the proposed method is effective in building a lightweight neural network for the task of object detection. The obtained model can keep the predefined mAP while minimizing the number of parameters and computational resources. All experiments are executed elaborately on the person detection in intrusion warning systems (PDIWS) dataset.
Deep learning method for lung cancer identification and classificationIAESIJAI
Lung cancer (LC) is calming many lives and is becoming a serious cause of concern. The detection of LC at an early stage assists the chances of recovery. Accuracy of detection of LC at an early stage can be improved with the help of a convolutional neural network (CNN) based deep learning approach. In this paper, we present two methodologies for Lung cancer detection (LCD) applied on Lung image database consortium (LIDC) and image database resource initiative (IDRI) data sets. Classification of these LC images is carried out using support vector machine (SVM), and deep CNN. The CNN is trained with i) multiple batches and ii) single batch for LC image classification as non cancer and cancer image. All these methods are being implemented in MATLAB. The accuracy of classification obtained by SVM is 65%, whereas deep CNN produced detection accuracy of 80% and 100% respectively for multiple and single batch training. The novelty of our experimentation is near 100% classification accuracy obtained by our deep CNN model when tested on 25 Lung computed tomography (CT) test images each of size 512×512 pixels in less than 20 iterations as compared to the research work carried out by other researchers using cropped LC nodule images.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
Embedded artificial intelligence system using deep learning and raspberrypi f...IAESIJAI
Melanoma is a kind of skin cancer that originates in melanocytes responsible for producing melanin, it can be a severe and potentially deadly form of cancer because it can metastasize to other regions of the body if not detected and treated early. To facilitate this process, Recently, various computer-assisted low-cost, reliable, and accurate diagnostic systems have been proposed based on artificial intelligence (AI) algorithms, particularly deep learning techniques. This work proposed an innovative and intelligent system that combines the internet of things (IoT) with a Raspberry Pi connected to a camera and a deep learning model based on the deep convolutional neural network (CNN) algorithm for real-time detection and classification of melanoma cancer lesions. The key stages of our model before serializing to the Raspberry Pi: Firstly, the preprocessing part contains data cleaning, data transformation (normalization), and data augmentation to reduce overfitting when training. Then, the deep CNN algorithm is used to extract the features part. Finally, the classification part with applied Sigmoid Activation Function. The experimental results indicate the efficiency of our proposed classification system as we achieved an accuracy rate of 92%, a precision of 91%, a sensitivity of 91%, and an area under the curve- receiver operating characteristics (AUC-ROC) of 0.9133.
Deep learning based biometric authentication using electrocardiogram and irisIAESIJAI
Authentication systems play an important role in wide range of applications. The traditional token certificate and password-based authentication systems are now replaced by biometric authentication systems. Generally, these authentication systems are based on the data obtained from face, iris, electrocardiogram (ECG), fingerprint and palm print. But these types of models are unimodal authentication, which suffer from accuracy and reliability issues. In this regard, multimodal biometric authentication systems have gained huge attention to develop the robust authentication systems. Moreover, the current development in deep learning schemes have proliferated to develop more robust architecture to overcome the issues of tradition machine learning based authentication systems. In this work, we have adopted ECG and iris data and trained the obtained features with the help of hybrid convolutional neural network- long short-term memory (CNN-LSTM) model. In ECG, R peak detection is considered as an important aspect for feature extraction and morphological features are extracted. Similarly, gabor-wavelet, gray level co-occurrence matrix (GLCM), gray level difference matrix (GLDM) and principal component analysis (PCA) based feature extraction methods are applied on iris data. The final feature vector is obtained from MIT-BIH and IIT Delhi Iris dataset which is trained and tested by using CNN-LSTM. The experimental analysis shows that the proposed approach achieves average accuracy, precision, and F1-core as 0.985, 0.962 and 0.975, respectively.
Hybrid channel and spatial attention-UNet for skin lesion segmentationIAESIJAI
Melanoma is a type of skin cancer which has affected many lives globally. The American Cancer Society research has suggested that it a serious type of skin cancer and lead to mortality but it is almost 100% curable if it is detected and treated in its early stages. Currently automated computer vision-based schemes are widely adopted but these systems suffer from poor segmentation accuracy. To overcome these issue, deep learning (DL) has become the promising solution which performs extensive training for pattern learning and provide better classification accuracy. However, skin lesion segmentation is affected due to skin hair, unclear boundaries, pigmentation, and mole. To overcome this issue, we adopt UNet based deep learning scheme and incorporated attention mechanism which considers low level statistics and high-level statistics combined with feedback and skip connection module. This helps to obtain the robust features without neglecting the channel information. Further, we use channel attention, spatial attention modulation to achieve the final segmentation. The proposed DL based scheme is instigated on publically available dataset and experimental investigation shows that the proposed Hybrid Attention UNet approach achieves average performance as 0.9715, 0.9962, 0.9710.
Photoplethysmogram signal reconstruction through integrated compression sensi...IAESIJAI
The transmission of photoplethysmogram (PPG) signals in real-time is extremely challenging and facilitates the use of an internet of things (IoT) environment for healthcare- monitoring. This paper proposes an approach for PPG signal reconstruction through integrated compression sensing and basis function aware shallow learning (CSBSL). Integrated-CSBSL approach for combined compression of PPG signals via multiple channels thereby improving the reconstruction accuracy for the PPG signals essential in healthcare monitoring. An optimal basis function aware shallow learning procedure is employed on PPG signals with prior initialization; this is further fine-tuned by utilizing the knowledge of various other channels, which exploit the further sparsity of the PPG signals. The proposed method for learning combined with PPG signals retains the knowledge of spatial and temporal correlation. The proposed Integrated-CSBSL approach consists of two steps, in the first step the shallow learning based on basis function is carried out through training the PPG signals. The proposed method is evaluated using multichannel PPG signal reconstruction, which potentially benefits clinical applications through PPG monitoring and diagnosis.
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Multi-channel microseismic signals classification with convolutional neural n...IAESIJAI
Identifying and classifying microseismic signals is essential to warn of mines’ dangers. Deep learning has replaced traditional methods, but labor-intensive manual identification and varying deep learning outcomes pose challenges. This paper proposes a transfer learning-based convolutional neural network (CNN) method called microseismic signals-convolutional neural network (MS-CNN) to automatically recognize and classify microseismic events and blasts. The model was instructed on a limited sample of data to obtain an optimal weight model for microseismic waveform recognition and classification. A comparative analysis was performed with an existing CNN model and classical image classification models such as AlexNet, GoogLeNet, and ResNet50. The outcomes demonstrate that the MS-CNN model achieved the best recognition and classification effect (99.6% accuracy) in the shortest time (0.31 s to identify 277 images in the test set). Thus, the MS-CNN model can efficiently recognize and classify microseismic events and blasts in practical engineering applications, improving the recognition timeliness of microseismic signals and further enhancing the accuracy of event classification.
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...IAESIJAI
Efficient and accurate coronavirus disease (COVID-19) surveillance necessitates robust identification of individuals wearing face masks. This research introduces the sophisticated face mask dataset (SFMD), a comprehensive compilation of high-quality face mask images enriched with detailed annotations on mask types, fits, and usage patterns. Leveraging cutting-edge deep learning models—EfficientNet-B2, ResNet50, and MobileNet-V2—, we compare SFMD against two established benchmarks: the real-world masked face dataset (RMFD) and the masked face recognition dataset (MFRD). Across all models, SFMD consistently outperforms RMFD and MFRD in key metrics, including accuracy, precision, recall, and F1 score. Additionally, our study demonstrates the dataset's capability to cultivate robust models resilient to intricate scenarios like low-light conditions and facial occlusions due to accessories or facial hair.
Transfer learning for epilepsy detection using spectrogram imagesIAESIJAI
Epilepsy stands out as one of the common neurological diseases. The neural activity of the brain is observed using electroencephalography (EEG). Manual inspection of EEG brain signals is a slow and arduous process, which puts heavy load on neurologists and affects their performance. The aim of this study is to find the best result of classification using the transfer learning model that automatically identify the epileptic and the normal activity, to classify EEG signals by using images of spectrogram which represents the percentage of energy for each coefficient of the continuous wavelet. Dataset includes the EEG signals recorded at monitoring unit of epilepsy used in this study to presents an application of transfer learning by comparing three models Alexnet, visual geometry group (VGG19) and residual neural network ResNet using different combinations with seven different classifiers. This study tested the models and reached a different value of accuracy and other metrics used to judge their performances, and as a result the best combination has been achieved with ResNet combined with support vector machine (SVM) classifier that classified EEG signals with a high success rate using multiple performance metrics such as 97.22% accuracy and 2.78% the value of the error rate.
Deep neural network for lateral control of self-driving cars in urban environ...IAESIJAI
The exponential growth of the automotive industry clearly indicates that self-driving cars are the future of transportation. However, their biggest challenge lies in lateral control, particularly in urban bottlenecking environments, where disturbances and obstacles are abundant. In these situations, the ego vehicle has to follow its own trajectory while rapidly correcting deviation errors without colliding with other nearby vehicles. Various research efforts have focused on developing lateral control approaches, but these methods remain limited in terms of response speed and control accuracy. This paper presents a control strategy using a deep neural network (DNN) controller to effectively keep the car on the centerline of its trajectory and adapt to disturbances arising from deviations or trajectory curvature. The controller focuses on minimizing deviation errors. The Matlab/Simulink software is used for designing and training the DNN. Finally, simulation results confirm that the suggested controller has several advantages in terms of precision, with lateral deviation remaining below 0.65 meters, and rapidity, with a response time of 0.7 seconds, compared to traditional controllers in solving lateral control.
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...IAESIJAI
Recently, cardiovascular diseases (CVDs) have become a rapidly growing problem in the world, especially in developing countries. The latter are facing a lifestyle change that introduces new risk factors for heart disease, that requires a particular and urgent interest. Besides, cardiomegaly is a sign of cardiovascular diseases that refers to various conditions; it is associated with the heart enlargement that can be either transient or permanent depending on certain conditions. Furthermore, cardiomegaly is visible on any imaging test including Chest X-Radiation (X-Ray) images; which are one of the most common tools used by Cardiologists to detect and diagnose many diseases. In this paper, we propose an innovative deep learning (DL) model based on an attention module and MobileNet architecture to recognize Cardiomegaly patients using the popular Chest X-Ray8 dataset. Actually, the attention module captures the spatial relationship between the relevant regions in Chest X-Ray images. The experimental results show that the proposed model achieved interesting results with an accuracy rate of 81% which makes it suitable for detecting cardiomegaly disease.
Efficient commodity price forecasting using long short-term memory modelIAESIJAI
Predicting commodity prices, particularly food prices, is a significant concern for various stakeholders, especially in regions that are highly sensitive to commodity price volatility. Historically, many machine learning models like autoregressive integrated moving average (ARIMA) and support vector machine (SVM) have been suggested to overcome the forecasting task. These models struggle to capture the multifaceted and dynamic factors influencing these prices. Recently, deep learning approaches have demonstrated considerable promise in handling complex forecasting tasks. This paper presents a novel long short-term memory (LSTM) network-based model for commodity price forecasting. The model uses five essential commodities namely bread, meat, milk, oil, and petrol. The proposed model focuses on advanced feature engineering which involves moving averages, price volatility, and past prices. The results reveal that our model outperforms traditional methods as it achieves 0.14, 3.04%, and 98.2% for root mean square error (RMSE), mean absolute percentage error (MAPE), and R-squared (R2 ), respectively. In addition to the simplicity of the model, which consists of an LSTM single-cell architecture that reduced the training time to a few minutes instead of hours. This paper contributes to the economic literature on price prediction using advanced deep learning techniques as well as provides practical implications for managing commodity price instability globally.
1-dimensional convolutional neural networks for predicting sudden cardiacIAESIJAI
Sudden cardiac arrest (SCA) is a serious heart problem that occurs without symptoms or warning. SCA causes high mortality. Therefore, it is important to estimate the incidence of SCA. Current methods for predicting ventricular fibrillation (VF) episodes require monitoring patients over time, resulting in no complications. New technologies, especially machine learning, are gaining popularity due to the benefits they provide. However, most existing systems rely on manual processes, which can lead to inefficiencies in disseminating patient information. On the other hand, existing deep learning methods rely on large data sets that are not publicly available. In this study, we propose a deep learning method based on one-dimensional convolutional neural networks to learn to use discrete fourier transform (DFT) features in raw electrocardiogram (ECG) signals. The results showed that our method was able to accurately predict the onset of SCA with an accuracy of 96% approximately 90 minutes before it occurred. Predictions can save many lives. That is, optimized deep learning models can outperform manual models in analyzing long-term signals.
A deep learning-based approach for early detection of disease in sugarcane pl...IAESIJAI
In many regions of the nation, agriculture serves as the primary industry. The farming environment now faces a number of challenges to farmers. One of the major concerns, and the focus of this research, is disease prediction. A methodology is suggested to automate a process for identifying disease in plant growth and warning farmers in advance so they can take appropriate action. Disease in crop plants has an impact on agricultural production. In this work, a novel DenseNet-support vector machine: explainable artificial intelligence (DNet-SVM: XAI) interpretation that combines a DenseNet with support vector machine (SVM) and local interpretable model-agnostic explanation (LIME) interpretation has been proposed. DNet-SVM: XAI was created by a series of modifications to DenseNet201, including the addition of a support vector machine (SVM) classifier. Prior to using SVM to identify if an image is healthy or un-healthy, images are first feature extracted using a convolution network called DenseNet. In addition to offering a likely explanation for the prediction, the reasoning is carried out utilizing the visual cue produced by the LIME. In light of this, the proposed approach, when paired with its determined interpretability and precision, may successfully assist farmers in the detection of infected plants and recommendation of pesticide for the identified disease.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
A deep learning based stereo matching model for autonomous vehicle
1. IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 12, No. 1, March 2023, pp. 87~95
ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i1.pp87-95 87
Journal homepage: http://ijai.iaescore.com
A deep learning based stereo matching model for autonomous
vehicle
Deepa1
, Jyothi Kupparu2
1
Department of Information Science and Engineering, Nitte Mahalinga Adyanthaya Memorial Institute of Technology-Affiliated to Nitte
(Deemed to be University), Nitte, India
2
Department of Information Science and Engineering, Jawaharlal Nehru National College of Engineering, Visvesvaraya Technological
University, Shimoga, India
Article Info ABSTRACT
Article history:
Received Dec 23, 2021
Revised Jul 23, 2022
Accepted Aug 21, 2022
Autonomous vehicle is one the prominent area of research in computer
vision. In today’s AI world, the concept of autonomous vehicles has become
popular largely to avoid accidents due to negligence of driver. Perceiving the
depth of the surrounding region accurately is a challenging task in
autonomous vehicles. Sensors like light detection and ranging can be used
for depth estimation but these sensors are expensive. Hence stereo matching
is an alternate solution to estimate the depth. The main difficulties observed
in stereo matching is to minimize mismatches in the ill-posed regions, like
occluded, texture less and discontinuous regions. This paper presents an
efficient deep stereo matching technique for estimating disparity map from
stereo images in ill-posed regions. The images from Middlebury stereo data
set are used to assess the efficacy of the model proposed. The experimental
outcome dipicts that the proposed model generates reliable results in the
occluded, texture less and discontinuous regions as compared to the existing
techniques.
Keywords:
Convolutional neural networks
Disparity
Generative adversarial network
Ill posed regions
Stereo matching
This is an open access article under the CC BY-SA license.
Corresponding Author:
Deepa
Department of Information Science and Engineering, Nitte Mahalinga Adyanthaya Memorial Institute of
Technology-Affiliated to Nitte (Deemed to be University)
Nitte, India
Email: deepashetty17@nitte.edu.in
1. INTRODUCTION
Autonomous vehicles are a prominent research topic in the computer vision. It is necessary to correctly
measure the three-dimensional (3D) view of the surrounding region of the vehicle in real time to make a driving
decision. Precision of the depth map is crucial for the safety measure of autonomous vehicles. In these vehicles
the depth details of the surrounding region are usually extracted using the hardware like light detection and
ranging sensors. These sensors are expensive to install and also have certain drawbacks that may lower the
standard of the depth information. These sensors do not provide additional information like traffic light color
which plays a major role in decision making. Computer vision based stereo matching could be an alternate
solution to overcome this drawback. The aim of stereo matching is to find matching pixels of images from
different viewpoints and then estimate the depth [1]–[3]. It finds its applications in augmented reality, robotics,
3D reconstruction [4]–[9]. Stereo vision tries to imitate the process in the human eye and the human brain. A
scene taken from two cameras displaced horizontally will form two slightly separate projections. Disparity is the
horizontal displacement in an object. A map that contains displacement of all pixels in an image is known as
disparity map. Depth of a scene can be estimated from this disparity map.
2. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 87-95
88
In recent past many stereo algorithms were proposed [10]–[12]. A classic stereo algorithm mainly
follows three steps namely: computing the pixel wise features, construction of cost volume followed by post-
processing. Traditional stereo matching methods are grouped as local, global and semi-global methods. Local
methods rely on low level pixel features to compute the similarity in the cost computation step. They estimate
the correspondence by means of a window or support region [13]–[15]. Since the pixel wise characterization
play a major factor, a wide variety of these representations are used by researchers varying from a simple rgb
representation of pixels to the other descriptors like census transform, scale invariant feature transform.
Segment based super pixel technique is proposed in [16]. After finding the edges and matching cost, adaptive
support weight is used in cost aggregation. It proposes dual path refinement to correct disparities. Stereo
matching based on adaptive cross area and guided filtering with orthogonal weights (ACR-GIF-OW) is
proposed in [17]. These techniques are computationally less expensive but do not produce accurate results in
the texture less, discontinuous and occluded areas.
A Global methods handle texture less regions or uneven surfaces by including smoothness cost.
Global methods make use of global energy function. The energy function is minimized step by step to
compute disparity by assuming matching as a labelling problem. The pixels are considered as nodes and
disparity estimated is considered as labels. The global methods use data and smoothness term to compute the
energy function to produce smooth disparity. Graph cut [18], dynamic programming [19] and belief
propagation [20] are the classic global matching algorithm. A tree structure is proposed in [21] named
pyramid-tree that performs cross regional smoothing and handling region of low texture. In addition, they
used log angle for cost computation which is robust to inconsistencies. The performance of global methods is
limited because these approaches depend on hand-crafted features and hence do not produce accurate results.
Convolutional neural network (CNN) is popular in different vision [22]–[24] applications. These
methods are widely used in stereo matching. It improves the performance as compared to traditional
methods. Kendall et. al. [25] the authors presented an architecture that learns disparity without regularization.
Features are extracted automatically using CNN without any manual intervention. These features are used to
perform stereo matching, that can handle texture less regions or uneven surfaces. Eigen et. al. [26] made use
of basic neural networks to determine depth of a scene. They used AlexNet architecture to generate coarse
map. Another network is followed that performs local refinements. The work proposed in [27] included the
process of multi-stage framework that combined random forests and CNN. An architecture named neural
regression forest is used to find depth from single input image. It allows parallel training of all CNN. Finally,
a bilateral filter was used to obtain a refined disparity map. A similar concept is presented in [28] where
many tiny neural networks were trained across overlapping patches. DispNet is one of the basic networks
used for disparity estimation. A cascading residual learning network is used in [29] that extend the DispNet
structure. It is obtained by using DispFullNet and DispResNet. The initial stages of CNN uses DispNet with
an additional up convolution module. This help to extract more information. The next stage generates
residual signal that helps in refinement. A trainable network is explained in [30]. It uses a robust
differentiable patch match internal structure that discards most disparities without performing cost volume
evaluation fully. This reduces search space and increases memory and time efficiency. The main drawbacks
of existing methods are that the ill posed regions are not handled effectively. In the proposed method CNN is
combined with optimization technique. CNN is used to replace the the hand-crafted term with the learned
features. The output of CNN is used to calculate the unary and smoothness cost. Smoothness cost is added by
taking the information from the neighboring pixels. Smoothness cost estimates the contrast-sensitive
information to get a smooth disparity map. Post processing is performed to handle occlusion.
In stereo vision, the areas visible in one view may not be visible in another. It is often difficult to
reconstruct such regions in one image by looking at the other. The losses computed in these areas are noisy,
leading to inaccurate results specifically in the occluded areas. Disparity refinement is implemented to
enhance the accuracy of matching in ill posed areas. The left-right consistency check is the common method
used to identify and handle the outliers. Even though several methods were proposed in the past to enhance
the efficiency of matching, the low accuracy problem especially in the ill posed areas has not been handled
very well. In order to handle these areas, post processing is performed by means of a generative adversarial
network (GAN) model put forward by Goodfellow [31]. GAN is a structure used for training generative
model. It uses the concept of min-max game. The two models namely generator model and a discriminative
model is used to analyze the distribution of data. The generator tries to understand the distribution which is
almost same to the real distribution of data. The ability to generate high quality image by GAN makes it
applicable in several image processing applications. An encoder decoder structure is used for training in
reconstructing the images. This model can produce various realistic representation of input by altering the
attribute values. A conditional adversarial network [32] can be used for image translation. This translation
converts the image from one representation to the other such as day to night.
3. Int J Artif Intell ISSN: 2252-8938
A deep learning based stereo matching model for autonomous vehicle (Deepa)
89
We propose a hybrid CNN based deep stereo network model (CDSN) to estimate the disparity map
that can produce accurate results. Loopy belief propagation is used to compute initial disparity map from
features extracted from CNN. A generative neural network is used to handle the ill posed regions in the
disparity map. The generated images look more realistic and closer to ground truth disparity map. The
obtained result show that the proposed CDSN model handle the ill posed regions like discontinuities in the
image boundaries and occluded areas effectively. The proposed model outperforms the other existing
techniques on Middlebury dataset [33]. The paper is organized in a manner, section 2 explains the proposed
CDSN model. Section 3 depicts the results of proposed model. The conclusions of the paper are presented in
section 4.
2. METHOD
A CNN based model is proposed for stereo matching to find disparity map. The features extracted
from CNN is used to compute the unary cost and smoothness cost. Global energy function is adapted to get
the initial disparity map. A GAN model is used to handle ill posed region. Table 1 depicts the list of symbols
with its description. The flow chart of proposed model is displayed in Figure 1.
Table 1. Symbol table
Symbol Description
D(di) Unary cost at pixel ‘i’
Vi
l
Left feature vector
Vi
r
Right feature vector
S(di,dj)
α and β
msg t
i→j(di)
D(p, q)
Ep,q
G(p,r)
Ep,r
Dg
Dt
T
Smoothness cost
Smoothness constants
Message from pixel ‘i’ to ‘j’ at iterations ‘t’
discriminator
Expected values of all real data instances
generator
Expected values of all generated instances
Ground truth disparity map
Estimated disparity map
Threshold value
N Total number of pixels
Figure 1. Flowchart of the CDSN model
4. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 87-95
90
2.1. CNN feature extraction
Conventional algorithms for stereo matching focuses on hand crafted features which leads to
inadequate image information. CNN is used for the various vision problems including stereo matching. The
CNN can extract local context better, hence it is robust to any photometric differences. The feature
descriptors are extracted from rectified stereo images using a pre-trained visual geometry group (VGG-16)
model [34]. The VGG-16 model is trained using ImageNet dataset which contains 14 million labelled images
that are of high resolution that belong to 1,000 classes. The output of the 9th layer is used for stereo matching
in the proposed model as it presents an appropriate feature space for computing disparity. VGG-16 uses a
max pool layer that select the maximum element from the input map using a filter of 2×2. The first and
second layers include 64 channels of 3×3 kernel size which is followed by max pool function of stride 2, 2.
The third and fourth layers include 128 channels of 3×3 kernel followed by max pool function of stride 2, 2.
The next three layers include 256 channels of 3×3 kernel that is followed by max pool function of stride 2, 2.
Eighth and ninth layers include 512 channels of 3×3 kernel size. An N-dimensional feature vector is obtained
for every location of pixel.
2.2. Initial disparity map estimation
The extracted feature descriptors are used to determine the matching cost of every pixel in left
feature map. We search horizontally along the right feature map for the best matching value. The matching
unary cost is calculated using the Euclidian distance of two feature descriptors using (1).
𝐷(𝑑𝑖) = 𝑚𝑖𝑛‖𝑉𝑙
− 𝑉𝑟‖ (1)
Unary cost may not yield optimal result in the texture less, repetitive patterns, discontinuity regions.
The smoothness cost is used to smoothen the unary cost. Many smoothening techniques is proposed in the
recent past. Most of these methods use random variables to have the disparity of a pixel, which encodes
smoothness cost based on some standard constant. The smoothness cost is estimated based on neighbouring
pixel information. The smoothness cost penalizes the inconsistent disparity values. The smoothness cost is
computed using (2),
𝑆(𝑑𝑖, 𝑑𝑗) =
∝ ∗ (𝑑𝑖−𝑑𝑗)
2
(𝑑𝑖−𝑑𝑗)
2
+𝛽
(2)
Let 𝑃 represent pixels in the image. The initial disparity map 𝑑𝑖 of each pixel 𝑖 ∈ 𝑃 is estimated using energy
function 𝐸
𝐸(𝑑) = ∑ 𝐷(𝑑𝑖) + ∑ 𝑆(𝑑𝑖, 𝑑𝑗)
(𝑖,𝑗)∈𝑁
𝑖∈𝑃 (3)
The proposed method uses max product variation of loopy belief propagation (LBP) [20] to obtain
the best disparity map. LBP is an algorithm based on assigning label to each pixel imposing global
constraints and message passing. This is an iterative method where the messages are passed to left, right, top
and bottom in each iteration. In each iteration t, the message is passing from pixel i to pixel j using (4),
𝑚𝑠𝑔𝑖→𝑗
𝑡
(𝑑𝑖) = 𝑚𝑖𝑛𝑑𝑖
[𝐷(𝑑𝑖) + 𝑆(𝑑𝑖, 𝑑𝑗) + ∑ 𝑚𝑠𝑔𝑎→𝑖
𝑡−1
(𝑑𝑖)
𝑎∈𝑁(𝑖)𝑗 ] (4)
Here 𝑎 represents all neighbours of 𝑖 except 𝑗
Belief is calculated by (5).
𝐵𝑒𝑙𝑖𝑒𝑓(𝑑𝑖) = 𝐷(𝑑𝑖) + ∑ 𝑚𝑠𝑔𝑘→𝑖
𝑇
(𝑑𝑖)
𝑘∈𝑁(𝑖) (5)
The values 𝑑 ranges from 0 to maximum disparity range and 𝑘 represent neighbours of pixel 𝑖 . The
smooth disparity is obtained for iteration 𝑇 that minimizes the 𝐵𝑒𝑙𝑖𝑒𝑓(𝑑𝑖 ). It is observed that the
minimization of energy became constant after 10 iterations. Hence the proposed algorithm used 10 iterations.
2.3. Disparity refinement using GAN
The GAN network is used to refine the disparity. This refinement model is used to handle ill posed
regions. The GAN can perform learning task automatically by identifying various patterns or irregularities
from the input data. GANs have the ability to handle missing data such as occluded pixels in the disparity
map. The two sub models in GAN are generator and discriminator. The generator model generates new
5. Int J Artif Intell ISSN: 2252-8938
A deep learning based stereo matching model for autonomous vehicle (Deepa)
91
samples and discriminator model checks if the generated samples are similar to ground truth map. In the
proposed model the network learns through ground truth. The architecture of this refinement technique is
given in Figure 2.
Figure 2. Architecture of disparity refinement technique
The proposed model uses Pix2Pix GAN model [32]. Pix2Pix GAN is simple and can produce high
quality images for image translation applications. The efficiency of this GAN as compared to other GAN like
CycleGAN [35] and DualGAN [36] is explained in the ablation study. The generator in Pix2Pix is a
convolutional network that accepts initial disparity map as the input image and passes it through several
convolution and up-sampling layers. Finally, it produces a refined disparity map, where all the occluded areas
are filled with valid data. The U-Net auto encoding generator model is trained using adversarial loss that
encourages it to create reasonable image. The encoder and decoder are made up of blocks of convolutional,
activation layers and batch normalization layers. The generator is updated by loss that is generated between
generated image and ground truth image. This information helps generator model to create more reasonable
image that is similar to ground truth. The generator G is trained so as to generate output which can be
differentiated from ground truth image by a discriminator D. The GAN objective is represented as,
𝐿𝐺𝐴𝑁(𝐺, 𝐷) = 𝐸𝑝,𝑞 [𝑙𝑜𝑔𝐷(𝑝, 𝑞)] + 𝐸𝑝,𝑟 [𝑙𝑜𝑔 (1 − 𝐷(𝑝, 𝐺(𝑝, 𝑟)))] (6)
Here 𝑝 denote a ground truth image, 𝑞 represent the generated image and 𝑟 represent the initial disparity map
𝐺∗
= 𝑎𝑟𝑔𝑚𝑖𝑛𝑔𝑚𝑎𝑥𝑑𝐿𝐺𝐴𝑁(𝐺, 𝐷) (7)
𝐺 aims to decrease the objective and 𝐷 aims to increase the objective.
The generator G tries to move the generated image closer to ground truth image using loss 𝐿1 which is
calculated as
𝐿𝑜𝑠𝑠𝐿1
(𝐺) = 𝐸𝑝,𝑞,𝑟 [‖(𝑞 − 𝐺(𝑝, 𝑟))‖1
] (8)
The final objective is represented as
𝐺∗
= 𝑎𝑟𝑔𝑚𝑖𝑛𝑔𝑚𝑎𝑥𝑑𝐿𝐺𝐴𝑁(𝐺, 𝐷) + 𝜆𝐿𝑜𝑠𝑠𝐿1
(𝐺) (9)
The visual arti-facts were reduced for the value of 𝜆=100.
The network is trained by images from the Middlebury dataset [33]. The network is tested for 100,
200, 300, 400 epochs. The best disparity map is achieved for 300 epochs. The output from the generator is
fed to the discriminator together with ground truth image. The gradient loss is calculated with respect to
generator and discriminator to update the model. The trained model is tested to yield a best disparity map. It
is observed from the results that best disparity map was obtained by handling the ill posed regions. Figure 3
shows the performace of the model with respect to training loss and training accuracy. Figure 3(a) dipicts
training loss and training accuracy against the number of epochs is shown in Figure 3(b). Lower the loss
better is the accuracy.
To measure the efficacy of the model proposed, we deployed and tested our model on Dual Intel
Xeon E5-2609V4 8C 1.7 GHz 20M 6.4 GT/s with 128GB memory, Dual NVDIA Tesla P100 graphics
6. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 87-95
92
processing unit (GPU) with 3584 cores and maximum of 18.7 TeraFLOPS. The proposed CDSN model is
evaluated on Middlebury dataset images. These images are pre-processed and rectified stereo images. The
output of the 9th layer pre-trained VGG-16 architecture is used for estimating initial disparity map using
loopy belief propagation. Initial disparity map is estimated using python programming. GAN is implemented
using Pytorch. The Adam optimizer is used to train the Pix2Pix GAN for 300 epochs to handle the ill posed
regions. The learning rate has been initialised to 0.0002. The complexity of GAN model is summarized in the
Table 2.
(a) (b)
Figure 3. Performance of the model (a) training loss versus number of epochs and (b) training accuracy
versus number of epochs
Table 2. Complexity of GAN model
Input size Optimizer Parameters Epochs Output size GPU memory GPU model
256X256X3 Adam 54.414M 300 256X256X3 128GB Dual NVDIA Tesla P100
3. RESULTS AND DISCUSSION
The proposed model is analyzed for the images taken from Middlebury datasets namely “Jade
plant”, “Piano”, “Pipes”, and “Recycle”. The test images with resolution are shown in Table 3. Middlebury
2014 dataset contains 33 scenes that are classified into training, additional images and test images. Certain
images are used more than once under various exposure. A very high-resolution images is the salient feature
of the dataset. Ground truth maps and images are given at quarter, half and full resolution.
Table 3. Images from Middlebury 2014
Images Image resolution
Jade plant 659x497
Piano 707x481
Pipes 735X485
Recycle 720x486
3.1. Qualitative comparison
The qualitative results for estimating disparity map is depicted in Figure 4. From the top to bottom:
Jade plant, piano, pipes, and recycle. Figure 4(a) shows the left image, Figure 4(b) shows the right image,
Figure 4(c) represent the ground truth image and Figure 4(d) represent the estimated disparity map.
3.2. Quantitative comparison
The percentage of bad matching pixel (PBMP) and root mean square error (RMSE) metrics were
used for quantitative analysis. Lower values of PBMP and RMSE indicates better efficiency. PBMP is
calculated,
𝑃𝐵𝑀𝑃 = [
1
𝑁
∑|𝑑𝑡(𝑥, 𝑦) − 𝑑𝑔(𝑥, 𝑦)| > 𝑇] ∗ 100 (10)
7. Int J Artif Intell ISSN: 2252-8938
A deep learning based stereo matching model for autonomous vehicle (Deepa)
93
RMSE is calculated as,
𝑅𝑀𝑆𝐸 = [
1
𝑁
∑|𝑑𝑡(𝑥, 𝑦) − 𝑑𝑔(𝑥, 𝑦)|
2
]
1
2
(11)
For evaluations purpose, we compared CDSN model with existing stereo matching model. The
compared matching models are: deep pruner [30] ACR-GIF-OW [17], and efficient stereo matching by log-
angle and pyramid-tree (LPSM) [21]. The occluded areas are not dealt efficiently in [30]. Stereo matching
proposed in [17] is computationally less expensive but do not produce accurate results in the texture less,
discontinuous areas. Stereo matching proposed in [21] rely on hand-crafted cost matching and hence results
produced are not accurate. The Middlebury evaluation leader board results of existing methods are used for
comparison. The PBMP and RMSE results of the proposed model and existing techniques are shown in
Table 4 and Table 5 respectively. The PBMP and average RMSE results of the proposed CDSN model is less
than all three compared method. Hence the proposed model outperforms the compared method and hence
suitable for disparity map estimation.
(a) (b) (c) (d)
Figure 4. Visual results on Middlebury images (a) left image, (b) right image, (c) ground truth image and
(d) estimated disparity map
Table 4. The quantitative results based on PBMP for error threshold = 1 between computed and ground truth
disparities
Jade plant Piano Pipes Recycle
DEEP PRUNER [30] 62.8 41.0 53.8 36.8
ACR-GIF-OW [17] 51.8 45.1 40.5 37.5
LPSM [21] 59.2 44.8 46.3 36.8
CDSN 50.76 39.71 40.47 33.57
Table 5. The quantitative results based on RMSE between computed and ground truth disparities
Jade plant Piano Pipes Recycle Average
DEEP PRUNER [25] 28.2 4.64 13.7 3.81 12.58
ACR-GIF-OW [17] 64.9 14.8 28.6 15.8 31.02
LPSM [21] 34.8 6.09 16.3 5.79 15.74
CDSN 6.07 8.73 8.88 8.48 8.04
3.3. Ablation study
We executed ablation study by comparing the proposed model with the models like CycleGAN and
DualGAN. CycleGAN is a technique that performs image translation without using paired examples. This
GAN uses unsupervised training. DualGAN is made up of two generators and two discriminators. It is trained
to translate images from source to target and target to source. The various metric used are absolute relative
distance (ARD), squared relative difference (SRD) and RMSE. Lower values indicate better performance.
We find the efficicency of the proposed model is significantly high which is presented in the Table 6.
8. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 87-95
94
𝐴𝑅𝐷 =
1
𝑁
∑
𝑑𝑡 (𝑥,𝑦)−𝑑𝑔(𝑥,𝑦)
𝑑𝑡(𝑥,𝑦)
(12)
𝑆𝑅𝐷 =
1
𝑁
∑
|𝑑𝑡(𝑥,𝑦)−𝑑𝑔(𝑥,𝑦)|
2
𝑑𝑡(𝑥,𝑦)
(13)
Table 6. Ablation study using metrics ARD, SRD, RMSE
CycleGAN DualGAN Proposed model
ARD 0.032 0.035 0.016 Lower is better
SRD 0.352 0.374 0.337
RMSE 7.036 7.974 6.591
4. CONCLUSION
This paper presents a novel CNN based model for stereo matching to estimate disparity map from
rectified stereo images which is useful in autonomous vehicles. The features extracted from CNN is used to
compute the unary cost and smoothness cost. The initial disparity map is obtained using loopy belief
propagation, which is then refined using a GAN model to handle the ill posed regions. It is found that the
proposed model based on CNN generated disparity maps which are smoother than those generated using
naive model and the ill posed regions are handled well using GAN network. The proposed model is evaluated
qualitatively as well as quantitatively on various images from Middlebury stereo data set. The results
determine that proposed model achieves best disparity map and outperforms existing methods.
REFERENCES
[1] D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int.
J. Comput. Vis., no. 1, pp. 7–42, 2002, doi: 10.1109/ SMBV.2001.988771.
[2] R. A. Hamzah and H. Ibrahim, “Literature survey on stereo vision disparity map algorithms,” Journal of Sensors, vol. 2016, 2016,
doi: 10.1155/2016/8742920.
[3] M. S. Hamid, N. F. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Stereo matching algorithm based on deep learning: A survey,”
Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 5, pp. 1663–1673, 2022, doi:
10.1016/j.jksuci.2020.08.011.
[4] U. Hani and L. Moin, “Realtime autonomous navigation in V-Rep based static and dynamic environment using EKF-SLAM,”
IAES International Journal of Robotics and Automation (IJRA), vol. 10, no. 4, p. 296, 2021, doi: 10.11591/ijra.v10i4.pp296-307.
[5] Susanto, D. D. Budiarjo, A. Hendrawan, and P. T. Pungkasanti, “The implementation of intelligent systems in automating vehicle
detection on the road,” IAES International Journal of Artificial Intelligence, vol. 10, no. 3, pp. 571–575, 2021, doi:
10.11591/ijai.v10.i3.pp571-575.
[6] S. Sivaraman and M. M. Trivedi, “A review of recent developments in vision-based vehicle detection,” IEEE Intelligent Vehicles
Symposium, Proceedings, pp. 310–315, 2013, doi: 10.1109/IVS.2013.6629487.
[7] S. Hong, M. Li, M. Liao, and P. Van Beek, “Real-time mobile robot navigation based on stereo vision and low-cost GPS,” IS and
T International Symposium on Electronic Imaging Science and Technology, pp. 10–15, 2017, doi: 10.2352/ISSN.2470-
1173.2017.9.IRIACV-259.
[8] B. Krajancich, P. Kellnhofer, and G. Wetzstein, “Optimizing depth perception in virtual and augmented reality through gaze-
contingent stereo rendering,” ACM Transactions on Graphics, vol. 39, no. 6, 2020, doi: 10.1145/3414685.3417820.
[9] H. Ham, J. Wesley, and Hendra, “Computer vision based 3D reconstruction : a review,” International Journal of Electrical and
Computer Engineering, vol. 9, no. 4, pp. 2394–2402, 2019, doi: 10.11591/ijece.v9i4.pp2394-2402.
[10] C. Bai, Q. Ma, P. Hao, Z. Liu, and J. Zhang, “Improving stereo matching algorithm with adaptive cross-scale cost aggregation,”
International Journal of Advanced Robotic Systems, vol. 15, no. 1, 2018, doi: 10.1177/1729881417751544.
[11] H. Shabanian and M. Balasubramanian, “A new hybrid stereo disparity estimation algorithm with guided image filtering-based
cost aggregation,” IS and T International Symposium on Electronic Imaging Science and Technology, vol. 2021, no. 2, 2021, doi:
10.2352/ISSN.2470-1173.2021.2.SDA-059.
[12] R. A. Hamzah, M. G. Y. Wei, and N. S. N. Anwar, “Development of stereo matching algorithm based on sum of absolute RGB
color differences and gradient matching,” International Journal of Electrical and Computer Engineering, vol. 10, no. 3, pp. 2375–
2382, 2020, doi: 10.11591/ijece.v10i3.pp2375-2382.
[13] H. Liu, R. Wang, Y. Xia, and X. Zhang, “Improved cost computation and adaptive shape guided filter for local stereo matching of
low texture stereo images,” Applied Sciences (Switzerland), vol. 10, no. 5, 2020, doi: 10.3390/app10051869.
[14] S. Chen, J. Zhang, and M. Jin, “A simplified ICA-based local similarity stereo matching,” Visual Computer, vol. 37, no. 2, pp.
411–419, 2021, doi: 10.1007/s00371-020-01811-x.
[15] J. K and P. C.J, “Multi modal face recognition using block based curvelet features,” International Journal of Computer Graphics
& Animation, vol. 4, no. 2, pp. 21–37, 2014, doi: 10.5121/ijcga.2014.4203.
[16] C. S. Huang, Y. H. Huang, D. Y. Chan, and J. F. Yang, “Shape-reserved stereo matching with segment-based cost aggregation and
dual-path refinement,” Eurasip Journal on Image and Video Processing, vol. 2020, no. 1, 2020, doi: 10.1186/s13640-020-00525-3.
[17] L. Kong, J. Zhu, and S. Ying, “Local stereo matching using adaptive cross-region-based guided image filtering with orthogonal
weights,” Mathematical Problems in Engineering, vol. 2021, 2021, doi: 10.1155/2021/5556990.
[18] V. Kolmogorov, P. Monasse, and P. Tan, “Kolmogorov and zabih’s graph cuts stereo matching algorithm,” Image Processing On
Line, vol. 4, pp. 220–251, 2014, doi: 10.5201/ipol.2014.97.
[19] O. Veksler, “Stereo correspondence by dynamic programming on a tree,” Proceedings - 2005 IEEE Computer Society Conference
9. Int J Artif Intell ISSN: 2252-8938
A deep learning based stereo matching model for autonomous vehicle (Deepa)
95
on Computer Vision and Pattern Recognition, CVPR 2005, vol. II, pp. 384–390, 2005, doi: 10.1109/CVPR.2005.334.
[20] J. Sun, H. Y. Shum, and N. N. Zheng, “Stereo matching using belief propagation,” Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2351, pp. 510–524, 2002, doi:
10.1007/3-540-47967-8_34.
[21] C. Xu, C. Wu, D. Qu, F. Xu, H. Sun, and J. Song, “Accurate and efficient stereo matching by log-angle and pyramid-tree,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 4007–4019, 2021, doi:
10.1109/TCSVT.2020.3044891.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,”
Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017, doi: 10.1145/3065386.
[23] M. S. Hamid, N. A. Manap, R. A. Hamzah, A. F. Kadmin, S. F. A. Gani, and A. I. Herman, “A new function of stereo matching
algorithm based on hybrid convolutional neural network,” Indonesian Journal of Electrical Engineering and Computer Science,
vol. 25, no. 1, pp. 223–231, 2022, doi: 10.11591/ijeecs.v25.i1.pp223-231.
[24] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2017, doi: 10.1109/TPAMI.2016.2572683.
[25] A. Kendall et al., “End-to-end learning of geometry and context for deep stereo regression,” Proceedings of the IEEE
International Conference on Computer Vision, vol. 2017-Octob, pp. 66–75, 2017, doi: 10.1109/ICCV.2017.17.
[26] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in
Neural Information Processing Systems, vol. 3, no. January, pp. 2366–2374, 2014.
[27] A. Roy and S. Todorovic, “Monocular depth estimation using neural regression forest,” Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 5506–5514, 2016, doi:
10.1109/CVPR.2016.594.
[28] A. Chakrabarti, J. Shao, and G. Shakhnarovich, “Depth from a single image by harmonizing overcomplete local network
predictions,” Advances in Neural Information Processing Systems, pp. 2666–2674, 2016.
[29] J. Pang, W. Sun, J. S. J. Ren, C. Yang, and Q. Yan, “Cascade residual learning: a two-stage convolutional neural network for
stereo matching,” Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, vol. 2018-
Janua, pp. 878–886, 2017, doi: 10.1109/ICCVW.2017.108.
[30] S. Duggal, S. Wang, W. C. Ma, R. Hu, and R. Urtasun, “Deeppruner: learning efficient stereo matching via differentiable
patchmatch,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-Octob, pp. 4383–4392, 2019,
doi: 10.1109/ICCV.2019.00448.
[31] I. J. Goodfellow et al., “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 3, no. January, pp.
2672–2680, 2014, doi: 10.3156/jsoft.29.5_177_2.
[32] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Proceedings -
30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 5967–5976, 2017, doi:
10.1109/CVPR.2017.632.
[33] D. Scharstein et al., “High-resolution stereo datasets with subpixel-accurate ground truth,” Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8753, pp. 31–42, 2014, doi:
10.1007/978-3-319-11752-2_3.
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International
Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015.
[35] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,”
Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 2242–2251, 2017, doi:
10.1109/ICCV.2017.244.
[36] Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: unsupervised dual learning for image-to-image translation,” Proceedings of
the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 2868–2876, 2017, doi: 10.1109/ICCV.2017.310.
BIOGRAPHIES OF AUTHORS
Deepa is a graduate with M. Tech in Computer Science Engineering from
N.M.A.M. Institute of Technology, Nitte, India. She is currently pursuing her Ph.D. degree in
Computer Science engineering at VTU. She is currently working as an assistant professor at
Information Science & Engineering in N.M.A.M. Institute of Technology, Nitte, India. Her
research interests are in fields of computer vision, digital image processing. She has published
several papers in international journals and conferences. She can be contacted at email:
deepashetty17@nitte.edu.in.
Jyothi Kupparu received the Ph. D degree in computer science from the
Kuvempu University, Shimoga, India. She is a Professor of Information Science &
Engineering at J.N.N.C.E Shimoga. Her research interests include image processing, stereo
correspondence algorithms for face images, multimodal face recognition, 3D sparse
reconstruction and techniques based on stereo rectification for face images. She has published
several papers in international journals and conferences. She can be contacted at email:
jyothik@jnnce.ac.in.