This paper proposes a new hybrid method between the learning-based and handcrafted methods for a stereo matching algorithm. The main purpose of the stereo matching algorithm is to produce a disparity map. This map is essential for many applications, including three-dimensional (3D) reconstruction. The raw disparity map computed by a convolutional neural network (CNN) is still prone to errors in the low texture region. The algorithm is set to improve the matching cost computation stage with hybrid CNN-based combined with truncated directional intensity computation. The difference in truncated directional intensity value is employed to decrease radiometric errors. The proposed method’s raw matching cost went through the cost aggregation step using the bilateral filter (BF) to improve accuracy. The winner-take-all (WTA) optimization uses the aggregated cost volume to produce an initial disparity map. Finally, a series of refinement processes enhance the initial disparity map for a more accurate final disparity map. This paper verified the performance of the algorithm using the Middlebury online stereo benchmarking system. The proposed algorithm achieves the objective of generating a more accurate and smooth disparity map with different depths at low texture regions through better matching cost quality.
Stereo matching based on absolute differences for multiple objects detectionTELKOMNIKA JOURNAL
This article presents a new algorithm for object detection using stereo camera system. The problem to get an accurate object detion using stereo camera is the imprecise of matching process between two scenes with the same viewpoint. Hence, this article aims to reduce the incorrect matching pixel with four stages. This new algorithm is the combination of continuous process of matching cost computation, aggregation, optimization and filtering. The first stage is matching cost computation to acquire preliminary result using an absolute differences method. Then the second stage known as aggregation step uses a guided filter with fixed window support size. After that, the optimization stage uses winner-takes-all (WTA) approach which selects the smallest matching differences value and normalized it to the disparity level. The last stage in the framework uses a bilateral filter. It is effectively further decrease the error on the disparity map which contains information of object detection and locations. The proposed work produces low errors (i.e., 12.11% and 14.01% nonocc and all errors) based on the KITTI dataset and capable to perform much better compared with before the proposed framework and competitive with some newly available methods.
Stereo matching algorithm using census transform and segment tree for depth e...TELKOMNIKA JOURNAL
This article proposes an algorithm for stereo matching corresponding process that will be used in many applications such as augmented reality, autonomous vehicle navigation and surface reconstruction. Basically, the proposed framework in this article is developed through a series of functions. The final result from this framework is disparity map which this map has the information of depth estimation. Fundamentally, the framework input is the stereo image which represents left and right images respectively. The proposed algorithm in this article has four steps in total, which starts with the matching cost computation using census transform, cost aggregation utilizes segment-tree, optimization using winner-takes-all (WTA) strategy, and post-processing stage uses weighted median filter. Based on the experimental results from the standard benchmarking evaluation system from the Middlebury, the disparity map results produce an average low noise error at 9.68% for nonocc error and 18.9% for all error attributes. On average, it performs far better and very competitive with other available methods from the benchmark system.
Comparison of specific segmentation methods used for copy move detection IJECEIAES
In this digital age, the widespread use of digital images and the availability of image editors have made the credibility of images controversial. To confirm the credibility of digital images many image forgery detection types are arises, copy-move forgery is consisting of transforming any image by duplicating a part of the image, to add or hide existing objects. Several methods have been proposed in the literature to detect copy-move forgery, these methods use the key point-based and block-based to find the duplicated areas. However, the key point-based and block-based have a drawback of the ability to handle the smooth region. In addition, image segmentation plays a vital role in changing the representation of the image in a meaningful form for analysis. Hence, we execute a comparison study for segmentation based on two clustering algorithms (i.e., k-means and super pixel segmentation with density-based spatial clustering of applications with noise (DBSCAN)), the paper compares methods in term of the accuracy of detecting the forgery regions of digital images. K-means shows better performance compared with DBSCAN and with other techniques in the literature.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
Matching algorithm performance analysis for autocalibration method of stereo ...TELKOMNIKA JOURNAL
Stereo vision is one of the interesting research topics in the computer vision field. Two cameras are used to generate a disparity map, resulting in the depth estimation. Camera calibration is the most important step in stereo vision. The calibration step is used to generate an intrinsic parameter of each camera to get a better disparity map. In general, the calibration process is done manually by using a chessboard pattern, but this process is an exhausting task. Self-calibration is an important ability required to overcome this problem. Self-calibration required a robust and good matching algorithm to find the key feature between images as reference. The purpose of this paper is to analyze the performance of three matching algorithms for the autocalibration process. The matching algorithms used in this research are SIFT, SURF, and ORB. The result shows that SIFT performs better than other methods.
A deep learning based stereo matching model for autonomous vehicleIAESIJAI
Autonomous vehicle is one the prominent area of research in computer
vision. In today’s AI world, the concept of autonomous vehicles has become
popular largely to avoid accidents due to negligence of driver. Perceiving the
depth of the surrounding region accurately is a challenging task in
autonomous vehicles. Sensors like light detection and ranging can be used
for depth estimation but these sensors are expensive. Hence stereo matching
is an alternate solution to estimate the depth. The main difficulties observed
in stereo matching is to minimize mismatches in the ill-posed regions, like
occluded, texture less and discontinuous regions. This paper presents an
efficient deep stereo matching technique for estimating disparity map from
stereo images in ill-posed regions. The images from Middlebury stereo data
set are used to assess the efficacy of the model proposed. The experimental
outcome dipicts that the proposed model generates reliable results in the
occluded, texture less and discontinuous regions as compared to the existing
techniques.
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Stereo matching based on absolute differences for multiple objects detectionTELKOMNIKA JOURNAL
This article presents a new algorithm for object detection using stereo camera system. The problem to get an accurate object detion using stereo camera is the imprecise of matching process between two scenes with the same viewpoint. Hence, this article aims to reduce the incorrect matching pixel with four stages. This new algorithm is the combination of continuous process of matching cost computation, aggregation, optimization and filtering. The first stage is matching cost computation to acquire preliminary result using an absolute differences method. Then the second stage known as aggregation step uses a guided filter with fixed window support size. After that, the optimization stage uses winner-takes-all (WTA) approach which selects the smallest matching differences value and normalized it to the disparity level. The last stage in the framework uses a bilateral filter. It is effectively further decrease the error on the disparity map which contains information of object detection and locations. The proposed work produces low errors (i.e., 12.11% and 14.01% nonocc and all errors) based on the KITTI dataset and capable to perform much better compared with before the proposed framework and competitive with some newly available methods.
Stereo matching algorithm using census transform and segment tree for depth e...TELKOMNIKA JOURNAL
This article proposes an algorithm for stereo matching corresponding process that will be used in many applications such as augmented reality, autonomous vehicle navigation and surface reconstruction. Basically, the proposed framework in this article is developed through a series of functions. The final result from this framework is disparity map which this map has the information of depth estimation. Fundamentally, the framework input is the stereo image which represents left and right images respectively. The proposed algorithm in this article has four steps in total, which starts with the matching cost computation using census transform, cost aggregation utilizes segment-tree, optimization using winner-takes-all (WTA) strategy, and post-processing stage uses weighted median filter. Based on the experimental results from the standard benchmarking evaluation system from the Middlebury, the disparity map results produce an average low noise error at 9.68% for nonocc error and 18.9% for all error attributes. On average, it performs far better and very competitive with other available methods from the benchmark system.
Comparison of specific segmentation methods used for copy move detection IJECEIAES
In this digital age, the widespread use of digital images and the availability of image editors have made the credibility of images controversial. To confirm the credibility of digital images many image forgery detection types are arises, copy-move forgery is consisting of transforming any image by duplicating a part of the image, to add or hide existing objects. Several methods have been proposed in the literature to detect copy-move forgery, these methods use the key point-based and block-based to find the duplicated areas. However, the key point-based and block-based have a drawback of the ability to handle the smooth region. In addition, image segmentation plays a vital role in changing the representation of the image in a meaningful form for analysis. Hence, we execute a comparison study for segmentation based on two clustering algorithms (i.e., k-means and super pixel segmentation with density-based spatial clustering of applications with noise (DBSCAN)), the paper compares methods in term of the accuracy of detecting the forgery regions of digital images. K-means shows better performance compared with DBSCAN and with other techniques in the literature.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
Matching algorithm performance analysis for autocalibration method of stereo ...TELKOMNIKA JOURNAL
Stereo vision is one of the interesting research topics in the computer vision field. Two cameras are used to generate a disparity map, resulting in the depth estimation. Camera calibration is the most important step in stereo vision. The calibration step is used to generate an intrinsic parameter of each camera to get a better disparity map. In general, the calibration process is done manually by using a chessboard pattern, but this process is an exhausting task. Self-calibration is an important ability required to overcome this problem. Self-calibration required a robust and good matching algorithm to find the key feature between images as reference. The purpose of this paper is to analyze the performance of three matching algorithms for the autocalibration process. The matching algorithms used in this research are SIFT, SURF, and ORB. The result shows that SIFT performs better than other methods.
A deep learning based stereo matching model for autonomous vehicleIAESIJAI
Autonomous vehicle is one the prominent area of research in computer
vision. In today’s AI world, the concept of autonomous vehicles has become
popular largely to avoid accidents due to negligence of driver. Perceiving the
depth of the surrounding region accurately is a challenging task in
autonomous vehicles. Sensors like light detection and ranging can be used
for depth estimation but these sensors are expensive. Hence stereo matching
is an alternate solution to estimate the depth. The main difficulties observed
in stereo matching is to minimize mismatches in the ill-posed regions, like
occluded, texture less and discontinuous regions. This paper presents an
efficient deep stereo matching technique for estimating disparity map from
stereo images in ill-posed regions. The images from Middlebury stereo data
set are used to assess the efficacy of the model proposed. The experimental
outcome dipicts that the proposed model generates reliable results in the
occluded, texture less and discontinuous regions as compared to the existing
techniques.
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
EM algorithm is a common algorithm in data mining techniques. With the idea of using two iterations of E and M, the algorithm creates a model that can assign class labels to data points. In addition, EM not only optimizes the parameters of the model but also can predict device data during the iteration. Therefore, the paper focuses on researching and improving the EM algorithm to suit the LiDAR point cloud classification. Based on the idea of breaking point cloud and using the scheduling parameter for step E to help the algorithm converge faster with a shorter run time. The proposed algorithm is tested with measurement data set in Nghe An province, Vietnam for more than 92% accuracy and has faster runtime than the original EM algorithm.
LIDAR POINT CLOUD CLASSIFICATION USING EXPECTATION MAXIMIZATION ALGORITHMijnlc
EM algorithm is a common algorithm in data mining techniques. With the idea of using two iterations of E and M, the algorithm creates a model that can assign class labels to data points. In addition, EM not only optimizes the parameters of the model but also can predict device data during the iteration. Therefore, the paper focuses on researching and improving the EM algorithm to suit the LiDAR point cloud classification. Based on the idea of breaking point cloud and using the scheduling parameter for step E to help the algorithm converge faster with a shorter run time. The proposed algorithm is tested with measurement data set in Nghe An province, Vietnam for more than 92% accuracy and has faster runtime than the original EM algorithm.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Fast and accurate primary user detection with machine learning techniques for...nooriasukmaningtyas
Spectrum decision is an important and crucial task for the secondary user to avail the unlicensed spectrum for transmission. Managing the spectrum is an efficient one for spectrum sensing. Determining the primary user presence in the spectrum is an essential work for using the licensed spectrum of primary user. The information which lacks in managing the spectrum are the information about the primary user presence, accuracy in determining the existence of user in the spectrum, the cost for computation and difficult in finding the user in low signal-to noise ratio (SNR) values. The proposed system overcomes the above limitations. In the proposed system, the various techniques of machine learning like decision tree, support vector machines, naive bayes, ensemble based trees, nearest neighbour’s and logistic regression are used for testing the algorithm. As a first step, the spectrum sensing is done in two stages with orthogonal frequency division multiplexing and energy detection algorithm at the various values of SNR. The results generated from the above algorithm is used for database generation. Next, the different machine learning techniques are trained and compared for the results produced by different algorithms with the characteristics like speed, time taken for training and accuracy in prediction. The accuracy and finding the presence of the user in the spectrum at low SNR values are achieved by all the algorithms. The computation cost of the algorithm differs from each other. Among the tested techniques, k-nearest neighbour (KNN) algorithm produces the better performance in a minimized time.
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...TELKOMNIKA JOURNAL
This paper presents a new approach of Forward-Backward Time-Stepping (FBTS)
utilizing Finite-Difference Time-Domain (FDTD) method with Overset Grid Generation (OGG)
method to solve the inverse scattering problems for electromagnetic (EM) waves. The proposed
FDTD method is combined with OGG method to reduce the geometrically complex problem to a
simple set of grids. The grids can be modified easily without the need to regenerate the grid
system, thus, it provide an efficient approach to integrate with the FBTS technique. Here, the
characteristics of the EM waves are analyzed. For the research mentioned in this paper, the
‘measured’ signals are syntactic data generated by FDTD simulations. While the ‘simulated’
signals are the calculated data. The accuracy of the proposed approach is validated. Good
agreements are obtained between simulation data and measured data. The proposed approach
has the potential to provide useful quantitative information of the unknown object particularly for
shape reconstruction, object detection and others.
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMcscpconf
Detection of unexpected and emerging new threats has become a necessity for secured internet
communication with absolute data confidentiality, integrity and availability. Design and
development of such a detection system shall not only be new, accurate and fast but also
effective in a dynamic environment encompassing the surrounding network. In this paper, an algorithm is proposed for anomaly detection through modifying the Self – Organizing Map (SOM), by including new neighbourhood updating rules and learning rate dynamically in order to overcome the fixed architecture and random weight vector assignment. The algorithm initially starts with null network and grows with the original data space as initial weight vectors. New nodes are created using distance threshold parameter and their neighbourhood is identified using connection strength. Employing learning rule, the weight vector updation is carried out for neighbourhood nodes. Performance of the new algorithm is evaluated for using standard bench mark dataset. The result is compared with other neural network methods, shows 98% detection rate and 2% false alarm rate.
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESsipij
It has been proven that deeper convolutional neural networks (CNN) can result in better accuracy in many
problems, but this accuracy comes with a high computational cost. Also, input instances have not the same
difficulty. As a solution for accuracy vs. computational cost dilemma, we introduce a new test-cost-sensitive
method for convolutional neural networks. This method trains a CNN with a set of auxiliary outputs and
expert branches in some middle layers of the network. The expert branches decide to use a shallower part
of the network or going deeper to the end, based on the difficulty of input instance. The expert branches
learn to determine: is the current network prediction is wrong and if the given instance passed to deeper
layers of the network it will generate right output; If not, then the expert branches stop the computation
process. The experimental results on standard dataset CIFAR-10 show that the proposed method can train
models with lower test-cost and competitive accuracy in comparison with basic models.
Online video-based abnormal detection using highly motion techniques and stat...TELKOMNIKA JOURNAL
At the essence of video surveillance, there are abnormal detection approaches, which have been proven to be substantially effective in detecting abnormal incidents without prior knowledge about these incidents. Based on the state-of-the-art research, it is evident that there is a trade-off between frame processing time and detection accuracy in abnormal detection approaches. Therefore, the primary challenge is to balance this trade-off suitably by utilizing few, but very descriptive features to fulfill online performance while maintaining a high accuracy rate. In this study, we propose a new framework, which achieves the balancing between detection accuracy and video processing time by employing two efficient motion techniques, specifically, foreground and optical flow energy. Moreover, we use different statistical analysis measures of motion features to get robust inference method to distinguish abnormal behavior incident from normal ones. The performance of this framework has been extensively evaluated in terms of the detection accuracy, the area under the curve (AUC) and frame processing time. Simulation results and comparisons with ten relevant online and non-online frameworks demonstrate that our framework efficiently achieves superior performance to those frameworks, in which it presents high values for the accuracy while attaining simultaneously low values for the processing time.
Depth-DensePose: an efficient densely connected deep learning model for came...IJECEIAES
Camera/image-based localization is important for many emerging applications such as augmented reality (AR), mixed reality, robotics, and self-driving. Camera localization is the problem of estimating both camera position and orientation with respect to an object. Use cases for camera localization depend on two key factors: accuracy and speed (latency). Therefore, this paper proposes Depth-DensePose, an efficient deep learning model for 6-degrees-of-freedom (6-DoF) camera-based localization. The Depth-DensePose utilizes the advantages of both DenseNets and adapted depthwise separable convolution (DS-Conv) to build a deeper and more efficient network. The proposed model consists of iterative depth-dense blocks. Each depth dense block contains two adapted DS-Conv with two kernel sizes 3 and 5, which are useful to retain both low-level as well as high-level features. We evaluate the proposed Depth-DensePose on the Cambridge Landmarks dataset, which shows that the Depth-DensePose outperforms the performance of related deep learning models for camera based localization. Furthermore, extensive experiments were conducted which proven the adapted DS-Conv is more efficient than the standard convolution. Especially, in terms of memory and processing time which is important to real-time and mobile applications.
Stereo vision-based obstacle avoidance module on 3D point cloud dataTELKOMNIKA JOURNAL
This paper deals in building a 3D vision-based obstacle avoidance and navigation. In order for an autonomous system to work in real life condition, a capability of gaining surrounding environment data, interpret the data and take appropriate action is needed. One of the required capability in this matter for an autonomous system is a capability to navigate cluttered, unorganized environment and avoiding collision with any present obstacle, defined as any data with vertical orientation and able to take decision when environment update exist. Proposed in this work are two-step strategy of extracting the obstacle position and orientation from point cloud data using plane based segmentation and the resultant segmentation are mapped based on obstacle point position relative to camera using occupancy grid map to acquire obstacle cluster position and recorded the occupancy grid map for future use and global navigation, obstacle position gained in grid map is used to plan the navigation path towards target goal without going through obstacle position and modify the navigation path to avoid collision when environment update is present or platform movement is not aligned with navigation path based on timed elastic band method.
Machine learning based augmented reality for improved learning application th...IJECEIAES
Detection of objects and their location in an image are important elements of current research in computer vision. In May 2020, Meta released its state-ofthe-art object-detection model based on a transformer architecture called detection transformer (DETR). There are several object-detection models such as region-based convolutional neural network (R-CNN), you only look once (YOLO) and single shot detectors (SSD), but none have used a transformer to accomplish this task. These models mentioned earlier, use all sorts of hyperparameters and layers. However, the advantages of using a transformer pattern make the architecture simple and easy to implement. In this paper, we determine the name of a chemical experiment through two steps: firstly, by building a DETR model, trained on a customized dataset, and then integrate it into an augmented reality mobile application. By detecting the objects used during the realization of an experiment, we can predict the name of the experiment using a multi-class classification approach. The combination of various computer vision techniques with augmented reality is indeed promising and offers a better user experience.
Slantlet transform used for faults diagnosis in robot armIJEECSIAES
The robot arm systems are the most target systems in the fields of faults detection and diagnosis which are electrical and the mechanical systems in many fields. Fault detection and diagnosis study is presented for two robot arms. The disturbance due to the faults at robot's joints causes oscillations at the tip of the robot arm. The acceleration in multi-direction is analysed to extract the features of the faults. Simulations for planar and space robots are presented. Two types of feature (faults) detection methods are used in this paper. The first one is the discrete wavelet transform, which is applied in many research's works before. The second type, is the Slantlet transform, which represents an improved model of the discrete wavelet transform. The multi-layer perceptron artificial neural network is used for the purpose of faults allocation and classification. According to the obtained results, the Slantlet transform with the multi-layer perceptron artificial neural network appear to possess best performance (4.7088e-05), lower consuming time (71.017308 sec) and higher accuracy (100%) than the results obtained when applying discrete wavelet transform and artificial neural network for the same purpose.
The IoT and registration of MRI brain diagnosis based on genetic algorithm an...IJEECSIAES
The technology of the multimodal brain image registration is the key method for accurate and rapid diagnosis and treatment of brain diseases. For achieving high-resolution image registration, a fast sub pixel registration algorithm is used based on single-step discrete wavelet transform (DWT) combined with phase convolution neural network (CNN) to classify the registration of brain tumors. In this work apply the genetic algorithm and CNN clasifcation in registration of magnetic resonance imaging (MRI) image. This approach follows eight steps, reading the source of MRI brain image and loading the reference image, enhencment all MRI images by bilateral filter, transforming DWT image by applying the DWT2, evaluating (fitness function) each MRI image by using entropy, applying the genetic algorithm, by selecting the two images based on rollout wheel and crossover of the two images, the CNN classify the result of subtraction to normal or abnormal, “in the eighth one,” the Arduino and global system for mobile (GSM) 8080 are applied to send the message to patient. The proposed model is tested on MRI Medical City Hospital in Baghdad database consist 550 normal and 350 abnormal and split to 80% training and 20 testing, the proposed model result achieves the 98.8% accuracy.
More Related Content
Similar to A new function of stereo matching algorithm based on hybrid convolutional neural network
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
The extended utilization of picture-enhancing or manipulating tools has led to ease of manipulating multimedia data which includes digital images. These manipulations will disturb the truthfulness and lawfulness of images, resulting in misapprehension, and might disturb social security. The image forensic approach has been employed for detecting whether or not an image has been manipulated with the usage of positive attacks which includes splicing, and copy-move. This paper provides a competent tampering detection technique using resampling features and convolution neural network (CNN). In this model range spatial filtering (RSF)-CNN, throughout preprocessing the image is divided into consistent patches. Then, within every patch, the resampling features are extracted by utilizing affine transformation and the Laplacian operator. Then, the extracted features are accumulated for creating descriptors by using CNN. A wide-ranging analysis is performed for assessing tampering detection and tampered region segmentation accuracies of proposed RSF-CNN based tampering detection procedures considering various falsifications and post-processing attacks which include joint photographic expert group (JPEG) compression, scaling, rotations, noise additions, and more than one manipulation. From the achieved results, it can be visible the RSF-CNN primarily based tampering detection with adequately higher accurateness than existing tampering detection methodologies.
EM algorithm is a common algorithm in data mining techniques. With the idea of using two iterations of E and M, the algorithm creates a model that can assign class labels to data points. In addition, EM not only optimizes the parameters of the model but also can predict device data during the iteration. Therefore, the paper focuses on researching and improving the EM algorithm to suit the LiDAR point cloud classification. Based on the idea of breaking point cloud and using the scheduling parameter for step E to help the algorithm converge faster with a shorter run time. The proposed algorithm is tested with measurement data set in Nghe An province, Vietnam for more than 92% accuracy and has faster runtime than the original EM algorithm.
LIDAR POINT CLOUD CLASSIFICATION USING EXPECTATION MAXIMIZATION ALGORITHMijnlc
EM algorithm is a common algorithm in data mining techniques. With the idea of using two iterations of E and M, the algorithm creates a model that can assign class labels to data points. In addition, EM not only optimizes the parameters of the model but also can predict device data during the iteration. Therefore, the paper focuses on researching and improving the EM algorithm to suit the LiDAR point cloud classification. Based on the idea of breaking point cloud and using the scheduling parameter for step E to help the algorithm converge faster with a shorter run time. The proposed algorithm is tested with measurement data set in Nghe An province, Vietnam for more than 92% accuracy and has faster runtime than the original EM algorithm.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Fast and accurate primary user detection with machine learning techniques for...nooriasukmaningtyas
Spectrum decision is an important and crucial task for the secondary user to avail the unlicensed spectrum for transmission. Managing the spectrum is an efficient one for spectrum sensing. Determining the primary user presence in the spectrum is an essential work for using the licensed spectrum of primary user. The information which lacks in managing the spectrum are the information about the primary user presence, accuracy in determining the existence of user in the spectrum, the cost for computation and difficult in finding the user in low signal-to noise ratio (SNR) values. The proposed system overcomes the above limitations. In the proposed system, the various techniques of machine learning like decision tree, support vector machines, naive bayes, ensemble based trees, nearest neighbour’s and logistic regression are used for testing the algorithm. As a first step, the spectrum sensing is done in two stages with orthogonal frequency division multiplexing and energy detection algorithm at the various values of SNR. The results generated from the above algorithm is used for database generation. Next, the different machine learning techniques are trained and compared for the results produced by different algorithms with the characteristics like speed, time taken for training and accuracy in prediction. The accuracy and finding the presence of the user in the spectrum at low SNR values are achieved by all the algorithms. The computation cost of the algorithm differs from each other. Among the tested techniques, k-nearest neighbour (KNN) algorithm produces the better performance in a minimized time.
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...TELKOMNIKA JOURNAL
This paper presents a new approach of Forward-Backward Time-Stepping (FBTS)
utilizing Finite-Difference Time-Domain (FDTD) method with Overset Grid Generation (OGG)
method to solve the inverse scattering problems for electromagnetic (EM) waves. The proposed
FDTD method is combined with OGG method to reduce the geometrically complex problem to a
simple set of grids. The grids can be modified easily without the need to regenerate the grid
system, thus, it provide an efficient approach to integrate with the FBTS technique. Here, the
characteristics of the EM waves are analyzed. For the research mentioned in this paper, the
‘measured’ signals are syntactic data generated by FDTD simulations. While the ‘simulated’
signals are the calculated data. The accuracy of the proposed approach is validated. Good
agreements are obtained between simulation data and measured data. The proposed approach
has the potential to provide useful quantitative information of the unknown object particularly for
shape reconstruction, object detection and others.
DYNAMIC NETWORK ANOMALY INTRUSION DETECTION USING MODIFIED SOMcscpconf
Detection of unexpected and emerging new threats has become a necessity for secured internet
communication with absolute data confidentiality, integrity and availability. Design and
development of such a detection system shall not only be new, accurate and fast but also
effective in a dynamic environment encompassing the surrounding network. In this paper, an algorithm is proposed for anomaly detection through modifying the Self – Organizing Map (SOM), by including new neighbourhood updating rules and learning rate dynamically in order to overcome the fixed architecture and random weight vector assignment. The algorithm initially starts with null network and grows with the original data space as initial weight vectors. New nodes are created using distance threshold parameter and their neighbourhood is identified using connection strength. Employing learning rule, the weight vector updation is carried out for neighbourhood nodes. Performance of the new algorithm is evaluated for using standard bench mark dataset. The result is compared with other neural network methods, shows 98% detection rate and 2% false alarm rate.
TEST-COST-SENSITIVE CONVOLUTIONAL NEURAL NETWORKS WITH EXPERT BRANCHESsipij
It has been proven that deeper convolutional neural networks (CNN) can result in better accuracy in many
problems, but this accuracy comes with a high computational cost. Also, input instances have not the same
difficulty. As a solution for accuracy vs. computational cost dilemma, we introduce a new test-cost-sensitive
method for convolutional neural networks. This method trains a CNN with a set of auxiliary outputs and
expert branches in some middle layers of the network. The expert branches decide to use a shallower part
of the network or going deeper to the end, based on the difficulty of input instance. The expert branches
learn to determine: is the current network prediction is wrong and if the given instance passed to deeper
layers of the network it will generate right output; If not, then the expert branches stop the computation
process. The experimental results on standard dataset CIFAR-10 show that the proposed method can train
models with lower test-cost and competitive accuracy in comparison with basic models.
Online video-based abnormal detection using highly motion techniques and stat...TELKOMNIKA JOURNAL
At the essence of video surveillance, there are abnormal detection approaches, which have been proven to be substantially effective in detecting abnormal incidents without prior knowledge about these incidents. Based on the state-of-the-art research, it is evident that there is a trade-off between frame processing time and detection accuracy in abnormal detection approaches. Therefore, the primary challenge is to balance this trade-off suitably by utilizing few, but very descriptive features to fulfill online performance while maintaining a high accuracy rate. In this study, we propose a new framework, which achieves the balancing between detection accuracy and video processing time by employing two efficient motion techniques, specifically, foreground and optical flow energy. Moreover, we use different statistical analysis measures of motion features to get robust inference method to distinguish abnormal behavior incident from normal ones. The performance of this framework has been extensively evaluated in terms of the detection accuracy, the area under the curve (AUC) and frame processing time. Simulation results and comparisons with ten relevant online and non-online frameworks demonstrate that our framework efficiently achieves superior performance to those frameworks, in which it presents high values for the accuracy while attaining simultaneously low values for the processing time.
Depth-DensePose: an efficient densely connected deep learning model for came...IJECEIAES
Camera/image-based localization is important for many emerging applications such as augmented reality (AR), mixed reality, robotics, and self-driving. Camera localization is the problem of estimating both camera position and orientation with respect to an object. Use cases for camera localization depend on two key factors: accuracy and speed (latency). Therefore, this paper proposes Depth-DensePose, an efficient deep learning model for 6-degrees-of-freedom (6-DoF) camera-based localization. The Depth-DensePose utilizes the advantages of both DenseNets and adapted depthwise separable convolution (DS-Conv) to build a deeper and more efficient network. The proposed model consists of iterative depth-dense blocks. Each depth dense block contains two adapted DS-Conv with two kernel sizes 3 and 5, which are useful to retain both low-level as well as high-level features. We evaluate the proposed Depth-DensePose on the Cambridge Landmarks dataset, which shows that the Depth-DensePose outperforms the performance of related deep learning models for camera based localization. Furthermore, extensive experiments were conducted which proven the adapted DS-Conv is more efficient than the standard convolution. Especially, in terms of memory and processing time which is important to real-time and mobile applications.
Stereo vision-based obstacle avoidance module on 3D point cloud dataTELKOMNIKA JOURNAL
This paper deals in building a 3D vision-based obstacle avoidance and navigation. In order for an autonomous system to work in real life condition, a capability of gaining surrounding environment data, interpret the data and take appropriate action is needed. One of the required capability in this matter for an autonomous system is a capability to navigate cluttered, unorganized environment and avoiding collision with any present obstacle, defined as any data with vertical orientation and able to take decision when environment update exist. Proposed in this work are two-step strategy of extracting the obstacle position and orientation from point cloud data using plane based segmentation and the resultant segmentation are mapped based on obstacle point position relative to camera using occupancy grid map to acquire obstacle cluster position and recorded the occupancy grid map for future use and global navigation, obstacle position gained in grid map is used to plan the navigation path towards target goal without going through obstacle position and modify the navigation path to avoid collision when environment update is present or platform movement is not aligned with navigation path based on timed elastic band method.
Machine learning based augmented reality for improved learning application th...IJECEIAES
Detection of objects and their location in an image are important elements of current research in computer vision. In May 2020, Meta released its state-ofthe-art object-detection model based on a transformer architecture called detection transformer (DETR). There are several object-detection models such as region-based convolutional neural network (R-CNN), you only look once (YOLO) and single shot detectors (SSD), but none have used a transformer to accomplish this task. These models mentioned earlier, use all sorts of hyperparameters and layers. However, the advantages of using a transformer pattern make the architecture simple and easy to implement. In this paper, we determine the name of a chemical experiment through two steps: firstly, by building a DETR model, trained on a customized dataset, and then integrate it into an augmented reality mobile application. By detecting the objects used during the realization of an experiment, we can predict the name of the experiment using a multi-class classification approach. The combination of various computer vision techniques with augmented reality is indeed promising and offers a better user experience.
Slantlet transform used for faults diagnosis in robot armIJEECSIAES
The robot arm systems are the most target systems in the fields of faults detection and diagnosis which are electrical and the mechanical systems in many fields. Fault detection and diagnosis study is presented for two robot arms. The disturbance due to the faults at robot's joints causes oscillations at the tip of the robot arm. The acceleration in multi-direction is analysed to extract the features of the faults. Simulations for planar and space robots are presented. Two types of feature (faults) detection methods are used in this paper. The first one is the discrete wavelet transform, which is applied in many research's works before. The second type, is the Slantlet transform, which represents an improved model of the discrete wavelet transform. The multi-layer perceptron artificial neural network is used for the purpose of faults allocation and classification. According to the obtained results, the Slantlet transform with the multi-layer perceptron artificial neural network appear to possess best performance (4.7088e-05), lower consuming time (71.017308 sec) and higher accuracy (100%) than the results obtained when applying discrete wavelet transform and artificial neural network for the same purpose.
The IoT and registration of MRI brain diagnosis based on genetic algorithm an...IJEECSIAES
The technology of the multimodal brain image registration is the key method for accurate and rapid diagnosis and treatment of brain diseases. For achieving high-resolution image registration, a fast sub pixel registration algorithm is used based on single-step discrete wavelet transform (DWT) combined with phase convolution neural network (CNN) to classify the registration of brain tumors. In this work apply the genetic algorithm and CNN clasifcation in registration of magnetic resonance imaging (MRI) image. This approach follows eight steps, reading the source of MRI brain image and loading the reference image, enhencment all MRI images by bilateral filter, transforming DWT image by applying the DWT2, evaluating (fitness function) each MRI image by using entropy, applying the genetic algorithm, by selecting the two images based on rollout wheel and crossover of the two images, the CNN classify the result of subtraction to normal or abnormal, “in the eighth one,” the Arduino and global system for mobile (GSM) 8080 are applied to send the message to patient. The proposed model is tested on MRI Medical City Hospital in Baghdad database consist 550 normal and 350 abnormal and split to 80% training and 20 testing, the proposed model result achieves the 98.8% accuracy.
A comparative study for the assessment of Ikonos satellite image-fusion techn...IJEECSIAES
Image-fusion provide users with detailed information about the urban and rural environment, which is useful for applications such as urban planning and management when higher spatial resolution images are not available. There are different image fusion methods. This paper implements, evaluates, and compares six satellite image-fusion methods, namely wavelet 2D-M transform, gram schmidt, high-frequency modulation, high pass filter (HPF) transform, simple mean value, and PCA. An Ikonos image (PanchromaticPAN and multispectral-MULTI) showing the northwest of Bogotá (Colombia) is used to generate six fused images: MULTIWavelet 2D-M, MULTIG-S, MULTIMHF, MULTIHPF, MULTISMV, and MULTIPCA. In order to assess the efficiency of the six image-fusion methods, the resulting images were evaluated in terms of both spatial quality and spectral quality. To this end, four metrics were applied, namely the correlation index, erreur relative globale adimensionnelle de synthese (ERGAS), relative average spectral error (RASE) and the Q index. The best results were obtained for the MULTISMV image, which exhibited spectral correlation higher than 0.85, a Q index of 0.84, and the highest scores in spectral assessment according to ERGAS and RASE, 4.36% and 17.39% respectively.
Elliptical curve cryptography image encryption scheme with aid of optimizatio...IJEECSIAES
Image encryption enables users to safely transmit digital photographs via a wireless medium while maintaining enhanced anonymity and validity. Numerous studies are being conducted to strengthen picture encryption systems. Elliptical curve cryptography (ECC) is an effective tool for safely transferring images and recovering them at the receiver end in asymmetric cryptosystems. This method's key generation generates a public and private key pair that is used to encrypt and decrypt a picture. They use a public key to encrypt the picture before sending it to the intended user. When the receiver receives the image, they use their private key to decrypt it. This paper proposes an ECC-dependent image encryption scheme utilizing an enhancement strategy based on the gravitational search algorithm (GSA) algorithm. The private key generation step of the ECC system uses a GSAbased optimization process to boost the efficiency of picture encryption. The image's output is used as a health attribute in the optimization phase, such as the peak signal to noise ratio (PSNR) value, which demonstrates the efficacy of the proposed approach. As comparison to the ECC method, it has been discovered that the suggested encryption scheme offers better optimal PSNR values.
Design secure multi-level communication system based on duffing chaotic map a...IJEECSIAES
Cryptography and steganography are among the most important sciences that have been properly used to keep confidential data from potential spies and hackers. They can be used separately or together. Encryption involves the basic principle of instantaneous conversion of valuable information into a specific form that unauthorized persons will not understand to decrypt it. While steganography is the science of embedding confidential data inside a cover, in a way that cannot be recognized or seen by the human eye. This paper presents a high-resolution chaotic approach applied to images that hide information. A more secure and reliable system is designed to properly include confidential data transmitted through transmission channels. This is done by working the use of encryption and steganography together. This work proposed a new method that achieves a very high level of hidden information based on non-uniform systems by generating a random index vector (RIV) for hidden data within least significant bit (LSB) image pixels. This method prevents the reduction of image quality. The simulation results also show that the peak signal to noise ratio (PSNR) is up to 74.87 dB and the mean square error (MSE) values is up to 0.0828, which sufficiently indicates the effectiveness of the proposed algorithm.
An internet of things-based automatic brain tumor detection systemIJEECSIAES
Due to the advances in information and communication technologies, the usage of the internet of things (IoT) has reached an evolutionary process in the development of the modern health care environment. In the recent human health care analysis system, the amount of brain tumor patients has increased severely and placed in the 10th position of the leading cause of death. Previous state-of-the-art techniques based on magnetic resonance imaging (MRI) faces challenges in brain tumor detection as it requires accurate image segmentation. A wide variety of algorithms were developed earlier to classify MRI images which are computationally very complex and expensive. In this paper, a cost-effective stochastic method for the automatic detection of brain tumors using the IoT is proposed. The proposed system uses the physical activities of the brain to detect brain tumors. To track the daily brain activities, a portable wrist band named Mi Band 2, temperature, and blood pressure monitoring sensors embedded with Arduino-Uno are used and the system achieved an accuracy of 99.3%. Experimental results show the effectiveness of the designed method in detecting brain tumors automatically and produce better accuracy in comparison to previous approaches.
A YOLO and convolutional neural network for the detection and classification ...IJEECSIAES
The developing of deep learning systems that used for chronic diseases diagnosing is challenge. Furthermore, the localization and identification of objects like white blood cells (WBCs) in leukemia without preprocessing or traditional hand segmentation of cells is a challenging matter due to irregular and distorted of nucleus. This paper proposed a system for computer-aided detection depend completely on deep learning with three models computeraided detection (CAD3) to detect and classify three types of WBC which is fundamentals of leukemia diagnosing. The system used modified you only look once (YOLO v2) algorithm and convolutional neural network (CNN). The proposed system trained and evaluated on dataset created and prepared specially for the addressed problem without any traditional segmentation or preprocessing on microscopic images. The study proved that dividing of addressed problem into sub-problems will achieve better performance and accuracy. Furthermore, the results show that the CAD3 achieved an average precision (AP) up to 96% in the detection of leukocytes and accuracy 94.3% in leukocytes classification. Moreover, the CAD3 gives report contain a complete information of WBC. Finally, the CAD3 proved its efficiency on the other dataset such as acute lymphoblastic leukemia image database (ALL-IBD1) and blood cell count dataset (BCCD).
Development of smart machine for sorting of deceased onionsIJEECSIAES
Today, we are thinking to raise farmer’s income through various means and measures. Implementation of new crop patterns, technology inclusion and promoting the eshtablishment of numerous agro processing industries will play a major role in agriculture sector. The labour issue is also one of the main concerns in many of the agricultural activities. In this paper we propose a technological evolvement in onion detection process, where we apply image processing and sensory mechanism to identify sprouted and rotten onions respectively. This will yield to quick, accurate and prompt supply of goods to the market, irrespective of lack of consistent but costly manpower. The efficiency of this prototype in identifying the sprouted onions with the help of camera is observed to be upto 87% and also the response of Gas sensing system in detecting rooten onions under prescribed chamber dimensions is analysed and obtained encouraging results.
State and fault estimation based on fuzzy observer for a class of Takagi-Suge...IJEECSIAES
Singular nonlinear systems have received wide attention in recent years, and can be found in various applications of engineering practice. On the basis of the Takagi-Sugeno (T-S) formalism, which represents a powerful tool allowing the study and the treatment of nonlinear systems, many control and diagnostic problems have been treated in the literature. In this work, we aim to present a new approach making it possible to estimate simultaneously both non-measurable states and unknown faults in the actuators and sensors for a class of continuous-time Takagi-Sugeno singular model (CTSSM). Firstly, the considered class of CTSSM is represented in the case of premise variables which are non-measurable, and is subjected to actuator and sensor faults. Secondly, the suggested observer is synthesized based on the decomposition approach. Next, the observer’s gain matrices are determined using the Lyapunov theory and the constraints are defined as linear matrix inequalities (LMIs). Finally, a numerical simulation on an application example is given to demonstrate the usefulness and the good performance of the proposed dynamic system.
Hunting strategy for multi-robot based on wolf swarm algorithm and artificial...IJEECSIAES
The cooperation and coordination in multi-robot systems is a popular topic in the field of robotics and artificial intelligence, thanks to its important role in solving problems that are better solved by several robots compared to a single robot. Cooperative hunting is one of the important problems that exist in many areas such as military and industry, requiring cooperation between robots in order to accomplish the hunting process effectively. This paper proposed a cooperative hunting strategy for a multi-robot system based on wolf swarm algorithm (WSA) and artificial potential field (APF) in order to hunt by several robots a dynamic target whose behavior is unexpected. The formation of the robots within the multi-robot system contains three types of roles: the leader, the follower, and the antagonist. Each role is characterized by a different cognitive behavior. The robots arrive at the hunting point accurately and rapidly while avoiding static and dynamic obstacles through the artificial potential field algorithm to hunt the moving target. Simulation results are given in this paper to demonstrate the validity and the effectiveness of the proposed strategy.
Agricultural harvesting using integrated robot systemIJEECSIAES
In today's competitive world, robot designs are developed to simplify and improve quality wherever necessary. The rise in technology and modernization has led people from the unskilled sector to shift to the skilled sector. The agricultural sector's solution for harvesting fruits and vegetables is manual labor and a few other agro bots that are expensive and have various limitations when it comes to harvesting. Although robots present may achieve harvesting, the affordability of such designs may not be possible by small and medium-scale producers. The integrated robot system is designed to solve this problem, and when compared with the existing manual methods, this seems to be the most cost-effective, efficient, and viable solution. The robot uses deep learning for image detection, and the object is acquired using robotic manipulators. The robot uses a Cartesian and articulated configuration to perform the picking action. In the end, the robot is operated where carrots and cantaloupes were harvested. The data of the harvested crops are used to arrive at the conclusion of the robot's accuracy.
Characterization of silicon tunnel field effect transistor based on charge pl...IJEECSIAES
The aim of the proposed paper is an analytical model and realization of the characteristics for tunnel field-effect transistor (TFET) based on charge plasma (CP). One of the most applications of the TFET device which operates based on CP technique is the biosensor. CP-TFET is to be used as an effective device to detect the uncharged molecules of the bio-sample solution. Charge plasma is one of some techniques that recently invited to induce charge carriers inside the devices. In this proposed paper we use a high work function in the source (ϕ=5.93 eV) to induce hole charges and we use a lower work function in drain (ϕ=3.90 eV) to induce electron charges. Many electrical characterizations in this paper are considered to study the performance of this device like a current drain (ID) versus voltage gate (Vgs), ION/IOFF ratio, threshold voltage (VT) transconductance (gm), and subthreshold swing (SS). The signification of this paper comes into view enhancement the performance of the device. Results show that high dielectric (K=12), oxide thickness (Tox=1 nm), channel length (Lch=42 nm), and higher work function for the gate (ϕ=4.5 eV) tend to best charge plasma silicon tunnel field-effect transistor characterization.
A new smart approach of an efficient energy consumption management by using a...IJEECSIAES
Many consumers of electric power have excesses in their electric power consumptions that exceed the permissible limit by the electrical power distribution stations, and then we proposed a validation approach that works intelligently by applying machine learning (ML) technology to teach electrical consumers how to properly consume without wasting energy expended. The validation approach is one of a large combination of intelligent processes related to energy consumption which is called the efficient energy consumption management (EECM) approaches, and it connected with the internet of things (IoT) technology to be linked to Google Firebase Cloud where a utility center used to check whether the consumption of the efficient energy is satisfied. It divides the measured data for actual power (Ap) of the electrical model into two portions: the training portion is selected for different maximum actual powers, and the validation portion is determined based on the minimum output power consumption and then used for comparison with the actual required input power. Simulation results show the energy expenditure problem can be solved with good accuracy in energy consumption by reducing the maximum rate (Ap) in a given time (24) hours for a single house, as well as electricity’s bill cost, is reduced.
Parameter selection in data-driven fault detection and diagnosis of the air c...IJEECSIAES
Data-driven fault detection and diagnosis system (FDD) has been proven as simple yet powerful to identify soft and abrupt faults in the air conditioning system, leading to energy saving. However, the challenge is to obtain reliable operation data from the actual building. Therefore, a lab-scaled centralized chilled water air conditioning system was successfully developed in this paper. All necessary sensors were installed to generate reliable operation data for the data-driven FDD. Nevertheless, if a practical system is considered, the number of sensors required would be extensive as it depends on the number of rooms in the building. Hence, parameters impact in the dataset were also investigated to identify critical parameters for fault classifications. The analysis results had identified four critical parameters for data-driven FDD: the rooms' temperature (TTCx), supplied chilled water temperature (TCHWS), supplied chilled water flow rate (VCHWS) and supplied cooled water temperature (TCWS). Results showed that the data-driven FDD successfully diagnosed all six conditions correctly with the proposed parameters for more than 92.3% accuracy; only 0.6-3.4% differed from the original dataset's accuracy. Therefore, the proposed parameters can reduce the number of sensors used for practical buildings, thus reducing installation costs without compromising the FDD accuracy.
Electrical load forecasting through long short term memoryIJEECSIAES
For a power supplier, meeting demand-supply equilibrium is of utmost importance. Electrical energy must be generated according to demand, as a
large amount of electrical energy cannot be stored. For the proper
functioning of a power supply system, an adequate model for predicting load is a necessity. In the present world, in almost every industry, whether it be healthcare, agriculture, and consulting, growing digitization and automation is a prominent feature. As a result, large sets of data related to these industries are being generated, which when subjected to rigorous analysis,
yield out-of-the-box methods to optimize the business and services offered. This paper aims to ascertain the viability of long short term memory (LSTM)
neural networks, a recurrent neural network capable of handling both longterm and short-term dependencies of data sets, for predicting load that is to
be met by a Dispatch Center located in a major city. The result shows appreciable accuracy in forecasting future demand.
A simple faulted phase-based fault distance estimation algorithm for a loop d...IJEECSIAES
This paper presents a single ended faulted phase-based traveling wave fault localization algorithm for loop distribution grids which is that the sensor can
get many reflected signals from the fault point to face the complexity of localization. This localization algorithm uses a band pass filter to remove noise from the corrupted signal. The arriving times of the faulted phasebased filtered signals can be obtained by using phase-modal and discrete
wavelet transformations. The estimated fault distance can be calculated using the traveling wave method. The proposed algorithm presents detail
level analysis using three detail levels coefficients. The proposed algorithm is tested with MATLAB simulation single line to ground fault in a 10 kV grounded loop distribution system. The simulation result shows that the
faulted phase time delay can give better accuracy than using conventional time delays. The proposed algorithm can give fault distance estimation accuracy up to 99.7% with 30 dB contaminated signal-to-noise ratio (SNR)
for the nearest lines from the measured terminal.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
A new function of stereo matching algorithm based on hybrid convolutional neural network
1. Indonesian Journal of Electrical Engineering and Computer Science
Vol. 25, No. 1, January 2022, pp. 223~231
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v25.i1.pp223-231 223
Journal homepage: http://ijeecs.iaescore.com
A new function of stereo matching algorithm based on hybrid
convolutional neural network
Mohd Saad Hamid1,2
, Nurulfajar Abd Manap1
, Rostam Affendi Hamzah2
, Ahmad Fauzan Kadmin1,2
,
Shamsul Fakhar Abd Gani2
, Adi Irwan Herman3
1
Fakulti Kejuruteraan Elektronik dan Kejuruteraan Komputer, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
2
Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
3
Product & Test Engineering, Texas Instruments, Batu Berendam, Melaka, Malaysia
Article Info ABSTRACT
Article history:
Received Apr 23, 2021
Revised Nov 23, 2021
Accepted Nov 30, 2021
This paper proposes a new hybrid method between the learning-based and
handcrafted methods for a stereo matching algorithm. The main purpose of
the stereo matching algorithm is to produce a disparity map. This map is
essential for many applications, including three-dimensional (3D)
reconstruction. The raw disparity map computed by a convolutional neural
network (CNN) is still prone to errors in the low texture region. The
algorithm is set to improve the matching cost computation stage with hybrid
CNN-based combined with truncated directional intensity computation. The
difference in truncated directional intensity value is employed to decrease
radiometric errors. The proposed method’s raw matching cost went through
the cost aggregation step using the bilateral filter (BF) to improve accuracy.
The winner-take-all (WTA) optimization uses the aggregated cost volume to
produce an initial disparity map. Finally, a series of refinement processes
enhance the initial disparity map for a more accurate final disparity map.
This paper verified the performance of the algorithm using the Middlebury
online stereo benchmarking system. The proposed algorithm achieves the
objective of generating a more accurate and smooth disparity map with
different depths at low texture regions through better matching cost quality.
Keywords:
Convolutional neural network
Directional intensity
Stereo matching algorithm
Stereo vision
This is an open access article under the CC BY-SA license.
Corresponding Author:
Mohd Saad Hamid
Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka
Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia
Email: mohdsaad@utem.edu.my
1. INTRODUCTION
Recent years have seen a rise in the number of significant advancements in the stereo vision area. As
one of the active topics under stereo vision, the stereo matching problem still appears to be actively discussed
among researchers around the globe today [1]. One of the objectives in stereo matching algorithm
development is to produce a disparity or depth map. The map contains the disparity values between pixel
locations of matching features in the left and right images. The map can provide the depth information
between the camera and object. This information is valuable for applications such as three-dimensional (3D)
reconstruction, obstacle avoidance, robotics, and navigation. The stereo vision approach also has been used to
generate the 3D facial image for facial recognition [2].
Besides the stereo-based approach, as mentioned by [3], the single view or monocular-based method
can also produce depth-related information. This approach can generate a disparity map using a single image.
However, as [3] and [4] pointed out, the monocular depth method was less accurate than the stereo-based
approach. It is because the multi-view contains a broader amount of information compared to the single-view
2. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
224
approach. Another merit for the stereo-based method, as mentioned by [4], is the capability to outperform the
monocular method in recovering the objects of interest.
Another method for generating depth-related information is light detection and ranging (LiDAR).
LiDAR uses the laser approach to sense the depth and perform the depth measurement. LiDAR method can
provide accurate 3D points [4]. However, this method is not very cost-effective and time-consuming [5]. Due
to the highlighted shortcomings in monocular and LiDAR methods, we are inspired to continue working on
the stereo-based method and propose our algorithm.
Figure 1 illustrates the crucial steps for the stereo vision algorithm. The formalization of the
algorithm steps is related to the previous literature [6]-[9]. The first step is matching cost computation. Some
researchers utilize the directional intensity difference to compute matching costs [10]. The calculation of
absolute and squared differences is also commonly used because it requires low computational complexity.
The next step, cost aggregation, can decrease the errors in the initial matching cost. The edge-preserving
filters such as bilateral filter (BF) [11] and guided filter (GF) [12] can achieve the purpose. These filters
maintain a good edge while smoothing the input. Thus it provides better results in the aggregation step than
the low pass filters (Gaussian and box filter). The third step is in charge of assigning a value to the disparity
map. Winner-take-all (WTA) optimization is the most popular method for this step for the local approach. In
WTA, the smallest cost value disparity will be selected for every pixel location [7] to generate an initial
disparity map. The third step’s initial disparity may still contain errors caused by occlusions, low textures,
and invalid matches [13]. So, the final step may include multiple post-processing steps to refine the map. The
authors of [12] used the left-to-right consistency (LRC) check process to identify the void matched pixels.
The median filter is also commonly used for local refinement [14]-[16]. It is another type of non-linear filter
for denoising purposes. Due to its complexity, this final step can add additional time to the overall process.
Figure 1. Stereo vision algorithm steps
Deep learning has become the driving force behind advancements in the computer vision field [17].
Deep learning utilizes artificial neural networks to learn a large amount of data [18] mentioned that
conventional stereo vision performance will be improved by incorporating deep learning for recognition
tasks. Implementation of deep learning on stereo vision can be divided into two approaches. The first
approach is the deep learning approach combined with a traditional handcraft algorithm matching cost with a
convolutional neural network (MC-CNN-acrt) [14]. MC-CNN-acrt architecture outperforms other stereo
matching conventional methods on Middlebury [19] benchmarking systems [14] construct and train their
convolutional neural network (CNN) based network using binary classification to solve matching cost
computation steps. Another researcher in [20] proposed CNN-based similarity learning in Euclidean space
using a dot product. The method computes faster than MC-CNN, but the result is less accurate. CNN is also
used in stereo matching because of its feature extraction capabilities and vigorous radiometric difference [15].
The second approach is the pure deep learning end-to-end network. This approach does not require any
handcrafted algorithm. The end-to-end style deep learning network performs all stereo matching stages in
one combination network. Geometry and context network (GC-Net) by [21] uses 2D CNN to form cost
volumes and uses soft-argmin layer to regress the disparity values [22] introduce PSMNet, a faster and more
accurate disparity map than GC-Net.
The hybrid method between learning and handcraft algorithm was also introduced by [23] recently.
Their deep learning network produces a disparity map and predictions of uncertainties. They also propose a
handcrafted method to enhance their disparity map using a modified SGBM algorithm (SGBMP) by [23], [24]
introduces a hybrid model between CNN and the learning-based conditional random field (CRF) model to
form an end-to-end learning network. The performance of the hybrid and pure end-to-end methods will be
discussed further in section 3.
The author of [25] concluded, the end-to-end method still has some demerits in the ill-posed region
and is computationally expensive. Additionally, [14] also pointed out that the raw disparity map generated by
a CNN is prone to errors in the low texture and the occluded region. The problem with the low texture region
is also mentioned by [13], affecting the similarity measurement. Other researchers [26]-[28] also stated that
the low texture region is one of the challenging areas to cater to in stereo matching methods. So, in the
3. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
225
following section of this paper, we will present our proposed stereo matching algorithm. This paper’s main
contribution is the hybrid converged classification CNN fused with directional intensity information for the
matching cost computation stage. The main objective is to produce a better disparity map by focusing on the
low texture region errors.
2. THE PROPOSED METHOD
As mentioned in the earlier section, we present our proposed algorithm as categorized in [6]. A
summary view of our algorithm is illustrated in Figure 2. Our proposed algorithm is comprised of four stages,
Figure 2. The proposed stereo vision algorithm steps
2.1. Matching cost computation
Our CNN-based model is inspired by the MC-CNN-acrt [14] architecture. However, we did improve
the architecture, as discussed in [29]. In this paper, we extend the improvement through our new findings. As
discussed in our previous work, the converged classification method in our Siamese-based CNN model [29]
has been used for the matching cost computation stage. We then fused it with additional directional intensity
differences to improve the accuracy of matching costs computed in this stage. We maintained the eight layers
of our CNN architecture as in our previous work. The network will produce a similarity score that provides a
binary classification of good or bad matching. Based on (1), CCNN reflects the cost value for all disparities d at
each pixel position p.
𝐶𝐶𝑁𝑁(𝑝, 𝑑) = −𝑠(𝑃𝑁
𝐿(𝑝), 𝑃𝑁
𝑅(𝑝 − 𝑑)) (1)
Input patches of NxN size from left and right image, 𝑃𝑁
𝐿
and 𝑃𝑁
𝑅
respectively supplied to the CNN.
The minus sign would translate the score of similarity to the initial matching cost. The matching cost values
obtained by (1) are also discussed in [29]. Another crucial part of this cost computation stage is the
directional intensity difference. The second part of the cost function is defined in (2).
𝐶𝐷𝐼(𝑝, 𝑑) = 𝛼 ⋅ 𝑚𝑖𝑛(∥ 𝐼𝐿(𝑝) − 𝐼𝑅(𝑝) ∥,𝜏1) + (1 − 𝛼) ⋅ 𝑚𝑖𝑛 (∥▽
𝑥
𝐼𝐿(𝑝) −▽
𝑥
𝐼𝑅(𝑝) ∥,𝜏2) (2)
Here 𝐼(𝑝) represent the colour vector of a pixel at position p. ▽x is the directional intensity
difference in the x-direction. Α balances the directional intensity difference values, while τ1 and τ2 are
truncation values. This paper improves the matching cost step from our previous work [29] by combining it
with the directional intensity difference-based cost volume, CDI, as in (3).
𝐶𝐼𝑀(𝑝, 𝑑) = 𝐶𝐷𝐼(𝑝, 𝑑) + 𝜆 ⋅ 𝐶𝐶𝑁𝑁(𝑝, 𝑑) (3)
The parameter λ will balance the cost volumes. The raw matching cost volume cost from this stage is defined
as CIM.
2.2. Cost aggregation
In the second stage, we refine the raw matching costs to create a more accurate disparity map. It is
because the initial cost volume generated from the previous stage is prone to noises. Therefore, we aggregate
cost volumes in this stage using a bilateral filter (BF). The BF is responsible for performing smoothing
4. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
226
operations while keeping the edges sharp on the matching costs. The BF [11] defined in (4) will aggregate
the raw matching cost volume, CIM.
𝐵𝐹[𝐼]𝑝 =
1
𝑊𝑝
∑ 𝐺𝜎𝑠
𝑞𝜖𝑆 (∥ 𝑝 − 𝑞 ∥)𝐺𝜎𝑟
(∣ 𝐼𝑝 − 𝐼𝑞 ∣) (4)
Where 𝑊p is the normalization factor. 𝐼𝑝 and 𝐼𝑞 represent the intensity at pixel position 𝑝 and 𝑞, respectively.
𝑝 represents the (𝑥, 𝑦) coordinates pixel of interest while the neighbouring coordinate 𝑞 is within support
region 𝑆. The aggregated cost is denoted as CAGG in (5) at the end of this step.
𝐶𝐴𝐺𝐺(𝑝, 𝑑) = 𝐵𝐹[𝐼]𝐶𝐼𝑀(𝑝, 𝑑) (5)
2.3. Disparity computation / optimization
Using winner-take-all (WTA) optimization, we computed the disparity map for this disparity
computation stage. We employ WTA optimization as defined in (6) to produce an initial disparity map, 𝑑(𝑝).
𝑑(𝑝) = argmin
𝑑𝜖𝑑𝑟
(𝐶𝐴𝐺𝐺(𝑝, 𝑑)) (6)
The sets of disparity values of 𝑑𝑟 obtained from the ground truth. The 𝑑(𝑝) is the chosen disparity value for
position 𝑑(𝑝), which represents 2D coordinates (𝑥, 𝑦).
2.4. Disparity refinement
This refinement stage consists of several steps. As discussed by [12], left-right consistency (LRC)
check and interpolation were also used to deal with occlusion and mismatches. Firstly, we employ LRC to
identify invalid pixels. Then the detected invalid pixel is replaced with a valid pixel using the hole-filling
process. After that, we implemented a guided image filter (GIF) due to its good edge-preserving capabilities.
The filter kernel for GIF is implemented in this paper as defined in (7).
𝐺𝑝,𝑞(𝐼) =
1
∣𝑤∣
2 ∑ (1 +
(𝐼𝑝−𝜇𝑘)(𝐼𝑞−𝜇𝑘)
𝜎𝑘
2+𝜀
)
𝑞𝜖𝑊𝑘
(7)
Where I the guidance image and p represent the (x; y) coordinates pixel of interest. Another pixel position, q,
denote the neighbouring pixel in the support region wk. Then, σ and µ are the variance and mean of the
intensity values, respectively. The parameter, 𝜀 is used as a control element for the smoothness term. The
refined disparity map using GIF is defined as 𝑑𝐺𝐹 in (8).
𝑑𝐺𝐹(𝑝) = 𝐺𝑝,𝑞(𝐼)𝑑(𝑝) (8)
The weighted median filter (WMF) was implemented to remove any existing outliers in the disparity
map. We use the following cosine similarity weight function as defined in (9) for the weighted median
filtering.
𝑊
𝑝
𝑐𝑜𝑠
=
𝐼𝑝⋅𝐼𝑞
∥𝐼𝑝∥⋅∥𝐼𝑞∥
(9)
𝑑𝑓(𝑝) = 𝑊
𝑝
𝑐𝑜𝑠
ℎ(𝑝)𝑑𝐺𝐹(𝑝) (10)
After the WMF is implemented, the final disparity map generated from the whole proposed
algorithm represented by 𝑑𝑓(𝑝) in (10). The following section will provide an overview of the proposed
method’s performance quantitatively and qualitatively,
3. RESULTS AND DISCUSSION
Several tests were executed on a personal computer (PC) platform to examine the proposed method’s
performance using Middlebury v3 datasets [19]. The main code runs on the Python platform, utilizing Keras
and Tensorflow library to perform the CNN architecture’s inference. The hardware used in this test is a PC with
Intel Core i5 3.0 GHz with 16 GB DDR3 RAM and the Nvidia GTX1060 GPU. We maintained the
hyperparameters of our CNN model, similar to our previous work. The image datasets for the tests were taken
from Middlebury online benchmarking system [19]. The results of the proposed algorithm are also compared
with other methods such as MC-CNN-acrt and MC-CNN-fst by [14], pyramid stereo matching network
5. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
227
(PSMNet_ROB) by [22], hybrid CNN+CRF models by [24] (labelled as JMR), SGBMP by [23], MC-CNN-WS
by [30] and line segment based efficient large scale stereo matching (LS_ELAS) by [31]. Table 1 shows the
results obtained for All errors (error of invalid disparity values in all pixels). Another metric used is shown in
Table 2, which illustrates the NonOcc error percentage (error of invalid disparity values in non-occluded pixel).
Based on the All errors results in Table 1, the proposed algorithm framework outperforms the
original MC-CNN-acrt and other published methods. The proposed method can reduce the average errors
down to 9.29%. It can also perform better than another recently published deep learning method denoted as
SGBMP [23] and JMR [24]. However, there is a considerable difference between the SGBMP method’s
results and ours, especially on the Jadeplant, PianoL, Vintage and PlayTable images.
Based on the quantitative comparison of NonOcc error percentage in Table 2, the proposed method
performed moderately. However, the proposed method still performed better than the end-to-end network-
based method, PSMNet [22]. The proposed method generates output with 6.05% of errors, while PSMNet
produces 9.60%. Thus, except for the Vintage and Piano images, our proposed method outperforms PSMNet
in almost every image.
Table 1. Results from Middlebury benchmark - All error
Method
Proposed
JMR
SGBMP
MC-CNN-acrt
MC-CNN-fst
PSMNet_ROB
MC-CNN-WS
LS_ELAS
Adirondack 5.37 2.17 6.50 4.24 5.32 8.83 5.73 9.31
ArtL 8.36 18.00 9.33 18.70 19.20 13.90 20.50 5.90
Jadeplant 37.60 24.70 56.80 34.10 32.60 68.40 36.30 64.50
Motorcycle 7.24 5.98 5.04 7.21 8.75 8.26 9.39 7.24
MotorcycleE 7.29 6.90 5.43 7.22 8.83 9.16 9.37 7.65
Piano 6.63 6.14 4.77 6.00 8.12 5.89 8.13 6.25
PianoL 7.88 7.27 14.80 9.35 17.20 10.50 16.10 9.69
Pipes 11.40 11.00 7.85 13.50 15.50 14.40 16.70 12.80
Playroom 9.78 17.50 7.62 18.30 18.60 9.38 18.70 10.10
Playtable 4.72 8.18 10.60 9.71 13.60 5.54 11.50 23.90
PlaytableP 4.48 7.44 3.78 9.37 9.75 5.52 10.10 4.27
Recycle 3.79 2.96 3.19 4.64 5.00 4.98 5.05 7.39
Shelves 8.44 7.81 5.00 6.62 8.91 11.60 9.83 8.48
Teddy 3.64 8.98 3.35 9.31 10.40 3.87 11.00 2.98
Vintage 9.96 10.30 30.00 21.60 15.80 9.66 20.80 14.00
Average 9.29 9.57 11.20 11.80 12.80 13.30 13.70 12.90
Table 2. Results from Middlebury benchmark - NonOcc error
Method
JMR
MC-CNN-acrt
MC-CNN-fst
MC-CNN-WS
Proposed
SGBMP
PSMNet_ROB
LS_ELAS
Adirondack 0.92 0.76 1.21 1.66 3.23 3.87 7.32 8.46
ArtL 2.18 2.49 2.84 4.27 5.79 4.96 9.69 3.83
Jadeplant 6.01 16.30 10.00 12.80 19.00 29.30 44.50 41.10
Motorcycle 1.26 1.27 1.62 2.26 4.59 3.45 5.55 5.12
MotorcycleE 1.27 1.27 1.61 2.18 4.50 3.89 6.12 5.80
Piano 2.21 1.83 3.17 3.21 5.88 3.82 5.01 5.54
PianoL 4.03 5.07 13.20 11.70 7.31 14.40 9.82 8.97
Pipes 2.12 2.29 3.20 4.27 7.05 3.94 9.86 7.44
Playroom 1.94 2.27 3.13 3.49 6.75 5.09 7.33 8.76
Playtable 2.20 3.11 5.78 3.78 3.58 9.74 4.40 22.40
PlaytableP 1.65 3.03 2.97 3.31 3.51 2.70 4.43 3.47
Recycle 1.30 2.48 1.95 1.83 2.97 2.91 3.73 6.93
Shelves 5.51 4.41 6.26 7.02 7.39 4.64 11.10 8.26
Teddy 1.15 1.07 1.12 2.00 2.51 1.80 3.44 2.29
Vintage 3.73 14.80 9.16 14.30 8.08 26.10 8.07 13.10
Average 2.30 3.81 3.87 4.63 6.05 7.25 9.60 9.66
6. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
228
The qualitative comparison of the methods is illustrated in Figure 3. The ground truth image of
PlayTable image is shown in Figure 3(a). The PlayTable image pair (illustrated as left and right image in
Figure 3(b) and Figure 3(c) respectively) contains a low texture region in the floor area of images. Based on
Figure 3(d), the proposed method produces a smooth disparity map with different depths in the region with low
texture and plain colour. Qualitatively, the map produced is more accurate than the disparity map generated by
other methods such as SGBMP, JMR, MC-CNN-WS, MC-CNN-acrt and LS_ELAS (illustrated in Figure 3(e),
Figure 3(f), Figure 3(g), Figure 3(h) and Figure 3(i) respectively). This proposed method achieved the objective
of this work, to reduce the error in the low texture region, as mentioned in the earlier section. The All error
recorded for the PlayTable image is the best among other methods, as illustrated in Table 1.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 3. Middlebury - PlayTable image comparison, (a) ground truth, (b) left, (c) right, (d) proposed
method, (e) SGBMP, (f) JMR, (g) MC-CNN-WS, (h) MC-CNN-acrt, (i) LS_ELAS
The effectiveness of combining the cost volumes 𝐶𝐷𝐼 and 𝐶𝐶𝑁𝑁 into CIM in step 1 as discussed in
section 2.1, is proven. The combined cost volumes, CIM helps reduce the error from 16.2% to 9.29% for All
errors and from 14.4% to 6.05% for NonOcc error in the Middlebury online stereo benchmarking system. For
qualitative comparison, Figure 4(a) depicts the left image of the Adirondack image. Figure 4(b) illustrated the
disparity map for the Adirondack image when CDI was used without CCNN. In addition, Figure 4(c)
demonstrates the effect of combining both cost volumes (CIM). Thus, this combination contributes to better
accuracy generated through the proposed matching cost computation steps.
(a) (b) (c)
Figure 4. Middlebury-Adirondack image (a) left image, (b) without proposed method, and (c) proposed method
7. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
229
4. CONCLUSION
In conclusion, we demonstrate the hybrid learning-based method combined with a handcrafted
method to perform matching cost computation. The initial cost volume created may contain noises, so the
cost aggregation step was implemented to reduce errors. BF has been used as an edge-preserving filter to
maintain the sharp edge while smoothing the input. WTA was employed to compute the initial disparity map.
The final disparity map generated through post-processing steps includes LRC, hole filling, GIF, and WMF.
Based on the final disparity map, we conclude that the proposed algorithm enhances the disparity map’s
accuracy in the low texture region while maintaining the object edge. Furthermore, the proposed algorithm
can perform competitively compared to other published methods based on the Middlebury stereo
benchmarking system.
ACKNOWLEDGEMENTS
This work is supported by the Centre for Research and Innovation Management (CRIM), Universiti
Teknikal Malaysia Melaka (UTeM).
REFERENCES
[1] T. Xue, A. Owens, D. Scharstein, M. Goesele, and R. Szeliski, “Multi-frame stereo matching with edges, planes, and
superpixels,” Image Vis. Comput., 2019, doi: 10.1016/j.imavis.2019.05.006.
[2] E. Winarno, I. H. Al Amin, and W. Hadikurniawati, “Asymmetrical half-join method on dual vision face recognition,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 6, pp. 3411-3420, 2017, doi:
10.11591/ijece.v7i6.pp3411-3420.
[3] H. Ham, J. Wesley, and Hendra, “Computer vision based 3D reconstruction : A review,” International Journal of Electrical and
Computer Engineering (IJECE), vol. 9, no. 4, pp. 2394-2402, 2019, doi: 10.11591/ijece.v9i4.pp2394-2402.
[4] C. Strecha, W. V. Hansen, L. V. Gool, P. Fua, and U. Thoennessen, “On benchmarking camera calibration and multi-view stereo
for high resolution imagery,” in 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008, doi:
10.1109/CVPR.2008.4587706.
[5] G. Yang, J. Manela, M. Happold, and D. Ramanan, “Hierarchical Deep Stereo Matching on High-Resolution Images,” in 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019,
pp. 5510-5519, doi: 10.1109/CVPR.2019.00566.
[6] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo,” Int. J. Comput. Vis., vol. 47, no. 1, pp.
7-42, 2002, doi: 10.1109/SMBV.2001.988771.
[7] R. A. Hamzah and H. Ibrahim, “Literature survey on stereo vision disparity map algorithms,” J. Sensors, vol. 2016, 2016, doi:
10.1155/2016/8742920.
[8] M. S. Hamid, N. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Stereo matching algorithm based on deep learning: A survey,” J.
King Saud Univ. - Comput. Inf. Sci., 2020, doi: 10.1016/j.jksuci.2020.08.011.
[9] N. A. Manap, S. F. Hussin, A. M. Darsono, and M. M. Ibrahim, “Performance Analysis on Stereo Matching Algorithms Based on
Local and Global Methods for 3D Images Application,” J. Telecommun. Electron. Comput. Eng., vol. 10, no. 2, pp. 23-28, 2018.
[10] K. Zhang, Y. Fang, D. Min, L. Sun, S. Yang, and S. Yan, “Cross-Scale Cost Aggregation for Stereo Matching,” IEEE Trans.
Circuits Syst. Video Technol., vol. 27, no. 5, pp. 965-976, 2017, doi: 10.1109/TCSVT.2015.2513663.
[11] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral filtering: Theory and applications,” Found. Trends Comput. Graph.
Vis., vol. 4, no. 1, pp. 1-73, 2009, doi: 10.1561/0600000020.
[12] S. Zhu, Z. Wang, X. Zhang, and Y. Li, “Edge-preserving guided filtering based cost aggregation for stereo matching,” J. Vis.
Commun. Image Represent., vol. 39, pp. 107-119, 2016, doi: 10.1016/j.jvcir.2016.05.012.
[13] P. Brandao, E. Mazomenos, and D. Stoyanov, “Widening siamese architectures for stereo matching,” Pattern Recognit. Lett., vol.
120, pp. 75-81, Apr. 2019, doi: 10.1016/j.patrec.2018.12.002.
[14] J. ˇZbontar and Y. LeCun, “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches,” J. Mach.
Learn. Res., vol. 17, pp. 1-32, 2016, doi: 10.1186/s13568-015-0106-7.
[15] S. Wen, “Convolutional neural network and adaptive guided image filter based stereo matching,” in 2017 IEEE International
Conference on Imaging Systems and Techniques (IST), 2017, no. 1, pp. 1-6, doi: 10.1109/IST.2017.8261530.
[16] H. Xu, “Stereo matching and depth map collection algorithm based on deep learning,” in IST 2017 - IEEE International
Conference on Imaging Systems and Techniques, Proceedings, vol. 2018-Janua, no. 1, 2018, pp. 1-6, doi:
10.1109/IST.2017.8261504.
[17] T. Q. Vinh, L. H. Duy, and N. T. Nhan, “Vietnamese handwritten character recognition using convolutional neural network,”
Indonesian Journal of Applied Informatics (IJAI), vol. 9, no. 2, pp. 276-283, 2020, doi: 10.11591/ijai.v9.i2.pp276-283.
[18] D. Ciresan, U. Meier, J. Schmidhuber, D. Cires, and U. Meier, “Multi-column Deep Neural Networks for Image Classification,”
in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, no. February,
pp. 3642-3649, doi: 10.1109/CVPR.2012.6248110.
[19] D. Scharstein et al., “High-resolution stereo datasets with subpixel-accurate ground truth,” in German Conference on Pattern
Recognition, 2014, doi: 10.1007/978-3-319-11752-2_3.
[20] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang, “A Deep Visual Correspondence Embedding Model,” in 2015 IEEE
International Conference on Computer Vision (ICCV), 2015, pp. 1-9, doi: 10.1109/ICCV.2015.117.
[21] A. Kendall et al., “End-to-End Learning of Geometry and Context for Deep Stereo Regression,” in Proceedings of the IEEE
International Conference on Computer Vision, 2017, vol. 2017-Octob, pp. 66-75, doi: 10.1109/ICCV.2017.17.
[22] J.-R. Chang and Y.-S. Chen, “Pyramid Stereo Matching Network,” in 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2018, doi: 10.1109/CVPR.2017.730.
[23] Y. Hu, W. Zhen, and S. Scherer, “Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction,” in 2020
IEEE International Conference on Robotics and Automation (ICRA), 2020,
pp. 8637-8643, doi: 10.1109/ICRA40945.2020.9196655.
8. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 25, No. 1, January 2022: 223-231
230
[24] P. Knöbelreiter, C. Reinbacher, A. Shekhovtsov, and T. Pock, “End-to-end training of hybrid CNN-CRF models for stereo,” Proc.
- 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, 2017,
pp. 1456-1465, doi: 10.1109/CVPR.2017.159.
[25] X. Song, X. Zhao, H. Hu, and L. Fang, “EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching,” Lect.
Notes Comput. Sci., vol. 11365, pp. 20-35, 2019, doi: 10.1007/978-3-030-20873-8_2.
[26] S. Zhu, H. Xu, and L. Yan, “A Stereo Matching and Depth Map Acquisition Algorithm Based on Deep Learning and Improved
Winner Takes All-Dynamic Programming,” IEEE Access, vol. 7, pp. 74625-74639, 2019, doi: 10.1109/ACCESS.2019.2921395.
[27] J. Kang, L. Chen, F. Deng, and C. Heipke, “Context pyramidal network for stereo matching regularized by disparity gradients,”
ISPRS J. Photogramm. Remote Sens., vol. 157, no. March, pp. 201-215, 2019, doi: 10.1016/j.isprsjprs.2019.09.012.
[28] F. Li, Q. Li, T. Zhang, Y. Niu, and G. Shi, “Depth acquisition with the combination of structured light and deep learning stereo
matching,” Signal Process. Image Commun., vol. 75, no. April, pp. 111-117, 2019, doi: 10.1016/j.image.2019.04.001.
[29] M. S. Hamid, N. A. Manap, R. A. Hamzah, and A. F. Kadmin, “Converged classification network for matching cost
computation,” Int. J. Adv. Sci. Technol., vol. 29, no. 6, 2020.
[30] S. Tulyakov, A. Ivanov, and F. Fleuret, “Weakly Supervised Learning of Deep Metrics for Stereo Reconstruction,” in
Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, 2017, pp. 1348-1357, doi:
10.1109/ICCV.2017.150.
[31] R. A. Jellal, M. Lange, B. Wassermann, A. Schilling, and A. Zell, “LS-ELAS: Line segment based efficient large scale stereo
matching,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017,
pp. 146-152, doi: 10.1109/ICRA.2017.7989019.
BIOGRAPHIES OF AUTHORS
Mohd Saad Hamid received his B. Eng degree majoring in Computer from
Multimedia University. He completed his M. Eng majoring in Computer and Communication
from Universiti Kebangsaan Malaysia in 2014. Currently, he is a lecturer at Universiti
Teknikal Malaysia Melaka. His research interests are computer vision, deep learning, image
processing and embedded systems. He can be contacted at email: mohdsaad@utem.edu.my.
Nurulfajar Abd Manap earned a Master of Electrical Engineering in image
processing (2002, Universiti Teknologi Malaysia) and Bachelor of Electrical Engineering
(2000, Universiti Teknologi Malaysia). He obtained his PhD (Image and Video Processing) at
the University of Strathclyde, Glasgow in 2012. At present, he teaches various electronics
engineering subjects and courses in his affiliation for undergraduate and postgraduate levels.
His subjects include Advanced Digital Signal Processing, Computer Vision and Pattern
Recognition, Data Structures, Multimedia Technology & Applications and Distributed & High
Performance Computing. His research interests are 3D image processing, stereo vision and
video processing. He also one of MIET Chartered Engineer and Apple Distinguished
Educators (ADE). He can be contacted at email: nurulfajar@utem.edu.my.
Rostam Affendi Hamzah graduated from Universiti Teknologi Malaysia where
he received his B. Eng majoring in Electronic Engineering. Then he received his M. Sc.
majoring in Electronic System Design engineering from the Universiti Sains Malaysia in 2010.
In 2017, he received PhD majoring in Electronic Imaging from Universiti Sains Malaysia.
Currently, he is a senior lecturer in the Universiti Teknikal Malaysia Melaka teaching Digital
Electronics, Digital Image Processing and Embedded Systems. His research interests are
computer vision, pattern recognition and digital image processing. He can be contacted at
email: rostamaffendi@utem.edu.my.
9. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
A new function of stereo matching algorithm based on hybrid convolutional neural… (Mohd Saad Hamid)
231
Ahmad Fauzan Kadmin graduated with a Bachelor Degree in Electronics
Engineering from Universiti Sains Malaysia and Master Degree in Computer &
Communication Engineering from Universiti Kebangsaan Malaysia. He has over 14 years of
experience in the electronic & computer engineering field with technical expert in R&D
engineering, computer vision & medical electronics. He is also one of the MIET Chartered
Engineers. He can be contacted at email: fauzan@utem.edu.my.
Shamsul Fakhar Abd Gani graduated from Universiti Malaysia Perlis
(UniMAP) in Bachelor of Engineering (Computer Engineering) with honours in 2006 and later
received his Master's degree in Internet & Web Computing in 2015 from Royal Melbourne
Institute of Technology (RMIT) Australia. He started his career as an R&D electronic engineer
specializing in software design for meter cluster development in Siemens VDO Automotive.
Shamsul is now a lecturer in the electronic and computer engineering technology department
of FTKEE UTeM. He can be contacted at email: shamsulfakhar@utem.edu.my.
Adi Irwan Herman graduated in 2015 with a Bachelor Degree in Computer
Engineering Technology (Computer Systems) from the Universiti Teknikal Malaysia Melaka.
Currently, he has been working with Texas Instrument for more than five years, with his
current research interests being computer engineering-related fields of studies. He can be
contacted at email: adiirwanherman@gmail.com.