In recent years, many applications have been implemented in embedded systems and mobile Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry Convolutional Neural Network (CNN) in this class of devices, many research groups have developed specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm called Stochastic Computing (SC) can implement CNN with low hardware footprint and power consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding error. Experimental results show our proposed solution provides significant savings in hardware footprint and increased accuracy for the SC CNN basic functions circuits compared to previous work.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
ย
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Energy efficiency is one of the most critical issue in design of System on Chip. In Network On
Chip (NoC) based system, energy consumption is influenced dramatically by mapping of
Intellectual Property (IP) which affect the performance of the system. In this paper we test the
antecedently extant proposed algorithms and introduced a new energy proficient algorithm
stand for 3D NoC architecture. In addition a hybrid method has also been implemented using
bioinspired optimization (particle swarm optimization) technique. The proposed algorithm has
been implemented and evaluated on randomly generated benchmark and real life application
such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing algorithm (spiral and crinkle) and has shown better
reduction in the communication energy consumption and shows improvement in the
performance of the system. Comparing our work with spiral and crinkle, experimental result
shows that the average reduction in communication energy consumption is 19% with spiral and
17% with crinkle mapping algorithms, while reduction in communication cost is 24% and 21%
whereas reduction in latency is of 24% and 22% with spiral and crinkle. Optimizing our work
and the existing methods using bio-inspired technique and having the comparison among them
an average energy reduction is found to be of 18% and 24%.
STUDY OF TASK SCHEDULING STRATEGY BASED ON TRUSTWORTHINESS ijdpsjournal
ย
MapReduce is a distributed computing model for cloud computing to process massive data. It simplifies the
writing of distributed parallel programs. For the fault-tolerant technology in the MapReduce programming
model, tasks may be allocated to nodes with low reliability. It causes the task to be reexecuted, wasting time
and resources. This paper proposes a reliability task scheduling strategy with a failure recovery mechanism,
evaluates the trustworthiness of resource nodes in the cloud environment and builds a trustworthiness model.
By using the simulation platform CloudSim, the stability of the task scheduling algorithm and scheduling
model are verified in this paper.
An experimental evaluation of similarity-based and embedding-based link predi...IJDKP
ย
The task of inferring missing links or predicting future ones in a graph based on its current structure
is referred to as link prediction. Link prediction methods that are based on pairwise node similarity
are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features,
and thus support the subsequent link prediction task in graph. This paper studies a selection of
methods from both categories on several benchmark (homogeneous) graphs with different properties
from various domains. Beyond the intra and inter category comparison of the performances of the
methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)-
based methods and heuristic ones as a means to alleviate the black-box well-known limitation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
The objective of this paper is to present the hybrid approach for edge detection. Under this technique, edge
detection is performed in two phase. In first phase, Canny Algorithm is applied for image smoothing and in
second phase neural network is to detecting actual edges. Neural network is a wonderful tool for edge
detection. As it is a non-linear network with built-in thresholding capability. Neural Network can be trained
with back propagation technique using few training patterns but the most important and difficult part is to
identify the correct and proper training set.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
ย
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Energy efficiency is one of the most critical issue in design of System on Chip. In Network On
Chip (NoC) based system, energy consumption is influenced dramatically by mapping of
Intellectual Property (IP) which affect the performance of the system. In this paper we test the
antecedently extant proposed algorithms and introduced a new energy proficient algorithm
stand for 3D NoC architecture. In addition a hybrid method has also been implemented using
bioinspired optimization (particle swarm optimization) technique. The proposed algorithm has
been implemented and evaluated on randomly generated benchmark and real life application
such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing algorithm (spiral and crinkle) and has shown better
reduction in the communication energy consumption and shows improvement in the
performance of the system. Comparing our work with spiral and crinkle, experimental result
shows that the average reduction in communication energy consumption is 19% with spiral and
17% with crinkle mapping algorithms, while reduction in communication cost is 24% and 21%
whereas reduction in latency is of 24% and 22% with spiral and crinkle. Optimizing our work
and the existing methods using bio-inspired technique and having the comparison among them
an average energy reduction is found to be of 18% and 24%.
STUDY OF TASK SCHEDULING STRATEGY BASED ON TRUSTWORTHINESS ijdpsjournal
ย
MapReduce is a distributed computing model for cloud computing to process massive data. It simplifies the
writing of distributed parallel programs. For the fault-tolerant technology in the MapReduce programming
model, tasks may be allocated to nodes with low reliability. It causes the task to be reexecuted, wasting time
and resources. This paper proposes a reliability task scheduling strategy with a failure recovery mechanism,
evaluates the trustworthiness of resource nodes in the cloud environment and builds a trustworthiness model.
By using the simulation platform CloudSim, the stability of the task scheduling algorithm and scheduling
model are verified in this paper.
An experimental evaluation of similarity-based and embedding-based link predi...IJDKP
ย
The task of inferring missing links or predicting future ones in a graph based on its current structure
is referred to as link prediction. Link prediction methods that are based on pairwise node similarity
are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features,
and thus support the subsequent link prediction task in graph. This paper studies a selection of
methods from both categories on several benchmark (homogeneous) graphs with different properties
from various domains. Beyond the intra and inter category comparison of the performances of the
methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)-
based methods and heuristic ones as a means to alleviate the black-box well-known limitation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
The objective of this paper is to present the hybrid approach for edge detection. Under this technique, edge
detection is performed in two phase. In first phase, Canny Algorithm is applied for image smoothing and in
second phase neural network is to detecting actual edges. Neural network is a wonderful tool for edge
detection. As it is a non-linear network with built-in thresholding capability. Neural Network can be trained
with back propagation technique using few training patterns but the most important and difficult part is to
identify the correct and proper training set.
Design and implementation a prototype system for fusion image by using SWT-PC...IJECEIAES
ย
The technology of fusion image is dominance strongly over domain research for recent years, the techniques of fusion have various applications in real time used and proposed such as purpose of military and remote sensing etc., the fusion image is very efficient in processing of digital image. Single image produced from two images or more information of relevant combining process results from multi sensor fusion image. FPGA is the best implementation types of most technology enabling wide spread.This device works with modern versions for different critical characteristics same huge number of elements logic in order to permit complex algorithm implemented. In this paper,filters are designed and implemented in FPGA utilized for disease specified detection from images CT/MRI scanned where the samples are taken for human's brain with various medical images and the processing of fusion employed by using technique Stationary Wavelet Transform and Principal Component Analysis (SWT-PCA). Accuracy image output increases when implemented this technique and that was done by sampling down eliminating where effects blurring and artifacts doesn't influenced. The algorithm of SWT-PCA parameters quality measurements like NCC, MSE, PSNR, coefficients and Eigen values.The advantages significant of this system that provide real time, time rapid to market and portability beside the change parametric continuing in the DWT transform. The designed and simulation of module proposed system has been done by using MATLAB simulink and blocks generator system, Xilinx synthesized with synthesis tool (XST) and implemented in XilinxSpartan 6-SP605 device.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...IJCNCJournal
ย
Energy efficiency is an essential issue to be reckoned in wireless sensor networks development. Since the low-powered sensor nodes deplete their energy in transmitting the collected information, several strategies have been proposed to investigate the communication power consumption, in order to reduce the amount of transmitted data without affecting the information reliability. Lossy compression is a promising solution recently adapted to overcome the challenging energy consumption, by exploiting the data correlation and discarding the redundant information. In this paper, we propose a hybrid compression approach based on two dimensions specified as horizontal (HC) and vertical compression (VC), typically implemented in cluster-based routing architecture. The proposed scheme considers two key performance metrics, energy expenditure, and data accuracy to decide the adequate compression approach based on HC-VC or VC-HC configuration according to each WSN application requirement. Simulation results exhibit the performance of both proposed approaches in terms of extending the clustering network lifetime.
RTL Implementation of image compression techniques in WSNIJECEIAES
ย
The Wireless sensor networks have limitations regarding data redundancy, power and require high bandwidth when used for multimedia data. Image compression methods overcome these problems. Non-negative Matrix Factorization (NMF) method is useful in approximating high dimensional data where the data has non-negative components. Another method of the NMF called (PNMF) Projective Nonnegative Matrix Factorization is used for learning spatially localized visual patterns. Simulation results show the comparison between SVD, NMF, PNMF compression schemes. Compressed images are transmitted from base station to cluster head node and received from ordinary nodes. The station takes on the image restoration. Image quality, compression ratio, signal to noise ratio and energy consumption are the essential metrics measured for compression performance. In this paper, the compression methods are designed using Matlab.The parameters like PSNR, the total node energy consumption are calculated. RTL schematic of NMF SVD, PNMF methods is generated by using Verilog HDL.
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
ย
Slides of the presentation given at BigData'19, special session on Information Granulation in Data Science and Scalable Computing.
The fully automatic (i.e., without any manual tuning) graph embedding (i.e., network representation learning, unsupervised feature extraction) performed in near-linear time is presented. The resulting embeddings are interpretable, preserve both low- and high-order structural proximity of the graph nodes, computed (i.e., learned) by orders of magnitude faster and perform competitively to the manually tuned best state-of-the-art embedding techniques evaluated on diverse tasks of graph analysis.
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAeeiej_journal
ย
Multiplications and additions are most widely and more often used arithmetic computations performed in
all digital signal processing applications. Addition is the basic operation for many digital application. The
aim is to develop area efficient, high speed and low power devices. Accurate operation of a digital system
is mainly influenced by the performance of the adders. Multipliers are also very important component in
digital systems
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
ย
Deep learning (DL) has achieved a significant performance in computer vision problems, mainly in automatic feature extraction and representation. However, it is not easy to determine the best pooling method in a different case study. For instance, experts can implement the best types of pooling in image processing cases, which might not be optimal for various tasks. Thus, it is
required to keep in line with the philosophy of DL. In dynamic neural network architecture, it is not practically possible to find
a proper pooling technique for the layers. It is the primary reason why various pooling cannot be applied in the dynamic and multidimensional dataset. To deal with the limitations, it needs to construct an optimal pooling method as a better option than max pooling and average pooling. Therefore, we introduce a dynamic pooling layer called RunPool to train the convolutional
neuralnetwork(CNN)architecture.RunPoolpoolingisproposedtoregularizetheneuralnetworkthatreplacesthedeterministic
pooling functions. In the final section, we test the proposed pooling layer to address classification problems with online social network (OSN) dataset
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Design and implementation a prototype system for fusion image by using SWT-PC...IJECEIAES
ย
The technology of fusion image is dominance strongly over domain research for recent years, the techniques of fusion have various applications in real time used and proposed such as purpose of military and remote sensing etc., the fusion image is very efficient in processing of digital image. Single image produced from two images or more information of relevant combining process results from multi sensor fusion image. FPGA is the best implementation types of most technology enabling wide spread.This device works with modern versions for different critical characteristics same huge number of elements logic in order to permit complex algorithm implemented. In this paper,filters are designed and implemented in FPGA utilized for disease specified detection from images CT/MRI scanned where the samples are taken for human's brain with various medical images and the processing of fusion employed by using technique Stationary Wavelet Transform and Principal Component Analysis (SWT-PCA). Accuracy image output increases when implemented this technique and that was done by sampling down eliminating where effects blurring and artifacts doesn't influenced. The algorithm of SWT-PCA parameters quality measurements like NCC, MSE, PSNR, coefficients and Eigen values.The advantages significant of this system that provide real time, time rapid to market and portability beside the change parametric continuing in the DWT transform. The designed and simulation of module proposed system has been done by using MATLAB simulink and blocks generator system, Xilinx synthesized with synthesis tool (XST) and implemented in XilinxSpartan 6-SP605 device.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...IJCNCJournal
ย
Energy efficiency is an essential issue to be reckoned in wireless sensor networks development. Since the low-powered sensor nodes deplete their energy in transmitting the collected information, several strategies have been proposed to investigate the communication power consumption, in order to reduce the amount of transmitted data without affecting the information reliability. Lossy compression is a promising solution recently adapted to overcome the challenging energy consumption, by exploiting the data correlation and discarding the redundant information. In this paper, we propose a hybrid compression approach based on two dimensions specified as horizontal (HC) and vertical compression (VC), typically implemented in cluster-based routing architecture. The proposed scheme considers two key performance metrics, energy expenditure, and data accuracy to decide the adequate compression approach based on HC-VC or VC-HC configuration according to each WSN application requirement. Simulation results exhibit the performance of both proposed approaches in terms of extending the clustering network lifetime.
RTL Implementation of image compression techniques in WSNIJECEIAES
ย
The Wireless sensor networks have limitations regarding data redundancy, power and require high bandwidth when used for multimedia data. Image compression methods overcome these problems. Non-negative Matrix Factorization (NMF) method is useful in approximating high dimensional data where the data has non-negative components. Another method of the NMF called (PNMF) Projective Nonnegative Matrix Factorization is used for learning spatially localized visual patterns. Simulation results show the comparison between SVD, NMF, PNMF compression schemes. Compressed images are transmitted from base station to cluster head node and received from ordinary nodes. The station takes on the image restoration. Image quality, compression ratio, signal to noise ratio and energy consumption are the essential metrics measured for compression performance. In this paper, the compression methods are designed using Matlab.The parameters like PSNR, the total node energy consumption are calculated. RTL schematic of NMF SVD, PNMF methods is generated by using Verilog HDL.
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
ย
Slides of the presentation given at BigData'19, special session on Information Granulation in Data Science and Scalable Computing.
The fully automatic (i.e., without any manual tuning) graph embedding (i.e., network representation learning, unsupervised feature extraction) performed in near-linear time is presented. The resulting embeddings are interpretable, preserve both low- and high-order structural proximity of the graph nodes, computed (i.e., learned) by orders of magnitude faster and perform competitively to the manually tuned best state-of-the-art embedding techniques evaluated on diverse tasks of graph analysis.
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAeeiej_journal
ย
Multiplications and additions are most widely and more often used arithmetic computations performed in
all digital signal processing applications. Addition is the basic operation for many digital application. The
aim is to develop area efficient, high speed and low power devices. Accurate operation of a digital system
is mainly influenced by the performance of the adders. Multipliers are also very important component in
digital systems
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
ย
Deep learning (DL) has achieved a significant performance in computer vision problems, mainly in automatic feature extraction and representation. However, it is not easy to determine the best pooling method in a different case study. For instance, experts can implement the best types of pooling in image processing cases, which might not be optimal for various tasks. Thus, it is
required to keep in line with the philosophy of DL. In dynamic neural network architecture, it is not practically possible to find
a proper pooling technique for the layers. It is the primary reason why various pooling cannot be applied in the dynamic and multidimensional dataset. To deal with the limitations, it needs to construct an optimal pooling method as a better option than max pooling and average pooling. Therefore, we introduce a dynamic pooling layer called RunPool to train the convolutional
neuralnetwork(CNN)architecture.RunPoolpoolingisproposedtoregularizetheneuralnetworkthatreplacesthedeterministic
pooling functions. In the final section, we test the proposed pooling layer to address classification problems with online social network (OSN) dataset
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...rinzindorjej
ย
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used
Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different
kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer)
are used. The mathematic principle, experiment detail and the experiment result will be explained through
comparison.
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer) are used. The mathematic principle, experiment detail and the experiment result will be explained through comparison.
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...rinzindorjej
ย
In this paper, a fruit image data set is used to compare the efficiency and accuracy of two widely used Convolutional Neural Network, namely the ResNet and the DenseNet, for the recognition of 50 different kinds of fruits. In the experiment, the structure of ResNet-34 and DenseNet_BC-121 (with bottleneck layer)
are used. The mathematic principle, experiment detail and the experiment result will be explained through comparison.
Multimode system condition monitoring using sparsity reconstruction for quali...IJECEIAES
ย
In this paper, we introduce an improved multivariate statistical monitoring method based on the stacked sparse autoencoder (SSAE). Our contribution focuses on the choice of the SSAE model based on neural networks to solve diagnostic problems of complex systems. In order to monitor the process performance, the squared prediction error (SPE) chart is linked with nonparametric adaptive confidence bounds which arise from the kernel density estimation to minimize erroneous alerts. Then, faults are localized using two methods: contribution plots and sensor validity index (SVI). The results are obtained from experiments and real data from a drinkable water processing plant, demonstrating how the applied technique is performed. The simulation results of the SSAE model show a better ability to detect and identify sensor failures.
An energy-efficient cluster head selection in wireless sensor network using g...TELKOMNIKA JOURNAL
ย
Clustering is considered as one of the most prominent solutions to preserve theenergy in the wireless sensor networks. However, for optimal clustering, anenergy efficient cluster head selection is quite important. Improper selectionofcluster heads(CHs) consumes high energy compared to other sensor nodesdue to the transmission of data packets between the cluster members and thesink node. Thereby, it reduces the network lifetime and performance of thenetwork. In order to overcome the issues, we propose a novelcluster headselection approach usinggrey wolf optimization algorithm(GWO) namelyGWO-CH which considers the residual energy, intra-cluster and sink distance.In addition to that, we formulated an objective function and weight parametersfor anefficient cluster head selection and cluster formation. The proposedalgorithm is tested in different wireless sensor network scenarios by varyingthe number of sensor nodes and cluster heads. The observed results conveythat the proposed algorithm outperforms in terms of achieving better networkperformance compare to other algorithms.
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
ย
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parametersโ requirements and precision compared to related work.
Medium term load demand forecast of Kano zone using neural network algorithmsTELKOMNIKA JOURNAL
ย
Electricity load forecasting refers to projection of future load requirements of an area or region or country through appropriate use of historical load data. One of several challenges faced by the Nigerian power distribution sectors is the overloaded power distribution network which leads to poor voltage distribution and frequent power outages. Accurate load demand forecasting is a key in addressing this challenge. This paper presents a comparison of generalized regression neural network (GRNN), feed-forward neural network (FFNN) and radial basis function neural network for medium term load demand estimation. Experimental data from Kano electricity distribution company (KEDCO) were used in validating the models. The simulation results indicated that the neural network models yielded promising results having achieved a mean absolute percentage error (MAPE) of less than 10% in all the considered scenarios. The generalization capability of FFNN is slightly better than that of RBFNN and GRNN model. The models could serve as a valuable and promising tool for the forecasting of the load demand.
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...ijcsit
ย
In Network on Chip (NoC) rooted system, energy consumption is affected by task scheduling and allocation
schemes which affect the performance of the system. In this paper we test the pre-existing proposed
algorithms and introduced a new energy skilled algorithm for 3D NoC architecture. An efficient dynamic
and cluster approaches are proposed along with the optimization using bio-inspired algorithm. The
proposed algorithm has been implemented and evaluated on randomly generated benchmark and real life
application such as MMS, Telecom and VOPD. The algorithm has also been tested with the E3S benchmark
and has been compared with the existing mapping algorithm spiral and crinkle and has shown better
reduction in the communication energy consumption and shows improvement in the performance of the
system. On performing experimental analysis of proposed algorithm results shows that average reduction
in energy consumption is 49%, reduction in communication cost is 48% and average latency is 34%.
Cluster based approach is mapped onto NoC using Dynamic Diagonal Mapping (DDMap), Crinkle and
Spiral algorithms and found DDmap provides improved result. On analysis and comparison of mapping of
cluster using DDmap approach the average energy reduction is 14% and 9% with crinkle and spiral.
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...IJECEIAES
ย
Cloud Computing is the most powerful computing model of our time. While the major IT providers and consumers are competing to exploit the benefits of this computing model in order to thrive their profits, most of the cloud computing platforms are still built on operating systems that uses basic CPU (Core Processing Unit) scheduling algorithms that lacks the intelligence needed for such innovative computing model. Correspdondingly, this paper presents the benefits of applying Artificial Neural Networks algorithms in regards to enhancing CPU scheduling for Cloud Computing model. Furthermore, a set of characteristics and theoretical metrics are proposed for the sake of comparing the different Artificial Neural Networks algorithms and finding the most accurate algorithm for Cloud Computing CPU Scheduling.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
ย
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...ijsrd.com
ย
This paper addresses the problem of estimation of fabrication time in Rig construction projects through application of Artificial Neural Network (ANNs) as this is the most crucial activity for successful project management planning. ANN is a non-linear, data driven, self adaptive approach as opposed to the traditional model based methods, also fast becoming popular in forecasting where relationship between input and output is not known but vast collection of data is available. Around 960 data regarding fabrication activity has been collected from ABG Shipyard Ltd., Dahej. 3 input parameters have been considered for estimation of output as fabrication time. 11 Feed Forward Back Propagation neural networks with different network architectures were made. Network N10 was able to predict the output with MSE 1.35337e-2. Coding was done for the Graphical User Interface (GUI) so that the GUI runs, simulates network N10, and displays the fabrication time for different combination of inputs.
Similar to Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions (20)
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
ย
In recent times, the trend of online shopping through e-commerce stores and websites has grown to a huge extent. Whenever a product is purchased on an e-commerce platform, people leave their reviews about the product. These reviews are very helpful for the store owners and the productโs manufacturers for the betterment of their work process as well as product quality. An automated system is proposed in this work that operates on two datasets D1 and D2 obtained from Amazon. After certain preprocessing steps, N-gram and word embedding-based features are extracted using term frequency-inverse document frequency (TF-IDF), bag of words (BoW) and global vectors (GloVe), and Word2vec, respectively. Four machine learning (ML) models support vector machines (SVM), logistic regression (RF), logistic regression (LR), multinomial Naรฏve Bayes (MNB), two deep learning (DL) models convolutional neural network (CNN), long-short term memory (LSTM), and standalone bidirectional encoder representations (BERT) are used to classify reviews as either positive or negative. The results obtained by the standard ML, DL models and BERT are evaluated using certain performance evaluation measures. BERT turns out to be the best-performing model in the case of D1 with an accuracy of 90% on features derived by word embedding models while the CNN provides the best accuracy of 97% upon word embedding features in the case of D2. The proposed model shows better overall performance on D2 as compared to D1.
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
ย
In this study, a microstrip patch antenna that works at 3.6 GHz was built and tested to see how well it works. In this work, Rogers RT/Duroid 5880 has been used as the substrate material, with a dielectric permittivity of 2.2 and a thickness of 0.3451 mm; it serves as the base for the examined antenna. The computer simulation technology (CST) studio suite is utilized to show the recommended antenna design. The goal of this study was to get a more extensive transmission capacity, a lower voltage standing wave ratio (VSWR), and a lower return loss, but the main goal was to get a higher gain, directivity, and efficiency. After simulation, the return loss, gain, directivity, bandwidth, and efficiency of the supplied antenna are found to be -17.626 dB, 9.671 dBi, 9.924 dBi, 0.2 GHz, and 97.45%, respectively. Besides, the recreation uncovered that the transfer speed side-lobe level at phi was much better than those of the earlier works, at -28.8 dB, respectively. Thus, it makes a solid contender for remote innovation and more robust communication.
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
ย
In this paper, snake optimization algorithm (SOA) is used to find the optimal gains of an enhanced controller for controlling congestion problem in computer networks. M-file and Simulink platform is adopted to evaluate the response of the active queue management (AQM) system, a comparison with two classical controllers is done, all tuned gains of controllers are obtained using SOA method and the fitness function chose to monitor the system performance is the integral time absolute error (ITAE). Transient analysis and robust analysis is used to show the proposed controller performance, two robustness tests are applied to the AQM system, one is done by varying the size of queue value in different period and the other test is done by changing the number of transmission control protocol (TCP) sessions with a value of ยฑ 20% from its original value. The simulation results reflect a stable and robust behavior and best performance is appeared clearly to achieve the desired queue size without any noise or any transmission problems.
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
ย
Vehicular ad-hoc networks (VANETs) are wireless-equipped vehicles that form networks along the road. The security of this network has been a major challenge. The identity-based cryptosystem (IBC) previously used to secure the networks suffers from membership authentication security features. This paper focuses on improving the detection of intruders in VANETs with a modified identity-based cryptosystem (MIBC). The MIBC is developed using a non-singular elliptic curve with Lagrange interpolation. The public key of vehicles and roadside units on the network are derived from number plates and location identification numbers, respectively. Pseudo-identities are used to mask the real identity of users to preserve their privacy. The membership authentication mechanism ensures that only valid and authenticated members of the network are allowed to join the network. The performance of the MIBC is evaluated using intrusion detection ratio (IDR) and computation time (CT) and then validated with the existing IBC. The result obtained shows that the MIBC recorded an IDR of 99.3% against 94.3% obtained for the existing identity-based cryptosystem (EIBC) for 140 unregistered vehicles attempting to intrude on the network. The MIBC shows lower CT values of 1.17 ms against 1.70 ms for EIBC. The MIBC can be used to improve the security of VANETs.
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
ย
Understanding the primary factors of internet banking (IB) acceptance is critical for both banks and users; nevertheless, our knowledge of the role of usersโ perceived risk and trust in IB adoption is limited. As a result, we develop a conceptual model by incorporating perceived risk and trust into the technology acceptance model (TAM) theory toward the IB. The proper research emphasized that the most essential component in explaining IB adoption behavior is behavioral intention to use IB adoption. TAM is helpful for figuring out how elements that affect IB adoption are connected to one another. According to previous literature on IB and the use of such technology in Iraq, one has to choose a theoretical foundation that may justify the acceptance of IB from the customerโs perspective. The conceptual model was therefore constructed using the TAM as a foundation. Furthermore, perceived risk and trust were added to the TAM dimensions as external factors. The key objective of this work was to extend the TAM to construct a conceptual model for IB adoption and to get sufficient theoretical support from the existing literature for the essential elements and their relationships in order to unearth new insights about factors responsible for IB adoption.
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
ย
The smart antennas are broadly used in wireless communication. The least mean square (LMS) algorithm is a procedure that is concerned in controlling the smart antenna pattern to accommodate specified requirements such as steering the beam toward the desired signal, in addition to placing the deep nulls in the direction of unwanted signals. The conventional LMS (C-LMS) has some drawbacks like slow convergence speed besides high steady state fluctuation error. To overcome these shortcomings, the present paper adopts an adaptive fuzzy control step size least mean square (FC-LMS) algorithm to adjust its step size. Computer simulation outcomes illustrate that the given model has fast convergence rate as well as low mean square error steady state.
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
ย
This paper presents the design and implementation of a forest fire monitoring and warning system based on long range (LoRa) technology, a novel ultra-low power consumption and long-range wireless communication technology for remote sensing applications. The proposed system includes a wireless sensor network that records environmental parameters such as temperature, humidity, wind speed, and carbon dioxide (CO2) concentration in the air, as well as taking infrared photos.The data collected at each sensor node will be transmitted to the gateway via LoRa wireless transmission. Data will be collected, processed, and uploaded to a cloud database at the gateway. An Android smartphone application that allows anyone to easily view the recorded data has been developed. When a fire is detected, the system will sound a siren and send a warning message to the responsible personnel, instructing them to take appropriate action. Experiments in Tram Chim Park, Vietnam, have been conducted to verify and evaluate the operation of the system.
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
ย
Cognitive radio is a smart radio that can change its transmitter parameter based on interaction with the environment in which it operates. The demand for frequency spectrum is growing due to a big data issue as many Internet of Things (IoT) devices are in the network. Based on previous research, most frequency spectrum was used, but some spectrums were not used, called spectrum hole. Energy detection is one of the spectrum sensing methods that has been frequently used since it is easy to use and does not require license users to have any prior signal understanding. But this technique is incapable of detecting at low signal-to-noise ratio (SNR) levels. Therefore, the wavelet-based sensing is proposed to overcome this issue and detect spectrum holes. The main objective of this work is to evaluate the performance of wavelet-based sensing and compare it with the energy detection technique. The findings show that the percentage of detection in wavelet-based sensing is 83% higher than energy detection performance. This result indicates that the wavelet-based sensing has higher precision in detection and the interference towards primary user can be decreased.
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
ย
In this paper, we present the design of a new wide dual-band bandstop filter (DBBSF) using nonuniform transmission lines. The method used to design this filter is to replace conventional uniform transmission lines with nonuniform lines governed by a truncated Fourier series. Based on how impedances are profiled in the proposed DBBSF structure, the fractional bandwidths of the two 10 dB-down rejection bands are widened to 39.72% and 52.63%, respectively, and the physical size has been reduced compared to that of the filter with the uniform transmission lines. The results of the electromagnetic (EM) simulation support the obtained analytical response and show an improved frequency behavior.
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
ย
A distributed denial of service (DDoS) attack is where one or more computers attack or target a server computer, by flooding internet traffic to the server. As a result, the server cannot be accessed by legitimate users. A result of this attack causes enormous losses for a company because it can reduce the level of user trust, and reduce the companyโs reputation to lose customers due to downtime. One of the services at the application layer that can be accessed by users is a web-based lightweight directory access protocol (LDAP) service that can provide safe and easy services to access directory applications. We used a deep learning approach to detect DDoS attacks on the CICDDoS 2019 dataset on a complex computer network at the application layer to get fast and accurate results for dealing with unbalanced data. Based on the results obtained, it is observed that DDoS attack detection using a deep learning approach on imbalanced data performs better when implemented using synthetic minority oversampling technique (SMOTE) method for binary classes. On the other hand, the proposed deep learning approach performs better for detecting DDoS attacks in multiclass when implemented using the adaptive synthetic (ADASYN) method.
The appearance of uncertainties and disturbances often effects the characteristics of either linear or nonlinear systems. Plus, the stabilization process may be deteriorated thus incurring a catastrophic effect to the system performance. As such, this manuscript addresses the concept of matching condition for the systems that are suffering from miss-match uncertainties and exogeneous disturbances. The perturbation towards the system at hand is assumed to be known and unbounded. To reach this outcome, uncertainties and their classifications are reviewed thoroughly. The structural matching condition is proposed and tabulated in the proposition 1. Two types of mathematical expressions are presented to distinguish the system with matched uncertainty and the system with miss-matched uncertainty. Lastly, two-dimensional numerical expressions are provided to practice the proposed proposition. The outcome shows that matching condition has the ability to change the system to a design-friendly model for asymptotic stabilization.
Implementation of FinFET technology based low power 4ร4 Wallace tree multipli...TELKOMNIKA JOURNAL
ย
Many systems, including digital signal processors, finite impulse response (FIR) filters, application-specific integrated circuits, and microprocessors, use multipliers. The demand for low power multipliers is gradually rising day by day in the current technological trend. In this study, we describe a 4ร4 Wallace multiplier based on a carry select adder (CSA) that uses less power and has a better power delay product than existing multipliers. HSPICE tool at 16 nm technology is used to simulate the results. In comparison to the traditional CSA-based multiplier, which has a power consumption of 1.7 ยตW and power delay product (PDP) of 57.3 fJ, the results demonstrate that the Wallace multiplier design employing CSA with first zero finding logic (FZF) logic has the lowest power consumption of 1.4 ยตW and PDP of 27.5 fJ.
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
ย
The flaw in 5G orthogonal frequency division multiplexing (OFDM) becomes apparent in high-speed situations. Because the doppler effect causes frequency shifts, the orthogonality of OFDM subcarriers is broken, lowering both their bit error rate (BER) and throughput output. As part of this research, we use a novel design that combines massive multiple input multiple output (MIMO) and weighted overlap and add (WOLA) to improve the performance of 5G systems. To determine which design is superior, throughput and BER are calculated for both the proposed design and OFDM. The results of the improved system show a massive improvement in performance ver the conventional system and significant improvements with massive MIMO, including the best throughput and BER. When compared to conventional systems, the improved system has a throughput that is around 22% higher and the best performance in terms of BER, but it still has around 25% less error than OFDM.
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
ย
In this study, it is aimed to obtain two different asymmetric radiation patterns obtained from antennas in the shape of the cross-section of a parabolic reflector (fan blade type antennas) and antennas with cosecant-square radiation characteristics at two different frequencies from a single antenna. For this purpose, firstly, a fan blade type antenna design will be made, and then the reflective surface of this antenna will be completed to the shape of the reflective surface of the antenna with the cosecant-square radiation characteristic with the frequency selective surface designed to provide the characteristics suitable for the purpose. The frequency selective surface designed and it provides the perfect transmission as possible at 4 GHz operating frequency, while it will act as a band-quenching filter for electromagnetic waves at 5 GHz operating frequency and will be a reflective surface. Thanks to this frequency selective surface to be used as a reflective surface in the antenna, a fan blade type radiation characteristic at 4 GHz operating frequency will be obtained, while a cosecant-square radiation characteristic at 5 GHz operating frequency will be obtained.
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
ย
A simple and low-cost fiber based optical sensor for iron detection is demonstrated in this paper. The sensor head consist of an unclad optical fiber with the unclad length of 1 cm and it has a straight structure. Results obtained shows a linear relationship between the output light intensity and iron concentration, illustrating the functionality of this iron optical sensor. Based on the experimental results, the sensitivity and linearity are achieved at 0.0328/ppm and 0.9824 respectively at the wavelength of 690 nm. With the same wavelength, other performance parameters are also studied. Resolution and limit of detection (LOD) are found to be 0.3049 ppm and 0.0755 ppm correspondingly. This iron sensor is advantageous in that it does not require any reagent for detection, enabling it to be simpler and cost-effective in the implementation of the iron sensing.
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
ย
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 ยฐC, 120 ยฐC, 150 ยฐC, and 180 ยฐC under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 ยฐC. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 ยฐC, which lowers resistance while increasing electron lifetime.
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 ยฐC, 120 ยฐC, 150 ยฐC, and 180 ยฐC under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 ยฐC. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 ยฐC, which lowers resistance while increasing electron lifetime.
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
ย
This article discusses the progressive learning for structural tolerance online sequential extreme learning machine (PSTOS-ELM). PSTOS-ELM can save robust accuracy while updating the new data and the new class data on the online training situation. The robustness accuracy arises from using the householder block exact QR decomposition recursive least squares (HBQRD-RLS) of the PSTOS-ELM. This method is suitable for applications that have data streaming and often have new class data. Our experiment compares the PSTOS-ELM accuracy and accuracy robustness while data is updating with the batch-extreme learning machine (ELM) and structural tolerance online sequential extreme learning machine (STOS-ELM) that both must retrain the data in a new class data case. The experimental results show that PSTOS-ELM has accuracy and robustness comparable to ELM and STOS-ELM while also can update new class data immediately.
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
ย
This study aimed to develop a brain-computer interface that can control an electric wheelchair using electroencephalography (EEG) signals. First, we used the Mind Wave Mobile 2 device to capture raw EEG signals from the surface of the scalp. The signals were transformed into the frequency domain using fast Fourier transform (FFT) and filtered to monitor changes in attention and relaxation. Next, we performed time and frequency domain analyses to identify features for five eye gestures: opened, closed, blink per second, double blink, and lookup. The base state was the opened-eyes gesture, and we compared the features of the remaining four action gestures to the base state to identify potential gestures. We then built a multilayer neural network to classify these features into five signals that control the wheelchairโs movement. Finally, we designed an experimental wheelchair system to test the effectiveness of the proposed approach. The results demonstrate that the EEG classification was highly accurate and computationally efficient. Moreover, the average performance of the brain-controlled wheelchair system was over 75% across different individuals, which suggests the feasibility of this approach.
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
ย
For image segmentation, level set models are frequently employed. It offer best solution to overcome the main limitations of deformable parametric models. However, the challenge when applying those models in medical images stills deal with removing blurs in image edges which directly affects the edge indicator function, leads to not adaptively segmenting images and causes a wrong analysis of pathologies wich prevents to conclude a correct diagnosis. To overcome such issues, an effective process is suggested by simultaneously modelling and solving systemsโ two-dimensional partial differential equations (PDE). The first PDE equation allows restoration using Eulerโs equation similar to an anisotropic smoothing based on a regularized Perona and Malik filter that eliminates noise while preserving edge information in accordance with detected contours in the second equation that segments the image based on the first equation solutions. This approach allows developing a new algorithm which overcome the studied model drawbacks. Results of the proposed method give clear segments that can be applied to any application. Experiments on many medical images in particular blurry images with high information losses, demonstrate that the developed approach produces superior segmentation results in terms of quantity and quality compared to other models already presented in previeous works.
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
ย
Drug addiction is a complex neurobiological disorder that necessitates comprehensive treatment of both the body and mind. It is categorized as a brain disorder due to its impact on the brain. Various methods such as electroencephalography (EEG), functional magnetic resonance imaging (FMRI), and magnetoencephalography (MEG) can capture brain activities and structures. EEG signals provide valuable insights into neurological disorders, including drug addiction. Accurate classification of drug addiction from EEG signals relies on appropriate features and channel selection. Choosing the right EEG channels is essential to reduce computational costs and mitigate the risk of overfitting associated with using all available channels. To address the challenge of optimal channel selection in addiction detection from EEG signals, this work employs the shuffled frog leaping algorithm (SFLA). SFLA facilitates the selection of appropriate channels, leading to improved accuracy. Wavelet features extracted from the selected input channel signals are then analyzed using various machine learning classifiers to detect addiction. Experimental results indicate that after selecting features from the appropriate channels, classification accuracy significantly increased across all classifiers. Particularly, the multi-layer perceptron (MLP) classifier combined with SFLA demonstrated a remarkable accuracy improvement of 15.78% while reducing time complexity.
Nuclear Power Economics and Structuring 2024Massimo Talia
ย
Title: Nuclear Power Economics and Structuring - 2024 Edition
Produced by: World Nuclear Association Published: April 2024
Report No. 2024/001
ยฉ 2024 World Nuclear Association.
Registered in England and Wales, company number 01215741
This report reflects the views
of industry experts but does not
necessarily represent those
of World Nuclear Associationโs
individual member organizations.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxnikitacareer3
ย
Looking for the best engineering colleges in Jaipur for 2024?
Check out our list of the top 10 B.Tech colleges to help you make the right choice for your future career!
1) MNIT
2) MANIPAL UNIV
3) LNMIIT
4) NIMS UNIV
5) JECRC
6) VIVEKANANDA GLOBAL UNIV
7) BIT JAIPUR
8) APEX UNIV
9) AMITY UNIV.
10) JNU
TO KNOW MORE ABOUT COLLEGES, FEES AND PLACEMENT, WATCH THE FULL VIDEO GIVEN BELOW ON "TOP 10 B TECH COLLEGES IN JAIPUR"
https://www.youtube.com/watch?v=vSNje0MBh7g
VISIT CAREER MANTRA PORTAL TO KNOW MORE ABOUT COLLEGES/UNIVERSITITES in Jaipur:
https://careermantra.net/colleges/3378/Jaipur/b-tech
Get all the information you need to plan your next steps in your medical career with Career Mantra!
https://careermantra.net/
We have compiled the most important slides from each speaker's presentation. This yearโs compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Water billing management system project report.pdfKamal Acharya
ย
Our project entitled โWater Billing Management Systemโ aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
ย
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Online aptitude test management system project report.pdfKamal Acharya
ย
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesnโt matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examineeโs simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions
1. TELKOMNIKA, Vol.16, No.6, December 2018, pp.2835~2843
ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018
DOI: 10.12928/TELKOMNIKA.v16i6.8955 ๏ฎ 2835
Received February 14, 2018; Revised September 27, 2018; Accepted October 24, 2018
Stochastic Computing Correlation Utilization in
Convolutional Neural Network Basic Functions
Hamdan Abdellatef*, Mohamed Khalil-Hani, Nasir Shaikh-Husin, Sayed Omid Ayat
VeCAD Research Laboratory, School of Electrical Engineering, Faculty of Engineering,
Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
*Corresponding author, e-mail: hamdan.abdellatef@gmail.com
Abstract
In recent years, many applications have been implemented in embedded systems and mobile
Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and
exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry
Convolutional Neural Network (CNN) in this class of devices, many research groups have developed
specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate
Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm
called Stochastic Computing (SC) can implement CNN with low hardware footprint and power
consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in
SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding
error. Experimental results show our proposed solution provides significant savings in hardware footprint
and increased accuracy for the SC CNN basic functions circuits compared to previous work.
Keywords: convolutional neural network, stochastic computing, correlation
Copyright ยฉ 2018 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
Deep learning has emerged as a new area of machine learning research, which
enables a system to automatically learn complex information and extract representations at
multiple levels of abstraction. Convolutional Neural Network (CNN) is recognized as one of the
most promising types of Artificial Neural Networks (ANN) and has become the dominant
approach for almost all recognition and detection tasks [1] such as face recognition [2],
handwritten digit recognition [3], target recognition [4], and image classification [5]. To achieve
acceptable classification results, CNN performs a massive number of convolutions and sub-
sampling operations with significant amounts of intermediate data results. Despite its high
classification accuracy, a deep CNN is highly demanding in terms of energy consumption and
computation cost [6]. To bring the success of CNNs to resource-constrained mobile and
embedded systems, designers must overcome the challenges of implementing resource-hungry
CNNs in embedded systems with limited area and power budget [7].
Stochastic Computing (SC), which represents and processes information in the form of
a probability of ones in a bit-stream, has the potential to implement CNNs with significantly
reduced hardware resources and achieve high power efficiency. In SC, arithmetic operations
like multiplication can be performed using simple AND or XNOR logic gate in Uni-Polar (UP) or
BiPolar (BP) representation, respectively, and scaled addition is done using Multiplexers (MUX).
Also, in SC, there are no positional weights among bits; therefore, SC circuits are better in
soft-error resilience and have a free dynamic trade-off between performance, accuracy, and
energy. Fortunately, neural networks have high error-tolerance at the algorithmic level, which
allows using SC for CNN implementation [8]. Opposite to what was previously believed, the
correlation among Stochastic Numbers (SN) can serve as a resource in designing stochastic
circuits. A comparative study [9] reported that the circuits exploiting correlation are generally
smaller, more accurate, and have lower latency than those with independent inputs. However,
previous SC CNN works [7, 8, 10, 11] did not explore correlation in their implementation.
Besides, these works did not consider the conversion circuits between SC and conventional
binary in their estimates as RNGs can take up to 80% of the circuit cost [12]. Also, long bit
2. ๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2836
streams up to 8192 bits are used to obtain acceptable accuracy which significantly increases
the latency. This work has the following contributions:
1. SC CNN basic functions (inner product, pooling, and ReLU activation function) that exploit
correlation is proposed. The obtained functions had significant lower resource utilization,
higher accuracy, and more robust to rounding error compared to previous SC work. Also, it
shows significant area reduction compared to binary implementation [13].
2. A new method that generates uncorrelated bit streams for MUXs selectors to enhance the
accuracy of scaled addition using Toggle Flip-Flops (TFF) is presented. Also, this method
reduces the number of used RNGs.
The rest of this paper is organized as follows. Section 2 overviews CNN and explains its
basic functions. Section 3 reviews SC basics. Section 4 presents the CNN basic functions (inner
product, pooling, ReLU activation function) design method. Section 5 presents experimental
results to show the effectiveness of the proposed design with respect to compactness and
accuracy, and finally, Section 6 concludes the paper.
2. Convolutional Neural Network
Previously, hand-engineered features development had been the primary source of
difficulty in computer vision, like sophisticated feature extractors, to identify higher-level patterns
that are optimal for machine vision tasks, such as object recognition. However, convolutional
neural networks aim to solve this problem by learning higher-level representations automatically
from data [14]. As a supervised learning algorithm, CNN employs a feedforward process for
recognition and a backward path for training. In industrial practice, many application designers
train CNN off-line and use the off-line trained CNN to perform time-sensitive jobs. Thus, the
speed, area, and energy consumption of feedforward computation are to be considered. This
work is scoped to the feedforward computation hardware implementation on FPGA.
A typical CNN, as shown in Figure 1, is composed of multiple computational layers that
can be categorized into two components: a feature extractor and a classifier. The feature
extractor is used to filter input images into "feature maps" that represent features of the image
such as corners and edges which are relatively invariant to position shifts or distortions. The
feature extractor output is a low-dimensional vector containing features. Then, the vector is fed
into the CNN second component classifier, which is usually a traditional fully connected artificial
neural network. The classifier decides the likelihood of categories that the input image might
belong to [15].
Figure 1. CNN architecture proposed in [13]
The CNN layers can be one of three types: convolutional layer, pooling layer, or fully
connected layer. The convolutional layer is the core building block of the CNN whereas the main
operation is the convolution that computes the inner-product of receptive fields, a window of the
input feature map, and a set of learnable filters. This layer is the most time and resource
consuming operation in CNN, occupying more than 90% overall computation dominating
runtime and energy consumption as shown in [16]. The convolutional layer has N feature map
batch size, M output feature maps, Ch input feature maps, H/W size of output feature map,
K weights kernel size, and S stride. To compute one element in the output feature map of the
3. TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2837
convolutional layer (1) is evaluated. This layer is of 4-D operation and to calculate all output
values (1) is looped for N, M, H, and W . The convolutional layer contains two functions, the
convolution, and activation. The convolution is loops of multiply-accumulate (MAC) operations
(or inner product). The activation functions conduct non-linear transformations such as Rectified
Linear Unit (ReLU), Sigmoid function, and hyperbolic tangent function. This work implements
only ReLU activation function since it is the most popular one. ReLU compares the input to zero
and outputs the maximum value as shown in (2).
๐[๐][๐][๐][๐] = ๐ด๐๐ก๐๐ฃ๐๐ก๐๐๐(๐ต[๐] + โ โ โ ๐ฅ[๐][๐โ][๐๐ + ๐][๐๐ + ๐] ร๐พโ1
๐=0
๐พโ1
๐=0
๐ถโโ1
๐โ=0
๐ค[๐][๐โ][๐][๐] (1)
๐๐ข๐ก = ๐๐๐ฅ(๐ฅ, 0) (2)
The pooling layers perform nonlinear down-sampling for data dimension reduction.
Commonly, max pooling and average pooling are used for this purpose. The max pooling layer
is shown in (3) which output the max value in a 2D window of K size and S stride. The average
pooling is to compute the average value of the same window as shown in (4). To complete the
layer operation, this equation is repeated for all N, M, H, W. The output feature map of the
pooling layer has a 1/๐ dimension reduction in height and width. Finally, the high-level
reasoning is completed via the classifier which is a fully connected layer. Neurons in this layer
are connected to all activation results in the previous layer which is an inner product with a filter
size of one element.
๐[๐][๐][๐][๐] = ๐๐๐ฅ(๐ฅ[๐][๐][๐๐ + ๐][๐๐ + ๐]) ๐๐๐ (0,0) โค (๐, ๐) < (๐พ โ 1, ๐พ โ 1) (3)
๐[๐][๐][๐][๐] =
1
๐พ2
โ โ ๐ฅ[๐][๐][๐๐ + ๐][๐๐ + ๐]๐พโ1
๐=0
๐พโ1
๐=0 (4)
The basic operations in CNN are the inner product, pooling, and activation function
operations. Any CNN neuron may consist of one or multiple basic operations. For instance,
neurons in convolutional layers implement inner product and activation operations only; those in
pooling layers implement pooling only, and those in fully connected layers implement inner
product and activation operations.
3. Stochastic Computing
Stochastic computing (SC) represents and processes information in the form of digitized
probabilities. In SC, numbers called stochastic numbers SNs are represented by binary bit
streams. The SN donate the probability p, the probability of 1s in the SN [17]. SN has no fixed
length nor structure. The stochastic representation can be One-line UP, One-line BP, and
Two-line. This paper uses BP representation since it allows negative values, but UP do not. The
UP and BP stochastic representations are clarified in Table 1 where N0, N1, and N represents
the number of zeros, ones, total bits in SN respectively. To convert from binary to stochastic a
stochastic number generator (SNG) is used which is a random number generator RNG and a
comparator. On the other hand, to convert from Stochastic to binary, a counter is used.
According to [18], SNs should be independent and uncorrelated bit-streams. However, recent
studies [9] showed that the correlation could serve as a resource in designing stochastic
circuits. In that study, Alaghi and Hayes introduced a parameter that determines the significance
of the correlation between two SNs called stochastic computing correlation (SCC) as
shown in (5). The major advantage of SC is that it employs very low-complexity arithmetic
units [17], as shown in Table 1. The AND or XNOR gates perform the multiplication operation in
SC using UP or BP representation. There is no direct addition in SC; instead, a scaled addition
is used. 2-to-1 MUX performs the scaled addition having ๐๐ = 0.5 or s = 0 selector bit stream
value in UP and BP representation respectively. It should be noted that the MUX selector should
be uncorrelated with the MUX inputs to prevent correlation-induced error. However, the MUX
inputs are correlation-insensitive and can have any value of correlation. To perform subtraction
NOT gate can be used to negate the subtracted value and then it will be added by 2-to-1 MUX
to the other value.
4. ๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2838
Table 1. The SN Representations and Basic Operations
Value Interval Relation Negation Multiplication Scaled Addition
UP
๐ =
๐1
๐
[0,1]
p =
1 + ๐ฅ
2
๐ ๐๐๐
= 1 โ ๐
๐ ๐ด๐๐ท = ๐1 ๐2 ๐ ๐๐๐ = ๐๐ ๐1 + (1 โ ๐๐ )๐2
BP ๐ฅ
=
๐1 โ ๐0
๐
[-1,1] x = 2p โ 1 ๐ฅ ๐๐๐ = โ๐ฅ ๐ฅ ๐๐๐๐ = ๐ฅ1 ๐ฅ2
๐ฅ ๐๐๐ =
1
2
[(1 + ๐ )๐ฅ1 + (1
โ ๐ )๐ฅ2]
One source of inaccuracy in SC is rounding. If we wish to eliminate the rounding error,
the SN length L, and the binary precision, the number of bits, n should satisfy L = 2 ๐
. Suppose
the precision in binary computing is n = 8 bits, so the full SN length L = 256 bits. Each bit
requires one clock cycle to be processed causing the SC latency. As precision increases, L and
latency increase exponentially which is a significant drawback in SC. To reduce the number of
clock cycles needed, the full SN length is not used producing rounding error.
๐๐ถ๐ถ(๐, ๐) = {
๐ ๐โฉ๐โ๐ ๐ ๐ ๐
๐๐๐(๐ ๐,๐ ๐)โ๐ ๐ ๐ ๐
๐ ๐โฉ๐ โ ๐ ๐ ๐ ๐ > 0
0 ๐ ๐โฉ๐ โ ๐ ๐ ๐ ๐ = 0
๐ ๐โฉ๐โ๐ ๐ ๐ ๐
๐ ๐ ๐ ๐โ๐๐๐ฅ(๐ ๐+๐ ๐โ1,0)
๐ ๐โฉ๐ โ ๐ ๐ ๐ ๐ < 0
(5)
For SC addition operation, it is required to produce a 0.5 bit-stream that has ๐๐ถ๐ถ โ 0
with respect to the inputs. Theoretically, the independent RNGs generates uncorrelated
bitstreams, but the growing circuit size will require many independent RNGs affecting the area
cost. In this study, we propose using flip-flops (FF)s to obtain uncorrelated bit-streams for SC
scaled addition. T-FF is a JK-FF where T is connected to both inputs of the JK-FF. Based on
Gaudet and Rapley [19] the JK-FF output follow (6). In the case of T-FF ๐ ๐ = ๐๐ฝ = ๐ ๐พ, then
always ๐ ๐ = 0.5 for any value of T and always SCC(T, Q) โ 0. Using T-FF will allow using one
RNG for all SNGs to generate the MUX inputs and the uncorrelated selector as shown in
Figure 2a. Similarly, SC addition accuracy will be increased as shown in Figure 2b.
๐ ๐ =
๐ ๐ฝ
๐ ๐ฝ+๐ ๐
(6)
(a) SC scaled addition using T-FF for MUX
selector generation
(b) Accuracy comparison between T-FF and
independent RNG for selector generation
Figure 2. More accurate SC scaled addition by using T-FF
4. Design of Stochastic Computing Based Convolutional Neural Network Basic
Functions
4.1. Inner Product
The inner product is a MAC operation which is the basic function of convolution in CNN.
The number of elements in the inner product is determined from the "for loop" unroll factor. To
perform inner product in SC, the standard blocks are XNOR to perform multiplication and MUX
or Approximate Parallel Counter (APC) [20] to perform addition as proposed in previous SC
5. TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2839
CNN works [7, 8, 10]. However, XNOR gate requires long uncorrelated bit-streams which will
affect latency and area cost by using one RNG per SNG. On the other hand, MUX tree
approach proposed in [21] to perform inner product in digital filter case study can be adapted to
this application. MUX tree allows sharing RNG among all inputs and is more accurate compared
to previous XNOR-MUX and XNOR-APC approaches as to be shown in Section 5.
One MUX can be used for inner product of 2-elements vectors x and h as shown in
Figure 3 and following (7), where the real operation does not involve multiplication or
๐ง =
1
โ|โ|
(โ โ ๐ฅ) =
1
|โ1|+|โ2|
(โ1 ๐ฅ1 + โ2 ๐ฅ2) (7)
addition. The selector bit-stream of probability
|โ1|
|โ1|+|โ2|
which follows (8) and
๐ ๐ ๐ =
โ(โ1 ๐)
โ(โ1 ๐โชโ0 ๐)
(8)
sign(โ๐) will be denoted by ๐ ๐. The same equation shows the MUX tree output for inner
product of two vectors x and h with any length, but scaling seems to be a problem. Taking
advantage of learning, the backward phase of the network is modified to adapt the scaling.
Figure 3. SC inner product circuit [21]
To create a mathematical model for SC inner product using the MUX tree and adapt it
to CNN, the MUX tree will be changed to the sum of products. For ๐๐๐ inputs, the MUX tree has
๐๐๐ โ 1 MUXs. We define the SM array which is a multi-dimensional array created from the
selector probabilities. This array will be of dimensions ๐๐๐. Each element is a binary ANDing of
selector bits related to the specific input until the output (9) shows the general SM array and an
example of OL-MUX tree if ๐๐๐ = 5 where ๐ ๐๐ is a bit of the selector bit-stream of the ๐ ๐กโ
MUX.
For more information about constructing optimum OL-MUX trees we refer to [21].
๐๐ =
[
๐ ๐ ๐๐๐โ1
ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ โฉ โฏ โฉ ๐ ๐1
ฬ ฬ ฬ ฬ
๐ ๐ ๐๐๐โ1
ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ ฬ โฉ โฏ โฉ ๐ ๐1
โฎ โฑ โฎ
๐ ๐ ๐ ๐๐โ1 โฉ โฏ ]
=
[
๐ ๐4
ฬ ฬ ฬ ฬ โฉ ๐ ๐2
ฬ ฬ ฬ ฬ โฉ ๐ ๐1
ฬ ฬ ฬ ฬ
๐ ๐4
ฬ ฬ ฬ ฬ โฉ ๐ ๐2
ฬ ฬ ฬ ฬ โฉ ๐ ๐1
๐ ๐4
ฬ ฬ ฬ ฬ โฉ ๐ ๐2
๐ ๐4 โฉ ๐ ๐3
ฬ ฬ ฬ ฬ
๐ ๐4 โฉ ๐ ๐3 ]
(9)
Thus, the general equation of the MUX tree will become like that of (10) for evaluating
one bit of the output bit-stream.
๐ = โ (๐ ๐ โ ๐ฅ๐) โฉ ๐๐๐
๐ ๐๐โ1
๐=0 (10)
Suppose we want to use the SC inner product MUX tree to compute all elements (e.g.
๐๐๐ = ๐ถโ ร ๐พ ร ๐พ). The inner product in (1) can be changed to (11) by taking advantage of SC
(7) and (10) and adding a new loop representing the SN bits. It should be noted that all the
6. ๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2840
variables are bits. The new index [b] is to evaluate the bit operation through the SNs from 0 to
L โ 1. It is apparent the amount of resource utilization reduced.
๐[๐][๐][๐][๐][๐] = โ โ โ ((๐ ๐ค[๐][๐โ][๐][๐] โ ๐ฅ[๐][๐โ][๐๐ + ๐][๐๐ + ๐][๐]) โฉ๐พโ1
๐=0
๐พโ1
๐=0
๐ถโโ1
๐โ=0
๐๐[๐][๐โ][๐][๐][๐] (11)
Instead of using a multiplier and an adder, by SC we use only some simple gates at the
expense of latency. The resources used in a MUX tree inner product of ๐๐๐ number of parallel
inputs are ๐๐๐ XOR gates and ๐๐๐ โ 1 MUXs. In practical implementations, to highly reduce
latency, some loops should be unrolled entirely.
The inputs of CNN are of single size. To use the full SN length, L should be
L = 232
= 4294967296 which is too long and will produce high latency. One approach to reduce
L is to reduce n, binary precision that will cause the binary quantization. The other approach is
to reduce L without changing n which will cause the SC rounding error. The accuracy of MUX
tree inner product circuit is evaluated with respect to precision and number of inputs with
rounding error as shown in Table 2. It can be concluded that the MUX tree has high accuracy
and robust to rounding error.
Table 2. MUX Tree Absolute Error ร 10โ2
with Respect to SN Length ๐ฟ and Number of Inputs,
๐๐๐, using Binary Precision ๐ = 64
๐๐๐, ๐ฟ 64 128 256 512 1024 2048 4096 8192
2 3.3 2.2 1.49 1.02 0.73 0.51 0.36 0.25
4 4.48 3.07 2.13 1.48 1.05 0.74 0.53 0.37
8 5.03 3.6 2.47 1.74 1.22 0.86 0.62 0.42
16 5.45 3.81 2.66 1.85 1.32 0.94 0.66 0.47
4.2. Pooling and Activation Function
In SC, if the correlation is exploited, the OR gates act as the max function for SCC=1.
Therefore, (3) can be modified to become as stated in (12). In SC, instead of using a compactor,
the OR gate will perform the max operation leading to a significant reduction in hardware
footprint. Usually, the max pooling stride S is 2 and kernel K is 2, so max pooling can be
realized using only 3 OR gates using the proposed approach after unrolling i and j loops of (12)
as shown in Figure 4a. Similar to max pooling, the ReLU activation function performs the max
operation but compared to zero. Thus, the input is โORedโ with a correlated SN of x=0 in the BP
domain. Figure 4b shows the ReLU circuit.
๐[๐][๐][๐][๐][๐] = โ โ (๐[๐][๐][๐๐ + ๐][๐๐ + ๐][๐]๐พโ1
๐=0
๐พโ1
๐=0 (12)
(a) Proposed max pooling (b) Proposed SC ReLU circuit
Figure 4. Proposed SC CNN pooling and activation function circuits
The general scaled addition (using MUX) in SC for ๐๐๐ inputs follow (13) which is easily
realized using ๐๐๐ โ 1 2-to-1 MUXs tree with uncorrelated selector bit-streams of probability 0.5
for each of the MUXs. Thus, average Pooling is straightforward in SC. For example, if we want
to implement in SC average pooling with stride size 2 and kernel size 2, 3 scaled addition units
7. TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2841
(2-to-1 MUXs) tree with selector probability p = 0.5 will be used. To increase the accuracy, TFF
will be utilized for selector bit-stream generation. The average pooling block can be used with
any SCC among inputs. Two versions of average pooling will be experimented, the average
pooling using independent RNGs for MUX selectors (SC AP RNG) and the average pooling
using T-FF to create the uncorrelated selector bit-stream (SC AP FF).
๐ง =
1
๐ ๐๐
โ ๐ฅ๐
๐ ๐๐โ1
๐=0 (13)
5. Experimental Results and Discussion
To clarify the effectiveness of the proposed SC CNN basic functions, they were
compared with previous SC work and the respective binary computation. The accuracy and the
resource utilization are the measured metrics. To evaluate the accuracy, the absolute error is
computed for 10000 attempts of randomly generated inputs where the conventional binary result
is the golden reference. From these attempts, we obtained the average output absolute errors.
Then different SN lengths L were used to observe the error behavior and the robustness to
rounding error as the input binary precision n = 32 bits. On the other hand, to evaluate the area
of the basic functions designs, we synthesize the circuits using Vivado Design Suite targeting
Xilinx ZYNQ Z706 FPGA.
Previous SC CNN used XNOR-MUX or XNOR-APC for the inner product operation in
convolutional layers of CNN [7, 8, 10]. This work proposed MUX tree for the inner product
shown in Figure 5, and the selector probability values follow (8) [21]. The number of inputs ๐๐๐
used in this experiment is 16 since it is more optimum for XNOR-APC SC inner product. The SN
length is varied through 64, 128, 256, 512, 1024, 2048, 4096, and 8192 bits since 8192 bits is
used in [10] and 1024 bits in [8]. Figure 6(a) shows the mean absolute error of MUX tree,
XNOR-MUX, and XNOR-APC approaches for inner product. To make a fair comparison, the
results of each SC circuit is multiplied by its scaling factor. For example, the MUX tree output is
multiplied by โ|โ|. The MUX tree obtained the least error. Therefore, the MUX tree SC inner
product is more accurate than previous works approaches with respect to different SN lengths.
Figure 5. MUX tree used for 4 ร 4 inner product
The resource utilizations of the three SC inner product approaches MUX tree, XNOR-
MUX, and XNOR-APC are compared along with conventional binary (bin) inner product (serial
design) as shown in Table 3. The MUX tree shows lower resource utilization of 1.6 ร and 2 ร
compared to XNOR-MUX and XNOR-APC. Also, significant savings compared to the binary
inner product. Besides, MUX tree has more area savings since the MUX tree circuit requires
one RNG for any number of inputs. However, XNOR-MUX and XNOR-APC SC inner product
need ๐๐๐ RNGs. Therefore, the MUX tree obtains ร ๐๐๐ RNG circuit savings. The RNG used is
linear feedback shift register LFSR.
8. ๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2842
Table 3. Resource Utilization of Proposed Functions, Previous Work, and Conventional Binary
Inner Product DSP FF LUT
MUX tree 0 0 35
XNOR-MUX 0 0 55
XNOR-APC 0 0 71
Bin 8-bit fixed point 1 26 18
Bin 32-bit float 5 540 740
Max Pooling (MP) DSP FF LUT
Proposed SC MP 0 0 1
Approximate SC MP [10] 0 32 65
Bin MP 32-bit 0 0 134
Activation function DSP FF LUT
Proposed SC ReLU 0 0 1
Bin ReLU 32-bit 0 0 16
Average Pooling (AP) DSP FF LUT
Proposed SC AP FF 0 3 4
SC AP RNGs 0 0 2
Bin AP 32-bit 0 0 95
Using the proposed MUX tree for SC inner product operation of the CNN convolutional
layer is more efficient compared to previous SC inner product circuits or the conventional binary.
MUX tree is more accurate than other SC inner product and has less hardware footprint. Also,
compared to conventional binary, using the MUX tree SC inner product will provide significant
resource utilization savings. Without exploiting correlation, the max pooling operation in SC is
hard to be designed. Ren et al. [10] proposed an approximate SC max pooling circuit. However,
the proposed SC max pooling in this study outperforms the previous work in terms of accuracy
and resource utilization. Figure 6 (b) shows that the proposed max pooling is more accurate for
any SN length L. Also Figure 6 (c) shows the absolute error of proposed average pooling
operation using independent RNG for each MUX selector and T-FF for selector bitstream
generation. The average pooling using independent RNGs requires ๐๐๐2(๐๐๐) + 1 different
RNGs, while the average pooling using T-FF and MUXs require 1 RNG for any number of
inputs. This result a (๐๐๐2(๐๐๐) + 1) times savings in RNGs. The resource utilization of the
proposed SC average and max-pooling functions and their binary counterparts are shown in
Table 3 where all are of parallel architecture. The proposed SC ReLU circuit absolute error is
shown in Figure 6(d). A very minimal accuracy loss in the proposed SC ReLU is obtained with a
high resource utilization savings of 16 times.
(a) SC inner product circuits absolute error
(b) SC max pooling absolute error
(c) SC average pooling absolute error (d) SC ReLU absolute error
Figure 6. The absolute error of the proposed basic functions compared to previous works
9. TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2843
6. Conclusion
In this paper, the SC CNN basic functions exploiting correlation was proposed with
reduced hardware footprint to be efficient in the resource-constrained mobile and embedded
systems. These functions are inner product, max pooling, average pooling, and ReLU activation
function. A combination of these basic functions when looped create a specific CNN layer.
Experimental results demonstrate that the proposed SC functions achieved significant hardware
footprint savings compared to equivalent binary functions. Also, the proposed functions
outperformed previous works of SC CNN in terms of accuracy and resource utilization. Our
future work will investigate the performance of a complete SC CNN which is composed of the
proposed basic functions.
References
[1] Y LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature. 2015; 521(7553)436โ444.
[2] B Cheung. Convolutional Neural Networks Applied to Human Face Classification. 2012 11th Int.
Conf. Mach. Learn. Appl. 2012: 580โ583.
[3] C Wu, W Fan, Y He, J Sun, S Naoi. Cascaded Heterogeneous Convolutional Neural Networks for
Handwritten Digit Recognition. Int. Conf. Pattern Recognit. 2012: 657โ660.
[4] J S Ashwin, N Manoharan. Convolutional Neural Network Based Target Recognition for Marine
Search. Indones. J. Electr. Eng. Comput. Sci. 2017; 8(2): 561โ563.
[5] R Wang, J Zhang, W Dong, J Yu, C J. Xie, R Li, T Chen, H Chen. A Crop Pests Image
Classification Algorithm Based on Deep Convolutional Neural Network. Telkomnika
Telecommunication Computing Electronics and Control. 2017; 15(3): 1239โ1246.
[6] M Alawad, M. Lin. Stochastic-Based Deep Convolutional Networks with Reconfigurable Logic
Fabric. IEEE Trans. Multi-Scale Comput. Syst. 2016; 99: 1.
[7] J Li, A Ren, Z Li, C Ding, B Yuan, Q Qiu, Y Wang. Towards acceleration of deep convolutional
neural networks using stochastic computing. Proc. Asia South Pacific Des. Autom. Conf.
ASP-DAC. 2017: 115โ120.
[8] H Sim, D Nguyen, J Lee, K. Choi. Scalable stochastic-computing accelerator for convolutional
neural networks. Proc. Asia South Pacific Des. Autom. Conf. ASP-DAC. 2017: 696โ701.
[9] A Alaghi, J P Hayes. Exploiting correlation in stochastic circuit design. 2013 IEEE 31st Int. Conf.
Comput. Des. ICCD. 2013: 39โ46.
[10] A Ren, J Li, Z Li, C Ding, X Qian, Q Qiu, B Yuan, Y Wang. SC-DCNN: Highly-Scalable Deep
Convolutional Neural Network using Stochastic Computing. in Proceedings of the Twenty-Second
International Conference on Architectural Support for Programming Languages and Operating
Systems. 2017: 405โ418.
[11] J Li, Z Yuan, Z Li, C Ding, A Ren, Q Qiu, J Draper, Y Wang. Hardware-Driven Nonlinear Activation
for Stochastic Computing Based Deep Convolutional Neural Networks. in International Joint
Conference on Neural Networks (IJCNN). 2017: 1230โ1236.
[12] W Qian, X Li, M D Riedel, K Bazargan, D J Lilja. An architecture for fault-tolerant computation with
stochastic logic. IEEE Trans. Comput. 2011; 60(1): 93โ105.
[13] A Krizhevsky, I Sutskever, G E Hinton. ImageNet Classification with Deep Convolutional Neural
Networks. Adv. Neural Inf. Process. Syst. 2012: 1โ9.
[14] S-M Khaligh-Razavi. What you need to know about the state-of-the-art computational models of
object-vision: A tour through the models. arXiv Prepr. arXiv1407.2776. 2014: 36.
[15] C Zhang, P Li, G Sun, Y Guan, B Xiao, . Cong. Optimizing FPGA-based Accelerator Design for
Deep Convolutional Neural Networks. Proc. 2015 ACM/SIGDA Int. Symp. Field-Programmable
Gate Arrays - FPGA. 2015; 15: 161โ170.
[16] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan, V Vanhoucke, A Rabinovich.
Going Deeper with Convolutions. in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2015: 1โ9.
[17] A Alaghi and J P Hayes. Survey of Stochastic Computing. ACM Trans. Embed. Comput. Syst.
2013; 12(2s): 1โ19.
[18] B. Gaines. Stochastic computing systems. Adv. Inf. Syst. Sci. 1969: 37โ172.
[19] V C Gaudet, A C Rapley. Iterative decoding using stochastic computation. Electron. Lett. 2003;
39(3): 299โ301.
[20] K Kim, J Lee, K Choi. Approximate de-randomizer for stochastic circuits. ISOCC 2015 - Int. SoC
Des. Conf. SoC Internet Everything. 2016: 123โ124.
[21] H Ichihara, T Sugino, S Ishii, T Iwagaki, T Inoue. Compact and Accurate Digital Filters Based on
Stochastic Computing. Trans. Emerg. Top. Comput. 2016; 99: 1โ12