SlideShare a Scribd company logo
TELKOMNIKA, Vol.16, No.6, December 2018, pp.2835~2843
ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018
DOI: 10.12928/TELKOMNIKA.v16i6.8955 ๏ฎ 2835
Received February 14, 2018; Revised September 27, 2018; Accepted October 24, 2018
Stochastic Computing Correlation Utilization in
Convolutional Neural Network Basic Functions
Hamdan Abdellatef*, Mohamed Khalil-Hani, Nasir Shaikh-Husin, Sayed Omid Ayat
VeCAD Research Laboratory, School of Electrical Engineering, Faculty of Engineering,
Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
*Corresponding author, e-mail: hamdan.abdellatef@gmail.com
Abstract
In recent years, many applications have been implemented in embedded systems and mobile
Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and
exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry
Convolutional Neural Network (CNN) in this class of devices, many research groups have developed
specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate
Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm
called Stochastic Computing (SC) can implement CNN with low hardware footprint and power
consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in
SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding
error. Experimental results show our proposed solution provides significant savings in hardware footprint
and increased accuracy for the SC CNN basic functions circuits compared to previous work.
Keywords: convolutional neural network, stochastic computing, correlation
Copyright ยฉ 2018 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
Deep learning has emerged as a new area of machine learning research, which
enables a system to automatically learn complex information and extract representations at
multiple levels of abstraction. Convolutional Neural Network (CNN) is recognized as one of the
most promising types of Artificial Neural Networks (ANN) and has become the dominant
approach for almost all recognition and detection tasks [1] such as face recognition [2],
handwritten digit recognition [3], target recognition [4], and image classification [5]. To achieve
acceptable classification results, CNN performs a massive number of convolutions and sub-
sampling operations with significant amounts of intermediate data results. Despite its high
classification accuracy, a deep CNN is highly demanding in terms of energy consumption and
computation cost [6]. To bring the success of CNNs to resource-constrained mobile and
embedded systems, designers must overcome the challenges of implementing resource-hungry
CNNs in embedded systems with limited area and power budget [7].
Stochastic Computing (SC), which represents and processes information in the form of
a probability of ones in a bit-stream, has the potential to implement CNNs with significantly
reduced hardware resources and achieve high power efficiency. In SC, arithmetic operations
like multiplication can be performed using simple AND or XNOR logic gate in Uni-Polar (UP) or
BiPolar (BP) representation, respectively, and scaled addition is done using Multiplexers (MUX).
Also, in SC, there are no positional weights among bits; therefore, SC circuits are better in
soft-error resilience and have a free dynamic trade-off between performance, accuracy, and
energy. Fortunately, neural networks have high error-tolerance at the algorithmic level, which
allows using SC for CNN implementation [8]. Opposite to what was previously believed, the
correlation among Stochastic Numbers (SN) can serve as a resource in designing stochastic
circuits. A comparative study [9] reported that the circuits exploiting correlation are generally
smaller, more accurate, and have lower latency than those with independent inputs. However,
previous SC CNN works [7, 8, 10, 11] did not explore correlation in their implementation.
Besides, these works did not consider the conversion circuits between SC and conventional
binary in their estimates as RNGs can take up to 80% of the circuit cost [12]. Also, long bit
๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2836
streams up to 8192 bits are used to obtain acceptable accuracy which significantly increases
the latency. This work has the following contributions:
1. SC CNN basic functions (inner product, pooling, and ReLU activation function) that exploit
correlation is proposed. The obtained functions had significant lower resource utilization,
higher accuracy, and more robust to rounding error compared to previous SC work. Also, it
shows significant area reduction compared to binary implementation [13].
2. A new method that generates uncorrelated bit streams for MUXs selectors to enhance the
accuracy of scaled addition using Toggle Flip-Flops (TFF) is presented. Also, this method
reduces the number of used RNGs.
The rest of this paper is organized as follows. Section 2 overviews CNN and explains its
basic functions. Section 3 reviews SC basics. Section 4 presents the CNN basic functions (inner
product, pooling, ReLU activation function) design method. Section 5 presents experimental
results to show the effectiveness of the proposed design with respect to compactness and
accuracy, and finally, Section 6 concludes the paper.
2. Convolutional Neural Network
Previously, hand-engineered features development had been the primary source of
difficulty in computer vision, like sophisticated feature extractors, to identify higher-level patterns
that are optimal for machine vision tasks, such as object recognition. However, convolutional
neural networks aim to solve this problem by learning higher-level representations automatically
from data [14]. As a supervised learning algorithm, CNN employs a feedforward process for
recognition and a backward path for training. In industrial practice, many application designers
train CNN off-line and use the off-line trained CNN to perform time-sensitive jobs. Thus, the
speed, area, and energy consumption of feedforward computation are to be considered. This
work is scoped to the feedforward computation hardware implementation on FPGA.
A typical CNN, as shown in Figure 1, is composed of multiple computational layers that
can be categorized into two components: a feature extractor and a classifier. The feature
extractor is used to filter input images into "feature maps" that represent features of the image
such as corners and edges which are relatively invariant to position shifts or distortions. The
feature extractor output is a low-dimensional vector containing features. Then, the vector is fed
into the CNN second component classifier, which is usually a traditional fully connected artificial
neural network. The classifier decides the likelihood of categories that the input image might
belong to [15].
Figure 1. CNN architecture proposed in [13]
The CNN layers can be one of three types: convolutional layer, pooling layer, or fully
connected layer. The convolutional layer is the core building block of the CNN whereas the main
operation is the convolution that computes the inner-product of receptive fields, a window of the
input feature map, and a set of learnable filters. This layer is the most time and resource
consuming operation in CNN, occupying more than 90% overall computation dominating
runtime and energy consumption as shown in [16]. The convolutional layer has N feature map
batch size, M output feature maps, Ch input feature maps, H/W size of output feature map,
K weights kernel size, and S stride. To compute one element in the output feature map of the
TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2837
convolutional layer (1) is evaluated. This layer is of 4-D operation and to calculate all output
values (1) is looped for N, M, H, and W . The convolutional layer contains two functions, the
convolution, and activation. The convolution is loops of multiply-accumulate (MAC) operations
(or inner product). The activation functions conduct non-linear transformations such as Rectified
Linear Unit (ReLU), Sigmoid function, and hyperbolic tangent function. This work implements
only ReLU activation function since it is the most popular one. ReLU compares the input to zero
and outputs the maximum value as shown in (2).
๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘] = ๐ด๐‘๐‘ก๐‘–๐‘ฃ๐‘Ž๐‘ก๐‘–๐‘œ๐‘›(๐ต[๐‘š] + โˆ‘ โˆ‘ โˆ‘ ๐‘ฅ[๐‘›][๐‘โ„Ž][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—] ร—๐พโˆ’1
๐‘—=0
๐พโˆ’1
๐‘–=0
๐ถโ„Žโˆ’1
๐‘โ„Ž=0
๐‘ค[๐‘š][๐‘โ„Ž][๐‘–][๐‘—] (1)
๐‘œ๐‘ข๐‘ก = ๐‘š๐‘Ž๐‘ฅ(๐‘ฅ, 0) (2)
The pooling layers perform nonlinear down-sampling for data dimension reduction.
Commonly, max pooling and average pooling are used for this purpose. The max pooling layer
is shown in (3) which output the max value in a 2D window of K size and S stride. The average
pooling is to compute the average value of the same window as shown in (4). To complete the
layer operation, this equation is repeated for all N, M, H, W. The output feature map of the
pooling layer has a 1/๐‘† dimension reduction in height and width. Finally, the high-level
reasoning is completed via the classifier which is a fully connected layer. Neurons in this layer
are connected to all activation results in the previous layer which is an inner product with a filter
size of one element.
๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘] = ๐‘š๐‘Ž๐‘ฅ(๐‘ฅ[๐‘›][๐‘š][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—]) ๐‘“๐‘œ๐‘Ÿ (0,0) โ‰ค (๐‘–, ๐‘—) < (๐พ โˆ’ 1, ๐พ โˆ’ 1) (3)
๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘] =
1
๐พ2
โˆ‘ โˆ‘ ๐‘ฅ[๐‘›][๐‘š][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—]๐พโˆ’1
๐‘—=0
๐พโˆ’1
๐‘–=0 (4)
The basic operations in CNN are the inner product, pooling, and activation function
operations. Any CNN neuron may consist of one or multiple basic operations. For instance,
neurons in convolutional layers implement inner product and activation operations only; those in
pooling layers implement pooling only, and those in fully connected layers implement inner
product and activation operations.
3. Stochastic Computing
Stochastic computing (SC) represents and processes information in the form of digitized
probabilities. In SC, numbers called stochastic numbers SNs are represented by binary bit
streams. The SN donate the probability p, the probability of 1s in the SN [17]. SN has no fixed
length nor structure. The stochastic representation can be One-line UP, One-line BP, and
Two-line. This paper uses BP representation since it allows negative values, but UP do not. The
UP and BP stochastic representations are clarified in Table 1 where N0, N1, and N represents
the number of zeros, ones, total bits in SN respectively. To convert from binary to stochastic a
stochastic number generator (SNG) is used which is a random number generator RNG and a
comparator. On the other hand, to convert from Stochastic to binary, a counter is used.
According to [18], SNs should be independent and uncorrelated bit-streams. However, recent
studies [9] showed that the correlation could serve as a resource in designing stochastic
circuits. In that study, Alaghi and Hayes introduced a parameter that determines the significance
of the correlation between two SNs called stochastic computing correlation (SCC) as
shown in (5). The major advantage of SC is that it employs very low-complexity arithmetic
units [17], as shown in Table 1. The AND or XNOR gates perform the multiplication operation in
SC using UP or BP representation. There is no direct addition in SC; instead, a scaled addition
is used. 2-to-1 MUX performs the scaled addition having ๐‘๐‘  = 0.5 or s = 0 selector bit stream
value in UP and BP representation respectively. It should be noted that the MUX selector should
be uncorrelated with the MUX inputs to prevent correlation-induced error. However, the MUX
inputs are correlation-insensitive and can have any value of correlation. To perform subtraction
NOT gate can be used to negate the subtracted value and then it will be added by 2-to-1 MUX
to the other value.
๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2838
Table 1. The SN Representations and Basic Operations
Value Interval Relation Negation Multiplication Scaled Addition
UP
๐‘ =
๐‘1
๐‘
[0,1]
p =
1 + ๐‘ฅ
2
๐‘ ๐‘๐‘‚๐‘‡
= 1 โˆ’ ๐‘
๐‘ ๐ด๐‘๐ท = ๐‘1 ๐‘2 ๐‘ ๐‘€๐‘ˆ๐‘‹ = ๐‘๐‘  ๐‘1 + (1 โˆ’ ๐‘๐‘ )๐‘2
BP ๐‘ฅ
=
๐‘1 โˆ’ ๐‘0
๐‘
[-1,1] x = 2p โˆ’ 1 ๐‘ฅ ๐‘๐‘‚๐‘‡ = โˆ’๐‘ฅ ๐‘ฅ ๐‘‹๐‘๐‘‚๐‘… = ๐‘ฅ1 ๐‘ฅ2
๐‘ฅ ๐‘€๐‘ˆ๐‘‹ =
1
2
[(1 + ๐‘ )๐‘ฅ1 + (1
โˆ’ ๐‘ )๐‘ฅ2]
One source of inaccuracy in SC is rounding. If we wish to eliminate the rounding error,
the SN length L, and the binary precision, the number of bits, n should satisfy L = 2 ๐‘›
. Suppose
the precision in binary computing is n = 8 bits, so the full SN length L = 256 bits. Each bit
requires one clock cycle to be processed causing the SC latency. As precision increases, L and
latency increase exponentially which is a significant drawback in SC. To reduce the number of
clock cycles needed, the full SN length is not used producing rounding error.
๐‘†๐ถ๐ถ(๐‘‹, ๐‘Œ) = {
๐‘ ๐‘‹โˆฉ๐‘Œโˆ’๐‘ ๐‘‹ ๐‘ ๐‘Œ
๐‘š๐‘–๐‘›(๐‘ ๐‘‹,๐‘ ๐‘Œ)โˆ’๐‘ ๐‘‹ ๐‘ ๐‘Œ
๐‘ ๐‘‹โˆฉ๐‘Œ โˆ’ ๐‘ ๐‘‹ ๐‘ ๐‘Œ > 0
0 ๐‘ ๐‘‹โˆฉ๐‘Œ โˆ’ ๐‘ ๐‘‹ ๐‘ ๐‘Œ = 0
๐‘ ๐‘‹โˆฉ๐‘Œโˆ’๐‘ ๐‘‹ ๐‘ ๐‘Œ
๐‘ ๐‘‹ ๐‘ ๐‘Œโˆ’๐‘š๐‘Ž๐‘ฅ(๐‘ ๐‘‹+๐‘ ๐‘Œโˆ’1,0)
๐‘ ๐‘‹โˆฉ๐‘Œ โˆ’ ๐‘ ๐‘‹ ๐‘ ๐‘Œ < 0
(5)
For SC addition operation, it is required to produce a 0.5 bit-stream that has ๐‘†๐ถ๐ถ โ‰ˆ 0
with respect to the inputs. Theoretically, the independent RNGs generates uncorrelated
bitstreams, but the growing circuit size will require many independent RNGs affecting the area
cost. In this study, we propose using flip-flops (FF)s to obtain uncorrelated bit-streams for SC
scaled addition. T-FF is a JK-FF where T is connected to both inputs of the JK-FF. Based on
Gaudet and Rapley [19] the JK-FF output follow (6). In the case of T-FF ๐‘ ๐‘‡ = ๐‘๐ฝ = ๐‘ ๐พ, then
always ๐‘ ๐‘„ = 0.5 for any value of T and always SCC(T, Q) โ‰ˆ 0. Using T-FF will allow using one
RNG for all SNGs to generate the MUX inputs and the uncorrelated selector as shown in
Figure 2a. Similarly, SC addition accuracy will be increased as shown in Figure 2b.
๐‘ ๐‘„ =
๐‘ ๐ฝ
๐‘ ๐ฝ+๐‘ ๐‘„
(6)
(a) SC scaled addition using T-FF for MUX
selector generation
(b) Accuracy comparison between T-FF and
independent RNG for selector generation
Figure 2. More accurate SC scaled addition by using T-FF
4. Design of Stochastic Computing Based Convolutional Neural Network Basic
Functions
4.1. Inner Product
The inner product is a MAC operation which is the basic function of convolution in CNN.
The number of elements in the inner product is determined from the "for loop" unroll factor. To
perform inner product in SC, the standard blocks are XNOR to perform multiplication and MUX
or Approximate Parallel Counter (APC) [20] to perform addition as proposed in previous SC
TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2839
CNN works [7, 8, 10]. However, XNOR gate requires long uncorrelated bit-streams which will
affect latency and area cost by using one RNG per SNG. On the other hand, MUX tree
approach proposed in [21] to perform inner product in digital filter case study can be adapted to
this application. MUX tree allows sharing RNG among all inputs and is more accurate compared
to previous XNOR-MUX and XNOR-APC approaches as to be shown in Section 5.
One MUX can be used for inner product of 2-elements vectors x and h as shown in
Figure 3 and following (7), where the real operation does not involve multiplication or
๐‘ง =
1
โˆ‘|โ„Ž|
(โ„Ž โˆ™ ๐‘ฅ) =
1
|โ„Ž1|+|โ„Ž2|
(โ„Ž1 ๐‘ฅ1 + โ„Ž2 ๐‘ฅ2) (7)
addition. The selector bit-stream of probability
|โ„Ž1|
|โ„Ž1|+|โ„Ž2|
which follows (8) and
๐‘ ๐‘™ ๐‘€ =
โˆ‘(โ„Ž1 ๐‘€)
โˆ‘(โ„Ž1 ๐‘€โˆชโ„Ž0 ๐‘€)
(8)
sign(โ„Ž๐‘–) will be denoted by ๐‘ ๐‘–. The same equation shows the MUX tree output for inner
product of two vectors x and h with any length, but scaling seems to be a problem. Taking
advantage of learning, the backward phase of the network is modified to adapt the scaling.
Figure 3. SC inner product circuit [21]
To create a mathematical model for SC inner product using the MUX tree and adapt it
to CNN, the MUX tree will be changed to the sum of products. For ๐‘๐‘–๐‘› inputs, the MUX tree has
๐‘๐‘–๐‘› โˆ’ 1 MUXs. We define the SM array which is a multi-dimensional array created from the
selector probabilities. This array will be of dimensions ๐‘๐‘–๐‘›. Each element is a binary ANDing of
selector bits related to the specific input until the output (9) shows the general SM array and an
example of OL-MUX tree if ๐‘๐‘–๐‘› = 5 where ๐‘ ๐‘™๐‘– is a bit of the selector bit-stream of the ๐‘– ๐‘กโ„Ž
MUX.
For more information about constructing optimum OL-MUX trees we refer to [21].
๐‘†๐‘€ =
[
๐‘ ๐‘™ ๐‘๐‘–๐‘›โˆ’1
ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ… โˆฉ โ‹ฏ โˆฉ ๐‘ ๐‘™1
ฬ…ฬ…ฬ…ฬ…
๐‘ ๐‘™ ๐‘๐‘–๐‘›โˆ’1
ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ… โˆฉ โ‹ฏ โˆฉ ๐‘ ๐‘™1
โ‹ฎ โ‹ฑ โ‹ฎ
๐‘ ๐‘™ ๐‘ ๐‘–๐‘›โˆ’1 โˆฉ โ‹ฏ ]
=
[
๐‘ ๐‘™4
ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™2
ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™1
ฬ…ฬ…ฬ…ฬ…
๐‘ ๐‘™4
ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™2
ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™1
๐‘ ๐‘™4
ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™2
๐‘ ๐‘™4 โˆฉ ๐‘ ๐‘™3
ฬ…ฬ…ฬ…ฬ…
๐‘ ๐‘™4 โˆฉ ๐‘ ๐‘™3 ]
(9)
Thus, the general equation of the MUX tree will become like that of (10) for evaluating
one bit of the output bit-stream.
๐‘ = โ‹ƒ (๐‘ ๐‘– โŠ• ๐‘ฅ๐‘–) โˆฉ ๐‘†๐‘€๐‘–
๐‘ ๐‘–๐‘›โˆ’1
๐‘–=0 (10)
Suppose we want to use the SC inner product MUX tree to compute all elements (e.g.
๐‘๐‘–๐‘› = ๐ถโ„Ž ร— ๐พ ร— ๐พ). The inner product in (1) can be changed to (11) by taking advantage of SC
(7) and (10) and adding a new loop representing the SN bits. It should be noted that all the
๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2840
variables are bits. The new index [b] is to evaluate the bit operation through the SNs from 0 to
L โˆ’ 1. It is apparent the amount of resource utilization reduced.
๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘][๐‘] = โ‹ƒ โ‹ƒ โ‹ƒ ((๐‘  ๐‘ค[๐‘š][๐‘โ„Ž][๐‘–][๐‘—] โŠ• ๐‘ฅ[๐‘›][๐‘โ„Ž][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—][๐‘]) โˆฉ๐พโˆ’1
๐‘—=0
๐พโˆ’1
๐‘–=0
๐ถโ„Žโˆ’1
๐‘โ„Ž=0
๐‘†๐‘€[๐‘š][๐‘โ„Ž][๐‘–][๐‘—][๐‘] (11)
Instead of using a multiplier and an adder, by SC we use only some simple gates at the
expense of latency. The resources used in a MUX tree inner product of ๐‘๐‘–๐‘› number of parallel
inputs are ๐‘๐‘–๐‘› XOR gates and ๐‘๐‘–๐‘› โˆ’ 1 MUXs. In practical implementations, to highly reduce
latency, some loops should be unrolled entirely.
The inputs of CNN are of single size. To use the full SN length, L should be
L = 232
= 4294967296 which is too long and will produce high latency. One approach to reduce
L is to reduce n, binary precision that will cause the binary quantization. The other approach is
to reduce L without changing n which will cause the SC rounding error. The accuracy of MUX
tree inner product circuit is evaluated with respect to precision and number of inputs with
rounding error as shown in Table 2. It can be concluded that the MUX tree has high accuracy
and robust to rounding error.
Table 2. MUX Tree Absolute Error ร— 10โˆ’2
with Respect to SN Length ๐ฟ and Number of Inputs,
๐‘๐‘–๐‘›, using Binary Precision ๐‘› = 64
๐‘๐‘–๐‘›, ๐ฟ 64 128 256 512 1024 2048 4096 8192
2 3.3 2.2 1.49 1.02 0.73 0.51 0.36 0.25
4 4.48 3.07 2.13 1.48 1.05 0.74 0.53 0.37
8 5.03 3.6 2.47 1.74 1.22 0.86 0.62 0.42
16 5.45 3.81 2.66 1.85 1.32 0.94 0.66 0.47
4.2. Pooling and Activation Function
In SC, if the correlation is exploited, the OR gates act as the max function for SCC=1.
Therefore, (3) can be modified to become as stated in (12). In SC, instead of using a compactor,
the OR gate will perform the max operation leading to a significant reduction in hardware
footprint. Usually, the max pooling stride S is 2 and kernel K is 2, so max pooling can be
realized using only 3 OR gates using the proposed approach after unrolling i and j loops of (12)
as shown in Figure 4a. Similar to max pooling, the ReLU activation function performs the max
operation but compared to zero. Thus, the input is โ€œORedโ€ with a correlated SN of x=0 in the BP
domain. Figure 4b shows the ReLU circuit.
๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘][๐‘] = โ‹ƒ โ‹ƒ (๐‘‹[๐‘›][๐‘š][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—][๐‘]๐พโˆ’1
๐‘—=0
๐พโˆ’1
๐‘–=0 (12)
(a) Proposed max pooling (b) Proposed SC ReLU circuit
Figure 4. Proposed SC CNN pooling and activation function circuits
The general scaled addition (using MUX) in SC for ๐‘๐‘–๐‘› inputs follow (13) which is easily
realized using ๐‘๐‘–๐‘› โˆ’ 1 2-to-1 MUXs tree with uncorrelated selector bit-streams of probability 0.5
for each of the MUXs. Thus, average Pooling is straightforward in SC. For example, if we want
to implement in SC average pooling with stride size 2 and kernel size 2, 3 scaled addition units
TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2841
(2-to-1 MUXs) tree with selector probability p = 0.5 will be used. To increase the accuracy, TFF
will be utilized for selector bit-stream generation. The average pooling block can be used with
any SCC among inputs. Two versions of average pooling will be experimented, the average
pooling using independent RNGs for MUX selectors (SC AP RNG) and the average pooling
using T-FF to create the uncorrelated selector bit-stream (SC AP FF).
๐‘ง =
1
๐‘ ๐‘–๐‘›
โˆ‘ ๐‘ฅ๐‘–
๐‘ ๐‘–๐‘›โˆ’1
๐‘–=0 (13)
5. Experimental Results and Discussion
To clarify the effectiveness of the proposed SC CNN basic functions, they were
compared with previous SC work and the respective binary computation. The accuracy and the
resource utilization are the measured metrics. To evaluate the accuracy, the absolute error is
computed for 10000 attempts of randomly generated inputs where the conventional binary result
is the golden reference. From these attempts, we obtained the average output absolute errors.
Then different SN lengths L were used to observe the error behavior and the robustness to
rounding error as the input binary precision n = 32 bits. On the other hand, to evaluate the area
of the basic functions designs, we synthesize the circuits using Vivado Design Suite targeting
Xilinx ZYNQ Z706 FPGA.
Previous SC CNN used XNOR-MUX or XNOR-APC for the inner product operation in
convolutional layers of CNN [7, 8, 10]. This work proposed MUX tree for the inner product
shown in Figure 5, and the selector probability values follow (8) [21]. The number of inputs ๐‘๐‘–๐‘›
used in this experiment is 16 since it is more optimum for XNOR-APC SC inner product. The SN
length is varied through 64, 128, 256, 512, 1024, 2048, 4096, and 8192 bits since 8192 bits is
used in [10] and 1024 bits in [8]. Figure 6(a) shows the mean absolute error of MUX tree,
XNOR-MUX, and XNOR-APC approaches for inner product. To make a fair comparison, the
results of each SC circuit is multiplied by its scaling factor. For example, the MUX tree output is
multiplied by โˆ‘|โ„Ž|. The MUX tree obtained the least error. Therefore, the MUX tree SC inner
product is more accurate than previous works approaches with respect to different SN lengths.
Figure 5. MUX tree used for 4 ร— 4 inner product
The resource utilizations of the three SC inner product approaches MUX tree, XNOR-
MUX, and XNOR-APC are compared along with conventional binary (bin) inner product (serial
design) as shown in Table 3. The MUX tree shows lower resource utilization of 1.6 ร— and 2 ร—
compared to XNOR-MUX and XNOR-APC. Also, significant savings compared to the binary
inner product. Besides, MUX tree has more area savings since the MUX tree circuit requires
one RNG for any number of inputs. However, XNOR-MUX and XNOR-APC SC inner product
need ๐‘๐‘–๐‘› RNGs. Therefore, the MUX tree obtains ร— ๐‘๐‘–๐‘› RNG circuit savings. The RNG used is
linear feedback shift register LFSR.
๏ฎ ISSN: 1693-6930
TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843
2842
Table 3. Resource Utilization of Proposed Functions, Previous Work, and Conventional Binary
Inner Product DSP FF LUT
MUX tree 0 0 35
XNOR-MUX 0 0 55
XNOR-APC 0 0 71
Bin 8-bit fixed point 1 26 18
Bin 32-bit float 5 540 740
Max Pooling (MP) DSP FF LUT
Proposed SC MP 0 0 1
Approximate SC MP [10] 0 32 65
Bin MP 32-bit 0 0 134
Activation function DSP FF LUT
Proposed SC ReLU 0 0 1
Bin ReLU 32-bit 0 0 16
Average Pooling (AP) DSP FF LUT
Proposed SC AP FF 0 3 4
SC AP RNGs 0 0 2
Bin AP 32-bit 0 0 95
Using the proposed MUX tree for SC inner product operation of the CNN convolutional
layer is more efficient compared to previous SC inner product circuits or the conventional binary.
MUX tree is more accurate than other SC inner product and has less hardware footprint. Also,
compared to conventional binary, using the MUX tree SC inner product will provide significant
resource utilization savings. Without exploiting correlation, the max pooling operation in SC is
hard to be designed. Ren et al. [10] proposed an approximate SC max pooling circuit. However,
the proposed SC max pooling in this study outperforms the previous work in terms of accuracy
and resource utilization. Figure 6 (b) shows that the proposed max pooling is more accurate for
any SN length L. Also Figure 6 (c) shows the absolute error of proposed average pooling
operation using independent RNG for each MUX selector and T-FF for selector bitstream
generation. The average pooling using independent RNGs requires ๐‘™๐‘œ๐‘”2(๐‘๐‘–๐‘›) + 1 different
RNGs, while the average pooling using T-FF and MUXs require 1 RNG for any number of
inputs. This result a (๐‘™๐‘œ๐‘”2(๐‘๐‘–๐‘›) + 1) times savings in RNGs. The resource utilization of the
proposed SC average and max-pooling functions and their binary counterparts are shown in
Table 3 where all are of parallel architecture. The proposed SC ReLU circuit absolute error is
shown in Figure 6(d). A very minimal accuracy loss in the proposed SC ReLU is obtained with a
high resource utilization savings of 16 times.
(a) SC inner product circuits absolute error
(b) SC max pooling absolute error
(c) SC average pooling absolute error (d) SC ReLU absolute error
Figure 6. The absolute error of the proposed basic functions compared to previous works
TELKOMNIKA ISSN: 1693-6930 ๏ฎ
Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef)
2843
6. Conclusion
In this paper, the SC CNN basic functions exploiting correlation was proposed with
reduced hardware footprint to be efficient in the resource-constrained mobile and embedded
systems. These functions are inner product, max pooling, average pooling, and ReLU activation
function. A combination of these basic functions when looped create a specific CNN layer.
Experimental results demonstrate that the proposed SC functions achieved significant hardware
footprint savings compared to equivalent binary functions. Also, the proposed functions
outperformed previous works of SC CNN in terms of accuracy and resource utilization. Our
future work will investigate the performance of a complete SC CNN which is composed of the
proposed basic functions.
References
[1] Y LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature. 2015; 521(7553)436โ€“444.
[2] B Cheung. Convolutional Neural Networks Applied to Human Face Classification. 2012 11th Int.
Conf. Mach. Learn. Appl. 2012: 580โ€“583.
[3] C Wu, W Fan, Y He, J Sun, S Naoi. Cascaded Heterogeneous Convolutional Neural Networks for
Handwritten Digit Recognition. Int. Conf. Pattern Recognit. 2012: 657โ€“660.
[4] J S Ashwin, N Manoharan. Convolutional Neural Network Based Target Recognition for Marine
Search. Indones. J. Electr. Eng. Comput. Sci. 2017; 8(2): 561โ€“563.
[5] R Wang, J Zhang, W Dong, J Yu, C J. Xie, R Li, T Chen, H Chen. A Crop Pests Image
Classification Algorithm Based on Deep Convolutional Neural Network. Telkomnika
Telecommunication Computing Electronics and Control. 2017; 15(3): 1239โ€“1246.
[6] M Alawad, M. Lin. Stochastic-Based Deep Convolutional Networks with Reconfigurable Logic
Fabric. IEEE Trans. Multi-Scale Comput. Syst. 2016; 99: 1.
[7] J Li, A Ren, Z Li, C Ding, B Yuan, Q Qiu, Y Wang. Towards acceleration of deep convolutional
neural networks using stochastic computing. Proc. Asia South Pacific Des. Autom. Conf.
ASP-DAC. 2017: 115โ€“120.
[8] H Sim, D Nguyen, J Lee, K. Choi. Scalable stochastic-computing accelerator for convolutional
neural networks. Proc. Asia South Pacific Des. Autom. Conf. ASP-DAC. 2017: 696โ€“701.
[9] A Alaghi, J P Hayes. Exploiting correlation in stochastic circuit design. 2013 IEEE 31st Int. Conf.
Comput. Des. ICCD. 2013: 39โ€“46.
[10] A Ren, J Li, Z Li, C Ding, X Qian, Q Qiu, B Yuan, Y Wang. SC-DCNN: Highly-Scalable Deep
Convolutional Neural Network using Stochastic Computing. in Proceedings of the Twenty-Second
International Conference on Architectural Support for Programming Languages and Operating
Systems. 2017: 405โ€“418.
[11] J Li, Z Yuan, Z Li, C Ding, A Ren, Q Qiu, J Draper, Y Wang. Hardware-Driven Nonlinear Activation
for Stochastic Computing Based Deep Convolutional Neural Networks. in International Joint
Conference on Neural Networks (IJCNN). 2017: 1230โ€“1236.
[12] W Qian, X Li, M D Riedel, K Bazargan, D J Lilja. An architecture for fault-tolerant computation with
stochastic logic. IEEE Trans. Comput. 2011; 60(1): 93โ€“105.
[13] A Krizhevsky, I Sutskever, G E Hinton. ImageNet Classification with Deep Convolutional Neural
Networks. Adv. Neural Inf. Process. Syst. 2012: 1โ€“9.
[14] S-M Khaligh-Razavi. What you need to know about the state-of-the-art computational models of
object-vision: A tour through the models. arXiv Prepr. arXiv1407.2776. 2014: 36.
[15] C Zhang, P Li, G Sun, Y Guan, B Xiao, . Cong. Optimizing FPGA-based Accelerator Design for
Deep Convolutional Neural Networks. Proc. 2015 ACM/SIGDA Int. Symp. Field-Programmable
Gate Arrays - FPGA. 2015; 15: 161โ€“170.
[16] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan, V Vanhoucke, A Rabinovich.
Going Deeper with Convolutions. in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2015: 1โ€“9.
[17] A Alaghi and J P Hayes. Survey of Stochastic Computing. ACM Trans. Embed. Comput. Syst.
2013; 12(2s): 1โ€“19.
[18] B. Gaines. Stochastic computing systems. Adv. Inf. Syst. Sci. 1969: 37โ€“172.
[19] V C Gaudet, A C Rapley. Iterative decoding using stochastic computation. Electron. Lett. 2003;
39(3): 299โ€“301.
[20] K Kim, J Lee, K Choi. Approximate de-randomizer for stochastic circuits. ISOCC 2015 - Int. SoC
Des. Conf. SoC Internet Everything. 2016: 123โ€“124.
[21] H Ichihara, T Sugino, S Ishii, T Iwagaki, T Inoue. Compact and Accurate Digital Filters Based on
Stochastic Computing. Trans. Emerg. Top. Comput. 2016; 99: 1โ€“12

More Related Content

What's hot

Design and implementation a prototype system for fusion image by using SWT-PC...
Design and implementation a prototype system for fusion image by using SWT-PC...Design and implementation a prototype system for fusion image by using SWT-PC...
Design and implementation a prototype system for fusion image by using SWT-PC...
IJECEIAES
ย 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
ย 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networks
eSAT Publishing House
ย 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
ย 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
ย 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
ย 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
taeseon ryu
ย 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
ย 
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
IJCNCJournal
ย 
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODELANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
VLSICS Design
ย 
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
IRJET Journal
ย 
RTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSNRTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSN
IJECEIAES
ย 
Ijcnc050213
Ijcnc050213Ijcnc050213
Ijcnc050213
IJCNCJournal
ย 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
Jinwon Lee
ย 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
Artem Lutov
ย 
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAIMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
eeiej_journal
ย 
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...
Serhan
ย 
IRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation ModelsIRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET Journal
ย 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
YoussefKitane
ย 

What's hot (20)

Design and implementation a prototype system for fusion image by using SWT-PC...
Design and implementation a prototype system for fusion image by using SWT-PC...Design and implementation a prototype system for fusion image by using SWT-PC...
Design and implementation a prototype system for fusion image by using SWT-PC...
ย 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
ย 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networks
ย 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
ย 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
ย 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
ย 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
ย 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
ย 
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ย 
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODELANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
ย 
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
A Novel Low Complexity Histogram Algorithm for High Performance Image Process...
ย 
RTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSNRTL Implementation of image compression techniques in WSN
RTL Implementation of image compression techniques in WSN
ย 
Ijcnc050213
Ijcnc050213Ijcnc050213
Ijcnc050213
ย 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
ย 
Todtree
TodtreeTodtree
Todtree
ย 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
ย 
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAIMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
ย 
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Ske...
ย 
IRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation ModelsIRJET- LS Chaotic based Image Encryption System Via Permutation Models
IRJET- LS Chaotic based Image Encryption System Via Permutation Models
ย 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
ย 

Similar to Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions

RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Putra Wanda
ย 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
gerogepatton
ย 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
gerogepatton
ย 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
rinzindorjej
ย 
6119ijcsitce01
6119ijcsitce016119ijcsitce01
6119ijcsitce01
ijcsitcejournal
ย 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
rinzindorjej
ย 
Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...
IJECEIAES
ย 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
ย 
E035425030
E035425030E035425030
E035425030ijceronline
ย 
An energy-efficient cluster head selection in wireless sensor network using g...
An energy-efficient cluster head selection in wireless sensor network using g...An energy-efficient cluster head selection in wireless sensor network using g...
An energy-efficient cluster head selection in wireless sensor network using g...
TELKOMNIKA JOURNAL
ย 
A White Paper On Neural Network Quantization
A White Paper On Neural Network QuantizationA White Paper On Neural Network Quantization
A White Paper On Neural Network Quantization
April Knyff
ย 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
IJECEIAES
ย 
Medium term load demand forecast of Kano zone using neural network algorithms
Medium term load demand forecast of Kano zone using neural network algorithmsMedium term load demand forecast of Kano zone using neural network algorithms
Medium term load demand forecast of Kano zone using neural network algorithms
TELKOMNIKA JOURNAL
ย 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
ijcsit
ย 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
ย 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
IRJET Journal
ย 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
IJECEIAES
ย 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
ย 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...ijsrd.com
ย 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
ijsrd.com
ย 

Similar to Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions (20)

RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
ย 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
ย 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
ย 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
ย 
6119ijcsitce01
6119ijcsitce016119ijcsitce01
6119ijcsitce01
ย 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
ย 
Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...
ย 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
ย 
E035425030
E035425030E035425030
E035425030
ย 
An energy-efficient cluster head selection in wireless sensor network using g...
An energy-efficient cluster head selection in wireless sensor network using g...An energy-efficient cluster head selection in wireless sensor network using g...
An energy-efficient cluster head selection in wireless sensor network using g...
ย 
A White Paper On Neural Network Quantization
A White Paper On Neural Network QuantizationA White Paper On Neural Network Quantization
A White Paper On Neural Network Quantization
ย 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
ย 
Medium term load demand forecast of Kano zone using neural network algorithms
Medium term load demand forecast of Kano zone using neural network algorithmsMedium term load demand forecast of Kano zone using neural network algorithms
Medium term load demand forecast of Kano zone using neural network algorithms
ย 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
ย 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
ย 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
ย 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
ย 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
ย 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
ย 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
ย 

More from TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
TELKOMNIKA JOURNAL
ย 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
TELKOMNIKA JOURNAL
ย 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
TELKOMNIKA JOURNAL
ย 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
TELKOMNIKA JOURNAL
ย 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
TELKOMNIKA JOURNAL
ย 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
TELKOMNIKA JOURNAL
ย 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
TELKOMNIKA JOURNAL
ย 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
TELKOMNIKA JOURNAL
ย 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
TELKOMNIKA JOURNAL
ย 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
TELKOMNIKA JOURNAL
ย 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
TELKOMNIKA JOURNAL
ย 
Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...
Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...
Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...
TELKOMNIKA JOURNAL
ย 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
TELKOMNIKA JOURNAL
ย 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
TELKOMNIKA JOURNAL
ย 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
TELKOMNIKA JOURNAL
ย 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
TELKOMNIKA JOURNAL
ย 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
TELKOMNIKA JOURNAL
ย 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
TELKOMNIKA JOURNAL
ย 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
TELKOMNIKA JOURNAL
ย 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
TELKOMNIKA JOURNAL
ย 

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
ย 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
ย 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
ย 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
ย 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
ย 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
ย 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
ย 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
ย 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
ย 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
ย 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
ย 
Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...
Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...
Implementation of FinFET technology based low power 4ร—4 Wallace tree multipli...
ย 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
ย 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
ย 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
ย 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
ย 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
ย 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
ย 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
ย 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
ย 

Recently uploaded

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
ย 
AIR POLLUTION lecture EnE203 updated.pdf
AIR POLLUTION lecture EnE203 updated.pdfAIR POLLUTION lecture EnE203 updated.pdf
AIR POLLUTION lecture EnE203 updated.pdf
RicletoEspinosa1
ย 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
ย 
ๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ท
ๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ทๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ท
ๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ท
insn4465
ย 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
ย 
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxTOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
nikitacareer3
ย 
Modelagem de um CSTR com reaรงรฃo endotermica.pdf
Modelagem de um CSTR com reaรงรฃo endotermica.pdfModelagem de um CSTR com reaรงรฃo endotermica.pdf
Modelagem de um CSTR com reaรงรฃo endotermica.pdf
camseq
ย 
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†
bakpo1
ย 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
ย 
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†
zwunae
ย 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
ย 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
ย 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
ย 
Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
iemerc2024
ย 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
ย 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
ย 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
ย 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
ย 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
ย 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
ย 

Recently uploaded (20)

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
ย 
AIR POLLUTION lecture EnE203 updated.pdf
AIR POLLUTION lecture EnE203 updated.pdfAIR POLLUTION lecture EnE203 updated.pdf
AIR POLLUTION lecture EnE203 updated.pdf
ย 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
ย 
ๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ท
ๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ทๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ท
ๅ“ช้‡ŒๅŠž็†(csuๆฏ•ไธš่ฏไนฆ)ๆŸฅๅฐ”ๆ–ฏ็‰นๅคงๅญฆๆฏ•ไธš่ฏ็ก•ๅฃซๅญฆๅŽ†ๅŽŸ็‰ˆไธ€ๆจกไธ€ๆ ท
ย 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
ย 
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxTOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
ย 
Modelagem de um CSTR com reaรงรฃo endotermica.pdf
Modelagem de um CSTR com reaรงรฃo endotermica.pdfModelagem de um CSTR com reaรงรฃo endotermica.pdf
Modelagem de um CSTR com reaรงรฃo endotermica.pdf
ย 
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(SFUๆฏ•ไธš่ฏ)่ฅฟ่’™่ฒ่ŽŽๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅฆ‚ไฝ•ๅŠž็†
ย 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ย 
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†
ไธ€ๆฏ”ไธ€ๅŽŸ็‰ˆ(IITๆฏ•ไธš่ฏ)ไผŠๅˆฉ่ฏบไผŠ็†ๅทฅๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ไธ“ไธšๅŠž็†
ย 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
ย 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
ย 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
ย 
Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
ย 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
ย 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
ย 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
ย 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
ย 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
ย 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
ย 

Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions

  • 1. TELKOMNIKA, Vol.16, No.6, December 2018, pp.2835~2843 ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018 DOI: 10.12928/TELKOMNIKA.v16i6.8955 ๏ฎ 2835 Received February 14, 2018; Revised September 27, 2018; Accepted October 24, 2018 Stochastic Computing Correlation Utilization in Convolutional Neural Network Basic Functions Hamdan Abdellatef*, Mohamed Khalil-Hani, Nasir Shaikh-Husin, Sayed Omid Ayat VeCAD Research Laboratory, School of Electrical Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia *Corresponding author, e-mail: hamdan.abdellatef@gmail.com Abstract In recent years, many applications have been implemented in embedded systems and mobile Internet of Things (IoT) devices that typically have constrained resources, smaller power budget, and exhibit "smartness" or intelligence. To implement computation-intensive and resource-hungry Convolutional Neural Network (CNN) in this class of devices, many research groups have developed specialized parallel accelerators using Graphical Processing Units (GPU), Field-Programmable Gate Arrays (FPGA), or Application-Specific Integrated Circuits (ASIC). An alternative computing paradigm called Stochastic Computing (SC) can implement CNN with low hardware footprint and power consumption. To enable building more efficient SC CNN, this work incorporates the CNN basic functions in SC that exploit correlation, share Random Number Generators (RNG), and is more robust to rounding error. Experimental results show our proposed solution provides significant savings in hardware footprint and increased accuracy for the SC CNN basic functions circuits compared to previous work. Keywords: convolutional neural network, stochastic computing, correlation Copyright ยฉ 2018 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction Deep learning has emerged as a new area of machine learning research, which enables a system to automatically learn complex information and extract representations at multiple levels of abstraction. Convolutional Neural Network (CNN) is recognized as one of the most promising types of Artificial Neural Networks (ANN) and has become the dominant approach for almost all recognition and detection tasks [1] such as face recognition [2], handwritten digit recognition [3], target recognition [4], and image classification [5]. To achieve acceptable classification results, CNN performs a massive number of convolutions and sub- sampling operations with significant amounts of intermediate data results. Despite its high classification accuracy, a deep CNN is highly demanding in terms of energy consumption and computation cost [6]. To bring the success of CNNs to resource-constrained mobile and embedded systems, designers must overcome the challenges of implementing resource-hungry CNNs in embedded systems with limited area and power budget [7]. Stochastic Computing (SC), which represents and processes information in the form of a probability of ones in a bit-stream, has the potential to implement CNNs with significantly reduced hardware resources and achieve high power efficiency. In SC, arithmetic operations like multiplication can be performed using simple AND or XNOR logic gate in Uni-Polar (UP) or BiPolar (BP) representation, respectively, and scaled addition is done using Multiplexers (MUX). Also, in SC, there are no positional weights among bits; therefore, SC circuits are better in soft-error resilience and have a free dynamic trade-off between performance, accuracy, and energy. Fortunately, neural networks have high error-tolerance at the algorithmic level, which allows using SC for CNN implementation [8]. Opposite to what was previously believed, the correlation among Stochastic Numbers (SN) can serve as a resource in designing stochastic circuits. A comparative study [9] reported that the circuits exploiting correlation are generally smaller, more accurate, and have lower latency than those with independent inputs. However, previous SC CNN works [7, 8, 10, 11] did not explore correlation in their implementation. Besides, these works did not consider the conversion circuits between SC and conventional binary in their estimates as RNGs can take up to 80% of the circuit cost [12]. Also, long bit
  • 2. ๏ฎ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843 2836 streams up to 8192 bits are used to obtain acceptable accuracy which significantly increases the latency. This work has the following contributions: 1. SC CNN basic functions (inner product, pooling, and ReLU activation function) that exploit correlation is proposed. The obtained functions had significant lower resource utilization, higher accuracy, and more robust to rounding error compared to previous SC work. Also, it shows significant area reduction compared to binary implementation [13]. 2. A new method that generates uncorrelated bit streams for MUXs selectors to enhance the accuracy of scaled addition using Toggle Flip-Flops (TFF) is presented. Also, this method reduces the number of used RNGs. The rest of this paper is organized as follows. Section 2 overviews CNN and explains its basic functions. Section 3 reviews SC basics. Section 4 presents the CNN basic functions (inner product, pooling, ReLU activation function) design method. Section 5 presents experimental results to show the effectiveness of the proposed design with respect to compactness and accuracy, and finally, Section 6 concludes the paper. 2. Convolutional Neural Network Previously, hand-engineered features development had been the primary source of difficulty in computer vision, like sophisticated feature extractors, to identify higher-level patterns that are optimal for machine vision tasks, such as object recognition. However, convolutional neural networks aim to solve this problem by learning higher-level representations automatically from data [14]. As a supervised learning algorithm, CNN employs a feedforward process for recognition and a backward path for training. In industrial practice, many application designers train CNN off-line and use the off-line trained CNN to perform time-sensitive jobs. Thus, the speed, area, and energy consumption of feedforward computation are to be considered. This work is scoped to the feedforward computation hardware implementation on FPGA. A typical CNN, as shown in Figure 1, is composed of multiple computational layers that can be categorized into two components: a feature extractor and a classifier. The feature extractor is used to filter input images into "feature maps" that represent features of the image such as corners and edges which are relatively invariant to position shifts or distortions. The feature extractor output is a low-dimensional vector containing features. Then, the vector is fed into the CNN second component classifier, which is usually a traditional fully connected artificial neural network. The classifier decides the likelihood of categories that the input image might belong to [15]. Figure 1. CNN architecture proposed in [13] The CNN layers can be one of three types: convolutional layer, pooling layer, or fully connected layer. The convolutional layer is the core building block of the CNN whereas the main operation is the convolution that computes the inner-product of receptive fields, a window of the input feature map, and a set of learnable filters. This layer is the most time and resource consuming operation in CNN, occupying more than 90% overall computation dominating runtime and energy consumption as shown in [16]. The convolutional layer has N feature map batch size, M output feature maps, Ch input feature maps, H/W size of output feature map, K weights kernel size, and S stride. To compute one element in the output feature map of the
  • 3. TELKOMNIKA ISSN: 1693-6930 ๏ฎ Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef) 2837 convolutional layer (1) is evaluated. This layer is of 4-D operation and to calculate all output values (1) is looped for N, M, H, and W . The convolutional layer contains two functions, the convolution, and activation. The convolution is loops of multiply-accumulate (MAC) operations (or inner product). The activation functions conduct non-linear transformations such as Rectified Linear Unit (ReLU), Sigmoid function, and hyperbolic tangent function. This work implements only ReLU activation function since it is the most popular one. ReLU compares the input to zero and outputs the maximum value as shown in (2). ๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘] = ๐ด๐‘๐‘ก๐‘–๐‘ฃ๐‘Ž๐‘ก๐‘–๐‘œ๐‘›(๐ต[๐‘š] + โˆ‘ โˆ‘ โˆ‘ ๐‘ฅ[๐‘›][๐‘โ„Ž][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—] ร—๐พโˆ’1 ๐‘—=0 ๐พโˆ’1 ๐‘–=0 ๐ถโ„Žโˆ’1 ๐‘โ„Ž=0 ๐‘ค[๐‘š][๐‘โ„Ž][๐‘–][๐‘—] (1) ๐‘œ๐‘ข๐‘ก = ๐‘š๐‘Ž๐‘ฅ(๐‘ฅ, 0) (2) The pooling layers perform nonlinear down-sampling for data dimension reduction. Commonly, max pooling and average pooling are used for this purpose. The max pooling layer is shown in (3) which output the max value in a 2D window of K size and S stride. The average pooling is to compute the average value of the same window as shown in (4). To complete the layer operation, this equation is repeated for all N, M, H, W. The output feature map of the pooling layer has a 1/๐‘† dimension reduction in height and width. Finally, the high-level reasoning is completed via the classifier which is a fully connected layer. Neurons in this layer are connected to all activation results in the previous layer which is an inner product with a filter size of one element. ๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘] = ๐‘š๐‘Ž๐‘ฅ(๐‘ฅ[๐‘›][๐‘š][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—]) ๐‘“๐‘œ๐‘Ÿ (0,0) โ‰ค (๐‘–, ๐‘—) < (๐พ โˆ’ 1, ๐พ โˆ’ 1) (3) ๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘] = 1 ๐พ2 โˆ‘ โˆ‘ ๐‘ฅ[๐‘›][๐‘š][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—]๐พโˆ’1 ๐‘—=0 ๐พโˆ’1 ๐‘–=0 (4) The basic operations in CNN are the inner product, pooling, and activation function operations. Any CNN neuron may consist of one or multiple basic operations. For instance, neurons in convolutional layers implement inner product and activation operations only; those in pooling layers implement pooling only, and those in fully connected layers implement inner product and activation operations. 3. Stochastic Computing Stochastic computing (SC) represents and processes information in the form of digitized probabilities. In SC, numbers called stochastic numbers SNs are represented by binary bit streams. The SN donate the probability p, the probability of 1s in the SN [17]. SN has no fixed length nor structure. The stochastic representation can be One-line UP, One-line BP, and Two-line. This paper uses BP representation since it allows negative values, but UP do not. The UP and BP stochastic representations are clarified in Table 1 where N0, N1, and N represents the number of zeros, ones, total bits in SN respectively. To convert from binary to stochastic a stochastic number generator (SNG) is used which is a random number generator RNG and a comparator. On the other hand, to convert from Stochastic to binary, a counter is used. According to [18], SNs should be independent and uncorrelated bit-streams. However, recent studies [9] showed that the correlation could serve as a resource in designing stochastic circuits. In that study, Alaghi and Hayes introduced a parameter that determines the significance of the correlation between two SNs called stochastic computing correlation (SCC) as shown in (5). The major advantage of SC is that it employs very low-complexity arithmetic units [17], as shown in Table 1. The AND or XNOR gates perform the multiplication operation in SC using UP or BP representation. There is no direct addition in SC; instead, a scaled addition is used. 2-to-1 MUX performs the scaled addition having ๐‘๐‘  = 0.5 or s = 0 selector bit stream value in UP and BP representation respectively. It should be noted that the MUX selector should be uncorrelated with the MUX inputs to prevent correlation-induced error. However, the MUX inputs are correlation-insensitive and can have any value of correlation. To perform subtraction NOT gate can be used to negate the subtracted value and then it will be added by 2-to-1 MUX to the other value.
  • 4. ๏ฎ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843 2838 Table 1. The SN Representations and Basic Operations Value Interval Relation Negation Multiplication Scaled Addition UP ๐‘ = ๐‘1 ๐‘ [0,1] p = 1 + ๐‘ฅ 2 ๐‘ ๐‘๐‘‚๐‘‡ = 1 โˆ’ ๐‘ ๐‘ ๐ด๐‘๐ท = ๐‘1 ๐‘2 ๐‘ ๐‘€๐‘ˆ๐‘‹ = ๐‘๐‘  ๐‘1 + (1 โˆ’ ๐‘๐‘ )๐‘2 BP ๐‘ฅ = ๐‘1 โˆ’ ๐‘0 ๐‘ [-1,1] x = 2p โˆ’ 1 ๐‘ฅ ๐‘๐‘‚๐‘‡ = โˆ’๐‘ฅ ๐‘ฅ ๐‘‹๐‘๐‘‚๐‘… = ๐‘ฅ1 ๐‘ฅ2 ๐‘ฅ ๐‘€๐‘ˆ๐‘‹ = 1 2 [(1 + ๐‘ )๐‘ฅ1 + (1 โˆ’ ๐‘ )๐‘ฅ2] One source of inaccuracy in SC is rounding. If we wish to eliminate the rounding error, the SN length L, and the binary precision, the number of bits, n should satisfy L = 2 ๐‘› . Suppose the precision in binary computing is n = 8 bits, so the full SN length L = 256 bits. Each bit requires one clock cycle to be processed causing the SC latency. As precision increases, L and latency increase exponentially which is a significant drawback in SC. To reduce the number of clock cycles needed, the full SN length is not used producing rounding error. ๐‘†๐ถ๐ถ(๐‘‹, ๐‘Œ) = { ๐‘ ๐‘‹โˆฉ๐‘Œโˆ’๐‘ ๐‘‹ ๐‘ ๐‘Œ ๐‘š๐‘–๐‘›(๐‘ ๐‘‹,๐‘ ๐‘Œ)โˆ’๐‘ ๐‘‹ ๐‘ ๐‘Œ ๐‘ ๐‘‹โˆฉ๐‘Œ โˆ’ ๐‘ ๐‘‹ ๐‘ ๐‘Œ > 0 0 ๐‘ ๐‘‹โˆฉ๐‘Œ โˆ’ ๐‘ ๐‘‹ ๐‘ ๐‘Œ = 0 ๐‘ ๐‘‹โˆฉ๐‘Œโˆ’๐‘ ๐‘‹ ๐‘ ๐‘Œ ๐‘ ๐‘‹ ๐‘ ๐‘Œโˆ’๐‘š๐‘Ž๐‘ฅ(๐‘ ๐‘‹+๐‘ ๐‘Œโˆ’1,0) ๐‘ ๐‘‹โˆฉ๐‘Œ โˆ’ ๐‘ ๐‘‹ ๐‘ ๐‘Œ < 0 (5) For SC addition operation, it is required to produce a 0.5 bit-stream that has ๐‘†๐ถ๐ถ โ‰ˆ 0 with respect to the inputs. Theoretically, the independent RNGs generates uncorrelated bitstreams, but the growing circuit size will require many independent RNGs affecting the area cost. In this study, we propose using flip-flops (FF)s to obtain uncorrelated bit-streams for SC scaled addition. T-FF is a JK-FF where T is connected to both inputs of the JK-FF. Based on Gaudet and Rapley [19] the JK-FF output follow (6). In the case of T-FF ๐‘ ๐‘‡ = ๐‘๐ฝ = ๐‘ ๐พ, then always ๐‘ ๐‘„ = 0.5 for any value of T and always SCC(T, Q) โ‰ˆ 0. Using T-FF will allow using one RNG for all SNGs to generate the MUX inputs and the uncorrelated selector as shown in Figure 2a. Similarly, SC addition accuracy will be increased as shown in Figure 2b. ๐‘ ๐‘„ = ๐‘ ๐ฝ ๐‘ ๐ฝ+๐‘ ๐‘„ (6) (a) SC scaled addition using T-FF for MUX selector generation (b) Accuracy comparison between T-FF and independent RNG for selector generation Figure 2. More accurate SC scaled addition by using T-FF 4. Design of Stochastic Computing Based Convolutional Neural Network Basic Functions 4.1. Inner Product The inner product is a MAC operation which is the basic function of convolution in CNN. The number of elements in the inner product is determined from the "for loop" unroll factor. To perform inner product in SC, the standard blocks are XNOR to perform multiplication and MUX or Approximate Parallel Counter (APC) [20] to perform addition as proposed in previous SC
  • 5. TELKOMNIKA ISSN: 1693-6930 ๏ฎ Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef) 2839 CNN works [7, 8, 10]. However, XNOR gate requires long uncorrelated bit-streams which will affect latency and area cost by using one RNG per SNG. On the other hand, MUX tree approach proposed in [21] to perform inner product in digital filter case study can be adapted to this application. MUX tree allows sharing RNG among all inputs and is more accurate compared to previous XNOR-MUX and XNOR-APC approaches as to be shown in Section 5. One MUX can be used for inner product of 2-elements vectors x and h as shown in Figure 3 and following (7), where the real operation does not involve multiplication or ๐‘ง = 1 โˆ‘|โ„Ž| (โ„Ž โˆ™ ๐‘ฅ) = 1 |โ„Ž1|+|โ„Ž2| (โ„Ž1 ๐‘ฅ1 + โ„Ž2 ๐‘ฅ2) (7) addition. The selector bit-stream of probability |โ„Ž1| |โ„Ž1|+|โ„Ž2| which follows (8) and ๐‘ ๐‘™ ๐‘€ = โˆ‘(โ„Ž1 ๐‘€) โˆ‘(โ„Ž1 ๐‘€โˆชโ„Ž0 ๐‘€) (8) sign(โ„Ž๐‘–) will be denoted by ๐‘ ๐‘–. The same equation shows the MUX tree output for inner product of two vectors x and h with any length, but scaling seems to be a problem. Taking advantage of learning, the backward phase of the network is modified to adapt the scaling. Figure 3. SC inner product circuit [21] To create a mathematical model for SC inner product using the MUX tree and adapt it to CNN, the MUX tree will be changed to the sum of products. For ๐‘๐‘–๐‘› inputs, the MUX tree has ๐‘๐‘–๐‘› โˆ’ 1 MUXs. We define the SM array which is a multi-dimensional array created from the selector probabilities. This array will be of dimensions ๐‘๐‘–๐‘›. Each element is a binary ANDing of selector bits related to the specific input until the output (9) shows the general SM array and an example of OL-MUX tree if ๐‘๐‘–๐‘› = 5 where ๐‘ ๐‘™๐‘– is a bit of the selector bit-stream of the ๐‘– ๐‘กโ„Ž MUX. For more information about constructing optimum OL-MUX trees we refer to [21]. ๐‘†๐‘€ = [ ๐‘ ๐‘™ ๐‘๐‘–๐‘›โˆ’1 ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ… โˆฉ โ‹ฏ โˆฉ ๐‘ ๐‘™1 ฬ…ฬ…ฬ…ฬ… ๐‘ ๐‘™ ๐‘๐‘–๐‘›โˆ’1 ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ…ฬ… โˆฉ โ‹ฏ โˆฉ ๐‘ ๐‘™1 โ‹ฎ โ‹ฑ โ‹ฎ ๐‘ ๐‘™ ๐‘ ๐‘–๐‘›โˆ’1 โˆฉ โ‹ฏ ] = [ ๐‘ ๐‘™4 ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™2 ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™1 ฬ…ฬ…ฬ…ฬ… ๐‘ ๐‘™4 ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™2 ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™1 ๐‘ ๐‘™4 ฬ…ฬ…ฬ…ฬ… โˆฉ ๐‘ ๐‘™2 ๐‘ ๐‘™4 โˆฉ ๐‘ ๐‘™3 ฬ…ฬ…ฬ…ฬ… ๐‘ ๐‘™4 โˆฉ ๐‘ ๐‘™3 ] (9) Thus, the general equation of the MUX tree will become like that of (10) for evaluating one bit of the output bit-stream. ๐‘ = โ‹ƒ (๐‘ ๐‘– โŠ• ๐‘ฅ๐‘–) โˆฉ ๐‘†๐‘€๐‘– ๐‘ ๐‘–๐‘›โˆ’1 ๐‘–=0 (10) Suppose we want to use the SC inner product MUX tree to compute all elements (e.g. ๐‘๐‘–๐‘› = ๐ถโ„Ž ร— ๐พ ร— ๐พ). The inner product in (1) can be changed to (11) by taking advantage of SC (7) and (10) and adding a new loop representing the SN bits. It should be noted that all the
  • 6. ๏ฎ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843 2840 variables are bits. The new index [b] is to evaluate the bit operation through the SNs from 0 to L โˆ’ 1. It is apparent the amount of resource utilization reduced. ๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘][๐‘] = โ‹ƒ โ‹ƒ โ‹ƒ ((๐‘  ๐‘ค[๐‘š][๐‘โ„Ž][๐‘–][๐‘—] โŠ• ๐‘ฅ[๐‘›][๐‘โ„Ž][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—][๐‘]) โˆฉ๐พโˆ’1 ๐‘—=0 ๐พโˆ’1 ๐‘–=0 ๐ถโ„Žโˆ’1 ๐‘โ„Ž=0 ๐‘†๐‘€[๐‘š][๐‘โ„Ž][๐‘–][๐‘—][๐‘] (11) Instead of using a multiplier and an adder, by SC we use only some simple gates at the expense of latency. The resources used in a MUX tree inner product of ๐‘๐‘–๐‘› number of parallel inputs are ๐‘๐‘–๐‘› XOR gates and ๐‘๐‘–๐‘› โˆ’ 1 MUXs. In practical implementations, to highly reduce latency, some loops should be unrolled entirely. The inputs of CNN are of single size. To use the full SN length, L should be L = 232 = 4294967296 which is too long and will produce high latency. One approach to reduce L is to reduce n, binary precision that will cause the binary quantization. The other approach is to reduce L without changing n which will cause the SC rounding error. The accuracy of MUX tree inner product circuit is evaluated with respect to precision and number of inputs with rounding error as shown in Table 2. It can be concluded that the MUX tree has high accuracy and robust to rounding error. Table 2. MUX Tree Absolute Error ร— 10โˆ’2 with Respect to SN Length ๐ฟ and Number of Inputs, ๐‘๐‘–๐‘›, using Binary Precision ๐‘› = 64 ๐‘๐‘–๐‘›, ๐ฟ 64 128 256 512 1024 2048 4096 8192 2 3.3 2.2 1.49 1.02 0.73 0.51 0.36 0.25 4 4.48 3.07 2.13 1.48 1.05 0.74 0.53 0.37 8 5.03 3.6 2.47 1.74 1.22 0.86 0.62 0.42 16 5.45 3.81 2.66 1.85 1.32 0.94 0.66 0.47 4.2. Pooling and Activation Function In SC, if the correlation is exploited, the OR gates act as the max function for SCC=1. Therefore, (3) can be modified to become as stated in (12). In SC, instead of using a compactor, the OR gate will perform the max operation leading to a significant reduction in hardware footprint. Usually, the max pooling stride S is 2 and kernel K is 2, so max pooling can be realized using only 3 OR gates using the proposed approach after unrolling i and j loops of (12) as shown in Figure 4a. Similar to max pooling, the ReLU activation function performs the max operation but compared to zero. Thus, the input is โ€œORedโ€ with a correlated SN of x=0 in the BP domain. Figure 4b shows the ReLU circuit. ๐‘‚[๐‘›][๐‘š][๐‘Ÿ][๐‘][๐‘] = โ‹ƒ โ‹ƒ (๐‘‹[๐‘›][๐‘š][๐‘†๐‘Ÿ + ๐‘–][๐‘†๐‘ + ๐‘—][๐‘]๐พโˆ’1 ๐‘—=0 ๐พโˆ’1 ๐‘–=0 (12) (a) Proposed max pooling (b) Proposed SC ReLU circuit Figure 4. Proposed SC CNN pooling and activation function circuits The general scaled addition (using MUX) in SC for ๐‘๐‘–๐‘› inputs follow (13) which is easily realized using ๐‘๐‘–๐‘› โˆ’ 1 2-to-1 MUXs tree with uncorrelated selector bit-streams of probability 0.5 for each of the MUXs. Thus, average Pooling is straightforward in SC. For example, if we want to implement in SC average pooling with stride size 2 and kernel size 2, 3 scaled addition units
  • 7. TELKOMNIKA ISSN: 1693-6930 ๏ฎ Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef) 2841 (2-to-1 MUXs) tree with selector probability p = 0.5 will be used. To increase the accuracy, TFF will be utilized for selector bit-stream generation. The average pooling block can be used with any SCC among inputs. Two versions of average pooling will be experimented, the average pooling using independent RNGs for MUX selectors (SC AP RNG) and the average pooling using T-FF to create the uncorrelated selector bit-stream (SC AP FF). ๐‘ง = 1 ๐‘ ๐‘–๐‘› โˆ‘ ๐‘ฅ๐‘– ๐‘ ๐‘–๐‘›โˆ’1 ๐‘–=0 (13) 5. Experimental Results and Discussion To clarify the effectiveness of the proposed SC CNN basic functions, they were compared with previous SC work and the respective binary computation. The accuracy and the resource utilization are the measured metrics. To evaluate the accuracy, the absolute error is computed for 10000 attempts of randomly generated inputs where the conventional binary result is the golden reference. From these attempts, we obtained the average output absolute errors. Then different SN lengths L were used to observe the error behavior and the robustness to rounding error as the input binary precision n = 32 bits. On the other hand, to evaluate the area of the basic functions designs, we synthesize the circuits using Vivado Design Suite targeting Xilinx ZYNQ Z706 FPGA. Previous SC CNN used XNOR-MUX or XNOR-APC for the inner product operation in convolutional layers of CNN [7, 8, 10]. This work proposed MUX tree for the inner product shown in Figure 5, and the selector probability values follow (8) [21]. The number of inputs ๐‘๐‘–๐‘› used in this experiment is 16 since it is more optimum for XNOR-APC SC inner product. The SN length is varied through 64, 128, 256, 512, 1024, 2048, 4096, and 8192 bits since 8192 bits is used in [10] and 1024 bits in [8]. Figure 6(a) shows the mean absolute error of MUX tree, XNOR-MUX, and XNOR-APC approaches for inner product. To make a fair comparison, the results of each SC circuit is multiplied by its scaling factor. For example, the MUX tree output is multiplied by โˆ‘|โ„Ž|. The MUX tree obtained the least error. Therefore, the MUX tree SC inner product is more accurate than previous works approaches with respect to different SN lengths. Figure 5. MUX tree used for 4 ร— 4 inner product The resource utilizations of the three SC inner product approaches MUX tree, XNOR- MUX, and XNOR-APC are compared along with conventional binary (bin) inner product (serial design) as shown in Table 3. The MUX tree shows lower resource utilization of 1.6 ร— and 2 ร— compared to XNOR-MUX and XNOR-APC. Also, significant savings compared to the binary inner product. Besides, MUX tree has more area savings since the MUX tree circuit requires one RNG for any number of inputs. However, XNOR-MUX and XNOR-APC SC inner product need ๐‘๐‘–๐‘› RNGs. Therefore, the MUX tree obtains ร— ๐‘๐‘–๐‘› RNG circuit savings. The RNG used is linear feedback shift register LFSR.
  • 8. ๏ฎ ISSN: 1693-6930 TELKOMNIKA Vol. 16, No. 6, December 2018: 2835-2843 2842 Table 3. Resource Utilization of Proposed Functions, Previous Work, and Conventional Binary Inner Product DSP FF LUT MUX tree 0 0 35 XNOR-MUX 0 0 55 XNOR-APC 0 0 71 Bin 8-bit fixed point 1 26 18 Bin 32-bit float 5 540 740 Max Pooling (MP) DSP FF LUT Proposed SC MP 0 0 1 Approximate SC MP [10] 0 32 65 Bin MP 32-bit 0 0 134 Activation function DSP FF LUT Proposed SC ReLU 0 0 1 Bin ReLU 32-bit 0 0 16 Average Pooling (AP) DSP FF LUT Proposed SC AP FF 0 3 4 SC AP RNGs 0 0 2 Bin AP 32-bit 0 0 95 Using the proposed MUX tree for SC inner product operation of the CNN convolutional layer is more efficient compared to previous SC inner product circuits or the conventional binary. MUX tree is more accurate than other SC inner product and has less hardware footprint. Also, compared to conventional binary, using the MUX tree SC inner product will provide significant resource utilization savings. Without exploiting correlation, the max pooling operation in SC is hard to be designed. Ren et al. [10] proposed an approximate SC max pooling circuit. However, the proposed SC max pooling in this study outperforms the previous work in terms of accuracy and resource utilization. Figure 6 (b) shows that the proposed max pooling is more accurate for any SN length L. Also Figure 6 (c) shows the absolute error of proposed average pooling operation using independent RNG for each MUX selector and T-FF for selector bitstream generation. The average pooling using independent RNGs requires ๐‘™๐‘œ๐‘”2(๐‘๐‘–๐‘›) + 1 different RNGs, while the average pooling using T-FF and MUXs require 1 RNG for any number of inputs. This result a (๐‘™๐‘œ๐‘”2(๐‘๐‘–๐‘›) + 1) times savings in RNGs. The resource utilization of the proposed SC average and max-pooling functions and their binary counterparts are shown in Table 3 where all are of parallel architecture. The proposed SC ReLU circuit absolute error is shown in Figure 6(d). A very minimal accuracy loss in the proposed SC ReLU is obtained with a high resource utilization savings of 16 times. (a) SC inner product circuits absolute error (b) SC max pooling absolute error (c) SC average pooling absolute error (d) SC ReLU absolute error Figure 6. The absolute error of the proposed basic functions compared to previous works
  • 9. TELKOMNIKA ISSN: 1693-6930 ๏ฎ Stochastic Computing Correlation Utilization in Convolutional Neural... (Hamdan Abdellatef) 2843 6. Conclusion In this paper, the SC CNN basic functions exploiting correlation was proposed with reduced hardware footprint to be efficient in the resource-constrained mobile and embedded systems. These functions are inner product, max pooling, average pooling, and ReLU activation function. A combination of these basic functions when looped create a specific CNN layer. Experimental results demonstrate that the proposed SC functions achieved significant hardware footprint savings compared to equivalent binary functions. Also, the proposed functions outperformed previous works of SC CNN in terms of accuracy and resource utilization. Our future work will investigate the performance of a complete SC CNN which is composed of the proposed basic functions. References [1] Y LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature. 2015; 521(7553)436โ€“444. [2] B Cheung. Convolutional Neural Networks Applied to Human Face Classification. 2012 11th Int. Conf. Mach. Learn. Appl. 2012: 580โ€“583. [3] C Wu, W Fan, Y He, J Sun, S Naoi. Cascaded Heterogeneous Convolutional Neural Networks for Handwritten Digit Recognition. Int. Conf. Pattern Recognit. 2012: 657โ€“660. [4] J S Ashwin, N Manoharan. Convolutional Neural Network Based Target Recognition for Marine Search. Indones. J. Electr. Eng. Comput. Sci. 2017; 8(2): 561โ€“563. [5] R Wang, J Zhang, W Dong, J Yu, C J. Xie, R Li, T Chen, H Chen. A Crop Pests Image Classification Algorithm Based on Deep Convolutional Neural Network. Telkomnika Telecommunication Computing Electronics and Control. 2017; 15(3): 1239โ€“1246. [6] M Alawad, M. Lin. Stochastic-Based Deep Convolutional Networks with Reconfigurable Logic Fabric. IEEE Trans. Multi-Scale Comput. Syst. 2016; 99: 1. [7] J Li, A Ren, Z Li, C Ding, B Yuan, Q Qiu, Y Wang. Towards acceleration of deep convolutional neural networks using stochastic computing. Proc. Asia South Pacific Des. Autom. Conf. ASP-DAC. 2017: 115โ€“120. [8] H Sim, D Nguyen, J Lee, K. Choi. Scalable stochastic-computing accelerator for convolutional neural networks. Proc. Asia South Pacific Des. Autom. Conf. ASP-DAC. 2017: 696โ€“701. [9] A Alaghi, J P Hayes. Exploiting correlation in stochastic circuit design. 2013 IEEE 31st Int. Conf. Comput. Des. ICCD. 2013: 39โ€“46. [10] A Ren, J Li, Z Li, C Ding, X Qian, Q Qiu, B Yuan, Y Wang. SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing. in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 2017: 405โ€“418. [11] J Li, Z Yuan, Z Li, C Ding, A Ren, Q Qiu, J Draper, Y Wang. Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks. in International Joint Conference on Neural Networks (IJCNN). 2017: 1230โ€“1236. [12] W Qian, X Li, M D Riedel, K Bazargan, D J Lilja. An architecture for fault-tolerant computation with stochastic logic. IEEE Trans. Comput. 2011; 60(1): 93โ€“105. [13] A Krizhevsky, I Sutskever, G E Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012: 1โ€“9. [14] S-M Khaligh-Razavi. What you need to know about the state-of-the-art computational models of object-vision: A tour through the models. arXiv Prepr. arXiv1407.2776. 2014: 36. [15] C Zhang, P Li, G Sun, Y Guan, B Xiao, . Cong. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Proc. 2015 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays - FPGA. 2015; 15: 161โ€“170. [16] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan, V Vanhoucke, A Rabinovich. Going Deeper with Convolutions. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 1โ€“9. [17] A Alaghi and J P Hayes. Survey of Stochastic Computing. ACM Trans. Embed. Comput. Syst. 2013; 12(2s): 1โ€“19. [18] B. Gaines. Stochastic computing systems. Adv. Inf. Syst. Sci. 1969: 37โ€“172. [19] V C Gaudet, A C Rapley. Iterative decoding using stochastic computation. Electron. Lett. 2003; 39(3): 299โ€“301. [20] K Kim, J Lee, K Choi. Approximate de-randomizer for stochastic circuits. ISOCC 2015 - Int. SoC Des. Conf. SoC Internet Everything. 2016: 123โ€“124. [21] H Ichihara, T Sugino, S Ishii, T Iwagaki, T Inoue. Compact and Accurate Digital Filters Based on Stochastic Computing. Trans. Emerg. Top. Comput. 2016; 99: 1โ€“12