SlideShare a Scribd company logo
1 of 11
Download to read offline
TVLSI-00648-2014

Abstract—In this paper, we present an FPGA implementation
of an artificial neural network. In addition, the current paper
emphasizes important FPGA design principles, which turn the
development of a neural network into a much more modular
procedure. In fact, these principles may be found extremely
useful for those who plan to implement a neural architecture on
an FPGA.
Our implementation was based on a multilayer perceptron
topology and used the back-propagate algorithm for training.
Thanks to a highly modular and parameterized structure, a
network with any number of layers and neurons can be
synthesized in a matter of minutes. This means that the system
can be quickly configured for solving any type of neural network
related problem, including examples that involve deep learning.
The design was fitted on Xilinx Zynq-7000 at 200MHz clock
frequency. The peak performances measured were 5.46 MCUPS
and 8.24 MCPS during the training and computation
respectively. The implementation was successfully tested using
a training set of “Breast Cancer Wisconsin” classification
problem. Tests have shown 96% hit ratio.
Index Terms—Artificial neural networks, Backpropagation,
Deep Learning, Design methodology, Feedforward neural
networks, Field programmable gate arrays, Learning systems,
Multi-layer neural network, Multilayer perceptrons, Neural
network hardware, Parallel architectures, Reconfigurable
architectures.
I. INTRODUCTION
HE artificial neural networks, or ANNs, are
computational models that provide a unique way of data
processing. This computational model was inspired by an
animal’s neural system and was found useful in areas such as
pattern recognition, classification, approximation, etc. A
general structure of an ANN consists of a network of
interconnected artificial neurons. Moreover, the ANN is a
trainable machine, thus it is capable of learning patterns for
later recognition.
An interest in data classification, such as face recognition,
has significantly increased in the past few years. As the
available training sets are getting bigger, more and more
researchers are attracted to improve the ANN’s recognition
K. Berestizhevsky and R. Levy are with the Department of Electrical
Engineering Systems, Tel Aviv University, Tel Aviv 69978, Israel. (e-mail:
konsta9@mail.tau.ac.il; roeelev1@mail.tau.ac.il).
abilities. For instance, a significant research direction has
been conducted in the field of deep learning, carried out on
Deep Neural Networks, or DNNs [12].
On the one hand, the ANN model provides an efficient,
uncentralized way of computation. On the other hand, the
very fine grain parallelism of this model, makes it fit well
only in highly parallel hardware platforms. Various previous
work on finding an appropriate platform for neural
architecture was conducted. Platforms such as general
purpose computers, dedicated parallel computers and ASICs
were examined [3]. Eventually, it seems that when dealing
with neural architectures, the FPGA platform provides the
best compromise among its competitors. This compromise is
in terms of time-to-market, cost, speed, area and reliability.
Extended comparison between the platforms was described
in [1].
In our study, we have been looking for a flexible solution
for designing a neural architecture. The platform of our
choice, based on the previously mentioned papers, was
decided to be FPGA. Unfortunately, the early literature
review has shown lack in properly described design
principles of neural architecture on FPGA. One significant
work was conducted in [4] and provided additional
inspiration for our paper to come. Hence, in this paper, we
describe a modular method for designing an ANN on an
FPGA.
The idea presented in this paper relies on a hierarchical
approach. In other words, we break the structure of ANN
down into hierarchical levels of abstraction. To begin with,
the base level is the single artificial neuron. Afterwards, the
following levels encapsulate one or more lower-level
components and lead the development towards the top level
of the abstraction, which is the software application that
controls the global actions of the network.
The paper is organized as follows. Chapter II describes the
background behind the hardware platform used, the ANN
theory and the main algorithms used. Chapter III presents the
hierarchical approach for a neural network’s design process.
Chapter IV extends this approach to concrete development
steps. Chapter V presents the conducted tests and the results.
Artificial Neural Network Implementation on
FPGA – a Modular Approach
K. Berestizhevsky and R. Levy
T
TVLSI-00648-2014
Chapter II concludes the study and suggests further
improvements.
II. BACKGROUND
A. FPGA Technology
Field Programmable Gate Arrays (FPGAs) are
programmable devices that can be configured to function as
custom digital circuits. The configuration of an FPGA device
is done by means of hardware description languages such as
Verilog or VHDL. Thus, by describing the logic and by
programming the FPGA with it, one can obtain an ASIC-like
hardware.
The FPGA hardware consists of 2D array of configurable
logical blocks (CLBs) that can be set to implement specific
logical functions. The connections between these CLBs are
configurable as well. Furthermore, every logical function can
be programmed into the FPGA as long as there are enough
CLBs and the connectivity between them is available.
The fact that FPGA utilizes generic CLBs, located in fixed
positions on a chip, imposes a significant drawback. This
means that every custom design is forced to be fitted on this
fixed arrangement of blocks. As a result, an FPGA design
will always have much poorer performance than the same
logic on a properly designed ASIC. Nevertheless the
development time for FPGA design is measured by weeks,
whereas ASIC development times are measured by months.
Moreover, an appropriately parameterized FPGA design can
be reconfigured and re-programmed into the chip in a matter
of minutes.
In our study, we’ve used Xilinx Zynq™-7000 as the FPGA
platform. This platform is equipped with a dual core ARM
Cortex™-A9 and provides extensive programming abilities
as well as a convenient interface with a PC.
B. Artificial Neural Networks
Artificial neural networks (ANNs) are computational
models for solving various problems. These models can be
implemented both in software and in hardware. During the
last decades, ANNs were used for engineering purposes, such
as studying behavior and control in animals and machines,
pattern recognition, forecasting, and data compression. First,
we present a few important definitions:
ANN – Artificial Neural Network, consists of artificial
neurons, ordered in layers. The signals in the network are
flowing through these layers (see Fig. 2).
Artificial Neuron = “activator” = “node” of the ANN
General activator consists of multiple input ports, summing
junction, activation function (threshold, ramp, sigmoidal,
etc.) and one output port (see Fig. 1).
MLP– multilayer perceptron. A form of a feed-forward
neural network consisting of an input layer, followed by two
or more hidden layers of neurons, the last of which is the
output layer.
DNN – deep neural network, another way to refer to MLP
with more than a single hidden layer.
Learning Phase – period of time, when ANN determines
its learnable parameters, based on learning set. This phase
involves the training algorithm.
Calculation Phase – period of time when ANN is fed with
inputs and the forward computation on these inputs generate
an output of the network.
In the nature, the real neurons receive electrical signals
through synapses. When the received signals surpass a
threshold, the neuron emits a signal through its axon. The
artificial neuron attempts to replicate this natural behavior.
The electrical signals appear in a form of digital inputs, their
strength is their numeric value multiplied by a weight of a
corresponding input edge. The activation of an artificial
neuron is done by applying a nonlinear function (such as
sigmoid) to the accumulated value of the neuron’s inputs. The
final calculated value of an artificial neuron appears on its
output port. Fig. 1 presents the mathematical abstraction of
an artificial neuron that receives inputs denoted by x0 to xn-1
and outputs the value denoted by y.
Fig. 1. Generic structure of an artificial neuron with n inputs (x0,…,xn-1).
These outputs are multiplied by the corresponding weights (w0,…,wn-1),
summed up and passed through an activation function f.
The ANN is comprised of interconnected layers, where
each layer consists of a certain number of artificial neurons.
During the calculation phase, the ANN is activated in a way
called forward computation. In this form of activity, the
signals are flowing from the input layer, through one or
more hidden layers.
Finally, the last computations are carried out in an output
layer. Fig. 2 presents a schematic chart of ANN with L
layers and N neurons in each layer. The general structure of
the ANN can have variable amount of artificial neurons in
its layers.
TVLSI-00648-2014
Fig. 2. Generic structure of Feedforward ANN with L layers with N
artificial neurons in each layer.
C. Backpropagation Algorithm
Prior to executing any forward computation, the ANN
should be trained. The training algorithm of our choice is the
“Back-Propagation” algorithm [2]. It is one of the most
common models of ANNs and many others are based on it.
In our design, we have slightly modified the algorithm
flow described in [1]. Our change to the original algorithm
comes in a form of treating the biases as actual neurons.
In order to demonstrate the Backpropagation algorithm, we
consider an MLP, similar to the presented in Fig. 2. The MLP
consists of 𝐿 layers having 𝑁 𝑙
neurons in every 𝑙 𝑡ℎ
layer. The
execution of the error backpropagation algorithm is an
iterative process. In our context, 𝑛 denotes the current
iteration. The algorithm itself consists of the following five
steps:
1) Initialization of parameters
Prior to training, the following parameters must be set:
𝜇 is defined as the learning rate and is a constant scaling
factor used in error correction during each iteration of the
algorithm. 𝑤 𝑘𝑗
𝑙
(𝑛 = 0) is the weight of the connection
from neuron 𝑗 in the (𝑙 − 1)𝑡ℎ
layer, to 𝑘 𝑡ℎ
neuron in the
𝑙 𝑡ℎ
layer. This weight is modified in every iteration, where
iteration 𝑛 = 0 stands for the initialization. 𝜃 𝑙
is the
value of a bias of 𝑙 𝑡ℎ
layer. Bias can be thought of as an
additional neuron that has no inputs and it always outputs
a constant value. In our model, bias neuron is placed as a
last neuron in an input layer and in every hidden layer as
well.
All of the mentioned parameters are chosen based on
heuristics. For instance, the typical bias value is around
1.0, the algorithm step is a fractional number from an
interval (0,1) and the weights can be initialized randomly
to values between -1 and 1.
2) Training Example Input
For ANN with 𝑁0 inputs and 𝑁𝑙−1 outputs, the training set
T is defined as follows:
𝑻 ≜ {(𝑰, 𝑪)| 𝑰 ∈ ℝ 𝑵 𝟎 , 𝑪 ∈ ℝ 𝑵 𝒍−𝟏} ( 1 )
Where 𝐶 is the output vector, corresponding to an input
vector 𝐼. Every iteration, one training example (𝐼, 𝐶) ∈ 𝑇
is fed into the network.
3) Forward Computation
Data signals from neurons of a previous (𝑙 − 1) 𝑡ℎ
layer ,
are propagated towards the neurons in the 𝑙 𝑡ℎ
layer.
During that process, each neuron in the hidden and the
output layers calculates the weighted sum ( 2 ) of its
inputs.
𝑆 𝑘
𝑙
= ∑ 𝑜𝑗
𝑙−1
𝑤 𝑘𝑗
𝑙
𝑁 𝑙−1
𝑗=0
( 2 )
Where 𝑙 ∈ [ 1, . . . , 𝐿 − 1] denotes the layer number
and 𝑘 ∈ [0, … , 𝑁 𝑙
− 1] denotes the neuron number.
𝑁 𝑙−1
is the amount of neurons in (𝑙 − 1)𝑡ℎ
layer, not
including the bias neuron. 𝑆 𝑘
𝑙
is the weighted sum of the
𝑘 𝑡ℎ
neuron in the 𝑙 𝑡ℎ
layer. 𝑤 𝑘𝑗
𝑙
is the weight, as defined in
step (1). 𝑜𝑗
𝑙−1
is the neuron output of the 𝑗 𝑡ℎ
neuron in the
(𝑙 – 1) 𝑡ℎ
layer. As we have already mentioned, our
algorithm treats the biases as if they are actual neurons.
Therefore, the bias values that are passed to the next layer
are multiplied by the corresponding weights. The
advantage of this strategy is that the biasing becomes
adjustable during the learning phase. To sum up, the output
of each neuron is as follows:
𝑜 𝑘
𝑙
= {
𝑓(𝑆 𝑘
𝑙
) 𝑓𝑜𝑟 𝑘 < 𝑁 𝑙
𝜃 𝑙
𝑓𝑜𝑟 𝑘 = 𝑁 𝑙
(𝑏𝑖𝑎𝑠 𝑛𝑒𝑢𝑟𝑜𝑛)
( 3 )
Where 𝑘 ∈ [0, … , 𝑁 𝑙
] and 𝑙 ∈ [ 1, . . . , 𝐿 − 1]. 𝑜 𝑘
𝑙
is the
output of the 𝑘 𝑡ℎ
neuron in the 𝑙 𝑡ℎ
layer. 𝜃 𝑙
is the bias of
the 𝑙 𝑡ℎ
layer. 𝑓(𝑆 𝑘
𝑙
) is the activation function, which
modifies the weighted sum 𝑆 𝑘
𝑙
by passing it through a non-
linear operation. Our algorithm uses a unipolar sigmoid as
an activation function. To be specific, the Log-sigmoid
function ( 4 ) is used.
𝑓(𝑥) =
1
1 + 𝑒−𝑥
( 4 )
4) Backward Computation
The backward computation, compares the current
abilities of the network versus the expected result.
Consequently, this comparison induces the appropriate
TVLSI-00648-2014
adjustment to the network’s weights. To put it another
way, the algorithm tries to minimize the error between
the 𝐶 (correct) value and the actual output value that was
determined during the forward computation.
The backward computation begins at the output layer
and proceeds towards the input layer. During the
backward computation, the following steps are
performed:
a) Local gradients calculation
𝐸 𝑘
𝑙
= {
𝐶 𝑘 − 𝑜 𝑘
𝑙
𝑙 = 𝐿 − 1
∑ 𝑤 𝑘𝑗
𝑙+1
𝛿𝑗
𝑙+1
𝑁 𝑙+1
𝑗=1
𝑙 ∈ [1, … , 𝐿 − 2]
( 5 )
Where 𝐸 𝑘
𝑙
is the calculated error for the 𝑘 𝑡ℎ
neuron in the
𝑙 𝑡ℎ
layer, defined as the difference between the correct
value 𝐶 𝑘 and the actual neuron output 𝑜 𝑘
𝑙
. The 𝛿𝑗
𝑙+1
is the
gradient of the 𝑗 𝑡ℎ
neuron in the (𝑙 + 1) 𝑡ℎ
:
𝛿 𝑘
𝑙
= 𝐸 𝑘
𝑙
𝑓′
(𝐻𝑘
𝑙
) ( 6 )
Where𝑙 ∈ [1, … , 𝐿 − 1] and 𝑓′(∙) - derivative of the
activation function.
b) Weight change calculation.
Δ𝑤 𝑘𝑗
𝑙
= 𝜇𝛿 𝑘
𝑙
𝑜𝑗
𝑙−1
( 7 )
Where 𝑘 ∈ [1, … , 𝑁 𝑙
] and 𝑗 ∈ [1, … , 𝑁 𝑙−1]. Δ𝑤 𝑘𝑗
𝑙
defines
the change in weight value of the connection from neuron
𝑗 in the (𝑙 − 1) 𝑡ℎ
layer, to neuron 𝑘 in the 𝑙 𝑡ℎ
layer.
c) Weight update
𝑤 𝑘𝑗
𝑙
(𝑛 + 1) = Δ𝑤 𝑘𝑗
𝑙
(𝑛) + 𝑤 𝑘𝑗
𝑙
(𝑛) ( 8 )
Where 𝑘 ∈ [0, … , 𝑁 𝑙
− 1] and 𝑗 ∈ [0 … , 𝑁 𝑙−1]. 𝑤 𝑘𝑗
𝑙
(𝑛 +
1) is the updated weight to be used in the next (i.e.
the (𝑛 + 1) 𝑡ℎ
) iteration of the forward computation.
Δ𝑤 𝑘𝑗
𝑙
(𝑛) is the weight change calculated in the 𝑛 𝑡ℎ
iteration of the backward computation, where 𝑛 is the
current iteration. 𝑤 𝑘𝑗
𝑙
(𝑛) is the weight to be used in the 𝑛 𝑡ℎ
iteration of the forward and backward computations, where
𝑛 is the current iteration.
5) Backward Computation
Repeating the steps 2-4 for every example (𝐼, 𝐶) ∈ 𝑇 is
considered as one global iteration. One can choose to
continue training the MLP using one or more global
iterations until a predefined stopping criteria is met. Once
learning phase is complete, the ANN can execute the
forward computations on unknown inputs.
D. Sigmoid and its Derivative Approximation
As we mentioned in section B, every artificial neuron must
implement a nonlinear activation function. The nonlinear
functions are problematic in hardware implementation,
therefore an approximation is required.
The Log-sigmoid functions were widely used in many of
the state-of-the-art ANNs [1]. Therefore, this type of
activation function has become the type of our choice as well.
The approximation that we used was the piecewise linear
approximation that was described in [5] and it is defined as
follows:
𝑓𝑆 𝑖𝑔𝐴𝑝𝑝𝑟𝑜𝑥(𝑥) = {
1 𝑓𝑜𝑟 𝑥 > 2
0.25𝑥 + 0.5 𝑓𝑜𝑟 − 2 ≤ 𝑥 ≤ 2
0 𝑓𝑜𝑟 𝑥 < −2
( 9 )
Note that the derivative of a sigmoid function already has
the convenient form as follows:
𝑓𝑆𝑖𝑔𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒(𝑥) = 𝑥(1 − 𝑥)
( 10 )
III. ARCHITECTURE OVERVIEW
As it was mentioned in the introduction, the design was
broken down into four hierarchical levels (see Fig. 3). In this
chapter, we present a review of our architecture, with respect
to this hierarchical model.
The first level is the basic block of the network – the
artificial neuron. The artificial neuron is capable of forward
and backward computations as well as of storing its incoming
links’ weights and its latest output. The artificial neurons are
grouped together in layers, where every two adjacent layers
are interconnected. This type of layered structure forms the
second abstraction level, which is the artificial neural
network itself. Similarly, the ANN is capable of forward and
backward computations, where the adjacent layers activate
each other during the computational process.
The third level of abstraction deals with the board
environment and the processing system (PS) neighborhood.
This technological setup provides the input/output (I/O)
interface between the ANN and the PS, through the memory.
The top level of abstraction is the software application that
runs within the PS and controls the ANN. Practically
speaking, the ANN’s design must be already programmed
into the programmable logic (PL) in order to be controlled by
PS.
The next chapter is about to elaborate on how the
previously mentioned abstraction is implemented.
TVLSI-00648-2014
Fig. 3. The 4 levels abstraction model, as it used in the current architecture
IV. IMPLEMENTATION
This chapter presents the 5 steps of the design process that
was conducted. The work started with a high-level software
simulation. After the simulation, the design process followed
a bottom-up procedure with respect to the model described in
chapter III and Fig. 3.
A. Software Simulation
Designing a full neural architecture is a complicated
process that involves an enormous amount of algorithmic and
structural decisions. Accordingly, it would be a good practice
to try different high level designs, prior to a detailed
implementation. This approach provides the ability to
compare different designs without wasting precious time on
detailed implementation.
In order to test of the high-level functionality, a high-level
simulator was designed. The simulator was written in Java
language and provided an input interface for the learning set
and the calculation inputs. The restriction of the simulator
was that it supported only an MLP with a single hidden layer.
The high-level simulator provided the simulation of the
first two levels in the hierarchical model presented in Fig. 3.
In other words, it simulated the artificial neural network only,
without the board environment. This provided the ability to
concentrate on the algorithmic correctness of the design.
The simulator application is initiated with two sets of
arguments. The first set of arguments describes the topology,
i.e. amounts of neurons in each layer and the bias values. The
second set of arguments consists of the learning parameters,
such as the training-set filename, learning rate (µ), number of
learning iteration per example and number of global
iterations. In addition, there is a special flag for determining
how the weight randomization is made. The weight
randomization can be done randomly, or it can be extracted
from the VHDL module of the implementation. The latter
option is extremely useful for conducting a comparison
between the software simulation and the hardware
implementation. Thus, by extracting the initial weights from
the hardware module, the initial conditions of the software
simulator and the hardware become identical.
The software simulator was trained using parts of the
training sets of “Breast-Cancer-Wisconsin” and “Lenses
classification” [9] problems. After the training was complete,
the simulator was tested on sporadic examples of the training
set, which were not previously fed in the training phase.
The importance of the software simulation appeared in a
form of crucial design decisions that will be described here
shortly. First, the network topology was chosen to be an MLP
with its neurons fully connected between adjacent layers.
Secondly, the training back-propagate algorithm was tested
with stopping criteria which is based on a fixed number of
iterations. Third, the data representation was chosen to be the
IEEE-754 32-bit floating-point. According to the studies
described in [6], this data representation has shown high
precision results. Finally, the last significant decision was
made regarding the sigmoid approximation as already shown
in chapter II, section D.
Another important use of software simulation was during
the validation of an ANN logical design. This process will be
described in 4.3.
B. Artificial Neuron
The basic building block of our FPGA neural network will
be an Artificial Neuron. In our text, we use the terms
Activator and Artificial Neuron interchangeably.
The activator is in charge of both the calculations and the
storage. On the one hand, the activator handles both forward
and backward computations, including training algorithm.
These computations are carried out by an efficient utilization
Fig. 4. Activator's higher level state machine. It provides two main types of
functionality – the forward and the backward computation. The lower level
state machines are initiated by the FWDi and the BCKi states.
TVLSI-00648-2014
of a single 32bit Acc-Mult core [7]. A computational
transaction is initiated by an appropriate signal from the
network control. When the activator completes the
transaction, an appropriate signal is outputted to the net, as
well as a valid activator’s floating-point output. On the other
hand, every activator is in charge of storing its input weights,
and its latest calculated output.
The activator’s control is comprised of two hierarchical
levels. The lower level involves various state machines.
These state machines are in charge of atomic tasks, such as
the computation of the derivatives, or the sigmoid activation.
The higher control level schedules the execution of the
previously mentioned lower level machines. As a matter of
fact, the higher control level was implemented by a sole state
machine and it is shown in Fig. 4. The top level of the
activator’s schematic is shown in Fig. A.1.
At this point, it is important to emphasize the modularity
of components within the activator. As presented in Fig. 4,
every atomic operation is encapsulated by a lower-level state
machine. Therefore, any of these state machines can be easily
replaced with a different one, whereas the rest of the activator
control can remain with no changes.
Additional important principle to emphasize is module
parameterization. Parameterized modules contribute to the
reusability of the sources. As a trivial example, consider the
mux unit, that was instantiated 6 times and each instance had
a different configuration. Another example is the select unit,
which is in charge of scheduling the inputs and the weights
for a serial accumulation, which appears in two differently
configured instances. These two components are likely to be
designed as parameterized modules. In our design, we
utilized the VHDL generics in order to create parametrically
defined modules.
C. Network
Our ANN’s hardware structure was divided into datapath
and control components. This section elaborates on these
components of ANN’s.
The ANN’s datapath connects the Activator modules
together and forms a complex computational network. The
activators are arranged in layers, where every couple of
adjacent layers have their activators fully connected. For
example, given two adjacent layers 1 and 2, every activator
in layer 1 connects its output to every activator in layer 2. For
additional illustration, see Fig. 5, or for a detailed schematic
view see Fig. A.2. Auxiliary connections between the
neurons are used for error back propagation.
During the network activity, at every given moment, at
most one layer is active (computing). When all the activators
in a given layer report ‘done’, the current layer is halted and
the next layer begins the job. This sequential calculation
method is inevitable, since every layer depends upon the
previous one. This dependence appears both in forward and
backward computations, and can be observed in equations (
2 ) and ( 5 ) respectively.
Fig. 5. ANN’s datapath is formed by interconnecting Activators. The ANN’s
controller is in charge of initiating the forward / backward computations of
the network.
The input values, such as the learning step parameter and
the correct values for training, are received from the Load-
Store Machine (will be described in the next section). These
parameters are passed through the datapath directly to the
activators. The outputs of the output layer are declared as the
network output and are passed back to the Load-Store
Machine (will be described in the next section).
The ANN’s control is a relatively simple state machine. It
receives an instruction from the Load-Store Machine (will be
described in the next chapter) and initiates the network. The
received instruction can be: ‘Calc’ – only the forward
computation will be run on the network; or ‘Learn’ – forward
& error back-propagation will be run on the network.
The generation of the network structure involves an
enormous amount of wiring. It is clear that the best way to
deal with this task is scripting. A net-generating script will
not only prevent man-made mistakes, but also make it
simpler to rearrange the network structure if needed. Indeed,
every classification problem has a unique network structure
(number of layers, activators in every layer, etc.). Therefore,
there is a need for a parameterized network generator.
Using a simple Python scripting, we have created an
automated network generator. The script was in charge of
generating the network datapath, control and the Load-Store-
Machine VHDL codes. The arguments to the script were: the
bias values, the number of layers in the network and a list of
activator amounts per layer. Additional important role of this
script was the weights randomization. When the script writes
down the VHDL lines, it hardwires the initial random floating
point values in the ‘Weights’ module instances of every
activator. The randomization is an important step towards a
successful convergence of the learning phase.
TVLSI-00648-2014
In order to validate the design, we have used various test
benches. The output of these test benches was compared to
the software simulator (see section A), to ensure functional
consistency.
D. Board Environment
The hardware platform of the research was Xilinx
ZedBoard Zynq Evaluation Kit along with Vivado 2014.2
and SDK 2014 CAD tools. The ZedBoard consists of the PS
and the PL. The objective of this section is to describe the
interface between the PS and the PL via the shared memory
of the board.
In the previous section, the whole logical design of the
ANN was introduced. In this section, we encapsulate the
ANN’s design by a Load Store Machine (LSM) module. The
LSM module is in charge of the following. (i) Receiving
directives from the memory. (ii) Guiding the network
controller with corresponding control signals. (iii) Reporting
the calculation results back to the memory.
Fig. 6. Complete view of a board configuration. The interface between the
programmable logic (PL) and the board infrastructure is in the form of the
three data channels DataIn, CFG and DataOut.
The LSM is connected to the board’s infrastructure by
means of 3 AXI ports named DataIn, Cfg and DataOut. Each
of these AXI ports maps a certain region of the board’s
DRAM to a single 128KB BRAM module in order to allow a
fast access. The port DataIn is used for feeding data samples
for the input layer of the network. The Cfg port is used for
triggering the network’s operation and configuring its
learning parameters, such as the learning step size, number of
iterations, etc. The ports DataIn and Cfg are used by LSM for
read only operations. As for the output, the port DataOut
provides the LSM with the ability to report upon a completion
of a task and to write back the values of the output layer.
The top module of the hardware design is called
NeuralNetworkTop and is illustrated in Fig. 6. It is important
to point that the PL fabric clock was set to 200MHz as a part
of the board configuration.
E. Software Application
In the previous sections, a full logical design and a board
configuration were described. The only missing part of a
puzzle is the software application that will run over the PS
and supply the actual data of our choice to the ANN. The
software application’s objective is to provide an interface for
a human user, letting him choose a training set, train the
network and consequently be able to query it with the new
inputs. This section describes how the software application
completes the whole design of the project. It is important to
notice that this application runs on a PS within the board. For
our convenience, the board can be connected to the PC and
its console window can be observed on a PC as well.
The application was written in C language, compiled for
ARM processor, and then exported as an ELF file for
execution on the PS of the board. The board support packages
and the hardware platform project must be compiled with the
application. The process of the compilation and of the export
was carried out using the SDK 2014 tool, which is a part of
the Xilinx evaluation kit described in section D.
The functionality of the software application is as follows.
(i) Copy the training dataset and the training configurations
to DataIn and Cfg regions respectively. (ii) Initiate the
learning job and wait for completion. (iii) Read calculation
input from the user and initiate calculation job. (iv) Display
the calculation results and return to step iii. A demonstration
of an application run is shown in Fig. 7.
Fig. 7. Console view of running the software application and entering one
calculation example (the 9 floating-point numbers). The calculation session
on the artificial neural network outputted the values 0.0, 1.0.
V. TESTS
A. Test Setup
The hardware platforms for the analysis were the
ZedBoard Zynq Evaluation Kit and a PC equipped with Intel
i7 3770 CPU @ 3.4 GHz, 16GB of RAM.
The neural problem of our choice was “Breast-Cancer-
Wisconsin” [9]. This problem is described as follows. Given
patient’s 9 parameters, predict whether the patient has benign
or malignant breast tumor. Our neural network was
synthesized in a structure of MLP with one hidden layer,
having 9 input, 8 hidden and two output neurons. The outputs
were encoded as “0, 1” for “benign” and “1, 0” for
“malignant”. The bias values were set to 1.0 in each layer.
TVLSI-00648-2014
Two main characteristics were analyzed. The first
characteristic was the precision of the neural computation,
whereas the second characteristic was the speed of the
training and calculation phases. These two metrics are
described in sections B and C respectively
B. Precision
The precision stands for the fraction of the correct
answers returned by the neural network. For the precision
analysis, we chose a network that was trained with the 260
training examples of the Breast-Cancer training set. The
training configurations were set to 2000 global iterations and
5 iterations per example. The training algorithm’s step size
was 0.2.
In order to challenge the network, it was queried with 50
input examples that have not appeared in the training set.
Since the outputs of the network were not always round
values 1.0 and 0.0, we have decided to treat the outputs that
were below 0.5 as 0.0 and outputs above 0.5 as 1.0.
Eventually, the test showed 48 correctly answered queries,
therefore the precision was 96%.
C. Speed
When evaluating the computational speed of ANN, the
most commonly used metrics are MCPS and MCUPS [8].
The processing speed, i.e. multiply and accumulate
operations performed in unit time, is measured by MCPS
(Millions of Connections per Second). The learning speed,
which is the rate of weight updates, is measured by MCUPS
(Millions of Connection Updates per Second).
A calculation of the performance in terms of MCPS and
MCUPS is performed while ignoring external effects that
might have limited the system performance. Assuming
200MHz PL-clock, the above metrics were extracted from the
simulation waveforms presented in Fig. 8.
Fig. 8. A waveform of the time interval between the assertion of fwd_0
signal (forward computation initiation) and the reception of f_done_0
signal (report of completion of a forward computation).
As it can be seen in Fig. 8, the interval between the
forward computation’s initiation signal (fwd_0) and the
completion report (f_done_0) lasts for 9.7𝜇𝑠𝑒𝑐𝑠. During this
interval, the hidden layer, which consists of 8 activators,
processes 9 inputs + 1 bias input. Recall that this design uses
fully-connected topology between the input and the hidden
layers, therefore 80 connection operations were made during
the hidden layer’s forward computation. Thus, the processing
speed is given by:
Processing Speed =
80
9.7𝜇
× 106
= 𝟖. 𝟐𝟒 𝑴𝑪𝑷𝑺
The next category to be tested is the learning speed. It is
important to note that the measuring of the learning speed is
performed only for the operation described by equation ( 8 )
i.e. the weight update. This specific operation takes place
during the state ‘7’ (or ‘BCK4’ as appears in Fig. 4) of the
activators state machine. By observing the Fig. 9, we derive
that the hidden layer’s weight update time takes 14.65𝜇𝑠𝑒𝑐𝑠.
Fig. 9. A waveform of the time period that the activator control stays in ‘7’
state which is responsible for the weigh update.
Similarly to the forward computation, there were 80
connections and therefore 80 weight values were updated
during this interval of time. Therefore, the derived learning
speed is given by:
Learning Speed =
80
14.65𝜇
× 106
= 𝟓. 𝟒𝟔 𝑴𝑪𝑼𝑷𝑺
VI. CONCLUSIONS
A. Goals vs. Results
The initial goal of the research was to create a flexible
neural design. As the development went on, concurrent goals
appeared, such as generalization of the design. Eventually,
the design became not only efficient, but also modular and
extensible. The modularity provided the ability to easily
assemble any neural structure, with any amount of activators
and layers.
The functionality of the architecture proved to be correct
in the conducted tests. First, in order to ensure basic
correctness, limited tests were conducted on small training
sets. Afterward, the more comprehensive tests were run in
order to calculate the precision rate of the architecture, these
tests have shown a great result of 96% precision.
As for the performance, the design was tested with a
limited amount of activators (8 in the hidden layer). It showed
a performance similar to the designs that appeared in [8].
B. Improvement Suggestions and Further Work
In this section, we enumerate suggestions for future works
that can rely on our current design.
Deep Learning. As our design fits deep neural
architectures. Our design can be instantiated as DNN,
providing a platform for an advanced research, such as [12].
Making the state machines more compact. In the current
design, some state machines use excess states to ensure
robustness. These redundant states create more transitions
TVLSI-00648-2014
during the execution, resulting in longer latencies. Additional
effort can be made to examine the necessity of these
redundant states.
Avoiding multiplication in non-arithmetic blocks. The
synthesis process creates “parasitic” multipliers in specific
areas of the design. These parts should be investigated and
the HDL code there should be rewritten in such a form that
will not trigger a multiplier synthesizing. The goal is to have
only one multiplier per activator module.
Fixed point accuracy. The neural computations are
performed mostly on fractional numbers between 1 and -1. In
our design we have used floating point format, which
required 32 bits per every numeric value. Rather than using
the floating point, a less flexible format can be used. For
example, Q15 fixed point format requires only 16 bits and
still supplies a good precision.
Utilize on chip learning. The idea of on-chip learning was
not implemented in our design. It implies running the
learning phase prior to optimizing implementation.
Hopefully, this should reduce the number of actual
connections for FPGA programming [10].
Utilize DSP features. Features such as “multiply-
accumulate” (MAC) [11] are likely to replace sequential
forward computations. Another example is “dot-product”,
which can be useful for weight update. Today, these DSP
features are already embedded in the evaluation boards.
Hence additional research can be conducted in order to utilize
these features as a part of the neural network.
Reducing the number of special registers in the activator.
There are registers such as mudelta and error which store
intermediate calculation results. These registers can be united
to a single register that will store the last calculation only.
TVLSI-00648-2014
APPENDIX
Fig. A.2. Schematic view of an artificial neural network. This specific network is comprised of 3 input neurons, followed by 3 hidden neurons, followed by
2 output neurons. Additional logic between the layers is added in order to trigger the next layer computation.
clk
reset
outW
actIn
initW
fwd
bck
fin
delta
bp_done
f_done
actOut
bpDelta
inW
clk
reset
outW
actIn
initW
fwd
bck
fin
delta
bp_done
f_done
actOut
bpDelta
inW
clk
reset
outW
actIn
initW
fwd
bck
fin
delta
bp_done
f_done
actOut
bpDelta
inW
clk
reset
outW
actIn
initW
fwd
bck
fin
delta
bp_done
f_done
actOut
bpDelta
inW
f_done_0
bp_done_0
f_done_0
delta_1_0
delta_1_0
delta_1_1
actOut[31:0]
correct[31:0]
clk
reset
actIn
initW
fwd
bck
fin
delta
bp_done
f_done
actOut
bpDelta
inW
delta_1_1
actOut[63:31]
correct[63:31]
delta_1_0
delta_1_1
delta_1_0
delta_1_1
bp_done_1
f_done_1
bp_done_1
outW_1_0
outWoutW_1_1
outW_1_0[95:64]
outW_1_1[95:64]
outW_1_0[31:0]
outW_1_1[31:0]
outW_1_0[63:31]
outW_1_1[63:31]
1
-1
1
-1
random
value
random
value
random
value
random
value
random
value
f_done_0
fwd
bck
ndp_en
actIn
bp_done_1
clk
reset
mu
mu
mu
mu
mu
mu
mu
mu
mu
mu
mu
mu
f_done_1
f_done
bp_done
bp_done_0
ACC_MULT
B
A
acc_init
acc_init_value
acc_out
acc_done
Mux_AMux_B
Act_in_mux
bp_Delta_Mux in_W_Mux
acc_en
initW
W
address
sel
sel
new_W
acc_Out
curr_Weight
sel_B
sel_A
sig_acc_init
mult_acc_init
slope_acc_init
sel_f_acc_init
sel_b_acc_init
slopeA
errorVal
bp_Delta_Mux_Out
deltaVal
acc_out
slopeB
slopeVal
in_W_Mux_Out
muDelta
0.25
w_en
sel_w_en sel_out
sel_f_acc_en
sel_f_acc_initsel_f_done
sel_f_f_en
sel_f_bp_en
nextVal acc_done
mult_acc_en
sel_b_acc_en
sel_f_acc_en
sig_acc_en
slope_acc_en
in_W_Mux_Out
sel
sel
bp_Delta_Mux_Out
sel_w_en sel_out
sel_b_acc_en
sel_b_acc_initsel_b_done
sel_b_f_en
sel_b_bp_en
nextVal acc_donesel_b_en
selsel_acc_init
0 curr_Weight 0.5
in
en
out
in
en
out
in
en
out
in
en
out
sig_in
sig_en
sig_acc_en
sig_acc_done
sig_out
sig_acc_init
store_delta
store_slope
store_error
store_muDelta
slopeVal
deltaVal
errorVal
muDeltaVal
sig_en
acc_done
sig_acc_init
sig_acc_en
actOut
w_en
sel_b_acc_init
sel_b_acc_en
sel_b_done
bp_Delta_Mux_Out
bp_Delta_Mux_Out
acc_done
sig_res
acc_res
slope_acc_en
slope_acc_done
slopa_A
slope_acc_init
acc_done
slope_acc_init
slope_acc_en
slopeA
slope_calc_en
slopa_B
slope_calc_done
slope_calc_en
slopeB
slope_clac_done
clk
reset
clk
reset
clk
reset
clk
reset
clk
reset
clk
reset
clk
reset
mult_ctl_en mult_acc_en
mult_acc_done
mult_acc_init
acc_done
mult_acc_init
mult_acc_en
mult_ctl_done
mult_ctl_en
mult_ctl_done
clk
reset
sel_b_done
sel_f_done
sig_done
mult_ctl_done
slope_calc_done
fwd
bck
fin
sel_b_done
sel_f_done
sig_done
mult_ctl_done
slope_calc_done
fwd
bck
fin
sig_en
sel_init
sel_B
sel_A
sel_b_en
sel_f_f_en
sel_f_bp_en
mult_ctl_en
bp_done
f_done
store_slope
store_muDelta
store_delta
store_error
sig_en
sel_init
sel_B
sel_A
sel_b_en
sel_f_f_en
sel_f_bp_en
mult_ctl_en
bp_done
f_done
store_slope
store_muDelta
store_delta
store_error
initW
clk
reset
actIn
inWbpDelta
mu
bp_done
f_done
delta
outW
Fig. A.1. Artificial neuron's top level schematic. The main control implemented in Activator_Ctl. The secondary state machines are implemented in
sigmoid, slope_calc_ctl, mult_ctl, Weights, Select_f and Select_b. The ACC_MULT unit is the core of the floating point computations. The storage takes
place in the Weights unit.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11
ACKNOWLEDGMENT
We would like to thank Avi Efrati, the VLSI lab manager, for
introducing this research to us and for an outstanding assistance
during the development. We also thank Guy Lampert, the
Xilinx application engineer, for the assistance during the board
configuration stage. In addition, we thank the Xilinx Company
for its cooperation with Tel Aviv University and supplying the
ZedBoard evaluation kits.
REFERENCES
[1] OMONDI, Amos R.; RAJAPAKSE, Jagath Chandana (ed.). FPGA
implementations of neural networks. New York, NY, USA:: Springer,
2006.
[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning
internalrepresentations by error propagation,” newblock MIT Press 1986,
Nature, vol. 323, pp. 533–536.
[3] MISRA, Janardan; SAHA, Indranil. Artificial neural networks in
hardware: A survey of two decades of progress. Neurocomputing, 2010,
74.1: 239-255.
[4] GOMPERTS, Alexander; UKIL, Abhisek; ZURFLUH, Franz.
Development and implementation of parameterized FPGA-based general
purpose neural networks for online applications. Industrial Informatics,
IEEE Transactions on, 2011, 7.1: 78-89.
[5] BASTERRETXEA, K.; TARELA, JOSÉ MANUEL; MASTORAKIS, N.
Approximation of sigmoid function and the derivative for artificial
neurons. advances in neural networks and applications, WSES Press,
Athens, 2001, 397-401.
[6] SAHIN, Suhap; BECERIKLI, Yasar; YAZICI, Suleyman. Neural
network implementation in hardware using FPGAs. In: Neural
Information Processing. Springer Berlin Heidelberg, 2006. p. 1105-1112.
[7] REMADEVI, R. Design and Simulation of Floating Point Multiplier
Based on VHDL. International Journal of Enginnering Research and
Applications, 2013, 3.2: 283-286.
[8] IENNE, P., “Architectures for Neuro-Computers: Review and
Performance Evaluation”, EPFL Technical Report 93/21, 1993.
[9] Bache, K. & Lichman, M. UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science, 2013.
[10] LIN, Cheng-Jian; LEE, Chi-Yung. FPGA Implementation of a recurrent
neural fuzzy network with on-chip learning for prediction and
identification applications. Journal of Information Science and
Engineering, 2009, 25.2: 575-589.
[11] NEDJAH, Nadia, et al. Dynamic MAC-based architecture of artificial
neural networks suitable for hardware implementation on FPGAs.
Neurocomputing, 2009, 72.10: 2171-2179.
[12] Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Web-Scale
Training for Face Identification. arXiv preprint arXiv:1406.5266.

More Related Content

What's hot

Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...COMPEGENCE
 
Artificial Neural Network Abstract
Artificial Neural Network AbstractArtificial Neural Network Abstract
Artificial Neural Network AbstractAnjali Agrawal
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)irjes
 
A simplified design of multiplier for multi layer feed forward hardware neura...
A simplified design of multiplier for multi layer feed forward hardware neura...A simplified design of multiplier for multi layer feed forward hardware neura...
A simplified design of multiplier for multi layer feed forward hardware neura...eSAT Publishing House
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentationguestac67362
 
Artificial Neural Network and its Applications
Artificial Neural Network and its ApplicationsArtificial Neural Network and its Applications
Artificial Neural Network and its Applicationsshritosh kumar
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkmustafa aadel
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksmadhu sudhakar
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognitionHưng Đặng
 
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...Willy Marroquin (WillyDevNET)
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkManasa Mona
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkBurhan Muzafar
 
Neural Network Research Projects Topics
Neural Network Research Projects TopicsNeural Network Research Projects Topics
Neural Network Research Projects TopicsMatlab Simulation
 

What's hot (19)

Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
Compegence: Dr. Rajaram Kudli - An Introduction to Artificial Neural Network ...
 
Artificial Neural Network Abstract
Artificial Neural Network AbstractArtificial Neural Network Abstract
Artificial Neural Network Abstract
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
A simplified design of multiplier for multi layer feed forward hardware neura...
A simplified design of multiplier for multi layer feed forward hardware neura...A simplified design of multiplier for multi layer feed forward hardware neura...
A simplified design of multiplier for multi layer feed forward hardware neura...
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentation
 
Neural
NeuralNeural
Neural
 
Artificial Neural Network and its Applications
Artificial Neural Network and its ApplicationsArtificial Neural Network and its Applications
Artificial Neural Network and its Applications
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
 
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
 
Neural networks introduction
Neural networks introductionNeural networks introduction
Neural networks introduction
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Neural Network Research Projects Topics
Neural Network Research Projects TopicsNeural Network Research Projects Topics
Neural Network Research Projects Topics
 
E04423133
E04423133E04423133
E04423133
 

Viewers also liked

(Artificial) Neural Network
(Artificial) Neural Network(Artificial) Neural Network
(Artificial) Neural NetworkPutri Wikie
 
Ann by rutul mehta
Ann by rutul mehtaAnn by rutul mehta
Ann by rutul mehtaRutul Mehta
 
what is neural network....???
what is neural network....???what is neural network....???
what is neural network....???Adii Shah
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)James Boulie
 
Use of artificial neural network in pattern recognition
Use of artificial neural network in pattern recognitionUse of artificial neural network in pattern recognition
Use of artificial neural network in pattern recognitionkamalsrit
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
Artificial neural network for misuse detection
Artificial neural network for misuse detectionArtificial neural network for misuse detection
Artificial neural network for misuse detectionLikan Patra
 
8 bit single cycle processor
8 bit single cycle processor8 bit single cycle processor
8 bit single cycle processorDhaval Kaneria
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
neural network
neural networkneural network
neural networkSTUDENT
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligenceu053675
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligencevallibhargavi
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural NetworkDerek Kane
 

Viewers also liked (17)

(Artificial) Neural Network
(Artificial) Neural Network(Artificial) Neural Network
(Artificial) Neural Network
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Ann by rutul mehta
Ann by rutul mehtaAnn by rutul mehta
Ann by rutul mehta
 
Abstract
AbstractAbstract
Abstract
 
what is neural network....???
what is neural network....???what is neural network....???
what is neural network....???
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)
 
Use of artificial neural network in pattern recognition
Use of artificial neural network in pattern recognitionUse of artificial neural network in pattern recognition
Use of artificial neural network in pattern recognition
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Artificial neural network for misuse detection
Artificial neural network for misuse detectionArtificial neural network for misuse detection
Artificial neural network for misuse detection
 
8 bit single cycle processor
8 bit single cycle processor8 bit single cycle processor
8 bit single cycle processor
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Back propagation
Back propagationBack propagation
Back propagation
 
neural network
neural networkneural network
neural network
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 

Similar to Artificial Neural Network Implementation on FPGA – a Modular Approach

Artificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipArtificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipMaria Perkins
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)irjes
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...IOSR Journals
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...IOSR Journals
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Ece Rljit
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesIRJET Journal
 
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural NetworkIRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural NetworkIRJET Journal
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...IJERA Editor
 
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...ijaia
 
Artificial neural networks and its applications
Artificial neural networks and its applications Artificial neural networks and its applications
Artificial neural networks and its applications PoojaKoshti2
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final ReportShikhar Agarwal
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptxPandi Gingee
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 

Similar to Artificial Neural Network Implementation on FPGA – a Modular Approach (20)

Artificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipArtificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA Chip
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of Trusses
 
IRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural NetworkIRJET-Breast Cancer Detection using Convolution Neural Network
IRJET-Breast Cancer Detection using Convolution Neural Network
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
 
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
 
Artificial neural networks and its applications
Artificial neural networks and its applications Artificial neural networks and its applications
Artificial neural networks and its applications
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
 
N ns 1
N ns 1N ns 1
N ns 1
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
 
Lesson 37
Lesson 37Lesson 37
Lesson 37
 
AI Lesson 37
AI Lesson 37AI Lesson 37
AI Lesson 37
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 

Artificial Neural Network Implementation on FPGA – a Modular Approach

  • 1. TVLSI-00648-2014  Abstract—In this paper, we present an FPGA implementation of an artificial neural network. In addition, the current paper emphasizes important FPGA design principles, which turn the development of a neural network into a much more modular procedure. In fact, these principles may be found extremely useful for those who plan to implement a neural architecture on an FPGA. Our implementation was based on a multilayer perceptron topology and used the back-propagate algorithm for training. Thanks to a highly modular and parameterized structure, a network with any number of layers and neurons can be synthesized in a matter of minutes. This means that the system can be quickly configured for solving any type of neural network related problem, including examples that involve deep learning. The design was fitted on Xilinx Zynq-7000 at 200MHz clock frequency. The peak performances measured were 5.46 MCUPS and 8.24 MCPS during the training and computation respectively. The implementation was successfully tested using a training set of “Breast Cancer Wisconsin” classification problem. Tests have shown 96% hit ratio. Index Terms—Artificial neural networks, Backpropagation, Deep Learning, Design methodology, Feedforward neural networks, Field programmable gate arrays, Learning systems, Multi-layer neural network, Multilayer perceptrons, Neural network hardware, Parallel architectures, Reconfigurable architectures. I. INTRODUCTION HE artificial neural networks, or ANNs, are computational models that provide a unique way of data processing. This computational model was inspired by an animal’s neural system and was found useful in areas such as pattern recognition, classification, approximation, etc. A general structure of an ANN consists of a network of interconnected artificial neurons. Moreover, the ANN is a trainable machine, thus it is capable of learning patterns for later recognition. An interest in data classification, such as face recognition, has significantly increased in the past few years. As the available training sets are getting bigger, more and more researchers are attracted to improve the ANN’s recognition K. Berestizhevsky and R. Levy are with the Department of Electrical Engineering Systems, Tel Aviv University, Tel Aviv 69978, Israel. (e-mail: konsta9@mail.tau.ac.il; roeelev1@mail.tau.ac.il). abilities. For instance, a significant research direction has been conducted in the field of deep learning, carried out on Deep Neural Networks, or DNNs [12]. On the one hand, the ANN model provides an efficient, uncentralized way of computation. On the other hand, the very fine grain parallelism of this model, makes it fit well only in highly parallel hardware platforms. Various previous work on finding an appropriate platform for neural architecture was conducted. Platforms such as general purpose computers, dedicated parallel computers and ASICs were examined [3]. Eventually, it seems that when dealing with neural architectures, the FPGA platform provides the best compromise among its competitors. This compromise is in terms of time-to-market, cost, speed, area and reliability. Extended comparison between the platforms was described in [1]. In our study, we have been looking for a flexible solution for designing a neural architecture. The platform of our choice, based on the previously mentioned papers, was decided to be FPGA. Unfortunately, the early literature review has shown lack in properly described design principles of neural architecture on FPGA. One significant work was conducted in [4] and provided additional inspiration for our paper to come. Hence, in this paper, we describe a modular method for designing an ANN on an FPGA. The idea presented in this paper relies on a hierarchical approach. In other words, we break the structure of ANN down into hierarchical levels of abstraction. To begin with, the base level is the single artificial neuron. Afterwards, the following levels encapsulate one or more lower-level components and lead the development towards the top level of the abstraction, which is the software application that controls the global actions of the network. The paper is organized as follows. Chapter II describes the background behind the hardware platform used, the ANN theory and the main algorithms used. Chapter III presents the hierarchical approach for a neural network’s design process. Chapter IV extends this approach to concrete development steps. Chapter V presents the conducted tests and the results. Artificial Neural Network Implementation on FPGA – a Modular Approach K. Berestizhevsky and R. Levy T
  • 2. TVLSI-00648-2014 Chapter II concludes the study and suggests further improvements. II. BACKGROUND A. FPGA Technology Field Programmable Gate Arrays (FPGAs) are programmable devices that can be configured to function as custom digital circuits. The configuration of an FPGA device is done by means of hardware description languages such as Verilog or VHDL. Thus, by describing the logic and by programming the FPGA with it, one can obtain an ASIC-like hardware. The FPGA hardware consists of 2D array of configurable logical blocks (CLBs) that can be set to implement specific logical functions. The connections between these CLBs are configurable as well. Furthermore, every logical function can be programmed into the FPGA as long as there are enough CLBs and the connectivity between them is available. The fact that FPGA utilizes generic CLBs, located in fixed positions on a chip, imposes a significant drawback. This means that every custom design is forced to be fitted on this fixed arrangement of blocks. As a result, an FPGA design will always have much poorer performance than the same logic on a properly designed ASIC. Nevertheless the development time for FPGA design is measured by weeks, whereas ASIC development times are measured by months. Moreover, an appropriately parameterized FPGA design can be reconfigured and re-programmed into the chip in a matter of minutes. In our study, we’ve used Xilinx Zynq™-7000 as the FPGA platform. This platform is equipped with a dual core ARM Cortex™-A9 and provides extensive programming abilities as well as a convenient interface with a PC. B. Artificial Neural Networks Artificial neural networks (ANNs) are computational models for solving various problems. These models can be implemented both in software and in hardware. During the last decades, ANNs were used for engineering purposes, such as studying behavior and control in animals and machines, pattern recognition, forecasting, and data compression. First, we present a few important definitions: ANN – Artificial Neural Network, consists of artificial neurons, ordered in layers. The signals in the network are flowing through these layers (see Fig. 2). Artificial Neuron = “activator” = “node” of the ANN General activator consists of multiple input ports, summing junction, activation function (threshold, ramp, sigmoidal, etc.) and one output port (see Fig. 1). MLP– multilayer perceptron. A form of a feed-forward neural network consisting of an input layer, followed by two or more hidden layers of neurons, the last of which is the output layer. DNN – deep neural network, another way to refer to MLP with more than a single hidden layer. Learning Phase – period of time, when ANN determines its learnable parameters, based on learning set. This phase involves the training algorithm. Calculation Phase – period of time when ANN is fed with inputs and the forward computation on these inputs generate an output of the network. In the nature, the real neurons receive electrical signals through synapses. When the received signals surpass a threshold, the neuron emits a signal through its axon. The artificial neuron attempts to replicate this natural behavior. The electrical signals appear in a form of digital inputs, their strength is their numeric value multiplied by a weight of a corresponding input edge. The activation of an artificial neuron is done by applying a nonlinear function (such as sigmoid) to the accumulated value of the neuron’s inputs. The final calculated value of an artificial neuron appears on its output port. Fig. 1 presents the mathematical abstraction of an artificial neuron that receives inputs denoted by x0 to xn-1 and outputs the value denoted by y. Fig. 1. Generic structure of an artificial neuron with n inputs (x0,…,xn-1). These outputs are multiplied by the corresponding weights (w0,…,wn-1), summed up and passed through an activation function f. The ANN is comprised of interconnected layers, where each layer consists of a certain number of artificial neurons. During the calculation phase, the ANN is activated in a way called forward computation. In this form of activity, the signals are flowing from the input layer, through one or more hidden layers. Finally, the last computations are carried out in an output layer. Fig. 2 presents a schematic chart of ANN with L layers and N neurons in each layer. The general structure of the ANN can have variable amount of artificial neurons in its layers.
  • 3. TVLSI-00648-2014 Fig. 2. Generic structure of Feedforward ANN with L layers with N artificial neurons in each layer. C. Backpropagation Algorithm Prior to executing any forward computation, the ANN should be trained. The training algorithm of our choice is the “Back-Propagation” algorithm [2]. It is one of the most common models of ANNs and many others are based on it. In our design, we have slightly modified the algorithm flow described in [1]. Our change to the original algorithm comes in a form of treating the biases as actual neurons. In order to demonstrate the Backpropagation algorithm, we consider an MLP, similar to the presented in Fig. 2. The MLP consists of 𝐿 layers having 𝑁 𝑙 neurons in every 𝑙 𝑡ℎ layer. The execution of the error backpropagation algorithm is an iterative process. In our context, 𝑛 denotes the current iteration. The algorithm itself consists of the following five steps: 1) Initialization of parameters Prior to training, the following parameters must be set: 𝜇 is defined as the learning rate and is a constant scaling factor used in error correction during each iteration of the algorithm. 𝑤 𝑘𝑗 𝑙 (𝑛 = 0) is the weight of the connection from neuron 𝑗 in the (𝑙 − 1)𝑡ℎ layer, to 𝑘 𝑡ℎ neuron in the 𝑙 𝑡ℎ layer. This weight is modified in every iteration, where iteration 𝑛 = 0 stands for the initialization. 𝜃 𝑙 is the value of a bias of 𝑙 𝑡ℎ layer. Bias can be thought of as an additional neuron that has no inputs and it always outputs a constant value. In our model, bias neuron is placed as a last neuron in an input layer and in every hidden layer as well. All of the mentioned parameters are chosen based on heuristics. For instance, the typical bias value is around 1.0, the algorithm step is a fractional number from an interval (0,1) and the weights can be initialized randomly to values between -1 and 1. 2) Training Example Input For ANN with 𝑁0 inputs and 𝑁𝑙−1 outputs, the training set T is defined as follows: 𝑻 ≜ {(𝑰, 𝑪)| 𝑰 ∈ ℝ 𝑵 𝟎 , 𝑪 ∈ ℝ 𝑵 𝒍−𝟏} ( 1 ) Where 𝐶 is the output vector, corresponding to an input vector 𝐼. Every iteration, one training example (𝐼, 𝐶) ∈ 𝑇 is fed into the network. 3) Forward Computation Data signals from neurons of a previous (𝑙 − 1) 𝑡ℎ layer , are propagated towards the neurons in the 𝑙 𝑡ℎ layer. During that process, each neuron in the hidden and the output layers calculates the weighted sum ( 2 ) of its inputs. 𝑆 𝑘 𝑙 = ∑ 𝑜𝑗 𝑙−1 𝑤 𝑘𝑗 𝑙 𝑁 𝑙−1 𝑗=0 ( 2 ) Where 𝑙 ∈ [ 1, . . . , 𝐿 − 1] denotes the layer number and 𝑘 ∈ [0, … , 𝑁 𝑙 − 1] denotes the neuron number. 𝑁 𝑙−1 is the amount of neurons in (𝑙 − 1)𝑡ℎ layer, not including the bias neuron. 𝑆 𝑘 𝑙 is the weighted sum of the 𝑘 𝑡ℎ neuron in the 𝑙 𝑡ℎ layer. 𝑤 𝑘𝑗 𝑙 is the weight, as defined in step (1). 𝑜𝑗 𝑙−1 is the neuron output of the 𝑗 𝑡ℎ neuron in the (𝑙 – 1) 𝑡ℎ layer. As we have already mentioned, our algorithm treats the biases as if they are actual neurons. Therefore, the bias values that are passed to the next layer are multiplied by the corresponding weights. The advantage of this strategy is that the biasing becomes adjustable during the learning phase. To sum up, the output of each neuron is as follows: 𝑜 𝑘 𝑙 = { 𝑓(𝑆 𝑘 𝑙 ) 𝑓𝑜𝑟 𝑘 < 𝑁 𝑙 𝜃 𝑙 𝑓𝑜𝑟 𝑘 = 𝑁 𝑙 (𝑏𝑖𝑎𝑠 𝑛𝑒𝑢𝑟𝑜𝑛) ( 3 ) Where 𝑘 ∈ [0, … , 𝑁 𝑙 ] and 𝑙 ∈ [ 1, . . . , 𝐿 − 1]. 𝑜 𝑘 𝑙 is the output of the 𝑘 𝑡ℎ neuron in the 𝑙 𝑡ℎ layer. 𝜃 𝑙 is the bias of the 𝑙 𝑡ℎ layer. 𝑓(𝑆 𝑘 𝑙 ) is the activation function, which modifies the weighted sum 𝑆 𝑘 𝑙 by passing it through a non- linear operation. Our algorithm uses a unipolar sigmoid as an activation function. To be specific, the Log-sigmoid function ( 4 ) is used. 𝑓(𝑥) = 1 1 + 𝑒−𝑥 ( 4 ) 4) Backward Computation The backward computation, compares the current abilities of the network versus the expected result. Consequently, this comparison induces the appropriate
  • 4. TVLSI-00648-2014 adjustment to the network’s weights. To put it another way, the algorithm tries to minimize the error between the 𝐶 (correct) value and the actual output value that was determined during the forward computation. The backward computation begins at the output layer and proceeds towards the input layer. During the backward computation, the following steps are performed: a) Local gradients calculation 𝐸 𝑘 𝑙 = { 𝐶 𝑘 − 𝑜 𝑘 𝑙 𝑙 = 𝐿 − 1 ∑ 𝑤 𝑘𝑗 𝑙+1 𝛿𝑗 𝑙+1 𝑁 𝑙+1 𝑗=1 𝑙 ∈ [1, … , 𝐿 − 2] ( 5 ) Where 𝐸 𝑘 𝑙 is the calculated error for the 𝑘 𝑡ℎ neuron in the 𝑙 𝑡ℎ layer, defined as the difference between the correct value 𝐶 𝑘 and the actual neuron output 𝑜 𝑘 𝑙 . The 𝛿𝑗 𝑙+1 is the gradient of the 𝑗 𝑡ℎ neuron in the (𝑙 + 1) 𝑡ℎ : 𝛿 𝑘 𝑙 = 𝐸 𝑘 𝑙 𝑓′ (𝐻𝑘 𝑙 ) ( 6 ) Where𝑙 ∈ [1, … , 𝐿 − 1] and 𝑓′(∙) - derivative of the activation function. b) Weight change calculation. Δ𝑤 𝑘𝑗 𝑙 = 𝜇𝛿 𝑘 𝑙 𝑜𝑗 𝑙−1 ( 7 ) Where 𝑘 ∈ [1, … , 𝑁 𝑙 ] and 𝑗 ∈ [1, … , 𝑁 𝑙−1]. Δ𝑤 𝑘𝑗 𝑙 defines the change in weight value of the connection from neuron 𝑗 in the (𝑙 − 1) 𝑡ℎ layer, to neuron 𝑘 in the 𝑙 𝑡ℎ layer. c) Weight update 𝑤 𝑘𝑗 𝑙 (𝑛 + 1) = Δ𝑤 𝑘𝑗 𝑙 (𝑛) + 𝑤 𝑘𝑗 𝑙 (𝑛) ( 8 ) Where 𝑘 ∈ [0, … , 𝑁 𝑙 − 1] and 𝑗 ∈ [0 … , 𝑁 𝑙−1]. 𝑤 𝑘𝑗 𝑙 (𝑛 + 1) is the updated weight to be used in the next (i.e. the (𝑛 + 1) 𝑡ℎ ) iteration of the forward computation. Δ𝑤 𝑘𝑗 𝑙 (𝑛) is the weight change calculated in the 𝑛 𝑡ℎ iteration of the backward computation, where 𝑛 is the current iteration. 𝑤 𝑘𝑗 𝑙 (𝑛) is the weight to be used in the 𝑛 𝑡ℎ iteration of the forward and backward computations, where 𝑛 is the current iteration. 5) Backward Computation Repeating the steps 2-4 for every example (𝐼, 𝐶) ∈ 𝑇 is considered as one global iteration. One can choose to continue training the MLP using one or more global iterations until a predefined stopping criteria is met. Once learning phase is complete, the ANN can execute the forward computations on unknown inputs. D. Sigmoid and its Derivative Approximation As we mentioned in section B, every artificial neuron must implement a nonlinear activation function. The nonlinear functions are problematic in hardware implementation, therefore an approximation is required. The Log-sigmoid functions were widely used in many of the state-of-the-art ANNs [1]. Therefore, this type of activation function has become the type of our choice as well. The approximation that we used was the piecewise linear approximation that was described in [5] and it is defined as follows: 𝑓𝑆 𝑖𝑔𝐴𝑝𝑝𝑟𝑜𝑥(𝑥) = { 1 𝑓𝑜𝑟 𝑥 > 2 0.25𝑥 + 0.5 𝑓𝑜𝑟 − 2 ≤ 𝑥 ≤ 2 0 𝑓𝑜𝑟 𝑥 < −2 ( 9 ) Note that the derivative of a sigmoid function already has the convenient form as follows: 𝑓𝑆𝑖𝑔𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒(𝑥) = 𝑥(1 − 𝑥) ( 10 ) III. ARCHITECTURE OVERVIEW As it was mentioned in the introduction, the design was broken down into four hierarchical levels (see Fig. 3). In this chapter, we present a review of our architecture, with respect to this hierarchical model. The first level is the basic block of the network – the artificial neuron. The artificial neuron is capable of forward and backward computations as well as of storing its incoming links’ weights and its latest output. The artificial neurons are grouped together in layers, where every two adjacent layers are interconnected. This type of layered structure forms the second abstraction level, which is the artificial neural network itself. Similarly, the ANN is capable of forward and backward computations, where the adjacent layers activate each other during the computational process. The third level of abstraction deals with the board environment and the processing system (PS) neighborhood. This technological setup provides the input/output (I/O) interface between the ANN and the PS, through the memory. The top level of abstraction is the software application that runs within the PS and controls the ANN. Practically speaking, the ANN’s design must be already programmed into the programmable logic (PL) in order to be controlled by PS. The next chapter is about to elaborate on how the previously mentioned abstraction is implemented.
  • 5. TVLSI-00648-2014 Fig. 3. The 4 levels abstraction model, as it used in the current architecture IV. IMPLEMENTATION This chapter presents the 5 steps of the design process that was conducted. The work started with a high-level software simulation. After the simulation, the design process followed a bottom-up procedure with respect to the model described in chapter III and Fig. 3. A. Software Simulation Designing a full neural architecture is a complicated process that involves an enormous amount of algorithmic and structural decisions. Accordingly, it would be a good practice to try different high level designs, prior to a detailed implementation. This approach provides the ability to compare different designs without wasting precious time on detailed implementation. In order to test of the high-level functionality, a high-level simulator was designed. The simulator was written in Java language and provided an input interface for the learning set and the calculation inputs. The restriction of the simulator was that it supported only an MLP with a single hidden layer. The high-level simulator provided the simulation of the first two levels in the hierarchical model presented in Fig. 3. In other words, it simulated the artificial neural network only, without the board environment. This provided the ability to concentrate on the algorithmic correctness of the design. The simulator application is initiated with two sets of arguments. The first set of arguments describes the topology, i.e. amounts of neurons in each layer and the bias values. The second set of arguments consists of the learning parameters, such as the training-set filename, learning rate (µ), number of learning iteration per example and number of global iterations. In addition, there is a special flag for determining how the weight randomization is made. The weight randomization can be done randomly, or it can be extracted from the VHDL module of the implementation. The latter option is extremely useful for conducting a comparison between the software simulation and the hardware implementation. Thus, by extracting the initial weights from the hardware module, the initial conditions of the software simulator and the hardware become identical. The software simulator was trained using parts of the training sets of “Breast-Cancer-Wisconsin” and “Lenses classification” [9] problems. After the training was complete, the simulator was tested on sporadic examples of the training set, which were not previously fed in the training phase. The importance of the software simulation appeared in a form of crucial design decisions that will be described here shortly. First, the network topology was chosen to be an MLP with its neurons fully connected between adjacent layers. Secondly, the training back-propagate algorithm was tested with stopping criteria which is based on a fixed number of iterations. Third, the data representation was chosen to be the IEEE-754 32-bit floating-point. According to the studies described in [6], this data representation has shown high precision results. Finally, the last significant decision was made regarding the sigmoid approximation as already shown in chapter II, section D. Another important use of software simulation was during the validation of an ANN logical design. This process will be described in 4.3. B. Artificial Neuron The basic building block of our FPGA neural network will be an Artificial Neuron. In our text, we use the terms Activator and Artificial Neuron interchangeably. The activator is in charge of both the calculations and the storage. On the one hand, the activator handles both forward and backward computations, including training algorithm. These computations are carried out by an efficient utilization Fig. 4. Activator's higher level state machine. It provides two main types of functionality – the forward and the backward computation. The lower level state machines are initiated by the FWDi and the BCKi states.
  • 6. TVLSI-00648-2014 of a single 32bit Acc-Mult core [7]. A computational transaction is initiated by an appropriate signal from the network control. When the activator completes the transaction, an appropriate signal is outputted to the net, as well as a valid activator’s floating-point output. On the other hand, every activator is in charge of storing its input weights, and its latest calculated output. The activator’s control is comprised of two hierarchical levels. The lower level involves various state machines. These state machines are in charge of atomic tasks, such as the computation of the derivatives, or the sigmoid activation. The higher control level schedules the execution of the previously mentioned lower level machines. As a matter of fact, the higher control level was implemented by a sole state machine and it is shown in Fig. 4. The top level of the activator’s schematic is shown in Fig. A.1. At this point, it is important to emphasize the modularity of components within the activator. As presented in Fig. 4, every atomic operation is encapsulated by a lower-level state machine. Therefore, any of these state machines can be easily replaced with a different one, whereas the rest of the activator control can remain with no changes. Additional important principle to emphasize is module parameterization. Parameterized modules contribute to the reusability of the sources. As a trivial example, consider the mux unit, that was instantiated 6 times and each instance had a different configuration. Another example is the select unit, which is in charge of scheduling the inputs and the weights for a serial accumulation, which appears in two differently configured instances. These two components are likely to be designed as parameterized modules. In our design, we utilized the VHDL generics in order to create parametrically defined modules. C. Network Our ANN’s hardware structure was divided into datapath and control components. This section elaborates on these components of ANN’s. The ANN’s datapath connects the Activator modules together and forms a complex computational network. The activators are arranged in layers, where every couple of adjacent layers have their activators fully connected. For example, given two adjacent layers 1 and 2, every activator in layer 1 connects its output to every activator in layer 2. For additional illustration, see Fig. 5, or for a detailed schematic view see Fig. A.2. Auxiliary connections between the neurons are used for error back propagation. During the network activity, at every given moment, at most one layer is active (computing). When all the activators in a given layer report ‘done’, the current layer is halted and the next layer begins the job. This sequential calculation method is inevitable, since every layer depends upon the previous one. This dependence appears both in forward and backward computations, and can be observed in equations ( 2 ) and ( 5 ) respectively. Fig. 5. ANN’s datapath is formed by interconnecting Activators. The ANN’s controller is in charge of initiating the forward / backward computations of the network. The input values, such as the learning step parameter and the correct values for training, are received from the Load- Store Machine (will be described in the next section). These parameters are passed through the datapath directly to the activators. The outputs of the output layer are declared as the network output and are passed back to the Load-Store Machine (will be described in the next section). The ANN’s control is a relatively simple state machine. It receives an instruction from the Load-Store Machine (will be described in the next chapter) and initiates the network. The received instruction can be: ‘Calc’ – only the forward computation will be run on the network; or ‘Learn’ – forward & error back-propagation will be run on the network. The generation of the network structure involves an enormous amount of wiring. It is clear that the best way to deal with this task is scripting. A net-generating script will not only prevent man-made mistakes, but also make it simpler to rearrange the network structure if needed. Indeed, every classification problem has a unique network structure (number of layers, activators in every layer, etc.). Therefore, there is a need for a parameterized network generator. Using a simple Python scripting, we have created an automated network generator. The script was in charge of generating the network datapath, control and the Load-Store- Machine VHDL codes. The arguments to the script were: the bias values, the number of layers in the network and a list of activator amounts per layer. Additional important role of this script was the weights randomization. When the script writes down the VHDL lines, it hardwires the initial random floating point values in the ‘Weights’ module instances of every activator. The randomization is an important step towards a successful convergence of the learning phase.
  • 7. TVLSI-00648-2014 In order to validate the design, we have used various test benches. The output of these test benches was compared to the software simulator (see section A), to ensure functional consistency. D. Board Environment The hardware platform of the research was Xilinx ZedBoard Zynq Evaluation Kit along with Vivado 2014.2 and SDK 2014 CAD tools. The ZedBoard consists of the PS and the PL. The objective of this section is to describe the interface between the PS and the PL via the shared memory of the board. In the previous section, the whole logical design of the ANN was introduced. In this section, we encapsulate the ANN’s design by a Load Store Machine (LSM) module. The LSM module is in charge of the following. (i) Receiving directives from the memory. (ii) Guiding the network controller with corresponding control signals. (iii) Reporting the calculation results back to the memory. Fig. 6. Complete view of a board configuration. The interface between the programmable logic (PL) and the board infrastructure is in the form of the three data channels DataIn, CFG and DataOut. The LSM is connected to the board’s infrastructure by means of 3 AXI ports named DataIn, Cfg and DataOut. Each of these AXI ports maps a certain region of the board’s DRAM to a single 128KB BRAM module in order to allow a fast access. The port DataIn is used for feeding data samples for the input layer of the network. The Cfg port is used for triggering the network’s operation and configuring its learning parameters, such as the learning step size, number of iterations, etc. The ports DataIn and Cfg are used by LSM for read only operations. As for the output, the port DataOut provides the LSM with the ability to report upon a completion of a task and to write back the values of the output layer. The top module of the hardware design is called NeuralNetworkTop and is illustrated in Fig. 6. It is important to point that the PL fabric clock was set to 200MHz as a part of the board configuration. E. Software Application In the previous sections, a full logical design and a board configuration were described. The only missing part of a puzzle is the software application that will run over the PS and supply the actual data of our choice to the ANN. The software application’s objective is to provide an interface for a human user, letting him choose a training set, train the network and consequently be able to query it with the new inputs. This section describes how the software application completes the whole design of the project. It is important to notice that this application runs on a PS within the board. For our convenience, the board can be connected to the PC and its console window can be observed on a PC as well. The application was written in C language, compiled for ARM processor, and then exported as an ELF file for execution on the PS of the board. The board support packages and the hardware platform project must be compiled with the application. The process of the compilation and of the export was carried out using the SDK 2014 tool, which is a part of the Xilinx evaluation kit described in section D. The functionality of the software application is as follows. (i) Copy the training dataset and the training configurations to DataIn and Cfg regions respectively. (ii) Initiate the learning job and wait for completion. (iii) Read calculation input from the user and initiate calculation job. (iv) Display the calculation results and return to step iii. A demonstration of an application run is shown in Fig. 7. Fig. 7. Console view of running the software application and entering one calculation example (the 9 floating-point numbers). The calculation session on the artificial neural network outputted the values 0.0, 1.0. V. TESTS A. Test Setup The hardware platforms for the analysis were the ZedBoard Zynq Evaluation Kit and a PC equipped with Intel i7 3770 CPU @ 3.4 GHz, 16GB of RAM. The neural problem of our choice was “Breast-Cancer- Wisconsin” [9]. This problem is described as follows. Given patient’s 9 parameters, predict whether the patient has benign or malignant breast tumor. Our neural network was synthesized in a structure of MLP with one hidden layer, having 9 input, 8 hidden and two output neurons. The outputs were encoded as “0, 1” for “benign” and “1, 0” for “malignant”. The bias values were set to 1.0 in each layer.
  • 8. TVLSI-00648-2014 Two main characteristics were analyzed. The first characteristic was the precision of the neural computation, whereas the second characteristic was the speed of the training and calculation phases. These two metrics are described in sections B and C respectively B. Precision The precision stands for the fraction of the correct answers returned by the neural network. For the precision analysis, we chose a network that was trained with the 260 training examples of the Breast-Cancer training set. The training configurations were set to 2000 global iterations and 5 iterations per example. The training algorithm’s step size was 0.2. In order to challenge the network, it was queried with 50 input examples that have not appeared in the training set. Since the outputs of the network were not always round values 1.0 and 0.0, we have decided to treat the outputs that were below 0.5 as 0.0 and outputs above 0.5 as 1.0. Eventually, the test showed 48 correctly answered queries, therefore the precision was 96%. C. Speed When evaluating the computational speed of ANN, the most commonly used metrics are MCPS and MCUPS [8]. The processing speed, i.e. multiply and accumulate operations performed in unit time, is measured by MCPS (Millions of Connections per Second). The learning speed, which is the rate of weight updates, is measured by MCUPS (Millions of Connection Updates per Second). A calculation of the performance in terms of MCPS and MCUPS is performed while ignoring external effects that might have limited the system performance. Assuming 200MHz PL-clock, the above metrics were extracted from the simulation waveforms presented in Fig. 8. Fig. 8. A waveform of the time interval between the assertion of fwd_0 signal (forward computation initiation) and the reception of f_done_0 signal (report of completion of a forward computation). As it can be seen in Fig. 8, the interval between the forward computation’s initiation signal (fwd_0) and the completion report (f_done_0) lasts for 9.7𝜇𝑠𝑒𝑐𝑠. During this interval, the hidden layer, which consists of 8 activators, processes 9 inputs + 1 bias input. Recall that this design uses fully-connected topology between the input and the hidden layers, therefore 80 connection operations were made during the hidden layer’s forward computation. Thus, the processing speed is given by: Processing Speed = 80 9.7𝜇 × 106 = 𝟖. 𝟐𝟒 𝑴𝑪𝑷𝑺 The next category to be tested is the learning speed. It is important to note that the measuring of the learning speed is performed only for the operation described by equation ( 8 ) i.e. the weight update. This specific operation takes place during the state ‘7’ (or ‘BCK4’ as appears in Fig. 4) of the activators state machine. By observing the Fig. 9, we derive that the hidden layer’s weight update time takes 14.65𝜇𝑠𝑒𝑐𝑠. Fig. 9. A waveform of the time period that the activator control stays in ‘7’ state which is responsible for the weigh update. Similarly to the forward computation, there were 80 connections and therefore 80 weight values were updated during this interval of time. Therefore, the derived learning speed is given by: Learning Speed = 80 14.65𝜇 × 106 = 𝟓. 𝟒𝟔 𝑴𝑪𝑼𝑷𝑺 VI. CONCLUSIONS A. Goals vs. Results The initial goal of the research was to create a flexible neural design. As the development went on, concurrent goals appeared, such as generalization of the design. Eventually, the design became not only efficient, but also modular and extensible. The modularity provided the ability to easily assemble any neural structure, with any amount of activators and layers. The functionality of the architecture proved to be correct in the conducted tests. First, in order to ensure basic correctness, limited tests were conducted on small training sets. Afterward, the more comprehensive tests were run in order to calculate the precision rate of the architecture, these tests have shown a great result of 96% precision. As for the performance, the design was tested with a limited amount of activators (8 in the hidden layer). It showed a performance similar to the designs that appeared in [8]. B. Improvement Suggestions and Further Work In this section, we enumerate suggestions for future works that can rely on our current design. Deep Learning. As our design fits deep neural architectures. Our design can be instantiated as DNN, providing a platform for an advanced research, such as [12]. Making the state machines more compact. In the current design, some state machines use excess states to ensure robustness. These redundant states create more transitions
  • 9. TVLSI-00648-2014 during the execution, resulting in longer latencies. Additional effort can be made to examine the necessity of these redundant states. Avoiding multiplication in non-arithmetic blocks. The synthesis process creates “parasitic” multipliers in specific areas of the design. These parts should be investigated and the HDL code there should be rewritten in such a form that will not trigger a multiplier synthesizing. The goal is to have only one multiplier per activator module. Fixed point accuracy. The neural computations are performed mostly on fractional numbers between 1 and -1. In our design we have used floating point format, which required 32 bits per every numeric value. Rather than using the floating point, a less flexible format can be used. For example, Q15 fixed point format requires only 16 bits and still supplies a good precision. Utilize on chip learning. The idea of on-chip learning was not implemented in our design. It implies running the learning phase prior to optimizing implementation. Hopefully, this should reduce the number of actual connections for FPGA programming [10]. Utilize DSP features. Features such as “multiply- accumulate” (MAC) [11] are likely to replace sequential forward computations. Another example is “dot-product”, which can be useful for weight update. Today, these DSP features are already embedded in the evaluation boards. Hence additional research can be conducted in order to utilize these features as a part of the neural network. Reducing the number of special registers in the activator. There are registers such as mudelta and error which store intermediate calculation results. These registers can be united to a single register that will store the last calculation only.
  • 10. TVLSI-00648-2014 APPENDIX Fig. A.2. Schematic view of an artificial neural network. This specific network is comprised of 3 input neurons, followed by 3 hidden neurons, followed by 2 output neurons. Additional logic between the layers is added in order to trigger the next layer computation. clk reset outW actIn initW fwd bck fin delta bp_done f_done actOut bpDelta inW clk reset outW actIn initW fwd bck fin delta bp_done f_done actOut bpDelta inW clk reset outW actIn initW fwd bck fin delta bp_done f_done actOut bpDelta inW clk reset outW actIn initW fwd bck fin delta bp_done f_done actOut bpDelta inW f_done_0 bp_done_0 f_done_0 delta_1_0 delta_1_0 delta_1_1 actOut[31:0] correct[31:0] clk reset actIn initW fwd bck fin delta bp_done f_done actOut bpDelta inW delta_1_1 actOut[63:31] correct[63:31] delta_1_0 delta_1_1 delta_1_0 delta_1_1 bp_done_1 f_done_1 bp_done_1 outW_1_0 outWoutW_1_1 outW_1_0[95:64] outW_1_1[95:64] outW_1_0[31:0] outW_1_1[31:0] outW_1_0[63:31] outW_1_1[63:31] 1 -1 1 -1 random value random value random value random value random value f_done_0 fwd bck ndp_en actIn bp_done_1 clk reset mu mu mu mu mu mu mu mu mu mu mu mu f_done_1 f_done bp_done bp_done_0 ACC_MULT B A acc_init acc_init_value acc_out acc_done Mux_AMux_B Act_in_mux bp_Delta_Mux in_W_Mux acc_en initW W address sel sel new_W acc_Out curr_Weight sel_B sel_A sig_acc_init mult_acc_init slope_acc_init sel_f_acc_init sel_b_acc_init slopeA errorVal bp_Delta_Mux_Out deltaVal acc_out slopeB slopeVal in_W_Mux_Out muDelta 0.25 w_en sel_w_en sel_out sel_f_acc_en sel_f_acc_initsel_f_done sel_f_f_en sel_f_bp_en nextVal acc_done mult_acc_en sel_b_acc_en sel_f_acc_en sig_acc_en slope_acc_en in_W_Mux_Out sel sel bp_Delta_Mux_Out sel_w_en sel_out sel_b_acc_en sel_b_acc_initsel_b_done sel_b_f_en sel_b_bp_en nextVal acc_donesel_b_en selsel_acc_init 0 curr_Weight 0.5 in en out in en out in en out in en out sig_in sig_en sig_acc_en sig_acc_done sig_out sig_acc_init store_delta store_slope store_error store_muDelta slopeVal deltaVal errorVal muDeltaVal sig_en acc_done sig_acc_init sig_acc_en actOut w_en sel_b_acc_init sel_b_acc_en sel_b_done bp_Delta_Mux_Out bp_Delta_Mux_Out acc_done sig_res acc_res slope_acc_en slope_acc_done slopa_A slope_acc_init acc_done slope_acc_init slope_acc_en slopeA slope_calc_en slopa_B slope_calc_done slope_calc_en slopeB slope_clac_done clk reset clk reset clk reset clk reset clk reset clk reset clk reset mult_ctl_en mult_acc_en mult_acc_done mult_acc_init acc_done mult_acc_init mult_acc_en mult_ctl_done mult_ctl_en mult_ctl_done clk reset sel_b_done sel_f_done sig_done mult_ctl_done slope_calc_done fwd bck fin sel_b_done sel_f_done sig_done mult_ctl_done slope_calc_done fwd bck fin sig_en sel_init sel_B sel_A sel_b_en sel_f_f_en sel_f_bp_en mult_ctl_en bp_done f_done store_slope store_muDelta store_delta store_error sig_en sel_init sel_B sel_A sel_b_en sel_f_f_en sel_f_bp_en mult_ctl_en bp_done f_done store_slope store_muDelta store_delta store_error initW clk reset actIn inWbpDelta mu bp_done f_done delta outW Fig. A.1. Artificial neuron's top level schematic. The main control implemented in Activator_Ctl. The secondary state machines are implemented in sigmoid, slope_calc_ctl, mult_ctl, Weights, Select_f and Select_b. The ACC_MULT unit is the core of the floating point computations. The storage takes place in the Weights unit.
  • 11. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11 ACKNOWLEDGMENT We would like to thank Avi Efrati, the VLSI lab manager, for introducing this research to us and for an outstanding assistance during the development. We also thank Guy Lampert, the Xilinx application engineer, for the assistance during the board configuration stage. In addition, we thank the Xilinx Company for its cooperation with Tel Aviv University and supplying the ZedBoard evaluation kits. REFERENCES [1] OMONDI, Amos R.; RAJAPAKSE, Jagath Chandana (ed.). FPGA implementations of neural networks. New York, NY, USA:: Springer, 2006. [2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internalrepresentations by error propagation,” newblock MIT Press 1986, Nature, vol. 323, pp. 533–536. [3] MISRA, Janardan; SAHA, Indranil. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing, 2010, 74.1: 239-255. [4] GOMPERTS, Alexander; UKIL, Abhisek; ZURFLUH, Franz. Development and implementation of parameterized FPGA-based general purpose neural networks for online applications. Industrial Informatics, IEEE Transactions on, 2011, 7.1: 78-89. [5] BASTERRETXEA, K.; TARELA, JOSÉ MANUEL; MASTORAKIS, N. Approximation of sigmoid function and the derivative for artificial neurons. advances in neural networks and applications, WSES Press, Athens, 2001, 397-401. [6] SAHIN, Suhap; BECERIKLI, Yasar; YAZICI, Suleyman. Neural network implementation in hardware using FPGAs. In: Neural Information Processing. Springer Berlin Heidelberg, 2006. p. 1105-1112. [7] REMADEVI, R. Design and Simulation of Floating Point Multiplier Based on VHDL. International Journal of Enginnering Research and Applications, 2013, 3.2: 283-286. [8] IENNE, P., “Architectures for Neuro-Computers: Review and Performance Evaluation”, EPFL Technical Report 93/21, 1993. [9] Bache, K. & Lichman, M. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2013. [10] LIN, Cheng-Jian; LEE, Chi-Yung. FPGA Implementation of a recurrent neural fuzzy network with on-chip learning for prediction and identification applications. Journal of Information Science and Engineering, 2009, 25.2: 575-589. [11] NEDJAH, Nadia, et al. Dynamic MAC-based architecture of artificial neural networks suitable for hardware implementation on FPGAs. Neurocomputing, 2009, 72.10: 2171-2179. [12] Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Web-Scale Training for Face Identification. arXiv preprint arXiv:1406.5266.