SlideShare a Scribd company logo
1 of 6
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1984
Implementation of Neural Network on FPGA
Ranjith M S1, Sampanna T2, Niranjan Ganapati Hegde3, Shruthi R4
1,2,3Student, Dept. of ECE, The National Institute of Engineering, Mysuru, India
4Assistant Professor, Dept. of ECE, The National Institute of Engineering, Mysuru, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Deep neural networks and their application is rapidly changing many of the fields by bringing a different way of
solution to the problems. In fact, it is being applied in every field right from Signal processing to the communication. Although it
solves many problems easily, training and deploying it is limited due to various reasons. These deep neural networks requiremost
powerful computational devices like GPU in order to train them and large RAM, memory is required for its operation limiting its
usage in many of the applications. Even if these resources are provided throughcloudthe deployment ofsystemfacesproblemslike
more latency as there is round trip to the server, connectivity, privacy as the data needs to leave the system. All these problems
could be overcome using the FPGA’s for the deployment of the neural network as they are low cost and reconfigurable. In fact,
FPGA’s are faster than GPU during inference in many of the cases makingitsuitableforthe deployment intherealtimeapplication.
FPGA’s are faster than GPU’s as the neural network is implemented as a hardware in case of FPGA’s, where only hardware delays
are introduced but in case of CPU or GPU large number of floating-point operationsneeds tobedoneusingALU. Wewill implement
a trained neural network of TensorFlow on FPGA using Verilog HDL to perform a regression task.
Key Words: Deep learning, FPGA, Regression, Verilog HDL, Artificial Intelligence, Neural Networks, TensorFlow
1. INTRODUCTION
Currently a variety of applications like image processing are deployed on FPGA’s because of their advantages like cost-
optimized system integration, differentiated designs for a wide range of embedded applications,a significantspeedadvantage
compared to processor-based solutions. Even complicated calculations can be carried out in an extremely short time using
FPGA’s.
Deep neural networks are employed for the various tasks in various fields like object detection, speech recognition, natural
language processing, segmentation most of these are deployed on GPU. We use FPGA to employ deep neural network for the
regression task. Hardware is generated for the trained neural network which includes many floating-point operations to be
performed before the prediction is done i.e., to obtain the output. Initially the network is implemented and trained in python
environment using TensorFlow and the values of the weights are obtained as an array from the trained model and these
floating-point numbers are converted into IEEE 754 double precision format. The whole network is then broken into floating
point multiplication and addition which is implemented using Verilog HDL. Since Verilog HDL does not perform floatingpoint
arithmetic by default we implement the modules to perform floating point addition and multiplication on IEEE 754 double
precision format based on their sign, exponent and mantissa part.
Since we are using only dense networks we use He initialization[1] as compared with Xavier Initialization[2] for initializing the
weights while training the network.
“"overfitting" is greatly reduced by randomly omittinghalf ofthefeaturedetectorsoneachtrainingcase. Thispreventscomplexco-
adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each
neuron learns to detect a feature that is generally helpful for producing the correctanswergiventhecombinatoriallylargevariety
of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks”[3].
The optimizers such as Adam[4], rms[5], LARS[6] leads for the faster convergence and better optimization withoutgettingstuck
at the saddle points or at the local minima’s. “One of the key elements of super-convergence is training with one learning rate
cycle and a large maximum learning rate”[7]. To speed up training we use Batch Normalization[8].
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1985
2. EXPERIMENTAL SETUP
2.1 Dataset
The dataset contains sold prices of many houses with their information like overall area in square feet. The maximum valueof
the area is 1.306800e+06 whereas the maximum value of the pricecolumnis3600.Iftheserawvaluesofthedatasetarepassed
into the network for training, it leads to overshooting and also slows the rate of convergence as the loss surface is not smooth.
The loss surface will have large number of saddle points making it difficult for training. These problems could be overcomeby
doing the mean normalization on the data before training and same has to be applied during the inference.
Mean normalization is given by,
Where μ(i) is the average of all the values for a featureand s(i)is the standard deviation.Aftermeannormalizationthemeanofa
feature is nearly 0 and standard deviation is nearly 1. Which counters the vanishing gradient problem.
Fig-1 represent how the price varies with area before the mean normailzation is applied.
fig-1: Dataset
2.2 Training the Neural Network
We implemented two architectures for the same dataset, perceptron and the neural network (multi-output perceptron). The
perceptron is implemented in python environment and the gradient descent algorithm is used tofindtheoptimal values ofthe
weight and bias. Mean squared error is used as a cost function based on this the parameters are optimized.
Neural network is implemented and trained using TensorFlowlibraryinpython environment.The network has2hiddenlayers
and an output layer. First hidden layer has 8 hidden units to whichReLUactivationisappliedthepurposeofactivationfunction
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1986
is to introduce the non-linearities in the network. The output of the activation is applied to the second dense layer with the 4
hidden units, output of which is then applied with ReLU activation and these values are used to predict the final output. The
network architecture is as shown in fig-2. And the structure parameter and its dimension are as in table-1.
fig -2: Neural network architecture
Table -1: Network structure
Model structure and parameters
Layer Output shape NO. of parameters
Input (None, 1) 0
Dense_1 (None, 8) 16
Dense_2 (None, 4) 36
Output (None, 1) 5
Total params: 57
Trainable params: 57
Non-trainable params: 0
The network is trained using Adam optimizer with a batch size of 32 and the data is split into 90% for training and 10% for
validation. Mean squared error is used as a loss function.
Table -2: Network results after training
Final loss
Mean Squared Error
Training 0.2753
validation 0.2088
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1987
fig -3: loss during training for neural network fig -4: loss during training in perceptron
From table-2 we see the mean squared error values after training the neural network. Since the values of training error and
validation error are near it indicates that the model is not overfitted to the training data.
Fig-3 indicates how the loss (mean squared error) varied during the training of the neural network andfig-4indicateshowthe
loss (mean squared error) varied during the training (only training data) of the perceptron.Inboththecaseslossreachedvery
low value by the end of the training indicating that they have learnt to predict the proper values of the target.
The model was trained on Tesla K80 GPU which can do up to 8.73 Teraflops single-precision and up to 2.91 Teraflops double-
precision performance with NVIDIA GPU Boost
3. IMPLEMENTATION OF TRAINED MODEL IN VERILOG HDL
The trained network is implemented in Verilog HDL to perform the inference. Since the network implementation includes the
floating-point arithmetic and Verilog does not have any functions for the floating-point operations the operations have to be
implemented based on the sign, exponent and mantissa. The optimized weights are converted into IEEE 754 double precision
format having 64 bits.
fig-5 IEEE 754 double precision format
Fig-5 shows the representation of the IEEE 754 double precisionformat.First bitindicatessign1indicatenegativenumberand
0 indicate positive number. Bit 62 to 52 indicates the exponent part and other bits indicates fractional part.
In floating point multiplication, the result will be negative if any one of the numbers is negative, result will be positive if both
are negative or positive which indicates the behaviour of XOR gate. Hence the resultant sign bit of the output is determined by
XOR of two sign bits of the numbers.
Exponent part of the resultant is determined by using the formula,
Er = E1 +E2-1023
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1988
Where E1 and E2 are the exponent part of two numbers. Hence the 11-bit adder is required for the exponent part of the
resultant.
Fractional part is obtained by the normal multiplication between fractional parts of the number. And result is rounded to 52
bits. Hence the multiplier is implemented for the fractional part.
Hence all the trained network parameters are converted into IEEE 754 double precision format and the whole network is
implemented using these floating-point operation modules. The module now takes the input andoperationsaredone withthe
optimized weights to get the prediction.
Fig-6: Simulation output
From fig-6 we observe the output prediction (output_z) of the implemented hardware for the given mean normalized input
square feet.
4. RESULTS AND CONCLUSIONS
The implemented hardware using Verilog HDL produced the same result as that of the GPU or CPU which computes the result
using ALU.
The hardware implementation reduces the latency when compared with GPU or CPUinmanycases.Buttheimplementationof
the neural network on the hardware takes more time and the training of neural network is not possible in FPGA as there are
more training loops involved and the gradient computation and their update.
REFERENCES
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level
Performance on ImageNet Classification
[2] Bengio, Yoshua and Glorot, Xavier. Understanding thedifficulty of training deep feedforward neural
networks.InProceedings of AISTATS 2010, volume 9, pp. 249–256, May 2010.
[3] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov Improving neural
networks by preventing co-adaptation of feature detectors arXiv:1207.0580v1 [cs.NE]
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1989
[4] Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980v9 [cs.LG]
[5] Geoffrey Hinton. 2012. Neural Networks for MachineLearning - Lecture 6a - Overview of mini-batch gradi-ent
descent.
[6] Yang You, Igor Gitman, Boris Ginsburg. Large Batch Training of Convolutional Networks. arXiv:1708.03888v3
[cs.CV]
[7] Leslie N. Smith, Nicholay Topin. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning
Rates. arXiv:1708.07120v3 [cs.LG]
[8] Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift. arXiv:1502.03167v3 [cs.LG]
[9] Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio. Artificial Neural Networks
Applied to Taxi Destination Prediction. arXiv:1508.00021v2 [cs.LG]
[10] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,Irving, G., Isard, M., Kudlur, M.,
Levenberg, J., Monga, R., Moore, S., Murray, D. G.,Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke,M.,Yu,Y.,and
Zheng, X. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467
[cs.DC]
[11] J. D. Hunter, "Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95,
2007.
[12] Plotly Technologies Inc. (2015). Collaborative data science. Montreal, QC: Plotly Technolo-gies Inc.
[13] S. Sahin, Y. Becerik1i, S. Yazici, "Neural network implementation in hardware using FPGAs", NIP Neural Information
Processing, vol. 4234, no. 3, pp. 1105-1112, 2006.
[14] Ranjith M S, S Parameshwara. Optimizing Neural Networks for Embedded Systems. IRJET Volume 7, Issue 4, April
2020 S.NO: 211
[15] Yufeng Hao. A General Neural Network Hardware Architecture on FPGA
arXiv:1711.05860 [cs.CV]
[16] S. Coric, I. Latinovic and A. Pavasovic, "A neural network FPGA implementation," Proceedings of the 5th Seminar on
Neural Network Applications in Electrical Engineering. NEUREL 2000 (IEEE Cat. No.00EX287), Belgrade, Yugoslavia,
2000, pp. 117-120.

More Related Content

What's hot

Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Cemal Ardil
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
ijsrd.com
 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12
Alexander Decker
 

What's hot (16)

Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
 
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHMBACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
 
Optimization of latency of temporal key Integrity protocol (tkip) using graph...
Optimization of latency of temporal key Integrity protocol (tkip) using graph...Optimization of latency of temporal key Integrity protocol (tkip) using graph...
Optimization of latency of temporal key Integrity protocol (tkip) using graph...
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
Efficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of TrussesEfficiency of Neural Networks Study in the Design of Trusses
Efficiency of Neural Networks Study in the Design of Trusses
 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
 
Design and development of DrawBot using image processing
Design and development of DrawBot using image processing Design and development of DrawBot using image processing
Design and development of DrawBot using image processing
 
Qualitative Analysis of Optical Interleave Division Multiple Access using Spe...
Qualitative Analysis of Optical Interleave Division Multiple Access using Spe...Qualitative Analysis of Optical Interleave Division Multiple Access using Spe...
Qualitative Analysis of Optical Interleave Division Multiple Access using Spe...
 
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
An advancement in the N×N Multiplier Architecture Realization via the Ancient...
An advancement in the N×N Multiplier Architecture Realization via the Ancient...An advancement in the N×N Multiplier Architecture Realization via the Ancient...
An advancement in the N×N Multiplier Architecture Realization via the Ancient...
 
implementation of area efficient high speed eddr architecture
implementation of area efficient high speed eddr architectureimplementation of area efficient high speed eddr architecture
implementation of area efficient high speed eddr architecture
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12
 
I017425763
I017425763I017425763
I017425763
 

Similar to IRJET - Implementation of Neural Network on FPGA

NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
Mohid Nabil
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
Melissa Luster
 
Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpga
VLSICS Design
 

Similar to IRJET - Implementation of Neural Network on FPGA (20)

G010334554
G010334554G010334554
G010334554
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
 
Towards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsTowards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprograms
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
 
IRJET- Python Libraries and Packages for Deep Learning-A Survey
IRJET-  	  Python Libraries and Packages for Deep Learning-A SurveyIRJET-  	  Python Libraries and Packages for Deep Learning-A Survey
IRJET- Python Libraries and Packages for Deep Learning-A Survey
 
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
IRJET -  	  Handwritten Bangla Digit Recognition using Capsule NetworkIRJET -  	  Handwritten Bangla Digit Recognition using Capsule Network
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
 
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
 
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Traffic Sign Recognition Model
Traffic Sign Recognition ModelTraffic Sign Recognition Model
Traffic Sign Recognition Model
 
High Speed Optimized AES using Parallel Processing Implementation
High Speed Optimized AES using Parallel Processing ImplementationHigh Speed Optimized AES using Parallel Processing Implementation
High Speed Optimized AES using Parallel Processing Implementation
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A Review
 
The Computation Complexity Reduction of 2-D Gaussian Filter
The Computation Complexity Reduction of 2-D Gaussian FilterThe Computation Complexity Reduction of 2-D Gaussian Filter
The Computation Complexity Reduction of 2-D Gaussian Filter
 
Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpga
 
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaComparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma
 
VHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
VHDL Implementation of High Speed and Low Power BIST Based Vedic MultiplierVHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
VHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
 

More from IRJET Journal

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Recently uploaded (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 

IRJET - Implementation of Neural Network on FPGA

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1984 Implementation of Neural Network on FPGA Ranjith M S1, Sampanna T2, Niranjan Ganapati Hegde3, Shruthi R4 1,2,3Student, Dept. of ECE, The National Institute of Engineering, Mysuru, India 4Assistant Professor, Dept. of ECE, The National Institute of Engineering, Mysuru, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Deep neural networks and their application is rapidly changing many of the fields by bringing a different way of solution to the problems. In fact, it is being applied in every field right from Signal processing to the communication. Although it solves many problems easily, training and deploying it is limited due to various reasons. These deep neural networks requiremost powerful computational devices like GPU in order to train them and large RAM, memory is required for its operation limiting its usage in many of the applications. Even if these resources are provided throughcloudthe deployment ofsystemfacesproblemslike more latency as there is round trip to the server, connectivity, privacy as the data needs to leave the system. All these problems could be overcome using the FPGA’s for the deployment of the neural network as they are low cost and reconfigurable. In fact, FPGA’s are faster than GPU during inference in many of the cases makingitsuitableforthe deployment intherealtimeapplication. FPGA’s are faster than GPU’s as the neural network is implemented as a hardware in case of FPGA’s, where only hardware delays are introduced but in case of CPU or GPU large number of floating-point operationsneeds tobedoneusingALU. Wewill implement a trained neural network of TensorFlow on FPGA using Verilog HDL to perform a regression task. Key Words: Deep learning, FPGA, Regression, Verilog HDL, Artificial Intelligence, Neural Networks, TensorFlow 1. INTRODUCTION Currently a variety of applications like image processing are deployed on FPGA’s because of their advantages like cost- optimized system integration, differentiated designs for a wide range of embedded applications,a significantspeedadvantage compared to processor-based solutions. Even complicated calculations can be carried out in an extremely short time using FPGA’s. Deep neural networks are employed for the various tasks in various fields like object detection, speech recognition, natural language processing, segmentation most of these are deployed on GPU. We use FPGA to employ deep neural network for the regression task. Hardware is generated for the trained neural network which includes many floating-point operations to be performed before the prediction is done i.e., to obtain the output. Initially the network is implemented and trained in python environment using TensorFlow and the values of the weights are obtained as an array from the trained model and these floating-point numbers are converted into IEEE 754 double precision format. The whole network is then broken into floating point multiplication and addition which is implemented using Verilog HDL. Since Verilog HDL does not perform floatingpoint arithmetic by default we implement the modules to perform floating point addition and multiplication on IEEE 754 double precision format based on their sign, exponent and mantissa part. Since we are using only dense networks we use He initialization[1] as compared with Xavier Initialization[2] for initializing the weights while training the network. “"overfitting" is greatly reduced by randomly omittinghalf ofthefeaturedetectorsoneachtrainingcase. Thispreventscomplexco- adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correctanswergiventhecombinatoriallylargevariety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks”[3]. The optimizers such as Adam[4], rms[5], LARS[6] leads for the faster convergence and better optimization withoutgettingstuck at the saddle points or at the local minima’s. “One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate”[7]. To speed up training we use Batch Normalization[8].
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1985 2. EXPERIMENTAL SETUP 2.1 Dataset The dataset contains sold prices of many houses with their information like overall area in square feet. The maximum valueof the area is 1.306800e+06 whereas the maximum value of the pricecolumnis3600.Iftheserawvaluesofthedatasetarepassed into the network for training, it leads to overshooting and also slows the rate of convergence as the loss surface is not smooth. The loss surface will have large number of saddle points making it difficult for training. These problems could be overcomeby doing the mean normalization on the data before training and same has to be applied during the inference. Mean normalization is given by, Where μ(i) is the average of all the values for a featureand s(i)is the standard deviation.Aftermeannormalizationthemeanofa feature is nearly 0 and standard deviation is nearly 1. Which counters the vanishing gradient problem. Fig-1 represent how the price varies with area before the mean normailzation is applied. fig-1: Dataset 2.2 Training the Neural Network We implemented two architectures for the same dataset, perceptron and the neural network (multi-output perceptron). The perceptron is implemented in python environment and the gradient descent algorithm is used tofindtheoptimal values ofthe weight and bias. Mean squared error is used as a cost function based on this the parameters are optimized. Neural network is implemented and trained using TensorFlowlibraryinpython environment.The network has2hiddenlayers and an output layer. First hidden layer has 8 hidden units to whichReLUactivationisappliedthepurposeofactivationfunction
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1986 is to introduce the non-linearities in the network. The output of the activation is applied to the second dense layer with the 4 hidden units, output of which is then applied with ReLU activation and these values are used to predict the final output. The network architecture is as shown in fig-2. And the structure parameter and its dimension are as in table-1. fig -2: Neural network architecture Table -1: Network structure Model structure and parameters Layer Output shape NO. of parameters Input (None, 1) 0 Dense_1 (None, 8) 16 Dense_2 (None, 4) 36 Output (None, 1) 5 Total params: 57 Trainable params: 57 Non-trainable params: 0 The network is trained using Adam optimizer with a batch size of 32 and the data is split into 90% for training and 10% for validation. Mean squared error is used as a loss function. Table -2: Network results after training Final loss Mean Squared Error Training 0.2753 validation 0.2088
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1987 fig -3: loss during training for neural network fig -4: loss during training in perceptron From table-2 we see the mean squared error values after training the neural network. Since the values of training error and validation error are near it indicates that the model is not overfitted to the training data. Fig-3 indicates how the loss (mean squared error) varied during the training of the neural network andfig-4indicateshowthe loss (mean squared error) varied during the training (only training data) of the perceptron.Inboththecaseslossreachedvery low value by the end of the training indicating that they have learnt to predict the proper values of the target. The model was trained on Tesla K80 GPU which can do up to 8.73 Teraflops single-precision and up to 2.91 Teraflops double- precision performance with NVIDIA GPU Boost 3. IMPLEMENTATION OF TRAINED MODEL IN VERILOG HDL The trained network is implemented in Verilog HDL to perform the inference. Since the network implementation includes the floating-point arithmetic and Verilog does not have any functions for the floating-point operations the operations have to be implemented based on the sign, exponent and mantissa. The optimized weights are converted into IEEE 754 double precision format having 64 bits. fig-5 IEEE 754 double precision format Fig-5 shows the representation of the IEEE 754 double precisionformat.First bitindicatessign1indicatenegativenumberand 0 indicate positive number. Bit 62 to 52 indicates the exponent part and other bits indicates fractional part. In floating point multiplication, the result will be negative if any one of the numbers is negative, result will be positive if both are negative or positive which indicates the behaviour of XOR gate. Hence the resultant sign bit of the output is determined by XOR of two sign bits of the numbers. Exponent part of the resultant is determined by using the formula, Er = E1 +E2-1023
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1988 Where E1 and E2 are the exponent part of two numbers. Hence the 11-bit adder is required for the exponent part of the resultant. Fractional part is obtained by the normal multiplication between fractional parts of the number. And result is rounded to 52 bits. Hence the multiplier is implemented for the fractional part. Hence all the trained network parameters are converted into IEEE 754 double precision format and the whole network is implemented using these floating-point operation modules. The module now takes the input andoperationsaredone withthe optimized weights to get the prediction. Fig-6: Simulation output From fig-6 we observe the output prediction (output_z) of the implemented hardware for the given mean normalized input square feet. 4. RESULTS AND CONCLUSIONS The implemented hardware using Verilog HDL produced the same result as that of the GPU or CPU which computes the result using ALU. The hardware implementation reduces the latency when compared with GPU or CPUinmanycases.Buttheimplementationof the neural network on the hardware takes more time and the training of neural network is not possible in FPGA as there are more training loops involved and the gradient computation and their update. REFERENCES [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification [2] Bengio, Yoshua and Glorot, Xavier. Understanding thedifficulty of training deep feedforward neural networks.InProceedings of AISTATS 2010, volume 9, pp. 249–256, May 2010. [3] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov Improving neural networks by preventing co-adaptation of feature detectors arXiv:1207.0580v1 [cs.NE]
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1989 [4] Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980v9 [cs.LG] [5] Geoffrey Hinton. 2012. Neural Networks for MachineLearning - Lecture 6a - Overview of mini-batch gradi-ent descent. [6] Yang You, Igor Gitman, Boris Ginsburg. Large Batch Training of Convolutional Networks. arXiv:1708.03888v3 [cs.CV] [7] Leslie N. Smith, Nicholay Topin. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. arXiv:1708.07120v3 [cs.LG] [8] Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167v3 [cs.LG] [9] Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent, Yoshua Bengio. Artificial Neural Networks Applied to Taxi Destination Prediction. arXiv:1508.00021v2 [cs.LG] [10] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G.,Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke,M.,Yu,Y.,and Zheng, X. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467 [cs.DC] [11] J. D. Hunter, "Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007. [12] Plotly Technologies Inc. (2015). Collaborative data science. Montreal, QC: Plotly Technolo-gies Inc. [13] S. Sahin, Y. Becerik1i, S. Yazici, "Neural network implementation in hardware using FPGAs", NIP Neural Information Processing, vol. 4234, no. 3, pp. 1105-1112, 2006. [14] Ranjith M S, S Parameshwara. Optimizing Neural Networks for Embedded Systems. IRJET Volume 7, Issue 4, April 2020 S.NO: 211 [15] Yufeng Hao. A General Neural Network Hardware Architecture on FPGA arXiv:1711.05860 [cs.CV] [16] S. Coric, I. Latinovic and A. Pavasovic, "A neural network FPGA implementation," Proceedings of the 5th Seminar on Neural Network Applications in Electrical Engineering. NEUREL 2000 (IEEE Cat. No.00EX287), Belgrade, Yugoslavia, 2000, pp. 117-120.