SlideShare a Scribd company logo
1 of 9
Download to read offline
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Fast Neural Network Training on FPGA Using Quasi-Newton
Optimization Method
Abstract:
In this brief, a customized and pipelined hardware implementation of the quasi-Newton
(QN) method on field-programmable gate array (FPGA) is proposed for fast artificial
neural networks onsite training, targeting at the embedded applications. The architecture
is scalable to cope with different neural network sizes while it supports batch-mode
training. Experimental results demonstrate the superior performance and power efficiency
of the proposed implementation over CPU, graphics processing unit, and FPGA QN
implementations.
Software Implementation:
 Modelsim
 Xilinx 14.2
Existing System:
Field-programmable gate arrays (FPGAs) have been considered as a promising
alternative to implement artificial neural networks (ANNs) for high-performance ANN
applications because of their massive parallelism, high reconfigurability (compared to
application specific integrated circuits), and better energy efficiency [compared to
graphics processing units (GPUs)]. However, the majority of the existing FPGA-based
ANN implementations was static implementations for specific offline applications
without learning capability. While achieving high performance, the static hardware
implementations of ANN suffer from low adaptability for a wide range of applications.
Onsite training provides a dynamic training flexibility for the ANN implementations.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Currently, high-performance CPUs and GPUs are widely used for offline training, but not
suitable for onsite training especially in embedded applications. It is complex to
implement onsite training in hardware, due to the following reasons. First, some software
features, such as floating-point number representations or advanced training techniques,
are not practical or expensive to be implemented in hardware. Second, the
implementation of batch-mode training increases the design complexity of hardware.
Batch training is that the ANN’s weight update is based on training error from all the
samples in training data set, which enables good convergence and efficiently prevents
from sample data perturbation attacks. However, the implementation of batch training on
an FPGA platform requires large data buffering and quick intermediate weights
computation. There have been FPGA implementations of ANNs with online training.
These implementations used backpropagation (BP) learning algorithm with non batch
training. Although widely used, the BP algorithm converges slowly, since it is essentially
a steepest descent method. Several advanced training algorithms, such as conjugate
gradient, quasi-Newton (QN), and Levenberg–Marquardt, have been proposed to speed
up the converging procedure and reduce training time, at the cost of memory resources.
This brief provides a feasible solution to the challenges in hardware implementation of
onsite training by implementing the QN training algorithm and supporting batch-mode
training on the latest FPGA. Our previous work implemented the Davidon–Fletcher–
Power QN (DFP-QN) algorithm on FPGA and achieved 17 times speedup over the
software implementation. To further improve performance, this brief proposes a hardware
implementation of the Broyden–Fletcher– Goldfarb–Shanno QN (BFGS-QN) algorithm
on FPGA for ANN training.
To the best of our knowledge, this is the first reported FPGA hardware accelerator of the
BFGS-QN algorithm in a deeply pipelined fashion. The performance of the
implementation is analyzed quantitatively. The architecture is designed with low
hardware cost in mind for scalable performance, and for both onsite and offline training.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
The designed accelerator is applied to ANN training and is able to cope with different
network sizes. The experimental results show that the proposed accelerator achieves
performance improvement up to 105 times for various neural network sizes, compared
with the CPU software implementation. The hardware implementation is also applied to
two real-world applications, showing superior performance.
Disadvantages:
 Performance is not efficient
 Power efficiency is lower
Proposed System:
FPGA Hardware implementation
Because the steps for computing B, λ, and g are three computationally intensive parts in
the BFGS-QN algorithm, we mainly describe how the three parts are designed.
B Matrix Computation
The B computation (BC) block derives an n × n matrix, where n is the total number of
weights, according to step 5 of Algorithm 1. The most computationally intensive
operations are matrix-by-vector multiplication (MVM). For scalable performance and
modularized architecture, MVM is implemented as multiple vector-by-vector
multiplications (VVMs). Three on-chip RAM units, each storing n words, are used to
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
store the intermediate computation results
which are repeatedly used in the subsequent computations.
Fig. 1. Flowchart of the GSS algorithm.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Fig. 2. Architecture of λ computation block. (a) Forward–backward block for determining a search
interval [a0, b0]. (b) GSS block. The operations in the dashed blocks share the same hardware.
In addition, an n-word first-input–first-ouput is used to match two input data streams of
the final addition and form Bk+1 row by row. where L mult, L sub, and L div are the latency
of multiplier, subtractor, and divider, respectively, and Tvector is the number of execution
cycles of a VVM. A deeply pipelined VVM unit is implemented with Tvector = n + + L
add[log2 L add + 2], where the last two items are for draining out the pipeline. 2) Step Size
λ Computation: An exact line search method is implemented to find λk according to step
2 of Algorithm 1. The method includes two sub modules: the forward–backward
algorithm determining an initial search interval that contains the minimizer, and the GSS
algorithm shown in Fig. 1 implemented to reduce the interval iteratively. Fig. 2 shows the
hardware architecture, where the computation time of each block is analyzed. The
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
architecture comprises a number of pipelined arithmetic operations and the objective
function ET evaluation.
Fig. 3. (a) Architecture of the objective function evaluation module. (b) Micro architecture of neural
model computation.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Fig. 4. Architecture of the GC unit.
The dashed blocks in Fig. 2(a) and (b) are the same piece of hardware shared by the
corresponding operations. The complex objective function (1) needs to be frequently
evaluated during λ computation. Therefore, we also implement the objective function
evaluation in hardware as a separate block for high performance. It can be modified
according to different types of ANNs while other parts of the implementation remain the
same. As shown in Fig. 3(a), the block is implemented in two parts: the first part
exercises the ANN to obtain outputs and the second part computes the training error. Fig.
3(b) shows the structure of the first part. Two accumulation units are implemented to
obtain h and yˆ shown. Note that the accumulation unit is implemented differently from
the one in the VVM unit, by using length variable shift registers
Table I
Storage space required by each module
TABLE II
FP units required in the implementation
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Resource Usage Analysis
During the computation, intermediate results are accessed in different patterns: 1) some
results, such as w and g, which are connected to multiple calculation modules, are read
more than once and 2) the values, such as h, F(h), and δ, are read in different orders from
writing. Also, the batch training is supported. Therefore, all intermediate results are
buffered in FPGA on-chip memories, for pipelined and parallel computations and data
reuse. FPGA on-chip dual-port RAMs are used to implement the buffers. The required
storage space of each module is shown in Table I. As n increases, off-chip memory will
be used and a properly designed memory hierarchy containing on-chip and off chip
memories is needed to prevent speed slowdown. In addition, all training data are stored in
off-chip memories for applications with large number of training data. The overall data
path of the BFGS-QN hardware design needs a number of floating-point (FP) units, as
shown in Table II. The number of required FP units is independent of n.
Advantages:
 Performance is efficient
 Power efficiency is higher
References:
[1] J. Zhu and P. Sutton, ―FPGA implementations of neural networks— A survey of a decade of
progress,‖ in Field Programmable Logic and Application (FPL) (Lecture Notes in Computer Science),
vol. 2778, P. Y. K. Cheung and G. A. Constantinides, Eds. Berlin, Germany: Springer-Verlag, 2003.
[2] Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, ―Scalable and modularized RTL compilation of
convolutional neural networks onto FPGA,‖ in Proc. Int. Conf. Field Program. Logic Appl., Aug./Sep.
2016, pp. 1–8.
NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
[3] A. Gomperts, A. Ukil, and F. Zurfluh, ―Development and implementation of parameterized FPGA-
based general purpose neural networks for online applications,‖ IEEE Trans. Ind. Informat., vol. 7, no.
1, pp. 78–89, Feb. 2011.
[4] N. Akhtar and A. Mian. (Feb. 2018). ―Threat of adversarial attacks on deep learning in computer
vision: A survey.‖ [Online]. Available: https://arxiv.org/abs/1801.00553
[5] R. G. Gironés, R. G. Gironés, R. C. Palero, J. C. Boluda, J. C. Boluda, and A. S. Cortés, ―FPGA
implementation of a pipelined on-line backpropagation,‖ J. VlSI Signal Process. Syst. Signal, Image
Video Technol., vol. 40, no. 2, pp. 189–213, 2005.
[6] Q. Liu, R. Sang, and Q. Zhang, ―FPGA-based acceleration of DavidonFletcher-Powell quasi-Newton
optimization method,‖ Trans. Tianjin Univ., vol. 22, no. 5, pp. 381–387, 2016.
[7] S. Razavi and B. A. Tolson, ―A new formulation for feedforward neural networks,‖ IEEE Trans.
Neural Netw., vol. 22, no. 10, pp. 1588–1598, Oct. 2011.
[8] Q. J. Zhang and K. C. Gupta, Neural Networks for RF and Microwave Design. London, U.K.: Artech
House, 2000.
[9] W. Sun and Y.-X. Yuan, Optimization Theory and Methods. New York, NY, USA: Springer-Verlag,
2006.
[10] Q. Liu, G. A. Constantinides, K. Masselos, and P. Y. K. Cheung, ―Combining data reuse with data-
level parallelization for FPGA-targeted hardware compilation: A geometric programming framework,‖
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, no. 3, pp. 305–315, Mar. 2009.
[11] F. Feng, C. Zhang, J. Ma, and Q.-J. Zhang, ―Parametric modeling of EM behavior of microwave
components using combined neural networks and pole-residue-based transfer functions,‖ IEEE Trans.
Microw. Theory Techn., vol. 64, no. 1, pp. 60–77, Jan. 2016.
[12] Q. J. Zhang. (2013). Neuromodeler. [Online]. Available: http://www.doe. carleton.ca/~qjz/ [13] J.
Martens, ―Second-order optimization for neural networks,‖ Ph.D. dissertation, Graduate Dept. Comput.
Sci., Univ. Toronto, Toronto, ON, Canada, 2016. [Online]. Available: http://www.cs.toronto.
edu/~jmartens/docs/thesis_phd_martens.pdf

More Related Content

Similar to Fast neural network training on fpga using quasi newton optimization method

A high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolutionA high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolution
Nxfee Innovation
 
NaveadKazi_resume
NaveadKazi_resumeNaveadKazi_resume
NaveadKazi_resume
Navead Kazi
 
Sanjay Kumar resume
Sanjay Kumar resumeSanjay Kumar resume
Sanjay Kumar resume
Sanjay kumar
 
Mridul_Mandal_Resume_5+yrs_QA
Mridul_Mandal_Resume_5+yrs_QAMridul_Mandal_Resume_5+yrs_QA
Mridul_Mandal_Resume_5+yrs_QA
Mridul Mandal
 
Resume_Sunil_Kumara_KM
Resume_Sunil_Kumara_KMResume_Sunil_Kumara_KM
Resume_Sunil_Kumara_KM
sunilmsp
 
Puja_CV_6th July_Latest
Puja_CV_6th July_LatestPuja_CV_6th July_Latest
Puja_CV_6th July_Latest
puja paul
 

Similar to Fast neural network training on fpga using quasi newton optimization method (20)

A high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolutionA high accuracy programmable pulse generator with a 10-ps timing resolution
A high accuracy programmable pulse generator with a 10-ps timing resolution
 
Design and fpga implementation of a reconfigurable digital down converter for...
Design and fpga implementation of a reconfigurable digital down converter for...Design and fpga implementation of a reconfigurable digital down converter for...
Design and fpga implementation of a reconfigurable digital down converter for...
 
Approximate sum of-products designs based on distributed arithmetic
Approximate sum of-products designs based on distributed arithmeticApproximate sum of-products designs based on distributed arithmetic
Approximate sum of-products designs based on distributed arithmetic
 
NaveadKazi_resume
NaveadKazi_resumeNaveadKazi_resume
NaveadKazi_resume
 
Sanjay Kumar resume
Sanjay Kumar resumeSanjay Kumar resume
Sanjay Kumar resume
 
Securing the present block cipher against combined side channel analysis and ...
Securing the present block cipher against combined side channel analysis and ...Securing the present block cipher against combined side channel analysis and ...
Securing the present block cipher against combined side channel analysis and ...
 
Analysis and design of cost effective, high-throughput ldpc decoders
Analysis and design of cost effective, high-throughput ldpc decodersAnalysis and design of cost effective, high-throughput ldpc decoders
Analysis and design of cost effective, high-throughput ldpc decoders
 
akash_cv
akash_cvakash_cv
akash_cv
 
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
Feedback based low-power soft-error-tolerant design for dual-modular redundancyFeedback based low-power soft-error-tolerant design for dual-modular redundancy
Feedback based low-power soft-error-tolerant design for dual-modular redundancy
 
Efficient fpga mapping of pipeline sdf fft cores
Efficient fpga mapping of pipeline sdf fft coresEfficient fpga mapping of pipeline sdf fft cores
Efficient fpga mapping of pipeline sdf fft cores
 
MR_Resume
MR_ResumeMR_Resume
MR_Resume
 
Mridul_Mandal_Resume_5+yrs_QA
Mridul_Mandal_Resume_5+yrs_QAMridul_Mandal_Resume_5+yrs_QA
Mridul_Mandal_Resume_5+yrs_QA
 
Resume_Sunil_Kumara_KM
Resume_Sunil_Kumara_KMResume_Sunil_Kumara_KM
Resume_Sunil_Kumara_KM
 
Puja_CV_6th July_Latest
Puja_CV_6th July_LatestPuja_CV_6th July_Latest
Puja_CV_6th July_Latest
 
Resume
ResumeResume
Resume
 
IRJET- Rice QA using Deep Learning
IRJET- Rice QA using Deep LearningIRJET- Rice QA using Deep Learning
IRJET- Rice QA using Deep Learning
 
CV_Swapnil_Deshmukh
CV_Swapnil_DeshmukhCV_Swapnil_Deshmukh
CV_Swapnil_Deshmukh
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
 

More from Nxfee Innovation

A reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applicationsA reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applications
Nxfee Innovation
 

More from Nxfee Innovation (12)

VLSI IEEE Transaction 2018 - IEEE Transaction
VLSI IEEE Transaction 2018 - IEEE Transaction VLSI IEEE Transaction 2018 - IEEE Transaction
VLSI IEEE Transaction 2018 - IEEE Transaction
 
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...
 
The implementation of the improved omp for aic reconstruction based on parall...
The implementation of the improved omp for aic reconstruction based on parall...The implementation of the improved omp for aic reconstruction based on parall...
The implementation of the improved omp for aic reconstruction based on parall...
 
Low complexity methodology for complex square-root computation
Low complexity methodology for complex square-root computationLow complexity methodology for complex square-root computation
Low complexity methodology for complex square-root computation
 
Combating data leakage trojans in commercial and asic applications with time ...
Combating data leakage trojans in commercial and asic applications with time ...Combating data leakage trojans in commercial and asic applications with time ...
Combating data leakage trojans in commercial and asic applications with time ...
 
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...
 
A reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applicationsA reconfigurable ldpc decoder optimized applications
A reconfigurable ldpc decoder optimized applications
 
A flexible wildcard pattern matching accelerator via simultaneous discrete fi...
A flexible wildcard pattern matching accelerator via simultaneous discrete fi...A flexible wildcard pattern matching accelerator via simultaneous discrete fi...
A flexible wildcard pattern matching accelerator via simultaneous discrete fi...
 
A closed form expression for minimum operating voltage of cmos d flip-flop
A closed form expression for minimum operating voltage of cmos d flip-flopA closed form expression for minimum operating voltage of cmos d flip-flop
A closed form expression for minimum operating voltage of cmos d flip-flop
 
A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...
A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...
A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...
 
A 12 bit 40-ms s sar adc with a fast-binary-window dac switching scheme
A 12 bit 40-ms s sar adc with a fast-binary-window dac switching schemeA 12 bit 40-ms s sar adc with a fast-binary-window dac switching scheme
A 12 bit 40-ms s sar adc with a fast-binary-window dac switching scheme
 
Nxfee Innovation Brochure
Nxfee Innovation BrochureNxfee Innovation Brochure
Nxfee Innovation Brochure
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Recently uploaded (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Fast neural network training on fpga using quasi newton optimization method

  • 1. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ Fast Neural Network Training on FPGA Using Quasi-Newton Optimization Method Abstract: In this brief, a customized and pipelined hardware implementation of the quasi-Newton (QN) method on field-programmable gate array (FPGA) is proposed for fast artificial neural networks onsite training, targeting at the embedded applications. The architecture is scalable to cope with different neural network sizes while it supports batch-mode training. Experimental results demonstrate the superior performance and power efficiency of the proposed implementation over CPU, graphics processing unit, and FPGA QN implementations. Software Implementation:  Modelsim  Xilinx 14.2 Existing System: Field-programmable gate arrays (FPGAs) have been considered as a promising alternative to implement artificial neural networks (ANNs) for high-performance ANN applications because of their massive parallelism, high reconfigurability (compared to application specific integrated circuits), and better energy efficiency [compared to graphics processing units (GPUs)]. However, the majority of the existing FPGA-based ANN implementations was static implementations for specific offline applications without learning capability. While achieving high performance, the static hardware implementations of ANN suffer from low adaptability for a wide range of applications. Onsite training provides a dynamic training flexibility for the ANN implementations.
  • 2. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ Currently, high-performance CPUs and GPUs are widely used for offline training, but not suitable for onsite training especially in embedded applications. It is complex to implement onsite training in hardware, due to the following reasons. First, some software features, such as floating-point number representations or advanced training techniques, are not practical or expensive to be implemented in hardware. Second, the implementation of batch-mode training increases the design complexity of hardware. Batch training is that the ANN’s weight update is based on training error from all the samples in training data set, which enables good convergence and efficiently prevents from sample data perturbation attacks. However, the implementation of batch training on an FPGA platform requires large data buffering and quick intermediate weights computation. There have been FPGA implementations of ANNs with online training. These implementations used backpropagation (BP) learning algorithm with non batch training. Although widely used, the BP algorithm converges slowly, since it is essentially a steepest descent method. Several advanced training algorithms, such as conjugate gradient, quasi-Newton (QN), and Levenberg–Marquardt, have been proposed to speed up the converging procedure and reduce training time, at the cost of memory resources. This brief provides a feasible solution to the challenges in hardware implementation of onsite training by implementing the QN training algorithm and supporting batch-mode training on the latest FPGA. Our previous work implemented the Davidon–Fletcher– Power QN (DFP-QN) algorithm on FPGA and achieved 17 times speedup over the software implementation. To further improve performance, this brief proposes a hardware implementation of the Broyden–Fletcher– Goldfarb–Shanno QN (BFGS-QN) algorithm on FPGA for ANN training. To the best of our knowledge, this is the first reported FPGA hardware accelerator of the BFGS-QN algorithm in a deeply pipelined fashion. The performance of the implementation is analyzed quantitatively. The architecture is designed with low hardware cost in mind for scalable performance, and for both onsite and offline training.
  • 3. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ The designed accelerator is applied to ANN training and is able to cope with different network sizes. The experimental results show that the proposed accelerator achieves performance improvement up to 105 times for various neural network sizes, compared with the CPU software implementation. The hardware implementation is also applied to two real-world applications, showing superior performance. Disadvantages:  Performance is not efficient  Power efficiency is lower Proposed System: FPGA Hardware implementation Because the steps for computing B, λ, and g are three computationally intensive parts in the BFGS-QN algorithm, we mainly describe how the three parts are designed. B Matrix Computation The B computation (BC) block derives an n × n matrix, where n is the total number of weights, according to step 5 of Algorithm 1. The most computationally intensive operations are matrix-by-vector multiplication (MVM). For scalable performance and modularized architecture, MVM is implemented as multiple vector-by-vector multiplications (VVMs). Three on-chip RAM units, each storing n words, are used to
  • 4. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ store the intermediate computation results which are repeatedly used in the subsequent computations. Fig. 1. Flowchart of the GSS algorithm.
  • 5. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ Fig. 2. Architecture of λ computation block. (a) Forward–backward block for determining a search interval [a0, b0]. (b) GSS block. The operations in the dashed blocks share the same hardware. In addition, an n-word first-input–first-ouput is used to match two input data streams of the final addition and form Bk+1 row by row. where L mult, L sub, and L div are the latency of multiplier, subtractor, and divider, respectively, and Tvector is the number of execution cycles of a VVM. A deeply pipelined VVM unit is implemented with Tvector = n + + L add[log2 L add + 2], where the last two items are for draining out the pipeline. 2) Step Size λ Computation: An exact line search method is implemented to find λk according to step 2 of Algorithm 1. The method includes two sub modules: the forward–backward algorithm determining an initial search interval that contains the minimizer, and the GSS algorithm shown in Fig. 1 implemented to reduce the interval iteratively. Fig. 2 shows the hardware architecture, where the computation time of each block is analyzed. The
  • 6. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ architecture comprises a number of pipelined arithmetic operations and the objective function ET evaluation. Fig. 3. (a) Architecture of the objective function evaluation module. (b) Micro architecture of neural model computation.
  • 7. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ Fig. 4. Architecture of the GC unit. The dashed blocks in Fig. 2(a) and (b) are the same piece of hardware shared by the corresponding operations. The complex objective function (1) needs to be frequently evaluated during λ computation. Therefore, we also implement the objective function evaluation in hardware as a separate block for high performance. It can be modified according to different types of ANNs while other parts of the implementation remain the same. As shown in Fig. 3(a), the block is implemented in two parts: the first part exercises the ANN to obtain outputs and the second part computes the training error. Fig. 3(b) shows the structure of the first part. Two accumulation units are implemented to obtain h and yˆ shown. Note that the accumulation unit is implemented differently from the one in the VVM unit, by using length variable shift registers Table I Storage space required by each module TABLE II FP units required in the implementation
  • 8. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ Resource Usage Analysis During the computation, intermediate results are accessed in different patterns: 1) some results, such as w and g, which are connected to multiple calculation modules, are read more than once and 2) the values, such as h, F(h), and δ, are read in different orders from writing. Also, the batch training is supported. Therefore, all intermediate results are buffered in FPGA on-chip memories, for pipelined and parallel computations and data reuse. FPGA on-chip dual-port RAMs are used to implement the buffers. The required storage space of each module is shown in Table I. As n increases, off-chip memory will be used and a properly designed memory hierarchy containing on-chip and off chip memories is needed to prevent speed slowdown. In addition, all training data are stored in off-chip memories for applications with large number of training data. The overall data path of the BFGS-QN hardware design needs a number of floating-point (FP) units, as shown in Table II. The number of required FP units is independent of n. Advantages:  Performance is efficient  Power efficiency is higher References: [1] J. Zhu and P. Sutton, ―FPGA implementations of neural networks— A survey of a decade of progress,‖ in Field Programmable Logic and Application (FPL) (Lecture Notes in Computer Science), vol. 2778, P. Y. K. Cheung and G. A. Constantinides, Eds. Berlin, Germany: Springer-Verlag, 2003. [2] Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, ―Scalable and modularized RTL compilation of convolutional neural networks onto FPGA,‖ in Proc. Int. Conf. Field Program. Logic Appl., Aug./Sep. 2016, pp. 1–8.
  • 9. NXFEE INNOVATION (SEMICONDUCTOR IP &PRODUCT DEVELOPMENT) (ISO : 9001:2015Certified Company), # 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam, Pondicherry– 605004, India. Buy Project on Online :www.nxfee.com | contact : +91 9789443203 | email : nxfee.innovation@gmail.com _________________________________________________________________ [3] A. Gomperts, A. Ukil, and F. Zurfluh, ―Development and implementation of parameterized FPGA- based general purpose neural networks for online applications,‖ IEEE Trans. Ind. Informat., vol. 7, no. 1, pp. 78–89, Feb. 2011. [4] N. Akhtar and A. Mian. (Feb. 2018). ―Threat of adversarial attacks on deep learning in computer vision: A survey.‖ [Online]. Available: https://arxiv.org/abs/1801.00553 [5] R. G. Gironés, R. G. Gironés, R. C. Palero, J. C. Boluda, J. C. Boluda, and A. S. Cortés, ―FPGA implementation of a pipelined on-line backpropagation,‖ J. VlSI Signal Process. Syst. Signal, Image Video Technol., vol. 40, no. 2, pp. 189–213, 2005. [6] Q. Liu, R. Sang, and Q. Zhang, ―FPGA-based acceleration of DavidonFletcher-Powell quasi-Newton optimization method,‖ Trans. Tianjin Univ., vol. 22, no. 5, pp. 381–387, 2016. [7] S. Razavi and B. A. Tolson, ―A new formulation for feedforward neural networks,‖ IEEE Trans. Neural Netw., vol. 22, no. 10, pp. 1588–1598, Oct. 2011. [8] Q. J. Zhang and K. C. Gupta, Neural Networks for RF and Microwave Design. London, U.K.: Artech House, 2000. [9] W. Sun and Y.-X. Yuan, Optimization Theory and Methods. New York, NY, USA: Springer-Verlag, 2006. [10] Q. Liu, G. A. Constantinides, K. Masselos, and P. Y. K. Cheung, ―Combining data reuse with data- level parallelization for FPGA-targeted hardware compilation: A geometric programming framework,‖ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, no. 3, pp. 305–315, Mar. 2009. [11] F. Feng, C. Zhang, J. Ma, and Q.-J. Zhang, ―Parametric modeling of EM behavior of microwave components using combined neural networks and pole-residue-based transfer functions,‖ IEEE Trans. Microw. Theory Techn., vol. 64, no. 1, pp. 60–77, Jan. 2016. [12] Q. J. Zhang. (2013). Neuromodeler. [Online]. Available: http://www.doe. carleton.ca/~qjz/ [13] J. Martens, ―Second-order optimization for neural networks,‖ Ph.D. dissertation, Graduate Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, 2016. [Online]. Available: http://www.cs.toronto. edu/~jmartens/docs/thesis_phd_martens.pdf