SlideShare a Scribd company logo
1 of 11
Download to read offline
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[872]
FPGA IMPLEMENTATION OF APPROXIMATE SOFTMAX FUNCTION FOR
EFFICIENT CNN INFERENCE
Mohammed Abdullah Mubarak Alshahrani*1
*1Department of Electrical and Computer Engineering, King Abdulaziz University,
Jeddah, Makkah, Saudi Arabia.
ABSTRACT
Softmax function is an integral part of object detection frameworks based on most deep or shallow neural
networks. While the configuration of different operation layers in a neural network can be quite different,
softmax operation is fixed. With the recent advances in object detection approaches, especially with the
introduction of highly accurate convolutional neural networks, researchers and developers have suggested
different hardware architectures to speed up the overall operation of these compute-intensive algorithms.
Xilinx, one of the leading FPGA vendors, has recently introduced a deep neural network development kit for
exactly this purpose. However, due to the complex nature of softmax arithmetic hardware involving
exponential function, this functionality is only available for bigger devices. For smaller devices, this operation is
bound to be implemented in software. In this paper, a light-weight hardware implementation of this function
has been proposed which does not require too many logic resources when implemented on an FPGA device.
The proposed design is based on the analysis of the statistical properties of a custom convolutional neural
network when used for classification on a standard dataset i.e. CIFAR-10. Specifically, instead of using a brute
force approach to design a generic full precision arithmetic circuit for SoftMax function using real numbers, an
approximate integer-only design has been suggested for the limited range of operands encountered in real-
world scenario. The approximate circuit uses fewer logic resources since it involves computing only a few
iterations of the series expansion of exponential function. However, despite using fewer iterations, the function
has been shown to work as good as the full precision circuit for classification and leads to only minimal error
being introduced in the associated probabilities. The circuit has been synthesized using Hardware Description
Language (HDL) Coder and Vision HDL toolboxes in Simulink® by Mathworks® which provide higher level
abstraction of image processing and machine learning algorithms for quick deployment on a variety of target
hardware. The final design has been implemented on a Xilinx FPGA development board i.e. Zedboard which
contains the necessary hardware components such as USB, Ethernet and HDMI interfaces etc. to implement a
fully working system capable of processing a machine learning application in real-time.
Keywords: FPGA, High-Level Synthesis, Machine Learning, Convolutional Neural Networks, SoftMax.
I. INTRODUCTION
The rise of machine learning applications for object classification in images and videos has risen the demand for
real-time realization of such complex algorithms on dedicated hardware. In the recent years Convolutional
Neural Networks (CNN) have made their mark as the most effective machine learning algorithm for
classification operation in various domains. Most CNN architectures use SoftMax as the final layer before the
output one. Softmax is a sort of normalization operation which transforms the arbitrary input operands into
probability distribution points which add to a sum of one. Mathematically, it is represented as,
( )
∑
(1)
Where ‘x’ is a ‘j’-dimensional input vector and ‘ ( )’ is the perceived output probability of its ‘ith’ element.
Thus, for a classification operation with ‘j’ number of classes, the softmax function can be used to calculate the
relative confidence score of the classifier for each class. This can be understood from the standpoint of a neural
network [1] with certain hidden processing layers and an output classification layers such as that depicted in
Fig. 1.
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[873]
Figure 1: Data flow through a typical neural network
Typically the outputs of a neural network depicting a given class (e.g. a dog or a cat) are independent logistic
regression classifiers with values within the range [0 1]. In order to convert these independent output values to
a probability vector, SoftMax function depicted by eq. (1) is employed. Without this function, the different
classes will each have an independent real-value probability but the results will not add up to one necessarily
leading to difficulty in interpreting the results. This function is a generalization of the logistic function used in
logistic regression classifiers so that it could be used for multiple classes. While other layers of neural networks
employ simpler multiplication and addition arithmetic units, Softmax is particularly complex in nature due to
the inclusion of exponential and division operation. Thus, several hardware-based implementations suffer from
exorbitantly large resource utilization and poor performance due to long logic delay. To this end, many
researchers have proposed approximate implementations of this unit which necessarily result in loss of result
precision. Thus, there is a need to explore a variety of circuit design techniques to lower the computational
complexity of this inevitable component of neural networks while keeping the result precision at the acceptable
levels. Moreover, given the complex nature of the parent neural networks themselves, there is a need to
simplify the overall design process as well so that the design and test procedures could be completed within
reasonable time. This calls for employing high-level synthesis tools and developing the frameworks to make
this process easier for neural network experts less acquainted with the hardware design process. In this work,
we have designed a SoftMax implementation that is efficient in terms of resource utilization of hardware logic
circuits while keeping the result accuracy at the acceptable levels. The framework has been developed using
high-level synthesis and testing tools.
II. LITERATURE REVIEW
Given the importance of SoftMax function in deep neural networks, several research works have considered its
hardware implementation using different strategies. Recently, Li et al. [2] have described an FPGA-based
hardware approximate implementation using Look Up Tables (LUT) and piece-wise linear interpolation. This
work is based on a pipelined approach and uses multistage Wallace and other multiplier structures to speed up
the overall computation. Such an intricate structure is needed since the Softmax function requires multiple
exponential, addition, multiplication and division units. Similarly, Kouretas and Vassilis [3] have described
another approximate computing architecture with adaptive approximation to tradeoff complexity with
accuracy of the results. Realizing the high complexity, long critical path delay and associated overflow problem,
Yuan [4] has suggested using down-scaling and domain transformation to eliminate the aforementioned
problems. On the same pattern, Du et al. [5] have described a Softmax implementation based on LUTs and an
arithmetic unit to calculate natural logarithm using Maclaurin series expansion. Since, the main computational
unit in Softmax function is the exponent, its direct implementation using well-known hardware design
techniques is also relevant. Thus, a CORDIC algorithm-based FPGA implementation as suggested by Rekha and
Menon [6] and a short Taylor expansion-based implementation proposed by Jamro et a. [7] also provides
further insight into the problem of speeding up this expensive operation using a dedicated hardware
implementation. Other hardware implementation techinques for arithmetic units such as Distributed
Arithmetic [8] and Common Sub-expression sharing [9] etc. can also be considered for reducing the associated
circuit complexity.
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[874]
III. METHODOLOGY
In this work, we have proposed to build a SoftMax hardware accelerator on an FPGA device and consider
various resource saving techniques mentioned in the literature to come up with the most optimal configuration
suitable for current generation deep neural networks. Specifically, signal statistics have been analyzed to design
the most suitable hardware structure for this important function while conserving precious logic resources. For
this purpose, a popular standard dataset for image recognition task has been selected i.e. CIFAR-10 [11]. For
experimentation on this dataset, a custom CNN has been trained in Matlab environment. This deep neural
network uses residual links for faster and efficient training [12] and has been depicted in Figure 2.
Figure 2: Custom Residual CNN for CIFAR-10 Dataset
In this work, we have focused only on the hardware implementation of the Softmax layer which is an essential
component of the CNN architecture and determines the final output class as shown in Figure 2. However, given
the complexity of the exponent function of eq. (1), its corresponding hardware implementation is too complex.
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[875]
Thus, to come up with a low complexity hardware, we have analyzed the signal statistics of the inputs to this
function for the CIFAR-10 dataset and their relation to the final detection accuracy of the whole CNN
architecture.
Figure 3: Confusion matrix for the CIFAR-10 dataset with full precision SoftMax function
Figure 3 shows the confusion matrix obtained when the custom CNN show in Figure 2 is used to process CIFAR-
10 dataset with full precision SoftMax implementation. The overall accuracy is 9.54% on the validation set and
2.62% on the training set.
Figure 4: Histogram of input values to the SoftMax function for CIFAR-10 dataset
Figure 4 illustrates the data statistics collected for CIFAR-10 dataset when processed using the custom CNN
architecture shown in Figure 2. From the statistics it can be clearly seen that the bulk of the input values are
within a narrow range around 0 i.e. [-10 10]. Thus, it does not seem appropriate to design a hardware circuit for
a generic wider range of inputs because that leads to a very complex hardware circuit. Since the exponent
function is the main complicated operation in eq. (1), we keep our discussion focused on this operation only. It
can be clearly seen in Figure 5, that if the whole input range of [-20 40] is considered, the full precision
exponent function has a very wide output range i.e. [0 2.5 × 1017]. This requires enormous logic resources to
implement the functionality in hardware. Moreover, the input operands are real numbers with fractional parts
as well. This necessitates the use of floating- or fixed-point number formats which adds to the circuit
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[876]
complexity even more. Realizing these two important properties of the input operands, i.e. the range as well as
the format, various approximations have been applied systematically in this work to reduce the computational
load without affecting the classification accuracy beyond reasonable levels. Moreover, the exponent function
itself can be approximated using its series expansion with truncation. However, the all these approximations,
i.e. limiting the range, quantizing to integer-only numbers and truncating the series expansion have to be
analyzed for the combined effect on the result precision when applied to the real-world scenario. For this
purpose, as mentioned earlier, the custom CNN architecture with approximate SoftMax function has been
evaluated on the standard CIFAR-10 dataset.
Figure 5: Exponent function plot for the range of input values to SoftMax function
As mentioned earlier, the main arithmetic units in the SoftMax function are the exponential and division
functions. The hardware design entry of complex functions such as that of Softmax and its integration within
the larger deep neural network is, however, too complex to be handled using conventional hardware
description language (HDL) approach. Thus, we have employed the high-level synthesis tool supplied by
Mathworks in Simulink i.e. HDL Coder toolbox. The HDL coder generates the HDL code (Verilog or VHDL)
automatically which can then be incorporate as a hardware accelerator in a larger system employing both
software (CPU) and hardware accelerator combined called “Hardware-Software Co-Design”. One big advantage
of using Simulink for such designs is that the environment can be easily setup for simulation using real images
(datasets) and the functionality can be tested before finally incorporating into the hardware. The whole
hardware-software co-design system has been implemented on an FPGA SOC i.e. Zedboard which contains both
processor and programmable sections. The main application for deep neural network runs on the processor
while the Softmax accelerator has been implemented on the FPGA logic fabric accessible to the software
through standard bus interface i.e. AXI interconnect.
Figure 6: Approximate exponent function employed as a hardware accelerator within a
hardware-software co-design in an FPGA
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[877]
Figure 7 shows the whole hardware system with the Zedboard FPGA platform where a full hardware-software
co-design has been ported. This system uses Ubuntu operating system to provide the user-interface while the
hardware portion incorporates an accelerator for the desired functionality. The hardware and software work
seamlessly together through the standard bus interface as show in Figure 6 above.
Figure 7: The Hardware-Software Co-Design implemented on Zedboard FPGA for deep neural network
processing on live video stream
IV. RESULTS AND DISCUSSION
As mentioned in the previous section, various approximations can be applied to the SoftMax operation to lower
its computational demand while keeping the result accuracy higher. In this section, the effect of these different
approximation methods has been analyzed when applied to real-world scenario.
Approximating SoftMax Function through Integer-Only Operations
The first approximation technique applied to the whole CNN inference framework on standard CIFAR-10
dataset is the conversion of input operands to the integer-only format by dropping the fractional part without
rounding as given by,
⌊ ⌋ (2)
Although rounding seems a better method than truncation, the overall accuracy of the CNN classifier did not
register any significant drop and very similar 9.56% and 2.89% detection rates were observed on the validation
set and the training set respectively. There were, however, negligible errors in the calculation of confidence
scores in the validation set as depicted in the histogram of errors shown in Figure 8. It can be noticed that
almost all the confidence scores had zero error with only a tiny percentage (0.21 %) showing errors as little as
0.01. Thus, it can be safely concluded that integer-only operation does not affect the result accuracy
significantly while reducing the computational load from floating point operation to integer operation.
Figure 8: Histogram of errors in confidence score of the validation set introduced due
to integer-only operation
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[878]
Approximating SoftMax Function through Limiting the Operand Range
The second approximation considered in this work is limiting the range of input operands. The first step in this
regard is limiting the operands to only positive integers i.e. [0: ∞]. Later, the range is systematically reduced to
[0: 31], [0: 15] and [0: 7] to correspond to binary bit representation of 5-bits, 4-bits and 3-bits respectively.
These successive approximations on top of using integer-only operands lead to increasingly larger errors in
both classification accuracy and confidence scores. The errors have been reported in Figures 9 to 12 and Table
1. It can be observed that the result fidelity in both the classification accuracy and the confidence scores has
been largely preserved for integer range up to [0: 15] and takes a significant hit below that. Thus, if the range is
limited to integers between a very narrow range i.e. [0: 7], the classification error increases to 22.5% while the
confidence scores can have an error up to 3.9%. The corresponding histograms of error also show the same
trend. In Figure 13, it can be seen that drastically reducing the input operand range to [0: 7], leads to higher
occurrences of non-zero errors. From this data, it can be concluded that the integer-only range [0: 15] can be
used safely with an acceptable range of error introduced due to the approximation in input operand
representation. This leads to a significant savings in the computation resources since only 4 bits are required
for number representation compared to the original floating point representation requiring at least 32 bits for
single precision representation.
Figure 9: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: ∞]
Figure 10: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 31]
Figure 11: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[879]
Figure 12: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 7]
Approximating SoftMax Function through series implementation
One of the most common techniques used in the literature to approximate exponent function is through
truncation of its series expansion. Precisely, the McLaurin series expansion of the exponent function is given as
follows,
∑ (3)
(4)
(5)
(6)
(7)
The infinite series for exponent function given by Eq. 3 can be approximated by first 2, 3, 4 or 5 terms as in
equations 4, 5, 6 and 7 respectively. The results of using these approximations along with the earlier
approximations i.e. integer-only operation with range limited to [0:15] have been reported in Figures 13 to 16
and Table 1. It can be noticed that although using only a two-term approximation does not lead to any
degradation of classification accuracy, a 10% error has been introduced in the confidence level scores. Using a
three-term approximation leads to a 5% error while using four terms gives 2.85 % error. 1.8 % error is given
when using five terms. With each additional term, the complexity of the operation grows. We, however,
conclude that using three or four terms is sufficient since the error is within acceptable range. Using five terms
gives a very low error but the additional complexity over four terms is not justified. To further reduce the
hardware complexity associated with division operation, it is suggested to use the nearest power-of-two
coefficients in eq. (6) to give,
(8)
As seen from the data in Table 1, this approximation does not affect the detection accuracy while only
marginally affecting the confidence scores.
Figure 13: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 2 term approximation of exponent function
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[880]
Figure 14: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 3 term approximation of exponent function
Figure 15: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 4 term approximation of exponent function
Figure 16: Histogram of errors in confidence score of the validation set introduced due to integer-only
operation limited to the range [0: 15] with 5 term approximation of exponent function
Table 1. Comparison of different approximation techniques using error on CIFAR-10 Dataset
SN. Method Validation Classification Error Validation Confidence Score Error
1 Full Precision 9.54 % 5 × 10-7 %
2 Integer-Only 9.56 % 0.21 %
3 Integer-Only [0: ∞] 9.56 % 0.24 %
4 Integer-Only [0: 31] 9.56 % 0.24 %
5 Integer-Only [0: 15] 9.73 % 0.31 %
6 Integer-Only [0: 7] 22.5 % 3.9 %
7 Integer-Only [0: 15], 2 series
terms
9.73 % 10.33 %
8 Integer-Only [0: 15], 3 series
terms
9.73 % 5.1%
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[881]
9 Integer-Only [0: 15], 4 series
terms
9.73 % 2.85 %
10 Integer-Only [0: 15], 5 series
terms
9.73 % 1.8 %
11 Integer-Only [0: 15], 4 series
terms with power-of-two
coefficients
9.73 % 3.0 %
The proposed approximate exponent function for SoftMax operation in the custom CNN architecture has been
implemented in the Simulink HDL coder to generate it’s HDL code for use in the hardware-software co-design
shown in Figure 6 above. The schematic of this proposed design has been depicted in Figure 17. This is the
implementation of the proposed design given by eq. (8) and gives an error of 9.73 % in detection accuracy and
3.0 % in confidence scores when tested on CIFAR-10 dataset (Table 1).
Figure 17: High-Level circuit diagram for the approximate exponent function using Simulink HDL Coder
V. CONCLUSION
An approximate circuit for implementation of SoftMax function as used in standard CNN architectures has been
described in this work. For this purpose, various approximation techniques related to the range and type of
operands and series expansion of exponent function have been employed. The considered techniques have
been motivated by the actual signal statistics gathered while processing a real world standard dataset i.e.
CIFAR-10. To test the setup, a custom CNN has been trained and tested with the proposed approximations
implemented in the full system. The results show that the proposed approximation lead to negligible loss in
CNN’s detection accuracy as well as the confidence scores while reducing the circuit complexity significantly.
VI. REFERENCES
[1] Online link: Super Data Science, “Convolutional Neural Networks (CNN): Softmax & Cross-Entropy”
available at https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-softmax-
crossentropy, accessed 28th Nov, 2020.
[2] Z. Li, H. Li, X. Jiang, B. Chen, Y. Zhang and G. Du, "Efficient FPGA Implementation of Softmax Function for
DNN Applications," 2018 12th IEEE International Conference on Anti-counterfeiting, Security, and
Identification (ASID), Xiamen, China, 2018, pp. 212-216
[3] Kouretas, I.; Paliouras, V. Hardware Implementation of a Softmax-Like Function for Deep Learning.
Technologies 2020, 8, 46
[4] Yuan, B. “Efficient hardware architecture of softmax layer in deep neural network.” 2016 29th IEEE
International System-on-Chip Conference (SOCC) (2016): 323-326.
[5] Gaoming Du, Chao Tian, Zhenmin Li, Duoli Zhang, Yongsheng Yin, and Yiming Ouyang, “Efficient
Softmax Hardware Architecture for Deep Neural Networks”, in Proceedings of the 2019 Great Lakes
Symposium on VLSI (GLSVLSI '19).
[6] R. Rekha and K. P. Menon, "FPGA implementation of exponential function using cordic IP core for
extended input range," 2018 3rd IEEE International Conference on Recent Trends in Electronics,
Information & Communication Technology (RTEICT), Bangalore, India, 2018, pp. 597-600
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[882]
[7] E. Jamro, K. Wiatr and M. Wielgosz, "FPGA Implementation of 64-Bit Exponential Function for HPC,"
2007 International Conference on Field Programmable Logic and Applications, Amsterdam, 2007, pp.
718-721
[8] NagaJyothi, Grande & Sriadibhatla, Sridevi. (2017). Distributed arithmetic architectures for FIR filters-
A comparative review. 2684-2690. 10.1109/WiSPNET.2017.8300250.
[9] Chip-Hong Chang and Mathias Faust, "A new common subexpression elimination algorithm for
realizing low-complexity higher order digital filters". Trans. Comp.-Aided Des. Integ. Cir. Sys. 29, 5 (May
2010), pp. 844–848.
[10] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer,
“SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”,
arXiv:1602.07360.
[11] Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, 2009.
[12] https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image-
classification.html

More Related Content

What's hot

Distance based cluster head section in sensor networks for efficient energy u...
Distance based cluster head section in sensor networks for efficient energy u...Distance based cluster head section in sensor networks for efficient energy u...
Distance based cluster head section in sensor networks for efficient energy u...
IAEME Publication
 
A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...
A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...
A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...
International Journal of Power Electronics and Drive Systems
 

What's hot (19)

IRJET- Review on Energy Optimization and Cluster based Routing Protocol in WSN
IRJET- Review on Energy Optimization and Cluster based Routing Protocol in WSNIRJET- Review on Energy Optimization and Cluster based Routing Protocol in WSN
IRJET- Review on Energy Optimization and Cluster based Routing Protocol in WSN
 
Distance based cluster head section in sensor networks for efficient energy u...
Distance based cluster head section in sensor networks for efficient energy u...Distance based cluster head section in sensor networks for efficient energy u...
Distance based cluster head section in sensor networks for efficient energy u...
 
A LOW-ENERGY DATA AGGREGATION PROTOCOL USING AN EMERGENCY EFFICIENT HYBRID ME...
A LOW-ENERGY DATA AGGREGATION PROTOCOL USING AN EMERGENCY EFFICIENT HYBRID ME...A LOW-ENERGY DATA AGGREGATION PROTOCOL USING AN EMERGENCY EFFICIENT HYBRID ME...
A LOW-ENERGY DATA AGGREGATION PROTOCOL USING AN EMERGENCY EFFICIENT HYBRID ME...
 
A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...
A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...
A Time-Frequency Transform Based Fault Detectionand Classificationof STATCOM ...
 
IRJET- An Efficient Dynamic Deputy Cluster Head Selection Method for Wireless...
IRJET- An Efficient Dynamic Deputy Cluster Head Selection Method for Wireless...IRJET- An Efficient Dynamic Deputy Cluster Head Selection Method for Wireless...
IRJET- An Efficient Dynamic Deputy Cluster Head Selection Method for Wireless...
 
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
 
Resume- EE
Resume- EEResume- EE
Resume- EE
 
Dead node detection in teen protocol survey
Dead node detection in teen protocol surveyDead node detection in teen protocol survey
Dead node detection in teen protocol survey
 
Dead node detection in teen protocol
Dead node detection in teen protocolDead node detection in teen protocol
Dead node detection in teen protocol
 
Performance Analysis and Comparison of Routing Protocols in Wireless Sensor N...
Performance Analysis and Comparison of Routing Protocols in Wireless Sensor N...Performance Analysis and Comparison of Routing Protocols in Wireless Sensor N...
Performance Analysis and Comparison of Routing Protocols in Wireless Sensor N...
 
A Novel Back Up Wide Area Protection Technique for Power Transmission Grids U...
A Novel Back Up Wide Area Protection Technique for Power Transmission Grids U...A Novel Back Up Wide Area Protection Technique for Power Transmission Grids U...
A Novel Back Up Wide Area Protection Technique for Power Transmission Grids U...
 
Optimal Overcurrent Relay Coordination using GA, FFA, CSA Techniques and Comp...
Optimal Overcurrent Relay Coordination using GA, FFA, CSA Techniques and Comp...Optimal Overcurrent Relay Coordination using GA, FFA, CSA Techniques and Comp...
Optimal Overcurrent Relay Coordination using GA, FFA, CSA Techniques and Comp...
 
A survey report on mapping of networks
A survey report on mapping of networksA survey report on mapping of networks
A survey report on mapping of networks
 
A Fault Tolerant Approach To Enhances WSN Lifetime In Star Topology
A Fault Tolerant Approach To Enhances WSN Lifetime In Star TopologyA Fault Tolerant Approach To Enhances WSN Lifetime In Star Topology
A Fault Tolerant Approach To Enhances WSN Lifetime In Star Topology
 
IRJET- Enhancing Data Transmission and Protection in Wireless Sensor Node
IRJET- Enhancing Data Transmission and Protection in Wireless Sensor NodeIRJET- Enhancing Data Transmission and Protection in Wireless Sensor Node
IRJET- Enhancing Data Transmission and Protection in Wireless Sensor Node
 
40220140504008
4022014050400840220140504008
40220140504008
 
15 ijcse-01236
15 ijcse-0123615 ijcse-01236
15 ijcse-01236
 
Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...
 
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
 

Similar to FPGA IMPLEMENTATION OF APPROXIMATE SOFTMAX FUNCTION FOR EFFICIENT CNN INFERENCE

(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
mlaij
 
Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading
Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading
Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading
Onyebuchi nosiri
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
eSAT Journals
 
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
evonnehoggarth79783
 

Similar to FPGA IMPLEMENTATION OF APPROXIMATE SOFTMAX FUNCTION FOR EFFICIENT CNN INFERENCE (20)

(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
 
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAA LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGA
 
Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading
Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading
Cloud Based Datacenter Network Acceleration Using FPGA for Data-Offloading
 
5 1-33-1-10-20161221 kennedy
5 1-33-1-10-20161221 kennedy5 1-33-1-10-20161221 kennedy
5 1-33-1-10-20161221 kennedy
 
HIGH PERFORMANCE ETHERNET PACKET PROCESSOR CORE FOR NEXT GENERATION NETWORKS
HIGH PERFORMANCE ETHERNET PACKET PROCESSOR CORE FOR NEXT GENERATION NETWORKSHIGH PERFORMANCE ETHERNET PACKET PROCESSOR CORE FOR NEXT GENERATION NETWORKS
HIGH PERFORMANCE ETHERNET PACKET PROCESSOR CORE FOR NEXT GENERATION NETWORKS
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
 
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL ScriptNetwork Analyzer and Report Generation Tool for NS-2 using TCL Script
Network Analyzer and Report Generation Tool for NS-2 using TCL Script
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
GCF
GCFGCF
GCF
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 

More from International Research Journal of Modernization in Engineering Technology and Science

TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
International Research Journal of Modernization in Engineering Technology and Science
 
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
International Research Journal of Modernization in Engineering Technology and Science
 
EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...
EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...
EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...
International Research Journal of Modernization in Engineering Technology and Science
 
EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...
EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...
EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...
International Research Journal of Modernization in Engineering Technology and Science
 

More from International Research Journal of Modernization in Engineering Technology and Science (20)

AUTOMATIC IRRIGATION SYSTEM DESIGN AND IMPLEMENTATION BASED ON IOT FOR AGRICU...
AUTOMATIC IRRIGATION SYSTEM DESIGN AND IMPLEMENTATION BASED ON IOT FOR AGRICU...AUTOMATIC IRRIGATION SYSTEM DESIGN AND IMPLEMENTATION BASED ON IOT FOR AGRICU...
AUTOMATIC IRRIGATION SYSTEM DESIGN AND IMPLEMENTATION BASED ON IOT FOR AGRICU...
 
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
TRAINING NEEDS ANALYSIS AMONG GRASSROOTS ENTREPRENEURS: BASIS FOR THE IMPLEME...
 
AUTOMOTIVE ENGINE EXHAUST AND ITS EFFECT ON ATMOSPHERE
AUTOMOTIVE ENGINE EXHAUST AND ITS EFFECT ON ATMOSPHEREAUTOMOTIVE ENGINE EXHAUST AND ITS EFFECT ON ATMOSPHERE
AUTOMOTIVE ENGINE EXHAUST AND ITS EFFECT ON ATMOSPHERE
 
ROLE OF ELECTRONIC HUMAN RESOURCES (E-HR) IN ORGANIZATION
ROLE OF ELECTRONIC HUMAN RESOURCES (E-HR) IN ORGANIZATIONROLE OF ELECTRONIC HUMAN RESOURCES (E-HR) IN ORGANIZATION
ROLE OF ELECTRONIC HUMAN RESOURCES (E-HR) IN ORGANIZATION
 
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
 
R-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGE
R-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGER-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGE
R-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGE
 
INTUITIONISTIC S-FUZZY SOFT NORMAL SUBGROUPS
INTUITIONISTIC S-FUZZY SOFT NORMAL SUBGROUPSINTUITIONISTIC S-FUZZY SOFT NORMAL SUBGROUPS
INTUITIONISTIC S-FUZZY SOFT NORMAL SUBGROUPS
 
PARTIAL REPLACEMENT OF AGGREGATES BY BURNT BRICK BATS AND LATERITIC FINES IN ...
PARTIAL REPLACEMENT OF AGGREGATES BY BURNT BRICK BATS AND LATERITIC FINES IN ...PARTIAL REPLACEMENT OF AGGREGATES BY BURNT BRICK BATS AND LATERITIC FINES IN ...
PARTIAL REPLACEMENT OF AGGREGATES BY BURNT BRICK BATS AND LATERITIC FINES IN ...
 
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
A REVIEW PAPER ON PROPERTIES OF CONCRETE WITH FRACTIONAL REPLACEMENT OF RECYC...
 
REVIEW ON PROCESS PARAMETER OPTIMIZATION FOR FORGING PROCESS
REVIEW ON PROCESS PARAMETER OPTIMIZATION FOR FORGING PROCESSREVIEW ON PROCESS PARAMETER OPTIMIZATION FOR FORGING PROCESS
REVIEW ON PROCESS PARAMETER OPTIMIZATION FOR FORGING PROCESS
 
MODELLING AND VIBRATION ANALYSIS OF REINFORCED CONCRETE BRIDGE
MODELLING AND VIBRATION ANALYSIS OF REINFORCED CONCRETE BRIDGEMODELLING AND VIBRATION ANALYSIS OF REINFORCED CONCRETE BRIDGE
MODELLING AND VIBRATION ANALYSIS OF REINFORCED CONCRETE BRIDGE
 
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORKCLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
 
A REVIEW ON GREENHOUSE ENVIRONMENT CONTROLLING ROBOT
A REVIEW ON GREENHOUSE ENVIRONMENT CONTROLLING ROBOTA REVIEW ON GREENHOUSE ENVIRONMENT CONTROLLING ROBOT
A REVIEW ON GREENHOUSE ENVIRONMENT CONTROLLING ROBOT
 
DIFFICULTIES FACED BY HIGHER EDUCATION FACULTIES DURING COVID-19 LEADING TO S...
DIFFICULTIES FACED BY HIGHER EDUCATION FACULTIES DURING COVID-19 LEADING TO S...DIFFICULTIES FACED BY HIGHER EDUCATION FACULTIES DURING COVID-19 LEADING TO S...
DIFFICULTIES FACED BY HIGHER EDUCATION FACULTIES DURING COVID-19 LEADING TO S...
 
MECHANICAL RECYCLING OF CFRP ALONG WITH CASE STUDY OF BICYCLE FRAME
MECHANICAL RECYCLING OF CFRP ALONG WITH CASE STUDY OF BICYCLE FRAMEMECHANICAL RECYCLING OF CFRP ALONG WITH CASE STUDY OF BICYCLE FRAME
MECHANICAL RECYCLING OF CFRP ALONG WITH CASE STUDY OF BICYCLE FRAME
 
FLOODPLAIN HAZARD MAPPING AND ASSESSMENT OF RIVER KABUL USING HEC-RAS 2D MODEL
FLOODPLAIN HAZARD MAPPING AND ASSESSMENT OF RIVER KABUL USING HEC-RAS 2D MODELFLOODPLAIN HAZARD MAPPING AND ASSESSMENT OF RIVER KABUL USING HEC-RAS 2D MODEL
FLOODPLAIN HAZARD MAPPING AND ASSESSMENT OF RIVER KABUL USING HEC-RAS 2D MODEL
 
ASSESSMENT OF WATER QUALITY AND HEAVY METALS IN DRAINAGE CANAL
ASSESSMENT OF WATER QUALITY AND HEAVY METALS IN DRAINAGE CANALASSESSMENT OF WATER QUALITY AND HEAVY METALS IN DRAINAGE CANAL
ASSESSMENT OF WATER QUALITY AND HEAVY METALS IN DRAINAGE CANAL
 
EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...
EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...
EXPERIMENTAL STUDY ON MECHANICAL PROPERTIES AND DURABILITY OF REACTIVE POWDER...
 
AIRPORT MANAGEMENT USING FACE RECOGNITION BASE SYSTEM
AIRPORT MANAGEMENT USING FACE RECOGNITION BASE SYSTEMAIRPORT MANAGEMENT USING FACE RECOGNITION BASE SYSTEM
AIRPORT MANAGEMENT USING FACE RECOGNITION BASE SYSTEM
 
EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...
EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...
EFFECTS OF LAYING KADHAKNATH HEN SERUM AND ANTI-PROLACTIN MEDICATION [BROMOCR...
 

Recently uploaded

Recently uploaded (20)

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 

FPGA IMPLEMENTATION OF APPROXIMATE SOFTMAX FUNCTION FOR EFFICIENT CNN INFERENCE

  • 1. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [872] FPGA IMPLEMENTATION OF APPROXIMATE SOFTMAX FUNCTION FOR EFFICIENT CNN INFERENCE Mohammed Abdullah Mubarak Alshahrani*1 *1Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, Makkah, Saudi Arabia. ABSTRACT Softmax function is an integral part of object detection frameworks based on most deep or shallow neural networks. While the configuration of different operation layers in a neural network can be quite different, softmax operation is fixed. With the recent advances in object detection approaches, especially with the introduction of highly accurate convolutional neural networks, researchers and developers have suggested different hardware architectures to speed up the overall operation of these compute-intensive algorithms. Xilinx, one of the leading FPGA vendors, has recently introduced a deep neural network development kit for exactly this purpose. However, due to the complex nature of softmax arithmetic hardware involving exponential function, this functionality is only available for bigger devices. For smaller devices, this operation is bound to be implemented in software. In this paper, a light-weight hardware implementation of this function has been proposed which does not require too many logic resources when implemented on an FPGA device. The proposed design is based on the analysis of the statistical properties of a custom convolutional neural network when used for classification on a standard dataset i.e. CIFAR-10. Specifically, instead of using a brute force approach to design a generic full precision arithmetic circuit for SoftMax function using real numbers, an approximate integer-only design has been suggested for the limited range of operands encountered in real- world scenario. The approximate circuit uses fewer logic resources since it involves computing only a few iterations of the series expansion of exponential function. However, despite using fewer iterations, the function has been shown to work as good as the full precision circuit for classification and leads to only minimal error being introduced in the associated probabilities. The circuit has been synthesized using Hardware Description Language (HDL) Coder and Vision HDL toolboxes in Simulink® by Mathworks® which provide higher level abstraction of image processing and machine learning algorithms for quick deployment on a variety of target hardware. The final design has been implemented on a Xilinx FPGA development board i.e. Zedboard which contains the necessary hardware components such as USB, Ethernet and HDMI interfaces etc. to implement a fully working system capable of processing a machine learning application in real-time. Keywords: FPGA, High-Level Synthesis, Machine Learning, Convolutional Neural Networks, SoftMax. I. INTRODUCTION The rise of machine learning applications for object classification in images and videos has risen the demand for real-time realization of such complex algorithms on dedicated hardware. In the recent years Convolutional Neural Networks (CNN) have made their mark as the most effective machine learning algorithm for classification operation in various domains. Most CNN architectures use SoftMax as the final layer before the output one. Softmax is a sort of normalization operation which transforms the arbitrary input operands into probability distribution points which add to a sum of one. Mathematically, it is represented as, ( ) ∑ (1) Where ‘x’ is a ‘j’-dimensional input vector and ‘ ( )’ is the perceived output probability of its ‘ith’ element. Thus, for a classification operation with ‘j’ number of classes, the softmax function can be used to calculate the relative confidence score of the classifier for each class. This can be understood from the standpoint of a neural network [1] with certain hidden processing layers and an output classification layers such as that depicted in Fig. 1.
  • 2. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [873] Figure 1: Data flow through a typical neural network Typically the outputs of a neural network depicting a given class (e.g. a dog or a cat) are independent logistic regression classifiers with values within the range [0 1]. In order to convert these independent output values to a probability vector, SoftMax function depicted by eq. (1) is employed. Without this function, the different classes will each have an independent real-value probability but the results will not add up to one necessarily leading to difficulty in interpreting the results. This function is a generalization of the logistic function used in logistic regression classifiers so that it could be used for multiple classes. While other layers of neural networks employ simpler multiplication and addition arithmetic units, Softmax is particularly complex in nature due to the inclusion of exponential and division operation. Thus, several hardware-based implementations suffer from exorbitantly large resource utilization and poor performance due to long logic delay. To this end, many researchers have proposed approximate implementations of this unit which necessarily result in loss of result precision. Thus, there is a need to explore a variety of circuit design techniques to lower the computational complexity of this inevitable component of neural networks while keeping the result precision at the acceptable levels. Moreover, given the complex nature of the parent neural networks themselves, there is a need to simplify the overall design process as well so that the design and test procedures could be completed within reasonable time. This calls for employing high-level synthesis tools and developing the frameworks to make this process easier for neural network experts less acquainted with the hardware design process. In this work, we have designed a SoftMax implementation that is efficient in terms of resource utilization of hardware logic circuits while keeping the result accuracy at the acceptable levels. The framework has been developed using high-level synthesis and testing tools. II. LITERATURE REVIEW Given the importance of SoftMax function in deep neural networks, several research works have considered its hardware implementation using different strategies. Recently, Li et al. [2] have described an FPGA-based hardware approximate implementation using Look Up Tables (LUT) and piece-wise linear interpolation. This work is based on a pipelined approach and uses multistage Wallace and other multiplier structures to speed up the overall computation. Such an intricate structure is needed since the Softmax function requires multiple exponential, addition, multiplication and division units. Similarly, Kouretas and Vassilis [3] have described another approximate computing architecture with adaptive approximation to tradeoff complexity with accuracy of the results. Realizing the high complexity, long critical path delay and associated overflow problem, Yuan [4] has suggested using down-scaling and domain transformation to eliminate the aforementioned problems. On the same pattern, Du et al. [5] have described a Softmax implementation based on LUTs and an arithmetic unit to calculate natural logarithm using Maclaurin series expansion. Since, the main computational unit in Softmax function is the exponent, its direct implementation using well-known hardware design techniques is also relevant. Thus, a CORDIC algorithm-based FPGA implementation as suggested by Rekha and Menon [6] and a short Taylor expansion-based implementation proposed by Jamro et a. [7] also provides further insight into the problem of speeding up this expensive operation using a dedicated hardware implementation. Other hardware implementation techinques for arithmetic units such as Distributed Arithmetic [8] and Common Sub-expression sharing [9] etc. can also be considered for reducing the associated circuit complexity.
  • 3. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [874] III. METHODOLOGY In this work, we have proposed to build a SoftMax hardware accelerator on an FPGA device and consider various resource saving techniques mentioned in the literature to come up with the most optimal configuration suitable for current generation deep neural networks. Specifically, signal statistics have been analyzed to design the most suitable hardware structure for this important function while conserving precious logic resources. For this purpose, a popular standard dataset for image recognition task has been selected i.e. CIFAR-10 [11]. For experimentation on this dataset, a custom CNN has been trained in Matlab environment. This deep neural network uses residual links for faster and efficient training [12] and has been depicted in Figure 2. Figure 2: Custom Residual CNN for CIFAR-10 Dataset In this work, we have focused only on the hardware implementation of the Softmax layer which is an essential component of the CNN architecture and determines the final output class as shown in Figure 2. However, given the complexity of the exponent function of eq. (1), its corresponding hardware implementation is too complex.
  • 4. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [875] Thus, to come up with a low complexity hardware, we have analyzed the signal statistics of the inputs to this function for the CIFAR-10 dataset and their relation to the final detection accuracy of the whole CNN architecture. Figure 3: Confusion matrix for the CIFAR-10 dataset with full precision SoftMax function Figure 3 shows the confusion matrix obtained when the custom CNN show in Figure 2 is used to process CIFAR- 10 dataset with full precision SoftMax implementation. The overall accuracy is 9.54% on the validation set and 2.62% on the training set. Figure 4: Histogram of input values to the SoftMax function for CIFAR-10 dataset Figure 4 illustrates the data statistics collected for CIFAR-10 dataset when processed using the custom CNN architecture shown in Figure 2. From the statistics it can be clearly seen that the bulk of the input values are within a narrow range around 0 i.e. [-10 10]. Thus, it does not seem appropriate to design a hardware circuit for a generic wider range of inputs because that leads to a very complex hardware circuit. Since the exponent function is the main complicated operation in eq. (1), we keep our discussion focused on this operation only. It can be clearly seen in Figure 5, that if the whole input range of [-20 40] is considered, the full precision exponent function has a very wide output range i.e. [0 2.5 × 1017]. This requires enormous logic resources to implement the functionality in hardware. Moreover, the input operands are real numbers with fractional parts as well. This necessitates the use of floating- or fixed-point number formats which adds to the circuit
  • 5. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [876] complexity even more. Realizing these two important properties of the input operands, i.e. the range as well as the format, various approximations have been applied systematically in this work to reduce the computational load without affecting the classification accuracy beyond reasonable levels. Moreover, the exponent function itself can be approximated using its series expansion with truncation. However, the all these approximations, i.e. limiting the range, quantizing to integer-only numbers and truncating the series expansion have to be analyzed for the combined effect on the result precision when applied to the real-world scenario. For this purpose, as mentioned earlier, the custom CNN architecture with approximate SoftMax function has been evaluated on the standard CIFAR-10 dataset. Figure 5: Exponent function plot for the range of input values to SoftMax function As mentioned earlier, the main arithmetic units in the SoftMax function are the exponential and division functions. The hardware design entry of complex functions such as that of Softmax and its integration within the larger deep neural network is, however, too complex to be handled using conventional hardware description language (HDL) approach. Thus, we have employed the high-level synthesis tool supplied by Mathworks in Simulink i.e. HDL Coder toolbox. The HDL coder generates the HDL code (Verilog or VHDL) automatically which can then be incorporate as a hardware accelerator in a larger system employing both software (CPU) and hardware accelerator combined called “Hardware-Software Co-Design”. One big advantage of using Simulink for such designs is that the environment can be easily setup for simulation using real images (datasets) and the functionality can be tested before finally incorporating into the hardware. The whole hardware-software co-design system has been implemented on an FPGA SOC i.e. Zedboard which contains both processor and programmable sections. The main application for deep neural network runs on the processor while the Softmax accelerator has been implemented on the FPGA logic fabric accessible to the software through standard bus interface i.e. AXI interconnect. Figure 6: Approximate exponent function employed as a hardware accelerator within a hardware-software co-design in an FPGA
  • 6. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [877] Figure 7 shows the whole hardware system with the Zedboard FPGA platform where a full hardware-software co-design has been ported. This system uses Ubuntu operating system to provide the user-interface while the hardware portion incorporates an accelerator for the desired functionality. The hardware and software work seamlessly together through the standard bus interface as show in Figure 6 above. Figure 7: The Hardware-Software Co-Design implemented on Zedboard FPGA for deep neural network processing on live video stream IV. RESULTS AND DISCUSSION As mentioned in the previous section, various approximations can be applied to the SoftMax operation to lower its computational demand while keeping the result accuracy higher. In this section, the effect of these different approximation methods has been analyzed when applied to real-world scenario. Approximating SoftMax Function through Integer-Only Operations The first approximation technique applied to the whole CNN inference framework on standard CIFAR-10 dataset is the conversion of input operands to the integer-only format by dropping the fractional part without rounding as given by, ⌊ ⌋ (2) Although rounding seems a better method than truncation, the overall accuracy of the CNN classifier did not register any significant drop and very similar 9.56% and 2.89% detection rates were observed on the validation set and the training set respectively. There were, however, negligible errors in the calculation of confidence scores in the validation set as depicted in the histogram of errors shown in Figure 8. It can be noticed that almost all the confidence scores had zero error with only a tiny percentage (0.21 %) showing errors as little as 0.01. Thus, it can be safely concluded that integer-only operation does not affect the result accuracy significantly while reducing the computational load from floating point operation to integer operation. Figure 8: Histogram of errors in confidence score of the validation set introduced due to integer-only operation
  • 7. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [878] Approximating SoftMax Function through Limiting the Operand Range The second approximation considered in this work is limiting the range of input operands. The first step in this regard is limiting the operands to only positive integers i.e. [0: ∞]. Later, the range is systematically reduced to [0: 31], [0: 15] and [0: 7] to correspond to binary bit representation of 5-bits, 4-bits and 3-bits respectively. These successive approximations on top of using integer-only operands lead to increasingly larger errors in both classification accuracy and confidence scores. The errors have been reported in Figures 9 to 12 and Table 1. It can be observed that the result fidelity in both the classification accuracy and the confidence scores has been largely preserved for integer range up to [0: 15] and takes a significant hit below that. Thus, if the range is limited to integers between a very narrow range i.e. [0: 7], the classification error increases to 22.5% while the confidence scores can have an error up to 3.9%. The corresponding histograms of error also show the same trend. In Figure 13, it can be seen that drastically reducing the input operand range to [0: 7], leads to higher occurrences of non-zero errors. From this data, it can be concluded that the integer-only range [0: 15] can be used safely with an acceptable range of error introduced due to the approximation in input operand representation. This leads to a significant savings in the computation resources since only 4 bits are required for number representation compared to the original floating point representation requiring at least 32 bits for single precision representation. Figure 9: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: ∞] Figure 10: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 31] Figure 11: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 15]
  • 8. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [879] Figure 12: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 7] Approximating SoftMax Function through series implementation One of the most common techniques used in the literature to approximate exponent function is through truncation of its series expansion. Precisely, the McLaurin series expansion of the exponent function is given as follows, ∑ (3) (4) (5) (6) (7) The infinite series for exponent function given by Eq. 3 can be approximated by first 2, 3, 4 or 5 terms as in equations 4, 5, 6 and 7 respectively. The results of using these approximations along with the earlier approximations i.e. integer-only operation with range limited to [0:15] have been reported in Figures 13 to 16 and Table 1. It can be noticed that although using only a two-term approximation does not lead to any degradation of classification accuracy, a 10% error has been introduced in the confidence level scores. Using a three-term approximation leads to a 5% error while using four terms gives 2.85 % error. 1.8 % error is given when using five terms. With each additional term, the complexity of the operation grows. We, however, conclude that using three or four terms is sufficient since the error is within acceptable range. Using five terms gives a very low error but the additional complexity over four terms is not justified. To further reduce the hardware complexity associated with division operation, it is suggested to use the nearest power-of-two coefficients in eq. (6) to give, (8) As seen from the data in Table 1, this approximation does not affect the detection accuracy while only marginally affecting the confidence scores. Figure 13: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 15] with 2 term approximation of exponent function
  • 9. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [880] Figure 14: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 15] with 3 term approximation of exponent function Figure 15: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 15] with 4 term approximation of exponent function Figure 16: Histogram of errors in confidence score of the validation set introduced due to integer-only operation limited to the range [0: 15] with 5 term approximation of exponent function Table 1. Comparison of different approximation techniques using error on CIFAR-10 Dataset SN. Method Validation Classification Error Validation Confidence Score Error 1 Full Precision 9.54 % 5 × 10-7 % 2 Integer-Only 9.56 % 0.21 % 3 Integer-Only [0: ∞] 9.56 % 0.24 % 4 Integer-Only [0: 31] 9.56 % 0.24 % 5 Integer-Only [0: 15] 9.73 % 0.31 % 6 Integer-Only [0: 7] 22.5 % 3.9 % 7 Integer-Only [0: 15], 2 series terms 9.73 % 10.33 % 8 Integer-Only [0: 15], 3 series terms 9.73 % 5.1%
  • 10. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [881] 9 Integer-Only [0: 15], 4 series terms 9.73 % 2.85 % 10 Integer-Only [0: 15], 5 series terms 9.73 % 1.8 % 11 Integer-Only [0: 15], 4 series terms with power-of-two coefficients 9.73 % 3.0 % The proposed approximate exponent function for SoftMax operation in the custom CNN architecture has been implemented in the Simulink HDL coder to generate it’s HDL code for use in the hardware-software co-design shown in Figure 6 above. The schematic of this proposed design has been depicted in Figure 17. This is the implementation of the proposed design given by eq. (8) and gives an error of 9.73 % in detection accuracy and 3.0 % in confidence scores when tested on CIFAR-10 dataset (Table 1). Figure 17: High-Level circuit diagram for the approximate exponent function using Simulink HDL Coder V. CONCLUSION An approximate circuit for implementation of SoftMax function as used in standard CNN architectures has been described in this work. For this purpose, various approximation techniques related to the range and type of operands and series expansion of exponent function have been employed. The considered techniques have been motivated by the actual signal statistics gathered while processing a real world standard dataset i.e. CIFAR-10. To test the setup, a custom CNN has been trained and tested with the proposed approximations implemented in the full system. The results show that the proposed approximation lead to negligible loss in CNN’s detection accuracy as well as the confidence scores while reducing the circuit complexity significantly. VI. REFERENCES [1] Online link: Super Data Science, “Convolutional Neural Networks (CNN): Softmax & Cross-Entropy” available at https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-softmax- crossentropy, accessed 28th Nov, 2020. [2] Z. Li, H. Li, X. Jiang, B. Chen, Y. Zhang and G. Du, "Efficient FPGA Implementation of Softmax Function for DNN Applications," 2018 12th IEEE International Conference on Anti-counterfeiting, Security, and Identification (ASID), Xiamen, China, 2018, pp. 212-216 [3] Kouretas, I.; Paliouras, V. Hardware Implementation of a Softmax-Like Function for Deep Learning. Technologies 2020, 8, 46 [4] Yuan, B. “Efficient hardware architecture of softmax layer in deep neural network.” 2016 29th IEEE International System-on-Chip Conference (SOCC) (2016): 323-326. [5] Gaoming Du, Chao Tian, Zhenmin Li, Duoli Zhang, Yongsheng Yin, and Yiming Ouyang, “Efficient Softmax Hardware Architecture for Deep Neural Networks”, in Proceedings of the 2019 Great Lakes Symposium on VLSI (GLSVLSI '19). [6] R. Rekha and K. P. Menon, "FPGA implementation of exponential function using cordic IP core for extended input range," 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 2018, pp. 597-600
  • 11. e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [882] [7] E. Jamro, K. Wiatr and M. Wielgosz, "FPGA Implementation of 64-Bit Exponential Function for HPC," 2007 International Conference on Field Programmable Logic and Applications, Amsterdam, 2007, pp. 718-721 [8] NagaJyothi, Grande & Sriadibhatla, Sridevi. (2017). Distributed arithmetic architectures for FIR filters- A comparative review. 2684-2690. 10.1109/WiSPNET.2017.8300250. [9] Chip-Hong Chang and Mathias Faust, "A new common subexpression elimination algorithm for realizing low-complexity higher order digital filters". Trans. Comp.-Aided Des. Integ. Cir. Sys. 29, 5 (May 2010), pp. 844–848. [10] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”, arXiv:1602.07360. [11] Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, 2009. [12] https://www.mathworks.com/help/deeplearning/ug/train-residual-network-for-image- classification.html