SlideShare a Scribd company logo
Towards Neural Processing for General Purpose Approximate Programs
Prasanna Kothalkar1
, Mohid Nabil1
, Vidhi Agrawal1
, Paridha Saxena1
1
Department of Computer and Electrical Engineering, University of Texas at Dallas
pxk122030@utdallas.edu, mxn150230@utdallas.edu, vna150130@utdallas.edu, pxs158430@utdallas.edu
Abstract
Modern processor architectures have focused on increasing pro-
cessor speeds while reducing power consumption. The chal-
lenge to maintain energy consumption feasibility with increased
transistor density on microprocessor chips has lead to new gen-
eration of processors which replace program code with alternate
faster implementations at run time. In this report we present one
such architecture that use Neural Networks to mimic regions of
program code. We train neural networks on training data gen-
erated to model programs to be run on processors and then run
the neural networks at run time through program invocations to
generate our results from program outputs.
Index Terms: computer architecture, neural processing units,
program acceleration
1. Introduction
Due to the limitations of technology scaling for modern pro-
cessors, recent focus has moved towards computation special-
ization to run programs at fast speeds and minimal energy con-
sumption. Recent work provides acceleration by exploiting er-
ror tolerance by leveraging an approximate computing frame-
work. We implement Neural Processing Units(NPUs), a new
class of configurable accelerators for approximate computation.
Many application programs in diverse fields like image process-
ing, speech recognition, computer vision, signal processing can
tolerate errors in computation and thus provide immense oppor-
tunity to replace with alternate computing executions. While
traditional processors execute programs via instruction set ar-
chitecture and programming logic, NPUs are ’trained’ to mimic
regions of imperative code. We need to generate the training
data by running functions that need to be transformed via NPU
acceleration. To achieve this we save the inputs and outputs
generated using neural networks.
2. Neural Networks
Neural Networks are brain inspired machine learning models
[1] which have neurons as the basic building block and advan-
tageous in learning concepts in hierarchical fashion. Similar
to the brain which builds up it’s idea about a complex topic
from simpler ideas, neural networks learn a concept as a func-
tion from data by trying to learn to generate outputs from inputs.
The learning is performed in terms of weights on the edges that
connect neurons in adjacent layers. The learning is performed to
reduce the error by providing feedback and is realized through
the back-propagation algorithm.
Neural networks are a popular class of machine learning
models and can theoretically represent any given function given
two layers with infinite number of nodes. Practically, such com-
putation can only be measured in the limit and so one way to im-
plement such functions in neural networks will require adding
more layers. However, this vertical addition of layers faces
the problem of vanishing gradient where the gradient signal
in the backpropogation algorithm looses the information trav-
eling through many layers. A new class of algorithms using
layerwise pre-training technique were developed leading to a
resurgence in neural network literature and development of va-
riety of learning models. The basic idea involved initializing
the neural network edge weights in a more informed manner
such that the training procedure would not end up in local min-
ima while training. Such models were known as ’Deep Neu-
ral Networks’ and have lead to impressive results in challeng-
ing problems in natural language processing, speech processing
and vision problems. We plan to use Deep Belief Networks and
Convolutional Neural Networks for our Sobel Edge detection
NPU in future work.
Figure 1: Neural Networks
2.1. Weight Update Rule
Neural Network backpropogation updates the weights for neu-
ral network edges using the following update rule.
wji = wji + ∆wji (1)
where wji represents the edge weight between node j and node
i and ∆wji is the change in weight performed by the backpro-
pogation step.
∆wji = α(tj − yj)xi (2)
where α represents the learning rate of our training algorithm,
tj are the target values for jth
output node and yj are the output
values for jth
output node. Finally, xi are the input values for
the ith
input node.
3. Data generation
Data extraction was carried out for training and implentation of
neural network . For kmeans dataset, we used random inputs
to our code and extracted the generated outputs. For sobel edge
detection program input images were provided and the sobel
edge detection computation was learned by the neural network,
which was then further fine-tuned on a separate validation set of
images and tested on another set of images.
Dataset for Sobel Edge Detection Sample size: 200 train-
ing, 200 validation, 100 testing Input: Feature extraction per-
formed for each pixel such that each sample is made of individ-
ual pixel value and its neighbors Output: Sobel Edge Detection
sum value
Dataset for k-means Sample size: 850 training, 850 vali-
dation, 850 test Input: Randomly generated vector of size 10
Output: Vector of size 10 which provides Cluster Membership
for each of 10 input values
The output files were created in a fixed format so as to en-
able FANN to read the inputs and outputs from the text file. The
file format selected was as follows 1) the first row contained the
total number of iteration 2) number of iterations was followed
by total number of inputs 3) at the end of the line total number
of outputs was printed
Each line of the text file contained the inputs followed by
the output results.The same format file was used for both train-
ing and testing purposes
4. Software Neural Acceleration
As mentioned previously, we have used FANN toolkit to learn
neural networks off the fly for our programs. The code below
shows the neural network output function being called for gen-
erating the output instead of getting the output from sobel pro-
gram.
void edgeDetection ( Mat src , Mat dst , bool NPU, i n t prev i , s t r i n g name )
{
i n t gx , gy , sum ;
vector<int> output ;
i f (NPU){
for ( i n t y = 0; y < s r c . rows ; y++)
{
for ( i n t x = 0; x < s r c . c o l s ; x++)
{
d s t . at<uchar >(y , x ) = 0 . 0 ;
}
}
i n t i = p r e v i ;
i n t l e n g t h = ( d s t . rows −2)*( d s t . cols −2);
output = t e s t S o b e l ( i , l e n g t h ) ;
i n t idx = 0;
for ( i n t y = 1; y < s r c . rows − 1; y++){
for ( i n t x = 1; x < s r c . c o l s − 1; x++){
i f ( output [ idx ++] < 1 . 0 )
d s t . at<uchar >(y , x ) = 255;
e l s e
d s t . at<uchar >(y , x ) = 0;
}
}
name = name . r e p l a c e ( name . f i n d ( ’ . ’ ) , 4 , ”−npu . png ” ) ;
}
e l s e{
for ( i n t y = 0; y < s r c . rows ; y++)
for ( i n t x = 0; x < s r c . c o l s ; x++)
d s t . at<uchar >(y , x ) = 0 . 0 ;
for ( i n t y = 1; y < s r c . rows − 1; y++){
for ( i n t x = 1; x < s r c . c o l s − 1; x++){
gx = xGradient ( src , x , y ) ;
gy = yGradient ( src , x , y ) ;
sum = abs ( gx ) + abs ( gy ) ;
i n t output = sum > 127 ? 1 : 0;
i f ( output ==1)
d s t . at<uchar >(y , x ) = 255;
e l s e
d s t . at<uchar >(y , x ) = 0;
}
}
name = name . r e p l a c e ( name . f i n d ( ’ . ’ ) , 4 , ”−sob . png ” ) ;
}
imwrite ( name , d s t ) ;
}
In case of Sobel Edge detector program any input image has
154401 pixels. (441 × 321 or 321 × 481) Each pixel value is
transformed along with it’s neighboring pixels which form our
training set. The output of this patch of image through the sobel
filter is our target value. All the input and output values are
thresholded as binary image for edge detection problem. Hence,
our training, validation and testing data is binary for our neural
network. So, each pixel has 8 neighboring pixels giving an input
node of size 9, 3 hidden layers with 9 nodes each and an output
node of 1 node. For k-means 10 input values are generated
from range of 0-100 and ouptut values specify if they belong
to cluster 0 or cluster 1. So, we have 10 inputs and 10 ouptuts
along with 3 hidden layers with 10 nodes each.
5. FPGA simulation
Implementation of Neural networks on FPGA( Hardware Im-
plementation) is performed to test if further speed acceleration
can be achieved by
Generally the neural networks are implemented in software,
and are trained and simulated on general-purpose sequential
computers for emulating a wide range of neural networks mod-
els. Software implementations offer flexibility. However hard-
ware implementations of neural networks provide high speed in
real time applications and compactness. The usage of the FPGA
(Field Programmable Gate Array) for the implementation of the
neural network is done for the purpose of providing flexibility
and speed to the programmable systems. The neural network
design implememtation on the FPGAs provides higher speed
and smaller size for real time application than the other imple-
mentations.The major advantage includes that the programma-
bility of reconfigurable FPGAs yields fast special purpose hard-
ware for wide applications and this can also be used to explore
new neural network algorithms and problems of a scale that
would not be feasible with conventional processor implemen-
tation . This implementation is done using Very High Speed
Integrated Circuits Hardware Description Language (Verilog).
5.1. Overview
The basic idea includes that each of its neuron take some in-
formation as an input from another neuron or from an external
input. This information is propagated as an output that are com-
puted as weighted sum of inputs and are applied as non-linear
function. FPGAs consist of three basic blocks that are config-
urable logic blocks, in-out blocks and connection blocks. Logic
blocks perform logic function. Connection blocks connect logic
blocks with in-out blocks. These structures consist of routing
channels and programmable switches.
For this, first the training data is being generated on c, and
is the data is being saved in the file. Then the neural network
is implemented using the hardware language (verilog) on Xil-
inx. The inputs are given as the input nodes and the weights are
being wired between the different layers, while the output is ex-
tracted from the output nodes.The hidden layes are implemted
using different gates and are being looped for executing (Mul-
tiplying and addition), giving the output. This implementation
reads the data from the file generated in C++, in such way the
trained data is being passed to the FPGA, and the neural net-
work is executed on Xilinx. The execution time of this run is be-
ing recorded and is being compared to that of the conventional
run ( Software implementation on C). This shows the amount of
speed up of execution of the same neural network.
Figure 2: Neural Network block diagram in Xilinx
5.2. Implementation
By using of the FPGA features hardware implementation of
fully parallel ANN’s is possible. In this architecture number of
multipliers per neuron equals to number of connections to this
neuron and number of the full adders equals to number of con-
nections to the previous layer. In this the verilog library were
designed for floating point addition and floating point multipli-
cation. The inputs from previous layer enter the layer parallel
and multiplier serially with their corresponding weights. The
results of multiplication are stored in their neuron area in the
addition Neural Network. Multiplied value of per neuron are
inputs for adder. The inputs of adder are added serially and
each addition are inputs for lookup table. The results of look
up table are stored for next layer. This ANNs architecture is
shown in Figure 2. In this design number of layer and number
of neuron can be changed easily during the working phase. Our
development platform is the Xilinx SPARTAN-3E FPGA (Xil-
inx 2007). This can further be modelled to a FPGA. Following
is the ITL schematic of the implemented neural network. It
consists of the inputs of the neural network X1, X2,..X10, and
a clock , giving the outputs Y1, Y2,..Y10.
A test bench in Verilog consists of same two main parts
of a normal design; an entity and architecture. We are simply
supplying inputs and observing the outputs to the design in test.
The architecture of the test bench will consist of the design we
are testing as a component, internal signals for input and output,
a port map of the component for the UUT (unit under test), a
process to run the clock and finally a stimulus process, which
will be responsible for running the tests you write to test the
design. Then the stimulus code is added to it. Firstly, we have
defined the clock and clock period. Then, replaced the stimulus
process with code. Total time for which code was simulated =
1000ns. In each cycle weight is read from the file and fed to the
accumulator and in next cycle we get the output of accumulator.
always @( posedge clk )
begin
for ( s t a g e = 0; s t a g e < 4; s t a g e = s t a g e +1)
begin
for ( nod =(N*( s t a g e +1)+1); nod<=(( s t a g e +2)*N) ; nod=nod +1)
begin
node [ nod ]=0; / / i n i t i a l z e to zero f o r c l e a r i n g the p r ev i ou s summation
for ( in =((N* s t a g e ) + 1 ) ; in<= ( ( s t a g e +1)*N) ; in = in +1)
begin
node [ nod ] = b i a s [ nod ]+ node [ nod ]+ node [ in ]* t e s t [ t e s t c o u n t e r ] ;
end
Y1 = node [ nod −1];
end
end
end
After the test we got the resuts as shown in Figure 3. It
shows the various values of the nodes being updated with time.
Thereafter its simulation and run time were obtained and were
compared with that of the network implemented on software
(C++).
Figure 3: Timing diagram screenshot using Xilinx development
tool
6. Experiments and Results
We have computed the running time and energy consumption
for the software based version of neural network and compared
with runnning times of the orignial programs without neural ac-
celaration. Our results indicate a speedup of 10-900% without
much loss of accuracy. We have used the FANN toolkit in C++
for performing neural network training and testing. All the re-
sults can be seen in table below. Xilinx ran the neural network
implementation in 4 microseconds. This is excellent speedup
and we would like to investigate this further on different pro-
grams with larger training and testing data sizes.
7. Discussion
As we see from the results for software implementation of Neu-
ral Networks i.e. Fast Artificial Neural Networks(FANN) li-
brary, the time reduction in running the programs is clearly ap-
parent. It is more prominent for k-means algorithm which is an
iterative algorithm and is provided great speedup due to neural
processing. Speedup for Sobel edge detection is affected due to
entire image pixel processing computation which has equivalent
data point processing as original sobel filter. Energy consump-
tion does not show clear pattern Sobel Edge detection program
Program Running
time (mil-
liseconds)
Energy con-
sumption
(Watt)
Mean
Squared
Error
Sobel-
original (40
images)
17547 ms 7.732 W NA
Sobel-
transformed
(40 images)
16255 ms 8.1316 W 0.035139
Sobel-
original (80
images)
31567 ms 8.005 W NA
Sobel-
transformed
(80 images)
26911 ms 4.472 W 0.033425
Kmeans-
original
983 ms 8.827 W NA
Kmeans-
transformed
180 ms 2.38 W 0.040964
and needs to be further investigated on different training and
testing sizes. However, as seen from figure 4 and figure 5, power
consumption and maximum temperature are higher for the orig-
inal Sobel Edge detection code run on training and testing set of
size 80 images. K-means NPU-accelerated program again pro-
vides clear advantage over traditional k-means for power con-
sumption. The accuracy for all the images generated are accept-
able, although application dependency shall have the final say.
Mean squared errors for each dataset is less than 0.05.
8. Conclusions and Future Work
Can be utilized for many system programs but large scale utility
is slowed down by manual computation. Innovative program-
ming framework revisions to handle neural processing or tight
processor architecture integration will have to be performed to
take advantage of this technique for large scale acceptability and
usability. Future scope includes better implementation of algo-
rithms on hardware through smaller and more efficient mapping
FPGAs , ASICS and other hardware accelerators are all poten-
tial hosts for further testing of this approach. Another area of fu-
ture work is the study of different machine learning algorithms
along with neural networks Linear classifiers, principal compo-
nents analysis and spectral waveform analysis tools have a vast
potential especially in the field of electrical engineering and sig-
nal processing. Deep Neural Networks are a natural extension
for our current Neural Processing Architecture and should pro-
vide significant improvements.
Figure 4: Power consumption for NPU Accelerated Sobel code
9. Acknowledgements
We thank Dr. Bhanu Kapoor for guidance and advise during
development of the project.
Figure 5: Power consumption for original Sobel code
10. References
[1] C. M. Bishop, Neural networks for pattern recognition. Oxford
university press, 1995.
[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statis-
tical Learning – Data Mining, Inference, and Prediction. New
York: Springer, 2009.
Figure 6: Original image
Figure 7: Edge Detected image using NPU Accelerated Sobel
code
Figure 8: Edge detected image using Sobel filter

More Related Content

What's hot

Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
Nicholas McClure
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
Barbara Fusinska
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
S N
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
David Dao
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Neural network
Neural networkNeural network
Neural network
Babu Priyavrat
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
datasciencekorea
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
indico data
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 

What's hot (19)

Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Neural network
Neural networkNeural network
Neural network
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 

Similar to NeuralProcessingofGeneralPurposeApproximatePrograms

Neural Networks
Neural NetworksNeural Networks
Neural Networks
Ismail El Gayar
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
RADO7900
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
ijceronline
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
IRJET Journal
 
Neural networks
Neural networksNeural networks
Neural networks
HarshitGupta367
 
IRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGA
IRJET Journal
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...
ijsrd.com
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
ijcsa
 
11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...
Alexander Decker
 
Digital image processing for camera application in mobile devices using artif...
Digital image processing for camera application in mobile devices using artif...Digital image processing for camera application in mobile devices using artif...
Digital image processing for camera application in mobile devices using artif...
Alexander Decker
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
IOSR Journals
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
IOSR Journals
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
Amazon Web Services
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
supratikmondal6
 
Lab 6 Neural Network
Lab 6 Neural NetworkLab 6 Neural Network
Lab 6 Neural Network
Kyle Villano
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
irjes
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
irjes
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
Alexander Decker
 
Implementation of Back-Propagation Neural Network using Scilab and its Conver...
Implementation of Back-Propagation Neural Network using Scilab and its Conver...Implementation of Back-Propagation Neural Network using Scilab and its Conver...
Implementation of Back-Propagation Neural Network using Scilab and its Conver...
IJEEE
 

Similar to NeuralProcessingofGeneralPurposeApproximatePrograms (20)

Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
 
Neural networks
Neural networksNeural networks
Neural networks
 
IRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGA
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...
 
Feed forward neural network for sine
Feed forward neural network for sineFeed forward neural network for sine
Feed forward neural network for sine
 
11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...
 
Digital image processing for camera application in mobile devices using artif...
Digital image processing for camera application in mobile devices using artif...Digital image processing for camera application in mobile devices using artif...
Digital image processing for camera application in mobile devices using artif...
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Lab 6 Neural Network
Lab 6 Neural NetworkLab 6 Neural Network
Lab 6 Neural Network
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
 
Implementation of Back-Propagation Neural Network using Scilab and its Conver...
Implementation of Back-Propagation Neural Network using Scilab and its Conver...Implementation of Back-Propagation Neural Network using Scilab and its Conver...
Implementation of Back-Propagation Neural Network using Scilab and its Conver...
 

NeuralProcessingofGeneralPurposeApproximatePrograms

  • 1. Towards Neural Processing for General Purpose Approximate Programs Prasanna Kothalkar1 , Mohid Nabil1 , Vidhi Agrawal1 , Paridha Saxena1 1 Department of Computer and Electrical Engineering, University of Texas at Dallas pxk122030@utdallas.edu, mxn150230@utdallas.edu, vna150130@utdallas.edu, pxs158430@utdallas.edu Abstract Modern processor architectures have focused on increasing pro- cessor speeds while reducing power consumption. The chal- lenge to maintain energy consumption feasibility with increased transistor density on microprocessor chips has lead to new gen- eration of processors which replace program code with alternate faster implementations at run time. In this report we present one such architecture that use Neural Networks to mimic regions of program code. We train neural networks on training data gen- erated to model programs to be run on processors and then run the neural networks at run time through program invocations to generate our results from program outputs. Index Terms: computer architecture, neural processing units, program acceleration 1. Introduction Due to the limitations of technology scaling for modern pro- cessors, recent focus has moved towards computation special- ization to run programs at fast speeds and minimal energy con- sumption. Recent work provides acceleration by exploiting er- ror tolerance by leveraging an approximate computing frame- work. We implement Neural Processing Units(NPUs), a new class of configurable accelerators for approximate computation. Many application programs in diverse fields like image process- ing, speech recognition, computer vision, signal processing can tolerate errors in computation and thus provide immense oppor- tunity to replace with alternate computing executions. While traditional processors execute programs via instruction set ar- chitecture and programming logic, NPUs are ’trained’ to mimic regions of imperative code. We need to generate the training data by running functions that need to be transformed via NPU acceleration. To achieve this we save the inputs and outputs generated using neural networks. 2. Neural Networks Neural Networks are brain inspired machine learning models [1] which have neurons as the basic building block and advan- tageous in learning concepts in hierarchical fashion. Similar to the brain which builds up it’s idea about a complex topic from simpler ideas, neural networks learn a concept as a func- tion from data by trying to learn to generate outputs from inputs. The learning is performed in terms of weights on the edges that connect neurons in adjacent layers. The learning is performed to reduce the error by providing feedback and is realized through the back-propagation algorithm. Neural networks are a popular class of machine learning models and can theoretically represent any given function given two layers with infinite number of nodes. Practically, such com- putation can only be measured in the limit and so one way to im- plement such functions in neural networks will require adding more layers. However, this vertical addition of layers faces the problem of vanishing gradient where the gradient signal in the backpropogation algorithm looses the information trav- eling through many layers. A new class of algorithms using layerwise pre-training technique were developed leading to a resurgence in neural network literature and development of va- riety of learning models. The basic idea involved initializing the neural network edge weights in a more informed manner such that the training procedure would not end up in local min- ima while training. Such models were known as ’Deep Neu- ral Networks’ and have lead to impressive results in challeng- ing problems in natural language processing, speech processing and vision problems. We plan to use Deep Belief Networks and Convolutional Neural Networks for our Sobel Edge detection NPU in future work. Figure 1: Neural Networks 2.1. Weight Update Rule Neural Network backpropogation updates the weights for neu- ral network edges using the following update rule. wji = wji + ∆wji (1) where wji represents the edge weight between node j and node i and ∆wji is the change in weight performed by the backpro- pogation step. ∆wji = α(tj − yj)xi (2) where α represents the learning rate of our training algorithm, tj are the target values for jth output node and yj are the output values for jth output node. Finally, xi are the input values for the ith input node.
  • 2. 3. Data generation Data extraction was carried out for training and implentation of neural network . For kmeans dataset, we used random inputs to our code and extracted the generated outputs. For sobel edge detection program input images were provided and the sobel edge detection computation was learned by the neural network, which was then further fine-tuned on a separate validation set of images and tested on another set of images. Dataset for Sobel Edge Detection Sample size: 200 train- ing, 200 validation, 100 testing Input: Feature extraction per- formed for each pixel such that each sample is made of individ- ual pixel value and its neighbors Output: Sobel Edge Detection sum value Dataset for k-means Sample size: 850 training, 850 vali- dation, 850 test Input: Randomly generated vector of size 10 Output: Vector of size 10 which provides Cluster Membership for each of 10 input values The output files were created in a fixed format so as to en- able FANN to read the inputs and outputs from the text file. The file format selected was as follows 1) the first row contained the total number of iteration 2) number of iterations was followed by total number of inputs 3) at the end of the line total number of outputs was printed Each line of the text file contained the inputs followed by the output results.The same format file was used for both train- ing and testing purposes 4. Software Neural Acceleration As mentioned previously, we have used FANN toolkit to learn neural networks off the fly for our programs. The code below shows the neural network output function being called for gen- erating the output instead of getting the output from sobel pro- gram. void edgeDetection ( Mat src , Mat dst , bool NPU, i n t prev i , s t r i n g name ) { i n t gx , gy , sum ; vector<int> output ; i f (NPU){ for ( i n t y = 0; y < s r c . rows ; y++) { for ( i n t x = 0; x < s r c . c o l s ; x++) { d s t . at<uchar >(y , x ) = 0 . 0 ; } } i n t i = p r e v i ; i n t l e n g t h = ( d s t . rows −2)*( d s t . cols −2); output = t e s t S o b e l ( i , l e n g t h ) ; i n t idx = 0; for ( i n t y = 1; y < s r c . rows − 1; y++){ for ( i n t x = 1; x < s r c . c o l s − 1; x++){ i f ( output [ idx ++] < 1 . 0 ) d s t . at<uchar >(y , x ) = 255; e l s e d s t . at<uchar >(y , x ) = 0; } } name = name . r e p l a c e ( name . f i n d ( ’ . ’ ) , 4 , ”−npu . png ” ) ; } e l s e{ for ( i n t y = 0; y < s r c . rows ; y++) for ( i n t x = 0; x < s r c . c o l s ; x++) d s t . at<uchar >(y , x ) = 0 . 0 ; for ( i n t y = 1; y < s r c . rows − 1; y++){ for ( i n t x = 1; x < s r c . c o l s − 1; x++){ gx = xGradient ( src , x , y ) ; gy = yGradient ( src , x , y ) ; sum = abs ( gx ) + abs ( gy ) ; i n t output = sum > 127 ? 1 : 0; i f ( output ==1) d s t . at<uchar >(y , x ) = 255; e l s e d s t . at<uchar >(y , x ) = 0; } } name = name . r e p l a c e ( name . f i n d ( ’ . ’ ) , 4 , ”−sob . png ” ) ; } imwrite ( name , d s t ) ; } In case of Sobel Edge detector program any input image has 154401 pixels. (441 × 321 or 321 × 481) Each pixel value is transformed along with it’s neighboring pixels which form our training set. The output of this patch of image through the sobel filter is our target value. All the input and output values are thresholded as binary image for edge detection problem. Hence, our training, validation and testing data is binary for our neural network. So, each pixel has 8 neighboring pixels giving an input node of size 9, 3 hidden layers with 9 nodes each and an output node of 1 node. For k-means 10 input values are generated from range of 0-100 and ouptut values specify if they belong to cluster 0 or cluster 1. So, we have 10 inputs and 10 ouptuts along with 3 hidden layers with 10 nodes each. 5. FPGA simulation Implementation of Neural networks on FPGA( Hardware Im- plementation) is performed to test if further speed acceleration can be achieved by Generally the neural networks are implemented in software, and are trained and simulated on general-purpose sequential computers for emulating a wide range of neural networks mod- els. Software implementations offer flexibility. However hard- ware implementations of neural networks provide high speed in real time applications and compactness. The usage of the FPGA (Field Programmable Gate Array) for the implementation of the neural network is done for the purpose of providing flexibility and speed to the programmable systems. The neural network design implememtation on the FPGAs provides higher speed and smaller size for real time application than the other imple- mentations.The major advantage includes that the programma- bility of reconfigurable FPGAs yields fast special purpose hard- ware for wide applications and this can also be used to explore new neural network algorithms and problems of a scale that would not be feasible with conventional processor implemen- tation . This implementation is done using Very High Speed Integrated Circuits Hardware Description Language (Verilog). 5.1. Overview The basic idea includes that each of its neuron take some in- formation as an input from another neuron or from an external input. This information is propagated as an output that are com- puted as weighted sum of inputs and are applied as non-linear function. FPGAs consist of three basic blocks that are config- urable logic blocks, in-out blocks and connection blocks. Logic blocks perform logic function. Connection blocks connect logic blocks with in-out blocks. These structures consist of routing channels and programmable switches. For this, first the training data is being generated on c, and is the data is being saved in the file. Then the neural network is implemented using the hardware language (verilog) on Xil- inx. The inputs are given as the input nodes and the weights are being wired between the different layers, while the output is ex- tracted from the output nodes.The hidden layes are implemted using different gates and are being looped for executing (Mul- tiplying and addition), giving the output. This implementation reads the data from the file generated in C++, in such way the trained data is being passed to the FPGA, and the neural net- work is executed on Xilinx. The execution time of this run is be- ing recorded and is being compared to that of the conventional run ( Software implementation on C). This shows the amount of speed up of execution of the same neural network.
  • 3. Figure 2: Neural Network block diagram in Xilinx 5.2. Implementation By using of the FPGA features hardware implementation of fully parallel ANN’s is possible. In this architecture number of multipliers per neuron equals to number of connections to this neuron and number of the full adders equals to number of con- nections to the previous layer. In this the verilog library were designed for floating point addition and floating point multipli- cation. The inputs from previous layer enter the layer parallel and multiplier serially with their corresponding weights. The results of multiplication are stored in their neuron area in the addition Neural Network. Multiplied value of per neuron are inputs for adder. The inputs of adder are added serially and each addition are inputs for lookup table. The results of look up table are stored for next layer. This ANNs architecture is shown in Figure 2. In this design number of layer and number of neuron can be changed easily during the working phase. Our development platform is the Xilinx SPARTAN-3E FPGA (Xil- inx 2007). This can further be modelled to a FPGA. Following is the ITL schematic of the implemented neural network. It consists of the inputs of the neural network X1, X2,..X10, and a clock , giving the outputs Y1, Y2,..Y10. A test bench in Verilog consists of same two main parts of a normal design; an entity and architecture. We are simply supplying inputs and observing the outputs to the design in test. The architecture of the test bench will consist of the design we are testing as a component, internal signals for input and output, a port map of the component for the UUT (unit under test), a process to run the clock and finally a stimulus process, which will be responsible for running the tests you write to test the design. Then the stimulus code is added to it. Firstly, we have defined the clock and clock period. Then, replaced the stimulus process with code. Total time for which code was simulated = 1000ns. In each cycle weight is read from the file and fed to the accumulator and in next cycle we get the output of accumulator. always @( posedge clk ) begin for ( s t a g e = 0; s t a g e < 4; s t a g e = s t a g e +1) begin for ( nod =(N*( s t a g e +1)+1); nod<=(( s t a g e +2)*N) ; nod=nod +1) begin node [ nod ]=0; / / i n i t i a l z e to zero f o r c l e a r i n g the p r ev i ou s summation for ( in =((N* s t a g e ) + 1 ) ; in<= ( ( s t a g e +1)*N) ; in = in +1) begin node [ nod ] = b i a s [ nod ]+ node [ nod ]+ node [ in ]* t e s t [ t e s t c o u n t e r ] ; end Y1 = node [ nod −1]; end end end After the test we got the resuts as shown in Figure 3. It shows the various values of the nodes being updated with time. Thereafter its simulation and run time were obtained and were compared with that of the network implemented on software (C++). Figure 3: Timing diagram screenshot using Xilinx development tool 6. Experiments and Results We have computed the running time and energy consumption for the software based version of neural network and compared with runnning times of the orignial programs without neural ac- celaration. Our results indicate a speedup of 10-900% without much loss of accuracy. We have used the FANN toolkit in C++ for performing neural network training and testing. All the re- sults can be seen in table below. Xilinx ran the neural network implementation in 4 microseconds. This is excellent speedup and we would like to investigate this further on different pro- grams with larger training and testing data sizes. 7. Discussion As we see from the results for software implementation of Neu- ral Networks i.e. Fast Artificial Neural Networks(FANN) li- brary, the time reduction in running the programs is clearly ap- parent. It is more prominent for k-means algorithm which is an iterative algorithm and is provided great speedup due to neural processing. Speedup for Sobel edge detection is affected due to entire image pixel processing computation which has equivalent data point processing as original sobel filter. Energy consump- tion does not show clear pattern Sobel Edge detection program
  • 4. Program Running time (mil- liseconds) Energy con- sumption (Watt) Mean Squared Error Sobel- original (40 images) 17547 ms 7.732 W NA Sobel- transformed (40 images) 16255 ms 8.1316 W 0.035139 Sobel- original (80 images) 31567 ms 8.005 W NA Sobel- transformed (80 images) 26911 ms 4.472 W 0.033425 Kmeans- original 983 ms 8.827 W NA Kmeans- transformed 180 ms 2.38 W 0.040964 and needs to be further investigated on different training and testing sizes. However, as seen from figure 4 and figure 5, power consumption and maximum temperature are higher for the orig- inal Sobel Edge detection code run on training and testing set of size 80 images. K-means NPU-accelerated program again pro- vides clear advantage over traditional k-means for power con- sumption. The accuracy for all the images generated are accept- able, although application dependency shall have the final say. Mean squared errors for each dataset is less than 0.05. 8. Conclusions and Future Work Can be utilized for many system programs but large scale utility is slowed down by manual computation. Innovative program- ming framework revisions to handle neural processing or tight processor architecture integration will have to be performed to take advantage of this technique for large scale acceptability and usability. Future scope includes better implementation of algo- rithms on hardware through smaller and more efficient mapping FPGAs , ASICS and other hardware accelerators are all poten- tial hosts for further testing of this approach. Another area of fu- ture work is the study of different machine learning algorithms along with neural networks Linear classifiers, principal compo- nents analysis and spectral waveform analysis tools have a vast potential especially in the field of electrical engineering and sig- nal processing. Deep Neural Networks are a natural extension for our current Neural Processing Architecture and should pro- vide significant improvements. Figure 4: Power consumption for NPU Accelerated Sobel code 9. Acknowledgements We thank Dr. Bhanu Kapoor for guidance and advise during development of the project.
  • 5. Figure 5: Power consumption for original Sobel code 10. References [1] C. M. Bishop, Neural networks for pattern recognition. Oxford university press, 1995. [2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statis- tical Learning – Data Mining, Inference, and Prediction. New York: Springer, 2009. Figure 6: Original image Figure 7: Edge Detected image using NPU Accelerated Sobel code Figure 8: Edge detected image using Sobel filter