SlideShare a Scribd company logo
1 of 117
Download to read offline
Självständigt arbete på grundnivå
Independent degree project  first cycle
Electrical Engineering
DFPM on FPGA – A speed optimized implementation of the Dynamic
Functional Particle method on Spartan 3E
Taiyelolu Adeboye
DFPM on FPGA
Taiyelolu Adeboye 2015-09-25
iii
MID SWEDEN UNIVERSITY
Department of Electronics Design(EKS)
Examiner: Benny Thörnberg, Benny.Thornberg@miun.se
Supervisor: Kent bertilsson, Kent.Bertilsson@miun.se
Author: Taiyelolu O. Adeboye, taad1000@student.miun.se
Degree programme: International Bachelor’s Programme in Electronics, 180 credits
Main field of study: Electronics Engineering
Semester, year: Autumn, 2014
DFPM on FPGA
Taiyelolu Adeboye
Abstract
2015-09-25
iv
Abstract
This thesis focuses on the design of electronic circuitry that implements
the Dynamic Functional Particle Method (DFPM). The design was done
in VHDL and implemented on a Xilinx Spartan 3E FPGA. The work
included a digital 33-bit ALU implementation that was designed to
solve differential equations with the DFPM algorithm and UART trans-
ceiver and controller circuits for data exchange between the FPGA and
the PC. This report explains the design principles, process, tests and
results of the work. It also compares the performance of the designed
system with the performance of generic computational devices and also
examines the possibilities and limitations of operational concurrency
with relation to the size of problem sets.
Keywords: MATLAB, VHDL, FPGA, DFPM, algorithm evaluation, CPU
clock cycles, particle method
DFPM on FPGA
Taiyelolu Adeboye
Acknowledgements
2015-09-25
v
Acknowledgements
I would like to express my appreciation to my supervisor, Associate
Professor Kent Bertilsson, for his guidance, mentorship and support in
the course of this project. His contribution was vital to the execution and
completion of this project work. I would also like to express my appreci-
ation to Associate Professor Sverker Edvardsson for being so approach-
able and for his great willingness to explain.
My various tutors and examiners in the course of this Bachelor’s pro-
gramme have proven themselves to be exceptional and unforgettable. In
no particular order, Professor Bengt Oelmann, Dr. Börje Norlin, Profes-
sor Kent Bertilsson, Professor Benny Thörnberg, Martin Kjellqvist,
Mikael Hasselmalm, Dr. Najeem Lawal, Mikael Bylund, Amir Yousaf,
Professor Cornelia Schiebold, Dr. Peng Cheng, Mazhar Hussein, Profes-
sor Engmont Porten, Stefan Haller, David Krapohl, Solange Hamrin and
Evelina Caffrey will remain entrenched in my memory.
Without mincing words, Anders Rådberg, Anders Molin, Sara Lodin,
Lars Malmbom, Tove Gullikson and the team at MIUN Innovation will
always remain dear to my heart. Thank you for your time, advice and
your effort!
Finally, I owe a huge debt of gratitude to the following: The divine, for
those moments when I was dry, Temitope Ruth, for being so under-
standing and special, Ire Peter, our bundle of joy, for being so sweet,
Kehinde, my wonderful twin, my family (Samuel, Dorcas, Ardex,
Adeyemi and Ope) for being such a pillar of support, and my friends in
Sweden and in Nigeria. Words will not be enough to express how much
I appreciate you!
Thank you for being part of this journey, muchas gracias! Greater things
are still to come!
DFPM On FPGA
Taiyelolu Adeboye
Table of Contents
2015-09-25
vi
Table of Contents
Abstract ............................................................................................................ iv
Acknowledgements .........................................................................................v
1 Introduction............................................................................................1
1.1 Background and problem motivation......................................2
1.2 Overall aim...................................................................................3
1.3 Scope .............................................................................................4
1.4 Tools to be used...........................................................................4
1.5 Concrete and verifiable goals ....................................................4
1.6 Outline ..........................................................................................5
1.7 Contributions ...............................................................................5
2 Theory......................................................................................................6
2.1 Definition of terms and abbreviations......................................7
2.1.1 Terms..................................................................................7
2.1.2 Abbreviations..................................................................11
2.2 DFPM algorithm........................................................................12
3 Methodology ........................................................................................15
3.1 Concurrence vs. sequentiality .................................................15
3.2 Numerical representation ........................................................15
3.3 Modularity..................................................................................16
4 Design....................................................................................................17
4.1 The DFPM algorithm ................................................................17
4.2 Project Top Module...................................................................19
4.2.1 The two top sub-modules..............................................19
4.2.2 Data type conversion .....................................................19
4.3 Project defined Packages..........................................................20
4.4 Communication Top Module ..................................................20
4.4.1 UART................................................................................20
4.5 Iteration Control Top Module .................................................22
4.6 Implementation Constraint......................................................24
4.7 Parameters..................................................................................24
4.8 Data exchange format...............................................................25
4.9 Signed numerical representation ............................................26
4.10 Integer and fractional representation.....................................27
4.11 Spartan 3E-1200 FG320 FPGA .................................................28
DFPM On FPGA
Taiyelolu Adeboye
Table of Contents
2015-09-25
vii
4.12 Nexys2 FPGA demonstration board ......................................28
4.13 Xilinx ISE ....................................................................................29
4.14 ISim Simulation software.........................................................29
4.15 Design verification ....................................................................30
4.16 The complete design .................................................................30
5 Results ...................................................................................................32
5.1 Simulation results......................................................................32
5.1.1 Element wise vector multiplication .............................32
5.1.2 Element-wise vector subtraction..................................33
5.1.3 Evaluating new vector V ...............................................34
5.1.4 Evaluating new vector X ...............................................34
5.1.5 Convergence check.........................................................35
5.1.6 DFPM top module..........................................................36
5.2 Comparison................................................................................39
6 Discussion.............................................................................................42
6.1 FPGA resource utilization........................................................42
6.2 Reduction in computation time...............................................42
6.3 Larger problem sets ..................................................................42
6.4 UART bottleneck .......................................................................43
6.5 Precision......................................................................................43
6.6 Communication input/output limitations .............................43
6.7 Cross platform comparison......................................................43
6.8 Output comparison...................................................................45
6.9 Communication possibilities ...................................................49
6.10 Applications ...............................................................................49
6.11 Implications................................................................................50
7 Conclusions ..........................................................................................51
7.1 Benchmark..................................................................................51
7.2 Further work ..............................................................................51
References........................................................................................................53
Appendix A: Documentation of own developed program code...........54
Design codes ....................................................................................................54
New V operations………. ..............................................................................65
New X operations............................................................................................67
One Iteration …………………………………………………………...69
DFPM top module ..........................................................................................73
UART Core …………………………………………………………..76
UART Interface …………………………………………………………..83
Project Top module.........................................................................................88
DFPM On FPGA
Taiyelolu Adeboye
Table of Contents
2015-09-25
viii
Test code written in C++.................................................................................96
Appendix B: Explanation of some basic mathematical concepts........100
Two’s complement........................................................................................100
Euclidian norm ..............................................................................................100
Appendix C: Project report summary.......................................................102
Appendix D: MATLAB codes....................................................................103
Code for problem specification and comparison. ....................................103
Appendix E. Table of standard ASCII symbols and their numerical
representation ....................................................................................109
DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
1
1 Introduction
DFPM on FPGA is a project work that implements the algorithm of the Dy-
namic Functional Particle Method in silicon. The implementation was done on
Xilinx Spartan 3E FPGA, and it was designed for speed (in terms of the num-
ber of clock cycles required for the implementation).
The Dynamic Functional Particle Method (DFPM) is a numerical particle
method that was developed at Mid Sweden University. While the method is
iterative, it consists of steps, some of which can be executed in parallel. There-
fore a FPGA was considered to be able to offer advantages due to its parallel
processing capabilities.
The FPGA implementation takes matrix elements as input parameters through
the UART and returns an output in the form of the solution vector relevant to
the parameter input received.
Figure 1.1: A simplified illustration of the project
DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
2
1.1 Background and problem motivation
Systems of linear equations can be used to describe many observable natural
phenomena in nature and find application in many areas in physics, mechan-
ics, and sensor fusion among others.
One of the approaches to solving systems of linear equations involves the
application of the knowledge of matrices. This approach treats the system as
matrices or vectors comprising of elements that represent the parameters of
the system in question.
This approach often results in the classical A*X = B problem where A, X and B
are matrices/vectors. A has elements containing various parameters of the
system, X contains elements representing the defining properties of the pa-
rameters and B represents the solution vector.
For instance, if a system is defined as shown below,
3x – 2y + 4z = 10
5y + 1y – 2z = -2
10y – 5y + 3z = 4
Then it can be represented in A*X = B form as shown below.
As the number of variables in these systems increase, the size of the matrices
increase proportionately but the number of iterations required for solving the
problem using an iterative numerical method increases geometrically, thus
consuming significant CPU time.
This project aims to address this problem through the design of an Arithmetic
and Logical Unit (ALU) that implements the DFPM algorithm in a system that
combines sequential and parallel execution as a means of reducing the number
of CPU clock cycles required per iteration and consequentially, the computa-
tion time for the complete algorithm.
DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
3
1.2 Overall aim
The overall aim of the project is the design of an ALU that implements the
Dynamic Functional Particle Method on a FPGA. The system will be capable of
receiving input in the form of parameters that represent the variables of the
system to be analysed and will give its output in the form of a matrix whose
elements represent the solution to the problem.
The designed system will be capable of communicating with a computer
through the USB port and the data is to be collected and displayed on the
computer screen using suitable software.
The output from the designed system should be correct and consistent in
comparison with values obtainable from a similar computation executed in
MATLAB or similar software on a PC.
Figure 1.2: An overview of the project concept
DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
4
1.3 Scope
The designed system is expected to be able to resolve system of linear equation
problems expressed in the form A*X = B where A is a 5x5 square matrix while
X and B are 5X1 Vectors respectively. A and B will be given as input to the
designed system while the system gives an output that represents X as a solu-
tion vector of the system.
The input to the designed system should be in the form of positive 8 bit inte-
gers while the output from it is expected to consist of whole numbers as well
as fractions which can be represented to a maximum precision of 8 binary bits.
Although limits have been imposed on the kind of input parameter expected
with the aim of easing the communication between the designed FPGA system
and PC software, it is expected that the ALU designed should be able to exe-
cute the DFPM algorithm on input data beyond these constraints.
1.4 Tools to be used
The following tools are expected to be used to carry out this project:
1. Xilinx Spartan 3E FPGA on Nexys2 demonstration board.
2. Xilinx ISE design suite.
3. Desktop terminal application software running on a PC.
4. MATLAB software running on a PC.
1.5 Concrete and verifiable goals
The goals of the project are as follows:
1. Design of a processor/ALU in VHDL. The unit should implement the
DFPM algorithm.
2. Implementation of parallel processing into the design of the DFPM
computational module, as much as optimal for the problem size.
3. Design of UART communication modules, in VHDL, for the transfer of
data from the PC/UART port to the DFPM computation module speci-
fied in the item number above.
4. Verification of the output from the FPGA. It should be consistently
equivalent to the output of the same algorithm run on a PC.
DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
5
5. Investigation and suggestion of possible solutions and approaches to
scaling up the design for significantly larger problem sets.
1.6 Outline
Chapter 2 of this report explains, in brief, the theories behind the design and
some related work pertinent to DFPM and the FPGA implementation while
Chapter 3 examines the design methodology and principles behind design
choices and approaches. Chapter 4 outlines some of the tests carried out to
verify the functionality of the modules designed as well as compares the
results with those obtainable from other systems. In the fifth chapter, the
results are discussed, and the possibilities and limitations examined, and
Chapter 6, which concludes the report.
1.7 Contributions
This design was wholly done by the author of this report with support and
guidance from the supervisor (Associate Prof. Kent Bertilsson). The design was
based on the Dynamic Functional Particle Method algorithm which was devel-
oped by Prof. Sverker Edvardsson et al [1].
Prof. Sverker Edvardsson supplied the author with information about DFPM
and sample application of the algorithm implemented in MATLAB. A UART
core designed for the Nexys2 and made available by Digilent Inc., it was
adapted in designing the data exchange modules interfacing between the
FPGA and the PC.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
6
2 Theory
Systems of linear and differential equations is a well-established concept in
mathematics and finds its applications in solving theoretical numerical prob-
lems as well as real world challenges in various fields of endeavours like
mechanics, biology, electronics, economics etc. Thus a lot of work has been
done to develop approaches to solving these problems.
The dynamic functional paticle (DFPM) is an approach, recently developed by
Sverker Edvardsson et al [1] [2], which can be used to solve systems of linear
and differential equations. The algorithm is simple, widely applicable and
efficient with significant comparative advantages in relation to some of the
other established approaches [2].
DFPM implements a novel second order dynamical particle method which,
though new, is related to some first order approaches in previous work done
by Sincovec and Madsen [3], Pata and Squassina [4], and F. Alvarez [5].
There are a number of computational libraries and algorithm, implementing
various approaches to solve problems of linear and differential equation sys-
tems. Some of these include ARPACK and LAPACK, Colt library (java), and
IML++ (C++) among others.
Since this report is not a mathematical treatise, the main focus is on design and
implementation of electronic hardware that is able to compute and present
solutions to problems presented as a system of differential equations received
as input.
The design and implementation done in this project, while novel, is also relat-
ed to a previous work by Bruce Land entitled “Hybrid Computing on an
FPGA“ [6], in which a Digital Differential Analyzer (DDA) was designed and
implemented on Altera Cyclone II 2C35 FPGA on an Altera DE2 FPGA
demonstration board. The design made use of numerical representation in 18
bits, of which 16 bits were set apart for floating point fractions. Parallel compu-
tations were also used in order to reduce CPU computation time.
Apart from Bruce Land’s design above, there is little or no known information
about the implementation of numerical or particle methods in FPGA, and this
work could lead to novel concepts and applications.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
7
2.1 Definition of terms and abbreviations
2.1.1 Terms
Below are basic definitions and/or explanation of some important concepts
used in this report.
1. Linear equations
A linear equation can simply be defined as an algebraic equation consisting of
either or both constants and a product of constants and single power variables.
2. Systems of linear equations
These are a set of simultaneous linear equations which are defined as a single
problem and meant to be treated as such. These are often encountered in real
life situations and observable physical phenomena.
3. Differential equations
These kinds of equations define relationships connecting certain functions or
physical properties with their differentials (i.e. derivatives) hence the name.
4. Systems of differential equations
These are simultaneous statements of differential equations defining a specific
problem as a function of relationships between one or more independent
variables and their derivatives (dependent variables).
5. Numerical methods
These are approaches to solving mathematical problems with the use of vari-
ous methods numerical approximation. Numerical methods can be direct or
iterative.
Direct numerical methods include algorithms that have a predefined number
of steps for arriving at solutions. An example is the Gaussian elimination
method. Iterative methods, however, require an undetermined number of
iterations, of computational steps, which can vary with each problem defini-
tion. Examples of iterative numerical methods are Newton’s method and the
Newton-Raphson method.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
8
6. Particle methods
Particle methods are algorithms used, primarily, for the simulation of interact-
ing particles of physical systems and their motion in nature. These algorithms
are, sometimes, applied to numerical treatment of theoretical mathematical
models. The dynamic functional particle method falls under this category.
7. Convergence
Convergence is a characteristic of an iterative method when its sequences
subsequently and consistently approximates, or “converges”, to some specific
numeric approximations. The approximation to which the method converges
to is said to be the solution for the problem being solved with the use of the
iterative method.
8. The Dynamic Functional Particle method
This is an iterative particle method applied to general mathematical problems
by which mathematical problem models can be translated to particle models
and solved, as developed by Sverker Edvardsson et al [2].
The method is robust and widely applicable to problems of systems of linear
and differential equations, especially those defining nature and observable
physical phenomena.
9. Sequential processes
Sequential processes are processes consisting of operations which are carried
out one after the other. In these kinds of processes no two operations take
place simultaneously. All operations follow a definite sequence. Examples are
operations that take place in a single core CPU (Central Processing Unit).
10. Concurrent processes
Concurrent processes are processes consisting of more than one operation
being carried out in parallel. These kinds of processes can occur in multi-core
CPUs, FPGAs and other kinds of devices with parallel processing capabilities.
11. CPU time
This refers to the time spent by a processing unit while carrying out a certain
computational operation or set of operations. It is expressed in seconds.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
9
12. Clock
This is a component in digital electronics systems by which the timing of
operations and processes are controlled. It basically oscillates between a high
and low signal.
13. Clock cycle
This is a single complete up and down oscillation of a clock.
14. Clock frequency
This refers to the number of cycles a clock completes in a second. It is ex-
pressed in Hertz.
15. Field Programmable Gates Array (FPGA)
These are integrated circuits that are factory manufactured to be configurable
by engineers and designers as the use case or application demands. They are
normally programmed in a hardware description language (HDL).
16. Universal Asynchronous Receiver Transmitter
This is a standard hardware that facilitates serial data exchange between two
electronic devices. A UART port should be connected to another UART port in
order for them to exchange data.
Data exchange between UART hardware is 1 bit serial and takes place between
cross-connected receiver and transmitter pins while the data received is con-
verted to parallel 8 bit format and exchanged between the UART hardware
and the device controlling it.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
10
Figure 2.1 Simplified illustration of the UART communication process
17. MATLAB
MATLAB is an interactive software platform and high-level programming
language which is often used in scientific and engineering computing due to its
simplicity, robustness and easy to use interactive environment and functions.
In this project, it was used for the initial execution of the DFPM algorithm and
comparison.
18. Terminal software application
This is a software application that enables its user to get access to one or more
input/output ports (e.g. USB) of a PC and which displays the data stream. In
this project, Br@y++ terminal was used to access a USB port and communicate
with the FPGA running the DFPM algorithm.
19. Two’s complement
Two’s complement is a method of representing positive and negative signed
numbers such that the most significant bit is used to represent the sign while
the rest of the bits represent the numeric value of the number being represent-
ed.
When the most significant bit of a number represented in two’s complement is
“1”, then the number is negative but when it is “0”, the number is positive.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
11
This is a standard way of representing numbers that is frequently applied in
computing and electronics.
2.1.2 Abbreviations
The following abbreviations are used in this report:
ALU: Arithmetic and Logic Unit.
ASCII: American Standard Code for Information Interchange. This is the
standard used for the data exchanged between the PC and the FPGA.
ASIC: Application Specific Integreated Circuit. These are integrated circuits
that are designed or configured for a specific use case or application.
ARPACK: Arnoldi PACKage. Is a software library, coded in FORTRAN,
which can be used to solve eigenvalue problems.
BGA: Ball Grid Array.
CLB: Configurable Logic Blocks. These are logic elements on FPGAs used to
implement circuits.
CPLD: Complex Programmable Logic Device.
CPU: Central Processing Unit.
DE: Differential Equations.
DFPM: Dynamic Functional Particle Method.
FPGA: Field Programmable Gates Array.
FPU: Floating-Point Unit.
HDL: Hardware Description Language. These are languages by which one can
design hardware by means of semantics in an ISE or IDE.
IDE: Integrated Design Environment.
IOB: Input Output Block. These are ports for input and output to and from the
FPGA.
ISE: Integrated Synthesis Environment. This is software for synthesizing
designs done in HDL. Xilinx ISE is an example.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
12
LAPACK: Linear Algebra PACKage. This a library written in FORTRAN
which can be used to solve problems in linear algebra.
LDE: Linear Differential Equations.
LSB: Least Significant Bit.
LUT: Look Up Table
MATLAB: This is a software platform and high-level language used for pro-
gramming and simulations.
MCU: Microcontroller.
MSB: Most Significant Bit.
N/A: Not Applicable.
RAM: Random Access Memory.
RX: Receive. This is a pin through which data is to be received on a transceiver
port.
TX: Transmit. This is a pin through which data is to be transmitted on a trans-
ceiver port.
UART: Universal Asynchronous Receiver Transmitter.
USB: Universal Serial Bus.
VGA: Video Graphics Array. This is a standard for image display.
VHDL: VHSIC Hardware Description Language. In this project, VHDL was
used for digital hardware design.
VHSIC: Very High Speed Integrated Circuit.
2.2 DFPM algorithm
The dynamic functional particle method (DFPM) is widely applicable to solv-
ing a number of different problems when defined as a system of linear or
differential equations. However, the focus of this project work is on the appli-
cation of DFPM to solve the classical A*X = B system of differential equation
problem as described in Chapter 1 of this report.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
13
The algorithm is simply a two-step computation which is iterated until con-
vergence (or a specified level of convergence) is reached. Checking for conver-
gence is done by evaluating the Euclidean norm of the difference between
vector B and the vector product of vector X and matrix A and comparing it
with a predetermined scalar value representing the acceptable tolerance of the
computation.
The algorithm requires a number of input which are three n sized vectors
representing vector B in the problem statement and vectors X and V which are
used in the algorithm. An nxn matrix is also required as an input equivalent to
the A-matrix in the problem statement. Three scalar input Dt, mu and toler-
ance are also expected in the algorithm and they represent the discretization
step, the damping factor and the tolerance respectively.
DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
14
Figure 2.2 A flowchart of the DFPM algorithm
A MATLAB sample code implementing the algorithm in Figure 2.2 above is
included in this report.
DFPM On FPGA
Taiyelolu Adeboye
3 Methodology
2015-09-25
15
3 Methodology
As stated in the introductory part of this report, one of the purposes of this
project work is the reduction of CPU time. Hence, significant attention was
paid to the computational processes implemented in this design, as well as the
impact on the speed, and resource use on the FPGA. This chapter describes the
methodologies and considerations that influenced the design and implementa-
tion as described in the following chapter.
The preference of an FPGA over traditional CPUs and other types of pro-
cessing units is a consequence of the advantages offered by operational con-
currency that is characteristic of FPGAs and CPLDs.
After having chosen a design concept, the next biggest challenge was the
design itself. The design in this project work was done in VHDL (VHSIC
Hardware Description Language). While there are other languages and ap-
proaches to similar hardware design, VHDL was chosen because of the ease
with which it can be used to manage large projects, as well as the author’s
familiarity with it.
3.1 Concurrence vs. sequentiality
A limitation that was encountered early in the course of the design was the
limited number of dedicated multipliers on FPGAs. This was due to the fact
that FPGAs have a limit to the number of multipliers available on them, hence
limiting the number of multiplicative operations that can be executed concur-
rently.
An important focus of this work is speed optimization, for which concurrency
is key in this implementation. However, a balance needed to be struck between
concurrency and sequentiality. Hence some operations were run in parallel
while others were sequential. Addition and subtraction operations were most-
ly concurrent while some multiplicative operations were sequential and others
parallel.
3.2 Numerical representation
The dynamic functional particle method involves an iterative process with a
number of multiplications, subtractions and additions at each stage. The algo-
DFPM On FPGA
Taiyelolu Adeboye
3 Methodology
2015-09-25
16
rithm was implemented in MATLAB and run while the result of the computa-
tions at each stage of the iteration was output to the console and examined.
The cursory examination clearly indicated that the various values obtained
from the computations assumed a range that stretched across positive and
negative parts of the number line. This implied that a scheme was needed for a
distinct representation of negative and positive values. The values contained
integers as well as fractions, necessitating a need for representation of frac-
tions.
3.3 Modularity
In order to simplify the design, the whole project was split into to two major
top modules. One of these two top modules implemented the DFPM algorithm
and the necessary iterative computations while the other module was designed
to implement UART communication and data exchange between the UART
hardware on the FPGA board and the port on the PC with which it will be
communicating. This second module was also responsible for the conversion
of the 8-bit parallel data to 33-bit numbers and the format expected by the
DFPM algorithm module.
Each of these top modules was subdivided into smaller modules which carried
out specific functions and communicated with other modules through signals
and inter-module data exchange.
The details of the design are discussed under design in Chapter 4.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
17
4 Design
The digital hardware designed in VHDL consisted of combinatorial and syn-
chronous circuits which were coded as IO ports, modules, processes and
signals. The functioning of the combinatorial circuit elements were instantane-
ous while synchronous circuit activities too place at the edge of the clock.
The complete design was made up of several modules exchanging information
with the aid of signal input and output via their ports. Since the design is
reasonably complex and large, an attempt was made to give each module a
name that signified or helped to identify the purpose and function of the
modules.
The core of the design consisted of the modules which executed the DFPM
algorithm, an over view of these core modules and their interaction is present-
ed in Figure 4.1
4.1 The DFPM algorithm
The dynamic functional particle method is widely applicable to many problem
models as stated in Chapter 2 of this report. However, in order to design a
circuit that specifically solves the A*X = B problem, one needs to understand
the step by step procedure of applying DFPM to the problem. Various imple-
mentations of DFPM in MATLAB, C++ and VHDL as applied in this thesis are
included in the appendix.
The procedure entails access to input vectors and matrix containing a number
of elements, of vectors and matrices, which make up the coefficients of the
systems of equations. The next step is the iterative computation, after which
comes the output. Throughout the process, the values of vector B, matrix A, Dt
and the damping factor (mu) remains fixed while the values of vectors X and V
may be modified at the end each iteration.
Each stage of the iterative computation comprises of two steps which are the
approximation calculation and the convergence check. The approximation
calculation takes the form of matrix multiplication, subtraction and addition
operations while the convergence check required a comparison of a predeter-
mined tolerance value with the Euclidian norm of the vector V.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
18
Figure 4.1. An overview of the core modules of the DFPM algorithm
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
19
4.2 Project top module
The topmost level container for the project HDL code was named
DFPM_ON_FPGA_TOP_MODULE. This module functioned as the overall top
module, containing all VHDL code relevant to the project design. It consisted
of two top modules which served two distinctly important functions. The
modules were named “UART_INTERFACE” and
“Signed_DFPM_Iteration_Control_Top_Module”. The complete VHDL code
for all the modules will be included as an appendix to this report.
4.2.1 The two top sub-modules
The communication top module was designed to handle communication with
the PC through the UART port and the UART VHDL code that controlled it.
Data received from the PC which would normally be in 8 bits were converted
to 33 bits in the format stated in section 3.2.2 of this report. The data were also
accumulated in arrays internal to this module until all data relevant to the
specific problem model has been received. The data would then be sent as
output through the ports of this module.
The Signed DFPM Iteration control module receives a stream of 33-bit data in a
format specified in its design, which mathematically describes the problem
being solved. The data received would then be subjected to the DFPM algo-
rithm, after which a solution would be obtained and sent out as an output
through the ports of this module.
At the conclusion of the Signed DFPM Iteration Control module’s computa-
tion, the output signal would be returned to the Communication top module
which reconverts the solution by first translating the result into human reada-
ble decimal equivalent before serially shifting the values out in 8 bits through
the UART interface.
4.2.2 Data type conversion
The communication top module handles data as standard logic vectors and
standard logic signals while the Signed DFPM Iteration Control module han-
dles data as signed bit vectors for all vectors.
This fact necessitated a need for the conversion of the data signal types from
standard logic vectors to signed bits and vice versa. This was done with the aid
of predefined functions which are conversion standards in VHDL. The conver-
sion takes place in the project top module.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
20
4.3 Project defined packages
The input data for each problem consisted of scalar data and many vectors and
some multi-dimensional matrices. Hence a specific format was designed for
easy recognition and handling of these vectors and matrices. Due to the fact
that these design-specific format vector data types were often handled and
shared between multiple modules in the project, it was considered advanta-
geous to create special packages to define these unique format vectors.
The specific formats designed are described below:
1. DFPM_VECTOR_5X32_BIT: A data type defining an array of 5 standard
logic vectors. Representative of a 5 by 1 vector of standard logic type
data.
2. DFPM_VECTOR_25X32_BIT: A data type defining an array of 5
DFPM_VECTOR_5X32_BIT. I.e. a multidimensional array equivalent to
a 5 by 5 matrix of standard logic vector type data.
3. DFPM_ARRAY_5X32_BIT: A data type defining an array of 5 signed bit
vectors. It was used to represent 5 by 1 vectors of containing signed da-
ta.
4. DFPM_ARRAY_25x32_BIT: A data type defining an array of 5
DFPM_ARRAY_5X32_BIT. This is equivalent to a 5 by 5 multidimen-
sional array of signed data.
These packages were used to ease the process of design and implementation
and also facilitated a unified standard between modules.
4.4 Communication top module
The communication top module comprised of 8 sub-modules. The modules
and their functionalities are briefly described below.
4.4.1 UART
These are the modules controlling the UART circuitry
1. RS232RefComp: This module was released by Digilent Inc. as a sample
code for an implementation of a UART core for the Nexys2 board. It is
the only purely non-original code used in this project.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
21
It is a simple implementation of UART designed in VHDL and it is re-
sponsible for 1 bit serial data transmission and reception, as well as the
conversion of 1-bit serial to 8-bit parallel data and transmission to the
on-board electronic hardware.
2. UART_INTERFACE: This module was used to control the RS232Comp
circuit. It determines when the UART core should transmit data, receive
data or neither.
This module is a simple four-state state machine. The states correspond
to:
a. Receive state: When the UART core is switched to receive data.
b. Waiting state: When both the UART interface and the UART core
do nothing but wait for data from the DFPM module.
c. Send state: When the UART module is switched to send an 8 bit da-
ta.
d. RepeatSend state: This is a transitional state where the module goes
to after sending each 8-bit data before sending the next. This helps to
ensure that the data transmission between the UART INTERFACE
and the UART core is hitch-free.
The control of the UART core from the UART INTERFACE and feed-
back from the UART core was facilitated with the aid of four signals namely
wrSig, rdSig, TBESig and RDASig. These signals and their effect on the UART
core are outlined in Table 4.1 below.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
22
Table 4.1 Table of control signals and their effect on the state of the
UART core
UART Module status
Transmit Receive
Signal wrSig 0 Off N/A
1 On N/A
rdSig 0 N/A On
1 N/A Off
Feedback from the UART core was received through the TBE and RDA signals,
which, when raised high, indicated that new data has been read or transmitted
respectively.
4.5 Iteration control top module
This module is made up of the circuitry that implements the DFPM algorithm.
The sub-modules were designed to carry out the various computations and
logical evaluation required in the DFPM method.
1. Signed_Vector_Vector_Mult_5By1: This module computes the ele-
ment-wise product of two 5 by 1 vectors of 33-bit data. Its operation is
concurrent and all computation results are immediately available at the
output when the input values changes.
2. Signed_Vector_Vector_5By1_Subtr: This module computes the ele-
ment-wise difference between the elements that make up two modules.
It concurrently performs subtraction operations on two vectors contain-
ing five elements of 33-bit data type and immediately assigns the result
to the output.
3. Signed_SubtrAndMult_Ops_Module: This module instantiates the
vector multiplication and the vector subtraction modules above and us-
es them in the computation “B – A*X – mu*V” for each iteration stage of
the DFPM algorithm.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
23
In this module, computation of the product of matrix A and vector X
was a combination of concurrent and sequential operations. The prod-
uct of one row of matrix A and the vector X was concurrent but since
matrix A comprised of 5 rows, each row product was pipelined in order
of row sequence.
4. Signed_New_V_Ops: This module computed a new value for the vec-
tor V at each iteration stage of the DFPM algorithm. The value was
based on the result of the operations carried out in the subtraction and
multiplication operations module, described in number 3 above.
5. Signed_New_X_Ops: This module computed a new value for the vector
X in each iteration stage of the DFPM algorithm. The new value for vec-
tor X is always dependent on the new value of vector V above.
6. Signed_Tolerance_Check: This module receives the value of B-A*X as
input and should then compare the Euclidean norm of the vector re-
ceived with the pre-fixed tolerance value. However, computing square
roots in FPGA can be problematic and introduce significant errors.
Hence, the square of the tolerance value was compared with the square
of the Euclidean norm, which is equivalent to the sum of the squares of
the elements that make up the vector input.
After comparison, if the square of the norm was found to be lesser than
the square of the tolerance level, a signal line would then be raised and
the algorithm terminates. The squares of the two vectors were comput-
ed by self-multiplying them with the aid of the Vector_Vector_Mult
module described above.
When the condition checked by this module is found to be true, conver-
gence is said to have been reached.
7. Signed_DFPM_One_Iteration: This module instantiated the subtraction
and multiplication module, new v operation module, new x operation
module and the tolerance check module. It connected the input and
output appropriately and makes up all the operation that make up one
iteration stage of the DFPM algorithm.
8. Signed_DFPM_Iteration_Control: This module instantiated the
Signed_DFPM_One_Iteration module. It feeds the new V and X vectors
back into the computational module and stops the iterations when con-
vergence is attained.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
24
4.6 Implementation constraint
In order to translate, map and route the design done in VHDL to device specif-
ic circuit, an implementation constraints file named UCF_DFPM_TOP was
used. The file links input and output pins specified in the project top module
with the intended pin on the FPGA chip and demonstration board.
4.7 Parameters
The design was intended to make room for some level of easy configurability.
Thus, the initial values of vectors v and x, and the scalar discretization coeffi-
cient (dt), the tolerance and the damping factor (mu) can be changed inside the
DFPM modules. The UART module parameters can also be easily modified.
The default values for these parameters are listed below:
Table 4.2 Table of parameters and corresponding values used
S/N Parameter Value used
1. Vector V [1 1 1 1 1]
2. Vector X [1 1 1 1 1]
3. Damping factor 0.1
4. Discretization coefficient 1.0
5. Tolerance 2-7
6. UART baud rate 9600
7. Number of data bits per trans-
mission
8
8. Parity odd
9. Number of stop bits 1
10. Handshaking None
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
25
4.8 Data exchange format
The exchange of data between the PC terminal and the FPGA system needed
to be standardized in order for the data to be stored in the correct structure
and also for it to be usable by the DFPM computation modules.
The MATLAB approach for specifying vectors and matrices was, hence,
adopted.
In order to specify a problem set of the type applicable in the format usable by
the DFPM module, closing braces begin all problem sets, followed by each
element of each row of the matrix separated by whitespace and each row in a
matrix separated by a semicolon. The solution output from the FPGA is trans-
mitted using the same standard except for the opening and closing braces.
An example of the utilization is shown in the Figure 4.2 below.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
26
Figure 4.2 Image showing the terminal being used for data exchange be-
tween the FPGA and the PC
4.9 Signed numerical representation
Since digital systems only deal with binary arithmetic for numerical computa-
tions and representation, the numbers handled in the DFPM algorithm were
represented by using signed bits. This decision helped to ensure that positive
and negative numbers were distinguished from one another.
The downside of this approach was that the bit being used for sign representa-
tion could not be used for numerical value representation. Therefore an extra
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
27
bit needed to be added to the number of bits representing each signed number
in order to make up for the shortfall.
4.10 Integer and fractional representation
Another important consideration in the design was the representation of
fractional values. It was decided that binary digits after the radix point will be
represented and treated like whole integers i.e. shifted to the left. At the end of
all computations, the result will also be shifted to the right by the appropriate
number of binary digits to make up for the left shift. This process is a simple
scheme that makes for the manipulation of fractions in a way that is similar to
whole numbers.
As a result, each number in the DFPM algorithm consisted of 33 bits. The MSB
indicated the sign of the number while the next 16 bits represented the integer
part of the value being handled. The fractional part of the number was then
represented by the least significant 16 bits.
Below is an image showing a sample numerical representation as used in the
design. It can be seen that the MSB is “0” therefore it is a positive number. The
next 16 bits are equivalent to 910 and the last 16 bits are equivalent to 0.628906
(i.e. 2-1 + 2-3 + 2-8). Hence the number represented in the image below is
+9.628910.
Fig 4.3 Image showing the numerical representation scheme
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
28
The multiplication of two numbers with n number of fractional binary digits
will result in a product with 2n fractional binary digits. This scheme, therefore,
offers an advantage in multiplication operations since it ensures that multipli-
cative operations maintain a precision of 2-810 for each operation.
4.11 Spartan 3E-1200 FG320 FPGA
Spartan 3E-51200 FG320 FPGA is a standard performance 320-ball fine pitch
ball grid array FPGA chip with 1.2 million gates, 136 K RAM, 28 dedicated
multipliers and 250 user IO pins [7]. The chip is made up of five functional
elements which are the Digital Clock Managers (DCMs), the Input/Output
Blocks (IOBs), Configurable Logic Blocks (CLBs), dedicated multipliers and
block RAMs.
The dedicated multipliers are able to directly compute 18-bit by 18-bit multi-
plication in two’s complement while the IOBs can be used for data input and
output to and from the FPGA and the 136 K RAM is equivalent to 139264 bits
of memory available for storage on (136 * 1024 bits). The logic of combinatorial
and synchronous circuits resulting from the VHDL design is mainly imple-
mented in CLBs (Configurable Logic Blocks) on the chip.
4.12 Nexys2 FPGA demonstration board
The Nexys2 FPGA demonstration board is a hardware platform, designed and
manufactured to accommodate and support the Spartan 3E FPGA, enable a
demonstration of its capabilities and provide some standard hardware periph-
eral access to the chip.
It can be powered via USB, battery or wall socket and runs on a 50 MHz oscil-
lator while featuring 16 MB SDRAM and flash and an impressive array of
standard hardware interfaces like VGA, USB, RS232 ports as well as switches,
buttons and a quad digit seven segment display [8].
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
29
Figure 4.4 Image showing a Nexys2 FPGA demonstration board
4.13 Xilinx ISE
Hardware design was done with Xilinx ISE (Integrated Synthesis Environ-
ment) and the generated design was then downloaded onto the FPGA. Xilinx
is free software developed by Xilinx for programming FPGAs and for their
hardware design.
There are a number of other design/synthesis environment applications for
hardware design, e.g. Altera’s Quartus II design environment. However,
Xilinx seemed to be an obvious choice due to the fact that it was offered by the
vendor of the FPGA chip used, and also because it provides out-of-the-box
support for the FPGA chip and the board used.
4.14 ISim simulation software
ISim simulator software is a software application for the simulation of HDL
code which is bundled with the Xilinx ISE software suite. It is easy to use and
provides support for mixed languages, multi-threaded compilation, and dis-
plays the circuit behavior with the aid of waveforms on the screen.
ModelSim is also a simulation software that can be used but due to its usage
restrictions and the author’s familiarity with ISim, ISim was chosen over
ModelSim.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
30
4.15 Design verification
For each module designed in this project, a test-bench was written for testing,
simulation and verification of its functionality and behavior. Test-benches, in
this context, refer to VHDL code written for the purpose of simulating opera-
tional circumstances of the designed module in question. The modules being
tested are normally referred to as unit under test (UUT).
4.16 The complete design
The complete system integrated these different modules and connected them
while doing type conversion in the top module where appropriate. The incom-
ing data from the UART were converted to signed bit vectors and stored in
memory on the FPGA until all the data necessary for each problem set were
received.
After this, a signal that activates the DFPM computation module is raised so
that computation can start. The complete design made use of 26 multipliers, 12
IOB pins and 3243 LUTs. While the utilization of multipliers was 92%, the
utilization of logical and IO blocks was much lower. A copy of the project
report summary is included in the appendix of this report.
DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
31
Figure 4.5 The Nexys2 board FPGA connected to a PC and running the
DFPM algorithm.
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
32
5 Results
Every module designed in Chapter 4 of this report was tested with a test-bench
written in VHDL. The test benches were written to simulate the expected
conditions and functional environment for each module. The simulations were
done in ISim software and the module’s behavior verified through visual
inspection and calculations. The test benches were not included in appendix of
this report. The following are results of the tests carried out on the modules.
It is worth noting that since the values represented in this chapter are basically
binary, negative numbers were represented in two’s complement.
5.1 Simulation results
5.1.1 Element wise vector multiplication
The image below shows the result of the simulation of the vector multiplica-
tion module. Vectors 1 and 2 were input while vector_out was the output.
Fig 5.1 Test simulation for Signed_Vector_Vector_Mult module
Vector 1 = [5.0 3.0 2.0 4.0 7.0] and Vector 2 = [3.0 2.0 3.0 4.0 5.0]
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
33
The output vector was 10011102 = 78.0
By calculation: (5*3) + (3*2) + (2*3) + (4*4) + (7*5) = 78
This supports the idea that the module worked fine.
5.1.2 Element-wise vector subtraction
Figure 5.2 Test simulation for Signed_Vector_Vector_5By1_Subtr module
Above is an image of the simulation waveform for the vector subtraction
module. The input vectors were named vectors 1 and 2 while the output was
named vector_out.
Vector 1 = [1.0 7.81e-3 11.72e-3 15.62e-3 19.53e-3]
Vector 2 = [15.0 3.91e-3 3.91e-3 3.91e-3 3.91e-3]
Vector out = [-14.0 3.91e-3 7.81e-3 11.72e-3 15.62e-3]
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
34
Simple calculation indicates that Vector 1 – vector 2 = vector out.
5.1.3 Evaluating new vector V
In the image below, the effect of operations pipelining can be seen as the
elements of vector_new_v assume new values one clock cycle after one anoth-
er. The iteration complete signal indicates the completion of the subtraction
and multiplication operations in each iteration stage.
Figure 5.3 Test simulation for Signed_New_V_Ops
5.1.4 Evaluating new vector X
Similar to the module in section 5.1.3 above, the effect of pipelining is seen in
the evaluation of vector_new_x. The signal new_v_ready signified that the
evaluation of the new value for vector V was complete and that the evaluation
process for vector x can start.
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
35
Figure 5.4 Test simulation for Signed_New_V_Ops
The signal new_X_ready is a signal line that indicated that the operation was
complete. The behavior was as expected.
5.1.5 Convergence check
The tolerance check module was simulated with two sets of values for vector
b_ax. The first set of values was set to be beyond the tolerance level while the
second set of values was set to be below the expected limit.
The signal “iteration complete” raised at the end of each multiplication and
subtraction operation of the iteration stage. The convergence check module
completes its function in about seven clock cycles, after which, the “iterate”
signal should be raised high or low depending on the result of the convergence
check.
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
36
Figure 5.5 Test simulation for tolerance check module
It can be seen above that after the second set of values were received and
computed, the “iterate” signal was brought low. This is consistent with the
design concept.
5.1.6 DFPM top module
This simulation was done with the following input set:
Vector B
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
37
Matrix A
Vectors X and V
By visual inspection of the results from the simulation, the final value of vector
X on the output was calculated thus:
Vector X(0) is a negative number since the first bit is 1.
1111111111111111111000111011001012 in two’s complement is equivalent to -
0000000000000000000111000100110102 in unsigned binary. A simplified ap-
proach to conversion of unsigned binary to and from two’s complement is
outlined in the appendix.
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
38
Figure 5.6 Test simulation for DFPM top module
Hence it is correct to state that:
Vector X(0) = - (0.0 + 2-3 + 2-4 + 2-5 + 2-9 + 2-12 + 2-13 + 2-15).
Vector X(0) = -0.2211
In the same manner Vector X(1) is a negative number.
1111111111111111111100100011100112 in two’s complement is equivalent to -
0000000000000000000011011100011002 in unsigned binary. Hence,
Vector X(1) = - (0.0 + 2-4 + 2-5 + 2-7 + 2-8 + 2-9 + 2-13 + 2-14)
Vector X(1) = -0.1076
Vector X(2) , Vector X(3) and Vector X(4) are positive numbers since their MSB
are 0. Therefore conversion from two’s complement is not required for them.
Vector X(2) = 000000000000000000001001111100000
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
39
Vector X(2) = +0.0 + 2-4 + 2-7 + 2-8 + 2-9 + 2-10 + 2-11
Vector X(2) = +0.0776
Vector X(3) = 000000000000000000011111001011011
Vector X(3) = +0.0 + 2-3 + 2-4 + 2-5 + 2-6 + 2-7 + 2-10 + 2-12 + 2-13 + 2-15 + 2-16
Vector X(3) = +0.2436
Vector X(4) = 000000000000000000101101000100000
Vector X(4) = +0.0 + 2-2 + 2-4 + 2-5 + 2-7 + 2-11
Vector X(4) = +0.3520
Therefore the final value of the solution vector in this simulation was
While the behavior seen above was consistent with design expectation, it was
considered that comparison with the output from a MATLAB implementation
would help to further verify the module’s behavior.
The values obtained from the MATLAB code and the VHDL simulations were
quite close as the MATLAB implementation produced vector X as shown
below:
X = [-0.2199, -0.1074, 0.0775, 0.2440, 0.3521]
5.2 Comparison
The circuit implemented on FPGA was tested by connecting the FPGA to a PC
and sending in numbers that represented problem sets while the FPGA re-
turned the solution to the problems. Since the accuracy was crucial, the results
obtained during these tests were noted and compared with values obtainable
from the same algorithm implemented in MATLAB on a PC. The comparison
showed that the values obtained by both systems, for each problem set inves-
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
40
tigated, were approximately equal. A table comparing the results obtained
during two of these tests is shown below.
Table 5.1 Table of a comparison of the results obtained from two runs of
DFPM on different systems.
1st test 2nd test
Problem
Set
Vector A
Vector B
Solution
Vector
(MATLA
B/PC)
Binary N/A N/A
Decimal
Solution
Vector
(FPGA)
Binary
DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
41
Decimal
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
42
6 Discussion
Based on the tests carried out on the VHDL design modules, the behavior of
the circuit was as expected. However, a number of implications need to be
discussed.
6.1 FPGA resource utilization
Due to the fact that FPGAs have limited resources, there are established limita-
tions to the number of multiplication operations one can execute in parallel for
problems of the 5x5 matrix dimension implemented in this design. As matrix
dimensions get bigger the number of concurrent operations possible are re-
duced proportionately.
By this design, for a problem defined by an n dimension matrix and n-element
vectors, then n + 5 number of multipliers will be needed for the design. This is
because matrix row-vector multiplication in A*X was done concurrently for
each row while other multiplication operations were done sequentially. An-
other limitation is the data size expected by the dedicated multipliers.
The Spartan 3E multipliers are 18-bit multipliers by default and multiplication
operations involving data types bigger than 18 bits will consume even more
resources. As can be seen in the project report, the actual number of multipli-
ers used was 26 out of a total of 28.
6.2 Reduction in computation time
For every iteration stage of this design, computation time for (n-1)2 is saved.
Thus for a solution requiring m number of iterations, the time required for ((n
– 1)2 * m) multiplication operations are saved per solution. For instance, a 5 by
5 design as implemented in this project work saves the computation time for
1600 multiplication operations for a solution requiring a hundred iterations.
6.3 Larger problem sets
An approach to implementing this design for significantly larger problem sets
might be to section the complete data set into subsets containing small-sized
problem sets which the module is capable of handling. The solutions can then
be stored and reused as appropriate. At a point, this approach might encounter
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
43
limitations as well, due to the fact that the on-chip memory of FPGAs is also
limited. However, this was not the focus of this design.
6.4 UART bottleneck
Tests showed that each iteration stage of DFPM computation for a 5 by 5
dimensioned problem required 28 clock cycles. However, the data was being
received through a 9600 baud rate UART. The UART is, thus, slower than the
DFPM computations. In a case where large volumes of data may need to be
transmitted to the DFPM computation module, the UART may prove to be a
bottleneck. This problem might be mitigated with the use of a more parallel
communication mode and faster transmission rates.
6.5 Precision
Although the number of bits assigned for fractional value representation was
quite many (16 bits), there might be some challenges when it comes to the
accuracy of the exact values obtained from multiplication operations. This is
because the result of the multiplication of two 33-bit values is a 66-bit value.
When this product is to be stored back in a 32-bit data type container, then
some bits will be lost.
This problem will, most likely, not affect integer values in the DFPM computa-
tion but can result in some precision loss in the fractional representation.
6.6 Communication input/output limitations
Since the data received from the UART could not be used directly, modules
were written for the forward and reverse translation of the data transmitted to
and received from the DFPM computation module.
For instance, due to the translation done in the “UART_out_DFPM_in” mod-
ule, only single digit decimal numbers are expected as input data typifying the
problem set. Likewise, in order to reduce FPGA resource consumption, reverse
translation of the solution vector element sets was also limited to four fraction-
al digits.
6.7 Cross platform comparison
Since the goal of the project is to implement DFPM in an FPGA design that is
speed optimized, the CPU time consumed by the algorithm became an issue of
pertinent importance. However, since different computational devices have
varying architectures and processing speed, as well as operating systems, a
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
44
reasonable metric for the evaluation of the computation time that is independ-
ent of these parameters was needed in order to compare the performance of
the FPGA design with other implementations. The agreed metric was the
number of clock cycles used by the processing unit while executing the DFPM
algorithm.
Thus comparison was done between the DFPM computation done on the
FPGA and the same algorithm coded in C++ and run on a 2.4 GHz CPU PC.
The FPGA implementation completed the algorithm for solving the sample
problem used for testing the DFPM top module (according to simulation) in
57670 nanoseconds which is equivalent to 2883.5 clock cycles while the PC
used completed the same problem in 0.0156001 seconds.
The time used up by the PC included the time used for context switching and
kernel operations, in the operating system, as well as process user time. Provi-
sion was made in the C++ code used for implementing the algorithm and for
measuring the time taken.
In the C++ code, arrays with a dimension of 1000 were created for storing a
thousand copies of vectors A and B and the DFPM algorithm was implement-
ed and looped through each copy of the same problem statement. Thus a
thousand copies of the same problem were treated with the same algorithm.
The large number of iterations was a result of the fact that the amount of time
spent by the CPU in kernel mode was sometimes too low to be measured by
the functions used to measure the CPU process times when the algorithm was
run only once.
Hence running the algorithm a thousand times generated reasonably measur-
able process times from which the time spent by the CPU while not running
the actual algorithm was deducted and the result of the deduction was divided
by 1000 in order to trim down the CPU time obtained to what is applicable to a
single run of the DFPM algorithm.
Based on the test, and the assumptions that the program/algorithm was exe-
cuted on only one core of the CPU and that the CPU was not overclocking, the
number of clock cycles used by the PC = 2.4 * 109 * 0.0156001/1000 = 37440.240.
This evidently indicated that the FPGA implementation offers a great ad-
vantage.
It is noteworthy to state that if the CPU executed the program on multiple
cores or overclocked while running the program, the PC may have ended up
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
45
using more cycles than stated above. Nonetheless, the calculations show that
in both cases, DFPM would still have been faster. A copy of the C++ code is
included in the appendices.
6.8 Output comparison
In order to ensure consistency of results and ease of operation, a MATLAB
script was written which is able to communicate problem specifications to the
FPGA and receive its results. The MATLAB script also computes the algorithm
on its own and the two outputs were printed to the screen and compared. The
script is described further in Appendix D with the code included.
By making use of the script described above, three different problem sets were
formulated and fed to the DFPM on FPGA design through the MATLAB
script. The results obtained are shown below as well as the MATLAB plots of
the values obtained during each test.
The plots have no units on the x and y axes since the plots were only used to
indicate the proximity between the results obtained. Hence the plots showed
the location of each of the results obtained on the co-ordinate axes.
Figure 6.1 Plot of the values obtained during the first test
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
46
Table 6.1 Table of results obtained in tests with three different problem sets
Tests Results obtained
MATLAB implementation FPGA implementation
Test 1 -2.4599e-01
-1.9253e-01
+5.8280e-03
+2.5866e-01
+5.0859e-01
-2.4715e-01
-1.9301e-01
+5.7221e-03
+2.5965e-01
+5.1057e-01
Test 2 -3.8910e-01
-1.5755e-01
+1.2061e-02
+2.6273e-01
+5.1339e-01
-3.9112e-01
-1.5810e-01
+1.1765e-02
+2.6343e-01
+5.1507e-01
Test 3 +6.5463e-01
+3.7920e-01
+3.1785e-01
+6.8058e-02
-1.8173e-01
+6.5653e-01
+3.7948e-01
+3.2008e-01
+6.8391e-02
-1.8323e-01
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
47
Figure 6.2 Plot of the values obtained during the second test
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
48
Figure 6.3 Plot of the values obtained during the third test
As can be seen in the figures and table above, in each of the three tests carried
out, the results of the MATLAB implementation and the FPGA implementa-
tion tallied so much so that the point plots overlapped at each of the positions
marked on the plots, indicating that, to a large extent, the differences in the
values obtained are almost negligible.
However, it is worth noting that these tests made use of single digit data as
coefficients in the matrices and vectors used to define the problem sets. It is
believed that this implementation can handle these kinds of data but the de-
sign of the communication modules were limited and only capable (by design
intent) to handle single digit input alone.
While the MATLAB implementation produced results that are very close, it
may be reasonable to expect some variation with some other implementations
and system architectures due to the differences in hardware and software
design, as well as system optimization, be it in hardware or software.
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
49
6.9 Communication possibilities
As indicated in an earlier part of this discussion, the speed of the whole system
was limited due to bottlenecks in the UART. However, in consideration of the
fact that most inter-component communication between electronic modules
and components make use of standard protocols, of which UART is one, this
design will still perform slightly better and faster than most other designs that
make use of sequential processing.
Nonetheless, there are other faster protocols which can be exploited in order to
speed up the rate of data exchange and parallel communication can also be
considered since the FPGA has a substantial number of I/O (Input/Output)
pins.
6.10 Applications
This design concept can find application in a large number of fields ranging
from mathematical theory to real world engineering design and systems. The
DFPM can be used to model systems in nature, for instance heat flow in a
space, and fluid flow [10] etc.
A great number of applications can also be found in electronics and engineer-
ing in general. DFPM will prove very useful in solving least squares and,
possibly, weighted least squares problems in sensor fusion. This will prove
useful in radar systems, telecommunications, multi-sensor networks and
mobile sensory and localization problems often encountered in systems requir-
ing self-localization, e.g. mobile robots, and sound-source detecting systems.
DFPM looks promising for the field of image and signal processing especially
in problems requiring singular value decomposition (SVD). DFPM will also
find great usefulness in mechanics where complex linear and non-linear sys-
tems may need to be modeled.
Solutions of large matrix problems often require significant computation and
computational resources, hence DFPM can be found to be a very suitable and
resource-efficient approach to solving these problems. It will be even more
useful when the problem involves sparse matrices, a concept that is useful in
FEM based simulations which is used in all engineering fields [9].
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
50
A DFPM algorithm based on a smaller dimensioned matrix that functions as a
sliding window through the matrix can serve as a very quick, efficient ap-
proach that requires minimal computational resources.
6.11 Implications
While DFPM offers a lot of advantages and developmental possibilities, there
are situations in which its efficiency can possibly be exploited for negative
purposes.
Certain aspects of data safety and integrity depend on hashing and a signifi-
cant amount of computational resource and time is required to break them but
the advent of simpler algorithms and dedicated devices (e.g.) FPGAs with
great computational power facilitate access to, supposedly secured, data by
criminals.
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
51
7 Conclusions
It was found that the design approach met expectations and offered significant
advantages over traditional computational devices and methods. It was also
found that implementing the DFPM algorithm in FPGA is an efficient ap-
proach to reducing computation time and improving resource efficiency.
Since the DFPM algorithm is widely applicable to a number of other problems,
implementing the algorithm in a dedicated device that makes efficient use of
resources, while increasing the speed at which results are obtained, offers a lot
of advantages.
7.1 Benchmark
In order to base the conclusions drawn in this project on criteria that are inde-
pendent of platforms, the computation output and the number of clock cycles
were used.
Based on the result of a test carried out using the C++ snippet in Appendix A,
on a mobile PC, Acer Aspire 5750, with dual CPU cores running at 2.4 GHz
clock speed, it was observed that the same algorithm applied to a specific
problem required 75754 clock cycles on the PC while the same problem was
completed in 3192 clock cycles using the FPGA implementation.
Regardless of the significant difference in computation time and computational
architecture and resources, the results obtained from both computations were
close enough to be regarded as equivalent.
Hence, the initial goals of the design were achieved and the expectation of
superior performance and resource-efficiency was verified.
7.2 Further work
A lot can be improved in this design. Below is a list of possibilities:
1. Improving the forward translation modules so that they can handle
multi-digit decimal input in the problem set.
2. Modifying the module that reverse-translates the solution vector from
the DFPM top module so that they are able to handle the full range of
bits representing fractional values in the data type used in the design.
DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
52
3. Designing the DFPM computational module to be able to handle larger
problem sets along with the possibility of handling multi-dimensional
problem sets.
4. Enhancing the UART baud rate as well as making it configurable in use.
This will reduce the stress that can be encountered while setting up a
connection between the UART on the FPGA and the terminal applica-
tion software.
5. Enhancing the design so that it can handle multiple problem sets, i.e. re-
ceive a problem set, resolve it and return to wait for the next problem.
DFPM On FPGA
2015-09-25
53
References
[1] S. Edvardsson, M. Gulliksson, J. Persson, et. al, “The Dynamic Functional
Particle Method: An Approach for Boundary Value Problems”, J. Appl.
Mech. 79(2) 021012 (Feb 24, 2012)
[2] S. Edvardsson et al, Role of the dynamic functional particle method for
solving linear equations, Physical Review E. Statistical, Nonlinear, and
Soft Matter Physics.
[3] R. Sincovec, N. Madsen, Software for non-linear partial differential
equations, ACM Trans. Math. Softw. 1 (1975) 232 260
[4] V. Pata, M. Squassina, On the strongly damped wave equation, Com-
mun. Math. Phy. 253 (2005) 511 533
[5] F. Alvarez, On the minimization property of a second order dissipative
system in Hilbert spaces, Siam J. Control Optim. 38 (2000) 1102 1119
[6] B. Land, Hybrid Computing On an FPGA, Cornell University,
https://courses.cit.cornell.edu/ece576/DDA/FPGAhybridBRL.pdf, last re-
trieved 2014-09-25
[7] Xilinx Inc., 2013: Spartan 3-E FPGA family data sheet,
http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf ,
last retreived 2014-09-25
[8] Digilent Inc., 2011, Digilent Nexys2 Board Reference manual,
http://www.digilentinc.com/data/products/nexys2/nexys2_rm.pdf , last
retrieved 2014-09-25
[9] Y. Saad, Iterative methods for sparse linear systems, 2nd ed., Society for
Industrial and applied mathematics, 2003.
[10] Ne_Zheng Sun, Applications of numerical methods to simulate the
movements of contaminants in groundwater, Environmental Health Per-
spectives, Vol. 83, (Nov. 1989), pp. 97 – 115.
[11] ASCII Table, www.asciitable.com , last retrieved 2014-09-26.
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
54
Appendix A: Documentation of
developed program code
Design codes
Vector multiplication
1 --------------------------------------------------------------
2 -- Company: Mid Sweden University
3 -- Engineer: Taiyelolu Adeboye
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral
8 -- Project Name: DFPM on FPGA
9 -- Target Devices: Nexys2
10 -------------------------------------------------------------
11 library IEEE;
12 use IEEE.STD_LOGIC_1164.ALL;
13 use IEEE.std_logic_signed.all;
14 use work.DFPM_ARRAY_5X32_BIT.all;
15
16 -- Uncomment the following library declaration if using
17 -- arithmetic functions with Signed or Unsigned values
18 use IEEE.NUMERIC_STD.ALL;
19
20 -- Uncomment the following library declaration if instantiating
21 -- any Xilinx primitives in this code.
22 --library UNISIM;
23 --use UNISIM.VComponents.all;
24
25 entity Signed_Vector_Vector_Mult_5By1 is
26 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT;
27 Vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;
28 CLK : in STD_LOGIC;
29 RST : in STD_LOGIC;
30 Vector_Out : out Signed (32 downto 0));
31 end Signed_Vector_Vector_Mult_5By1;
32
33 architecture Behavioral of Signed_Vector_Vector_Mult_5By1 is
34
35 Signal Mult0, Mult1, Mult2,
Mult3, Mult4 : Signed(65 downto 0):= (others => '0');
36
37 Signal Sum : Signed(69 downto 0):= (others => '0');
38
39 begin
40
41 Mult0 <= Vector_1(0) * Vector_2(0);
42 Mult1 <= Vector_1(1) * Vector_2(1);
43 Mult2 <= Vector_1(2) * Vector_2(2);
44 Mult3 <= Vector_1(3) * Vector_2(3);
45 Mult4 <= Vector_1(4) * Vector_2(4);
46
47 Sum <= "0000" & Mult0 + Mult1 + Mult2 + Mult3 + Mult4;
48
49 Vector_Out <= Sum(48 downto 16);
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
55
50
51 end Behavioral;
Vector subtraction
1 --------------------------------------------------------------
2 -- Company: Mid Sweden University
3 -- Engineer: Taiyelolu Adeboye
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral
8 -- Project Name: DFPM on FPGA
9 -- Target Devices: Nexys2
10 -------------------------------------------------------------
11
12 library IEEE;
13 use IEEE.STD_LOGIC_1164.ALL;
14 use IEEE.std_logic_signed.all;
15 use work.DFPM_ARRAY_5X32_BIT.all;
16
17 -- Uncomment the following library declaration if using
18 -- arithmetic functions with Signed or Unsigned values
19 use IEEE.NUMERIC_STD.ALL;
20
21 -- Uncomment the following library declaration if using
22 -- arithmetic functions with Signed or Unsigned values
23 --use IEEE.NUMERIC_STD.ALL;
24
25
29
30 entity Signed_Vector_Vector_5By1_Subtr is
31 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT;
32 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;
33 CLK : in STD_LOGIC;
34 RST : in STD_LOGIC;
35 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT);
36 end Signed_Vector_Vector_5By1_Subtr;
37
38 architecture Behavioral of Signed_Vector_Vector_5By1_Subtr is
39
40 Signal Subtr0, Subtr1, Subtr2, Subtr3, Subtr4 : Signed(33 downto 0);
41
42 begin
43
44 Subtr0 <= '0' & Vector_1(0) - vector_2(0);
45 Subtr1 <= '0' & Vector_1(1) - vector_2(1);
46 Subtr2 <= '0' & Vector_1(2) - vector_2(2);
47 Subtr3 <= '0' & Vector_1(3) - vector_2(3);
48 Subtr4 <= '0' & Vector_1(4) - vector_2(4);
49
50 Vector_Out(0) <= Subtr0(32 downto 0);
51 Vector_Out(1) <= Subtr1(32 downto 0);
52 Vector_Out(2) <= Subtr2(32 downto 0);
53 Vector_Out(3) <= Subtr3(32 downto 0);
54 Vector_Out(4) <= Subtr4(32 downto 0);
55
56
57 end Behavioral;
Subtraction and multiplication operations
Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015
Page 1
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
56
1 --------------------------------------------------------------
2 -- Company: Mid Sweden University
3 -- Engineer: Taiyelolu Adeboye
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral
8 -- Project Name: DFPM on FPGA
9 -- Target Devices: Nexys2
10 -------------------------------------------------------------
11
12 library IEEE;
13 use IEEE.STD_LOGIC_1164.ALL;
14 use IEEE.std_logic_signed.all;
15 use work.DFPM_ARRAY_5X32_BIT.all;
16 use work.DFPM_ARRAY_25X32_BIT.all;
17 use IEEE.NUMERIC_STD.ALL;
18
19
20 entity Signed_SubtrAndMult_Ops_Module is
21 Port ( Vector_A : in DFPM_SIGNED_VECTOR_25X32_BIT;
22 Vector_B : in DFPM_SIGNED_VECTOR_5X32_BIT;
23 Vector_X : in DFPM_SIGNED_VECTOR_5X32_BIT;
24 Scalar_Mu : in SIGNED (32 downto 0);
25 Vector_V : in DFPM_SIGNED_VECTOR_5X32_BIT;
26
27 CLK : in STD_LOGIC;
28 RST : in STD_LOGIC;
29 NEW_ITERATION : in STD_LOGIC := '0';
30 ITERATION_COMPLETE : out STD_LOGIC:= '0';
31
32 B_Minus_AX : out DFPM_SIGNED_VECTOR_5X32_BIT;
33 B_Minus_Ax_Minus_muV : out DFPM_SIGNED_VECTOR_5X32_BIT);
34 end Signed_SubtrAndMult_Ops_Module;
35
36 architecture Behavioral of Signed_SubtrAndMult_Ops_Module is
37
38 ------------------------------------------------
39
40
41 -- This component will be used to evaluate
42 -- The vector multiplication A*X
43 -- It takes two input of 5 by 1 vectors
44 COMPONENT Signed_Vector_Vector_Mult_5By1
45 PORT(
46 Vector_1 : IN DFPM_SIGNED_VECTOR_5X32_BIT;
47 Vector_2 : IN DFPM_SIGNED_VECTOR_5X32_BIT;
48 CLK : IN std_logic;
49 RST : IN std_logic;
50 Vector_Out : OUT Signed(32 downto 0)
51 );
52 END COMPONENT;
53
54 -- This component will be used top evaluate the subtraction in B -
Ax
55 COMPONENT Signed_Vector_Vector_5By1_Subtr
56 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT;
57 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;
58 CLK : in STD_LOGIC;
59 RST : in STD_LOGIC;
60 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT);
61 END COMPONENT;
62
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
57
63 ------------------------------------------------
64
65
66
67 ------------------------------------------------
68 -- Signals for storing the input values
69 Signal Sig_Vector_A : DFPM_SIGNED_VECTOR_25X32_BIT := ( ((Others =>
'0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')),
70 ((Others => '0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')),
71 ((Others => '0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')),
72 ((Others => '0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')),
73 ((Others => '0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')));
74
75 Signal Sig_Vector_B : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
'0'), (Others =>
'0'), (Others => '0'), (Others => '0'), (Others => '0'));
76 Signal Sig_Vector_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
'0'), (Others =>
'0'), (Others => '0'), (Others => '0'), (Others => '0'));
77 Signal Sig_Scalar_Mu: SIGNED (32 downto 0);
78 Signal Sig_Vector_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
'0'), (Others =>
'0'), (Others => '0'), (Others => '0'), (Others => '0'));
79
80
81 -- The two signals below are used to connect the signals at the
Vector_vector_Mult_Module
82 -- To the the Corresponding Vector indexes.
83 -- These were used to avoid assigning Dynamically changing signals
directly to a
static line
84 Signal Sig_Vector_A_With_IndexPosition : DFPM_SIGNED_VECTOR_5X32_BIT
:= ((Others =>
'0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others =>
'0'));
85
86 Signal Sig_Vector_A_Mult_X_With_IndexPosition : SIGNED (32 downto
0);
87
88 -- These following two(2) signals will be used to store the products
of the
89 -- Multiplication of Vectors A and X
90 -- as well as Scalar mu and Vector V.
91 Signal Sig_Vector_A_Mult_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others
=> '0'), (
Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'));
92 Signal Sig_Vector_Mu_Mult_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth-
ers => '0'), (
Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'));
93
94 -- These following tow signals will be used to store the result
95 -- of the subtraction operations
96 Signal Sig_Vector_B_Minus_AX : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth-
ers => '0'), (
Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'));
97 Signal Sig_Vector_B_Minus_AX_Minus_MuV : DFPM_SIGNED_VECTOR_5X32_BIT
:= ((Others =>
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
58
'0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others =>
'0'));
98
99 -- This signal will only be raised for one clock cycle
100 -- when there is a new set of data for available computation
101 Signal DFPMCompute : STD_LOGIC := '0';
102
103 -- This signal is used to sommunicate with other modules "down-
stream" of this module
104 -- when there the result of this module's computation is ready
105 Signal Sig_ITERATION_COMPLETE : STD_LOGIC := '0';
106
107 -- This Signal will be used to represent the index position that
108 -- that will be progressively incremented as a means of pipelining
109 -- data for multiplication in this module as well as input for the
110 -- Vector_Vector_Multiplication module
111 Signal MultplicationStageArrayPosition : integer := 0;
112
113 -- This signal will be used to signal when the index position
114 -- can be shifted and when data can be stored for output
115 Signal Shift_Array_Position : STD_LOGIC := '0';
116
117 -- This signal will be raised once when all the products of multi-
plication are
ready.
118 -- This is to enable the module to signal to other modules "down-
stream"
119 -- that the result of the computation is ready
120 Signal MultiplicationProductsReady : STD_LOGIC := '0';
121
122 Signal ReadyFlag : STD_LOGIC := '0';
123
124 -- This clock signal was created as a slowed down (half pace of
CLK)
125 -- And will be used for clocking the shifting of the index position
126 Signal Sig_Clk_For_Index_Shifting : STD_LOGIC := '0';
127
128
129 begin
130 -- For Vector - Vector multiplication
131 Vector_Vector_Mult: Signed_Vector_Vector_Mult_5By1 PORT MAP (
132 Vector_1 => Sig_Vector_A_With_IndexPosition,
133 Vector_2 => Sig_Vector_X,
134 CLK => CLK,
135 RST => RST,
136 Vector_Out => Sig_Vector_A_Mult_X_With_IndexPosition);
137
138 -- For Subtraction operations for B - AX
139 Doing_B_Minus_AX : Signed_Vector_Vector_5By1_Subtr PORT MAP (
140 Vector_1 => Sig_Vector_B,
141 vector_2 => Sig_Vector_A_Mult_X,
142 CLK => CLK,
143 RST => RST,
144 Vector_Out => Sig_Vector_B_Minus_AX);
145
146 -- For Subtraction operations for B - AX - muV
147 Doing_B_Minus_AX_Minus_MuV : Signed_Vector_Vector_5By1_Subtr PORT
MAP (
148 Vector_1 => Sig_Vector_B_Minus_AX,
149 vector_2 => Sig_Vector_Mu_Mult_V,
150 CLK => CLK,
151 RST => RST,
152 Vector_Out => Sig_Vector_B_Minus_AX_Minus_MuV);
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
59
153
154 -- This signal wiill be used to signal that the output of this
module is ready to
be read.
155 ITERATION_COMPLETE <= Sig_ITERATION_COMPLETE;
156
157
158
159
160
161 -- This process determines the when each iteration of the DFPM
algorithm is to be
started
162 -- Computation will only be done if it's a new iteration and it has
not been
completed before
163 -- Therefore this process sets DFPMCompute to '1' only on the
rising edge of
NEW_ITERATION
164 -- And stored new Value into the Vectors only at the rising edge of
NEW_ITERATION
165 process(CLK, RST, Sig_ITERATION_COMPLETE, NEW_ITERATION)
166 Variable NEW_ITERATION_Var : STD_LOGIC := '0';
167 begin
168 if rising_edge(CLK) then
169 if (RST = '1') then
170 DFPMCompute <= '0';
171 NEW_ITERATION_Var := '0';
172 elsif (Sig_ITERATION_COMPLETE = '1') then
173 NEW_ITERATION_Var := '0';
174 DFPMCompute <= '0';
175 -- This more or less senses for the rising edge of NEW_ITERATION
176 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '0') then
177 --if rising_edge(NEW_ITERATION) then
178 NEW_ITERATION_Var := '1';
179
180 Sig_Vector_A <= Vector_A;
181 Sig_Vector_B <= Vector_B;
182 Sig_Vector_X <= Vector_X;
183 Sig_Vector_V <= Vector_V;
184 Sig_Scalar_Mu <= Scalar_Mu;
185
186 DFPMCompute <= '1';
187 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '1') then
188 NEW_ITERATION_Var := '0';
189 DFPMCompute <= '0';
190 elsif (NEW_ITERATION = '0') then
191 NEW_ITERATION_Var := '0';
192 DFPMCompute <= '0';
193 end if;
194 end if;
195 end process;
196
197
198 -- This process determies the array postions to be multiplied
together for A*X
199 process(RST, Sig_ITERATION_COMPLETE, DFPMCompute,
Shift_Array_Position,
NEW_ITERATION, CLK, Sig_Clk_For_Index_Shifting, MultplicationStageAr-
rayPosition,
Sig_Vector_A, Sig_Vector_A_Mult_X_With_IndexPosition, Sig_Scalar_Mu,
Sig_Vector_V)
200 Variable MultplicationStageArrayPosition_Var : integer := 0;
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
60
201
202 begin
203 if (RST = '1') then
204 MultplicationStageArrayPosition <= 0;
205 Shift_Array_Position <= '0';
206 MultiplicationProductsReady <= '0';
207
208 elsif (Sig_ITERATION_COMPLETE = '1') then
209 MultplicationStageArrayPosition <= 0;
210 Shift_Array_Position <= '0';
211
212 elsif (DFPMCompute = '1') then -- Checking for the rising edge of
NEW
iteration here
213 MultplicationStageArrayPosition <= 0;
214 Shift_Array_Position <= '1';
215 MultiplicationProductsReady <= '0';
216
217 -- Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0);
218 -- Sig_Vector_A_Mult_X(0) <=
Sig_Vector_A_Mult_X_With_IndexPosition;
219 -- productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0);
220 -- Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <=
productTempStore(48 downto 16);
221
222 elsif (Shift_Array_Position = '1') then
223 if rising_edge(Sig_Clk_For_Index_Shifting) then
224 if (MultplicationStageArrayPosition = 5) then
225 MultplicationStageArrayPosition <= 0;
226 Shift_Array_Position <= '0';
227 MultiplicationProductsReady <= '1';
228 else
229 MultplicationStageArrayPosition_Var :=
MultplicationStageArrayPosition;
230 MultplicationStageArrayPosition <=
MultplicationStageArrayPosition_Var + 1;
231 end if;
232 end if;
233 end if;
234 end process;
235
236 process(CLK, DFPMCompute, Shift_Array_Position, Multplication-
StageArrayPosition)
237 Variable productTempStore : Signed(65 downto 0);
238 begin
239 if rising_edge(CLK) then
240 if (Shift_Array_Position = '1') and ( MultplicationStageArrayPosi-
tion < 5
) then
241 case MultplicationStageArrayPosition is
242 when 0 =>
243 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0);
244 Sig_Vector_A_Mult_X(0) <= Sig_Vector_A_Mult_X_With_IndexPosition;
245 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0);
246 when 1 =>
247 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(1);
248 Sig_Vector_A_Mult_X(1) <= Sig_Vector_A_Mult_X_With_IndexPosition;
249 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(1);
250 when 2 =>
251 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(2);
252 Sig_Vector_A_Mult_X(2) <= Sig_Vector_A_Mult_X_With_IndexPosition;
253 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(2);
254 when 3 =>
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
61
255 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(3);
256 Sig_Vector_A_Mult_X(3) <= Sig_Vector_A_Mult_X_With_IndexPosition;
257 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(3);
258 when 4 =>
259 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(4);
260 Sig_Vector_A_Mult_X(4) <= Sig_Vector_A_Mult_X_With_IndexPosition;
261 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(4);
262 when Others =>
263 NULL;
264 end case;
265 -- -- Setting the correcponding Vector_A element as the input to
the Vector_Vector_Mult_Module
266 -- Sig_Vector_A_With_IndexPosition <=
Sig_Vector_A(MultplicationStageArrayPosition);
267 -- -- Connecting the output of the Vector_Vector_Mult module to
tghe corresponding A_Mult_X index
268 -- Sig_Vector_A_Mult_X(MultplicationStageArrayPosition) <=
Sig_Vector_A_Mult_X_With_IndexPosition;
269 -- -- Doing mu*V
270 -- productTempStore := Sig_Scalar_Mu *
Sig_Vector_V(MultplicationStageArrayPosition);
271 Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <=
productTempStore(48 downto 16);
272 end if;
273 end if;
274 end process;
275
276
277 -- This process clears ITERATION_COMPLETE and
278 -- only sets it to 1 when the MultiplicationProductsReady signal is
high.
279 -- At the rising_edge of MultiplicationProductsReady, the vectors
280 -- B_Minus_AX and B_Minus_Ax_Minus_muV are assigned.
281 process(CLK, RST, DFPMCompute, MultiplicationProductsReady, Ready-
Flag)
282 begin
283 if rising_edge(clk) then
284 if (RST = '1') then
285 Sig_ITERATION_COMPLETE <= '0';
286 ReadyFlag <= '0';
287
288 elsif (DFPMCompute = '1') then
289 Sig_ITERATION_COMPLETE <= '0';
290 ReadyFlag <= '0';
291 elsif (MultiplicationProductsReady = '1') and (ReadyFlag = '0')
then
292 ReadyFlag <= '1';
293
294 Sig_ITERATION_COMPLETE <= '1';
295 B_Minus_AX <= Sig_Vector_B_Minus_AX;
296 B_Minus_Ax_Minus_muV <= Sig_Vector_B_Minus_AX_Minus_MuV;
297 else
298 Sig_ITERATION_COMPLETE <= '0';
299 -- end if;
300 end if;
301 end if;
302 end process;
303
304 -- The clock signal created in this process is a real afterthought
305 -- It would not have been created if this module had behaved itself
;-))
306 -- It was observed that the circuit computed an output that was
wrong
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
62
307 -- For as long as the shifting of the index position was based on
the normal clock
"CLK"
308 -- Hence this clock that cuts the speed to half.
Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015
Page 7
309 process(CLK)
310 begin
311 if rising_edge(CLK) then
312 Sig_Clk_For_Index_Shifting <= not(Sig_Clk_For_Index_Shifting);
313 end if;
314 End process;
315
316 end Behavioral;
317
318
Tolerance check
1 ---------------------------------------------------------------------
-------------
2 -- Company: Mid Sweden University
3 -- Engineer: Taiyelolu Adeboye
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral
8 -- Project Name: DFPM on FPGA
9 -- Target Devices: Nexys2
10 --------------------------------------------------------------------
--------------
11
12 library IEEE;
13 use IEEE.STD_LOGIC_1164.ALL;
14 use IEEE.std_logic_signed.all;
15 use work.DFPM_ARRAY_5X32_BIT.all;
16
17
18 -- Uncomment the following library declaration if using
19 -- arithmetic functions with Signed or Unsigned values
20 use IEEE.NUMERIC_STD.ALL;
21
22 -- Uncomment the following library declaration if using
23 -- arithmetic functions with Signed or Unsigned values
24 --use IEEE.NUMERIC_STD.ALL;
25
26 -- Uncomment the following library declaration if instantiating
27 -- any Xilinx primitives in this code.
28 --library UNISIM;
29 --use UNISIM.VComponents.all;
30
31 entity Signed_Tolerance_Check is
32 Port ( Vector_B_AX : in DFPM_SIGNED_VECTOR_5X32_BIT;
33 Tolerance_Limit : in Signed (32 downto 0);
34 Iteration_Complete : in STD_LOGIC:= '0';
35
36 CLK : in STD_LOGIC:= '0';
37 RST : in STD_LOGIC:= '0';
38
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
63
39 Tolerance_Limit_Squared, Vector_B_AX_Sum : out Signed (32 downto 0);
40
41 Iterate : out STD_LOGIC := '1');
42 end Signed_Tolerance_Check;
43
44 architecture Behavioral of Signed_Tolerance_Check is
45
46 Signal Sig_Vector_B_AX, Sig_Vector_B_AX_Squared :
DFPM_SIGNED_VECTOR_5X32_BIT;
47 Signal Sig_Tolerance_Limit, Sig_Tolerance_Limit_Squared : Signed (32
downto 0);
48
49 Signal Sig_Vector_B_AX_Sum : Signed(32 downto 0);
50
51 Signal Sig_Position : integer := 0;
52
53 Signal Sig_ShiftPosition, Sig_Multiplication_Is_Complete,
Sig_Check_Tolerance_Limit
: STD_LOGIC := '0';
54
55
56
57
58 begin
59
60 Tolerance_Limit_Squared <= Sig_Tolerance_Limit_Squared;
61 Vector_B_AX_Sum <= Sig_Vector_B_AX_Sum;
62
63 -- This process determines when data stored innternally are to be
serially
multiplied
64 -- They are serially multiplied to save on Multipliers
65 process(CLK, RST, Iteration_Complete, Sig_ShiftPosition,
Sig_Position)
66 Variable Var_Position: integer := 0;
67 begin
68 if rising_edge(CLK) then
69 if (RST = '1') then
70 Sig_Position <= 0;
71 Sig_ShiftPosition <= '0';
72 Sig_Multiplication_Is_Complete <= '0';
73 elsif (Iteration_Complete = '1') then
74 Sig_Check_Tolerance_Limit <= '0';
75 Sig_Position <= 0;
76 Sig_ShiftPosition <= '1';
77 Sig_Multiplication_Is_Complete <= '0';
78 elsif (Sig_Multiplication_Is_Complete = '1') then
79 Sig_Check_Tolerance_Limit <= '1';
80 else
81 if (Sig_ShiftPosition = '1') then
82 if (Sig_Position = 5) then
83 Sig_Position <= 0;
84 Sig_Multiplication_Is_Complete <= '1';
85 Sig_ShiftPosition <= '0';
86 else
87 Var_Position := Sig_Position;
88 Sig_Position <= Var_Position + 1;
89 end if;
90 end if;
91 end if;
92 end if;
93 end process;
94
DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
64
95 -- Storing data internally at when signal from SubtrAndMult Module
is high
96 process(Iteration_Complete)
97 Variable productTempStore : Signed(65 downto 0) := (Others => '0');
98 begin
99 if rising_edge(Iteration_Complete) then
100 Sig_Tolerance_Limit <= Tolerance_Limit;
101 Sig_Vector_B_AX <= Vector_B_AX;
102 end if;
103 end process;
104
105 -- Serial multiplication
106 process(CLK, Sig_ShiftPosition, Sig_Position)
107 Variable productTempStore : Signed(65 downto 0);
108 begin
109 if rising_edge(clk) then
110 if (Sig_ShiftPosition <= '1') then
111 Case Sig_Position is
112 when 0 =>
113 productTempStore := (Sig_Vector_B_AX(Sig_Position) *
Sig_Vector_B_AX(Sig_Position));
114 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48
downto 16);
115 when 1 =>
116 productTempStore := (Sig_Vector_B_AX(Sig_Position) *
Sig_Vector_B_AX(Sig_Position));
117 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48
downto 16);
118 when 2 =>
119 productTempStore := (Sig_Vector_B_AX(Sig_Position) *
Sig_Vector_B_AX(Sig_Position));
120 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48
downto 16);
121 when 3 =>
122 productTempStore := (Sig_Vector_B_AX(Sig_Position) *
Sig_Vector_B_AX(Sig_Position));
123 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48
downto 16);
124 when 4 =>
125 productTempStore := (Sig_Vector_B_AX(Sig_Position) *
Sig_Vector_B_AX(Sig_Position));
126 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48
downto 16);
127 when 5 =>
128 productTempStore := Sig_Tolerance_Limit * Sig_Tolerance_Limit;
129 Sig_Tolerance_Limit_Squared <= productTempStore(48 downto 16);
130 when others =>
131 NULL;
132 End case;
133 end if;
134 end if;
135 end process;
136
137 process(Sig_Multiplication_Is_Complete)
138 variable Var_Vector_B_AX_Sum : Signed (36 downto 0);
139 begin
140 if rising_edge(Sig_Multiplication_Is_Complete) then
141 Var_Vector_B_AX_Sum := ("0000" & Sig_Vector_B_AX_Squared(0) +
Sig_Vector_B_AX_Squared(1)
142 + Sig_Vector_B_AX_Squared(2) +
Sig_Vector_B_AX_Squared(3)
143 + Sig_Vector_B_AX_Squared(4));
144
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report
DFPM on FPGA -Bachelor Thesis Report

More Related Content

Similar to DFPM on FPGA -Bachelor Thesis Report

Fernando_Borbon_MSc_Thesis_2009
Fernando_Borbon_MSc_Thesis_2009Fernando_Borbon_MSc_Thesis_2009
Fernando_Borbon_MSc_Thesis_2009
Fernando Borbón
 
Closed loop power control for lte uplink
Closed loop power control for lte uplinkClosed loop power control for lte uplink
Closed loop power control for lte uplink
Pfedya
 
From LED die to a lighting system. Performance improvement in LED lighting by...
From LED die to a lighting system. Performance improvement in LED lighting by...From LED die to a lighting system. Performance improvement in LED lighting by...
From LED die to a lighting system. Performance improvement in LED lighting by...
VTT Technical Research Centre of Finland Ltd
 
elec_2016_nguyen_huy
elec_2016_nguyen_huyelec_2016_nguyen_huy
elec_2016_nguyen_huy
Nguyen Huy
 
Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054
Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054
Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054
Artjoms Timofejevs
 
Exjobb_Rapport-Dejan_Koren-v2.8-RedQual
Exjobb_Rapport-Dejan_Koren-v2.8-RedQualExjobb_Rapport-Dejan_Koren-v2.8-RedQual
Exjobb_Rapport-Dejan_Koren-v2.8-RedQual
Dejan Koren
 
Cell phone based dtmf controlled
Cell phone based dtmf controlledCell phone based dtmf controlled
Cell phone based dtmf controlled
slmnsvn
 
Report - PLC Based Electrical Load Management System
Report - PLC Based Electrical Load Management SystemReport - PLC Based Electrical Load Management System
Report - PLC Based Electrical Load Management System
Ijlal Siddiqui
 

Similar to DFPM on FPGA -Bachelor Thesis Report (20)

Fernando_Borbon_MSc_Thesis_2009
Fernando_Borbon_MSc_Thesis_2009Fernando_Borbon_MSc_Thesis_2009
Fernando_Borbon_MSc_Thesis_2009
 
Netland thesis
Netland thesisNetland thesis
Netland thesis
 
Closed loop power control for lte uplink
Closed loop power control for lte uplinkClosed loop power control for lte uplink
Closed loop power control for lte uplink
 
From LED die to a lighting system. Performance improvement in LED lighting by...
From LED die to a lighting system. Performance improvement in LED lighting by...From LED die to a lighting system. Performance improvement in LED lighting by...
From LED die to a lighting system. Performance improvement in LED lighting by...
 
Emona-based Interactive Amplitude Modulation/Demodulation iLab
Emona-based Interactive Amplitude Modulation/Demodulation iLabEmona-based Interactive Amplitude Modulation/Demodulation iLab
Emona-based Interactive Amplitude Modulation/Demodulation iLab
 
elec_2016_nguyen_huy
elec_2016_nguyen_huyelec_2016_nguyen_huy
elec_2016_nguyen_huy
 
TR1643
TR1643TR1643
TR1643
 
Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054
Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054
Control_of_MMC_in_HVDC_Applications_Masters_Thesis_WPS4_1054
 
Nem_thermoelectrics_V3.1
Nem_thermoelectrics_V3.1Nem_thermoelectrics_V3.1
Nem_thermoelectrics_V3.1
 
Exjobb_Rapport-Dejan_Koren-v2.8-RedQual
Exjobb_Rapport-Dejan_Koren-v2.8-RedQualExjobb_Rapport-Dejan_Koren-v2.8-RedQual
Exjobb_Rapport-Dejan_Koren-v2.8-RedQual
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"
Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"
Bachelor in Computer Engineering Minor Project " MULTI-LEARNING PLATFORM"
 
Cell phone based dtmf controlled
Cell phone based dtmf controlledCell phone based dtmf controlled
Cell phone based dtmf controlled
 
Report - PLC Based Electrical Load Management System
Report - PLC Based Electrical Load Management SystemReport - PLC Based Electrical Load Management System
Report - PLC Based Electrical Load Management System
 
Final_29_09_v3 (1)
Final_29_09_v3 (1)Final_29_09_v3 (1)
Final_29_09_v3 (1)
 
Fundamentals of Telecommunication Engineering
Fundamentals of Telecommunication EngineeringFundamentals of Telecommunication Engineering
Fundamentals of Telecommunication Engineering
 
Electronic Student Record Management System
Electronic Student Record Management SystemElectronic Student Record Management System
Electronic Student Record Management System
 
Advanced Mechanical Design and Material 6ME503
Advanced Mechanical Design and Material 6ME503Advanced Mechanical Design and Material 6ME503
Advanced Mechanical Design and Material 6ME503
 
FULLTEXT01
FULLTEXT01FULLTEXT01
FULLTEXT01
 
Test
TestTest
Test
 

DFPM on FPGA -Bachelor Thesis Report

  • 1. Självständigt arbete på grundnivå Independent degree project  first cycle Electrical Engineering DFPM on FPGA – A speed optimized implementation of the Dynamic Functional Particle method on Spartan 3E Taiyelolu Adeboye
  • 2. DFPM on FPGA Taiyelolu Adeboye 2015-09-25 iii MID SWEDEN UNIVERSITY Department of Electronics Design(EKS) Examiner: Benny Thörnberg, Benny.Thornberg@miun.se Supervisor: Kent bertilsson, Kent.Bertilsson@miun.se Author: Taiyelolu O. Adeboye, taad1000@student.miun.se Degree programme: International Bachelor’s Programme in Electronics, 180 credits Main field of study: Electronics Engineering Semester, year: Autumn, 2014
  • 3. DFPM on FPGA Taiyelolu Adeboye Abstract 2015-09-25 iv Abstract This thesis focuses on the design of electronic circuitry that implements the Dynamic Functional Particle Method (DFPM). The design was done in VHDL and implemented on a Xilinx Spartan 3E FPGA. The work included a digital 33-bit ALU implementation that was designed to solve differential equations with the DFPM algorithm and UART trans- ceiver and controller circuits for data exchange between the FPGA and the PC. This report explains the design principles, process, tests and results of the work. It also compares the performance of the designed system with the performance of generic computational devices and also examines the possibilities and limitations of operational concurrency with relation to the size of problem sets. Keywords: MATLAB, VHDL, FPGA, DFPM, algorithm evaluation, CPU clock cycles, particle method
  • 4. DFPM on FPGA Taiyelolu Adeboye Acknowledgements 2015-09-25 v Acknowledgements I would like to express my appreciation to my supervisor, Associate Professor Kent Bertilsson, for his guidance, mentorship and support in the course of this project. His contribution was vital to the execution and completion of this project work. I would also like to express my appreci- ation to Associate Professor Sverker Edvardsson for being so approach- able and for his great willingness to explain. My various tutors and examiners in the course of this Bachelor’s pro- gramme have proven themselves to be exceptional and unforgettable. In no particular order, Professor Bengt Oelmann, Dr. Börje Norlin, Profes- sor Kent Bertilsson, Professor Benny Thörnberg, Martin Kjellqvist, Mikael Hasselmalm, Dr. Najeem Lawal, Mikael Bylund, Amir Yousaf, Professor Cornelia Schiebold, Dr. Peng Cheng, Mazhar Hussein, Profes- sor Engmont Porten, Stefan Haller, David Krapohl, Solange Hamrin and Evelina Caffrey will remain entrenched in my memory. Without mincing words, Anders Rådberg, Anders Molin, Sara Lodin, Lars Malmbom, Tove Gullikson and the team at MIUN Innovation will always remain dear to my heart. Thank you for your time, advice and your effort! Finally, I owe a huge debt of gratitude to the following: The divine, for those moments when I was dry, Temitope Ruth, for being so under- standing and special, Ire Peter, our bundle of joy, for being so sweet, Kehinde, my wonderful twin, my family (Samuel, Dorcas, Ardex, Adeyemi and Ope) for being such a pillar of support, and my friends in Sweden and in Nigeria. Words will not be enough to express how much I appreciate you! Thank you for being part of this journey, muchas gracias! Greater things are still to come!
  • 5. DFPM On FPGA Taiyelolu Adeboye Table of Contents 2015-09-25 vi Table of Contents Abstract ............................................................................................................ iv Acknowledgements .........................................................................................v 1 Introduction............................................................................................1 1.1 Background and problem motivation......................................2 1.2 Overall aim...................................................................................3 1.3 Scope .............................................................................................4 1.4 Tools to be used...........................................................................4 1.5 Concrete and verifiable goals ....................................................4 1.6 Outline ..........................................................................................5 1.7 Contributions ...............................................................................5 2 Theory......................................................................................................6 2.1 Definition of terms and abbreviations......................................7 2.1.1 Terms..................................................................................7 2.1.2 Abbreviations..................................................................11 2.2 DFPM algorithm........................................................................12 3 Methodology ........................................................................................15 3.1 Concurrence vs. sequentiality .................................................15 3.2 Numerical representation ........................................................15 3.3 Modularity..................................................................................16 4 Design....................................................................................................17 4.1 The DFPM algorithm ................................................................17 4.2 Project Top Module...................................................................19 4.2.1 The two top sub-modules..............................................19 4.2.2 Data type conversion .....................................................19 4.3 Project defined Packages..........................................................20 4.4 Communication Top Module ..................................................20 4.4.1 UART................................................................................20 4.5 Iteration Control Top Module .................................................22 4.6 Implementation Constraint......................................................24 4.7 Parameters..................................................................................24 4.8 Data exchange format...............................................................25 4.9 Signed numerical representation ............................................26 4.10 Integer and fractional representation.....................................27 4.11 Spartan 3E-1200 FG320 FPGA .................................................28
  • 6. DFPM On FPGA Taiyelolu Adeboye Table of Contents 2015-09-25 vii 4.12 Nexys2 FPGA demonstration board ......................................28 4.13 Xilinx ISE ....................................................................................29 4.14 ISim Simulation software.........................................................29 4.15 Design verification ....................................................................30 4.16 The complete design .................................................................30 5 Results ...................................................................................................32 5.1 Simulation results......................................................................32 5.1.1 Element wise vector multiplication .............................32 5.1.2 Element-wise vector subtraction..................................33 5.1.3 Evaluating new vector V ...............................................34 5.1.4 Evaluating new vector X ...............................................34 5.1.5 Convergence check.........................................................35 5.1.6 DFPM top module..........................................................36 5.2 Comparison................................................................................39 6 Discussion.............................................................................................42 6.1 FPGA resource utilization........................................................42 6.2 Reduction in computation time...............................................42 6.3 Larger problem sets ..................................................................42 6.4 UART bottleneck .......................................................................43 6.5 Precision......................................................................................43 6.6 Communication input/output limitations .............................43 6.7 Cross platform comparison......................................................43 6.8 Output comparison...................................................................45 6.9 Communication possibilities ...................................................49 6.10 Applications ...............................................................................49 6.11 Implications................................................................................50 7 Conclusions ..........................................................................................51 7.1 Benchmark..................................................................................51 7.2 Further work ..............................................................................51 References........................................................................................................53 Appendix A: Documentation of own developed program code...........54 Design codes ....................................................................................................54 New V operations………. ..............................................................................65 New X operations............................................................................................67 One Iteration …………………………………………………………...69 DFPM top module ..........................................................................................73 UART Core …………………………………………………………..76 UART Interface …………………………………………………………..83 Project Top module.........................................................................................88
  • 7. DFPM On FPGA Taiyelolu Adeboye Table of Contents 2015-09-25 viii Test code written in C++.................................................................................96 Appendix B: Explanation of some basic mathematical concepts........100 Two’s complement........................................................................................100 Euclidian norm ..............................................................................................100 Appendix C: Project report summary.......................................................102 Appendix D: MATLAB codes....................................................................103 Code for problem specification and comparison. ....................................103 Appendix E. Table of standard ASCII symbols and their numerical representation ....................................................................................109
  • 8. DFPM On FPGA Taiyelolu Adeboye 1 Introduction 2015-09-25 1 1 Introduction DFPM on FPGA is a project work that implements the algorithm of the Dy- namic Functional Particle Method in silicon. The implementation was done on Xilinx Spartan 3E FPGA, and it was designed for speed (in terms of the num- ber of clock cycles required for the implementation). The Dynamic Functional Particle Method (DFPM) is a numerical particle method that was developed at Mid Sweden University. While the method is iterative, it consists of steps, some of which can be executed in parallel. There- fore a FPGA was considered to be able to offer advantages due to its parallel processing capabilities. The FPGA implementation takes matrix elements as input parameters through the UART and returns an output in the form of the solution vector relevant to the parameter input received. Figure 1.1: A simplified illustration of the project
  • 9. DFPM On FPGA Taiyelolu Adeboye 1 Introduction 2015-09-25 2 1.1 Background and problem motivation Systems of linear equations can be used to describe many observable natural phenomena in nature and find application in many areas in physics, mechan- ics, and sensor fusion among others. One of the approaches to solving systems of linear equations involves the application of the knowledge of matrices. This approach treats the system as matrices or vectors comprising of elements that represent the parameters of the system in question. This approach often results in the classical A*X = B problem where A, X and B are matrices/vectors. A has elements containing various parameters of the system, X contains elements representing the defining properties of the pa- rameters and B represents the solution vector. For instance, if a system is defined as shown below, 3x – 2y + 4z = 10 5y + 1y – 2z = -2 10y – 5y + 3z = 4 Then it can be represented in A*X = B form as shown below. As the number of variables in these systems increase, the size of the matrices increase proportionately but the number of iterations required for solving the problem using an iterative numerical method increases geometrically, thus consuming significant CPU time. This project aims to address this problem through the design of an Arithmetic and Logical Unit (ALU) that implements the DFPM algorithm in a system that combines sequential and parallel execution as a means of reducing the number of CPU clock cycles required per iteration and consequentially, the computa- tion time for the complete algorithm.
  • 10. DFPM On FPGA Taiyelolu Adeboye 1 Introduction 2015-09-25 3 1.2 Overall aim The overall aim of the project is the design of an ALU that implements the Dynamic Functional Particle Method on a FPGA. The system will be capable of receiving input in the form of parameters that represent the variables of the system to be analysed and will give its output in the form of a matrix whose elements represent the solution to the problem. The designed system will be capable of communicating with a computer through the USB port and the data is to be collected and displayed on the computer screen using suitable software. The output from the designed system should be correct and consistent in comparison with values obtainable from a similar computation executed in MATLAB or similar software on a PC. Figure 1.2: An overview of the project concept
  • 11. DFPM On FPGA Taiyelolu Adeboye 1 Introduction 2015-09-25 4 1.3 Scope The designed system is expected to be able to resolve system of linear equation problems expressed in the form A*X = B where A is a 5x5 square matrix while X and B are 5X1 Vectors respectively. A and B will be given as input to the designed system while the system gives an output that represents X as a solu- tion vector of the system. The input to the designed system should be in the form of positive 8 bit inte- gers while the output from it is expected to consist of whole numbers as well as fractions which can be represented to a maximum precision of 8 binary bits. Although limits have been imposed on the kind of input parameter expected with the aim of easing the communication between the designed FPGA system and PC software, it is expected that the ALU designed should be able to exe- cute the DFPM algorithm on input data beyond these constraints. 1.4 Tools to be used The following tools are expected to be used to carry out this project: 1. Xilinx Spartan 3E FPGA on Nexys2 demonstration board. 2. Xilinx ISE design suite. 3. Desktop terminal application software running on a PC. 4. MATLAB software running on a PC. 1.5 Concrete and verifiable goals The goals of the project are as follows: 1. Design of a processor/ALU in VHDL. The unit should implement the DFPM algorithm. 2. Implementation of parallel processing into the design of the DFPM computational module, as much as optimal for the problem size. 3. Design of UART communication modules, in VHDL, for the transfer of data from the PC/UART port to the DFPM computation module speci- fied in the item number above. 4. Verification of the output from the FPGA. It should be consistently equivalent to the output of the same algorithm run on a PC.
  • 12. DFPM On FPGA Taiyelolu Adeboye 1 Introduction 2015-09-25 5 5. Investigation and suggestion of possible solutions and approaches to scaling up the design for significantly larger problem sets. 1.6 Outline Chapter 2 of this report explains, in brief, the theories behind the design and some related work pertinent to DFPM and the FPGA implementation while Chapter 3 examines the design methodology and principles behind design choices and approaches. Chapter 4 outlines some of the tests carried out to verify the functionality of the modules designed as well as compares the results with those obtainable from other systems. In the fifth chapter, the results are discussed, and the possibilities and limitations examined, and Chapter 6, which concludes the report. 1.7 Contributions This design was wholly done by the author of this report with support and guidance from the supervisor (Associate Prof. Kent Bertilsson). The design was based on the Dynamic Functional Particle Method algorithm which was devel- oped by Prof. Sverker Edvardsson et al [1]. Prof. Sverker Edvardsson supplied the author with information about DFPM and sample application of the algorithm implemented in MATLAB. A UART core designed for the Nexys2 and made available by Digilent Inc., it was adapted in designing the data exchange modules interfacing between the FPGA and the PC.
  • 13. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 6 2 Theory Systems of linear and differential equations is a well-established concept in mathematics and finds its applications in solving theoretical numerical prob- lems as well as real world challenges in various fields of endeavours like mechanics, biology, electronics, economics etc. Thus a lot of work has been done to develop approaches to solving these problems. The dynamic functional paticle (DFPM) is an approach, recently developed by Sverker Edvardsson et al [1] [2], which can be used to solve systems of linear and differential equations. The algorithm is simple, widely applicable and efficient with significant comparative advantages in relation to some of the other established approaches [2]. DFPM implements a novel second order dynamical particle method which, though new, is related to some first order approaches in previous work done by Sincovec and Madsen [3], Pata and Squassina [4], and F. Alvarez [5]. There are a number of computational libraries and algorithm, implementing various approaches to solve problems of linear and differential equation sys- tems. Some of these include ARPACK and LAPACK, Colt library (java), and IML++ (C++) among others. Since this report is not a mathematical treatise, the main focus is on design and implementation of electronic hardware that is able to compute and present solutions to problems presented as a system of differential equations received as input. The design and implementation done in this project, while novel, is also relat- ed to a previous work by Bruce Land entitled “Hybrid Computing on an FPGA“ [6], in which a Digital Differential Analyzer (DDA) was designed and implemented on Altera Cyclone II 2C35 FPGA on an Altera DE2 FPGA demonstration board. The design made use of numerical representation in 18 bits, of which 16 bits were set apart for floating point fractions. Parallel compu- tations were also used in order to reduce CPU computation time. Apart from Bruce Land’s design above, there is little or no known information about the implementation of numerical or particle methods in FPGA, and this work could lead to novel concepts and applications.
  • 14. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 7 2.1 Definition of terms and abbreviations 2.1.1 Terms Below are basic definitions and/or explanation of some important concepts used in this report. 1. Linear equations A linear equation can simply be defined as an algebraic equation consisting of either or both constants and a product of constants and single power variables. 2. Systems of linear equations These are a set of simultaneous linear equations which are defined as a single problem and meant to be treated as such. These are often encountered in real life situations and observable physical phenomena. 3. Differential equations These kinds of equations define relationships connecting certain functions or physical properties with their differentials (i.e. derivatives) hence the name. 4. Systems of differential equations These are simultaneous statements of differential equations defining a specific problem as a function of relationships between one or more independent variables and their derivatives (dependent variables). 5. Numerical methods These are approaches to solving mathematical problems with the use of vari- ous methods numerical approximation. Numerical methods can be direct or iterative. Direct numerical methods include algorithms that have a predefined number of steps for arriving at solutions. An example is the Gaussian elimination method. Iterative methods, however, require an undetermined number of iterations, of computational steps, which can vary with each problem defini- tion. Examples of iterative numerical methods are Newton’s method and the Newton-Raphson method.
  • 15. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 8 6. Particle methods Particle methods are algorithms used, primarily, for the simulation of interact- ing particles of physical systems and their motion in nature. These algorithms are, sometimes, applied to numerical treatment of theoretical mathematical models. The dynamic functional particle method falls under this category. 7. Convergence Convergence is a characteristic of an iterative method when its sequences subsequently and consistently approximates, or “converges”, to some specific numeric approximations. The approximation to which the method converges to is said to be the solution for the problem being solved with the use of the iterative method. 8. The Dynamic Functional Particle method This is an iterative particle method applied to general mathematical problems by which mathematical problem models can be translated to particle models and solved, as developed by Sverker Edvardsson et al [2]. The method is robust and widely applicable to problems of systems of linear and differential equations, especially those defining nature and observable physical phenomena. 9. Sequential processes Sequential processes are processes consisting of operations which are carried out one after the other. In these kinds of processes no two operations take place simultaneously. All operations follow a definite sequence. Examples are operations that take place in a single core CPU (Central Processing Unit). 10. Concurrent processes Concurrent processes are processes consisting of more than one operation being carried out in parallel. These kinds of processes can occur in multi-core CPUs, FPGAs and other kinds of devices with parallel processing capabilities. 11. CPU time This refers to the time spent by a processing unit while carrying out a certain computational operation or set of operations. It is expressed in seconds.
  • 16. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 9 12. Clock This is a component in digital electronics systems by which the timing of operations and processes are controlled. It basically oscillates between a high and low signal. 13. Clock cycle This is a single complete up and down oscillation of a clock. 14. Clock frequency This refers to the number of cycles a clock completes in a second. It is ex- pressed in Hertz. 15. Field Programmable Gates Array (FPGA) These are integrated circuits that are factory manufactured to be configurable by engineers and designers as the use case or application demands. They are normally programmed in a hardware description language (HDL). 16. Universal Asynchronous Receiver Transmitter This is a standard hardware that facilitates serial data exchange between two electronic devices. A UART port should be connected to another UART port in order for them to exchange data. Data exchange between UART hardware is 1 bit serial and takes place between cross-connected receiver and transmitter pins while the data received is con- verted to parallel 8 bit format and exchanged between the UART hardware and the device controlling it.
  • 17. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 10 Figure 2.1 Simplified illustration of the UART communication process 17. MATLAB MATLAB is an interactive software platform and high-level programming language which is often used in scientific and engineering computing due to its simplicity, robustness and easy to use interactive environment and functions. In this project, it was used for the initial execution of the DFPM algorithm and comparison. 18. Terminal software application This is a software application that enables its user to get access to one or more input/output ports (e.g. USB) of a PC and which displays the data stream. In this project, Br@y++ terminal was used to access a USB port and communicate with the FPGA running the DFPM algorithm. 19. Two’s complement Two’s complement is a method of representing positive and negative signed numbers such that the most significant bit is used to represent the sign while the rest of the bits represent the numeric value of the number being represent- ed. When the most significant bit of a number represented in two’s complement is “1”, then the number is negative but when it is “0”, the number is positive.
  • 18. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 11 This is a standard way of representing numbers that is frequently applied in computing and electronics. 2.1.2 Abbreviations The following abbreviations are used in this report: ALU: Arithmetic and Logic Unit. ASCII: American Standard Code for Information Interchange. This is the standard used for the data exchanged between the PC and the FPGA. ASIC: Application Specific Integreated Circuit. These are integrated circuits that are designed or configured for a specific use case or application. ARPACK: Arnoldi PACKage. Is a software library, coded in FORTRAN, which can be used to solve eigenvalue problems. BGA: Ball Grid Array. CLB: Configurable Logic Blocks. These are logic elements on FPGAs used to implement circuits. CPLD: Complex Programmable Logic Device. CPU: Central Processing Unit. DE: Differential Equations. DFPM: Dynamic Functional Particle Method. FPGA: Field Programmable Gates Array. FPU: Floating-Point Unit. HDL: Hardware Description Language. These are languages by which one can design hardware by means of semantics in an ISE or IDE. IDE: Integrated Design Environment. IOB: Input Output Block. These are ports for input and output to and from the FPGA. ISE: Integrated Synthesis Environment. This is software for synthesizing designs done in HDL. Xilinx ISE is an example.
  • 19. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 12 LAPACK: Linear Algebra PACKage. This a library written in FORTRAN which can be used to solve problems in linear algebra. LDE: Linear Differential Equations. LSB: Least Significant Bit. LUT: Look Up Table MATLAB: This is a software platform and high-level language used for pro- gramming and simulations. MCU: Microcontroller. MSB: Most Significant Bit. N/A: Not Applicable. RAM: Random Access Memory. RX: Receive. This is a pin through which data is to be received on a transceiver port. TX: Transmit. This is a pin through which data is to be transmitted on a trans- ceiver port. UART: Universal Asynchronous Receiver Transmitter. USB: Universal Serial Bus. VGA: Video Graphics Array. This is a standard for image display. VHDL: VHSIC Hardware Description Language. In this project, VHDL was used for digital hardware design. VHSIC: Very High Speed Integrated Circuit. 2.2 DFPM algorithm The dynamic functional particle method (DFPM) is widely applicable to solv- ing a number of different problems when defined as a system of linear or differential equations. However, the focus of this project work is on the appli- cation of DFPM to solve the classical A*X = B system of differential equation problem as described in Chapter 1 of this report.
  • 20. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 13 The algorithm is simply a two-step computation which is iterated until con- vergence (or a specified level of convergence) is reached. Checking for conver- gence is done by evaluating the Euclidean norm of the difference between vector B and the vector product of vector X and matrix A and comparing it with a predetermined scalar value representing the acceptable tolerance of the computation. The algorithm requires a number of input which are three n sized vectors representing vector B in the problem statement and vectors X and V which are used in the algorithm. An nxn matrix is also required as an input equivalent to the A-matrix in the problem statement. Three scalar input Dt, mu and toler- ance are also expected in the algorithm and they represent the discretization step, the damping factor and the tolerance respectively.
  • 21. DFPM On FPGA Taiyelolu Adeboye 2 Theory 2015-09-25 14 Figure 2.2 A flowchart of the DFPM algorithm A MATLAB sample code implementing the algorithm in Figure 2.2 above is included in this report.
  • 22. DFPM On FPGA Taiyelolu Adeboye 3 Methodology 2015-09-25 15 3 Methodology As stated in the introductory part of this report, one of the purposes of this project work is the reduction of CPU time. Hence, significant attention was paid to the computational processes implemented in this design, as well as the impact on the speed, and resource use on the FPGA. This chapter describes the methodologies and considerations that influenced the design and implementa- tion as described in the following chapter. The preference of an FPGA over traditional CPUs and other types of pro- cessing units is a consequence of the advantages offered by operational con- currency that is characteristic of FPGAs and CPLDs. After having chosen a design concept, the next biggest challenge was the design itself. The design in this project work was done in VHDL (VHSIC Hardware Description Language). While there are other languages and ap- proaches to similar hardware design, VHDL was chosen because of the ease with which it can be used to manage large projects, as well as the author’s familiarity with it. 3.1 Concurrence vs. sequentiality A limitation that was encountered early in the course of the design was the limited number of dedicated multipliers on FPGAs. This was due to the fact that FPGAs have a limit to the number of multipliers available on them, hence limiting the number of multiplicative operations that can be executed concur- rently. An important focus of this work is speed optimization, for which concurrency is key in this implementation. However, a balance needed to be struck between concurrency and sequentiality. Hence some operations were run in parallel while others were sequential. Addition and subtraction operations were most- ly concurrent while some multiplicative operations were sequential and others parallel. 3.2 Numerical representation The dynamic functional particle method involves an iterative process with a number of multiplications, subtractions and additions at each stage. The algo-
  • 23. DFPM On FPGA Taiyelolu Adeboye 3 Methodology 2015-09-25 16 rithm was implemented in MATLAB and run while the result of the computa- tions at each stage of the iteration was output to the console and examined. The cursory examination clearly indicated that the various values obtained from the computations assumed a range that stretched across positive and negative parts of the number line. This implied that a scheme was needed for a distinct representation of negative and positive values. The values contained integers as well as fractions, necessitating a need for representation of frac- tions. 3.3 Modularity In order to simplify the design, the whole project was split into to two major top modules. One of these two top modules implemented the DFPM algorithm and the necessary iterative computations while the other module was designed to implement UART communication and data exchange between the UART hardware on the FPGA board and the port on the PC with which it will be communicating. This second module was also responsible for the conversion of the 8-bit parallel data to 33-bit numbers and the format expected by the DFPM algorithm module. Each of these top modules was subdivided into smaller modules which carried out specific functions and communicated with other modules through signals and inter-module data exchange. The details of the design are discussed under design in Chapter 4.
  • 24. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 17 4 Design The digital hardware designed in VHDL consisted of combinatorial and syn- chronous circuits which were coded as IO ports, modules, processes and signals. The functioning of the combinatorial circuit elements were instantane- ous while synchronous circuit activities too place at the edge of the clock. The complete design was made up of several modules exchanging information with the aid of signal input and output via their ports. Since the design is reasonably complex and large, an attempt was made to give each module a name that signified or helped to identify the purpose and function of the modules. The core of the design consisted of the modules which executed the DFPM algorithm, an over view of these core modules and their interaction is present- ed in Figure 4.1 4.1 The DFPM algorithm The dynamic functional particle method is widely applicable to many problem models as stated in Chapter 2 of this report. However, in order to design a circuit that specifically solves the A*X = B problem, one needs to understand the step by step procedure of applying DFPM to the problem. Various imple- mentations of DFPM in MATLAB, C++ and VHDL as applied in this thesis are included in the appendix. The procedure entails access to input vectors and matrix containing a number of elements, of vectors and matrices, which make up the coefficients of the systems of equations. The next step is the iterative computation, after which comes the output. Throughout the process, the values of vector B, matrix A, Dt and the damping factor (mu) remains fixed while the values of vectors X and V may be modified at the end each iteration. Each stage of the iterative computation comprises of two steps which are the approximation calculation and the convergence check. The approximation calculation takes the form of matrix multiplication, subtraction and addition operations while the convergence check required a comparison of a predeter- mined tolerance value with the Euclidian norm of the vector V.
  • 25. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 18 Figure 4.1. An overview of the core modules of the DFPM algorithm
  • 26. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 19 4.2 Project top module The topmost level container for the project HDL code was named DFPM_ON_FPGA_TOP_MODULE. This module functioned as the overall top module, containing all VHDL code relevant to the project design. It consisted of two top modules which served two distinctly important functions. The modules were named “UART_INTERFACE” and “Signed_DFPM_Iteration_Control_Top_Module”. The complete VHDL code for all the modules will be included as an appendix to this report. 4.2.1 The two top sub-modules The communication top module was designed to handle communication with the PC through the UART port and the UART VHDL code that controlled it. Data received from the PC which would normally be in 8 bits were converted to 33 bits in the format stated in section 3.2.2 of this report. The data were also accumulated in arrays internal to this module until all data relevant to the specific problem model has been received. The data would then be sent as output through the ports of this module. The Signed DFPM Iteration control module receives a stream of 33-bit data in a format specified in its design, which mathematically describes the problem being solved. The data received would then be subjected to the DFPM algo- rithm, after which a solution would be obtained and sent out as an output through the ports of this module. At the conclusion of the Signed DFPM Iteration Control module’s computa- tion, the output signal would be returned to the Communication top module which reconverts the solution by first translating the result into human reada- ble decimal equivalent before serially shifting the values out in 8 bits through the UART interface. 4.2.2 Data type conversion The communication top module handles data as standard logic vectors and standard logic signals while the Signed DFPM Iteration Control module han- dles data as signed bit vectors for all vectors. This fact necessitated a need for the conversion of the data signal types from standard logic vectors to signed bits and vice versa. This was done with the aid of predefined functions which are conversion standards in VHDL. The conver- sion takes place in the project top module.
  • 27. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 20 4.3 Project defined packages The input data for each problem consisted of scalar data and many vectors and some multi-dimensional matrices. Hence a specific format was designed for easy recognition and handling of these vectors and matrices. Due to the fact that these design-specific format vector data types were often handled and shared between multiple modules in the project, it was considered advanta- geous to create special packages to define these unique format vectors. The specific formats designed are described below: 1. DFPM_VECTOR_5X32_BIT: A data type defining an array of 5 standard logic vectors. Representative of a 5 by 1 vector of standard logic type data. 2. DFPM_VECTOR_25X32_BIT: A data type defining an array of 5 DFPM_VECTOR_5X32_BIT. I.e. a multidimensional array equivalent to a 5 by 5 matrix of standard logic vector type data. 3. DFPM_ARRAY_5X32_BIT: A data type defining an array of 5 signed bit vectors. It was used to represent 5 by 1 vectors of containing signed da- ta. 4. DFPM_ARRAY_25x32_BIT: A data type defining an array of 5 DFPM_ARRAY_5X32_BIT. This is equivalent to a 5 by 5 multidimen- sional array of signed data. These packages were used to ease the process of design and implementation and also facilitated a unified standard between modules. 4.4 Communication top module The communication top module comprised of 8 sub-modules. The modules and their functionalities are briefly described below. 4.4.1 UART These are the modules controlling the UART circuitry 1. RS232RefComp: This module was released by Digilent Inc. as a sample code for an implementation of a UART core for the Nexys2 board. It is the only purely non-original code used in this project.
  • 28. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 21 It is a simple implementation of UART designed in VHDL and it is re- sponsible for 1 bit serial data transmission and reception, as well as the conversion of 1-bit serial to 8-bit parallel data and transmission to the on-board electronic hardware. 2. UART_INTERFACE: This module was used to control the RS232Comp circuit. It determines when the UART core should transmit data, receive data or neither. This module is a simple four-state state machine. The states correspond to: a. Receive state: When the UART core is switched to receive data. b. Waiting state: When both the UART interface and the UART core do nothing but wait for data from the DFPM module. c. Send state: When the UART module is switched to send an 8 bit da- ta. d. RepeatSend state: This is a transitional state where the module goes to after sending each 8-bit data before sending the next. This helps to ensure that the data transmission between the UART INTERFACE and the UART core is hitch-free. The control of the UART core from the UART INTERFACE and feed- back from the UART core was facilitated with the aid of four signals namely wrSig, rdSig, TBESig and RDASig. These signals and their effect on the UART core are outlined in Table 4.1 below.
  • 29. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 22 Table 4.1 Table of control signals and their effect on the state of the UART core UART Module status Transmit Receive Signal wrSig 0 Off N/A 1 On N/A rdSig 0 N/A On 1 N/A Off Feedback from the UART core was received through the TBE and RDA signals, which, when raised high, indicated that new data has been read or transmitted respectively. 4.5 Iteration control top module This module is made up of the circuitry that implements the DFPM algorithm. The sub-modules were designed to carry out the various computations and logical evaluation required in the DFPM method. 1. Signed_Vector_Vector_Mult_5By1: This module computes the ele- ment-wise product of two 5 by 1 vectors of 33-bit data. Its operation is concurrent and all computation results are immediately available at the output when the input values changes. 2. Signed_Vector_Vector_5By1_Subtr: This module computes the ele- ment-wise difference between the elements that make up two modules. It concurrently performs subtraction operations on two vectors contain- ing five elements of 33-bit data type and immediately assigns the result to the output. 3. Signed_SubtrAndMult_Ops_Module: This module instantiates the vector multiplication and the vector subtraction modules above and us- es them in the computation “B – A*X – mu*V” for each iteration stage of the DFPM algorithm.
  • 30. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 23 In this module, computation of the product of matrix A and vector X was a combination of concurrent and sequential operations. The prod- uct of one row of matrix A and the vector X was concurrent but since matrix A comprised of 5 rows, each row product was pipelined in order of row sequence. 4. Signed_New_V_Ops: This module computed a new value for the vec- tor V at each iteration stage of the DFPM algorithm. The value was based on the result of the operations carried out in the subtraction and multiplication operations module, described in number 3 above. 5. Signed_New_X_Ops: This module computed a new value for the vector X in each iteration stage of the DFPM algorithm. The new value for vec- tor X is always dependent on the new value of vector V above. 6. Signed_Tolerance_Check: This module receives the value of B-A*X as input and should then compare the Euclidean norm of the vector re- ceived with the pre-fixed tolerance value. However, computing square roots in FPGA can be problematic and introduce significant errors. Hence, the square of the tolerance value was compared with the square of the Euclidean norm, which is equivalent to the sum of the squares of the elements that make up the vector input. After comparison, if the square of the norm was found to be lesser than the square of the tolerance level, a signal line would then be raised and the algorithm terminates. The squares of the two vectors were comput- ed by self-multiplying them with the aid of the Vector_Vector_Mult module described above. When the condition checked by this module is found to be true, conver- gence is said to have been reached. 7. Signed_DFPM_One_Iteration: This module instantiated the subtraction and multiplication module, new v operation module, new x operation module and the tolerance check module. It connected the input and output appropriately and makes up all the operation that make up one iteration stage of the DFPM algorithm. 8. Signed_DFPM_Iteration_Control: This module instantiated the Signed_DFPM_One_Iteration module. It feeds the new V and X vectors back into the computational module and stops the iterations when con- vergence is attained.
  • 31. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 24 4.6 Implementation constraint In order to translate, map and route the design done in VHDL to device specif- ic circuit, an implementation constraints file named UCF_DFPM_TOP was used. The file links input and output pins specified in the project top module with the intended pin on the FPGA chip and demonstration board. 4.7 Parameters The design was intended to make room for some level of easy configurability. Thus, the initial values of vectors v and x, and the scalar discretization coeffi- cient (dt), the tolerance and the damping factor (mu) can be changed inside the DFPM modules. The UART module parameters can also be easily modified. The default values for these parameters are listed below: Table 4.2 Table of parameters and corresponding values used S/N Parameter Value used 1. Vector V [1 1 1 1 1] 2. Vector X [1 1 1 1 1] 3. Damping factor 0.1 4. Discretization coefficient 1.0 5. Tolerance 2-7 6. UART baud rate 9600 7. Number of data bits per trans- mission 8 8. Parity odd 9. Number of stop bits 1 10. Handshaking None
  • 32. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 25 4.8 Data exchange format The exchange of data between the PC terminal and the FPGA system needed to be standardized in order for the data to be stored in the correct structure and also for it to be usable by the DFPM computation modules. The MATLAB approach for specifying vectors and matrices was, hence, adopted. In order to specify a problem set of the type applicable in the format usable by the DFPM module, closing braces begin all problem sets, followed by each element of each row of the matrix separated by whitespace and each row in a matrix separated by a semicolon. The solution output from the FPGA is trans- mitted using the same standard except for the opening and closing braces. An example of the utilization is shown in the Figure 4.2 below.
  • 33. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 26 Figure 4.2 Image showing the terminal being used for data exchange be- tween the FPGA and the PC 4.9 Signed numerical representation Since digital systems only deal with binary arithmetic for numerical computa- tions and representation, the numbers handled in the DFPM algorithm were represented by using signed bits. This decision helped to ensure that positive and negative numbers were distinguished from one another. The downside of this approach was that the bit being used for sign representa- tion could not be used for numerical value representation. Therefore an extra
  • 34. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 27 bit needed to be added to the number of bits representing each signed number in order to make up for the shortfall. 4.10 Integer and fractional representation Another important consideration in the design was the representation of fractional values. It was decided that binary digits after the radix point will be represented and treated like whole integers i.e. shifted to the left. At the end of all computations, the result will also be shifted to the right by the appropriate number of binary digits to make up for the left shift. This process is a simple scheme that makes for the manipulation of fractions in a way that is similar to whole numbers. As a result, each number in the DFPM algorithm consisted of 33 bits. The MSB indicated the sign of the number while the next 16 bits represented the integer part of the value being handled. The fractional part of the number was then represented by the least significant 16 bits. Below is an image showing a sample numerical representation as used in the design. It can be seen that the MSB is “0” therefore it is a positive number. The next 16 bits are equivalent to 910 and the last 16 bits are equivalent to 0.628906 (i.e. 2-1 + 2-3 + 2-8). Hence the number represented in the image below is +9.628910. Fig 4.3 Image showing the numerical representation scheme
  • 35. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 28 The multiplication of two numbers with n number of fractional binary digits will result in a product with 2n fractional binary digits. This scheme, therefore, offers an advantage in multiplication operations since it ensures that multipli- cative operations maintain a precision of 2-810 for each operation. 4.11 Spartan 3E-1200 FG320 FPGA Spartan 3E-51200 FG320 FPGA is a standard performance 320-ball fine pitch ball grid array FPGA chip with 1.2 million gates, 136 K RAM, 28 dedicated multipliers and 250 user IO pins [7]. The chip is made up of five functional elements which are the Digital Clock Managers (DCMs), the Input/Output Blocks (IOBs), Configurable Logic Blocks (CLBs), dedicated multipliers and block RAMs. The dedicated multipliers are able to directly compute 18-bit by 18-bit multi- plication in two’s complement while the IOBs can be used for data input and output to and from the FPGA and the 136 K RAM is equivalent to 139264 bits of memory available for storage on (136 * 1024 bits). The logic of combinatorial and synchronous circuits resulting from the VHDL design is mainly imple- mented in CLBs (Configurable Logic Blocks) on the chip. 4.12 Nexys2 FPGA demonstration board The Nexys2 FPGA demonstration board is a hardware platform, designed and manufactured to accommodate and support the Spartan 3E FPGA, enable a demonstration of its capabilities and provide some standard hardware periph- eral access to the chip. It can be powered via USB, battery or wall socket and runs on a 50 MHz oscil- lator while featuring 16 MB SDRAM and flash and an impressive array of standard hardware interfaces like VGA, USB, RS232 ports as well as switches, buttons and a quad digit seven segment display [8].
  • 36. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 29 Figure 4.4 Image showing a Nexys2 FPGA demonstration board 4.13 Xilinx ISE Hardware design was done with Xilinx ISE (Integrated Synthesis Environ- ment) and the generated design was then downloaded onto the FPGA. Xilinx is free software developed by Xilinx for programming FPGAs and for their hardware design. There are a number of other design/synthesis environment applications for hardware design, e.g. Altera’s Quartus II design environment. However, Xilinx seemed to be an obvious choice due to the fact that it was offered by the vendor of the FPGA chip used, and also because it provides out-of-the-box support for the FPGA chip and the board used. 4.14 ISim simulation software ISim simulator software is a software application for the simulation of HDL code which is bundled with the Xilinx ISE software suite. It is easy to use and provides support for mixed languages, multi-threaded compilation, and dis- plays the circuit behavior with the aid of waveforms on the screen. ModelSim is also a simulation software that can be used but due to its usage restrictions and the author’s familiarity with ISim, ISim was chosen over ModelSim.
  • 37. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 30 4.15 Design verification For each module designed in this project, a test-bench was written for testing, simulation and verification of its functionality and behavior. Test-benches, in this context, refer to VHDL code written for the purpose of simulating opera- tional circumstances of the designed module in question. The modules being tested are normally referred to as unit under test (UUT). 4.16 The complete design The complete system integrated these different modules and connected them while doing type conversion in the top module where appropriate. The incom- ing data from the UART were converted to signed bit vectors and stored in memory on the FPGA until all the data necessary for each problem set were received. After this, a signal that activates the DFPM computation module is raised so that computation can start. The complete design made use of 26 multipliers, 12 IOB pins and 3243 LUTs. While the utilization of multipliers was 92%, the utilization of logical and IO blocks was much lower. A copy of the project report summary is included in the appendix of this report.
  • 38. DFPM On FPGA Taiyelolu Adeboye 4 Design 2015-09-25 31 Figure 4.5 The Nexys2 board FPGA connected to a PC and running the DFPM algorithm.
  • 39. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 32 5 Results Every module designed in Chapter 4 of this report was tested with a test-bench written in VHDL. The test benches were written to simulate the expected conditions and functional environment for each module. The simulations were done in ISim software and the module’s behavior verified through visual inspection and calculations. The test benches were not included in appendix of this report. The following are results of the tests carried out on the modules. It is worth noting that since the values represented in this chapter are basically binary, negative numbers were represented in two’s complement. 5.1 Simulation results 5.1.1 Element wise vector multiplication The image below shows the result of the simulation of the vector multiplica- tion module. Vectors 1 and 2 were input while vector_out was the output. Fig 5.1 Test simulation for Signed_Vector_Vector_Mult module Vector 1 = [5.0 3.0 2.0 4.0 7.0] and Vector 2 = [3.0 2.0 3.0 4.0 5.0]
  • 40. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 33 The output vector was 10011102 = 78.0 By calculation: (5*3) + (3*2) + (2*3) + (4*4) + (7*5) = 78 This supports the idea that the module worked fine. 5.1.2 Element-wise vector subtraction Figure 5.2 Test simulation for Signed_Vector_Vector_5By1_Subtr module Above is an image of the simulation waveform for the vector subtraction module. The input vectors were named vectors 1 and 2 while the output was named vector_out. Vector 1 = [1.0 7.81e-3 11.72e-3 15.62e-3 19.53e-3] Vector 2 = [15.0 3.91e-3 3.91e-3 3.91e-3 3.91e-3] Vector out = [-14.0 3.91e-3 7.81e-3 11.72e-3 15.62e-3]
  • 41. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 34 Simple calculation indicates that Vector 1 – vector 2 = vector out. 5.1.3 Evaluating new vector V In the image below, the effect of operations pipelining can be seen as the elements of vector_new_v assume new values one clock cycle after one anoth- er. The iteration complete signal indicates the completion of the subtraction and multiplication operations in each iteration stage. Figure 5.3 Test simulation for Signed_New_V_Ops 5.1.4 Evaluating new vector X Similar to the module in section 5.1.3 above, the effect of pipelining is seen in the evaluation of vector_new_x. The signal new_v_ready signified that the evaluation of the new value for vector V was complete and that the evaluation process for vector x can start.
  • 42. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 35 Figure 5.4 Test simulation for Signed_New_V_Ops The signal new_X_ready is a signal line that indicated that the operation was complete. The behavior was as expected. 5.1.5 Convergence check The tolerance check module was simulated with two sets of values for vector b_ax. The first set of values was set to be beyond the tolerance level while the second set of values was set to be below the expected limit. The signal “iteration complete” raised at the end of each multiplication and subtraction operation of the iteration stage. The convergence check module completes its function in about seven clock cycles, after which, the “iterate” signal should be raised high or low depending on the result of the convergence check.
  • 43. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 36 Figure 5.5 Test simulation for tolerance check module It can be seen above that after the second set of values were received and computed, the “iterate” signal was brought low. This is consistent with the design concept. 5.1.6 DFPM top module This simulation was done with the following input set: Vector B
  • 44. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 37 Matrix A Vectors X and V By visual inspection of the results from the simulation, the final value of vector X on the output was calculated thus: Vector X(0) is a negative number since the first bit is 1. 1111111111111111111000111011001012 in two’s complement is equivalent to - 0000000000000000000111000100110102 in unsigned binary. A simplified ap- proach to conversion of unsigned binary to and from two’s complement is outlined in the appendix.
  • 45. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 38 Figure 5.6 Test simulation for DFPM top module Hence it is correct to state that: Vector X(0) = - (0.0 + 2-3 + 2-4 + 2-5 + 2-9 + 2-12 + 2-13 + 2-15). Vector X(0) = -0.2211 In the same manner Vector X(1) is a negative number. 1111111111111111111100100011100112 in two’s complement is equivalent to - 0000000000000000000011011100011002 in unsigned binary. Hence, Vector X(1) = - (0.0 + 2-4 + 2-5 + 2-7 + 2-8 + 2-9 + 2-13 + 2-14) Vector X(1) = -0.1076 Vector X(2) , Vector X(3) and Vector X(4) are positive numbers since their MSB are 0. Therefore conversion from two’s complement is not required for them. Vector X(2) = 000000000000000000001001111100000
  • 46. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 39 Vector X(2) = +0.0 + 2-4 + 2-7 + 2-8 + 2-9 + 2-10 + 2-11 Vector X(2) = +0.0776 Vector X(3) = 000000000000000000011111001011011 Vector X(3) = +0.0 + 2-3 + 2-4 + 2-5 + 2-6 + 2-7 + 2-10 + 2-12 + 2-13 + 2-15 + 2-16 Vector X(3) = +0.2436 Vector X(4) = 000000000000000000101101000100000 Vector X(4) = +0.0 + 2-2 + 2-4 + 2-5 + 2-7 + 2-11 Vector X(4) = +0.3520 Therefore the final value of the solution vector in this simulation was While the behavior seen above was consistent with design expectation, it was considered that comparison with the output from a MATLAB implementation would help to further verify the module’s behavior. The values obtained from the MATLAB code and the VHDL simulations were quite close as the MATLAB implementation produced vector X as shown below: X = [-0.2199, -0.1074, 0.0775, 0.2440, 0.3521] 5.2 Comparison The circuit implemented on FPGA was tested by connecting the FPGA to a PC and sending in numbers that represented problem sets while the FPGA re- turned the solution to the problems. Since the accuracy was crucial, the results obtained during these tests were noted and compared with values obtainable from the same algorithm implemented in MATLAB on a PC. The comparison showed that the values obtained by both systems, for each problem set inves-
  • 47. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 40 tigated, were approximately equal. A table comparing the results obtained during two of these tests is shown below. Table 5.1 Table of a comparison of the results obtained from two runs of DFPM on different systems. 1st test 2nd test Problem Set Vector A Vector B Solution Vector (MATLA B/PC) Binary N/A N/A Decimal Solution Vector (FPGA) Binary
  • 48. DFPM On FPGA Taiyelolu Adeboye 5 Results 2015-09-25 41 Decimal
  • 49. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 42 6 Discussion Based on the tests carried out on the VHDL design modules, the behavior of the circuit was as expected. However, a number of implications need to be discussed. 6.1 FPGA resource utilization Due to the fact that FPGAs have limited resources, there are established limita- tions to the number of multiplication operations one can execute in parallel for problems of the 5x5 matrix dimension implemented in this design. As matrix dimensions get bigger the number of concurrent operations possible are re- duced proportionately. By this design, for a problem defined by an n dimension matrix and n-element vectors, then n + 5 number of multipliers will be needed for the design. This is because matrix row-vector multiplication in A*X was done concurrently for each row while other multiplication operations were done sequentially. An- other limitation is the data size expected by the dedicated multipliers. The Spartan 3E multipliers are 18-bit multipliers by default and multiplication operations involving data types bigger than 18 bits will consume even more resources. As can be seen in the project report, the actual number of multipli- ers used was 26 out of a total of 28. 6.2 Reduction in computation time For every iteration stage of this design, computation time for (n-1)2 is saved. Thus for a solution requiring m number of iterations, the time required for ((n – 1)2 * m) multiplication operations are saved per solution. For instance, a 5 by 5 design as implemented in this project work saves the computation time for 1600 multiplication operations for a solution requiring a hundred iterations. 6.3 Larger problem sets An approach to implementing this design for significantly larger problem sets might be to section the complete data set into subsets containing small-sized problem sets which the module is capable of handling. The solutions can then be stored and reused as appropriate. At a point, this approach might encounter
  • 50. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 43 limitations as well, due to the fact that the on-chip memory of FPGAs is also limited. However, this was not the focus of this design. 6.4 UART bottleneck Tests showed that each iteration stage of DFPM computation for a 5 by 5 dimensioned problem required 28 clock cycles. However, the data was being received through a 9600 baud rate UART. The UART is, thus, slower than the DFPM computations. In a case where large volumes of data may need to be transmitted to the DFPM computation module, the UART may prove to be a bottleneck. This problem might be mitigated with the use of a more parallel communication mode and faster transmission rates. 6.5 Precision Although the number of bits assigned for fractional value representation was quite many (16 bits), there might be some challenges when it comes to the accuracy of the exact values obtained from multiplication operations. This is because the result of the multiplication of two 33-bit values is a 66-bit value. When this product is to be stored back in a 32-bit data type container, then some bits will be lost. This problem will, most likely, not affect integer values in the DFPM computa- tion but can result in some precision loss in the fractional representation. 6.6 Communication input/output limitations Since the data received from the UART could not be used directly, modules were written for the forward and reverse translation of the data transmitted to and received from the DFPM computation module. For instance, due to the translation done in the “UART_out_DFPM_in” mod- ule, only single digit decimal numbers are expected as input data typifying the problem set. Likewise, in order to reduce FPGA resource consumption, reverse translation of the solution vector element sets was also limited to four fraction- al digits. 6.7 Cross platform comparison Since the goal of the project is to implement DFPM in an FPGA design that is speed optimized, the CPU time consumed by the algorithm became an issue of pertinent importance. However, since different computational devices have varying architectures and processing speed, as well as operating systems, a
  • 51. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 44 reasonable metric for the evaluation of the computation time that is independ- ent of these parameters was needed in order to compare the performance of the FPGA design with other implementations. The agreed metric was the number of clock cycles used by the processing unit while executing the DFPM algorithm. Thus comparison was done between the DFPM computation done on the FPGA and the same algorithm coded in C++ and run on a 2.4 GHz CPU PC. The FPGA implementation completed the algorithm for solving the sample problem used for testing the DFPM top module (according to simulation) in 57670 nanoseconds which is equivalent to 2883.5 clock cycles while the PC used completed the same problem in 0.0156001 seconds. The time used up by the PC included the time used for context switching and kernel operations, in the operating system, as well as process user time. Provi- sion was made in the C++ code used for implementing the algorithm and for measuring the time taken. In the C++ code, arrays with a dimension of 1000 were created for storing a thousand copies of vectors A and B and the DFPM algorithm was implement- ed and looped through each copy of the same problem statement. Thus a thousand copies of the same problem were treated with the same algorithm. The large number of iterations was a result of the fact that the amount of time spent by the CPU in kernel mode was sometimes too low to be measured by the functions used to measure the CPU process times when the algorithm was run only once. Hence running the algorithm a thousand times generated reasonably measur- able process times from which the time spent by the CPU while not running the actual algorithm was deducted and the result of the deduction was divided by 1000 in order to trim down the CPU time obtained to what is applicable to a single run of the DFPM algorithm. Based on the test, and the assumptions that the program/algorithm was exe- cuted on only one core of the CPU and that the CPU was not overclocking, the number of clock cycles used by the PC = 2.4 * 109 * 0.0156001/1000 = 37440.240. This evidently indicated that the FPGA implementation offers a great ad- vantage. It is noteworthy to state that if the CPU executed the program on multiple cores or overclocked while running the program, the PC may have ended up
  • 52. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 45 using more cycles than stated above. Nonetheless, the calculations show that in both cases, DFPM would still have been faster. A copy of the C++ code is included in the appendices. 6.8 Output comparison In order to ensure consistency of results and ease of operation, a MATLAB script was written which is able to communicate problem specifications to the FPGA and receive its results. The MATLAB script also computes the algorithm on its own and the two outputs were printed to the screen and compared. The script is described further in Appendix D with the code included. By making use of the script described above, three different problem sets were formulated and fed to the DFPM on FPGA design through the MATLAB script. The results obtained are shown below as well as the MATLAB plots of the values obtained during each test. The plots have no units on the x and y axes since the plots were only used to indicate the proximity between the results obtained. Hence the plots showed the location of each of the results obtained on the co-ordinate axes. Figure 6.1 Plot of the values obtained during the first test
  • 53. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 46 Table 6.1 Table of results obtained in tests with three different problem sets Tests Results obtained MATLAB implementation FPGA implementation Test 1 -2.4599e-01 -1.9253e-01 +5.8280e-03 +2.5866e-01 +5.0859e-01 -2.4715e-01 -1.9301e-01 +5.7221e-03 +2.5965e-01 +5.1057e-01 Test 2 -3.8910e-01 -1.5755e-01 +1.2061e-02 +2.6273e-01 +5.1339e-01 -3.9112e-01 -1.5810e-01 +1.1765e-02 +2.6343e-01 +5.1507e-01 Test 3 +6.5463e-01 +3.7920e-01 +3.1785e-01 +6.8058e-02 -1.8173e-01 +6.5653e-01 +3.7948e-01 +3.2008e-01 +6.8391e-02 -1.8323e-01
  • 54. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 47 Figure 6.2 Plot of the values obtained during the second test
  • 55. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 48 Figure 6.3 Plot of the values obtained during the third test As can be seen in the figures and table above, in each of the three tests carried out, the results of the MATLAB implementation and the FPGA implementa- tion tallied so much so that the point plots overlapped at each of the positions marked on the plots, indicating that, to a large extent, the differences in the values obtained are almost negligible. However, it is worth noting that these tests made use of single digit data as coefficients in the matrices and vectors used to define the problem sets. It is believed that this implementation can handle these kinds of data but the de- sign of the communication modules were limited and only capable (by design intent) to handle single digit input alone. While the MATLAB implementation produced results that are very close, it may be reasonable to expect some variation with some other implementations and system architectures due to the differences in hardware and software design, as well as system optimization, be it in hardware or software.
  • 56. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 49 6.9 Communication possibilities As indicated in an earlier part of this discussion, the speed of the whole system was limited due to bottlenecks in the UART. However, in consideration of the fact that most inter-component communication between electronic modules and components make use of standard protocols, of which UART is one, this design will still perform slightly better and faster than most other designs that make use of sequential processing. Nonetheless, there are other faster protocols which can be exploited in order to speed up the rate of data exchange and parallel communication can also be considered since the FPGA has a substantial number of I/O (Input/Output) pins. 6.10 Applications This design concept can find application in a large number of fields ranging from mathematical theory to real world engineering design and systems. The DFPM can be used to model systems in nature, for instance heat flow in a space, and fluid flow [10] etc. A great number of applications can also be found in electronics and engineer- ing in general. DFPM will prove very useful in solving least squares and, possibly, weighted least squares problems in sensor fusion. This will prove useful in radar systems, telecommunications, multi-sensor networks and mobile sensory and localization problems often encountered in systems requir- ing self-localization, e.g. mobile robots, and sound-source detecting systems. DFPM looks promising for the field of image and signal processing especially in problems requiring singular value decomposition (SVD). DFPM will also find great usefulness in mechanics where complex linear and non-linear sys- tems may need to be modeled. Solutions of large matrix problems often require significant computation and computational resources, hence DFPM can be found to be a very suitable and resource-efficient approach to solving these problems. It will be even more useful when the problem involves sparse matrices, a concept that is useful in FEM based simulations which is used in all engineering fields [9].
  • 57. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 50 A DFPM algorithm based on a smaller dimensioned matrix that functions as a sliding window through the matrix can serve as a very quick, efficient ap- proach that requires minimal computational resources. 6.11 Implications While DFPM offers a lot of advantages and developmental possibilities, there are situations in which its efficiency can possibly be exploited for negative purposes. Certain aspects of data safety and integrity depend on hashing and a signifi- cant amount of computational resource and time is required to break them but the advent of simpler algorithms and dedicated devices (e.g.) FPGAs with great computational power facilitate access to, supposedly secured, data by criminals.
  • 58. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 51 7 Conclusions It was found that the design approach met expectations and offered significant advantages over traditional computational devices and methods. It was also found that implementing the DFPM algorithm in FPGA is an efficient ap- proach to reducing computation time and improving resource efficiency. Since the DFPM algorithm is widely applicable to a number of other problems, implementing the algorithm in a dedicated device that makes efficient use of resources, while increasing the speed at which results are obtained, offers a lot of advantages. 7.1 Benchmark In order to base the conclusions drawn in this project on criteria that are inde- pendent of platforms, the computation output and the number of clock cycles were used. Based on the result of a test carried out using the C++ snippet in Appendix A, on a mobile PC, Acer Aspire 5750, with dual CPU cores running at 2.4 GHz clock speed, it was observed that the same algorithm applied to a specific problem required 75754 clock cycles on the PC while the same problem was completed in 3192 clock cycles using the FPGA implementation. Regardless of the significant difference in computation time and computational architecture and resources, the results obtained from both computations were close enough to be regarded as equivalent. Hence, the initial goals of the design were achieved and the expectation of superior performance and resource-efficiency was verified. 7.2 Further work A lot can be improved in this design. Below is a list of possibilities: 1. Improving the forward translation modules so that they can handle multi-digit decimal input in the problem set. 2. Modifying the module that reverse-translates the solution vector from the DFPM top module so that they are able to handle the full range of bits representing fractional values in the data type used in the design.
  • 59. DFPM On FPGA Taiyelolu Adeboye 6 Discussion 2015-09-25 52 3. Designing the DFPM computational module to be able to handle larger problem sets along with the possibility of handling multi-dimensional problem sets. 4. Enhancing the UART baud rate as well as making it configurable in use. This will reduce the stress that can be encountered while setting up a connection between the UART on the FPGA and the terminal applica- tion software. 5. Enhancing the design so that it can handle multiple problem sets, i.e. re- ceive a problem set, resolve it and return to wait for the next problem.
  • 60. DFPM On FPGA 2015-09-25 53 References [1] S. Edvardsson, M. Gulliksson, J. Persson, et. al, “The Dynamic Functional Particle Method: An Approach for Boundary Value Problems”, J. Appl. Mech. 79(2) 021012 (Feb 24, 2012) [2] S. Edvardsson et al, Role of the dynamic functional particle method for solving linear equations, Physical Review E. Statistical, Nonlinear, and Soft Matter Physics. [3] R. Sincovec, N. Madsen, Software for non-linear partial differential equations, ACM Trans. Math. Softw. 1 (1975) 232 260 [4] V. Pata, M. Squassina, On the strongly damped wave equation, Com- mun. Math. Phy. 253 (2005) 511 533 [5] F. Alvarez, On the minimization property of a second order dissipative system in Hilbert spaces, Siam J. Control Optim. 38 (2000) 1102 1119 [6] B. Land, Hybrid Computing On an FPGA, Cornell University, https://courses.cit.cornell.edu/ece576/DDA/FPGAhybridBRL.pdf, last re- trieved 2014-09-25 [7] Xilinx Inc., 2013: Spartan 3-E FPGA family data sheet, http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf , last retreived 2014-09-25 [8] Digilent Inc., 2011, Digilent Nexys2 Board Reference manual, http://www.digilentinc.com/data/products/nexys2/nexys2_rm.pdf , last retrieved 2014-09-25 [9] Y. Saad, Iterative methods for sparse linear systems, 2nd ed., Society for Industrial and applied mathematics, 2003. [10] Ne_Zheng Sun, Applications of numerical methods to simulate the movements of contaminants in groundwater, Environmental Health Per- spectives, Vol. 83, (Nov. 1989), pp. 97 – 115. [11] ASCII Table, www.asciitable.com , last retrieved 2014-09-26.
  • 61. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 54 Appendix A: Documentation of developed program code Design codes Vector multiplication 1 -------------------------------------------------------------- 2 -- Company: Mid Sweden University 3 -- Engineer: Taiyelolu Adeboye 4 -- 5 -- Create Date: 10:42:33 01/07/2015 6 -- Design Name: 7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral 8 -- Project Name: DFPM on FPGA 9 -- Target Devices: Nexys2 10 ------------------------------------------------------------- 11 library IEEE; 12 use IEEE.STD_LOGIC_1164.ALL; 13 use IEEE.std_logic_signed.all; 14 use work.DFPM_ARRAY_5X32_BIT.all; 15 16 -- Uncomment the following library declaration if using 17 -- arithmetic functions with Signed or Unsigned values 18 use IEEE.NUMERIC_STD.ALL; 19 20 -- Uncomment the following library declaration if instantiating 21 -- any Xilinx primitives in this code. 22 --library UNISIM; 23 --use UNISIM.VComponents.all; 24 25 entity Signed_Vector_Vector_Mult_5By1 is 26 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT; 27 Vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT; 28 CLK : in STD_LOGIC; 29 RST : in STD_LOGIC; 30 Vector_Out : out Signed (32 downto 0)); 31 end Signed_Vector_Vector_Mult_5By1; 32 33 architecture Behavioral of Signed_Vector_Vector_Mult_5By1 is 34 35 Signal Mult0, Mult1, Mult2, Mult3, Mult4 : Signed(65 downto 0):= (others => '0'); 36 37 Signal Sum : Signed(69 downto 0):= (others => '0'); 38 39 begin 40 41 Mult0 <= Vector_1(0) * Vector_2(0); 42 Mult1 <= Vector_1(1) * Vector_2(1); 43 Mult2 <= Vector_1(2) * Vector_2(2); 44 Mult3 <= Vector_1(3) * Vector_2(3); 45 Mult4 <= Vector_1(4) * Vector_2(4); 46 47 Sum <= "0000" & Mult0 + Mult1 + Mult2 + Mult3 + Mult4; 48 49 Vector_Out <= Sum(48 downto 16);
  • 62. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 55 50 51 end Behavioral; Vector subtraction 1 -------------------------------------------------------------- 2 -- Company: Mid Sweden University 3 -- Engineer: Taiyelolu Adeboye 4 -- 5 -- Create Date: 10:42:33 01/07/2015 6 -- Design Name: 7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral 8 -- Project Name: DFPM on FPGA 9 -- Target Devices: Nexys2 10 ------------------------------------------------------------- 11 12 library IEEE; 13 use IEEE.STD_LOGIC_1164.ALL; 14 use IEEE.std_logic_signed.all; 15 use work.DFPM_ARRAY_5X32_BIT.all; 16 17 -- Uncomment the following library declaration if using 18 -- arithmetic functions with Signed or Unsigned values 19 use IEEE.NUMERIC_STD.ALL; 20 21 -- Uncomment the following library declaration if using 22 -- arithmetic functions with Signed or Unsigned values 23 --use IEEE.NUMERIC_STD.ALL; 24 25 29 30 entity Signed_Vector_Vector_5By1_Subtr is 31 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT; 32 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT; 33 CLK : in STD_LOGIC; 34 RST : in STD_LOGIC; 35 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT); 36 end Signed_Vector_Vector_5By1_Subtr; 37 38 architecture Behavioral of Signed_Vector_Vector_5By1_Subtr is 39 40 Signal Subtr0, Subtr1, Subtr2, Subtr3, Subtr4 : Signed(33 downto 0); 41 42 begin 43 44 Subtr0 <= '0' & Vector_1(0) - vector_2(0); 45 Subtr1 <= '0' & Vector_1(1) - vector_2(1); 46 Subtr2 <= '0' & Vector_1(2) - vector_2(2); 47 Subtr3 <= '0' & Vector_1(3) - vector_2(3); 48 Subtr4 <= '0' & Vector_1(4) - vector_2(4); 49 50 Vector_Out(0) <= Subtr0(32 downto 0); 51 Vector_Out(1) <= Subtr1(32 downto 0); 52 Vector_Out(2) <= Subtr2(32 downto 0); 53 Vector_Out(3) <= Subtr3(32 downto 0); 54 Vector_Out(4) <= Subtr4(32 downto 0); 55 56 57 end Behavioral; Subtraction and multiplication operations Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015 Page 1
  • 63. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 56 1 -------------------------------------------------------------- 2 -- Company: Mid Sweden University 3 -- Engineer: Taiyelolu Adeboye 4 -- 5 -- Create Date: 10:42:33 01/07/2015 6 -- Design Name: 7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral 8 -- Project Name: DFPM on FPGA 9 -- Target Devices: Nexys2 10 ------------------------------------------------------------- 11 12 library IEEE; 13 use IEEE.STD_LOGIC_1164.ALL; 14 use IEEE.std_logic_signed.all; 15 use work.DFPM_ARRAY_5X32_BIT.all; 16 use work.DFPM_ARRAY_25X32_BIT.all; 17 use IEEE.NUMERIC_STD.ALL; 18 19 20 entity Signed_SubtrAndMult_Ops_Module is 21 Port ( Vector_A : in DFPM_SIGNED_VECTOR_25X32_BIT; 22 Vector_B : in DFPM_SIGNED_VECTOR_5X32_BIT; 23 Vector_X : in DFPM_SIGNED_VECTOR_5X32_BIT; 24 Scalar_Mu : in SIGNED (32 downto 0); 25 Vector_V : in DFPM_SIGNED_VECTOR_5X32_BIT; 26 27 CLK : in STD_LOGIC; 28 RST : in STD_LOGIC; 29 NEW_ITERATION : in STD_LOGIC := '0'; 30 ITERATION_COMPLETE : out STD_LOGIC:= '0'; 31 32 B_Minus_AX : out DFPM_SIGNED_VECTOR_5X32_BIT; 33 B_Minus_Ax_Minus_muV : out DFPM_SIGNED_VECTOR_5X32_BIT); 34 end Signed_SubtrAndMult_Ops_Module; 35 36 architecture Behavioral of Signed_SubtrAndMult_Ops_Module is 37 38 ------------------------------------------------ 39 40 41 -- This component will be used to evaluate 42 -- The vector multiplication A*X 43 -- It takes two input of 5 by 1 vectors 44 COMPONENT Signed_Vector_Vector_Mult_5By1 45 PORT( 46 Vector_1 : IN DFPM_SIGNED_VECTOR_5X32_BIT; 47 Vector_2 : IN DFPM_SIGNED_VECTOR_5X32_BIT; 48 CLK : IN std_logic; 49 RST : IN std_logic; 50 Vector_Out : OUT Signed(32 downto 0) 51 ); 52 END COMPONENT; 53 54 -- This component will be used top evaluate the subtraction in B - Ax 55 COMPONENT Signed_Vector_Vector_5By1_Subtr 56 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT; 57 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT; 58 CLK : in STD_LOGIC; 59 RST : in STD_LOGIC; 60 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT); 61 END COMPONENT; 62
  • 64. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 57 63 ------------------------------------------------ 64 65 66 67 ------------------------------------------------ 68 -- Signals for storing the input values 69 Signal Sig_Vector_A : DFPM_SIGNED_VECTOR_25X32_BIT := ( ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')), 70 ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')), 71 ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')), 72 ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')), 73 ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'))); 74 75 Signal Sig_Vector_B : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 76 Signal Sig_Vector_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 77 Signal Sig_Scalar_Mu: SIGNED (32 downto 0); 78 Signal Sig_Vector_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 79 80 81 -- The two signals below are used to connect the signals at the Vector_vector_Mult_Module 82 -- To the the Corresponding Vector indexes. 83 -- These were used to avoid assigning Dynamically changing signals directly to a static line 84 Signal Sig_Vector_A_With_IndexPosition : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 85 86 Signal Sig_Vector_A_Mult_X_With_IndexPosition : SIGNED (32 downto 0); 87 88 -- These following two(2) signals will be used to store the products of the 89 -- Multiplication of Vectors A and X 90 -- as well as Scalar mu and Vector V. 91 Signal Sig_Vector_A_Mult_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others => '0'), ( Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 92 Signal Sig_Vector_Mu_Mult_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth- ers => '0'), ( Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 93 94 -- These following tow signals will be used to store the result 95 -- of the subtraction operations 96 Signal Sig_Vector_B_Minus_AX : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth- ers => '0'), ( Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 97 Signal Sig_Vector_B_Minus_AX_Minus_MuV : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
  • 65. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 58 '0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others => '0')); 98 99 -- This signal will only be raised for one clock cycle 100 -- when there is a new set of data for available computation 101 Signal DFPMCompute : STD_LOGIC := '0'; 102 103 -- This signal is used to sommunicate with other modules "down- stream" of this module 104 -- when there the result of this module's computation is ready 105 Signal Sig_ITERATION_COMPLETE : STD_LOGIC := '0'; 106 107 -- This Signal will be used to represent the index position that 108 -- that will be progressively incremented as a means of pipelining 109 -- data for multiplication in this module as well as input for the 110 -- Vector_Vector_Multiplication module 111 Signal MultplicationStageArrayPosition : integer := 0; 112 113 -- This signal will be used to signal when the index position 114 -- can be shifted and when data can be stored for output 115 Signal Shift_Array_Position : STD_LOGIC := '0'; 116 117 -- This signal will be raised once when all the products of multi- plication are ready. 118 -- This is to enable the module to signal to other modules "down- stream" 119 -- that the result of the computation is ready 120 Signal MultiplicationProductsReady : STD_LOGIC := '0'; 121 122 Signal ReadyFlag : STD_LOGIC := '0'; 123 124 -- This clock signal was created as a slowed down (half pace of CLK) 125 -- And will be used for clocking the shifting of the index position 126 Signal Sig_Clk_For_Index_Shifting : STD_LOGIC := '0'; 127 128 129 begin 130 -- For Vector - Vector multiplication 131 Vector_Vector_Mult: Signed_Vector_Vector_Mult_5By1 PORT MAP ( 132 Vector_1 => Sig_Vector_A_With_IndexPosition, 133 Vector_2 => Sig_Vector_X, 134 CLK => CLK, 135 RST => RST, 136 Vector_Out => Sig_Vector_A_Mult_X_With_IndexPosition); 137 138 -- For Subtraction operations for B - AX 139 Doing_B_Minus_AX : Signed_Vector_Vector_5By1_Subtr PORT MAP ( 140 Vector_1 => Sig_Vector_B, 141 vector_2 => Sig_Vector_A_Mult_X, 142 CLK => CLK, 143 RST => RST, 144 Vector_Out => Sig_Vector_B_Minus_AX); 145 146 -- For Subtraction operations for B - AX - muV 147 Doing_B_Minus_AX_Minus_MuV : Signed_Vector_Vector_5By1_Subtr PORT MAP ( 148 Vector_1 => Sig_Vector_B_Minus_AX, 149 vector_2 => Sig_Vector_Mu_Mult_V, 150 CLK => CLK, 151 RST => RST, 152 Vector_Out => Sig_Vector_B_Minus_AX_Minus_MuV);
  • 66. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 59 153 154 -- This signal wiill be used to signal that the output of this module is ready to be read. 155 ITERATION_COMPLETE <= Sig_ITERATION_COMPLETE; 156 157 158 159 160 161 -- This process determines the when each iteration of the DFPM algorithm is to be started 162 -- Computation will only be done if it's a new iteration and it has not been completed before 163 -- Therefore this process sets DFPMCompute to '1' only on the rising edge of NEW_ITERATION 164 -- And stored new Value into the Vectors only at the rising edge of NEW_ITERATION 165 process(CLK, RST, Sig_ITERATION_COMPLETE, NEW_ITERATION) 166 Variable NEW_ITERATION_Var : STD_LOGIC := '0'; 167 begin 168 if rising_edge(CLK) then 169 if (RST = '1') then 170 DFPMCompute <= '0'; 171 NEW_ITERATION_Var := '0'; 172 elsif (Sig_ITERATION_COMPLETE = '1') then 173 NEW_ITERATION_Var := '0'; 174 DFPMCompute <= '0'; 175 -- This more or less senses for the rising edge of NEW_ITERATION 176 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '0') then 177 --if rising_edge(NEW_ITERATION) then 178 NEW_ITERATION_Var := '1'; 179 180 Sig_Vector_A <= Vector_A; 181 Sig_Vector_B <= Vector_B; 182 Sig_Vector_X <= Vector_X; 183 Sig_Vector_V <= Vector_V; 184 Sig_Scalar_Mu <= Scalar_Mu; 185 186 DFPMCompute <= '1'; 187 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '1') then 188 NEW_ITERATION_Var := '0'; 189 DFPMCompute <= '0'; 190 elsif (NEW_ITERATION = '0') then 191 NEW_ITERATION_Var := '0'; 192 DFPMCompute <= '0'; 193 end if; 194 end if; 195 end process; 196 197 198 -- This process determies the array postions to be multiplied together for A*X 199 process(RST, Sig_ITERATION_COMPLETE, DFPMCompute, Shift_Array_Position, NEW_ITERATION, CLK, Sig_Clk_For_Index_Shifting, MultplicationStageAr- rayPosition, Sig_Vector_A, Sig_Vector_A_Mult_X_With_IndexPosition, Sig_Scalar_Mu, Sig_Vector_V) 200 Variable MultplicationStageArrayPosition_Var : integer := 0;
  • 67. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 60 201 202 begin 203 if (RST = '1') then 204 MultplicationStageArrayPosition <= 0; 205 Shift_Array_Position <= '0'; 206 MultiplicationProductsReady <= '0'; 207 208 elsif (Sig_ITERATION_COMPLETE = '1') then 209 MultplicationStageArrayPosition <= 0; 210 Shift_Array_Position <= '0'; 211 212 elsif (DFPMCompute = '1') then -- Checking for the rising edge of NEW iteration here 213 MultplicationStageArrayPosition <= 0; 214 Shift_Array_Position <= '1'; 215 MultiplicationProductsReady <= '0'; 216 217 -- Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0); 218 -- Sig_Vector_A_Mult_X(0) <= Sig_Vector_A_Mult_X_With_IndexPosition; 219 -- productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0); 220 -- Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <= productTempStore(48 downto 16); 221 222 elsif (Shift_Array_Position = '1') then 223 if rising_edge(Sig_Clk_For_Index_Shifting) then 224 if (MultplicationStageArrayPosition = 5) then 225 MultplicationStageArrayPosition <= 0; 226 Shift_Array_Position <= '0'; 227 MultiplicationProductsReady <= '1'; 228 else 229 MultplicationStageArrayPosition_Var := MultplicationStageArrayPosition; 230 MultplicationStageArrayPosition <= MultplicationStageArrayPosition_Var + 1; 231 end if; 232 end if; 233 end if; 234 end process; 235 236 process(CLK, DFPMCompute, Shift_Array_Position, Multplication- StageArrayPosition) 237 Variable productTempStore : Signed(65 downto 0); 238 begin 239 if rising_edge(CLK) then 240 if (Shift_Array_Position = '1') and ( MultplicationStageArrayPosi- tion < 5 ) then 241 case MultplicationStageArrayPosition is 242 when 0 => 243 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0); 244 Sig_Vector_A_Mult_X(0) <= Sig_Vector_A_Mult_X_With_IndexPosition; 245 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0); 246 when 1 => 247 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(1); 248 Sig_Vector_A_Mult_X(1) <= Sig_Vector_A_Mult_X_With_IndexPosition; 249 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(1); 250 when 2 => 251 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(2); 252 Sig_Vector_A_Mult_X(2) <= Sig_Vector_A_Mult_X_With_IndexPosition; 253 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(2); 254 when 3 =>
  • 68. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 61 255 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(3); 256 Sig_Vector_A_Mult_X(3) <= Sig_Vector_A_Mult_X_With_IndexPosition; 257 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(3); 258 when 4 => 259 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(4); 260 Sig_Vector_A_Mult_X(4) <= Sig_Vector_A_Mult_X_With_IndexPosition; 261 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(4); 262 when Others => 263 NULL; 264 end case; 265 -- -- Setting the correcponding Vector_A element as the input to the Vector_Vector_Mult_Module 266 -- Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(MultplicationStageArrayPosition); 267 -- -- Connecting the output of the Vector_Vector_Mult module to tghe corresponding A_Mult_X index 268 -- Sig_Vector_A_Mult_X(MultplicationStageArrayPosition) <= Sig_Vector_A_Mult_X_With_IndexPosition; 269 -- -- Doing mu*V 270 -- productTempStore := Sig_Scalar_Mu * Sig_Vector_V(MultplicationStageArrayPosition); 271 Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <= productTempStore(48 downto 16); 272 end if; 273 end if; 274 end process; 275 276 277 -- This process clears ITERATION_COMPLETE and 278 -- only sets it to 1 when the MultiplicationProductsReady signal is high. 279 -- At the rising_edge of MultiplicationProductsReady, the vectors 280 -- B_Minus_AX and B_Minus_Ax_Minus_muV are assigned. 281 process(CLK, RST, DFPMCompute, MultiplicationProductsReady, Ready- Flag) 282 begin 283 if rising_edge(clk) then 284 if (RST = '1') then 285 Sig_ITERATION_COMPLETE <= '0'; 286 ReadyFlag <= '0'; 287 288 elsif (DFPMCompute = '1') then 289 Sig_ITERATION_COMPLETE <= '0'; 290 ReadyFlag <= '0'; 291 elsif (MultiplicationProductsReady = '1') and (ReadyFlag = '0') then 292 ReadyFlag <= '1'; 293 294 Sig_ITERATION_COMPLETE <= '1'; 295 B_Minus_AX <= Sig_Vector_B_Minus_AX; 296 B_Minus_Ax_Minus_muV <= Sig_Vector_B_Minus_AX_Minus_MuV; 297 else 298 Sig_ITERATION_COMPLETE <= '0'; 299 -- end if; 300 end if; 301 end if; 302 end process; 303 304 -- The clock signal created in this process is a real afterthought 305 -- It would not have been created if this module had behaved itself ;-)) 306 -- It was observed that the circuit computed an output that was wrong
  • 69. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 62 307 -- For as long as the shifting of the index position was based on the normal clock "CLK" 308 -- Hence this clock that cuts the speed to half. Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015 Page 7 309 process(CLK) 310 begin 311 if rising_edge(CLK) then 312 Sig_Clk_For_Index_Shifting <= not(Sig_Clk_For_Index_Shifting); 313 end if; 314 End process; 315 316 end Behavioral; 317 318 Tolerance check 1 --------------------------------------------------------------------- ------------- 2 -- Company: Mid Sweden University 3 -- Engineer: Taiyelolu Adeboye 4 -- 5 -- Create Date: 10:42:33 01/07/2015 6 -- Design Name: 7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral 8 -- Project Name: DFPM on FPGA 9 -- Target Devices: Nexys2 10 -------------------------------------------------------------------- -------------- 11 12 library IEEE; 13 use IEEE.STD_LOGIC_1164.ALL; 14 use IEEE.std_logic_signed.all; 15 use work.DFPM_ARRAY_5X32_BIT.all; 16 17 18 -- Uncomment the following library declaration if using 19 -- arithmetic functions with Signed or Unsigned values 20 use IEEE.NUMERIC_STD.ALL; 21 22 -- Uncomment the following library declaration if using 23 -- arithmetic functions with Signed or Unsigned values 24 --use IEEE.NUMERIC_STD.ALL; 25 26 -- Uncomment the following library declaration if instantiating 27 -- any Xilinx primitives in this code. 28 --library UNISIM; 29 --use UNISIM.VComponents.all; 30 31 entity Signed_Tolerance_Check is 32 Port ( Vector_B_AX : in DFPM_SIGNED_VECTOR_5X32_BIT; 33 Tolerance_Limit : in Signed (32 downto 0); 34 Iteration_Complete : in STD_LOGIC:= '0'; 35 36 CLK : in STD_LOGIC:= '0'; 37 RST : in STD_LOGIC:= '0'; 38
  • 70. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 63 39 Tolerance_Limit_Squared, Vector_B_AX_Sum : out Signed (32 downto 0); 40 41 Iterate : out STD_LOGIC := '1'); 42 end Signed_Tolerance_Check; 43 44 architecture Behavioral of Signed_Tolerance_Check is 45 46 Signal Sig_Vector_B_AX, Sig_Vector_B_AX_Squared : DFPM_SIGNED_VECTOR_5X32_BIT; 47 Signal Sig_Tolerance_Limit, Sig_Tolerance_Limit_Squared : Signed (32 downto 0); 48 49 Signal Sig_Vector_B_AX_Sum : Signed(32 downto 0); 50 51 Signal Sig_Position : integer := 0; 52 53 Signal Sig_ShiftPosition, Sig_Multiplication_Is_Complete, Sig_Check_Tolerance_Limit : STD_LOGIC := '0'; 54 55 56 57 58 begin 59 60 Tolerance_Limit_Squared <= Sig_Tolerance_Limit_Squared; 61 Vector_B_AX_Sum <= Sig_Vector_B_AX_Sum; 62 63 -- This process determines when data stored innternally are to be serially multiplied 64 -- They are serially multiplied to save on Multipliers 65 process(CLK, RST, Iteration_Complete, Sig_ShiftPosition, Sig_Position) 66 Variable Var_Position: integer := 0; 67 begin 68 if rising_edge(CLK) then 69 if (RST = '1') then 70 Sig_Position <= 0; 71 Sig_ShiftPosition <= '0'; 72 Sig_Multiplication_Is_Complete <= '0'; 73 elsif (Iteration_Complete = '1') then 74 Sig_Check_Tolerance_Limit <= '0'; 75 Sig_Position <= 0; 76 Sig_ShiftPosition <= '1'; 77 Sig_Multiplication_Is_Complete <= '0'; 78 elsif (Sig_Multiplication_Is_Complete = '1') then 79 Sig_Check_Tolerance_Limit <= '1'; 80 else 81 if (Sig_ShiftPosition = '1') then 82 if (Sig_Position = 5) then 83 Sig_Position <= 0; 84 Sig_Multiplication_Is_Complete <= '1'; 85 Sig_ShiftPosition <= '0'; 86 else 87 Var_Position := Sig_Position; 88 Sig_Position <= Var_Position + 1; 89 end if; 90 end if; 91 end if; 92 end if; 93 end process; 94
  • 71. DFPM On FPGA Appendix A: Documentation of developed program code 2015-09-25 64 95 -- Storing data internally at when signal from SubtrAndMult Module is high 96 process(Iteration_Complete) 97 Variable productTempStore : Signed(65 downto 0) := (Others => '0'); 98 begin 99 if rising_edge(Iteration_Complete) then 100 Sig_Tolerance_Limit <= Tolerance_Limit; 101 Sig_Vector_B_AX <= Vector_B_AX; 102 end if; 103 end process; 104 105 -- Serial multiplication 106 process(CLK, Sig_ShiftPosition, Sig_Position) 107 Variable productTempStore : Signed(65 downto 0); 108 begin 109 if rising_edge(clk) then 110 if (Sig_ShiftPosition <= '1') then 111 Case Sig_Position is 112 when 0 => 113 productTempStore := (Sig_Vector_B_AX(Sig_Position) * Sig_Vector_B_AX(Sig_Position)); 114 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48 downto 16); 115 when 1 => 116 productTempStore := (Sig_Vector_B_AX(Sig_Position) * Sig_Vector_B_AX(Sig_Position)); 117 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48 downto 16); 118 when 2 => 119 productTempStore := (Sig_Vector_B_AX(Sig_Position) * Sig_Vector_B_AX(Sig_Position)); 120 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48 downto 16); 121 when 3 => 122 productTempStore := (Sig_Vector_B_AX(Sig_Position) * Sig_Vector_B_AX(Sig_Position)); 123 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48 downto 16); 124 when 4 => 125 productTempStore := (Sig_Vector_B_AX(Sig_Position) * Sig_Vector_B_AX(Sig_Position)); 126 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48 downto 16); 127 when 5 => 128 productTempStore := Sig_Tolerance_Limit * Sig_Tolerance_Limit; 129 Sig_Tolerance_Limit_Squared <= productTempStore(48 downto 16); 130 when others => 131 NULL; 132 End case; 133 end if; 134 end if; 135 end process; 136 137 process(Sig_Multiplication_Is_Complete) 138 variable Var_Vector_B_AX_Sum : Signed (36 downto 0); 139 begin 140 if rising_edge(Sig_Multiplication_Is_Complete) then 141 Var_Vector_B_AX_Sum := ("0000" & Sig_Vector_B_AX_Squared(0) + Sig_Vector_B_AX_Squared(1) 142 + Sig_Vector_B_AX_Squared(2) + Sig_Vector_B_AX_Squared(3) 143 + Sig_Vector_B_AX_Squared(4)); 144