DFPM on FPGA -Bachelor Thesis Report

Självständigt arbete på grundnivå
Independent degree project  first cycle
Electrical Engineering
DFPM on FPGA – A speed optimized implementation of the Dynamic
Functional Particle method on Spartan 3E
Taiyelolu Adeboye

DFPM on FPGA
Taiyelolu Adeboye 2015-09-25
iii
MID SWEDEN UNIVERSITY
Department of Electronics Design(EKS)
Examiner: Benny Thörnberg, Benny.Thornberg@miun.se
Supervisor: Kent bertilsson, Kent.Bertilsson@miun.se
Author: Taiyelolu O. Adeboye, taad1000@student.miun.se
Degree programme: International Bachelor’s Programme in Electronics, 180 credits
Main field of study: Electronics Engineering
Semester, year: Autumn, 2014

DFPM on FPGA
Taiyelolu Adeboye
Abstract
2015-09-25
iv
Abstract
This thesis focuses on the design of electronic circuitry that implements
the Dynamic Functional Particle Method (DFPM). The design was done
in VHDL and implemented on a Xilinx Spartan 3E FPGA. The work
included a digital 33-bit ALU implementation that was designed to
solve differential equations with the DFPM algorithm and UART trans-
ceiver and controller circuits for data exchange between the FPGA and
the PC. This report explains the design principles, process, tests and
results of the work. It also compares the performance of the designed
system with the performance of generic computational devices and also
examines the possibilities and limitations of operational concurrency
with relation to the size of problem sets.
Keywords: MATLAB, VHDL, FPGA, DFPM, algorithm evaluation, CPU
clock cycles, particle method

DFPM on FPGA
Taiyelolu Adeboye
Acknowledgements
2015-09-25
v
Acknowledgements
I would like to express my appreciation to my supervisor, Associate
Professor Kent Bertilsson, for his guidance, mentorship and support in
the course of this project. His contribution was vital to the execution and
completion of this project work. I would also like to express my appreci-
ation to Associate Professor Sverker Edvardsson for being so approach-
able and for his great willingness to explain.
My various tutors and examiners in the course of this Bachelor’s pro-
gramme have proven themselves to be exceptional and unforgettable. In
no particular order, Professor Bengt Oelmann, Dr. Börje Norlin, Profes-
sor Kent Bertilsson, Professor Benny Thörnberg, Martin Kjellqvist,
Mikael Hasselmalm, Dr. Najeem Lawal, Mikael Bylund, Amir Yousaf,
Professor Cornelia Schiebold, Dr. Peng Cheng, Mazhar Hussein, Profes-
sor Engmont Porten, Stefan Haller, David Krapohl, Solange Hamrin and
Evelina Caffrey will remain entrenched in my memory.
Without mincing words, Anders Rådberg, Anders Molin, Sara Lodin,
Lars Malmbom, Tove Gullikson and the team at MIUN Innovation will
always remain dear to my heart. Thank you for your time, advice and
your effort!
Finally, I owe a huge debt of gratitude to the following: The divine, for
those moments when I was dry, Temitope Ruth, for being so under-
standing and special, Ire Peter, our bundle of joy, for being so sweet,
Kehinde, my wonderful twin, my family (Samuel, Dorcas, Ardex,
Adeyemi and Ope) for being such a pillar of support, and my friends in
Sweden and in Nigeria. Words will not be enough to express how much
I appreciate you!
Thank you for being part of this journey, muchas gracias! Greater things
are still to come!

DFPM On FPGA
Taiyelolu Adeboye
Table of Contents
2015-09-25
vi
Table of Contents
Abstract ............................................................................................................ iv
Acknowledgements .........................................................................................v
1 Introduction............................................................................................1
1.1 Background and problem motivation......................................2
1.2 Overall aim...................................................................................3
1.3 Scope .............................................................................................4
1.4 Tools to be used...........................................................................4
1.5 Concrete and verifiable goals ....................................................4
1.6 Outline ..........................................................................................5
1.7 Contributions ...............................................................................5
2 Theory......................................................................................................6
2.1 Definition of terms and abbreviations......................................7
2.1.1 Terms..................................................................................7
2.1.2 Abbreviations..................................................................11
2.2 DFPM algorithm........................................................................12
3 Methodology ........................................................................................15
3.1 Concurrence vs. sequentiality .................................................15
3.2 Numerical representation ........................................................15
3.3 Modularity..................................................................................16
4 Design....................................................................................................17
4.1 The DFPM algorithm ................................................................17
4.2 Project Top Module...................................................................19
4.2.1 The two top sub-modules..............................................19
4.2.2 Data type conversion .....................................................19
4.3 Project defined Packages..........................................................20
4.4 Communication Top Module ..................................................20
4.4.1 UART................................................................................20
4.5 Iteration Control Top Module .................................................22
4.6 Implementation Constraint......................................................24
4.7 Parameters..................................................................................24
4.8 Data exchange format...............................................................25
4.9 Signed numerical representation ............................................26
4.10 Integer and fractional representation.....................................27
4.11 Spartan 3E-1200 FG320 FPGA .................................................28

DFPM On FPGA
Taiyelolu Adeboye
Table of Contents
2015-09-25
vii
4.12 Nexys2 FPGA demonstration board ......................................28
4.13 Xilinx ISE ....................................................................................29
4.14 ISim Simulation software.........................................................29
4.15 Design verification ....................................................................30
4.16 The complete design .................................................................30
5 Results ...................................................................................................32
5.1 Simulation results......................................................................32
5.1.1 Element wise vector multiplication .............................32
5.1.2 Element-wise vector subtraction..................................33
5.1.3 Evaluating new vector V ...............................................34
5.1.4 Evaluating new vector X ...............................................34
5.1.5 Convergence check.........................................................35
5.1.6 DFPM top module..........................................................36
5.2 Comparison................................................................................39
6 Discussion.............................................................................................42
6.1 FPGA resource utilization........................................................42
6.2 Reduction in computation time...............................................42
6.3 Larger problem sets ..................................................................42
6.4 UART bottleneck .......................................................................43
6.5 Precision......................................................................................43
6.6 Communication input/output limitations .............................43
6.7 Cross platform comparison......................................................43
6.8 Output comparison...................................................................45
6.9 Communication possibilities ...................................................49
6.10 Applications ...............................................................................49
6.11 Implications................................................................................50
7 Conclusions ..........................................................................................51
7.1 Benchmark..................................................................................51
7.2 Further work ..............................................................................51
References........................................................................................................53
Appendix A: Documentation of own developed program code...........54
Design codes ....................................................................................................54
New V operations………. ..............................................................................65
New X operations............................................................................................67
One Iteration …………………………………………………………...69
DFPM top module ..........................................................................................73
UART Core …………………………………………………………..76
UART Interface …………………………………………………………..83
Project Top module.........................................................................................88

DFPM On FPGA
Taiyelolu Adeboye
Table of Contents
2015-09-25
viii
Test code written in C++.................................................................................96
Appendix B: Explanation of some basic mathematical concepts........100
Two’s complement........................................................................................100
Euclidian norm ..............................................................................................100
Appendix C: Project report summary.......................................................102
Appendix D: MATLAB codes....................................................................103
Code for problem specification and comparison. ....................................103
Appendix E. Table of standard ASCII symbols and their numerical
representation ....................................................................................109

DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
1
1 Introduction
DFPM on FPGA is a project work that implements the algorithm of the Dy-
namic Functional Particle Method in silicon. The implementation was done on
Xilinx Spartan 3E FPGA, and it was designed for speed (in terms of the num-
ber of clock cycles required for the implementation).
The Dynamic Functional Particle Method (DFPM) is a numerical particle
method that was developed at Mid Sweden University. While the method is
iterative, it consists of steps, some of which can be executed in parallel. There-
fore a FPGA was considered to be able to offer advantages due to its parallel
processing capabilities.
The FPGA implementation takes matrix elements as input parameters through
the UART and returns an output in the form of the solution vector relevant to
the parameter input received.
Figure 1.1: A simplified illustration of the project

DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
2
1.1 Background and problem motivation
Systems of linear equations can be used to describe many observable natural
phenomena in nature and find application in many areas in physics, mechan-
ics, and sensor fusion among others.
One of the approaches to solving systems of linear equations involves the
application of the knowledge of matrices. This approach treats the system as
matrices or vectors comprising of elements that represent the parameters of
the system in question.
This approach often results in the classical A*X = B problem where A, X and B
are matrices/vectors. A has elements containing various parameters of the
system, X contains elements representing the defining properties of the pa-
rameters and B represents the solution vector.
For instance, if a system is defined as shown below,
3x – 2y + 4z = 10
5y + 1y – 2z = -2
10y – 5y + 3z = 4
Then it can be represented in A*X = B form as shown below.
As the number of variables in these systems increase, the size of the matrices
increase proportionately but the number of iterations required for solving the
problem using an iterative numerical method increases geometrically, thus
consuming significant CPU time.
This project aims to address this problem through the design of an Arithmetic
and Logical Unit (ALU) that implements the DFPM algorithm in a system that
combines sequential and parallel execution as a means of reducing the number
of CPU clock cycles required per iteration and consequentially, the computa-
tion time for the complete algorithm.

DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
3
1.2 Overall aim
The overall aim of the project is the design of an ALU that implements the
Dynamic Functional Particle Method on a FPGA. The system will be capable of
receiving input in the form of parameters that represent the variables of the
system to be analysed and will give its output in the form of a matrix whose
elements represent the solution to the problem.
The designed system will be capable of communicating with a computer
through the USB port and the data is to be collected and displayed on the
computer screen using suitable software.
The output from the designed system should be correct and consistent in
comparison with values obtainable from a similar computation executed in
MATLAB or similar software on a PC.
Figure 1.2: An overview of the project concept

DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
4
1.3 Scope
The designed system is expected to be able to resolve system of linear equation
problems expressed in the form A*X = B where A is a 5x5 square matrix while
X and B are 5X1 Vectors respectively. A and B will be given as input to the
designed system while the system gives an output that represents X as a solu-
tion vector of the system.
The input to the designed system should be in the form of positive 8 bit inte-
gers while the output from it is expected to consist of whole numbers as well
as fractions which can be represented to a maximum precision of 8 binary bits.
Although limits have been imposed on the kind of input parameter expected
with the aim of easing the communication between the designed FPGA system
and PC software, it is expected that the ALU designed should be able to exe-
cute the DFPM algorithm on input data beyond these constraints.
1.4 Tools to be used
The following tools are expected to be used to carry out this project:
1. Xilinx Spartan 3E FPGA on Nexys2 demonstration board.
2. Xilinx ISE design suite.
3. Desktop terminal application software running on a PC.
4. MATLAB software running on a PC.
1.5 Concrete and verifiable goals
The goals of the project are as follows:
1. Design of a processor/ALU in VHDL. The unit should implement the
DFPM algorithm.
2. Implementation of parallel processing into the design of the DFPM
computational module, as much as optimal for the problem size.
3. Design of UART communication modules, in VHDL, for the transfer of
data from the PC/UART port to the DFPM computation module speci-
fied in the item number above.
4. Verification of the output from the FPGA. It should be consistently
equivalent to the output of the same algorithm run on a PC.

DFPM On FPGA
Taiyelolu Adeboye
1 Introduction
2015-09-25
5
5. Investigation and suggestion of possible solutions and approaches to
scaling up the design for significantly larger problem sets.
1.6 Outline
Chapter 2 of this report explains, in brief, the theories behind the design and
some related work pertinent to DFPM and the FPGA implementation while
Chapter 3 examines the design methodology and principles behind design
choices and approaches. Chapter 4 outlines some of the tests carried out to
verify the functionality of the modules designed as well as compares the
results with those obtainable from other systems. In the fifth chapter, the
results are discussed, and the possibilities and limitations examined, and
Chapter 6, which concludes the report.
1.7 Contributions
This design was wholly done by the author of this report with support and
guidance from the supervisor (Associate Prof. Kent Bertilsson). The design was
based on the Dynamic Functional Particle Method algorithm which was devel-
oped by Prof. Sverker Edvardsson et al [1].
Prof. Sverker Edvardsson supplied the author with information about DFPM
and sample application of the algorithm implemented in MATLAB. A UART
core designed for the Nexys2 and made available by Digilent Inc., it was
adapted in designing the data exchange modules interfacing between the
FPGA and the PC.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
6
2 Theory
Systems of linear and differential equations is a well-established concept in
mathematics and finds its applications in solving theoretical numerical prob-
lems as well as real world challenges in various fields of endeavours like
mechanics, biology, electronics, economics etc. Thus a lot of work has been
done to develop approaches to solving these problems.
The dynamic functional paticle (DFPM) is an approach, recently developed by
Sverker Edvardsson et al [1] [2], which can be used to solve systems of linear
and differential equations. The algorithm is simple, widely applicable and
efficient with significant comparative advantages in relation to some of the
other established approaches [2].
DFPM implements a novel second order dynamical particle method which,
though new, is related to some first order approaches in previous work done
by Sincovec and Madsen [3], Pata and Squassina [4], and F. Alvarez [5].
There are a number of computational libraries and algorithm, implementing
various approaches to solve problems of linear and differential equation sys-
tems. Some of these include ARPACK and LAPACK, Colt library (java), and
IML++ (C++) among others.
Since this report is not a mathematical treatise, the main focus is on design and
implementation of electronic hardware that is able to compute and present
solutions to problems presented as a system of differential equations received
as input.
The design and implementation done in this project, while novel, is also relat-
ed to a previous work by Bruce Land entitled “Hybrid Computing on an
FPGA“ [6], in which a Digital Differential Analyzer (DDA) was designed and
implemented on Altera Cyclone II 2C35 FPGA on an Altera DE2 FPGA
demonstration board. The design made use of numerical representation in 18
bits, of which 16 bits were set apart for floating point fractions. Parallel compu-
tations were also used in order to reduce CPU computation time.
Apart from Bruce Land’s design above, there is little or no known information
about the implementation of numerical or particle methods in FPGA, and this
work could lead to novel concepts and applications.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
7
2.1 Definition of terms and abbreviations
2.1.1 Terms
Below are basic definitions and/or explanation of some important concepts
used in this report.
1. Linear equations
A linear equation can simply be defined as an algebraic equation consisting of
either or both constants and a product of constants and single power variables.
2. Systems of linear equations
These are a set of simultaneous linear equations which are defined as a single
problem and meant to be treated as such. These are often encountered in real
life situations and observable physical phenomena.
3. Differential equations
These kinds of equations define relationships connecting certain functions or
physical properties with their differentials (i.e. derivatives) hence the name.
4. Systems of differential equations
These are simultaneous statements of differential equations defining a specific
problem as a function of relationships between one or more independent
variables and their derivatives (dependent variables).
5. Numerical methods
These are approaches to solving mathematical problems with the use of vari-
ous methods numerical approximation. Numerical methods can be direct or
iterative.
Direct numerical methods include algorithms that have a predefined number
of steps for arriving at solutions. An example is the Gaussian elimination
method. Iterative methods, however, require an undetermined number of
iterations, of computational steps, which can vary with each problem defini-
tion. Examples of iterative numerical methods are Newton’s method and the
Newton-Raphson method.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
8
6. Particle methods
Particle methods are algorithms used, primarily, for the simulation of interact-
ing particles of physical systems and their motion in nature. These algorithms
are, sometimes, applied to numerical treatment of theoretical mathematical
models. The dynamic functional particle method falls under this category.
7. Convergence
Convergence is a characteristic of an iterative method when its sequences
subsequently and consistently approximates, or “converges”, to some specific
numeric approximations. The approximation to which the method converges
to is said to be the solution for the problem being solved with the use of the
iterative method.
8. The Dynamic Functional Particle method
This is an iterative particle method applied to general mathematical problems
by which mathematical problem models can be translated to particle models
and solved, as developed by Sverker Edvardsson et al [2].
The method is robust and widely applicable to problems of systems of linear
and differential equations, especially those defining nature and observable
physical phenomena.
9. Sequential processes
Sequential processes are processes consisting of operations which are carried
out one after the other. In these kinds of processes no two operations take
place simultaneously. All operations follow a definite sequence. Examples are
operations that take place in a single core CPU (Central Processing Unit).
10. Concurrent processes
Concurrent processes are processes consisting of more than one operation
being carried out in parallel. These kinds of processes can occur in multi-core
CPUs, FPGAs and other kinds of devices with parallel processing capabilities.
11. CPU time
This refers to the time spent by a processing unit while carrying out a certain
computational operation or set of operations. It is expressed in seconds.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
9
12. Clock
This is a component in digital electronics systems by which the timing of
operations and processes are controlled. It basically oscillates between a high
and low signal.
13. Clock cycle
This is a single complete up and down oscillation of a clock.
14. Clock frequency
This refers to the number of cycles a clock completes in a second. It is ex-
pressed in Hertz.
15. Field Programmable Gates Array (FPGA)
These are integrated circuits that are factory manufactured to be configurable
by engineers and designers as the use case or application demands. They are
normally programmed in a hardware description language (HDL).
16. Universal Asynchronous Receiver Transmitter
This is a standard hardware that facilitates serial data exchange between two
electronic devices. A UART port should be connected to another UART port in
order for them to exchange data.
Data exchange between UART hardware is 1 bit serial and takes place between
cross-connected receiver and transmitter pins while the data received is con-
verted to parallel 8 bit format and exchanged between the UART hardware
and the device controlling it.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
10
Figure 2.1 Simplified illustration of the UART communication process
17. MATLAB
MATLAB is an interactive software platform and high-level programming
language which is often used in scientific and engineering computing due to its
simplicity, robustness and easy to use interactive environment and functions.
In this project, it was used for the initial execution of the DFPM algorithm and
comparison.
18. Terminal software application
This is a software application that enables its user to get access to one or more
input/output ports (e.g. USB) of a PC and which displays the data stream. In
this project, Br@y++ terminal was used to access a USB port and communicate
with the FPGA running the DFPM algorithm.
19. Two’s complement
Two’s complement is a method of representing positive and negative signed
numbers such that the most significant bit is used to represent the sign while
the rest of the bits represent the numeric value of the number being represent-
ed.
When the most significant bit of a number represented in two’s complement is
“1”, then the number is negative but when it is “0”, the number is positive.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
11
This is a standard way of representing numbers that is frequently applied in
computing and electronics.
2.1.2 Abbreviations
The following abbreviations are used in this report:
ALU: Arithmetic and Logic Unit.
ASCII: American Standard Code for Information Interchange. This is the
standard used for the data exchanged between the PC and the FPGA.
ASIC: Application Specific Integreated Circuit. These are integrated circuits
that are designed or configured for a specific use case or application.
ARPACK: Arnoldi PACKage. Is a software library, coded in FORTRAN,
which can be used to solve eigenvalue problems.
BGA: Ball Grid Array.
CLB: Configurable Logic Blocks. These are logic elements on FPGAs used to
implement circuits.
CPLD: Complex Programmable Logic Device.
CPU: Central Processing Unit.
DE: Differential Equations.
DFPM: Dynamic Functional Particle Method.
FPGA: Field Programmable Gates Array.
FPU: Floating-Point Unit.
HDL: Hardware Description Language. These are languages by which one can
design hardware by means of semantics in an ISE or IDE.
IDE: Integrated Design Environment.
IOB: Input Output Block. These are ports for input and output to and from the
FPGA.
ISE: Integrated Synthesis Environment. This is software for synthesizing
designs done in HDL. Xilinx ISE is an example.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
12
LAPACK: Linear Algebra PACKage. This a library written in FORTRAN
which can be used to solve problems in linear algebra.
LDE: Linear Differential Equations.
LSB: Least Significant Bit.
LUT: Look Up Table
MATLAB: This is a software platform and high-level language used for pro-
gramming and simulations.
MCU: Microcontroller.
MSB: Most Significant Bit.
N/A: Not Applicable.
RAM: Random Access Memory.
RX: Receive. This is a pin through which data is to be received on a transceiver
port.
TX: Transmit. This is a pin through which data is to be transmitted on a trans-
ceiver port.
UART: Universal Asynchronous Receiver Transmitter.
USB: Universal Serial Bus.
VGA: Video Graphics Array. This is a standard for image display.
VHDL: VHSIC Hardware Description Language. In this project, VHDL was
used for digital hardware design.
VHSIC: Very High Speed Integrated Circuit.
2.2 DFPM algorithm
The dynamic functional particle method (DFPM) is widely applicable to solv-
ing a number of different problems when defined as a system of linear or
differential equations. However, the focus of this project work is on the appli-
cation of DFPM to solve the classical A*X = B system of differential equation
problem as described in Chapter 1 of this report.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
13
The algorithm is simply a two-step computation which is iterated until con-
vergence (or a specified level of convergence) is reached. Checking for conver-
gence is done by evaluating the Euclidean norm of the difference between
vector B and the vector product of vector X and matrix A and comparing it
with a predetermined scalar value representing the acceptable tolerance of the
computation.
The algorithm requires a number of input which are three n sized vectors
representing vector B in the problem statement and vectors X and V which are
used in the algorithm. An nxn matrix is also required as an input equivalent to
the A-matrix in the problem statement. Three scalar input Dt, mu and toler-
ance are also expected in the algorithm and they represent the discretization
step, the damping factor and the tolerance respectively.

DFPM On FPGA
Taiyelolu Adeboye
2 Theory
2015-09-25
14
Figure 2.2 A flowchart of the DFPM algorithm
A MATLAB sample code implementing the algorithm in Figure 2.2 above is
included in this report.

DFPM On FPGA
Taiyelolu Adeboye
3 Methodology
2015-09-25
15
3 Methodology
As stated in the introductory part of this report, one of the purposes of this
project work is the reduction of CPU time. Hence, significant attention was
paid to the computational processes implemented in this design, as well as the
impact on the speed, and resource use on the FPGA. This chapter describes the
methodologies and considerations that influenced the design and implementa-
tion as described in the following chapter.
The preference of an FPGA over traditional CPUs and other types of pro-
cessing units is a consequence of the advantages offered by operational con-
currency that is characteristic of FPGAs and CPLDs.
After having chosen a design concept, the next biggest challenge was the
design itself. The design in this project work was done in VHDL (VHSIC
Hardware Description Language). While there are other languages and ap-
proaches to similar hardware design, VHDL was chosen because of the ease
with which it can be used to manage large projects, as well as the author’s
familiarity with it.
3.1 Concurrence vs. sequentiality
A limitation that was encountered early in the course of the design was the
limited number of dedicated multipliers on FPGAs. This was due to the fact
that FPGAs have a limit to the number of multipliers available on them, hence
limiting the number of multiplicative operations that can be executed concur-
rently.
An important focus of this work is speed optimization, for which concurrency
is key in this implementation. However, a balance needed to be struck between
concurrency and sequentiality. Hence some operations were run in parallel
while others were sequential. Addition and subtraction operations were most-
ly concurrent while some multiplicative operations were sequential and others
parallel.
3.2 Numerical representation
The dynamic functional particle method involves an iterative process with a
number of multiplications, subtractions and additions at each stage. The algo-

DFPM On FPGA
Taiyelolu Adeboye
3 Methodology
2015-09-25
16
rithm was implemented in MATLAB and run while the result of the computa-
tions at each stage of the iteration was output to the console and examined.
The cursory examination clearly indicated that the various values obtained
from the computations assumed a range that stretched across positive and
negative parts of the number line. This implied that a scheme was needed for a
distinct representation of negative and positive values. The values contained
integers as well as fractions, necessitating a need for representation of frac-
tions.
3.3 Modularity
In order to simplify the design, the whole project was split into to two major
top modules. One of these two top modules implemented the DFPM algorithm
and the necessary iterative computations while the other module was designed
to implement UART communication and data exchange between the UART
hardware on the FPGA board and the port on the PC with which it will be
communicating. This second module was also responsible for the conversion
of the 8-bit parallel data to 33-bit numbers and the format expected by the
DFPM algorithm module.
Each of these top modules was subdivided into smaller modules which carried
out specific functions and communicated with other modules through signals
and inter-module data exchange.
The details of the design are discussed under design in Chapter 4.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
17
4 Design
The digital hardware designed in VHDL consisted of combinatorial and syn-
chronous circuits which were coded as IO ports, modules, processes and
signals. The functioning of the combinatorial circuit elements were instantane-
ous while synchronous circuit activities too place at the edge of the clock.
The complete design was made up of several modules exchanging information
with the aid of signal input and output via their ports. Since the design is
reasonably complex and large, an attempt was made to give each module a
name that signified or helped to identify the purpose and function of the
modules.
The core of the design consisted of the modules which executed the DFPM
algorithm, an over view of these core modules and their interaction is present-
ed in Figure 4.1
4.1 The DFPM algorithm
The dynamic functional particle method is widely applicable to many problem
models as stated in Chapter 2 of this report. However, in order to design a
circuit that specifically solves the A*X = B problem, one needs to understand
the step by step procedure of applying DFPM to the problem. Various imple-
mentations of DFPM in MATLAB, C++ and VHDL as applied in this thesis are
included in the appendix.
The procedure entails access to input vectors and matrix containing a number
of elements, of vectors and matrices, which make up the coefficients of the
systems of equations. The next step is the iterative computation, after which
comes the output. Throughout the process, the values of vector B, matrix A, Dt
and the damping factor (mu) remains fixed while the values of vectors X and V
may be modified at the end each iteration.
Each stage of the iterative computation comprises of two steps which are the
approximation calculation and the convergence check. The approximation
calculation takes the form of matrix multiplication, subtraction and addition
operations while the convergence check required a comparison of a predeter-
mined tolerance value with the Euclidian norm of the vector V.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
18
Figure 4.1. An overview of the core modules of the DFPM algorithm

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
19
4.2 Project top module
The topmost level container for the project HDL code was named
DFPM_ON_FPGA_TOP_MODULE. This module functioned as the overall top
module, containing all VHDL code relevant to the project design. It consisted
of two top modules which served two distinctly important functions. The
modules were named “UART_INTERFACE” and
“Signed_DFPM_Iteration_Control_Top_Module”. The complete VHDL code
for all the modules will be included as an appendix to this report.
4.2.1 The two top sub-modules
The communication top module was designed to handle communication with
the PC through the UART port and the UART VHDL code that controlled it.
Data received from the PC which would normally be in 8 bits were converted
to 33 bits in the format stated in section 3.2.2 of this report. The data were also
accumulated in arrays internal to this module until all data relevant to the
specific problem model has been received. The data would then be sent as
output through the ports of this module.
The Signed DFPM Iteration control module receives a stream of 33-bit data in a
format specified in its design, which mathematically describes the problem
being solved. The data received would then be subjected to the DFPM algo-
rithm, after which a solution would be obtained and sent out as an output
through the ports of this module.
At the conclusion of the Signed DFPM Iteration Control module’s computa-
tion, the output signal would be returned to the Communication top module
which reconverts the solution by first translating the result into human reada-
ble decimal equivalent before serially shifting the values out in 8 bits through
the UART interface.
4.2.2 Data type conversion
The communication top module handles data as standard logic vectors and
standard logic signals while the Signed DFPM Iteration Control module han-
dles data as signed bit vectors for all vectors.
This fact necessitated a need for the conversion of the data signal types from
standard logic vectors to signed bits and vice versa. This was done with the aid
of predefined functions which are conversion standards in VHDL. The conver-
sion takes place in the project top module.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
20
4.3 Project defined packages
The input data for each problem consisted of scalar data and many vectors and
some multi-dimensional matrices. Hence a specific format was designed for
easy recognition and handling of these vectors and matrices. Due to the fact
that these design-specific format vector data types were often handled and
shared between multiple modules in the project, it was considered advanta-
geous to create special packages to define these unique format vectors.
The specific formats designed are described below:
1. DFPM_VECTOR_5X32_BIT: A data type defining an array of 5 standard
logic vectors. Representative of a 5 by 1 vector of standard logic type
data.
2. DFPM_VECTOR_25X32_BIT: A data type defining an array of 5
DFPM_VECTOR_5X32_BIT. I.e. a multidimensional array equivalent to
a 5 by 5 matrix of standard logic vector type data.
3. DFPM_ARRAY_5X32_BIT: A data type defining an array of 5 signed bit
vectors. It was used to represent 5 by 1 vectors of containing signed da-
ta.
4. DFPM_ARRAY_25x32_BIT: A data type defining an array of 5
DFPM_ARRAY_5X32_BIT. This is equivalent to a 5 by 5 multidimen-
sional array of signed data.
These packages were used to ease the process of design and implementation
and also facilitated a unified standard between modules.
4.4 Communication top module
The communication top module comprised of 8 sub-modules. The modules
and their functionalities are briefly described below.
4.4.1 UART
These are the modules controlling the UART circuitry
1. RS232RefComp: This module was released by Digilent Inc. as a sample
code for an implementation of a UART core for the Nexys2 board. It is
the only purely non-original code used in this project.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
21
It is a simple implementation of UART designed in VHDL and it is re-
sponsible for 1 bit serial data transmission and reception, as well as the
conversion of 1-bit serial to 8-bit parallel data and transmission to the
on-board electronic hardware.
2. UART_INTERFACE: This module was used to control the RS232Comp
circuit. It determines when the UART core should transmit data, receive
data or neither.
This module is a simple four-state state machine. The states correspond
to:
a. Receive state: When the UART core is switched to receive data.
b. Waiting state: When both the UART interface and the UART core
do nothing but wait for data from the DFPM module.
c. Send state: When the UART module is switched to send an 8 bit da-
ta.
d. RepeatSend state: This is a transitional state where the module goes
to after sending each 8-bit data before sending the next. This helps to
ensure that the data transmission between the UART INTERFACE
and the UART core is hitch-free.
The control of the UART core from the UART INTERFACE and feed-
back from the UART core was facilitated with the aid of four signals namely
wrSig, rdSig, TBESig and RDASig. These signals and their effect on the UART
core are outlined in Table 4.1 below.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
22
Table 4.1 Table of control signals and their effect on the state of the
UART core
UART Module status
Transmit Receive
Signal wrSig 0 Off N/A
1 On N/A
rdSig 0 N/A On
1 N/A Off
Feedback from the UART core was received through the TBE and RDA signals,
which, when raised high, indicated that new data has been read or transmitted
respectively.
4.5 Iteration control top module
This module is made up of the circuitry that implements the DFPM algorithm.
The sub-modules were designed to carry out the various computations and
logical evaluation required in the DFPM method.
1. Signed_Vector_Vector_Mult_5By1: This module computes the ele-
ment-wise product of two 5 by 1 vectors of 33-bit data. Its operation is
concurrent and all computation results are immediately available at the
output when the input values changes.
2. Signed_Vector_Vector_5By1_Subtr: This module computes the ele-
ment-wise difference between the elements that make up two modules.
It concurrently performs subtraction operations on two vectors contain-
ing five elements of 33-bit data type and immediately assigns the result
to the output.
3. Signed_SubtrAndMult_Ops_Module: This module instantiates the
vector multiplication and the vector subtraction modules above and us-
es them in the computation “B – A*X – mu*V” for each iteration stage of
the DFPM algorithm.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
23
In this module, computation of the product of matrix A and vector X
was a combination of concurrent and sequential operations. The prod-
uct of one row of matrix A and the vector X was concurrent but since
matrix A comprised of 5 rows, each row product was pipelined in order
of row sequence.
4. Signed_New_V_Ops: This module computed a new value for the vec-
tor V at each iteration stage of the DFPM algorithm. The value was
based on the result of the operations carried out in the subtraction and
multiplication operations module, described in number 3 above.
5. Signed_New_X_Ops: This module computed a new value for the vector
X in each iteration stage of the DFPM algorithm. The new value for vec-
tor X is always dependent on the new value of vector V above.
6. Signed_Tolerance_Check: This module receives the value of B-A*X as
input and should then compare the Euclidean norm of the vector re-
ceived with the pre-fixed tolerance value. However, computing square
roots in FPGA can be problematic and introduce significant errors.
Hence, the square of the tolerance value was compared with the square
of the Euclidean norm, which is equivalent to the sum of the squares of
the elements that make up the vector input.
After comparison, if the square of the norm was found to be lesser than
the square of the tolerance level, a signal line would then be raised and
the algorithm terminates. The squares of the two vectors were comput-
ed by self-multiplying them with the aid of the Vector_Vector_Mult
module described above.
When the condition checked by this module is found to be true, conver-
gence is said to have been reached.
7. Signed_DFPM_One_Iteration: This module instantiated the subtraction
and multiplication module, new v operation module, new x operation
module and the tolerance check module. It connected the input and
output appropriately and makes up all the operation that make up one
iteration stage of the DFPM algorithm.
8. Signed_DFPM_Iteration_Control: This module instantiated the
Signed_DFPM_One_Iteration module. It feeds the new V and X vectors
back into the computational module and stops the iterations when con-
vergence is attained.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
24
4.6 Implementation constraint
In order to translate, map and route the design done in VHDL to device specif-
ic circuit, an implementation constraints file named UCF_DFPM_TOP was
used. The file links input and output pins specified in the project top module
with the intended pin on the FPGA chip and demonstration board.
4.7 Parameters
The design was intended to make room for some level of easy configurability.
Thus, the initial values of vectors v and x, and the scalar discretization coeffi-
cient (dt), the tolerance and the damping factor (mu) can be changed inside the
DFPM modules. The UART module parameters can also be easily modified.
The default values for these parameters are listed below:
Table 4.2 Table of parameters and corresponding values used
S/N Parameter Value used
1. Vector V [1 1 1 1 1]
2. Vector X [1 1 1 1 1]
3. Damping factor 0.1
4. Discretization coefficient 1.0
5. Tolerance 2-7
6. UART baud rate 9600
7. Number of data bits per trans-
mission
8
8. Parity odd
9. Number of stop bits 1
10. Handshaking None

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
25
4.8 Data exchange format
The exchange of data between the PC terminal and the FPGA system needed
to be standardized in order for the data to be stored in the correct structure
and also for it to be usable by the DFPM computation modules.
The MATLAB approach for specifying vectors and matrices was, hence,
adopted.
In order to specify a problem set of the type applicable in the format usable by
the DFPM module, closing braces begin all problem sets, followed by each
element of each row of the matrix separated by whitespace and each row in a
matrix separated by a semicolon. The solution output from the FPGA is trans-
mitted using the same standard except for the opening and closing braces.
An example of the utilization is shown in the Figure 4.2 below.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
26
Figure 4.2 Image showing the terminal being used for data exchange be-
tween the FPGA and the PC
4.9 Signed numerical representation
Since digital systems only deal with binary arithmetic for numerical computa-
tions and representation, the numbers handled in the DFPM algorithm were
represented by using signed bits. This decision helped to ensure that positive
and negative numbers were distinguished from one another.
The downside of this approach was that the bit being used for sign representa-
tion could not be used for numerical value representation. Therefore an extra

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
27
bit needed to be added to the number of bits representing each signed number
in order to make up for the shortfall.
4.10 Integer and fractional representation
Another important consideration in the design was the representation of
fractional values. It was decided that binary digits after the radix point will be
represented and treated like whole integers i.e. shifted to the left. At the end of
all computations, the result will also be shifted to the right by the appropriate
number of binary digits to make up for the left shift. This process is a simple
scheme that makes for the manipulation of fractions in a way that is similar to
whole numbers.
As a result, each number in the DFPM algorithm consisted of 33 bits. The MSB
indicated the sign of the number while the next 16 bits represented the integer
part of the value being handled. The fractional part of the number was then
represented by the least significant 16 bits.
Below is an image showing a sample numerical representation as used in the
design. It can be seen that the MSB is “0” therefore it is a positive number. The
next 16 bits are equivalent to 910 and the last 16 bits are equivalent to 0.628906
(i.e. 2-1 + 2-3 + 2-8). Hence the number represented in the image below is
+9.628910.
Fig 4.3 Image showing the numerical representation scheme

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
28
The multiplication of two numbers with n number of fractional binary digits
will result in a product with 2n fractional binary digits. This scheme, therefore,
offers an advantage in multiplication operations since it ensures that multipli-
cative operations maintain a precision of 2-810 for each operation.
4.11 Spartan 3E-1200 FG320 FPGA
Spartan 3E-51200 FG320 FPGA is a standard performance 320-ball fine pitch
ball grid array FPGA chip with 1.2 million gates, 136 K RAM, 28 dedicated
multipliers and 250 user IO pins [7]. The chip is made up of five functional
elements which are the Digital Clock Managers (DCMs), the Input/Output
Blocks (IOBs), Configurable Logic Blocks (CLBs), dedicated multipliers and
block RAMs.
The dedicated multipliers are able to directly compute 18-bit by 18-bit multi-
plication in two’s complement while the IOBs can be used for data input and
output to and from the FPGA and the 136 K RAM is equivalent to 139264 bits
of memory available for storage on (136 * 1024 bits). The logic of combinatorial
and synchronous circuits resulting from the VHDL design is mainly imple-
mented in CLBs (Configurable Logic Blocks) on the chip.
4.12 Nexys2 FPGA demonstration board
The Nexys2 FPGA demonstration board is a hardware platform, designed and
manufactured to accommodate and support the Spartan 3E FPGA, enable a
demonstration of its capabilities and provide some standard hardware periph-
eral access to the chip.
It can be powered via USB, battery or wall socket and runs on a 50 MHz oscil-
lator while featuring 16 MB SDRAM and flash and an impressive array of
standard hardware interfaces like VGA, USB, RS232 ports as well as switches,
buttons and a quad digit seven segment display [8].

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
29
Figure 4.4 Image showing a Nexys2 FPGA demonstration board
4.13 Xilinx ISE
Hardware design was done with Xilinx ISE (Integrated Synthesis Environ-
ment) and the generated design was then downloaded onto the FPGA. Xilinx
is free software developed by Xilinx for programming FPGAs and for their
hardware design.
There are a number of other design/synthesis environment applications for
hardware design, e.g. Altera’s Quartus II design environment. However,
Xilinx seemed to be an obvious choice due to the fact that it was offered by the
vendor of the FPGA chip used, and also because it provides out-of-the-box
support for the FPGA chip and the board used.
4.14 ISim simulation software
ISim simulator software is a software application for the simulation of HDL
code which is bundled with the Xilinx ISE software suite. It is easy to use and
provides support for mixed languages, multi-threaded compilation, and dis-
plays the circuit behavior with the aid of waveforms on the screen.
ModelSim is also a simulation software that can be used but due to its usage
restrictions and the author’s familiarity with ISim, ISim was chosen over
ModelSim.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
30
4.15 Design verification
For each module designed in this project, a test-bench was written for testing,
simulation and verification of its functionality and behavior. Test-benches, in
this context, refer to VHDL code written for the purpose of simulating opera-
tional circumstances of the designed module in question. The modules being
tested are normally referred to as unit under test (UUT).
4.16 The complete design
The complete system integrated these different modules and connected them
while doing type conversion in the top module where appropriate. The incom-
ing data from the UART were converted to signed bit vectors and stored in
memory on the FPGA until all the data necessary for each problem set were
received.
After this, a signal that activates the DFPM computation module is raised so
that computation can start. The complete design made use of 26 multipliers, 12
IOB pins and 3243 LUTs. While the utilization of multipliers was 92%, the
utilization of logical and IO blocks was much lower. A copy of the project
report summary is included in the appendix of this report.

DFPM On FPGA
Taiyelolu Adeboye
4 Design
2015-09-25
31
Figure 4.5 The Nexys2 board FPGA connected to a PC and running the
DFPM algorithm.

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
32
5 Results
Every module designed in Chapter 4 of this report was tested with a test-bench
written in VHDL. The test benches were written to simulate the expected
conditions and functional environment for each module. The simulations were
done in ISim software and the module’s behavior verified through visual
inspection and calculations. The test benches were not included in appendix of
this report. The following are results of the tests carried out on the modules.
It is worth noting that since the values represented in this chapter are basically
binary, negative numbers were represented in two’s complement.
5.1 Simulation results
5.1.1 Element wise vector multiplication
The image below shows the result of the simulation of the vector multiplica-
tion module. Vectors 1 and 2 were input while vector_out was the output.
Fig 5.1 Test simulation for Signed_Vector_Vector_Mult module
Vector 1 = [5.0 3.0 2.0 4.0 7.0] and Vector 2 = [3.0 2.0 3.0 4.0 5.0]

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
33
The output vector was 10011102 = 78.0
By calculation: (5*3) + (3*2) + (2*3) + (4*4) + (7*5) = 78
This supports the idea that the module worked fine.
5.1.2 Element-wise vector subtraction
Figure 5.2 Test simulation for Signed_Vector_Vector_5By1_Subtr module
Above is an image of the simulation waveform for the vector subtraction
module. The input vectors were named vectors 1 and 2 while the output was
named vector_out.
Vector 1 = [1.0 7.81e-3 11.72e-3 15.62e-3 19.53e-3]
Vector 2 = [15.0 3.91e-3 3.91e-3 3.91e-3 3.91e-3]
Vector out = [-14.0 3.91e-3 7.81e-3 11.72e-3 15.62e-3]

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
34
Simple calculation indicates that Vector 1 – vector 2 = vector out.
5.1.3 Evaluating new vector V
In the image below, the effect of operations pipelining can be seen as the
elements of vector_new_v assume new values one clock cycle after one anoth-
er. The iteration complete signal indicates the completion of the subtraction
and multiplication operations in each iteration stage.
Figure 5.3 Test simulation for Signed_New_V_Ops
5.1.4 Evaluating new vector X
Similar to the module in section 5.1.3 above, the effect of pipelining is seen in
the evaluation of vector_new_x. The signal new_v_ready signified that the
evaluation of the new value for vector V was complete and that the evaluation
process for vector x can start.

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
35
Figure 5.4 Test simulation for Signed_New_V_Ops
The signal new_X_ready is a signal line that indicated that the operation was
complete. The behavior was as expected.
5.1.5 Convergence check
The tolerance check module was simulated with two sets of values for vector
b_ax. The first set of values was set to be beyond the tolerance level while the
second set of values was set to be below the expected limit.
The signal “iteration complete” raised at the end of each multiplication and
subtraction operation of the iteration stage. The convergence check module
completes its function in about seven clock cycles, after which, the “iterate”
signal should be raised high or low depending on the result of the convergence
check.

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
36
Figure 5.5 Test simulation for tolerance check module
It can be seen above that after the second set of values were received and
computed, the “iterate” signal was brought low. This is consistent with the
design concept.
5.1.6 DFPM top module
This simulation was done with the following input set:
Vector B

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
37
Matrix A
Vectors X and V
By visual inspection of the results from the simulation, the final value of vector
X on the output was calculated thus:
Vector X(0) is a negative number since the first bit is 1.
1111111111111111111000111011001012 in two’s complement is equivalent to -
0000000000000000000111000100110102 in unsigned binary. A simplified ap-
proach to conversion of unsigned binary to and from two’s complement is
outlined in the appendix.

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
38
Figure 5.6 Test simulation for DFPM top module
Hence it is correct to state that:
Vector X(0) = - (0.0 + 2-3 + 2-4 + 2-5 + 2-9 + 2-12 + 2-13 + 2-15).
Vector X(0) = -0.2211
In the same manner Vector X(1) is a negative number.
1111111111111111111100100011100112 in two’s complement is equivalent to -
0000000000000000000011011100011002 in unsigned binary. Hence,
Vector X(1) = - (0.0 + 2-4 + 2-5 + 2-7 + 2-8 + 2-9 + 2-13 + 2-14)
Vector X(1) = -0.1076
Vector X(2) , Vector X(3) and Vector X(4) are positive numbers since their MSB
are 0. Therefore conversion from two’s complement is not required for them.
Vector X(2) = 000000000000000000001001111100000

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
39
Vector X(2) = +0.0 + 2-4 + 2-7 + 2-8 + 2-9 + 2-10 + 2-11
Vector X(2) = +0.0776
Vector X(3) = 000000000000000000011111001011011
Vector X(3) = +0.0 + 2-3 + 2-4 + 2-5 + 2-6 + 2-7 + 2-10 + 2-12 + 2-13 + 2-15 + 2-16
Vector X(3) = +0.2436
Vector X(4) = 000000000000000000101101000100000
Vector X(4) = +0.0 + 2-2 + 2-4 + 2-5 + 2-7 + 2-11
Vector X(4) = +0.3520
Therefore the final value of the solution vector in this simulation was
While the behavior seen above was consistent with design expectation, it was
considered that comparison with the output from a MATLAB implementation
would help to further verify the module’s behavior.
The values obtained from the MATLAB code and the VHDL simulations were
quite close as the MATLAB implementation produced vector X as shown
below:
X = [-0.2199, -0.1074, 0.0775, 0.2440, 0.3521]
5.2 Comparison
The circuit implemented on FPGA was tested by connecting the FPGA to a PC
and sending in numbers that represented problem sets while the FPGA re-
turned the solution to the problems. Since the accuracy was crucial, the results
obtained during these tests were noted and compared with values obtainable
from the same algorithm implemented in MATLAB on a PC. The comparison
showed that the values obtained by both systems, for each problem set inves-

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
40
tigated, were approximately equal. A table comparing the results obtained
during two of these tests is shown below.
Table 5.1 Table of a comparison of the results obtained from two runs of
DFPM on different systems.
1st test 2nd test
Problem
Set
Vector A
Vector B
Solution
Vector
(MATLA
B/PC)
Binary N/A N/A
Decimal
Solution
Vector
(FPGA)
Binary

DFPM On FPGA
Taiyelolu Adeboye
5 Results
2015-09-25
41
Decimal

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
42
6 Discussion
Based on the tests carried out on the VHDL design modules, the behavior of
the circuit was as expected. However, a number of implications need to be
discussed.
6.1 FPGA resource utilization
Due to the fact that FPGAs have limited resources, there are established limita-
tions to the number of multiplication operations one can execute in parallel for
problems of the 5x5 matrix dimension implemented in this design. As matrix
dimensions get bigger the number of concurrent operations possible are re-
duced proportionately.
By this design, for a problem defined by an n dimension matrix and n-element
vectors, then n + 5 number of multipliers will be needed for the design. This is
because matrix row-vector multiplication in A*X was done concurrently for
each row while other multiplication operations were done sequentially. An-
other limitation is the data size expected by the dedicated multipliers.
The Spartan 3E multipliers are 18-bit multipliers by default and multiplication
operations involving data types bigger than 18 bits will consume even more
resources. As can be seen in the project report, the actual number of multipli-
ers used was 26 out of a total of 28.
6.2 Reduction in computation time
For every iteration stage of this design, computation time for (n-1)2 is saved.
Thus for a solution requiring m number of iterations, the time required for ((n
– 1)2 * m) multiplication operations are saved per solution. For instance, a 5 by
5 design as implemented in this project work saves the computation time for
1600 multiplication operations for a solution requiring a hundred iterations.
6.3 Larger problem sets
An approach to implementing this design for significantly larger problem sets
might be to section the complete data set into subsets containing small-sized
problem sets which the module is capable of handling. The solutions can then
be stored and reused as appropriate. At a point, this approach might encounter

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
43
limitations as well, due to the fact that the on-chip memory of FPGAs is also
limited. However, this was not the focus of this design.
6.4 UART bottleneck
Tests showed that each iteration stage of DFPM computation for a 5 by 5
dimensioned problem required 28 clock cycles. However, the data was being
received through a 9600 baud rate UART. The UART is, thus, slower than the
DFPM computations. In a case where large volumes of data may need to be
transmitted to the DFPM computation module, the UART may prove to be a
bottleneck. This problem might be mitigated with the use of a more parallel
communication mode and faster transmission rates.
6.5 Precision
Although the number of bits assigned for fractional value representation was
quite many (16 bits), there might be some challenges when it comes to the
accuracy of the exact values obtained from multiplication operations. This is
because the result of the multiplication of two 33-bit values is a 66-bit value.
When this product is to be stored back in a 32-bit data type container, then
some bits will be lost.
This problem will, most likely, not affect integer values in the DFPM computa-
tion but can result in some precision loss in the fractional representation.
6.6 Communication input/output limitations
Since the data received from the UART could not be used directly, modules
were written for the forward and reverse translation of the data transmitted to
and received from the DFPM computation module.
For instance, due to the translation done in the “UART_out_DFPM_in” mod-
ule, only single digit decimal numbers are expected as input data typifying the
problem set. Likewise, in order to reduce FPGA resource consumption, reverse
translation of the solution vector element sets was also limited to four fraction-
al digits.
6.7 Cross platform comparison
Since the goal of the project is to implement DFPM in an FPGA design that is
speed optimized, the CPU time consumed by the algorithm became an issue of
pertinent importance. However, since different computational devices have
varying architectures and processing speed, as well as operating systems, a

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
44
reasonable metric for the evaluation of the computation time that is independ-
ent of these parameters was needed in order to compare the performance of
the FPGA design with other implementations. The agreed metric was the
number of clock cycles used by the processing unit while executing the DFPM
algorithm.
Thus comparison was done between the DFPM computation done on the
FPGA and the same algorithm coded in C++ and run on a 2.4 GHz CPU PC.
The FPGA implementation completed the algorithm for solving the sample
problem used for testing the DFPM top module (according to simulation) in
57670 nanoseconds which is equivalent to 2883.5 clock cycles while the PC
used completed the same problem in 0.0156001 seconds.
The time used up by the PC included the time used for context switching and
kernel operations, in the operating system, as well as process user time. Provi-
sion was made in the C++ code used for implementing the algorithm and for
measuring the time taken.
In the C++ code, arrays with a dimension of 1000 were created for storing a
thousand copies of vectors A and B and the DFPM algorithm was implement-
ed and looped through each copy of the same problem statement. Thus a
thousand copies of the same problem were treated with the same algorithm.
The large number of iterations was a result of the fact that the amount of time
spent by the CPU in kernel mode was sometimes too low to be measured by
the functions used to measure the CPU process times when the algorithm was
run only once.
Hence running the algorithm a thousand times generated reasonably measur-
able process times from which the time spent by the CPU while not running
the actual algorithm was deducted and the result of the deduction was divided
by 1000 in order to trim down the CPU time obtained to what is applicable to a
single run of the DFPM algorithm.
Based on the test, and the assumptions that the program/algorithm was exe-
cuted on only one core of the CPU and that the CPU was not overclocking, the
number of clock cycles used by the PC = 2.4 * 109 * 0.0156001/1000 = 37440.240.
This evidently indicated that the FPGA implementation offers a great ad-
vantage.
It is noteworthy to state that if the CPU executed the program on multiple
cores or overclocked while running the program, the PC may have ended up

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
45
using more cycles than stated above. Nonetheless, the calculations show that
in both cases, DFPM would still have been faster. A copy of the C++ code is
included in the appendices.
6.8 Output comparison
In order to ensure consistency of results and ease of operation, a MATLAB
script was written which is able to communicate problem specifications to the
FPGA and receive its results. The MATLAB script also computes the algorithm
on its own and the two outputs were printed to the screen and compared. The
script is described further in Appendix D with the code included.
By making use of the script described above, three different problem sets were
formulated and fed to the DFPM on FPGA design through the MATLAB
script. The results obtained are shown below as well as the MATLAB plots of
the values obtained during each test.
The plots have no units on the x and y axes since the plots were only used to
indicate the proximity between the results obtained. Hence the plots showed
the location of each of the results obtained on the co-ordinate axes.
Figure 6.1 Plot of the values obtained during the first test

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
46
Table 6.1 Table of results obtained in tests with three different problem sets
Tests Results obtained
MATLAB implementation FPGA implementation
Test 1 -2.4599e-01
-1.9253e-01
+5.8280e-03
+2.5866e-01
+5.0859e-01
-2.4715e-01
-1.9301e-01
+5.7221e-03
+2.5965e-01
+5.1057e-01
Test 2 -3.8910e-01
-1.5755e-01
+1.2061e-02
+2.6273e-01
+5.1339e-01
-3.9112e-01
-1.5810e-01
+1.1765e-02
+2.6343e-01
+5.1507e-01
Test 3 +6.5463e-01
+3.7920e-01
+3.1785e-01
+6.8058e-02
-1.8173e-01
+6.5653e-01
+3.7948e-01
+3.2008e-01
+6.8391e-02
-1.8323e-01

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
47
Figure 6.2 Plot of the values obtained during the second test

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
48
Figure 6.3 Plot of the values obtained during the third test
As can be seen in the figures and table above, in each of the three tests carried
out, the results of the MATLAB implementation and the FPGA implementa-
tion tallied so much so that the point plots overlapped at each of the positions
marked on the plots, indicating that, to a large extent, the differences in the
values obtained are almost negligible.
However, it is worth noting that these tests made use of single digit data as
coefficients in the matrices and vectors used to define the problem sets. It is
believed that this implementation can handle these kinds of data but the de-
sign of the communication modules were limited and only capable (by design
intent) to handle single digit input alone.
While the MATLAB implementation produced results that are very close, it
may be reasonable to expect some variation with some other implementations
and system architectures due to the differences in hardware and software
design, as well as system optimization, be it in hardware or software.

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
49
6.9 Communication possibilities
As indicated in an earlier part of this discussion, the speed of the whole system
was limited due to bottlenecks in the UART. However, in consideration of the
fact that most inter-component communication between electronic modules
and components make use of standard protocols, of which UART is one, this
design will still perform slightly better and faster than most other designs that
make use of sequential processing.
Nonetheless, there are other faster protocols which can be exploited in order to
speed up the rate of data exchange and parallel communication can also be
considered since the FPGA has a substantial number of I/O (Input/Output)
pins.
6.10 Applications
This design concept can find application in a large number of fields ranging
from mathematical theory to real world engineering design and systems. The
DFPM can be used to model systems in nature, for instance heat flow in a
space, and fluid flow [10] etc.
A great number of applications can also be found in electronics and engineer-
ing in general. DFPM will prove very useful in solving least squares and,
possibly, weighted least squares problems in sensor fusion. This will prove
useful in radar systems, telecommunications, multi-sensor networks and
mobile sensory and localization problems often encountered in systems requir-
ing self-localization, e.g. mobile robots, and sound-source detecting systems.
DFPM looks promising for the field of image and signal processing especially
in problems requiring singular value decomposition (SVD). DFPM will also
find great usefulness in mechanics where complex linear and non-linear sys-
tems may need to be modeled.
Solutions of large matrix problems often require significant computation and
computational resources, hence DFPM can be found to be a very suitable and
resource-efficient approach to solving these problems. It will be even more
useful when the problem involves sparse matrices, a concept that is useful in
FEM based simulations which is used in all engineering fields [9].

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
50
A DFPM algorithm based on a smaller dimensioned matrix that functions as a
sliding window through the matrix can serve as a very quick, efficient ap-
proach that requires minimal computational resources.
6.11 Implications
While DFPM offers a lot of advantages and developmental possibilities, there
are situations in which its efficiency can possibly be exploited for negative
purposes.
Certain aspects of data safety and integrity depend on hashing and a signifi-
cant amount of computational resource and time is required to break them but
the advent of simpler algorithms and dedicated devices (e.g.) FPGAs with
great computational power facilitate access to, supposedly secured, data by
criminals.

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
51
7 Conclusions
It was found that the design approach met expectations and offered significant
advantages over traditional computational devices and methods. It was also
found that implementing the DFPM algorithm in FPGA is an efficient ap-
proach to reducing computation time and improving resource efficiency.
Since the DFPM algorithm is widely applicable to a number of other problems,
implementing the algorithm in a dedicated device that makes efficient use of
resources, while increasing the speed at which results are obtained, offers a lot
of advantages.
7.1 Benchmark
In order to base the conclusions drawn in this project on criteria that are inde-
pendent of platforms, the computation output and the number of clock cycles
were used.
Based on the result of a test carried out using the C++ snippet in Appendix A,
on a mobile PC, Acer Aspire 5750, with dual CPU cores running at 2.4 GHz
clock speed, it was observed that the same algorithm applied to a specific
problem required 75754 clock cycles on the PC while the same problem was
completed in 3192 clock cycles using the FPGA implementation.
Regardless of the significant difference in computation time and computational
architecture and resources, the results obtained from both computations were
close enough to be regarded as equivalent.
Hence, the initial goals of the design were achieved and the expectation of
superior performance and resource-efficiency was verified.
7.2 Further work
A lot can be improved in this design. Below is a list of possibilities:
1. Improving the forward translation modules so that they can handle
multi-digit decimal input in the problem set.
2. Modifying the module that reverse-translates the solution vector from
the DFPM top module so that they are able to handle the full range of
bits representing fractional values in the data type used in the design.

DFPM On FPGA
Taiyelolu Adeboye
6 Discussion
2015-09-25
52
3. Designing the DFPM computational module to be able to handle larger
problem sets along with the possibility of handling multi-dimensional
problem sets.
4. Enhancing the UART baud rate as well as making it configurable in use.
This will reduce the stress that can be encountered while setting up a
connection between the UART on the FPGA and the terminal applica-
tion software.
5. Enhancing the design so that it can handle multiple problem sets, i.e. re-
ceive a problem set, resolve it and return to wait for the next problem.

DFPM On FPGA
2015-09-25
53
References
[1] S. Edvardsson, M. Gulliksson, J. Persson, et. al, “The Dynamic Functional
Particle Method: An Approach for Boundary Value Problems”, J. Appl.
Mech. 79(2) 021012 (Feb 24, 2012)
[2] S. Edvardsson et al, Role of the dynamic functional particle method for
solving linear equations, Physical Review E. Statistical, Nonlinear, and
Soft Matter Physics.
[3] R. Sincovec, N. Madsen, Software for non-linear partial differential
equations, ACM Trans. Math. Softw. 1 (1975) 232 260
[4] V. Pata, M. Squassina, On the strongly damped wave equation, Com-
mun. Math. Phy. 253 (2005) 511 533
[5] F. Alvarez, On the minimization property of a second order dissipative
system in Hilbert spaces, Siam J. Control Optim. 38 (2000) 1102 1119
[6] B. Land, Hybrid Computing On an FPGA, Cornell University,
https://courses.cit.cornell.edu/ece576/DDA/FPGAhybridBRL.pdf, last re-
trieved 2014-09-25
[7] Xilinx Inc., 2013: Spartan 3-E FPGA family data sheet,
http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf ,
last retreived 2014-09-25
[8] Digilent Inc., 2011, Digilent Nexys2 Board Reference manual,
http://www.digilentinc.com/data/products/nexys2/nexys2_rm.pdf , last
retrieved 2014-09-25
[9] Y. Saad, Iterative methods for sparse linear systems, 2nd ed., Society for
Industrial and applied mathematics, 2003.
[10] Ne_Zheng Sun, Applications of numerical methods to simulate the
movements of contaminants in groundwater, Environmental Health Per-
spectives, Vol. 83, (Nov. 1989), pp. 97 – 115.
[11] ASCII Table, www.asciitable.com , last retrieved 2014-09-26.

DFPM On FPGA
Appendix A: Documentation of
developed program code
2015-09-25
54
Design codes
Vector multiplication
1 --------------------------------------------------------------
2 -- Company: Mid Sweden University
3 -- Engineer: Taiyelolu Adeboye
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral
8 -- Project Name: DFPM on FPGA
9 -- Target Devices: Nexys2
10 -------------------------------------------------------------
11 library IEEE;
12 use IEEE.STD_LOGIC_1164.ALL;
13 use IEEE.std_logic_signed.all;
14 use work.DFPM_ARRAY_5X32_BIT.all;
15
16 -- Uncomment the following library declaration if using
17 -- arithmetic functions with Signed or Unsigned values
18 use IEEE.NUMERIC_STD.ALL;
19
20 -- Uncomment the following library declaration if instantiating
21 -- any Xilinx primitives in this code.
22 --library UNISIM;
23 --use UNISIM.VComponents.all;
24
25 entity Signed_Vector_Vector_Mult_5By1 is
26 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT;
27 Vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;
28 CLK : in STD_LOGIC;
29 RST : in STD_LOGIC;
30 Vector_Out : out Signed (32 downto 0));
31 end Signed_Vector_Vector_Mult_5By1;
32
33 architecture Behavioral of Signed_Vector_Vector_Mult_5By1 is
34
35 Signal Mult0, Mult1, Mult2,
Mult3, Mult4 : Signed(65 downto 0):= (others => '0');
36
37 Signal Sum : Signed(69 downto 0):= (others => '0');
38
39 begin
40
41 Mult0 <= Vector_1(0) * Vector_2(0);
46
47 Sum <= "0000" & Mult0 + Mult1 + Mult2 + Mult3 + Mult4;
48
49 Vector_Out <= Sum(48 downto 16);

DFPM On FPGA
2015-09-25
55
50
51 end Behavioral;
Vector subtraction
1 --------------------------------------------------------------
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
10 -------------------------------------------------------------
11
12 library IEEE;
16
20
23 --use IEEE.NUMERIC_STD.ALL;
24
25
29
30 entity Signed_Vector_Vector_5By1_Subtr is
32 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;
35 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT);
36 end Signed_Vector_Vector_5By1_Subtr;
37
38 architecture Behavioral of Signed_Vector_Vector_5By1_Subtr is
39
40 Signal Subtr0, Subtr1, Subtr2, Subtr3, Subtr4 : Signed(33 downto 0);
41
42 begin
43
44 Subtr0 <= '0' & Vector_1(0) - vector_2(0);
49
50 Vector_Out(0) <= Subtr0(32 downto 0);
55
56
57 end Behavioral;
Subtraction and multiplication operations
Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015
Page 1

DFPM On FPGA
2015-09-25
56
1 --------------------------------------------------------------
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
10 -------------------------------------------------------------
11
12 library IEEE;
18
19
20 entity Signed_SubtrAndMult_Ops_Module is
21 Port ( Vector_A : in DFPM_SIGNED_VECTOR_25X32_BIT;
22 Vector_B : in DFPM_SIGNED_VECTOR_5X32_BIT;
23 Vector_X : in DFPM_SIGNED_VECTOR_5X32_BIT;
24 Scalar_Mu : in SIGNED (32 downto 0);
25 Vector_V : in DFPM_SIGNED_VECTOR_5X32_BIT;
26
29 NEW_ITERATION : in STD_LOGIC := '0';
30 ITERATION_COMPLETE : out STD_LOGIC:= '0';
31
32 B_Minus_AX : out DFPM_SIGNED_VECTOR_5X32_BIT;
33 B_Minus_Ax_Minus_muV : out DFPM_SIGNED_VECTOR_5X32_BIT);
34 end Signed_SubtrAndMult_Ops_Module;
35
36 architecture Behavioral of Signed_SubtrAndMult_Ops_Module is
37
38 ------------------------------------------------
39
40
41 -- This component will be used to evaluate
42 -- The vector multiplication A*X
43 -- It takes two input of 5 by 1 vectors
44 COMPONENT Signed_Vector_Vector_Mult_5By1
45 PORT(
46 Vector_1 : IN DFPM_SIGNED_VECTOR_5X32_BIT;
47 Vector_2 : IN DFPM_SIGNED_VECTOR_5X32_BIT;
48 CLK : IN std_logic;
49 RST : IN std_logic;
50 Vector_Out : OUT Signed(32 downto 0)
51 );
52 END COMPONENT;
53
54 -- This component will be used top evaluate the subtraction in B -
Ax
55 COMPONENT Signed_Vector_Vector_5By1_Subtr
57 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;
60 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT);
61 END COMPONENT;
62

DFPM On FPGA
2015-09-25
57
63 ------------------------------------------------
64
65
66
67 ------------------------------------------------
68 -- Signals for storing the input values
69 Signal Sig_Vector_A : DFPM_SIGNED_VECTOR_25X32_BIT := ( ((Others =>
'0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')),
70 ((Others => '0'), (Others
=> '0'), (Others => '0'), (Others => '0'), (Others => '0')));
74
75 Signal Sig_Vector_B : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
'0'), (Others =>
'0'), (Others => '0'), (Others => '0'), (Others => '0'));
76 Signal Sig_Vector_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
'0'), (Others =>
77 Signal Sig_Scalar_Mu: SIGNED (32 downto 0);
78 Signal Sig_Vector_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>
'0'), (Others =>
79
80
81 -- The two signals below are used to connect the signals at the
Vector_vector_Mult_Module
82 -- To the the Corresponding Vector indexes.
83 -- These were used to avoid assigning Dynamically changing signals
directly to a
static line
84 Signal Sig_Vector_A_With_IndexPosition : DFPM_SIGNED_VECTOR_5X32_BIT
:= ((Others =>
'0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others =>
'0'));
85
86 Signal Sig_Vector_A_Mult_X_With_IndexPosition : SIGNED (32 downto
0);
87
88 -- These following two(2) signals will be used to store the products
of the
89 -- Multiplication of Vectors A and X
90 -- as well as Scalar mu and Vector V.
91 Signal Sig_Vector_A_Mult_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others
=> '0'), (
Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'));
92 Signal Sig_Vector_Mu_Mult_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth-
ers => '0'), (
93
94 -- These following tow signals will be used to store the result
95 -- of the subtraction operations
96 Signal Sig_Vector_B_Minus_AX : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth-
ers => '0'), (
97 Signal Sig_Vector_B_Minus_AX_Minus_MuV : DFPM_SIGNED_VECTOR_5X32_BIT
:= ((Others =>

DFPM On FPGA
2015-09-25
58
'0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others =>
'0'));
98
99 -- This signal will only be raised for one clock cycle
100 -- when there is a new set of data for available computation
101 Signal DFPMCompute : STD_LOGIC := '0';
102
103 -- This signal is used to sommunicate with other modules "down-
stream" of this module
104 -- when there the result of this module's computation is ready
105 Signal Sig_ITERATION_COMPLETE : STD_LOGIC := '0';
106
107 -- This Signal will be used to represent the index position that
108 -- that will be progressively incremented as a means of pipelining
109 -- data for multiplication in this module as well as input for the
110 -- Vector_Vector_Multiplication module
111 Signal MultplicationStageArrayPosition : integer := 0;
112
113 -- This signal will be used to signal when the index position
114 -- can be shifted and when data can be stored for output
115 Signal Shift_Array_Position : STD_LOGIC := '0';
116
117 -- This signal will be raised once when all the products of multi-
plication are
ready.
118 -- This is to enable the module to signal to other modules "down-
stream"
119 -- that the result of the computation is ready
120 Signal MultiplicationProductsReady : STD_LOGIC := '0';
121
122 Signal ReadyFlag : STD_LOGIC := '0';
123
124 -- This clock signal was created as a slowed down (half pace of
CLK)
125 -- And will be used for clocking the shifting of the index position
126 Signal Sig_Clk_For_Index_Shifting : STD_LOGIC := '0';
127
128
129 begin
130 -- For Vector - Vector multiplication
131 Vector_Vector_Mult: Signed_Vector_Vector_Mult_5By1 PORT MAP (
132 Vector_1 => Sig_Vector_A_With_IndexPosition,
133 Vector_2 => Sig_Vector_X,
134 CLK => CLK,
135 RST => RST,
136 Vector_Out => Sig_Vector_A_Mult_X_With_IndexPosition);
137
138 -- For Subtraction operations for B - AX
139 Doing_B_Minus_AX : Signed_Vector_Vector_5By1_Subtr PORT MAP (
140 Vector_1 => Sig_Vector_B,
141 vector_2 => Sig_Vector_A_Mult_X,
142 CLK => CLK,
143 RST => RST,
144 Vector_Out => Sig_Vector_B_Minus_AX);
145
146 -- For Subtraction operations for B - AX - muV
147 Doing_B_Minus_AX_Minus_MuV : Signed_Vector_Vector_5By1_Subtr PORT
MAP (
148 Vector_1 => Sig_Vector_B_Minus_AX,
149 vector_2 => Sig_Vector_Mu_Mult_V,
150 CLK => CLK,
151 RST => RST,
152 Vector_Out => Sig_Vector_B_Minus_AX_Minus_MuV);

DFPM On FPGA
2015-09-25
59
153
154 -- This signal wiill be used to signal that the output of this
module is ready to
be read.
155 ITERATION_COMPLETE <= Sig_ITERATION_COMPLETE;
156
157
158
159
160
161 -- This process determines the when each iteration of the DFPM
algorithm is to be
started
162 -- Computation will only be done if it's a new iteration and it has
not been
completed before
163 -- Therefore this process sets DFPMCompute to '1' only on the
rising edge of
NEW_ITERATION
164 -- And stored new Value into the Vectors only at the rising edge of
NEW_ITERATION
165 process(CLK, RST, Sig_ITERATION_COMPLETE, NEW_ITERATION)
166 Variable NEW_ITERATION_Var : STD_LOGIC := '0';
167 begin
168 if rising_edge(CLK) then
169 if (RST = '1') then
170 DFPMCompute <= '0';
171 NEW_ITERATION_Var := '0';
172 elsif (Sig_ITERATION_COMPLETE = '1') then
175 -- This more or less senses for the rising edge of NEW_ITERATION
176 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '0') then
177 --if rising_edge(NEW_ITERATION) then
179
180 Sig_Vector_A <= Vector_A;
181 Sig_Vector_B <= Vector_B;
182 Sig_Vector_X <= Vector_X;
183 Sig_Vector_V <= Vector_V;
184 Sig_Scalar_Mu <= Scalar_Mu;
185
187 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '1') then
190 elsif (NEW_ITERATION = '0') then
193 end if;
194 end if;
195 end process;
196
197
198 -- This process determies the array postions to be multiplied
together for A*X
199 process(RST, Sig_ITERATION_COMPLETE, DFPMCompute,
Shift_Array_Position,
NEW_ITERATION, CLK, Sig_Clk_For_Index_Shifting, MultplicationStageAr-
rayPosition,
Sig_Vector_A, Sig_Vector_A_Mult_X_With_IndexPosition, Sig_Scalar_Mu,
Sig_Vector_V)
200 Variable MultplicationStageArrayPosition_Var : integer := 0;

DFPM On FPGA
2015-09-25
60
201
202 begin
203 if (RST = '1') then
204 MultplicationStageArrayPosition <= 0;
205 Shift_Array_Position <= '0';
206 MultiplicationProductsReady <= '0';
207
208 elsif (Sig_ITERATION_COMPLETE = '1') then
211
212 elsif (DFPMCompute = '1') then -- Checking for the rising edge of
NEW
iteration here
216
217 -- Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0);
218 -- Sig_Vector_A_Mult_X(0) <=
Sig_Vector_A_Mult_X_With_IndexPosition;
219 -- productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0);
220 -- Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <=
productTempStore(48 downto 16);
221
222 elsif (Shift_Array_Position = '1') then
223 if rising_edge(Sig_Clk_For_Index_Shifting) then
224 if (MultplicationStageArrayPosition = 5) then
228 else
229 MultplicationStageArrayPosition_Var :=
MultplicationStageArrayPosition;
230 MultplicationStageArrayPosition <=
MultplicationStageArrayPosition_Var + 1;
231 end if;
232 end if;
233 end if;
234 end process;
235
236 process(CLK, DFPMCompute, Shift_Array_Position, Multplication-
StageArrayPosition)
237 Variable productTempStore : Signed(65 downto 0);
238 begin
240 if (Shift_Array_Position = '1') and ( MultplicationStageArrayPosi-
tion < 5
) then
241 case MultplicationStageArrayPosition is
242 when 0 =>
243 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0);
244 Sig_Vector_A_Mult_X(0) <= Sig_Vector_A_Mult_X_With_IndexPosition;
245 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0);
246 when 1 =>
250 when 2 =>
254 when 3 =>

DFPM On FPGA
2015-09-25
61
258 when 4 =>
262 when Others =>
263 NULL;
264 end case;
265 -- -- Setting the correcponding Vector_A element as the input to
the Vector_Vector_Mult_Module
266 -- Sig_Vector_A_With_IndexPosition <=
Sig_Vector_A(MultplicationStageArrayPosition);
267 -- -- Connecting the output of the Vector_Vector_Mult module to
tghe corresponding A_Mult_X index
268 -- Sig_Vector_A_Mult_X(MultplicationStageArrayPosition) <=
Sig_Vector_A_Mult_X_With_IndexPosition;
269 -- -- Doing mu*V
270 -- productTempStore := Sig_Scalar_Mu *
Sig_Vector_V(MultplicationStageArrayPosition);
271 Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <=
productTempStore(48 downto 16);
272 end if;
273 end if;
274 end process;
275
276
277 -- This process clears ITERATION_COMPLETE and
278 -- only sets it to 1 when the MultiplicationProductsReady signal is
high.
279 -- At the rising_edge of MultiplicationProductsReady, the vectors
280 -- B_Minus_AX and B_Minus_Ax_Minus_muV are assigned.
281 process(CLK, RST, DFPMCompute, MultiplicationProductsReady, Ready-
Flag)
282 begin
283 if rising_edge(clk) then
284 if (RST = '1') then
285 Sig_ITERATION_COMPLETE <= '0';
286 ReadyFlag <= '0';
287
288 elsif (DFPMCompute = '1') then
291 elsif (MultiplicationProductsReady = '1') and (ReadyFlag = '0')
then
293
295 B_Minus_AX <= Sig_Vector_B_Minus_AX;
296 B_Minus_Ax_Minus_muV <= Sig_Vector_B_Minus_AX_Minus_MuV;
297 else
299 -- end if;
300 end if;
301 end if;
302 end process;
303
304 -- The clock signal created in this process is a real afterthought
305 -- It would not have been created if this module had behaved itself
;-))
306 -- It was observed that the circuit computed an output that was
wrong

DFPM On FPGA
2015-09-25
62
307 -- For as long as the shifting of the index position was based on
the normal clock
"CLK"
308 -- Hence this clock that cuts the speed to half.
Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015
Page 7
309 process(CLK)
310 begin
312 Sig_Clk_For_Index_Shifting <= not(Sig_Clk_For_Index_Shifting);
313 end if;
314 End process;
315
316 end Behavioral;
317
318
Tolerance check
1 ---------------------------------------------------------------------
-------------
4 --
5 -- Create Date: 10:42:33 01/07/2015
6 -- Design Name:
10 --------------------------------------------------------------------
--------------
11
12 library IEEE;
16
17
21
24 --use IEEE.NUMERIC_STD.ALL;
25
26 -- Uncomment the following library declaration if instantiating
27 -- any Xilinx primitives in this code.
28 --library UNISIM;
29 --use UNISIM.VComponents.all;
30
31 entity Signed_Tolerance_Check is
32 Port ( Vector_B_AX : in DFPM_SIGNED_VECTOR_5X32_BIT;
33 Tolerance_Limit : in Signed (32 downto 0);
34 Iteration_Complete : in STD_LOGIC:= '0';
35
36 CLK : in STD_LOGIC:= '0';
37 RST : in STD_LOGIC:= '0';
38

DFPM On FPGA
2015-09-25
63
39 Tolerance_Limit_Squared, Vector_B_AX_Sum : out Signed (32 downto 0);
40
41 Iterate : out STD_LOGIC := '1');
42 end Signed_Tolerance_Check;
43
44 architecture Behavioral of Signed_Tolerance_Check is
45
46 Signal Sig_Vector_B_AX, Sig_Vector_B_AX_Squared :
DFPM_SIGNED_VECTOR_5X32_BIT;
47 Signal Sig_Tolerance_Limit, Sig_Tolerance_Limit_Squared : Signed (32
downto 0);
48
49 Signal Sig_Vector_B_AX_Sum : Signed(32 downto 0);
50
51 Signal Sig_Position : integer := 0;
52
53 Signal Sig_ShiftPosition, Sig_Multiplication_Is_Complete,
Sig_Check_Tolerance_Limit
: STD_LOGIC := '0';
54
55
56
57
58 begin
59
60 Tolerance_Limit_Squared <= Sig_Tolerance_Limit_Squared;
61 Vector_B_AX_Sum <= Sig_Vector_B_AX_Sum;
62
63 -- This process determines when data stored innternally are to be
serially
multiplied
64 -- They are serially multiplied to save on Multipliers
65 process(CLK, RST, Iteration_Complete, Sig_ShiftPosition,
Sig_Position)
66 Variable Var_Position: integer := 0;
67 begin
69 if (RST = '1') then
70 Sig_Position <= 0;
71 Sig_ShiftPosition <= '0';
72 Sig_Multiplication_Is_Complete <= '0';
73 elsif (Iteration_Complete = '1') then
74 Sig_Check_Tolerance_Limit <= '0';
78 elsif (Sig_Multiplication_Is_Complete = '1') then
79 Sig_Check_Tolerance_Limit <= '1';
80 else
81 if (Sig_ShiftPosition = '1') then
82 if (Sig_Position = 5) then
86 else
87 Var_Position := Sig_Position;
88 Sig_Position <= Var_Position + 1;
89 end if;
90 end if;
91 end if;
92 end if;
93 end process;
94

DFPM On FPGA
2015-09-25
64
95 -- Storing data internally at when signal from SubtrAndMult Module
is high
96 process(Iteration_Complete)
97 Variable productTempStore : Signed(65 downto 0) := (Others => '0');
98 begin
99 if rising_edge(Iteration_Complete) then
100 Sig_Tolerance_Limit <= Tolerance_Limit;
101 Sig_Vector_B_AX <= Vector_B_AX;
102 end if;
103 end process;
104
105 -- Serial multiplication
106 process(CLK, Sig_ShiftPosition, Sig_Position)
107 Variable productTempStore : Signed(65 downto 0);
108 begin
109 if rising_edge(clk) then
110 if (Sig_ShiftPosition <= '1') then
111 Case Sig_Position is
112 when 0 =>
113 productTempStore := (Sig_Vector_B_AX(Sig_Position) *
Sig_Vector_B_AX(Sig_Position));
114 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48
downto 16);
115 when 1 =>
downto 16);
118 when 2 =>
downto 16);
121 when 3 =>
downto 16);
124 when 4 =>
downto 16);
127 when 5 =>
128 productTempStore := Sig_Tolerance_Limit * Sig_Tolerance_Limit;
129 Sig_Tolerance_Limit_Squared <= productTempStore(48 downto 16);
130 when others =>
131 NULL;
132 End case;
133 end if;
134 end if;
135 end process;
136
137 process(Sig_Multiplication_Is_Complete)
138 variable Var_Vector_B_AX_Sum : Signed (36 downto 0);
139 begin
140 if rising_edge(Sig_Multiplication_Is_Complete) then
141 Var_Vector_B_AX_Sum := ("0000" & Sig_Vector_B_AX_Squared(0) +
Sig_Vector_B_AX_Squared(1)
142 + Sig_Vector_B_AX_Squared(2) +
Sig_Vector_B_AX_Squared(3)
143 + Sig_Vector_B_AX_Squared(4));
144

DFPM on FPGA -Bachelor Thesis Report

DFPM on FPGA -Bachelor Thesis Report

Recommended

Recommended

More Related Content

Similar to DFPM on FPGA -Bachelor Thesis Report

Similar to DFPM on FPGA -Bachelor Thesis Report (20)

DFPM on FPGA -Bachelor Thesis Report