SlideShare a Scribd company logo
1 of 35
Reducing Computational complexity of Mathematical
functions using FPGA
Neha Gour, M.Tech. (VLSI Design)
Department of Electronics, Banasthali University
Guided by:
Prof. Arup Banerjee
RRCAT, Indore
Outline
Introduction
Motivation
Conventional Processor vs. FPGA
Basic skeleton of project
Work carried out
I. Optimization Directives
II. Addition of Fixed Point Numbers
III. Addition of Floating Point Numbers
IV. Matrix Multiplication using Integer Numbers
Result and Discussion
2
Introduction
• Analysis of complex algorithm demand for less execution time and low storage space.
• Computationally intensive applications such as machine learning, weather forecasting, big data,
computational biology etc.
• Require a lot of computation time to execute, because usually the task is sequentially simulated.
3
Motivation
• To reduce the computation time and make system efficient using concept of parallelism.
• Parallel processing through FPGA is one possible solution to reduce execution time.
• Objective of this work is to improve the execution time of mathematical functions using
optimization directives such as loop pipeline and loop unrolling.
4
Conventional Processor vs. FPGA
Conventional Processor FPGA
• Sequential processing device. • Parallel processing device.
• Large no. of clock cycles are required to
perform a specific task.
• Fewer clock cycles are required to execute the
task.
• It has fixed ALU. • FPGA has programmable ALUs.
5
Overview of an FPGA
6
 2-D array of logic blocks with electrically programmable interconnections.
 User can configure
I. Interconnection between logic blocks,
II. The function of each block.
Figure: Basic structure of FPGA
Look Up
Tables
7
Basic Skeleton of the Project
High-Level language
(C-program)
LUT 1
LUT 2
LUT 3
* Parallel processing through FPGA is implemented using High-Level Synthesis(HLS).
Work Carried Out
8
C-Language
APPLYING DIRECTIVES
I. LOOP PIPELINE
II. LOOP UNROLL
FPGA
Fig. Basic Block diagram of design
NOTE: HLS is done by Vivado HLS(2017.2) software. Hardware implementation is done using Artix-7 FPGA.
1. Sequential loop
9
A[5]
B[5]
DATA
1
2
3
4
5
1
2
3
4
5
+
+
+
+
+
C[0]
C[1]
C[2]
C[4]
C[3]
For(i=0; i<5; i++)
{
C[i]=A[i]+ B[i];
}
I. Optimization Directives
1. Loop unrolling
• Loop unrolling is a directive that exploits the parallelism between loop iteration.
• It creates multiple copies of the loop body and adjust the loop iteration counter accordingly.
• Directive command: #pragma HLS UNROLL factor = <INTEGER>.
Unrolled loop
For( i=0;i<5; i++){
C[i]=A[i]+B[i];
C[i+1]=A[i+1]+B[i+1];
.
.
.
C[i+4]=A[i+4]+B[i+4]
}
10
cont..
DATA
11
A[5]
B[5]
1
2
3
4
5
1
2
3
4
5
+
C[0]
+
+
C[1]
C[2]
+
C[3]
+ C[4]
2. Loop Pipelining
• Loop pipelining allows the operation in a loop to be implemented in a concurrent manner.
• In pipelining the next iteration of loop can start before current iteration is finished.
• Directive command: #pragma HLS PIPELINE initiation interval(II)* = <INTEGER>.
12
*Initiation Interval (II) is the number of clock cycles between the start times of consecutive loop iterations.
cont..
13
A[5]
B[5]
1
2
3
4
5
1
2
3
4
5
+
+
+
+
+
C[0]
C[1]
C[2]
C[3]
C[4]
DATA
II. Addition of Fixed-Point Numbers
1. Sequential loop
int i, int A[i], int B[i], int C[i];
For(i=0; i<8; i++) {
C[i]=A[i]+ B[i];
}
14Fig. Fixed point addition simulation result of sequential process
2. Loop Pipelining
15Fig. Fixed point addition simulation result after applying pipelining
For(i=0; i<8; i++)
{
#pragma HLS pipeline II=1;
C[i]=A[i]+ B[i];
}
3. Loop Unrolling
16Fig. Fixed point addition simulation result after applying unrolling
For(i =0; i<8; i ++) {
C[i]=A[i]+ B[i];
C[i+1]=A[i+1]+ B[i+1];
.
.
.
C[i+7]=A[i+7]+ B[i+7];
}
Hardware Realization of Fixed-Point Addition
1. Sequential loop
17
State 1 State 2
State 3
RESET
EXIT CONDITION
OPERATION
EXIT=1
EXIT=0
2. Loop pipelining
18
State 1 State 2
State 3
RESET
EXIT CONDITION
OPERATION
EXIT=0
EXIT=1
3. Loop unrolling
19
State
1
State
2
RESET
EXIT
OPERATION
PARAMETERS Sequential Pipeline Unroll
Loop latency 17 10 1
LUTs 50 67 117
I/O ports 142 187 468
Comparison of computational optimization Directive of fixed-point addition
III. Addition of Floating Point Numbers
• For the addition and multiplication of floating point numbers we have to use floating point IPs.
• Numbers written in scientific notation have three components
20
Exponent Mantissa
Sign
31 30 23
8 bits
22 0
23 bits1bit
Single precision format(32 bit)
Floating-point Addition Block diagram(IP)
21
1. Sequential loop
int i, float A[i], float B[i], float C[i];
For(i=0; i<8; i++;) {
C[i]=A[i]+ B[i];
}
22
Fig. Floating point addition simulation result of sequential process
2. Loop Pipelining
23
Fig. Floating point addition simulation result after applying pipelining
3. Loop Unrolling
Fig. Floating point addition simulation result after applying unrolling
Hardware Realization of Floating-Point Addition
1. Sequential loop
24
State 1 State 2
State 8 State 7 State 6
State
5
State 4State 3
State
9
RESET
EXIT =0
EXIT
CONDITION
EXIT =1
2. Loop pipelining
OPERATION
RESET
EXIT =0
EXIT =1
State 1 State 2
State 3
26
3. Loop Unrolling
State 1 State 2
State 3
RESET
State 4
27
PARAMETER Sequential Pipeline Unroll
Loop latency 65 14 3
LUTs 272 326 1771
I/O ports 104 205 708
DSP48E 2 2 16
Comparison of computational optimization Directive of
floating-point addition
IV. Matrix multiplication using integer numbers
• Algorithm of Matrix multiplication
start
Read a[3][3],b[3][3]
For i=0 to 2
For j=0 to 2
For k=0 to 2
rows
C[i][j] += A[i][k] * B[k][j];
columns
product
* Sequential implementation of above code will take 79 clock cycles for completion. 28
29
1. Sequential implementation of matrix multiplication
Fig. Matrix multiplication simulation result by conventional method
Matrix multiplication with integer numbers Block
diagram(IP)
30
31
PARAMETER Sequential Pipelining Unrolling
Loop latency 79 21 10
LUTs 142 282 367
I/O ports 41 58 270
DSP48E 1 2 6
Comparison of computational optimization Directive of
Matrix multiplication
Results and Conclusion
• Applications of optimization directives have explored to reduce execution time.
• Loop Pipelining and loop unrolling of fixed-point addition show the reduction in delay by approx.
28% and 71%, and increase in hardware by 14% and 68% respectively, as compared to sequential.
• Experimental results demonstrate that, pipelining and unrolling of floating-point addition show
the reduction in delay by 72% and 91%, and increase in hardware 2 times and 5 times
respectively, as compared to sequential processing. DSP48E increased by 16 from 2 slices
compared to the conventional.
• Loop Pipelining and loop unrolling of matrix multiplication with integer entries show the
reduction in delay by approx. 73% and 87%, and increase in hardware by 30% and nearly 75%
respectively, as compared to sequential. In addition increase in DSP48E slices is also observed.
• Simulation results show that proposed design has reduced time complexity for mathematical
functions.
32
References
[1] Bob Z., ”Introduction to FPGA design” . Embedded system conference Europe.1999 classes 304-
314.
[2] James H., “Floating Point Design with Vivado HLS XAP599 (v1.0)” September 20, 2012
[3] Spyridon G., John E., “Overview of High-Level Synthesis tool”. Topical workshop on electronics for
particle physics. 2010
[4] Sumit G., Rajesh G., Nikhil D. D., Alexandru N., “SPARK: A Parallelization approach to the High-
Level Synthesis of Digital Circuit”. 2004, Springer Science US.
[5] Mohsen E., “Reducing system power and cost with Artix-7 FPGAs” . Xilinx, Artix-7, 2012:7:1-2.
33
Acknowledgement
• I would like to thank my project coordinator Prof. Arup Banerjee, who gave me the opportunity
to do this wonderful project.
• I would like to convey my sincere thanks and deepest regards to Dr. Srivathsan Vasudevan and
Dr. Satya S. Bulusu of IIT Indore for technical discussions and guidance for this work.
34
35

More Related Content

What's hot

Designing of Adders and Vedic Multiplier using Gate Diffusion Input
Designing of Adders and Vedic Multiplier using Gate Diffusion InputDesigning of Adders and Vedic Multiplier using Gate Diffusion Input
Designing of Adders and Vedic Multiplier using Gate Diffusion InputIRJET Journal
 
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
SCS-MCSA- Based Architecture for Montgomery Modular MultiplicationSCS-MCSA- Based Architecture for Montgomery Modular Multiplication
SCS-MCSA- Based Architecture for Montgomery Modular MultiplicationIRJET Journal
 
Basics of programming embedded processors
Basics of programming embedded processorsBasics of programming embedded processors
Basics of programming embedded processorsMurphy Chen
 
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...IJERA Editor
 
High performance parallel prefix adders with fast carry chain logic
High performance parallel prefix adders with fast carry chain logicHigh performance parallel prefix adders with fast carry chain logic
High performance parallel prefix adders with fast carry chain logiciaemedu
 
A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...
A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...
A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...Anders L. Madsen
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET Journal
 
Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders IOSR Journals
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetaggingTakashi Abe
 
Implementation of Energy Efficient Scalar Point Multiplication Techniques for...
Implementation of Energy Efficient Scalar Point Multiplication Techniques for...Implementation of Energy Efficient Scalar Point Multiplication Techniques for...
Implementation of Energy Efficient Scalar Point Multiplication Techniques for...idescitation
 
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORCOUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORIJNSA Journal
 
Generating lisp program for assembly drawing in AutoCAD
Generating lisp program for assembly drawing in AutoCAD Generating lisp program for assembly drawing in AutoCAD
Generating lisp program for assembly drawing in AutoCAD ISAAC SAMUEL RAJA T
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentationjesujoseph
 
IRJET- Parallelization of Definite Integration
IRJET- Parallelization of Definite IntegrationIRJET- Parallelization of Definite Integration
IRJET- Parallelization of Definite IntegrationIRJET Journal
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureIRJET Journal
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptgrssieee
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMijfls
 

What's hot (19)

Designing of Adders and Vedic Multiplier using Gate Diffusion Input
Designing of Adders and Vedic Multiplier using Gate Diffusion InputDesigning of Adders and Vedic Multiplier using Gate Diffusion Input
Designing of Adders and Vedic Multiplier using Gate Diffusion Input
 
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
SCS-MCSA- Based Architecture for Montgomery Modular MultiplicationSCS-MCSA- Based Architecture for Montgomery Modular Multiplication
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
 
Basics of programming embedded processors
Basics of programming embedded processorsBasics of programming embedded processors
Basics of programming embedded processors
 
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
 
I43024751
I43024751I43024751
I43024751
 
High performance parallel prefix adders with fast carry chain logic
High performance parallel prefix adders with fast carry chain logicHigh performance parallel prefix adders with fast carry chain logic
High performance parallel prefix adders with fast carry chain logic
 
A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...
A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...
A New Method for Vertical Parallelisation of TAN Learning Based on Balanced I...
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL Accelerator
 
Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders Design of 32 bit Parallel Prefix Adders
Design of 32 bit Parallel Prefix Adders
 
S1140183 Presentation
S1140183 PresentationS1140183 Presentation
S1140183 Presentation
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetagging
 
Implementation of Energy Efficient Scalar Point Multiplication Techniques for...
Implementation of Energy Efficient Scalar Point Multiplication Techniques for...Implementation of Energy Efficient Scalar Point Multiplication Techniques for...
Implementation of Energy Efficient Scalar Point Multiplication Techniques for...
 
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSORCOUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
COUPLED FPGA/ASIC IMPLEMENTATION OF ELLIPTIC CURVE CRYPTO-PROCESSOR
 
Generating lisp program for assembly drawing in AutoCAD
Generating lisp program for assembly drawing in AutoCAD Generating lisp program for assembly drawing in AutoCAD
Generating lisp program for assembly drawing in AutoCAD
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
 
IRJET- Parallelization of Definite Integration
IRJET- Parallelization of Definite IntegrationIRJET- Parallelization of Definite Integration
IRJET- Parallelization of Definite Integration
 
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI ArchitectureA Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
 
FAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.pptFAST MAP PROJECTION ON CUDA.ppt
FAST MAP PROJECTION ON CUDA.ppt
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 

Similar to Reducing computational complexity of Mathematical functions using FPGA

MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxBharathiLakshmiAAssi
 
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...IRJET Journal
 
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...Silicon Mentor
 
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET Journal
 
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET Journal
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...iosrjce
 
Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners Ravi Sony
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdlRavi Sony
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...IRJET Journal
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator
FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator
FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator cscpconf
 
A Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaA Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaIJERA Editor
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 
Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...IRJET Journal
 
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdfAsst.prof M.Gokilavani
 

Similar to Reducing computational complexity of Mathematical functions using FPGA (20)

MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
 
matrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsxmatrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsx
 
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
 
Cadancesimulation
CadancesimulationCadancesimulation
Cadancesimulation
 
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
 
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
 
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
 
Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator
FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator
FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator
 
A Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaA Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on Fpga
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...
 
Deld model answer nov 2017
Deld model answer nov 2017Deld model answer nov 2017
Deld model answer nov 2017
 
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
 

Recently uploaded

Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

Reducing computational complexity of Mathematical functions using FPGA

  • 1. Reducing Computational complexity of Mathematical functions using FPGA Neha Gour, M.Tech. (VLSI Design) Department of Electronics, Banasthali University Guided by: Prof. Arup Banerjee RRCAT, Indore
  • 2. Outline Introduction Motivation Conventional Processor vs. FPGA Basic skeleton of project Work carried out I. Optimization Directives II. Addition of Fixed Point Numbers III. Addition of Floating Point Numbers IV. Matrix Multiplication using Integer Numbers Result and Discussion 2
  • 3. Introduction • Analysis of complex algorithm demand for less execution time and low storage space. • Computationally intensive applications such as machine learning, weather forecasting, big data, computational biology etc. • Require a lot of computation time to execute, because usually the task is sequentially simulated. 3
  • 4. Motivation • To reduce the computation time and make system efficient using concept of parallelism. • Parallel processing through FPGA is one possible solution to reduce execution time. • Objective of this work is to improve the execution time of mathematical functions using optimization directives such as loop pipeline and loop unrolling. 4
  • 5. Conventional Processor vs. FPGA Conventional Processor FPGA • Sequential processing device. • Parallel processing device. • Large no. of clock cycles are required to perform a specific task. • Fewer clock cycles are required to execute the task. • It has fixed ALU. • FPGA has programmable ALUs. 5
  • 6. Overview of an FPGA 6  2-D array of logic blocks with electrically programmable interconnections.  User can configure I. Interconnection between logic blocks, II. The function of each block. Figure: Basic structure of FPGA
  • 7. Look Up Tables 7 Basic Skeleton of the Project High-Level language (C-program) LUT 1 LUT 2 LUT 3 * Parallel processing through FPGA is implemented using High-Level Synthesis(HLS).
  • 8. Work Carried Out 8 C-Language APPLYING DIRECTIVES I. LOOP PIPELINE II. LOOP UNROLL FPGA Fig. Basic Block diagram of design NOTE: HLS is done by Vivado HLS(2017.2) software. Hardware implementation is done using Artix-7 FPGA.
  • 10. I. Optimization Directives 1. Loop unrolling • Loop unrolling is a directive that exploits the parallelism between loop iteration. • It creates multiple copies of the loop body and adjust the loop iteration counter accordingly. • Directive command: #pragma HLS UNROLL factor = <INTEGER>. Unrolled loop For( i=0;i<5; i++){ C[i]=A[i]+B[i]; C[i+1]=A[i+1]+B[i+1]; . . . C[i+4]=A[i+4]+B[i+4] } 10
  • 12. 2. Loop Pipelining • Loop pipelining allows the operation in a loop to be implemented in a concurrent manner. • In pipelining the next iteration of loop can start before current iteration is finished. • Directive command: #pragma HLS PIPELINE initiation interval(II)* = <INTEGER>. 12 *Initiation Interval (II) is the number of clock cycles between the start times of consecutive loop iterations.
  • 14. II. Addition of Fixed-Point Numbers 1. Sequential loop int i, int A[i], int B[i], int C[i]; For(i=0; i<8; i++) { C[i]=A[i]+ B[i]; } 14Fig. Fixed point addition simulation result of sequential process
  • 15. 2. Loop Pipelining 15Fig. Fixed point addition simulation result after applying pipelining For(i=0; i<8; i++) { #pragma HLS pipeline II=1; C[i]=A[i]+ B[i]; }
  • 16. 3. Loop Unrolling 16Fig. Fixed point addition simulation result after applying unrolling For(i =0; i<8; i ++) { C[i]=A[i]+ B[i]; C[i+1]=A[i+1]+ B[i+1]; . . . C[i+7]=A[i+7]+ B[i+7]; }
  • 17. Hardware Realization of Fixed-Point Addition 1. Sequential loop 17 State 1 State 2 State 3 RESET EXIT CONDITION OPERATION EXIT=1 EXIT=0
  • 18. 2. Loop pipelining 18 State 1 State 2 State 3 RESET EXIT CONDITION OPERATION EXIT=0 EXIT=1
  • 19. 3. Loop unrolling 19 State 1 State 2 RESET EXIT OPERATION PARAMETERS Sequential Pipeline Unroll Loop latency 17 10 1 LUTs 50 67 117 I/O ports 142 187 468 Comparison of computational optimization Directive of fixed-point addition
  • 20. III. Addition of Floating Point Numbers • For the addition and multiplication of floating point numbers we have to use floating point IPs. • Numbers written in scientific notation have three components 20 Exponent Mantissa Sign 31 30 23 8 bits 22 0 23 bits1bit Single precision format(32 bit)
  • 22. 1. Sequential loop int i, float A[i], float B[i], float C[i]; For(i=0; i<8; i++;) { C[i]=A[i]+ B[i]; } 22 Fig. Floating point addition simulation result of sequential process
  • 23. 2. Loop Pipelining 23 Fig. Floating point addition simulation result after applying pipelining 3. Loop Unrolling Fig. Floating point addition simulation result after applying unrolling
  • 24. Hardware Realization of Floating-Point Addition 1. Sequential loop 24 State 1 State 2 State 8 State 7 State 6 State 5 State 4State 3 State 9 RESET EXIT =0 EXIT CONDITION EXIT =1
  • 25. 2. Loop pipelining OPERATION RESET EXIT =0 EXIT =1 State 1 State 2 State 3
  • 26. 26 3. Loop Unrolling State 1 State 2 State 3 RESET State 4
  • 27. 27 PARAMETER Sequential Pipeline Unroll Loop latency 65 14 3 LUTs 272 326 1771 I/O ports 104 205 708 DSP48E 2 2 16 Comparison of computational optimization Directive of floating-point addition
  • 28. IV. Matrix multiplication using integer numbers • Algorithm of Matrix multiplication start Read a[3][3],b[3][3] For i=0 to 2 For j=0 to 2 For k=0 to 2 rows C[i][j] += A[i][k] * B[k][j]; columns product * Sequential implementation of above code will take 79 clock cycles for completion. 28
  • 29. 29 1. Sequential implementation of matrix multiplication Fig. Matrix multiplication simulation result by conventional method
  • 30. Matrix multiplication with integer numbers Block diagram(IP) 30
  • 31. 31 PARAMETER Sequential Pipelining Unrolling Loop latency 79 21 10 LUTs 142 282 367 I/O ports 41 58 270 DSP48E 1 2 6 Comparison of computational optimization Directive of Matrix multiplication
  • 32. Results and Conclusion • Applications of optimization directives have explored to reduce execution time. • Loop Pipelining and loop unrolling of fixed-point addition show the reduction in delay by approx. 28% and 71%, and increase in hardware by 14% and 68% respectively, as compared to sequential. • Experimental results demonstrate that, pipelining and unrolling of floating-point addition show the reduction in delay by 72% and 91%, and increase in hardware 2 times and 5 times respectively, as compared to sequential processing. DSP48E increased by 16 from 2 slices compared to the conventional. • Loop Pipelining and loop unrolling of matrix multiplication with integer entries show the reduction in delay by approx. 73% and 87%, and increase in hardware by 30% and nearly 75% respectively, as compared to sequential. In addition increase in DSP48E slices is also observed. • Simulation results show that proposed design has reduced time complexity for mathematical functions. 32
  • 33. References [1] Bob Z., ”Introduction to FPGA design” . Embedded system conference Europe.1999 classes 304- 314. [2] James H., “Floating Point Design with Vivado HLS XAP599 (v1.0)” September 20, 2012 [3] Spyridon G., John E., “Overview of High-Level Synthesis tool”. Topical workshop on electronics for particle physics. 2010 [4] Sumit G., Rajesh G., Nikhil D. D., Alexandru N., “SPARK: A Parallelization approach to the High- Level Synthesis of Digital Circuit”. 2004, Springer Science US. [5] Mohsen E., “Reducing system power and cost with Artix-7 FPGAs” . Xilinx, Artix-7, 2012:7:1-2. 33
  • 34. Acknowledgement • I would like to thank my project coordinator Prof. Arup Banerjee, who gave me the opportunity to do this wonderful project. • I would like to convey my sincere thanks and deepest regards to Dr. Srivathsan Vasudevan and Dr. Satya S. Bulusu of IIT Indore for technical discussions and guidance for this work. 34
  • 35. 35