SlideShare a Scribd company logo
1 of 4
Download to read offline
Design and Implementation of Single Precision
Pipelined Floating Point Co-Processor
Manisha Sangwan
PG Student, M.Tech VLSI Design
SENSE, VIT University
Chennai, India - 600048
manishasangwan47@gmail.com
A Anita Angeline
Professor
SENSE, VIT University
Chennai, India -600048
Abstract—Floating point numbers are used in various
applications such as medical imaging, radar, telecommunications
Etc. This paper deals with the comparison of various arithmetic
modules and the implementation of optimized floating point
ALU. Here pipelined architecture is used in order to increase the
performance and the design is achieved to increase the operating
frequency by 1.62 times. The logic is designed using Verilog
HDL. Synthesis is done on Encounter by Cadence after timing
and logic simulation.
Keywords—CLA; clock-cycles; GDM; HDL; IEEE 754;
pipelining; verilog
I. INTRODUCTION
These days the use of computers has been incorporated in
many applications such as medical imaging, radar, audio
system design, signal processors, industrial control,
telecommunications and other applications. There are
many key factors that are considered before choosing the
number system. Those preferred factors are computational
capabilities required for the application, processor and system
costs, accuracy, complexity and performance attributes. Over
the years the designers moved from fixed point to floating
point operations due to wide range along with the ability of
floating point numbers to represent a very small number to a
very large number but at the same time accuracy of
floating point numbers is less. So trade off has to be done to
get the optimized architecture.
Almost twenty years ago IEEE 754 standard was
adopted for floating point numbers. This single precision
standard is for 32-bit number and the double precision is for
64- bit number.
The storage layout consists of three components: the sign,
the exponent and mantissa. The mantissa includes implicit
leading bit and the fractional part.
TABLE I. FLOATING POINT REPRESENTATION
Sign Exponent Fractional Bias
Single Precision 1[31] 8[30-23] 23[22-00] 127
Double Precision 1[63] 11[62-52] 52[62-52] 1023
Sign Bit: It defines whether the number the number is
positive or negative. If it is 0 then number is positive else
negative.
Exponent: Both positive and negative values are
represented by this field. To do this, a bias is added to the
actual exponent in order to get the stored exponent. [10] For
single precision this value is 127 and for double precision
this value is 1023.
Mantissa: mantissa bit consists of both implicit leading bit
and fractional part it is represented in the form of 1.f where 1
is implicit and f is fractional part mantissa is also known as
significant.
II. IMPLEMENTATION
A. Adder and Subtractor
Algorithm
Fig. 1. Block diagram of Floating Point Adder and Subtractor
In adders the propagation of carry from one adder block
to another block consumes a lot of time but Carry Look Ahead
(CLA) adder save this propagation time by generating and
propagating this carry simultaneously in the consecutive
blocks. So, for faster operation this carry look ahead adder
is used.
2013 International Conference on Advanced Electronic Systems (ICAES)
F1 E1 F2 E2 F3 E3
S[i] = X[i] Y[i] C[i]
G[i] = X[i] * Y[i]
P[i] = X[i] +Y[i]
C[i+1] = X[i] * Y[i] + X[i] * C[i] + Y[i] * C[i] C[i+1] = G[i] + P[i] *
G[i] + P[i] * P[i-1] * C[i-1]
B. Multiplication
Algorithm
Fig. 2. Block diagram of Floating Point Multiplier
Multiplication is an important block for ALU. With
high speed and low power multipliers there comes the
complexity so, we need to do the trade off between these
to get the optimized algorithm with the regularity of
layout. There are different algorithms available for
multiplication such as Booth, modified booth, Wallace,
Bough Wooley, Braun multipliers. But issue with multipliers
is speed and regular layout so keeping both the parameters in
mind modified booth algorithm was chosen. It is a powerful
algorithm for signed-number multiplication, which treats both
positive and negative numbers uniformly.
C. Division
Algorithm
Fig. 3. Block diagram of Floating Point Division
For division process Goldsmith (GDM) algorithm is
used. For this algorithm to be applied needs both the inputs to
be normalized first. In this both the multiplication of inputs is
independent of each other, so can be executed in parallel,
therefore it will increase the latency.
Algorithm for GDM for Q = A/B using k iterations:
• B !=0, |e0| < 1
• Initialize N = A, D = B, R = (1-e0) / B
• For I = 0 to k
• N = N*R
• D = D*R
• R = 2 – D
• End for
• Q = N
Return Q
D. Pipelining
The speed of execution of any instruction can be varied by
a number of methods like, using a faster circuit technology to
build a processor or to arrange the hardware in such a way so
that multiple operations can be performed at the same
time. [11] By using pipelining multiple operations can be
performed simultaneously without changing the execution
time of an instruction. As shown in example below, in
sequential execution the third instruction will be executed at
sixth clock cycle but in pipelined architecture the same
instruction will be executed at fourth clock cycle and we’ll be
saving two clock cycles at the end where, F is fetching block
and E is execution block. Therefore, as the number of
instruction count will be increasing we’ll be saving more
clock cycles.
1 2 3 4 5 6
|______I1______| |_______I2______| |_______I3______|
Fig. 4. Sequential Execution
Clock Cycles: 0 1 2 3 4
F1 E1
F2 E2
F3 E3
Fig. 5. Pipelined Execution
III. FUNCTIONAL AND TIMING VERIFICATION
The functional verification is done using both Cadence and
Xilinx along with that the arithmetic results are verified
theoretically. Timing analyses, power analyses and area
analyses are done using both Cadence and Xilinx.
2013 International Conference on Advanced Electronic Systems (ICAES)
A. Adder and Subtractors
In addition and subtraction, all the sign, exponential and
fractional bits are operated separately and then the
exponents are shifted accordingly to equate the exponents
then this addition operation is done on fractional bits. The
final result is combined to make the output of 32 bits [Fig 6].
Fig. 6. Simulation Waveform for Adder and Subtractor
B. Multiplier
In multiplication block, the exponents are added and the
fractional bits are multiplied according to the algorithm. To
get the sign of the result, the XOR operation is performed on
both the input sign bits. Finally all the bits are clubbed to get
the final result [Fig 7].
Fig. 7. Simulation Waveform for Multiplier
C. Division
The exponents are subtracted and the fractional bits are
multiplied and subtracted according to the GDM algorithm
and continues iterations are done to get the closest result, for
sign bit XOR operation is performed on the sign bits of inputs.
This block is most time and area consuming [Fig 8].
Fig. 8. Simulation Waveform for Division
D. ALU Layout
In Fig 9, the final layout of the circuit is shown.
Fig. 9. ALU Layout
IV. SYNTHESIS RESULT
Synthesis results are shown in the table 2, shown below.
TABLE II. COMPARATIVE ANALYSIS OF BOTH EXISTING AND
PROPOSED DESIGN
Existing Proposed
Leakage power 2.880282 W 3.50267 W
Dynamic Power 11.377751 mW 16.14882 mW
Total power 11.380632 mW 16.15232 mW
Gate counts 2881 3712
Frequency 225.65 MHz 367.654 MHz
Critical path 4.43164 nsec 2.70 nsec
Logic utilization 1% (466/38000) 4%(1780/46560)
IOS 44% (130/296) 65% (157/ 240)
Area 75436 97194
V. CONCLUSION
In this paper various arithmetic modules are implemented
and then various comparative analyses are done. Ultimately
these individual blocks are clubbed to make Floating point
based ALU in a pipelined manner to minimize the power and
to increase the operating frequency at the same time. These
comparative analyses are done on cadence and Xilinx both.
Simulation results are verified theoretically. Verilog HDL
(Hardware Description Language) is used to design the whole
ALU block. In existing design, total power is 11.380632 mW
that is 0.70458 times less as compared to the proposed design
but the operating frequency is 1.67 times more than the
existing design. Along with that the gate count is increased
and area is also increased because of number of iterations used
in the algorithm.
2013 International Conference on Advanced Electronic Systems (ICAES)
VI. FUTURE WORK
Optimization of source code to decrease the area and gate
counts will improve the reliability. Low power techniques
could be incorporated to obtain better trade off.
REFERENCES
[1] Addanki Purna Ramesh, Ch. Pradeep, “FPGA Based Implementation of
Double Precision Floating Point Adder/Subtractor Using VERILOG”,
International Journal of Emerging Technology and Advanced
Engineering, Volume 2, Issue 7, July 2012
[2] Semih Aslan, Erdal Oruklu and Jafar Saniie, “A High Level Synthesis
and Verification Tool for Fixed to Floating Point Conversion”, 55th
IEEE Internation Midwest Symposium on Circuits and Systems
(MWSCAS 2012)
[3] Prashanth B.U.V, P. Anil Kumai, G. Sreenivasulu, “Design and
Implementation of Floating Point ALU on a FPGA Processor”,
International Conference on Computing, Electronics and Electrical
Technologies (ICCEET 2012), 2012
[4] Subhajit Banerjee Purnapatra, Siddharth Kumar, Subrata Bhattacharya,
“Implementation of Floating Point Operations on Fixed Point Processor
– An Optimization Algorithm and Comparative Analysis”, IEEE 10th
International Conference on Computer Information Technology (CIT
2010), 2010
[5] Ghassem Jaberipur, Behrooz Parhami, and Saeid Gorgin,
“RedundantDigit Floating-Point Addition Scheme Based on a Stored
Rounding Value”, IEEE transactions on computer, vol. 59, no.
[6] Alexandre F. Tenca, “Multi-operand Floating-point Addition”, 19th
IEEE International Symposium on Computer Arithmetic, 2009.
[7] Cornea, “IEEE 754-2008 Decimal Floating-Point for Intel®
Architecture Processors”, 19th IEEE International Symposium on
Computer Arithmetic, 2009.
[8] Joy Alinda P. Reyes, Louis P. Alarcon, and Luis Alarilla, “A Study of
Floating-Point Architectures for Pipelined RISC Processors”, IEEE
International Symposium on Circuits and Systems, 2006.
[9] Peter-Michael Seidel, “High-Radix Implementation of IEEE
FloatingPoint Addition”, Proceedings of the 17th IEEE Symposium on
Computer Arithmetic, 2005.
[10] Guillermo Marcus, Patricia Hinojosa, Alfonso Avila and Juan
NolazcoFlores, “A Fully Synthesizable Single-Precision, Floating-Point
Adder/Subtractor and Multiplier in VHDL for General and Educational
Use”, Proceedings of the 5thIEEE International Caracas Conference on
Devices, Circuits and Systems, Dominican Republic, Nov.3-5, 2004.
[11] Carl Hamacher, Zvonko Vranesic, Safwat Zaky “Computer
Organization” 5th Edition, Tata McGraw-Hill Education, 2011.
2013 International Conference on Advanced Electronic Systems (ICAES)

More Related Content

What's hot

Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Karthik Sagar
 
Structures for discrete time systems
Structures for discrete time systemsStructures for discrete time systems
Structures for discrete time systems
veda C
 
Design of high speed adders for efficient digital design blocks
Design of high speed adders for efficient digital design blocksDesign of high speed adders for efficient digital design blocks
Design of high speed adders for efficient digital design blocks
Bharath Chary
 

What's hot (20)

Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
 
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
 
Ramya Project
Ramya ProjectRamya Project
Ramya Project
 
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Reversible code converter
Reversible code converterReversible code converter
Reversible code converter
 
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
DESIGN AND SIMULATION OF DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SA...
 
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAIMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLA
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
 
Implementation and Performance Analysis of a Vedic Multiplier Using Tanner ED...
Implementation and Performance Analysis of a Vedic Multiplier Using Tanner ED...Implementation and Performance Analysis of a Vedic Multiplier Using Tanner ED...
Implementation and Performance Analysis of a Vedic Multiplier Using Tanner ED...
 
Design & implementation of high speed carry select adder
Design & implementation of high speed carry select adderDesign & implementation of high speed carry select adder
Design & implementation of high speed carry select adder
 
8 bit alu design
8 bit alu design8 bit alu design
8 bit alu design
 
Implementation of Reversable Logic Based Design using Submicron Technology
Implementation of Reversable Logic Based Design using Submicron TechnologyImplementation of Reversable Logic Based Design using Submicron Technology
Implementation of Reversable Logic Based Design using Submicron Technology
 
High functionality reversible arithmetic logic unit
High functionality reversible arithmetic logic unitHigh functionality reversible arithmetic logic unit
High functionality reversible arithmetic logic unit
 
32-bit unsigned multiplier by using CSLA & CLAA
32-bit unsigned multiplier by using CSLA &  CLAA32-bit unsigned multiplier by using CSLA &  CLAA
32-bit unsigned multiplier by using CSLA & CLAA
 
Structures for discrete time systems
Structures for discrete time systemsStructures for discrete time systems
Structures for discrete time systems
 
Design of high speed adders for efficient digital design blocks
Design of high speed adders for efficient digital design blocksDesign of high speed adders for efficient digital design blocks
Design of high speed adders for efficient digital design blocks
 
MATLAB/SIMULINK for engineering applications: day 3
MATLAB/SIMULINK for engineering applications: day 3MATLAB/SIMULINK for engineering applications: day 3
MATLAB/SIMULINK for engineering applications: day 3
 
High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex...
High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex...High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex...
High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex...
 
Presentation energy efficient code converters using reversible logic gates
Presentation energy efficient code converters using reversible logic gatesPresentation energy efficient code converters using reversible logic gates
Presentation energy efficient code converters using reversible logic gates
 

Viewers also liked

Decimal arithmetic in Processors
Decimal arithmetic in ProcessorsDecimal arithmetic in Processors
Decimal arithmetic in Processors
Peeyush Pashine
 

Viewers also liked (10)

06 floating point
06 floating point06 floating point
06 floating point
 
SINGLE PRECISION FLOATING POINT MULTIPLIER USING SHIFT AND ADD ALGORITHM
SINGLE PRECISION FLOATING POINT MULTIPLIER USING SHIFT AND ADD ALGORITHMSINGLE PRECISION FLOATING POINT MULTIPLIER USING SHIFT AND ADD ALGORITHM
SINGLE PRECISION FLOATING POINT MULTIPLIER USING SHIFT AND ADD ALGORITHM
 
Class10
Class10Class10
Class10
 
Decimal arithmetic in Processors
Decimal arithmetic in ProcessorsDecimal arithmetic in Processors
Decimal arithmetic in Processors
 
Fixed-point arithmetic
Fixed-point arithmeticFixed-point arithmetic
Fixed-point arithmetic
 
Representation of Real Numbers
Representation of Real NumbersRepresentation of Real Numbers
Representation of Real Numbers
 
Fixed point and floating-point numbers
Fixed point and  floating-point numbersFixed point and  floating-point numbers
Fixed point and floating-point numbers
 
Design and Implementation of High Speed Area Efficient Double Precision Float...
Design and Implementation of High Speed Area Efficient Double Precision Float...Design and Implementation of High Speed Area Efficient Double Precision Float...
Design and Implementation of High Speed Area Efficient Double Precision Float...
 
Quick tutorial on IEEE 754 FLOATING POINT representation
Quick tutorial on IEEE 754 FLOATING POINT representationQuick tutorial on IEEE 754 FLOATING POINT representation
Quick tutorial on IEEE 754 FLOATING POINT representation
 
Computer arithmetic
Computer arithmeticComputer arithmetic
Computer arithmetic
 

Similar to Design and Implementation of Single Precision Pipelined Floating Point Co-Processor

Similar to Design and Implementation of Single Precision Pipelined Floating Point Co-Processor (20)

Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
 
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
 
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
IRJET - Design of a Low Power Serial- Parallel Multiplier with Low Transition...
 
A Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaA Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on Fpga
 
Implementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adderImplementation of an arithmetic logic using area efficient carry lookahead adder
Implementation of an arithmetic logic using area efficient carry lookahead adder
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
I43024751
I43024751I43024751
I43024751
 
Jz2517611766
Jz2517611766Jz2517611766
Jz2517611766
 
Jz2517611766
Jz2517611766Jz2517611766
Jz2517611766
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemes
 
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...IRJET-  	  Asic Implementation of Efficient Error Detection for Floating Poin...
IRJET- Asic Implementation of Efficient Error Detection for Floating Poin...
 
Id93
Id93Id93
Id93
 
Id93
Id93Id93
Id93
 
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
 
A comparative study of different multiplier designs
A comparative study of different multiplier designsA comparative study of different multiplier designs
A comparative study of different multiplier designs
 
Cn33543547
Cn33543547Cn33543547
Cn33543547
 
Cn33543547
Cn33543547Cn33543547
Cn33543547
 
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
 
At36276280
At36276280At36276280
At36276280
 
Implemenation of Vedic Multiplier Using Reversible Gates
Implemenation of Vedic Multiplier Using Reversible Gates Implemenation of Vedic Multiplier Using Reversible Gates
Implemenation of Vedic Multiplier Using Reversible Gates
 

More from Silicon Mentor

More from Silicon Mentor (16)

Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer Vision
 
Encoding Schemes for Multipliers
Encoding Schemes for MultipliersEncoding Schemes for Multipliers
Encoding Schemes for Multipliers
 
Signal Filtering
Signal FilteringSignal Filtering
Signal Filtering
 
Implementation of DSP Algorithms on FPGA
Implementation of DSP Algorithms on FPGAImplementation of DSP Algorithms on FPGA
Implementation of DSP Algorithms on FPGA
 
High Performance FPGA Based Decimal-to-Binary Conversion Schemes
High Performance FPGA Based Decimal-to-Binary Conversion SchemesHigh Performance FPGA Based Decimal-to-Binary Conversion Schemes
High Performance FPGA Based Decimal-to-Binary Conversion Schemes
 
Low Power Design Approach in VLSI
Low Power Design Approach in VLSILow Power Design Approach in VLSI
Low Power Design Approach in VLSI
 
Floating Point Unit (FPU)
Floating Point Unit (FPU)Floating Point Unit (FPU)
Floating Point Unit (FPU)
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
Analog design
Analog design Analog design
Analog design
 
Matlab worshop
Matlab worshopMatlab worshop
Matlab worshop
 
Low power vlsi design workshop 1
Low power vlsi design workshop 1Low power vlsi design workshop 1
Low power vlsi design workshop 1
 
Hspice proposal workshop
Hspice proposal workshopHspice proposal workshop
Hspice proposal workshop
 
HDL workshop
HDL workshopHDL workshop
HDL workshop
 
Vlsi ieee projects
Vlsi ieee projectsVlsi ieee projects
Vlsi ieee projects
 
Vlsi ieee projects
Vlsi ieee projectsVlsi ieee projects
Vlsi ieee projects
 
IEEE based Research projects List for M.tech/PhD students
IEEE based Research projects List for M.tech/PhD studentsIEEE based Research projects List for M.tech/PhD students
IEEE based Research projects List for M.tech/PhD students
 

Design and Implementation of Single Precision Pipelined Floating Point Co-Processor

  • 1. Design and Implementation of Single Precision Pipelined Floating Point Co-Processor Manisha Sangwan PG Student, M.Tech VLSI Design SENSE, VIT University Chennai, India - 600048 manishasangwan47@gmail.com A Anita Angeline Professor SENSE, VIT University Chennai, India -600048 Abstract—Floating point numbers are used in various applications such as medical imaging, radar, telecommunications Etc. This paper deals with the comparison of various arithmetic modules and the implementation of optimized floating point ALU. Here pipelined architecture is used in order to increase the performance and the design is achieved to increase the operating frequency by 1.62 times. The logic is designed using Verilog HDL. Synthesis is done on Encounter by Cadence after timing and logic simulation. Keywords—CLA; clock-cycles; GDM; HDL; IEEE 754; pipelining; verilog I. INTRODUCTION These days the use of computers has been incorporated in many applications such as medical imaging, radar, audio system design, signal processors, industrial control, telecommunications and other applications. There are many key factors that are considered before choosing the number system. Those preferred factors are computational capabilities required for the application, processor and system costs, accuracy, complexity and performance attributes. Over the years the designers moved from fixed point to floating point operations due to wide range along with the ability of floating point numbers to represent a very small number to a very large number but at the same time accuracy of floating point numbers is less. So trade off has to be done to get the optimized architecture. Almost twenty years ago IEEE 754 standard was adopted for floating point numbers. This single precision standard is for 32-bit number and the double precision is for 64- bit number. The storage layout consists of three components: the sign, the exponent and mantissa. The mantissa includes implicit leading bit and the fractional part. TABLE I. FLOATING POINT REPRESENTATION Sign Exponent Fractional Bias Single Precision 1[31] 8[30-23] 23[22-00] 127 Double Precision 1[63] 11[62-52] 52[62-52] 1023 Sign Bit: It defines whether the number the number is positive or negative. If it is 0 then number is positive else negative. Exponent: Both positive and negative values are represented by this field. To do this, a bias is added to the actual exponent in order to get the stored exponent. [10] For single precision this value is 127 and for double precision this value is 1023. Mantissa: mantissa bit consists of both implicit leading bit and fractional part it is represented in the form of 1.f where 1 is implicit and f is fractional part mantissa is also known as significant. II. IMPLEMENTATION A. Adder and Subtractor Algorithm Fig. 1. Block diagram of Floating Point Adder and Subtractor In adders the propagation of carry from one adder block to another block consumes a lot of time but Carry Look Ahead (CLA) adder save this propagation time by generating and propagating this carry simultaneously in the consecutive blocks. So, for faster operation this carry look ahead adder is used. 2013 International Conference on Advanced Electronic Systems (ICAES)
  • 2. F1 E1 F2 E2 F3 E3 S[i] = X[i] Y[i] C[i] G[i] = X[i] * Y[i] P[i] = X[i] +Y[i] C[i+1] = X[i] * Y[i] + X[i] * C[i] + Y[i] * C[i] C[i+1] = G[i] + P[i] * G[i] + P[i] * P[i-1] * C[i-1] B. Multiplication Algorithm Fig. 2. Block diagram of Floating Point Multiplier Multiplication is an important block for ALU. With high speed and low power multipliers there comes the complexity so, we need to do the trade off between these to get the optimized algorithm with the regularity of layout. There are different algorithms available for multiplication such as Booth, modified booth, Wallace, Bough Wooley, Braun multipliers. But issue with multipliers is speed and regular layout so keeping both the parameters in mind modified booth algorithm was chosen. It is a powerful algorithm for signed-number multiplication, which treats both positive and negative numbers uniformly. C. Division Algorithm Fig. 3. Block diagram of Floating Point Division For division process Goldsmith (GDM) algorithm is used. For this algorithm to be applied needs both the inputs to be normalized first. In this both the multiplication of inputs is independent of each other, so can be executed in parallel, therefore it will increase the latency. Algorithm for GDM for Q = A/B using k iterations: • B !=0, |e0| < 1 • Initialize N = A, D = B, R = (1-e0) / B • For I = 0 to k • N = N*R • D = D*R • R = 2 – D • End for • Q = N Return Q D. Pipelining The speed of execution of any instruction can be varied by a number of methods like, using a faster circuit technology to build a processor or to arrange the hardware in such a way so that multiple operations can be performed at the same time. [11] By using pipelining multiple operations can be performed simultaneously without changing the execution time of an instruction. As shown in example below, in sequential execution the third instruction will be executed at sixth clock cycle but in pipelined architecture the same instruction will be executed at fourth clock cycle and we’ll be saving two clock cycles at the end where, F is fetching block and E is execution block. Therefore, as the number of instruction count will be increasing we’ll be saving more clock cycles. 1 2 3 4 5 6 |______I1______| |_______I2______| |_______I3______| Fig. 4. Sequential Execution Clock Cycles: 0 1 2 3 4 F1 E1 F2 E2 F3 E3 Fig. 5. Pipelined Execution III. FUNCTIONAL AND TIMING VERIFICATION The functional verification is done using both Cadence and Xilinx along with that the arithmetic results are verified theoretically. Timing analyses, power analyses and area analyses are done using both Cadence and Xilinx. 2013 International Conference on Advanced Electronic Systems (ICAES)
  • 3. A. Adder and Subtractors In addition and subtraction, all the sign, exponential and fractional bits are operated separately and then the exponents are shifted accordingly to equate the exponents then this addition operation is done on fractional bits. The final result is combined to make the output of 32 bits [Fig 6]. Fig. 6. Simulation Waveform for Adder and Subtractor B. Multiplier In multiplication block, the exponents are added and the fractional bits are multiplied according to the algorithm. To get the sign of the result, the XOR operation is performed on both the input sign bits. Finally all the bits are clubbed to get the final result [Fig 7]. Fig. 7. Simulation Waveform for Multiplier C. Division The exponents are subtracted and the fractional bits are multiplied and subtracted according to the GDM algorithm and continues iterations are done to get the closest result, for sign bit XOR operation is performed on the sign bits of inputs. This block is most time and area consuming [Fig 8]. Fig. 8. Simulation Waveform for Division D. ALU Layout In Fig 9, the final layout of the circuit is shown. Fig. 9. ALU Layout IV. SYNTHESIS RESULT Synthesis results are shown in the table 2, shown below. TABLE II. COMPARATIVE ANALYSIS OF BOTH EXISTING AND PROPOSED DESIGN Existing Proposed Leakage power 2.880282 W 3.50267 W Dynamic Power 11.377751 mW 16.14882 mW Total power 11.380632 mW 16.15232 mW Gate counts 2881 3712 Frequency 225.65 MHz 367.654 MHz Critical path 4.43164 nsec 2.70 nsec Logic utilization 1% (466/38000) 4%(1780/46560) IOS 44% (130/296) 65% (157/ 240) Area 75436 97194 V. CONCLUSION In this paper various arithmetic modules are implemented and then various comparative analyses are done. Ultimately these individual blocks are clubbed to make Floating point based ALU in a pipelined manner to minimize the power and to increase the operating frequency at the same time. These comparative analyses are done on cadence and Xilinx both. Simulation results are verified theoretically. Verilog HDL (Hardware Description Language) is used to design the whole ALU block. In existing design, total power is 11.380632 mW that is 0.70458 times less as compared to the proposed design but the operating frequency is 1.67 times more than the existing design. Along with that the gate count is increased and area is also increased because of number of iterations used in the algorithm. 2013 International Conference on Advanced Electronic Systems (ICAES)
  • 4. VI. FUTURE WORK Optimization of source code to decrease the area and gate counts will improve the reliability. Low power techniques could be incorporated to obtain better trade off. REFERENCES [1] Addanki Purna Ramesh, Ch. Pradeep, “FPGA Based Implementation of Double Precision Floating Point Adder/Subtractor Using VERILOG”, International Journal of Emerging Technology and Advanced Engineering, Volume 2, Issue 7, July 2012 [2] Semih Aslan, Erdal Oruklu and Jafar Saniie, “A High Level Synthesis and Verification Tool for Fixed to Floating Point Conversion”, 55th IEEE Internation Midwest Symposium on Circuits and Systems (MWSCAS 2012) [3] Prashanth B.U.V, P. Anil Kumai, G. Sreenivasulu, “Design and Implementation of Floating Point ALU on a FPGA Processor”, International Conference on Computing, Electronics and Electrical Technologies (ICCEET 2012), 2012 [4] Subhajit Banerjee Purnapatra, Siddharth Kumar, Subrata Bhattacharya, “Implementation of Floating Point Operations on Fixed Point Processor – An Optimization Algorithm and Comparative Analysis”, IEEE 10th International Conference on Computer Information Technology (CIT 2010), 2010 [5] Ghassem Jaberipur, Behrooz Parhami, and Saeid Gorgin, “RedundantDigit Floating-Point Addition Scheme Based on a Stored Rounding Value”, IEEE transactions on computer, vol. 59, no. [6] Alexandre F. Tenca, “Multi-operand Floating-point Addition”, 19th IEEE International Symposium on Computer Arithmetic, 2009. [7] Cornea, “IEEE 754-2008 Decimal Floating-Point for Intel® Architecture Processors”, 19th IEEE International Symposium on Computer Arithmetic, 2009. [8] Joy Alinda P. Reyes, Louis P. Alarcon, and Luis Alarilla, “A Study of Floating-Point Architectures for Pipelined RISC Processors”, IEEE International Symposium on Circuits and Systems, 2006. [9] Peter-Michael Seidel, “High-Radix Implementation of IEEE FloatingPoint Addition”, Proceedings of the 17th IEEE Symposium on Computer Arithmetic, 2005. [10] Guillermo Marcus, Patricia Hinojosa, Alfonso Avila and Juan NolazcoFlores, “A Fully Synthesizable Single-Precision, Floating-Point Adder/Subtractor and Multiplier in VHDL for General and Educational Use”, Proceedings of the 5thIEEE International Caracas Conference on Devices, Circuits and Systems, Dominican Republic, Nov.3-5, 2004. [11] Carl Hamacher, Zvonko Vranesic, Safwat Zaky “Computer Organization” 5th Edition, Tata McGraw-Hill Education, 2011. 2013 International Conference on Advanced Electronic Systems (ICAES)