This document summarizes a research paper that simulates an IEEE 754 standard double precision floating point multiplier using VHDL. It begins by introducing floating point arithmetic and the IEEE 754 standard. It then describes the double precision floating point format which uses 64 bits total, including a 1-bit sign, 11-bit exponent, and 52-bit mantissa. The document explains that floating point multiplication involves multiplying the mantissas and adding the exponents, with an XOR of the signs. It notes that simulating a double precision multiplier requires generating 53x53 bit partial products. The objective of the research was to design an efficient double precision floating point multiplier using techniques like Booth encoding to reduce the number of partial products and speed up
On Fixed Point error analysis of FFT algorithmIDES Editor
In this correspondence the analysis of overall
quantization loss for the Fast Fourier Transform (FFT)
algorithms is extended to the case where the twiddle factor
word length is different from the register word length. First,
a statistical noise model to predict the Quantization error
after the multiplication of two quantized signals, of different
precision, is presented. This model is then applied to FFT
algorithms. Simulation results, that corroborate the
theoretical analysis, are then presented.
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...Naoki Shibata
Naoki Shibata : Efficient Evaluation Methods of Elementary Functions Suitable for SIMD Computation, Journal of Computer Science on Research and Development, Proceedings of the International Supercomputing Conference ISC10., Volume 25, Numbers 1-2, pp. 25-32, DOI:10.1007/s00450-010-0108-2 (May. 2010).
http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
http://freecode.com/projects/sleef
Data-parallel architectures like SIMD (Single Instruction Multiple Data) or SIMT (Single Instruction Multiple Thread) have been adopted in many recent CPU and GPU architectures. Although some SIMD and SIMT instruction sets include double-precision arithmetic and bitwise operations, there are no instructions dedicated to evaluating elementary functions like trigonometric functions in double precision. Thus, these functions have to be evaluated one by one using an FPU or using a software library. However, traditional algorithms for evaluating these elementary functions involve heavy use of conditional branches and/or table look-ups, which are not suitable for SIMD computation. In this paper, efficient methods are proposed for evaluating the sine, cosine, arc tangent, exponential and logarithmic functions in double precision without table look-ups, scattering from, or gathering into SIMD registers, or conditional branches. We implemented these methods using the Intel SSE2 instruction set to evaluate their accuracy and speed. The results showed that the average error was less than 0.67 ulp, and the maximum error was 6 ulps. The computation speed was faster than the FPUs on Intel Core 2 and Core i7 processors.
An Area Efficient Vedic-Wallace based Variable Precision Hardware Multiplier ...IDES Editor
The complete architecture with the necessary blocks
and their internal structures are proposed in this paper. In
this algorithm the complete variable precision format is
utilized for the multiplication of the two numbers with a size
of nxn bits. The internal multiplier is choosen for m bit size
and is implemented using vedic-wallace structure for high
speed implementation. The architecture includes the
calculation of all the fields in the format for complete output.
The exponent of the final result is obtained by using carry
save adder for fast computations with less area utilization.
This multiplier uses the concept of MAC unit, giving rise to
more accurate results having a bits size of the final result will
be 2n2.
Design and Performance Analysis of Convolutional Encoder and Viterbi Decoder ...IJERA Editor
In digital communication forward error correction methods have a great practical importance when channel is
noisy. Convolutional error correction code can correct both type of errors random and burst. Convolution
encoding has been used in digital communication systems including deep space communication and wireless
communication. The error correction capability of convolutional code depends on code rate and constraint
length. The low code rate and high constraint length has more error correction capabilities but that also
introduce large overhead. This paper introduces convolutional encoders for various constraint lengths. By
increasing the constraint length the error correction capability can be increased. The performance and error
correction also depends on the selection of generator polynomial. This paper also introduces a good generator
polynomial which has high performance and error correction capabilities.
On Fixed Point error analysis of FFT algorithmIDES Editor
In this correspondence the analysis of overall
quantization loss for the Fast Fourier Transform (FFT)
algorithms is extended to the case where the twiddle factor
word length is different from the register word length. First,
a statistical noise model to predict the Quantization error
after the multiplication of two quantized signals, of different
precision, is presented. This model is then applied to FFT
algorithms. Simulation results, that corroborate the
theoretical analysis, are then presented.
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...Naoki Shibata
Naoki Shibata : Efficient Evaluation Methods of Elementary Functions Suitable for SIMD Computation, Journal of Computer Science on Research and Development, Proceedings of the International Supercomputing Conference ISC10., Volume 25, Numbers 1-2, pp. 25-32, DOI:10.1007/s00450-010-0108-2 (May. 2010).
http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
http://freecode.com/projects/sleef
Data-parallel architectures like SIMD (Single Instruction Multiple Data) or SIMT (Single Instruction Multiple Thread) have been adopted in many recent CPU and GPU architectures. Although some SIMD and SIMT instruction sets include double-precision arithmetic and bitwise operations, there are no instructions dedicated to evaluating elementary functions like trigonometric functions in double precision. Thus, these functions have to be evaluated one by one using an FPU or using a software library. However, traditional algorithms for evaluating these elementary functions involve heavy use of conditional branches and/or table look-ups, which are not suitable for SIMD computation. In this paper, efficient methods are proposed for evaluating the sine, cosine, arc tangent, exponential and logarithmic functions in double precision without table look-ups, scattering from, or gathering into SIMD registers, or conditional branches. We implemented these methods using the Intel SSE2 instruction set to evaluate their accuracy and speed. The results showed that the average error was less than 0.67 ulp, and the maximum error was 6 ulps. The computation speed was faster than the FPUs on Intel Core 2 and Core i7 processors.
An Area Efficient Vedic-Wallace based Variable Precision Hardware Multiplier ...IDES Editor
The complete architecture with the necessary blocks
and their internal structures are proposed in this paper. In
this algorithm the complete variable precision format is
utilized for the multiplication of the two numbers with a size
of nxn bits. The internal multiplier is choosen for m bit size
and is implemented using vedic-wallace structure for high
speed implementation. The architecture includes the
calculation of all the fields in the format for complete output.
The exponent of the final result is obtained by using carry
save adder for fast computations with less area utilization.
This multiplier uses the concept of MAC unit, giving rise to
more accurate results having a bits size of the final result will
be 2n2.
Design and Performance Analysis of Convolutional Encoder and Viterbi Decoder ...IJERA Editor
In digital communication forward error correction methods have a great practical importance when channel is
noisy. Convolutional error correction code can correct both type of errors random and burst. Convolution
encoding has been used in digital communication systems including deep space communication and wireless
communication. The error correction capability of convolutional code depends on code rate and constraint
length. The low code rate and high constraint length has more error correction capabilities but that also
introduce large overhead. This paper introduces convolutional encoders for various constraint lengths. By
increasing the constraint length the error correction capability can be increased. The performance and error
correction also depends on the selection of generator polynomial. This paper also introduces a good generator
polynomial which has high performance and error correction capabilities.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree MultiplierWaqas Tariq
The paper presents FPGA implementation of a spectral sharpening process suitable for speech enhancement and noise reduction algorithms for digital hearing aids. Booth and Booth Wallace multiplier is used for implementing digital signal processing algorithms in hearing aids. VHDL simulation results confirm that Booth Wallace multiplier is hardware efficient and performs faster than Booth’s multiplier. Booth Wallace multiplier consumes 40% less power compared to Booth multiplier. A novel digital hearing aid using spectral sharpening filter employing booth Wallace multiplier is proposed. The results reveal that the hardware requirement for implementing hearing aid using Booth Wallace multiplier is less when compared with that of a booth multiplier. Furthermore it is also demonstrated that digital hearing aid using Booth Wallace multiplier consumes less power and performs better in terms of speed.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Real Time System Identification of Speech Signal Using Tms320c6713IOSRJVSP
In this paper real time system identification is implemented on TMS320C6713 for speech signal using standard IEEE sentence (SP23) of NOIZEUS database with different types of real world noises at different level SNR taken from AURORA data base. The performance is measured in terms of signal to noise ratio (SNR) improvement. The implementation is done with “C’ program for least mean square (LMS) and recursive least square (RLS) algorithms and processed using DSP processor with Code composer studio.
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...jmicro
This paper presents floating point multiplier capable of supporting wide range of application domains like scientific computing and multimedia applications and also describes an
implementation of a floating point multiplier that supports the IEEE 754-2008 binary interchange format with methodology for estimating the power and speed has been developed. This Pipelined vectorized floating point multiplier supporting FP16, FP32, FP64 input data and reduces the area, power, latency and increases throughput. Precision can be implemented by taking the 128 bit input operands.The floating point units consumeless power and small part of total area. Graphic Processor Units (GPUS) are specially tuned for
performing a set of operations on large sets of data. This paper also presents the design of a Double precision floating point multiplication algorithm with vector support. The single precision floating point multiplier is having a path delay of 72ns and also having the operating frequency of 13.58MHz.Finally this implementation is done in Verilog HDL using Xilinx ISE-14.2.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree MultiplierWaqas Tariq
The paper presents FPGA implementation of a spectral sharpening process suitable for speech enhancement and noise reduction algorithms for digital hearing aids. Booth and Booth Wallace multiplier is used for implementing digital signal processing algorithms in hearing aids. VHDL simulation results confirm that Booth Wallace multiplier is hardware efficient and performs faster than Booth’s multiplier. Booth Wallace multiplier consumes 40% less power compared to Booth multiplier. A novel digital hearing aid using spectral sharpening filter employing booth Wallace multiplier is proposed. The results reveal that the hardware requirement for implementing hearing aid using Booth Wallace multiplier is less when compared with that of a booth multiplier. Furthermore it is also demonstrated that digital hearing aid using Booth Wallace multiplier consumes less power and performs better in terms of speed.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Real Time System Identification of Speech Signal Using Tms320c6713IOSRJVSP
In this paper real time system identification is implemented on TMS320C6713 for speech signal using standard IEEE sentence (SP23) of NOIZEUS database with different types of real world noises at different level SNR taken from AURORA data base. The performance is measured in terms of signal to noise ratio (SNR) improvement. The implementation is done with “C’ program for least mean square (LMS) and recursive least square (RLS) algorithms and processed using DSP processor with Code composer studio.
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...jmicro
This paper presents floating point multiplier capable of supporting wide range of application domains like scientific computing and multimedia applications and also describes an
implementation of a floating point multiplier that supports the IEEE 754-2008 binary interchange format with methodology for estimating the power and speed has been developed. This Pipelined vectorized floating point multiplier supporting FP16, FP32, FP64 input data and reduces the area, power, latency and increases throughput. Precision can be implemented by taking the 128 bit input operands.The floating point units consumeless power and small part of total area. Graphic Processor Units (GPUS) are specially tuned for
performing a set of operations on large sets of data. This paper also presents the design of a Double precision floating point multiplication algorithm with vector support. The single precision floating point multiplier is having a path delay of 72ns and also having the operating frequency of 13.58MHz.Finally this implementation is done in Verilog HDL using Xilinx ISE-14.2.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...Silicon Mentor
Floating point numbers are used in various applications such as medical imaging, radar, telecommunications Etc. This paper deals with the comparison of various arithmetic modules and the implementation of optimized floating point ALU. For more info download this file or visit us at:
http://www.siliconmentor.com/
A Fast Floating Point Double Precision Implementation on FpgaIJERA Editor
In the modern day digital systems, floating point units are an important component in many signal and image
processing applications. Many approaches of the floating point units have been proposed and compared with
their counterparts in recent years. IEEE 754 floating point standard allows two types of precision units for
floating point operations, single and double. In the proposed architecture double precision floating point unit is
used and basic arithmetic operations are performed. A parallel architecture is proposed along with the high
speed adder, which is shared among other operations and can perform operations independently as a separate
unit. To improve the area efficiency of the unit, carry select adder is designed with the novel resource sharing
technique which allows performing the operations with the minimum usage of the resources while computing
the carry and sum for „0‟ and „1‟. The design is implemented using the Xilinx Spartan 6 FPGA and the results
show the 23% improvement in the speed of the designed circuit
A High Speed Transposed Form FIR Filter Using Floating Point Dadda MultiplierIJRES Journal
There is a huge demand in high speed area efficient parallel FIR filter using floating point dadda algorithm, due to increase performance of processing units. Area and spped are usually confictiong constraints so that improving speed results mostly in large areas. In our research we will try to determine the best solution to this problem by comparing the results of different multipliers. Different sized of two algorithm for high speed hardware multipliers were studied and implemented ie.dadda and booth multipliers. The working of these two multipliers were studied and implementing each of them separately in VHDL. The results of this research will help us to choose the better option between multipliers for floating point multiplier for fabricating different system.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...iosrjce
Field Programmable Gate Arrays (FPGA) are increasingly being used to design high- end
computationally intense microprocessors capable of handling both fixed and floating- point mathematical
operations. Addition is the most complex operation in a floating-point unit and offers major delay while taking
significant area. Over the years, the VLSI community has developed many floating-point adder algorithms
mainly aimed to reduce the overall latency. The Objective of this paper to implement the 32 bit binary floating
point adder with minimum time. Floating point numbers are used in various applications such as medical
imaging, radar, telecommunications Etc. Here pipelined architecture is used in order to increase the
performance and the design is achieved to increase the operating frequency. The logic is designed using VHDL.
This paper discusses in detail the best possible FPGA implementation will act as an important design resource.
The performance criterion is latency in all the cases. The algorithms are compared for overall latency, area,
and levels of logic and analyzed specifically for one of the latest FPGA architectures provided by Xilinx.
DOUBLE PRECISION FLOATING POINT CORE IN VERILOGIJCI JOURNAL
A floating-point unit (FPU) is a math coprocessor, a part of a computer system specially designed to carry
out operations on floating point numbers. The term floating point refers to the fact that the radix point can
"float"; that is, it can placed anywhere with respect to the significant digits of the number. Double
precision floating point, also known as double, is a commonly used format on PCs due to its wider range
over single precision in spite of its performance and bandwidth cost. This paper aims at developing the
verilog version of the double precision floating point core designed to meet the IEEE 754 standard .This
standard defines a double as sign bit, exponent and mantissa. The aim is to build an efficient FPU that
performs basic functions with reduced complexity of the logic used and also reduces the memory
requirement as far as possible.
1. Puneet Paruthi, Tanvi Kumar, Himanshu Singh / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1761-1766
Simulation of IEEE 754 Standard Double Precision Multiplier
using Booth Techniques
1
Puneet Paruthi, 2Tanvi Kumar, 3Himanshu Singh
1,2
Dept. of ECE, BMSCE, Mukhsar, 3Asstt. Professor,ISTK,Kalawad.
Abstract
Multiplication is an important are generated in parallel, the multiplier’s
fundamental function in arithmetic operations. It performance is almost doubled compared to a
can be performed with the help of different conventional floating point multiplier.
multipliers using different techniques. The A. Floating Point Arithmetic
objective of good multiplier is to provide a The IEEE Standard for Binary Floating-
physically compact high speed and low power Point Arithmetic (IEEE 754) is the most widely used
consumption. To save significant power standard for floating-point computation, and is
consumption of multiplier design, it is a good followed by many CPU and FPU implementations.
direction to reduce number of operations thereby The standard defines formats for representing
reducing a dynamic power which is a major part floating-point number (including ±zero and
of total power dissipation. The main objective of denormals) and special values (infinities and NaNs)
this Dissertation is to design “Simulation of IEEE together with a set of floating-point operations that
754 standard double precision multiplier” using operate on these values. It also specifies four
VHDL. rounding modes and five exceptions. IEEE 754
specifies four formats for representing floating-point
values: single-precision (32-bit), double-precision
Keywords - IEEE floating point arithmetic;
(64-bit), single-extended precision (≥ 43-bit, not
Rounding; Floating point multiplier
commonly used) and double-extended precision (≥
79-bit, usually implemented with 80 bits). Many
I. INTRODUCTION languages specify that IEEE formats and arithmetic
Every computer has a floating point be implemented, although sometimes it is optional.
processor or a dedicated accelerator that fulfils the For example, the C programming language, which
requirements of precision using detailed floating pre-dated IEEE 754, now allows but does not require
point arithmetic. The main applications of floating IEEE arithmetic (the C float typically is used for
points today are in the field of medical imaging, IEEE single-precision and double uses IEEE double-
biometrics, motion capture and audio applications. precision).
Since multiplication dominates the execution time of B. Double Precision Floating Point Numbers
most DSP algorithms, so there is a need of high Thus, a total of 64 bits is needed for
speed multiplier with more accuracy. Reducing the double-precision number representation. To achieve
time delay and power consumption are very essential n 1
requirements for many applications. Floating Point a bias equal to 2 − 1 is added to the actual
Numbers: The term floating point is derived from exponent in order to obtain the stored exponent. This
the fact that there is no fixed number of digits before equal 1023 for an 11-bit exponent of the double
and after the decimal point, that is, the decimal point precision format. The addition of bias allows the use
can float. There are also representations in which the of an exponent in the range from −1023 to +1024,
number of digits before and after the decimal point corresponding to a range of 0.2047 for double
is set, called fixed-point representations. precision number. The double precision format
offers a range from 2−1023 to 2+1023, which is
Floating Point Numbers are numbers that equivalent to 10−308 to 10+308.
can contain a fractional part. E.g. following numbers Sign: 1-bit wide and used to denote the sign of the
are the floating point numbers: 3.0, -111.5, ½, 3E-5 number i.e. 0 indicate positive number and 1
etc. IEEE Standard for Binary Floating Point represent negative number.
Arithmetic. The Institute of Electrical and Exponent: 11-bit wide and signed exponent in
Electronics Engineers (IEEE) sponsored a standard excess- 1023 representation. Mantissa: 52-bit wide
format for 32-bit and larger floating point numbers, and fractional component
known as IEEE 754 standard. 1 11 52
This paper presents a new floating-point
Sign Exponent Mantissa
multiplier which can perform a double-precision
floating-point multiplication or simultaneous single
64
precision floating-point multiplications. Since in
Fig.1 Double Precision Floating Point Format
single precision floating-point multiplication results
1761 | P a g e
2. Puneet Paruthi, Tanvi Kumar, Himanshu Singh / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1761-1766
C. Floating-Point Multiplication operation of the sign fields of both of the operands.
Multiplication of two floating point Cho, J. Hong et al. and N. Besli et al.[5][6]
normalized numbers is performed by multiplying the multiplied double precision operands (53-bit fraction
fractional components, adding the exponents, and an fields),in which a total of 53 53-bit partial products
exclusive or operation of the sign fields of both of are generated . To speed up this process, the two
the operands The most complicated part is obvious solutions are to generate fewer partial
performing the integer-like multiplication on the products and to sum them faster. Sumit Vaidya et
fraction fields . Essentially the multiplication is done al.[1] compared the different multipliers on the basis
in two steps, partial product generation and partial of power, speed, delay and area to get the efficient
product addition. For double precision operands (53- multiplier. It can be concluded that array Multiplier
bit fraction fields), a total of 53 53-bit partial requires more power consumption and gives
products are generated. optimum number of components required, but delay
The general form of the representation of floating for this multiplier is larger than Wallace Tree
point is: Multiplier. Hasan Krad et al.[4] presented a
(-1) S * M * 2E performance analysis of two different multipliers for
Where unsigned data, one uses a carry-look-ahead adder
S represents the sign bit, M represents the mantissa and the second one uses a ripple adder. In this author
and E represents the exponent. said that the multiplier with a carry-look-ahead
Given two FP numbers n1 and n2, the product of adder has shown a better performance over the
both, denoted as n, can be expressed as: multiplier with a ripple adder in terms of gate delays.
n = n1 × n2 In other words, the multiplier with the carrylook-
= (−1) S1 · p1 · 2E1 × (−1) S2 · p2 · 2E2 ahead adder has approximately twice the speed of
= (−1) S1+S2 · (p1 · p2) · 2E1+E2 the multiplier with the ripple adder, under the worst
In order to perform floating-point multiplication, a case. Soojin Kim et al.[2] described the pipeline
simple algorithm is realized: architecture of high-speed modified Booth
Add the exponents and subtract 1023. multipliers. The proposed multiplier circuits are
Multiply the mantissas and determine the based on the modified Booth algorithm and the
sign of the result. pipeline technique which are the most widely used to
Normalize the resulting value, if necessary. accelerate the multiplication speed. In order to
implement the optimally pipelined multipliers, many
D. Model Sim Overview kinds of experiments have been conducted. P.
ModelSim is a verification and simulation Assady [3] presented a new high-speed algorithm.
tool for VHDL, Verilog, SystemVerilog, SystemC, As multipler has to do three important steps, which
and mixed-language designs. ModelSim VHDL include partial product generation, partial product
implements the VHDL language as defined by IEEE reduction, and final addition step. In partial product
Standards 1076-1987, 1076-1993, and 1076-2002. generation step, a new Booth algorithm has been
ModelSim also supports the 1164-1993 Standard presented. In partial product reduction step, a new
Multivalue Logic System for VHDL Interoperability, tree structure has been designed and in final addition
and the 1076.2-1996 Standard VHDL Mathematical step, a new hybrid adder using 4-bit blocks has been
Packages standards. Any design developed with proposed.
ModelSim will be compatible with any other VHDL
system that is compliant with the 1076 specifications III. METHODOLOGY
A. Methods for multiplication
II. LITERATURE SURVEY There are number of techniques that can be
A few research work have been conducted used to perform multiplication. In general, the
to explain the concept of Floating Point Numbers. D. choice is based upon factors such as latency,
Goldberg [11] explained the concept of Floating throughput, area, and design complexity.
Point Numbers used to describe very small to very a) Array Multiplier b) Booth Multiplier
large numbers with a varying level of precision.
They are comprised of three fields, a sign, a fraction Booth's multiplication algorithm is a
and an exponent field. B. Parhami [8] proposed multiplication algorithm that multiplies two signed
IEEE-754 standard defining several floating point binary numbers in two's complement notation. The
number formats and the size of the fields that algorithm was invented by Andrew Donald Booth..
comprise them. This Standard defines several
rounding schemes, which include round to zero, 1) Booth Multiplier
round to infinity, round to negative infinity, and Conventional array multipliers, like the
round to nearest. Michael L. Overton [7] performed Braun multiplier and Baugh Woolley multiplier
the multiplication of two floating point normalized achieve comparatively good performance but they
numbers by multiplying the fractional components, require large area of silicon, unlike the add-shift
adding the exponents, and an Exclusive OR algorithms, which require less hardware and exhibit
1762 | P a g e
3. Puneet Paruthi, Tanvi Kumar, Himanshu Singh / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1761-1766
poorer performance. The Booth multiplier makes use 6) Drop the least significant (rightmost) bit
of Booth encoding algorithm in order to reduce the from P. This is the product of m and r.
number of partial products by considering two bits
of the multiplier at a time, thereby achieving a speed
advantage over other multiplier architectures. This
algorithm is valid for both signed and unsigned
numbers. It accepts the number in 2's complement
form.
2) Array Multiplier
Array multiplier is an efficient layout of a
combinational multiplier. Multiplication of two
binary number can be obtained with one micro-
operation by using a combinational circuit that forms
the product bit all at once thus making it a fast way
of multiplying two numbers since only delay is the
time for the signals to propagate through the gates
that forms the multiplication array. In array
multiplier, consider two binary numbers A and B, of
m and n bits. There are mn summands that are
produced in parallel by a set of mn AND gates. n x n
multiplier requires n (n-2) full adders, n half-adders
and n2 AND gates. Also, in array multiplier worst Figure 2: Block Diagram of the Multiplier
case delay would be (2n+1) td.
C. Double Precision Booth Multiplication
B. Booth Multiplication(Proposed work) Let's suppose a multiplication of 2 floating-
Booth's algorithm involves repeatedly point numbers A and B, where A=-18.0 and B=9.5
adding one of two predetermined values A and S to a Binary representation of the operands:
product P, then performing a rightward arithmetic A = -10010.0
shift on P. Let m and r be the multiplicand and B = +1001.1
multiplier, respectively; and let x and y represent the
number of bits in m and r. Normalized representation of the operands:
Determine the values of A and S, and the A = -1.001x24
initial value of P. All of these numbers should have B = +1.0011x23
a length equal to (x + y + 1). Fill the most significant IEEE representation of the operands:
(leftmost) bits with the value of m. Fill the A =
remaining (y + 1) bits with zeros. 110000000011001000000000000000000000000000
1) Fill the most significant bits with the value 0000000000000000000000
of (−m) in two's complement notation. Fill B =
the remaining (y + 1) bits with zeros. 010000000010001100000000000000000000000000
2) P: Fill the most significant x bits with zeros. 0000000000000000000000
To the right of this, append the value of r.
Fill the least significant (rightmost) bit with Multiplication of the mantissas:
a zero. •we must extract the mantissas, adding an1 as most
3) Determine the two least significant significant bit, for normalization
(rightmost) bits of P. A=
a. If they are 01, find the value of 100100000000000000000000000000000000000000
P + A. Ignore any overflow. 00000000000
b. If they are 10, find the value of B=
P + S. Ignore any overflow. 100110000000000000000000000000000000000000
c. If they are 00, do nothing. Use P 00000000000
directly in the next step. the 106-bit result of the multiplication is:
d. If they are 11, do nothing. Use P 0x558000000000
directly in the next step. only the most significant bits are useful:
4) Arithmetically shift the value obtained in after normalization (elimination of the most
the 2nd step by a single place to the right. significant 1), we get the 52-bit mantissa of the
Let P now equal this new value. result. This normalization can lead to a correction of
5) Repeat steps 2 and 3 until they have been the result's exponent
done y times. In our case, we get:
1763 | P a g e
4. Puneet Paruthi, Tanvi Kumar, Himanshu Singh / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1761-1766
01 IV. SYNTHESIS RESULTS
010101100000000000000000000000000000000000 This design has been implemented,
0000000000 simulated on ModelSim and synthesized for VHDL.
000000000000000000000000000000000000000000 The HDL code uses VHDL 2001 constructs that
000000000000 provide certain benefits over the VHDL 95 standard
Addition of the exponents: in terms of scalability and code reusability.
exponent of the result is equal to the sum of the Simulation based verification is one of the methods
operands exponents. A 1 for functional verification of a design. In this
can be added if needed by the normalization of the method, test inputs are provided using standard test
mantissas multiplication (this is not the case in our benches. The test bench forms the top module that
example) instantiates other modules. Simulation based
As the exponent fields (Ea and Eb) are verification ensures that the design is functionally
biased, the bias must be removed in order to do the correct when tested with a given set of inputs.
addition. And then, we must to add again the bias, to Though it is not fully complete, by picking a random
get the value to be entered into the exponent field of set of inputs as well as corner cases, simulation
the result (Er): based verification can still yield reasonably good
results.
Er = (Ea-1023) + (Eb-1023) + 1023
= Ea + Eb – 1023
In our example, we have:
Ea 10000000011
Eb 10000000010
-1023 10000000001
Er 10000000110
what is actually 7, the exponent of the result
Table 1. The comparison of Multipliers
Calculation of the sign of the result:
The following snapshots are taken from
The sign of the result (Sr) is given by the exclusive-
ModelSim after the timing simulation of the floating
or of the operands signs
point multiplier core.
(Sa and Sb): Consider the inputs to the floating point multiplier
Sr = Sa ⊕ Sb are:
In our example, we get: 1) A = -18.0 = -1.001x24
Sr = 1 ⊕ 0 = 1 A = 1 10000011
i.e. a negative sign 001000000000000000000000000000000000000000
Composition of the result: the setting of the 3 0000000000000 = 0x40F00000
intermediate results (sign, exponent and B = 9.5 = 1.0011x23
mantissa) gives us the final result of our B = 0 10000010
multiplication: 001100000000000000000000000000000000000000
1 10000000110 0000000000000 = 0x41780000
010101100000000000000000000000000000000000 AxB = -171.0 = -1.0101011 x27
0000000000 AxB = 1 10000110
AxB = -18.0x9.5 = -1.0101011 x 21030-1023 = - 010101100000000000000000000000000000000000
10101011.0 = -171.010 00000000000 = 0x42E88000
Figure 3: Schematic Diagram of Double Precision
Multiplier Figure 4: Simulation of Double Precision Multiplier
1764 | P a g e
5. Puneet Paruthi, Tanvi Kumar, Himanshu Singh / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1761-1766
2) A = 134.0625 = 1.00001100001x27 Academy of Science, Engineering and
A = Technology 61, 2010.
010000000110000011000010000000000000000000 [3] P. Assady, “A New Multiplication
0000000000000000000000 Algorithm Using High-Speed
B = -2.25 = -1.001x21 Counters”,European Journal of Scientific
B = Research ISSN 1450-216X, Vol.26
110000000000001000000000000000000000000000 No.3 ,pp.362-368, 2009.
0000000000000000000000 [4] Hasan Krad and Aws Yousif Al-Taie,
AxB = -301.640625 = -1.00101101101001x28 “Performance Analysis of a 32-Bit
AxB = 1 10000000111 Multiplier with a Carry-Look-Ahead Adder
001011011010010000000000000000000000000000 and a 32-bit Multiplier with a Ripple Adder
0000000 using VHDL”, Journal of Computer
Science 4 , ISSN 1549-3636,pp. 305-308,
2008.
[5] Cho, J. Hong, and G Choi, “54x54-bit
Radix-4 Multiplier based on Modified
Booth Algorithm,” 13th ACM Symp.VLSI,
pp 233-236, Apr. 2003.
[6] N. Besli, R. G. DeshMukh, “A 54*54-bit
Multiplier with a new Redundant Booth’s
Encoding,” IEEE Conf. Electrical and
Computer Engineering, vol. 2, pp 597-602,
12-15 May 2002.
[7] Michael L. Overton, “Numerical
Computing with IEEE Floating Point
Arithmetic,” Published by Society for
Industrial and Applied Mathematics, 2001.
Figure 5: Simulation of Double Precision Multiplier [8] B. Parhami, “Computer Arithmetic:
Algorithms and Hardware Designs”,
V. CONCLUSION Oxford University Press, 2000.
In this study a floating point multiplier [9] N. Itoh, Y. Naemura, H. Makino, Y.
design which is capable of executing a double Nakase, “A Compact 54*54-bit With
precision floating-point multiplication using booth Improved Wallace-Tree Structure,” Dig.
algorithm. One of the important aspects of the Technical Papers of Symp. VLSI Circuits,
presented design method is that it can be applicable pp 15-16, Jun. 1999.
to all kinds of floating-point multipliers. The [10] R. K. Yu, G. B. Zyner, “167MHz Radix-4
presented design is compared with a ordinary Floating-Point Multiplier,” IEEE Symp. on
floating point array multiplier via synthesis. The Computer Arithmetic, pp 149-154, Jul.
synthesis results showed that proposed design is 1995.
more efficient than conventional multiplier and [11] D. Goldberg, “What every computer
critical path increment is only one or two gate delay. scientist should know about floating-point
Since modern floating-point multiplier designs have arithmetic” ,ACM Computing Surveys vol.
significantly larger area than the standard floating- 23-1 , pp. 5-48 ,1991.
point multiplier, the percentage of the extra [12] A. Goldovsky and et al., “Design and
hardware will be less for those units. The methods Implementation of 16 by 16 Low-Power
presented in this study will be used on the design of Two’s Complement Multiplier. In Design
floating-point multiplier-adder circuits. Also, the and Implementation of Adder/Subtractor
future work will enhance the proposed designs to and Multiplication Units for Floating-Point
support all IEEE754 rounding modes. Arithmetic IEEE International Symposium
on Circuits and Systems, 5, pp 345–348,
REFERENCES 2000.
[1] Sumit Vaidya and Deepak Dandekar, [13] P. Seidel, L. McFearin, and D. Matula.
“ Delay-Power Performance comparison of “Binary Multiplication Radix-32 and
multipliers in VLSI circuit Radix-256”, In 15th Symp. On Computer
design”,International Journal of Computer Arithmetic, pp 23–32, 2001.
Networks & Communications (IJCNC), [14] U. Kulisch, “Advanced Arithmetic for the
Vol.2, No.4, pg 47-55, July 2010. Digital Computers”, Springer- Verlag,
[2] Soojin Kim and Kyeongsoon Cho, “Design Vienna, 2002.
of High-speed Modified Booth Multipliers [15] M. Schulte, E. Swartzlander, “Hardware
Operating at GHz Ranges”, World Design and ArithmeticAlgorithms for a
1765 | P a g e
6. Puneet Paruthi, Tanvi Kumar, Himanshu Singh / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1761-1766
Variable-Precision, Interval Arithmetic [28] K. Yano and et al. A 3.8-ns CMOS 16 _ 16-
Coprocessor”, Proc. IEEE 12th Symposium b Multiplier Using Complementary Pass-
on Computer Arithmetic (ARITH-12), pp Transistor Logic. Journal of Solid-State
222-230, 1995. Circuits, 25:388–395, 1990.
[16] R. V. K. Pillai, D. A1 - IShalili and A. J. [29] I. Khater, A. Bellaouar, and M. Elmasry,
A1 - Khalili, “A Low Power Approach to “Circuit Techniques for CMOS Low-Power
Floating Point Adder Design”, in High-Performance Multipliers”, IEEE
Proceedings of the I997 International Journal of Solid-State Circuits, 31:1535–
Conference on Computer Design, pp. 178- 1546, 1996.
185. [30] G.Bewick, “Fast Multiplication:
[17] Kari Kalliojarvi and Yrjo Neovo, Algorithms and Implementation”, PhD.
“Distribution of Roundoff Noise in Binary Thesis, Stanford University, 1992
Floating - Point Addition”, Proceedings of
ISCAS92, pp. 1796-1799.
[18] Wakerly, John F., “Digital Design –
Principles and Practices”, Tata McGraw
Hill Series.
[19] Flores, Ivan, The Logic of Computer
Arithmetic, Prentice Hall, Inc. Englewood
Cliff (N.J.).
[20] G. Todorov, BASIC Design,
Implementation and Analysis of a Scalable
High-radix Montgomery Multiplier,”
Master Thesis, Oregon State University,
USA, December 2000.
[21] L. A. Tawalbeh, “Radix-4 ASIC Design of
a Scalable Montgomery Modular Multiplier
using Encoding Techniques,” M.S. Thesis,
Oregon State University, USA, October
2002.
[22] W. Gallagher and E. Swartzlander, “High
Radix Booth Multipliers Using Reduced
Area Adder Trees”, In Twenty- Eighth
Asilomar Conference on Signals,
Systemsand Computers, 1, PP 545–549,
1994.
[23] B. Cherkauer and E. Friedman, “A Hybrid
Radix-4/Radix- 8 Low Power, High Speed
Multiplier Architecture for Wide Bit
Widths”, In IEEE Intl. Symp. onCircuits
and Systems, 4, pp 53–56, 1996.
[24] Israel Koren, Computer Arithmetic
Algorithms, Prentice Hall, Inc. Englewood
Cliff (N.J.) 1993.
[25] Nabeel Shirazi, A1 Walters, and Peter
Athanas. “Quantitative Analysis of Floating
Point Arithmetic on FPGA Based Custom
Computing Machines”. In IEEE Symposium
on FPGAs for Custom Computing
Machines, pages 155-162, April 1995.
[26] Y.Wang, Y. Jiang, and E. Sha, “On Area-
Efficient Low Power Array Multipliers”, In
The 8th IEEE International Conference on
Electronics, Circuits and Systems, pp
1429– 1432, 2001.
[27] Computer Architecture: A Quantitative
Approach, D.A. Patterson and J.L.
Hennessy, Morgan Kaufman, San Mateo,
C.A. 1996, Appendix A.
1766 | P a g e