Your SlideShare is downloading. ×
Design and Implementation of a High Performance Multiplier using HDL
Design and Implementation of a High Performance Multiplier using HDL
Design and Implementation of a High Performance Multiplier using HDL
Design and Implementation of a High Performance Multiplier using HDL
Design and Implementation of a High Performance Multiplier using HDL
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Design and Implementation of a High Performance Multiplier using HDL


Published on

services on...... embedded(ARM9,ARM11,LINUX,DEVICE DRIVERS,RTOS) VLSI-FPGA DIP/DSP PLC AND SCADA JAVA AND DOTNET iPHONE ANDROID If ur intrested in these project please feel free to contact …

services on...... embedded(ARM9,ARM11,LINUX,DEVICE DRIVERS,RTOS) VLSI-FPGA DIP/DSP PLC AND SCADA JAVA AND DOTNET iPHONE ANDROID If ur intrested in these project please feel free to contact us@09640648777,Mallikarjun.V

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Design and Implementation of a High Performance Multiplier using HDL Aparna P R Nisha Thomas Department of ECE ER & DCI-IT Lourde Matha College of Science and Technology Centre for Development of Advanced Computing Kuttichal, India Thiruvananthapuram, India nishathomas2601@gmail.comAbstract—This paper presents an area efficient implementation compressors. The design is structured for m x n multiplicationof a high performance parallel multiplier. Radix-4 Booth where m and n can reach up to 126 bits. The number of partialmultiplier with 3:2 compressors and Radix-8 Booth multiplier products is n/2 in Radix-4 Booth algorithm while it getswith 4:2 compressors are presented here. The design is reduced to n/3 in Radix-8 Booth algorithm. The Wallace treestructured for m × n multiplication where m and n can reach upto 126 bits. Carry Lookahead Adder is used as the final adder to uses Carry Save Adders (CSA) to accumulate the partialenhance the speed of operation. Finally the performance products. This reduces the time as well as the chip area. Toimprovement of the proposed multipliers is validated by further enhance the speed of operation, carry-look-aheadimplementing a higher order FIR filter. The design entry is done (CLA) adder is used as the final adder[3] .in VHDL and simulated using ModelSim SE 6.4 design suite fromMentor Graphics. It is then synthesized and implemented using II. ARCHITECTUREXilinx ISE 9.2i targeted towards Spartan 3 FPGA The architecture of the proposed multiplier is shown in Keywords- FPGA; HDL;Carry Lookahead Adder; Carry Save Fig.1. It consists of four major modules: Booth encoder,Adder; Wallace Tree; Booth Encoding. partial product generator, Wallace tree and carry look-ahead adder[4]. The Booth encoder performs Radix-2 or Radix-4 I. INTRODUCTION encoding of the multiplier bits. Based on the multiplicand and With the rapid advances in multimedia and communication the encoded multiplier, partial products are generated by thesystems, real-time signal processing and large capacity data generator. For large multipliers of 32 bits, the performance ofprocessing are increasingly being demanded. The multiplier is the modified Booth algorithm is limited. So Booth recodingan essential element of the digital signal processing such as together with Wallace tree structures have been used in thefiltering and convolution. Most digital signal processing proposed fast multiplier. The partial products are suppliedmethods use nonlinear functions such as discrete cosine to Wallace Tree and added appropriately. The results aretransform (DCT) or discrete wavelet transform (DWT). As finally added using a Carry Look-ahead Adder (CLA) to getthey are basically accomplished by repetitive application of the final product.multiplication and addition, their speed becomes a major factorwhich determines the performance of the entire calculation. Multiplier MultiplicandSince the multiplier requires the longest delay among the basicoperational blocks in digital system, the critical path isdetermined more by the multiplier [1]. Furthermore, multiplier Partial Productconsumes much area and dissipates more power. Hence Booth Encoder Generatordesigning multipliers which offer either of the following designtargets – high speed, low power consumption [2], less area oreven a combination of them is of substantial research interest. Wallace Tree Multiplication operation involves generation of partial Formationproducts and their accumulation. The speed of multiplicationcan be increased by reducing the number of partial productsand/or accelerating the accumulation of partial products. Carry Look-aheadAmong the many methods of implementing high speed Adderparallel multipliers, there are two basic approaches namelyBooth algorithm and Wallace Tree compressors. This paperdescribes an efficient implementation of a high speed parallel Productmultiplier using both these approaches. Here two multipliers Figure 1. Block diagram of Wallace Booth Multiplierare proposed. The first multiplier makes use of the Radix-4Booth Algorithm with 3:2 compressors while the secondmultiplier uses the Radix-8 Booth algorithm with 4:2
  • 2. A. Radix – 4 Booth Algorithm B. Radix 8 Booth Algorithm Booth algorithm is a powerful algorithm [5] for signed Radix-8 Booth recoding applies the same algorithm as thatnumber multiplication, which treats both positive and negative of Radix-4, but now we take quartets of bits instead of triplets.numbers uniformly. Since a k-bit binary number can be Each quartet is codified as a signed digit using Table II.interpreted as k/2-digit Radix-4 number, a k/3-digit Radix-8 Radix-8 algorithm reduces the number of partial products tonumber and so on, it can deal with more than one bit of the n/3, where n is the number of multiplier bits. Thus it allows amultiplier in each cycle by using high radix multiplication[6]. time gain in the partial products summation.The major disadvantage of the Radix-2 algorithm was that theprocess required n shifts and an average of n/2 additions for an C. Wallace Treen bit multiplier. This variable number of shift and add The Wallace tree method [7] is used in high speed designsoperations is inconvenient for designing parallel multipliers.Also the Radix-2 algorithm becomes inefficient when there are in order to produce two rows of partial products that can beisolated 1’s. The Radix-4 modified Booth algorithm added in the last stage. Also critical path and the number ofovercomes all these limitations of Radix-2 algorithm. For adders get reduced when compared to the conventionaloperands equal to or greater than 16 bits, the modified Radix-4 parallel adders. Here the Wallace tree has taken the role ofBooth algorithm has been widely used. It is based on encoding accelerating the accumulation of the partial products. Itsthe two’s complement multiplier in order to reduce the number advantage becomes more pronounced for multipliers of greaterof partial products to be added to n/2. The multiplier, Y in than 16 bits .The speed, area and power consumption of thetwo’s complement form can be written as in multipliers will be in direct proportion to the efficiency of the compressors. The Wallace tree structure with 3:2 compressors Y = -Yn-1 2n-1 + Ȉi Yi2i ; 0 ” i” n-2 (1) and 4:2 compressors is shown in Fig.2 and Fig.3 respectively. In this regard, we can expect a significant reduction inIt can be written as computing multiplications. Y = Ȉi (- 2 Y2i+1 + + Y2i +Y2i-1) 22i ; 0 ” i ” n-2 (2) TABLE II. RADIX -8 BOOTH RECODING Multiplier Bits Recoded Operation on Table I shows the encoding of the signed multiplier Y, multiplicand, X Yi+2 Yi+1 Yi Yi-1using the Radix-4 Booth algorithm. Radix-4 Booth recodingencodes multiplier bits into [-2, 2]. Here we consider the 0 0 0 0 0Xmultiplier bits in blocks of three, such that each block overlaps 0 0 0 1 +Xthe previous block by one bit. It is advantageous to begin the 0 0 1 0 +Xexamination of the multiplier with the least significant bit. The 0 0 1 1 +2Xoverlap is necessary so that we know what happened in thelast block, as the most significant bit of the block acts like a 0 1 0 0 +2Xsign bit. 0 1 0 1 +3X TABLE I. RADIX - 4 BOOTH RECODING 0 1 1 0 +3X Multiplier Bits Recoded Operation on 0 1 1 1 +4X multiplicand, X Y2i- 1 Y2i Y 2i +1 1 0 0 0 -4X 0 0 0 0X 1 0 0 1 -3X 0 0 1 +X 1 0 1 0 -3X 0 1 0 +X 1 0 1 1 -2X 0 1 1 +2X 1 1 0 0 -2X 1 0 0 -2X 1 1 0 1 -1X 1 0 1 -X 1 1 1 0 -1X 1 1 0 -X 1 1 1 1 0X 1 1 1 0X
  • 3. Figure 2. Wallace Tree using 3:2 compressors Figure. 3. Wallace Tree using 4:2 compressors The 3:2 compressors make use of a carry save adder .The a sequence of carry bits. In carry save adder, the carry digit iscarry save adder outputs two numbers of the same dimensions taken from the right and passed to the left, just as inas the inputs, one is a sequence of partial sum bits and other is conventional addition; but the carry digit passed to the left is
  • 4. the result of the previous calculation and not the current one. III. IMPLEMENTATION RESULTSSo in each clock cycle, carries only have to move one step The design entry of 126×126 bit multipliers using Radix-4along and the clock can tick much faster. Also the carry-save Booth algorithm with 3:2 compressors and Radix-8 Boothadder produces all of its output values in parallel, and thus has algorithm with 4:2 compressors are done using VHDL andthe same delay as a single full-adder. The 4:2 compressors [8] simulated using ModelSim SE 6.4 design suite from Mentorhave been widely employed in the high speed multipliers to Graphics. It is then synthesized and implemented in a Xilinxlower the latency of the partial product accumulation stage. A XC3S5000 fg1156 -4 FPGA using the Xilinx ISE 9.2i design4:2compressor can be built using two 3:2 compressors. Owing suite. Figure 4 presents a snapshot of simulation waveforms forto its regular interconnection, the 4:2 compressors is ideal for 126×126 bit multiplier. Table IV summarizes the FPGAthe construction of regularly structured Wallace Tree with low resource utilization of these two multipliers.complexity. The number of levels in the Wallace tree using Finally the performance improvement is validated by3:2 compressors can be approximately given as implementing a higher order FIR filter using these multipliers. Number of levels = log (k/2) (3) Table V summarizes the FPGA resource utilization for FIR filters using these multipliers. This shows that the multiplier log ( 3/2) using Radix-8 Booth multiplier with 4:2 compressors gives better speed and the number of occupied slices is lower for thewhere k is the number of partial products. Table III shows the multiplier using Radix-4 Booth algorithm with 3:2number of levels in the Wallace tree using 3:2 compressors for compressors. The FIR filters are implemented in Xilinxdifferent number of partial products. XC3S1500fg676-4 FPGA. The specifications of the FIR filter chosen are as follows. Sampling frequency : 24 KHz TABLE III. NUMBER OF LEVELS IN THE WALLACE TREE Pass band frequency : 8 KHz Number of Number of Levels Partial in the Wallace Tree Stop band frequency : 9 KHz Products,(k) Pass band ripple : 0.1 linear scale 3 1 Stop band attenuation : 0.001 linear scale 4 2 5”k”6 3 TABLE IV. DEVICE UTILIZATION SUMMARY OF MULTIPLIERS 7”k”9 4 10 ” k ” 13 5 Logic Utilization Radix-4 Booth Radix -8 Booth Algorithm with Algorithm with 14 ” k ” 19 6 ( XC3S5000 fg1156 -4 ) 3:2 compressors 4:2 compressors 20 ” k ” 28 7 29 ” k ” 42 8 Number of four input LUTs 29,990 43,520 43 ” k ” 63 9 Number of occupied Slices 16,535 25,100 Number of bonded IOBs 503 503 The final results obtained at the output of the Wallace treeare added using a Carry Look-ahead Adder (CLA) which isindependent of the number of bits of the two operands. In Total equivalent gate count 205,470 271,881Carry Look-ahead Adder, for every bit the carry and sumoutputs are independent of the previous bits and thus the JTAG Gate Count for IOBs 24,144 24,144rippling effect has completely been eliminated. It works bycreating two signals, propagate and generate for each bitposition, based on whether a carry is propagated through from Maximum combinational path 448.270 448.082a less significant bit position, a carry is generated in that bit delay (ns)position, or if a carry is killed in that bit position. Average Connection Delay (ns) 5.514 4.781 Maximum Pin Delay (ns) 27.603 28.171
  • 5. Figure 4. Simulation of 126 × 126 bit multiplier IV. CONCLUSION ACKNOWLEDGMENT In this paper, the design and implementation of two high We express our sincere gratitude to Mrs. Kumari Roshniperformance parallel multipliers is proposed. The first V.S, Centre for Development of Advanced Computing,multiplier makes use of the Radix-4 Booth Algorithm with 3:2 Thiruvananthapuram for incorporating us as part of thiscompressors while the second multiplier uses the Radix-8 prestigious work.Booth algorithm with 4:2 compressors. Both the designs were REFERENCESimplemented on Spartan 3 FPGA. The multiplier using Radix-4 Booth algorithm with 3:2 compressors shows morereduction in device utilization as compared to the multiplier [1] Dong-Wook Kim, Young-Ho Seo, “A New VLSI Architecture ofusing Radix-8 Booth algorithm with 4:2 compressors. Parallel Multiplier-Accumulator based on Radix-2 Modified Booth Algorithm”, Very Large Scale Integration (VLSI) Systems, IEEEMeanwhile the multiplier using Radix-8 Booth algorithm with Transactions, vol.18, pp.: 201-208, 04 Feb. 20104:2 compressors are found to be faster than the other. Also the [2] Prasanna Raj P, Rao, Ravi, “VLSI Design and Analysis of Multipliersuse of Radix- 8 Booth multiplier with 4:2 compressors for a for Low Power”, Intelligent Information Hiding and Multimedia Signalhigher order FIR filter showed a dramatic speed improvement Processing, Fifth International Conference, pp.: 1354-1357, Sept. 2009than that using Radix-4 Booth multiplier with 3:2 [3] Lakshmanan, Masuri Othman and Mohamad Alauddin Mohd.Ali, “High Performance Parallel Multiplier using Wallace-Booth Algorithm”,compressors. Semiconductor Electronics, IEEE International Conference , pp.: 433- 436, Dec. 2002. [4] Jan M Rabaey, “Digital Integrated Circuits, A Design Perspective”, TABLE V. DEVICE UTILIZATION SUMMARY OF FIR FILTERS Prentice Hall, Dec.1995 Radix-4 Booth Radix-8 Booth [5] Louis P. Rubinfield, “A Proof of the Modified Booths Algorithm for Algorithm with Multiplication”, Computers, IEEE Transactions,vol.24, pp.: 1014-1015, Algorithm with Logic Utilization 4:2 Oct. 1975 3:2 ( XC3S1500 fg676 -4 ) compressors compressors [6] Rajendra Katti, “A Modified Booth Algorithm for High Radix Fixed- point Multiplication”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions, vol. 2, pp.: 522-524, Dec. 1994. [7] C. S. Wallace, “A Suggestion for a Fast Multiplier”, Electronic Number of four input LUTs 9,870 11,099 Computers, IEEE Transactions, vol.13, Page(s): 14-17, Feb. 1964 [8] Hussin R et al , “An Efficient Modified Booth Multiplier Architecture”, IEEE International Conference, pp.:1-4, 2008. Number of occupied Slices 5,445 5,910 Number of Slice Flip Flops 264 285 Number of Bonded IOBs 311 311 Number of Gated Clocks 1 1 Maximum Combinational 100.241 75.502 Path Delay (ns)