High speed modified booth encoder multiplier for signed and unsigned numbers 1


Published on


Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

High speed modified booth encoder multiplier for signed and unsigned numbers 1

  1. 1. High speed Modified Booth Encoder multiplier for signed and unsigned numbersRavindra P Rajput M. N Shanmukha SwamyVTU, Belgaum, Dept. of Electronics and Communication, VTU, Belgaum,Dept. of Electronics and CommunicationBDT College of Engineering, JC College of Engineering,Davangere, Karnataka, India Mysore, Karnataka, Indiarprajput2006@gmail.com mnsjce@gmail.comAbstract---This paper presents the design and implementation ofsigned-unsigned Modified Booth Encoding (SUMBE) multiplier.The present Modified Booth Encoding (MBE) multiplier andthe Baugh-Wooley multiplier perform multiplication operationon signed numbers only. The array multiplier and Braun arraymultipliers perform multiplication operation on unsignednumbers only. Thus, the requirement of the modern computersystem is a dedicated and very high speed unique multiplierunit for signed and unsigned numbers. Therefore, this paperpresents the design and implementation of SUMBE multiplier.The modified Booth Encoder circuit generates half the partialproducts in parallel. By extending sign bit of the operands andgenerating an additional partial product the SUMBE multiplieris obtained. The Carry Save Adderr (CSA) tree and the finalCarry Lookahead (CLA) adder used to speed up the multiplieroperation. Since signed and unsigned multiplication operationis performed by the same multiplier unit the required hardwareand the chip area reduces and this in turn reduces powerdissipation and cost of a system.Index Terms------Modified Booth Encoding multiplier, Baugh-Wooley multiplier, Array multiplier, Braun array multiplier,CSA, CLA, partial products, Signed-unsigned.I. INTRODUCTIONIn digital computing systems multiplication is anarithmetic operation. The multiplication operation consists ofproducing partial products and then adding these partialproducts the final product is obtained. Thus the speed of themultiplier depends on the number of partial product and thespeed of the adder. Since the multipliers have a significantimpact on the performance of the entire system, many high-performance algorithms and architectures have beenproposed [1-23]. The very high speed and dedicatedmultipliers are used in pipeline and vector computers.The high speed Booth multipliers and pipelined Boothmultipliers are used for digital signal processing (DSP)applications such as for multimedia and communicationsystems. High speed DSP computation applications such asFast Fourier transform (FFT) require additions andmultiplications.The papers [1, 4] presents a design methodology for high-speed Booth encoded parallel multiplier. For partial productgeneration, a new Modified Booth encoding (MBE) schemeis used to improve the performance of traditional MBEschemes. But this multiplier is only for signed numbermultiplication operation.The conventional modified Booth encoding (MBE)generates an irregular partial product array because of theextra partial product bit at the least significant bit position ofeach partial product row. Therefore papers [2, 3] presents asimple approach to generate a regular partial product arraywith fewer partial product rows and negligible overhead,there by lowering the complexity of partial productreduction and reducing the area, delay, and power of MBEmultipliers. But the drawback of this multiplier is that itfunction only for signed number operands.The modified-Booth algorithm is extensively used forhigh-speed multiplier circuits. Once, when array multiplierswere used, the reduced number of generated partial productssignificantly improved multiplier performance. In designsbased on reduction trees with logarithmic logic depth,however, the reduced number of partial products has alimited impact on overall performance. The Baugh-Wooleyalgorithm [5, 8, 11] is a different scheme for signedmultiplication, but is not so widely adopted because it maybe complicated to deploy on irregular reduction trees. Againthe Baugh-Wooley algorithm is for only signed numbermultiplication. The array multipliers [12] and Braun arraymultipliers [13] operates only on the unsigned numbers.Thus, the requirement of the modern computer system is adedicated and very high speed multiplier unit that canperform multiplication operation on signed as well asunsigned numbers. In this paper we designed andimplemented a dedicated multiplier unit that can performmultiplication operation on both signed and unsignednumbers, and this multiplier is called as SUMBE multiplier.II. CONVENTIONAL MBE MULTIPLIERSThe new MBE recoder [1] was designed according to thefollowing analysis. Table 1 presents the truth table of thenew encoding scheme. The Z signal makes the output zero tocompensate the incorrect X2_b and Neg signals. Fig. 1presents the circuit diagram of the encoder and decoder. Theencoder generates X1_b, X2_b, and Z signals by encoding thethree x-signals. The yLSB signal is the LSB of the y signaland is combined with x-signals to determine the Row_LSBand the Neg_cin signals. Similarly, yMSB is combined with x-!000111!      111444ttthhh      IIInnnttteeerrrnnnaaatttiiiooonnnaaalll      CCCooonnnfffeeerrreeennnccceee      ooonnn      MMMooodddeeelllllliiinnnggg      aaannnddd      SSSiiimmmuuulllaaatttiiiooonnn!777888-­-­-000-­-­-777666!555-­-­-444666888222-­-­-777///111222      $$$222666...000000      ©©©      222000111222      IIIEEEEEEEEEDDDOOOIII      111000...111111000!///UUUKKKSSSiiimmm...222000111222...!!666444!!000111!      111444ttthhh      IIInnnttteeerrrnnnaaatttiiiooonnnaaalll      CCCooonnnfffeeerrreeennnccceee      ooonnn      MMMooodddeeelllllliiinnnggg      aaannnddd      SSSiiimmmuuulllaaatttiiiooonnn!777888-­-­-000-­-­-777666!555-­-­-444666888222-­-­-777///111222      $$$222666...000000      ©©©      222000111222      IIIEEEEEEEEEDDDOOOIII      111000...111111000!///UUUKKKSSSiiimmm...222000111222...!!666444!
  2. 2. signals to determine the sign extension signals. Fig. 2 showsan overview of the partial product array for an 8 × 8multiplier. The sign extension circuitry developed in [22]and [23]. The conventional MBE partial product array hastwo drawbacks: 1) an additional partial product term at the(n-2)th bit position; 2) poor performance at the LSB-part. Toremedy the two drawbacks, the LSB part of the partialproduct array is modified. Referring to Fig. 2a, the Row_LSB(gray circle) and the Neg_cin terms are combined and furthersimplified using Boolean minimization. The new equationsfor the Row_LSB and Neg_cin can be written as (1) and (2),respectively.TABLE 1: Truth Table of MBE Scheme.Row_ LSBi = yLSB (x2i-1 ⊕ x2i ) (1)(2)Fig. 1. The Encoder and Decoder for the new MBE scheme. (a) Simpleencoder (b) Decoder.The Fig. 2(a) has widely been adopted in parallelmultipliers since it can reduce the number of partial productrows to be added by half, thus reducing the size andenhancing the speed of the reduction tree. However, asshown in Fig. 1(a), the conventional MBE algorithmgenerates n/2 + 1 partial product rows rather than n/2 due tothe extra partial product bit (neg bit) at the least significantbit position of each partial product row for negativeencoding, leading to an irregular partial product array and acomplex reduction tree. Therefore, the Modified Boothmultipliers with a regular partial product array [2] produce avery regular partial product array, as shown in Fig. 3. Notonly each negi is shifted left and replaced by ci but also thelast neg bit is removed. This approach reduces the partialproduct rows from n/2 + 1 to n/2 by incorporating the lastneg bit into the sign extension bits of the first partial productrow, and almost no overhead is introduced to the partialproduct generator. More regular partial product array andfewer partial product rows result in a small and fast reductiontree, so that the area, delay, and power of MBE multiplierscan further be reduced.0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1bi+1 bi bi-1 value0112-2-1-10X1_a10011001X2_b01100110Z11000011Neg00001111Neg_cini = x2i+1 (x2i-1 + x2i ) (x2i-1 +yLSB ) (x2i + yLSB )Fig.3.The partial product array for 8×8 multiplier.p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p02 1 0 p07 p06 p05 p04 p03 p02 p01 001 s1 p17 p16 p15 p14 p13 p12 p11 10 c01 s2 p27 p26 p25 p24 p23 p22 p21 10 c11 s3 p37 p36 p35 p34 p33 p32 p31 20 c2Fig. 2. 8× 8 MBE partial product array. (a) Traditional MBE partialproductarray. (b) New MBE partial product array.yj yj-1PijNegX1_aX2_bZx2i-1X2_bX2_aZx2ix2ix2i+1x2i-1x2i(a) (b)666555000666555000
  3. 3. III. PROPOSED SUMBE MULTIPLIERThe main goal of this paper is to design and implement8×8 multiplier for signed and unsigned numbers using MBEtechnique. Table 2 shows the truth table of MBE scheme.From table 2 the MBE logic diagram is implemented asshown in Fig. 4. Using the MBE logic and considering otherconditions the Boolean expression for one bit partial productgenerator is given by the equation 3.TABLE 2: Truth Table of MBE Scheme.Equation 3 is implemented as shown in Fig. 5. TheSUMBE multiplier does not separately consider theencoder and the decoder logic, but instead implementedas a single unit called partial product generator as shownin Fig. 5. The negative partial products are converted into 2’’scomplement by adding a negate (Ni) bit. An expression fornegate bit is given by the Boolean equation 4. This equationis implemented as shown in Fig. 6. The required signedextension to convert 2’’s complement signed multiplier intoboth signed-unsigned multiplier is given by the equations 5and 6. For Boolean equations 5 and 6 the correspondinglogic diagram is shown in Fig. 7.a8 = s_u a7 (5)b8 = s_u b7 (6)The working principle of sign extension that convertssigned multiplier signed-unsigned multiplier as follows. Onebit control signal called signed-unsigned (s_u) bit is used toindicate whether the multiplication operation is signed(3)Pij= (ai⊕ bi+1+bi-1⊕ bi ) ( bi-1⊕ bi+1+bi⊕ bi+1+ bi-1⊕ bi )s_ub7a7b8a8Fig. 7. Logic diagram of sign converter.0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1bi+1 bi bi-1 value0112-2-1-10X1_a10011001X2_a01100110Z11000011Neg00001110Fig. 5. Logic diagram of 1-bit partial product generatorPijbi-1 biai bi+1 bi bi+1ai-1 bi+1bi-1 biNibi+1bi-1biFig. 6. Logic diagram of negate bit generater.bi bi-1 bi bi+1 bi bi-1X1_a X2_bZFig. 4. Logic diagram of MBE.Ni = bi+1 ( bi-1 bi ) (4)666555111666555111
  4. 4. number or unsigned number. When Sign-unsign s_u) = 0, itindicates unsigned number multiplication, and when s_u = 1,it indicates signed number multiplication. It is required thatwhen the operation is unsigned multiplication the signextended bit of both multiplicand and multiplier should beextended with 0, that is a8 = a9 = b8 = b9 = 0. It is requiredthat when the operation is signed multiplication the signextended bit depends on whether the multiplicand is negativeor the multiplier is negative or both the operands arenegative. For this when the multiplicand operand is negativeand muitplier operand is positive the sign extended bitsshould be generated are s_u = 1, a7 =1, b7 = 0, a8 = a9 =1,and b8 = b9 =0. And when the multiplicand operand ispositive and muitplier operand is negative the sign extendedbits should be generated are s_u = 1, a7 =0, , b7 = 1, a8 = a9=0, and b8 = b9 =1. Table 3 shows the SUMBE multiplieroperation.Fig. 8 shows the partial products generated by partaialproduct generater circuit which is shown in Fig. 5. There are5-partial products with sign extension and negate bit Ni. Allthe 5-partial products are generated in parallel.In Fig. 8 there are 5-partial products namely X1, X2, X3,X4 and X5. These partial products are added by the CarrySave Adders (CSA) and the final stage is Carry Lookahead(CLA) adder as shown in Fig. 9. Each CSA adder takesthree inputs and produse sum and carry in parallel. There arethree CSAs, five partial products are added by the CSA treeand finally when there are only two outputs left out thenfinally CLA adder is used to produce the final result.Assuming each gate delay an unit delay, including partialproduct generator circuit delay, then the total through theCSA and CLA is 3+4 = 7 Unit delay. Thus with present VeryLarge Scale Integration (VLSI) the total delay is estimatedaround 0.7 nanosecond and the multiplier operates at gigahertz frequency.IV. SIMULATION RESULTSVerilog code is written to generate the required hardwareand to produce the partial product, for CSA adder, and CLAadder. After the successful compilation the RTL viewgenerated is shown in Fig.10.Fig. 10: RTL view of 8×8 signed-unsigned multiplier.Table 3: SUMBE operationSign-unsign Type of operation0 Unsigned multiplication1 Signed multiplicationCSA1CSA2CSA3CLAP = A×BX3 X2 X1X4X5Fig. 9. Partial product adder logic.X1X4X5X2X3Fig. 8. 8×8 multiplier for signed-unsigned number.p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0a7 a6 a5 a4 a3 a2 a1 a0b7 b6 b5 b4 b3 b2 b1 b0p08p08 p08 p07 p06 p05 p04 p03 p02 p01 p001 p18p17 p16 p15 p14 p13 p12 p11 p10 N01 p28p27 p26p25 p24p23 p22p21 p20 N11 p38 p37p36p35 p34 p33 p32 p31 p30 N2P47p46 p45 p44 p43 p42 p41p40 N3666555222666555222
  5. 5. Fig. 11 shows the simulation result of signed-unsignednumbers. Fig. 11(a) shows the simulation result of signed-unsigned number in binary. Here when the control signal s_u= 0, the 8-bit operands are considered as unsigned and theproduct of 11111111 × 11111111 = 1111111000000001.And when the control signal s_u = 1, the 8-bit operands areconsidered as signed and the product of 11111111 ×11111111 = 0000000000000001. Fig. 11(b) shows thesimulation result of unsigned operands in decimal i.e whenthe control signal s_u = 0, the 8-bit operands are consideredas unsigned and the product of 11111111 (255) × 11111111(255) = 1111111000000001 (65025), and when the controlsignal s_u = 1, the 8-bit operands are considered as signedand the product of 11111111 (-1) × 11111111 (-1) =0000000000000001 (+1).Fig. 11(c) and Fig. 11(d) shows the simulation result ofsigned- unsigned number in binary and decimal respectively.When s_u = 0, the 8-bit operands are unsigned and theproduct of 01111111 (127) × 01111111 (127) =0011111100000001 (16129). And when the control signals_u = 1, the 8-bit operands are signed and the product of11111111 (-1) × 00000001 (+1) = 1111111111111111 (-1).Fig. 11(e) and Fig. 11(f) shows the simulation result ofsigned- unsigned number in binary and decimal respectively.When s_u = 0, the 8-bit operands are unsigned and theproduct of 10000000 (128) × 10000000 (128) =0100000000000000 (16384). And when the control signals_u = 1, the 8-bit operands are signed and the product of10000000 (-128) × 01111111 (+127) = 1100000010000000(-16256).(a)(b)(c)(d)(e)(f)Fig. 11. Simulation results.666555333666555333
  6. 6. V. CONCLUSIONIn all multiplication operation product is obtained byadding partial products. Thus the final speed of the multipliercircuit depends on the speed of the adder circuit and thenumber of partial products generated. If radix 8 Boothencodind technique is used then there are only 3 partialproducts and for that only one CSA and a CLA is required toproduce the final product.REFERENCES[1] W. ––C. Yeh and C. ––W. Jen, ““High Speed Booth encoded ParallelMultiplier Design,”” IEEE transactions on computers, vol. 49, no. 7, pp.692-701, July 2000.[2] Shiann-Rong Kuang, Jiun-Ping Wang, and Cang-Yuan Guo,““Modified Booth multipliers with a Regular Partial Product Array,””IEEE Transactions on circuits and systems-II, vol 56, No 5, May 2009.[3] Li-Rong Wang, Shyh-Jye Jou and Chung-Len Lee, ““A well-tructuredModified Booth Multiplier Design”” 978-1-4244-1617-2/08/$25.00©2008 IEEE.[4] Soojin Kim and Kyeongsoon Cho ““Design of High-speed ModifiedBooth Multipliers Operating at GHz Ranges”” World Academy ofScience, Engineering and Technology 61 2010.[5] Magnus Sjalander and Per Larson-Edefors. ““The Case for HPM-BasedBaugh-Wooley Multipliers,”” Chalmers University of Technology,Sweden, March 2008.[6] Z Haung and M D Ercegovac, ““High performance Low Power left toright array multiplier design”” IEEE trans.Computer, vol 54 no3, page272-283 Mar 2005.[7] Hsing-Chung Liang and Pao-Hsin Huang, ““Testing Transition DelayFaults in Modified BoothMultipliers by Using C-testable and SICPatterns””IEEE2007, 1-4244-1272-2/07.[8] Aswathy Sudhakar, and D. Gokila, ““Run-Time ReconfigurablePipelined Modified Baugh-Wooley Multipliers,”” Advances inComputational Sciences and Technology ISSN 0973-6107 Volume 3Number 2 (2010) pp. 223––235.[9] Myoung-Cheol Shin, Se-Hyeon Kang, and In-Cheol Park, ““An Area-Efficient Iterative Modified-Booth Multiplier Based on Self-TimedClocking,”” Industry, and Energy through the project System IC 2010,and by IC Design Education Center (IDEC).[10] Leandro Z. Pieper, Eduardo A. C. da Costa, Sérgio J. M. de Almeida,““Efficient Dedicated Multiplication Blocks for2´s Complement Radix-2m Array Multipliers,”” JOURNAL OF COMPUTERS, VOL. 5, NO.10, OCTOBER 2010.[11] C R Baugh and B. A Wooley, ““ A two’’s complement parallel arraymultiplication algorithm,”” IEEE Transaction on Computers, Vol. 22,n0.12,pp 1045-1047, Dec.1973.[12] Neil H E Weste, David Harris, Ayan Banerjee, ““CMOS VLSI DesignA circuits and Systems Perspective ”” Third edition, Pearson Education,pp.347-349.[13] Pucknell Douglas A, Eshraghan, Kamran, ““Basic VLSI Design,””Thirdedition 2003, PHI Publication, pp.242-243.[14] A Chandrakasan and R Brodersen, ““ Low power CMOS digitaldesign,”” IEEE J solid state circ, vol 27 no. 4, April 1992, pp. 473-484.[15] C. S Wallace, ““A suggestion for a fast multiplier”” IEEE Transactionon Electronic Computers, pp-14-17, Feb-1974.[16] I Koren, Computer Arithmetic Algorithms. Englewood Cliffs, NewJersey, Prentice Hall, 1993, pp.99-123.[17] M D Ercegovac, and T Lang, ““ Fast multiplication without carrypropagate addition”” IEEE Transaction on computers, vol.39, no.11, pp.1385-1390. Nov 1990.[18] P.E. Blankenship, ““ A comment on a two’’s complement parallel arraymultiplication algorithm,”” IEEE Tranaction on computers, vol. 23, no.12, pp.1327, Dec 1974.[19] A. D. Booth, ““A Signed Binary Multiplication Technique”” QuartelyJournal of Mechanics and Applied Mathematics. Vol. 4, no.2pp. 236-240, 1951.[20] O. L. MacSorley, ““High Speed Arithmetic in Binary Computers,”” inProceedings of the IRE, vol. 49, no. 1, January 1961, pp. 67-97.[21] J. Liu, S. Zhou, H.Zhu, and C. ––K. Cheng. ““An Algorithmic Approachfor Generic Parallel Adders,”” in IEEE International Conference onComputer Aided Design, December 2003, pp. 734-740.[22] J. Fadavi-Ardekani, ªM×N Booth Encoded Multiplier GeneratorUsing Optimized Wallace Trees,º IEEE Trans. VLSI Systems,vol. 1,no. 2, June 1993.[23] A.A. Farooqui et al., ªGeneral Data-Path Organization of a MACUnitfor VLSI Implementation of DSP Processors,º Proc. 1998IEEEIntl Symp. Circuits and Systems, vol. 2, pp. 260-263, 1998..666555444666555444