FPGA implementation of Binary Coded Decimal Digit Adders and Multipliers


Published on

services on...... embedded(ARM9,ARM11,LINUX,DEVICE DRIVERS,RTOS) VLSI-FPGA DIP/DSP PLC AND SCADA JAVA AND DOTNET iPHONE ANDROID If ur intrested in these project please feel free to contact us@09640648777,Mallikarjun.V

Published in: Education
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

FPGA implementation of Binary Coded Decimal Digit Adders and Multipliers

  1. 1. FPGA implementation of Binary Coded Decimal Digit Adders and Multipliers Osama D. Al-Khaleel∗ , Nuh H. Tuli´ § , and Khaldoon M. Mhaidat∗ c Department of Computer Engineering Jordan University of Science and Technology Irbid 22110, Jordan Email∗ :{oda,mhaidat}@just.edu.jo Email§ :noohtulic@hotmail.com Abstract—Decimal arithmetic has gained high impact on the are key components of any decimal hardware to supportoverall performance of today’s financial and commercial applica- decimal arithmetic applications. Therefore, this work focusestions. Decimal additions and multiplication are the main decimal on delivering efficient BCD digit units to be used in highoperations used in any decimal arithmetic algorithm. Decimaldigit adders and decimal digit multipliers are usually the building performance decimal hardware accelerators.blocks for higher order decimal adders and multipliers. FPGAs Two main contributions of this work can be highlighted:provide an efficient hardware platform that can be employed for proposing two new BCD digit adders and proposing oneaccelerating decimal algorithms. In this paper, different designs new BCD digit multiplier. These designs are described andfor two decimal digit adders and one decimal digit multiplierare proposed. The proposed designs were described, functionally simulated using VHDL hardware description language. Theytested, and implemented using VHDL and the Xilinx ISE 10.1 are all implemented on an FPGA and compared with existingtargeting Xilinx Vertix-5 XC5VLX30-3 FPGA. Implementation designers.results and comparison with existing designs are provided. The rest of this paper is organized as follows: Section II presents the related work. Section III presents the CAD flow I. I NTRODUCTION that has been adopted in designing the proposed modules. Sec- Although binary calculations are the dominant in most tion IV and V describes the proposed BCD digit Adders andmachines, they are not suitable for commercial, banking, and BCD digit Multiplier respectively. The experimental resultsbusiness applications due to the unacceptable inexact decimal- and the comparisons are provided in Section VI. Finally, theto-binary conversion errors they produce. In [1], a real example conclusion is given in Section VII.shows the extreme effect of these wrong approximations,where it stated that if a communication company approximates II. R ELATED W ORKa 5% sales tax on an item (such as a $0.70 telephone call),the yearly loss is over than a $5 million. The addition of two n-digit BCD numbers follows the same Software emulation of decimal arithmetic, as a solution procedure. After binary addition of any decimal pair, the resultto the fractional approximation problem, is not fast enough. is checked for correctness. Then, the correct decimal carryCompared to hardware speeds, the performance of existing output is passed to the next more significant digit pair, to bedecimal arithmetic software libraries is very poor . Software added with the two decimal digits located in the same position.emulation is slower than a hardware implementation by 100 to The conditional addition of 6 in each decimal digit position1000 times [2] . Currently, decimal arithmetic is implemented could be implemented using any 4-bit binary carry-propagateusing software while binary arithmetic is usually implemented adder architecture for each decimal digit. The long carryby the hardware, whereas . chain for such BCD adder slows down the addition operation. Further, the survey in [3] showed that 55% of the numeric Therefore, to improve the BCD adders speed, designers havedata columns, used by 51 major organization’s databases, proposed several enhancements to the basic BCD additionwere decimal data types and 43.7% were integer types which algorithm. Direct decimal addition [4], decimal speculativecould have been stored as decimals. In spite of this, currently, addition [5] [6] and conditional speculative decimal additiondecimal floating-point arithmetic is not supported by any [7], are examples of such refinements.microprocessors. Decimal floating-point coprocessor could be Decimal digit multiplier is an integral building block ofincluded in the machines that handle these calculations to any decimal multiplier. It produces two decimal digits as aspeed up these applications. result of multiplying one multiplier decimal digit with another In order to convoy the growing evolution of the decimal multiplicand decimal digit. All well known decimal digitarithmetic, efficient decimal algorithms have to be investi- multipliers use BCD-8421 encoding to represent the decimalgated. Decimal digit adders and decimal digit multipliers digits. Many BCD digit multiplier approaches exist in the978-1-4673-0862-5/12/$31.00 c 2012 IEEE literature.
  2. 2. In the digit-by-digit Look-up Table approach, the bits of the Building Truth Table for the Boolean Functionsmultiplier BCD digit and the multiplicand BCD digit are usedto address a look-up table, to produce the BCD digit product[8]. In [9], an iterative BCD multiplier using digit-by-digitlookup table technique is presented. A faster implementation Generating Optimized Boolean Expressionswas described in [8]. The later uses the same look-up tableused in [9], but it uses a carry-save addition scheme to addpartial products, instead of the carry-propagate adder used in[9]. However, due to the wide range of digits handled by Generating HDL codethe decimal system, digit-by-digit lookup table multipliers areinefficient from both area and speed sides. Another approach is the BCD digit multiplier using binary Design Simulationmultiplication. G.Jaberipur and A. Kaivani [10] provided anovel BCD digit multiplier cell design. Their design computesfirst the binary product of the two BCD digits then it convertsthe result to a BCD form. They have provided their own binary Design implementationproduct to BCD conversion unit and claimed that it is faster on FPGAand simpler than the previous designs. They have presented aDelay-optimized binary product BCD digit multiplier as well Fig. 1. CAD flow adopted in designing the different BCD digit adders and multiplieras an Area-optimized binary product BCD digit multiplier.Their new Delay-optimized binary product of two BCD digitsemploys the BCD constraint which stated that a3 × a2 = 0 III. CAD FLOWand a3 × a1 = 0 (where a3 a2 a1 a0 is the BCD digit A), tosimplify the ordinary binary partial product tree of A × B. The CAD flow that has been followed in designing proposed BCD digit adders and BCD digit multiplier is illustrated in In the digit recoding approach, the fixed-point BCD digit Figure 1. The first step is to generate a truth table for themultiplier in [11], converts both BCD operands (multiplier Boolean functions that represent the individual bits of theand multiplicand) to signed digits in range [−5, 5] and uses a output in each design. The truth table is used to generatesigned digit by signed digit multiplier to generate the partial optimized Boolean expressions using any logic optimizationproducts in a signed-digit form with two digits per position. technique. The output from logic optimization stage is thenEach partial product is then added along with the accumulated modified to generate a VHDL description for the entire design.sum of the previous partial products via a signed-digit adder. Functionality of the design is then verified by carrying out simulation using VHDL simulator. Finally, design is hardware Another different design presented in [12] also uses re- implemented on FPGAs using FPGAs CAD tools.coding scheme to generate partial products. Each multiplierdigit is recoded as Bi = BH + BL , with BH ∈ 0, 5, 10 and IV. P ROPOSED BCD D IGIT A DDERSBL ∈ −2, −1, 0, 1, 2. Only multiples 2A and 5A of the multi- The main problem in decimal addition, that increases theplicand are required. The partial products are accumulated by delay, is the need for correction if the result exceeds thea tree of redundant adders. Finally, the real product is obtained permitted BCD range (decimal number 9). This correctionby converting the carry-save tree’s outputs into a BCD format. actually adds the binary number (0110)2 to the result. One contribution in this work is the design of new high-speed A new family of parallel decimal multipliers was proposed area-optimized correction-free BCD digit adders that can bein [13]. Three different signed-digit recodings of the mul- employed in different decimal applications. Throughout thistiplier were proposed with different trade-offs between fast section, two new different design configurations for BCD digitpartial product generation and the number of generated partial adders are discussed.products. The BCD-8421 multiplier operand was recoded intominimally redundant signed-digit radix-10, radix-4 and radix-5 A. Configuration 1: Direct Boolean Expressions BCD Digitrepresentations. The proposed signed-digit radix-4 and radix- Adder5 recodings allow combined binary/decimal multiplications. In this approach, the idea is to design a direct BCD digitTheir radix-10 signed-digit recoder (digits in [−5, 5]) produces adder using a nine bit input, five bit output combinationalonly d + 1 partial products (where d is the number of multi- logic. The nine bit inputs are the two BCD input digits Aplier’s digit), but requires ripple addition to produce complex and B plus the decimal carry input cin and the five bitmultiples. The signed-digit radix-4 and radix-5 recodings (with outputs are the BCD digit of the decimal sum S plus thedigits in [−2, 2]) generate partial products in a few levels decimal carry out cout . The combinational logic of this adder isof combinational logic. However, they produce 2d partial constructed by extracting the Boolean expressions for the BCDproducts. addition result directly from the BCD input operands. In other
  3. 3. First Level Second Levelwords, our proposed approach will represent numbers usingthe BCD representation form, without any relation with thebinary numbering system. For example, to add (6 + 7 = 13), 6−input S1 LUTthis operation is translated to (0110 + 0111 = 10011)BCD .The output result (10011)BCD is the BCD representation of 6−input S0the decimal number 13. The most significant bit is the decimal LUTcarry output generated from the addition operation, while the 6−input S2other bits are the BCD summation digit. Accordingly, this is a0 a2 LUT a1 a3a correction-free technique, since the addition result is in a b0 6−input b2 b1 b3BCD form, and the need for correction is internally resolved x0 cin LUT x0 x1through the Boolean expressions of the addition result. 6−input The truth table for all output logic functions is constructed S3 LUTfor all possible combinations of the inputs. Since the inputs are 6−inputnine bits, the number of possible combinations is 29 = 512. LUT x1Many of these combinations are not valid since a decimal digit 6−inputis less than (10)10 , while 4-bit number can take any vale from LUT cout0 to (15)10 . In the case when the input is not valid, the outputis set to don’t care. This would help in minimizing the outputlogic functions more. Some entries of the truth table of thedesign are listed in Table I where the two input BCD digits Fig. 2. The two levels of the implementation of the proposed BCD digit Adder: Configuration 2are assumed to be A = a3 a2 a1 a0 and B = b3 b2 b1 b0 . Theoutput consists of the BCD digit sum S = s3 s2 s1 s0 and thedecimal carry output. Don’t care cases, represented by dashes, two decimal digits A = a3 a2 a1 a0 and B = b3 b2 b1 b0 are toappear in the output whenever the input includes an invalid be added along with the decimal carry input cin to produce theBCD digit. BCD summation digit S = S3 S2 S1 S0 and the decimal carry TABLE I output cout . The first level adds the first two least significantT HREE DIFFERENT ENTRIES FROM THE TRUTH TABLE OF THE BCD DIGIT bits of A and B (a1 a0 , and b1 b0 ), along with the carry input ADDER : C ONFIGURATION 1 cin , and produces three bits: x1 , x0 , and the least significant bit Inputs Outputs of the final BCD summation digit S0 . The two most significant cin a3 a2 a1 a0 b3 b2 b1 b0 cout s3 s2 s1 s0 temporary bits (x1 , x0 ), are used along with the two most 0 0110 1000 1 0100 significant bits of the two BCD input digits A and B (a3 a2 , 1 1001 1001 1 1001 and b3 b2 ) as inputs to the second level. The second level adds 0 1100 0001 - ---- the previous inputs and produces the three most significant bits of the final BCD summation digit (S3 S2 S1 ), and the final decimal carry output cout .B. Configuration 2: Minimal-area BCD Digit Adder It is notable that the most significant two bits from the first To get an optimized speed/area results on FPGAs, it is level (x1 , x0 ) are inputs to the the second level, since theyimportant to understand the internal architecture of the FPGA are involved in the correction operation. Actually, there is nodevice to be used in the synthesis/implementation. We are tar- direct correction in this addition approach, because each ofgeting Vertix-5 family FPGA device, which provides powerful the two levels of the design is implemented using the CADfeatures in the FPGA market [14]. Vertix-5 FPGA has 6-input flow presented in Section III and the correction is internallylook-up tables [14], and therefore any function with six or less hidden.inputs, will occupy exactly one look-up table. The previouslyproposed BCD adder have at least nine-input sub-architectures, V. P ROPOSED BCD D IGIT M ULTIPLIERa thing which complicates the overall design and increases Binary coded decimal digit multiplier is a fundamental cellits area and speed. We have used this point to find a more in the BCD multiplication operation. It multiplies two BCDoptimized BCD addition architecture. digits to produce a two BCD digits product output. We have To turn around this obstacle, we have proposed a two levels discussed many BCD digit multiplier approaches in SectionBCD digit adder. The first level has five inputs and three II. In this section, we introduce a new BCD digit multiplieroutputs, and by this we guarantee that each output from this architecture that manipulates directly with BCD representationlevel requires just one look-up table. The second level has form.six inputs (two of them from the first level) and four outputs.Therefor, four look-up tables are needed for this stage. The A. Direct Boolean Expressions BCD Digit Multiplieroverall occupied area is then seven look-up tables, which is In our proposed BCD digit multiplier, word ”direct” meansthe same as the traditional BCD digit adder’s area, as we will no need for neither ”first finding the binary multiplicationsee in Section VI. This is illustrated in Figure 3 where the result and then converting the product to a BCD form” as
  4. 4. VI. E XPERIMENTAL R ESULTS AND C OMPARISONS 6−input All our and other relevant proposed designs were described P0 6 LUT using VHDL hardware description language, and simulated to ensure correct functionality. They were then synthesized with Xilinx ISE10.1 tool and then implemented in Vertix-5 A Heirarchy XC5VLX30 -3 FPGA, optimized for speed. 4 of 6−input P1 Synthesis results of our BCD digit adders versus other 8 LUTs proposed decimal digit adders are shown in Tables III and IV. B 4 These results show that our first direct Boolean expressions BCD digit adder is the fastest among other proposed adders. However, it suffers from the high occupied area. Our proposed minimal-area BCD digit adder minimizes this area and gives 6−input the best area diagrams. P7 6 LUT We compare our proposed BCD digit adders with other three representative adders; the traditional BCD digit adder (which applies the basic BCD addition algorithm), the conditionalFig. 3. Expected implementation of the output functions of the BCD digit speculative BCD digit adder [7], and the [3:2] BCD-4221multiplier on 6-input LUTs decimal adder proposed by [13]. The authors in [13] stated that their proposed most recent (Q-T) based conditional speculative TABLE II addition algorithm [7], presents low latency and requires lessT HREE DIFFERENT ENTRIES FROM THE TRUTH TABLE OF THE BCD DIGIT MULTIPLIER hardware than other alternatives. They have proposed two implementations for their algorithm, and they preferred the Inputs Outputs (Q-T) carry tree implementation over their parallel prefix carry a3 a2 a1 a0 b3 b2 b1 b0 P7 P6 P5 P4 P3 P2 P1 P0 tree implementation for their proposed adder. However, due to 0110 1000 0100 1000 the insufficient resources about the (Q-T) carry tree, we have 1001 1001 1000 0001 1100 0001 ---- ---- implemented their other architecture, modified for decimal addition. TABLE III D ELAY FOR DIFFERENT DECIMAL DIGIT ADDERS (ns)proposed in [10], nor ”any recoding process” as designed in[11] and [12]. Instead, we have used a simplified Boolean Decimal digit adder Delay(ns)expressions to perform the ”direct” functionality by using the Proposed: Configuration 1 1.337CAD flow of Section III. In this case, the two operands are Proposed: Configuration 2 1.566two decimal digits A = a3 a2 a1 a0 and B = b3 b2 b1 b0 and Traditional BCD adder 2.562 Conditional speculative of [7] 2.757the output P = A × B is 8 bit P7 P6 P5 P4 P3 P2 P1 P0 (two [3:2] BCD-4221 adder of [13]1 1.347BCD digits). Since the input is 8 bits wide, the number ofcombinations in the truth table is 28 = 256. Among all thesecombinations only 100 combinations are valid and the rest are TABLE IVinvalid. All outputs for the invalid combinations in the truth A REA FOR DIFFERENT DECIMAL DIGIT ADDERS (LUT S )table are set to don’t care. Some entries from the truth table Decimal Area inare listed in Table II where don’t cares are represented by digit adder number of LUTsdashes. Proposed: Configuration 1 11 Since the output functions in this case are functions of Proposed: Configuration 2 7 Traditional BCD adder 78 variables, the implementation of most functions required Conditional speculative of [7] 19a hierarchy of LUTs for the case of 6-input LUTs FPGAs. [3:2] BCD-4221 adder of [13]1 11Figure 3 shows the expected implementation of the outputfunctions on 6-input LUTs. If the output functions depends Results listed in Tables III and IV; clearly show that twoon more than six variables from the input variable then it of our correction-free BCD digit adders outperform both theneeds hierarchy of LUTs to be implemented. An example on traditional BCD digit adder and the conditional speculativethis is P1 which depends on all the eight input variables. On BCD digit adder [7], from the speed side. Moreover, thesethe other hand, some functions depends only on two variables proposed adders are better than the conditional speculativelike P0 or four variables like P7 . This means that these two BCD digit adder in both speed/area terms. The proposedoutputs consumes a single 6-input LUT each. minimal-area BCD digit adder has the same area as the The simplified Boolean expressions for the BCD digit traditional decimal adder (which has the minimal area), butadders and multiplier designs are reported by our work pre-sented in [15]. 1 Requires extra overhead for conversion to BCD-8421 encoding
  5. 5. with a 39% more speed. This speed increase in all our BCD Xilinx Vertix-5 XC5VLX30 -3 FPGA. Experimental resultsdigit adders is a result of applying correction-free addition are provided and comparison with other existing designs showstechniques. that the proposed designs outperform their counterparts in The proposed BCD-4221[3:2] decimal adder of [13] shows terms of speed and area.a good area/speed results comparing to our first proposed R EFERENCESdirect Boolean expressions BCD digit adder. However, theirproposed decimal adder is dedicated for [3:2] BCD-4221 [1] IBM Corporation, “Decimal FAQ,” http://www2.hursley.ibm.com/ decimal/decifaq1.html.encoding addition, and not for [2:1] BCD-8421 one, as ours. [2] M. F. Cowlishaw, “Decimal floating-point: Algorism for computers,” Tables V and VI show synthesis results for our BCD digit in Proceedings of the 16th IEEE Symposium on Computer Arithmeticmultiplier and other high-performance BCD digit multipliers. (ARITH-16’03), Washington, DC, USA, 2003, ARITH ’03, pp. 104–, IEEE Computer Society.One can clearly observe that our first proposed direct Boolean [3] A. Tsang and M. Olschanowsky, “A study of database 2 customerexpressions BCD digit multiplier outperforms other BCD digit queries,” Tech. Rep., IBM Technical Report, IBM Santa Teresa Labo-multipliers in terms of speed and area. ratory, San Jose, CA, April 1991. [4] M. S. Schmookler and A. Weinberger, “High speed decimal addition,” TABLE V IEEE Transactions on Computers, vol. 20, pp. 862–866, 1971. D ELAY FOR DIFFERENT DECIMAL DIGIT MULTIPLIERS (ns) [5] H. Wetter W. Bultmann, W. Haller and A. Worner, “Binary and decimal adder unit,” 2001. [6] J. Thompson, I. Karra, and M. J. Schulte, “A 64-bit decimal floating- Decimal digit multiplier Delay (ns) point adder,” in Proceedings of the IEEE Computer Society Annual Proposed 1.336 Symposium on VLSI, 2004, pp. 297–298. BCD digit multiplier of [10] 4.414 [7] A. V´ zquez and E. Antelo, “Conditional speculative decimal addition,” a BCD digit multiplier of [13]1 1.81 Nancy, France, 2006, pp. 47–57. [8] R. H. Larson, “High speed multiply using four input carry save adder,” IBM Technical Disclosure Bulletin, pp. 2053–2054, 1973. [9] R. H. Larson, “Medium speed multiply,” IBM Technical Disclosure TABLE VI Bulletin, p. 2055, 1973. A REA FOR DIFFERENT DECIMAL DIGIT MULTIPLIERS (LUT S ) [10] G. Jaberipur and A. Kaivani, “Binary-coded decimal digit multipliers,” IET Computers and Digital Techniques, vol. 1, no. 4, pp. 377–381, 2007. Decimal Area in [11] E. M. Schwarz, “Decimal multiplication with efficient partial product digit multiplier number of LUTs generation,” in Proceedings of the 17th IEEE Symposium on Computer Proposed 25 Arithmetic, Washington, DC, USA, 2005, ARITH ’05, pp. 21–28, IEEE BCD digit multiplier of [10] 27 Computer Society. BCD digit multiplier of [13]1 28 [12] T. Lang and A. Nannarelli, “A radix-10 combinational multiplier,” in proc. of 40th Asilomar Conference on Signals, Systems, and Computers, oct 2006, pp. 313–317. For comparison, we have implemented two efficient and [13] A. Vazquez, E. Antelo, and P. Montuschi, “A new family of highmost recent BCD digit multipliers, proposed in [10]2 and performance parallel decimal multipliers,” in Proceedings of the 18th IEEE Symposium on Computer Arithmetic, Washington, DC, USA, 2007,[13]. The delay-optimized BCD digit multiplier of [10], which ARITH ’07, pp. 195–204, IEEE Computer Society.calculates the binary multiplication of the two BCD digits, [14] Inc. Xilinx, “Vertix-5 data sheet,” http://www.xilinx.com/support/then converts the product to a BCD form, presents the worst documentation/datasheets/ds100.pdf. [15] N. H. Tuli´ , “Fpga implementations of decimal arithmetic cells,” M.S. cspeed/area results. It consists of multiple components, which thesis, Jordan University of Science and Technology, May 2009.increases the number of gate levels in the critical path, andhence the overall delay. The SD radix-5 BCD digit multiplierof [13] was implemented to compare with. The authors in[13] stated that their radix-5 BCD digit multiplier gives thebest speed figures among their radix-10 and radix-4 BCD digitmultipliers. It produces four BCD-4221 digits for every BCDdigit multiplication operation. Thus, for a fair comparison,extra hardware must be added to generate only two BCDdigits, as almost all other BCD digit multipliers in the literaturedo. VII. C ONCLUSION In this paper two new BCD digit adders and one newBCD digit multiplier are designed for the purpose of speedingup decimal arithmetic applications over FPGAs. Each designis described, verified and tested for a correct functionalityusing VHDL coding and simulation. The different designsare implemented using Xilinx ISE10.1 CAD tool targeting 2 We note that there is a typo in drawing the circuit for this design sincewhen implemented exactly as shown by the logic circuit in [10] some inputcombinations give incorrect results