Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
2009 IEEE Computer Society Annual Symposium on VLSI

A High Performance Unified BCD and Binary
Adder/Subtractor
Anshul Sin...
in Fig.6), which is used for group propagate and generate. The
CM block is basically the Black cell used in the proposed
d...
This paper presents a modified version of this unified
adder and is shown to perform better by at least 32% in
power-delay...
the delay. After this stage the correct binary outputs are
obtained but for BCD addition/subtraction further corrections
a...
G k-1:j

S3

S4

Pi:k

0

Gk-1:j

Pi:k

Gi:k

1

Pk-1:j

C1

    

G i:j

Pi:j

Gi:j

(b)

0

S2

0

1

O2

S4
1

S2.S3
0
...
V. Conclusion
This paper presented a modified architecture for fast BCD
addition/subtraction that performs binary addition...
Upcoming SlideShare
Loading in …5
×

A High performance unified BCD adder/Subtractor

2,346 views

Published on

Improved architecture for efficient Binary Coded Decimal (BCD) addition/subtraction is presented that performs binary
addition/subtraction without any extra hardware

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

A High performance unified BCD adder/Subtractor

  1. 1. 2009 IEEE Computer Society Annual Symposium on VLSI A High Performance Unified BCD and Binary Adder/Subtractor Anshul Singh,Aman Gupta,Sreehari Veeramachaneni, M.B. Srinivas* Centre for VLSI and Embedded System Technologies(CVEST), International Institute of Information Technology (IIIT),Gachibowli, Hyderabad, 500032, India. * Department Electronics and Communication Engg, Birla Institute of Technology and Science (BITS), Hyderabad Campus, Hyderabad, India Email: {anshul_ singh, aman }@students.iiit.ac.in, srihari@research.iiit.ac.in ,srinivas@iiit.ac.in. Abstract- Decimal data processing applications have grown exponentially in recent years thereby increasing the need to have hardware support for decimal arithmetic. In this paper, an improved architecture for efficient Binary Coded Decimal (BCD) addition/subtraction is presented that performs binary addition/subtraction without any extra hardware. The architecture works for both signed and unsigned numbers. The design is runtime reconfigurable and maximum utilization of the hardware is a feature of the architecture. Simulation results show that the proposed architecture is at least 32% better in terms of power-delay product than the existing designs. Flora [5] followed the principle of carry select adders and came up with a design which concurrently calculated two results, one assuming the presence of an input carry and the other in its absence. Fischer et al. [4] (Fig. 2) later came up with a compact design that employed only one adder but the latency was a problem as it had to use an additional correction block. Input Stage Invert the operand when a sign is detected N1 Output Stage AGS X S I. INTRODUCTION Binary Adder Fast decimal data processing needs hardware that supports decimal arithmetic. Recently, specifications for decimal floating point arithmetic have been added to the draft revision of the IEEE-P754 standard for floating point arithmetic [2]. Extensive work has been done on BCD arithmetic especially on adders/subtractors. Some of the initial contributions came from Schmooklar et al. [1] and Adiletta et al. [3]. Later, designs of combined BCD and Binary adders were presented by Levine et al. and Anderson. The first BCD sign-magnitude adder/subtractor was designed by Grupe [11]. An area efficient sign-magnitude adder was later developed by Hwang[8] as shown in Fig.1 Area occupied by this design was least amongst all the previous designs. A 6 is added when both N2 and N1 are positive N2 Subtract a 6 when necessary SUM C Y F3 F1 F2 Figure 2. Fischer’s proposal [4] During the last decade various BCD adder/subtractor circuits have been developed for the IBM microprocessors based on the design presented by Haller et al. [6]. This architecture is shown in Fig. 3. Recently, Haller et al. optimized the carry chain in the same architecture which slightly reduced the delay but with an increased area of the unit. Operand N2 N2 Augend Input N1 (Addend Input) …. …. Digitwise + 6 Operand N1 Nine’s Complimenter Dec Add Dec Sub 1 0 MUX Ai Carry Input Ci …. Bi A0 Binary ALU Yi …. …. B0 Y1 . . Y0 Binary Carry Out . . . Ci Y1 . . . Y0 . . . N1 N2 …. Y0 …. C0 Decimal Correction Unit Ri …. Cd Decimal Carry Out N1 + N2 N1 N2 6 N1 + N2 + 6 CY0 N1 N2 N1 + N2 N1 N2 6 N1 + N2 + 6 CY1 Digital Carry Network R0 …. Digitwise - 6 Digitwise - 6 …. Multiplexer Partial Sum 0 …. MUX Partial Sum 1 MUX SUM Figure 1. Hwang’s proposal [8] MUX SUM Figure3. Haller’s Proposal [6,7] 978-0-7695-3684-2/09 $25.00 © 2009 IEEE DOI 10.1109/ISVLSI.2009.40 211 Carry Out
  2. 2. in Fig.6), which is used for group propagate and generate. The CM block is basically the Black cell used in the proposed design. It can be easily observed that the delay for generating the carry out from the CM1 is same as that for the second ripple carry adder thereby making the CM1 block redundant. Further, replacement of CM3 block by the grey cell (GC) will reduce the hardware without affecting the functionality of the circuit. The design of the Universal Adder (Fig. 4) proposed by D.R.Humberto et al. [12] uses affective addition/subtraction operations on unsigned/sign-magnitude, and various complement representations. This design overcomes the limitations of previously reported approaches that produce some of the results in complement representation when operating on sign-magnitude numbers. The design has high latency. A4 B4 Add N1' N2' EAdd N2 N1 N1 N2* DC Logic XOR Add Decimal SUB Binary ADD/SUB Decimal Add Digitwise -6 N1 Bin A2 B2 A1 B1 FA Co Logic Correction Coder Decimal A3 B3 FA FA FA N2 CM2 Correction Time CM1 DS (3/2) Counter Array CM3 Co Carry-Propagate Adder Ci Add XOR 0 SUM 1 Figure4. Humberto’s Proposal [12] G e n e ra te Sreehari et al. [9] recently came up with the prefix logic based BCD adders and proposed a novel unified BCD binary adder-subtractor [10] which is considered as the fastest unified adder in the literature so far. The architecture is divided into three major parts, the pre-computation stage, the prefix network and the post-computation stage. This architecture is illustrated in Fig.5. The pre-computation block consists of logic to compute propagate and generate signals for both BCD and Binary addition/subtraction. Bn An BCD BCD PG P*G* B3 A3 Bn-1 n-1 A BCD FA PG BCD FA PG BCD FS P*G* BCD FS P*G* B2 A 2 BCD FA PG BCD FS P*G* Wide varieties of prefix networks are available depending on the requirements of the designer. Sklansky network is chosen by the authors for reduction in delay. The post-computation block proposed by Sreehari et al. [10] (Fig. 7) uses a 4 bit CLA to add the two numbers to calculate the sum/difference and the carry out bits for each stage. But these bits are already calculated in the precomputation block and the prefix network thereby making more than half of the post computation block redundant. Removing these redundancies from the design can increase the performance of the architecture considerably for each stage. B1 A1 BCD FA PG P ro p a g a te Figure6. The Existing P-G Block [9] BCD FS P*G* As Bs B1 B2 B3 B4 C in PREFIX NETWORK A1 A2 A3 A4 0 O pSelect FA/FS FA/FS FA/FS FA/FS FA/FS 4-bit C LA Full Adder OpSelect 4 MUX B k+1 Sum/Difference 4 C orrector for Subtractor C out C orrector for Adder 4 Figure5. Architecture of the existing Unified BCD and Binary Adder/Subtractor [10] 4 MUX 4 SU M / D IFFER EN C E The pre-computation stage of the architecture is not clearly presented by the authors in the paper [10] and it is assumed that they have used the same P-G block presented in [9]. The P-G block uses a Carry Merge block, CM (as shown Figure7. Post Computation Block [10] 212 C in’ 1's C om plem ent 1
  3. 3. This paper presents a modified version of this unified adder and is shown to perform better by at least 32% in power-delay product. The rest of the paper is organized as follows: Section 2 gives description of the algorithm for the unified adder while section 3 describes the proposed architecture. Simulation results for the proposed and existing circuits are given in section 4 and comparisons are carried out. . 4 AN-(N-3) 4 B8-5 4 A8-5 4 B4-1 P-G Block 2 PG2 S8-5 4 S4-1 4 Correction Block K 4 ON-(N-3) C2 S8-5 4 Correction Block 2 O8-5 4 C1 S4-1 36 +3 8 74 0 1 1 1011 1100 + 0010 0011 + 0010 0011 1110 0000 0 1 1 1011 1100 1011 1100 + 0010 0111 + 0010 0111 1110 0011 4 1110 0100 Figure10. Example of Binary addition/subtraction illustrating the concept of 4 bit propagate and generate for BCD subtraction / Binary addition/subtraction Correction Block 1 O4-1 0 1011 1100 1 Generate C + D > 1111 SN-(N-3) 1 1101 1111 Propagate C + D = 1111 Prefix Network CK 36 +3 3 69 For Binary addition/subtraction and BCD subtraction, P* = 1 if C + D = 15 (C and D are 4 bit numbers) G* = 1 if C + D > 15 For the case of subtraction, D is the 2’s complement of the original subtrahend. For BCD subtraction P* and G* remain the same as in binary addition/subtraction because BCD subtraction is treated as Binary subtraction for the first two stages. These control signals are then sent to the prefix network which calculates the group propagate and generate using the formula Gi:j = Gi:k + Pi:k.Gk-1:j Pi:j = Pi:k.Pk-1:j where i ≥ k > j P-G Block 1 PG1 0 4 A4-1 4 4 SN-(N-3) PG K 0 Figure9. Examples of BCD addition illustrating the concept of 4 bit propagate and generate 0 P-G Block K 1 36 +3 8 75 Generate A+B>9 The main objective of the algorithm is to perform efficient BCD addition/subtraction. But in the proposed design the binary addition/subtraction is automatically taken care of without any extra hardware. As BCD digits are 4 bits in length, all the operations, be it BCD addition/subtraction or binary addition/subtraction, are done on 4 bit numbers. The algorithm divides the proposed design into three major parts, the P-G Block, the Prefix Block and the Correction Block as shown in the Fig. 8. The P-G block generates signals named propagate (P) and generate (G) for every 4 bits. These signals are used by the prefix network for generating the carry out for each stage. The P and G for a stage denote whether the stage propagates or generates the carry/borrow respectively. Along with generating these signals, the sum/difference of the 4 bit numbers is obtained that is directly used by the correction logic unlike the previous design [10]. The P-G block itself uses prefix logic to generate the P and G signals for 4 bit numbers. 4 36 +3 3 70 1 II. Algorithm for Unified BCD/Binary Adder/Subtractor BN -(N -3) 1 1 Propagate A+B=9 4 The group Pk:0 and Gk:0 bits denote whether the first k stages propagate or generate the carry/borrow. Gk:0 denotes the carry out of the kth stage i.e. Ck = Gk:0 where Ck is the carry out of the kth stage. After all the carry/borrow bits are obtained, these are fed to the correction stage which along with the sum/difference bits from the P-G block gives out the final result. The first operation in the correction block is to add the in-coming carry/borrow from the previous stage to the sum/difference bits. This is implemented using carry select adder to reduce Figure8. Architecture of Unified BCD/Binary Adder/Subtractor The concept of propagate and generate for different cases are illustrated below with equations and examples. For BCD addition, P = 1 if A + B = 9 (A and B are 4 bit numbers) G = 1 if A + B > 9 213
  4. 4. the delay. After this stage the correct binary outputs are obtained but for BCD addition/subtraction further corrections are to be made to obtain the correct BCD result. For BCD addition (0110)2 or (6)10 is added to the binary sum if it exceeds (1001)2 or (9)10 to get the correct BCD sum. BCD subtraction in the first block is treated as binary subtraction and the difference is obtained by the 2’s complement technique. The only thing which has to be taken care of is that the magnitude of subtrahend should always be smaller than that of the minuend. If a digit of the minuend is greater than that of subtrahend the binary output for that digit is the correct BCD output and there is no need for any correction. But if a digit of the subtrahend is greater than the minuend then (1010)2 or (10)10 has to be added to the binary output for that digit to get the correct BCD difference. To detect the relative magnitude of the minuend and the subtrahend of a PG block, the carry out of that stage is checked. The following example illustrates the above algorithm. Let A (minuend) = 5 5 6 B (subtrahend) = 2 3 9 In BCD format: A = 0101 0101 0110 B = 0010 0011 1001 Treating these numbers as binary, 2’s complement of B, say C is taken C = 1101 1100 0111 Next the subtrahend is added to the minuend and the correction is done if needed is 1 when the effective operation is subtraction and 0 when the effective operation is addition. III. Proposed Architecture of the Unified BCD/Binary Adder/Subtractor The architecture, as discussed before, consists of three major blocks i.e. the P-G block, the Prefix block and the Correction block (Fig.8). The architecture of the P-G block is shown in the Fig. 12. Each block takes in 8 bits, 4 bits of each number and generates the propagate and generate signals for BCD addition (P and G) and for BCD subtraction/Binary addition/subtraction (P* and G*) and also the sum or difference bits (S4 to S1 in the below case). The logic diagram of the full adder, BC (black cell) and GC (grey cell) are given in Fig. 13. For the case of BCD/Binary subtraction, 2’s complement of the subtrahend is calculated by inverting the bits of the subtrahend and adding 1 to the adder generating the least significant bit in the first P-G block (least significant) as shown in Fig. 12. The rest of the PG blocks only take complements of the subtrahend and do not add 1. To choose between the two kinds of propagate and generate a multiplexer is used at the end of each P-G block. Bs OpSelect SUB/ADD B4 Carry out, no correction needed SUB/ADD A4 A3 FA + Correct Binary output Correction Correct BCD output No carry out, correction needed 0 FA g2:0 S3 FA S2 S1 p4:3,g4:3 S4 S3 GC S2 S1 S4 S1 0 g4:0/Cout1 1101 1100 0111 G 1 G* P P* 0011 0001 1101 + A1 A2 p3,g3 B1 BC 0101 0101 0110 A C 1 SUB/ADD B2 FA S4 p4,g4 1 SUB/ADD B3 1 1010 0 BCD ADD/ELSE Output to Prefix Network 0011 0001 0111 Figure 11.Illustrating the proposed algorithm for BCD subtraction              Figure12. The P-G block     Hence the final result = (317)10 A The signed numbers are taken care by the control logic at the beginning which takes the two sign bits and OpSelect (Operation Select) as inputs to compute the control signal (SUB/ADD) which specifies the effective operation to be performed by the hardware. The effective operation to be performed is calculated by the below equation SUB/ADD = As ⊕ Bs ⊕ OpSelect B XOR-XNOR Cin Generate Propagate where OpSelect is 1 when the operation is subtraction and 0 when the operation is addition and As and Bs are the sign bits of the numbers under operation. The control signal SUB/ADD 0 1 0 Sum Figure13. (a) Full adder 214 1 Cout As
  5. 5. G k-1:j S3 S4 Pi:k 0 Gk-1:j Pi:k Gi:k 1 Pk-1:j C1      G i:j Pi:j Gi:j (b) 0 S2 0 1 O2 S4 1 S2.S3 0 S2(C1)’ S3 0 1 For the binary addition/subtraction the output of the carry select adder in the correction block gives the final result. IV. Simulations and Results The analysis of all the architectures tabulated below has been carried out by performing simulation runs on HSPICE using 65nm CMOS technology. Simulations are performed for 32 bit adders/subtractors. All the circuits are simulated at 1.2V at a frequency of 50 MHz. The simulation results are shown in Table 1 TABLE I Average Delay, Power, Power-Delay Product and Area of various architectures Cin 4 D4-1 Delay (10-10 sec) Power (10-4 watt) Humberto[12] Haller [7] Sreehari [10] Proposed 8.106 5.488 3.959 3.268 3.714 5.029 2.860 2.328 (C1)’ 4 Correction Logic for BCD Subtraction 4 S2 Figure16. Gate level diagram of correction unit for BCD Subtraction C1 Correction Logic for BCD Addition S1 S3 (C1)’ Architecture 4 O1 Figure15. Gate level diagram of correction unit for BCD Addition S1 Carry select 1-adder PowerDelay Product (10-14) 35.984 27.911 11.486 7.813 Area (no. of mosfets) 8510 11056 2902 2500 4 ADD/SUB MUX It is clear from the above Table 1 that the proposed design has an improvement of 17.45% in terms of delay and is 18.60% better in terms of power giving it a 32% improvement in power-delay product over the most efficient architecture in the literature. 4 B4-1 D4-1 BCD/Binary MUX      1 O3 S4(C1)’ The propagate and the generate signals produced by the P-G block are then sent to the prefix network. The selection of the prefix network can be made according to the requirements of area, power and delay from the wide range available in literature. For simulation purposes Sklansky network is chosen for the design [13]. The prefix network generates the group generate for each stage which is the carry out of that stage. Carry out of nth stage is denoted by Cn. After all the carry/borrow bits are obtained, these are fed to the correction stage which along with the sum/difference bits from the P-G block gives out the final result which is shown in Fig. 14. The first operation in the correction block is to add the in-coming carry/borrow from the previous stage to the sum/difference bits. This is implemented using carry select adder to reduce the delay. After this stage the correct binary outputs are obtained but for BCD addition/subtraction, corrections need to be made to obtain the correct BCD result. For the BCD addition the correction is done by adding (0110)2 and for BCD subtraction the correction is done by adding (1010)2 to the correct binary output whenever needed. The logic diagram of the two correction units is shown in Fig. 15 and Fig. 16 S3 1 0   (c) S4 1 0 O4 Figure13. (b) Grey Cell (c) Black Cell 4 S1 S2 S3 S4 4 S2 G i:k O4-1                            Figure14.Correction Block     215
  6. 6. V. Conclusion This paper presented a modified architecture for fast BCD addition/subtraction that performs binary addition/subtraction without any extra hardware. The design is runtime reconfigurable and maximum utilization of the hardware is a feature of the architecture. All the blocks have been designed to work with least delay. The proposed architecture shows, on an average, an improvement of 32% in power-delay product over the most efficient architecture in the literature. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] M.S.Schmookler and A. Weinderger. “Decimal Adder for Directly Implementing BCD Addition Utilizing Logic Circuitry”, International Business Machines Corporation, US patent 3629565, pages 1 – 19, Dec 1971. IEEE standard for floating-point arithmetic. IEEE SC, Oct. 2006 at http://754r.ucbtest.org/drafts M. J. Adiletta and V. C. Lamere. “BCD Adder Circuit”. Digital Equipment Corporation, US patent 4805131, pages 1 – 18, Jul 1989. H. Fischer andW. Rohsaint. “Circuit Arrangement for Adding or Subtracting Operands Coded in BCD-Code or Binary-Code”, Siemens Aktiengesellschaft, US patent 5146423, pages 1 – 9, Sep 1992 Flora, Laurence P., “Fast BCD/Binary Adder”, US Patent 5007010. W. Haller, U. Krauch, and H. Wetter. Combined Binary/Decimal Adder Unit. International Business Machines Corporation, US patent 5928319, pages 1-9, Jul 1999. W. Haller, W. H. Li, M. R. Kelly, and H. Wetter. “Highly Parallel Structure for Fast Cycle Binary and Decimal Adder Unit”. International Business Machines Corporation, US patent 2006/0031289, pages 1 – 8, Feb 2006 S. Hwang. “High-Speed Binary and Decimal Arithmetic Logic Unit”, American Telephone and Telegraph Company, AT&T Bell Laboratories, US patent 4866656, pages 1-11, Sep 1989. Sreehari Veeramachaneni, M. Keerthi Krishna , L. Avinesh, P Sreekanth Reddy, M.B. Srinivas, “Novel High-Speed 16-Digit BCD Adders Conforming to IEEE 754r Format”, IEEE Computer Society Annual Symposium on VLSI (ISVLSI’07), pages 343-350, Mar 2007. Sreehari Veeramachaneni, M, Kirthi Krishna; V, Prateek G, S. Subroto, S, Bharat, M.B.Srinivas, “A Novel Carry-Look Ahead Approach to a Unified BCD and Binary Adder/Subtractor”, 21st International Conference on VLSI Design 2008, pages 547-552, Jan 2008. U. Grupe.“Decimal Adder“, Vereinigte Flugtechnische Werke-Fokker gmbH, US patent 3935438, pages 1 – 11, Jan 1976. D.R.Humberto Calderón, G. N. Gaydadjiev, S. Vassiliadis, “Reconfigurable Universal Adder”, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP 07), pages 186-191, July 2007. J. Sklansky, “Conditional-sum addition logic,” IRE Trans. Electronic Computers, vol. EC-9, pages 226-231, June 1960. 216

×