radix 4 and radix-8 booth encoded multi-modulus multipliers
Upcoming SlideShare
Loading in...5
×
 

radix 4 and radix-8 booth encoded multi-modulus multipliers

on

  • 2,451 views

For more projects feel free to contact, www.nanocdac.com, info@nanocdac.com, 08297578555 –Mallikarjuna.V(Project manager)

For more projects feel free to contact, www.nanocdac.com, info@nanocdac.com, 08297578555 –Mallikarjuna.V(Project manager)

Statistics

Views

Total Views
2,451
Views on SlideShare
2,451
Embed Views
0

Actions

Likes
0
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

 radix 4 and radix-8 booth encoded multi-modulus multipliers radix 4 and radix-8 booth encoded multi-modulus multipliers Document Transcript

  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS 1 Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers Ramya Muralidharan, Member, IEEE, and Chip-Hong Chang, Senior Member, IEEE Abstract—Novel multi-modulus designs capable of performing the desired modulo operation for more than one modulus in Residue Number System (RNS) are explored in this paper to lower the hardware overhead of residue multiplication. Two multi-modulus multipliers that reuse the hardware resources amongst the modulo , modulo and modulo multipliers by virtue of their analogous number theoretic properties are proposed. The former employs the radix- Booth encoding algorithm and the latter employs the radixBooth encoding algorithm. In the proposed radix- and radix- Booth encoded multi-modulus multipliers, the modulo-reduced products for the moduli , and are computed successively. With the basis of the radixBooth encoded modulo and radixBooth encoded modulo and modulo multiplier architectures, new Booth encoded modulo multipliers are proposed to maximally share the hardware resources in the multi-modulus architectures. Our experimental results on based RNS multiplication show that the and radixBooth encoded multi-modulus proposed radixmultipliers save nearly 60% of area over the corresponding single-modulus multipliers. The proposed radix- and radixBooth encoded multi-modulus multipliers increase the delay of the corresponding single-modulus multipliers by 18% and 13%, respectively in the worst case. Compared to the single-modulus multipliers, the proposed multi-modulus multipliers incur a minor power dissipation penalty of 5%. Index Terms—Booth algorithm, multi-modulus architectures, multiplier, residue number system (RNS). I. INTRODUCTION R ESIDUE number system (RNS) has been shown to be an advantageous alternative to traditional Two’s Complement System (TCS) for accelerating certain computations in high dynamic range (DR) applications such as signal processing, communication, and cryptographic algorithms pair-wise co-prime [1]–[3]. RNS is defined by a set of called the base. For unambiguous moduli representation, the DR of the RNS is given by the product . An inof all moduli comprising the base, i.e., within the DR is represented by a unique -tuple teger , where is a residue defined by modulo . The operation is implemented as or , of is independently computed from where each residue only the residues, and , of the operands, and , respectively in a modulo channel corresponding to the modulus Manuscript received July 20, 2012; revised December 03, 2012; accepted February 07, 2013. This paper was recommended by Associate Editor M.-D. Shieh. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore (e-mail: ramy0003@e.ntu. edu.sg; echchang@ntu.edu.sg). Digital Object Identifier 10.1109/TCSI.2013.2252642 [4]. The absence of inter-channel carry propagation and the reduced length of intra-channel carry propagation chains lead to faster implementation in RNS when compared to its TCS counterpart. However, the benefits of RNS may not translate well into area and power constrained platforms due to the significantly extra hardware required for the conversion between TCS and RNS as well as the concurrent computations in several modulo channels. A factor that directly influences the hardware complexity of RNS is the choice of the moduli that comprise the RNS base. Compared to the general moduli, special moduli of forms, , and that posses good number theoretic properties are preferred. By exploiting these properties, full-adder based binary-to-residue and residue-to-binary converters for special moduli sets have been developed [5]–[8]. Hardware efficient and modulo arithmetic circuits such as modulo adders and multipliers have also been designed with these properties as the basis [9]–[23]. To lower the RNS area and power dissipation, system level design methodologies like multi-function and multi-modulus designs have also been suggested [24], [25]. Architectures that perform the same modulo operation for more than one modulus are called multi-modulus architectures (MMAs) [24]. MMA is classified into two types: fixed multi-modulus architecture (FMA) and variable multi-modulus architecture (VMA). The modulo operations corresponding to the different moduli are performed concurrently in a FMA but serially in a VMA. In contrast to the single-modulus architecture (SMA), MMA exploits hardware reuse amongst the modulo channels of RNS. Hence, both FMA and VMA result in considerable savings in area as well as power dissipation. MMA is widely used in reconfigurable and fault-tolerant RNSs where the moduli of the base are selectively used. In reconfigurable RNS, the DR of RNS is varied by selecting a subset of moduli from the base to fulfill the DR requirement of the application. The channels corresponding to the unselected moduli are turned off thereby reducing the system power dissipation [26], [27]. In fault-tolerant RNS, additional moduli are included in the base such that the DR of RNS far exceeds the DR requirement of the application. These redundant moduli are used to detect and isolate faults in computations. Once the faulty residue digit is located, the corresponding modulus of the base is discarded [28], [29]. Another specific application of VMA is high DR RNSs based on imbalanced word-length moduli sets [5], [8]. In such RNSs, a VMA can be employed for the non-critical moduli to substantially lower the hardware requirement without compromising the system delay. , Consequently, MMAs for the three special moduli, have been explored in recent times. In [30], a variable and 1549-8328/$31.00 © 2013 IEEE
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 multi-modulus two-operand parallel-prefix adder that computes the sum modulo , modulo and modulo in only prefix levels was proposed. By extending the number of operands, a variable multi-modulus multi-operand adder for the moduli and based on regularly structured carry save adder (CSA) tree was described in [31]. By making use of the variable multi-modulus multi-operand adder, a variable multi-modulus multiplier for the three special moduli was suggested in [32]. Recently, a dual-modulus multiplier based on and was radix-4 Booth encoding for the moduli proposed [33]. For the moduli , and , non-encoded fixed and variable multi-modulus squarers were proposed in [34]. In [35], the VMA for squarer was further improved by employing the radix-4 Booth encoding algorithm. In this paper, Booth encoding technique is applied to multimodulus multiplication. Two new multi-modulus multipliers for , and are proposed. The former the moduli uses the radix- Booth encoding scheme while the latter uses the radix- Booth encoding scheme. In the radix- Booth encoded multi-modulus multiplier, the FMA approach is applicable only to the partial product generation stage and not applicable to the bias generation and partial product addition stages. While in the radix- Booth encoded multi-modulus multiplier, the FMA approach is relevant only to the encoding of the multiplier bits. Owing to the restricted applicability of the FMA in encoded multipliers, only VMA is considered in this paper. For the proposed radix- Booth encoded multi-modulus multiplier of [13] is used as the multiplier, the modulo baseline architecture. The preliminary modulo multiplier architecture we proposed in [20] is shown to be advantageous for sharing resources with [13]. Furthermore, a new radixBooth encoded modulo multiplier that maximally reuses the hardware blocks of [13] and [20] is proposed. The equivalences and modulo in the radix- Booth encoded modulo multiplier architectures initially presented in [23] are exploited to design the proposed radixBooth encoded multi-modulus multiplier. The structure of the modulo hard multiple generator (HMG) is modified to employ dual prefix operators for handling the complemented and non-complemented modified-generate and modified-propagate signal multiplier that pair. A new radix- Booth encoded modulo is analogous to [23] is also introduced. In both the proposed radix- and radix- Booth encoded multi-modulus multipliers, an efficient technique to include the bias associated with modulo negation is also proposed. The suggested modulo bias generation involves only hardwiring of the Booth encoder outputs and does not incur any additional hardware overhead. This paper is organized as follows. The properties of modulo , modulo and modulo arithmetic are discussed in Section II. In Sections III and IV, the proposed radixand radixBooth encoded multi-modulus multipliers are described with emphasis on multi-modulus partial product, hard multiple and bias generations as well as multi-modulus partial product accumulation. In Section V, the performances of the proposed multi-modulus multipliers are evaluated and based RNS multipliers compared against that employ the same encoding scheme. Conclusions are drawn in Section VI. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS II. PRELIMINARIES In this section, the fundamental properties of modulo , modulo and modulo arithmetic that will be employed in the design of radix- and radix- Booth encoded multimodulus multipliers are reviewed [4], [11], [15], [18], [19], [23]. All equations are assumed to be modulo-reduced by , unless otherwise specified. , and Let be the -bit multiplicand, multiplier and product, respectively of each modulo multiplier of the RNS. Specifically, the residues , and are in binary representation with dual zeros, i.e., and for the modulo channel and in diminished-1 representation for the modulo channel. Conversions to and from these representations will be handled by the RNS forward and reverse converters, respectively. Therefore, for the modulo and modulo multiplications, the binary product is to be computed from the binary represented inputs, and , whereas for the modulo multiplication, the diminished-1 product is to be computed directly from the diminished-1 inputs and . Hence, for . The multi-modulus multiplication can be expressed as follows. if if if (1) if In the radix- Booth encoding algorithm, adjacent multiplier bits and an overlap multiplier bit are scanned and converted to a signed digit , . Thus, the multiplier bits are encoded as multiplier digits as follows. (2) The translation of to as well as the generation and the addition of the modulo-reduced partial products can be simplified for efficient hardware implementation by the following number theoretic properties of modulo arithmetic. Property 1: Let denote the bit complement of . The modulo reduction of the binary weight of in excess of the modulus is given by if if if (3) Property 2: Let denote the one’s complement of . By the definition of additive inverse, the modulo negation of is given by if if if (4)
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS Property 3: Let denote the left shift of by bit positions where the resultant least significant bits (lsbs) are zeros. Furthermore, let the circular-left-shift and complementary-circular-left-shift operations denote the circular left shift of by bit positions, where the resultant lsbs are not inverted and inverted, respectively. As a corollary of Property 1, the modulo multiplication of by a positive power-of-two term is given by if if if (5) 3 Fig. 1. (a) Proposed multi-modulus radixMultiplexer (MUX3). where is even. The term Property 1 as follows. can be modulo-reduced using if if if Property 4: By Properties 2 and 3, the modulo multiplication of by is given by if if if (6) Property 5: The modulo addition of two operands, is given by (7) if (11) if if (8) BOOTH ENCODED MULTI-MODULUS MULTIPLIER A. Multi-Modulus Partial Product Generation in (2), the radix- if if where is the carry-out bit from the -bit addition of and and the binary weight of is modulo-reduced using Property 1. It must be highlighted that in modulo addition, the diminished-1 result is the sum of not only the diminished-1 addends, and , but also a constant one. Furthermore, modulo , modulo and modulo additions are equivalent to -bit end-around-carry (EAC), carry-ignore and complementary-end-around-carry (CEAC) additions, respectively. Property 6: The modulo addition of operands is given by By substituting multiplier is given by (10) Hence, (9) can be simplified to and if if if III. RADIX- Booth encoder. (b) Proposed 3:1 Booth encoded if where (12) The radix- Booth encoded digit is formatted using three bits: a sign bit and one-hot encoded magnitude bits, and . The proposed multi-modulus radix- Booth encoder using radix- Booth Encoder (BE2) slices and one MUX3 block is shown in Fig. 1(a) for . To select the appropriate bit corresponding to the modulus , or , a custom 3:1 multiplexer (MUX3) with 2-bit select input depicted in Fig. 1(b) is used. The input corresponding to the modulus , or is selected when is “00,” “01,” or “10,” respectively. From Fig. 1(b), it can be easily shown that equals and hence, MUX3 can be simplified to an XOR-AND implementation. In what follows, all illustrations of MUX3 block are assumed to include inputs. Using the radix- Booth encoded multiplier form of (11), the multi-modulus multiplication of (1) becomes if if (9) (13)
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS TABLE I MODULO-REDUCED PARTIAL PRODUCTS FOR RADIX- BOOTH ENCODING Fig. 2. Proposed multi-modulus partial product generation for radixencoding. Booth The radix- Booth encoded multi-modulus multiplication of (13) can then be reformulated as By Property 3, modulo , modulo and modulo multiplications of by can be implemented respectively by , operations, and operation with a bias of on . Similarly by Property 4, modulo , modulo and modulo multiplications of by can be implemented respectively by operation, operation with a bias of and operation with a bias of on . is shown to be equivalent to a vector for the modulus , the sum of a vector and a constant bias for the modulus and the sum of a vector and a constant bias for the modulus in Table I. In Table I, # denotes the concatenation operator. For the modulo reduced calculated based on Properties 3 and 4, the most significant bits of corresponding to and 0 as well as and 1 are bitwise complement of each other. However, for and 2, only the most significant bits of are bitwise complement of each other as shown below. (14) (15) is suitably modified such The bias corresponding to that the most significant bits of for are bitwise complement of for . (16) and , the Therefore in Table I, for the modulus expressions of and derived in (16) are employed. (17) In the following, the generation of the for the three moduli, , and is described. The standard circuit implementation of a bit slice of the radix- Booth Selector (BS2) generates a single bit of , i.e., , by selecting a bit of either the multiplicand or one-bit shifted multiplicand and conditionally inverting it. From Table I, the least significant bits of for moduli , and are , 0 and , respectively. Hence, MUX3 of Fig. 1(b) can be used at the output of the BS2 blocks in the least significant bit positions to select from , 0 or . Furthermore, when , the input to the BS2 block is ,0 or for modulus , or , respectively. Thus this input to the BS2 block is also selected using a MUX3. The proposed multi-modulus generation of the using BS2 and MUX3 blocks is shown in Fig. 2 for . The number of BS2 and MUX3 blocks required in the generation is and , respectively. B. Multi-Modulus Bias Generation using BS2 blocks, the Analogous to the generation of bias can be generated by decoding appropriately. This technique while being straightforward has an obvious limitation. By considering as an -bit word, the double the number of partial products to be added. Hence, the reduction in the number of partial products achieved by the radixBooth encoding technique is essentially nullified. An efficient technique to include with minimal hardware circuitry based on the properties of modulo and modulo arithmetic is proposed. From Table I, and for the modulus are tabulated in Table II. In order to determine the relation between and , is represented in terms of , and while is expressed as a function of the bit . From the three-variable Karnaugh map with don’t care minterms “011” and “111,” it can be shown that
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS BIAS FOR THE MODULUS BIAS FOR THE MODULUS TABLE II FOR RADIX- TABLE III FOR RADIX- 5 To negate this effect, the constant aggregate bias. The result is BOOTH ENCODING is subtracted from the BOOTH ENCODING (21) where Note: and . . Hence, the aggregate bias is given by the simplified expression and can be generated by merely hardwiring the output of the BE2 blocks to appropriate bit positions. (18) for the modulus are tabulated From Table I, and in Table III. Furthermore, is divided into a static bias and a dynamic bias . is specific to the radix of the Booth encoding but independent of . On the other hand, is dependent on . is represented in terms of , and while is expressed as , where , to determine the relation between and . The dynamic bias can be minimized by the three-variable Karnaugh map, which results in , and , where the operator “ ” denotes Boolean AND if its operands are Boolean variables. (19) (22) Both and can be implemented by hardwiring the BE2 outputs, and as well as logic one to appropriate bit positions via at most one level of AND gates. For , the aggregate bias is composed of two -bit words, and given by (21). For , the aggregate bias is composed of one -bit word given by (18). For , the aggregate bias equals zero as shown in Table I. These three aggregate biases are generalized in (23) as a sum of and given by (24) and (25), respectively, such that and for as well as for are zero. (23) if By applying Property 1 to the negatively weighted term, (19) becomes (24) if (20) The next step entails modulo-reduced addition of partial products. From Property 6, the modulo addition of operands inherently increments the result by . if
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS In (26), an -bit all-zeros partial product is included for the moduli and so that the number of partial products to be added is for all three moduli. if or Fig. 3. Proposed multi-modulus bias generation for radix- Booth encoding. if (26) From Property 5, a multi-modulus addition can be implemented using a MUX3 in the carry feedback path that selects from , 0 or . The multi-modulus addition of partial products in a CSA tree and a parallel-prefix two-operand adder is illustrated in Fig. 4(a) for . In Fig. 4(a), the parallel-prefix adder is constructed from the pre-processing (PP), prefix and post-processing blocks and the implementation of these blocks is shown in Fig. 4(b). The number of MUX3 blocks needed for multi-modulus partial product addition is . IV. RADIX- BOOTH ENCODED MULTI-MODULUS MULTIPLIER A. Multi-Modulus Partial Product Generation By substituting multiplier is given by Fig. 4. (a) Proposed multi-modulus partial product addition for radixencoding. (b) Implementation of parallel-prefix adder components. Fig. 5. Proposed multi-modulus radix- Booth Booth encoder. if in (2), the radix- Booth encoded (27) where . The radix- Booth encoded multiplier digit is formatted using five bits: a sign bit and four one-hot encoded magni, , and . radix- Booth tude bits, Encoder (BE3) blocks are used to encode the -bit multiplier as shown in Fig. 5 for . Using the radix- Booth encoded multiplier form of (27), the multi-modulus multiplication of (1) becomes if (28) (25) if if Equations (23)–(25) are implemented as shown in Fig. 3 for . and are simply hardwired to and hence, are not illustrated in Fig. 3. C. Multi-Modulus Partial Product Addition in (17) with (23), the radixBy replacing Booth encoded multi-modulus multiplication is given by (26). By Properties 3 and 4, the modulo-reduced partial prodare simplified and tabulated in Table IV. ucts and In Table IV, all partial products except , are shown to be generated by , and operations on or when is , and , respectively. In the subsequent section, a multi-modulus HMG is proposed whose output is , or if the selected modulus is , or , respectively. Thus the partial products corresponding to can be generated by , and operations
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS TABLE IV MODULO-REDUCED PARTIAL PRODUCTS FOR RADIX- 7 BOOTH ENCODING Fig. 6. Proposed multi-modulus partial product generation for radixencoding. Booth shifted multiplicand followed by a conditional bit inversion. According to Table IV, MUX3 blocks are used at the output of the BS3 blocks in the least significant bit positions to select , 0 or when the modulus is , or , from respectively. When , MUX3 blocks are used at the BS3 input to select from , 0 or and at the BS3 to select from , 0 or . Likewise, when input a MUX3 block is used at the BS3 input to select from , 0 or . The number of MUX3 blocks . needed is B. Multi-Modulus Hard Multiple Generation Application-specific adders known as HMGs that compute and to generate the hard multiple only the sum of were proposed in [23] for the moduli and . The HMGs were designed by reformulating the carry equations and modulo additions using the bit of modulo correlation between the addends, and . The carry equations for modulo and modulo HMGs are given by (30) and (31) for even and odd cases of , respectively. if if on or for the moduli , and , respectively, as shown in Table IV. The radix- Booth encoded multi-modulus multiplication of (28) can then be reformulated as (30) if if (31) (29) using Fig. 6 shows the generation of radix- Booth Selector (BS3) blocks for , where each bit slice of BS3 selects a bit from the multiplicand, the one-bit shifted multiplicand, the hard multiple or the two-bit where fied-generate and modified-propagate pair follows. if if and the modiis defined as
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS BIAS FOR THE MODULUS TABLE V FOR RADIX- BOOTH ENCODING Fig. 7. Proposed multi-modulus HMG. BIAS FOR THE MODULUS TABLE VI FOR RADIX- BOOTH ENCODING if (32) As modulo addition is equivalent to carry-ignore addition, the carry equation for modulo HMG becomes (33) where is defined as follows. Note: and . (34) From (30), (31), and (33), the parallel-prefix implementation of multi-modulus HMG is illustrated for in Fig. 7. It consists of three stages: pre-processing, prefix computation and post-processing. In the preprocessing stage, each operator “ ” computes using AND-OR and OR-AND gates. The bit is also computed by the XOR operation on the -th bits of the two addends. Two MUX3 blocks are required at the input of the pre-processing operator to select from , 0, or and , 0 or for computing and . In from the prefix-computation stage, is computed using the “ ” prefix operators in only levels. To implement the multimodulus carry generation, dual prefix operators are used in the first level such that the input to one prefix operator is while the input to the other prefix operator is selected from , (0,0) or using MUX3 blocks. The number of MUX3 blocks required is and the number of dual prefix operators in each prefix level is , where . In the post-processing stage, each operator “ ” computes a bit of the hard multiple by . When , a MUX3 is employed at the input of the post-pro, 0 or to implement cessing operator to select from EAC, carry-ignore, and CEAC additions, respectively. C. Multi-Modulus Bias Generation for the modulus are tabulated in From Table IV, and Table V. From the five-variable Karand the aggregate bias naugh map, it can be shown that can be simplified to (35) From Table IV, and for the modulus are tabulated is divided into a static bias in Table VI. that depends on the radix of the Booth encoding and a dynamic that depends on the multiplier digit . bias Karnaugh Using a five-variable map, the aggregate of the static bias, the dynamic bias and the constant that negates the constant incurred from addition of partial products, is the modulo reduced to (36)
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS 9 where Fig. 8. Proposed multi-modulus bias generation for radix- (37) The operators “ ” and “ ” denote Boolean OR and AND, respectively when their operands are Boolean variables. For , the aggregate bias is composed of three -bit words, , and given by (36). For , the aggiven by (35). gregate bias is composed of one -bit word , the aggregate bias equals zero as shown in For Table IV. They can be generalized in (38) as a sum of , and given by (39), (40), and (41), respectively, such that , and for as well as and for are zero. (See equation at bottom of page.) Booth encoding. Equations (38)–(41) are implemented as shown in Fig. 8 for . , , and are hardwired to logic “0” and hence, are not illustrated in Fig. 8. D. Multi-Modulus Partial Product Addition in (29) with (38), the radixBy replacing Booth encoded multi-modulus multiplication is given by (42). Two -bit all-zeros partial products are included for the moduli and so that the number of partial products to be added is for all three moduli. (See equation at bottom of page.) partial products The multi-modulus addition of in a CSA tree and a two-operand adder with MUX3 blocks in (38) (39) (40) (41) if (42) if
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS TABLE VII AREA, DELAY, AND TOTAL POWER DISSIPATION OF PROPOSED RADIXBOOTH ENCODED MULTI-MODULUS MULTIPLIERS AREA OF RADIX- Fig. 9. Proposed multi-modulus partial product addition for radixencoding. , MODULO , AND Booth the carry feedback paths is illustrated in Fig. 9 for . number of MUX3 blocks needed is TABLE VIII BOOTH ENCODED MODULO MODULO MULTIPLIERS . The V. PERFORMANCE COMPARISON In this section, the performances of the proposed radix- and radix- Booth encoded multi-modulus multipliers are evaluated. The efficiency of the proposed multi-modulus multiplier over a triad of single-modulus multipliers is demonstrated in reconfigurable and redundant modulo channels, where the computation of only a single modulo-reduced product is desired. The proposed radix- Booth encoded multi-modulus multiplier is compared with based RNS multiplier that consists of three radix- Booth encoded single-modulus multipliers, i.e., modulo , modulo and modulo multipliers. Four configurations of based RNS multipliers are compared. In Type A RNS multiplier, all three single-modulus multipliers are radix- Booth encoded while in Types B-D RNS multipliers, all three single-modulus multipliers are radix- Booth encoded. Hence, the proposed radix- Booth encoded multi-modulus multiplier is compared with Type A RNS multiplier while the proposed radix- Booth encoded multi-modulus multiplier is compared with Types B-D RNS multipliers. Type A RNS multiplier consists of modulo and modulo multipliers described in [23]. In Types B-D RNS multipliers, the multiplier of [13] is employed for the modulus . The multipliers of [15], [18] and [20] are employed for the modulus in Type B, Type C, and Type D, respectively. As radix- and radix- Booth encoded modulo multipliers have not been proposed in literature thus far, modulo multipliers described in Sections III and IV are employed in Type A and Types B-D, respectively. The multipliers were specified in VHDL and synthesized by Synopsys Design Compiler (version D-2010.03) under nominal synthesis environment, i.e., 25 and 1.8 V based on TSMC 0.18 1.8 V CMOS standard-cell library. The designs were optimized for area and delay independently. The average dynamic power dissipation is also estimated based on the Monte Carlo method with 99.9% confidence that the error is bounded below 3%. The minimum achievable area from area-constrained optimization, critical path delay from delay-constrained optimization and average total power dissipation of the proposed radix- Booth encoded multi-modulus multipliers for the computation of a single modulo-reduced product are tabulated in Table VII for and . The areas, delays, and total power dissipations of the radixBooth encoded single-modulus modulo , modulo , and modulo multipliers that comprise Types A-D RNS multipliers are tabulated in Tables VIII, IX, and X respectively. The area, delay and total power dissipation of Types A-D RNS multipliers for the computation of a single modulo-reduced product are tabulated in Table XI. The unused single-modulus multipliers are assumed to be gated off. In this table, the areas of Types A-D RNS multipliers are computed as the sum of the areas of the constituent single-modulus modulo , modulo , and modulo multipliers from Table VIII. The critical path delays of Types A-D RNS multipliers are determined as the maximum value amongst the delays of the constituent single-modulus modulo , modulo , and modulo multipliers from Table IX. The total power dissipations of Types A-D RNS multipliers are computed from Table X as the sum of the following two terms: a) the maximum dynamic power dissipation amongst modulo , modulo and modulo multipliers and b) the leakage power dissipations of modulo , modulo , and modulo multipliers. From Tables VII and XI, the percentage savings or increase in area, delay and power dissipation of the proposed
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS DELAY OF RADIX- TABLE IX BOOTH ENCODED MODULO MODULO MULTIPLIERS , MODULO , AND TABLE X TOTAL POWER DISSIPATION OF RADIX- BOOTH ENCODED MODULO MULTIPLIERS MODULO , AND MODULO , TABLE XI AREA, DELAY, AND TOTAL POWER DISSIPATION OF RADIX- BOOTH ENCODED BASED RNS MULTIPLIERS TABLE XII PERCENTAGE SAVINGS IN AREA, DELAY, AND TOTAL POWER DISSIPATION OF PROPOSED MULTI-MODULUS MULTIPLIERS OVER RNS MULTIPLIERS radix- Booth encoded multi-modulus multiplier over Type A RNS multiplier and of the proposed radix- Booth encoded multi-modulus multiplier over Types B-D RNS multipliers are tabulated in Table XII. 11 From Table XII, by replacing the three single-modulus multipliers in the based RNS multiplier by the proposed multi-modulus multiplier, an average reduction of 60% in area can be observed. However the proposed radixBooth encoded multi-modulus multiplier increases the critical path delay of Type A RNS multiplier by 13% in the worst case. The delay penalties of the proposed radix- Booth encoded multi-modulus multiplier over Types B-D RNS multipliers for are 8%, 18%, and 16%, respectively in the worst case. The maximum delay penalty is incurred by the proposed radix- Booth encoded multi-modulus over Type C RNS multiplier for . This can be attributed to the difference in the number of CSAs in the critical path as determined by the Dadda depth function. The proposed radix- and radixBooth encoded multi-modulus multipliers incur a power dissipation penalty of no more than 4 5% over Type A and Type B RNS multipliers, respectively which can be directly attributed to the additional power dissipated by the MUX3 blocks. It should be noted that power gating affects design architecture and requires sizing of sleep transistors at transistor level as well as additional time delays to assure safe entrance and exit of power gated mode. If clock gating is used to gate off the unused single-modulus multipliers, some area, delay, and power overheads will be incurred in replacing the input registers by integrated clock gating cells and the insertion of an -bit wide 3:1 multiplexer to select the output from one of the three modulo multipliers. These overheads are not included for Types A to D RNS multipliers in Table XI. Furthermore, owing to its simpler architecture, the proposed radix- Booth encoded multi-modulus multiplier lowers the power dissipation of Types C and D RNS multipliers maximally by 37% and 49%, respectively. The most recent dual-modulus multiplier [33] uses only radix- Booth-encoding and does not incorporate hardware reuse for the modulus. Based on the unit-gate model of [36], the area and delay of this dual-modulus radix- Booth encoded multiplier are compared with our proposed triple-modulus radix- Booth encoded multiplier in Table XIII. In this model, the area and delay of AND, NAND, OR, and NOR gates are assumed to be one unit each while the area and delay of XOR and XNOR gates are assumed to be two units each. In Table XIII, is the Dadda depth function. Since the final parallel-prefix adder is common to both multipliers, it is excluded from Table XIII. From Table XIII, the proposed multiplier employs additional unit-gates than [33], of which the overhead of unit-gates can be solely attributed to the use of MUX3 for triple-modulus selection in the proposed design instead of XOR for dual-modulus selection in [33]. Though in the proposed radixmulti-modulus multiplier, one additional partial product has to be generated and accumulated using AND gates, FAs, and 1 MUX3 of unit-gates, the overhead has been limited to only unit-gates. Since is a logarithmic function, it can be verified that the and are the same for all in the range from 36 to 64 except 52. The additional one unit-delay per CSA level, i.e., instead of , in the proposed multiplier over [33] is also due to the use of MUX3 as opposed to XOR for moduli selection. However, it should be noted that our proposed multiplier is more
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS TABLE XIII UNIT-GATE AREA AND DELAY COMPARISON OF RADIX- BOOTH ENCODED TRIPLE AND DUAL MODULUS MULTIPLIERS agile in that it can perform multi-modulus multiplications for more comprehensive RNS made of all three types of moduli. Moreover, for the same dynamic range, the value of of the dual-modulus multiplier [33] for the RNS multiplication is 1.5 times that of our proposed triple-modulus multiplier for the RNS multiplication. VI. CONCLUSION The equivalences in operations central to modulo multiplication, i.e., modulo negation, modulo reduction of binary weight, modulo multiplication by powers-of-two, and two-operand modulo addition for the three special moduli, , , and were demonstrated. New radix- and radix- Booth encoded modulo multipliers with architectures comparable to those of the corresponding modulo and modulo multipliers were introduced. With the correlation among modulo , modulo and modulo operations as the basis, radix- and radix- Booth encoded multi-modulus multipliers that perform modulo multiplication for the three special moduli successively were developed. The proposed multi-modulus multipliers were compared against RNS multipliers employing Booth encoding of the same radix. REFERENCES [1] R. Conway and J. Nelson, “Improved RNS FIR filter architectures,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 1, pp. 26–28, Jan. 2004. [2] A. S. Madhukumar and F. Chin, “Enhanced architecture for residue number system-based CDMA for high rate data transmission,” IEEE Trans. Wireless Commun., vol. 3, no. 5, pp. 1363–1368, Sep. 2004. elliptic [3] D. M. Schinianakis et al., “An RNS implementation of an curve point multiplier,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 6, pp. 1202–1213, Jun. 2009. [4] N. S. Szabo and R. I. Tanaka, Residue Arithmetic and its Application to Computer Technology. New York: McGraw-Hill, 1967. [5] B. Cao, C. H. Chang, and T. Srikanthan, “An efficient reverse converter based on the New for the 4-moduli set Chinese Remainder Theorem,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 10, pp. 1296–1303, Oct. 2003. [6] P. V. A. Mohan and A. B. Premkumar, “RNS-to-binary converters for and two four-moduli sets ,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 6, pp. 1245–1254, Jun. 2007. [7] P. V. A. Mohan, “RNS-to-binary converter for a new three-moduli set ,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 9, pp. 775–779, Sep. 2007. [8] A. S. Molahosseini, K. Navi, C. Dadkhah, O. Kavehei, and S. Timarchi, “Efficient reverse converter designs for the new 4-moduli sets and based on New CRTs,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 4, pp. 823–835, Apr. 2010. [9] R. Zimmermann, “Efficient VLSI implementation of modulo addition and multiplication,” in Proc. 14th IEEE Symp. Comput. Arithmetic, Adelaide, Australia, Apr. 1999, pp. 158–167. [10] L. Kalampoukas, D. Nikolos, C. Efstathiou, H. T. Vergos, and J. Kalaadders,” IEEE matianos, “High-speed parallel-prefix modulo Trans. Comput., vol. 49, no. 7, pp. 673–680, Jul. 2000. [11] S. J. Piestrak, “Design of squarers modulo A with low-level pipelining,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 1, pp. 31–41, Jan. 2002. [12] H. T. Vergos, C. Efstathiou, and D. Nikolos, “Diminished-one modulo adder design,” IEEE Trans. Comput., vol. 51, no. 12, pp. 1389–1399, Dec. 2002. [13] C. Efstathiou, H. T. Vergos, and D. Nikolos, “Modified Booth modulo multiplier,” IEEE Trans. Comput., vol. 53, no. 3, pp. 370–374, Mar. 2004. [14] C. Efstathiou, H. T. Vergos, G. Dimitrakopoulos, and D. Nikolos, “Efmultipliers,” IEEE Trans. Comput., ficient diminshed-1 modulo vol. 54, no. 4, pp. 491–496, Apr. 2005. [15] L. Sousa and R. Chaves, “A universal architecture for designing effimultipliers,” IEEE Trans. Circuits Syst. I, Reg. cient modulo Papers, vol. 52, no. 6, pp. 1166–1178, Jun. 2005. squarer [16] H. T. Vergos and C. Efstathiou, “Diminished-1 modulo design,” IEE Comput. Digit. Tech., vol. 152, no. 5, pp. 561–566, Sep. 2005. [17] H. T. Vergos and C. Efstathiou, “Design of efficient modulo multipliers,” IET Comput. Digit. Tech., vol. 1, no. 1, pp. 49–57, Jan. 2007. multipliers for [18] J. W. Chen and R. H. Yao, “Efficient modulo diminished-1 representation,” IET Circuits, Devices, Syst., vol. 4, no. 4, pp. 291–300, Jul. 2010. [19] R. Muralidharan and C. H. Chang, “Radix-8 Booth encoded modulo multipliers with adaptive delay for high dynamic range Residue Number System,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 5, pp. 982–993, May 2011. [20] R. Muralidharan and C. H. Chang, “A simple radix-4 Booth encoded multiplier,” in Proc. IEEE Int. Symp. Circuits Syst., Rio modulo de Janeiro, Brazil, May 2011, pp. 1163–1166. [21] C. Efstathiou, K. Pekmestzi, and N. Axelos, “On the design of modulo multipliers,” in Proc. 14th Euromicro Conf. Digit. Syst. Design, Oulu, Finland, Aug. 2011, pp. 453–459. multi[22] J. W. Chen, R. H. Yao, and W. J. Wu, “Efficient modulo pliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 12, pp. 2149–2157, Dec. 2011. [23] R. Muralidharan and C. H. Chang, “Area-power efficient modulo and modulo multipliers for based RNS,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 10, pp. 2263–2274, Oct. 2012. [24] V. Paliouras and T. Stouraitis, “Multifunction architectures for RNS processors,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 8, pp. 1041–1054, Aug. 1999. [25] D. Adamidis and H. T. Vergos, “RNS multiplication/sum-of-squares units,” IET Comput. Digit. Tech., vol. 1, no. 1, pp. 38–48, Jan. 2007. [26] G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re, “Residue Number System reconfigurable datapath,” in Proc. IEEE Int. Symp. Circuits Syst., Scottsdale, AZ, USA, May 2002, vol. 2, pp. 756–759. [27] W. K. Jenkins and A. J. Mansen, “Variable word length DSP using serial-by-modulus residue arithmetic,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Minneapolis, MN, USA, Apr. 1993, vol. 3, pp. 89–92. [28] W. K. Jenkins and B. A. Schnaufer, “Fault Tolerant architectures for efficient realization of common DSP kernels,” in Proc. 35th Midwest Symp. Circuits Syst., Washington, DC, USA, Aug. 1992, vol. 2, pp. 1320–1323.
  • This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS [29] W. K. Jenkins, B. A. Schnaufer, and A. J. Mansen, “Combined system level redundancy and modular arithmetic for fault tolerant digital signal processing,” in Proc. 11th Symp. Comput. Arithmetic, Windsor, Canada, Jun. 1993, pp. 28–35. [30] H. T. Vergos and D. Bakalis, “Area-time efficient multi-modulus adders and their applications,” Microprocessors Microsyst., vol. 36, no. 5, pp. 409–419, Jul. 2012. [31] C. H. Chang, S. Menon, B. Cao, and T. Srikanthan, “A configurable dual-moduli multi-operand modulo adder,” in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, May 2005, vol. 2, pp. 1630–1633. [32] S. Menon and C. H. Chang, “A reconfigurable multi-modulus modulo multiplier,” in Proc. IEEE Asia-Pacific Conf. Circuits Syst., Singapore, Dec. 2006, pp. 1168–1171. [33] E. Vassalos, D. Bakalis, and H. T. Vergos, “Configurable Booth-enmultipliers,” in Proc. 8th Conf. Ph.D. Res. Micoded modulo croelectron. Electron., Aachen, Germany, Jun. 2012, pp. 1–4. [34] R. Muralidharan and C. H. Chang, “Fixed and variable multi-modulus squarer architectures for triple moduli base of RNS,” in Proc. IEEE Int. Symp. Circuits Syst., Taipei, Taiwan, May 2009, pp. 441–444. [35] D. Bakalis and H. T. Vergos, “Area-efficient multi-moduli squarer for RNS,” in Proc. 13th Euromicro Conf. Digit. Syst. Design, Lille, France, Sep. 2010, pp. 408–411. [36] A. Tyagi, “A reduced-area scheme for carry-select adders,” IEEE Trans. Comput., vol. 42, no. 10, pp. 1163–1170, Oct. 1993. Ramya Muralidharan (S’07–M’13) received the B.E. degree from Anna University, Chennai, India, in 2006 and the Ph.D. degree from Nanyang Technological University, Singapore, in 2012. She is currently working as a Hardware System Engineer in San Diego, CA, USA. Her research interests include residue number system, computer arithmetic circuits and digital IC design. 13 Chip-Hong Chang (S’92–M’98–SM’03) received the B.Eng. (Hons.) degree from National University of Singapore in 1989, and the M. Eng. and Ph.D. degrees from Nanyang Technological University (NTU), Singapore, in 1993 and 1998, respectively. He served as a Technical Consultant in industry prior to joining the School of Electrical and Electronic Engineering (EEE), NTU, in 1999, where he is currently an Associate Professor. He holds joint appointments with the university as Assistant Chair of Alumni of the School of EEE since June 2008, Deputy Director of the Center for High Performance Embedded Systems since 2000, and the Program Director of the Center for Integrated Circuits and Systems from 2003 to 2009. He has coedited one book, published four book chapters and more than 180 research papers in refereed international journals and conferences. His current research interests include low power arithmetic circuits, residue number system, digital filter design, and digital watermarking for VLSI intellectual property protection. Dr. Chang has served as Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I in 2010-2012, the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS since 2011 and the IEEE Access Journal since 2013, the Editorial Advisory Board Member of the Open Electrical and Electronic Engineering Journal since 2007 and the Editorial Board Member of the JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING since 2008. He also guest edited several special issues for the Journal of Circuits, Systems and Computers in 2011 and 2012, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART I: REGULAR PAPERS and the JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING in 2012, and served in many international conference advisory and technical program committees. He is a Fellow of the IET.