SlideShare a Scribd company logo
1 of 14
Download to read offline
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
256
AUTOMATED HDL GENERATION OF TWO’S COMPLEMENT
WALLACE MULTIPLIER WITH PARALLEL PREFIX ADDERS
Bharat Kumar Potipireddi1
, Dr. Abhijit Asati2
1
(EEE, BITS PILANI, Pilani, Rajasthan, India)
2
(EEE, BITS PILANI, Pilani, Rajasthan, India)
ABSTRACT
Wallace multipliers are among the fastest multipliers owing to their logarithmic delay.
The partial products of two’s complement multiplication are generated by an algorithm
described by Baugh-Wooley. The complicated reduction of partial products by Wallace
algorithm and use of Parallel Prefix adders with logarithmic delay in the final stage of
addition makes it difficult to write a generic Verilog code for them. To solve this difficulty,
we described a C program which automatically generates a Verilog file for a Wallace
multiplier of user defined size with Parallel Prefix adders like Kogge-Stone adder, Brent-
Kung adder and Han-Carlson adder. We compared their post layout results which include
propagation delay, area and power consumption. The Verilog codes have been synthesized
using 90nm technology library. We observed that the multiplier using Kogge-Stone adder in
the final stage gives higher speed and lower Power Delay Products when compared to that
using Brent-Kung and Han- Carlson adders.
Keywords: Brent-kung adder, Han-Carlson adder, Kogge-Stone adder, Two’s complement
multiplication, Wallace multiplier.
I. INTRODUCTION
High speed multiplication is a fundamental requirement in many high performance
digital systems. For this purpose, parallel multiplication schemes have been developed. There
are two classes of parallel multipliers, namely array multipliers and tree multipliers. Tree
multipliers, also known as column compression multipliers, are known for their higher speeds
making them very useful in high speed computations. Their propagation delay is proportional
to the logarithm of the operand word length in comparison to array multipliers whose delay is
INTERNATIONAL JOURNAL OF ELECTRONICS AND
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 4, Issue 3, May – June, 2013, pp. 256-269
© IAEME: www.iaeme.com/ijecet.asp
Journal Impact Factor (2013): 5.8896 (Calculated by GISI)
www.jifactor.com
IJECET
© I A E M E
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
257
directly proportional to operand word length [1]. Column compression multipliers are faster
than array multipliers but have an irregular structure and so their design is difficult. With the
improvement in VLSI design techniques and process technology, designs which were
previously infeasible or too difficult to be implemented by manual layout can now be
implemented through automated synthesis.
Two of the most well-known column compression multipliers have been presented by
Wallace [3] and Dadda [4]. Both architectures are similar with the difference occurring in the
procedure of reduction of the partial products and the size of the final adder. In Wallace’s
scheme, the partial products are reduced as soon as possible. On the other hand, Dadda’s
method does minimum reduction necessary at each level and requires the same number of
levels as Wallace multiplier [5]. As a result, final adder in Wallace multiplier is slightly
smaller in size as compared to the final adder in Dadda multiplier.
This paper presents a C program which generates a Verilog file for Wallace multiplier
of user specified size. A comparison of post synthesis and post layout results between
Wallace multiplier of varying sizes with Parallel Prefix adders in the final stage is also
presented. Sections II and III explain the algorithm of Wallace and its implementation in C to
create an automatic Verilog file generator. Section IV gives the post synthesis and post layout
results for the multiplier of varying sizes.
II. WALLACE MULTIPLIER ARCHITECTURE
The Wallace multiplier architecture can be divided into three stages. The first stage
involves generation of partial products by two's complement parallel array multiplication
algorithm presented by Baugh-Wooley [2]. In their algorithm, signs of all partial product bits
are positive. It's different from conventional two's complement multiplication which generates
partial product bits with negative and positive signs. The final product obtained after the
reduction is also in two’s complement form. Fig. 1 shows generation of partial products for
4x4 multiplier by Baugh-Wooley method. Fig. 2(a) shows the arrangement of the partial
products for an 8x8 multiplier. The dots represent the partial products.
In the second stage, the partial product matrix is reduced to a height of two using the
column compression procedure developed by Wallace. The iterative procedure for doing this
is as follows:
Find out the maximum height of columns in the dot matrix array. If it is greater than 2,
reduce the height by following the recursive procedure described below.
1. Check the height of each column. If it is 1, no reduction is done. If it is 2, use a half
adder else use a full adder and check the height of column again. Continue the
reduction till the height of column becomes ≤ 1.
2. Repeat the above step for all other columns and at the end, enqueue the sum bits of all
half adders and full adders into the same columns and carry bits into the adjacent
columns.
3. Again find out the maximum height of columns and continue the reduction using the
above recursive procedure till maximum height reaches 2.
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
258
Figure 1. Generation of Partial Products of 4x4 Two's complement multiplier by Baugh
Wooley's method
a)
b)
c)
d)
e)
Figure 2. Column Compression scheme for 8x8 Wallace multiplier
Fig. 2 (b), (c), (d) and (e) show the reduction stages for an 8x8 Wallace multiplier.
Once the height of matrix is reduced to two, an adder is used to generate the final product. The
paper describes the use of different Parallel Prefix adders for final adder stage which is described
later in section III-D.
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
259
III. VERILOG CODE GENERATION
The main objective in writing a C program was to output a verilog file for an NxN
Wallace multiplier based on the user input N. We implemented the Wallace algorithm and wrote
the verilog code to a file using C to achieve this.
III.I Initialization of Verilog Modules
Initially, the program prompts the user for the size of the multiplier. After accepting the
size of the multiplier, the program creates an empty Verilog file and prints the half adder and full
adder modules in it. Next, it prints the gate level ‘and’ primitive for partial product generation.
After this the top-level multiplier module is printed using sub-modules generated above.
The top module contains two N-bit ‘input’ data types one 2N bit ‘output’ data type for the
multiplier, multiplicand and product bits respectively. The program also sets up N2
‘wire’ data
type to store the partial products. Once all the above modules and data types are set up in the
Verilog file, the program prints the required number of half adders and full adders to reduce
column heights as explained earlier in section-II. Column reduction utilizes the dot matrix array in
C, which is described below.
III.II Dot Matrix Array Creation
For the purpose of implementing the column compression according to Wallace’s
algorithm, the program creates an equivalent of a dot matrix array internally. The columns of the
array are represented by queues (First-In-First-Out data structure). Hence the program creates 2N
queues, where each of the queues is implemented as a linked list.
The sizes and pointers to the first and last element of all queues are stored. This allows
quicker enqueuing and dequeuing operations. Every element in a queue stores a string of
characters, particularly the name of the wires carrying the partial product. Initially, the queues are
loaded with the names of the wires holding the partial products.
III.III Array Reduction
Once the array, using the queues, has been created in C, it needs to be reduced according to
Wallace’s algorithm. The recursive procedure described in the section-II is implemented in C
using a ‘while loop’. The size of the queues (height of columns) is reduced by using full adders
and half adders. The loop stops executing when the length of all queues (height of all columns) is
two or less.
Let ‘a[2N]’ represents an array of ‘2N’ queues to store the names of partial products during
reduction. Let a[i] size represents the size of ith
queue. At the start of every iteration, the sizes of
all the 2N queues are checked and their maximum is stored in ‘max’. Let temp_size[i] holds the
size of ith
queue a[i] size and it is reassigned with new size after every iteration. Every dequeue
operation will decrement the size of queue, a[i] size by 1 and every enqueue will increment its
size by 1 and temp_size[i] will be decremented by 1 for every dequeue.
If temp_size[i] = 2, queue a[i] is reduced using a half adder. The first two elements are
dequeued and used as input to the half adder, temp_size[i] is decremented by 2 and becomes 0.
The sum from the half adder is enqueued to the same queue and the carry out is enqueued to the
next queue in sequence. If temp_size[i] > 2, full adder is used to reduce the queue. When a full
adder is used, the first three elements of the queue are dequeued and are supplied as inputs to the
full adder. The sum from the full adder is enqueued to the same queue while the carry out is
enqueued to the next queue in sequence and temp_size[i] is decremented by 3. If temp_size[i]
becomes ≤ 1, the reduction of queue a[i] is stopped else the above procedure is followed
recursively. After a queue is reduced, the next queue in sequence is taken up for reduction. At the
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
260
end of iteration, maximum height of dot matrix array is found out and is stored in ‘max’. The
iterations continue till ‘max’ becomes ≤ 2 i.e at most two elements remain in each queue. Once
such a state has been reached the ‘while’ loop exits and the reduction phase is completed.
The array reduction using queues is explained with the help of an example. A 4x4 Wallace
multiplication is taken as the example
a)
b)
c)
d)
e)
Figure 3. Reduction of Queues for a 4x4 Wallace multiplier
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
261
The state of the queues before the start of the reduction is shown in fig. 3(a). The reduction of
the queues for a 4x4 multiplier is explained below:
III.III.I Iteration One
On checking the sizes of all the queues, it is observed that the maximum size ‘max’ for
this iteration is 4. Since the size of Queue1 = 1, no reduction is done. The size of Queue2 is 2,
hence a half adder is used for reduction. The first two elements of the queue are dequeued and
summed in a half adder (‘ha0’) by printing a half adder in the Verilog file. Now temp_size[1] of
this queue becomes 0. The sum bit of the half adder is assigned to a wire ‘ha0s’ and the carry out
bit is assigned to the wire ‘ha0c’. In accordance with the algorithm, ‘ha0s’ is enqueued to Queue2
and ‘ha0c’ is enqueued to Queue3. Since temp_size[2] for Queue3 is 3, a full adder (‘fa0’) is used
for reduction by printing a full adder in verilog file. The addition of the full adder leads to
dequeuing of the first three elements of Queue3 and enqueuing of the sum of the full adder‘fa0s’
to Queue3 and the carry out of the full adder ‘fa0c’ to Queue4. Now for reduction of Queue4,
consider temp_size[3] of this Queue which is 4. So, a full adder(‘fa1’) is used for reduction and
it’s sum ‘fa1s’ is enqueued into Queue4 and carry ‘fa1c’ is enqueued into Queue5. Now
temp_size[2] is decremented by 3 and becomes 1 and so the reduction stops. The reduction of
Queues 5 and 6 follow similar procedure described above and the reduction of all Queues in this
iteration is shown in Fig. 3(b). The elements above the arrow in the queue are the ones which are
enqueued in this iteration. The state of the queues at the end of this iteration is shown in Fig. 3(c).
III.III.IIIteration Two
On checking the sizes of all the queues, it is observed that the maximum size ‘max’ for
the second iteration is 3. The variable temp_size[i] is assigned the new size of ith
array a[i] size.
All queues with sizes greater than 1 need to be reduced. The program starts checking the sizes of
the queues sequentially. Queue 3 has a size of 2 and hence a half adder (‘ha2’) is used for
reduction. Full adders are used for the reduction of Queues 4 and 5 since their size is 3. Queues 6
and 7 are reduced using half adders. Fig. 3(d) shows the elements dequeued and enqueued to all
the Queues in this iteration. The final state of the queues at the end of this iteration is shown in
Fig. 3(e).
III.IV Final Stage Adder
Once the size of all queues has been reduced to two or less, the elements in the queues are
ready to be summed using an adder. If a[i] size is 1, the only element in the Queue a[i] is
dequeued and assigned to prod[i], where ‘prod’ is an array holding the ‘2N’ bit product of an NxN
multiplier. Let ‘st_index’ gives the value of i such that size of all queues before a[i] is 1. The size
of the final adder is ‘2N – st_index’. The first elements of all queues form the first input to the
adder and the second elements form the second input to the adder. Different Parallel Prefix adders
like Kogge-Stone adder, Brent-Kung adder and Han-Carlson adder are used. The implementation
of these adders is described below.
III.IV.I Kogge-Stone Adder
The Kogge-Stone adder generates carry signal in O(log n) time [6]. A radix-2 Kogge-
Stone adder of size ‘2N-st_index’ has been used as the final adder for an NxN multiplier. Fig. 4
shows the example of an 8-bit Kogge-Stone adder with no carry-in. The first bit in the boxes is the
propagate bit while the second one is the generate bit. Initially (stage zero) in an N-bit Kogge-
Stone adder, the propagate and generate bits are generated according to (1).
G0,n = an bn for 1 ≤ n ≤ N
P0,n = an + bn for 1 ≤ n ≤ N (1)
where an and bn are the nth
bits of the two inputs.
International Journal of Electronics and Communication Engineering &
ISSN 0976 – 6464(Print), ISSN 0976
Figure 4.
In the ith
(i ≥ 1) stage the propagate and generate bits in n
according to (2).
Gi,n = Gi-1,n
Pi,n = Pi-1,n
Gi,n = Gi-1,n + (Pi-1,n Gi-1,m
Pi,n = Pi-1,n Pi-1,m
where m is given by (3)
m =n – 2i-1
Finally, the carry and sum bits are calculated according to (4).
Cn = Gf,n
Sn = P0,n Cn-1
where Sn and Cn are the nth
bits of the sum and carry respectively. G
nth
block in the final stage.
The numbers of stages in the adder depend upon its size. In a Kogge
size N, there are ceil(log2N)+1 stages. Since we need a
stage of an NxN multiplier, the number of stages in the adder are
The logarithmic delay of Kogge
delay of the multiplier.
International Journal of Electronics and Communication Engineering & Technology (IJECET),
6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
262
Figure 4. Example of 8 bit Kogge-Stone adder
1) stage the propagate and generate bits in nth
block (Pi,n and Gi,n) are calculated
for 1 ≤ n ≤ 2i-1
+1
for 1 ≤ n ≤ 2i-1
+1
1,m) for 2i-1
+2 ≤ n ≤ N
for 2i-1
+2 ≤ n ≤ N (2)
(3)
Finally, the carry and sum bits are calculated according to (4).
1 (4)
bits of the sum and carry respectively. Gf,n is the generate bit of the
The numbers of stages in the adder depend upon its size. In a Kogge-Stone adder of
stages. Since we need a ‘2N-st_index’ bit adder for the final
multiplier, the number of stages in the adder are ceil[log2(2N-
The logarithmic delay of Kogge-Stone adder is crucial in maintaining overall logarithmic
Technology (IJECET),
June (2013), © IAEME
) are calculated
is the generate bit of the
Stone adder of
bit adder for the final
-st_index)]+1.
Stone adder is crucial in maintaining overall logarithmic
International Journal of Electronics and Communication Engineering &
ISSN 0976 – 6464(Print), ISSN 0976
III.IV.II Brent-Kung Adder
Kogge-Stone Adder has higher performance but it takes larger area to be laid out.
Brent-Kung adder can be laid out in lesser area than Kogge
congestion, but its time performance is poor[7]. In a Brent
Ceil[2*log2(N)] stages. Initially (stage zero) in an N
generate bits are generated according to (5).
G0,n = an bn for 1 ≤ n ≤
P0,n = an + bn for 1 ≤ n
where an and bn are the nth
bits of the two inputs
In the ith
(1≤ i ≤ log2(N)) stage the propagate and generate bits in n
calculated according to (6).
Gi,n = Gi-1,n + (Pi-1,n Gi-1,m
Pi,n = Pi-1,n Pi-1,m
Gi,n = Gi-1,n
Pi,n = Pi-1,n
where m is given by (7)
m =n – 2i-1
Figure 5.
International Journal of Electronics and Communication Engineering & Technology (IJECET),
6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
263
Stone Adder has higher performance but it takes larger area to be laid out.
out in lesser area than Kogge-Stone Adder and has lesser wiring
congestion, but its time performance is poor[7]. In a Brent-Kung adder of size N, there are
(N)] stages. Initially (stage zero) in an N-bit Brent-Kung adder, the propagate and
generate bits are generated according to (5).
n ≤ N
≤ n ≤ N (5)
bits of the two inputs
) stage the propagate and generate bits in nth
block (Pi,n and G
1,m) for n=2i
.k ∀ k ∈ Z+
& n ≤ N
for n=2i
.k ∀k ∈ Z+
& n ≤ N
for 1 ≤ n ≤ N & n!=2i
.k ∀ k ∈ Z+
for 1 ≤ n ≤ N & n!=2i
.k ∀ k ∈ Z+
(6)
(7)
Figure 5. Example of 8 bit Brent-Kung adder
Technology (IJECET),
June (2013), © IAEME
Stone Adder has higher performance but it takes larger area to be laid out.
Stone Adder and has lesser wiring
Kung adder of size N, there are
Kung adder, the propagate and
and Gi,n) are
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
264
To calculate generate and propagate bits of ith
stage (log2(N)+1 ≤ i <2*log2(N)), we
define three variables count, x and y.
count =k where 2k
is the smallest integer ≥ N and x ∈ Z+
, x≥ 2 and x=2 for i= log2(N)+1 and
is incremented by 1 for every subsequent stage and
z=2count
–2count-2
y= 2count
–2count-x
(8)
Now in the ith
stage the propagate and generate bits in nth
block (Pi,n and Gi,n) are calculated
according to
Gi,n = Gi-1,n + (Pi-1,n Gi-1,m) for n=y-l and N ≥ n ≥ z/2
Pi,n = Pi-1,n Pi-1,m for n=y-l and N ≥ n ≥ z/2
Gi,n = Gi-1,n for 1 ≤ n ≤ N and n!=y-l
Pi,n = Pi-1,n for 1 ≤ n ≤ N and n!=y-l (9)
where m and l are given by (10) and (11) respectively
m= n – 2count- x
(10)
l= (2count
–2count-x+1
).k, ∀ k ∈ Z+
U {0} (11)
Finally, the carry and sum bits are calculated according to (12).
Cn = Gf,n
Sn = P0,n Cn-1 (12)
where Sn and Cn are the nth
bits of the sum and carry respectively. Gf,n is the generate bit of
the nth
block in the final stage.
Fig. 5 shows the example of an 8 bit Brent-Kung Adder. Since we need a ‘2N-st_index’ bit
adder for the final stage of an NxN multiplier, the number of stages in the adder are
ceil[2*log2(2N-st_index)].
III.IV.III Han-Carlson adder
Brent-Kung adder reduces area and power but do not produce minimum depth
parallel prefix circuits. Their delay time is also high (2*log2N – 1). Kogge-Stone adder has
lesser delay (log2N) but it has high area and power. By combining B-K & K-S graphs, Han
and Carlson obtained a new hybrid prefix graph that achieves intermediate values of area and
time[8]. An example of 8 bit Han-Carlson Adder is shown in Fig. 6. The numbers of stages in
the adder depend upon its size. In a Han-Carlson adder of size N, there are Ceil[log2(N)+2]
stages.
Initially (stage zero) in an N-bit Han-Carlson adder, the propagate and generate bits are
generated according to (13).
G0,n = an bn for 1 ≤ n ≤ N
P0,n = an + bn for 1 ≤ n ≤ N (13)
where an and bn are the nth
bits of the two inputs
In the ith
(1≤ i ≤ log2(N)) stage the propagate and generate bits in nth
block (Pi,n and Gi,n) are
calculated according to (14)
Gi,n = Gi-1,n + (Pi-1,n Gi-1,m) for 2i-1
+2 ≤ n ≤ N, n is even
Pi,n = Pi-1,n Pi-1,m for 2i-1
+2 ≤ n ≤ N, n is even
Let S={n│2i-1
+2 ≤ n ≤ N, n is even}
Gi,n = Gi-1,n for 1 ≤ n ≤ N , ∀ n ∉ S
Pi,n = Pi-1,n for 1 ≤ n ≤ N , ∀ n ∉ S (14)
where m =n – 2i-1
(15)
In the last stage(i=log2(N)+1) the propagate and generate bits in nth
block (Pi,n and Gi,n) are
calculated according to (16)
International Journal of Electronics and Communication Engineering &
ISSN 0976 – 6464(Print), ISSN 0976
Gi,n = Gi-1,n + (Pi-1,n Gi-1,m
Pi,n = Pi-1,n Pi-1,m
Let R={n│3 ≤ n ≤ N, n is odd}
Gi,n = Gi-1,n
Pi,n = Pi-1,n
Finally, the carry and sum bits are calculated according to (17).
Cn = Gf,n
Sn = P0,n Cn-1
Figure 6.
where Sn and Cn are the nth
bits of the sum and carry respectively. G
the nth
block in the final stage.
Since we need a ‘2N-st_index’ bit adder for the final stage of an NxN multiplier, the number
of stages in the adder is given by
IV. RESULTS AND DISCUSSION
All the multiplier designs use Verilog as the HDL. The synthesis
results of Wallace multiplier with all the parallel prefix adders discussed in the previous
section were compared. This section prov
micrometer and power in milliwatt for all the architectures mentioned above. The synthesis
International Journal of Electronics and Communication Engineering & Technology (IJECET),
6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
265
1,m) for 3 ≤ n ≤ N, n is odd
for 3 ≤ n ≤ N, n is odd
for 1 ≤ n ≤ N, n ∉ R
for 1 ≤ n ≤ N, n ∉ R (16)
s are calculated according to (17).
1 (17)
Example of 8 bit Han-Carlson adder
bits of the sum and carry respectively. Gf,n is the generate bit of
bit adder for the final stage of an NxN multiplier, the number
of stages in the adder is given by ceil[log2(2N-st_index)+2].
DISCUSSION
All the multiplier designs use Verilog as the HDL. The synthesis and post layout
multiplier with all the parallel prefix adders discussed in the previous
section were compared. This section provides the delay in millisecond, area in square
micrometer and power in milliwatt for all the architectures mentioned above. The synthesis
Technology (IJECET),
June (2013), © IAEME
is the generate bit of
bit adder for the final stage of an NxN multiplier, the number
and post layout
multiplier with all the parallel prefix adders discussed in the previous
ides the delay in millisecond, area in square
micrometer and power in milliwatt for all the architectures mentioned above. The synthesis
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
266
was performed in Cadence RTL Compiler using 90 nm UMC technology libraries at typical
(tt) conditions. The synthesized netlist was used along with design constraint .sdc file,
technology library file and .lef files for generating the layout using SOC Encounter tool. The
layouts have been done for Wallace multiplier of different sizes and their post layout results
have been tabulated. The post layout results follow the same trend as the post synthesis
results.
TABLE I. COMPARISON OF SYNTHESIS AND POST LAYOUT RESULTS FOR WALLACE
MULTIPLIER WITH KOGGE-STONE ADDER
TABLE II. COMPARISON OF SYNTHESIS AND POST LAYOUT RESULTS FOR WALLACE
MULTIPLIER WITH BRENT-KUNG ADDER
TABLE III. COMPARISON OF SYNTHESIS AND POST LAYOUT RESULTS FOR WALLACE
MULTIPLIER WITH HAN-CARLSON ADDER
Size
Synthesis
(Pre Layout)
Post Layout
Delay
(ns)
Area
(µm2
)
Delay
(ns)
Area
(um2
)
16 5.495 9404 5.626 22309
32 7.128 37939 7.702 92336
48 8.191 84688 10.144 200909
64 9.570 150174 10.998 356264
96 10.845 335926 15.691 796928
Size
Synthesis
(Pre Layout)
Post Layout
Delay
(ns)
Area
(µm2
)
Delay
(ns)
Area
(um2
)
16 5.004 9707 5.021 23028
32 6.698 38922 6.933 93159
48 7.70 86692 8.772 205662
64 8.804 152986 9.421 362935
96 10.262 340522 13.761 807831
Size
Synthesis
(Pre Layout)
Post Layout
Delay
(ns)
Area
(µm2
)
Delay
(ns)
Area
(um2
)
16 5.622 9330 5.977 20220
32 9.556 37002 9.616 90004
48 13.362 83637 12.763 198416
64 15.150 148553 15.458 352419
96 19.018 331952 19.811 787504
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
267
TABLE IV. COMPARISON OF POST LAYOUT RESULTS FOR WALLACE MULTIPLIER WITH
PARALLEL PREFIX ADDERS
Size
Kogge-Stone Brent-Kung Han-Carlson
Power
(mW)
PDP
Power
(mW)
PDP
Power
(mW)
PDP
16 0.437 2.194 0.370 2.215 0.411 2.312
32 1.145 7.938 0.961 9.241 1.049 8.079
48 1.859 16.307 1.356 17.306 1.665 16.89
64 2.955 27.839 2.248 34.749 2.685 29.53
96 5.795 79.745 4.776 94.617 5.295 83.083
The post layout area is the area of core which includes area of standard cells as well
as the area of interconnect wires. Fig.7 shows that the multiplier with Kogge-Stone adder in
the final stage is much faster than with all other adders but it’s power consumption is the
highest of all as shown in Fig. 8. Fig.9 compares the Power Delay Products. It can be
observed that Wallace multiplier with Kogge-Stone and Han-Carlson adders in the final stage
has lower Power Delay Products than with Brent-Kung adder. Wallace multiplier with Brent-
Kung adder gives lower power and area when compared to that with Kogge-Stone and Han-
Carlson adders but owing to the higher delay, its Power Delay Product is also higher.
Figure 7. Post Layout Delay Comparison
10 20 30 40 50 60 70 80 90 100
5
10
15
20
Size
Delay(ns)
Delay vs Size
Kogge-Stone
Brent-Kung
Han-Carlson
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
268
Figure 8. Post Layout Power Comparison
Figure 9. Post Layout Power Delay Product Comparison
10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
Power vs Size
Size
Power(mW)
Kogge-Stone
Brent-Kung
Han-Carlson
10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
Size
PowerDelayProduct(mW-ns)
Power Delay Product(PDP) vs Size
Kogge-Stone
Brent-Kung
Han-Carlson
International Journal of Electronics and Communication Engineering & Technology (IJECET),
ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME
269
V. CONCLUSION
This paper explains an easy and efficient method of generating synthesizable Verilog
code for Wallace multiplier of user specified size. It also shows that the use of Parallel Prefix
adders in the final stage greatly improves the speed of Wallace multiplier and also gives
lower Power Delay Products. The logarithmic delay of the Parallel Prefix adders supplements
the logarithmic delay of the compression tree to provide an overall logarithmic delay.
REFERENCES
[1] P. R. Cappello and K Steiglitz: A VLSI layout for a pipe-lined Dadda multiplier,
ACM Transactions on Computer Systems 1,2(May 1983), pp. 157-17.
[2] Baugh, Charles R.; Wooley, B.A., "A Two's Complement Parallel Array
Multiplication Algorithm," Computers, IEEE Transactions on , vol.C-22, no.12,
pp.1045,1047, Dec. 1973
[3] Wallace, C. S., "A Suggestion for a Fast Multiplier," Electronic Computers, IEEE
Transactions on , vol.EC-13, no.1, pp.14,17, Feb. 1964
[4] L. Dadda, “Some schemes for parallel multipliers,” Alta Frequenza, vol. 34, pp. 349–
356, 1965
[5] Townsend, W. Swartzlander, E. Abraham, J., "A Comparison of Dadda and Wallace
Multiplier Delays". SPIE Advanced Signal Processing Algorithms, Architectures, and
Implementations XIII.
[6] Kogge, Peter M.; Stone, Harold S., "A Parallel Algorithm for the Efficient Solution of
a General Class of Recurrence Equations," Computers, IEEE Transactions on,
vol.C-22, no.8, pp.786,793, Aug. 1973
[7] Brent, Richard P.; Kung, H. T., "A Regular Layout for Parallel Adders," Computers,
IEEE Transactions on , vol.C-31, no.3, pp.260,264, March 1982
[8] Han, Tackdon; Carlson, D.A., "Fast area-efficient VLSI adders," Computer
Arithmetic (ARITH), 1987 IEEE 8th Symposium on , vol., no., pp.49,56, 18-21 May
1987
[9] Er. Kirti Rawal, Er.Sonia, Er. Rajeev Kumar Patial and Mahesh Mudavath, “Parallel
Algorithm for Computing Edt with New Architecture”, International Journal of
Electronics and Communication Engineering & Technology (IJECET), Volume 1,
Issue 1, 2010, pp. 1 - 17, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472
[10] Anitha R and V Bagyaveereswaran, “High Performance Parallel Prefix Adders with
Fast Carry Chain Logic”, International Journal of Advanced Research in Engineering
& Technology (IJARET), Volume 3, Issue 2, 2012, pp. 1 - 10, ISSN Print: 0976-
6480, ISSN Online: 0976-6499.

More Related Content

What's hot

Ijarcet vol-2-issue-3-1036-1040
Ijarcet vol-2-issue-3-1036-1040Ijarcet vol-2-issue-3-1036-1040
Ijarcet vol-2-issue-3-1036-1040
Editor IJARCET
 
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Angel Yogi
 

What's hot (19)

IRJET- An Efficient Multiply Accumulate Unit Design using Vedic Mathematics A...
IRJET- An Efficient Multiply Accumulate Unit Design using Vedic Mathematics A...IRJET- An Efficient Multiply Accumulate Unit Design using Vedic Mathematics A...
IRJET- An Efficient Multiply Accumulate Unit Design using Vedic Mathematics A...
 
Ijarcet vol-2-issue-3-1036-1040
Ijarcet vol-2-issue-3-1036-1040Ijarcet vol-2-issue-3-1036-1040
Ijarcet vol-2-issue-3-1036-1040
 
An 8 bit_multiplier
An 8 bit_multiplierAn 8 bit_multiplier
An 8 bit_multiplier
 
2012
20122012
2012
 
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. TechniqueDesign and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
Design and Implementation of 8 Bit Multiplier Using M.G.D.I. Technique
 
An Area-efficient Montgomery Modular Multiplier for Cryptosystems
An Area-efficient Montgomery Modular Multiplier for CryptosystemsAn Area-efficient Montgomery Modular Multiplier for Cryptosystems
An Area-efficient Montgomery Modular Multiplier for Cryptosystems
 
32-bit unsigned multiplier by using CSLA & CLAA
32-bit unsigned multiplier by using CSLA &  CLAA32-bit unsigned multiplier by using CSLA &  CLAA
32-bit unsigned multiplier by using CSLA & CLAA
 
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
 
Design and Implementation of an Efficient 64 bit MAC
Design and Implementation of an Efficient 64 bit MACDesign and Implementation of an Efficient 64 bit MAC
Design and Implementation of an Efficient 64 bit MAC
 
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
A Novel Design For Generating Dynamic Length Message Digest To Ensure Integri...
 
FPGA IMPLEMENTATION OF HIGH SPEED BAUGH-WOOLEY MULTIPLIER USING DECOMPOSITION...
FPGA IMPLEMENTATION OF HIGH SPEED BAUGH-WOOLEY MULTIPLIER USING DECOMPOSITION...FPGA IMPLEMENTATION OF HIGH SPEED BAUGH-WOOLEY MULTIPLIER USING DECOMPOSITION...
FPGA IMPLEMENTATION OF HIGH SPEED BAUGH-WOOLEY MULTIPLIER USING DECOMPOSITION...
 
Fpga implementation of high speed
Fpga implementation of high speedFpga implementation of high speed
Fpga implementation of high speed
 
High Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
High Performance Baugh Wooley Multiplier Using Carry Skip Adder StructureHigh Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
High Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
 
Vedic multiplier
Vedic multiplierVedic multiplier
Vedic multiplier
 
IJETT-V9P226
IJETT-V9P226IJETT-V9P226
IJETT-V9P226
 
Lab03
Lab03Lab03
Lab03
 
Permonace Modeling of Pipelined Linear Algebra Architectures on ASIC
Permonace Modeling of Pipelined Linear Algebra Architectures on ASICPermonace Modeling of Pipelined Linear Algebra Architectures on ASIC
Permonace Modeling of Pipelined Linear Algebra Architectures on ASIC
 
IRJET - Approximate Unsigned Multiplier with Varied Error Rate
IRJET - Approximate Unsigned Multiplier with Varied Error RateIRJET - Approximate Unsigned Multiplier with Varied Error Rate
IRJET - Approximate Unsigned Multiplier with Varied Error Rate
 
Design and implementation of high speed baugh wooley and modified booth multi...
Design and implementation of high speed baugh wooley and modified booth multi...Design and implementation of high speed baugh wooley and modified booth multi...
Design and implementation of high speed baugh wooley and modified booth multi...
 

Similar to Automated hdl generation of two’s complement wallace multiplier with par

Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
IJRAT
 
JOURNAL PAPER
JOURNAL PAPERJOURNAL PAPER
JOURNAL PAPER
Raj kumar
 
Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2
IAEME Publication
 

Similar to Automated hdl generation of two’s complement wallace multiplier with par (20)

A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
 
VLSI ARCHITECTURE OF AN 8-BIT MULTIPLIER USING VEDIC MATHEMATICS IN 180NM TEC...
VLSI ARCHITECTURE OF AN 8-BIT MULTIPLIER USING VEDIC MATHEMATICS IN 180NM TEC...VLSI ARCHITECTURE OF AN 8-BIT MULTIPLIER USING VEDIC MATHEMATICS IN 180NM TEC...
VLSI ARCHITECTURE OF AN 8-BIT MULTIPLIER USING VEDIC MATHEMATICS IN 180NM TEC...
 
IRJET- Implementation of Low Area and Less Delay of Various Multipliers u...
IRJET-  	  Implementation of Low Area and Less Delay of Various Multipliers u...IRJET-  	  Implementation of Low Area and Less Delay of Various Multipliers u...
IRJET- Implementation of Low Area and Less Delay of Various Multipliers u...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Design and Verification of 4 X 4 Wallace Tree Multiplier
Design and Verification of 4 X 4 Wallace Tree MultiplierDesign and Verification of 4 X 4 Wallace Tree Multiplier
Design and Verification of 4 X 4 Wallace Tree Multiplier
 
IRJET - Comparison of Vedic, Wallac Tree and Array Multipliers
IRJET -  	  Comparison of Vedic, Wallac Tree and Array MultipliersIRJET -  	  Comparison of Vedic, Wallac Tree and Array Multipliers
IRJET - Comparison of Vedic, Wallac Tree and Array Multipliers
 
D0161926
D0161926D0161926
D0161926
 
Lo3420902093
Lo3420902093Lo3420902093
Lo3420902093
 
High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
 
JOURNAL PAPER
JOURNAL PAPERJOURNAL PAPER
JOURNAL PAPER
 
IRJET- Wallace Tree Multiplier using MFA Counters
IRJET-  	  Wallace Tree Multiplier using MFA CountersIRJET-  	  Wallace Tree Multiplier using MFA Counters
IRJET- Wallace Tree Multiplier using MFA Counters
 
Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2
 
IRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth Multiplier
 
Fpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revisedFpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revised
 
A Comparative Analysis of Vedic multiplier with Array and Wallace Tree multip...
A Comparative Analysis of Vedic multiplier with Array and Wallace Tree multip...A Comparative Analysis of Vedic multiplier with Array and Wallace Tree multip...
A Comparative Analysis of Vedic multiplier with Array and Wallace Tree multip...
 
A Brief Review of Design of High Speed Low Power Area Efficient Multipliers
A Brief Review of Design of High Speed Low Power Area Efficient MultipliersA Brief Review of Design of High Speed Low Power Area Efficient Multipliers
A Brief Review of Design of High Speed Low Power Area Efficient Multipliers
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 

More from IAEME Publication

A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
IAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
IAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
IAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Automated hdl generation of two’s complement wallace multiplier with par

  • 1. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 256 AUTOMATED HDL GENERATION OF TWO’S COMPLEMENT WALLACE MULTIPLIER WITH PARALLEL PREFIX ADDERS Bharat Kumar Potipireddi1 , Dr. Abhijit Asati2 1 (EEE, BITS PILANI, Pilani, Rajasthan, India) 2 (EEE, BITS PILANI, Pilani, Rajasthan, India) ABSTRACT Wallace multipliers are among the fastest multipliers owing to their logarithmic delay. The partial products of two’s complement multiplication are generated by an algorithm described by Baugh-Wooley. The complicated reduction of partial products by Wallace algorithm and use of Parallel Prefix adders with logarithmic delay in the final stage of addition makes it difficult to write a generic Verilog code for them. To solve this difficulty, we described a C program which automatically generates a Verilog file for a Wallace multiplier of user defined size with Parallel Prefix adders like Kogge-Stone adder, Brent- Kung adder and Han-Carlson adder. We compared their post layout results which include propagation delay, area and power consumption. The Verilog codes have been synthesized using 90nm technology library. We observed that the multiplier using Kogge-Stone adder in the final stage gives higher speed and lower Power Delay Products when compared to that using Brent-Kung and Han- Carlson adders. Keywords: Brent-kung adder, Han-Carlson adder, Kogge-Stone adder, Two’s complement multiplication, Wallace multiplier. I. INTRODUCTION High speed multiplication is a fundamental requirement in many high performance digital systems. For this purpose, parallel multiplication schemes have been developed. There are two classes of parallel multipliers, namely array multipliers and tree multipliers. Tree multipliers, also known as column compression multipliers, are known for their higher speeds making them very useful in high speed computations. Their propagation delay is proportional to the logarithm of the operand word length in comparison to array multipliers whose delay is INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) ISSN 0976 – 6464(Print) ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June, 2013, pp. 256-269 © IAEME: www.iaeme.com/ijecet.asp Journal Impact Factor (2013): 5.8896 (Calculated by GISI) www.jifactor.com IJECET © I A E M E
  • 2. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 257 directly proportional to operand word length [1]. Column compression multipliers are faster than array multipliers but have an irregular structure and so their design is difficult. With the improvement in VLSI design techniques and process technology, designs which were previously infeasible or too difficult to be implemented by manual layout can now be implemented through automated synthesis. Two of the most well-known column compression multipliers have been presented by Wallace [3] and Dadda [4]. Both architectures are similar with the difference occurring in the procedure of reduction of the partial products and the size of the final adder. In Wallace’s scheme, the partial products are reduced as soon as possible. On the other hand, Dadda’s method does minimum reduction necessary at each level and requires the same number of levels as Wallace multiplier [5]. As a result, final adder in Wallace multiplier is slightly smaller in size as compared to the final adder in Dadda multiplier. This paper presents a C program which generates a Verilog file for Wallace multiplier of user specified size. A comparison of post synthesis and post layout results between Wallace multiplier of varying sizes with Parallel Prefix adders in the final stage is also presented. Sections II and III explain the algorithm of Wallace and its implementation in C to create an automatic Verilog file generator. Section IV gives the post synthesis and post layout results for the multiplier of varying sizes. II. WALLACE MULTIPLIER ARCHITECTURE The Wallace multiplier architecture can be divided into three stages. The first stage involves generation of partial products by two's complement parallel array multiplication algorithm presented by Baugh-Wooley [2]. In their algorithm, signs of all partial product bits are positive. It's different from conventional two's complement multiplication which generates partial product bits with negative and positive signs. The final product obtained after the reduction is also in two’s complement form. Fig. 1 shows generation of partial products for 4x4 multiplier by Baugh-Wooley method. Fig. 2(a) shows the arrangement of the partial products for an 8x8 multiplier. The dots represent the partial products. In the second stage, the partial product matrix is reduced to a height of two using the column compression procedure developed by Wallace. The iterative procedure for doing this is as follows: Find out the maximum height of columns in the dot matrix array. If it is greater than 2, reduce the height by following the recursive procedure described below. 1. Check the height of each column. If it is 1, no reduction is done. If it is 2, use a half adder else use a full adder and check the height of column again. Continue the reduction till the height of column becomes ≤ 1. 2. Repeat the above step for all other columns and at the end, enqueue the sum bits of all half adders and full adders into the same columns and carry bits into the adjacent columns. 3. Again find out the maximum height of columns and continue the reduction using the above recursive procedure till maximum height reaches 2.
  • 3. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 258 Figure 1. Generation of Partial Products of 4x4 Two's complement multiplier by Baugh Wooley's method a) b) c) d) e) Figure 2. Column Compression scheme for 8x8 Wallace multiplier Fig. 2 (b), (c), (d) and (e) show the reduction stages for an 8x8 Wallace multiplier. Once the height of matrix is reduced to two, an adder is used to generate the final product. The paper describes the use of different Parallel Prefix adders for final adder stage which is described later in section III-D.
  • 4. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 259 III. VERILOG CODE GENERATION The main objective in writing a C program was to output a verilog file for an NxN Wallace multiplier based on the user input N. We implemented the Wallace algorithm and wrote the verilog code to a file using C to achieve this. III.I Initialization of Verilog Modules Initially, the program prompts the user for the size of the multiplier. After accepting the size of the multiplier, the program creates an empty Verilog file and prints the half adder and full adder modules in it. Next, it prints the gate level ‘and’ primitive for partial product generation. After this the top-level multiplier module is printed using sub-modules generated above. The top module contains two N-bit ‘input’ data types one 2N bit ‘output’ data type for the multiplier, multiplicand and product bits respectively. The program also sets up N2 ‘wire’ data type to store the partial products. Once all the above modules and data types are set up in the Verilog file, the program prints the required number of half adders and full adders to reduce column heights as explained earlier in section-II. Column reduction utilizes the dot matrix array in C, which is described below. III.II Dot Matrix Array Creation For the purpose of implementing the column compression according to Wallace’s algorithm, the program creates an equivalent of a dot matrix array internally. The columns of the array are represented by queues (First-In-First-Out data structure). Hence the program creates 2N queues, where each of the queues is implemented as a linked list. The sizes and pointers to the first and last element of all queues are stored. This allows quicker enqueuing and dequeuing operations. Every element in a queue stores a string of characters, particularly the name of the wires carrying the partial product. Initially, the queues are loaded with the names of the wires holding the partial products. III.III Array Reduction Once the array, using the queues, has been created in C, it needs to be reduced according to Wallace’s algorithm. The recursive procedure described in the section-II is implemented in C using a ‘while loop’. The size of the queues (height of columns) is reduced by using full adders and half adders. The loop stops executing when the length of all queues (height of all columns) is two or less. Let ‘a[2N]’ represents an array of ‘2N’ queues to store the names of partial products during reduction. Let a[i] size represents the size of ith queue. At the start of every iteration, the sizes of all the 2N queues are checked and their maximum is stored in ‘max’. Let temp_size[i] holds the size of ith queue a[i] size and it is reassigned with new size after every iteration. Every dequeue operation will decrement the size of queue, a[i] size by 1 and every enqueue will increment its size by 1 and temp_size[i] will be decremented by 1 for every dequeue. If temp_size[i] = 2, queue a[i] is reduced using a half adder. The first two elements are dequeued and used as input to the half adder, temp_size[i] is decremented by 2 and becomes 0. The sum from the half adder is enqueued to the same queue and the carry out is enqueued to the next queue in sequence. If temp_size[i] > 2, full adder is used to reduce the queue. When a full adder is used, the first three elements of the queue are dequeued and are supplied as inputs to the full adder. The sum from the full adder is enqueued to the same queue while the carry out is enqueued to the next queue in sequence and temp_size[i] is decremented by 3. If temp_size[i] becomes ≤ 1, the reduction of queue a[i] is stopped else the above procedure is followed recursively. After a queue is reduced, the next queue in sequence is taken up for reduction. At the
  • 5. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 260 end of iteration, maximum height of dot matrix array is found out and is stored in ‘max’. The iterations continue till ‘max’ becomes ≤ 2 i.e at most two elements remain in each queue. Once such a state has been reached the ‘while’ loop exits and the reduction phase is completed. The array reduction using queues is explained with the help of an example. A 4x4 Wallace multiplication is taken as the example a) b) c) d) e) Figure 3. Reduction of Queues for a 4x4 Wallace multiplier
  • 6. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 261 The state of the queues before the start of the reduction is shown in fig. 3(a). The reduction of the queues for a 4x4 multiplier is explained below: III.III.I Iteration One On checking the sizes of all the queues, it is observed that the maximum size ‘max’ for this iteration is 4. Since the size of Queue1 = 1, no reduction is done. The size of Queue2 is 2, hence a half adder is used for reduction. The first two elements of the queue are dequeued and summed in a half adder (‘ha0’) by printing a half adder in the Verilog file. Now temp_size[1] of this queue becomes 0. The sum bit of the half adder is assigned to a wire ‘ha0s’ and the carry out bit is assigned to the wire ‘ha0c’. In accordance with the algorithm, ‘ha0s’ is enqueued to Queue2 and ‘ha0c’ is enqueued to Queue3. Since temp_size[2] for Queue3 is 3, a full adder (‘fa0’) is used for reduction by printing a full adder in verilog file. The addition of the full adder leads to dequeuing of the first three elements of Queue3 and enqueuing of the sum of the full adder‘fa0s’ to Queue3 and the carry out of the full adder ‘fa0c’ to Queue4. Now for reduction of Queue4, consider temp_size[3] of this Queue which is 4. So, a full adder(‘fa1’) is used for reduction and it’s sum ‘fa1s’ is enqueued into Queue4 and carry ‘fa1c’ is enqueued into Queue5. Now temp_size[2] is decremented by 3 and becomes 1 and so the reduction stops. The reduction of Queues 5 and 6 follow similar procedure described above and the reduction of all Queues in this iteration is shown in Fig. 3(b). The elements above the arrow in the queue are the ones which are enqueued in this iteration. The state of the queues at the end of this iteration is shown in Fig. 3(c). III.III.IIIteration Two On checking the sizes of all the queues, it is observed that the maximum size ‘max’ for the second iteration is 3. The variable temp_size[i] is assigned the new size of ith array a[i] size. All queues with sizes greater than 1 need to be reduced. The program starts checking the sizes of the queues sequentially. Queue 3 has a size of 2 and hence a half adder (‘ha2’) is used for reduction. Full adders are used for the reduction of Queues 4 and 5 since their size is 3. Queues 6 and 7 are reduced using half adders. Fig. 3(d) shows the elements dequeued and enqueued to all the Queues in this iteration. The final state of the queues at the end of this iteration is shown in Fig. 3(e). III.IV Final Stage Adder Once the size of all queues has been reduced to two or less, the elements in the queues are ready to be summed using an adder. If a[i] size is 1, the only element in the Queue a[i] is dequeued and assigned to prod[i], where ‘prod’ is an array holding the ‘2N’ bit product of an NxN multiplier. Let ‘st_index’ gives the value of i such that size of all queues before a[i] is 1. The size of the final adder is ‘2N – st_index’. The first elements of all queues form the first input to the adder and the second elements form the second input to the adder. Different Parallel Prefix adders like Kogge-Stone adder, Brent-Kung adder and Han-Carlson adder are used. The implementation of these adders is described below. III.IV.I Kogge-Stone Adder The Kogge-Stone adder generates carry signal in O(log n) time [6]. A radix-2 Kogge- Stone adder of size ‘2N-st_index’ has been used as the final adder for an NxN multiplier. Fig. 4 shows the example of an 8-bit Kogge-Stone adder with no carry-in. The first bit in the boxes is the propagate bit while the second one is the generate bit. Initially (stage zero) in an N-bit Kogge- Stone adder, the propagate and generate bits are generated according to (1). G0,n = an bn for 1 ≤ n ≤ N P0,n = an + bn for 1 ≤ n ≤ N (1) where an and bn are the nth bits of the two inputs.
  • 7. International Journal of Electronics and Communication Engineering & ISSN 0976 – 6464(Print), ISSN 0976 Figure 4. In the ith (i ≥ 1) stage the propagate and generate bits in n according to (2). Gi,n = Gi-1,n Pi,n = Pi-1,n Gi,n = Gi-1,n + (Pi-1,n Gi-1,m Pi,n = Pi-1,n Pi-1,m where m is given by (3) m =n – 2i-1 Finally, the carry and sum bits are calculated according to (4). Cn = Gf,n Sn = P0,n Cn-1 where Sn and Cn are the nth bits of the sum and carry respectively. G nth block in the final stage. The numbers of stages in the adder depend upon its size. In a Kogge size N, there are ceil(log2N)+1 stages. Since we need a stage of an NxN multiplier, the number of stages in the adder are The logarithmic delay of Kogge delay of the multiplier. International Journal of Electronics and Communication Engineering & Technology (IJECET), 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 262 Figure 4. Example of 8 bit Kogge-Stone adder 1) stage the propagate and generate bits in nth block (Pi,n and Gi,n) are calculated for 1 ≤ n ≤ 2i-1 +1 for 1 ≤ n ≤ 2i-1 +1 1,m) for 2i-1 +2 ≤ n ≤ N for 2i-1 +2 ≤ n ≤ N (2) (3) Finally, the carry and sum bits are calculated according to (4). 1 (4) bits of the sum and carry respectively. Gf,n is the generate bit of the The numbers of stages in the adder depend upon its size. In a Kogge-Stone adder of stages. Since we need a ‘2N-st_index’ bit adder for the final multiplier, the number of stages in the adder are ceil[log2(2N- The logarithmic delay of Kogge-Stone adder is crucial in maintaining overall logarithmic Technology (IJECET), June (2013), © IAEME ) are calculated is the generate bit of the Stone adder of bit adder for the final -st_index)]+1. Stone adder is crucial in maintaining overall logarithmic
  • 8. International Journal of Electronics and Communication Engineering & ISSN 0976 – 6464(Print), ISSN 0976 III.IV.II Brent-Kung Adder Kogge-Stone Adder has higher performance but it takes larger area to be laid out. Brent-Kung adder can be laid out in lesser area than Kogge congestion, but its time performance is poor[7]. In a Brent Ceil[2*log2(N)] stages. Initially (stage zero) in an N generate bits are generated according to (5). G0,n = an bn for 1 ≤ n ≤ P0,n = an + bn for 1 ≤ n where an and bn are the nth bits of the two inputs In the ith (1≤ i ≤ log2(N)) stage the propagate and generate bits in n calculated according to (6). Gi,n = Gi-1,n + (Pi-1,n Gi-1,m Pi,n = Pi-1,n Pi-1,m Gi,n = Gi-1,n Pi,n = Pi-1,n where m is given by (7) m =n – 2i-1 Figure 5. International Journal of Electronics and Communication Engineering & Technology (IJECET), 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 263 Stone Adder has higher performance but it takes larger area to be laid out. out in lesser area than Kogge-Stone Adder and has lesser wiring congestion, but its time performance is poor[7]. In a Brent-Kung adder of size N, there are (N)] stages. Initially (stage zero) in an N-bit Brent-Kung adder, the propagate and generate bits are generated according to (5). n ≤ N ≤ n ≤ N (5) bits of the two inputs ) stage the propagate and generate bits in nth block (Pi,n and G 1,m) for n=2i .k ∀ k ∈ Z+ & n ≤ N for n=2i .k ∀k ∈ Z+ & n ≤ N for 1 ≤ n ≤ N & n!=2i .k ∀ k ∈ Z+ for 1 ≤ n ≤ N & n!=2i .k ∀ k ∈ Z+ (6) (7) Figure 5. Example of 8 bit Brent-Kung adder Technology (IJECET), June (2013), © IAEME Stone Adder has higher performance but it takes larger area to be laid out. Stone Adder and has lesser wiring Kung adder of size N, there are Kung adder, the propagate and and Gi,n) are
  • 9. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 264 To calculate generate and propagate bits of ith stage (log2(N)+1 ≤ i <2*log2(N)), we define three variables count, x and y. count =k where 2k is the smallest integer ≥ N and x ∈ Z+ , x≥ 2 and x=2 for i= log2(N)+1 and is incremented by 1 for every subsequent stage and z=2count –2count-2 y= 2count –2count-x (8) Now in the ith stage the propagate and generate bits in nth block (Pi,n and Gi,n) are calculated according to Gi,n = Gi-1,n + (Pi-1,n Gi-1,m) for n=y-l and N ≥ n ≥ z/2 Pi,n = Pi-1,n Pi-1,m for n=y-l and N ≥ n ≥ z/2 Gi,n = Gi-1,n for 1 ≤ n ≤ N and n!=y-l Pi,n = Pi-1,n for 1 ≤ n ≤ N and n!=y-l (9) where m and l are given by (10) and (11) respectively m= n – 2count- x (10) l= (2count –2count-x+1 ).k, ∀ k ∈ Z+ U {0} (11) Finally, the carry and sum bits are calculated according to (12). Cn = Gf,n Sn = P0,n Cn-1 (12) where Sn and Cn are the nth bits of the sum and carry respectively. Gf,n is the generate bit of the nth block in the final stage. Fig. 5 shows the example of an 8 bit Brent-Kung Adder. Since we need a ‘2N-st_index’ bit adder for the final stage of an NxN multiplier, the number of stages in the adder are ceil[2*log2(2N-st_index)]. III.IV.III Han-Carlson adder Brent-Kung adder reduces area and power but do not produce minimum depth parallel prefix circuits. Their delay time is also high (2*log2N – 1). Kogge-Stone adder has lesser delay (log2N) but it has high area and power. By combining B-K & K-S graphs, Han and Carlson obtained a new hybrid prefix graph that achieves intermediate values of area and time[8]. An example of 8 bit Han-Carlson Adder is shown in Fig. 6. The numbers of stages in the adder depend upon its size. In a Han-Carlson adder of size N, there are Ceil[log2(N)+2] stages. Initially (stage zero) in an N-bit Han-Carlson adder, the propagate and generate bits are generated according to (13). G0,n = an bn for 1 ≤ n ≤ N P0,n = an + bn for 1 ≤ n ≤ N (13) where an and bn are the nth bits of the two inputs In the ith (1≤ i ≤ log2(N)) stage the propagate and generate bits in nth block (Pi,n and Gi,n) are calculated according to (14) Gi,n = Gi-1,n + (Pi-1,n Gi-1,m) for 2i-1 +2 ≤ n ≤ N, n is even Pi,n = Pi-1,n Pi-1,m for 2i-1 +2 ≤ n ≤ N, n is even Let S={n│2i-1 +2 ≤ n ≤ N, n is even} Gi,n = Gi-1,n for 1 ≤ n ≤ N , ∀ n ∉ S Pi,n = Pi-1,n for 1 ≤ n ≤ N , ∀ n ∉ S (14) where m =n – 2i-1 (15) In the last stage(i=log2(N)+1) the propagate and generate bits in nth block (Pi,n and Gi,n) are calculated according to (16)
  • 10. International Journal of Electronics and Communication Engineering & ISSN 0976 – 6464(Print), ISSN 0976 Gi,n = Gi-1,n + (Pi-1,n Gi-1,m Pi,n = Pi-1,n Pi-1,m Let R={n│3 ≤ n ≤ N, n is odd} Gi,n = Gi-1,n Pi,n = Pi-1,n Finally, the carry and sum bits are calculated according to (17). Cn = Gf,n Sn = P0,n Cn-1 Figure 6. where Sn and Cn are the nth bits of the sum and carry respectively. G the nth block in the final stage. Since we need a ‘2N-st_index’ bit adder for the final stage of an NxN multiplier, the number of stages in the adder is given by IV. RESULTS AND DISCUSSION All the multiplier designs use Verilog as the HDL. The synthesis results of Wallace multiplier with all the parallel prefix adders discussed in the previous section were compared. This section prov micrometer and power in milliwatt for all the architectures mentioned above. The synthesis International Journal of Electronics and Communication Engineering & Technology (IJECET), 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 265 1,m) for 3 ≤ n ≤ N, n is odd for 3 ≤ n ≤ N, n is odd for 1 ≤ n ≤ N, n ∉ R for 1 ≤ n ≤ N, n ∉ R (16) s are calculated according to (17). 1 (17) Example of 8 bit Han-Carlson adder bits of the sum and carry respectively. Gf,n is the generate bit of bit adder for the final stage of an NxN multiplier, the number of stages in the adder is given by ceil[log2(2N-st_index)+2]. DISCUSSION All the multiplier designs use Verilog as the HDL. The synthesis and post layout multiplier with all the parallel prefix adders discussed in the previous section were compared. This section provides the delay in millisecond, area in square micrometer and power in milliwatt for all the architectures mentioned above. The synthesis Technology (IJECET), June (2013), © IAEME is the generate bit of bit adder for the final stage of an NxN multiplier, the number and post layout multiplier with all the parallel prefix adders discussed in the previous ides the delay in millisecond, area in square micrometer and power in milliwatt for all the architectures mentioned above. The synthesis
  • 11. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 266 was performed in Cadence RTL Compiler using 90 nm UMC technology libraries at typical (tt) conditions. The synthesized netlist was used along with design constraint .sdc file, technology library file and .lef files for generating the layout using SOC Encounter tool. The layouts have been done for Wallace multiplier of different sizes and their post layout results have been tabulated. The post layout results follow the same trend as the post synthesis results. TABLE I. COMPARISON OF SYNTHESIS AND POST LAYOUT RESULTS FOR WALLACE MULTIPLIER WITH KOGGE-STONE ADDER TABLE II. COMPARISON OF SYNTHESIS AND POST LAYOUT RESULTS FOR WALLACE MULTIPLIER WITH BRENT-KUNG ADDER TABLE III. COMPARISON OF SYNTHESIS AND POST LAYOUT RESULTS FOR WALLACE MULTIPLIER WITH HAN-CARLSON ADDER Size Synthesis (Pre Layout) Post Layout Delay (ns) Area (µm2 ) Delay (ns) Area (um2 ) 16 5.495 9404 5.626 22309 32 7.128 37939 7.702 92336 48 8.191 84688 10.144 200909 64 9.570 150174 10.998 356264 96 10.845 335926 15.691 796928 Size Synthesis (Pre Layout) Post Layout Delay (ns) Area (µm2 ) Delay (ns) Area (um2 ) 16 5.004 9707 5.021 23028 32 6.698 38922 6.933 93159 48 7.70 86692 8.772 205662 64 8.804 152986 9.421 362935 96 10.262 340522 13.761 807831 Size Synthesis (Pre Layout) Post Layout Delay (ns) Area (µm2 ) Delay (ns) Area (um2 ) 16 5.622 9330 5.977 20220 32 9.556 37002 9.616 90004 48 13.362 83637 12.763 198416 64 15.150 148553 15.458 352419 96 19.018 331952 19.811 787504
  • 12. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 267 TABLE IV. COMPARISON OF POST LAYOUT RESULTS FOR WALLACE MULTIPLIER WITH PARALLEL PREFIX ADDERS Size Kogge-Stone Brent-Kung Han-Carlson Power (mW) PDP Power (mW) PDP Power (mW) PDP 16 0.437 2.194 0.370 2.215 0.411 2.312 32 1.145 7.938 0.961 9.241 1.049 8.079 48 1.859 16.307 1.356 17.306 1.665 16.89 64 2.955 27.839 2.248 34.749 2.685 29.53 96 5.795 79.745 4.776 94.617 5.295 83.083 The post layout area is the area of core which includes area of standard cells as well as the area of interconnect wires. Fig.7 shows that the multiplier with Kogge-Stone adder in the final stage is much faster than with all other adders but it’s power consumption is the highest of all as shown in Fig. 8. Fig.9 compares the Power Delay Products. It can be observed that Wallace multiplier with Kogge-Stone and Han-Carlson adders in the final stage has lower Power Delay Products than with Brent-Kung adder. Wallace multiplier with Brent- Kung adder gives lower power and area when compared to that with Kogge-Stone and Han- Carlson adders but owing to the higher delay, its Power Delay Product is also higher. Figure 7. Post Layout Delay Comparison 10 20 30 40 50 60 70 80 90 100 5 10 15 20 Size Delay(ns) Delay vs Size Kogge-Stone Brent-Kung Han-Carlson
  • 13. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 268 Figure 8. Post Layout Power Comparison Figure 9. Post Layout Power Delay Product Comparison 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 Power vs Size Size Power(mW) Kogge-Stone Brent-Kung Han-Carlson 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Size PowerDelayProduct(mW-ns) Power Delay Product(PDP) vs Size Kogge-Stone Brent-Kung Han-Carlson
  • 14. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 3, May – June (2013), © IAEME 269 V. CONCLUSION This paper explains an easy and efficient method of generating synthesizable Verilog code for Wallace multiplier of user specified size. It also shows that the use of Parallel Prefix adders in the final stage greatly improves the speed of Wallace multiplier and also gives lower Power Delay Products. The logarithmic delay of the Parallel Prefix adders supplements the logarithmic delay of the compression tree to provide an overall logarithmic delay. REFERENCES [1] P. R. Cappello and K Steiglitz: A VLSI layout for a pipe-lined Dadda multiplier, ACM Transactions on Computer Systems 1,2(May 1983), pp. 157-17. [2] Baugh, Charles R.; Wooley, B.A., "A Two's Complement Parallel Array Multiplication Algorithm," Computers, IEEE Transactions on , vol.C-22, no.12, pp.1045,1047, Dec. 1973 [3] Wallace, C. S., "A Suggestion for a Fast Multiplier," Electronic Computers, IEEE Transactions on , vol.EC-13, no.1, pp.14,17, Feb. 1964 [4] L. Dadda, “Some schemes for parallel multipliers,” Alta Frequenza, vol. 34, pp. 349– 356, 1965 [5] Townsend, W. Swartzlander, E. Abraham, J., "A Comparison of Dadda and Wallace Multiplier Delays". SPIE Advanced Signal Processing Algorithms, Architectures, and Implementations XIII. [6] Kogge, Peter M.; Stone, Harold S., "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," Computers, IEEE Transactions on, vol.C-22, no.8, pp.786,793, Aug. 1973 [7] Brent, Richard P.; Kung, H. T., "A Regular Layout for Parallel Adders," Computers, IEEE Transactions on , vol.C-31, no.3, pp.260,264, March 1982 [8] Han, Tackdon; Carlson, D.A., "Fast area-efficient VLSI adders," Computer Arithmetic (ARITH), 1987 IEEE 8th Symposium on , vol., no., pp.49,56, 18-21 May 1987 [9] Er. Kirti Rawal, Er.Sonia, Er. Rajeev Kumar Patial and Mahesh Mudavath, “Parallel Algorithm for Computing Edt with New Architecture”, International Journal of Electronics and Communication Engineering & Technology (IJECET), Volume 1, Issue 1, 2010, pp. 1 - 17, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472 [10] Anitha R and V Bagyaveereswaran, “High Performance Parallel Prefix Adders with Fast Carry Chain Logic”, International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 3, Issue 2, 2012, pp. 1 - 10, ISSN Print: 0976- 6480, ISSN Online: 0976-6499.