Survey on Prefix adders

Survey On Parallel Prefix Adders
Lakshmi Yasaswi Kamireddy and Shashikala Kodandaraman
ECE Department, University of Illinois at Chicago, IL 60607, USA
E-mail: lkamir2@uic.edu,skodan2@uic.edu
Abstract— Parallel Prefix Adder (PPA) is one of the fastest
types of adder that had been created and developed. Some of the
common types of parallel prefix adders are Brent-Kung, Kogge-
Stone, Han-Carlson and Ladner-Fischer adders. This paper
involves an investigation of the performances of these adders in
terms of computational delay, interconnect usage and power.
The investigation and comparison for all these adders was
conducted for 16 bit size. By using the Quartus II design
software, the designs for above mentioned adders were
developed. The simulation result produced the vector waveform
which then shows the computational delay; interconnect usage
and power for the adders. Hence, this project is significant in
showing which of these adders being tested perform better in
terms of computational delay, interconnect usage and power.
Keywords—Prefix adders; Kogge-stone adder; Brent-Kung
adder; Han-Carlson adder; Ladner-Fischer adder
I. INTRODUCTION
The fundamental operations involved in any Digital
systems are addition and multiplication. Addition is an
indispensible operation in any Digital, Analog, or Control
system. Fast and accurate operation of digital system depends
on the performance of adders[1]. The main function of adder is
to speed up the addition of partial products generated during
multiplication operation. Hence improving the speed by
reduction in area is the main area of research in VLSI system
design. Over the last decade many types of adder architectures
were studied, such as carry ripple adders, carry skip adder,
carry look ahead adder, parallel prefix tree adders[2],etc. In
tree adders, carries are generated in parallel and fast
computation is obtained at the expense of increased area and
power. The main advantage of the design is that the carry tree
reduces the number of logic levels (N) by essentially
generating the carries in parallel.
The parallel-prefix tree adders are more favorable in terms
of speed due to the complexity O(log2N) delay through the
carry path compared to that of other adders. The prominent
parallel prefix tree adders are Kogge-Stone, Brent-Kung, Han-
Carlson, and Ladner- Fischer.
II. PARALLEL PREFIX ADDERS
Parallel Prefix Adder (PPA) is very useful in today’s world of
technology because of its implementation in Very Large Scale
Integration (VLSI) chips. The VLSI chips rely heavily on fast
and reliable arithmetic computation. These contributions can
be provided by PPA. There are many types of PPA such as
Brent-Kung [4], Kogge-Stone [3], Ladner-Fisher [6] and Han-
Carlson [5]. Fig. 1 shows the structured diagram of an 8-bit
PPA. It employs the 3-stage structure of the CLA adder,
namely the pre-processing, carry graph and Post processing.
The improvement in the carry generation stage is the most
intensive one.
Fig 1 8-bit Parallel-Prefix Structure
A. Pre-computation stage
The pre-processing part will generate the propagate (P) and
generate (G) bits. The acquirement of the PPA carry bit
differentiates PPA from other type of adders. It is a parallel
form of obtaining the carry bit that makes it performs addition
arithmetic faster . In this stage ,the generate and propagate
signals computed are used to generate carry input of each
adder. A and B are inputs. These signals are given by the
equations 1&2.
Pi=Ai XOR Bi (1)
Gi=Ai .Bi (2)
B. Prefix tree
In this stage carries corresponding to each bit are
computed. Execution is done in parallel form .After the
computation of carries in parallel they are divided into smaller
pieces. Carry operator contain two AND gates and one OR
gate. It uses propagate and generate as intermediate signals
which are given by the equations 3&4.
P(i:k) =P(i:j) . P(j-1:k) (3)
G(i:k)=G(i:j)+(G(j-1:k).P(i:j)) (4)
C. Post-Computation Stage
This is the final stage to compute the summation of input
bits. It is same for all adders and sum bit equation is given by
Si= Pi XOR Ci (5)
Ci+1=(Pi .C0) + Gi (6)

D. Operators
Parallel Prefix adders compute carry-in at each level of
addition by combining generate and propagate signals in a
different manner. Two operators namely black and gray are
used in parallel prefix trees are shown in Fig 2.
Fig 2 .Logic Diagram of a) Black operator b) Grey operator
E. Parameter Calculation
An array of cells for an 8-bit adder is shown in Fig 3. The
outputs of the cells are labeled with a pair of integers
corresponding to the initial and the final bit that is spanned by
the output. Because each level produces a doubling of bits
spanned, for n power-of-two, the number of the levels is
L=log2n+1 (7)
where the additional level is due to the carry-in c0. In the figure
for eight bits there are four levels. Although c0 causes the
additional level it does not increase the overall delay because
the computation of c8 is in parallel to the calculation of the sum
bits. The expression for the delay is
TPA=ta,g+(log2n)tcell+txor (8)
Fig 3: 8-bit prefix adder
Since each level (except the last) has n/2 cells, the number of
cells is
N = n log2(n/2) + 1 (9)
(not including the gates to produce gi and ai nor the XOR
gates). Since the cells are simple, their delay and area are
small, resulting in an effective implementation. The main
disadvantage of this implementation is the large fan-out of
some cells as well as the long interconnection wires. For
example, in the 8-bit adder there is a cell with internal fan-out
of four, so that in general for an adder of n bits that maximum
fan-out is n/2+1 where n/2 is the fan-out of the carry tree and
the additional 1goes to XOR gate. The large fan-out and long
inter-connections produce an increase in the delay, which can
be reduced by including buffers. However, the delay of these
buffers might still be significant. In such a case, the large fan-
out can be eliminated by two approaches, or a combination of
both:
1. Increasing the number of levels
The fan-out can be reduced by increasing the number of
levels, as shown in Fig 4. This is achieved by reducing the
parallelism in the determination of the carries. The resulting
number of levels in the limit (carry tree fan-out=2) is 2log2=L
1+1) −(n where the last 1 corresponds again to the stage with
one cell, due to c0. The number of cells is the same as for the
basic scheme.
Fig 4: 8-bit prefix adder with maximum fan-out of three and
five levels
2. Increasing the number of cells
The maximum fan-out is reduced to two (without increasing
the number of levels) by the structure shown in Fig 5. This
structure is constructed as follows:
• Level 1 is formed of cells having as inputs
neighboring bits. So, groups are formed with bits c0
and 0, with bits 0 and 1, with bits 1 and 2, and so on.
Consequently, for n bits there are n cells.
• Level 2 combines outputs of cells of level 1 whose
indexes differ by 2. That is, c0 and 1, 0 and 2, and so
on. There are n-1 cells at this level.
• Level 3 combines outputs of cells of level 2 whose
indexes differ by 4. That is, c0 and 3, 0 and 4, and so
on. There are n-3 cells.
• In general, level k combines outputs of level (k-1)
whose indexes differ by 2k-1
. It has n-(2k-1
-1) cells.

As in the basic scheme there are log2(n)+1 levels. As can
be seen, the fan-out of all cells is two and the connections
are regular. The number of cells is
N=n+ (n-1) + (n-3) +…+ (n-(n/2-1)) +1
= (n) (log2n-1) + log2n+2 (10)
The number of cells of this scheme is about twice that of
the basic scheme. If the number of cells is too high, it is
possible to use an intermediate scheme, which has an
intermediate maximum fan-out as well as an intermediate
number of cells.
Fig 5: 8-bit prefix adder with minimum number of levels and
fan-out of two
III. IMPLEMENTATION OF VARIOUS PARALLEL PREFIX ADDERS
A. Kogge-stone Adder
Kogge-Stone prefix adder has best performance in VLSI
implementations and widely known as a parallel prefix adder
that performs fast logical addition. It is used for wide adders
because it shows the less delay among other architectures
which is because it uses the fewest logic levels as shown in
Fig6.
Fig 6:16 Bit Kogge-Stone Adder
B. Brent-Kung Adder
Brent-Kung prefix tree is a well-known structure with
relatively sparse network as in Fig7. The PPA could be
optionally constructed with minimum number of nodes and
maximum logic depth. This design trades speed for reduced
area.
Fig 7:16 Bit Brent-Kung Adder
C. Han-Carlson Adder
The Han-Carlson structure is a hybrid design combining
stages from the Brent-Kung and Kogge-Stone structures. The
middle stages resemble the Kogge-Stone structure and the first
and the final stages use the Brent-Kung structure. Comparing
to the KS structure, it reduces the wiring and gates. The major
difference is that in each logic level as Han-Carlson prefix tree
places cells every other bit and the last logic level accounts for
the missing carries as in Fig 8.
Fig 8: 16 Bit Han-Carlson Adder
D. Ladner-Fischer Adder
Ladner-Fischer prefix tree is a structure that sits
between Brent-Kung and Sklansky prefix tree[7]. It can
be observed that in Fig 9, the first two logic levels of
the structure are exactly the same as Brent-Kung’s.
Ladner- Fischer adder has minimum logic depth but it
has large fan-out.
Fig 9: 16-Bit Ladner-Fischer Adder
Carry
Stages
# of cells
Max
fan-out

KSA log2n n(log2n-1)+1 2
BKA 2 log2n-1 2(n-1)- log2n 2
HCA log2n +1 (n/2 ) log2n 2
LFA log2n (n/4) log2n + 3n/4 − 1 n/2
Table 1: Comparison of parameters for various parallel prefix
adders
IV. RESULTS
A. Delay
From the results it is observed that the 8-bit Kogge Stone
Adder is the fastest adder among all the adders investigated
since it has the least critical delay. This is because Kogge-
Stone Adder has the least number of levels in the prefix tree.
But Brent-Kung Adder has the worst performance among all
the adders compared in this study. This is one of the reasons
why Kogge-Stone Adder has high usage in high speed
applications.
Fig 10.Critical delay (ns) and simulation delay (ns) for various
8 bit adders
B. Interconnect usage:
It has been observed that Kogge-Stone Adder has high
interconnect usage. This can be accounted from the fact that it
uses more number of cells than the other adders which is seen
from Fig 6-9. Table 1 shows the calculations for number of
cells for n bit adders.
Fig 11.Interconnect usage for various 8 bit adders
C. Power
Fig 12 - 14 present comparison of all the parallel prefix adders
investigated in this paper in terms of power consumption.
From Fig 10 it can be seen that the Kogge-Stone adders are
the fastest among all the adders compared in this paper. But
from the simulation results it can be clearly inferred that in
terms of power; performance of Kogge-Stone Adders are not
among the best. The Brent-Kung and Ladner-Fischer Adder
exhibit the best performance in terms of power consumption
for the 8 bit adder category.
Fig 12: I/O Thermal power dissipation (mW) for various 8 bit
adders
Fig 13: Core dynamic power dissipation (mW) for various 8
bit adders
Fig 14: Total thermal power (mW) for various 8 bit adders
CONCLUSION
The primary objective of this paper is the
performance comparison of various parallel prefix adders. In
this paper several adders were analyzed to identify the optimal
adder modules that can be used for the realization of high
speed or low power adder structures. The addition algorithms
that were studied include four different types of prefix tree

adders. Each kind of adder has different properties in
propagation delay, interconnect usage and power consumption.
Based on our simulation studies the Kogge-Stone adder
performs with less delay and larger area since it uses high
number of cells. Thus it is widely used in high performance
applications and it has the merits of uniform structure and
balanced loading in each internal node to get high speed
performance. The Brent-Kung Adder exhibited the best
performance in terms of area(measured in terms of number of
cells) and power consumption for all the input data considered.
Han-Carlson, Ladner-Fischer have good performance with
less area and delay. The main disadvantage of parallel prefix
adder is the large fan-out of some cells as well as the long
interconnection wires. The large fan-out can be eliminated by
increasing the number of levels or cells; as a result, there are
different structures. The long inter-connections produce an
increase in delay which can be reduced by including buffers.
ACKNOWLEDGMENT
We are grateful to Prof.Shripriya Poduri for her valuable
inputs and advice. It was not piece of cake but her guidance
and suggestions proved to be precious not only for this survey
but will be great value for our career.
REFERENCES
[1] Anitha R and Bagyaveereswaran V (2012), “High Performance Parallel
Prefix Adders with Fast Carry Chain Logic”, International Journal of
VLSI Design & Communication Systems, Vol. 3, No. 2, pp. 01-10,
ISSN 0976-6499.
[2] Krishna Kumari V, Sri Chakrapani Y and Kamaraju M (2013), “Design
and Characterization of Koggestone, Sparse Koggestone, Spanning Tree
and Brentkung Adders”, International Journal of Scientific &
Engineering Research, Vol. 4, No. 10, pp. 1502-1506, ISSN 2229-5518.
[3] Kogge P and Stone H, “A Parallel Algorithm for the Efficient Solutions
of a General Class of Recurrence Relations”, IEEE Transactions on
Computers, Vol. C-22,No.8,1973.
[4] R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE
Trans. Computers, vol. C-31, no. 3, pp. 260–264, Mar. 1982.
[5] T. Han and D. Carlson, “Fast area-efficient VLS Adders,” in Proc. 8th
Symp. Comp. Arith., Sept. 1987, pp. 49–56.
[6] R. Ladner and M. Fischer,”Parallel prefix computation”,Journal of
ACM. La. Jolla CA, Vol.27, pp.831838,October1980.
[7] J. Sklansky, “Conditional-sum addition logic,” IRE Trans. Electronic
Computers, vol. EC-9, pp. 226–231, June 1960.

Survey on Prefix adders

More Related Content

What's hot

Viewers also liked

Similar to Survey on Prefix adders

More from Lakshmi Yasaswi Kamireddy

Recently uploaded

Survey on Prefix adders