This document describes the design of a 16-bit, 3-input adder using two different strategies: a wait strategy and a speculative/divide-and-conquer (DAC) strategy. The wait strategy uses a divide-and-conquer tree of 5-bit full adders as the basic building block. The DAC strategy speculatively calculates potential outputs in parallel for subsets of bits and then selects the correct outputs using multiplexers once the carry bits are known. Both strategies were implemented and tested for area and delay, with the DAC strategy showing around a 26.6% reduction in worst-case propagation delay over the wait strategy.
2. ECE 465: PROJECT 1
OBJECTIVE
We need to design a 16-bit 3-number adder that can add three 16-bit numbers A, B, D and produce an
18-bit sum output S. This has to be achieved using two different designs:
(a) Wait Strategy
(b) Speculative Strategy
DESIGN
Divide and Conquer approach has been used for both the cases in order to make the problem simpler and
getting a design which can reduce the propagation delays when compared with the conventional designs.
At the lowest level, we have our basic building block named as FA5 which is a 5-bit full adder. FA5 consists
of 2 3-bit Full Adders and 1 Half Adder as shown in Fig 3. It produces three outputs - Sum, Ci+1 and Ci+2.
The design is shown in the Fig 1.
Fig 1: D&C Tree
16-bit RCA
8-bit RCA 8-bit RCA
4-bit RCA 4-bit RCA
2-bit RCA 2-bit RCA
FA5 FA5 FA5 FA5
.
.
.
10 FA5
FA5FA5
.
.
.
3. A. WAIT STRATEGY
Instead of trying to add all the bits at once, we divide them into halves till we get a 1-bit addition as shown
in the design tree given below. We don’t need a stitch up function as we don’t need to go up the tree. The
schematic capture showing the critical path can be found at the end of the report.
WORKING
In this design, the Carry outputs from ith
stage is given to the Carry inputs of (i+1) and (i+2) stages. Hence,
the (i+1) and (i+2) stage has to wait for the ith
stage to complete its operation and generate the carry outs.
So in this case, the carry ripples from first stage to the last one. At the end of the 16th
FA5, we are left with
three carry outs. We need to group them and make it two as explained in the problem. We add Ci+1 from
16th
level and Ci+2 from 15th
level using a Half adder. The sum of these two bits becomes 17th
bit of final
sum and the carry of the Half Adder is OR’ed with the Ci+2 of 16th
level and this gives the 18th
bit of final
sum. In this way we get a 18-bit sum. We can also call this design a 16-bit RCA with FA5 as the building
block and additional HA and an OR gate for highest 2 bits (Check Fig 4).
The above explanation is described in the figures below.
Fig 2: Design of Wait Strategy at leaf level
Fig 3: Design for FA5 (Leaf node)
4. Fig 4: Calculating Carry bits at the last stage
AREA CALCULATION
Area of FA5 = 5 inputs
Total Area, A = 16 FA5 + 1 HA + 1 OR = 80 + 2 + 2 = 84 inputs
DELAY CALCULATION
(a) Theoretical delay
Delay for 1 FA5 = Delay of (2 3-bit FA’s + 1 HA) = 2X5 + 4 = 14p units
Total Delay = 16X14 + 4 + 2 = 224+6 = 230p units
(a) Delay captured from Quartus (Output Pin Load = 10pf)
Worst case propagation delay, Ts = 36.705 ns.
T = 28.1ns (obtained by checking the max delay found across the test vector. Check the snapshot below)
AT2
= 66327.24 units
5. B. SPECULATIVE STRATEGY
Also known as design for all cases or DAC. We divide our problem in smaller subproblem and then design
the circuit in a way that it solves the subproblem for all possible cases and is ready with the output by the
time we get to know which is the correct copy and the only delay is taken in choosing this copy. The
schematic capture showing the critical path can be found at the end of the report.
WORKING
In this design, we need to break the problem to a level at which we can speculate the carry inputs to make
the design faster and the same time try not to increase the area by a big margin. So we choose it to be done
at the 3rd
level from top. We break the 16 bits into 4 parts. Each of these 4 are further broken in the same
way as done for part A. So we have 4 4-bit RCA’s.
The lowest 4 bits are calculated with the 3 carry inputs as zeros. But next 4-bit RCA is dependent on the
carry outputs from the lowest RCA. Instead of waiting for the lowest RCA to complete its operation, we
can have 8 copies (3 select inputs) of the RCA2. If we look closely, we don’t really need 8 copies. If we
order the carry outputs like Co22 (Ci+2 from last FA5 of RCA1), Co21 (Ci+2 from 2nd
last FA5 of RCA1),
Co11 (Ci+1 from last FA5 of RCA1), we find out that the only possible values are 000-100. So we have 5
copies of RCA2 with the Co22, Co21, Co11 as inputs having values from 000-100.
Since we cannot have a 5:1 MUX and using 8:1 MUX takes more delay (dependent on number of inputs),
we can have a 4:1 MUX which samples the inputs based on Co21, Co11 (00-11). These are the cases when
Co22=0. Now we can use another 2:1 MUX which will have Co22 as its select input. This MUX takes the
output of 4:1 MUX when Co22=0 otherwise passes the output of the RCA2 which has 100 as its carry
inputs. We use this 5:1 MUX to get the correct carry outputs and Sum bits from all the copies of RCA2
based on the carry outputs from RCA1. So for each stage, we need 4 of these MUX’es – one each for the
carry outs and one for 4-bit Sum. The 1-bit MUX is shown in Fig 6. 4-bit MUX needed for Sum is also
designed in the same way (Fig 7 & 8).
We continue this step for next 8 bits in the same way and the final carry outs will be tackled in the same
way as we did for the part A (Shown in Fig 4). The design is shown in the Fig 5.
6. Fig 5: DAC Design
16-bit CSA
Speculative
8-bit RCA 8-bit RCA
4-bit RCA
4-bit RCA
4-bit RCA
4-bit RCA
4-bit sum
4-bit sum
4-bit sum
4-bit sum
4-bit sum
Reduced MUX
5-1
4-bit sum
4-bit RCA4 4-bit RCA3
FA5 FA5 FA5 FA5
Reduced
MUX
5-1
Reduced
MUX
5-1
Reduced
MUX
5-1
These three carry
outputs go to Fig. X
This structure which gives us one
4-bit sum and three carry outputs
is repeated for each of the three
4-bit RCA2 RCA3 and RCA4, the
first RCA works like a normal RCA
with its three carry inputs set to
‘0’
4-bit RCA2 4-bit RCA1
These
are the
outputs
of the
previous
RCA
carry
MUXs
8. Fig 8: 4-bit 5:1 MUX explained
2-1
4-bit MUX
2-1
4-bit MUX
2-1
4-bit MUX
2-1
4-bit MUX
Co11
Co21
Co22
This one is Ci+1 from the last
FA5 of RCA-4
If we consider all the
possible cases for adding two
4-bit numbers, this bit can
be ‘1’ only if the other two
carry outs (Co21 and Co11)
are ‘0’.
S2S1
S5
0 1
S3 S5
The structure of carry
MUXs is the same and just
instead of having a 4-bit
MUX, we need 1-bit MUXs
only.
We implemented a 4-bit
MUX by 4 1-bit MUX which
work simultaneously with the
same selector
9. AREA CALCULATION
Area of 4-bit RCA = 12 input bits + 3 Carry bits = 15 inputs
Area of 1-bit 5:1 MUX = 5 1-bit inputs + 3 select lines = 8 inputs
Area of 4-bit 5:1 MUX = 5 4-bit inputs + 3 select lines = 5X4+3 = 23 inputs
Total Area, A = 16 4-bit RCA inputs + 9 1-bit 5:1 MUX inputs + 3 4-bit 5:1 MUX inputs + 2 HA inputs +
2 OR gate inputs = 16X15+9X8+3X23+2+2= 385 inputs
DELAY CALCULATION
(b) Theoretical delay
Delay for 1 4-bit RCA = 4 FA5 = 4X14 = 56p units
Delay of 1 8:1 MUX = 5X3 = 15p units
Total Delay = 56 + 3X15 + 4 + 2 = 107p units
(c) Delay captured from Quartus (Output Pin Load = 10pf)
Worst case propagation delay, Ts = 26.947 ns.
T = 22.3ns (obtained by checking the max delay found across the test vector. Check the snapshot below)
AT2
= 191456.65 units
10. CONCLUSIONS
As it can be seen from our calculations and the simulation outputs that the expected delay reduction was
around 50% but we get only 26.6% for the static delay (Ts).
Also, Ts/T !=1 in both the cases. This is known as the discrepancy factor. Ts shows the time when no inputs
are provided and hence it gives the worst propagation delay. Whereas T is specific to the set of inputs
provided to the adder. The worst case of T will still be equal to Ts but only for some specific inputs. The
test vector which we have used, never goes through that path which introduces the discrepancy.
Delay for n-bit adder
(a) Wait strategy = n(delay of FA5) + Delay of 1 Half Adder + Delay of 1 OR gate
(b) DAC strategy = x(delay of FA5) + (n/x-1)(Delay of 5:1 MUX) ;where x is group size and 5:1
MUX is as defined above.
Number of basic components for n-bit adder
(a) Wait strategy = n FA5 + 1 HA + 1 OR = (n+2) basic components
(b) DAC strategy = x[5(n/x-1)+1] FA5 + 3(n/x-1)*4*(2:1 MUX) + (n/x-1)*16*(2:1 MUX)
= (5x+28)(n/x-1)+x ;Second and third terms are for 1-bit 5:1 MUX and 4-bit 5:1 MUX