Iaetsd implementation of power efficient iterative logarithmic multiplier usi...
Final Project Report
1. Design and verification of 8X8 Vedic Multiplier using
90nm CMOS Process Technology
Huan Wang
Department of Electrical and Computer Engineering
University of Massachusetts
Lowell, MA01854, USA
Huan_Wang@student.uml.edu
Riddhi Shah
Department of Electrical and Computer Engineering
University of Massachusetts
Lowell, MA01854, USA
shahriddhi65@yahoo.in
Abstract—A previous paper mentioned this modified carry
select adder (CSA) by using Verilog. We transplant this design to
the transistor level. This CSA is considered to be the fastest
adder among the normal adder configuration. A multiplier is a
very important element in almost all the processors and
contributes substantially to the total power consumption of the
system. The novel point is the efficient use of Vedic algorithm
(sutras) that reduces the number of computational steps
considerably compared with traditional method. The schematic
for this multiplier is designed using Cadence. The design is then
verified in virtuoso using 90nm CMOS technology library file. In
the end we design the ideal multiplier using the Verilog to do the
verification with our design in transistor level. Paper presents a
systematic design methodology for this improved performance
digital multiplier based on Vedic mathematics.
Keywords— Multiplier, Vedic Multiplier, Ripple Carry Adder
I. INTRODUCTION
The multiplier is one of the most important structure in any
processor nowadays. A binary multiplier is an electronic circuit
used in digital circuit A variety of computer arithmetic
techniques can be used to implement a digital multiplier. Most
techniques involve computing a set of partial products, and
then summing the partial products together[1].This process
conducting long multiplication on decimal integers, but has
been modified here for application to a binary number system.
As more transistors per chip became available due to larger-
scale integration, it became possible to put enough adders on a
single chip to sum all the partial products at once, rather than
reuse a single adder to handle each partial product one at a time.
As the common digital signal processing algorithms spend
most of their time multiplying, the processors spend a lot of
chip area in order to make the multiplication as fast as possible.
Hence a non-conventional yet very efficient Vedic
mathematics is used for making a high performance multiplier.
Vedic Mathematics deals mainly with various Vedic
mathematical formulae andtheir applications for carrying out
large arithmetical operations easily[2]. The power consumption
and speed performance are what to be compared with the
existing digital multiplier designs.
II. VEDIC MULTIPLICATION ALGORITHM
A. The Vedic Sutras
Depending on the various branches of mathematics, Vedic
algorithms are divided into 16 sutras (algorithms) [3], out of
which two sutras are for multiplication as :
1. Nikhilam Navatashcaramam Dashatah – All from 9 and
the last from 10.
2. Urdhva-Tiryagbhyam – Vertically and crosswise.
This paper is based on Urdhva-Tiryagbhyam(UT) sutra of
Vedic multiplication, which is the most generalized method for
multiplication. This sutra is used for binary multiplication for
making the digital multiplier. It is also called as“Vertically
and Crosswise” method of multiplication. An illustration of
this multiplication algorithm is shown in the figure 1 below.
Considering a digital hardware, a Vedic multiplier will be more
power efficient and more faster also as less number of steps are
required for multiplication. Also there is nearly no limitations
attached to this multiplication algorithm
B. Example for general Multipllicand using Vedic
Mathmatics
Fig-1 shows the generalized line diagram for the UT
algorithm. This algorithm is able to be used in all cases such as
decimal multiplicand, binary multiplicand, etc. [4] All the
multiplications being done here are in vertical and crosswise
directions, requiring only 7 steps for multiplication of two, 4
bit numbers.
2. Fig-1 Line diagram for UT algorithm
III. VLSI TECHNOLOGY USING CMOS LOGIC
Large integrated circuits can be constructed using CMOS
logic with very low static power consumption. The increasing
demand for low-power very large scale integration(VLSI) can
be addressed at different design levels, such as the architectural,
circuit, layout, and the process technology level. At the circuit
design level, considerable potential for power savings exists by
means of proper choice of a logic style for implementing
combinational circuits. This is because all the important
parameters governing power dissipation—switching
capacitance, transition activity, and short-circuit currents—are
strongly influenced by the chosen logic style. Depending on
the application, the kind of circuit to be implemented, and the
design technique used, different performance aspects become
important. In the past, the parameters like high speed, small
area and low cost were the major areas of concern, whereas
power considerations are now gaining the attention of the
scientific community associated with VLSI design. In recent
years, the growth of personal computing devices (portable
computers and real time audio and video based multimedia
applications) and wireless communication systems has made
power dissipation a most critical design parameter [5] .In the
absence of low-power design techniques such applications
generally suffer from very short battery life, while packaging
and cooling them would be very difficult and this is leading to
an unavoidable increase in the cost of the product.
In multiplication, reliability is strongly affected by power
consumption. Usually, high power dissipation implies high
temperature operation, which, in turn, has a tendency to induce
several failure mechanisms in the system [6]. Power dissipation
is the most critical parameter for portability & mobility and it
is classified in to dynamic and static power dissipation.
Dynamic power dissipation occurs when the circuit is
operational, while static power dissipation becomes an issue
whether circuit is inactive or is in a power-down mode. There
are three major sources of power dissipation in digital CMOS
circuit which are summarized in equation (1):
Pavg = Pswitching + Pshort circuit + Pleakage (1)
The first term represents the switching component of power,
The second term is due to the direct-path short circuit current,
I , which arises when both the NMOS and PMOS transistors
are simultaneously active, conducting current directly from
supply to ground. Finally, leakage current, which can arise
from substrate injection and sub-threshold effects, is primarily
determined by fabrication technology considerations. The
switching power dissipation in CMOS digital integrated
circuits is a strong function of the power supply voltage (V)
and emerges as a very effective means of limiting the power
consumption. However, the saving in power Therefore,
reduction of dissipation comes at a significant cost in terms of
increased circuit delay. Since the exact analysis of propagation
delay is quite complex, a simple first order derivation can be
used to show the relation between power supply and delay time
[7].
Td = Cl * Vdd/ (K*Vdd-Vth)α (2)
IV. MODIFIED MULTIPLIER ARCHITECTURE
The architectures for 2×2, 4×4, 8×8 bit modules are
discussed in this section. In this section, the technique used is
UT (Vertically and Crosswise) sutra.
A. 2X2 Vedic Multiplier Design
To show how it works. If we have 2 numbers each has two
bits, let’s assume A=a1a0, B=b1b0. First the least significant
bit (LSB) bit of final product (vertical) is obtained by taking
the product of two least significant bit (LSB) bits of A and B is
a0b0. Second step is to take the products in a crosswise manner
such as the least significant bit (LSB) of the first number A
(multiplicand) is multiplied with the next higher bit of the
multiplicand B in a crosswise manner. The output generated is
1-Carry bit and 1bit used in the result as shown below. Next
step is to take product of 2 most significant bits (MSB) and for
the obtained result previously obtained carry should be added.
The result obtained is used as the fourth bit of the final result
and final carry is the other bit.[8]
s0 = a0b0
c1s1 = a1b0+ a0b1 (4)
c2s2 = c1 + a1b1 (5)
3. The result of the 2X2 multiplier is c2s2s1s0. The 2X2
multiplier is composed of two half adders. The below figures
are the schematic design of half adder and 2X2 multiplier in
Cadence.
Fig-3 Half Adder Block Design
Fig-4 2X2 Vedic Multiplier Block Design
Fig-6 Simulation Result for 2X2 Vedic Multiplier
B. 4X4 Vedic Mulstiplier Deisng
In this part we will introduce how the 4X4 Multiplier
works. First let’s assume we have two numbers: A=a3b2b1b0,
B=b3b2b1b0. The procedure can be seen in the Block Design
Figure below. The final product will be c6s6s5s4s3s2s1s0.
The partial products are calculated in parallel and hence delay
obtained is decreased enormously for the increase in the
number of bits. The Least Significant Bit (LSB) S0 is obtained
easily by multiplying the LSBs of the multiplier and the
multiplicand. [8] The following equations show how the
multiplier does the algorithm.
S0 = A0B0 (6)
C1S1 = A1B0 + A0B1 (7)
C2S2 = C1 + A0B2 + A2B0 + A1B1 (8)
C3S3 = C2 + A0B3 + A3B0 + A1B2 + A2B1 (9)
C4S4 = C3 + A1B3 + A3B1 + A2B2 (10)
C5S5 = C4 + A3B2 + A2B3 (11)
C6S6 = C5 + A3B3 (12)
Fig-7 Full Adder Block Design
Fig-8 4-bit Ripple Carry Adder
Fig-9 4X4 Multiplier Block Design
4. Fig-10 Full Adder Simulation Result
Fig-11 4X4 Vedic Multiplier Simulation Result
The function for the Ripple Carry Adder is that the carry
generated from the first ripple carry adder is passed on to the
next ripple carry adder and there are two zero inputs for
second ripple carry adder. The arrangement of the ripple carry
adders in Fig-9 can reduce the computational time such that
the delay can be decreased.
C. 8X8 Vedic Multiplier Design
In this part we will discuss the 8X8 Vedic Multiplier
design. Let’s assume we have two numbers
A=a7a6a5a4a3a2a1a0, B=b7b6b5b4b3b2b1b0. The procedure
could be explained by the following design figures. The final
product will be
S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0. The partial
products are calculated in parallel and hence delay obtained is
decreased enormously for the increase in the number of bits.
The Least Significant Bit (LSB) S0 is obtained easily by
multiplying the LSBs of the multiplier and the multiplicand.
Here the multiplication is followed according to the steps
shown in the line diagram in figure 4. After performing all the
steps the result (Sn) and Carry (Cn) is obtained and in the
same way at each step the previous stage carry is forwarded to
the next stage and the process goes on. [8]
Fig-12 8X8 Vedice Multiplier Block Design
Fig-13 8X8 Vedic Multiplier Simulation Result.
Look at the block design for 8x8 as shown above. In the
block diagram 8x8 totally there are four 4x4 Vedic multiplier
modules, and three modified carry select adders which are of 8
bit size are used. The 8 bit modified carry select adders are
used for addition of two 8 bits and likewise totally four are use
at intermediate stages of multiplier. The carry generated from
the first modified carry select adder is passed on to the next
modified carry select adder and there are four zero inputs for
second modified carry select adders. The arrangements of the
modified carry select adders are shown in below block
diagram which can reduces the computational time such that
the delay can be decrease. [8]
V. VERFICATION
We have designed the 2X2, 4X4, 8X8 multiplier in
Verilog HDL and the simulation is done in ModelSim to do
the verification of our result. We also did the ideal block
design using the Verilog HDL to run the simulation in
Cadence to do the comparison. Also we did the comparison
with the traditional booth multiplier in Verilog HDL design.
(the codes can be found in Appendix)
5. Fig-14 4X4 Vedic Multiplier Simulation Result in
Modelsim
Fig-15 8X8 Vedic Multiplier Simulation Result in
Modelsim
Fig-16 8X8 Booth Multiplier Simulation Result in
ModelSim
VI. SIMULATION RESULT ANALYSIS
1) For the 4X4 Vedic Multiplier:
Measurement Result
Pavg = 0.01371 W
Processor Time Required = 4.46 seconds
2) For the 8X8 Vedic Multiplier
Measurement Result:
Pavg = 0.0939 W
Processor Time Required = 13.97 seconds
Both for a transition time of 100ns.When compared with the
results obtained in [9] , the power consumption and processor
time required is found to be very less. The power consumption
using the gate level analysis in [9] for a 4-bit multiplier is
found to be 0.45W whether the results obtained in this paper
using transistor level analysis , gives it to be around 3 mW .
The power consumption for the 8-bit multiplier structure here
using four , 4-bit multipliers is found to be around 93 mW The
processor time required in the gate level analysis in [9] is 6.42
Seconds for the 4-bit multiplier against the 4.43 seconds
obtained in the Vedic multiplier designed above using CMOS
VLSI technology. Again the computational steps are also
reduced and hardware implementation required will also be
less as compared to the conventional methods and hence
enhancing the performance of the overall system.
VII. CONCLUSION
This paper represents an efficient Vedic multiplier design
using VLSI technology. Almost 80% power reduction at 1.2
volts can be achieved using this Vedic multiplier as compared
to its earlier counterparts using gate level analysis or the
conventional ways of multiplication. The processor's time
consumption is reduced from 6.42 Seconds to 4.43 Seconds
for the 4-bit Vedic multiplier and the computational
complexity is also less as it is requiring fewer numbers of
steps as compared to conventional multiplication methods. For
a real world application of this multiplier, it is implemented
for finding out the determinant of a 2 X 2 matrix which will be
having two, 8-bit multipliers and finding the difference of both
using two's compliment.
The design in a transplant from a previous design all use the
ideal block design in Verilog HDL. Transplanting this design
to the transistor level cause us a lot of problems in the delay
time which will have an influence on the later stage logic.
That’s why we can’t finish the 16X16 Vedic Multiplier
because the delay is so severe that we can’t get the right logic
out. And we redesign the full adder using the PTL solution,
which will be much faster and more power saving. And a
carry skip adder should be added to reduce the delay caused
by the ripple carry adder. For the power consumption part, as
the multiplier is using large number of MOSFETs so the
transistor’s switching characteristics also needs to be kept in
mind and buffers will be required at various nodes inside the
circuit for avoiding the voltage drop inside the circuit [10].
The design algorithm and the results show that this Vedic
multiplier requires less area and consumes less power as
compared to the conventional multipliers.
VIII.FUTURE WORKS
1. Do more research on the more efficient full-adder
design and try to add a carry skip adder to reduce the
delay time from ripple carry adder.
2. Design a built-in self-test circuitry for the verification
in hardware approach
ACKNOWLEDGMENT
I sincerely thank my partner Riddhi and Prof. Martin Margala,
for their help in completing this project. And special thanks to
Rajitha Gullapalli for her help in the Verilog Design part.
6. REFERENCES
[1] Kai Hwang, Computer Arithmetic: Principles, Architecture And Design.
New York: John Wiley & Sons, 1979
[2] Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim, Yong Beom
Cho, "Multiplier design based on ancient Indian Vedic Mathematics”,
2008 International SoC Design Conference, PP 65-68.
[3] Parth Mehta, Dhanashri Gawali,“Conventional versus Vedic
mathematical method for Hardware implementation of a multiplier”
Department of ETC,Maharashtra Academy of Engg., ., Alandi(D),Pune,
India, 2009
[4] Vedic Mathematics [Online]. Available:
http://www.hinduism.co.za/vedic.htm.
[5] J.D. Lee, Y.J. Yoony, K.H. Leez, B.-G. Park, “Application of dynamic
pass- transistor logic to an 8-bit multiplier,” J.Kor. Phys. Soc. 38 (3)
(2001) 220–223.
[6] Sung Mo Kang , Yusuf Leblebici " CMOS Digital Integrated Circuits,
Third Edition , 2003.
[7] R. Jacob Baker, Harry W. Li, David E. Boyce " CMOS :Circuit Design
Layout And Simulation (Book style) ", Third Edition, 2011.
[8] Bhavani Prasad.Y, Ganesh Chokkakula, Srikanth Reddy.P and
Samhitha.N.R “Design of Low Power and High Speed Modified Carry
Select Adder for 16 bit Vedic Multiplier”, ICICES2014, ISBN No.978-
1-4799-3834-6/14
[9] Laxman P.Thakre, Suresh Balpande, Umesh Akare, Sudhir Lande,
“Performance Evaluation and Synthesis of Multiplier used in FFT
operation using Conventional and Vedic algorithms,” Third
International Conference on Emerging Trends in Engineering and
Technology , PP 614-619, IEEE, 2010.
[10] Kang, S.,“Accurate simulation of power dissipation in circuits”, IEEE
Journal of Solid-State Circuits, vol. 21, pp.889-891, 1986.