SlideShare a Scribd company logo
CHAPTER-1

               INTRODUCTION TO VLSI DOMAIN

1.1 VLSI DESIGN:

        The complexity of VLSI is being designed and used today makes the manual
approach to design impractical. Design automation is the order of the day. With the
rapid technological developments in the last two decades, the status of VLSI
technology is characterized by the following

A steady increase in the size and hence the functionality of the ICs:

• A steady reduction in feature size and hence increase in the speed of operation as
well as gate or transistor density.
• A steady improvement in the predictability of circuit behavior.
• A steady increase in the variety and size of software tools for VLSI design.
The above developments have resulted in a proliferation of approaches to VLSI
design.
1.2 HISTORY OF VLSI:

       VLSI began in the 1970s when complex semiconductor and communication
technologies were being developed. The microprocessor is a VLSI device. The term is
no longer as common as it once was, as chips have increased in complexity into the
hundreds of millions of transistors.

       This is the field which involves packing more and more logic devices into
smaller and smaller areas. VLSI circuits       can now be put into a small space few
millimeters across.. VLSI circuits are everywhere ... our computer, our car, our brand
new state-of-the-art digital camera, the cell-phones, and what we have.

1.3 VARIOUS INTEGRATIONS:

        Over time, millions, and today billions of transistors could be placed on one
chip, and to make a good design became a task to be planned thoroughly.

In the early days of integrated circuits, only a few transistors could be placed on a
chip as the scale used was large because of the contemporary technology, and

                                           1
manufacturing yields were low by today's standards. As the degree of integration was
small, the design was done easily. Over time, millions, and today billions of
transistors could be placed on one chip, and to make a good design became a task to
be planned thoroughly.

1.3.1 SSI TECHNOLOGY:

          The first integrated circuits contained only a few transistors. Called "small-
scale integration" (SSI), digital circuits containing transistors numbering in the tens
provided a few logic gates for example, while early linear ICs such as the Plessey
SL201 or the Philips TAA320 had as few as two transistors. The term Large Scale
Integration was first used by IBM scientist Rolf Landauer when describing the
theoretical concept from there came the terms for SSI, MSI, VLSI, and ULSI.

1.3.2 MSI TECHNOLOGY:
           The next step in the development of integrated circuits, taken in the late
1960s, introduced devices which contained hundreds of transistors on each chip,
called "medium-scale integration" (MSI).
           They were attractive economically because while they cost little more to
produce than SSI devices, they allowed more complex systems to be produced using
smaller circuit boards, less assembly work (because of fewer separate components),
and a number of other advantages.
1.3.3 LARGE SCALE INTEGRATION:
        Further development, driven by the same economic factors, led to "large-scale
integration" (LSI) in the mid 1970s, with tens of thousands of transistors per chip.
Integrated circuits such as 1K-bit RAMs, calculator chips, and the first
microprocessors, that began to be manufactured in moderate quantities in the early
1970s, had under 4000 transistors. True LSI circuits, approaching 10,000 transistors,
began to be produced around 1974, for computer main memories and second-
generation microprocessors.
1.3.4 VLSI:
         Final step in the development process, starting in the 1980s and continuing
through the present, was in the early 1980s, and continues beyond several billion
transistors as of 2009. In 1986 the first one megabit RAM chips were introduced,
which contained more than one million transistors. Microprocessor chips passed the

                                           2
million transistor mark in 1989 and the billion transistor mark in 2005.The trend
continues largely unabated, with chips introduced in 2007 containing tens of billions
of memory transistors.


VLSI DESIGN FLOW:




                                  Start




                               Design Entity




                              Logic Synthesis                      Pre layout Simulation




                             System Partitioning




                               Floor Planning                   Pre layout Simulation




                                 Placement




                                     Routing                       Circuit Extraction
  Finish




                              Fig 2.1 vlsi design flow


                                          3
1.4 ULSI, WSI, SOC and 3D-IC:
       To reflect further growth of the complexity, the term ULSI that stands for
"ultra-large-scale integration" was proposed for chips of complexity of more than 1
million transistors. Wafer-scale integration (WSI) is a system of building very-large
integrated circuits that uses an entire silicon wafer to produce a single "super-chip".
Through a combination of large size and reduced packaging.
       A system-on-a-chip ( SOC) is an integrated circuit in which all the
components needed for a computer or other system are included on a single chip. The
design of such a device can be complex and costly, and building disparate
components on a single piece of silicon may compromise the efficiency of some
elements. However, these drawbacks are offset by lower manufacturing and assembly
costs and by a greatly reduced power budget: because signals among the components
are kept on-die, much less power is required.
        Three-dimensional integrated circuit (3D-IC) has two or more layers of active
electronic components that are integrated both vertically and horizontally into a single
circuit, &less power consumption.
1.5 VLSI DESIGN FLOW AND THEIR DESCRIPTION:
       The design at the behavioral level is to be elaborated in terms of known and
acknowledged functional blocks. It forms the next detailed level of design description.
Once again the design is to be tested through simulation and iteratively corrected for
errors. The elaboration can be continued one or two steps further. It leads to a detailed
design description in terms of logic gates and transistor switches.
Optimization
       The circuit at the gate level – in terms of the gates and flip-flops – can be
redundant in nature. The same can be minimized with the help of minimization tools.
The step is not shown separately in the figure. The minimized logical design is
converted to a circuit in terms of the switch level cells from standard libraries
provided by the foundries. The cell based design generated by the tool is the last step
in the logical design process; it forms the input to the first level of physical design.
Simulation
       The design descriptions are tested for their functionality at every level –
behavioral, data flow, and gate. One has to check here whether all the functions are
carried out as expected and rectify them. All such activities are carried out by the


                                             4
simulation tool. The tool also has an editor to carry out any corrections to the source
code. Simulation involves testing the design for all its functions, functional sequences,
timing constraints, and specifications. Normally testing and simulation at all the levels
– behavioral to switch level – are carried out by a single tool; the same is identified as
“scope of simulation tool” in Figure 1.1.




                                            5
Synthesis
       With the availability of design at the gate (switch) level, the logical design is
complete. The corresponding circuit hardware realization is carried out by a synthesis
tool. Two common approaches are as follows:
• The circuit is realized through an FPGA. The gate level design description is the
starting point for the synthesis here. The FPGA vendors provide an interface to the
synthesis tool. Through the interface the gate level design is realized as a final circuit.
With many synthesis tools, one can directly use the design description at the data flow
level itself to realize the final circuit through an FPGA. The FPGA route is attractive
for limited volume production or a fast development cycle.

 • The circuit is realized as an ASIC. A typical ASIC vendor will have his own library
of basic components like elementary gates and flip-flops. Eventually the circuit is to
be realized by selecting such components and interconnecting them conforming to the
required design. This constitutes the physical design. Being an elaborate and costly
process, a physical design may call for an intermediate functional verification through
the FPGA route. The circuit realized through the FPGA is tested as a prototype. It
provides another opportunity for testing the design closer to the final circuit.
Physical Design
       A fully tested and error-free design at the switch level can be the starting point
for a physical design [Baker & Boyce, Wolf]. It is to be realized as the final circuit
using (typically) a million components in the foundry’s library. The step-by-step
activities in the process are described briefly as follows:
• System partitioning: The design is partitioned into convenient compartments or
functional blocks. Often it would have been done at an earlier stage itself and the
software design prepared in terms of such blocks. Interconnection of the blocks is part
of the partition process.
• Floor planning: The positions of the partitioned blocks are planned and the blocks
are arranged accordingly. The procedure is analogous to the planning and
arrangement of domestic furniture in a residence. Blocks with I/O pins are kept close
to the periphery; those which interact frequently or through a large number of
interconnections are kept close together, and so on. Partitioning and floor planning
may have to be carried out and refined iteratively to yield best results.



                                            6
• Placement: The selected components from the ASIC library are placed in position
on the “Silicon floor.” It is done with each of the blocks above.
• Routing: The components placed as described above are to be interconnected to the
rest of the block: It is done with each of the blocks by suitably routing the
interconnects. Once the routing is complete, the physical design cam is taken as
complete. The final mask for the design can be made at this stage      and the ASIC
manufactured in the foundry.
Post Layout Simulation
       Once the placement and routing are completed, the performance specifications
like silicon area, power consumed, path delays, etc., can be computed. Equivalent
circuit can be extracted at the component level and performance analysis carried out.
This constitutes the final stage called “verification.” One may have to go through the
placement and routing activity once again to improve performance.
Critical Subsystems
       The design may have critical subsystems. Their performance may be crucial to
the overall performance; in other words, to improve the system performance
substantially, one may have to design such subsystems afresh. The design here may
imply redefinition of the basic feature size of the component, component design,
placement of components, or routing done separately and specifically for the
subsystem. A set of masks used in the foundry may have to be done afresh for the
purpose.




                                           7
CHAPTER 2

               INTRODUCTION TO THE PROJECT

2.1 Motivation:
       The multiplication operation can be employed to implement the system
performance and had been widely used in Digital Signal Processing and in Digital
Communications.

       The traditional array based multiplication performs a regular usage of more
number of addition and shifting operations, thus utilizing more amount of Hardware
and having more complex operations.

2.2 Overview of the Project:

       Multiplication operation involves generation of partial products and their
accumulation. The speed of multiplication can be increased by reducing the number
of partial products and/or accelerating the accumulation of partial products. Among
the many methods of implementing high speed parallel multipliers, there are two
basic approaches namely Booth algorithm and Wallace Tree compressors.

        This paper describes an efficient implementation of a high speed parallel
multiplier using both these approaches. Here two multipliers are proposed. The first
multiplier makes use of the Radix-4 Booth Algorithm with 3:2 compressors while the
second multiplier uses the Radix-8 Booth algorithm with 4:2 compressors. The design
is structured for m x n multiplication where m and n can reach up to 126 bits. The
number of partial products is n/2 in Radix-4 Booth algorithm while it gets reduced to
n/3 in Radix-8 Booth algorithm.

       The Wallace tree uses Carry Save Adders (CSA) to accumulate the partial
products. This reduces the time as well as the chip area. To further enhance the speed
of operation, carry-look-ahead (CLA) adder is used as the final adder.

2.3 Organization of Thesis:

       The first chapter in this project report is introduction to the Booth Encoding.
Second chapter gives the brief idea on different types of operations, like, addition and

                                           8
shifting. Third chapter is the different types of Wallace tree method. Fourth chapter
shows the operation of Carry Look-ahead Adder scheme.

       The synthesis and simulation results for calculating processor (CP) reports in
the fifth chapter. Conclusions and future scope are explained in sixth chapter,
References are given after sixth chapter. The Code for calculating processor (CP) put
in Appendix.The efficient implementation of Radix-8 multiplication operation is
an important prerequisite in Booth Algorithm because multiplication operations are
performed using Radix-8 representation operations in the underlying field.

       Wallace tree method provides an efficient way of adding the partial products.
Three kinds of Radix operations that are especially amenable for the efficient
implementation of multiplication operations. Finally a Carry Look-ahead Adder is
used in addition of partial products.




                                          9
CHAPTER 3

           BASIC THEORY OF BOOTH ALGORITHM

3.1 Introduction to Booth Algorithm:

       It consists of four major modules: Booth encoder, partial product generator,
Wallace tree and carry look-ahead adder. The Booth encoder performs Radix-2 or
Radix-4 encoding of the multiplier bits. Based on the multiplicand and the encoded
multiplier, partial products are generated by the generator. For large multipliers of 32
bits, the performance of the modified Booth algorithm is limited. So Booth recoding
together with Wallace tree structures have been used in the proposed fast multiplier.
The partial products are supplied to Wallace Tree and added appropriately. The
results are finally added using a Carry Look-ahead Adder (CLA) to get the final
product.




                     Fig 3.1 Block Diagram of Wallace Booth Multiplier

                                          10
3.2 Radix – 8 Booth Algorithm


         Multiplier Bits       Recoded
                               Operation
          Yi+   Yi+ Y Yi –
                               on
           2    1   i      1
                               multiplicand
                               ,X

          0     0   0      0       0X

          0     0   0      1       +1X

          0     0   1      0       +1X

          0     0   1      1       +2X

          0     1   0      0       +2X

          0     1   0      1       +3X

          0     1   1      0       +3X

          0     1   1      1       +4X

          1     0   0      0       -4X

          1     0   0      1       -3X

          1     0   1      0       -3X

          1     0   1      1       -2X

          1     1   0      0       -2X

          1     1   0      1       -1X

          1     1   1      0       -1X

          1     1   1      1       0X




                                     11
Table 3.2 Radix-8 Multiplication



       Here we have an odd multiple of the multiplicand, 3Y, which is not
immediately available. To generate it we need to perform this previous add:
2Y+Y=3Y. But we are designing a multiplier for specific purpose and thereby the
multiplicand belongs to a previously known set of numbers which are stored in a
memory chip. We have tried to take advantage of this fact, to ease the bottleneck of
the radix-8 architecture, that is, the generation of 3Y.
               In this manner we try to attain a better overall multiplication time, or at
least comparable to the time we could obtain using a radix-4 architecture (with the
additional advantage of using a less number of transistors). To generate 3Y with 21-
bit words we only have to add 2Y+Y, that is, to add the number with the same number



                                            12
shifted one position to the left, getting in this way a new 23-bit word, as shown in
below figure 3.2.




         Fig. 3.2: 21-bit previous add.

       In fact, only a 21-bit adder is needed to generate the bit positions from z1 to
z21. Bits z0 and z22 are directly known because z0=y0 and z22=y20 (sign bit of the
2s-complement number; 3Y and Y have the same sign). If in the memory from where
we take the numbers just two additional bits are stored together with each value of the
set of numbers, we can decompose the previous add in three shorter adds that can be
done in parallel. In this way, the delay is the same of a 7-bit adder:




                Fig. 3.3: Modified previous add

       Bits which are going to be stored are the two intermediate carry signals c8 and
c15. Before each word of the set of numbers is stored in the memory,the value of its
intermediate carries has to be obtained and stored beside it. In this way, they are
immediately available when it is required to perform the previous add to get the
multiple 3Y of one of the numbers that belongs to the set.

       The increment in memory requirements is relatively small (9.5%, 23 bits
instead of 21 for every word), and the gain in time is obvious because we substitute a

                                            13
21-bit adder by three 7-bit adders which can operate in parallel. In order to get the
minimum delay in the previous adder we use high-speed adders. The adders that best
fit our needs are the carry and sum select adders (CSSA) with an estimated delay of

         where n is the word length.

        So reducing the word length to one third, the diminishing of the previous add
delay will be 42% approximately. Although this reduction, the previous add delay will
keep on being dominant compared to the recodification time which is the only
operation that can be done in parallel with the previous add.




3.3 Multiplier unit design

The multiplication of two binary numbers, 21-bit length, 2s-complement and using the

algorithm with radix-8 recoding of the multiplier presents the following features:

a) Radix-8 recoding of the multiplier implies a reduction in the number of digits to 7:




            Fig. 3.4: Multiplier recoding.




b) The partial products multiplexer must choose one out of nine possibilities depending
on the value of the corresponding signed-digit, as shown in figure 3.5:




                                             14
Fig. 3.5: Partial products multiplexer.

c) The partial product length is two bits longer than the multiplicand length, giving
23-bit length partial products.

d) The number of partial products entering the Wallace tree structure is 8: 7 coming
from the multiplier recoded digits plus another partial product due to the compensation
bits of the 2scomplement multiplication algorithm which cannot be included in any of
the other 7 words.

e) The best structure for the reduction of 8 partial products applies only 4-2
compressors [7] (instead of the conventional full adders) .

The Wallace tree has the following scheme:




             Fig. 8: Wallace reduction tree.

with an equivalent delay of 6 logic gates.


                                             15
f) The previous and the final add must be done as fast as possible, so they are
implemented with carry and sum select adders (CSSA). In order to have a better
understanding of the multiplier design we are going to show an example following the
radix-8 recoding algorithm.

Consider the multiplication of these 2s-complement binary numbers:

         Multiplicand: 111100010010110111001

         Multiplier: 100011010100110100111

The multiplier recoding has the result shown here (following table 1):




The generation of three times the multiplicand gives:




       The partial products array and its summation, which gives the multiplication
result, is shown in figure 9. In the array, some bits are encircled (fixed 1’s) and they
avoid the partial products sign extension. Some other bits are squared and they will be
1’s when the corresponding partial product has to be complemented (if recodification
gives a negative digit).

       The leading four partial products will enter the first block of 4-2 compressors
while the other three partial products plus the compensation bits will enter the second
block of 4-2 compressors, still in the first compression level. Moreover, the final adder
has been decomposed in three adders with lengths 3, 6 and 31 bits. The 31-bit adder is
the proper final adder while the 3 and the 6-bit adders are used to advance bits of the
final result without passing through all the compression blocks in the Wallace tree.




                                           16
CHAPTER 4

                                 Wallace Tree

       The Wallace tree method is used in high speed designs in order to produce two
rows of partial products that can be added in the last stage. Also critical path and the
number of adders get reduced when compared to the conventional parallel adders.
Here the Wallace tree has taken the role of accelerating the accumulation of the partial
products. Its advantage becomes more pronounced for multipliers of greater than 16
bits .The speed, area and power consumption of the multipliers will be in direct
proportion to the efficiency of the compressors.

       The Wallace tree structure with 3:2 compressors and 4:2 compressors is
shown in Figure 3.2 and Figure 3.3 respectively. In this regard, we can expect a
significant reduction in computing multiplications.




                                          17
Figure 4.2 Wallace Tree using 4:2 compressors

       The 3:2 compressors make use of a carry save adder .The carry save adder
outputs two numbers of the same dimensions as the inputs, one is a sequence of
partial sum bits and other is a sequence of carry bits. In carry save adder, the carry
digit is taken from the right and passed to the left, just as in conventional addition; but
the carry digit passed to the left is the result of the previous calculation and not the
current one.

       So in each clock cycle, carries only have to move one step along and the clock
can tick much faster. Also the carry-save adder produces all of its output values in
parallel, and thus has the same delay as a single full-adder. The 4:2 compressors have
been widely employed in the high speed multipliers to lower the latency of the partial
product accumulation stage.

       A 4:2compressor can be built using two 3:2 compressors. Owing to its regular
interconnection, the 4:2 compressors is ideal for the construction of regularly


                                            18
structured Wallace Tree with low complexity. The number of levels in the Wallace
tree using 3:2 compressors can be approximately given as



       Number of Levels =



       3.3Where, k is the number of partial products.


       Table III shows the number of levels in the Wallace tree using 3:2 compressors
for different number of partial products.




       Table III . NUMBER OF LEVELS IN THE WALLACE TREE


       The final results obtained at the output of the Wallace tree are added using a
Carry Look-ahead Adder (CLA) which is independent of the number of bits of the
two operands. In Carry Look-ahead Adder, for every bit the carry and sum outputs are
independent of the previous bits and thus the rippling effect has completely been
eliminated.




                                            19
It works by creating two signals, propagate and generate for each bit position,
based on whether a carry is propagated through from a less significant bit position, a
carry is generated in that bit position, or if a carry is killed in that bit position.
         The design entry of 126×126 bit multipliers using Radix-4 Booth algorithm
with 3:2 compressors and Radix-8 Booth algorithm with 4:2 compressors are done
using VHDL and simulated using ModelSim SE 6.4 design suite from Mentor
Graphics. It is then synthesized and implemented in a Xilinx XC3S5000 fg1156 -4
FPGA using the Xilinx ISE 9.2i design suite.
         Figure 4 presents a snapshot of simulation waveforms for 126×126 bit
multiplier. Table IV summarizes the FPGA resource utilization of these two
multipliers.
        Finally the performance improvement is validated by implementing a higher
order FIR filter using these multipliers. Table V summarizes the FPGA resource
utilization for FIR filters using these multipliers.
         This shows that the multiplier using Radix-8 Booth multiplier with 4:2
compressors gives better speed and the number of occupied slices is lower for the
multiplier using Radix-4 Booth algorithm with 3:2 compressors.
        The FIR filters are implemented in Xilinx XC3S1500fg676-4 FPGA. The
specifications of the FIR filter chosen are as follows.

Sampling frequency : 24 KHz
Pass band frequency : 8 KHz
Stop band frequency : 9 KHz
Pass band ripple : 0.1 linear scale
Stop band attenuation : 0.001 linear scale




                                              20
TABLE IV. DEVICE UTILIZATION SUMMARY OF MULTIPLIERS


                                    CHAPTER 5
                          TOOLS AND HDL USED

5.1 ROLE OF HDL:
        An HDL provides the framework for the complete logical design of the ASIC. All the
activities coming under the purview of an HDL are shown enclosed in bold dotted lines .
Verilog and VHDL are the two most commonly used HDLs today. Both have constructs with
which the design can be fully described at all the levels. There are additional constructs
available to facilitate setting up of the test bench, spelling out test vectors for them and
“observing” the outputs from the designed unit.



                                             21
IEEE has brought out Standards for the HDLs, and the software tools conform to
 them. Verilog as an HDL was introduced by Cadence Design Systems; they placed it into the
 public domain in 1990. It was established as a formal IEEE Standard in 1995. The revised
 version has been brought out in 2001. However, most of the simulation tools available today
 conform only to the 1995 version of the standard.VHDL used by a substantial number of the
 VLSI designers today is the used in this project for modeling the design.
          We have used Xilinx ISE 9.2i for simulation and synthesis purposes. We
 implemented the prescribed design in VHDL, a famous Industry and IEEE standard HDL.
 5.2 NEEDS OF (V)HDL:

     o    Interoperability.
     o    Technology independence.
     o    Design reuse.
     o    Several levels of abstraction.
     o    Readability.
     o    Standard language.
     o    Widely supported.




  What is VHDL?

 VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)


Specify                  Capture              Verify               Formalize           Implement



                      Fig.5.1 Data Flow of VHDL

 VHDL language are called as

      Design specification language.
      Design entry language.
      Design simulation language.
      Design documentation language.
 An alternative to schematics.

 5.2.1 BRIEF HISTORY:


                                               22
o VHDL was developed in the early 1980s for managing design problems that
      involved large circuits and multiple teams of engineers.
   o Funded by U.S Department of Defence.
   o The first publicly available version was released in 1985.
   o In 1986 IEEE (Institute of Electrical and Electronics Engineers) was presented
      with a proposal to standardize the VHDL.

   o In 1987 standardization => IEEE 1076-1987.
   o An improved version of the language was released in 1994=> IEEE      standard
      1076-1993 .

Related Standards:

   o IEEE 1076 doesn’t support simulation conditions such as unknown and high-
      impedance.
   o Soon after IEEE 1076-1987was released, simulator companies began using
      their own, non-standard types=>VHDL was becoming a nonstandard.
   o IEEE 1164 standard was developed by an IEEE.IEEE1164 contains definitions
      for a nine –valued data type, std_logic.




5.3 VHDL ENVIRONMENT:




                                         23
Fig 5.2 VHDL Environment

Design Units:

Segments of VHDL code that can be compiled separately and stored in a library.




                            Fig.5.3 Designs Uni




                                        24
5.3 LEVELS OF ABSTRACTION:
       VHDL supports many possible styles of design description, which differ
primarily in how closely they relate to the HW.
It is possible to describe a circuit in a number of ways.

    Structural.

    Data flow.

    Behavioral.

Structural VHDL description:

   •   Circuit is described in terms of its components.
   •   From a low-level description (e.g., transistor-level description)to a high level
       description.
   •   For large circuits, a low-level description quickly becomes impractical.
Dataflow VHDL Description:
   •   Circuit is described in terms of how data moves through the system.
   •   In the dataflow style you described how information flows between registers
       in the system.
   •   The combinational      of is described at a relatively high level, the placement
       and operation register is specified quite precisely.




             Fig 5.4.Data Flow Of VHDL Description


                                            25
•    The behavior of the system over the time is defined by registers.
   •    There are no build-in registers in VHDL-language.
        -Either lowers level description.

        -Or behavioral description of sequential elements is needed.

   •    The lower level descriptions must be created or obtained.
   •    If their is no 3rd party models for registers => you must write the behavioral
        description of registers.
   •    The      behavioral   description    can    be    provided   in   the    form   of
        subprograms(functions or procedures).
Behavioral VHDL Description
   •    Circuit is described in terms of its operation over time.
   •    Representation might include, e.g., state diagrams ,timing diagrams and
        algorithmic descriptions.
   •    The concept of time may be expressed precisely using delays(e.g., A<=B after
        10ns).
   •    If no actual delay is used, order of sequential operations is defined.
   •    In the lower level of abstraction (e.g., RTL) synthesis tools ignore detailed
        timing specifications.
   •    The actual timing results depend on implementation technology and efficiency
        of synthesis tools.
   •    There are few tools for behavioral synthesis.
       General format:

                                 Process [(sensitivity list)]

                                 Process_declarative_part

                                            Begin

                                    Process_statements

                                     [wait_statement]

                                        End process


                                             26
CHAPTER 6
                            SOFTWARE TOOLS

6.1 SOFTWARE TOOL-XILINX:
       Xilinx ISE is a software tool produced by Xilinx for synthesis and analysis of
HDL designs, which enables the developer to synthesize ("compile") their designs,
perform timing analysis, examine RTL diagrams, simulate a design's reaction to
different stimuli, and configure the target device with the programmer.
        Xilinx was founded in 1984 by two semiconductor engineers, Ross Freeman
and Bernard Vonderschmitt, who were both working for integrated circuit and solid-
state device manufacturer Zilog Corp.
       While working for Zilog, Freeman wanted to create chips that acted like a
blank tape, allowing users to program the technology themselves. At the time, the
concept was paradigm-changing. "The concept required lots of transistors and, at that
time, transistors were considered extremely precious – people thought that Ross's idea
was pretty far out", said Xilinx Fellow Bill Carter, who when hired in 1984 as the first
IC designer was the company's eighth employee.
       Xilinx   is a software tool, which is used to run the programs in VHDL
language. It has various versions like Xilinx 92.1, Xilinx 10.1, Xilinx 10.5 etc. Xilinx
has various pre-defined libraries ,packages.
6.2 VERSION 9.2I:
New Device Support.
This release supports the new Spartan™- 3A DSP family.
New Software Features.
Following are the new features in this release.
Operating System Support:
   •   Support for Windows® Vista Business 32-bit operating system.

   •   This operating system is supported, but has had limited testing.

   •   Support for Windows XP Professional 64-bit operating system


                                           27
•   Support for Red Hat Enterprise WS 5.0 32-bit and 64-bit operating system.
       This operating system is supported, but has had limited testing.




WHY XILINX ONLY?
       We have many software tools to run the VHDL programs like cadence .But
compared to all software tools Xilinx is cost effective.




                                           28
CHAPTER 7
                             TUTORIAL OF ISE8.2i
ISE 8.2i Quick Start Tutorial

                The ISE 8.2i Quick Start Tutorial provides Xilinx PLD designers with
a quick overview of the basic design process using ISE 8.2i. After you have
completed the tutorial, you will have an understanding of how to create, verify, and
implement a design.

Note: This tutorial is designed for ISE 8.2i on Windows.

This tutorial contains the following sections:

• “Getting Started”

• “Create a New Project”

• “Create an HDL Source”

• “Design Simulation”

• “Create Timing Constraints”

• “Implement Design and Verify Constraints”

• “Reimplement Design and Verify Pin Locations”

• “Download Design to the Spartan™-3 Demo Board”




For an in-depth explanation of the ISE design tools, see the ISE In-Depth Tutorial on
the

Xilinx® web site at: http://www.xilinx.com/support/techsup/tutorials/




                                           29
Getting Started

Software Requirements:

To use this tutorial, you must install the following software:

• ISE 8.2i

For more information about installing Xilinx® software, see the ISE Release Notes
and

Installation Guide at: http://www.xilinx.com/support/software_manuals.htm.

Hardware Requirements:

To use this tutorial, you must have the following hardware:

• Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board

Starting the ISE Software

To start ISE, double-click the desktop icon,




or start ISE from the Start menu by selecting:

Start → All Programs → Xilinx ISE 8.2i → Project Navigator

Note: Your start-up path is set during the installation process and may differ from the
one above.

Accessing Help

At any time during the tutorial, you can access online help for additional information

about the ISE software and related tools.

                                            30
To open Help, do either of the following:

• Press F1 to view Help for the specific tool or function that you have selected or

highlighted.

• Launch the ISE Help Contents from the Help menu. It contains information about

creating and maintaining your complete design flow in ISE.




Figure 1: ISE Help Topics

Create a New Project

Create a new ISE project which will target the FPGA device on the Spartan-3 Startup
Kit demo board.

To create a new project:

1. Select File > New Project... The New Project Wizard appears.

2. Type tutorial in the Project Name field.

3. Enter or browse to a location (directory path) for the new project. A tutorial
subdirectory is created automatically.

4. Verify that HDL is selected from the Top-Level Source Type list.

5. Click Next to move to the device properties page.

6. Fill in the properties in the table as shown below:

♦ Product Category: All


                                            31
♦ Family: Spartan3

♦ Device: XC3S200

♦ Package: FT256

♦ Speed Grade: -4

♦ Top-Level Module Type: HDL

♦ Synthesis Tool: XST (VHDL/Verilog)

♦ Simulator: ISE Simulator (VHDL/Verilog)

♦ Verify that Enable Enhanced Design Summary is selected.

Leave the default values in the remaining fields.

When the table is complete, your project properties will look like the following:




                                          32
Figure 2: Project Device Properties

7. Click Next to proceed to the Create New Source window in the New Project
Wizard. At the end of the next section, your new project will be complete.




Create an Verilog HDL Source

In this section, I will create the a example top-level Verilog HDL file

Creating a Verilog Source

Create the top-level Verilog source file as follows:

1. Click New Source in the New Project dialog box.


                                           33
2. Select Verilog Module as the source type in the New Source dialog box.

3. Type in the file name counter.

4. Verify that the Add to Project checkbox is selected.

5. Click Next.

6. Declare the ports for the counter design by filling in the port information as shown

below:




                               Figure 5: Define Module



                                           34
7. Click Next, then Finish in the New Source Information dialog box to complete the
new source file template.

8. Click Next, then Next, then Finish.

The source file containing the counter module displays in the Workspace, and the
counter displays in the Sources tab, as shown below:




                                         35
Figure 6: New Project in ISE

Using Language Templates (Verilog)

The next step in creating the new source is to add the behavioral description for
counter.




                                       36
Use a simple counter code example from the ISE Language Templates and customize
it for the counter design.

1. Place the cursor on the line below the output [3:0] COUNT_OUT; statement.

2. Open the Language Templates by selecting Edit → Language Templates…

Note: You can tile the Language Templates and the counter file by selecting Window
→ Tile Vertically to make them both visible.

3. Using the “+” symbol, browse to the following code example:

Verilog → Synthesis Constructs → Coding Examples → Counter → Binary →

Up/Down Counters → Simple Counter

4. With Simple Counter selected, select Edit → Use in File, or select the Use
Template in File toolbar button. This step copies the template into the counter source
file.

5. Close the Language Templates.

Final Editing of the Verilog Source

1. To declare and initialize the register that stores the counter value, modify the

declaration statement in the first line of the template as follows:

replace: reg [<upper>:0] <reg_name>;

with: reg [3:0] count_int = 0;

2. Customize the template for the counter design by replacing the port and signal
name

placeholders with the actual ones as follows:

♦ replace all occurrences of <clock> with CLOCK

♦ replace all occurrences of <up_down> with DIRECTION

♦ replace all occurrences of <reg_name> with count_int



                                            37
3. Add the following line just above the endmodule statement to assign the register
value to the output port:

assign COUNT_OUT = count_int;

4. Save the file by selecting File → Save.

When you are finished, the code for the counter will look like the following:

module counter(CLOCK, DIRECTION, COUNT_OUT);

input CLOCK;

input DIRECTION;

output [3:0] COUNT_OUT;

reg [3:0] count_int = 0;

always @(posedge CLOCK)

if (DIRECTION)

count_int <= count_int + 1;

else

count_int <= count_int - 1;

assign COUNT_OUT = count_int;

endmodule

You have now created the Verilog source for the tutorial project.

Checking the Syntax of the New Counter Module

When the source files are complete, check the syntax of the design to find errors and
typos.

1. Verify that Synthesis/Implementation is selected from the drop-down list in the

Sources window.



                                             38
2. Select the counter design source in the Sources window to display the related

processes in the Processes window.

3. Click the “+” next to the Synthesize-XST process to expand the process group.

4. Double-click the Check Syntax process.

Note: You must correct any errors found in your source files. You can check for
errors in the Console tab of the Transcript window. If you continue without valid
syntax, you will not be able to simulate or synthesize your design.

5. Close the HDL file.

Design Simulation

Verifying Functionality using Behavioral Simulation

Create a test bench waveform containing input stimulus you can use to verify the

functionality of the counter module. The test bench waveform is a graphical view of a
test bench.

Create the test bench waveform as follows:

1. Select the counter HDL file in the Sources window.

2. Create a new test bench source by selecting Project → New Source.

3. In the New Source Wizard, select Test Bench WaveForm as the source type, and
type counter_tbw in the File Name field.

4. Click Next.

5. The Associated Source page shows that you are associating the test bench
waveform with the source file counter. Click Next.

6. The Summary page shows that the source will be added to the project, and it
displays the source directory, type and name. Click Finish.

7. You need to set the clock frequency, setup time and output delay times in the
Initialize Timing dialog box before the test bench waveform editing window opens.


                                           39
The requirements for this design are the following:

♦ The counter must operate correctly with an input clock frequency = 25 MHz.

♦ The DIRECTION input will be valid 10 ns before the rising edge of CLOCK.

♦ The output (COUNT_OUT) must be valid 10 ns after the rising edge of CLOCK.

The design requirements correspond with the values below.




Fill in the fields in the Initialize Timing dialog box with the following information:

♦ Clock Time High: 20 ns.

♦ Clock Time Low: 20 ns.

♦ Input Setup Time: 10 ns.

♦ Output Valid Delay: 10 ns.

♦ Offset: 0 ns.

♦ Global Signals: GSR (FPGA)

Note: When GSR(FPGA) is enabled, 100 ns. is added to the Offset value
automatically.

♦ Initial Length of Test Bench: 1500 ns.

Leave the default values in the remaining fields.




                                           40
Figure 7: Initialize Timing




            41
8. Click Finish to complete the timing initialization.

9. The blue shaded areas that precede the rising edge of the CLOCK correspond to the

Input Setup Time in the Initialize Timing dialog box. Toggle the DIRECTION port to

define the input stimulus for the counter design as follows:

♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so

that the counter will count up.

♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION high so

that the counter will count down.

Note: For more accurate alignment, you can use the Zoom In and Zoom Out toolbar
buttons.




                           Figure 8: Test Bench Waveform

                                           42
10. Save the waveform.

11. In the Sources window, select the Behavioral Simulation view to see that the test

bench waveform file is automatically added to your project.




        Figure 9: Behavior Simulation Selection

12. Close the test bench waveform.

Create a Self-Checking Test Bench Waveform

       Add the expected output values to finish creating the test bench waveform.
This transforms the test bench waveform into a self-checking test bench waveform.
The key benefit to a self-checking test bench waveform is that it compares the desired
and actual output values and flags errors in your design as it goes through the various
transformations, from behavioral HDL to the device specific representation.

       To create a self-checking test bench, edit output values manually, or run the
Generate Expected Results process to create them automatically. If you run the
Generate Expected Results process, visually inspect the output values to see if they
are the ones you expected for the given set of input values.


                                           43
To create the self-checking test bench waveform automatically, do the following:

1. Verify that Behavioral Simulation is selected from the drop-down list in the
Sources window.

2. Select the counter_tbw file in the Sources window.

3. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and
double-click the Generate Expected Simulation Results process. This process
simulates the design in a background process.

4. The Expected Results dialog box opens. Select Yes to annotate the results to the
test bench.




                      Figure 10: Expected Results Dialog Box




5. Click the “+” to expand the COUNT_OUT bus and view the transitions that

correspond to the Output Delay value (yellow cells) specified in the Initialize Timing

dialog box.




                                          44
Figure 11: Test Bench Waveform with Results




6. Save the test bench waveform and close it.

You have now created a self-checking test bench waveform.

Simulating Design Functionality

Verify that the counter design functions as you expect by performing behavior
simulation

as follows:

1. Verify that Behavioral Simulation and counter_tbw are selected in the Sources
window.

2. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and

double-click the Simulate Behavioral Model process.


                                          45
The ISE Simulator opens and runs the simulation to the end of the test bench.

3. To view your simulation results, select the Simulation tab and zoom in on the

transitions.

The simulation waveform results will look like the following:




                           Figure 12: Simulation Results




Note: You can ignore any rows that start with TX.




4. Verify that the counter is counting up and down as expected.

5. Close the simulation view. If you are prompted with the following message, “You
have an active simulation open. Are you sure you want to close it?“, click Yes to
continue.You have now completed simulation of your design using the ISE Simulator.




                                         46
CHAPTER-8
                           HARDWARE TOOLS
        A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturing—hence the name
"field-programmable". FPGAs are programmed using a logic circuit diagram or a
source code in a hardware description language (HDL) to specify how the chip will
work.
        They can be used to implement any logical function that an application-
specific integrated circuit (ASIC) could perform, but the ability to update the
functionality after shipping offers advantages for many applications. FPGAs contain
programmable logic components called "logic blocks", and a hierarchy of
reconfigurable interconnects that allow the blocks to be "wired together"—somewhat
like a one-chip programmable breadboard.
        Logic blocks can be configured to perform complex combinational functions,
or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks
also include memory elements, which may be simple flip-flops or more complete
blocks of memory.
7.1 HISTORY
        The FPGA industry sprouted from programmable read only memory (PROM)
and programmable logic devices (PLDs). PROMs and PLDs both had the option of
being programmed in batches in a factory or in the field (field programmable),
however programmable logic was hard-wired between logic gates.
        Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the
first commercially viable field programmable gate array in 1985 – the XC2064. The
XC2064 had programmable gates and programmable interconnects between gates, the
beginnings of a new technology and market. The XC2064 boasted a mere 64
configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than
20 years later, Freeman was entered into the National Inventor's Hall of Fame for his
invention.




                                         47
7.2 ARCHITECTURE
       The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs), I/O pads, and routing channels. Generally, all the routing
channels have the same width (number of wires). Multiple I/O pads may fit into the
height of one row or the width of one column in the array.
       An application circuit must be mapped into an FPGA with adequate resources.
While the number of CLBs and I/Os required is easily determined from the design,
the number of routing tracks needed may vary considerably even among designs with
the same amount of logic.




                  Fig 7.1 Internal Structure of FPGA


7.3 APPLICATIONS
       Applications of FPGAs include digital signal processing, software-defined
radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer
vision, speech recognition, cryptography, bioinformatics, computer hardware
emulation, radio astronomy and a growing range of other areas.

7.4 A BRIEF TUTORIAL: SOURCE CODE IS DUMPED INTO FPGA.
                                          48
1.   Now let’s look at the flow for actually synthesizing and implementing the
     design in the FPGA prototyping boards. Close ModelSim and go back to the
     Xilinx ISE environment. In the Sources subwindow change the selection in
     the dropdown box from “Behavioral Simulation” to
     “Synthesis/Implementation”.




                                         


2.   To properly synthesize the design we need to specify which pins on the chip
     all the inputs and outputs should be assigned to. In general of course we could
     assign the signals just about any way we want. Since we will be using specific
     prototype boards, we need to make sure our pins assignments match the
     switches, buttons, and LEDs so we can test our design. We will be starting
     with Digilab 2E boards that are connected to Digilab DIO2 input/output
     boards. The I/O board has already been programmed and configured to have
     the following connections:




                                       49
3.   To assign specific pins, expand the User Constraints selection under the
     Process subwindow and double-click on Assign Package Pins.




                                       50
4.   A new application called Xilinx PACE should be launched.




     a. In the Design Object List subwindow you should see a listing of all the
        input and output signals from our design.




                                       51
Here is where we can specify which pin locations we want for each signal.
Simply enter the pins numbers from the tables shown in Step 19 above,
making sure to use a capital letter “P” in front of the pin specification.
Let’s assign our signals as        A  P163 (Switch 1)
                              I0  P164 (Switch 2)
                              I1  P166 (Switch 3)




                                 52
Y  P149 (LED 0)




        Once all pins have been assigned, save your constraints by selecting File
         Save from the menu bar and exit Xilinx Pace.


5.   Back in the Xilinx ISE. In the Process subwindow double-click on the
     Synthesize – XST selection and wait for the process to complete. Then
     double-click on the Implement Design selection and wait for the process to
     complete. Then double-click on the Generate Programming File selection and
     wait for the process to complete. If all goes well, you should have green
     checks marks for the whole design.




                                       53
6.   There is a lot of information you can obtain through all of the objects listed in
     the Processes subwindow, but let us proceed to downloading the design onto
     the prototyping board for testing. First make sure the prototyping board is
     connected to the PC and has power on. Also make sure the slide switch on the
     FPGA board by the parallel port is set to JTAG (as opposed to “Port”). Then
     select Configure Device (iMPACT) underneath the Generate Programming
     File selection. You should the following window




                                         54
7.   Now you need to specify which bitstream file to use to configure the device.
     For this tutorial we want to select the mux.bit file and click Open.




                                        55
You will probably get the message below. Just click Yes.




                                 56
You will also get a warning message saying the JTAG clock was updated in
     the bitstream file (which is good) so just click OK. There is a way to correct
     for that in the original design flow, but Xilinx automatically catches it here so
     I don’t usually bother.




8.   You should now see the Spartan XC2S200E chip in the main window. Right
     click on the chip to prepare for downloading the bitstream file.




     Select Program on the resulting window.




                                         57
9.   Click OK.




                 58
If all goes well you should get the Programming Succeeded message




10.    Now just test and verify your design on the actual FPGA board!




                                CONCLUSION
       It has been performed the design, implementation and simulation of a 21´21-
bit, radix-8, multiplier unit for specific purpose. The number of transistors is 8224
with an active area size of 2.97 mm2. The measured multiplication time is 9.4 ns and
the power dissipation is 60.7 mW at the frequency of 10 MHz It has been proved that
it can be useful to apply a radix-8 architecture in high-speed multipliers for specific
purpose because of the gain in time and number of transistors compared to the
conventional radix-4 recoding architecture.
       This can be achieved with a slight modification in the previous adder. To do
the modification is needed to store two additional bits (intermediate carries) for each
word in the set of numbers. Memory needs are increased in a 9.5% while time
decrease in the previous adder can be estimated in a 42%. Due to this, the overall
multiplication time can be reduced with our radix-8 architecture for specific purpose.


                                          59
REFERENCE

[1]    Dong-Wook Kim, Young-Ho Seo, “A New VLSI Architecture of Parallel
Multiplier-Accumulator based on Radix-2 Modified Booth Algorithm”, Very Large
Scale Integration (VLSI) Systems, IEEE Transactions, vol.18, pp.: 201-208, 04 Feb.
2010


[2] Prasanna Raj P, Rao, Ravi, “VLSI Design and Analysis of Multipliers for Low
Power”, Intelligent Information Hiding and Multimedia Signal Processing, Fifth
International Conference, pp.: 1354-1357, Sept. 2009




                                         60
[3] Lakshmanan, Masuri Othman and Mohamad Alauddin Mohd.Ali, “High
Performance Parallel Multiplier using Wallace-Booth Algorithm”, Semiconductor
Electronics, IEEE International Conference , pp.: 433- 436, Dec. 2002.


[4] Jan M Rabaey, “Digital Integrated Circuits, A Design Perspective”, Prentice Hall,
Dec.1995


[5] Louis P. Rubinfield, “A Proof of the Modified Booth's Algorithm for
Multiplication”, Computers, IEEE Transactions,vol.24, pp.: 1014-1015, Oct. 1975


[6] Rajendra Katti, “A Modified Booth Algorithm for High Radix Fixedpoint
Multiplication”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions,
vol. 2, pp.: 522-524, Dec. 1994.


7] C. S. Wallace, “A Suggestion for a Fast Multiplier”, Electronic Computers, IEEE
Transactions, vol.13, Page(s): 14-17, Feb. 1964


[8] Hussin R et al , “An Efficient Modified Booth Multiplier Architecture”, IEEE
International Conference, pp.:1-4, 2008.




                                           61

More Related Content

Viewers also liked

Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
IJEEE
 
Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014nexgentech
 
C0161018
C0161018C0161018
C0161018
IOSR Journals
 
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderFpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderMalik Tauqir Hasan
 
Good report on Adders/Prefix adders
Good report on Adders/Prefix addersGood report on Adders/Prefix adders
Good report on Adders/Prefix addersPeeyush Pashine
 
Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2IAEME Publication
 
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisMinimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisSajib Mitra
 
Cmos Arithmetic Circuits
Cmos Arithmetic CircuitsCmos Arithmetic Circuits
Cmos Arithmetic Circuits
ankitgoel
 

Viewers also liked (11)

Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
Layout and Design Analysis of Carry Look Ahead Adder using 90nm Technology
 
adders(1)
adders(1)adders(1)
adders(1)
 
report
reportreport
report
 
Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014
 
C0161018
C0161018C0161018
C0161018
 
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adderFpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
Fpga 07-port-rules-gate-delay-data-flow-carry-look-ahead-adder
 
Good report on Adders/Prefix adders
Good report on Adders/Prefix addersGood report on Adders/Prefix adders
Good report on Adders/Prefix adders
 
Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2Analysis of different bit carry look ahead adder using verilog code 2
Analysis of different bit carry look ahead adder using verilog code 2
 
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisMinimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
 
Cmos Arithmetic Circuits
Cmos Arithmetic CircuitsCmos Arithmetic Circuits
Cmos Arithmetic Circuits
 
Adder
Adder Adder
Adder
 

Similar to High bit rate_mul

unit 1vlsi notes.pdf
unit 1vlsi notes.pdfunit 1vlsi notes.pdf
unit 1vlsi notes.pdf
AcademicICECE
 
VLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptVLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.ppt
indrajeetPatel22
 
Report
ReportReport
1 VLSI Introduction.pptx
1 VLSI Introduction.pptx1 VLSI Introduction.pptx
1 VLSI Introduction.pptx
ShishirAhmed39
 
VLSI Design- Guru.ppt
VLSI Design- Guru.pptVLSI Design- Guru.ppt
VLSI Design- Guru.ppt
Ram Pavithra Guru
 
Gourp 12 Report.pptx
Gourp 12 Report.pptxGourp 12 Report.pptx
Gourp 12 Report.pptx
ShubhamMane733576
 
VLSI UNIT-1.1.pdf.ppt
VLSI UNIT-1.1.pdf.pptVLSI UNIT-1.1.pdf.ppt
VLSI UNIT-1.1.pdf.ppt
rajukolluri
 
introduction to cmos vlsi
introduction to cmos vlsi introduction to cmos vlsi
introduction to cmos vlsi
ssuser593a2d
 
Pasta documentation
Pasta  documentationPasta  documentation
Pasta documentation
chendrashekar pabbaraju
 
Digital Integrated Circuit (IC) Design
Digital Integrated Circuit (IC) DesignDigital Integrated Circuit (IC) Design
Digital Integrated Circuit (IC) Design
Mahesh Dananjaya
 
Chapter1.slides
Chapter1.slidesChapter1.slides
Chapter1.slides
Avinash Pillai
 
Electronics.ppt
Electronics.pptElectronics.ppt
Electronics.ppt
AbdullahGubbi1
 
Electronics for basic engineers and .ppt
Electronics for basic engineers and .pptElectronics for basic engineers and .ppt
Electronics for basic engineers and .ppt
MrRRThirrunavukkaras
 
Electronics.ppt useful to electronics engineers.
Electronics.ppt useful to electronics engineers.Electronics.ppt useful to electronics engineers.
Electronics.ppt useful to electronics engineers.
abhishagi22
 
Electronics introduction and its various devises
Electronics introduction and its various devisesElectronics introduction and its various devises
Electronics introduction and its various devises
SIVA NAGI REDY KALLI
 
Electronics.ppt
Electronics.pptElectronics.ppt
Electronics.ppt
PRASATHD8
 
Electronics.ppt
Electronics.pptElectronics.ppt
Electronics.ppt
Srihariv11
 

Similar to High bit rate_mul (20)

unit 1vlsi notes.pdf
unit 1vlsi notes.pdfunit 1vlsi notes.pdf
unit 1vlsi notes.pdf
 
VLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptVLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.ppt
 
Report
ReportReport
Report
 
MAJOR PROJEC TVLSI
MAJOR PROJEC TVLSIMAJOR PROJEC TVLSI
MAJOR PROJEC TVLSI
 
1 VLSI Introduction.pptx
1 VLSI Introduction.pptx1 VLSI Introduction.pptx
1 VLSI Introduction.pptx
 
VLSI Design- Guru.ppt
VLSI Design- Guru.pptVLSI Design- Guru.ppt
VLSI Design- Guru.ppt
 
Gourp 12 Report.pptx
Gourp 12 Report.pptxGourp 12 Report.pptx
Gourp 12 Report.pptx
 
VLSI UNIT-1.1.pdf.ppt
VLSI UNIT-1.1.pdf.pptVLSI UNIT-1.1.pdf.ppt
VLSI UNIT-1.1.pdf.ppt
 
introduction to cmos vlsi
introduction to cmos vlsi introduction to cmos vlsi
introduction to cmos vlsi
 
ArvindP1
ArvindP1ArvindP1
ArvindP1
 
nikhil.pptx
nikhil.pptxnikhil.pptx
nikhil.pptx
 
Pasta documentation
Pasta  documentationPasta  documentation
Pasta documentation
 
Digital Integrated Circuit (IC) Design
Digital Integrated Circuit (IC) DesignDigital Integrated Circuit (IC) Design
Digital Integrated Circuit (IC) Design
 
Chapter1.slides
Chapter1.slidesChapter1.slides
Chapter1.slides
 
Electronics.ppt
Electronics.pptElectronics.ppt
Electronics.ppt
 
Electronics for basic engineers and .ppt
Electronics for basic engineers and .pptElectronics for basic engineers and .ppt
Electronics for basic engineers and .ppt
 
Electronics.ppt useful to electronics engineers.
Electronics.ppt useful to electronics engineers.Electronics.ppt useful to electronics engineers.
Electronics.ppt useful to electronics engineers.
 
Electronics introduction and its various devises
Electronics introduction and its various devisesElectronics introduction and its various devises
Electronics introduction and its various devises
 
Electronics.ppt
Electronics.pptElectronics.ppt
Electronics.ppt
 
Electronics.ppt
Electronics.pptElectronics.ppt
Electronics.ppt
 

High bit rate_mul

  • 1. CHAPTER-1 INTRODUCTION TO VLSI DOMAIN 1.1 VLSI DESIGN: The complexity of VLSI is being designed and used today makes the manual approach to design impractical. Design automation is the order of the day. With the rapid technological developments in the last two decades, the status of VLSI technology is characterized by the following A steady increase in the size and hence the functionality of the ICs: • A steady reduction in feature size and hence increase in the speed of operation as well as gate or transistor density. • A steady improvement in the predictability of circuit behavior. • A steady increase in the variety and size of software tools for VLSI design. The above developments have resulted in a proliferation of approaches to VLSI design. 1.2 HISTORY OF VLSI: VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have increased in complexity into the hundreds of millions of transistors. This is the field which involves packing more and more logic devices into smaller and smaller areas. VLSI circuits can now be put into a small space few millimeters across.. VLSI circuits are everywhere ... our computer, our car, our brand new state-of-the-art digital camera, the cell-phones, and what we have. 1.3 VARIOUS INTEGRATIONS: Over time, millions, and today billions of transistors could be placed on one chip, and to make a good design became a task to be planned thoroughly. In the early days of integrated circuits, only a few transistors could be placed on a chip as the scale used was large because of the contemporary technology, and 1
  • 2. manufacturing yields were low by today's standards. As the degree of integration was small, the design was done easily. Over time, millions, and today billions of transistors could be placed on one chip, and to make a good design became a task to be planned thoroughly. 1.3.1 SSI TECHNOLOGY: The first integrated circuits contained only a few transistors. Called "small- scale integration" (SSI), digital circuits containing transistors numbering in the tens provided a few logic gates for example, while early linear ICs such as the Plessey SL201 or the Philips TAA320 had as few as two transistors. The term Large Scale Integration was first used by IBM scientist Rolf Landauer when describing the theoretical concept from there came the terms for SSI, MSI, VLSI, and ULSI. 1.3.2 MSI TECHNOLOGY: The next step in the development of integrated circuits, taken in the late 1960s, introduced devices which contained hundreds of transistors on each chip, called "medium-scale integration" (MSI). They were attractive economically because while they cost little more to produce than SSI devices, they allowed more complex systems to be produced using smaller circuit boards, less assembly work (because of fewer separate components), and a number of other advantages. 1.3.3 LARGE SCALE INTEGRATION: Further development, driven by the same economic factors, led to "large-scale integration" (LSI) in the mid 1970s, with tens of thousands of transistors per chip. Integrated circuits such as 1K-bit RAMs, calculator chips, and the first microprocessors, that began to be manufactured in moderate quantities in the early 1970s, had under 4000 transistors. True LSI circuits, approaching 10,000 transistors, began to be produced around 1974, for computer main memories and second- generation microprocessors. 1.3.4 VLSI: Final step in the development process, starting in the 1980s and continuing through the present, was in the early 1980s, and continues beyond several billion transistors as of 2009. In 1986 the first one megabit RAM chips were introduced, which contained more than one million transistors. Microprocessor chips passed the 2
  • 3. million transistor mark in 1989 and the billion transistor mark in 2005.The trend continues largely unabated, with chips introduced in 2007 containing tens of billions of memory transistors. VLSI DESIGN FLOW: Start Design Entity Logic Synthesis Pre layout Simulation System Partitioning Floor Planning Pre layout Simulation Placement Routing Circuit Extraction Finish Fig 2.1 vlsi design flow 3
  • 4. 1.4 ULSI, WSI, SOC and 3D-IC: To reflect further growth of the complexity, the term ULSI that stands for "ultra-large-scale integration" was proposed for chips of complexity of more than 1 million transistors. Wafer-scale integration (WSI) is a system of building very-large integrated circuits that uses an entire silicon wafer to produce a single "super-chip". Through a combination of large size and reduced packaging. A system-on-a-chip ( SOC) is an integrated circuit in which all the components needed for a computer or other system are included on a single chip. The design of such a device can be complex and costly, and building disparate components on a single piece of silicon may compromise the efficiency of some elements. However, these drawbacks are offset by lower manufacturing and assembly costs and by a greatly reduced power budget: because signals among the components are kept on-die, much less power is required. Three-dimensional integrated circuit (3D-IC) has two or more layers of active electronic components that are integrated both vertically and horizontally into a single circuit, &less power consumption. 1.5 VLSI DESIGN FLOW AND THEIR DESCRIPTION: The design at the behavioral level is to be elaborated in terms of known and acknowledged functional blocks. It forms the next detailed level of design description. Once again the design is to be tested through simulation and iteratively corrected for errors. The elaboration can be continued one or two steps further. It leads to a detailed design description in terms of logic gates and transistor switches. Optimization The circuit at the gate level – in terms of the gates and flip-flops – can be redundant in nature. The same can be minimized with the help of minimization tools. The step is not shown separately in the figure. The minimized logical design is converted to a circuit in terms of the switch level cells from standard libraries provided by the foundries. The cell based design generated by the tool is the last step in the logical design process; it forms the input to the first level of physical design. Simulation The design descriptions are tested for their functionality at every level – behavioral, data flow, and gate. One has to check here whether all the functions are carried out as expected and rectify them. All such activities are carried out by the 4
  • 5. simulation tool. The tool also has an editor to carry out any corrections to the source code. Simulation involves testing the design for all its functions, functional sequences, timing constraints, and specifications. Normally testing and simulation at all the levels – behavioral to switch level – are carried out by a single tool; the same is identified as “scope of simulation tool” in Figure 1.1. 5
  • 6. Synthesis With the availability of design at the gate (switch) level, the logical design is complete. The corresponding circuit hardware realization is carried out by a synthesis tool. Two common approaches are as follows: • The circuit is realized through an FPGA. The gate level design description is the starting point for the synthesis here. The FPGA vendors provide an interface to the synthesis tool. Through the interface the gate level design is realized as a final circuit. With many synthesis tools, one can directly use the design description at the data flow level itself to realize the final circuit through an FPGA. The FPGA route is attractive for limited volume production or a fast development cycle. • The circuit is realized as an ASIC. A typical ASIC vendor will have his own library of basic components like elementary gates and flip-flops. Eventually the circuit is to be realized by selecting such components and interconnecting them conforming to the required design. This constitutes the physical design. Being an elaborate and costly process, a physical design may call for an intermediate functional verification through the FPGA route. The circuit realized through the FPGA is tested as a prototype. It provides another opportunity for testing the design closer to the final circuit. Physical Design A fully tested and error-free design at the switch level can be the starting point for a physical design [Baker & Boyce, Wolf]. It is to be realized as the final circuit using (typically) a million components in the foundry’s library. The step-by-step activities in the process are described briefly as follows: • System partitioning: The design is partitioned into convenient compartments or functional blocks. Often it would have been done at an earlier stage itself and the software design prepared in terms of such blocks. Interconnection of the blocks is part of the partition process. • Floor planning: The positions of the partitioned blocks are planned and the blocks are arranged accordingly. The procedure is analogous to the planning and arrangement of domestic furniture in a residence. Blocks with I/O pins are kept close to the periphery; those which interact frequently or through a large number of interconnections are kept close together, and so on. Partitioning and floor planning may have to be carried out and refined iteratively to yield best results. 6
  • 7. • Placement: The selected components from the ASIC library are placed in position on the “Silicon floor.” It is done with each of the blocks above. • Routing: The components placed as described above are to be interconnected to the rest of the block: It is done with each of the blocks by suitably routing the interconnects. Once the routing is complete, the physical design cam is taken as complete. The final mask for the design can be made at this stage and the ASIC manufactured in the foundry. Post Layout Simulation Once the placement and routing are completed, the performance specifications like silicon area, power consumed, path delays, etc., can be computed. Equivalent circuit can be extracted at the component level and performance analysis carried out. This constitutes the final stage called “verification.” One may have to go through the placement and routing activity once again to improve performance. Critical Subsystems The design may have critical subsystems. Their performance may be crucial to the overall performance; in other words, to improve the system performance substantially, one may have to design such subsystems afresh. The design here may imply redefinition of the basic feature size of the component, component design, placement of components, or routing done separately and specifically for the subsystem. A set of masks used in the foundry may have to be done afresh for the purpose. 7
  • 8. CHAPTER 2 INTRODUCTION TO THE PROJECT 2.1 Motivation: The multiplication operation can be employed to implement the system performance and had been widely used in Digital Signal Processing and in Digital Communications. The traditional array based multiplication performs a regular usage of more number of addition and shifting operations, thus utilizing more amount of Hardware and having more complex operations. 2.2 Overview of the Project: Multiplication operation involves generation of partial products and their accumulation. The speed of multiplication can be increased by reducing the number of partial products and/or accelerating the accumulation of partial products. Among the many methods of implementing high speed parallel multipliers, there are two basic approaches namely Booth algorithm and Wallace Tree compressors. This paper describes an efficient implementation of a high speed parallel multiplier using both these approaches. Here two multipliers are proposed. The first multiplier makes use of the Radix-4 Booth Algorithm with 3:2 compressors while the second multiplier uses the Radix-8 Booth algorithm with 4:2 compressors. The design is structured for m x n multiplication where m and n can reach up to 126 bits. The number of partial products is n/2 in Radix-4 Booth algorithm while it gets reduced to n/3 in Radix-8 Booth algorithm. The Wallace tree uses Carry Save Adders (CSA) to accumulate the partial products. This reduces the time as well as the chip area. To further enhance the speed of operation, carry-look-ahead (CLA) adder is used as the final adder. 2.3 Organization of Thesis: The first chapter in this project report is introduction to the Booth Encoding. Second chapter gives the brief idea on different types of operations, like, addition and 8
  • 9. shifting. Third chapter is the different types of Wallace tree method. Fourth chapter shows the operation of Carry Look-ahead Adder scheme. The synthesis and simulation results for calculating processor (CP) reports in the fifth chapter. Conclusions and future scope are explained in sixth chapter, References are given after sixth chapter. The Code for calculating processor (CP) put in Appendix.The efficient implementation of Radix-8 multiplication operation is an important prerequisite in Booth Algorithm because multiplication operations are performed using Radix-8 representation operations in the underlying field. Wallace tree method provides an efficient way of adding the partial products. Three kinds of Radix operations that are especially amenable for the efficient implementation of multiplication operations. Finally a Carry Look-ahead Adder is used in addition of partial products. 9
  • 10. CHAPTER 3 BASIC THEORY OF BOOTH ALGORITHM 3.1 Introduction to Booth Algorithm: It consists of four major modules: Booth encoder, partial product generator, Wallace tree and carry look-ahead adder. The Booth encoder performs Radix-2 or Radix-4 encoding of the multiplier bits. Based on the multiplicand and the encoded multiplier, partial products are generated by the generator. For large multipliers of 32 bits, the performance of the modified Booth algorithm is limited. So Booth recoding together with Wallace tree structures have been used in the proposed fast multiplier. The partial products are supplied to Wallace Tree and added appropriately. The results are finally added using a Carry Look-ahead Adder (CLA) to get the final product. Fig 3.1 Block Diagram of Wallace Booth Multiplier 10
  • 11. 3.2 Radix – 8 Booth Algorithm Multiplier Bits Recoded Operation Yi+ Yi+ Y Yi – on 2 1 i 1 multiplicand ,X 0 0 0 0 0X 0 0 0 1 +1X 0 0 1 0 +1X 0 0 1 1 +2X 0 1 0 0 +2X 0 1 0 1 +3X 0 1 1 0 +3X 0 1 1 1 +4X 1 0 0 0 -4X 1 0 0 1 -3X 1 0 1 0 -3X 1 0 1 1 -2X 1 1 0 0 -2X 1 1 0 1 -1X 1 1 1 0 -1X 1 1 1 1 0X 11
  • 12. Table 3.2 Radix-8 Multiplication Here we have an odd multiple of the multiplicand, 3Y, which is not immediately available. To generate it we need to perform this previous add: 2Y+Y=3Y. But we are designing a multiplier for specific purpose and thereby the multiplicand belongs to a previously known set of numbers which are stored in a memory chip. We have tried to take advantage of this fact, to ease the bottleneck of the radix-8 architecture, that is, the generation of 3Y. In this manner we try to attain a better overall multiplication time, or at least comparable to the time we could obtain using a radix-4 architecture (with the additional advantage of using a less number of transistors). To generate 3Y with 21- bit words we only have to add 2Y+Y, that is, to add the number with the same number 12
  • 13. shifted one position to the left, getting in this way a new 23-bit word, as shown in below figure 3.2. Fig. 3.2: 21-bit previous add. In fact, only a 21-bit adder is needed to generate the bit positions from z1 to z21. Bits z0 and z22 are directly known because z0=y0 and z22=y20 (sign bit of the 2s-complement number; 3Y and Y have the same sign). If in the memory from where we take the numbers just two additional bits are stored together with each value of the set of numbers, we can decompose the previous add in three shorter adds that can be done in parallel. In this way, the delay is the same of a 7-bit adder: Fig. 3.3: Modified previous add Bits which are going to be stored are the two intermediate carry signals c8 and c15. Before each word of the set of numbers is stored in the memory,the value of its intermediate carries has to be obtained and stored beside it. In this way, they are immediately available when it is required to perform the previous add to get the multiple 3Y of one of the numbers that belongs to the set. The increment in memory requirements is relatively small (9.5%, 23 bits instead of 21 for every word), and the gain in time is obvious because we substitute a 13
  • 14. 21-bit adder by three 7-bit adders which can operate in parallel. In order to get the minimum delay in the previous adder we use high-speed adders. The adders that best fit our needs are the carry and sum select adders (CSSA) with an estimated delay of where n is the word length. So reducing the word length to one third, the diminishing of the previous add delay will be 42% approximately. Although this reduction, the previous add delay will keep on being dominant compared to the recodification time which is the only operation that can be done in parallel with the previous add. 3.3 Multiplier unit design The multiplication of two binary numbers, 21-bit length, 2s-complement and using the algorithm with radix-8 recoding of the multiplier presents the following features: a) Radix-8 recoding of the multiplier implies a reduction in the number of digits to 7: Fig. 3.4: Multiplier recoding. b) The partial products multiplexer must choose one out of nine possibilities depending on the value of the corresponding signed-digit, as shown in figure 3.5: 14
  • 15. Fig. 3.5: Partial products multiplexer. c) The partial product length is two bits longer than the multiplicand length, giving 23-bit length partial products. d) The number of partial products entering the Wallace tree structure is 8: 7 coming from the multiplier recoded digits plus another partial product due to the compensation bits of the 2scomplement multiplication algorithm which cannot be included in any of the other 7 words. e) The best structure for the reduction of 8 partial products applies only 4-2 compressors [7] (instead of the conventional full adders) . The Wallace tree has the following scheme: Fig. 8: Wallace reduction tree. with an equivalent delay of 6 logic gates. 15
  • 16. f) The previous and the final add must be done as fast as possible, so they are implemented with carry and sum select adders (CSSA). In order to have a better understanding of the multiplier design we are going to show an example following the radix-8 recoding algorithm. Consider the multiplication of these 2s-complement binary numbers: Multiplicand: 111100010010110111001 Multiplier: 100011010100110100111 The multiplier recoding has the result shown here (following table 1): The generation of three times the multiplicand gives: The partial products array and its summation, which gives the multiplication result, is shown in figure 9. In the array, some bits are encircled (fixed 1’s) and they avoid the partial products sign extension. Some other bits are squared and they will be 1’s when the corresponding partial product has to be complemented (if recodification gives a negative digit). The leading four partial products will enter the first block of 4-2 compressors while the other three partial products plus the compensation bits will enter the second block of 4-2 compressors, still in the first compression level. Moreover, the final adder has been decomposed in three adders with lengths 3, 6 and 31 bits. The 31-bit adder is the proper final adder while the 3 and the 6-bit adders are used to advance bits of the final result without passing through all the compression blocks in the Wallace tree. 16
  • 17. CHAPTER 4 Wallace Tree The Wallace tree method is used in high speed designs in order to produce two rows of partial products that can be added in the last stage. Also critical path and the number of adders get reduced when compared to the conventional parallel adders. Here the Wallace tree has taken the role of accelerating the accumulation of the partial products. Its advantage becomes more pronounced for multipliers of greater than 16 bits .The speed, area and power consumption of the multipliers will be in direct proportion to the efficiency of the compressors. The Wallace tree structure with 3:2 compressors and 4:2 compressors is shown in Figure 3.2 and Figure 3.3 respectively. In this regard, we can expect a significant reduction in computing multiplications. 17
  • 18. Figure 4.2 Wallace Tree using 4:2 compressors The 3:2 compressors make use of a carry save adder .The carry save adder outputs two numbers of the same dimensions as the inputs, one is a sequence of partial sum bits and other is a sequence of carry bits. In carry save adder, the carry digit is taken from the right and passed to the left, just as in conventional addition; but the carry digit passed to the left is the result of the previous calculation and not the current one. So in each clock cycle, carries only have to move one step along and the clock can tick much faster. Also the carry-save adder produces all of its output values in parallel, and thus has the same delay as a single full-adder. The 4:2 compressors have been widely employed in the high speed multipliers to lower the latency of the partial product accumulation stage. A 4:2compressor can be built using two 3:2 compressors. Owing to its regular interconnection, the 4:2 compressors is ideal for the construction of regularly 18
  • 19. structured Wallace Tree with low complexity. The number of levels in the Wallace tree using 3:2 compressors can be approximately given as Number of Levels = 3.3Where, k is the number of partial products. Table III shows the number of levels in the Wallace tree using 3:2 compressors for different number of partial products. Table III . NUMBER OF LEVELS IN THE WALLACE TREE The final results obtained at the output of the Wallace tree are added using a Carry Look-ahead Adder (CLA) which is independent of the number of bits of the two operands. In Carry Look-ahead Adder, for every bit the carry and sum outputs are independent of the previous bits and thus the rippling effect has completely been eliminated. 19
  • 20. It works by creating two signals, propagate and generate for each bit position, based on whether a carry is propagated through from a less significant bit position, a carry is generated in that bit position, or if a carry is killed in that bit position. The design entry of 126×126 bit multipliers using Radix-4 Booth algorithm with 3:2 compressors and Radix-8 Booth algorithm with 4:2 compressors are done using VHDL and simulated using ModelSim SE 6.4 design suite from Mentor Graphics. It is then synthesized and implemented in a Xilinx XC3S5000 fg1156 -4 FPGA using the Xilinx ISE 9.2i design suite. Figure 4 presents a snapshot of simulation waveforms for 126×126 bit multiplier. Table IV summarizes the FPGA resource utilization of these two multipliers. Finally the performance improvement is validated by implementing a higher order FIR filter using these multipliers. Table V summarizes the FPGA resource utilization for FIR filters using these multipliers. This shows that the multiplier using Radix-8 Booth multiplier with 4:2 compressors gives better speed and the number of occupied slices is lower for the multiplier using Radix-4 Booth algorithm with 3:2 compressors. The FIR filters are implemented in Xilinx XC3S1500fg676-4 FPGA. The specifications of the FIR filter chosen are as follows. Sampling frequency : 24 KHz Pass band frequency : 8 KHz Stop band frequency : 9 KHz Pass band ripple : 0.1 linear scale Stop band attenuation : 0.001 linear scale 20
  • 21. TABLE IV. DEVICE UTILIZATION SUMMARY OF MULTIPLIERS CHAPTER 5 TOOLS AND HDL USED 5.1 ROLE OF HDL: An HDL provides the framework for the complete logical design of the ASIC. All the activities coming under the purview of an HDL are shown enclosed in bold dotted lines . Verilog and VHDL are the two most commonly used HDLs today. Both have constructs with which the design can be fully described at all the levels. There are additional constructs available to facilitate setting up of the test bench, spelling out test vectors for them and “observing” the outputs from the designed unit. 21
  • 22. IEEE has brought out Standards for the HDLs, and the software tools conform to them. Verilog as an HDL was introduced by Cadence Design Systems; they placed it into the public domain in 1990. It was established as a formal IEEE Standard in 1995. The revised version has been brought out in 2001. However, most of the simulation tools available today conform only to the 1995 version of the standard.VHDL used by a substantial number of the VLSI designers today is the used in this project for modeling the design. We have used Xilinx ISE 9.2i for simulation and synthesis purposes. We implemented the prescribed design in VHDL, a famous Industry and IEEE standard HDL. 5.2 NEEDS OF (V)HDL: o Interoperability. o Technology independence. o Design reuse. o Several levels of abstraction. o Readability. o Standard language. o Widely supported. What is VHDL? VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC) Specify Capture Verify Formalize Implement Fig.5.1 Data Flow of VHDL VHDL language are called as  Design specification language.  Design entry language.  Design simulation language.  Design documentation language. An alternative to schematics. 5.2.1 BRIEF HISTORY: 22
  • 23. o VHDL was developed in the early 1980s for managing design problems that involved large circuits and multiple teams of engineers. o Funded by U.S Department of Defence. o The first publicly available version was released in 1985. o In 1986 IEEE (Institute of Electrical and Electronics Engineers) was presented with a proposal to standardize the VHDL. o In 1987 standardization => IEEE 1076-1987. o An improved version of the language was released in 1994=> IEEE standard 1076-1993 . Related Standards: o IEEE 1076 doesn’t support simulation conditions such as unknown and high- impedance. o Soon after IEEE 1076-1987was released, simulator companies began using their own, non-standard types=>VHDL was becoming a nonstandard. o IEEE 1164 standard was developed by an IEEE.IEEE1164 contains definitions for a nine –valued data type, std_logic. 5.3 VHDL ENVIRONMENT: 23
  • 24. Fig 5.2 VHDL Environment Design Units: Segments of VHDL code that can be compiled separately and stored in a library. Fig.5.3 Designs Uni 24
  • 25. 5.3 LEVELS OF ABSTRACTION: VHDL supports many possible styles of design description, which differ primarily in how closely they relate to the HW. It is possible to describe a circuit in a number of ways.  Structural.  Data flow.  Behavioral. Structural VHDL description: • Circuit is described in terms of its components. • From a low-level description (e.g., transistor-level description)to a high level description. • For large circuits, a low-level description quickly becomes impractical. Dataflow VHDL Description: • Circuit is described in terms of how data moves through the system. • In the dataflow style you described how information flows between registers in the system. • The combinational of is described at a relatively high level, the placement and operation register is specified quite precisely. Fig 5.4.Data Flow Of VHDL Description 25
  • 26. The behavior of the system over the time is defined by registers. • There are no build-in registers in VHDL-language. -Either lowers level description. -Or behavioral description of sequential elements is needed. • The lower level descriptions must be created or obtained. • If their is no 3rd party models for registers => you must write the behavioral description of registers. • The behavioral description can be provided in the form of subprograms(functions or procedures). Behavioral VHDL Description • Circuit is described in terms of its operation over time. • Representation might include, e.g., state diagrams ,timing diagrams and algorithmic descriptions. • The concept of time may be expressed precisely using delays(e.g., A<=B after 10ns). • If no actual delay is used, order of sequential operations is defined. • In the lower level of abstraction (e.g., RTL) synthesis tools ignore detailed timing specifications. • The actual timing results depend on implementation technology and efficiency of synthesis tools. • There are few tools for behavioral synthesis. General format: Process [(sensitivity list)] Process_declarative_part Begin Process_statements [wait_statement] End process 26
  • 27. CHAPTER 6 SOFTWARE TOOLS 6.1 SOFTWARE TOOL-XILINX: Xilinx ISE is a software tool produced by Xilinx for synthesis and analysis of HDL designs, which enables the developer to synthesize ("compile") their designs, perform timing analysis, examine RTL diagrams, simulate a design's reaction to different stimuli, and configure the target device with the programmer. Xilinx was founded in 1984 by two semiconductor engineers, Ross Freeman and Bernard Vonderschmitt, who were both working for integrated circuit and solid- state device manufacturer Zilog Corp. While working for Zilog, Freeman wanted to create chips that acted like a blank tape, allowing users to program the technology themselves. At the time, the concept was paradigm-changing. "The concept required lots of transistors and, at that time, transistors were considered extremely precious – people thought that Ross's idea was pretty far out", said Xilinx Fellow Bill Carter, who when hired in 1984 as the first IC designer was the company's eighth employee. Xilinx is a software tool, which is used to run the programs in VHDL language. It has various versions like Xilinx 92.1, Xilinx 10.1, Xilinx 10.5 etc. Xilinx has various pre-defined libraries ,packages. 6.2 VERSION 9.2I: New Device Support. This release supports the new Spartan™- 3A DSP family. New Software Features. Following are the new features in this release. Operating System Support: • Support for Windows® Vista Business 32-bit operating system. • This operating system is supported, but has had limited testing. • Support for Windows XP Professional 64-bit operating system 27
  • 28. Support for Red Hat Enterprise WS 5.0 32-bit and 64-bit operating system. This operating system is supported, but has had limited testing. WHY XILINX ONLY? We have many software tools to run the VHDL programs like cadence .But compared to all software tools Xilinx is cost effective. 28
  • 29. CHAPTER 7 TUTORIAL OF ISE8.2i ISE 8.2i Quick Start Tutorial The ISE 8.2i Quick Start Tutorial provides Xilinx PLD designers with a quick overview of the basic design process using ISE 8.2i. After you have completed the tutorial, you will have an understanding of how to create, verify, and implement a design. Note: This tutorial is designed for ISE 8.2i on Windows. This tutorial contains the following sections: • “Getting Started” • “Create a New Project” • “Create an HDL Source” • “Design Simulation” • “Create Timing Constraints” • “Implement Design and Verify Constraints” • “Reimplement Design and Verify Pin Locations” • “Download Design to the Spartan™-3 Demo Board” For an in-depth explanation of the ISE design tools, see the ISE In-Depth Tutorial on the Xilinx® web site at: http://www.xilinx.com/support/techsup/tutorials/ 29
  • 30. Getting Started Software Requirements: To use this tutorial, you must install the following software: • ISE 8.2i For more information about installing Xilinx® software, see the ISE Release Notes and Installation Guide at: http://www.xilinx.com/support/software_manuals.htm. Hardware Requirements: To use this tutorial, you must have the following hardware: • Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board Starting the ISE Software To start ISE, double-click the desktop icon, or start ISE from the Start menu by selecting: Start → All Programs → Xilinx ISE 8.2i → Project Navigator Note: Your start-up path is set during the installation process and may differ from the one above. Accessing Help At any time during the tutorial, you can access online help for additional information about the ISE software and related tools. 30
  • 31. To open Help, do either of the following: • Press F1 to view Help for the specific tool or function that you have selected or highlighted. • Launch the ISE Help Contents from the Help menu. It contains information about creating and maintaining your complete design flow in ISE. Figure 1: ISE Help Topics Create a New Project Create a new ISE project which will target the FPGA device on the Spartan-3 Startup Kit demo board. To create a new project: 1. Select File > New Project... The New Project Wizard appears. 2. Type tutorial in the Project Name field. 3. Enter or browse to a location (directory path) for the new project. A tutorial subdirectory is created automatically. 4. Verify that HDL is selected from the Top-Level Source Type list. 5. Click Next to move to the device properties page. 6. Fill in the properties in the table as shown below: ♦ Product Category: All 31
  • 32. ♦ Family: Spartan3 ♦ Device: XC3S200 ♦ Package: FT256 ♦ Speed Grade: -4 ♦ Top-Level Module Type: HDL ♦ Synthesis Tool: XST (VHDL/Verilog) ♦ Simulator: ISE Simulator (VHDL/Verilog) ♦ Verify that Enable Enhanced Design Summary is selected. Leave the default values in the remaining fields. When the table is complete, your project properties will look like the following: 32
  • 33. Figure 2: Project Device Properties 7. Click Next to proceed to the Create New Source window in the New Project Wizard. At the end of the next section, your new project will be complete. Create an Verilog HDL Source In this section, I will create the a example top-level Verilog HDL file Creating a Verilog Source Create the top-level Verilog source file as follows: 1. Click New Source in the New Project dialog box. 33
  • 34. 2. Select Verilog Module as the source type in the New Source dialog box. 3. Type in the file name counter. 4. Verify that the Add to Project checkbox is selected. 5. Click Next. 6. Declare the ports for the counter design by filling in the port information as shown below: Figure 5: Define Module 34
  • 35. 7. Click Next, then Finish in the New Source Information dialog box to complete the new source file template. 8. Click Next, then Next, then Finish. The source file containing the counter module displays in the Workspace, and the counter displays in the Sources tab, as shown below: 35
  • 36. Figure 6: New Project in ISE Using Language Templates (Verilog) The next step in creating the new source is to add the behavioral description for counter. 36
  • 37. Use a simple counter code example from the ISE Language Templates and customize it for the counter design. 1. Place the cursor on the line below the output [3:0] COUNT_OUT; statement. 2. Open the Language Templates by selecting Edit → Language Templates… Note: You can tile the Language Templates and the counter file by selecting Window → Tile Vertically to make them both visible. 3. Using the “+” symbol, browse to the following code example: Verilog → Synthesis Constructs → Coding Examples → Counter → Binary → Up/Down Counters → Simple Counter 4. With Simple Counter selected, select Edit → Use in File, or select the Use Template in File toolbar button. This step copies the template into the counter source file. 5. Close the Language Templates. Final Editing of the Verilog Source 1. To declare and initialize the register that stores the counter value, modify the declaration statement in the first line of the template as follows: replace: reg [<upper>:0] <reg_name>; with: reg [3:0] count_int = 0; 2. Customize the template for the counter design by replacing the port and signal name placeholders with the actual ones as follows: ♦ replace all occurrences of <clock> with CLOCK ♦ replace all occurrences of <up_down> with DIRECTION ♦ replace all occurrences of <reg_name> with count_int 37
  • 38. 3. Add the following line just above the endmodule statement to assign the register value to the output port: assign COUNT_OUT = count_int; 4. Save the file by selecting File → Save. When you are finished, the code for the counter will look like the following: module counter(CLOCK, DIRECTION, COUNT_OUT); input CLOCK; input DIRECTION; output [3:0] COUNT_OUT; reg [3:0] count_int = 0; always @(posedge CLOCK) if (DIRECTION) count_int <= count_int + 1; else count_int <= count_int - 1; assign COUNT_OUT = count_int; endmodule You have now created the Verilog source for the tutorial project. Checking the Syntax of the New Counter Module When the source files are complete, check the syntax of the design to find errors and typos. 1. Verify that Synthesis/Implementation is selected from the drop-down list in the Sources window. 38
  • 39. 2. Select the counter design source in the Sources window to display the related processes in the Processes window. 3. Click the “+” next to the Synthesize-XST process to expand the process group. 4. Double-click the Check Syntax process. Note: You must correct any errors found in your source files. You can check for errors in the Console tab of the Transcript window. If you continue without valid syntax, you will not be able to simulate or synthesize your design. 5. Close the HDL file. Design Simulation Verifying Functionality using Behavioral Simulation Create a test bench waveform containing input stimulus you can use to verify the functionality of the counter module. The test bench waveform is a graphical view of a test bench. Create the test bench waveform as follows: 1. Select the counter HDL file in the Sources window. 2. Create a new test bench source by selecting Project → New Source. 3. In the New Source Wizard, select Test Bench WaveForm as the source type, and type counter_tbw in the File Name field. 4. Click Next. 5. The Associated Source page shows that you are associating the test bench waveform with the source file counter. Click Next. 6. The Summary page shows that the source will be added to the project, and it displays the source directory, type and name. Click Finish. 7. You need to set the clock frequency, setup time and output delay times in the Initialize Timing dialog box before the test bench waveform editing window opens. 39
  • 40. The requirements for this design are the following: ♦ The counter must operate correctly with an input clock frequency = 25 MHz. ♦ The DIRECTION input will be valid 10 ns before the rising edge of CLOCK. ♦ The output (COUNT_OUT) must be valid 10 ns after the rising edge of CLOCK. The design requirements correspond with the values below. Fill in the fields in the Initialize Timing dialog box with the following information: ♦ Clock Time High: 20 ns. ♦ Clock Time Low: 20 ns. ♦ Input Setup Time: 10 ns. ♦ Output Valid Delay: 10 ns. ♦ Offset: 0 ns. ♦ Global Signals: GSR (FPGA) Note: When GSR(FPGA) is enabled, 100 ns. is added to the Offset value automatically. ♦ Initial Length of Test Bench: 1500 ns. Leave the default values in the remaining fields. 40
  • 41. Figure 7: Initialize Timing 41
  • 42. 8. Click Finish to complete the timing initialization. 9. The blue shaded areas that precede the rising edge of the CLOCK correspond to the Input Setup Time in the Initialize Timing dialog box. Toggle the DIRECTION port to define the input stimulus for the counter design as follows: ♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so that the counter will count up. ♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION high so that the counter will count down. Note: For more accurate alignment, you can use the Zoom In and Zoom Out toolbar buttons. Figure 8: Test Bench Waveform 42
  • 43. 10. Save the waveform. 11. In the Sources window, select the Behavioral Simulation view to see that the test bench waveform file is automatically added to your project. Figure 9: Behavior Simulation Selection 12. Close the test bench waveform. Create a Self-Checking Test Bench Waveform Add the expected output values to finish creating the test bench waveform. This transforms the test bench waveform into a self-checking test bench waveform. The key benefit to a self-checking test bench waveform is that it compares the desired and actual output values and flags errors in your design as it goes through the various transformations, from behavioral HDL to the device specific representation. To create a self-checking test bench, edit output values manually, or run the Generate Expected Results process to create them automatically. If you run the Generate Expected Results process, visually inspect the output values to see if they are the ones you expected for the given set of input values. 43
  • 44. To create the self-checking test bench waveform automatically, do the following: 1. Verify that Behavioral Simulation is selected from the drop-down list in the Sources window. 2. Select the counter_tbw file in the Sources window. 3. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and double-click the Generate Expected Simulation Results process. This process simulates the design in a background process. 4. The Expected Results dialog box opens. Select Yes to annotate the results to the test bench. Figure 10: Expected Results Dialog Box 5. Click the “+” to expand the COUNT_OUT bus and view the transitions that correspond to the Output Delay value (yellow cells) specified in the Initialize Timing dialog box. 44
  • 45. Figure 11: Test Bench Waveform with Results 6. Save the test bench waveform and close it. You have now created a self-checking test bench waveform. Simulating Design Functionality Verify that the counter design functions as you expect by performing behavior simulation as follows: 1. Verify that Behavioral Simulation and counter_tbw are selected in the Sources window. 2. In the Processes tab, click the “+” to expand the Xilinx ISE Simulator process and double-click the Simulate Behavioral Model process. 45
  • 46. The ISE Simulator opens and runs the simulation to the end of the test bench. 3. To view your simulation results, select the Simulation tab and zoom in on the transitions. The simulation waveform results will look like the following: Figure 12: Simulation Results Note: You can ignore any rows that start with TX. 4. Verify that the counter is counting up and down as expected. 5. Close the simulation view. If you are prompted with the following message, “You have an active simulation open. Are you sure you want to close it?“, click Yes to continue.You have now completed simulation of your design using the ISE Simulator. 46
  • 47. CHAPTER-8 HARDWARE TOOLS A field-programmable gate array (FPGA) is a semiconductor device that can be configured by the customer or designer after manufacturing—hence the name "field-programmable". FPGAs are programmed using a logic circuit diagram or a source code in a hardware description language (HDL) to specify how the chip will work. They can be used to implement any logical function that an application- specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping offers advantages for many applications. FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"—somewhat like a one-chip programmable breadboard. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. 7.1 HISTORY The FPGA industry sprouted from programmable read only memory (PROM) and programmable logic devices (PLDs). PROMs and PLDs both had the option of being programmed in batches in a factory or in the field (field programmable), however programmable logic was hard-wired between logic gates. Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the first commercially viable field programmable gate array in 1985 – the XC2064. The XC2064 had programmable gates and programmable interconnects between gates, the beginnings of a new technology and market. The XC2064 boasted a mere 64 configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than 20 years later, Freeman was entered into the National Inventor's Hall of Fame for his invention. 47
  • 48. 7.2 ARCHITECTURE The most common FPGA architecture consists of an array of configurable logic blocks (CLBs), I/O pads, and routing channels. Generally, all the routing channels have the same width (number of wires). Multiple I/O pads may fit into the height of one row or the width of one column in the array. An application circuit must be mapped into an FPGA with adequate resources. While the number of CLBs and I/Os required is easily determined from the design, the number of routing tracks needed may vary considerably even among designs with the same amount of logic. Fig 7.1 Internal Structure of FPGA 7.3 APPLICATIONS Applications of FPGAs include digital signal processing, software-defined radio, aerospace and defense systems, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography, bioinformatics, computer hardware emulation, radio astronomy and a growing range of other areas. 7.4 A BRIEF TUTORIAL: SOURCE CODE IS DUMPED INTO FPGA. 48
  • 49. 1. Now let’s look at the flow for actually synthesizing and implementing the design in the FPGA prototyping boards. Close ModelSim and go back to the Xilinx ISE environment. In the Sources subwindow change the selection in the dropdown box from “Behavioral Simulation” to “Synthesis/Implementation”.  2. To properly synthesize the design we need to specify which pins on the chip all the inputs and outputs should be assigned to. In general of course we could assign the signals just about any way we want. Since we will be using specific prototype boards, we need to make sure our pins assignments match the switches, buttons, and LEDs so we can test our design. We will be starting with Digilab 2E boards that are connected to Digilab DIO2 input/output boards. The I/O board has already been programmed and configured to have the following connections: 49
  • 50. 3. To assign specific pins, expand the User Constraints selection under the Process subwindow and double-click on Assign Package Pins. 50
  • 51. 4. A new application called Xilinx PACE should be launched. a. In the Design Object List subwindow you should see a listing of all the input and output signals from our design. 51
  • 52. Here is where we can specify which pin locations we want for each signal. Simply enter the pins numbers from the tables shown in Step 19 above, making sure to use a capital letter “P” in front of the pin specification. Let’s assign our signals as A  P163 (Switch 1) I0  P164 (Switch 2) I1  P166 (Switch 3) 52
  • 53. Y  P149 (LED 0) Once all pins have been assigned, save your constraints by selecting File  Save from the menu bar and exit Xilinx Pace. 5. Back in the Xilinx ISE. In the Process subwindow double-click on the Synthesize – XST selection and wait for the process to complete. Then double-click on the Implement Design selection and wait for the process to complete. Then double-click on the Generate Programming File selection and wait for the process to complete. If all goes well, you should have green checks marks for the whole design. 53
  • 54. 6. There is a lot of information you can obtain through all of the objects listed in the Processes subwindow, but let us proceed to downloading the design onto the prototyping board for testing. First make sure the prototyping board is connected to the PC and has power on. Also make sure the slide switch on the FPGA board by the parallel port is set to JTAG (as opposed to “Port”). Then select Configure Device (iMPACT) underneath the Generate Programming File selection. You should the following window 54
  • 55. 7. Now you need to specify which bitstream file to use to configure the device. For this tutorial we want to select the mux.bit file and click Open. 55
  • 56. You will probably get the message below. Just click Yes. 56
  • 57. You will also get a warning message saying the JTAG clock was updated in the bitstream file (which is good) so just click OK. There is a way to correct for that in the original design flow, but Xilinx automatically catches it here so I don’t usually bother. 8. You should now see the Spartan XC2S200E chip in the main window. Right click on the chip to prepare for downloading the bitstream file. Select Program on the resulting window. 57
  • 58. 9. Click OK. 58
  • 59. If all goes well you should get the Programming Succeeded message 10. Now just test and verify your design on the actual FPGA board! CONCLUSION It has been performed the design, implementation and simulation of a 21´21- bit, radix-8, multiplier unit for specific purpose. The number of transistors is 8224 with an active area size of 2.97 mm2. The measured multiplication time is 9.4 ns and the power dissipation is 60.7 mW at the frequency of 10 MHz It has been proved that it can be useful to apply a radix-8 architecture in high-speed multipliers for specific purpose because of the gain in time and number of transistors compared to the conventional radix-4 recoding architecture. This can be achieved with a slight modification in the previous adder. To do the modification is needed to store two additional bits (intermediate carries) for each word in the set of numbers. Memory needs are increased in a 9.5% while time decrease in the previous adder can be estimated in a 42%. Due to this, the overall multiplication time can be reduced with our radix-8 architecture for specific purpose. 59
  • 60. REFERENCE [1] Dong-Wook Kim, Young-Ho Seo, “A New VLSI Architecture of Parallel Multiplier-Accumulator based on Radix-2 Modified Booth Algorithm”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions, vol.18, pp.: 201-208, 04 Feb. 2010 [2] Prasanna Raj P, Rao, Ravi, “VLSI Design and Analysis of Multipliers for Low Power”, Intelligent Information Hiding and Multimedia Signal Processing, Fifth International Conference, pp.: 1354-1357, Sept. 2009 60
  • 61. [3] Lakshmanan, Masuri Othman and Mohamad Alauddin Mohd.Ali, “High Performance Parallel Multiplier using Wallace-Booth Algorithm”, Semiconductor Electronics, IEEE International Conference , pp.: 433- 436, Dec. 2002. [4] Jan M Rabaey, “Digital Integrated Circuits, A Design Perspective”, Prentice Hall, Dec.1995 [5] Louis P. Rubinfield, “A Proof of the Modified Booth's Algorithm for Multiplication”, Computers, IEEE Transactions,vol.24, pp.: 1014-1015, Oct. 1975 [6] Rajendra Katti, “A Modified Booth Algorithm for High Radix Fixedpoint Multiplication”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions, vol. 2, pp.: 522-524, Dec. 1994. 7] C. S. Wallace, “A Suggestion for a Fast Multiplier”, Electronic Computers, IEEE Transactions, vol.13, Page(s): 14-17, Feb. 1964 [8] Hussin R et al , “An Efficient Modified Booth Multiplier Architecture”, IEEE International Conference, pp.:1-4, 2008. 61