CPLD & FPGA Architectures and Applications Guide

CPLD & FPGA ARCHITECTURES
AND
APPLICATIONS
Dr . Sudhir N. Shelke. Ph.D
Principal , Guru Nanak Institute of Technology, Nagpur
Tuesday,October03,2017Dr.SudhirN.ShelkePh.D
Dr Sudhir Shelke Page 1 of 152

Classification of PLDs
• The classification of PLDs is given below.

Simple Programmable Logic Device [SPLD]
• As the name suggests SPLD has a simple architecture. PROM is a
best example for SPLD.
• SPLD is capable of implementing hundreds of gates and normally
programmed by the user by using inexpensive programmers.
• The main limitation of SPLDs is their low logic capacities due to the
restricted nature of AND-OR planes.

BASIC CIRCUIT OF PLD

Contd…

Contd..
• There are three main types of SPLD
architectures
• (i).Programmable logic array (PLA), ii).Programmable array logic
(PAL) , and
(iii).sGeneric array of logic (GAL)

Configuration of SPLDs

Contd…
• Two of the most popular SPLDs are the PALs produced by
Advanced Micro Devices (AMD) known as the 16R8 and
22V10.
• Both of these devices are industry standards and are
widely second-sourced by various companies.
• The name 16R8 means that the PAL has a maximum of
16 inputs (there are 8 dedicated inputs and 8
input/outputs), and a maximum of 8 outputs.

Contd…
• The “R” refers to the type of outputs provided
by the PAL and means that each output is
“registered” by a D flip-flop.
• Similarly, the “22V10” has a maximum of 22
inputs and 10 outputs. Here, the “V” means
each output is versatile and can be configured
in various ways, some configurations
registered and some not.

Contd..
• Another widely used and second sourced
SPLD is the Altera Classic EP610.
• This device is similar in complexity to PALs,
but it offers more flexibility in the way that
outputs are produced and has larger AND-
and OR- planes.
• In the EP610, outputs can be registered and
the flip-flops are configurable as any of D, T,
JK, or SR.

PLA- Programmable logic array
• The PLA consists of two programmable planes
AND and OR . The AND plane consists of
programmable interconnect along with AND gates.
• The OR plane consists of programmable
interconnect along with OR gates.
• Each of the inputs can be connected to an AND
gate with any of the other inputs by connecting the
crossover point of the vertical and horizontal
interconnect lines in the AND gate programmable
interconnect.

Contd..
• Initially, the crossover points are not electrically
connected, but configuring the PLA will connect
particular cross over points together.
• The AND gate is seen with a single line to the
input. This view is by convention, but this also
means that any of the inputs (vertical lines) can be
connected. Hence, for four PLA inputs, the AND
gate also has four inputs. The single output from
each of the AND gates is applied to an OR gate
programmable inter connect.

Contd..

PROGRAMMABLE ARRAY LOGIC (PAL)
• The first programmable device was the programmable array
logic (PAL) developed by Monolithic Memories Inc(MMI).
• The Programmable Array Logic or PAL is similar to PLA, but
in a PAL device only AND gates are programmable. The OR
array is fixed by the manufacturer.
• This makes PAL devices easier to program and less expensive
than PLA. On the other hand, since the OR array is fixed, it is
less flexible than a PLA device.

Schematic Representation- PAL

Block diagram of PAL

PAL contd..
• The PAL device. has n input lines which are fed to
buffers/inverters.
• Buffers/inverters are connected to inputs of AND gates through
programmable links. Outputs of AND gates are then fed to the
OR array with fixed connections

GAL-GenericArray Logic
• PAL and PLA devices are one-time programmable (OTP) based
on PROM, so the PAL or PLA configuration cannot be changed
after it has been configured.
• This limitation means that the configured device would have to
be discarded and a new device configured. The GAL, although
similar to the PAL architecture, uses EEPROM and can be
reconfigured.

Contd…
• The Generic Array Logic (GAL) device was invented by Lattice
Semiconductor.
• The GAL was an improvement on the PAL because one device
was able to take the place of many PAL devices or could even
have functionality not covered by the original range. Its primary
benefit, however, was that it was erasable and re-programmable
making prototyping and design changes easier for engineers.

Complex Programmable Logic Devices
(CPLDs)
• CPLDs were pioneered by Altera, first in their
family of chips called Classic EPLDs, and then
in three additional series, called MAX 5000,
MAX 7000 and MAX 9000.
• The CPLD is the complex programmable Logic
Device which is more complex than the SPLD.
• This is build on SPLD architecture and creates a
much larger design. Consequently, the SPLD can
be used to integrate the functions of a number of
discrete digital ICs into a single device and the
CPLD can be used to integrate the functions of
a number of SPLDs into a single device.

Contd..
• CPLD architecture is based on a small number of
logic blocks and a global programmable
interconnect.
• Instead of relying on a programming unit to
configure chip , it is advantageous to be able to
perform the programming while the chip is still
attached to its circuit board.
• This method of programming is known is called
In-System programming (ISP). It is not usually
provided for PLAs (or) PALs , but it is available
for the more sophisticated chips known as
Complex programmable logic device.

Architecture of CPLD

Contd…
• The CPLD consists of a number of logic
blocks or functional blocks, each of
which contains a macrocell and either
a PLA or PAL circuit arrangement.
• In the diagram eight logic blocks are
shown. The building block of the CPLD
is the macro-cell, which contains logic
implementing disjunctive normal form
expressions and more specialized logic
operations.

Contd..
• The macro cell provides additional
circuitry to accommodate registered or
nonregistered outputs, along with signal
polarity control.
• Polarity control provides an output that is
a true signal or a complement of the true
signal.
• The actual number of logic blocks within
a CPLD varies ,the more logic blocks
available, the larger the design that can be
configured.

Contd..
• In the center of the design is a global programmable
interconnect.
• This interconnect allows connections to the logic block
macrocells and the I/O cell arrays (the digital I/O cells of
the CPLD connecting to the pins of the CPLD package).
• The programmable interconnect is usually based on
either array-based interconnect or multiplexer-based
interconnect

CPLD Architecture

Contd..
• Multiplexer-based interconnect uses digital
multiplexers connected to each of the
macrocell inputs within the logic blocks.
• Specific signals within the programmable
interconnect are connected to specific inputs
of the multiplexers.
• It would not be practical to connect all
internal signals within the programmable
interconnect to the inputs of all multiplexers
due to size and speed of operation
considerations.

FIELD PROGRAMMABLE GATEARRAYS
• The concept of FPGA was emerged in
1985 with the XC2064TM FPGA family
from Xilinx .
• The “FPGA is an integrated circuit that
contains many (64 to over 10,000) identical
logic cells that can be viewed as standard
components.”
• The individual cells are interconnected by
a matrix of wires and programmable
switches.

Contd..
• Unlike CPLDs (Complex Programmable Logic Devices) FPGAs
contain neither AND nor OR planes.
• The FPGA architecture consists of configurable logic blocks,
configurable I/O blocks, and programmable interconnect.
• Also, there will be clock circuitry for driving the clock signals
to each logic block, and additional logic resources such as
ALUs, memory, and decoders may be available.

Contd..
• The two basic types of programmable
elements for an FPGA are Static RAM and
anti-fuses.
• Each logic block in an FPGA has a small
number of inputs and one output.
• A look up table (LUT) is the most
commonly used type of logic block used
within FPGAs.
• There are two types of FPGAs.(i) SRAM
based FPGAs and (ii) Anti-fuse
technology based(OTP).

FPGA-Architecture

Contd..
Every FPGA consists of the following
elements.
1. Configurable logic blocks(CLBs)
2. Configurable input output blocks(IOBs)
3. Two layer metal network of vertical and horizontal lines
for interconnecting the CLBS. Which are called
Programmable Interconnects.

XILINX Logic CellArray(LCA)
• LCA is the novel architectural feature
introduced by XILINX in the year 1985 for
their FPGA devices. It is almost like a
proprietary or trade mark property of
XILINX implemented for FPGA devices.
• The XILINX LCA architecture consists of
three major Components. They are
1. Configurable Logic Blocks (CLBs)
2. Input / Output Blocks (lOBs) and
3. Programmable Interconnect.

Contd…
• In addition, configuration memory is used to hold the
configuration program bits which control the configuration of
CLRM, IOBs and interconnect.
• This LCA architecture consists of an interior matrix of logic
blocks and a surrounding ring of I/O interface blocks.
• Interconnect resources occupy the channels between the rows and
columns of logic blocks and between the logic blocks and I/O
blocks. Like a microprocessor the LCA is a program driven logic
device.

Contd..
• The functions of the LCA’s configurable
logic blocks and I/O blocks and their
interconnection are controlled by a
configuration program stored in an on-
chip memory.
• The configuration program is loaded
automatically from an external memory on
power-up or on command, or is
programmed by a microprocessor as part of
system initialization

Contd..
As shown below diagram the configuration
memory consists of a distributed array of
static memory cells.
During configuration the cell is written
through the data line and is read through
the data line during read back operation

LCA-Architecture
• The core of the LCA is a matrix of identical
Configurable Blocks (CLBs).Each CLB contains
programmable combinational logic and storage
registers.
• The combinational logic section of the block is
capable of implementing any Boolean function of
its input variables.
• The registers can be loaded from the combinational
logic or directly from a CLB input the register
outputs can be inputs to the combinational logic
via an internal feedback path

Block Diagram

Contd..
• The periphery of the Logic Cell Array is made up of user
programmable input/output blocks (IOBs).
• Each block can be programmed independently to be an input ,an
output or bi-directional pin with three state control. Inputs can be
programmed to recognize either TTL or CMOS thresholds.
• Each IOB also includes flip-flops that can be used to buffer
inputs and outputs.

Programmable Interconnect
• In FPGAs three types of metal resources are provided to fulfill
various network interconnect requirements. They are
1. General Purpose Interconnect
2. Direct Connection
3. Long lines (multiplexed busses and wide AND gates)

General Purpose Interconnect
• It consists of a grid of five horizontal and
five vertical metal segments located
between the rows and columns of logic and
IOBs.
• Each segment is the height or width of a
logic block.
• Switching matrices join the ends of these
segments and allow programmed
interconnections between the metal grid
segments of adjoining rows and columns.

Contd..

Contd...
• The switches of an un-programmed device
are all non-conducting.
• The connections through the switch matrix
may be established by the automatic routing
or by selecting the desired pairs of matrix
pins to be connected or disconnected.
• The interconnect buffers are available to
propagate signals in either direction on a
given general interconnect segment.
• These bidirectional (bidi) buffers are found
adjacent to the switching matrices, above and
to the right.

Direct Interconnect
• Direct interconnect provides the most efficient implementation
of networks between adjacent CLBs or I/O Blocks. Signals routed
from block to block using the direct interconnect exhibit
minimum interconnect propagation and use no general
interconnect resources.

Contd..

Contd…
• Direct interconnect should be used to maximize the speed of
high-performance portions of logic.
• Where logic blocks are adjacent to IOBs, direct connect is
provided alternately to the IOB inputs (I) and outputs (O) on all
four edges of the die.
• The right edge provides additional direct connects from CLB
outputs to adjacent IOBs.

Long lines
• The Long lines bypass the switch matrices and
are intended primarily for signals that must
travel a long distance, or must have minimum
skew among multiple destinations.
• Long lines, run vertically and horizontally the
height or width of the interconnect area.
• Each interconnection column has three
vertical Long lines, and each interconnection
row has two horizontal Long lines.

Contd..

Contd…
• Two additional Long lines are located adjacent to the outer sets
of switching matrices.
• Long lines can be driven by a logic block or IOB output on a
column-by-column basis.
• This capability provides a common low skew control or clock line
within each column of logic blocks.
• Isolation buffers are provided at each input to a Long line and
are enabled automatically by the development system when a
connection is made.

Technology Mapping for FPGA
• The high functionality of FPGA logic blocks presents new
challenges for logic synthesis. So,the technology mapping
provides a solution for FPGAs that use lookup tables to
implement combinational logic.
• Technology mapping is a process of transforming a technology
independent Boolean network into a technology dependent
network.
• For example a K input lookup table (LUT) is a
digital memory that can implement any Boolean
function of K variables

Contd..
• Technology mapping is the logic synthesis task that is directly
concerned with selecting the circuit elements used to implement
the optimized circuit.
• Previous approaches to technology mapping have focused on
using circuit elements from a limited set of simple gates.
• However such approaches are inappropriate for complex logic
blocks where each logic block can implement a large number of
functions

Library-Based Technology Mapping
• In library based mapping, gates or components are selected from a
technology library to implement a circuit.
• Hence it is also referred to as library binding. So, this method
generates a technology mapping for a given Boolean network
using a characterized cell library with the objective of cost
optimization or delay optimization

Contd..
• In this method the set of available circuit elements is
represented as a library of functions and the
construction of the optimized circuit is divided into
three sub problems
• (i). Decomposition, (ii). Matching and (iii) Covering.
• The original network is first decomposed into a
canonical representation that uses limited fan in
NAND nodes.
• This decomposition guarantees that there will be no
nodes in the network that are too large to be
implemented by any library element provided the
library includes NAND gates that reach the fan in
limit.

contd..
• After decomposition the network is
partitioned into a forest of trees The
optimal sub circuit covering each tree is
constructed and finally the circuit covering
the entire network is assembled from these
sub circuits.
• To form the forest of trees, the
decomposed network is partitioned at fan
out nodes into a set of single output sub
networks.

Contd..
• The major obstacle to applying library-
based technology mapping to LUT circuits
is the large number of different functions
that a K-input LUT can implement.
• The function implemented by a K-input
LUT is determined by the values stored in
its 2K memory bits. Since each bit can
independently be either 0 or 1, there are
22K different Boolean functions of K-
variables.

contd..
• The major obstacle to applying library-based technology mapping
to LUT circuits is the large number of different functions that a
K-input LUT can implement.
• The function implemented by a K-input LUT is determined by
the values stored in its 2K memory bits. Since each bit can
independently be either 0 or 1, there are 22K different Boolean
functions of K- variables

Contd..
• For values of K greater than 3 the library required
to represent a K-input LUT becomes very large.
• The size of the library can be reduced by noting
that some patterns are equivalent after a.
permutation of inputs.
• The inversion of outputs or inputs, which is
trivially accomplished with a LUT, can also
produce equivalent ‘patterns.
• Another alternative is to use a partial library tuned
to take advantage of the network structure likely to
be produced by technology independent logic
optimization.

LUT-based Technology Mapping
• The limitations of earlier technology mapping approaches
paved the way for the development of technology mapping
that deals specially with LUT circuits.
• The first LUT based technology mappers appeared in 90s. and
later improved for optimized delay performance of LUT
circuits by minimizing the number of levels of LUT in the final
circuit.

Contd..
• In LUT based FPGAs (example XILINX FPGAs) the building
blocks are LUTs and Flip-Flops.
• In an LUT based FPGA chip the basic programmable logic
block is a K-input Look Up Table.(K-LUT) which can
implement any Boolean function of up to K- variables.
• The technology mapping in LUT based FPGA designs is to cover
a general Boolean Network using K-LUTs to obtain functionally
equivalent K-LUT network.

Contd..
• The main objectives in LUT mapping are
(i).Cost optimal mapping i.e Minimizing the
number of LUTs and Minimizing the number of
CLBs
(ii) Delay optimal mapping i.e Minimizing the
number of LUT levels and Minimizing the delays
(including routing delays)
(iii).Maximizing the routability of the mapping
schemes.
• The LUT based technology can be implemented
using two types of algorithms .They are
• (a).The Area Algorithm and (b).The delay algorithm

MULTIPLEXER BASED TECHNOLOGY
MAPPING
• This Multiplexer based technology mapping is used in ACTEL
FPGAs and in recent Xilinx VIRTEX 6 FPGA devices .
• Because their logic block architectures are MUX based.
• In Actel based FPGAs ,the size of the Multiplexers is small and
suitable to achieve the objective of area optimization and
minimum delays.

Contd..
• Circuits usually contain a large number of
multiplexers (MUXes).
• This is mainly true for circuits that are
automatically synthesized from high-level
descriptions.
• MUXes exist in the data-paths of circuits, where
they are used to route operands to operators. Also,
the control logic is frequently specified as a CASE
statement in HDL descriptions.
• MUXs arise as a result of a direct translation of
CASE statements in HDLs into a logic-level
description

Contd..
• The main objective behind this Mux based technology mapping
is ,describing a combinational circuit in terms of Boolean
equations and realize it using minimum number of basic blocks
of the target Mux based architecture and minimizing the delay
on the critical path.
• In this algorithm an appropriate base function ,a library of cells
and a set of pattern graphs are selected .

Contd…
• The advantages of MUX based technology mapping are it
generates optimal mappings, which are often much better than
those produced by conventional heuristic techniques.
• Moderately large circuits can be mapped optimally in a small
amount of time. Very large circuits can be mapped near-
optimally by partitioning the circuits and mapping each partition
individually

Programming Technologies
• There are a number of programming technologies that have been
used for reconfigurable architectures.
• Each of these technologies have different characteristics and
have significant effect on the programmable architecture.
Some of the well-known technologies are
(i).SRAM Based Programming Technology (ii).Flash
Programming Technology(EEPROM) , and (iii) Anti-fuse based
Programming Technology

SRAM-Based Programming Technology
• Static memory cells are the basic cells used for SRAM-based
FPGAs.
• Most commercial vendors like XILINX, Lattice and Altera
etc.use static memory (SRAM) based programming technology
in their devices.
• These devices use static memory cells which are divided
throughout the FPGA to provide configurability.

Contd..
• There are two primary uses for the SRAM cells. Most of them
are used to set the select lines to multiplexers that steer
interconnect signals.
• The majority of the remaining SRAM cells are used to store the
data in the lookup-tables (LUTs) that are typically used in
SRAM-based FPGAs to implement logic functions.
• Historically, SRAM cells were used to control the tri-state
buffers and simple pass transistors that were also used for
programmable interconnect.

• SRAM-based programming technology has become the dominant
approach for FPGAs because of its re-programmability and the
use of standard CMOS process technology and therefore leading
to increased integration, higher speed and lower dynamic power
consumption of new process with smaller geometry.

Contd..
• There are however a number of drawbacks
associated with SRAM-based programming
technology.
• For example an SRAM cell requires 6
transistors which makes this technology costly
in terms of area compared to other
programming technologies.
• Further SRAM cells are volatile in nature and
external devices are required to permanently
store the configuration data.
• These external devices add to the cost and area
overhead of SRAM-based FPGAs.

Flash Programming Technology
• An important alternative to the SRAM-based
programming technology is the use of flash or
EEPROM based programming technology. This
technology inject charge onto a gate that “floats”
above the transistor.
• This approach is used in flash or EEPROM
memory cells. These cells are non-volatile; they
do not lose information when the device is
powered down.
• With modern IC fabrication processes, it has
become possible to use the floating gate cells
directly as switches.

Contd..
• Flash memory cells, in particular, are now
used because of their improved area
efficiency.
• The widespread use of flash memory cells
for non-volatile memory chips ensures that
flash manufacturing processes will benefit
from steady decreases in process
geometries.
• Flash-based programming technology
offers several advantages. For example, this
programming technology is nonvolatile in
nature.

• Flash-based programming technology is
also more area efficient than SRAM-based
programming technology.
• Flash-based programming technology has
its own disadvantages also.
• Unlike SRAM-based programming
technology, flash based devices cannot be
reconfigured/reprogrammed an infinite
number of times.
• Also, flash-based technology uses non-
standard CMOS process.

Contd..
• This flash-based programming technology offers several unique
advantages, most importantly non-volatility.
• This feature eliminates the need for the external resources
required to store and load configuration data when SRAM-based
programming technology is used.
• Additionally, a flash-based device can function immediately upon
power-up instead of having to wait for the loading of
configuration data.

Contd..
• The flash approach is more area efficient than SRAM-based
technology which requires up to six transistors to implement the
programmable storage.
• The programming circuitry, such as the high and low voltage
buffers needed to program the cell, contributes an area overhead
not present in SRAM-based devices.

Contd..
• In devices from Altera, Xilinx and Lattice, on-chip flash
memory is used to provide non-volatile storage while SRAM
cells are still used to control the programmable elements in the
design.
• This addresses the problems associated with the volatility of
pure-SRAM approaches, such as the cost of additional storage
devices or the possibility of configuration data interception,
while maintaining the infinite re-configurability of SRAM-
based devices

Anti-fuse Programming Technology
• An alternative to SRAM and floating
gate-based technologies is anti fuse
programming technology.
• This technology is based on structures
which exhibit very high-resistance
under normal circumstances but can be
programmably “blown” (in reality,
connected) to create a low resistance
link.

Contd..
• An anti-fuse is a two terminal device with an
unprogrammed state presenting a very high resistance
between its terminals.
• When a high voltage (from 11 to 20 volts, depending on the
type of anti-fuse) is applied across its terminals the anti-
fuse will blow and create a low resistance link.
• This link is permanent.

• Programming an anti-fuse requires extra
circuitry to deliver the high
programming voltage and a relatively
high current of 5 mA or more.
• This is done in through fairly sizable
pass transistors to provide addressing to
each anti-fuse. Anti-fuse technology is
used in the FPGA’s from Actel , Quick
logic , and Cross point

Contd..
• A major advantage of the anti-fuse is its
small size, little more than the cross-
section of two metal wires.
• But this advantage is limited by the large
size of the necessary programming
transistors, which handle large currents,
and the inclusion of isolation transistors
that are sometimes needed to protect low
voltage transistors from high programming
voltages.

Contd..
• A second major advantage of an anti-fuse is its relatively low
series resistance.
• The on-resistance of the ONO anti-fuse is 300 to500 ohms, while
the amorphous silicon anti-fuse is 50 to100 ohms.
• Additionally, the parasitic capacitance of an un programmed
amorphous anti-fuse is significantly lower than for other
programming technologies

Contd..
• The limitations of this technology are , this technology does not
make use of standard CMOS process.
• Also, anti-fuse programming technology based devices cannot be
reprogrammed.
• The ideal technology should be re-programmable, non-volatile,
and that uses a standard CMOS process.
• But it is clear that none of the above technologies satisfy these
conditions

Comparison of Programming Technologies
Inspites of all the advantages and disadvantages, the
SRAM-based programming technology is the most
widely used programming technology. The main
reason is its use of standard CMOS process .Due to
this reason it is expected that this technology will
continue to dominate the other two programming
technologies Dr Sudhir Shelke Page 82 of 152

XILINX XC3000 FPGADevice
• Xilinx introduced the first FPGA family, called the XC2000
series, in 1984 and next offered three more series of FPGAs
namely XC3000, XC4000, and XC5000 etc.
• XC3000 series of FPGA devices were introduced in 1985 by
XILINX Inc.
• This was the most successful family of FPGAs. The XC3000
archtecture includes enhancements to the XC2000 architecture to
improve performance ,density and usability.

Contd..
• The XC3000 Family covers a range of nominal device densities
from 2,000 to 9,000 gates, practically achievable densities from
1,000 to 6,000 gates with up to 144 user-definable I/Os.
• The XC3000 Configurable Logic block is substantially larger
than XC2000 and Each of the lookup tables has four inputs and
requires 16 bits of configuration memory.
• There are now four distinct families within the XC3000 Series
of FPGA devices

XC3000 Family of Devices
The basic LCA (Logic Cell Array) of XC3000
consists of three components .They are
Programmable I/O Blocks , Configurable Logic
Block and Programmable Interconnect. In addition
to this a small amount of configurable memory is
also present Dr Sudhir Shelke Page 85 of 152

Programmable I/O Block
• The I/O Block of the XC3000 is more complex than the XC2000 ,
IOB. The important addition in this is a flip-flop in the out-put
path
• By registering the data in IOB ,the clock to-out- time does to
include interconnect delays.
• Each user-configurable IOB provides an interface between the
external package pin of the device and the internal user logic.
Each IOB includes both registered and direct input paths

Programmable I/O Block

Contd..
• Each IOB includes input and output storage elements and I/O
options selected by configuration memory cells.
• A choice of two clocks is available on each die edge. The
polarity of each clock line (not each flip-flop or latch) is
programmable.
• Each input circuit also provides input clamping diodes to provide
electrostatic protection, and circuits to inhibit latch-up produced
by input currents.

Configurable Logic Block(CLB)
• The XC3000 CLB is substantially larger than the XC2000 CLB.
• Each of the look-up tables has four inputs rather than three and
hence requires sixteen bits of configuration memory rather than
eight.
• The lookup tables can be combined with a multiplexer to
produce any function of five inputs and some functions of up to
seven inputs

• The XC3000 CLB has two flip-flops ,to ensure that all
combinational logic can be followed by a pipelining flip-flop.
• The register rich CLB allows the XC3000 to implement state
intensive applications and heavily pipe lined designs efficiently.
• Each CLB has a combinatorial logic section, two flip-flops, and an
internal control section. The CLB has five logic inputs (A, B, C, D
and E)

XC3000 CLB

Contd..
• Data input for the flip-flops within a CLB is supplied from the
function F or G outputs of the combinatorial logic, or the block
input, DI.
• Both flip-flops in each CLB share the asynchronous RD which,
when enabled , is dominant over clocked inputs.
• All flip-flops are reset by the active-Low chip input, RESET, or
during the configuration process.

Programmable Interconnect
• Programmable-interconnection resources in the
Field Programmable Gate Array provide routing
paths to connect inputs and outputs of the IOBs
and CLBs into logic networks.
• Interconnections between blocks are composed of
a two-layer grid of metal segments.
• Specially designed pass transistors, each
controlled by a configuration bit, form
programmable interconnect points (PIPs) and
switching matrices used to implement the
necessary connections between selected metal
segments and block pins.

Contd..
• The XC3000 interconnect structure has five general interconnect
lines both vertically and horizontally .
• In addition each CLB has direct connections to adjacent CLBs
both vertically and horizontally.
• Three types of metal resources are provided to accommodate
various network interconnect requirements.
• General Purpose Interconnect
• Direct Connection
•Long lines (multiplexed busses and wide AND gates)

XC3000 Interconnect

XILINX XC4000 FPGADevice
• The XC4000 was designed to improve performance
and gate density for large designs.
• Several dedicated features were added to the general
purpose logic features of XC3000 , resulting an
interesting combination of special -purpose and
general purpose functions.
• The XC4000 family was designed using placement
and routing tools to evaluate architectural decisions.

The basicbuilding blocks in the XC4000
family
• Look-up tables for implementation of logic functions.
• A designer can use a function generator to implement
any Boolean function of a given number of inputs by
pre-loading the memory with the bit pattern
corresponding to the truth table of the function.
• All functions of a function generator have the timing
,the time to look-up results in the memory.
• Therefore ,the inputs to the function generator are fully
interchangeable by simple rearrangement of the bits in
the look-up table.

Contd..
• A Programmable Interconnect Point(PIP) is a pass transistor
controlled by a memory cell.
The PIP is the basic unit of configurable interconnect
mechanism.
• The wire segments on each side of the transistor are connected
depending on the value in the memory cell.
• The pass transistor introduces resistance into the interconnected
paths and hence delay occurs.

Advanced Features of the XC4000
FPGAs
1. CLBs can be used as on-chip RAM
2. Fast carry chain for high speed implementation of
arithmetic
3. Boundary scan compatibility (JTAG)
4. Wide decode logic, More global clocks
5. Faster placement and routing algorithms
6. Scaled routing resources.

Configurable Logic Block (CLB)
• The XC4000 CLB is similar to the XC3000CLB. It
contains three lookup tables and two flip-flops.(F,G
&H)
• The two primary look-up tables F & G implement
any function of four variables.
• These two results can be brought out of the block
independently or they can be combined with
another input in the H –look up table to make any
function of five inputs or some function of up to
nine inputs.

Contd..
• The XC3000 can implement arithmetic with sum
in one look-up table and carry in another look-up
table.
• The XC4000 CLB can implement arithmetic in this
way also, but as the speed of the arithmetic
operation is dominated by the speed of the carry
chain ,the XC4000 CLB includes dedicated high
speed carry logic.

Block Diagram-CLB

XC4000 I/O BLOCK
• The signals to be output from the chip can be registered before
output and enabled by a separate control signal.
• Outputs can be optionally pulled up or down and the output
driver can be configured with either fast or or slow slew rate.
• Inputs from the pad can be brought into the interior of the chip
directly ,registered or both to facilitate multiplexed bus
interfaces

Contd..
• The XC4000IOB includes boundary scan logic compatible with
the ANSI EEE1149.1 (JTAG) boundary scan standard.
• The boundary scan can check internal logic or external logic.
• Scan operation can take place before and after the FPGA is
programmed and do not interfere with the operation of the part.

Interconnect Structure
• The XC4000 interconnect is arranged in horizontal
and vertical channels.
• Each channel contains some number of short wire
segments that span a single CLB (the number of
segments in each channel depends on the specific
part number), longer segments that span two CLBs,
and very long segments that span the entire length
or width of the chip.
• Programmable switches are available to connect the
inputs and outputs of the CLBs to the wire
segments, or to connect one wire segment to
another..

Contd..
The figure below shows only the wire
segments in a horizontal channel, and does
not show the vertical routing channels, the
CLB inputs and outputs, or the routing
switches

Contd..
• The salient feature about the Xilinx interconnect is that signals
must pass through switches to reach one CLB from another, and
the total number of switches traversed depends on the particular
set of wire segments used.
• Thus, speed-performance of an implemented circuit depends in
part on how the wire segments are allocated to individual signals
by CAD tools.

Actel FPGAs
• In contrast to XILINX FPGAs the devices manufactured by
Actel are based on anti fuse technology.
• Actel offers three main families .They are : Act 1, Act 2, and
Act 3.
• Actel devices are based on a structure similar to traditional gate
arrays; the logic blocks are arranged in rows and there are
horizontal routing channels between adjacent rows.

LOGIC BLOCK –ACTEL FPGA

Contd..
• The logic blocks in the Actel devices are relatively small in
comparison to the LUT based ones. , and are based on
multiplexers.
• It comprises an AND and OR gate that are connected to a
multiplexer based circuit block.
• The multiplexer circuit is arranged such that, in combination
with the two logic gates, a very wide range of functions can be
realized in a single logic block.

Contd..
• Actel’s interconnect is organized in horizontal
routing channels.
• The channels consist of wire segments of various
lengths with anti-fuses to connect logic blocks to
wire segments or one wire to another.
• Also, Actel chips have vertical wires that overlay
the logic blocks, for signal paths that span multiple
rows.
• In terms of speed-performance, it is evident that
Actel chips are not fully predictable, because the
number of anti-fuses traversed by a signal depends
on how the wire segments are allocated during
circuit implementation by CAD tools.

Quicklogic pASIC FPGAs
• The Quicklogic is the main competitor for Actel in
anti-fuse -based FPGAs .
• It produces two families of devices, called pASIC
and pASIC-2. The pASIC-2 is an enhanced version
of pASIC.
• The pASIC, consists of a regular two-dimensional
array of blocks called pASIC Logic Blocks (pLBs).
• The logic capacities of first generation of Quick
Logic FPGAs is between 48 and 380pLBs,or 500 to
4000 equivalent MPGAs gates.s

Contd..
As shown in figure below pASIC has similarities to
other FPGAs i.e the overall structure is array-based
like Xilinx FPGAs, and logic blocks use multiplexers
similar to Actel FPGAs, and the interconnect consists
of only long- lines like in Altera FLEX 8000.

Contd..
• pASIC’s multiplexer-based logic block is shown in
below figure. It is more complex than Actel’s Logic
Module, with more inputs and wide (6-input) AND-
gates on the multiplexer select lines. Every logic block
also contains a flip- flops.

Altera FLEX 8000 and FLEX 10000 FPGAs
• The first FPGA chips from Aletra were simple arrays of logic
cells ,which are relatively simple logic elements (LEs),each
element comprising of a three input look-up table (LUT ) to
generate logic functions ,a single configurable flip-flop and
multiplexers for routing the signals and selecting clocks.
• The logic cells were connected by switch boxes instead of fixed
interconnect. The general architecture of Altera’s FPGAs is
shown in the next slide.

Architecture of ALTERA FPGA

• There are two high performance FPGA series called FLEX series.
• Altera’s FLEX 8000 series consists of a three-level hierarchy similar
to CPLDs.
• However, the lowest level of the hierarchy consists of a set of
lookup tables, rather than an SPLD like block, and so the FLEX
8000 is categorized here as an FPGA.

Contd..
• The architecture of FLEX 8000 is shown in next slide.
• The basic logic block, called a Logic Element (LE) contains a
four-input LUT, a flip-flop, and special-purpose carry circuitry
for arithmetic circuits (similar to Xilinx XC 4000).
• The LE also includes cascade circuitry that allows for efficient
implementation of wide AND functions

Architecture ofAltera FLEX 8000 FPGA

contd..
• A major difference between FLEX 8000 and Xilinx chips is that
Fast Track consists of only long lines. This makes the FLEX
8000 easy for CAD tools to automatically configure.
• All Fast-Track wires horizontal wires are identical, and so
interconnect delays in the FLEX 8000 are more predictable than
FPGAs that employ many smaller length segments because there
are fewer programmable switches in the longer paths.

contd..
• Predictability is furthered aided by the fact that connections
between horizontal and vertical lines pass through active buffers.
• The FLEX 8000 architecture has been extended in the state-of-
the-art FLEX 10000 family.
• FLEX 10000 offers all of the features of FLEX 8000, with the
addition of variable-sized blocks of SRAM, called Embedded
Array Blocks (EABs) which shows that each row in a FLEX
10000 chip has an EAB on one end.

Concurrent Logic FPGADevice
• The manufacturer Concurrent Logic offers the CFA6006 FPGA
device ,which is based on two dimensional array of identical
blocks ,where each block is symmetrical on its four sides.
• The array holds 3136 of such blocks ,providing a total logic
capacity of about 5000 equivalent gates.
• Connections are formed using multiplexers that are configured
by a static RAM programming technology.

Contd..
• The structure of the Concurrent Logic Block is shown in the next
slide.It comprises of user configurable multiplexers, basic gates
and a D type flip-flop .
• The concurrent FPGA is especially suitable for register-intensive
and arithmetic applications since the logic block can easily
implement a half-adder and a register bit.

Structure of the Concurrent Logic Block

Crosspoint Solutions FPGAs
• The crosspoint FPGAs are different from other FPGAs because
it is configurable at the transistor level as aoposed to logic block
level in other FPGAs.
• Basically the architecture consists of rows of transistor pairs
,where the rows are separated by horizontal wiring segments .
• Veritical wiring segments are also available ,for connection
among the rows

Contd..
• Each transistor row comprises two lines of series connected
transistors ,with one line being NMOS and the other PMOS .
• The wiring resources allow individual transistor pairs to be
interconnected to implement CMOS logic gates.
• The programming technology used for the programmable
switches is similar to the Via-Link anti-fuse ,which is based on
amorphous silicon.

Contd..
• The structure of the transistor pair rows is
shown in the next slide.
• The diagram shows the implementation of a
NOR gate and a NAND gate using the
transistor lines.
• The transistor gates ,drains , sources can be
programmable interconnected to other
transistors and also to power and ground.

Structure of the Transistor Pair
The series connections across the lines is broken where
necessary by permanently holding a transistor in its
OFF state. A wide range of logic gates can be
implemented by the transistor lines and the
interconnection patterns.

contd..
• The FPGAs currently offered by Crosspoint Solutions has a
total logic capacity of 4200 gates.
• The chip has 256 rows of transistor pairs and an additional 64-
rows of multiplexer like structures are provided.
• With its rows based architecture ,anti-fuse programming
technology and multiplexers ,the Crosspoint FPGAs are most
similar to those of Actel FPGAs.

ALGOTRONIX CAL-1024
• This design has a two-dimensional mesh array structure which
resembles the gate array “sea of gates” architecture .
• Like the Xilinx architecture, Algotronics used Static RAM
programming technology to specify the function performed by
each logic cell and to control the switching of connections
between cells.
• The CAL1024 design contains 1024 identical logic cells
arranged in a 32 X 32 matrix.

contd..
• The design is considered to be a mesh-connected architecture since
each cell is directly connected to its nearest north, south, east, and
west neighbors.
• In addition to these direct connects, two global interconnect signals
are routed to each cell to distribute clock and other “low skew
requirement” control signals.
• Figure in next slide shows the basic array architecture, indicating
both nearest neighbor and global connections to the logic cells.

BasicArrayArchitecture

contd..
• The basic building block of the Algotronix design is a configurable
cell containing multiplexers and a function unit.
• As indicated in the figure , the function unit is preceded by
multiplexers which select the source for the X1 and X2 inputs.
• The function unit is capable of generating any logic function of the
two inputs, or of operating as a D-type latch.
• There are four additional multiplexers which select the function
output or one of the external inputs for routing to each of the four
outputs (north, south, east, and west).

Commercially available FPGAs

FPGADesign Flow
• The earlier PLD and FPGA designs were performed largely by
hand But to-days complex programmable logic devices requires
the use of an integrated Computer-Aided Design (CAD) system.
• Both commercial CAD tool vendors and FPGA companies offer
appropriate tools.
• For example, traditional Electronic Design Automation (EDA)
vendors such as Cadence, Mentor Graphics, Synopsys, and View
Logic etc. offer tools to support FPGA design. s

contd..
• These tools are typically used for the front-end design entry and
simulation operations and provide the necessary interfaces to
vendor-specific back-end tools for chip placement and routing.
• Examples of vendor specific tools are the Xilinx XACT system
and the Altera MAX+PLUS II software.
• The Altera’s MAX+PLUS II software supports the entire
design flow on either PC or workstation platforms.

Contd..
• The first step in the design process is the
description of the logic circuit, which can be done
either by schematic capture tool or with Boolean
expressions.
• This is followed by a translation that converts the
original circuit description into a standard format
used by the suitable CAD tools (Ex: XILINX
CAD tools).
• The circuit is then passed through CAD programs
that partition it into appropriate logic blocks. Select
a specific location in the FPGA for each logic
block and form the required interconnections.(
(Cadence, View Logic, OrCAD, etc.)

Initial Design Entry
• The detailed description of the logic circuit are entered
using a schematic capture program. In the design entry
phase, RTL or schematic entry is used to create the logic
to be implemented in the device.
• Pin assignments can also be made, including pin placement
information, and timing constraints that might be necessary
for building a functioning design.
• In the design entry step a schematic or Block Design File
(.bdf) is created that is the top-level design. The library of
parameterized modules (LPM) functions are added and
Verilog HDL code is used to add a logic block

Contd..
• The library may be either supplied by the vendor of the
schematic capture program or any FPGA vendor(Like Xilinx
or Altera etc).
• An alternate way to specify the logic circuit is to use a Boolean
expression or state machine language.
• This is done without the graphical interface. Some times it is
possible to use a mixture of both schematic and Boolean
expressions

Translation to XNF Format
After the logic circuit is successfully designed and
merged into one circuit ,it is translated into a
special format that is understood by the CAD
tools.For Xilinx this format is called Xilinx net list
format or XNF.This translation utility is supported
by the Xilinx or by the vendor of the logic entry
tool.The translation process may also involve
automatic optimizations of the circuit.

Partition
• The XNF circuit is partitioned into logic cells (this partition is
also known as Technology Mapping).
• This technology mapping converts the XNF circuit which is a
net list of basic logic gates ,into a net list of Xilinx logic cells.
• The logic cell used depends on which Xilinx product the
circuit is to be implemented in. XACT tools also attempt to
optimize the circuit during this step.

Place and Route
• Place &Route is performed by using either CAD tools or
manually by the user or mixture of the two.
• The first step is placement ,in which each logic cell generated
during the partition step is assigned to a specific location in the
FPGA.
• Automatic placement can be done using the simulated annealing
algorithm.
• After the placement ,the required interconnections among the
logic cells must be realized by selecting wire segments and
routing switches within the FPGA interconnection resources

Contd..
• The XACT tools provide a critical path timing analyzer
which provides delay information on the longest through
shortest paths through the chip.
• In addition, the physical layout timing information can also
be back-annotated to the schematics to get more accurate
functional simulation results.
• The final step in the Xilinx design flow is the creation of
the BIT file which contains the binary programming data
needed to configure the SRAM bits of the target chip.
• This file is then downloaded to configure the chip for final
functional and timing tests of the programmed chip.

Compilation
• After creating the design it must be compiled. Compilation
converts the design into a bitstream that can be downloaded into
the FPGA.
• The most important output of compilation is an SRAM Object
File (.sof), which is used to program the device.
• The software also generates other report files that provide
information about the code as it compiles

Contd..
• In the design flow process the simulation is very important to
learn, and there are entire applications devoted to simulating
hardware designs.
• There are two types of simulation, RTL and timing. RTL (or
functional) simulation allows you to verify that your code is
place-and-route) simulation verifies that the design meets timing
and functions appropriately in the device

contd..
• After completion of the design ,its performance is checked either
by downloading the configuration bits into FPGA or by using an
interface to a timing simulation program.
• If the performance is not satisfactory ,suitable modifications are
done at some point in the design flow.
• Once the timing and functionality is verified the implementation
is complete.

APPLICATIONS OFFPGAs
• FPGAs have gained rapid acceptance over the past two decades.
• Users can apply them to a wide range of applications like
random logic, integrating multiple SPLDs, device controllers,
communication encoding and filtering, small- to medium-size
systems with SRAM blocks, and many more.
• Another interesting FPGA application is prototyping designs to
be implemented in gate arrays by using one or more large
FPGAs.

contd..
• Another application is the emulation of entire
large hardware systems via the use of many
interconnected FPGAs.
• FPGAs offer particularly powerful solutions for
meeting machine vision, industrial networking,
motor control, and video surveillance needs.
• For example, the flexibility of FPGAs allow
designers to quickly adapt to changing image
sensor interfaces and image processing
requirements, evolve analysis capabilities to
keep pace with market requirements, and add
features and functions long after deployment.

contd..
• FPGAs are also used as custom computing machines.
• This involves using the programmable parts to execute software,
rather than compiling the software for execution on a regular
CPU.
• FPGAs provide a unique combination of highly parallel custom
computation, relatively low manufacturing/engineering costs,
and low power requirements.

Contd..
• FPGAs meet critical timing and performance requirements
with parallel processing and real-time industrial application
performance, permitting greater system integration and lower
development cost.
• In areas such as Industrial Networking and Imaging, where the
protocols and standards are shifting and changing, the
programmability of FPGAs versus fixed logic chips such as
ASICs and ASSPs allows for both faster time-to-market and
longer time-in-market.

FINALE
• The low cost ,fast manufacturing turnaround is the secret
behind the market success of FPGAs.
• Though the large, slow programmable switches prevent FPGAs
from providing the speed performance ,the improvements in
architecture and CAD tools will overcome these disadvantages.
• Over time FPDs will become the dominant technology for
implementing digital circuits.

References
• Field Programmable Gate Arrays – S.D Brown, R.J.Francis et al
• Field Programmable Gate array Technology- Trimberger
• FPGA and CPLD Architectures : A Tutorial -STEPHEN BROWN &
JONATHAN ROSE.
• FPGA Architecture: Survey and Challenges --Ian Kuon1, Russell
Tessier and Jonathan Rose1

CPLD & FPGA Architectures and Applications Guide

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CPLD & FPGA Architectures and Applications Guide

Similar to CPLD & FPGA Architectures and Applications Guide (20)

More from Principal,Guru Nanak Institute of Technology, Nagpur

More from Principal,Guru Nanak Institute of Technology, Nagpur (12)

Recently uploaded

Recently uploaded (20)

CPLD & FPGA Architectures and Applications Guide