Science 7 - LAND and SEA BREEZE and its Characteristics
vlsi.pdf important qzn answer for ece department
1. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 1
UNIT – I
INTRODUCTION - BASIC MOS TRANSISTOR
The invention of the transistor by William B. Shockley, Walter H. Brattain and John Bardeen
of Bell Telephone laboratories was followed by the development of the Integrated circuit (IC)
The very first IC emerged at the beginning of 1960 and since that time there have already
been 4 generations of ICs
1) SSI ( Small Scale Integration)
2) MSI ( Medium Scale Integration)
3) LSI ( Large Scale Integration)
4) VLSI ( Very Large Scale Integration)
Now we see the emergence of the 5th
generation, ULSI ( Ultra Large Scale Integration) which
is characterized by complexities in excess of 3 million devices on a single IC chip.Within the bounds
of MOS technology, the possible circuit realizations may be based on pMOS, nMOS, CMOS and now
BiCMOS devices. Although CMOS is the dominant technology, some of the examples used to
illustrate the design processes will be presented in nMOS form. The reasons are :
1) For NMOS technology, the design methodology and the design rules are easily learned, thus
providing a simple but excellent introduction to structured design for VLSI.
2) nMOS technology and design processes provide an excellent background for other
technologies. In particular some familiarity with nMOS allows a relatively easy transition to CMOS
technology and design.
3) For GaAs technology some arrangements in relation to logic design are similar to those
employed in nMOS technology. Therefore, understanding the basics of nMOS design will assist in the
layout of GaAs circuits.
BASIC MOS TRANSISTORS
nMOS devices are formed in a p-type substrate of moderate doping level. The source and drain
regions are formed by diffusing n-type impurities through suitable masks into 3 areas to give the
desired n-impurity concentration and give rise to depletion regions which extend mainly in the more
lightly doped p-region.
Thus, source and drain are isolated from one another by 2 diodes.
Connections to the source and drain are made by a deposited metal layer. . ( Fig a)
A polysilicon gate is deposited on a layer of insulation over the region between source and drain
If the gate is connected to a suitable positive voltage with respect to the source, then the
electric field established between the gate and the substrate gives rise to a charge inversion region in
the substrate under the gate insulation and a conducting path or channel is formed between source and
drain.
Channel may also be established so that it is present under the condition Vgs = 0 by
implanting suitable impurities in the region between the insulation and the gate. (fig b)
Substrate is of n-type material and the source and drain diffusions are consequently p-type.(fig c)
2. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 2
ENHANCEMENT MODE TRANSISTOR ACTION:
In order to establish the channel in the first place a min. voltage level of threshold voltage Vt
must be established between gate and source.
Fig (a) indicates the conditions
prevailing with the channel established
but no current flowing between source
and drain (Vds = 0)
Condition: When current flows in the
channel by applying a voltage Vds
between drain and source.
Corresponding IR drop = Vds along the
channel.
This results in the voltage between gate
and channel varying with distance along
the channel with the voltage being a max.
ofVgs at the source end.
Effective voltage Vg = Vgs-Vt, there
will be voltage available to invert the
channel at the drain end so long as Vgs –
Vt>= Vds.
Limiting condition comes when Vds =
Vgs – Vt.
For all voltages Vds<Vgs – Vt, the
device is in the non-saturated region of
operation.
IR drop = Vgs –Vt takes place over less
than the whole length of the channel so
that over part of the channel, near the
drain, there is insufficient electric field
available to give rise to inversion layer to
create the channel.
Diffusion current completes the path
from source to drain causing the channel
to exhibit a high resistance known as
saturation region.
DEPLETION MODE TRANSISTOR ACTION
The channel is established, due to the implant, even when Vgs = 0 and to cause the channel to cease
to exist a –ve voltage Vtd must be applied between gate and source.
Vtd is typically < -0.8Vdd, depending on
the implant and substrate bias, but
threshold voltage differences apart.
Drain to source current Ids versus voltage Vds relationships
The whole concept of the MOS transistor evolves from the use of a voltage on the gate to induce a
charge in the channel between source and drain, which may then be caused to move from source to
drain under the influence of an electric field created by voltage Vds applied between source and drain.
Since the charge induced is dependent on the gate to source voltage Vgs then Ids is independent on
both Vgs and Vds.
Consider a structure in which electrons will flow from source to drain.
= , First, transit time ζ sd
But velocity ,Where μ = electron or hole mobility (surface) Eds = electric field (drain to
source) ;
Now , So that , Thus,
Typical values of μ at room temp. areμn = 650 cm2
/Vsec ( surface) μp = 240 cm2
/Vsec (surface)
Non Saturated region:
3. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 3
Charge induced in channel due to gate voltage is due to to the voltage difference between the gate
and the channel Vgs
Voltage along the channel varies linearly with distance X from source due to the IR drop in the
channel.
Assuming the device is not saturated then the average value is Vds/2
Effective gate voltage Vg = Vgs-Vt, Where Vt is the threshold voltage needed to invert the charge
under the gate and establish the channel.
, Thus induced charge , Where
Eg= avg. electric field gate to channel
εins = relative permittivity of insulation between gate and channel
ε0 = permittivity of free space = 8.85x10-14
Fcm-1
Where D = oxide thickness
Thus 3
Combine eqn 2 & 3 in 1 , we have
or in the non saturated or resistive region where Vds<Vgs - Vtand
/D
The factor W/L is of course contributed by the geometry and it is a common practice to write
= K. W/L
so that Ids =
2
/
)
( 2
ds
V
Vds
Vt
Vgs
4a ( Alternate form of Eqn 4)
Gate/Channel Capacitance (parallel plate) Also , so
Sometimes it is convenient to use gate capacitance per unit area Co rather than Cg. Noting that Cg = Co
WL
We may also write , Ids = Co W/L
2
/
)
( 2
ds
V
Vds
Vt
Vgs
4c
Saturated region:
Saturation begins when Vds = Vgs - Vt. Since at this point the IR drop in the channel equals the
effective gate to channel voltage at the drain and we may assume that the current remains fairly
constant as Vds increases further.
Ideal I-V Characteristics
Drain current of MOS device in different operating regions.
MOS transistors have three regions of operation:
• Cutoff or sub-threshold region •Linear region • Saturation region
The long-channel model assumes that the current through an OFF transistor is 0.When a transistor
turns ON (Vgs>Vt),the gate attracts carriers(electrons) to form a channel. The electrons drift from
source to drain at a rate proportional to the electric field between these regions. Thus, we can
compute currents if we know the amount of charge in the channel and the rate at which it moves. We
know that the charge on each plate of a capacitor is Q=CV. Thus, the charge in the channel Qchannel
is where Cg is the capacitance of the gate to the channel and Vgc-Vt is
the amount of voltage attracting charge to the channel beyond the minimum required to invert from
pton. The gate voltage is referenced to the channel, which is not grounded. If the source is at Vs and
4. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 4
the drain is at Vd, the average is Vc=(Vs+Vd)/2= Vs+Vds/2. Therefore, the mean difference between
the gate and channel potentials Vgc is Vg–Vc=Vgs–Vds /2,as shown in Figure 2.5. We can model the
gate as a parallel plate capacitor with capacitance proportional to area over thickness. If the gate has
length L and width W and the oxide thickness is tox, as shown in Figure2.6, the capacitance is
Where ε0 is the permittivity of frees pace,8.85×10–14F/cm,andthepermittivityofSiO2is
kox=3.9times as great. Often, the εox/tox term is called Cox, the capacitance per unit area of the
gate oxide.
Some nanometer processes use a different gate dielectric with a higher dielectric constant. In these
processes, tox the equivalent oxide thickness (EOT), the thickness of a layer of SiO2 that has the
same Cox. In this case, tox is thinner than the actual dielectric. Each carrier in the channel is
accelerated to an average velocity, v, proportional to the lateral electric field, i.e., the field between
source and drain. The constant of proportionality μ is called the mobility. The electric field
E is the voltage difference between drain and source Vds divided by the channel length .
The time required for carriers to cross the channel is the channel length divided by the carrier
velocity: L/v. Therefore, the current between source and drain is the total amount of charge in the
channel divided by the time required to cross
The term Vgs–Vt arises so often that it is convenient to abbreviate it as VGT. Equation describes the
linear region of operation, for Vgs>Vt, but Vds relatively small. It is called linear or resistive
because when Vds<<VGT, Ids increases almost linearly with Vds, just like an ideal resistor. The
geometry and technology- dependent parameters are sometimes merged into a single factor ᵝ .
If Vds>Vdsat-VGT, the channel is no longer inverted in the vicinity of the drain; we say it is pinched
off. Beyond this point, called the drain saturation voltage, increasing the drain voltage has no further
effect on current. Substituting Vds=Vdsat at this point of maximum current into Eq(2.5),we find an
expression for the saturation current that is independent of Vds. …
This expression is valid for Vgs>Vt and Vds>Vdsat. Thus, long-channel MOS transistors are said to
exhibit square-law behavior in saturation.
Two key figures of merit for a transistor are Ion and Ioff. Ion (also called Idsat) is the ON current,
Ids, when Vgs=Vds=VDD. Ioff is the OFF current when Vgs=0 and Vds=VDD. According to the
long-channel model, Ioff=0and .
Figure 2.7(a) showsthe I-Vcharacteristicsforthe transistor.Accordingtothefirst-ordermodel,the current
5. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 5
is zero for gate voltages below Vt. For higher gate voltages, current increases linearly with Vds for
small Vds. As Vds reaches the saturation point Vdsat=VGT, current rolls off and eventually becomes
independent of Vds when the transistor is saturated. pMOS transistors behave in the same way, but
with the signs of all voltages and currents reversed. The I-V characteristics are in the third quadrant,
as shown in Figure2.7 (b).
Non -Ideal I-V Effects
The saturation current increases less than quadratically with increasing Vgs . This is caused
by two effects: velocity saturation and mobility degradation.
At high lateral field strengths (Vds /L), carrier velocity ceases to increase linearly with field
strength. This is called velocity saturation and results in lower Ids than expected at high Vds .
At high vertical field strengths (Vgs /tox ), the carriers scatter off the oxide interface more
often, slowing their progess. This mobility degradation effect also leads to less current than
expected at high Vgs .
The saturation current of the nonideal transistor increases somewhat with Vds . This is caused
by channel length modulation, in which higher Vds increases the size of the depletion region
around the drain and thus effectively shortens the channel.
Increasing the potential between the source and body raises the threshold through the body
effect. Increasing the drain voltage lowers the threshold through drain-induced barrier
lowering. Increasing the channel length raises the threshold through the short channel effect.
When Vgs<Vt , the current drops off exponentially rather than abruptly becoming zero. This is
called subthreshold conduction. The current into the gate Ig is ideally 0. However, as the
thickness of gate oxides reduces to only a small number of atomic layers, electrons tunnel through
the gate, causing some gate leakage current. The source and drain diffusions are typically reverse-
biased diodes and also experience junction leakage into the substrate or well.
Both mobility and threshold voltage decrease with rising temperature. The mobility effect
tends to dominate for strongly ON transistors, resulting in lower Ids at high temperature. The
threshold effect is most important for OFF transistors, resulting in higher leakage current at high
temperature. In summary, MOS characteristics degrade with temperature.
6. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 6
Mobility Degradtion and Velocity Saturation
Carrier drift velocity, and hence current, is proportional to the lateral electric field Elat = Vds /L
between source and drain. The constant of proportionality is called the carrier mobility, μ. The long-
channel model assumed that carrier mobility is independent of the applied fields.
A high voltage at the gate of the transistor attracts the carriers to the edge of the channel, causing
collisions with the oxide interface that slow the carriers. This is called mobility degradation.
Carriers approach a maximum velocity vsat when high fields are applied. This phenomenon is
called velocity saturation.
Channel Length Modulation
Ideally, Ids is independent of Vds for a transistor in saturation, making the transistor a perfect
current source. The p–n junction between the drain and body forms a depletion region with a width Ld
that increases with Vdb. The depletion region effectively shortens the channel length to Leff = L - Ld
Assume the source voltage is close to the body voltage so Vdb = Vds. Hence, increasing Vds
decreases the effective channel length. Shorter channel length results in higher current; thus, Ids
increases with Vds in saturation. This can be crudely modeled by multiplying EQ (2.10) by a factor of
(1 + Vds / VA), where VA is called the Early voltage. In the saturation region
As channel length gets shorter, the effect of the channel length modulation becomes relatively more
important. Hence, VA is proportional to channel length. This channel length modulation model is a
gross oversimplification of nonlinear behavior and is more useful for conceptual understanding than
for accurate device modeling.
Threshold Effects
So far, we have treated the threshold voltage as a constant. However, Vt increases with the source
voltage, decreases with the body voltage, decreases with the drain voltage, and increases with channel
length. This section models each of these effects.
Body Effect
The body is an implicit fourth terminal. When a voltage Vsb is applied between the source and body,
it increases the amount of charge required to invert the channel, hence, it increases the threshold
voltage. The threshold voltage can be modeled as
where Vt0 is the threshold voltage when the source is at the body potential, ϕs is the surface potential
at threshold and γ is the body effect coefficient, typically in the range 0.4 to 1 V1/2
.
i. Drain induced barrier Lowering (DIBL)
The drain voltage Vds creates an electric field that affects the threshold voltage. This drain-
induced barrier lowering (DIBL) effect is especially pronounced in short-channel transistors.
It can be modeled asVt = Vto –ηVds. where η is the DIBL coefficient, typically on the order
of 0.1 (often expressed as 100 mV/V).
Drain-induced barrier lowering causes Ids to increase with Vds in saturation, in much the same way as
channel length modulation does. This effect can be lumped into a smaller Early voltage VA.
Short Channel Effects
The threshold voltage typically increases with channel length. This phenomenon is especially
pronounced for small L where the source and drain depletion regions extend into a significant portion
of the channel, and hence is called the short channel effect or Vtrolloff.
ii. Leakage
Even when transistors are nominally OFF, they leak small amounts of current. Leakage
mechanisms include subthreshold conduction between source and drain, gate leakage from the
gate to body, and junction leakage from source to body and drain to body.
Subthreshold conduction is caused by thermal emission of carriers over the potential barrier set by
the threshold. Gate leakage is a quantum-mechanical effect caused by tunneling through the
7. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 7
extremely thin gate dielectric. Junction leakage is caused by current through the p-n junction
between the source/drain diffusions and the body.
Subthreshold Leakage
The long-channel transistor I-V model assumes current only flows from source to drain when
Vgs> Vt. In real transistors, current does not abruptly cut off below threshold, but rather drops off
exponentially.
When the gate voltage is high, the transistor is strongly ON. When the gate falls below Vt , the
exponential decline in current appears as a straight line on the logarithmic scale. This regime of
Vgs<Vt is called weak inversion.
The subthreshold leakage current increases significantly with Vds because of drain-induced
barrier lowering. There is a lower limit on Ids set by drain junction leakage that is exacerbated by
the negative gate voltage.
Subthreshold leakage current is described by EQ (2.42). Ids0 is the current at threshold and is
dependent on process and device geometry.
Gate Leakage
According to quantum mechanics, the electron cloud surrounding an atom has a probabilistic spatial
distribution. For gate oxides thinner than 15–20 Å, side of the oxide, where it will get whisked away
through the channel. This effect of carriers crossing a thin barrier is called tunneling, and results in
leakage current through the gate.
Two physical mechanisms for gate tunneling are called Fowler-Nordheim (FN) tunnelingand
direct tunneling. FN tunneling is most important at high voltage and moderate oxide thickness and is
used to program EEPROM memories. Direct tunneling is most important at lower voltage with thin
oxides and is the dominant leakage component. The direct gate tunneling current can be estimated as
where A and B are technology constants.
Junction Leakage
The p–n junctions between diffusion and the substrate or well form diodes. The well-to-
substrate junction is another diode. The substrate and well are tied to GND or VDD to ensure these
diodes do not become forward biased in normal operation. However, reverse-biased diodes still
conduct a small amount of current ID.
where IS depends on doping levels and on the area and perimeter of the diffusion region and VD is the
diode voltage (e.g., –Vsb or –Vdb). When a junction is reverse biased by significantly
more than the thermal voltage, the leakage is just –IS, generally in the 0.1–0.01 fA/μm2
range, which
is negligible compared to other leakage mechanisms.
More significantly, heavily doped drains are subject to band-to-band tunneling (BTBT) and
gate-induced drain leakage (GIDL).
Temperature Dependence
Transistor characteristics are influenced by temperature. Carrier mobility decreases with temperature.
An approximate relation is
where T is the absolute temperature, Tr is room temperature, and kμ is a fitting parameterwith a
8. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 8
typical value of about 1.5. vsat also decreases with temperature, dropping by about20% from 300 to
400 K. The magnitude of the threshold voltage decreases nearly linearly with temperature and may
be approximated by where kvt is typically about 1–2
mV/K. Ion at high VDD decreases with temperature. Subthreshold leakage increases exponentiallywith
temperature.
Subthreshold leakage is exponentially dependent on temperature, so lower threshold voltages can
be used. Velocity saturation occurs at higher fields, providing more current.
As mobility is also higher, these fields are reached at a lower power supply, saving power.
Depletion regions become wider, resulting in less junction capacitance.
Geometry Dependence
The layout designer draws transistors with width and length Wdrawn and Ldrawn. The actual gate
dimensions may differ by some factors XW and XL.
the source and drain tend to diffuse laterally under the gate by LD, producing a shorter effective
channel length that the carriers must traverse between source and drain. Similarly, WD accounts
for other effects that shrink the transistor width. The factors of two come from lateral diffusion on
both sides of the channel.
Therefore, a transistor drawn twice as long may have an effective length that is more than twice as
great. Similarly, two transistors differing in drawn widths by a factor of two may differ in
saturation current by more than a factor of two.
Threshold voltages also vary with transistor dimensions because of the short and narrow channel
effects.
Combining threshold changes, effective channel lengths, channel length modulation, and
velocity saturation effects, Idsat does not scale exactly as 1/L. In general, when currents must be
precisely matched (e.g., in sense amplifiers or A/D converters), it is best to use the same width and
length for each device. Current ratios can be produced by tying several identical transistors in parallel.
CMOS TECHNOLOGIES
CMOS provides an inherently low power static circuit technology that has the capability of
providing a lower-delay product than comparable design-rule nMOS or pMOS technologies. The
four dominant CMOS technologies are:
P-well process
n-well process
twin-tub process
Silicon on chip process
nMOS FABRICATION
Processing is carried out on a thin wafer cut from a single crystal of silicon of high purity into
which the required p-impurities are introduced as the crystal is grown.
A layer of silicon dioxide ( SiO2), typically 1m thick is grown all over he surface of the wafer
to protect the surface, act as a barrier to dopants during processing and provide a generally
insulating substrate on to which other layers may be deposited and patterned.
The surface is now covered with a photo resist which is deposited onto the wafer and spun to
achieve an even distribution of the required thickness.
The photo resist layer is then exposed to ultra violet light through a mask which defines those
regions into which diffusion is to take place together with transistor channels.
These areas are subsequently readily etched away together with the underlying silicon dioxide so
that the wafer surface is exposed in the window defined by the mask.
Remaining photo resist is removed and a thin layer of SiO2 is grown over the entire chip surface
and then polysilicon is deposited on top of this to form the gate structure. The Layer consists of
heavily doped polysilicon deposited by chemical vapor deposition (CVD).
Photo resist coating and masking allows the polysilicon to be patterned and then the thin oxide is
removed to expose areas into which n-type impurities are to be diffused.
Thin oxide is grown over all again and is then masked with photo resist and etched to expose
selected areas of the polysilicon gate and the drain and source areas where connections are to be
made.
The whole chip then has metal (Al) deposited over its surface to a thickness typically of 1 m.
This metal layer is then masked and etched to form the required interconnection pattern.
9. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 9
CMOS FABRICATION
P-well process is widely used in practice and then the n-well process is also popular.
P-well process
The diffusion must be carried out with special care since the p-well doping concentration and depth
will affect the threshold voltages as well as the breakdown voltages of the n-transistor.
To achieve low threshold voltages ( 0.6 to 1.0 V) we need wither deep well diffusion or high well
resistivity.
But deep wells require larger spacing due to lateral diffusion and therefore a larger chip area.
10. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 10
The p-well act as substrates for the n-devices within the parent n-substrate and provided that voltage
polarity restrictions are observed, the 2 areas are electrically isolated.
Layout Design rules
Layout design rules describe how small features can be and how closely they can be reliably
packed in a particular manufacturing process. Industrial design rules are usually specified in
microns. This makes migrating from one process to a more advanced process or a different foundry‘s
process difficult because not all rules scale in the same way.
Mead and Conway popularized scalable design rules based on a single parameter ,λ, that
characterizes the resolution of the process. Λ is generally half of the minimum drawn transistor
channel length. This length is the distance between the source and drain of a transistor and is set by
the minimum width of a polysilicon wire. Designers often describe a process by its feature size.
Feature size refers to minimum transistor length, so λ is half the feature size.
This length is the distance between the source and drain of a transistor and is set by the
minimum width of a polysilicon wire. For example, a 180 nm process has a minimum polysilicon
width (and hence transistor length) of 0.18 μm and uses design rules with λ= 0.09 μm3
. Lambda-
based rules are necessarily conservative because they round up dimensions to an integer multiple of
λ
A conservative but easy-to-use set of design rules for layouts with two metal layers in an n-well
process is as follows:
Metal and diffusion have minimum width and spacing of 4 λ.
Contacts are 2 λ × 2 λ and must be surrounded by 1 λ on the layers above and below.
Polysilicon uses a width of 2 λ.
Polysilicon overlaps diffusion by 2λ where a transistor is desired and has a spacing
of 1 λ away where no transistor is desired.
Polysilicon and contacts have a spacing of 3λ from other polysilicon or contacts.
N-well surrounds pMOS transistors by 6λ and avoids nMOS transistors by 6λ.
Transistor dimensions are often specified by their Width/Length (W/L) ratio. For example, the
nMOS transistor in Figure 1.39 formed where polysilicon crosses n-diffusion has a W/L of 4/2. In a
0.6 μm process, this corresponds to an actual width of 1.2 μm and a length of 0.6 μm. Such a
11. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 11
minimum-width contacted transistor is often called a unit transistor.
pMOS transistors are often wider than nMOS transistors because holes move more slowly than
electrons so the transistor has to be wider to deliver the same current. Figure 1.40(a) shows a unit
inverter layout with a unit nMOS transistor and a double-sized pMOS transistor. Figure 1.40(b)
shows a schematic for the inverter annotated with Width/ Length for each transistor. In digital
systems, transistors are typically chosen to have the minimum possible length because short-channel
transistors are faster, smaller, and consume less power. Figure 1.40(c) shows a shorthand we will
often use, specifying multiples of unit width and assuming minimum length.
Gate layouts
Line of Diffusion based style consists of four horizontal strips:
Metal ground at the bottom of the cell, n-diffusion, p-diffusion, and metal power at the top.
The power and ground lines are often called supply rails. Polysilicon lines run vertically to form
transistor gates. Metal wires within the cell connect the transistors appropriately.
Figure 1.41(a) shows such a layout for an inverter. The input A can be connected from the
top, bottom, or left in polysilicon. The output Y is available at the right side of the cell in metal.
Recall that the p-substrate and n-well must be tied to ground and power, respectively.
Figure 1.41(b) shows the same inverter with well and substrate taps placed under the power
and ground rails, respectively. Figure 1.42 shows a 3-input NAND gate. Notice how the nMOS
transistors are connected in series while the pMOS transistors are connected in parallel. Power and
ground extend 2 λ on each side so if two gates were abutted the contents would be separated by 4 λ,
satisfying design rules. The height of the cell is 36 λ, or 40 λ if the 4 λ space between the cell and
another wire above it is counted. All these examples use transistors of width 4 λ.
12. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 12
13. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 13
UNIT II COMBINATIONAL CIRCUIT DESIGN
DESIGN PRINCIPLE OF STATIC CMOS DESIGN
Digital CMOS circuits are implemented using either static or dynamic design
techniques. In static CMOS, the output is tied to VDD or ground via a low resistance path
(except during switching) and this leads to circuits implementation robust with good noise
immunity. In static CMOS design any function can be realized as a sum of product (SOP) or
a product of sum (POS). If an SOP function pulls the output high, then an SOP-BAR function
will pull the output low. A POS function can pull the output high, while a POS-BAR function
can pull the output low, as shown in fig.
Important properties of static CMOS design:
At any instant of time, the output of the gate is directly connected to Vss or VDD. All
functions are composed of either AND'ed or OR'ed sub functions. The AND function is
composed of NMOS transistors in series. The OR function is composed of NMOS transistors
in parallel. Contains a pull-up network (PUP) and pull down network (PDN). PUP networks
consist of PMOS transistors. PDN networks consist of NMOS transistors. Each network is
the dual of the other network. The output of the complementary gate is inverted.
Advantages of static CMOS design:
Robust in construction.
Good noise immunity.
Static logic has no minimum clock rate, the clock can be paused indefinitely.
Low power consumption.
For low operating frequencies, CMOS static logic is used to obtain a relatively small
die size.
Limitations of static CMOS design:
The main limitation of static circuits is slower-speed as compared to dynamic circuits. The
reasons are
1. Increased gate capacitance due to the presence of both PMOS and NMOS transistors.
2. Output depends on the previous cycle inputs due to charges that may be present at internal
inputs.
3. Multiple switching of the output within a cycle depending on the input switching pattern
MOSFETS as Switches
The gate controls the passage of current between the source and the drain. CMOS uses
positive logic - VDD is logic ‗1‘ and Vss is logic '0'. We turn a transistor on or off using the
gate terminal. There are two kinds of CMOS transistors, n - Channel transistors and p -
channel transistors. An n - channel transistor requires a logic T on the gate to make the switch
conducting (to turn the transistor on). A p - channel transistor requires a logic '0' on the gate
to make the switch conducting (to turn the transistor on). The conventional schematic icon
representation along with the switch characteristics is shown.
Basic CMOS Gates In this section, the basic gate implementation in static CMOS are
presented.
AND Gate
If two N-switches are placed in series, the composite switch constructed by this action is
closed (or ON) if both switches are connected to logic '1'. If any one of the switch is at logic
14. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 14
'0' the circuit is said to be open (or OFF) state this yields an 'AND' function. The switch logic
of AND function is shown in
OR Gate
If two N-switches are placed in parallel, the composite switch constructed by this action is
closed (or ON) if any one of the switch is connected to logic ‗1‘.
Bubble Pushing
CMOS stages are inherently inverting, so AND and OR functions must be built from
NAND and NOR gates. DeMorgan‟ s law helps with this conversion:
A NAND gate is equivalent to an OR of inverted inputs. A NOR gate is equivalent to
an AND of inverted inputs. The same relationship applies to gates with more inputs.
Switching between these representations is easy to do on a whiteboard and is often called
bubble pushing.
Compound Gates:
Static CMOS also efficiently handles compound gates computing various
The logical effort of each input is the ratio of the input capacitance of that input to the
input capacitance of the inverter
For the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C)
than for the two AND terminals (A, B).
15. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 15
The parasitic delay is crudely estimated from the total diffusion capacitance on the output
node by summing the sizes of the transistors attached to the output.
Input Ordering Delay Effect
The logical effort and parasitic delay of different gate inputs are often different. Other
gates, like NANDs and NORs, are nominally symmetric but actually have slightly different
logical effort and parasitic delays for the different inputs.
Figure shows a 2-input NAND gate annotated with diffusion parasitic. Consider the
falling output transition occurring when one input held a stable 1 value and the other rises
from 0 to 1. If input B rises last, node x will initially be at VDD – Vt ≈ VDD because it was
pulled up through the nMOS transistor on input A.
The Elmore delay is (R/2)(2C) + R(6C) = 7RC. On the other hand, if input A
rises last, node x will initially be at 0 V because it was discharged through the nMOS
transistor on input B. No charge must be delivered to node x, so the Elmore delay is simply
R(6C) = 6RC.
In general, we define the outer input to be the input closer to the supply rail (e .g., B)
and the inner input to be the input closer to the output (e.g., A). The parasitic delay is smallest
when the inner input switches last because the intermediate nodes have already been
discharged. Therefore, if one signal is known to arrive later than the others, the gate is fastest
when that signal is connected to the inner input.
The inner input has a lower parasitic delay. The logical efforts are lower than
initial estimates might predict because of velocity saturation. Interestingly, the inner input has
a slightly higher logical effort because the intermediate node x tends to rise and cause
negative feedback when the inner input turns ON.
This effect is seldom significant to the designer because the inner input remains faster
16. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 16
over the range of fan-outs used in reasonable circuits. When one input is far less critical than
another, even nominally symmetric gates can be made asymmetric to favor the late input at
the expense of the early one.
For example, consider the path in Figure. Under ordinary conditions, the path acts as a
buffer between A and Y. When reset is asserted, the path forces the output low.
If reset only occurs under exceptional circumstances and can take place slowly, the
circuit should be optimized for input-to-output delay at the expense of reset.
The pulldown resistance is R/4 +R/ (4/3) = R, so the gate still offers the same driver
as a unit inverter. However, the capacitance on input A is only 10/3, so the logical effort is
10/9. This is better than 4/3, which is normally associated with a NAND gate. In the limit of
an infinitely large reset transistor and unit-sized nMOS transistor for input A, the logical
effort approaches 1, just like an inverter.
The improvement in logical effort of input A comes at the cost of much higher effort
on the reset input. Note that the pMOS transistor on the reset input is also shrunk. This
reduces its diffusion capacitance and parasitic delay at the expense of slower response to
reset.
Skewed Gates
In other cases, one input transition is more important than the other. We define H-I
skew gates to favor the rising output transition and LO-skew gates to favor the falling output
transition. This favoring can be done by decreasing the size of the noncritical transistor.
The logical efforts for the rising (up) and falling (down) transitions are called ground gd,
respectively, and are the ratio of the input capacitance of the skewed gate to the input
capacitance of an unskewed inverter with equal drive for that transition.
Figure (a) shows how a H-I skew inverter is constructed by downsizing the nMOS
transistor. This maintains the same effective resistance for the critical transition while
reducing the input capacitance relative to the unskewed inverter of Figure (b), thus reducing
the logical effort on that critical transition to gu = 2.5/3 = 5/6.
Of course , the improvement comes at the expense of the effort on the
noncritical transition. The logical effort for the falling transition is estimated by comparing
the inverter to a smaller unskewed inverter with equal pulldown current, shown in Figure (c),
giving a logical effort of gd = 2.5/1.5 = 5/3.
The degree of skewing (e.g., the ratio of effective resistance for the fast transition
relative to the slow transition) impacts the logical efforts and noise margins; a factor of two is
common. Figure catalogs HI-skew and LO-skew gates with a skew factor of two. Skewed
gates are sometimes denoted with an H or an L on their symbol in a schematic.
P/N Ratios
The pMOS transistors in the unskewed gate are enormous in order to provide
equal rise delay. They contribute input capacitance for both transitions, while only helping
the rising delay. By accepting a slower rise delay, the pMOS transistors can be downsized to
reduce input capacitance and average delay significantly.
17. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 17
Reducing the pMOS size from 2 to for the inverter gives the theoretical fastest
average delay, but this delay improvement is only 3%. However, this significantly reduces
the pMOS transistor area.
It also reduces input capacitance, which in turn reduces power consumption.
Unfortunately, it leads to unequal delay between the outputs. Some paths can be slower than
average if they trigger the worst edge of each gate.
Excessively slow rising outputs ca n also cause hot electron de gradation. And
reducing the pMOS size also moves the switching point lower and reduces the inverter‟ s
noise margin. In summary, the P/N ratio of a library of cells should be chosen on the basis of
area, power, and reliability, not average delay.
For NOR gates , reducing the size of the pMOS transistors significantly improves
both delay and area. In most standard cell libraries, the pitch of the cell determines the P/N
ratio that can be achieved in any particular gate. Ratios of 1.5–2 are commonly used for
inverters.
Multiple Threshold Voltages
Some CMOS processes offer two or more threshold voltages . Transistors with lower
threshold voltages produce more ON current, but also leak exponentially more OFF current.
Libraries can provide both high and low threshold versions of gates. The low - threshold
gates can be used sparingly to reduce the delay of critical paths. Skewed gates can use low
threshold devices on only the critical network of transistors.
Delay estimation:
Estimation of the delay of a Boolean function from its functional description is an
important step towards design exploration at the register transfer level (RTL). This paper
addresses the problem of estimating the delay of certain optimal multi-level implementations
of combinational circuits, given only their functional description.
tpdr: rising propagation delay From input to rising output crossing VDD/2
tpdf: falling propagation delay From input to falling output crossing VDD/2
tpd: average propagation delay tpd = (tpdr + tpdf)/2
tr: rise time From output crossing 20% to 80% VDD
tf: fall time From output crossing 80% to 20% VDD
tcd: average contamination delay tcd = (tcdr + tcdf)/2
tcdr: rising contamination delay: Min from input to rising output crossing VDD/2 tcdf:
falling contamination delay: Min from input to falling output crossinVDD/2
Use RC delay models to estimate delay
C = total capacitance on the output node. Use Effective resistance R, Therefore tpd = RC
Transistors are characterized by finding their effective R.
Transistor sizing:
Not all gates need to have the same delay.
Not all inputs to a gate need to have the same delay.
Adjust transistor sizes to achieve desired delay.
Logical effort
Logical effort is a gate delay model that takes transistor sizes into account. Allows us
to optimize transistor sizes over combinational networks. Isn‘t as accurate for circuits with
reconvergent fanout.
Logical effort gate delay model
18. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 18
Express delays in process-independent unit
Gate delay is measured in units of minimum-size inverter delay τ. d = dabs / τ.
τ = 3RC ≈ 12ps in 180 nm process, 40 ps in 0.6 µm process.
Gate delay formula: d = f + p.
Effort delay f is related to gate‘s load. Parasitic delay p depends on gate‘s structure.
Represents delay of gate driving no load Set by internal parasitic capacitance
Effort delay
Effort delay has two components: f = gh.
Electrical effort h is determined by gate‘s load: h = Cout/Cin Sometimes called fanout
Logical effort g is determined by gate‘s structure. Measures relative ability of gate to
deliver current g ≡ 1 for inverter
Delay plots:
Computing Logical Effort
Logical effort is the ratio of the input capacitance of a gate to the input capacitance of an
inverter delivering the same output current. Measure from delay Vs fanout plots Or estimate
by counting transistor widths.
Circuit families and its comparison:
The method of logical effort does not apply to arbitrary transistor networks, but only
to logic gates. A logic gate has one or more inputs and one output, subject to the following
restrictions:
The gate of each transistor is connected to an input, a power supply, or the output; and
Inputs are connected only to transistor gates.
The first condition rules out multiple logic gates masquerading as one, and the second
keeps inputs from being connected to transistor sources or drains, as in transmission gates
without explicit drivers.
Pseudo-NMOS circuits
Static CMOS gates are slowed because an input must drive both NMOS and PMOS
transistors. In any transition, either the pullup or pulldown network is activated, meaning the
input capacitance of the inactive network loads the input. Moreover, PMOS transistors have
poor mobility and must be sized larger to achieve comparable rising and falling delays,
further increasing input capacitance.
Pseudo-NMOS and dynamic gates offer improved speed by removing the PMOS
transistors from loading the input. Pseudo-NMOS gates resemble static gates, but replace the
slow PMOS pullup stack with a single grounded PMOS transistor which acts as a pullup
resistor. The effective pullup resistance should be large enough that the NMOS transistors
can pull the output to near ground, yet low enough to rapidly pull the output high.
Figure shows several pseudo-NMOS gates ratioed such that the pulldown transistors
are about four times as strong as the pullup. The logical effort follows from considering the
output current and input capacitance compared to the reference inverter from Figure Sized as
shown, the PMOS transistors produce 1/3 of the current of the reference inverter and the
NMOS transistor stacks produce 4/3 of the current of the reference inverter.
19. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 19
For falling transitions, the output current is the pulldown current minus the pullup
current which is fighting the pulldown, For rising transitions, the output current is just the
pullup current, 1/3. The inverter and NOR gate have an input capacitance of 4/3.
Gate
type
Logical Effort g
Rising Falling Average
2 - NAND 8/3 8/9 16/9
3 - NAND 4 4/3 8/3
4 - NAND 16/3 16/9 32/9
n - NOR 4/3 4/9 8/9
n - mux 8/3 8/9 16/9
The average logical effort is g = (4=9+4=3)=2 = 8. This is independent of the number of
inputs, explaining why pseudo-NMOS is a way to build fast wide NOR gates.
Pass Transistor Logic :
It is a MOS transistor, in which gate is driven by a control signal the source (out),
the drain of the transistor is called constant or variable voltage potential(in) when the control
signal is high, input is passed to the output and when the control signal is low, the output is
floating topology such topology circuits is called pass transistor.
The Pass transistor logic is required to reduce the transistors for implementing logic
by using the primary inputs to drive gate terminals, source and drain terminals. In
complementary CMOS logic primary inputs are allowed to drive only gate terminals.
Figure shows implementation of AND function using only MOS pass transistors. In this gate
if the B input is high the left NMOS is turned ON and copies the input A to the output F.
When B is low the right NMOS pass transistor is turned ON and passes a ‗0‘ to the output F.
This satisfies the truth table of AND gate reproduced in Table below for verification. ‗OR‘
gate using pass transistor logic
The truth table of ‗OR‘ gate is as shown in Table below. Figure below shows the
implementation of OR function using NMOS transistors only. In this gate if the B input is
high the right NMOS is turned ON and copies logic 1 to F and this operation does not
affected by ‗A‘ input. When B is low the left NMOS is turned ON the logic of ‗A‘ is copied
to the output F.
20. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 20
Advantage:
Fewer transistors are required to implement a given function.
Lower capacitance because of reduced number of transistors.
They do not have path VDD to GND and do not dissipate standby power (static power
dissipation).
Drawback:
As discussed NMOS devices are effective in passing strong ‗0‘ but it is poor at
pulling a node to VDD. Hence when the pass transistor pulls a node to high logic the output
only changes upto VDD–VTh. This is the major disadvantage of pass transistors.
Pass transistor logic (PTL) circuits are often superior to standard CMOS circuits in
terms of layout density, circuit delay and power consumption.
Transmission Gate Logic:
The transmission gate logic is used to solve the voltage drop problem of the pass
transistor logic. This technique uses the complementary properties of NMOS and PMOS
transistors. i.e. NMOS devices passes a strong ‗0‘ but a weak ‗1‘ while PMOS transistors
pass a strong ‗1‘ but a weak ‗0‘. The transmission gate combines the best of the two devices
by placing an NMOS transistor in parallel with a PMOS transistor as shown in Figure below.
The control signals to the transmission gate C and ~C are complementary to each
other. The transmission gate is mainly a bi-directional switch enabled by the gate signal ‗C‘.
When C = 1 both MOSFETs are ON and the signal pass through the gate i.e. A = B if C = 1.
Whereas C = 0 makes the MOSFETs cut off creating an open circuit between nodes A and B.
Basic Structure :
The basic structure of transmission gate is shown in Figure below which consists of
NMOS and PMOS transistors. Here, VG is applied to NMOS, and (VDD- VG) applied to the
PMOS.
The transmission gate work voltage-controlled switch. When VG is high, NMOS and
PMOS are conducting hence switch is closed. Therefore, conduction path between left and
right sides exist. When VG is low, then the MOSFETs are in cutoff and switch is open.
Therefore, there is no direct relationship between VA and VB. Figure below shows the
symbol of transmission gate controlled by switching signals X and X* that are applied to the
gates of NMOS and PMOS respectively.
21. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 21
The circuit constructed with the parallel connection of PMOS and NMOS with
shorted drain and source terminals. The gate terminal uses two select signals s and s, when s
is high than the transmission gates passes the signal on the input. The main advantage of
transmission gate is that it eliminates the threshold voltage drop. Multiplexing element of
path selector, A latch element An unlock switch, Act as a voltage controlled resistor
connecting the input and output.
2 : 1 MUX using transmission gate :
A 2:1 multiplexer is shown in Figure below. This gate selects either input A or B on the basis
of the value of the control signal ‗C‘. When control signal C is logic low the output is equal
to the input A and when control signal C is logic high the output is equal to the input B.
A 2 : 1 multiplexer can be implemented using transmission gates. Figure below shows the
connection diagram of the 2 : 1 multiplexer using transmission gates.
The 2 : 1 MUX selects either A or B depending upon the control signal C. This is
equivalent to implementing the Boolean function, F = (A C + B ~C) When the control
signal C is high then the upper transmission gate is ON and it passes A through it so that
output = A.
When the control signal C is low then the upper transmission gate turns OFF and it will not
allow A to pass through it, at the same time the lower transmission gate is ‗ON‘ and it allows
B to pass through it so the output = B.
DYNAMIC CMOS LOGIC
Ratioed circuits reduce the input capacitance by replacing the pMOS
transistors connected to the inputs with a single resistive pullup. The drawbacks of ratioed
circuits include slow rising transitions, contention on the falling transitions, static power
dissipation, and a non zero VOL.
Dynamic circuits circumvent these drawbacks by using a clocked pullup transistor
rather than a pMOS that is always ON. Figure compares (a) static CMOS, (b) pseudo- nMOS,
and (c) dynamic inverters. Dynamic circuit operation is divided into two modes, as shown in
Figure
22. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 22
Dynamic circuits are the fastest commonly used circuit family because they have
lower input capacitance and no contention during switching. They also have zero static power
dissipation. However, they require careful clocking, consume significant dynamic power, and
are sensitive to noise during evaluation.
In Figure, if the input A is 1 during precharge, contention will take place because both
the pMOS and nMOS transistors will be ON.
When the input cannot be guaranteed to be 0 during precharge, an extra clocked evaluation
transistor can be added to the bottom of the nMOS stack to avoid contention as shown in
Figure. The extra transistor is sometimes called a foot.
Figure estimates the falling logical effort of both footed and unfooted dynamic gates.
As usual, the pulldown transistors‟ widths are chosen to give unit resistance. Precharge
occurs while the gate is idle and often may take place more slowly. Therefore, the precharge
transistor width is chosen for twice unit resistance.
This reduces the capacitive load on the clock and the parasitic capacitance at the
expense of greater rising delays. We see that the logical efforts are very low. Footed gates
have higher logical effort than their unfooted counterparts but are still an improvement over
static logic. In practice, the logical effort of footed gates is better than predicted because
velocity saturation means series nMOS transistors have less resistance than we have
estimated.
The size of the foot can be increased relative to the other nMOS transistors to reduce
logical effort of the other inputs at the expense of greater clock loading. Like pseudo- nMOS
gates, dynamic gates are particularly well suited to wide NOR functions or multiplexers
because the logical effort is independent of the number of inputs.
A fundamental difficulty with dynamic circuits is the monotonicity
requirement. While a dynamic gate is in evaluation, the inputs must be monotonically rising.
That is, the input can start LOW and remain LOW, start LOW and rise HIGH, start HIGH
and remain HIGH, but not start HIGH and fall LOW.
Figure shows wave forms for a footed dynamic inverter in which the input violates
monotonicity. During precharge, the output is pulled HIGH. When the clock rises, the input
is HIGH so the output is discharged LOW through the pulldown network, as you would want
to have happen in an inverter. The input later falls LOW, turning off the pulldown network.
23. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 23
The output of a dynamic gate be gins HIGH and monotonically falls LOW during
evaluation. This monotonically falling output X is not a suitable input to a second dynamic
gate expecting monotonically rising signals.
CMOS Domino Logic
The monotonicity problem can be solved by placing a static CMOS inverter between
dynamic gates, as shown in Figure. This converts the monotonically falling output into a
monotonically rising signal suitable for the next gate, as shown in Figure.
The dynamic static pair together is called a domino gate because precharge
resembles setting up a chain of dominos and evaluation causes the gates to fire like dominos
tipping over, each triggering the next.
A single clock can be used to precharge and evaluate all the logic gates within the
chain. The dynamic output is monotonically falling during evaluation, so the static inverter
output is monotonically rising. Therefore, the static inverter is usually a HI-skew gate to
favor this rising output.
In general, more complex inverting static CMOS gates such as NANDs or NORs can
be used in place of the inverter . This mixture of dynamic and static logic is called compound
24. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 24
domino.
Domino gates are inherently noninverting, while some functions like XOR gates
necessarily require inversion. Three methods of addressing this problem include pushing
inversions into static logic, delaying clocks, and using dual-rail domino logic.
A second approach is to directly cascade dynamic gates without the static CMOS
inverter, delaying the clock to the later gates to ensure the inputs are monotonic during
evaluation.
Domino circuits
Pseudo-NMOS gates eliminate the bulky PMOS transistors loading the inputs, but pay
the price of quiescent power dissipation and contention between the pullup and pulldown
transistors. Dynamic gates offer even better logical effort and lower power consumption by
using a clocked precharge transistor instead of a pullup that is always conducting.
The dynamic gate is precharged HIGH then may evaluate LOW through an NMOS
stack. Unfortunately, if one dynamic inverter directly drives another, a race can corrupt the
result. When the clock rises, both outputs have been precharged HIGH.
The HIGH input to the first gate causes its output to fall, but the second gate‘s output
also falls in response to its initial HIGH input. The circuit therefore produces an incorrect
result because the second output will never rise during evaluation, as shown in Figure 10.3.
Domino circuits solve this problem by using inverting static gates between dynamic gates so
that the input to each dynamic gate is initially LOW. The falling dynamic output and rising
static output ripple through a chain of gates like a chain of toppling dominos.
In summary, domino logic runs 1:5 to 2 times faster than static CMOS logic because
dynamic gates present a much lower input capacitance for the same output current and have a
lower switching threshold, and because the inverting static gate can be skewed to favor the
critical monotonically rising evaluation edges. Figure shows some domino gates. Each
domino gate consists of a dynamic gate followed by an inverting static gate1.
The static gate is often but not always an inverter. Since the dynamic gate‘s output
falls monotonically during evaluation, the static gate should be skewed high to favor its
monotonically rising output.
A dynamic gate may be designed with or without a clocked evaluation transistor; the
extra transistor slows the gate but eliminates any path between power and ground during
precharge when the inputs are still high.
Dual-Rail Domino Logic:
Dual-rail domino gates encode each signal with a pair of wires. The input and output
signal pairs are denoted with sig_h and sig_l, respectively. Table summarizes the encoding.
The sig_h wire is asserted to indicate that the output of the gate is ―high‖ or 1. The sig_l wire
is asserted to indicate that the output of the gate is ―low‖ or 0.
25. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 25
When the gate is precharged, neither sig_h nor sig_l is asserted. The pair of lines
should never be both asserted simultaneously during correct operation.
Dual-rail domino gates accept both true and complementary inputs and compute both
true and complementary outputs, as shown in Figure. Observe that this is identical to static
CVSL circuits from Figure except that the cross-coupled pMOS transistors are instead
connected to the precharge clock. Therefore, dual-rail domino can be viewed as a dynamic
form of CVSL, sometimes called DCVS.
Figure shows a dual-rail AND/NAND gate and Figure shows a dual-rail XOR/XNOR
gate. The gates are shown with clocked evaluation transistors, but can also be unfooted. Dual-
rail domino is a complete logic family in that it can compute all inverting and non inverting
logic functions.
However, it requires more area, wiring, and power. Dual rail structures also lose the
efficiency of wide dynamic NOR gates because they require complementary tall dynamic
NAND stacks.
Dual rail domino signals not only the result of a computation but also indicates when
the computation is done. Before computation completes, both rails are precharged. When the
computation completes, one rail will be asserted. A NAND gate can be used for completion
detection, as shown in Figure. This is particularly useful for asynchronous circuits
Keepers
Dynamic circuits also suffer from charge leakage on the dynamic node. If a dynamic
node is precharged high and then left floating, the voltage on the dynamic node will drift over
time due to subthreshold, gate, and junction leakage. The time constants tend to be in the
millisecond to nanosecond range, depending on process and temperature. This problem is
analogous to leakage in dynamic RAMs.
26. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 26
More over, dynamic circuits have poor input noise margins . If the input rises above
Vt while the gate is in evaluation, the input transistors will turn on weakly and can incorrectly
discharge the output. Both leakage and noise margin problems can be addressed by adding a
keeper circuit.
Figure shows a conventional keeper on a domino buffer. The keeper is a weak
transistor that holds, or staticizes, the output at the correct level when it would otherwise
float. When the dynamic node X is high, the output Y is low and the keeper is ON to prevent
X from floating. When X falls, the keeper initially opposes the transition so it must be much
weaker than the pulldown network. Eventually Y rises, turning the keeper OFF and avoiding
static power dissipation.
The keeper must be strong (i.e., wide) enough to compensate for any leakage current
drawn when the output is floating and the pulldown stack is OFF. Strong keepers also
improve the noise margin because when the inputs are slightly above Vt the keeper can
supply enough current to hold the output high.
NP and Zipper Domino
Another variation on domino is shown in Figure. The HIskewinverting static gates
are replaced with predischarged dynamic gates using pMOS logic.
For example, a footed dynamic p-logic NAND gate is shown in Figure. When Φ is 0,
the first and third stages pre charge high while the second stage predischarges low. When Φ
rises, all the stages evaluate. Domino connections are possible, as shown in Figure. The
design style is called NP Domino or NORA Domino (NORA).
NORA has two major drawbacks. The logical effort of footed p-logic gates is
generally worse than that of HI-skew gates (e.g., 2 vs. 3/2 for NOR2 and 4/3 vs. 1 for
NAND2). Secondly, NORA is extremely susceptible to noise.
In an ordinary dynamic gate, the input has a low noise margin (about Vt ), but is
strongly driven by a static CMOS gate.
The floating dynamic output is more prone to noise from coupling and charge sharing,
but drives another static CMOS gate with a larger noise margin. In NORA, however, the
27. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 27
sensitive dynamic inputs are driven by noise prone dynamic outputs. Given these drawbacks
and the extra clock phase required, there is little reason to use NORA.
Zipper domino is a closely related technique that leaves the precharge transistors slightly ON
during evaluation by using precharge clocks that swing between 0 and VDD – |Vtp| for the
pMOS precharge and Vtn and VDD for the nMOS precharge. This plays much the same role
as a keeper.
THE STATIC AND DYNAMIC POWER DISSIPATION IN CMOS CIRCUITS
Static CMOS gates are very power-efficient because they dissipate nearly zero power
while idle. For much of the history of CMOS design, power was a secondary consideration
behind speed and area for many chips. As transistor counts and clock frequencies have
increased, power consumption has skyrocketed and now is a primary design constraint.
The instantaneous power P{t} drawn from the power supply is proportional to the
supply current iDD(t) and the supply voltage VDD, P(t) = iDD(t) VDD
The energy consumed over some time interval T is the integral of the instantaneous power
=
The average power over this interval is Pavg =
Power dissipation in CMOS circuits comes from two components
Static dissipation due to
subthreshold conduction through OFF transistors
tunneling current through gate oxide
leakage through reverse-biased diodes
contention current in ratioed circuits
Dynamic dissipation due to charging and discharging of load capacitances "short
circuit'' current while both pMOS and nMOS networks are partially ON
Ptotal = Pstatic + Pdynamic
Static Dissipation
Considering the static CMOS inverter shown in Figure, if the input = '0,' the
associated nMOS transistor is OFF and the pMOS transistor is ON. The output voltage is
VDD or logic 1.'
When the input = 1 the associated nMOS transistor is ON and the pMOS transistor is
OFF. The output voltage is 0 volts (GND). Note that one of the transistors is always OFF
when the gate is in either of these logic states.
Ideally, no current flows through the OFF transistor so the power dissipation is zero
when the circuit is quiescent, i.e., when no transistors are switching. Zero quiescent power
dissipation is a principle advantage of CMOS over competing transistor technologies.
However, secondary effects including subthreshold conduction, tunneling, and
leakage lead to small amounts of static current flowing through the OFF transistor. Assuming
the leakage current is constant so instantaneous and average power are the same, the static
power dissipation is the product of total leakage current and the supply voltage.
Pstatic = Istatic VDD
OFF transistors still conduct a small amount of subthreshold current. As subthreshold current
is exponentially dependent on threshold voltage, it is increasing dramatically as threshold
voltages have scaled down. There is also some small static dissipation due to reverse biased
diode leakage between diffusion regions, wells, and the substrate. In modern processes, diode
28. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 28
leakage is generally much smaller than the subthreshold or gate leakage and may be
neglected.
Dynamic Dissipation
Over any given interval of time T, the load will be charged and discharged Tfsw times.
Current flows from VDD to the load to charge it. Current then flows from the load to GND
during discharge. In one complete charge/discharge cycle, a total charge of Q = CVDD is
thus transferred from VDD to GND. The average dynamic power dissipation is
Pdynamic =
Pdynamic =
Because most gates do not switch every' clock cycle, it is often more convenient to express
switching frequency fsw as an activity factor a times the clock frequency.
Now the dynamic power dissipation may be rewritten as;
Pdynamic =
A clock has an activity factor of α=1, because it rises and falls every cycle. Most data
has a maximum activity factor of 0.5 because it transitions only once each cycle.
Static CMOS logic has been empirically determined to have acvtiity factors closer to
0.1 because some gates maintain one output state more often thananother.
Because the input rise /fall time is greater than zero, both nMOS and pMOS
transistors will be ON for a short period of time while the input is between Vtn and VDD - Vtp.
This results in an additional "short circuit" current pulse from to GND a VDD and typically
increases power dissipation by about 10% .
Methods to reduce dynamic power dissipation
1. Reducing the product of capacitance and its switching frequency.
2. Eliminate logic switching that is not necessary for computation.
3. Reduce activity factor Reduce supply voltage
Methods to reduce static power dissipation
1. By selecting multi threshold voltages on circuit paths with low-Vt transistors
while leakage on other paths with high-Vt transistors.
2. By using two operating modes, active and standby for each function blocks.
3. By adjusting the body bias (i.e) adjusting FBB (Forward Body Bias) in active
mode to increase performance and RBB (Reverse Body Bias) in standby mode
to reduce leakage.
4. By using sleep transistors to isolate the supply from the block to achieve
significant leakage power savings.
29. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 29
UNIT III: SEQUENTIAL LOGIC CIRCUITS
Static & Dynamic Latches and Registers, Pipelining
In sequential logic circuits, the output not only depends upon the current values of
the inputs, but also upon preceding input values. In other words, a sequential circuit
remembers some of the past history of the system—it hasmemory.
Figure shows a block diagram of a generic finite state machine (FSM) that consists
of combinational logic and registers, which hold the system state. The system
depicted here belongs to the class of synchronous sequential systems, in which all
registers are under control of a single global clock. The outputs of the FSM are a
function of the current Inputs and the Current State. The Next State is determined
based on the Current State and the current Inputs and is fed to the inputs of
registers.
On the rising edge of the clock, the Next State bits are copied to the outputs of the
registers (after some propagation delay), and a new cycle begins. The register then
ignores changes in the input signals until the next rising edge. In general, registers
can be positive edge- triggered (where the input data is copied on the positive edge
of the clock) or negative edge- triggered (where the input data is copied on the
negative edge, as is indicated by a small circle at the clock input).
Block diagram of a finite state machine using positive edge-triggered registers.
Timing Metrics for Sequential Circuits
There are three important timing parameters associated with a register as illustrated in
Figure.
1. The set-up time (tsu) is the time that the data inputs (D input) must be valid before
the clock transition (this is, the 0 to 1 transition for a positive edge-triggered
register).
2. The hold time (thold) is the time the data input must remain valid after the clock
edge.
3. Assuming that the set-up and hold-times are met, the data at the D input is copied to
the Q output after a worst-case propagation delay (with reference to the clock edge)
denoted by tc-q. Given the timing information for the registers and the combination
logic, some system-level timing constraints can be derived. Assume that the worst-
case propagation delay of the logic equals tplogic,while itsminimum delay (also
called the contamination delay) is tcd. The minimum clock period T, required for
proper operation of the sequential circuit is given by
The hold time of the register imposes an extra constraint for proper operation,
Wheretcdregisteris the minimum propagation delay (or contamination delay) of the register.
It is important to minimize the values of the timing parameters associated with the register, as
these directly affect the rate at which a sequential circuit can be clocked. In fact, modern
high-performance systems are characterized by a very-low logic depth, and the register
propagation delay and set-up times account for a significant portion of the clock period.
30. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 30
Classification of Memory Elements
Foreground versus Background Memory
Memory that is embedded into logic is foreground memory (internal memory), and is most
often organized as individual registers of register banks. Large amounts of centralized
memory core are referred to as background memory (external memory).
Static versus Dynamic Memory
Static memories preserve the state as long as the power is turned on.
Built using positive feedback or regeneration, where the circuit topology consists of
intentional connections between the output and the input of a combinational circuit.
Static memories are most useful when the register won‘t be updated for extended
periods of time. E.g. configuration data, loaded at power-up time.
This condition also holds for most processors that use conditional clocking (i.e.,
gated clocks) where the clock is turned off for unused modules. In that case, there
are no guarantees on how frequently the registers will be clocked, and static
memories are needed to preserve the state information.
Memory based on positive feedback fall under the class of elements called
multivibrator circuits.The bistableelement, is its most popular representative, but
other elements such as monostable and astable circuits are also frequently used.
Dynamic memories store state for a short period of time—on the order of
milliseconds. They are based on the principle of temporary charge storage on
parasitic capacitors associated with MOS devices. Capacitors have to be refreshed
periodically to annihilate charge leakage.
Dynamic memories tend to be simpler, resulting in significantly higher performance
and lower power dissipation. They are most useful in datapath circuits that require
high performance levels and are periodically clocked.
Latches versus Registers
A latch is an essential component in the construction of an edge-triggered register. It is
level- sensitive circuit that passes the D input to the Q output when the clock signal is high.
This latch is said to be in transparent mode. When the clock is low, the input data sampled
on the falling edge of the clock is held stable at the output for the entire phase, and the latch
is in hold mode. The inputs must be stable for a short period around the falling edge of the
clock to meet set-up and hold requirements. A latch operating under the above conditions is
a positive latch. Similarly, a negative latch passes the D input to the Q output when the
clock signal is low.
31. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 31
Timing of positive and negative latches
Static Latches and Registers
The Bistability Principle
Static memories use positive feedback to create a bistable circuit — a circuit having two
stable states that represent 0 and 1. The basic idea is shown in Figure a, which shows two
inverters connected in cascade along with a voltage-transfer characteristic typical of such a
circuit. Assume now that the output of the second inverter Vo2 is connected to the input of
the first Vi1, as shown by the dotted lines in Figure a.
The resulting circuit has only three possible operation points (A, B, and C). Under the
condition that the gain of the inverter in the transient region is larger than 1, only A and B
are stable operation points, and C is a metastable operation point. Suppose that the cross-
coupled inverter pair is biased at point C. A small deviation from this bias point, possibly
caused by noise, is amplified and regenerated around the circuit loop. This is a
consequence of the gain around the loop being larger than 1.
On the other hand, A and B are stable operation points. In these points, the loop gain is
much smaller than unity. Hence the cross-coupling of two inverters results in a
bistablecircuit, which serves as a memory, storing either a 1 or a 0 (corresponding to
positions A and B). In order to change the stored value, we must be able to bring the circuit
from state A to B and vice-versa. This is generally done by applying a trigger pulse at Vi1
or Vi2. The width of the trigger pulse need be only a little larger than the total propagation
delay around the circuit loop, which is twice the average propagation delay of the
inverters.
SR Flip-Flops
SR —or set- reset— flip-flopcircuit is similar to the cross-coupled inverter pair with NOR
gates replacing the inverters. The second input of the NOR gates is connected to the trigger
inputs (S and R), that make it possible to force the outputs Q and Q' to a given state. These
outputs are complimentary (except for the SR = 11 state). When both S and R are 0, the
flip-flop is in a quiescent state and both outputs retain their value. If a positive (or 1) pulse
is applied to the S input,theQ output is forced into the 1 state (with Q going to 0). Vice
versa, a 1 pulse on R resets the flip-flop and the Q output goes to 0.
32. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 32
When both S and R are high, both Q and Q'are forced to zero. This is forbidden. An
additional problem with this condition is that when the input triggers return to their zero
levels, the resulting state of the latch is unpredictable and depends on whatever input is last to
go low.
CMOS clocked SR flip-flop
One possible realization of a clocked SR flip-flop— a level-sensitive positive latch— is
shown in Figure. It consists of a cross-coupled inverter pair, plus 4 extra transistors to drive
the flip- flop from one state to another and to provide clocked operation.
Multiplexer-Based Latches
Advantage: the sizing of devices only affects performance and is not critical to the
functionality. For a negative latch, when the clock signal is low, the input 0 of the
multiplexer is selected, and the D input is passed to the output. When the clock signal is
high, the input 1 of the multiplexer, which connects to the output of the latch, is selected.
The feedback holds the output stable while the clock signal is high.
A transistor level implementation of a positive latch based on multiplexers is shown in
Figure.
When CLK is high, the bottom transmission gate is on and the latch is transparent -
that is, the D input is copied to the Q output.
The feedback does not have to be overridden to write the memory and hence sizing of
transistors is not critical for realizing correct functionality. The number of transistors
that the clock touches is important since it has an activity factor of 1.
Not efficient from this metric as it presents a load of 4 transistors to the CLK signal.
To reduce the clock load to 2 transistors, by using NMOS only pass transistor as shown in
Figure. Advantage
reduced clock load of only two NMOS devices.
33. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 33
Simple circuit.
Disadvantage:
Results in passing of a degraded high voltage of VDD- VTnto the input of the first inverter.
This impacts both noise margin and the switching performance, especially in the case of
low values of VDD and high values of VTn. It also causes static power dissipation in first
inverter. Since the maximum input-voltage to the inverter equals VDD-VTn, the PMOS
device of the inverter is never turned off, resulting in a static current flow.
Master-Slave Edge-Triggered Register
The register consists of cascading a negativeWSW latch (master stage) with a positive
latch (slave stage).
On the low phase of the clock, the master stage is transparent, and the D input is passed
to the master stage output, QM. During this period, the slave stage is in the hold mode,
keeping its previous value using feedback.
On the rising edge of the clock, the master slave stops sampling the input, and the slave
stage starts sampling. During the high phase of the clock, the slave stage samples the
output ofthe masterstage (QM), while the master stage remains in a hold mode. Since
QM is constant during the high phase of the clock, the output Q makes only one
transition per cycle.
The value of Q is the value ofDright before the rising edge of the clock, achieving the
positive edge-triggered effect. A negative edge-triggered register can be constructed
using the same principle by simply switching the order of the positive and negative
latch (this is, placing the positive latch first).
A complete transistor-level implementation of the master-slave positive edge-triggered
register is shown in Figure below.
Drawback of the transmission gate register :the high capacitive load presented to the clock
signal. The clock load per register is important, since it directly impacts the power
dissipation of the clock network. Each register has a clock load of 8 transistors. One
approach to reduce the clock load at the cost of robustness is to make the circuit ratioed.
34. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 34
Figure below shows that the feedback transmission gate can be eliminated by directly cross
coupling the inverters.
Another problem with this scheme is the reverse conduction — this is, the second stage can
affect the state of the first latch. When the slave stage is on (Figure above)it is possible for
the combination of T2 and I4 to influence the data stored in I1-I2 latch. As long as I4 is a
weak device, this is fortunately not a major problem.
Non-ideal clock signals
Variations can exist in the wires used to route the two clock signals, or the load
capacitances can vary based on data stored in the connecting latches. This effect, known as
clock skew is a major problem, and causes the two clock signals to overlap as is shown in
Figure 7.20b. Clock-overlap can cause two types of failures, as illustrated for the NMOS-
only negative master- slave register.
When the clock goes high, the slave stage should stop sampling the master stage
output and go into a hold mode. However, since CLK and CLK bar are both high for
a short period of time (the overlap period), both sampling pass transistors conduct
and there is a direct path from the D input to the Q output. As a result, data at the
output can change on the rising edge of the clock.This is a race condition in which
the value of the output Q is a function of whether the input D arrives at node X
before or after the falling edge of CLK. If node X is sampled in the metastable state,
the output will switch to a value determined by noise in the system.
The primary advantage of the multiplexer-based register is that the feedback loop is
open during the sampling period, and therefore sizing of devices is not critical to
functionality. However, if there is clock overlap between CLK bar and CLK, node A
can be driven by both D and B, resulting in an undefinedstate.
Those problems can be avoided by using two non-overlapping clocks PHI1 and PHI2
instead, and by keeping the nonoverlap time tnon_overlapbetween the clocks large
enough such that no overlap occurs even in the presence of clock-routing delays.
35. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 35
Dynamic Latches and Registers
The class of circuits based on temporary storage of charge on parasitic capacitors. Charge
stored on a capacitor can be used to represent a logic signal. The absence of charge denotes
a 0, while its presence stands for a stored 1. a periodic refresh of its value is necessary.
Hence the name dynamic storage.
Dynamic Transmission-Gate Edge-triggered Registers:
A fully dynamic positive edge-triggered register based on the master-slave concept is
shown inFigure below.
When CLK = 0, the input data is sampled on storage node 1, which has an equivalent
capacitance of C1 consisting of the gate capacitance of I1, the junction capacitance
of T1, and the overlap gate capacitance of T1.
During this period, the slave stage is in a hold mode, with node 2 in a high-
impedance (floating) state.
On the rising edge of clock, the transmission gate T2 turns on, and the value sampled
on node 1 right before the rising edge propagates to the output Q
Node 2 now stores the inverted version of node 1.
Very efficient - requires only 8 transistors. The sampling switches
canbeimplementedusingNMOS-onlypasstransistors (6-transistorimplementation).
The set-up time of this circuit is simply the delay of the transmission gate, and corresponds
to the time it takes node 1 to sample the D input. The hold time is approximately zero, since
the transmission gate is turned off on the clock edge and further inputs changes are ignored.
The propagation delay (tc-q) is equal to two inverter delays plus the delay of the
transmission gate T2.
Race Condition and Preventive Measures
Clock overlap is an important concern for this dynamic register. Consider the clock
waveforms shown in Figure below. During the 0-0 overlap period, the PMOS of T1 and
the PMOS of T2 are simultaneously on, creating a direct path for data to flow from the D
input of the register to the Q output. As a result, data at the output can change on the
falling edge of the clock, which is undesired for a positive edge triggered register. The is
known as a race condition in which the value of the output Q is a function of whether the
input D arrives at node X before or after the raising edge of CLK. The output Q can change
on the falling edge if the overlap period is large — obviously an undesirable effect for a
positive edge-triggered register. The sameis true for the 1-1 overlap region, where an
input-output path exists through the NMOS of T1 and the NMOS of T2. The latter case is
taken care of by enforcing a hold time constraint. That is, the data must be stable during
the high-high overlap period. The former situation (0-0 overlap) can be addressed by
making sure that there is enough delay between the D input and node 2 ensuring that new
data sampled by the master stage does not propagate through to the slave stage. Generally
the built in single inverter delay should be sufficient and the overlap period constraint is
givenas:
Similarly, the constraint for the 1-1 overlap is given as:
36. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 36
Impact of overlapping clocks.
C2
MOS—A Clock-Skew Insensitive Approach ( Method to prevent race
condition)
Figure below shows an ingenious positive edge-triggered register, based on a master-slave
concept insensitive to clock overlap. This circuit is called the C2
MOS (Clocked CMOS)
register, and operates in two phases.
1. CLK = 0 (CLK bar = 1): The first tri-state driver is turned on, and the master stage
acts as an inverter sampling the inverted version of D on the internal node X. The
master stage is in the evaluation mode. Meanwhile, the slave section is in a high-
impedance mode, or in ahold mode. Both transistors M7 and M8 are off, decoupling
the output from the input. The output Q retains its previous value stored on the
output capacitorCL2.
2. The roles are reversed when CLK = 1: The master stage section is in hold mode
(M3- M4 off), while the second section evaluates (M7-M8on). The value stored on
CL1propagates to the output node through the slave stage which acts as aninverter.
In the (0-0) overlap case, both PMOS devices are on during this period. New data is
sampled on node X through the series PMOS devices M2-M4, and node X can make a 0-to-1
transition during the overlap period. However, this data cannot propagate to the output
since the NMOS device M7is turned off. At the end of the overlap period, CLK=1 and both
M7 and M8 turn off, putting the slave stage is in the holdmode.
The (1-1) overlap case where both NMOS devices M3 and M7 are turned on. If the D input
changes during the overlap period, node X can make a 1-to-0 transition, but cannot
propagate to the output. However, as soon as the overlap period is over, the PMOS M8is
turned on and the 0 propagates to output. This effect is notdesirable.
The problem is fixed by imposing a hold time constraint on the input data, D, or, in other
words, the data D should be stable during the overlap period.
Pipelining: An approach to optimize sequential circuits
Pipelining is a popular design technique often used to accelerate the operation of the
datapaths in digital processors. The idea is easily explained with the example of
Figure(a).The goal of the presented circuit is to compute log(|a + b|), where both a and b
represent streams of numbers, that is, the computation must be performed on a large set of
inputvalues.
The minimal clock period Tmin necessary to ensure correct evaluation is given as:
wheretc-qand tsuare the propagation delay and the set-up time of the register, respectively.
We assume that the registers are edge-triggered D registers. The term tpd,logicstands for
the worst- case delay path through the combinational network, which consists of the adder,
absolute value, and logarithm functions. In conventional systems, the latter delay is
37. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 37
generally much larger than the delays associated with the registers and dominates the
circuit performance. Assume that each logic module has an equal propagation delay. We
note that each logic module is then active for only 1/3 of the clock period (if the delay of
the register is ignored). For example, the adder unit is active during the first third of the
period and remains idle—this is, it does no useful computation— during the other 2/3 of
theperiod.
(a)
(b)
Pipelining is a technique to improve the resource utilization, and increase the functional
throughput. Assume that we introduce registers between the logic blocks, as shown in
Figure b. This causes the computation for one set of input data to spread over a number of
clock periods, as shown in Table.The advantage of pipelined operation becomes apparent
when examining the minimum clock period of the modified circuit. The combinational
circuit block has been partitioned into three sections, each of which has a smaller
propagation delay than the original function. This effectively reduces the value of the
minimum allowable clock period:
Suppose that all logic blocks have approximately the same propagation delay, and that the
register overhead is small with respect to the logic delays. The pipelined network
outperforms the original circuit by a factor of three under these assumptions, or T
min,pipe=Tmin/3. The increased performance comes at the relatively small cost of two
additional registers, and an increased latency.
Latch- vs. Register-Based Pipelines
Consider the pipelined circuit of Figure below. The pipeline system is implemented based
on pass-transistor-based positive and negative latches instead of edge triggered registers.
Latch-based systems give significantly more flexibility in implementing a pipelined
system, and oftenoffers higher performance. When the clocks CLK and are non-
overlapping,correctpipelineoperationisobtained.InputdataissampledonC1atthenegativeedge
of CLK and the computation of logic block F starts; the result of the logic block F is stored
on C2 on the falling edge of , and the computation of logic block G starts. The
non
overlappingoftheclocksensurescorrectoperation.ThevaluestoredonC2attheendoftheCLKlow
phaseistheresultofpassingthepreviousinput(storedon thefallingedgeofCLKonC1) through
the logic function F.
38. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 38
NORA-CMOS—A Logic Style for Pipelined Structures
The latch-based pipeline circuit can also be implemented using C2
MOS latches, as shown
in Figure below. This topology has one additional, important property:A C2
MOS-based
pipelined circuit is race-free as long as all the logic functions F between the latches are
non-inverting.
The reasoning for the above argument is similar to the argument made in the construction
of a C2
MOS register. During a (0-0) overlap betweenCLK and, all C2
MOS latches,
simplify to pure pull-up networks (see Figure7.27).
The only way a signal can race from stage to stage under this condition is when the logic
function F is inverting, as illustrated in Figure above, where F is replaced by a single,
static CMOS inverter. Similar considerations are valid for the (1-1)overlap.
Sources of Clock Skew and Jitter
A perfect clock is defined as perfectly periodic signal that is simultaneous triggered at
various memory elements on the chip. However, due to a variety of process and
environmental variations, clocks are not ideal. To illustrate the sources of skew and jitter,
consider the simplistic view of clock generation and distribution as shown in Figure below.
Typically, a high frequency clock is either provided from off chip or generated on-chip.
From a central point, the clock is distributed using multiple matched paths to low-level
memory element, registers. Here two paths are shown. The clock paths include wiring and
the associated distributed buffers required to drive interconnects and loads. A key point to
realize in clock distribution is that the absolute delay through a clock distribution path is
not important; But the relative arrival time between the output of each path at the register
points is important.
39. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 39
The sources of clock uncertainty can be classified in several ways. Systematic errors are
nominally identical from chip to chip, and aretypically predictable (e.g., variation in total
load capacitance of each clock path). In principle, such errors can be modeled and
corrected at design time given sufficiently good models and simulators. Random errors are
due to manufacturing variations (e.g., dopant fluctuations that result in threshold
variations) that are difficult to model and eliminate.Mismatch may also be characterized as
static or time-varying. Below, the various sources ofskewand jitter, introduced in Figure
10.14, are described in detail.
Clock-Signal Generation(1)
The generation of the clock signal itself causes jitter. A typical on-chip clock
generator takes a low-frequency reference clock signal, and produces a high-
frequency global reference for the processor. The core of such a generator is a
Voltage-Controlled Oscillator (VCO). Problem is coupling from the surrounding
noisy digital circuitry through the substrate. These noise source cause temporal
variations of the clock signal that propagate unfiltered through the clock drivers to
the flip-flops.
Manufacturing Device Variations(2)
Distributed buffers are integral components of the clock distribution networks, as
they are required to drive both the register loads as well as the global and local
interconnects. The matching of devices in the buffers along multiple clock paths is
critical to minimizing timing uncertainty. Device parameters in the buffers vary
along different paths, resulting in static skew.There are many sources of variations
including oxide variations (that affects the gain and threshold), dopant variations,
and lateral dimension (width and length) variations.
Interconnect Variations(3)
Vertical and lateral dimension variations cause the interconnect capacitance and
resistance to vary across a chip. Since this variation is static, it causes skew between
different paths. One important source of interconnect variation is the Inter-level
Dielectric (ILD) thickness variations. Other interconnect variations include deviation
in the width of the wires and line spacing. This results from photolithography and
etch dependencies.
Environmental Variations (4 and 5)
The two major sources are temperature and power supply. Temperature gradients
across the chip isa result of variations in power dissipation across the die (chip). This
is an issue with clock gating where some parts of the chip maybe idle while other
parts of the chip might be active. Since the device parameters (such as threshold,
mobility, etc.) depend strongly on temperature, buffer delay for a clock distribution
network along one path can vary drastically for another path. The delay through
buffers is a very strong function of power supply as it directly affects the drive of the
transistors. As with temperature, the power supply voltage is a strong function of the
switching activity. Power supply variations can be classified into static (or slow) and
high frequency variations. Static power supply variations may result from fixed
currents drawn from various modules, while high-frequency variations result from
40. EC8095: VLSI Design Department of ECE 2020-2021
St.Joseph’s College of Engineering / St.Joseph’s Institute of Technology 40
instantaneous IR drops along the power grid due to fluctuations in switching activity.
Capacitive Coupling (6 and 7)
The variation in capacitive load also contributes to timing uncertainty. There are two
major sources of capacitive load variations: coupling between the clock lines and
adjacent signal wires and variation in gate capacitance. Any coupling between the
clock wire and adjacent signal results in timing uncertainty leading to clock jitter.
Another major source of clock uncertainty is variation in the gate capacitance related
to the sequential elements. The load capacitance is highly non-linear and depends on
the applied voltage.
Timing Issues in Digital Circuits, Clock Distribution Techniques,Synchronous and
Asynchronous Design
All sequential circuits have one property in common—a well-defined ordering of the
switching events must be imposed if the circuit is to operate correctly. If this were not the
case, wrong data might be written into the memory elements, resulting in a functional
failure. The synchronous system approach, in which all memory elements in the system are
simultaneously updated using a globally distributed periodic synchronization signal (that
is, a global clock signal), represents an effective and popular way to enforce this ordering.
Functionality is ensured by imposing some strict constraints on the generation of the clock
signals and their distribution to the memory elements distributed over the chip; non-
compliance often leads to malfunction.
We analyze the impact of spatial variations of the clock signal, called clock skew, and
temporal variations of the clock signal, called clock jitter, and introduce techniques to cope
with it. These variations fundamentally limit the performance that can be achieved using a
conventional design methodology.
At the other end of the design spectrum is an approach called asynchronous design,
which avoids the problem of clock uncertainty all-together by eliminating the need for
globally-distributed clocks. After discussing the basics of asynchronous design approach,
we analyze the associated overhead and identify some practical applications. The important
issue of synchronization, which is required when interfacing different clock domains
or when sampling an asynchronous signal, also deserves some in-depth treatment. Finally,
the fundamentals of on-chip clock generation using feedback is introduced along with
trends in timing.
Timing Classification Of Digital Systems
In digital systems, signals can be classified depending on how they are related to a local
clock.Signals that transition only at predetermined periods in time can be classified as
synchronous, mesochronous, or plesiochronous with respect to a system clock. A signal that
can transition at arbitrary times is considered asynchronous.
Synchronous Interconnect: A signal with exact same frequency, and a known fixed
phase offset with respect to the local clock.
Mesochronous interconnect:Asignal with the same frequency but an unknown
phase offset with respect to the local clock
Plesiochronous Interconnect A signal which has nominally the same, but slightly
differentfrequency as the local clock
Asynchronous Interconnect: Asynchronous signals can transition at any arbitrary
time, and are not slaved to any local clock.
Synchronous Design:
Synchronous Timing Basics
All systems designed today use a periodic synchronization signal or clock. The generation
and distribution of a clock has a significant impact on performance and power dissipation.
In the ideal world, assuming the clock paths from a central distribution point to each