Session 2: DEA METHODOLOGY
MULTI-SUPPLY DIGITAL LAYOUT
Regis Santonja, Motorola
Volker Wahl, Motorola
silicon. Back-annotated simulations and static
timing analysis allow the designer to ensure that all
the timing constraints of the design are met.
2. Example of timing constraints: the
In this paper, the principle of a technique called
quot;multi-supply digital layoutquot; is described. The use setup time
of this technique allows a reliable backannotation
between digital blocks that are NOT powered off Usually, there are two levels of complexity for
the same supplies, within an analog top-cell. The calculating the timing constraints of a flip-flop :
supplies do not have to have the same voltage
levels, thanks to the integration of level shifters for a) Before layout, when the clock is considered
voltage adaptation within the digital layout. It is perfect (no skew),
also applicable in systems where a supply can be
turned off while another one stays alive. This b) After layout, when a clock skew shows up.
technique also optimizes the die size with no extra
efforts, reduces the layout phase and optimizes scan Dealing with a multiple supply layout adds another
insertion and ATPG. level of complexity because we need level shifters
on some data and clock paths. The diagram below
Index Terms – Layout, level shifter, back- summarizes the situation.
annotation, scan, low power, multiple supplies,
The goal of this paper is to present why and how to
make a multiple supply digital layout.
We will present a flow which covers all the steps
from the RTL design down to the layout, using only Where δclk(i) is the delay for the clock root driver to the pin of flip-flop i,
standard CAD tools. We will also compare this δck2q is the transition time of the flip-flop, δd is the data path delay and Tclk
is the clock period.
technique with the existing literature on the subject,
and explain why it is best suited towards our needs In order for the layout tool to generate a balanced
in terms of resulting area, layout development time, clock tree, one needs to have a logical and a timing
and scan test. model for the level shifters. The level shifters are
presented in section III.4.
1. What is post-layout back-annotation ?
3. Why do we need several power
It is the process of calculating the cell delays based supplies in a design?
on the final routing, and putting these delays into
the cell models for simulation or static timing
There are two reasons for using several power
supplies, both of which are necessary for power
management chips. This kind of circuit is very
Back-annotation is needed in order to ensure that
common in mobile phones. They are used for
the functionality is kept from RTL design down to
SAME 2001, November 15th 2001 1
regulating and distributing the power supplies to the In section V, we present a program which generates
other chips in the telephone. scripts for Silicon Ensemble. In section VI, we
present our multiple voltage clock tree solution.
Finally, section VII presents a possibility for
a) Reducing Power Consumption
enhancing the flow in the future.
Reducing the power consumption of portable
II. Prior Art
devices such as mobile phones, PDAs or portable
PCs has become one of the most important goals of
the semiconductor industry. As exposed in section 1. Interfacing circuits that operate at
II.3. of this paper, using several power supplies is
different voltage levels.
one of the most effective techniques to reduce
On analog-oriented chips where several digital
blocks powered off different supplies have to be
b) Interfacing circuits that operate at laid out on the same silicon, the traditional way to
different voltage levels. do this was to design and layout the digital blocks
separately, place them as macro cells in the analog
The second reason for using several power supplies top cell of the chip, then use an analog router such
is to interface circuits that operate at several as IC Craftsman to interconnect the blocks.
voltages. Power management chips include a variety
of programmable functions (such as an audio
codec, an ADC used to monitor the supply levels, a
touch screen interface, a USB, an RS232 port,
etc…). The most effective technique is to have each
of these functions controlled by a logic powered off
the same voltage which is required for the
A simplified example of how a power management
chip can be in the heart of a multiple supply system
is presented below: we have a processor with inputs
and outputs operating at 1.8V and a core at 2.5V. This method had the following disadvantages:
The power management chip communicates through
its serial interface (SPI) operating at 1.8V with an a) There was no way to use the inter-block
embedded real time clock powered off an external connections' parasitics and generate a standard
Lithium cell at 3.2V. SDF file for back-annotation.
b) Three digital layouts had to be done separately
with no way to globally re-order the scan chain.
c) Three tools and environments had to be used:
Silicon Ensemble, Cadence Framework II
(Virtuoso) and IC Craftsman.
d) Tools such as IC Craftsman and Virtuoso from
Cadence are analog tools and not familiar to
most of the digital designers.
2. Sophisticated layout techniques found
in the literature.
The authors in     and  have already
proposed some techniques to layout multiple supply
The organization of this paper is the following. In circuits. However, they have started from a different
section II, we present the prior art in multiple situation: they have a single supply circuit and want
supply layout and show why it is not adapted to our to save power by multiplying the number of its
needs. In section III, we present our layout solution. supplies. For doing this, they split the circuit at the
In section IV we present the design flow and how to gate level and assign to each gate the power supply
integrate the analog level shifters in the digital flow. which best matches its timing requirements, with no
SAME 2001, November 15th 2001 2
respect to the function implemented, in such a way they do not have the same ground. The picture
that a given function can be spread over several below represents two inverters. We can see that
supplies. As the number of connections within a without the isolation, vss1 and vss2 would short
function is statistically much bigger than the number together.
of connections between the functions, this method
(called gate-level voltage scaling) generates a lot of
routing between the supplies. Because of this, these
authors have developed sophisticated techniques in
order to minimise the routing. However, the
drawback is that the placement algorithm has to be
modified. For example, Chingwei Yeh and Yin- Note that there is a minimum ring width and
Shuin Kang in  and  have proposed a distance required between the rings.
modification of the simulated annealing by
introducing a new cost function associated with
2. Layout style
In opposition to the prior art, our starting point is to
These methods cannot be used for our designs, as
develop a chip which is already, by nature, a
we require to use standard CAD tools.
multiple supply circuit. In fact, we could say that
another type of voltage scaling technique
3. How can we reduce power by using (architecture voltage scaling) was used at the
multiple supplies? system level, resulting in the definition of a chip in
which all the functions (control, real time clock, SPI
This technique - called gate level voltage scaling - interface etc…) have been assigned to a voltage
consists in using a low supply voltage for the parts supply. For this reason, we do not encounter the
of the circuit that do not suffer from the implied same issues than these authors concerning the
transistor performance degradation, and keep a routing. Thus, our layout solution has the following
higher voltage level for the critical paths of the advantages:
circuit. Effectively, lowering the voltage is the most
effective technique for reducing CMOS power • it is the simplest,
consumption because the latter is proportional to • it works fine with standard cell-based layout
the square of the supply voltage. tools (no need to modify the placement
4. What about clock distribution? • it includes all the necessary level shifters,
• it makes it easy to isolate the voltage regions
Many papers have been published since 1990 about from each other with a negligible impact on the
generating a zero skew clock tree . Various overall area,
algorithms have been proposed for single supply, as • cells can be abutted in each voltage region as in
well as for dual supply circuits  . However, in usual single-supply layouts.
, Usami et al. propose a clock tree structure
where the leaves have to be in the low voltage These last two points can result in significant area
region: the tree does not reach the flip-flops in the savings compared to the prior art. And if we
other region. compare to section II.1, the listed disadvantages
We’ll see in section VII. that we propose a
technique allowing a given clock tree to drive flip- a) We can now generate a single standard SDF
flops in both low and high voltage regions. file for back-annotation. All the inter-region
connections are taken into account.
III. Our layout solution b) Only one digital layout had to be done with the
possibility to globally re-order the scan chain.
1. Supplies isolation within the epi c) Only one layout tool is used: Silicon Ensemble,
and no analog tool.
d) Silicon Ensemble is familiar to most of the
Because we are in a mixed-signal environment, we
have to pay attention to the transitions in the digital
domain that might generate commutation noise on
In practice, we grouped the cells powered by the
sensitive analog blocks. For this reason, the digital
same voltage in 3 voltage regions, as presented
has to be surrounded by an isolation ring. In the
below. Note that the three regions are separated by
same manner, we isolate the digital blocks operating
the necessary isolation ring.
at different voltages from each other, especially if
SAME 2001, November 15th 2001 3
4. The signal goes from a low voltage to
a high voltage
Whenever a gate has to drive the input of another
gate operating at a higher voltage, a voltage
conversion is needed at the interface. Connecting
the low voltage signal directly to the high voltage
gate is not acceptable, even though it would be the
Two issues have to be taken into account when a
simplest solution. The simulation plot below shows
signal goes from one voltage to another one:
this situation with two inverters, the first one being
operating at a lower supply than the second one.
3. The signal goes from a high voltage to When a falling edge is presented at the input of the
a low voltage first inverter, there is a static current consumption in
the second inverter because its PMOS is weakly
The first issue that can show up is associated with opened.
antenna diodes that can allow a static current to
flow from the high to the low voltage region. output Curent in second
Effectively, charge-collecting antennas are formed inverter
during wafer processing when an interconnect (field
poly or metal) is connected to a poly gate that does 50 µA
no yet have an electrical connection to diffusion. A
connection to diffusion is typically completed at the 130 mV
top level of metal, so conductors below the top level
of metal are generally considered responsible for
damage from collecting charge during plasma
processing. Therefore, antenna area ratio design
rules are commonly used in the semiconductor
The solution we adopted is to use a dual cascode
industry to ensure that the remaining charges do not
voltage switch (DCVS), which I call a “level
damage circuits .
shifter” in this paper. However, a usual level shifter
as presented in  has its output undefined
Many companies in the industry add systematically
whenever the input supply is turned off. For this
antenna diodes in their standard cells that are
reason, we have added a 2-input AND gate in order
connected on all input pins of the gates. These
to force the output low and a NMOS in order to cut
antenna diodes are either connected to the supply
any current which could flow to the ground as
(P-type diode) or to the ground (N-type diode),
shown below. The NMOS and the AND gate are
depending on the area cost for the cell.
controlled by a signal which is low when the input
As a consequence, the voltage supply is switched off.
type of the diodes
appears to be random,
leading to the risk of
having a static current
from the higher to the
lower voltage flowing
through a P-type diode,
as presented on the right.
In order to avoid this leakage, we can take
advantage from the cells which happen to have only
N-type antenna diodes, such as all the simple
buffers in the technology we used. The inserted cell
has to be powered off the low supply as presented
on the Figure below.
SAME 2001, November 15th 2001 4
layout of the cell. The second file is the TFL
The level shifter’s
(Timing Library File). It can be automatically
layout has been done
derived from the Design Compiler’s library using
in such a way that it
the syn2tlf program provided by Cadence. The TLF
looks like a standard
file is needed for CT-Gen (the Clock-Tree
cell’s layout except
Generator) in order to estimate the clock skew and
that it is “dual-rail”
the insertion delay of the clock tree.
as shown on the
Silicon Ensemble generates a post-layout netlist
which includes the level shifters, and an RC file
which contains the list of all the capacitors and
IV. Design Flow and Libraries resistances of the routed nets. These two files can
then be read by the delay calculator which generates
The principle of the technique presented here is to a SDF file used for the back-annotation. The delay
avoid the need of using analog tools and tool calculator can be Design Compiler or Primetime
environments from RTL down to the layout. CAD from Synopsys, or any internal tool (quite often
tools all have to be digital and standard. In order to foundries have their own golden delay calculator).
stay in a pure digital environment, we had to write
all the digital libraries for the level shifters, just as V. Automated floorplan and
those that are used for normal standard cells:
1. Verilog (HDL description)
A small program has been developed in order to
ease the floorplan generation. Based on the number
The verilog model of a standard level shifter is
of level shifters and the desired utilization
similar to the one of a buffer. In our case, the model
percentage of each voltage region, it proposes a
we used is similar to a 2-inputs AND gate. RTL
selection of floorplans with different aspect ratios
design is performed as usual, without any reference
for which it generates Silicon Ensemble scripts that
to the power supplies. The level shifters are
will initialize the floorplan, place the level shifters
instantiated within the RTL code.
automatically and route the horizontal and vertical
power stripes as represented below.
2. Design Compiler (Synthesis)
The level shifter’s timing parameters (fall/rise slew
rate and fall/rise transition delays) under all the
necessary PVT (process, voltage and temperature)
corners have been extracted from Spice simulations.
A Design Compiler .lib file has been generated and
compiled to a .db file so that the synthesis will treat
the level shifter as a standard cell.
3. Fastscan (ATPG)
A Fastscan model of the level shifters has been
Finally, the cells are gathered in groups, and each
generated, too, so that we can automatically
group is assigned to a region, so that the placement
generate scan patterns for the production test.
tool will locate each cell in the correct region.
Fastscan does not need any timing information. The
logical function is a 2-inputs AND gate, as for the
VI. Clock tree synthesis
verilog. From there, Fastscan treats the level shifter
as if it was a digital cell. Running ATPG is easier
The clock tree structure with dual supply voltages
because we can read the complete design in
presented in  handles clock domains in which all
Fastscan, rather than generating a set of scan
the flip-flops are only allowed to operate at the low
vectors for each region. In addition, the fault
voltage while meeting the timing constraints.
coverage is most probably higher.
We propose here a technique allowing a given clock
4. Silicon Ensemble (Place&Route)
tree to drive flip-flops in both low and high voltage
regions. However, the clock tree generator is not
Silicon Ensemble needs 2 library files for the level
allowed to place clock buffers in a voltage region
shifter. The first one is the LEF and is a view of the
which is different from the clock’s root driver.
SAME 2001, November 15th 2001 5
Effectively, we have to avoid that a clock buffer SDF file for each region, and merging them together
gets placed in a voltage region that is turned off if with a simple PERL script.
the corresponding branch is supposed to drive
functions that are in use (powered on). The dashed VIII. References
line on the diagram below symbolizes a dead branch
of the clock tree, which makes some functions in  Chingwei Yeh and Yin-Shuin Kang, Cell-Based
voltage regions 1 and 3 fail if voltage 2 is turned Layout Techniques Supporting Gate-Level Voltage
off. Scaling for Low Power. IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, Vol. 8 No.
5, October 2000.
 Kimiyoshi Usami, Mitsunori Igarashi, Fumihiro
Minami, Takashi Ishikawa, Masahiro Kanazawa,
Makoto Ichida, and Kazutaka Nogami, Automated
Low-Power Technique Exploiting Multiple Supply
Voltages Applied to a Media Processor. IEEE
Journal of solid-state Circuits, Vol.33, No.3, March
 C.Yeh and M.-C. Chang, Gate-level voltage
scaling for low-power design using multiple supply
The correct placement of the clock tree buffers is voltages. IEE Proc. Circuits Devices Syst., Vol.
managed by several steps, automated in a Unix shell 146, No. 6, December 1999.
script. There are as many CT-Gen runs as voltage
regions. The diagram below presents an example of  Chigwei Yeh, Yin-Shuin Kang, Shan-Jih Shieh,
a clock tree generation in voltage region 3: all Jinn-Shyan Wang, Layout Techniques Supporting
possible “holes” in the rows of regions 1 and 2 are the Use of Dual Supply Voltages for Cell-Based
filled with dummy filler cells. Then all cells in these Designs. Design Automation Conference, 1999.
regions are assigned the FIXED property in the Proceedings. 36th , 1999
DEF file (Silicon Ensemble ASCII database).
Finally, CT-Gen is launched.  Yi-Jong Yeh and Sy-Yen Kuo, An Optimization-
based low-power voltage scaling technique using
multiple supply voltages. Circuits and Systems,
2001. ISCAS 2001. The 2001 IEEE International
Symposium on , Volume: 5, 2001.
 Martin Polzl, A Strategy to Detect Charge
Damaging Process Steps within a Multilayer
Metallization Technology. 1997 2nd International
Symposium on Plasma Process-Induced Damage.
 G. E. Tellez and M. Sarrafzadeh, Clock period
constrained minimal buffer insertion in clock trees.
In Proceedings of the IEEE/ACM International
Once all the clock trees have been generated, the
Conference on Computer-Aided Design, 1994.
routing can be launched as for a usual layout, and
RC parasitics file can be generated as in the
 Jatuchai Pangjun and Sachim S. Sapatnekar,
Clock Distribution Using Multiple Voltages in Low
Power Electronics and Design, 1999. Proceedings.
VII. Future enhancements 1999 International Symposium on , 1999.
By the chosen flow, all voltage regions will be  Alain Guyot and Sélim Abou-Samra, Low
back-annotated using the same PVT conditions, Power CMOS Digital Design in ICM’98, December
because only one SDF file is generated. A region 14-16 1998.
could impose its own voltage range (best case,
worst case) to the others, even if the latter have  Anantha P. Chandrakasan, Samuel Sheng, and
weaker voltage constraints. This problem could be Robert W. Brodersen, Low-Power CMOS Digital
eliminated by splitting the RC file, generating an Design in IEEE Journal of Solid-State Circuits. Vol.
27, No. 4, April 1992.
SAME 2001, November 15th 2001 6