Multi Supply Digital Layout


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Multi Supply Digital Layout

  1. 1. SAME 2001 Session 2: DEA METHODOLOGY MULTI-SUPPLY DIGITAL LAYOUT Regis Santonja, Motorola Volker Wahl, Motorola Toulouse silicon. Back-annotated simulations and static timing analysis allow the designer to ensure that all the timing constraints of the design are met. Abstract 2. Example of timing constraints: the In this paper, the principle of a technique called quot;multi-supply digital layoutquot; is described. The use setup time of this technique allows a reliable backannotation between digital blocks that are NOT powered off Usually, there are two levels of complexity for the same supplies, within an analog top-cell. The calculating the timing constraints of a flip-flop : supplies do not have to have the same voltage levels, thanks to the integration of level shifters for a) Before layout, when the clock is considered voltage adaptation within the digital layout. It is perfect (no skew), also applicable in systems where a supply can be turned off while another one stays alive. This b) After layout, when a clock skew shows up. technique also optimizes the die size with no extra efforts, reduces the layout phase and optimizes scan Dealing with a multiple supply layout adds another insertion and ATPG. level of complexity because we need level shifters on some data and clock paths. The diagram below Index Terms – Layout, level shifter, back- summarizes the situation. annotation, scan, low power, multiple supplies, standard cells. Level I. Introduction shifters The goal of this paper is to present why and how to make a multiple supply digital layout. We will present a flow which covers all the steps from the RTL design down to the layout, using only Where δclk(i) is the delay for the clock root driver to the pin of flip-flop i, standard CAD tools. We will also compare this δck2q is the transition time of the flip-flop, δd is the data path delay and Tclk is the clock period. technique with the existing literature on the subject, and explain why it is best suited towards our needs In order for the layout tool to generate a balanced in terms of resulting area, layout development time, clock tree, one needs to have a logical and a timing and scan test. model for the level shifters. The level shifters are presented in section III.4. 1. What is post-layout back-annotation ? 3. Why do we need several power It is the process of calculating the cell delays based supplies in a design? on the final routing, and putting these delays into the cell models for simulation or static timing There are two reasons for using several power analysis. supplies, both of which are necessary for power management chips. This kind of circuit is very Back-annotation is needed in order to ensure that common in mobile phones. They are used for the functionality is kept from RTL design down to SAME 2001, November 15th 2001 1
  2. 2. regulating and distributing the power supplies to the In section V, we present a program which generates other chips in the telephone. scripts for Silicon Ensemble. In section VI, we present our multiple voltage clock tree solution. Finally, section VII presents a possibility for a) Reducing Power Consumption enhancing the flow in the future. Reducing the power consumption of portable II. Prior Art devices such as mobile phones, PDAs or portable PCs has become one of the most important goals of the semiconductor industry. As exposed in section 1. Interfacing circuits that operate at II.3. of this paper, using several power supplies is different voltage levels. one of the most effective techniques to reduce power consumption. On analog-oriented chips where several digital blocks powered off different supplies have to be b) Interfacing circuits that operate at laid out on the same silicon, the traditional way to different voltage levels. do this was to design and layout the digital blocks separately, place them as macro cells in the analog The second reason for using several power supplies top cell of the chip, then use an analog router such is to interface circuits that operate at several as IC Craftsman to interconnect the blocks. voltages. Power management chips include a variety of programmable functions (such as an audio codec, an ADC used to monitor the supply levels, a touch screen interface, a USB, an RS232 port, etc…). The most effective technique is to have each of these functions controlled by a logic powered off the same voltage which is required for the function’s interface. A simplified example of how a power management chip can be in the heart of a multiple supply system is presented below: we have a processor with inputs and outputs operating at 1.8V and a core at 2.5V. This method had the following disadvantages: The power management chip communicates through its serial interface (SPI) operating at 1.8V with an a) There was no way to use the inter-block embedded real time clock powered off an external connections' parasitics and generate a standard Lithium cell at 3.2V. SDF file for back-annotation. b) Three digital layouts had to be done separately with no way to globally re-order the scan chain. c) Three tools and environments had to be used: Silicon Ensemble, Cadence Framework II (Virtuoso) and IC Craftsman. d) Tools such as IC Craftsman and Virtuoso from Cadence are analog tools and not familiar to most of the digital designers. 2. Sophisticated layout techniques found in the literature. The authors in [1] [2] [3] [4] and [5] have already proposed some techniques to layout multiple supply The organization of this paper is the following. In circuits. However, they have started from a different section II, we present the prior art in multiple situation: they have a single supply circuit and want supply layout and show why it is not adapted to our to save power by multiplying the number of its needs. In section III, we present our layout solution. supplies. For doing this, they split the circuit at the In section IV we present the design flow and how to gate level and assign to each gate the power supply integrate the analog level shifters in the digital flow. which best matches its timing requirements, with no SAME 2001, November 15th 2001 2
  3. 3. respect to the function implemented, in such a way they do not have the same ground. The picture that a given function can be spread over several below represents two inverters. We can see that supplies. As the number of connections within a without the isolation, vss1 and vss2 would short function is statistically much bigger than the number together. of connections between the functions, this method (called gate-level voltage scaling) generates a lot of routing between the supplies. Because of this, these authors have developed sophisticated techniques in order to minimise the routing. However, the drawback is that the placement algorithm has to be modified. For example, Chingwei Yeh and Yin- Note that there is a minimum ring width and Shuin Kang in [1] and [4] have proposed a distance required between the rings. modification of the simulated annealing by introducing a new cost function associated with 2. Layout style voltage clustering. In opposition to the prior art, our starting point is to These methods cannot be used for our designs, as develop a chip which is already, by nature, a we require to use standard CAD tools. multiple supply circuit. In fact, we could say that another type of voltage scaling technique 3. How can we reduce power by using (architecture voltage scaling) was used at the multiple supplies? system level, resulting in the definition of a chip in which all the functions (control, real time clock, SPI This technique - called gate level voltage scaling - interface etc…) have been assigned to a voltage consists in using a low supply voltage for the parts supply. For this reason, we do not encounter the of the circuit that do not suffer from the implied same issues than these authors concerning the transistor performance degradation, and keep a routing. Thus, our layout solution has the following higher voltage level for the critical paths of the advantages: circuit. Effectively, lowering the voltage is the most effective technique for reducing CMOS power • it is the simplest, consumption because the latter is proportional to • it works fine with standard cell-based layout the square of the supply voltage. tools (no need to modify the placement algorithm), 4. What about clock distribution? • it includes all the necessary level shifters, • it makes it easy to isolate the voltage regions Many papers have been published since 1990 about from each other with a negligible impact on the generating a zero skew clock tree [7]. Various overall area, algorithms have been proposed for single supply, as • cells can be abutted in each voltage region as in well as for dual supply circuits [2] [8]. However, in usual single-supply layouts. [2], Usami et al. propose a clock tree structure where the leaves have to be in the low voltage These last two points can result in significant area region: the tree does not reach the flip-flops in the savings compared to the prior art. And if we other region. compare to section II.1, the listed disadvantages have disappeared: We’ll see in section VII. that we propose a technique allowing a given clock tree to drive flip- a) We can now generate a single standard SDF flops in both low and high voltage regions. file for back-annotation. All the inter-region connections are taken into account. III. Our layout solution b) Only one digital layout had to be done with the possibility to globally re-order the scan chain. 1. Supplies isolation within the epi c) Only one layout tool is used: Silicon Ensemble, and no analog tool. d) Silicon Ensemble is familiar to most of the Because we are in a mixed-signal environment, we digital designers. have to pay attention to the transitions in the digital domain that might generate commutation noise on In practice, we grouped the cells powered by the sensitive analog blocks. For this reason, the digital same voltage in 3 voltage regions, as presented has to be surrounded by an isolation ring. In the below. Note that the three regions are separated by same manner, we isolate the digital blocks operating the necessary isolation ring. at different voltages from each other, especially if SAME 2001, November 15th 2001 3
  4. 4. 4. The signal goes from a low voltage to a high voltage Whenever a gate has to drive the input of another gate operating at a higher voltage, a voltage conversion is needed at the interface. Connecting the low voltage signal directly to the high voltage gate is not acceptable, even though it would be the Two issues have to be taken into account when a simplest solution. The simulation plot below shows signal goes from one voltage to another one: this situation with two inverters, the first one being operating at a lower supply than the second one. 3. The signal goes from a high voltage to When a falling edge is presented at the input of the a low voltage first inverter, there is a static current consumption in the second inverter because its PMOS is weakly The first issue that can show up is associated with opened. antenna diodes that can allow a static current to flow from the high to the low voltage region. output Curent in second Effectively, charge-collecting antennas are formed inverter during wafer processing when an interconnect (field input poly or metal) is connected to a poly gate that does 50 µA no yet have an electrical connection to diffusion. A connection to diffusion is typically completed at the 130 mV top level of metal, so conductors below the top level of metal are generally considered responsible for damage from collecting charge during plasma processing. Therefore, antenna area ratio design rules are commonly used in the semiconductor The solution we adopted is to use a dual cascode industry to ensure that the remaining charges do not voltage switch (DCVS), which I call a “level damage circuits [6]. shifter” in this paper. However, a usual level shifter as presented in [3] has its output undefined Many companies in the industry add systematically whenever the input supply is turned off. For this antenna diodes in their standard cells that are reason, we have added a 2-input AND gate in order connected on all input pins of the gates. These to force the output low and a NMOS in order to cut antenna diodes are either connected to the supply any current which could flow to the ground as (P-type diode) or to the ground (N-type diode), shown below. The NMOS and the AND gate are depending on the area cost for the cell. controlled by a signal which is low when the input As a consequence, the voltage supply is switched off. type of the diodes appears to be random, leading to the risk of having a static current from the higher to the lower voltage flowing through a P-type diode, as presented on the right. In order to avoid this leakage, we can take advantage from the cells which happen to have only N-type antenna diodes, such as all the simple buffers in the technology we used. The inserted cell has to be powered off the low supply as presented on the Figure below. SAME 2001, November 15th 2001 4
  5. 5. layout of the cell. The second file is the TFL The level shifter’s (Timing Library File). It can be automatically layout has been done derived from the Design Compiler’s library using in such a way that it the syn2tlf program provided by Cadence. The TLF looks like a standard file is needed for CT-Gen (the Clock-Tree cell’s layout except Generator) in order to estimate the clock skew and that it is “dual-rail” the insertion delay of the clock tree. as shown on the right. Silicon Ensemble generates a post-layout netlist which includes the level shifters, and an RC file which contains the list of all the capacitors and IV. Design Flow and Libraries resistances of the routed nets. These two files can then be read by the delay calculator which generates The principle of the technique presented here is to a SDF file used for the back-annotation. The delay avoid the need of using analog tools and tool calculator can be Design Compiler or Primetime environments from RTL down to the layout. CAD from Synopsys, or any internal tool (quite often tools all have to be digital and standard. In order to foundries have their own golden delay calculator). stay in a pure digital environment, we had to write all the digital libraries for the level shifters, just as V. Automated floorplan and those that are used for normal standard cells: placement 1. Verilog (HDL description) A small program has been developed in order to ease the floorplan generation. Based on the number The verilog model of a standard level shifter is of level shifters and the desired utilization similar to the one of a buffer. In our case, the model percentage of each voltage region, it proposes a we used is similar to a 2-inputs AND gate. RTL selection of floorplans with different aspect ratios design is performed as usual, without any reference for which it generates Silicon Ensemble scripts that to the power supplies. The level shifters are will initialize the floorplan, place the level shifters instantiated within the RTL code. automatically and route the horizontal and vertical power stripes as represented below. 2. Design Compiler (Synthesis) The level shifter’s timing parameters (fall/rise slew rate and fall/rise transition delays) under all the necessary PVT (process, voltage and temperature) corners have been extracted from Spice simulations. A Design Compiler .lib file has been generated and compiled to a .db file so that the synthesis will treat the level shifter as a standard cell. 3. Fastscan (ATPG) A Fastscan model of the level shifters has been Finally, the cells are gathered in groups, and each generated, too, so that we can automatically group is assigned to a region, so that the placement generate scan patterns for the production test. tool will locate each cell in the correct region. Fastscan does not need any timing information. The logical function is a 2-inputs AND gate, as for the VI. Clock tree synthesis verilog. From there, Fastscan treats the level shifter as if it was a digital cell. Running ATPG is easier The clock tree structure with dual supply voltages because we can read the complete design in presented in [2] handles clock domains in which all Fastscan, rather than generating a set of scan the flip-flops are only allowed to operate at the low vectors for each region. In addition, the fault voltage while meeting the timing constraints. coverage is most probably higher. We propose here a technique allowing a given clock 4. Silicon Ensemble (Place&Route) tree to drive flip-flops in both low and high voltage regions. However, the clock tree generator is not Silicon Ensemble needs 2 library files for the level allowed to place clock buffers in a voltage region shifter. The first one is the LEF and is a view of the which is different from the clock’s root driver. SAME 2001, November 15th 2001 5
  6. 6. Effectively, we have to avoid that a clock buffer SDF file for each region, and merging them together gets placed in a voltage region that is turned off if with a simple PERL script. the corresponding branch is supposed to drive functions that are in use (powered on). The dashed VIII. References line on the diagram below symbolizes a dead branch of the clock tree, which makes some functions in [1] Chingwei Yeh and Yin-Shuin Kang, Cell-Based voltage regions 1 and 3 fail if voltage 2 is turned Layout Techniques Supporting Gate-Level Voltage off. Scaling for Low Power. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 8 No. 5, October 2000. [2] Kimiyoshi Usami, Mitsunori Igarashi, Fumihiro Minami, Takashi Ishikawa, Masahiro Kanazawa, Makoto Ichida, and Kazutaka Nogami, Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor. IEEE Journal of solid-state Circuits, Vol.33, No.3, March 1998. [3] C.Yeh and M.-C. Chang, Gate-level voltage scaling for low-power design using multiple supply The correct placement of the clock tree buffers is voltages. IEE Proc. Circuits Devices Syst., Vol. managed by several steps, automated in a Unix shell 146, No. 6, December 1999. script. There are as many CT-Gen runs as voltage regions. The diagram below presents an example of [4] Chigwei Yeh, Yin-Shuin Kang, Shan-Jih Shieh, a clock tree generation in voltage region 3: all Jinn-Shyan Wang, Layout Techniques Supporting possible “holes” in the rows of regions 1 and 2 are the Use of Dual Supply Voltages for Cell-Based filled with dummy filler cells. Then all cells in these Designs. Design Automation Conference, 1999. regions are assigned the FIXED property in the Proceedings. 36th , 1999 DEF file (Silicon Ensemble ASCII database). Finally, CT-Gen is launched. [5] Yi-Jong Yeh and Sy-Yen Kuo, An Optimization- based low-power voltage scaling technique using multiple supply voltages. Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on , Volume: 5, 2001. [6] Martin Polzl, A Strategy to Detect Charge Damaging Process Steps within a Multilayer Metallization Technology. 1997 2nd International Symposium on Plasma Process-Induced Damage. [7] G. E. Tellez and M. Sarrafzadeh, Clock period constrained minimal buffer insertion in clock trees. In Proceedings of the IEEE/ACM International Once all the clock trees have been generated, the Conference on Computer-Aided Design, 1994. routing can be launched as for a usual layout, and RC parasitics file can be generated as in the [8] Jatuchai Pangjun and Sachim S. Sapatnekar, standard way. Clock Distribution Using Multiple Voltages in Low Power Electronics and Design, 1999. Proceedings. VII. Future enhancements 1999 International Symposium on , 1999. By the chosen flow, all voltage regions will be [9] Alain Guyot and Sélim Abou-Samra, Low back-annotated using the same PVT conditions, Power CMOS Digital Design in ICM’98, December because only one SDF file is generated. A region 14-16 1998. could impose its own voltage range (best case, worst case) to the others, even if the latter have [10] Anantha P. Chandrakasan, Samuel Sheng, and weaker voltage constraints. This problem could be Robert W. Brodersen, Low-Power CMOS Digital eliminated by splitting the RC file, generating an Design in IEEE Journal of Solid-State Circuits. Vol. 27, No. 4, April 1992. SAME 2001, November 15th 2001 6