Ultra-Low Power Electronics and Design


Published on

This book appears at a time when the first examples of complex circuits in 65nm CMOS technology are beginning to appear, and these products already must take advantage of many of the techniques to be discussed and developed in this book. So why then should our increasing success at miniaturization, as evidenced by the success of
Moore’s Law, be creating so many new difficulties in power management in
circuit designs?

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ultra-Low Power Electronics and Design

  2. 2. This page intentionally left blank
  3. 3. Ultra Low-PowerElectronics and Design Edited by Enrico Macii Politecnico di Torino, Italy KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
  4. 4. eBook ISBN: 1-4020-8076-XPrint ISBN: 1-4020-8075-1©2004 Springer Science + Business Media, Inc.Print ©2004 Kluwer Academic PublishersDordrechtAll rights reservedNo part of this eBook may be reproduced or transmitted in any form or by any means, electronic,mechanical, recording, or otherwise, without written consent from the PublisherCreated in the United States of AmericaVisit Springers eBookstore at: http://www.ebooks.kluweronline.comand the Springer Global Website Online at: http://www.springeronline.com
  7. 7. viiContributorsA. Acquaviva Università di UrbinoL. Benini Università di BolognaD. Bertozzi Università di BolognaD. Blaauw University of Michigan, Ann ArborA. Bogliolo Università di UrbinoA. Bona STMicroelectronicsC. Brandolese Politecnico di MilanoW.C. Cheng University of Southern CaliforniaG. De Micheli Stanford UniversityN. Dutt University of California, IrvineW. Fornaciari Politecnico di MilanoF. Gaffiot Ecole Centrale de LyonJ. Gautier CEA-DRT–LETI/D2NT–CEA/GREA. Gordon-Ross University of California, RiversideR. Gupta University of California, San DiegoC. Heer Infineon Technologies AGM. J. Irwin Pennsylvania State UniversityI. Kadayif Canakkale Onsekiz Mart UniversityM. Kandemir Pennsylvania State UniversityB. Kienhuis LeidenI. Kolcu UMISTE. Lattanzi Università di UrbinoD. Lee University of Michigan, Ann ArborA. Macii Politecnico di TorinoS. Mohapatra University of California, IrvineI. O’Connor Ecole Centrale de LyonK. Patel Politecnico di TorinoM. Pedram University of Southern CaliforniaC. Pereira University of California, San DiegoC. Piguet CSEMM. Poncino Università di VeronaF. Salice Politecnico di MilanoP. Schaumont University of California, Los AngelesU. Schlichtmann Technische Universität MünchenD. Sylvester University of Michigan, Ann Arbor
  8. 8. viiiF. Vahid University of California, Riverside and University of California, IrvineN. Venkatasubramanian University of California, IrvineI. Verbauwhede University of California, Los Angeles and K.U.LeuvenN. Vijaykrishnan Pennsylvania State UniversityV. Zaccaria STMicroelectronicsR. Zafalon STMicroelectronicsB. Zhai University of Michigan, Ann ArborC. Zhang University of California, Riverside
  9. 9. ixPrefaceToday we are beginning to have to face up to the consequences of thestunning success of Moore’s Law, that astute observation by Intel’s GordonMoore which predicts that integrated circuit transistor densities will doubleevery 12 to 18 months. This observation has now held true for the last 25years or more, and there are many indications that it will continue to holdtrue for many years to come. This book appears at a time when the firstexamples of complex circuits in 65nm CMOS technology are beginning toappear, and these products already must take advantage of many of thetechniques to be discussed and developed in this book. So why then shouldour increasing success at miniaturization, as evidenced by the success ofMoore’s Law, be creating so many new difficulties in power management incircuit designs? The principal source and the physical origin of the problem lies in thedifferential scaling rates of the many factors that contribute to powerdissipation in an IC – transistor speed/density product goes up faster thanthe energy per transition comes down, so the power dissipation per unit areaincreases in a general sense as the technology evolves. Secondly, the “natural” transistor switching speed increase from onegeneration to the next is becoming downgraded due to the greater parasiticlosses in the wiring of the devices. The technologists are offsetting thisproblem to some extent by introducing lower permittivity dielectrics (“low-k”) and lower resistivity conductors (copper) – but nonetheless to get theneeded circuit performance, higher speed devices using techniques such assilicon-on-insulator (SOI) substrates, enhanced carrier mobility (“strainedsilicon”) and higher field (“overdrive”) operation are driving powerdensities ever upwards. In many cases, these new device architectures areincreasingly leaky, so static power dissipation becomes a major headache inpower management, especially for portable applications.
  10. 10. x A third factor is system or application driven – having all this integrationcapability available encourages us to combine many different functionalblocks into one system IC. This means that in many cases, a large part of thechip’s required functionality will come from software executing on andbetween multiple on-chip execution units; how the optimum partitioningbetween hardware architecture and software implementation is obtained is avast subject, but clearly some implementations will be more energy efficientthan others. Given that, in many of today’s designs, more than 50% of thetotal development effort is on the software that runs on the chip, getting thispartitioning right in terms of power dissipation can be critical to the successof (or instrumental in the failure of!) the product. A final motivation comes from the practical and environmentalconsequences of how we design our chips – state-of-the-art highperformance circuits are dissipating up to 100W per square centimeter – weonly need 500 square meters of such silicon to soak up the output of a smallnuclear power station. A related argument, based on battery lifetime, showsthat the “converged” mobile phone application combining telephony, datatransmission, multimedia and PDA functions that will appear shortly isdemanding power at the limit of lithium-ion or even methanol-water fuel cellbattery technology. We have to solve the power issue by a combination ofdesign and process technology innovations; examples of current approachesto power management include multiple transistor thresholds, triple gateoxide, dynamic supply voltage adjustment and memory architectures. Multiple transistor thresholds is a technique, practiced for several yearsnow, that allows the designer to use high performance (low Vt) deviceswhere he needs the speed, and low leakage (high Vt) devices elsewhere. Thisbenefits both static power consumption (through less sub-threshold leakage)and dynamic power consumption (through lower overall switching currents).High threshold devices can also be used to gate the supplies to different partsof the circuit, allowing blocks to be put to sleep until needed. Similar to the previous technique, triple gate oxide (TGO) allows circuitpartitioning between those parts that need performance and other areas of thecircuit that don’t. It has the additional benefit of acting on both sub-thresholdleakage and gate leakage. The third oxide is used for I/O and possiblymixed-signal. It is expected over the next few years that the processtechnologists will eventually replace the traditional silicon dioxide gatedielectric of the CMOS devices by new materials such as rare earth oxideswith much higher dielectric constants that will allow the gate leakageproblem to be completely suppressed.
  11. 11. xi Dynamic supply voltage adjustment allows the supply voltage to differentblocks of the circuit to be adjusted dynamically in response to the immediateperformance needs for the block – this very sophisticated technique will takesome time to mature. Finally, many, if not most, advanced devices use very large amounts ofmemory for which the contents may have to be maintained during standby;this consumes a substantial amount of power, either through refreshingdynamic RAM or through the array leakage for static RAM. Traditional non-volatile memories have writing times that are orders of magnitude too slowto allow them to substitute these on-chip memories. New developments,such as MRAM, offer the possibility of SRAM-like performance coupledwith unlimited endurance and data retention, making them potentialcandidates to replace the traditional on-chip memories and remove thiscomponent of standby power consumption. Most of the approaches to power management described briefly abovewill be employed in 65nm circuits, but there are a lot more good ideaswaiting to be applied to the problem, many of which you will find clearlyand concisely explained in this book. Mike Thompson, Philippe Magarshack STMicroelectronics, Central R&D Crolles, France
  12. 12. This page intentionally left blank
  13. 13. xiiiIntroductionULTRA LOW-POWER ELECTRONICS ANDDESIGNEnrico MaciiPolitecnico di TorinoPower consumption is a key limitation in many electronic systems today,ranging from mobile telecom to portable and desktop computing systems,especially when moving to nanometer technologies. Power is also ashowstopper for many emerging applications like ambient intelligence andsensor networks. Consequently, new design techniques and methodologiesare needed to control and limit power consumption.The 2004 edition of the DATE (Design Automation and Test in Europe)conference has devoted an entire Special Focus Day to the power problemand its implications on the design of future electronic systems. In particular,keynote presentations and invited talks by outstanding researchers in the fieldof low-power design, as well as several technical papers from the regularconference sessions have addressed the difficulties ahead and advancedstrategies and principles for achieving ultra low-power design solutions.Purpose of this book is to integrate into a single volume a selection of thesecontributions, duly extended and transformed by the authors into chaptersproposing a mix of tutorial material and advanced research results.The manuscript consists of a total of 14 chapters, addressing different aspectsof ultra low-power electronics and design. Chapter 1 opens the volume byproviding an insight to innovative transistor devices that are capable ofoperating with a very low threshold voltage, thus contributing to a significantreduction of the dynamic component of power consumption. Solutions forlimiting leakage power during stand-by mode are also discussed. The chaptercloses with a quick overview of low-power design techniques applicable atthe logic level, including multi-Vdd, multi-Vth and hybrid approaches.Chapter 2 focuses on the problem of reducing power in the interconnectnetwork by investigating alternatives to traditional metal wires. In fact,according to the 2003 ITRS roadmap, metallic interconnections may not beable to provide enough transmission speed and to keep power under controlfor the upcoming technology nodes (65nm and below). A possible solution,explored in the chapter, consists of the adoption of optical interconnectnetworks. Two applications are presented: Clock distribution and datacommunication using wavelength division multiplexing.
  14. 14. xivIn Chapter 3, the power consumption problem is faced from the technologypoint of view by looking at innovative nano-devices, such as single-electronor few-electron transistors. The low-power characteristics and potential ofthese devices are reviewed in details. Other devices, including carbon nano-tube transistors, resonant tunnelling diodes and quantum cellular automataare also treated.Chapter 4 is entirely dedicated to advanced design methodologies forreducing sub-threshold and gate leakage currents in deep-submicron CMOScircuits by properly choosing the states to which gates have to be drivenwhen in stand-by mode, as well as the values of the threshold voltage and ofthe gate oxide thickness. The authors formulate the optimization problem forsimultaneous state/Vth and state/Vth/Tox assignments under delay constraintsand propose both an exact method for its optimal solution and two practicalheuristics with reasonable run-time. Experimental results obtained on anumber of benchmark circuits demonstrate the viability of the proposedmethodology.Chapter 5 is concerned with the issue of minimizing power consumption ofthe memory subsystem in complex, multi-processor systems-on-chip(MPSoCs), such as those employed in multi-media applications. The focus ison design solutions and methods for synthesizing memory architecturescontaining both single-ported and multi-ported memory banks. Powerefficiency is achieved by casting the memory partitioning design paradigm tothe case of heterogeneous memory structures, in which data need to beaccessed in a shared manner by different processing units.Chapter 6 addresses the relevant problem of minimizing the power consumedby the cache hierarchy of a microprocessor. Several design techniques arediscussed, including application-driven automatic and dynamic cacheparameter tuning, adoption of configurable victim buffers and frequent-valuedata encoding and compression.Power optimization for parallel, variable-voltage/frequency processors is thesubject of Chapter 7. Given a processor with such an architecture, thischapter investigates the energy/performance tradeoffs that can be spanned inparallelizing array-intensive applications, taking into account the possibilitythat individual processing units can operate at different voltage/frequencylevels. In assigning voltage levels to processing units, compiler analysis isused to reveal hetherogeneity between the loads of the different units inparallel execution.
  15. 15. xvChapter 8 provides guidelines for the design and implementation of DSP andmulti-media applications onto programmable embedded platforms. TheRINGS architecture is first introduced, followed by a detailed discussion onpower-efficient design of some of the platform components, namely, theDSPs. Next, design exploration, co-design and co-simulation challenges areaddressed, with the goal of offering to the designers the capability ofincluding into the final architecture the right level of programmability (orreconfigurability) to guarantee the required balance between systemperformance and power consumption.Chapter 9 targets software power minimization through source codeoptimization. Different classes of code transformations are first reviewed;next, the chapter outlines a flow for the estimation of the effects that theapplication of such transformations may have on the power consumed by asoftware application. At the core of the estimation methodology there is thedevelopment of power models that allow the decoupling of processor-independent analysis from all the aspects that are tightly related to processorarchitecture and implementation. The proposed approach to software powerminimization is validated through several experiments conducted on anumber of embedded processors for different types of benchmarkapplications.Reduction of the power consumed by TFT liquid crystal displays, such asthose commonly used in consumer electronic products is the subject ofChapter 10. More specifically, techniques for reducing power consumptionof transmissive TFT-LCDs using a cold cathode fluorescent lamp backlightare proposed. The rationale behind such techniques is that the transmittancefunction of the TFT-LCD panel can be adjusted (i.e., scaled) while meetingan upper bound on a contrast distortion metric. Experimental results showthat significant power savings can be achieved for still images with very littlepenalty in image contrast.Chapter 11 addresses the issue of efficiently accessing remote memoriesfrom wireless systems. This problem is particularly important for devicessuch as palmtops and PDAs, for which local memory space is at a premiumand networked memory access is required to support virtual memoryswapping. The chapter explores performance and energy of networkswapping in comparison with swapping on local microdrives and FLASHmemories. Results show that remote swapping over power-manageablewireless network interface cards can be more efficient than local swappingand that both energy and performance can be optimized by means of power-aware reshaping of data requests. In other words, dummy data accesses canbe preemptively inserted in the source code to reshape page requests in orderto significantly improve the effectiveness of dynamic power management.
  16. 16. xvi Chapter 12 focuses on communication architectures for multi-processor SoCs. The network-on-chip (NoC) paradigm is reviewed, touching upon several issues related to power optimization of such kinds of communication architectures. The analysis goes on a layer-by-layer basis, and particular emphasis is given to customized, domain-specific networks, which represent the most promising scenario for communication-energy minimization in multi-processor platforms. Chapter 13 provides a natural follow up to the theory of NoCs covered in the previous chapter by describing an industrial application of this type of communication architecture. In particular, the authors introduce an innovative methodology for automatically generating the power models of a versatile and parametric on-chip communication IP, namely the STBus by STMicroelectronics. The methodology is validated on a multi-processor hardware platform including four ARM cores accessing a number of peripheral targets, such as SRAM banks, interrupt slaves and ROM memories. The last contribution, offered in Chapter 14, proposes an integrated end-to- end power management approach for mobile video streaming applications that unifies low-level architectural optimizations (e.g., CPU, memory, registers), OS power-saving mechanisms (e.g., dynamic voltage scaling) and adaptive middleware techniques (e.g., admission control, trans-coding, network traffic regulation). Specifically, interaction parameters between the different levels are identified and optimized to achieve a reduction in the power consumption. Closing this introductory chapter, the editor would like to thank all the authors for their effort in producing their outstanding contributions in a very short time. A special thank goes to Mike Thompson and Philippe Magarshack of STMicroelectronics for their keynote presentation at DATE 2004 and for writing the foreword to this book. The editor would also like to acknowledge the support offered by Mark De Jongh and the Kluwer staff during the preparation of the final version of the manuscript. Last, but not least, the editor is grateful to Agnieszka Furman for taking care of most of the “dirty work” related to book editing, paging and preparation of the camera-ready material.
  17. 17. 1Chapter 1ULTRA-LOW-POWER DESIGN: DEVICE ANDLOGIC DESIGN APPROACHESChristoph Heer1 and Ulf Schlichtmann21 Infineon Technologies AG; 2Technische Universität MünchenAbstract Power consumption increasingly is becoming the bottleneck in the design of ICs in advanced process technologies. We give a brief introduction into the major causes of power consumption. Then we report on experiments in an advanced process technology with ultra-low threshold voltage (Vth) devices. It turns out that in contrast to older process technologies, this approach increasingly is becoming less suitable for industrial usage in advanced process technologies. Following, we describe methodologies to reduce power consumption by optimizations in logic design, specifically by utilizing multiple levels of supply voltage Vdd and threshold voltage Vth. We evaluate them from an industrial product development perspective. We also give a brief outlook to proposals on other levels in the design flow and to future work.Keywords: Low-power design, dynamic power reduction, leakage power reduction, ultra- low-Vth devices, multi-Vdd, multi-Vth, CVS1.1 INTRODUCTION The progress of silicon process technology marches on relentlessly. Aspredicted by Gordon Moore decades ago, silicon process technologycontinues to achieve improvements at an astonishing pace [1]. The numberof transistors that can be integrated on a single IC approximately doublesevery 2 years [2,3]. This engineering success has created innovative newindustries (e.g. personal computers and peripherals, consumer electronics)and revolutionized other industries (e.g. communications). Today, however, it is becoming increasingly difficult to achieveimprovements at the pace that the industry has become accustomed to. Moreand more technical challenges appear that require increasing resources to be
  18. 18. 2 solved [4]. One such problem is the increasing power consumption of integrated circuits. It becomes even more critical as an increasing number of today’s high-volume consumer products are battery-powered. In the following, we will consider the sources of power consumption and their development over time. We will show why reduction of power consumption increasingly is becoming critical to product success and will review traditional approaches in Sections 1.1 and 1.2. In Section 1.3 we will then analyze a potential solution based on introduction of an optimized transistor with a very low threshold voltage Vth. Thereafter, we will present and discuss logic-level design optimizations for power reduction in Section 1.4. Also, we will briefly point out potential optimizations on higher levels. Our observations are made from the perspective of industrial IC product development where technical optimizations must be carefully evaluated against the cost associated with achieving and implementing them. Mostly, the presented methodologies are already being utilized in leading-edge industrial ICs. 1.2 POWER CONSUMPTION BECOMES CRITICAL Depending on the type of end-product and its application, different aspects of power consumption are the primary concern: dynamic power or leakage power. Reduction of dynamic power consumption is a concern for almost all IC products today. For battery-powered products, reduced power consumption directly results in longer operating time for the product, which is a very desirable characteristic. Even for non-battery-powered products, reduced power consumption brings many advantages, such as reduced cost because of cheaper packaging or higher performance because of lower temperatures. Finally, reduced power consumption often leads to lower system cost (no fans required; no or cheaper air conditioning for data / telecom center etc.). Dynamic power consumption is caused by the charging and discharging of capacitances when a circuit switches. In addition, during switching a short-circuit current flows, but this current is typically much smaller, and will therefore be neglected in the following. The dynamic current due to capacitance charging and discharging is determined by the following well- known relationship: Pdyn ~ f • CL • Vdd 2
  19. 19. 3 Based on constant electrical field scaling, Vdd and CL each are reduced by30% in each successive process generation. Also, delay decreases by 30%,resulting in 43% increase in frequency. Therefore, the dynamic powerconsumption per device is reduced by 50% from one process generation tothe next. As scaling also doubles the number of devices that can beimplemented in a given die area, dynamic power consumption per areashould stay roughly identical. However, historically frequency has increasedby significantly more than 43% from one process generation to the next (e.g.in microprocessors, it has roughly doubled, due to architecturaloptimizations, such as deeper pipeline stages), and in addition, die sizes haveincreased with each new process technology, further increasing the powerconsumption, due to an increased number of active devices [5]. For thesereasons, dynamic power consumption has increased exponentially, as isshown in Figure 1-1 for the example of microprocessors. Reduction of leakage power consumption today is primarily a concernfor products that are powered by battery and spend most of their operatinghours in some type of standby mode, such as cell phones. For many process generations, however, leakage has increased roughlyby a factor of 10 for every two process nodes [6]. Due to this dramaticincrease with newer process generations, leakage is becoming a significantcontribution to overall IC power consumption even in normal operatingmode, as can be seen in Figure 1-1 as well. Leakage was estimated toincrease from 0.01% of overall power consumption in a 1.0µm technology,to 10% in a 0.1µm technology [6]. For a microprocessor, Intel estimatedleakage power consumption at more than 50W for a 100nm technologynode[3]. This figure probably is extreme, and leakage depends strongly on anumber of factors, such as threshold voltage (Vth) of the transistor, gateoxide thickness and environmental operating conditions (supply voltage Vdd,temperature T). Nevertheless, for an increasing number of products leakagepower consumption is turning into a problem, even when they are notbattery-powered.
  20. 20. 4 Figure 1-1. Development of dynamic and leakage power consumption over time [3,7] 1.3 TRADITIONAL APPROACHES TO POWER REDUCTION As outlined above, dynamic power consumption is governed by: Pdyn ~ f • CL • Vdd 2 with f denoting the switching frequency, CL the capacitance being switched, and Vdd the supply voltage . This formula immediately identifies the key levers to reduce dynamic power: • Reduce operating frequency • Reduce driven capacity • Reduce supply voltage Traditionally, reduction in supply voltage Vdd has been the most often followed strategy to reduce power consumption. Unfortunately, lowering Vdd has the side effect of reducing performance as well, primarily because gate
  21. 21. 5overdrive (the difference between Vdd and Vth) diminishes if the thresholdvoltage Vth is kept constant. Based on the alpha power law model [8], thedelay td of an inverter is given by CL • Vdd td = (Vdd − Vth )α with α denoting a fitting constant. As supply voltages are driven below1.0V, the reductions in gate overdrive are more pronounced than previously.In addition, newer process technologies give significantly less of aperformance boost compared to the previous process generation than hastraditionally been the case, therefore a further reduction in performance ishighly undesirable. Finally, the power reduction achieved by moving to anew process generation has trended down over time, since supply voltageshave been scaled by increasingly less than the 30% prescribed by theconstant electrical field scaling paradigm. Consequently, more advanced approaches are required. In the following, our main focus will be on dynamic power consumption,but we will also consider leakage power consumption.1.4 ZERO-VTH DEVICES The concept of zero-Vth devices was developed in the mid 90-ies. Itovercomes the diminishing gate overdrive by radically setting the thresholdvoltage of the active devices to zero. It has been shown [9], that the optimumpower dissipation is obtained, if Pleak (leakage contribution) is in the sameorder of magnitude as Pdyn (dynamic switching contribution). This can beachieved for transistors with Vth close to 0V (‘zero-Vth transistor‘). Thereforethe devices will never completely switch off. But from an overall powerperspective the gain in active power consumption is tremendous. Using these transistors the supply voltage of 130nm circuits can bereduced to values below 0.3V to achieve a Pdyn reduction by 90% withoutperformance degradation. Alternatively, the circuit can be operated at twicethe clock frequency when keeping the supply voltage at 1.2V, as shown inFigure 1-2. The corresponding Ion/Ioff-ratio for the zero-Vth transistor is about10-100 instead of >105 for the standard transistor options. During standby,the complete circuits are switched-off or are set into a low leakage mode tocope with the very high leakage contribution. The low leakage mode isachieved by ‘active well’ control, which denotes the use of the body effect.The well potentials of the PFETs and NFETs are altered to change Vth. Toachieve a lower leakage current, the absolute value of Vth is increased by
  22. 22. 6 reverse back biasing: a negative well-to-source voltage Usb is used. Therefore voltages below Vss for NFETs and above Vdd for PFETs have to be generated. Furthermore, active well is required to compensate the lot-to- lot or wafer-to-wafer variations of Vth. The initial ‘zero-Vth’ concept assumed constant junction temperatures Tj below 40°C. For some high-end computer equipment the costs for active chip cooling are affordable to achieve this junction temperature. But this is definitely not the case for cost-driven consumer products. For this application domain Tj in active mode ranges between 85°C and 125°C, and in some applications the specified worst-case ambient temperature is even 80°C. The proposed zero-Vth concept is therefore not applicable without changes and adaptations. Figure 1-2. Simulated performance curves of transistors with ultra-low Vth. Compared to low- Vth, either a performance gain or a Vdd reduction can be achieved. Curves for reg-Vth and high-Vth transistors of a 130 nm technology are included A more conservative approach with respect to zero-Vth, but still aggressive compared to current devices, had to be chosen. An ultra-low Vth device with about 150mV threshold voltage proved to be the best
  23. 23. 7compromise between zero-Vth and current low-Vth of about 300mV within a130 nm CMOS technology. To identify the optimal choice of Vth and Vdd in combination with thehigher junction temperature Tj, simulations with modified parameters of the130nm low-Vth transistor are performed. In Figure 1-3 the power dissipationis shown for a high activity circuit ( = 20%) with various options for thetransistor threshold voltages: reg-Vth, low-Vth, and transistors whose Vth arereduced to 200mV, 150mV, 100mV and 50mV. The reg-Vth circuitperformance was used as the reference (Vdd = 1.5V), and the supply voltagesfor the other transistor options were reduced to meet that referenceperformance. 3,5E-05 V dd= 1.5V T= 125°C fast 3,0E-05 Power [W ] 2,5E-05 nom = target 2,0E-05 1.2V slow 1,5E-05 1.0V 0.6V 1,0E-05 0.8V 0.7V 5,0E-06 0,0E+00 reg-Vt low-Vt 200mV 150mV 100mV 50mV Device Option / Vth (mV) Figure 1-3. Power dissipation at T=125°C in active mode for several transistor options with reduced Vth. A minimum power consumption is achieved at 150mV Vth. (At T=55°C the minimum is achieved for the same option but process variations show less impact). The reduced supply voltage leads to lower overall active powerconsumption Pactive. A minimum power consumption is reached at Vth =150mV. With even lower threshold voltages Pactive starts to increase againbecause of the increase of the leakage current. The steep rise of Pactiveoriginates from the exponential relation between Vth and leakage current. Asa rule of thumb a 100mV reduction of the threshold voltage allows for a Vdd
  24. 24. 8 reduction by 0.15V but on the other hand results in a tenfold increase of the leakage current. From Figure 1-3 also the impact of technology variations is visible. Due to the high leakage contribution a power reduction of only 25% is achieved under fast process conditions. Using back biasing in reverse mode, the high performance of fast transistors can be reduced through increasing Vth. The corresponding leakage current therefore decreases and allows a power reduction by 50% (stippled arrow). A process modification has been developed to manufacture devices with the threshold voltage of 150 mV, which proves to be the most efficient for the target application domain of mobile consumer products [10]. In Table1-1 the key transistor parameters of our ultra-low-Vth FETs (ulv) and of the standard low-Vth transistor are listed. The Vth values are 165mV and 161mV for the ulv-NFET and ulv-PFET respectively, Ion increases by 35% and 22%, which translates into an average decrease of the CV/I-metric delay by 29%. Circuit simulations showed a performance increase of 25%. Concerning Vth, performance, and Ioff the target values have been nearly met. Table 1-1. Extracted key parameters of the ulv-FETSs in comparison with the target values and the low- Vth FETs 130nm low-Vt 130nm ulv-FET Target NFET / PFET NFET / PFET Ion 560 / 240 755 / 295 [µA/µm] Ioff 1.2 / 1.2 48 / 17 35 [nA/µm] Vth 295 / 260 165/160 150 [mV] body effect 150 / 135 60/65 90 [mV/V] Vth@ L=10nm 35 / 30 65/30 [mV] Vth@ L=15nm 65 / 70 100/90 [mV] Simulated gate delay 1 0.8 0.75 [relative units] The sensitivity of Vth to gate length variation (roll-off) is expressed in Vth-shift per 10nm or 15nm gate length decrease. A comparison with low- Vth-FETs shows a pronounced increase. Therefore in addition to temperature compensation, back biasing has also to be used to compensate for this strong technology variation.
  25. 25. 9 The values of the body effect are also included in Table 1-1. The bodyeffect is expressed in Vth-shift per 1V well bias. The ulv-FETs yield values,which are lower by more than 50% compared to the low-Vth transistors. Thedecrease of body effect in combination with the increased roll-off reducesthe leverage of back biasing for ulv-FETs very significantly. The leverage isnot even sufficient to compensate the technology variation, since the valueof the roll-off is higher than that of the body effect. As an example, the ulv-NFET shows roll-off values of 65mV/10nm and 100mV/15nm and a bodyeffect of only 60mV/V. To investigate the migration potential of the ulv-FETs for futuretechnology generations Ioff measurement results, obtained from a recent90nm hardware, were used. Based on this measurement data the leverage ofactive well with the standard reg-Vth and low leakage transistor options hasbeen analyzed. For supply voltages of 1.2V and 0.75V a reverse back biasingvoltage of 0.5V has been applied. For the NFET, the back biasing results in aleakage reduction by 50% to 70% for all transistor widths and for bothvalues of Vdd. In the case of the PFET, the leakage reduction values aresimilar (60% to 80%) for transistors with W> 0.5µm. For very narrowPFETs with Vdd = 1.2V, the reduction is only 20% or even less. Sincenarrow FETs are used within SRAMs, which contribute a major part of thecircuit’s standby current, this small reduction for narrow transistors inaddition reduces significantly the leverage of active well. The root cause isan additional leakage mechanism based on tunnelling currents across thedrain-well junction, which limits the reverse back biasing to 0.5V. Thistunnelling current depends exponentially on the drain-well voltage and isworking against any reduction of the sub-threshold current via active well.At Vdd = 0.75V the drain-well voltage is reduced and the tunnelling currentis therefore lower. In this case the effect of back biasing is not compensatedby a rising tunnelling current and a leakage current reduction by 70% is stillachieved. For a 90nm technology the limit of 0.5V for the well potential swinglimits the reduction of the leakage currents to a factor between 2 and 4. Thisis still a major contribution of all feasible measures to reduce standby powerconsumption, but the leverage becomes quite small compared to thereduction ratios of several orders of magnitude obtained in previoustechnologies [11,12]. In future technologies, Ileak will become more stronglyaffected by the emerging tunnelling current Igate through the gate of the FET.This is due to the ever decreasing gate oxide thickness and also due to thefact, that even the on-state transistors shows gate leakage. Igate is not affectedby well biasing reducing the leverage of active well even further.
  26. 26. 10 In summary the zero-Vth-devices have become very susceptible to process and temperature variations. Significant yield is only achievable with back biasing via active well control and with active cooling. The latter approach is not feasible for mobile applications. Therefore a more conservative approach with respect to zero-Vth, but still aggressive compared to current devices, had to be chosen. An ultra-low-Vth device with about 150mV threshold voltage proved to be the best compromise between zero- Vth and current low-Vth of about 300mV within a 130 nm CMOS technology. But even though fabrication of this ultra-low-Vth device is possible, it affects some standard methods to overcome short-channel effects. The so called halo- or pocket-implantation had to be removed to bring the threshold voltage down. Unfortunately short-channel effects are now heavily increased, leading as shown to a very strong Vth roll-off at slight variations of the channel length. Finally this effect was prohibitive for the overall approach and led to cancellation of many zero-Vth projects in the industry[13]. 1.5 DESIGN APPROACHES TO POWER REDUCTION As outlined above, solutions from process technology by itself will not suffice to provide sufficient power reduction. Therefore, solutions must be found in algorithms, product architecture and logic design. Increasingly, differentiated device options provided by process technology are utilized on these levels in the search for optimization of power consumption. For leading-edge products which need to optimize both power consumption and system performance, optimization techniques on architecture and design level have been proposed and partly already been implemented. While academic research often focuses on the tradeoff between power consumption and performance, industrial product development must also take other variables into consideration. • Product cost: often, power optimization design techniques increase die area, directly affecting manufacturing cost. Also, utilization of additional devices (e.g. different Vth devices) increases mask count and consequently manufacturing cost, and additionally requires up-front expenditures for the development of such devices. Finally, increased manufacturing complexity poses the risk of lowered manufacturing yield. • Product robustness: it must be ensured that optimized products still work across the specified range of operating conditions, also taking manufacturing variations into account.
  27. 27. 111.5.1 Multi-Vdd Design As outlined in the introduction, the supply voltage Vdd quadraticallyimpacts dynamic switching power consumption. Thus, lowering Vdd is thepreferred option to reduce dynamic power consumption. However, asdiscussed in Section 1.2, lowering Vdd reduces the system performance.Thus, the incentive to lower Vdd to reduce power consumption is kept incheck by the need to maintain performance. Reduction of Vdd can be applied on different abstraction levels of adesign. Most effective regarding power reduction, and also easiest toimplement is to lower Vdd for an entire IC. As this will directly impact theperformance of the IC design, this often is not an option. On a lowerabstraction level, it is possible to lower Vdd for an entire module. This is stillrather simple to implement, but if only modules are chosen such that overallIC performance is not impacted, the achieved gains in power reduction willoften be very moderate. Finally, a reduction in supply voltage can be applied specifically toindividual gates, such that the overall system performance is not reduced.This approach, as shown in Figure 1-4, recognizes that in a typical design,most logic paths are not critical. They can be slowed down, oftensignificantly, without reducing the overall system performance. This slowingdown is achieved by lowering the supply voltage Vdd for gates on the non-critical paths, which results in lowered power consumption.
  28. 28. 12 10ns SET SET D Q D Q CLR Q CLR Q SET SET D Q D Q CLR Q CLR Q 5ns Non-critical path may be delayed 10ns SET SET D Q D Q Vdd_low Vdd_low CLR Q CLR Q Vdd_low SET SET D Q D Q CLR Q CLR Q 8ns Non-critical path runs with reduced supply voltage Figure 1-4. Multi-Vdd design This technique will modify the distribution of path delays in a design to a distribution skewed towards paths with higher delay, as indicated Figure 1-5 [14]. Single Supply Voltage SSV Multiple Supply Voltages MSV MSV SSV crit. paths td td 1/f 1/f Figure 1-5. Distribution of path delays under single and multiple supply voltages
  29. 29. 13 A number of studies have shown significant variation in dynamic powerreduction results from implementing a multi-Vdd design strategy, rangingfrom less than 10% up to almost 50%, with 40% being the average [15,16].Rules of thumb for selecting appropriate supply voltage levels have beendeveloped. When using two supply voltages, the lower Vdd was proposed tobe 0.6x-0.7x of the higher Vdd [17]. The optimal supply voltage level alsodepends on Vth [18]. The benefit of using multiple supply voltages quickly saturates. Themajor gain is obtained by moving from a single Vdd to dual-Vdd. Extendingthis to ever more supply voltage levels yields only small incremental benefits[18,19], even when the overhead introduced by multiple supply voltages (seebelow) is not taken into consideration. The power reduction achieved by this technique roughly depends on twoparameters: the difference between the regular supply voltage Vdd and thelowered supply voltage Vdd_low, and the percentage of gates to which Vdd_lowis applied. Regarding the first parameter, it has been pointed out some years ago thatthe leverage of this concept decreases as process technologies are scaleddown further [18]. Recent work has analyzed this in more detail [14]. At least for high-Vthdevices, which are essential for low standby power design due to their lowerleakage current, Vth has scaled much slower than Vdd recently. Therefore,gate overdrive (Vdd - Vth) is diminished, negatively impacting performance.Thus, even a little reduction in Vdd will have a very significant impact onperformance. Therefore, the potential to lower Vdd while maintaining overallsystem performance is greatly reduced. It is shown that from 0.25µm downto 0.09µm, the effectiveness of dual-Vdd decreases by a factor of 2 (from60% dynamic power reduction to 30%) for high-Vth designs, whereas it staysabout constant for low-Vth designs. This can however be countered byintroduction of variable threshold voltages, as will be seen later. Regarding the second parameter, experience has shown that especially indesigns using the multi-Vth technique outlined below, path delays tend to beskewed to higher delays already, thus reducing the number of gates that canbe slowed down further [14]. For the selection of those gates which will receive the lower supplyvoltage Vdd_low, a number of techniques have been proposed. Most prevalentis the concept of clustered voltage scaling (CVS). It recognizes that it isdesirable to have clusters of gates assigned to the same voltage, sincebetween the output of a gate supplied by Vdd_low and the input of a gatesupplied by Vdd a level shifter is required to avoid static current flow [20]. This concept has been enhanced by extended clustered voltage scaling(ECVS)[17] which essentially allows an arbitrary assignment of supply
  30. 30. 14 voltage levels to gates. This strategy implies more frequent insertion of level shifters into the design. However, usually only power consumption and delay are considered in the literature. The additional area cost is neglected. In industry, this certainly is not feasible. While conceptually simple, the implementation of a multi-Vdd concept poses a number of challenges. • The additional supply voltage Vdd_low needs to be created on-chip by a dc- to-dc converter, unless the voltage already exists externally. This results in area overhead, and in power consumption for the converter. • The additional supply voltage Vdd_low must be distributed across the chip. • Level-shifters are required between different supply domains. It is feasible to integrate level shifters into flip-flops [21]. The penalties in area, power consumption and delay resulting from these effects are not always taken into account by work published in the literature. Studies indicate that a 10% area overhead will result from implementing a dual-Vdd design [22]. An additional consideration for industrial IC product development is that EDA tool support for implementing a dual-Vdd design is still only rudimentary. It is not sufficient to have a single point tool which can perform power-performance tradeoffs. Instead, this methodology needs to encompass the entire design flow (e.g. power distribution in layout; automated insertion of level shifters etc.). 1.5.2 Multi-Vth Design Another essential technique is the use of different transistor threshold voltages (multi-Vth design). Primarily this technique reduces leakage power consumption, thus increasing standby time of battery-powered ICs. As leakage power consumption becomes an increasingly important component of overall power consumption in modern process technologies, this technique increasingly also helps to reduce overall power consumption significantly, as design moves to more advanced process technologies. The idea is similar to multi-Vdd design: paths that do not need highest performance are implemented with special leakage-reduced transistors (typically higher Vth transistors, but also thicker gate-oxide Tox), as shown in Figure 1-6.
  31. 31. 15 10ns SET SET D Q D Q CLR Q CLR Q SET SET D Q D Q CLR Q CLR Q 5ns Non-critical path may be delayed 10ns SET SET D Q D Q high Vt high V Q t Q CLR high Vt CLR SET SET D Q D Q CLR Q CLR Q 8ns Non-critical path runs with increased threshold voltage Figure 1-6. Multi-Vth design A typical industrial approach today is to first create a design using lowerVth transistors to achieve the required performance and then to selectivelyreplace gates off the critical path with higher Vth (or thicker Tox) transistorsto reduce leakage. Studies in the literature have reported reductions in leakage of around50% up to 80%. Some approaches assume that different Vth levels areprovided by the process technology (through doping variations) and proposealgorithms to optimally assign Vth levels to transistors, ensuring thatperformance is not compromised [23, 24]. Recently, it has also beenproposed to achieve modifications in Vth by modifying transistor length orgate oxide thickness Tox [25]. Design-tool support for this technique is also rudimentary at best. Whileit is becoming established to design different modules of an IC with differentVth transistors, it is very challenging to do this on the level of individualtransistors within a module. The primary reason is that the entire design flowmust be able to handle cells with identical functionality and size, whichdiffer in their electrical properties. This poses no principal algorithmicproblems, but must be consistently implemented in all EDA tools within adesign flow.
  32. 32. 16 1.5.3 Hybrid Approaches Recently approaches have been suggested in the literature which combine implementation of multiple supply voltages and multiple threshold voltages for further power reduction. Especially for designs where minimization of total power consumption is key (as compared to e.g. minimization of standby power for mobile products), it is possible to trade off leakage and dynamic power, as originally proposed in the zero-Vth concept. Studies in the literature indicate a total power optimum when leakage power contributes 10% to 30% [26,12]. This ratio depends significantly on the process technology, operating environment, and clock frequency of a design. For applications where leakage power minimization is critical (e.g. mobile products), this approach usually is not feasible, as it requires a relatively low Vth which causes high leakage currents [14]. With the increasing significance of gate leakage currents, variations of gate oxide thickness Tox have also been proposed. An overall framework for using two supply voltages and two threshold voltages as well has been presented [19]. Theoretically, it is shown that more than 60% of total power consumption can be saved this way (not considering required overhead such as level shifters, routing etc.). Rules of thumb are proposed and it is shown that the optimal second Vdd is about 50% of the original Vdd in this case. It is also argued that the usefulness of multi- Vdd strategies is not diminished, but actually increased in more advanced technologies, if also a multi-Vth strategy is followed, since this strategy allows to trade off leakage vs. dynamic power consumption by changing Vth and Vdd to optimize power consumption, while maintaining a required timing performance. This approach has been applied to the practical example of an ARM processor in [27]. Due to specific layout considerations it was not possible to implement all four intended combinations of Vdd and Vth. Instead, three different libraries were implemented. Using a CVS algorithm, a reduction in dynamic power by 15% was achieved for a 0.18µm process technology. Leakage power was reduced by 40%. As leakage power was more than 1000x smaller than dynamic power, overall active power reduction was 15%. To achieve this, a 14% increase in area was required. A very recent approach considers also transistor width sizing in addition to Vdd and Vth assignment [28]. Using a two stage, sensitivity-based approach, total power savings of 37% on average over a suite of benchmark circuits are reported. In this study, the threshold voltage is chosen rather low, so that leakage represents 20-50% of total power consumption. Therefore, optimization of both leakage and dynamic power consumption is essential, which is achieved with the presented approach.
  33. 33. 17 An enhanced approach for leakage power consumption considersmultiple gate oxide thicknesses Tox in addition to multi-Vth [29]. It ismotivated by the fact that gate leakage increases very dramatically withnewer process technologies. Gate leakage is of the same order of magnitudeas subthreshold leakage at the 90nm process node. Their relationship alsodepends significantly on the operating temperature T. The key observationthat an OFF transistor suffers from subthreshold leakage, an ON transistorfrom gate leakage, motivates the approach to analyze transistor states instandby mode and assign Vth and Tox such that leakage power consumptionis minimized. Leakage reductions of 5-6x are obtained on benchmarkcircuits, compared to designs using a single Vth and Tox. Previous approaches that included Tox into the optimization varied Toxonly for different design modules, not on critical paths within modules. These newer approaches promise further reductions in powerconsumption. This will come, however, at a price (as seen e.g. in the ARMexample). Design complexity increases significantly when variations inmany parameters are made available at the same time. In some studies, theresulting overhead is not considered.1.5.4 Cost Tradeoffs This overhead must be considered, however, since it is quite significant:• Multi-Vdd: level-shifter (area, power consumption, delay), routing of additional supply voltages (area).• Multi-Vth: additional masks (manufacturing costs); potentially special design rules at the boundary between different Vth devices (area).• Multi-Tox: additional masks (manufacturing costs).• In addition, IC development costs increase due to more complex design flows. Also, special process options (Vth, Tox) must be developed, qualified and continuously monitored. For each such option, the design library must be electrically characterized, modelled for all EDA tools, and potentially optimized regarding circuit design and layout. It must be maintained and regularly updated (changes in electrical parameters, changes in tools in the design flow) over a long period of time as well. If a very specialized manufacturing flow is developed to fully optimize a given product, it will be very difficult to shift manufacturing of this product to a different fab (e.g. a foundry in case additional capacity is required). For these and potentially other reasons, we are not yet aware of industrialproducts that have implemented such proposals in a fine-grained manner (i.e.different Vth, Vdd and Tox combined within one design module).
  34. 34. 18 Some approaches in the literature also determine optimum levels of threshold voltages depending on a given design. In industry, this is rarely feasible. Typically, a manufacturing process has to be taken as given, with only predefined values of Vth (and Tox) being available. 1.6 APPROACHES ON HIGHER ABSTRACTION LEVELS The approaches outlined above on gate level and device level can be (and often must be) supported by measures on higher levels of abstraction. Some of the most promising concepts are as follows: • partitioning the system such that large areas can be powered off for significant periods of time (block turnoff) • especially partitioning memory systems such that large parts can be turned off in standby mode • clock gating is an essential method which reduces dynamic power consumption by local off-switching of non-active gates • coding strategies (e.g. for buses) can reduce switching and thus dynamic power consumption 1.7 CONCLUSION AND FUTURE CHALLENGES There is no single “silver bullet” to solve the challenge of power reduction. While ultra-low voltage logic based on special ultra-low-Vth devices is a conceptually very convincing concept, its widespread implementation is hindered by manufacturing concerns. An extrapolation of current technology trends indicates that such a concept will become even more difficult in the future. Today, design techniques are the most promising approach to reduce power – both dynamic and leakage. The concepts outlined here can be further extended. It is feasible to dynamically adjust supply and threshold voltages. These are theoretically promising concepts which however still require more investigation especially with regard to feasibility under industrial boundary conditions. Quite likely, in the future even more emphasis than today will have to be placed on power reduction schemes on algorithmic and system level. On these levels, the levers to reduce power consumption are largest. Acknowledgement The authors wish to acknowledge and thank Jörg Berthold and Tim Schönauer for their contributions and fruitful discussions.
  35. 35. 19 References[1] G. Moore, Cramming More Components onto integrated circuits, Electronics Magazine, Vol. 38, No. 8, 1965, pp. 114-117.[2] ITRS, International Technology Roadmap for Semiconductors, 2003, http://public.itrs.net.[3] F. Pollack, New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies, Micro32 Keynote, 1999.[4] U. Schlichtmann, Systems are Made from Transistors: UDSM Technology Creates New Challenges for Library and IC Development, IEEE Euromicro Symposium on Digital System Design, 2002, pp. 1-2.[5] S. Borkar, Design Challenges of Technology Scaling, IEEE Micro, July/August 1999, pp. 23-29.[6] S. Thompson, P. Packan, and M. Bohr, MOS Scaling: Transistor Challenges for the 21st Century, Intel Technology Journal, Q3 1998.[7] N. Kim et al., Leakage Current: Moores Law Meets Static Power, IEEE Computer, Vol. 36, No. 12, December 2003, pp. 68-75.[8] S. Sakurai, A. R. Newton, Alpha-Power Law MOSFET Model and its Application to CMOS Inverter Delay and Other Formulas, IEEE Journal of Solid-State Circuits, Vol. 25, No. 2, 1990, pp. 584-594.[9] J.B. Burr, J. Schott, A 200 mV self-testing encoder/decoder using Stanford ultra-low- power CMOS, 1994 IEEE International Solid-State Circuits Conference[10] J. Berthold, R. Nadal, C. Heer, Optionen für Low-Power-Konzepte in den sub-180-nm- CMOS-Technologien (In German), U.R.S.I. Kleinheubacher Tagung 2002.[11] V. Svilan, M. Matsui, J. B. Burr, Energy-Efficient 32 x 32-bit Multiplier in Tunable Near-Zero Threshold CMOS, ISLPED 2000, pp. 268-272.[12] V. Svilan, J. B. Burr, L. Tyler, Effects of Elevated Temperature on Tunable Near-Zero Threshold CMOS, ISLPED 2001, pp. 255-258.[13] C. Heer, Designing low-power circuits: an industrial point of view, PATMOS 2001[14] T. Schoenauer, J. Berthold, C. Heer, Reduced Leverage of Dual Supply Voltages in Ultra Deep Submicron Technologies, International Workshop on Power And Timing Modeling, Optimization and Simulation PATMOS 2003, pp. 41-50.[15] K. Usami, M. Igarashi, Low-Power Design Methodology and Applications utilizing Dual Supply Voltages, Proceedings of the Asia and South Pacific Design Automation Conference 2000, pp. 123-128.[16] M. Donno, L. Macchiarulo, A. Macii, E. Macii, M. Poncino, Enhanced Clustered Voltage Scaling for Low Power, Proceedings of the 12th ACM Great Lakes Symposium on VLSI, 2002, pp. 18-23.[17] K. Usami et al., Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor, IEEE Journal of Solid-State Circuits, Vol. 33, No. 3, March 1998, pp. 463-472.[18] M. Hamada, Y. Ootaguro, T. Kuroda, Utilizing Surplus Timing for Power Reduction, Proceedings IEEE Custom Integrated Circuits Conference CICC, 2001, pp. 89-92.[19] A. Srivastava, D. Sylvester, Minimizing Total Power by Simultaneous Vdd/Vth Assignment, Proceedings of the Asia and South Pacific Design Automation Conference 2003, pp. 400-403.[20] K. Usami, M. Horowitz, Clustered Voltage Scaling Technique for Low-Power Design, Proceedings of the International Symposium on Low Power Design ISLPD, 1995, pp. 3- 8.
  36. 36. 20 [21] K. Usami et al., Design Methodology of Ultra Low-power MPEG4 Codec Core Exploiting Voltage Scaling Techniques, Proceedings of the 35th Design Automation Conference 1998, pp. 483-488. [22] C. Yeh, Y.-S. Kang, Layout Techniques Supporting the Use of Dual Supply Voltages for Cell-Based Designs, Proceedings of the 36th Design Automation Conference 1999, pp. 62-67. [23] Q. Wang, S. Vrudhula, Algorithms for Minimizing Standby Power in Deep Submicrometer, Dual-Vt CMOS Circuits, IEEE Transactions on CAD, Vol. 21, No. 3, March 2002, pp. 306/318. [24] L. Wei, Z. Chen, K. Roy, M. Johnson, Y. Ye, V. De, Design and Optimization of Dual- Threshold Circuits for Low-Voltage Low-Power Applications, IEEE Transactions on Very Large Scale Integration (VLSI), Vol. 7, No. 1, March 1999, pp. 16-24. [25] N. Sirisantana, K. Roy, Low-Power Design Using Multiple Channel Lengths and Oxide Thicknesses, IEEE Design & Test of Computers, January-February 2004, pp. 56-63. [26] K. Nose, T. Sakurai, Optimization of VDD and VTH for Low-Power and High-Speed Applications, Proceedings of the Asia and South Pacific Design Automation Conference 2000, pp. 469-474. [27] R. Bai, S. Kulkarni, W. Kwong, A. Srivastava, D. Sylvester, D. Blaauw, An Implementation of a 32-bit ARM Processor Using Dual Power Supplies and Dual Threshold Voltages, IEEE International Symposium on VLSI, 2003, pp. 149-154. [28] A. Srivastava, D. Sylvester, D. Blaauw, Concurrent Sizing, Vdd and Vth Assignment for Low-Power Design, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp. 718-719. [29] D. Lee, H. Deogun, D. Blaauw, D. Sylvester, Simultaneous State, Vt and Tox Assignment for Total Standby Power Minimization, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp. 494-499.
  37. 37. 21Chapter 2ON-CHIP OPTICAL INTERCONNECT FORLOW-POWERIan O’Connor and Fr´ d´ ric Gaffiot e eEcole Centrale de LyonAbstract It is an accepted fact that process scaling and operating frequency both contribute to increasing integrated circuit power dissipation due to interconnect. Extrapolat- ing this trend leads to a red brick wall which only radically different interconnect architectures and/or technologies will be able to overcome. The aim of this chap- ter is to explain how, by exploiting recent advances in integrated optical devices, optical interconnect within systems on chip can be realised. We describe our vision for heterogeneous integration of a photonic “above-IC" communication layer. Two applications are detailed: clock distribution and data communication using wavelength division multiplexing. For the first application, a design method will be described, enabling quantitative comparisons with electrical clock trees. For the second, more long-term, application, our views will be given on the use of various photonic devices to realize a network on chip that is reconfigurable in terms of the wavelength used.Keywords: Interconnect technology, optical interconnect, optical network on chip2.1 INTRODUCTION In the 2003 edition of the ITRS roadmap [17], the interconnect problem wassummarised thus: “For the long term, material innovation with traditional scal-ing will no longer satisfy performance requirements. Interconnect innovationwith optical, RF, or vertical integration ... will deliver the solution”. Continu-ally shrinking feature sizes, higher clock frequencies, and growth in complexityare all negative factors as far as switching charges on metallic interconnect isconcerned. Even with low resistance metals such as copper and low dielectricconstant materials, bandwidths for long interconnect will be insufficient for fu-ture operating frequencies. Already the use of metal tracks to transport a signalover a chip has a high cost in terms of power: clock distribution for instance
  38. 38. 22requires a significant part (30-50%) of total chip power in high-performancemicroprocessors. A promising approach to the interconnect problem is the use of an opticalinterconnect layer, which could empower an increase in the ratio between datarate and power dissipation. At the same time it would enable synchronous op-eration within the circuit and with other circuits, relax constraints on thermaldissipation and sensitivity, signal interference and distortion, and also free uprouting resources for complex systems. However, this comes at a price. Firstly,high-speed and low-power interface circuits are required, design of which isnot easy and has a direct influence on the overall performance of optical inter-connect. Another important constraint is the fact that all fabrication steps haveto be compatible with future IC technology and also that the additional costincurred remains affordable. Additionally, predictive design technology is re-quired to quantify the performance gain of optical interconnect solutions, whereinformation is scant and disparate concerning not only the optical technology,but also the CMOS technologies for which optics could be used (post-45nmnode). In section 2.2, we will describe the “above-IC” optical technology. Sections2.3 and 2.4 describe an optical clock distribution network and a quantitativeelectrical-optical power comparison respectively. A proposal for a novel opticalnetwork on chip in discussed in section OPTICAL INTERCONNECT TECHNOLOGY Various technological solutions may be proposed for integrating an opticaltransport layer in a standard CMOS system. In our opinion, the most promisingapproach makes use of hybrid (3D) integration of the optical layer above acomplete CMOS IC, as shown in fig. 2.1. The basic CMOS process remainsthe same, since the optical layer can be fabricated independently. The weaknessof this approach is in the complex electrical link between the CMOS interfacecircuits and the optical sources (via stack and advanced bonding). In the system shown in fig. 2.1, a CMOS source driver circuit modulatesthe current flowing through a biased III-V microsource through a via stackmaking the electrical connection between the CMOS devices and the opticallayer. III-V active devices are chosen in preference to Si-based optical devicesfor high-speed and high-wavelength operation. The microsource is coupled tothe passive waveguide structure, where silicon is used as the core and SiO2as the cladding material. Si/SiO2 structures are compatible with conventionalsilicon technology and silicon is an excellent material for transmitting wave-lengths above 1.2µm (mono-mode waveguiding with attenuation as low as 0.8dB/cm has been demonstrated [10]). The waveguide structure transports theoptical signal to a III-V photodetector (or possibly to several, as in the case of
  39. 39. 23 III−V III−V laser source photodetector electrical contact Si photonic waveguide (n=3.5) SiO2 waveguide metallic interconnect structure cladding (n=1.5) driver CMOS IC receiver circuit circuit Figure 2.1. Cross-section of hybridised interconnection structurea broadcast function) where it is converted to an electrical photocurrent, whichflows through another via stack to a CMOS receiver circuit which regeneratesthe digital output signal. This signal can then if necessary be distributed over asmall zone by a local electrical interconnect network.2.3 AN OPTICAL CLOCK DISTRIBUTION NETWORK In this section we present the structure of the optical clock distribution net-work, and detail the characteristics of each component part in the system: ac-tive optoelectronic devices (external VCSEL source and PIN detector), passivewaveguides, interface (driver and receiver) circuits. The latter represent ex-tremely critical parts to the operation of the overall link and require particularlycareful design. An optical clock distribution network, shown in fig. 2.2, requires a singlephotonic source coupled to a symmetrical waveguide structure routing to anumber of optical receivers. At the receivers the high-speed optical signal isconverted to an electrical one and provided to local electrical networks. Hencethe primary tree is optical, while the secondary tree is electrical. It is not feasibleto route the optical signal all the way down to the individual gate level sinceeach drop point requires a receiver circuit which consumes area and power.The clock signal is thus routed optically to a number of drop points which willcover a zone over which the last part of the clock distribution will be carried out
  40. 40. 24by the electrical secondary clock tree. The size of the zones is determined bycalculating the power required to continue in the optical domain and comparingit to the power required to distribute over the zone in the electrical domain. Thenumber of clock distribution points (64 in the figure) is a particularly crucialparameter in the overall system. The global optical H-tree was optimised to achieve minimal optical lossesby designing the bend radii to be as large as possible. For 20mm die width and64 output nodes in the H-tree at the 70nm technology mode, the smallest radiusof curvature (r3 in fig. 2.2) is 625µm, which leads to negligible pure bendingloss. die width, D L CR L CV : source−waveguide coupling loss LY LW : waveguide transmission loss LB : bending loss LY : Y−coupler loss L CR : waveguide−receiver r1 LB coupling loss r3 LW r2 L CV optical source electrical optical optical r1=D/8, r2=D/16, r3=D/32 clock trees waveguides receiversFigure 2.2. Optical H-tree clock distribution network (OCDN) with 64 output nodes. r1−3 arethe bend radii linked to the chip width D2.3.1 VCSEL sources VCSELs (Vertical Cavity Surface Emitting Lasers) are certainly the mostmature emitters for on-chip or chip-to-chip interconnections. Commercial VC-SELs, when forward biased at a voltage well above 1.5V, can emit opticalpower of the order of a few mW around 850nm, with an efficiency of some40%. Threshold currents are typically in the mA range. However, fundamentalrequirements for integrated semiconductor lasers in optical interconnect appli-cations are small size, low threshold lasing operation and single-mode operation(i.e. only one mode is allowed in the gain spectrum). Additionally, the factthat VCSELs emit light vertically makes coupling less easy. It is clear that
  41. 41. 25significant effort is required from the research community if VCSELs are tocompete seriously in the on-chip optical interconnect arena, to increase wave-length, efficiency and threshold current in the same device. Long wavelength,and low-threshold VCSELs are only just beginning to emerge (for example, a1.5µm, 2.5Gb/s tuneable VCSEL [5], and an 850nm, 70µA threshold current,2.6µm diameter CMOS compatible VCSEL [11] have been reported). Ulti-mately however, optical interconnect is more likely to make use of integratedmicrosources as described in section 2.5, as these devices are intrinsically bettersuited to this type of application.2.3.2 PIN photodetectors In order to optimise the frequency and power dissipation performance of theoverall link, photodetectors must exhibit high quantum efficiency, large intrinsicbandwidth and small parasitic capacitance. The photodetector performance ismeasured by the bandwidth efficiency product. Conventional III-V PIN devices suffer from two main limitations. On onehand, their relatively high capacitance per unit area leads to limitations in thedesign of the transconductance amplifier interface circuit. On the other hand,due to its vertical structure, there is a tradeoff between its frequency performanceand its efficiency (the quantum efficiency increases and the bandwidth decreaseswith the absorption intrinsic layer thickness) [9]. Metal-semiconductor-metal (MSM) photodetectors offer an alternative overconventional PIN photodetectors. An MSM photodetector consists of interdig-itated metal contacts on top of an absorption layer. Because of their lateralstructure, MSM photodetectors have very high bandwidths due to their lowcapacitance and the possibility to reduce the carrier transit time. However,the responsivity is usually low compared to PIN photodetectors [4]. MSMphotodiodes with bandwidth greater than 100GHz have been reported.2.3.3 Waveguides Optical waveguides are at the heart of the optical interconnect concept.In the Si/SiO2 approach, the high relative refractive index difference ∆ =(n2 − n2 )/2n2 between the core (n1 ≈ 3.5 for Si) and cladding (n2 ≈ 1.5 for 1 2 1SiO2 ) allows the realisation of a compact optical circuit with dimensions com-patible with DSM technologies. For example, it is possible to realise monomodewaveguides less than 1µm wide (waveguide width of 0.3µm for wavelengthsof 1.55µm), with bend radii of the order of a few µm [15]. However, the performance of the complete optical system depends on theminimum optical power required by the receiver and on the efficiency of passiveoptical devices used in the system. The total loss in any optical link is the sum
  42. 42. 26of losses (in decibels) of all optical components: Ltotal = LCV + LW + LB + LY + LCR (2.1)where LCV is the coupling coefficient between the photonic source and optical waveguide. There are currently several methods to couple the beam emitted from the laser into the optical waveguide. In this analysis we assumed 50% coupling efficiency LCV from the source to a single mode waveguide. LW is the rectangular waveguide transmission loss per unit distance of the optical power. Due to small waveguide dimensions and large in- dex change at the core/cladding interface in the Si/SiO2 waveguide the side-wall scattering is the dominant source of loss (fig. 2.3a). For the waveguide fabricated by Lee [10] with roughness of 2nm the calculated transmission loss is 1.3dB/cm. LB is the bending loss, highly dependent on the refractive index difference ∆ between the core and cladding medium. In Si/SiO2 waveguides, ∆ is relatively high and so due to this strong optical confinement, bend radii as small as a few µm may be realised. As can be seen from fig. 2.3b, the bending losses associated with a single mode strip waveguide are negligible if the radius of curvature is larger then 3µm. LY is the Y-coupler loss, and depends on the reflection and scattering attenuation into the propagation path and surrounding medium. For high index difference waveguides the losses for the Y-branch are significantly smaller than for low ∆ structures and the simulated losses are less then 0.2dB per split [14]. LCR is the coupling loss from the waveguide to the optical receiver. Using currently available materials and methods it is possible to achieve an almost 100% coupling efficiency from waveguide to optical receiver. In this analysis the coupling efficiency is assumed to be 87% (LCR = 0.6dB) [16].2.3.4 Interface circuits High-speed CMOS optoelectronic interface circuits are crucial building blocksto the optical interconnect approach. The electrical power dissipation of thelink is defined by these circuits, but it is the receiver circuit that poses the mostserious design challenges. The power dissipated by the source driver is mainlydetermined by the source bias current and is therefore device-dependent. Onthe receiver side however, most of the receiver power is due to the circuit, whileonly a small fraction is required for the photodetector device.
  43. 43. 27 60 100 1 50 Transmission loss (dB/cm) 0.01 Pure bending loss (dB) 40 0.0001 30 1e-06 1e-08 20 1e-10 10 1e-12 0 1e-14 1 2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 Sidewall roughness (nm) Bend radius (um)Figure 2.3a. Simulated transmission loss Figure 2.3b. Simulated pure bending lossfor varying sidewall roughness in a for various bend radii in a 0.5µm× 0.2µm0.5µm× 0.2µm Si/SiO2 strip waveguide Si/SiO2 strip waveguide2.3.4.1 Driver circuits. Source driver circuits generally use a currentmodulation scheme for high-speed operation. The source always has to bebiased above its threshold current by a MOS current sink to eliminate turn-ondelays, which is why low-threshold sources are so important (figures of theorder of 40µA [7] have been reported). A switched current sink modulatesthe current flowing through the source, and consequently the output opticalpower injected into the waveguide. As with most current-mode circuits, highbandwidth can be achieved since the voltage over the source is held relativelyconstant and parasitic capacitances at this node have reduced influence on thespeed. Receiver circuits. A typical structure for a high-speed pho-toreceiver circuit consists of: a transimpedance amplifier (TIA) to convert thephotocurrent of a few µA into a voltage of a few mV; a comparator to gener-ate a rail-to-rail signal; and a data recovery circuit to eliminate jitter from therestored signal. Of these, the TIA is arguably the most critical component forhigh-speed performance, since it has to cope with a generally large photodiodecapacitance situated at its input. The basic transimpedance amplifier structure in a typical configuration isshown in fig. 2.4 [8]. The bandwidth/power ratio of this structure can be max-imised by using small-signal analysis and mapping of the individual componentvalues to a filter approximation of Butterworth type. It is then possible to develop a synthesis procedure which, from desiredtransimpedance performance criteria (gain Zg0 , bandwidth and pole qualityfactor Q) and operating conditions (photodiode and load capacitances, Cd andCl respectively) generates component values for the feedback resistance Rf andthe voltage amplifer (voltage gain Av and output resistance Ro ). Circuits withhigh Ro /Av ratio (≈ 1/ gm ) require the least quiescent current and area andthis quantity constitutes therefore an important figure of merit in design space
  44. 44. 28 Rf 1 1 + Av Ii ω0= −A R oC y M f (M x + Mm(1 + Mx )) v M f (M x + M m(1 + M x ))(1 + A v ) Cd Cl Q= Vo 1 + M x (1 + M f ) + MmM f (1 + A v ) Vdd Vi Cm M2 M3 Vo Z g0 = − ( R f − Ro /Av 1 + 1/Av ) M f = Rf / R o M i = Cx / C y Mm= Cm / C y M1 Ci Co Cx = Cd + Ci Cy = C l + Co Figure 2.4. CMOS transimpedance amplifier structureexploration (fig. 2.5a). To reach a sized transistor-level circuit, approximateequations for the small-signal characteristics and bias conditions of the circuitare sufficient to allow a first-cut sizing of the amplifier, which can then be fine-tuned by numerical or manual optimisation, using simulation for exact results.The complete process is described in [13]. Amplifier Ro/Av requirement Ci=500fF Cl=100fF 1THzohm Transimpedance amplifier characteristics against technology node Ro/Av Cd = 400fF, Cl = 150fF 300 250 100 200 Area / um2 150 Quiescent power / 100uW 100400 50350300250 10200150100 50 0 1 1 10000 3 3000 Bandwidth Transimpedance requirement gain 0.1 /GHz 10 1000 requirement 350 180 130 100 70 45 /ohms Technology node (nm)Figure 2.5a. TIA Ro /Av design Figure 2.5b. Evolution of TIA character-space with varying bandwidth and istics (power, area, noise) with technologytransimpedance gain requirements node Using this methodology with industrial transistor models for technologynodes from 350nm to 180nm and predictive BSIM3v3/BSIM4 models for tech-nology nodes from 130nm down to 45nm [3], we generated design parametersfor 1T HzΩ transimpedance amplifiers to evaluate the evolution in critical char-acteristics with technology node. Fig. 2.5b shows the results of transistor levelsimulation of fully generated photoreceiver circuits at each technology node.