Your SlideShare is downloading. ×
0
1520 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004
Resonant Clocking Using Distributed
Parasitic Ca...
DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1521
Fig. 1. Buffer-driven clock network and reson...
1522 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004
quency is , then the ratio of power dissipated i...
DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1523
Fig. 3. Equivalent negative-resistance oscill...
1524 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004
Thus, the frequency is composed of a fundamental...
DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1525
Fig. 7. Microphotograph of resonant clock tes...
1526 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004
Fig. 9. Dual-phase clock eye-diagram.
In the tes...
DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1527
and more data induced capacitance changes. Un...
1528 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004
Tuyet Y. Nguyen was born in Vietnam.
She joined ...
Upcoming SlideShare
Loading in...5
×

Resonant clocking using distributed parasitic capacitance

228

Published on

For more projects or your own idea contact us @ www.nanocdac.com

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
228
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Resonant clocking using distributed parasitic capacitance"

  1. 1. 1520 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 Resonant Clocking Using Distributed Parasitic Capacitance Alan J. Drake, Student Member, IEEE, Kevin J. Nowka, Member, IEEE, Tuyet Y. Nguyen, Jeffrey L. Burns, Member, IEEE, and Richard B. Brown, Senior Member, IEEE Abstract—A resonant-clock generation and distribution scheme that uses the inherent, parasitic capacitance of the clocked logic as a lumped capacitor in a negative-resistance oscillator is described. Clock energy is resonated between inductors and the parasitic, local clock network to save power over traditional clocking methodologies. Theory predicts that the data passing though the clocked logic will change the clock frequency by less than 1.25%. A resonant clock test chip was designed and fabricated in an IBM 0.13- m partially depleted SOI process. Although the test chip was designed to operate in the gigahertz range using integrated inductors, startup difficulties required the addition of external inductance to reduce the resonant frequency so that the effects of the parasitic capacitance could be measured. The parasitic capac- itance is approximately 40 pF per clock phase, resulting in a clock frequency between 106 and 146 MHz, depending on biasing. At its most efficient bias point, the clock dissipated 2.09 mW, which is approximately 35% less power than a conventional, buffer-driven clock. The maximum period jitter measured in the resonant clock due to changing data in the clocked latches was 55 ps at 124 MHz, or 0.68% of the clock period. Index Terms—Clock generator, energy-recovery circuit, har- monic resonance, low power. I. INTRODUCTION THE clock distribution network of a microprocessor is typ- ically divided into global and local clock distributions as shown in Fig. 1(a). The global clock distribution comprises a clock source and the wires and buffers needed to drive the clock source to the logic gates. Since the clock buffers essentially drive the clock network in parallel, they can be combined to form the simplified circuit in Fig. 1(b). The buffers form an -stage exponential horn where each stage has a gain of . The total capacitance of the global buffers and wires is labeled as . The local clock distribution consists of the wires that connect the clock loads—latches and gates—in the micropro- cessor’s functional units. The capacitance of the local clock dis- tribution, , is the sum of the local wires and gate loads and forms the clock sink. In a properly designed exponential horn, the gain is balanced evenly across a number of stages, ; the input capacitance of each stage is the output capacitance of the Manuscript received January 24, 2004; revised February 27, 2004. This work was supported in part by a faculty grant from the IBM Austin Center for Ad- vanced Studies. A. J. Drake, K. J. Nowka, and T. Y. Nguyen are with the IBM Austin Research Laboratory, Austin, TX 78758 USA (e-mail: adrake@us.ibm.com). J. L. Burns is with the IBM Thomas J. Watson Research Laboratory, Yorktown Heights, NY 10598 USA. R. B. Brown is with the University of Utah, Salt Lake City, UT 84112 USA. Digital Object Identifier 10.1109/JSSC.2004.831435 stage divided by the stage gain as shown in Fig. 1(a). An ap- proximate value for the clock distribution capacitance can be computed as (1) which is sufficiently accurate even for a small number, , of buffer stages; the added capacitance of the third stage from the load of a balanced buffer horn with a stage gain of 3 is . Ignoring leakage and short-circuit power, the value for ob- tained in (1) can be used to estimate the power dissipated in the clock distribution network as (2) where is the power-supply voltage and is the clock fre- quency. For a buffer horn with a stage gain of 3, 2/3 of the clock power is dissipated in the local clock distribution and latches, which makes reducing local clock capacitance the prime target for reducing clock power. As can be seen from (2), the clock power dissipation de- pends strongly on the capacitance of the local clock distribu- tion. Deeper pipelining and greater complexity with each new generation of microprocessors has steadily increased the size of the local clock load. This increasing clock load, combined with ever-increasing clock frequencies, has made the clock-distribu- tion network the major power consumer in modern micropro- cessors. The POWER4 microprocessor, for example, dissipates 70% of its power in its clock distribution and latches [1]. Little can be done to reduce the capacitance of the local clock distribution by the clock-tree designer since the logic fixes the clock load. Instead, local clock power is reduced by shutting off sections of the local clock distribution using aggressive clock gating. To reduce global clock power, care is taken to optimize the global clock-distribution capacitance through effi- cient buffer allocation [2]. Another approach is to leverage the clock line inductance to aid in signal propagation of the global clock [3]. These techniques have been successful in slowing the growth of clock power dissipation but are ultimately limited by the fixed clock capacitance that needs to be switched. More exotic clock-power-reduction techniques that use some form of resonance to recycle clock energy have been increas- ingly studied due to their potential for significant power reduc- tion. The general idea is to form an energy-efficient tank with a high quality factor that dissipates power only in the parasitic resistance of the network, not in switching the clock capaci- tance. Adiabatic circuits represent the ultimate goal in resonance 0018-9200/04$20.00 © 2004 IEEE
  2. 2. DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1521 Fig. 1. Buffer-driven clock network and resonant clock network diagrams. (a) Buffer-driven clock distribution. (b) Reduced clock tree. in that all circuit power is recycled. Adiabatic logic benefits from slow transition times making it impractical for high-per- formance logic, although modified adiabatic circuits have been developed that run above 100 MHz [4], [5]. Another resonant- clock generation technique establishes a standing or traveling wave using the transmission-line characteristics of the clock lines; this approach has yet to demonstrate a power advantage over established clock-distribution techniques [6]–[9]. The resonant-clock scheme presented here addresses the power dissipation in the local clock directly by using the para- sitic capacitance inherent in the local clock distribution as the capacitor in an LC tank. All clock buffers and their associated capacitance are removed and the clock energy is resonated between integrated inductors and the local clock capacitance. Unlike adiabatic circuits which power the logic from the clock and rely on slow edges, this resonant-clock scheme has the potential to run at frequencies used in modern microprocessors since only the gate capacitance is driven by the clock. By incorporating the capacitance as part of the oscillator, clock generation, and distribution are designed concurrently and Fig. 2. Ideal resonant clock-generation and distribution. the oscillator naturally selects the most efficient frequency; un- like buffer driven resonant clock networks such as in [9] where the natural frequency of the network has to be tuned to the clock frequency. Unfortunately, clock-gating can only be achieved in the proposed scheme by shutting down the oscillator, which is possible if the startup time of the oscillator can be tolerated. The resonant-clock generation scheme presented here can be used to replace entire clock systems for small designs or the quadrant clocks in larger designs. Thanks to improving inte- grated inductors and copper metallization in advanced semi- conductor technologies, the quality factor of the resonant cir- cuit is sufficient to effect clock power reduction over ungated buffer-driven local clocking techniques. The next section will review the theory behind distributed-capacitance resonant-clock generation. Following that, a prototype resonant clock, built in IBM’s 0.13- m partially depleted SOI (PD-SOI) [10] will be described. II. RESONANT CLOCKING THEORY The main advantage of resonant clocking is a reduction of clock power, but the procedure introduces challenges for the designer such as jitter and skew management and nonlinear load capacitance. Each of these will now be examined. A. Power The power reduction can only occur if less static power is dis- sipated in the parasitic resistance of the resonant clock than is dissipated switching the buffers and local clock capacitance of a buffer-driven clock. To form the resonant clock, integrated in- ductors are placed in parallel with the clock load, , creating an RLC circuit as shown in Fig. 2. At resonance, the impedance of the parallel elements is infinite. Power is only dissipated in the parasitic resistance, , which arises from the resistive elements of the inductors and the distributed capacitance. The clock generated by the resonant circuit is a sinusoid of the form whose magnitude, , depends on the magnitude of . The resonant frequency, , is determined by . To allow comparisons to the buffer-driven clock, is assumed to be , providing a clock that swings be- tween 0 V and . The average power dissipation in the RLC circuit at resonance is (3) Given that the quality factor, , of a parallel RLC circuit is , that the clock load is , and that the clock fre-
  3. 3. 1522 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 quency is , then the ratio of power dissipated in the proposed resonant clock versus a buffer-driven clock, from (2) and (3), is (4) Thus, a resonant-clock distribution with a greater than will use less power than a buffer-driven clock network with a stage gain of 3. The quality of integrated inductors has im- proved significantly with each technology generation; inductor values greater than 15 were reported for the target technology [11]. The achievable power reduction in the resonant clock de- pends on how low the resistance of the clock distribution can be made. Another advantage of the proposed resonant clock is the ability to control the maximum voltage of the output clock simply by varying . This feature can be used to overdrive the clock signal for faster rise and fall times at the logic gates at the expense of some extra power, but without having to generate and propagate higher clock harmonics. To do this with buffer-driven clocks, the output resistance of the drivers must be made smaller by increasing the driver size which increases the capacitance in the network. Unfortunately, the clock-distri- bution network can nullify such efforts by filtering the higher clock harmonics, which is why clocks on high-performance processors become more sinusoidal with each generation. Care must be taken when overdriving the resonant clock to prevent the clock waveform from clipping so that it is no longer sinu- soidal. Clipping increases the power dissipation in the resonant circuit and reduces its efficiency [12]. B. Skew Management Since the clock network serves as the capacitance setting the clock frequency, it must be made small enough to avoid trans- mission-line effects and to keep skew manageable. Electromag- netic signals propagate at a speed given by (5) or roughly 150 m/ps in wires in an SiO insulator. Thus, for a 1-GHz clock, the clock sinks must extend less than 15 mm from the clock source to meet a skew requirement of 10%. The actual propagation time will be slower than predicted by (5) due to the loading and branching in the clock network, so some margin must be included in the design. A block of logic 7 cells by 64 cells, where the cells are 16 16 wire tracks and contain pass-gate clock loads, was simulated in IBM’s 0.13- m SOI process with a 3-GHz clock. Using distributed RLC pi models, the skew from top left to bottom right of the network was 10 ps, which is longer than the 6.9 ps predicted by (5). Fortunately, the skew requirements are stringent enough to ensure that the clock network does not behave like a transmission line. Since the rise time of a sinusoidal clock is 50% of its period and skew targets are less than 10% of the clock period, a clock network that meets skew requirements will never be long enough for reflections to become problematic. C. Quality Factor There are a number of definitions for quality factor which are essentially equivalent: Maximum Energy Stored Energy Dissipated per Cycle Maximum Energy Stored Average Power Dissipation (6) where is the resonant frequency and is the bandwidth, the difference between the half-power frequency above and below [13]. Quality factors of individual components are more easily characterized than a resonant circuit, so it is useful to be able to relate the quality factor of the components to that of the overall circuit. Real inductors and capacitors contain lossy elements and have somewhat complicated models when all physical effects are taken into consideration. However, if the RLC resonant frequency is well below the self resonance of the circuit elements, then at resonance the inductor and capacitor are inductive and capacitive with some real lossy component. The quality factor of a nonideal inductor [13] in parallel form is approximated from (6) by (7) where is the parasitic resistance of the inductor expressed as a parallel resistance at resonance. The quality factor for a nonideal capacitor [13] in parallel form is approximated by (8) where is the equivalent parallel resistance in the capac- itor. Solving (7) and (8) for and , substituting into the quality factor of a parallel RLC tank, , and performing some algebra provides the tank quality factor in terms of its component’s quality factors: (9) From (9) it is apparent that a low-quality distributed capac- itor will limit the quality of the resonant circuit. To improve the quality factor, the clock resistance must be kept to a minimum by utilizing techniques already needed for reducing skew in stan- dard clock-distribution methods such as clock grids, fat wires, and multiple vias. Unlike a standard clock distribution where reduced capacitance is a must, the quality factor of the resonant clock is improved by adding extra capacitance to the distribu- tion network. In most integrated oscillators, the quality of the inductor limits the quality factor, but the distributed nature of the capacitor adds enough parasitic resistance to the capacitance to limit the quality factor. D. Nonlinear Capacitance The most challenging part of the resonant clock is character- izing the distributed, parasitic capacitor. If the clock network is designed to meet skew requirements and avoid transmission-line effects, its parasitic capacitance acts like a lumped capacitor, but
  4. 4. DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1523 Fig. 3. Equivalent negative-resistance oscillator circuit. Fig. 4. Master–slave D-flip-flop. with a time-varying characteristic. Fig. 3 shows an equivalent model of a negative-resistance oscillator used for the resonant clock. The time-varying capacitance is represented as a fixed capacitance, , in parallel with a periodically varying capac- itance, , and time-dependent noise capacitance, . If designed correctly, the negative resistance and parasitic resis- tance cancel and the circuit behaves like an ideal LC tank which has a transfer function of (10) and a natural frequency of . Unfortunately, the capacitance in the distributed network is not constant, but a func- tion of two independent voltages applied to the gate and drain of the transistors. The gate voltage comes from the clock signal and the drain voltage, which is pseudo-random, comes from the data flowing through the logic. Equation (10) is not an accurate model of the transfer function. In Fig. 3, the time-varying ca- pacitance associated with the gate voltage is the periodic capac- itance, , and the time-varying capacitance associated with the data signals is the noise capacitance, . The common flip-flop design in Fig. 4 is used for the clock loads in this study. Fig. 5 shows the flip-flop’s simulated input gate capacitance variation for sinusoidal and square-wave gate voltages, ignoring the effect of the data signals. The change in gate capacitance is periodic with the input waveform and since the capacitance changes are driven by the clock at steady state, a stable oscilla- tion frequency will be reached, as will be explained later. The noise capacitance, , is more difficult to understand because it results from logic signals travelling through the clocked logic and will be pseudo-random in nature. The logic is driven by sources independent of the clock and will cause Fig. 5. Gate capacitance variation with input waveform. some amount of mixing in the clock signal. An intuitive under- standing of this effect is obtained from the KCL node equation for the circuit in Fig. 3 (11) There is no analytical solution to (11), but some insight into the solution can be gleaned from its Fourier transform, which is given by (12) The last two convolution terms on the right-hand side of (12) are not in (10) and result directly from the two time-varying capacitances. In the steady-state solution for (12), and are co-periodic so they will not cause phase noise. The noise capacitance, , and data voltages, , are random and will modulate the gate voltage, causing jitter. The difficulty is in analyzing the magnitude of this effect in a clock circuit. The oscillator behaves like a frequency modulation circuit where the data voltage acts as a modulating signal [14]. The instantaneous frequency of the oscillator is given by (13)
  5. 5. 1524 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 Thus, the frequency is composed of a fundamental frequency, , divided by a normalized time-varying capacitance. If the time-varying capacitance, , has a small maximum ampli- tude variation, , then a reasonable approximation can be made as follows (14) Equation (14) is valid when the change in capacitance is small relative to the average capacitance value. Plugging (14) into (13) gives an instantaneous frequency, , of (15) If the time-varying capacitance is then the instantaneous frequency is (16) By defining , then the phase shift over time is given by (17) Equations (16) and (17) can be used to analyze the effects of the time varying load capacitance on the oscillation frequency and on the jitter of the oscillator [15]. The most important ob- servation is that the frequency deviation is small if the change in capacitance is also small. If then the frequency will only deviate by 5%. Simulations show that the clock load capacitance of the D-flip-flop changes by 5% due to changes in the data-flow. If to of the clock network capacitance is in the gates of the latches then the maximum change in the clock network capacitance, which occurs when the data in all the latches changes at the same time and in the same direction, is between 1.7% and 2.5%. This results in a change in frequency of 0.83% to 1.25%. Simulations of a simplified resonant clock network showed less than 15-ps period jitter for a 2-GHz clock. This analysis does not account for capacitive coupling between the gate and the drain which will depend strongly on the edge rate of the logic and which may be significant if data movement is the same through a majority of the logic gates. E. Miscellaneous Concerns Some other concerns for resonant clocking include the area penalty associated with the integrated inductors, how to gate the clock to reduce power when a functional unit is not needed, and how to synchronize clock domains. Each of these will be briefly addressed. The area penalty can only be reduced by using res- onant clocks that require the minimum number and size of in- ductors. Fortunately, since the quality of the parasitic capacitor limits the quality of the resonant clock, some tradeoffs can be Fig. 6. Modified test-chip block diagram. made between inductor area and quality. As for clock gating, it is a challenge since the local clock buffers have been removed. Turning off the clock is an option but wastes time and power waiting for the clock to settle. Finally, to use the presented res- onant clock in a large design that cannot be covered by a single clock domain due to skew requirements, some tuning mecha- nism would have to be incorporated to synchronize multiple clock domains [16], or a hand-shaking system used between dif- ferent clock networks. III. RESONANT CLOCKING TEST CHIP A test chip was fabricated in IBM’s 0.13- m RF PD-SOI CMOS process [10] to analyze the frequency, power dissipa- tion, and quality of the proposed resonant clock as compared to a buffer-driven clock network. A block diagram of the test chip is shown in Fig. 6 and a microphotograph of the test chip, minus the external inductors, is shown in Fig. 7. The load of the local distribution consists of three 8 64 scan-chains connected by a clock grid. Each scan-chain has eight rows, for a total of 24 rows, where each row is composed of 64 D-flip-flops. The ex- perimental clock load represents the clock load of a block of static CMOS logic with 24 latch stages, as may be found in the functional units of a 64-bit pipelined microprocessor. The clock distribution is laid out differentially in metal-2 over each cell with a parallel grid on metal-4. The logic gates are powered by the supply voltage . A negative-resistance oscillator was designed as the resonant clock source with integrated inductors similar to those reported in [11]. The capacitance used to set the resonant frequency of the oscillator is the parasitic capacitance of the clock network and was estimated, based on wire models and manual extrac- tion, since an automatic extraction deck was not available, to be about 21 pF per clock phase. The parasitic resistance and induc- tance in the clock wires joining the VCO with its clock load were underestimated and so the NFETs in the VCO are undersized. The wires have 3- resistance and about 0.8 nH which is enough
  6. 6. DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1525 Fig. 7. Microphotograph of resonant clock test macro. to keep the VCO from starting. To get the chip to start up, the integrated inductors were cut out and the internal clock-node bonded to connect to off-chip inductors. The total inductance consists of the bond-wire inductance, , and the external inductor, . By using larger, off-chip inductors, the resonant frequency of the oscillator was lowered to a value where the transconductance of the crosscoupled NFETs could cancel the parasitic resistance and maintain the oscillation. Using off-chip inductance changes the resonant-clock experiment because the inductance is no longer integrated and the clock frequency is much lower than on a high-performance VLSI circuit. Never- theless, the capacitance that sets the oscillation frequency is still the local clock capacitance and its affect on clock stability due to data signals can still be measured. To simulate a conventional, buffer-driven clock distribution, an 11-stage ring-oscillator and associated 7-stage buffer horn, with a stage gain of 3, were added for power comparisons. Both the resonant clock and the ring oscillator drive the same local clock network but the ring oscillator is tri-stated, so only one clock driver has access to the clock grid at a time. The ring-oscillator characteristics were measured after cutting out the inductors with a laser. Simulations were performed to ensure that the ring-oscillator frequency was close to 2 GHz with 10% edge rates at 1.2 V. Both clock phases are divided down by 64 and output for frequency measurements. There are three power-supply domains on the chip that separate the oscillator, scan-chain, and ring-oscillator power. IV. TEST RESULTS The test chip operates in two modes for testing. In the reso- nant mode, the ring oscillator is disabled and the resonant clock controls the clock network. In the ring-oscillator mode, the res- onant clock is disconnected from the clock grid using laser trim- ming and the ring oscillator drives the clock network without the inductors. Functionality of the latches, and by correlation the Fig. 8. Operating frequency of the resonant clock and the ring-oscillator. clock, was determined by monitoring the divided clock output. The clock was also monitored at the junction of and . Fig. 8 shows the measured clock frequency of the reso- nant clock and the ring-oscillator as a function of the supply voltage on the logic. The resonant-clock frequency varies from 147 MHz when the VCO is driven by 0.4 V to 112 MHz when driven by 0.6 V, which is consistent with simulations. The ring-oscillator frequency, on the other hand, increases rapidly with power supply because of increasing current drive in the individual delay elements. Measurements were taken with ex- ternal inductors ranging from a simple wire to a 420-nH air-core inductor. Measurements indicate that the clock load is between 38 and 45 pF in each phase of the clock, the bond-wire has an inductance of 15.9 nH, and the external wire has an inductance of 17.1 nH. The clock frequency measured is within 12.5% of the predicted value. For all remaining measurements, the external inductor consisted only of a simple wire between the bond-pad and the power supply. Fig. 9 shows the eye diagram of the dual-phase clock mea- sured at the inductor bond-pin. The low voltage swing and dis- tortion in the waveform occur, according to simulations, because the eye diagram was measured between a voltage divider com- posed of the external and the bond-wire inductance, not at the clock gates. Simulations show a cleaner signal at the clock gates, and the logic is functional in measurements, but the actual shape of the clock signal cannot be verified in this test chip. Fig. 10 shows the power dissipated in the ring-oscil- lator-driven clock versus the resonant clock. The ring-oscillator power has also been scaled by frequency and supply voltage to compare the two techniques. Three things complicate this comparison. First, the buffer-horn was not optimized and may dissipate more power than a well designed clock distribution network. Second, the tri-state inverters add an extra load to the resonant clock that would not normally be present. Third, this comparison is not exact because the actual amplitude of the resonant clock voltage could not be measured. Knowing the amplitude of the clock signal is necessary for an accurate com- parison between resonant clocking and buffer-driven clocking from (4).
  7. 7. 1526 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 Fig. 9. Dual-phase clock eye-diagram. In the test chip, the clock voltage swing can be measured at the point where the bond-wire inductor (16 nH) and the external wire (17 nH), meet. Measurements taken at that tapped point show the clock voltage swing at the logic gates is from V to , depending on the bias on the resonant clock. The clock voltage at the clock gates on chip is higher than that mea- sured at the tapped point and its magnitude is determined by the ratio of inductance in the bond-wire to the inductance in the external wire. Simulations of the clock network, under the bias conditions being tested, show that the on-chip clock voltage is 300 mV higher than the clock voltage at the pad where the two inductors meet. The simulations predict a power dissipation of 2.8 mW at this bias point. Based on these simulations and the measurements taken, the on-chip resonant clock swing at the most efficient point measured— of 0.67 V and of 0.43 V—is between 0.63 and 0.93 V. The resonant-clock power, measured at V with a frequency of 147 MHz, is 2.06 mW. The ring-oscillator clock power was measured as 7.72 mW with a clock frequency of 360 MHz at V. Scaling the measured ring-oscillator clock power to 147 MHz at 0.63 V and 147 MHz at 0.93 V yields a scaled power of 3.15 mW and 6.9 mW. The resonant clock thus dissipates between 35% and 70% less power than the buffer-driven clock at the same fre- quency and clock amplitude. The actual power savings is most likely near the lower part of this range. From (4), the quality factor of the resonant circuit is 2.4 if the power savings is 35% and 5.3 if the power savings is 70%, which is quite low. For an inductor quality of 15, this means the capacitance has a quality between 2.8 and 8.2 from (9). Again, the lower number is prob- ably most accurate based on simulations. It is important to note that the resonant clock has a wide swing in power dissipation without increasing clock frequency. This is due to the resonant clock leaving its sinusoidal operation and generating a waveform that looks like a half sine wave [12]. For the resonant clock to make sense from a power perspective, it must be designed efficiently and kept in its sinusoidal operating mode. Fig. 11 shows the period jitter measurements of the output of the divider as the scan-chain input frequency was increased from DC to one-half of the resonant frequency. Period jitter is Fig. 10. Power dissipation of the resonant clock and the ring oscillator. Fig. 11. Period jitter of the resonant clock and the ring oscillator as the frequency of data passing through the scan-chain is swept. the square-root of the variance of the width of the clock period. The measurements were made using an Agilent Infiniium Oscil- loscope using the method described in [17]. Since the clock load is a scan-chain, the state of all of the flip-flops changes each time the input changes and in the same direction, maximizing the ca- pacitance change in the clock network. The resonant clock jitter was measured with the oscillator running at 2.0 MHz and the ring-oscillator jitter was measured with the ring oscillator run- ning at 1.96 MHz. The output clock frequency is the internal clock divided by 64. The on-chip period jitter can be approx- imated by dividing the period jitter of the output clock by the square root of the divider [18], or 8 in this case, although this ignores the jitter contribution of the divider. The ring-oscillator jitter is higher than the resonant clock jitter due to well studied differences between and delay- based oscillators. Jitter in the 2-MHz divided output clock is a relatively flat 400 ps until the data rate approaches one-half of the clock frequency where jitter rapidly rises to 910 ps, or 0.18% of the clock period. The maximum jitter of the internal clock, measured at the inductor bond-pad, was 55 ps, or 0.68% of the 124-MHz clock. The closer the data frequency is to half the resonant frequency, the worse the jitter becomes, which is ex- pected because faster data rates mean more capacitive coupling
  8. 8. DRAKE et al.: RESONANT CLOCKING USING DISTRIBUTED PARASITIC CAPACITANCE 1527 and more data induced capacitance changes. Unfortunately, a logic chip will have random data patterns, not deterministic pat- terns as measured here. To approximate a more real scenario, the data frequency was varied randomly between 1 and 60 MHz for several minutes. The measured jitter was 555 ps, or 0.11%. At higher frequencies, the edge rates are sharper, so capacitive coupling should increase the jitter as the gigahertz range is ap- proached, but since these changes are mostly local in the dis- tributed clock network, they should not be significant. V. CONCLUSION The test macro demonstrates that a stable resonant clock can be implemented using the inherent parasitic capacitance of the local clock network in an LC tank. Both a resonant clock using the local gate capacitance and a ring-oscillator-driven buffer- horn clock distribution were implemented. A stable sinusoidal clock between 112 and 147 MHz, depending on biasing, was measured using a straight wire for the external inductance. An analysis of the voltage-varying gate capacitance shows that data flowing in the clock network should change the clock frequency by less than 1.25%. A maximum period jitter of 0.68% was mea- sured when the scan-chain data frequency approached one-half of the clock frequency. Power comparisons indicate that the res- onant clock dissipates around 35% less power than the buffer- driven clock with an estimated quality factor between 2.4 and 5.3. Since the off-chip inductors used in the measurements have quality factors between 15 and 30, the of the parasitic capac- itance is nearly the quality of the tank. On-chip inductors in this technology were measured with quality factors above 15 as well [11], so moving the inductance on-chip should not adversely af- fect the power savings. The main disadvantage of scaling the clock into the multigigahertz range is the increase in wire re- sistance due to skin effect which will decrease the already low quality of the distributed capacitor. Some things can be done to improve the quality of the parasitic capacitance. The clock signal was partially routed in poly-silicon within the D-flip-flop, so removing poly routing and using wider wires in lower metal layers would improve the quality of the capacitor; clock wire widths in general need to be wider to handle the current needed at higher frequencies. Since the capacitor quality is the limiting factor, some loss of in the inductor can be tolerated to save area. Moving the inductor close to the logic circuits and using a multiturn inductor instead of a single-turn inductor would save area at the expense of some inductor quality. A balanced VCO with crosscoupled PFETs as well as NFETs uses only one inductor instead of two for even more area savings. A second generation of the resonant clock designed to operate in the multigigahertz range is being developed that improves the clock load and area using these techniques. ACKNOWLEDGMENT The authors acknowledge the contributions made by R. Mon- toye and U. Ghoshal of IBM’s Austin Research Laboratory and the help with the technology given by N. Zamdmer, M. Sherony, M. Talbi, and J.-O. Plouchart of IBM-Fishkill. REFERENCES [1] C. J. Anderson et al., “Physical design of a fourth-generation POWER GHz microprocessor,” in IEEE ISSCC Dig. Tech. Papers, 2001, pp. 232–233. [2] P. J. Restle et al., “A clock distribution network for microprocessors,” J. Solid-State Circuits, vol. 36, pp. 792–799, May 2001. [3] X. Huang, P. Restle, T. Bucelot, Y. Cao, T. J. King, and C. Hu, “Loop-based interconnect modeling and optimization approach for multi-gigahertz clock network design,” J. Solid-State Circuits, vol. 38, pp. 457–463, Mar. 2003. [4] W. Athas et al., “The design and implementation of a low-power clock-powered microprocessor,” J. Solid-State Circuits, vol. 35, pp. 1561–1569, Nov. 2000. [5] S. Kimm, C. Ziesler, and M. Papaefthymiou, “A true single-phase en- ergy-recovery multiplier,” IEEE Trans. VLSI Syst., vol. 11, pp. 194–207, Apr. 2003. [6] P. Restle and X. Huang, “Inductance: Implications and solutions for high-speed digital circuits,” in IEEE ISSCC Dig. Tech. Papers, 2002, pp. 558–562. [7] J. Wood, T. C. Edwards, and S. Lipa, “Rotary traveling wave oscillator arrays: A new clock technology,” J. Solid-State Circuits, vol. 36, pp. 1654–1665, Nov. 2001. [8] F. O’Mahony, C. P. Yue, M. A. Horowitz, and S. S. Wong, “A 10-GHz global clock distribution using coupled standing-wave oscillators,” J. Solid-State Circuits, vol. 38, pp. 1813–1820, Nov. 2003. [9] S. Chan, K. Shepard, and P. Restle, “Design of resonant global clock distributions,” in Proc. ICCD, 2003, pp. 248–253. [10] N. Zamdmer et al., “A 0.13-m SOI CMOS technology for low-power digital and RF applications,” in Symp. VLSI Technology Dig. Tech. Pa- pers, 2001, pp. 85–86. [11] N. Zamdmer et al., “Suitability of scaled SOI CMOS for high-frequency analog circuits,” in Proc. ESSDERC, 2002, pp. 511–514. [12] D. Ham and A. Hajimiri, “Concepts and methods in optimization of in- tegrated LC VCOs,” J. Solid-State Circuits, vol. 36, pp. 896–909, June 2001. [13] J. W. Nilsson, Electric Circuits, 3rd ed. New York: Addison-Wesley, 1990. [14] S. Haykin, Communication Systems, 3rd ed. New York: Wiley, 1994. [15] S. Goldman, Frequency Analysis, Modulation, and Noise. New York: McGraw-Hill, 1948. [16] V. Gutnik and A. P. Chandrakasan, “Active GHz clock network using distributed PLLs,” J. Solid-State Circuits, vol. 35, pp. 1553–1560, Nov. 2000. [17] “Jitter analysis techniques using an Agilent Infiniium oscilloscipe,” Ag- ilent Technologies, Palo Alto, CA, [Online.] Available: http://cp.litera- ture.agilent.com/litweb/pdf/5988-6109EN.pdf, May 2002. [18] M. S. McCorquodale, M. K. Ding, and R. B. Brown, “Study and sim- ulation of CMOS LC oscillator phase noise and jitter,” in Proc. ISCAS, 2003, pp. 665–668. Alan J. Drake (S’99) received the B.S. degree in electrical engineering from the University of Arizona, Tucson, in 1997 and the M.S. degree in electrical engineering from the University of Michigan, Ann Arbor, MI, in 2000. Currently, he is working toward the Ph.D. degree at the University of Michigan. His research interests include low-power VLSI, resonant clock generation and distribution, and SOI technology. In March, 2004, he joined the IBM Austin Research Laboratory where he is conducting research on clock distribution and high-performance processor circuit design. Kevin J. Nowka (S’84–M’85) received the B.S. de- gree in computer engineering from Iowa State Uni- versity, Ames, in 1986 and the M.S. and Ph.D. de- grees in electrical engineering from Stanford Univer- sity, Stanford, CA, in 1988 and 1995, respectively. He joined the IBM Austin Research Laboratory in 1996 where he has conducted research on CMOS VLSI circuits for two 1-GHz microprocessors and for a low-power embedded PowerPC processor. He currently manages the Exploratory VLSI Design Department of the IBM Austin Research Laboratory. He holds 35 patents related to microprocessor design.
  9. 9. 1528 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 9, SEPTEMBER 2004 Tuyet Y. Nguyen was born in Vietnam. She joined IBM in 1987. She has been involved in process support, specializing in analyzing device fail- ures resulting from manufacturing process and layout design issues. Her current focus is VLSI mask design for high speed analog and digital VLSI designs. Jeffrey L. Burns received the B.S. degree in en- gineering from the University of California, Los Angeles, and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Berkeley. In October 1988, he joined the IBM T. J. Watson Research Center as a Research Staff Member, where he worked in the areas of layout compaction, layout synthesis for control logic, CAD system architecture, and microprocessor design. In 1996, he joined the IBM Austin Research Laboratory, Austin, TX, where he worked initially on high-frequency microprocessor design and design-tools strategy. From 1999 to 2003, he managed the Exploratory VLSI Design Depart- ment of the Austin Research Laboratory, working in the areas of high-end mi- croprocessors, ultra-low-power embedded processors, and high-bandwidth data communications. Since mid 2003, he has been on the IBM Research Technical Strategy staff, in Yorktown Heights, NY, where his main responsibility has been to produce IBM Research’s long-term IT industry outlook. Dr. Burns received an IBM Outstanding Technical Achievement Award in 1997 for his microprocessor tools and design work for IBM’s S/390 products, and an IBM Research Division Award for his work on IBM’s 1.0-GHz PowerPC prototype disclosed in 1998. Richard B. Brown (S’74–M’76–SM’91) received the B.S. and M.S. degrees in electrical engineering from Brigham Young University, Provo, UT, in 1976, and the Ph.D. degree in electrical engineering (solid-state) from the University of Utah, Salt Lake City, in 1985. From 1976 to 1981, he worked in computer design as Vice-President of Engineering at Holman Indus- tries, Oakdale, CA, and then as Manager of Com- puter Development at Cardinal Industries, Webb City, MO. He joined the faculty of the Department of Elec- trical Engineering and Computer Science, University of Michigan, Ann Arbor, in 1985. He has conducted major research projects in the areas of solid-state sensors, mixed-signal circuits, GaAs and silicon-on-insulator circuits, and high performance and low power microprocessors. He served as Associate Chair of Electrical Engineering for four years and as Interim Chair of Electrical Engi- neering and Computer Science for two years at the University of Michigan. He became Dean of Engineering at the University of Utah in July 2004. Prof. Brown serves as Chairman of the NSF MOSIS Advisory Council for Ed- ucation. He was Chair of the 1997 Conference on Advanced Research in VLSI and the 2001 Microelectronic System Education Conference. He has served as Guest Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and Proceedings of the IEEE, and as associate editor of IEEE TRANSACTIONS ON VLSI SYSTEMS.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×