Project mac

1,672 views

Published on

  • Be the first to comment

  • Be the first to like this

Project mac

  1. 1. An Energy Efficient Sub-threshold Multiplication and Accumulation Unit for Low Power Digital Signal Processing Applications Harsha Yelisala SPRING 2009 - SUMMER 2010
  2. 2. Technology Profile The following technologies are used in this project, 90nm Pass Transistor Technology. Cadence IC design. Virtuoso schematic. Virtuoso Analog Design Environment. Cadence Spectre Simulator. Virtuoso Layout Suite. Synopsys Nanosim. Synopsys Hspice. Tcl Scripting. Perl Scripting. Python Programming Language.
  3. 3. Aim The Objectives of this project are 1. To design an industry standard energy efficient circuit in a 90nm Technology. 2. To emphasize the Subthreshold mode of operation. 3. To get hands on expertise on Cadence and Synopsys Tools. 4. To understand the hardware design flow. 5. To work with Perl and Tcl Scripting Languages.
  4. 4. Introduction Abstract The increased use of power consuming devices led to a new corner of research in energy and power efficient designs. The conventional design methodologies proved to be inefficient when energy efficiency is a prime metric. Of the several novel approaches, the one that is promising in terms of high energy savings and reduced complexity is the Sub-threshold mode of operation. A 220mV energy efficient Subthreshold MAC unit is designed based on the designed custom cell library in 90nm Pass transistor technology.
  5. 5. Work Flow 1. Studying the literature regarding Subthreshold operation. 2. Investigating various logic families for Subthreshold scheme. 3. Designing a custom library of standard cells out of the proposed logic family. 4. Designing a MAC unit. 5. Verifying and testing the unit from power and energy perspective.
  6. 6. Subthreshold Mode What is Subthreshold mode A basic MOS transistor works in three different modes of operation. 1. Active or Saturation Mode 2. Linear or Triode Mode 3. Cutoff or Subthreshold Mode
  7. 7. Modes of a MOSFET operation Modes of a MOSFET A basic MOS transistor works in three different modes of operation. 1. Active or Saturation Mode 2. Linear or Triode Mode 3. Cutoff or Subthreshold Mode
  8. 8. All about Subthreshold Mode! What is Subthreshold mode The subthreshold operation of CMOS transistor is performed when the gate to source potential (Vgs ) is less than threshold voltage(Vth ). Advantages: 1. As the device is operating in ultra low voltages(200-300mV), the dynamic power component is highly reduced. 2. Highly suitable for low power low speed applications like sensor nodes, battery operated devices etc., Disadvantages: 1. As the driving currents are the weak leakage currents the time to charge and discharge the nodes is high, making the speed in between 1-10MHz. 2. Transistor sizing criticality 3. Low On-Off Current ratio. 4. High Sensitivity to Process, Voltage and Temperature variations.
  9. 9. Subthreshold Current Model (1 of 2) In Subthreshold regime, the drain current(Ids ) varies exponentially. In long channel device, threshold voltage does not depend on drain voltage or channel length. But in sub-micron technology, due to drain induced barrier lowering(DIBL), threshold voltage does depend on drain voltage, as source/drain depletion region penetrates significantly into the channel. The subthreshold current of CMOS transistor is given by the following equation, Isub = I0 × e (Vgs −Vth +ηVds )/nvt × 1 − e −Vds /Vth . (1)
  10. 10. Subthreshold Current Model (2 of 2) Isub = I0 × e (Vgs −Vth +ηVds )/nvt × 1 − e −Vds /Vth . (2) where 2 I0 = µo Cox (W /L)(n − 1)Vth (3) and Vgs = transistor gate to source voltage, Vds = drain to source voltage, Vth = threshold voltage, vt = KT /q is the thermal voltage, n = subthreshold slope factor = (1 + Cd /Cox ) Cd = drain capacitance Cox = gate capacitance η = DIBL co-efficient µo = Mobility. W and L are the width and channel length of MOSFET respectively.
  11. 11. Subthreshold Power Model (1 ) For low frequency mobile devices, the advantage of subthreshold design is widely achieved through radical circuit power reduction at the cost of operating speed . The total power consumption of the digital circuit is given by following equation. Ptotal = Pdynamic + Pshort−circuit + Pstatic (4)
  12. 12. Subthreshold Power Model (2 ) Dynamic Power Dynamic power is described by following equation, Pdynamic = αfCeff Vdd 2 (5) where α is activity factor, f is switching frequency, Ceff is the effective capacitance. As dynamic power is directly proportional with the square of supply voltage, significant power reduction is achieved in subthreshold voltage.
  13. 13. Subthreshold Power Model (3 ) Dynamic Power At 220mV, the dynamic charging current which is directly proportional with dynamic power, is reduced by almost 248.49X compared to supply voltage of 1.2V for an inverter at TT process corner. 3 10 TT FS SF 2 SS 10 FF Current (uA) 1 10 0 10 −1 10 0.2 0.4 0.6 0.8 1 1.2 Supply voltage (V)
  14. 14. Subthreshold Power Model (4 ) Static Power Static power is the power consumed by the circuit during idle state and described by following equation. Pstatic = ILeakage Vdd (6) The leakage current consists of various components, subthreshold leakage, gate tunneling, gate induced drain lowering (GIDL) and reverse bias diode leakage. The subthreshold leakage varies according to equation (2). Thus with reduction of drain voltage, the DIBL effect reduces which in turn reduces subthreshold leakage current. The gate tunneling has significant contribution to overall leakage current, which also reduces with gate or supply voltage. GIDL and reverse bias diode leakage also significantly reduce due to supply voltage reduction in a subthreshold circuit.
  15. 15. Subthreshold Power Model (5 ) Static Power At 220mV, the subthreshold leakage current at weak inversion is reduced by almost 8.55X compared to strong inversion(supply voltage 1.2V) at TT process corner. 3 10 TT FS SF 2 SS 10 FF Current (nA) 1 10 0 10 −1 10 0.2 0.4 0.6 0.8 1 1.2 Supply voltage (V)
  16. 16. Subthreshold Power Model (6 ) Short Circuit Power Short circuit power is the power dissipated due to current conduction between Vdd and VSS during logic transition. It is described by the following equation. Pstatic = Ishort−circuit Vdd (7) Although short-circuit current flowing time is increased due to slower operation in subthreshold, but reduced supply voltage decreases electron conduction, which in turn reduces Ishort−circuit .
  17. 17. Subthreshold Power Model (5 ) Short Circuit Power At 220mV, there is a 446.45X reduction in short circuit current compared to full rail voltage of 1.2V in TT process corner. 2 10 1 10 Current (uA) 0 10 TT FS −1 SF 10 SS FF −2 10 0.2 0.4 0.6 0.8 1 1.2 Supply voltage (V) Figure: Short circuit current rating under varying supply voltage for an
  18. 18. Subthreshold Design Challenges (1) Transistor Sizing Criticality On-Off Current Ratio PVT variations Noise Margin
  19. 19. Subthreshold Design Challenges (2) Transistor Sizing Criticality The relative strength of pull-up, pull-down is very critical for optimal rise and fall time. As subthreshold current depends exponentially on Vth , any variation in threshold of NMOS and PMOS can change the β ratio drastically which directly affects rise/fall time and may trigger logic failure. The shift in β ratio is observed in low-voltage, enforcing us to size the cell transistor very carefully.
  20. 20. Subthreshold Design Challenges (2) Transistor Sizing Criticality The relative strength of pull-up, pull-down is very critical for optimal rise and fall time. As subthreshold current depends exponentially on Vth , any variation in threshold of NMOS and PMOS can change the β ratio drastically which directly affects rise/fall time and may trigger logic failure. The shift in β ratio is observed in low-voltage, enforcing us to size the cell transistor very carefully.
  21. 21. Subthreshold Design Challenges (2) Ratio of NMOS ION and PMOS ION at different corners 3 10 TT FF FS SF SS ION NMOS / ION PMOS 2 10 1 10 0 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Supply(V) Figure: Ratio of NMOS ION and PMOS ION at different corners
  22. 22. Subthreshold Design Challenges (2) Ratio of NMOS ION and PMOS ION at different temperatures 30 −40C −20C 0C 25 20C 40C 60C ION NMOS / ION PMOS 20 80C 100C 120C 15 10 5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Supply(V) Figure: Ratio of NMOS ION and PMOS ION at different temperatures
  23. 23. Subthreshold Design Challenges (3) On-Off Current Ratio The drain current of MOSFET increases exponentially in subthreshold region whereas in strong inversion it changes very slowly due to velocity saturation of majority carriers. In subthreshold region, the threshold voltage deviation and degradation of ION /IOFF of the current makes the circuit operation very critical. In subthreshold region like 0.2V, ION /IOFF degrades to below 300 at room temperature.There is strong race condition between on and off devices during setting of a critical signal and this determines the maximum number of allowable cells per bit-line. When this current ratio degrades to very low value, it becomes very difficult to differentiate between logic ‘1’ and logic ‘0’. If we consider process variations, this ratio becomes worse in FF corner as shown.
  24. 24. Subthreshold Design Challenges (3) On-Off Current Ratio 5 10 4 10 NMOS ION / IOFF 3 10 −40C −20C 2 10 0C 20C 40C 1 60C 10 80C 100C 120C 0 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Supply(V) Figure: Ratio of NMOS ION and IOFF at different temperatures Observation: Significant β ratio variation is observed in low
  25. 25. Subthreshold Design Challenges (3) On-Off Current Ratio 7 10 6 10 5 10 PMOS ION / IOFF 4 10 −40C 3 10 −20C 0C 2 20C 10 40C 60C 1 10 80C 100C 0 120C 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Supply(V) Figure: Ratio of PMOS ION and IOFF at different temperatures Observation: Significant β ratio variation is observed in low
  26. 26. On-Off Current Ratio 5 10 4 10 NMOS ION / IOFF 3 10 2 TT 10 FF FS SF 1 10 SS 0 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Supply(V) Figure: Ratio of NMOS ION and IOFF at different corners Observation: Significant β ratio variation is observed in low voltage at different temperatures.
  27. 27. On-Off Current Ratio 6 10 5 10 PMOS ION / IOFF 4 10 3 TT 10 FF FS SF 2 SS 10 1 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Supply(V) Figure: Ratio of PMOS ION and IOFF at different corners Observation: Significant β ratio variation is observed in low voltage at different temperatures.
  28. 28. A Look into other Logic families The conventional Complimentary MOS Logic family when operated in subthreshold voltages poses several disadvantages. A few of them are: 1. High Power dissipation 2. Weak Noise margins. 3. Huge delays. Thus it is evident that a CMOS logic family is not optimum for subthreshold operation.
  29. 29. A study of several other logic families is made with power and energy consumption as prime concern. Table: Minimum working voltages for different logic families for a basic AND gate Logic Family Minimum Voltage(mv) Delay(ns) Driving Current(nA) Power(nW) PDP(fJ) Sub-CMOS 250 2.56 3330 1859 4.759 Pseudo NMOS 220 4.765 102.56 0.6023 2.87 DTMOS 180 8.4173 32.54 233.63 1.97 Domino 240 7.6477 476.13 639.41 4.89 Pass Transistor 200 4.9953 201.43 426.17 2.13 DTPT 175 6.598 128.39 204.68 1.35 Table: Energy comparison at 250mV for different logic families for basic AND gate Logic Family Delay(ns) Driving Current(nA) Power(nW) PDP(fJ) Sub-CMOS 2.56 3330 1859 4.759 Pseudo NMOS 3.8637 761.938 0.9848 3.805 DTMOS 11.116 89.204 1.501 16.68 Domino 4.5477 568.31 1.119 5.09 Pass Transistor 2.2641 652.88 1.502 3.39 DTPT 1.8432 830 1.503 2.77
  30. 30. Custom Cell Library All the standard cells are designed in 90nm PT technology. The cells are fine tuned for their sizings, driving capability and minimum working voltage magnitudes. The cells that are customized are: Inverter Buffer And Or Xor Xnor
  31. 31. Inverter This is the only gate in the library that is based on CMOS technology. The only modification is that the driving capability of the cell is increased by improving the effective channel length of the P and N devices as shown.
  32. 32. Buffer Buffer gate is obtained by connecting two inverters in series.
  33. 33. And (1 of 2) Operation: When A=0, B=0 the transistors A p1 p1, n1, n3 are on and p2, n2, p3, p4 are off and transmits gnd. n1 A' output When A=0, B=1 the transistors p1, n1, p3 are on and p2, n2, n3, p2 B p3 p4 p4 are off and transmits gnd. vdc n2 When A=1, B=0 the transistors B' p2, n2, n3, p4 are on and p1, n1, gnd n3 p3 are off and transmits B. When A=1, B=1 the transistors Figure: And gate p1, n2, p3, p4 are on and p2, n2, n3 are off and transmits vdc.
  34. 34. And (2 of 2) Need for additional Mosfets n3, p3, p4: A when inputs are A=1, B=0, the p1 output node is discharged to zero. n1 output when inputs are A=1, B=1, the A' output should be connected to B p2 and should charge it to ‘1’. p3 p4 B But due to larger sub threshold vdc n2 B' delay, the node which was gnd discharged earlier takes longer n3 time to charge to ‘1’. Figure: And gate Hence an alternate path is provided to charge the output node to ‘1’ .
  35. 35. Or (1 of 2) Operation: When A=0, B=0 the transistors p1, n1 are on and p2 is off and transmits B. A p2 output When A=0, B=1 the transistors p1 p1, n1 are on and p2 is off and B transmits B. n1 When A=1, B=0 the transistors A' p1, n1 are off and p2 is on and transmits A. Figure: Or gate When A=1, B=1 the transistors p1, n1 are off and p2 is on and transmits A.
  36. 36. Or (2 of 2) A This works fine in strong inversion p2 output region. But when subthreshold mode is p1 considered, the output current is not B sufficient for the gate to drive a FO4 n1 load. Hence a chain of two inverters are A' connected at the final output to consider it as custom OR gate. Figure: Or gate
  37. 37. Xnor Operation: When A=0, B=0 the transistors n1 B' p1, n1 are on and p2, n2 are off and transmits B . p1 When A=0, B=1 the transistors A output p1, n1 are on and p2, n2 are off and transmits B . n2 B When A=1, B=0 the transistors p1, n1 are off and p2, n2 are on p2 and transmits B. A' When A=1, B=1 the transistors p1, n1 are off and p2, n2 are on Figure: Xnor gate and transmits B.
  38. 38. Xor(1 of 2) Operation: When A=0, B=0 the transistors p1 B p1, n1 are off and p2, n2 are on and transmits B. n1 When A=0, B=1 the transistors A output p1, n1 are off and p2, n2 are on and transmits B. p2 B When A=1, B=0 the transistors p1, n1 are on and p2, n2 are off n2 and transmits B . A When A=1, B=1 the transistors p1, n1 are on and p2, n2 are off Figure: Xor gate and transmits B .
  39. 39. Xor(2 of 2) p1 B However, the direct XOR n1 implementation is not used in our custom library, as the XOR derived from A output XNOR works for much lesser minimum working voltage than direct XOR p2 B implementation upon investigation. The details are mentioned in the further n2 slides. A Figure: Xor gate
  40. 40. Summary of the standard cells in PT technology Table: Electrical characteristics of different basic cells using pass transistor logic in TT process corner Basic cell Minimum Voltage(mv) Delay(ns) Driving Current(fA) Power(nW) PDP(aJ) Buffer 148 2.7258 582.06 0.134 0.365 Inverter 150 1.5655 590.65 0.197 0.308 XOR 155 1.5739 611.69 0.562 0.884 NAND 170 0.9638 673.64 0.435 0.419 AND 175 2.1523 689.82 0.47 1.011 OR 155 3.9219 611.81 0.431 1.6903 Full adder 185 2.9647 734.61 29.516 87.506
  41. 41. Design of a MAC Unit MAC is one of the most occurring and energy consuming operation in DSP or other computationally intensive applications. It represents a fundamental building block in all DSP tasks. Therefore, designing an ultra-low power MAC becomes a subject of substantial research interest. An energy efficient MAC unit is designed using the custom cell library.
  42. 42. Design of a MAC Unit Brief Specifications: Inputs : 8-bit Multiplier, 8-bit Multiplicand, 17-bit Addend Outputs :17-bit MAC output Type of Multiplier : Radix-4 Booth encoded multiplier Type of Adder : Ripple carry adder
  43. 43. Block diagram of MAC unit MULTIPLIER ADDER MD<7:0> -MD Partial Product 2s Compliment Generation I PP0 P0 -2MD N Shifter <16:0> Partial P PP1 P1 Product <16:0> Adder U 2MD O Shifter Adder T PP2 P2 U T MR<7:0> PP3 P3 P Booth Encoder U T Figure: Block diagram of MAC unit :
  44. 44. Flowchart of MAC Unit MULTIPLICAND 2s Compliment Boot h encoder Shift er s MULTIPLIER Partial product generation Partial product addition ADDER INPUT Adder MAC OUTPUT Figure: Flowchart for MAC operation
  45. 45. Sequence of logic flow The multiplicand(MD) input enters the 2s compliment block which negates the value of MD. The obtained -MD when shifted left gives a -2MD. The non negated MD is also shifted left to obtain 2MD. The booth encoder block encodes the 8 bit multiplier(MR) to 12 bits which are used to control the partial product generation. The partial product generation involves selection of four 8 bit vectors based on the encoded bits. The four partial products are generated by the PP0, PP1, PP2 and PP3 blocks respectively. The partial products are shifted and sign extended to 16 bits by the P0, P1, P2 and P3 blocks respectively. The obtained partial products are finally added to obtain the 17 bit multiplier output. A 17 bit external input is added with the obtained multiplier product to give final MAC output.
  46. 46. Modified booth encoding algorithm Modified booth encoding algorithm is an often selected algorithm for multiplication of signed numbers. This scheme is selected by its virtue of reducing the number of partial products to half the number of multiplier bits as compared to a conventional booth encoding scheme. This reduces the number of iterations at an increased circuit complexity. Thus the power consumption is also reduced by half. The modified booth encoder based multiplier architecture is designed keeping in view of the power consumption.
  47. 47. Algorithm Description and Control Implementation The modified booth algorithm considers 3 multiplier bits (MRi+1 , MRi , MRi−1 ) at a time and encodes to any value among -2MD, -MD, 0, MD, 2MD based on Table below. The value MRi refers to the i th bit of the multiplier where i ranges from 0 to number of multiplier bits and MR−1 is taken to be 0. Table: Mapping of multiplier bits to encoded bits using Radix 4 Booth Encoder MRi+1 MRi MRi−1 Partial Product A B C 0 0 0 0 0 0 0 0 0 1 MD 0 1 0 0 1 0 MD 0 1 0 0 1 1 2MD 0 0 1 1 0 0 -2MD 1 0 1 1 0 1 -MD 1 1 0 1 1 0 -MD 1 1 0 1 1 1 0 1 0 0 where A, B, C indicate the encoded bits for a given MRi+1 , MRi , MRi−1 bits of the multiplier bit sequence starting from the LSB.
  48. 48. Example Consider an example where, Multiplier(MR) :01001000 Adder input as Multiplicand(MD):00110110 01100010001000001 So, 2MD=01101100, -MD=11001010, -2MD=10010100 Encoding the MR: 010010000 000 encodes to 000 01001000 100 encodes to 101 01001000 001 encodes to 010 01001000 010 encodes to 010 Partial Products: After shifting and sign extending: pp0 :00000000 p0 :0000000000000000 pp1 :10010100 p1 :1111111001010000 pp2 :00110110 p2 :0000001101100000 pp3 :00110110 p3 :0000110110000000 Adder = 01100010001000001 + Product = 00000111100110000 MAC OUTPUT = 0000111100110000
  49. 49. Test Chip A 17 bit subthreshold MAC unit is implemented using 90nm CMOS technology. The fan-in of each logic gate is carefully selected to achieve maximum robustness in near-threshold supply voltage. Since pad-frame input to the MAC is 1.2V, input data and clock signals are down-converted using level shifter down converter. The output of MAC is up converted to 1.2V before being latched to output padframe using an efficient 2-stage down level-shifter. The design layout is done using cadence virtuoso.A total of four metal layers are employed to design the MAC unit. The MAC unit size is 658.4µm × 149.49µm which consumes an area of 0.098mm2 in 90nm technology. The transistor level circuit analysis is performed using random test vector. The design is elaborately tested for PVT variations.
  50. 50. Full chip layout of the proposed design with pad frame Figure: Layout of MAC unit :
  51. 51. Design Specs Table: Subthreshold MAC design specifications Minimum voltage 220mV Speed 1 MHz Energy per operation 1.63pJ Average power 2.04uW Standby power 1.4uW The MAC unit is configured to operate at an extremely low voltage of 220mV at a speed of 1MHz for the worst case process corner (SS) at room temperature and can be functional even down to 180mV at typical corner (TT).
  52. 52. MAC Simulation Results (1 of 8) 100 90 80 70 60 power (uW) 50 40 30 20 10 0 200 250 300 350 400 450 500 voltage (mV) Figure: Average Power Consumption of MAC at different supply voltages :
  53. 53. MAC Simulation Results (2 of 8) 12 SS 10 SF FS TT 8 FF Frequency (MHz) 6 4 2 0 220 225 230 235 240 245 250 Voltage (mV) Figure: Operating frequency of MAC unit at different supply voltages under global variation :
  54. 54. MAC Simulation Results (3 of 8) 7000 6000 5000 Energy/op (fJ) 4000 3000 2000 1000 200 250 300 350 400 450 500 voltage (mV) Figure: Energy/operation at different supply voltages :
  55. 55. MAC Simulation Results (4 of 8) 3 static current dynamic current 2.5 capacitive current 2 1.5 Current (uA) 1 0.5 0 −0.5 −1 200 250 300 350 400 450 500 Votage (mV) Figure: Short circuit, static and capacitive current ratings at different supply voltages :
  56. 56. MAC Simulation Results (5 of 8) 3 temp 0c temp 27c 2.5 temp100c 2 Stand By Power (uW) 1.5 1 0.5 0 −0.5 200 250 300 350 400 450 500 Supply (mV) Figure: Standby power versus supply voltage at different temperatures :
  57. 57. MAC Simulation Results (6 of 8) 3 static current dynamic current 2.5 capacitive current 2 1.5 Current (uA) 1 0.5 0 −0.5 −1 −40 −20 0 20 40 60 80 100 120 temp (c) Figure: Current ratings at different operating temperatures at supply voltage 220mV :
  58. 58. MAC Simulation Results (7 of 8) 1000 900 800 700 600 dealy (ns) 500 400 300 200 100 −40 −20 0 20 40 60 80 100 120 temp (c) Figure: Performance of MAC at different temperatures at supply voltage 220mV :
  59. 59. MAC Simulation Results (8 of 8) 300 250 200 power (uW) 150 100 50 0 −40 −20 0 20 40 60 80 100 120 temp (c) Figure: Average power of MAC at different temperatures at supply voltage 220mV :
  60. 60. Conclusion In this research project, Several logical families are investigated in subthreshold range to build the optimum subthreshold standard cells. Pass transistor logic family was chosen due to its energy efficiency compared to other subthreshold logic families. An optimal design choice is made for each subthreshold standard cell, based on power delay product. A 17 bit subthreshold MAC chip is implemented using customized subthreshold standard cells. The custom cell layout is done using cadence virtuoso and tested in all process corners using nanosim simulator. It is designed to work for a minimum voltage of 220mV and consumes an ultra low energy as minimum as 1.62pJ per operation for an operating performance of 1.0MHz.

×