Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Methods for Achieving RTL to Gate Power Consistency

1,972 views

Published on

Consistency between RTL and signoff power numbers is necessary in enabling early low power design decisions with confidence. A modeling and characterization approach that takes into account physical design parameters is required to ensure this consistency. This presentation covers factors that affect RTL power accuracy and how PowerArtist™ PACE™ technology models physical effects to deliver predictable RTL power accuracy for sub-20nm designs. Learn more on our website: https://bit.ly/10Rpcxu

Published in: Engineering
  • Be the first to comment

Methods for Achieving RTL to Gate Power Consistency

  1. 1. 6/23/2014 © 2014 ANSYS, Inc. 1 Methods for Achieving RTL to Gate Power Consistency Design Automation Conference 2014
  2. 2. 6/23/2014 © 2014 ANSYS, Inc. 2 PowerArtist™: RTL Design-for-Power Platform Power Analysis and Debug Original RTL Low-Power RTL Automated Power Reduction Links with Physical Physical Power RTL Power PACE RPM
  3. 3. 6/23/2014 © 2014 ANSYS, Inc. 3 Objectives of RTL Power Analysis • Power trade-off analysis using relative accuracy • Sign off power with absolute accuracy • Analysis driven power reduction 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 Cumulative Area Overhead (normalized) Total Power Savings Available (normalized) # RTL Changes (Design Effort) Maximum acceptable area impact Maximum possible power savings Only 5 changes gave 50% saving
  4. 4. 6/23/2014 © 2014 ANSYS, Inc. 5 RTL Power: Inputs for PowerArtist Vdd 1 Power domains (UPF / CPF) Vdd 2 module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule RTL (VHDL, Verilog, System Verilog) RTL Power Analysis Capacitance model (WLM / PACE) mux and register register Activity (FSDB / VCD / SAIF) Clock tree, gating (SDC, PACE, user input) clk Power models (Liberty .lib)
  5. 5. 6/23/2014 © 2014 ANSYS, Inc. 6 Factors Affecting RTL Power Accuracy Synthesis Modeling Inferencing Multi-VT Cell Selection Micro-architecture Algorithmic RTL Models Activity Propagation Timing Power Computation Physical Models Clock Tree Wire Cap Transition Time Low Power Structures Voltage / Power Domains CPF / UPF NOTE: Algorithmic and Low Power structures are not configured for accuracy
  6. 6. 6/23/2014 © 2014 ANSYS, Inc. 7 Synthesis Modeling Aspects for RTL Power • Optimization settings to be consistent as synthesis • Enable DesignWare flow (if DW components are present) Inferencing Multi-VT • Apply consistent multi-VT settings from synthesis • Fine-tune cell selection based on synthesis netlist • Apply boundary conditions based on load/ frequency Cell Selection • Apply microarchitectures for macros (e.g. adders, multipliers) Microarchitecture
  7. 7. 6/23/2014 © 2014 ANSYS, Inc. 8 Synthesis Modeling Aspects in PowerArtist b = 8’b11000100; assign z = a * b; CSA Constant Multipliers assign z = a + b + c + d ; a b c CSA d CSA + a b + c + d + Chains of Adders Look-Up Table Optimization OR plane address data case (address) 8'd0 : data = {32'd0}; 8'd1 : data = {32'd12}; … endcase address Optimized and-or plane by sharing common logic data Cell mapping to basic 2-input cells Modeled using AOIs Un-encoded mux
  8. 8. 6/23/2014 © 2014 ANSYS, Inc. 9 RTL Power Accuracy Using Wire Load Models – Large difference seen with simple wire load models – Clock and Combo power show the largest difference – Total power shows 40% difference wrt gate level Mobile SoC Case Study ** Note: GATE considered to be most accurate 28.8% 11.0% -9.2% 69.2% 41.2% 32.3% 40.2% -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% 0.000 0.020 0.040 0.060 0.080 0.100 0.120 % Difference Power (Watts) RTL Wire Load Models vs. Gate Level (Different Power Categories) RTL WLM GATE %diff
  9. 9. 6/23/2014 © 2014 ANSYS, Inc. 10 Physical Aspects Modeling for Power • Modeling clock tree • Balanced and Clock Mesh topology Clock Tree • Accurately model post-layout wire capacitance • Model capacitance profile for different types of nets Wire Cap • Accurately model slew for realistic power • Both clock and logic nets Transition Time
  10. 10. 6/23/2014 © 2014 ANSYS, Inc. 11 Physical Modeling: Clock Tree • RTL clock power accuracy requirements – Understand clock gating methodology – Understand clock tree topology and buffering • Difficult for RTL designers to get data from backend team Balanced Clock Tree Clock Mesh Topology
  11. 11. 6/23/2014 © 2014 ANSYS, Inc. 12 Physical Modeling: Wire Cap 40nm, 45k nets with fanout 1 Traditional Wire Load Models • Not available in some vendor libraries; often not calibrated • Custom WLMs not portable across blocks and designs • Simplistic modeling results in poor accuracy WLM assigns 1fF for all nets vs. SPEF that varies 0.2fF to >129fF
  12. 12. 6/23/2014 © 2014 ANSYS, Inc. 13 PACE™ for RTL Power Accuracy PACE applies from RTL to Pre-layout Power • Clock tree models – Determine buffer and CG cells per inferred clock tree – Supports both balanced clock tree as well as clock mesh • Wire capacitance models – Granular, power-oriented vs. traditional WLMs module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule Clock distribution Parasitics Multiple Vt Low-power structures RTL Power Bridge the RTL ↔ Implementation Gap Statistical Models: Wire Cap and Clock Representative Layout PowerArtist Calibration (PACE) Post-Layout Power
  13. 13. 6/23/2014 © 2014 ANSYS, Inc. 14 -13.4% 5.1% -9.2% 22.8% 8.1% -37.4% 3.0% -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% 0.000 0.020 0.040 0.060 0.080 0.100 0.120 % Difference Power (Watts) PACE Cap Models vs. WLM & Gate Level (Different Power Categories) RTL WLM RTL w PACE Cap GATE %diff RTL Power Accuracy Using PACE Cap Models – Tighter correlation seen with PACE Cap models – Register and Combo power are within +/-20% – Total power shows <5% difference wrt gate level Mobile SoC Case Study ** Note: GATE considered to be most accurate
  14. 14. 6/23/2014 © 2014 ANSYS, Inc. 15 RTL Power Accuracy Using PACE Cap + Clock Models – Best correlation seen with PACE Cap + Clock models – Overall correlation is within +/-15% Mobile SoC Case Study ** Note: GATE considered to be most accurate -13.4% 9.9% -9.2% -12.8% -9.0% -13.6% -9.4% -100.0% -80.0% -60.0% -40.0% -20.0% 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 0.000 0.020 0.040 0.060 0.080 0.100 0.120 % Difference Power (Watts) PACE Cap+Clk Models vs. WLM & Gate Level (Different Power Categories) RTL WLM RTL w PACE Cap+Clock GATE %diff w/ PACE %diff w/ WLM
  15. 15. 6/23/2014 © 2014 ANSYS, Inc. 16 0.000 0.020 0.040 0.060 0.080 0.100 0.120 Design 1 Design 2 Design 3 Power (Watts) Total Power Comparison RTL WLM RTL PACE GATE RTL Power Accuracy Using PACE Cap + Clock Models – Total power with WLM is greater than +/-30% – With PACE models within +/-20% Mobile SoC Blocks Case Study ** Note: GATE considered to be most accurate
  16. 16. 6/23/2014 © 2014 ANSYS, Inc. 17 RTL Power Accuracy Using PACE Cap + Clock Models – Total power with WLM is greater than +/-30% – With PACE models within +/-20% Mobile SoC Blocks Case Study ** Note: GATE considered to be most accurate – Clock power with PACE is within +/-20% as well 15.5% 19.0% 20.7% 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 0.00E+00 1.00E-02 2.00E-02 3.00E-02 4.00E-02 5.00E-02 6.00E-02 7.00E-02 8.00E-02 Design 1 Design 2 Design 3 % diff Power (Watts) Clock Power wrt RTL PACE vs. GATE GATE RTL PACE %diff
  17. 17. 6/23/2014 © 2014 ANSYS, Inc. 18 Nvidia Case Study: RTL Power Accuracy DESIGN Number of instances Black-boxed DW instances Avg Dynamic Power (mW) Avg Leakage Power (mW) Avg Total Power (mW) Avg Dynamic Power (mW) Avg Leakage Power (mW) Avg Total Power (mW) % Dynamic Power % Leakage Power % Total Power PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02% TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97% TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77% TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67% SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88% SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12% 115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26% 125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97% 85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18% Average Power excluding SMI/TTF Average Power PR/TD only Post-synthesis PT-PX RTL Power Artist RTL Power Artist vs Post-synthesis PT-PX Average Power overall designs • Power correlation performed for 6 designs 130K - 1.13M instances • In general, very good average power correlation observed (SMI and TTF having DWs) • 8-16 tests being run across the blocks ** Source : Nvidia-Apache Webinar, July 2013 (Miki)
  18. 18. 6/23/2014 © 2014 ANSYS, Inc. 19 Summary • RTL power enables early design trade offs for high power impact • PowerArtist provides predictable RTL power accuracy wrt GATE • PowerArtist has advanced synthesis and physical modeling techniques • PowerArtist PACE modeling is proven across designs • Use PowerArtist for RTL power sign-off with absolute accuracy

×