6/23/2014 © 2014 ANSYS, Inc. 1 
Methods for Achieving RTL to Gate 
Power Consistency 
Design Automation Conference 2014
6/23/2014 © 2014 ANSYS, Inc. 2 
PowerArtist™: RTL Design-for-Power Platform 
Power Analysis and Debug 
Original RTL Low-Power RTL 
Automated Power Reduction Links with Physical 
Physical 
Power 
RTL Power 
PACE RPM
6/23/2014 © 2014 ANSYS, Inc. 3 
Objectives of RTL Power Analysis 
• Power trade-off analysis using relative accuracy 
• Sign off power with absolute accuracy 
• Analysis driven power reduction 
0.00 
0.10 
0.20 
0.30 
0.40 
0.50 
0.60 
0.70 
0.80 
0.90 
1.00 
0.00 
0.10 
0.20 
0.30 
0.40 
0.50 
0.60 
0.70 
0.80 
0.90 
1.00 
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 
Cumulative Area 
Overhead 
(normalized) 
Total Power 
Savings Available 
(normalized) 
# RTL Changes (Design Effort) 
Maximum acceptable area 
impact 
Maximum possible 
power savings 
Only 5 changes 
gave 50% saving
6/23/2014 © 2014 ANSYS, Inc. 5 
RTL Power: Inputs for PowerArtist 
Vdd 
1 
Power domains 
(UPF / CPF) 
Vdd 
2 
module PA ( 
... 
always @ (posedge clk) begin 
dout <= din1; 
end 
assign out = sel ? dout : din2; 
... 
endmodule RTL 
(VHDL, Verilog, System Verilog) 
RTL Power 
Analysis 
Capacitance model 
(WLM / PACE) 
mux 
and 
register 
register 
Activity 
(FSDB / VCD / SAIF) 
Clock tree, gating 
(SDC, PACE, user input) 
clk 
Power models 
(Liberty .lib)
6/23/2014 © 2014 ANSYS, Inc. 6 
Factors Affecting RTL Power Accuracy 
Synthesis 
Modeling 
Inferencing 
Multi-VT 
Cell Selection 
Micro-architecture 
Algorithmic 
RTL Models 
Activity 
Propagation 
Timing 
Power 
Computation 
Physical 
Models 
Clock Tree 
Wire Cap 
Transition Time 
Low Power 
Structures 
Voltage / Power 
Domains 
CPF / UPF 
NOTE: Algorithmic and Low Power 
structures are not configured for 
accuracy
6/23/2014 © 2014 ANSYS, Inc. 7 
Synthesis Modeling Aspects for RTL Power 
• Optimization settings to be consistent as synthesis 
• Enable DesignWare flow (if DW components are present) Inferencing 
Multi-VT • Apply consistent multi-VT settings from synthesis 
• Fine-tune cell selection based on synthesis netlist 
• Apply boundary conditions based on load/ frequency Cell Selection 
• Apply microarchitectures for macros (e.g. adders, 
multipliers) Microarchitecture
6/23/2014 © 2014 ANSYS, Inc. 8 
Synthesis Modeling Aspects in PowerArtist 
b = 8’b11000100; 
assign z = a * b; 
CSA 
Constant Multipliers 
assign z = a + b + c + d ; a b c 
CSA d 
CSA 
+ 
a b 
+ c 
+ d 
+ 
Chains of Adders 
Look-Up Table Optimization 
OR 
plane 
address 
data 
case (address) 
8'd0 : data = {32'd0}; 
8'd1 : data = {32'd12}; 
… 
endcase 
address 
Optimized and-or plane by 
sharing common logic 
data 
Cell mapping to 
basic 2-input cells 
Modeled using 
AOIs 
Un-encoded mux
6/23/2014 © 2014 ANSYS, Inc. 9 
RTL Power Accuracy 
Using Wire Load Models 
– Large difference seen with 
simple wire load models 
– Clock and Combo power show 
the largest difference 
– Total power shows 40% 
difference wrt gate level 
Mobile SoC Case Study 
** Note: GATE considered to be most accurate 
28.8% 
11.0% 
-9.2% 
69.2% 
41.2% 
32.3% 
40.2% 
-100% 
-80% 
-60% 
-40% 
-20% 
0% 
20% 
40% 
60% 
80% 
100% 
0.000 
0.020 
0.040 
0.060 
0.080 
0.100 
0.120 
% Difference 
Power (Watts) 
RTL Wire Load Models vs. Gate Level 
(Different Power Categories) 
RTL WLM GATE %diff
6/23/2014 © 2014 ANSYS, Inc. 10 
Physical Aspects Modeling for Power 
• Modeling clock tree 
• Balanced and Clock Mesh topology Clock Tree 
• Accurately model post-layout wire capacitance 
• Model capacitance profile for different types of nets Wire Cap 
• Accurately model slew for realistic power 
• Both clock and logic nets Transition Time
6/23/2014 © 2014 ANSYS, Inc. 11 
Physical Modeling: Clock Tree 
• RTL clock power accuracy requirements 
– Understand clock gating methodology 
– Understand clock tree topology and buffering 
• Difficult for RTL designers to get data from backend team 
Balanced Clock Tree Clock Mesh Topology
6/23/2014 © 2014 ANSYS, Inc. 12 
Physical Modeling: Wire Cap 
40nm, 45k nets with fanout 1 
Traditional Wire Load Models 
• Not available in some vendor libraries; often not calibrated 
• Custom WLMs not portable across blocks and designs 
• Simplistic modeling results in poor accuracy 
WLM assigns 1fF for all nets vs. SPEF 
that varies 0.2fF to >129fF
6/23/2014 © 2014 ANSYS, Inc. 13 
PACE™ for RTL Power Accuracy 
PACE applies from RTL to Pre-layout Power 
• Clock tree models 
– Determine buffer and CG cells per inferred clock tree 
– Supports both balanced clock tree as well as clock mesh 
• Wire capacitance models 
– Granular, power-oriented vs. traditional WLMs 
module PA ( 
... 
always @ (posedge clk) 
begin 
dout <= din1; 
end 
assign out = sel ? dout : 
din2; 
... 
endmodule 
Clock distribution 
Parasitics 
Multiple Vt 
Low-power structures 
RTL Power 
Bridge the RTL ↔ Implementation Gap 
Statistical Models: 
Wire Cap and Clock 
Representative 
Layout 
PowerArtist 
Calibration (PACE) 
Post-Layout Power
6/23/2014 © 2014 ANSYS, Inc. 14 
-13.4% 
5.1% 
-9.2% 
22.8% 
8.1% 
-37.4% 
3.0% 
-100% 
-80% 
-60% 
-40% 
-20% 
0% 
20% 
40% 
60% 
80% 
100% 
0.000 
0.020 
0.040 
0.060 
0.080 
0.100 
0.120 
% Difference 
Power (Watts) 
PACE Cap Models vs. WLM & Gate Level 
(Different Power Categories) 
RTL WLM RTL w PACE Cap GATE %diff 
RTL Power Accuracy 
Using PACE Cap Models 
– Tighter correlation seen with 
PACE Cap models 
– Register and Combo power 
are within +/-20% 
– Total power shows <5% 
difference wrt gate level 
Mobile SoC Case Study 
** Note: GATE considered to be most accurate
6/23/2014 © 2014 ANSYS, Inc. 15 
RTL Power Accuracy 
Using PACE Cap + Clock Models 
– Best correlation seen with 
PACE Cap + Clock models 
– Overall correlation is within 
+/-15% 
Mobile SoC Case Study 
** Note: GATE considered to be most accurate 
-13.4% 
9.9% 
-9.2% 
-12.8% -9.0% -13.6% -9.4% 
-100.0% 
-80.0% 
-60.0% 
-40.0% 
-20.0% 
0.0% 
20.0% 
40.0% 
60.0% 
80.0% 
100.0% 
0.000 
0.020 
0.040 
0.060 
0.080 
0.100 
0.120 
% Difference 
Power (Watts) 
PACE Cap+Clk Models vs. WLM & Gate Level 
(Different Power Categories) 
RTL WLM RTL w PACE Cap+Clock GATE 
%diff w/ PACE %diff w/ WLM
6/23/2014 © 2014 ANSYS, Inc. 16 
0.000 
0.020 
0.040 
0.060 
0.080 
0.100 
0.120 
Design 1 Design 2 Design 3 
Power (Watts) 
Total Power Comparison 
RTL WLM RTL PACE GATE 
RTL Power Accuracy 
Using PACE Cap + Clock Models 
– Total power with WLM is 
greater than +/-30% 
– With PACE models within 
+/-20% 
Mobile SoC Blocks Case 
Study 
** Note: GATE considered to be most accurate
6/23/2014 © 2014 ANSYS, Inc. 17 
RTL Power Accuracy 
Using PACE Cap + Clock Models 
– Total power with WLM is 
greater than +/-30% 
– With PACE models within 
+/-20% 
Mobile SoC Blocks Case 
Study 
** Note: GATE considered to be most accurate 
– Clock power with PACE 
is within +/-20% as well 
15.5% 
19.0% 
20.7% 
0.0% 
5.0% 
10.0% 
15.0% 
20.0% 
25.0% 
0.00E+00 
1.00E-02 
2.00E-02 
3.00E-02 
4.00E-02 
5.00E-02 
6.00E-02 
7.00E-02 
8.00E-02 
Design 1 Design 2 Design 3 
% diff 
Power (Watts) 
Clock Power wrt RTL PACE vs. GATE 
GATE RTL PACE %diff
6/23/2014 © 2014 ANSYS, Inc. 18 
Nvidia Case Study: RTL Power Accuracy 
DESIGN 
Number of 
instances 
Black-boxed 
DW 
instances 
Avg 
Dynamic 
Power 
(mW) 
Avg 
Leakage 
Power 
(mW) 
Avg Total 
Power 
(mW) 
Avg 
Dynamic 
Power 
(mW) 
Avg 
Leakage 
Power 
(mW) 
Avg Total 
Power 
(mW) 
% 
Dynamic 
Power 
% Leakage 
Power 
% Total 
Power 
PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02% 
TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97% 
TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77% 
TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67% 
SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88% 
SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12% 
115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26% 
125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97% 
85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18% 
Average Power excluding SMI/TTF 
Average Power PR/TD only 
Post-synthesis PT-PX RTL Power Artist 
RTL Power Artist vs 
Post-synthesis PT-PX 
Average Power overall designs 
• Power correlation performed for 6 designs 130K - 1.13M instances 
• In general, very good average power correlation observed (SMI and TTF having DWs) 
• 8-16 tests being run across the blocks 
** Source : Nvidia-Apache Webinar, July 2013 (Miki)
6/23/2014 © 2014 ANSYS, Inc. 19 
Summary 
• RTL power enables early design trade offs for high power impact 
• PowerArtist provides predictable RTL power accuracy wrt GATE 
• PowerArtist has advanced synthesis and physical modeling techniques 
• PowerArtist PACE modeling is proven across designs 
• Use PowerArtist for RTL power sign-off with absolute accuracy

Methods for Achieving RTL to Gate Power Consistency

  • 1.
    6/23/2014 © 2014ANSYS, Inc. 1 Methods for Achieving RTL to Gate Power Consistency Design Automation Conference 2014
  • 2.
    6/23/2014 © 2014ANSYS, Inc. 2 PowerArtist™: RTL Design-for-Power Platform Power Analysis and Debug Original RTL Low-Power RTL Automated Power Reduction Links with Physical Physical Power RTL Power PACE RPM
  • 3.
    6/23/2014 © 2014ANSYS, Inc. 3 Objectives of RTL Power Analysis • Power trade-off analysis using relative accuracy • Sign off power with absolute accuracy • Analysis driven power reduction 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 Cumulative Area Overhead (normalized) Total Power Savings Available (normalized) # RTL Changes (Design Effort) Maximum acceptable area impact Maximum possible power savings Only 5 changes gave 50% saving
  • 4.
    6/23/2014 © 2014ANSYS, Inc. 5 RTL Power: Inputs for PowerArtist Vdd 1 Power domains (UPF / CPF) Vdd 2 module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule RTL (VHDL, Verilog, System Verilog) RTL Power Analysis Capacitance model (WLM / PACE) mux and register register Activity (FSDB / VCD / SAIF) Clock tree, gating (SDC, PACE, user input) clk Power models (Liberty .lib)
  • 5.
    6/23/2014 © 2014ANSYS, Inc. 6 Factors Affecting RTL Power Accuracy Synthesis Modeling Inferencing Multi-VT Cell Selection Micro-architecture Algorithmic RTL Models Activity Propagation Timing Power Computation Physical Models Clock Tree Wire Cap Transition Time Low Power Structures Voltage / Power Domains CPF / UPF NOTE: Algorithmic and Low Power structures are not configured for accuracy
  • 6.
    6/23/2014 © 2014ANSYS, Inc. 7 Synthesis Modeling Aspects for RTL Power • Optimization settings to be consistent as synthesis • Enable DesignWare flow (if DW components are present) Inferencing Multi-VT • Apply consistent multi-VT settings from synthesis • Fine-tune cell selection based on synthesis netlist • Apply boundary conditions based on load/ frequency Cell Selection • Apply microarchitectures for macros (e.g. adders, multipliers) Microarchitecture
  • 7.
    6/23/2014 © 2014ANSYS, Inc. 8 Synthesis Modeling Aspects in PowerArtist b = 8’b11000100; assign z = a * b; CSA Constant Multipliers assign z = a + b + c + d ; a b c CSA d CSA + a b + c + d + Chains of Adders Look-Up Table Optimization OR plane address data case (address) 8'd0 : data = {32'd0}; 8'd1 : data = {32'd12}; … endcase address Optimized and-or plane by sharing common logic data Cell mapping to basic 2-input cells Modeled using AOIs Un-encoded mux
  • 8.
    6/23/2014 © 2014ANSYS, Inc. 9 RTL Power Accuracy Using Wire Load Models – Large difference seen with simple wire load models – Clock and Combo power show the largest difference – Total power shows 40% difference wrt gate level Mobile SoC Case Study ** Note: GATE considered to be most accurate 28.8% 11.0% -9.2% 69.2% 41.2% 32.3% 40.2% -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% 0.000 0.020 0.040 0.060 0.080 0.100 0.120 % Difference Power (Watts) RTL Wire Load Models vs. Gate Level (Different Power Categories) RTL WLM GATE %diff
  • 9.
    6/23/2014 © 2014ANSYS, Inc. 10 Physical Aspects Modeling for Power • Modeling clock tree • Balanced and Clock Mesh topology Clock Tree • Accurately model post-layout wire capacitance • Model capacitance profile for different types of nets Wire Cap • Accurately model slew for realistic power • Both clock and logic nets Transition Time
  • 10.
    6/23/2014 © 2014ANSYS, Inc. 11 Physical Modeling: Clock Tree • RTL clock power accuracy requirements – Understand clock gating methodology – Understand clock tree topology and buffering • Difficult for RTL designers to get data from backend team Balanced Clock Tree Clock Mesh Topology
  • 11.
    6/23/2014 © 2014ANSYS, Inc. 12 Physical Modeling: Wire Cap 40nm, 45k nets with fanout 1 Traditional Wire Load Models • Not available in some vendor libraries; often not calibrated • Custom WLMs not portable across blocks and designs • Simplistic modeling results in poor accuracy WLM assigns 1fF for all nets vs. SPEF that varies 0.2fF to >129fF
  • 12.
    6/23/2014 © 2014ANSYS, Inc. 13 PACE™ for RTL Power Accuracy PACE applies from RTL to Pre-layout Power • Clock tree models – Determine buffer and CG cells per inferred clock tree – Supports both balanced clock tree as well as clock mesh • Wire capacitance models – Granular, power-oriented vs. traditional WLMs module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule Clock distribution Parasitics Multiple Vt Low-power structures RTL Power Bridge the RTL ↔ Implementation Gap Statistical Models: Wire Cap and Clock Representative Layout PowerArtist Calibration (PACE) Post-Layout Power
  • 13.
    6/23/2014 © 2014ANSYS, Inc. 14 -13.4% 5.1% -9.2% 22.8% 8.1% -37.4% 3.0% -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% 0.000 0.020 0.040 0.060 0.080 0.100 0.120 % Difference Power (Watts) PACE Cap Models vs. WLM & Gate Level (Different Power Categories) RTL WLM RTL w PACE Cap GATE %diff RTL Power Accuracy Using PACE Cap Models – Tighter correlation seen with PACE Cap models – Register and Combo power are within +/-20% – Total power shows <5% difference wrt gate level Mobile SoC Case Study ** Note: GATE considered to be most accurate
  • 14.
    6/23/2014 © 2014ANSYS, Inc. 15 RTL Power Accuracy Using PACE Cap + Clock Models – Best correlation seen with PACE Cap + Clock models – Overall correlation is within +/-15% Mobile SoC Case Study ** Note: GATE considered to be most accurate -13.4% 9.9% -9.2% -12.8% -9.0% -13.6% -9.4% -100.0% -80.0% -60.0% -40.0% -20.0% 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 0.000 0.020 0.040 0.060 0.080 0.100 0.120 % Difference Power (Watts) PACE Cap+Clk Models vs. WLM & Gate Level (Different Power Categories) RTL WLM RTL w PACE Cap+Clock GATE %diff w/ PACE %diff w/ WLM
  • 15.
    6/23/2014 © 2014ANSYS, Inc. 16 0.000 0.020 0.040 0.060 0.080 0.100 0.120 Design 1 Design 2 Design 3 Power (Watts) Total Power Comparison RTL WLM RTL PACE GATE RTL Power Accuracy Using PACE Cap + Clock Models – Total power with WLM is greater than +/-30% – With PACE models within +/-20% Mobile SoC Blocks Case Study ** Note: GATE considered to be most accurate
  • 16.
    6/23/2014 © 2014ANSYS, Inc. 17 RTL Power Accuracy Using PACE Cap + Clock Models – Total power with WLM is greater than +/-30% – With PACE models within +/-20% Mobile SoC Blocks Case Study ** Note: GATE considered to be most accurate – Clock power with PACE is within +/-20% as well 15.5% 19.0% 20.7% 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 0.00E+00 1.00E-02 2.00E-02 3.00E-02 4.00E-02 5.00E-02 6.00E-02 7.00E-02 8.00E-02 Design 1 Design 2 Design 3 % diff Power (Watts) Clock Power wrt RTL PACE vs. GATE GATE RTL PACE %diff
  • 17.
    6/23/2014 © 2014ANSYS, Inc. 18 Nvidia Case Study: RTL Power Accuracy DESIGN Number of instances Black-boxed DW instances Avg Dynamic Power (mW) Avg Leakage Power (mW) Avg Total Power (mW) Avg Dynamic Power (mW) Avg Leakage Power (mW) Avg Total Power (mW) % Dynamic Power % Leakage Power % Total Power PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02% TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97% TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77% TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67% SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88% SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12% 115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26% 125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97% 85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18% Average Power excluding SMI/TTF Average Power PR/TD only Post-synthesis PT-PX RTL Power Artist RTL Power Artist vs Post-synthesis PT-PX Average Power overall designs • Power correlation performed for 6 designs 130K - 1.13M instances • In general, very good average power correlation observed (SMI and TTF having DWs) • 8-16 tests being run across the blocks ** Source : Nvidia-Apache Webinar, July 2013 (Miki)
  • 18.
    6/23/2014 © 2014ANSYS, Inc. 19 Summary • RTL power enables early design trade offs for high power impact • PowerArtist provides predictable RTL power accuracy wrt GATE • PowerArtist has advanced synthesis and physical modeling techniques • PowerArtist PACE modeling is proven across designs • Use PowerArtist for RTL power sign-off with absolute accuracy