Methods for Achieving RTL to Gate Power Consistency

6/23/2014 © 2014 ANSYS, Inc. 1
Methods for Achieving RTL to Gate
Power Consistency
Design Automation Conference 2014

6/23/2014 © 2014 ANSYS, Inc. 2
PowerArtist™: RTL Design-for-Power Platform
Power Analysis and Debug
Original RTL Low-Power RTL
Automated Power Reduction Links with Physical
Physical
Power
RTL Power
PACE RPM

6/23/2014 © 2014 ANSYS, Inc. 3
Objectives of RTL Power Analysis
• Power trade-off analysis using relative accuracy
• Sign off power with absolute accuracy
• Analysis driven power reduction
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291
Cumulative Area
Overhead
(normalized)
Total Power
Savings Available
(normalized)
# RTL Changes (Design Effort)
Maximum acceptable area
impact
Maximum possible
power savings
Only 5 changes
gave 50% saving

6/23/2014 © 2014 ANSYS, Inc. 5
RTL Power: Inputs for PowerArtist
Vdd
1
Power domains
(UPF / CPF)
Vdd
2
module PA (
...
always @ (posedge clk) begin
dout <= din1;
end
assign out = sel ? dout : din2;
...
endmodule RTL
(VHDL, Verilog, System Verilog)
RTL Power
Analysis
Capacitance model
(WLM / PACE)
mux
and
register
register
Activity
(FSDB / VCD / SAIF)
Clock tree, gating
(SDC, PACE, user input)
clk
Power models
(Liberty .lib)

6/23/2014 © 2014 ANSYS, Inc. 6
Factors Affecting RTL Power Accuracy
Synthesis
Modeling
Inferencing
Multi-VT
Cell Selection
Micro-architecture
Algorithmic
RTL Models
Activity
Propagation
Timing
Power
Computation
Physical
Models
Clock Tree
Wire Cap
Transition Time
Low Power
Structures
Voltage / Power
Domains
CPF / UPF
NOTE: Algorithmic and Low Power
structures are not configured for
accuracy

6/23/2014 © 2014 ANSYS, Inc. 7
Synthesis Modeling Aspects for RTL Power
• Optimization settings to be consistent as synthesis
• Enable DesignWare flow (if DW components are present) Inferencing
Multi-VT • Apply consistent multi-VT settings from synthesis
• Fine-tune cell selection based on synthesis netlist
• Apply boundary conditions based on load/ frequency Cell Selection
• Apply microarchitectures for macros (e.g. adders,
multipliers) Microarchitecture

6/23/2014 © 2014 ANSYS, Inc. 8
Synthesis Modeling Aspects in PowerArtist
b = 8’b11000100;
assign z = a * b;
CSA
Constant Multipliers
assign z = a + b + c + d ; a b c
CSA d
CSA
+
a b
+ c
+ d
+
Chains of Adders
Look-Up Table Optimization
OR
plane
address
data
case (address)
8'd0 : data = {32'd0};
8'd1 : data = {32'd12};
…
endcase
address
Optimized and-or plane by
sharing common logic
data
Cell mapping to
basic 2-input cells
Modeled using
AOIs
Un-encoded mux

6/23/2014 © 2014 ANSYS, Inc. 9
RTL Power Accuracy
Using Wire Load Models
– Large difference seen with
simple wire load models
– Clock and Combo power show
the largest difference
– Total power shows 40%
difference wrt gate level
Mobile SoC Case Study
** Note: GATE considered to be most accurate
28.8%
11.0%
-9.2%
69.2%
41.2%
32.3%
40.2%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% Difference
Power (Watts)
RTL Wire Load Models vs. Gate Level
(Different Power Categories)
RTL WLM GATE %diff

6/23/2014 © 2014 ANSYS, Inc. 10
Physical Aspects Modeling for Power
• Modeling clock tree
• Balanced and Clock Mesh topology Clock Tree
• Accurately model post-layout wire capacitance
• Model capacitance profile for different types of nets Wire Cap
• Accurately model slew for realistic power
• Both clock and logic nets Transition Time

6/23/2014 © 2014 ANSYS, Inc. 11
Physical Modeling: Clock Tree
• RTL clock power accuracy requirements
– Understand clock gating methodology
– Understand clock tree topology and buffering
• Difficult for RTL designers to get data from backend team
Balanced Clock Tree Clock Mesh Topology

6/23/2014 © 2014 ANSYS, Inc. 12
Physical Modeling: Wire Cap
40nm, 45k nets with fanout 1
Traditional Wire Load Models
• Not available in some vendor libraries; often not calibrated
• Custom WLMs not portable across blocks and designs
• Simplistic modeling results in poor accuracy
WLM assigns 1fF for all nets vs. SPEF
that varies 0.2fF to >129fF

6/23/2014 © 2014 ANSYS, Inc. 13
PACE™ for RTL Power Accuracy
PACE applies from RTL to Pre-layout Power
• Clock tree models
– Determine buffer and CG cells per inferred clock tree
– Supports both balanced clock tree as well as clock mesh
• Wire capacitance models
– Granular, power-oriented vs. traditional WLMs
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
Clock distribution
Parasitics
Multiple Vt
Low-power structures
RTL Power
Bridge the RTL ↔ Implementation Gap
Statistical Models:
Wire Cap and Clock
Representative
Layout
PowerArtist
Calibration (PACE)
Post-Layout Power

6/23/2014 © 2014 ANSYS, Inc. 14
-13.4%
5.1%
-9.2%
22.8%
8.1%
-37.4%
3.0%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% Difference
Power (Watts)
PACE Cap Models vs. WLM & Gate Level
RTL WLM RTL w PACE Cap GATE %diff
RTL Power Accuracy
Using PACE Cap Models
– Tighter correlation seen with
PACE Cap models
– Register and Combo power
are within +/-20%
– Total power shows <5%
difference wrt gate level

6/23/2014 © 2014 ANSYS, Inc. 15
RTL Power Accuracy
Using PACE Cap + Clock Models
– Best correlation seen with
PACE Cap + Clock models
– Overall correlation is within
+/-15%
-13.4%
9.9%
-9.2%
-12.8% -9.0% -13.6% -9.4%
-100.0%
-80.0%
-60.0%
-40.0%
-20.0%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
0.000
0.020
0.040
0.060
0.080
0.100
0.120
% Difference
Power (Watts)
PACE Cap+Clk Models vs. WLM & Gate Level
RTL WLM RTL w PACE Cap+Clock GATE
%diff w/ PACE %diff w/ WLM

6/23/2014 © 2014 ANSYS, Inc. 16
0.000
0.020
0.040
0.060
0.080
0.100
0.120
Design 1 Design 2 Design 3
Power (Watts)
Total Power Comparison
RTL WLM RTL PACE GATE
RTL Power Accuracy
– Total power with WLM is
greater than +/-30%
– With PACE models within
+/-20%
Mobile SoC Blocks Case
Study

6/23/2014 © 2014 ANSYS, Inc. 17
RTL Power Accuracy
– Total power with WLM is
greater than +/-30%
– With PACE models within
+/-20%
Mobile SoC Blocks Case
Study
– Clock power with PACE
is within +/-20% as well
15.5%
19.0%
20.7%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
8.00E-02
Design 1 Design 2 Design 3
% diff
Power (Watts)
Clock Power wrt RTL PACE vs. GATE
GATE RTL PACE %diff

6/23/2014 © 2014 ANSYS, Inc. 18
Nvidia Case Study: RTL Power Accuracy
DESIGN
Number of
instances
Black-boxed
DW
instances
Avg
Dynamic
Power
(mW)
Avg
Leakage
Power
(mW)
Avg Total
Power
(mW)
Avg
Dynamic
Power
(mW)
Avg
Leakage
Power
(mW)
Avg Total
Power
(mW)
%
Dynamic
Power
% Leakage
Power
% Total
Power
PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02%
TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97%
TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77%
TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67%
SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88%
SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12%
115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26%
125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97%
85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18%
Average Power excluding SMI/TTF
Average Power PR/TD only
Post-synthesis PT-PX RTL Power Artist
RTL Power Artist vs
Post-synthesis PT-PX
Average Power overall designs
• Power correlation performed for 6 designs 130K - 1.13M instances
• In general, very good average power correlation observed (SMI and TTF having DWs)
• 8-16 tests being run across the blocks
** Source : Nvidia-Apache Webinar, July 2013 (Miki)

6/23/2014 © 2014 ANSYS, Inc. 19
Summary
• RTL power enables early design trade offs for high power impact
• PowerArtist provides predictable RTL power accuracy wrt GATE
• PowerArtist has advanced synthesis and physical modeling techniques
• PowerArtist PACE modeling is proven across designs
• Use PowerArtist for RTL power sign-off with absolute accuracy

Methods for Achieving RTL to Gate Power Consistency

In this document

More Related Content

What's hot

Viewers also liked

Similar to Methods for Achieving RTL to Gate Power Consistency

More from Ansys

Recently uploaded

Methods for Achieving RTL to Gate Power Consistency