6/23/2014 © 2014 ANSYS, Inc. 1 
PowerArtist™: RTL Design-for-Power 
Design Automation Conference 2014
6/23/2014 © 2014 ANSYS, Inc. 2 
Early Power Decisions  High Impact 
Power Reduction 
100% 
50% 
0% 
Large Impact Small Impact 
RTL 
Design 
Logic 
Synthesis 
Physical 
Design 
Timing 
Closure 
• Power-Performance-Area Trade-offs 
• Voltage / Power Domain Planning 
• Block-level Clock and Data Gating 
• Eliminate Redundant Activity 
• Power Switch Sizing / Placement 
• Clock Gater Cloning / Decloning 
• Multi-Vt Optimization 
• Power Integrity Verification 
RTL Design-for-Power Low Power Implementation
6/23/2014 © 2014 ANSYS, Inc. 3 
RTL Power ↔ Gate-level Power 
Design Specification 
RTL Design 
Gate-Level Design 
Layout 
~20 hours 
~22 mins 
Quicker Design Iterations Effective Design-for-Power 
Gate-level Power 
+ 
Adder 
Register 
Mux 
RTL Power 
Power-per-Function 
Power-per-Gate
6/23/2014 © 2014 ANSYS, Inc. 4 
PowerArtist: RTL Design-for-Power Platform 
RTL Power 
Analysis 
• Average, time-based 
• Power-critical vector selection 
• Regressions via TCL interface 
RTL Power 
Reduction 
• Clock, memory, logic 
• Analysis-driven automation 
• Interactive power debug 
RTL Links 
with Physical 
• PACE™: RTL power accuracy 
• RPM™: RTL-driven physical power integrity 
Physical 
Power 
RTL Power 
PACE RPM
6/23/2014 © 2014 ANSYS, Inc. 5 
RTL Power: Ins and Outs 
Vdd1 
Power domains 
(UPF / CPF) 
Vdd2 
module PA ( 
... 
always @ (posedge clk) begin 
dout <= din1; 
end 
assign out = sel ? dout : din2; 
... 
endmodule RTL 
(VHDL, Verilog, System Verilog) 
RTL Power 
Analysis 
Capacitance model 
(WLM / PACE) 
mu 
x 
and 
register 
register 
Activity 
(FSDB / VCD / SAIF) 
Clock tree, gating 
(SDC, PACE, user input) clk 
Power models 
(Liberty .lib) 
mux
6/23/2014 © 2014 ANSYS, Inc. 6 
Low Power RTL Design Methodology 
Peak Power = 391mW 
Check power vs. budget 
TRANSMIT MODE RECEIVE MODE 
Residual receive activity in 
transmit mode 
Profile power vectors 
RTL Power Regression Flow 
Reduce power automatically Monitor power vs. budget 
Enabled Clock 
Inactive Data 
Debug power hotspots 
Average power = 239mW 
Perform design trade-offs 
0.00E+00 
1.00E-02 
2.00E-02 
3.00E-02 
4.00E-02 
5.00E-02 
6.00E-02 
Power (W) 
Version 2 (Typ) 
Version 1 (Typ) 
Version 2 (Idle) 
Version 1 (Idle) 
Version 1 Version 2
6/23/2014 © 2014 ANSYS, Inc. 7 
RTL vs. Gates: Accuracy and Performance 
Nvidia Case Study 
RTL Power Accuracy: ~15% RTL Power: ~30X faster
6/23/2014 © 2014 ANSYS, Inc. 9 
RTL Capacity: Large Designs / FSDBs 
Samsung Case Study 
FSDB captures only power-critical 
signals identified by PowerArtist 
• FSDB size: 1/4 
• TAT: 4X faster 
• Loss of accuracy: 2%
6/23/2014 © 2014 ANSYS, Inc. 10 
RTL Power Analysis
6/23/2014 © 2014 ANSYS, Inc. 11 
PowerArtist RTL Power Analysis 
• Total Logic / Clock Activity 
per Hierarchical Instance 
• Qualify Coverage per Power 
Mode 
• Identify Power Bugs 
• Understand Power: Where? 
Why? 
• Per Hierarchy, Category, Mode, 
Clock / Voltage Domains 
• Qualify Power Efficiency with 
Multiple Metrics 
Activity Analysis Average Power Analysis 
• Power Waveforms per 
Hierarchical Instance 
• Waveforms per Category: 
Clock, Memory, Logic 
• Identify Peak Power and 
Time 
Time-based Power Analysis
6/23/2014 © 2014 ANSYS, Inc. 12 
Clock Gating Efficiency 
Temporal and Structural Metrics 
Example 
• 16 of 20 bits are gated 
• 5 of 10 cycles are gated 
• 2 of 5 enabled cycles had data toggles 
gclk 
clk 
en 
data 
SCGE DCGE CGEE 
Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles 
Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk) 
Value 80% 50% 40%
6/23/2014 © 2014 ANSYS, Inc. 13 
Clock Gating Efficiency 
Temporal and Structural Metrics 
100% Static CGE 
0% Dynamic CGE 
CGEE, 
Power Impact 
CGE: Static, Dynamic 
Flop: Power, Activity
6/23/2014 © 2014 ANSYS, Inc. 14 
RTL Power Reduction
6/23/2014 © 2014 ANSYS, Inc. 15 
PowerArtist RTL Power Reduction 
Original RTL Low-Power RTL 
openPDB powerartist.pdb 
set RPT [open $output_file "w"] 
set ungated_registers [getRegisters -cg none] 
foreach I $ungated_registers { 
set dyn_power [getPropVal $i Dynamic_Power "inst"] 
set bit_width [getInstWidth $reg] 
set file [getPropVal $iFile_Name "inst"] 
set line_num [getPropVal $i Line_Number "inst"] 
} 
1. Interactive Power 
Debug 
2. Automated Power 
Reduction 
3. Customizable Power 
Reports 
• Block-level Power “Bugs” 
• Large Power Savings 
• Instance-level Power Reduction 
• 15 Analysis-driven Techniques 
• TCL Queries to OADB 
• Automation Beyond 
PowerArtist Reports
6/23/2014 © 2014 ANSYS, Inc. 16 
Debug Power: Visualize-Analyze-Reduce 
Inactive Data, Active Clock 
Identify Block-level Clock Gating Enable
6/23/2014 © 2014 ANSYS, Inc. 17 
Block-Level Power Reduction 
Clock Active, Data Inactive 
Clock Inactive, Data Active 
Block-level 
Clock Gating 
Block-level 
Data Gating 
Block-level Activity Analysis: 
Clock and Data Ports 
1.1 Clock Pins 
------------------------------------------------------- 
Redundant Total Pin Mode Instance 
Cycles Cycles Name Name Name 
------------------------------------------------------- 
200 201 CLKA read top.core1.t1.dpmem.m1 
------------------------------------------------------- 
1.2 Input and Redundant Pins 
------------------------------------------------------- 
Redundant Total Pin Mode Instance 
Toggles Toggles Name Name Name 
------------------------------------------------------- 
1 1 AB[8] read top.core1.t1.dpmem.m1 
------------------------------------------------------- 
Wasted Activity 
per Mode 
Clock Activity per 
Hierarchy 
Constant high activity 
Missed clock gating? 
Redundant activity 
in read mode
6/23/2014 © 2014 ANSYS, Inc. 18 
Instance-Level Power Reduction 
• Clock gating coverage 
• Clock gating efficiency 
• Sequential and combinational 
• Redundant activity 
• Don’t care conditions 
• Datapath operand isolation 
• Redundant read/write 
• Splitting memories 
• Exercising sleep modes 
Clock / Clock Gating Control Logic and Datapath Memory Subsystem
6/23/2014 © 2014 ANSYS, Inc. 19 
Analysis-Driven RTL Power Reduction 
Wasted activity/power when sel is 0
6/23/2014 © 2014 ANSYS, Inc. 20 
Analysis-Driven RTL Power Reduction 
Pre-compute based new clock gate enables 
Multi-cycle ODC sequential analysis
6/23/2014 © 2014 ANSYS, Inc. 21 
Analysis-Driven RTL Power Reduction 
Pre-compute based new clock gate enables 
Multi-cycle ODC sequential analysis 
0.00 
0.10 
0.20 
0.30 
0.40 
0.50 
0.60 
0.70 
0.80 
0.90 
1.00 
1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291 
Predicted Power Savings 
(normalized) 
# RTL Changes (Design Effort) 
Top 5 RTL changes  
50% identified power savings 
Maximize Power Savings 
Minimize Design Impact 
• Clock, Memory, Logic 
• Sequential, Combinational 
• Vector-based, Vectorless 
• Hierarchical, SoC capacity 
15 Power Reduction Techniques
6/23/2014 © 2014 ANSYS, Inc. 22 
Power Reduction Case Studies 
…. 
. 
1 
0 
A 
B 
scan_enable = 0 
scan_clock 
data_in 
M_OUT 
Write Write Read 
MUX Reduction Technique: 
• Scan clocks toggling in functional mode 
• Redundant data activity in registers wasting power 
Redundant Data Toggles 
GMC Technique: 
• Redundant data toggles in 
read mode 
• Cycle-based analysis reports 
% Redundant Cycles
6/23/2014 © 2014 ANSYS, Inc. 23 
Power Database Access with TCL API 
Power Database 
(OpenAccess) 
Design Queries 
• getMemories/Flops/Combs 
• getFanout 
• getModulePorts 
• reportDesignStats 
Report Creation 
• reportCGEfficiency 
• diffPdbPower 
• reportPower 
• reportReductions 
Power Queries 
• getPropVal instance/net 
• getClockPower 
• getNetPower 
• getClockEnableExpr 
Design Navigation 
• dls 
• dpwd, dcd 
• dpushd, dpopd 
• show 
Customize and Automate Power Reduction, Reports, Regressions 
• Quick access to power and design properties 
• Accomplish custom tasks with few lines of TCL
6/23/2014 © 2014 ANSYS, Inc. 24 
Custom Power Reports 
50% Idle Power Reduction in Mobile SoC 
Instance Name 
Enable 
Efficiency Clock Power Clock En Net 
or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk 
or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1 
en_blk 
clk 
data 
gclk_blk 
Inefficient enables waste power 
en_blk 
clk 
gclk_blk 
Block 
Clock 
Gate 
en_reg 
Register 
Clock 
Gate 
gclk_reg 
Block-level clock gates control 
significant power 
Power Efficiency = 0 Single clock gate controls >5mW 
PowerArtist clock gating report  identifies inefficient clock gates
6/23/2014 © 2014 ANSYS, Inc. 25 
RTL Power Regressions 
• 30+ blocks per typical SoC 
• 2+ vectors per block 
• Vectors written for power: idle, active 
• Daily block-level, weekly chip-level regressions 
monitor power changes 
• Power metrics track power efficiency 
• PowerArtist identifies where power changed 
RTL 
(Verilog, SV, VHDL) 
Testbench 
Simulator 
FSDB 
RTL Power 
Analysis, Reduction, Regression
6/23/2014 © 2014 ANSYS, Inc. 26 
RTL Links with Physical Design
6/23/2014 © 2014 ANSYS, Inc. 27 
PACE™: Physical-Aware RTL Power 
Budgeting 
module PA ( 
... 
always @ (posedge clk) 
begin 
dout <= din1; 
end 
assign out = sel ? dout : 
din2; 
... 
endmodule 
• Clock Distribution 
• Parasitics 
• Multiple Vt 
• Low-power Structures 
• Optimization 
PACE Models 
(Cap, Clock) 
Post-Layout 
Gate-level Power 
RTL Power PACE 
PACE Bridges the RTL vs. Layout Gap 
 Predictable RTL Power Accuracy
6/23/2014 © 2014 ANSYS, Inc. 28 
RTL PACE vs. Gate-Power: Mobile SoC @14nm 
RTL-PACE Power within 20% 
Total Power Correlation 
Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation 
Gate-SPEF vs. RTL-PACE 
RTL-PACE Clock Power within 20%
6/23/2014 © 2014 ANSYS, Inc. 29 
RTL Power-Driven Power Integrity 
module PA ( 
... 
always @ (posedge clk) 
begin 
dout <= din1; 
end 
assign out = sel ? dout : 
din2; 
... 
endmodule 
• Shrinking geometries  Increasing di/dt 
• Gate vectors too late 
• Layout late for changes 
• Error-prone guesstimates 
RTL Power 
RPM Enables PDN Planning  
Early, Optimal, Robust 
RTL Power 
Model 
RPM 
Physical 
Power Integrity
6/23/2014 © 2014 ANSYS, Inc. 30 
RPM Case Studies 
RPM 
CPM(Layout)+Pkg 
CPM(RPM)+Pkg 
Pkg only 
RPM 
Gate 
FSDB 
Vectorless 
Peak = 6X Average Power 
Di/dt event not at the 
same time as the peak 
Peak and di/dt Cycle Selection on a GPU Core 
Frame: DIDT 
Start time: 0.0817704 
Finish time: 0.0817706 
Average leakage for supply VDD: 0.00257393 
Average power for supply VDD: 0.185336 
Peak power for supply VDD: 0.219776 
Frame: CYCLE_POWER 
Start time: 0.0806005 
Finish time: 0.0806007 
Average leakage for supply VDD: 0.002569 
Average power for supply VDD: 0.250168 
Peak power for supply VDD: 0.266678 
Early Voltage Drop Analysis Early Package Resonance Analysis
6/23/2014 © 2014 ANSYS, Inc. 32 
Related Presentations @ DAC2014 
• Power Analysis Using PowerArtist for WaveLogic3 ASIC – 
100Gbs Coherent Metro Optical Modem 
• Achieving RTL Power Efficiency and Automated Power 
Reduction 
• Methods for Achieving RTL to Gate Power Consistency

PowerArtist: RTL Design for Power Platform

  • 1.
    6/23/2014 © 2014ANSYS, Inc. 1 PowerArtist™: RTL Design-for-Power Design Automation Conference 2014
  • 2.
    6/23/2014 © 2014ANSYS, Inc. 2 Early Power Decisions  High Impact Power Reduction 100% 50% 0% Large Impact Small Impact RTL Design Logic Synthesis Physical Design Timing Closure • Power-Performance-Area Trade-offs • Voltage / Power Domain Planning • Block-level Clock and Data Gating • Eliminate Redundant Activity • Power Switch Sizing / Placement • Clock Gater Cloning / Decloning • Multi-Vt Optimization • Power Integrity Verification RTL Design-for-Power Low Power Implementation
  • 3.
    6/23/2014 © 2014ANSYS, Inc. 3 RTL Power ↔ Gate-level Power Design Specification RTL Design Gate-Level Design Layout ~20 hours ~22 mins Quicker Design Iterations Effective Design-for-Power Gate-level Power + Adder Register Mux RTL Power Power-per-Function Power-per-Gate
  • 4.
    6/23/2014 © 2014ANSYS, Inc. 4 PowerArtist: RTL Design-for-Power Platform RTL Power Analysis • Average, time-based • Power-critical vector selection • Regressions via TCL interface RTL Power Reduction • Clock, memory, logic • Analysis-driven automation • Interactive power debug RTL Links with Physical • PACE™: RTL power accuracy • RPM™: RTL-driven physical power integrity Physical Power RTL Power PACE RPM
  • 5.
    6/23/2014 © 2014ANSYS, Inc. 5 RTL Power: Ins and Outs Vdd1 Power domains (UPF / CPF) Vdd2 module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule RTL (VHDL, Verilog, System Verilog) RTL Power Analysis Capacitance model (WLM / PACE) mu x and register register Activity (FSDB / VCD / SAIF) Clock tree, gating (SDC, PACE, user input) clk Power models (Liberty .lib) mux
  • 6.
    6/23/2014 © 2014ANSYS, Inc. 6 Low Power RTL Design Methodology Peak Power = 391mW Check power vs. budget TRANSMIT MODE RECEIVE MODE Residual receive activity in transmit mode Profile power vectors RTL Power Regression Flow Reduce power automatically Monitor power vs. budget Enabled Clock Inactive Data Debug power hotspots Average power = 239mW Perform design trade-offs 0.00E+00 1.00E-02 2.00E-02 3.00E-02 4.00E-02 5.00E-02 6.00E-02 Power (W) Version 2 (Typ) Version 1 (Typ) Version 2 (Idle) Version 1 (Idle) Version 1 Version 2
  • 7.
    6/23/2014 © 2014ANSYS, Inc. 7 RTL vs. Gates: Accuracy and Performance Nvidia Case Study RTL Power Accuracy: ~15% RTL Power: ~30X faster
  • 8.
    6/23/2014 © 2014ANSYS, Inc. 9 RTL Capacity: Large Designs / FSDBs Samsung Case Study FSDB captures only power-critical signals identified by PowerArtist • FSDB size: 1/4 • TAT: 4X faster • Loss of accuracy: 2%
  • 9.
    6/23/2014 © 2014ANSYS, Inc. 10 RTL Power Analysis
  • 10.
    6/23/2014 © 2014ANSYS, Inc. 11 PowerArtist RTL Power Analysis • Total Logic / Clock Activity per Hierarchical Instance • Qualify Coverage per Power Mode • Identify Power Bugs • Understand Power: Where? Why? • Per Hierarchy, Category, Mode, Clock / Voltage Domains • Qualify Power Efficiency with Multiple Metrics Activity Analysis Average Power Analysis • Power Waveforms per Hierarchical Instance • Waveforms per Category: Clock, Memory, Logic • Identify Peak Power and Time Time-based Power Analysis
  • 11.
    6/23/2014 © 2014ANSYS, Inc. 12 Clock Gating Efficiency Temporal and Structural Metrics Example • 16 of 20 bits are gated • 5 of 10 cycles are gated • 2 of 5 enabled cycles had data toggles gclk clk en data SCGE DCGE CGEE Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk) Value 80% 50% 40%
  • 12.
    6/23/2014 © 2014ANSYS, Inc. 13 Clock Gating Efficiency Temporal and Structural Metrics 100% Static CGE 0% Dynamic CGE CGEE, Power Impact CGE: Static, Dynamic Flop: Power, Activity
  • 13.
    6/23/2014 © 2014ANSYS, Inc. 14 RTL Power Reduction
  • 14.
    6/23/2014 © 2014ANSYS, Inc. 15 PowerArtist RTL Power Reduction Original RTL Low-Power RTL openPDB powerartist.pdb set RPT [open $output_file "w"] set ungated_registers [getRegisters -cg none] foreach I $ungated_registers { set dyn_power [getPropVal $i Dynamic_Power "inst"] set bit_width [getInstWidth $reg] set file [getPropVal $iFile_Name "inst"] set line_num [getPropVal $i Line_Number "inst"] } 1. Interactive Power Debug 2. Automated Power Reduction 3. Customizable Power Reports • Block-level Power “Bugs” • Large Power Savings • Instance-level Power Reduction • 15 Analysis-driven Techniques • TCL Queries to OADB • Automation Beyond PowerArtist Reports
  • 15.
    6/23/2014 © 2014ANSYS, Inc. 16 Debug Power: Visualize-Analyze-Reduce Inactive Data, Active Clock Identify Block-level Clock Gating Enable
  • 16.
    6/23/2014 © 2014ANSYS, Inc. 17 Block-Level Power Reduction Clock Active, Data Inactive Clock Inactive, Data Active Block-level Clock Gating Block-level Data Gating Block-level Activity Analysis: Clock and Data Ports 1.1 Clock Pins ------------------------------------------------------- Redundant Total Pin Mode Instance Cycles Cycles Name Name Name ------------------------------------------------------- 200 201 CLKA read top.core1.t1.dpmem.m1 ------------------------------------------------------- 1.2 Input and Redundant Pins ------------------------------------------------------- Redundant Total Pin Mode Instance Toggles Toggles Name Name Name ------------------------------------------------------- 1 1 AB[8] read top.core1.t1.dpmem.m1 ------------------------------------------------------- Wasted Activity per Mode Clock Activity per Hierarchy Constant high activity Missed clock gating? Redundant activity in read mode
  • 17.
    6/23/2014 © 2014ANSYS, Inc. 18 Instance-Level Power Reduction • Clock gating coverage • Clock gating efficiency • Sequential and combinational • Redundant activity • Don’t care conditions • Datapath operand isolation • Redundant read/write • Splitting memories • Exercising sleep modes Clock / Clock Gating Control Logic and Datapath Memory Subsystem
  • 18.
    6/23/2014 © 2014ANSYS, Inc. 19 Analysis-Driven RTL Power Reduction Wasted activity/power when sel is 0
  • 19.
    6/23/2014 © 2014ANSYS, Inc. 20 Analysis-Driven RTL Power Reduction Pre-compute based new clock gate enables Multi-cycle ODC sequential analysis
  • 20.
    6/23/2014 © 2014ANSYS, Inc. 21 Analysis-Driven RTL Power Reduction Pre-compute based new clock gate enables Multi-cycle ODC sequential analysis 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291 Predicted Power Savings (normalized) # RTL Changes (Design Effort) Top 5 RTL changes  50% identified power savings Maximize Power Savings Minimize Design Impact • Clock, Memory, Logic • Sequential, Combinational • Vector-based, Vectorless • Hierarchical, SoC capacity 15 Power Reduction Techniques
  • 21.
    6/23/2014 © 2014ANSYS, Inc. 22 Power Reduction Case Studies …. . 1 0 A B scan_enable = 0 scan_clock data_in M_OUT Write Write Read MUX Reduction Technique: • Scan clocks toggling in functional mode • Redundant data activity in registers wasting power Redundant Data Toggles GMC Technique: • Redundant data toggles in read mode • Cycle-based analysis reports % Redundant Cycles
  • 22.
    6/23/2014 © 2014ANSYS, Inc. 23 Power Database Access with TCL API Power Database (OpenAccess) Design Queries • getMemories/Flops/Combs • getFanout • getModulePorts • reportDesignStats Report Creation • reportCGEfficiency • diffPdbPower • reportPower • reportReductions Power Queries • getPropVal instance/net • getClockPower • getNetPower • getClockEnableExpr Design Navigation • dls • dpwd, dcd • dpushd, dpopd • show Customize and Automate Power Reduction, Reports, Regressions • Quick access to power and design properties • Accomplish custom tasks with few lines of TCL
  • 23.
    6/23/2014 © 2014ANSYS, Inc. 24 Custom Power Reports 50% Idle Power Reduction in Mobile SoC Instance Name Enable Efficiency Clock Power Clock En Net or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1 en_blk clk data gclk_blk Inefficient enables waste power en_blk clk gclk_blk Block Clock Gate en_reg Register Clock Gate gclk_reg Block-level clock gates control significant power Power Efficiency = 0 Single clock gate controls >5mW PowerArtist clock gating report  identifies inefficient clock gates
  • 24.
    6/23/2014 © 2014ANSYS, Inc. 25 RTL Power Regressions • 30+ blocks per typical SoC • 2+ vectors per block • Vectors written for power: idle, active • Daily block-level, weekly chip-level regressions monitor power changes • Power metrics track power efficiency • PowerArtist identifies where power changed RTL (Verilog, SV, VHDL) Testbench Simulator FSDB RTL Power Analysis, Reduction, Regression
  • 25.
    6/23/2014 © 2014ANSYS, Inc. 26 RTL Links with Physical Design
  • 26.
    6/23/2014 © 2014ANSYS, Inc. 27 PACE™: Physical-Aware RTL Power Budgeting module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule • Clock Distribution • Parasitics • Multiple Vt • Low-power Structures • Optimization PACE Models (Cap, Clock) Post-Layout Gate-level Power RTL Power PACE PACE Bridges the RTL vs. Layout Gap  Predictable RTL Power Accuracy
  • 27.
    6/23/2014 © 2014ANSYS, Inc. 28 RTL PACE vs. Gate-Power: Mobile SoC @14nm RTL-PACE Power within 20% Total Power Correlation Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation Gate-SPEF vs. RTL-PACE RTL-PACE Clock Power within 20%
  • 28.
    6/23/2014 © 2014ANSYS, Inc. 29 RTL Power-Driven Power Integrity module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule • Shrinking geometries  Increasing di/dt • Gate vectors too late • Layout late for changes • Error-prone guesstimates RTL Power RPM Enables PDN Planning  Early, Optimal, Robust RTL Power Model RPM Physical Power Integrity
  • 29.
    6/23/2014 © 2014ANSYS, Inc. 30 RPM Case Studies RPM CPM(Layout)+Pkg CPM(RPM)+Pkg Pkg only RPM Gate FSDB Vectorless Peak = 6X Average Power Di/dt event not at the same time as the peak Peak and di/dt Cycle Selection on a GPU Core Frame: DIDT Start time: 0.0817704 Finish time: 0.0817706 Average leakage for supply VDD: 0.00257393 Average power for supply VDD: 0.185336 Peak power for supply VDD: 0.219776 Frame: CYCLE_POWER Start time: 0.0806005 Finish time: 0.0806007 Average leakage for supply VDD: 0.002569 Average power for supply VDD: 0.250168 Peak power for supply VDD: 0.266678 Early Voltage Drop Analysis Early Package Resonance Analysis
  • 30.
    6/23/2014 © 2014ANSYS, Inc. 32 Related Presentations @ DAC2014 • Power Analysis Using PowerArtist for WaveLogic3 ASIC – 100Gbs Coherent Metro Optical Modem • Achieving RTL Power Efficiency and Automated Power Reduction • Methods for Achieving RTL to Gate Power Consistency