Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PowerArtist: RTL Design for Power Platform

3,858 views

Published on

PowerArtist™ includes production-proven RTL power analysis with interactive visual debug, analysis-driven automatic RTL power reduction, and a Tcl interface to the database enabling custom reports and tracking of power through regressions. PowerArtist generated models bridge the RTL and layout gap delivering physical-aware RTL power accuracy and RTL-power driven early power grid integrity. This presentation provides an overview of PowerArtist and covers RTL design-for-power best practices using real-life examples. Learn more on our website: https://bit.ly/10Rpcxu

Published in: Engineering
  • Be the first to comment

PowerArtist: RTL Design for Power Platform

  1. 1. 6/23/2014 © 2014 ANSYS, Inc. 1 PowerArtist™: RTL Design-for-Power Design Automation Conference 2014
  2. 2. 6/23/2014 © 2014 ANSYS, Inc. 2 Early Power Decisions  High Impact Power Reduction 100% 50% 0% Large Impact Small Impact RTL Design Logic Synthesis Physical Design Timing Closure • Power-Performance-Area Trade-offs • Voltage / Power Domain Planning • Block-level Clock and Data Gating • Eliminate Redundant Activity • Power Switch Sizing / Placement • Clock Gater Cloning / Decloning • Multi-Vt Optimization • Power Integrity Verification RTL Design-for-Power Low Power Implementation
  3. 3. 6/23/2014 © 2014 ANSYS, Inc. 3 RTL Power ↔ Gate-level Power Design Specification RTL Design Gate-Level Design Layout ~20 hours ~22 mins Quicker Design Iterations Effective Design-for-Power Gate-level Power + Adder Register Mux RTL Power Power-per-Function Power-per-Gate
  4. 4. 6/23/2014 © 2014 ANSYS, Inc. 4 PowerArtist: RTL Design-for-Power Platform RTL Power Analysis • Average, time-based • Power-critical vector selection • Regressions via TCL interface RTL Power Reduction • Clock, memory, logic • Analysis-driven automation • Interactive power debug RTL Links with Physical • PACE™: RTL power accuracy • RPM™: RTL-driven physical power integrity Physical Power RTL Power PACE RPM
  5. 5. 6/23/2014 © 2014 ANSYS, Inc. 5 RTL Power: Ins and Outs Vdd1 Power domains (UPF / CPF) Vdd2 module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule RTL (VHDL, Verilog, System Verilog) RTL Power Analysis Capacitance model (WLM / PACE) mu x and register register Activity (FSDB / VCD / SAIF) Clock tree, gating (SDC, PACE, user input) clk Power models (Liberty .lib) mux
  6. 6. 6/23/2014 © 2014 ANSYS, Inc. 6 Low Power RTL Design Methodology Peak Power = 391mW Check power vs. budget TRANSMIT MODE RECEIVE MODE Residual receive activity in transmit mode Profile power vectors RTL Power Regression Flow Reduce power automatically Monitor power vs. budget Enabled Clock Inactive Data Debug power hotspots Average power = 239mW Perform design trade-offs 0.00E+00 1.00E-02 2.00E-02 3.00E-02 4.00E-02 5.00E-02 6.00E-02 Power (W) Version 2 (Typ) Version 1 (Typ) Version 2 (Idle) Version 1 (Idle) Version 1 Version 2
  7. 7. 6/23/2014 © 2014 ANSYS, Inc. 7 RTL vs. Gates: Accuracy and Performance Nvidia Case Study RTL Power Accuracy: ~15% RTL Power: ~30X faster
  8. 8. 6/23/2014 © 2014 ANSYS, Inc. 9 RTL Capacity: Large Designs / FSDBs Samsung Case Study FSDB captures only power-critical signals identified by PowerArtist • FSDB size: 1/4 • TAT: 4X faster • Loss of accuracy: 2%
  9. 9. 6/23/2014 © 2014 ANSYS, Inc. 10 RTL Power Analysis
  10. 10. 6/23/2014 © 2014 ANSYS, Inc. 11 PowerArtist RTL Power Analysis • Total Logic / Clock Activity per Hierarchical Instance • Qualify Coverage per Power Mode • Identify Power Bugs • Understand Power: Where? Why? • Per Hierarchy, Category, Mode, Clock / Voltage Domains • Qualify Power Efficiency with Multiple Metrics Activity Analysis Average Power Analysis • Power Waveforms per Hierarchical Instance • Waveforms per Category: Clock, Memory, Logic • Identify Peak Power and Time Time-based Power Analysis
  11. 11. 6/23/2014 © 2014 ANSYS, Inc. 12 Clock Gating Efficiency Temporal and Structural Metrics Example • 16 of 20 bits are gated • 5 of 10 cycles are gated • 2 of 5 enabled cycles had data toggles gclk clk en data SCGE DCGE CGEE Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk) Value 80% 50% 40%
  12. 12. 6/23/2014 © 2014 ANSYS, Inc. 13 Clock Gating Efficiency Temporal and Structural Metrics 100% Static CGE 0% Dynamic CGE CGEE, Power Impact CGE: Static, Dynamic Flop: Power, Activity
  13. 13. 6/23/2014 © 2014 ANSYS, Inc. 14 RTL Power Reduction
  14. 14. 6/23/2014 © 2014 ANSYS, Inc. 15 PowerArtist RTL Power Reduction Original RTL Low-Power RTL openPDB powerartist.pdb set RPT [open $output_file "w"] set ungated_registers [getRegisters -cg none] foreach I $ungated_registers { set dyn_power [getPropVal $i Dynamic_Power "inst"] set bit_width [getInstWidth $reg] set file [getPropVal $iFile_Name "inst"] set line_num [getPropVal $i Line_Number "inst"] } 1. Interactive Power Debug 2. Automated Power Reduction 3. Customizable Power Reports • Block-level Power “Bugs” • Large Power Savings • Instance-level Power Reduction • 15 Analysis-driven Techniques • TCL Queries to OADB • Automation Beyond PowerArtist Reports
  15. 15. 6/23/2014 © 2014 ANSYS, Inc. 16 Debug Power: Visualize-Analyze-Reduce Inactive Data, Active Clock Identify Block-level Clock Gating Enable
  16. 16. 6/23/2014 © 2014 ANSYS, Inc. 17 Block-Level Power Reduction Clock Active, Data Inactive Clock Inactive, Data Active Block-level Clock Gating Block-level Data Gating Block-level Activity Analysis: Clock and Data Ports 1.1 Clock Pins ------------------------------------------------------- Redundant Total Pin Mode Instance Cycles Cycles Name Name Name ------------------------------------------------------- 200 201 CLKA read top.core1.t1.dpmem.m1 ------------------------------------------------------- 1.2 Input and Redundant Pins ------------------------------------------------------- Redundant Total Pin Mode Instance Toggles Toggles Name Name Name ------------------------------------------------------- 1 1 AB[8] read top.core1.t1.dpmem.m1 ------------------------------------------------------- Wasted Activity per Mode Clock Activity per Hierarchy Constant high activity Missed clock gating? Redundant activity in read mode
  17. 17. 6/23/2014 © 2014 ANSYS, Inc. 18 Instance-Level Power Reduction • Clock gating coverage • Clock gating efficiency • Sequential and combinational • Redundant activity • Don’t care conditions • Datapath operand isolation • Redundant read/write • Splitting memories • Exercising sleep modes Clock / Clock Gating Control Logic and Datapath Memory Subsystem
  18. 18. 6/23/2014 © 2014 ANSYS, Inc. 19 Analysis-Driven RTL Power Reduction Wasted activity/power when sel is 0
  19. 19. 6/23/2014 © 2014 ANSYS, Inc. 20 Analysis-Driven RTL Power Reduction Pre-compute based new clock gate enables Multi-cycle ODC sequential analysis
  20. 20. 6/23/2014 © 2014 ANSYS, Inc. 21 Analysis-Driven RTL Power Reduction Pre-compute based new clock gate enables Multi-cycle ODC sequential analysis 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291 Predicted Power Savings (normalized) # RTL Changes (Design Effort) Top 5 RTL changes  50% identified power savings Maximize Power Savings Minimize Design Impact • Clock, Memory, Logic • Sequential, Combinational • Vector-based, Vectorless • Hierarchical, SoC capacity 15 Power Reduction Techniques
  21. 21. 6/23/2014 © 2014 ANSYS, Inc. 22 Power Reduction Case Studies …. . 1 0 A B scan_enable = 0 scan_clock data_in M_OUT Write Write Read MUX Reduction Technique: • Scan clocks toggling in functional mode • Redundant data activity in registers wasting power Redundant Data Toggles GMC Technique: • Redundant data toggles in read mode • Cycle-based analysis reports % Redundant Cycles
  22. 22. 6/23/2014 © 2014 ANSYS, Inc. 23 Power Database Access with TCL API Power Database (OpenAccess) Design Queries • getMemories/Flops/Combs • getFanout • getModulePorts • reportDesignStats Report Creation • reportCGEfficiency • diffPdbPower • reportPower • reportReductions Power Queries • getPropVal instance/net • getClockPower • getNetPower • getClockEnableExpr Design Navigation • dls • dpwd, dcd • dpushd, dpopd • show Customize and Automate Power Reduction, Reports, Regressions • Quick access to power and design properties • Accomplish custom tasks with few lines of TCL
  23. 23. 6/23/2014 © 2014 ANSYS, Inc. 24 Custom Power Reports 50% Idle Power Reduction in Mobile SoC Instance Name Enable Efficiency Clock Power Clock En Net or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1 en_blk clk data gclk_blk Inefficient enables waste power en_blk clk gclk_blk Block Clock Gate en_reg Register Clock Gate gclk_reg Block-level clock gates control significant power Power Efficiency = 0 Single clock gate controls >5mW PowerArtist clock gating report  identifies inefficient clock gates
  24. 24. 6/23/2014 © 2014 ANSYS, Inc. 25 RTL Power Regressions • 30+ blocks per typical SoC • 2+ vectors per block • Vectors written for power: idle, active • Daily block-level, weekly chip-level regressions monitor power changes • Power metrics track power efficiency • PowerArtist identifies where power changed RTL (Verilog, SV, VHDL) Testbench Simulator FSDB RTL Power Analysis, Reduction, Regression
  25. 25. 6/23/2014 © 2014 ANSYS, Inc. 26 RTL Links with Physical Design
  26. 26. 6/23/2014 © 2014 ANSYS, Inc. 27 PACE™: Physical-Aware RTL Power Budgeting module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule • Clock Distribution • Parasitics • Multiple Vt • Low-power Structures • Optimization PACE Models (Cap, Clock) Post-Layout Gate-level Power RTL Power PACE PACE Bridges the RTL vs. Layout Gap  Predictable RTL Power Accuracy
  27. 27. 6/23/2014 © 2014 ANSYS, Inc. 28 RTL PACE vs. Gate-Power: Mobile SoC @14nm RTL-PACE Power within 20% Total Power Correlation Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation Gate-SPEF vs. RTL-PACE RTL-PACE Clock Power within 20%
  28. 28. 6/23/2014 © 2014 ANSYS, Inc. 29 RTL Power-Driven Power Integrity module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule • Shrinking geometries  Increasing di/dt • Gate vectors too late • Layout late for changes • Error-prone guesstimates RTL Power RPM Enables PDN Planning  Early, Optimal, Robust RTL Power Model RPM Physical Power Integrity
  29. 29. 6/23/2014 © 2014 ANSYS, Inc. 30 RPM Case Studies RPM CPM(Layout)+Pkg CPM(RPM)+Pkg Pkg only RPM Gate FSDB Vectorless Peak = 6X Average Power Di/dt event not at the same time as the peak Peak and di/dt Cycle Selection on a GPU Core Frame: DIDT Start time: 0.0817704 Finish time: 0.0817706 Average leakage for supply VDD: 0.00257393 Average power for supply VDD: 0.185336 Peak power for supply VDD: 0.219776 Frame: CYCLE_POWER Start time: 0.0806005 Finish time: 0.0806007 Average leakage for supply VDD: 0.002569 Average power for supply VDD: 0.250168 Peak power for supply VDD: 0.266678 Early Voltage Drop Analysis Early Package Resonance Analysis
  30. 30. 6/23/2014 © 2014 ANSYS, Inc. 32 Related Presentations @ DAC2014 • Power Analysis Using PowerArtist for WaveLogic3 ASIC – 100Gbs Coherent Metro Optical Modem • Achieving RTL Power Efficiency and Automated Power Reduction • Methods for Achieving RTL to Gate Power Consistency

×