Altera trcak g


Published on

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Goal of this slide - Lay the foundations of power consumption. Engineers will understand all of the three types of power listed, but this introductory slide ensures you are all on the same page. In-Rush (AKA Powerup Icc, surge Icc, startup Icc. Icc means current) is the current drawn by the FPGA immediately after power is applied to the pins. Typically, FPGAs would suck a lot of current during this stage, but technological innovation implemented in Stratix II has allowed us to mitigate these effects as shown in the diagram. EP2S60ES surges approximately 2A. Virtex-4 will surge as indicated by their datasheet, see FAE power presentation. Stratix devices were also spec.’ed to surge. Static (AKA Standby, Quiescent , leakage) is the current drawn by the non-operating portions of the device. In this diagram, the static section shows the power consumption of the device in a non-operational condition (ie. no clock signal is applied). 90 nm Stratix II devices consume more Static power over 130 nm Stratix. Dynamic+Static (AKA Total Power) is the power consumed by the FPGA during normal operation of the device. The clock is oscillating and the customer’s design is operating. Actual dynamic power consumption (and therefore total power) will vary greatly from design to design based on frequency, resource utilization, operating temperature, and other factors.
  • Non-terminated standard consume very little static power
  • Choosing appropriate I/O standards can significantly reduce design power. To reduce power, use a low-voltage I/O standard (most important) and the lowest drive strength that will meet your speed requirements. For lower frequency applications or applications where I/Os are idle most of the time, I/O standards which are not resistively terminated, such as LVTTL, LVCMOS and PCI have the lowest total power, since they have very low static power. However, as I/O toggle rates increase these I/O standards eventually dissipate more power than resistively terminated standards such as SSTL, HSTL and LVDS, as the unterminated I/O standards generally have higher dynamic power. Use the PowerPlay Power Analyzer to analyze different I/O configurations and choose the lowest power option for your system. v
  • 2. Block Model, Routing Model, Operating Conditions, Vectorless Activity Estimation 3. Thermal management: Heat sink and Fan (Airflow)
  • Glitch filtering
  • Power Dissipation of Stratix II Devices in 112 Different Designs Varied Logic Resource Utilization Across Available Resources All Clocks set to 200 MHz Assuming 12.5% Toggle Rate Everywhere Power Breakdown is Design Dependant Characteristics Vary Depending on Design Function
  • When instantiating a RAM using the MegaWizard Plug-In Manager …
  • Not really needed for Quartus synthesis, since you can tell it to directly optimize for power. Area synthesis
  • altclkctrl
  • You can access this MegaFunction with the MegaWizard Plug-In Manager in the Quartus GUI.
  • In addition to selectable core voltage feature, Stratix III also offers Programmable Power Technology that enables Stratix III core logic to be programmed at the tile level for high-speed or low-power mode configuration. This is done automatically by Quartus II software. Tiles are defined as a combination of a LAB and MLAB pair which also includes the adjacent routing associated with LAB and MLAB, A DSP block, A memory block and a I/O interface is also define as a tile. Tiles with DSP blocks, memory blocks, and I/O elements that are used in the design are always set to high-speed mode and are configured as low-power by default when they are not used in the design and reduces static and dynamic power.
  • Existing FPGA fabrics are designed to deliver the highest performance everywhere which results in high leakage current everywhere. (Click) However, the majority of designs have only a few timing critical paths. These paths require the highest performance. (Click) The rest of the logic, though, does not require the highest performance and with Stratix III all non-performance critical logic can be set to low-power mode. (Click) All unused logic is also set to low-power mode This reduces static power by 70% where low-power logic can be used. Productivity is a significant part of this story as this added complexity would be a huge burden for the customer to manage unless we fully automate this process in the Quartus II design methodology.
  • Designs that are very high in LE usage and frequency see the highest power reduction relative to the FPGA.
  • Altera trcak g

    1. 1. Device and circuit architecture for Low power design & techniques Shlomi Shaked – Senior FAE ALTERA Department - Eastronics
    2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>Power Analysis </li></ul><ul><li>Power Optimization </li></ul><ul><li>Technology for Low Power </li></ul>
    3. 3. Introduction Static, Dynamic and I/O Power in FPGAs
    4. 4. Power Basics Current Power Up Static Total Power (Dynamic+Static) Time Stratix Family Power-Up Profile In-Rush Current Typical FPGA Stratix Family
    5. 5. Power Components <ul><li>Power During Operation </li></ul><ul><ul><li>Standby or Static Power </li></ul></ul><ul><ul><ul><li>Power with clocks stopped </li></ul></ul></ul><ul><ul><li>Dynamic Power </li></ul></ul><ul><ul><ul><li>Power that increases with clock frequency </li></ul></ul></ul><ul><ul><li>Get this power from Early Power Estimator or Quartus Power Analyzer </li></ul></ul><ul><li>Power During Start-up </li></ul><ul><ul><li>Temporary Power-Up Spike / Inrush Current </li></ul></ul><ul><ul><li>Configuration Power (to program SRAMs) </li></ul></ul><ul><ul><li>Get this power Information from data sheet </li></ul></ul>
    6. 6. Standby Power <ul><li>Standby or Static Power </li></ul><ul><ul><li>Power drawn by device even when the clocks are stopped </li></ul></ul><ul><li>Two Components </li></ul><ul><ul><li>Leakage Power: Transistors don’t turn off fully </li></ul></ul><ul><ul><li>IO Power for Terminated IO Standards </li></ul></ul><ul><ul><ul><li>IOs continuously drive current into resistors, even with no clock </li></ul></ul></ul>
    7. 7. Dynamic Power <ul><li>Dynamic Power </li></ul><ul><ul><li>Increases Linearly (or close to linearly) with clock Frequency </li></ul></ul><ul><li>Two Components </li></ul><ul><ul><li>Power due to Charging and Discharging of Capacitance of Routing Wires, ALMs, Load Capacitance on I/O Pins, etc. </li></ul></ul><ul><ul><li>Short Circuit Power </li></ul></ul><ul><ul><ul><li>Power Dissipated When Current Flows in a Direct Path from V CC to Ground during switching </li></ul></ul></ul>
    8. 8. I/O Power <ul><li>Dynamic Power to Charge Capacitance </li></ul><ul><li>Static Power </li></ul><ul><ul><li>Significant for Resistively-Terminated Standards like SSTL </li></ul></ul><ul><ul><li>Negligible for Non-Terminated I/O Standards like LVTTL and LVCOMS </li></ul></ul><ul><li>Terminated I/O Standards: Some Power Dissipated as Heat in Off-Chip Resistors </li></ul><ul><ul><li>Power Models Give Both Values </li></ul></ul><ul><ul><ul><li>Power Dissipated as Heat on FPGA (Thermal Power) </li></ul></ul></ul><ul><ul><ul><li>Power Drawn From Voltage Supply (Larger) </li></ul></ul></ul>FPGA Output Buffer R1 R2 C L Vccio I BUFFER V TT
    9. 9. Power Analysis Early Power Estimation ( EPE ), PowerPlay Power Analysis.
    10. 10. Power Analysis <ul><li>Three parts to good power estimates </li></ul><ul><ul><li>Accurate Toggle Rate data on each signal </li></ul></ul><ul><ul><li>Accurate Power Models of FPGA circuitry </li></ul></ul><ul><ul><li>Knowledge of device Operating Conditions </li></ul></ul>Toggle Rate & Signal Probability Power Models Power Estimation Report Operating conditions
    11. 11. PowerPlay Power Analysis Tools Lower Higher Higher Estimation Accuracy PowerPlay Analysis Inputs Design Concept Design Implementation User Input Quartus II Design Profile Place & Route Results Simulation Results Early Power Estimator Spreadsheets Quartus II Power Analyzer
    12. 12. PowerPlay - Early Power Estimator
    13. 13. PowerPlay Power Analyzer <ul><li>Accurately Estimate the device power consumption after the design is completed </li></ul>Signal Activities User Design (after Fitting) PowerPlay Power Analyzer Power Analysis Report Operating Conditions
    14. 14. PowerPlay Power Analyzer Tool <ul><li>PowerPlay Power Analyzer Tool under Tools Menu </li></ul><ul><li>Toggle Rate Input </li></ul><ul><ul><li>Signal Activity File </li></ul></ul><ul><ul><ul><li>Output by Quartus II simulator </li></ul></ul></ul><ul><ul><li>VCD </li></ul></ul><ul><ul><ul><li>Generated By 3 rd -Party Simulators </li></ul></ul></ul><ul><ul><li>Assignment Editor </li></ul></ul><ul><ul><li>Unspecified Toggle Rates: use either: </li></ul></ul><ul><ul><ul><li>Default Toggle Rate </li></ul></ul></ul><ul><ul><ul><li>Vectorless estimation </li></ul></ul></ul><ul><li>Operation Condition Setting </li></ul>
    15. 15. Power Optimization Synthesis & Place & Route
    16. 16. Core Dynamic Power Breakdown *DSP Block Power: 5% of Dynamic Power for Designs That Use DSP Blocks Average power Dissipation in varies FPGA architecture elements Routing 38% ALM Combinational 19% ALM Registers 18% RAM Blocks 14% Clock Networks 9% DSP Blocks 2%*
    17. 17. Power-Driven Compilation Flow – Quick and Easy <ul><li>Straight Forward </li></ul><ul><li>Longer Compile Time </li></ul><ul><li>Not Fully Optimized for Power </li></ul>Design Entry Schematic/HDL Power-Driven Synthesis (Extra effort) Power-Driven Fitter (Extra effort) PowerPlay Power Analyzer (Power Estimation)
    18. 18. Power-Driven Compilation Flow -Recommend <ul><li>Use Accurate Toggle Data From Simulation Results, Provide Best Guidance to Power-Driven Fitting </li></ul><ul><ul><li>SAF Provides the Design Signal Activity Information </li></ul></ul><ul><ul><li>Reads the Power Analyzer Input Settings </li></ul></ul><ul><li>Time Consuming Because of Longer Flow </li></ul><ul><li>Very Effective </li></ul>Fit Design Find Signal Toggle Rates: Gate-Level Simulation with Glitch Filtering Signal Activity (SAF) File Design Entry Schematic/HDL Power-Driven Synthesis (Extra effort) Power-Driven Fitter (Extra effort) PowerPlay Power Analyzer (Power Estimation)
    19. 19. 1.Power-Driven Synthesis <ul><li>Under Analysis & Synthesis Settings </li></ul><ul><li>Power Optimization Settings </li></ul><ul><ul><li>OFF: No Optimization </li></ul></ul><ul><ul><li>Normal compilation (Default): Power Optimizations which do not impact performance and do not Increase Compile Time </li></ul></ul><ul><ul><li>Extra effort: Power Optimizations which May Impact Design Performance and/or Increase Compile Time </li></ul></ul>
    20. 20. Impact On Memory Blocks <ul><li>Specify read-enable & write-enable signals on your RAMs whenever possible </li></ul><ul><ul><li>PowerPlay will convert to clock enables </li></ul></ul><ul><ul><li>Completely shuts down RAM on many cycles </li></ul></ul><ul><li>Leave RAM Block Type = Auto </li></ul><ul><ul><li>Power optimizer will choose best RAM block </li></ul></ul><ul><li>Memory Optimization </li></ul><ul><ul><li>Extra effort Setting </li></ul></ul><ul><ul><ul><li>Power-Aware Memory Balancing </li></ul></ul></ul>
    21. 21. Impact On Memory Blocks (Cont) Addr Decoder Data[0:3] Addr[10:11] Addr[10:11] Addr[0:9] Addr[0:11] Data[0:3] Power Efficient (Extra effort) Default Implementation 4K x 4 Memory 4K x 1 M4K RAM 1K x 4 M4K RAM 4 Extra effort Setting Normal Compilation Setting
    22. 22. Impact On Logic Elements <ul><li>Power-Aware Logic Mapping </li></ul><ul><ul><li>Normal compilation or Extra effort Settings </li></ul></ul><ul><ul><ul><li>Re-Arrange Logic During Synthesis to Reduce Impact of High Toggling Nets </li></ul></ul></ul><ul><ul><ul><ul><li>Balance the Area / Power / Speed Goals </li></ul></ul></ul></ul><ul><li>Less logic usually means less power </li></ul><ul><ul><li>Fewer signals to toggle </li></ul></ul>
    23. 23. 2.Power-Driven Fitter <ul><li>Under Fitter Settings </li></ul><ul><li>Power Optimization Settings </li></ul><ul><ul><li>OFF: No Optimization </li></ul></ul><ul><ul><li>Normal compilation (Default): Power Optimizations which do not impact performance and do not Increase Compile Time </li></ul></ul><ul><ul><li>Extra effort: Power Optimizations which May Impact Design Performance and/or Increase Compile Time </li></ul></ul>
    24. 24. Two Level Of Optimization <ul><li>Normal Compilation Setting </li></ul><ul><li>Power Efficient DSP Block Configuration </li></ul><ul><ul><li>Swap Operands to Multipliers </li></ul></ul><ul><ul><ul><li>Swap DATAB with DATAA if DATAB is wider than DATAA </li></ul></ul></ul><ul><ul><ul><li>Transparent to Designer and No Affect on Performance </li></ul></ul></ul><ul><li>Extra Effort Setting </li></ul><ul><li>Power Efficient DSP Block Configuration </li></ul><ul><li>Localize High-Toggling Nets, and Route for Minimum Capacitance </li></ul><ul><li>Place Circuitry to Minimize Clock Power </li></ul><ul><li>Utilizes the Signal Activity File to Guide the Fitter (Recommended) </li></ul>
    25. 25. Place Circuitry to Minimize Clock Power <ul><li>Previously P&R </li></ul><ul><ul><li>Places LEs Wherever is Best for Timing and Wiring </li></ul></ul><ul><ul><li>Doesn’t Try to Minimize Clock Power </li></ul></ul>LEs Clocks
    26. 26. Place Circuitry to Minimize Clock Power (Cont) <ul><li>With Extra effort: </li></ul><ul><ul><li>Groups LEs From Same Clock Domain to Reduce Clock Power </li></ul></ul><ul><ul><li>Reduces Clock Power with Minimal Effect on Routability </li></ul></ul>
    27. 27. 3.Clock Power Management <ul><li>Clocks represent a significant portion of dynamic power consumption </li></ul><ul><li>Clock routing power is automatically optimized by the QII software </li></ul><ul><li>Dynamic clock enable lets internal logic control the clock network </li></ul><ul><li>Gated clock in the LAB </li></ul>
    28. 28. Dynamic Clock Enable <ul><li>Entire clock domain unused in some cycles </li></ul><ul><ul><li>Use the altclkctrl MegaFunction to safely gate the clock </li></ul></ul><ul><ul><li>Shuts down entire clock tree  lower power than a clock enable on all registers </li></ul></ul>
    29. 29. Clock Control Block <ul><li>Use MegaWizard to Generate these Blocks </li></ul><ul><li>Dynamically Enable or Disable the Clock Network using Enable Signal </li></ul><ul><ul><li>When Clock Network is Powered Down, all the Logic Fed by that Clock does not Toggle </li></ul></ul><ul><ul><li>Reduces Overall Device Power Consumption </li></ul></ul><ul><li>Global and Regional Clock Network </li></ul>
    30. 30. 4.Architectural Optimization <ul><li>Taking advantage of specific architecture resources. </li></ul><ul><li>TriMatrix memory is optimized for different specific function. </li></ul><ul><li>Systemic design consideration </li></ul>
    31. 31. Use Dedicated Resources <ul><li>DSP Blocks </li></ul><ul><ul><li>Less power than logic elements except for small multiplies (e.g. 5x5) </li></ul></ul><ul><ul><li>Use all the DSP logic (not just multipliers): </li></ul></ul><ul><ul><ul><li>Multiplier-accumulator, complex-multiplier, finite impulse response sample chaining, etc. </li></ul></ul></ul><ul><ul><li>Use altmult_accum MegaFunction if synthesis not inferring </li></ul></ul><ul><li>RAM blocks </li></ul><ul><ul><li>Usually inferred by synthesis </li></ul></ul><ul><ul><li>Use altsyncram MegaFunction if necessary </li></ul></ul><ul><li>Shift registers </li></ul><ul><ul><li>Many toggling signals: Power inefficient </li></ul></ul><ul><ul><li>Medium to large shift registers: Implement in FIFOs </li></ul></ul><ul><ul><li>Use altshift_taps MegaFunction if necessary </li></ul></ul>
    32. 32. 5. Other Power Optimization Tool <ul><li>Power Optimization Advisor </li></ul><ul><ul><li>Provides specific power optimization advice and recommendations based on the current project settings and assignments </li></ul></ul><ul><li>Design Space Explorer </li></ul><ul><ul><li>Last method </li></ul></ul><ul><ul><li>Need more time </li></ul></ul>
    33. 33. Technology for Low Power Cyclone III LS / Cyclone IV Stratix IV / Stratix V Hardcopy™
    34. 34. Power / Performance / Area Compromises Power Utilization Performance
    35. 35. Key Technologies to Reduce Power FPGA Power Reduction (Yellow Highlight 28nm Techniques) Lower Static Power Lower Dynamic Power Process innovations (65nm -> 40nm -> 28nm…)   Programmable Power Technology  Lower core voltage (1.1V -> 1.0V -> 0.85 V)   Extensive hardening of IP, Embedded HardCopy Blocks   Hard power-down of more functional blocks  More granular clock gating  Selective use of high-speed transistors  Partial reconfiguration   Dynamic on-chip termination   Quartus II software PowerPlay power optimization  
    36. 36. Programmable Power Technology <ul><li>Programmable Power Technology enable Altera High end FPGA core logic to be programmed at the tile level for high-speed or low-power mode configuration </li></ul><ul><li>Tiles are defined as: </li></ul><ul><ul><li>MLAB/LAB pairs with routing to the pair </li></ul></ul><ul><ul><li>DSP blocks </li></ul></ul><ul><ul><li>Memory blocks </li></ul></ul><ul><ul><li>I/O interface </li></ul></ul><ul><li>Tiles with DSP blocks, memory blocks, and I/O elements that are used in the design are always set to high-speed mode </li></ul><ul><ul><li>Unused DSP blocks, memory blocks, and I/O interfaces are set to low-power mode by default to reduce static and dynamic power </li></ul></ul>
    37. 37. Programmable Speed vs. Leakage Note: A simple “model” showing Programmable Power Technology. Actual implementation varies and is patented. Source substrate Drain Gate 0 V < 0 V High speed (HS) Low power (LP) V T – Automatically controlled by software Channel Power High speed Low power Threshold voltage
    38. 38. Programmable Power Technology Performance where you need it, lowest power everywhere else, automated by Quartus II software Logic array High-speed logic Timing critical path Low-power logic Unused low-power logic
    39. 39. Power Reduction with DDR3 & Dynamic OCT Save 1.9W per 72-bit DIMM at 1067 Mbps Write (Matching line impedance) Read (Terminating far end) Stratix IV FPGA Memory chip Stratix IV FPGA Memory chip <ul><li>DDR3 consumes 30% lower power than DDR2 </li></ul><ul><ul><li>DDR2 requires 1.8-V VCC rails </li></ul></ul><ul><ul><li>DDR3 requires 1.5-V VCC rails </li></ul></ul><ul><li>Dynamic OCT reduces termination power by 1 W/72-bits </li></ul>
    40. 40. HardCopy IV Devices Designed for Low Power <ul><li>Optimized architecture for power efficiency </li></ul><ul><li>Unused logic and memory blocks not connected to power rail </li></ul><ul><li>Unused clock trees not powered </li></ul><ul><li>Total core power reduction estimates— 30% to 70% </li></ul><ul><li>Final results pending characterization </li></ul>0 1 2 3 4 5 6 Stratix ® FPGAs HardCopy ® ASICs Power (W) I/O DSP Leakage Logic Routing and clocks RAM
    41. 41. Thank you.