AMD PowerTune Technology on Workstation Graphics


Published on

AMD PowerTune technology is a significant
leap forward to better ensure that performance
is optimized for TDP-constrained GPUs. AMD
PowerTune technology helps deliver higher
performance that is optimized to the thermal
limits of the GPU by dynamically adjusting the
clock during runtime based on an internally
calculated GPU power assessment. AMD
PowerTune technology also improves the
mechanism to deal with applications that
would otherwise exceed the GPU’s TDP.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AMD PowerTune Technology on Workstation Graphics

  1. 1. AMD PowerTune TechnologyBACKGROUND higher state for maximum performance. High Thermal Design Power and GPU Design workloads tend to push the GPU into the highestDynamic Power Management Like all microprocessors, GPUs consume power state. Intermediate power states are electrical energy while in operation, andFor many years, nearly all ATI FirePro™ supported for tasks of light-to-moderate demand convert it to heat energy which must beand AMD FirePro™ professional graphics – such as light 3D, video and compute tasks. dissipated. The rate of energy consumptionproducts have been equipped with Dynamic An example of a GPU with three primary power is therefore limited by a system’s ability toPower Management (DPM). Dynamic Power states is shown in Figure 1. both deliver power to the device and cool itManagement makes an assessment of the Traditionally, the highest power state has been by removing the heat it generates.relative workload to aggressively conserve a fixed setting. When the graphics device is inpower when the demand on the graphics GPU manufacturers provide system builders this state, the voltage and clock speeds are theprocessor is low. With this technology the GPU with a Thermal Design Power (TDP) figure for same regardless of the type of application thatcan minimize power during light workloads such their products to allow them to design their is being processed. In practice, the manneras idle mode by enabling reductions in voltages, systems appropriately. This represents the in which applications load the GPU can varyengine and memory clock speeds. In such maximum power draw for reliable operation. greatly based on how they are coded andcases, the GPU is in the lowest power state There are many factors which can affect TDP. how they specifically interact with the GPUof voltage and frequency. When demanding architecture. As a result, GPU power draw in the • Voltages and clock speeds – higher transistorworkloads are placed on the GPU, voltage and highest power state can vary to a large degree voltages and switching speeds mean moreclock speeds are increased to a significantly based on the specific application that is running. power consumption. • Workload – Applications that keep a largerFigure 1: Example of Dynamic Performance Scaling percentage of the chip busy most of the time will draw more power. • Leakage – Transistors consume some amount of power even when they are not Relative Voltage and Frequency switching; the amount of leakage for a particular GPU can vary significantly as a result of the manufacturing process and (not to scale) changes in operating temperatures. • Ambient temperature – GPUs operating in a hot environment, or in enclosures with restricted airflow, are more difficult to cool; temperature can also increase if a device is kept heavily loaded for significant Idle and Very Low Workloads Medium Workloads High Workloads periods of time. (Lowest State) (Intermediate State) (Highest State)
  2. 2. TDP figures are typically provided for the entire Figure 2: Sample of Measured GPU Power Draw Normalized to TDP for Various graphics card including the GPU ASIC, voltage Application in the DPM-h Stateregulators, memory devices, interconnects,and other board components. However, 1.2 Outlierassumptions are made when generating such Applicationsfigures regarding the various factors mentioned 1.1above. For maximum reliability, TDP figuresshould assume a ‘worst case’ scenario. In the 1.0 Normalized TDP Limitcase of discrete graphics cards, this usuallymeans running at the maximum supported 0.9 Applicationsclock frequency, on a device with the highest with Thermal 0.8 Headroomallowable leakage, with one or more knownstressful workloads running trouble-free forseveral minutes in a closed system, and with 0.7multiple displays connected (as many as thecard can support simultaneously). 0.6Some allowances can still be made to keep 0.5the TDP reasonable, such as limiting ambient App_1 App_2 App_3 App_4 App_5 App_6 App_7temperature to 45°C, so long as these canbe assumed to fall well outside typical usagescenarios. However, this can still result in a TDPmuch higher than what most users are likely to any circumstances. Similarly, high end GPUs The existing method of dealing with applicationsencounter in normal operation. For this reason, may need to be limited in frequency to fit within that may exceed TDP includes thermala typical maximum board power figure can also a given board power envelope such as 75, 150, monitoring which may lead to a thermalbe useful. This still represents a board running 225 or 300 Watts. Such requirements need to event flag. A thermal event occurs whenat maximum frequency with stressful workloads, be strictly enforced to ensure that the system the GPU is loaded to the point where thebut assumes a device with average leakage design considerations are not compromised. junction temperature exceeds a pre-determinedrather than worst case, fewer connected warning value and forces the GPU into either A sample of measured graphics applicationsdisplays, and a well-ventilated system. an intermediate power state or the lowest and corresponding levels of normalized power power state. Hence, an application thatFurthermore, voltage and clock speeds that are draw for a GPU is provided in Figure 2. The triggers a thermal event will be forced to aselected for the highest power state of a GPU determination of the final voltage and clock much lower level of performance to ensureinclude consideration of the following factors: speeds for the TDP allowance in the high state that TDP limits are not being exceeded. While is generally based on a set of applications that this helps to ensure that TDP limits are being1. he TDP constraints of the overall design T are known to be particularly power intensive. enforced, it is not an ideal situation from an using a ‘worst case’ approach. These include applications that are known application performance perspective. A more to put an exceptionally high demand on the2. he upper limit of frequency of the GPU T ideal scenario would be to precisely curtail the GPU. Some applications are written for the for a given voltage. power and manage it gradually so that it is specific purpose of pushing the GPU past its slightly below an absolute limit while the outlier3. pplications used to determine the A thermal limits and are generally referred to as application is running. power characteristics of the GPU in the outlier applications. Outlier applications tend high power state. to generate much more activity within the GPU For applications that do not approach the limit silicon than the vast majority of applications of a TDP-constrained GPU during the highestThese three factors will largely determine the and consequently generate the largest dynamic power state, a situation arises where moreupper limit of a given GPU design. In many power requirements. However some well-known performance could be made available if thecases, GPUs can exceed the TDP limits of their 3D applications that are not written for such GPU could be cognizant of the available powerdesigns well before reaching their clock speed specific purposes are known to push some headroom. However since the GPU’s highestlimits. This is particularly common with GPUs in 1 1 GPUs beyond their TDP limits. Some outlier power state clock speeds are fixed settings, the 1power constrained notebook platforms as well applications load the GPU in a transient fashion potential of added performance is not very high performance desktop platforms. such that they may only approach or exceed In this situation, performance is left on the tableFor example, an entry level GPU may be able TDP limits occasionally. For example, dynamic since the GPU has additional headroom forto reach a clock of 900 MHz, but thermal design power for a 3D application can vary based on more performance, but lacks a mechanism toconstraints may limit its clock in the high state the content of the rendered scene. exploit 550 MHz to ensure that it does not go overan assigned power budget of 25 watts under Despite these factors, the majority of applications do not necessarily approach the TDP of the GPU in the highest power state.
  3. 3. DYNAMICALLY OPTIMIZED Dynamic TDP Management This approach has multiple advantages. First, it allows TDP-constrained GPUs to ship withPERFORMANCE WITH AMD Traditionally, modern AMD GPUs were equipped engine clock speeds in the highest state thatPOWERTUNE TECHNOLOGY to transition between fixed power states with the would otherwise have been lower without AMD upper states having increasingly higher clockIntroduction PowerTune technology. This subsequently speeds and voltages to increase performance provides greater performance on the majorityAMD PowerTune technology is a significant when needed and minimize power when of applications which do not exceed the TDPleap forward to better ensure that performance performance is not needed. constraints on the GPU. Second, it helps tois optimized for TDP-constrained GPUs. AMD AMD PowerTune technology expands on this avoid throttling of the GPU for extreme outlierPowerTune technology helps deliver higher by removing the constraint that a given power applications by managing down the GPU clockperformance that is optimized to the thermal state must be fixed. The AMD PowerTune speeds before a thermal event is flagged.limits of the GPU by dynamically adjusting the algorithm embedded in the GPU hardware This results in outlier applications running atclock during runtime based on an internally calculates the engine clock based on an internal significantly higher levels of performance thancalculated GPU power assessment. AMD assessment of the runtime power draw. When would otherwise be possible since the GPU isPowerTune technology also improves the the GPU is in the highest activity or power not necessarily forced into an intermediate ormechanism to deal with applications that state and not exceeding TDP, it will remain low power state through a thermal event. Whenwould otherwise exceed the GPU’s TDP. By in the highest power state for maximum an application is running that would otherwisedynamically managing the engine clock speeds performance. In the case where AMD exceed the TDP limit, AMD PowerTunebased on calculations which determine the PowerTune calculates that the GPU is technology can adjust the clock to contain theproximity of the GPU to its TDP limit, AMD exceeding TDP, the power is dynamically runtime power at a safe level that is slightlyPowerTune allows for the GPU to run at higher reduced in a gradual manner by reducing below the TDP limit. Also, outlier applicationsnominal clock speeds in the high state than the clock while still maintaining the high tend to vary in their runtime workloads.otherwise possible. AMD PowerTune technology power state. The amount of clock reduction is AMD PowerTune can manage this while theis very different from existing methods; rather variable and depends on the GPU’s assessment application is running by recalculating powerthan setting highest state GPU clock speeds of the power draw. A representation of how the draw many times within a given frame interval.based on a worst case TDP approach that GPU engine clock speeds in the highest state By keeping the outlier application in the realmcan compromise performance in a majority of can be managed is shown in Figure 3. of the highest state (albeit an inferred state withapplications, AMD PowerTune technology can a reduction in engine clock), the fast transientdynamically adjust the performance profile in response of the AMD PowerTune algorithmreal time to fit within the TDP envelope. is able to quickly raise clock speeds back to the nominal highest power state levels if theFigure 3: GPU Power State Comparison with AMD PowerTune near term demands of the application create additional headroom. Without AMD PowerTune AMD PowerTune Enabled Improved Performance on Critical Applications It is quite common for applications running in the highest power state to have TDP levels Key Benefits of AMD PowerTune below the GPU allowance. The large majority • Higher clocks in the highest power state of applications are not outliers. When the high for TDP constrained designs result inRelative Performance Relative Performance greater performance on the majority of power state clock speeds are fixed, it is not Clock Control applications • Applications which require TDP possible to take advantage of the remaining containment can benefit from higher performance compared to being forced TDP headroom and increase clock speeds to an intermediate state of significantly lesser performance at runtime to further improve application performance. However, with AMD PowerTune technology the GPU is able to ship with engine clock speeds in the highest state that are greater than what could be achieved without the technology. As a result, AMD PowerTune technology directly improves performance on critical applications.
  4. 4. As outlined in the example in Figure 4, Figure 4: Comparison of Outlier Application Behavioran application with a large amount of headroom(in this case, App_a) has the greatest Relative Power of a GPU at Nominal Relative Power of a GPU with Theoretical Frequency of 700 MHz AMD PowerTune Technologypotential for performance improvement with 130 Unsustainable: 130AMD PowerTune technology by allowing the Thermal Event 120 120maximum engine clock available for the GPU. 110 110Similarly, an application with some headroom 100 100(App_b) can still take advantage of the higher 90 90available clock speeds in the highest power 80 80 70 70state, but perhaps to a lesser degree than an 60 60application with more headroom. 50 50 @ 800 @ 750 @ 700 @ 650 MHz MHz MHz MHz 40 40AMD professional graphics products supporting App_a App_b App_c App_d App_a App_b App_c App_dAMD PowerTune technology use calibrationinformation based on performance analysisusing workstation applications. This ensuresoptimum performance across a wide range TDP budget is to maintain operation in the SUMMARY highest power state, but dial back on runtimeof your most critical professional application AMD PowerTune Technology Maximizes power by modulating the high power stateusage scenarios. Generally speaking, these TDP-Constrained Performance by Enabling clock to keep the TDP range slightly belowapplications have a very different usage profile Higher GPU Clock Speeds the absolute limit. This keeps the GPU awaywhen compared to consumer applications. For from the undesirable performance penalty of AMD PowerTune is a breakthrough technologyexample, workstation applications typically a forced state reduction arising from a thermal that sets an entirely new direction for maximumdo a great deal of geometry processing, and throttling event. Application performance is performance at TDP. It allows the GPU to besignificantly less pixel shading and texturing. maximized and system stability is improved as designed with higher engine clock speedsDue to this difference in usage profiles, applications cannot exceed the available power which can be applied on the broad set ofapplications can behave differently on AMD for the graphics board and stress the GPU with applications that have thermal headroom.professional graphics when compared to AMD excessive thermal conditions. It also improves how GPUs manage outlierconsumer graphics products. AMD professionalgraphics products have been optimized for Programmable, Deterministic and applications by managing them down to powerproductivity, while AMD consumer graphics Application Profile Independent Behavior levels within the TDP limits with minimalproducts are optimized for gaming performance. performance impact. Without AMD PowerTune AMD PowerTune power monitoring and technology, a TDP-limited GPU’s final clockMinimized Performance Impact on management technology is integrated into the speeds would inherently be based on aApplications which would otherwise GPU silicon itself to essentially eliminate the compromise between severe performanceexceed TDP unpredictable variability that would arise if it loss on higher power applications and were implemented at the board or system level performance left on the table with lower powerAMD PowerTune technology allows TDP- with analog sensors and feedback mechanisms. applications. With the intelligent monitoringconstrained GPUs to ship with greater nominal Activity is monitored to infer real-time power and management capabilities introducedclock speeds in the highest power state due to draw at the device level through integrated by AMD PowerTune technology, thesethe mechanism by which it handles applications counters that are placed throughout the GPU. compromises are removed to maximizewhich exceed TDP. Without AMD PowerTune As a result, AMD PowerTune is transparent performance and improve system, applications which exceed the in its ability to contain applications in realGPU TDP are forced to lower power states time without a reliance on specific drivers or(such as intermediate or lowest states) and application profiles. AMD PowerTune technologypay a very steep performance penalty as a is also programmable in a way that can allowresult of drastically reduced clock speeds and GPU and system designers to tailor the powervoltages. In the AMD PowerTune-enabled GPU, containment behavior to the specific needs ofthe clock speeds in the highest state can be the user. This can allow some systems to set adynamically managed to hold the TDP budget lower TDP threshold to enable a cooler overallin a way that was not otherwise achievable. system under heavy load, or set a higher TDPThe goal for applications that exceed the threshold in the case where there is known to be additional TDP headroom.DISCLAIMERThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions andtypographical errors. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOFAND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THISINFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT,OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT,INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINEDHEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.© Copyright 2011 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, FirePro, the FirePro logo, Catalyst,CrossFire, and combinations thereof are trademarks of Advanced Micro Devices, Inc. 50011A