AMD PowerTune technology is a significant
leap forward to better ensure that performance
is optimized for TDP-constrained GPUs. AMD
PowerTune technology helps deliver higher
performance that is optimized to the thermal
limits of the GPU by dynamically adjusting the
clock during runtime based on an internally
calculated GPU power assessment. AMD
PowerTune technology also improves the
mechanism to deal with applications that
would otherwise exceed the GPU’s TDP.
Azure Monitor & Application Insight to monitor Infrastructure & Application
AMD PowerTune Technology on Workstation Graphics
1. AMD PowerTune Technology
BACKGROUND higher state for maximum performance. High Thermal Design Power and GPU Design
workloads tend to push the GPU into the highest
Dynamic Power Management Like all microprocessors, GPUs consume
power state. Intermediate power states are
electrical energy while in operation, and
For many years, nearly all ATI FirePro™ supported for tasks of light-to-moderate demand
convert it to heat energy which must be
and AMD FirePro™ professional graphics – such as light 3D, video and compute tasks.
dissipated. The rate of energy consumption
products have been equipped with Dynamic An example of a GPU with three primary power
is therefore limited by a system’s ability to
Power Management (DPM). Dynamic Power states is shown in Figure 1.
both deliver power to the device and cool it
Management makes an assessment of the Traditionally, the highest power state has been by removing the heat it generates.
relative workload to aggressively conserve a fixed setting. When the graphics device is in
power when the demand on the graphics GPU manufacturers provide system builders
this state, the voltage and clock speeds are the
processor is low. With this technology the GPU with a Thermal Design Power (TDP) figure for
same regardless of the type of application that
can minimize power during light workloads such their products to allow them to design their
is being processed. In practice, the manner
as idle mode by enabling reductions in voltages, systems appropriately. This represents the
in which applications load the GPU can vary
engine and memory clock speeds. In such maximum power draw for reliable operation.
greatly based on how they are coded and
cases, the GPU is in the lowest power state There are many factors which can affect TDP.
how they specifically interact with the GPU
of voltage and frequency. When demanding architecture. As a result, GPU power draw in the • Voltages and clock speeds – higher transistor
workloads are placed on the GPU, voltage and highest power state can vary to a large degree voltages and switching speeds mean more
clock speeds are increased to a significantly based on the specific application that is running. power consumption.
• Workload – Applications that keep a larger
Figure 1: Example of Dynamic Performance Scaling percentage of the chip busy most of the
time will draw more power.
• Leakage – Transistors consume some
amount of power even when they are not
Relative Voltage and Frequency
switching; the amount of leakage for a
particular GPU can vary significantly as
a result of the manufacturing process and
(not to scale)
changes in operating temperatures.
• Ambient temperature – GPUs operating
in a hot environment, or in enclosures
with restricted airflow, are more difficult to
cool; temperature can also increase if a
device is kept heavily loaded for significant
Idle and Very Low Workloads Medium Workloads High Workloads periods of time.
(Lowest State) (Intermediate State) (Highest State)
2. TDP figures are typically provided for the entire Figure 2: Sample of Measured GPU Power Draw Normalized to TDP for Various
graphics card including the GPU ASIC, voltage Application in the DPM-h State
regulators, memory devices, interconnects,
and other board components. However, 1.2
Outlier
assumptions are made when generating such Applications
figures regarding the various factors mentioned 1.1
above. For maximum reliability, TDP figures
should assume a ‘worst case’ scenario. In the 1.0 Normalized TDP Limit
case of discrete graphics cards, this usually
means running at the maximum supported 0.9
Applications
clock frequency, on a device with the highest with Thermal
0.8 Headroom
allowable leakage, with one or more known
stressful workloads running trouble-free for
several minutes in a closed system, and with 0.7
multiple displays connected (as many as the
card can support simultaneously). 0.6
Some allowances can still be made to keep 0.5
the TDP reasonable, such as limiting ambient App_1 App_2 App_3 App_4 App_5 App_6 App_7
temperature to 45°C, so long as these can
be assumed to fall well outside typical usage
scenarios. However, this can still result in a TDP
much higher than what most users are likely to any circumstances. Similarly, high end GPUs The existing method of dealing with applications
encounter in normal operation. For this reason, may need to be limited in frequency to fit within that may exceed TDP includes thermal
a typical maximum board power figure can also a given board power envelope such as 75, 150, monitoring which may lead to a thermal
be useful. This still represents a board running 225 or 300 Watts. Such requirements need to event flag. A thermal event occurs when
at maximum frequency with stressful workloads, be strictly enforced to ensure that the system the GPU is loaded to the point where the
but assumes a device with average leakage design considerations are not compromised. junction temperature exceeds a pre-determined
rather than worst case, fewer connected warning value and forces the GPU into either
A sample of measured graphics applications
displays, and a well-ventilated system. an intermediate power state or the lowest
and corresponding levels of normalized power
power state. Hence, an application that
Furthermore, voltage and clock speeds that are draw for a GPU is provided in Figure 2. The
triggers a thermal event will be forced to a
selected for the highest power state of a GPU determination of the final voltage and clock
much lower level of performance to ensure
include consideration of the following factors: speeds for the TDP allowance in the high state
that TDP limits are not being exceeded. While
is generally based on a set of applications that
this helps to ensure that TDP limits are being
1. he TDP constraints of the overall design
T are known to be particularly power intensive.
enforced, it is not an ideal situation from an
using a ‘worst case’ approach. These include applications that are known
application performance perspective. A more
to put an exceptionally high demand on the
2. he upper limit of frequency of the GPU
T ideal scenario would be to precisely curtail the
GPU. Some applications are written for the
for a given voltage. power and manage it gradually so that it is
specific purpose of pushing the GPU past its
slightly below an absolute limit while the outlier
3. pplications used to determine the
A thermal limits and are generally referred to as
application is running.
power characteristics of the GPU in the outlier applications. Outlier applications tend
high power state. to generate much more activity within the GPU For applications that do not approach the limit
silicon than the vast majority of applications of a TDP-constrained GPU during the highest
These three factors will largely determine the
and consequently generate the largest dynamic power state, a situation arises where more
upper limit of a given GPU design. In many
power requirements. However some well-known performance could be made available if the
cases, GPUs can exceed the TDP limits of their
3D applications that are not written for such GPU could be cognizant of the available power
designs well before reaching their clock speed
specific purposes are known to push some headroom. However since the GPU’s highest
limits. This is particularly common with GPUs in
1 1 GPUs beyond their TDP limits. Some outlier power state clock speeds are fixed settings, the
1
power constrained notebook platforms as well
applications load the GPU in a transient fashion potential of added performance is not realized.
as very high performance desktop platforms.
such that they may only approach or exceed In this situation, performance is left on the table
For example, an entry level GPU may be able
TDP limits occasionally. For example, dynamic since the GPU has additional headroom for
to reach a clock of 900 MHz, but thermal design
power for a 3D application can vary based on more performance, but lacks a mechanism to
constraints may limit its clock in the high state
the content of the rendered scene. exploit it.
to 550 MHz to ensure that it does not go over
an assigned power budget of 25 watts under Despite these factors, the majority of
applications do not necessarily approach the
TDP of the GPU in the highest power state.
3. DYNAMICALLY OPTIMIZED Dynamic TDP Management This approach has multiple advantages. First,
it allows TDP-constrained GPUs to ship with
PERFORMANCE WITH AMD Traditionally, modern AMD GPUs were equipped
engine clock speeds in the highest state that
POWERTUNE TECHNOLOGY to transition between fixed power states with the
would otherwise have been lower without AMD
upper states having increasingly higher clock
Introduction PowerTune technology. This subsequently
speeds and voltages to increase performance
provides greater performance on the majority
AMD PowerTune technology is a significant when needed and minimize power when
of applications which do not exceed the TDP
leap forward to better ensure that performance performance is not needed.
constraints on the GPU. Second, it helps to
is optimized for TDP-constrained GPUs. AMD AMD PowerTune technology expands on this avoid throttling of the GPU for extreme outlier
PowerTune technology helps deliver higher by removing the constraint that a given power applications by managing down the GPU clock
performance that is optimized to the thermal state must be fixed. The AMD PowerTune speeds before a thermal event is flagged.
limits of the GPU by dynamically adjusting the algorithm embedded in the GPU hardware This results in outlier applications running at
clock during runtime based on an internally calculates the engine clock based on an internal significantly higher levels of performance than
calculated GPU power assessment. AMD assessment of the runtime power draw. When would otherwise be possible since the GPU is
PowerTune technology also improves the the GPU is in the highest activity or power not necessarily forced into an intermediate or
mechanism to deal with applications that state and not exceeding TDP, it will remain low power state through a thermal event. When
would otherwise exceed the GPU’s TDP. By in the highest power state for maximum an application is running that would otherwise
dynamically managing the engine clock speeds performance. In the case where AMD exceed the TDP limit, AMD PowerTune
based on calculations which determine the PowerTune calculates that the GPU is technology can adjust the clock to contain the
proximity of the GPU to its TDP limit, AMD exceeding TDP, the power is dynamically runtime power at a safe level that is slightly
PowerTune allows for the GPU to run at higher reduced in a gradual manner by reducing below the TDP limit. Also, outlier applications
nominal clock speeds in the high state than the clock while still maintaining the high tend to vary in their runtime workloads.
otherwise possible. AMD PowerTune technology power state. The amount of clock reduction is AMD PowerTune can manage this while the
is very different from existing methods; rather variable and depends on the GPU’s assessment application is running by recalculating power
than setting highest state GPU clock speeds of the power draw. A representation of how the draw many times within a given frame interval.
based on a worst case TDP approach that GPU engine clock speeds in the highest state By keeping the outlier application in the realm
can compromise performance in a majority of can be managed is shown in Figure 3. of the highest state (albeit an inferred state with
applications, AMD PowerTune technology can a reduction in engine clock), the fast transient
dynamically adjust the performance profile in response of the AMD PowerTune algorithm
real time to fit within the TDP envelope. is able to quickly raise clock speeds back to
the nominal highest power state levels if the
Figure 3: GPU Power State Comparison with AMD PowerTune near term demands of the application create
additional headroom.
Without AMD PowerTune AMD PowerTune Enabled Improved Performance on
Critical Applications
It is quite common for applications running in
the highest power state to have TDP levels
Key Benefits of AMD PowerTune below the GPU allowance. The large majority
• Higher clocks in the highest power state
of applications are not outliers. When the high
for TDP constrained designs result in
Relative Performance
Relative Performance
greater performance on the majority of power state clock speeds are fixed, it is not
Clock Control
applications
• Applications which require TDP
possible to take advantage of the remaining
containment can benefit from higher
performance compared to being forced
TDP headroom and increase clock speeds
to an intermediate state of significantly
lesser performance
at runtime to further improve application
performance. However, with AMD PowerTune
technology the GPU is able to ship with engine
clock speeds in the highest state that are
greater than what could be achieved without
the technology. As a result, AMD PowerTune
technology directly improves performance on
critical applications.