The document contains graphs and charts comparing the performance of different techniques for dynamically allocating threads (CTAs) to cores on a GPU. It shows that dynamically allocating CTAs (DYNCTA) achieves higher performance than static thread allocation, while using less power and energy than alternative dynamic allocation methods that do not leverage power gating of unused cores.
25. 1.4
DYNCTA DYNCORE with power gating DYNCORE without power gating
Normalized Power
1.2
1
0.8
0.6
0.4
26. 3
DYNCTA DYNCORE with power gating DYNCORE without power gating
2.9
Normalized Energy
2.6
2.6
2.2
Efficiency
1.8
1.4
1
0.6
27. IPC power energy
2
Normalized Value
1.5
1
0.5
0
4 8 12 16
Number of Cores Turned Off
28. IPC power energy
2
Normalized Value
1.5
1
0.5
0
64 121
Number of Nodes in the System
Editor's Notes
Slides 2-3-4-5 show some motivational figures. We are showing what happens to round trip latency and core utilization and IPC when we vary the number of CTAs on cores.
This will be used in motivation section, as an illustrative example to the core utilization problem caused by executing more concurrent CTAs.
This is used as a motivation to our power algorithm. It shows how the applications respond to turing off cores.Type 1 applications are memory intensive, Type 2 are compute intensive.We see that Type 1 applications can tolerate execution with smaller number of cores better compared to Type 2 applications.
This is the main performance result. We are comparing the baseline, our algorithm and static best results.
Slides 8-9 show the CTA modulation of six benchmarks over time. They are compared with the default CTA number, static best and the overall average of our algorithm over time.Secondary axis shows the ratio of active cycles to the total cycles, indicating how compute intensive that application is at that point of time.
This figure shows the power consumption of benchmarks after we apply our power algorithm (smaller is better).First bar shows the power consumption of baseline, second bar shows the power consumption of the CTA algorithm,Third bar shows the power consumption when we can apply power gating And fourth bar shows the power consumption when there is no power gating.
This is similar to the figure in 11. Instead of power, it is showing performance/power ratio (larger is better).So we can say that this is the energy efficiency graph.