Energy Efficiency in Multicore CPUs: Harnessing Voltage Margins
Presentation given by Dimitris Gizopoulos (University of Athens) at the LEGaTO Final Event: Low-Energy Heterogeneous Computing Workshop on 4 September 2020
This event was collocated with FPL 2020
Energy Efficiency in Multicore CPUs: Harnessing Voltage Margins
1.
Low-Energy Heterogeneous ComputingWorkshop – @FPL 2020 – September 4, 2020
Energy Efficiency in Multicore CPUs:
Harnessing Voltage Margins
Dimitris Gizopoulos
University of Athens
2.
@LEGaTO/FPL – September2020
U Athens
CPUs: Power, Energy, Performance
2
• Powerdynamic = ½ x Capacitance x frequency x Voltage2
• Energydynamic = Powerdynamic x Time
• Nominal Voltage & Frequency
= Worst Case (Workload, Conditions, Variability, Aging)
• i.e. Power, Energy, Performance Costs
3.
@LEGaTO/FPL – September2020
U Athens
In a Nutshell – Beyond Margins
3
+
-
Margins ?
(how low can you
safely/unsafely go ?)
CPU is under-volted (supply voltage under-scaling)
(or CPU over-clocked, or DRAM under-refreshed)
Behavior ?
(what happens in
the danger zone ?)
Variability ?
(among cores/
chips/workloads ?)
Faster ?
(less time to
characterize ?)
Model ?
(simulation
models ?)
Predict ?
(correlate to
run time stats ?)
Monitor/Expose ?
(log/report
to sw ?)
??
1
2
3
4
5
6
7
This work is on ARMv8 CPUs and
their voltage scaling
4.
@LEGaTO/FPL – September2020
U Athens
Margins Characterization Landscape
o First study on ARMv8-based micro-server CPU chips
ISA Processor/Chip Technology Reference
POWER 7 / 7+ IBM Power 750, 780 45 / 32 nm IBM (MICRO’11), UT Austin (MICRO’15)
IA-64 Intel Itanium 9560 32 nm Ohio State U (ISCA ‘13, MICRO ’14)
x86-64 Intel i7-3970X, i5-4200U 32 / 22 nm University of Athens (IOLTS ’17)
Nvidia Fermi /
Kepler
GTX 480, 580, 680, 780 40 / 28 nm IBM, UT Austin (MICRO ’15)
Xilinx FPGAs
Virtex-7, Zynq7000,
Kintex-7
28 nm BSC/UPC (MICRO ’18)
ARMv8 (8 cores) APM X-Gene 2 28 nm U Athens (MICRO’17, ISPASS’18)
ARMv8 (32 cores) APM (Ampere) X-Gene 3 16 nm U Athens (HPCA ’19)
4
5.
@LEGaTO/FPL – September2020
U Athens
Ampere’s (Applied Micro’s) X-Gene 2 & X-Gene 3
Parameter Configuration
ISA ARMv8
Pipeline 64-bit OoO (4-issue)
CPU 32 cores
Core clock 3 GHz
L1I $ 32KB per core (Parity)
L1D $ 32KB per core (Parity)
L2 $ 256KB per PMD (SECDED)
L3 $ 32MB (SECDED)
Technology 16 nm
Voltage Domains PMD & PCP/SoC
Freq. Domains per PMD (pair of cores)
5
Parameter Configuration
ISA ARMv8
Pipeline 64-bit OoO (4-issue)
CPU 8 cores
Core clock 2.4 GHz
L1I $ 32KB per core (Parity)
L1D $ 32KB per core (Parity)
L2 $ 256KB per PMD (SECDED)
L3 $ 8MB (SECDED)
Technology 28 nm
Voltage Domains PMD & PCP/SoC
Freq. Domains per PMD (pair of cores)
PMD = Processor module (2 cores), PCP=Processor complex (all cores)
X-Gene 3X-Gene 2
6.
@LEGaTO/FPL – September2020
U Athens
System-Level Voltage Scaling Characterization
(MICRO 2017)
6
• Running many different workloads at different voltage levels
@LEGaTO/FPL – September2020
U Athens
Core-to-Core & Workload-to-Workload Variation
8
850
860
870
880
890
900
910
920
930
0 1 2 3 4 5 6 7
TTT
mV
cactusADM
Crash Unsaf
850
860
870
880
890
900
910
920
930
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
TTT TFF
mV
soplex
Crash Unsafe Safe Average Vmin Average
850
860
870
880
890
900
910
920
930
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
TTT TFF TSS
mV
bwaves
Crash Unsafe Safe Average Vmin Average Crash
850
860
870
880
890
900
910
920
930
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
TTT TFF TSS
mV
bwaves
Crash Unsafe Safe Average Vmin Average Crash
25mV
20mV
25mV
max Power
Savings
18.4%
min Power
Savings
14.7%
max Power
Savings
22.1%
min Power
Savings
17.5%
max Power
Savings
21.2%
min Power
Savings
16.6%
also – variability across different chips, TTT/TFF/TSS (not shown)
9.
@LEGaTO/FPL – September2020
U Athens
Faster Characterization using Micro-viruses
(ISPASS 2018)
Vnominal
Vmin
Vcrash
Safe
(nothing abnormal)
Energy savings
Unsafe
(errors/SDCs –
but no crashes)
Uncertain/potential
energy savings
Crash
(crashes happen) Keep out !!
9
How much
time is
needed…?
37.6
20.7
1.5 1.9
0
9
18
27
36
Single-Thread Multi-Thread
Days
SPEC CPU2006 Micro-Viruses
10.
@LEGaTO/FPL – September2020
U Athens
Multicore/Multithread CPUs Voltage Margins
(HPCA 2019)
o Single-Core and Multi-Core executions
o Different frequencies
X-Gene 2 2.4 GHz, 1.2 GHz, and 0.9 GHz
X-Gene 3 3.0 GHz and 1.5GHz
o Different Thread Scaling Options
Max Threads (32T in X-Gene 3 - 8T in X-Gene 2)
Half Threads (16T in X-Gene 3 - 4T in X-Gene 2)
Quarter Threads (8T in X-Gene 3 - 2T in X-Gene 2)
o Thread Allocation Policies
(threads<max)
10
Spreaded
Clustered
0 1 2 3
0 1 2 3