The document describes a new process for measuring and modeling energy efficient performance of microprocessors. It involves synchronously measuring power and performance, calculating energy from power measurements, and graphing energy versus performance on a single plot. This allows optimization of performance and energy efficiency by identifying configurations with high energy/low performance or low energy/high performance. The process has been applied at Intel to quantify generation-to-generation improvements in energy efficiency, tune processors for best energy efficient performance, and optimize future designs. Challenges include automation, timing of measurements, and cross-generational comparisons.
Measure Quantify Model Energy Efficient Performance
1. New Process to Measure, Quantify, and
Model Energy Efficient Performance
13th International Workshop on
Microprocessor Test and Verification
December 10, 2012
Markus Mattwandel: Intel USA
2. 13th International Workshop on Microprocessor Test and Verification
2
Purpose
• Demonstrate the impetus and define methods of a new
process to validates Energy Efficient Performance
• Show how the process is being applied to validate “large
core” client processors
3. 13th International Workshop on Microprocessor Test and Verification
3
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
4. 13th International Workshop on Microprocessor Test and Verification
4
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
5. 13th International Workshop on Microprocessor Test and Verification
5
Both Power and Perf must be measured to establish best efficiency.
Energy Efficient Performance: Optimal Balance of
Highest Performance vs. Lowest Possible Power
Energy Efficient Performance
Snappy Response Holds charge >= 1 day
Highest Quality Settings Thinnest Design
Computationally Superior Longest Battery Life
Top Benchmark Scores Certified as Green
Maximum Throughput High PUE Datacenters
Performance Power
6. 13th International Workshop on Microprocessor Test and Verification
6
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
7. 13th International Workshop on Microprocessor Test and Verification
7
Traditional Power and Perf Measurements
Package Stepping
Core-Stepping
Core Freq (MHz)
ucode Patch
#cores
SMT
LLC Size
MLC Size
unCore Freq
Mem Model
Mem Type
Mem Speed
Mem Timings
Mem Channels
Mem Visible to OS
Total Mem Size
Ironlake Stepping
Bearlake-X Stepping
Tylersburg Stepping
Board-Revision
PCH Stepping
IOH
ICH
Descriptor
BIOS
Graphics Card
Graphics Driver
DirectX
INF Driver
BIOS Settings
Perf recipe
ITP settings
PowerConfiguration
Settings
Environment
Core
Uncore
GMCH
Board
Package Stepping
Core-Stepping
Core Freq (MHz)
ucode Patch
#cores
SMT
LLC Size
MLC Size
unCore Freq
Mem Model
Mem Type
Mem Speed
Mem Timings
Mem Channels
Mem Visible to OS
Total Mem Size
Ironlake Stepping
Bearlake-X Stepping
Tylersburg Stepping
Board-Revision
PCH Stepping
IOH
ICH
Descriptor
BIOS
Graphics Card
Graphics Driver
DirectX
INF Driver
BIOS Settings
Perf recipe
ITP settings
PerformanceConfiguration Settings
Environment
Core
Uncore
GMCH
Board
8. 13th International Workshop on Microprocessor Test and Verification
8
Step #1: Synch Power and Perf Measurements
Exact Synch only guaranteed via simultaneous measurements.
Package Stepping
Core-Stepping
Core Freq (MHz)
ucode Patch
#cores
SMT
LLC Size
MLC Size
unCore Freq
Mem Model
Mem Type
Mem Speed
Mem Timings
Mem Channels
Mem Visible to OS
Total Mem Size
Ironlake Stepping
Bearlake-X Stepping
Tylersburg Stepping
Board-Revision
PCH Stepping
IOH
ICH
Descriptor
BIOS
Graphics Card
Graphics Driver
DirectX
INF Driver
BIOS Settings
Perf recipe
ITP settings
Power/PerformanceConfiguration
Settings
Environment
Core
Uncore
GMCH
Board
9. 13th International Workshop on Microprocessor Test and Verification
9
Power Measurement Tiers
Power measurement granularity can be tailored.
Power rail
measurements
via interposer board
Component
measurements
via test points
On-chip
measurements
via counters
VIP
Board
Blue
Wires
DAL
10. 13th International Workshop on Microprocessor Test and Verification
10
Step #2: Use the Correct Metric
Use Energy for time-based workloads.
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
5100
6997
8894
10791
12688
14585
16482
18379
20276
22173
24070
25967
27864
29761
31658
33555
35452
37349
39246
41143
43040
44937
46834
48731
50628
52525
54422
56319
58216
60113
62010
63907
Watts
Sample#
Case
Non
Optimized
Optimized
Average
Power
62.97
Watts
78.33
Watts
#Active
Samples
46280
40109
Active
Energy
2277
Joules
2125
Joules
• Power Signature of a time based benchmark:
11. 13th International Workshop on Microprocessor Test and Verification
11
Calculating Energy via Active Region Detection
Power data can be processed via Active Region Detection.
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
90.00
95.00
100.00
105.00
110.00
115.00
120.00
125.00
130.00
135.00
140.00
145.00
150.00
155.00
160.00
165.00
170.00
175.00
180.00
185.00
190.00
195.00
200.00
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
10001
11001
12001
13001
14001
15001
16001
17001
18001
19001
20001
21001
22001
23001
24001
25001
26001
27001
28001
29001
30001
31001
32001
33001
34001
35001
36001
37001
38001
39001
40001
41001
42001
43001
44001
45001
46001
47001
48001
49001
50001
51001
52001
53001
54001
55001
56001
57001
58001
59001
60001
61001
Watts
Sample#
C:Python26EdgeDetect001_3dm11_gt1_ti0_wi0_nidaq_results.csv
All System Rails TOTAL Watts
Active Region = 1st Active Zone through Last Active Zone
Threshold
Active Zone 1
Active
Zone 2
Duration
12. 13th International Workshop on Microprocessor Test and Verification
12
Performance
Step #3: Graph Energy and Perf on one Plot
Low
Performance
High
Performance
13. 13th International Workshop on Microprocessor Test and Verification
13
Performance
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
14. 13th International Workshop on Microprocessor Test and Verification
14
Performance
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
Low
Performance
High
Performance
High Energy /
Low Perf
Low Energy /
Low Perf
High Energy /
High Perf
Low Energy /
High Perf
15. 13th International Workshop on Microprocessor Test and Verification
15
Performance
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
Low
Performance
High
Performance
High Energy /
Low Perf
Low Energy /
Low Perf
High Energy /
High Perf
Low Energy /
High Perf
16. 13th International Workshop on Microprocessor Test and Verification
16
Performance
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00
TotalPowerRatio(SMALLERisbetter)
Performance Ration (LARGER is better
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
Low
Performance
High
Performance
High Energy /
Low Perf
Low Energy /
Low Perf
High Energy /
High Perf
Low Energy /
High Perf
(LARGER is better)
17. 13th International Workshop on Microprocessor Test and Verification
17
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00
TotalPowerRatio(SMALLERisbetter)
Performance Ration (LARGER is better
Step #3: Graph Energy and Perf on one Plot
= 1.13
UUT Perf Score: 4979
Ref Perf Score: 4393
= 0.93
UUT Energy: 2116J
Ref Energy: 2266J
(LARGER is better)
18. 13th International Workshop on Microprocessor Test and Verification
18
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
19. 13th International Workshop on Microprocessor Test and Verification
19
Traditional Power and Perf Results
Performance S-Curve: Power S-Curve:
• Identify Low Performance or High Power Cases on LHS
• Do not show Power v Performance tradeoffs
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
PerformanceY/X
(<1.0isslower,>1.0isfaster)
Performance of Processors Y/X across all Benchmark Components
(sorted Slowest to Fastest)
Performance: Processor X v Processor Y
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
PowerY/X
(<1.0islower,>1.0ishigher)
Power of Processors Y/X across all Benchmark Components
(sorted Highest to Lowest)
Power: Processor X v Processor Y
20. 13th International Workshop on Microprocessor Test and Verification
20
GenerationtoGeneration
TuningforLowest
Power
TuningforBest
Performance
0.9500
0.9600
0.9700
0.9800
0.9900
1.0000
1.0100
1.0200
1.0300
1.0400
1.0500
0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500
TotalEnergyRatio(SMALLERisbetter)
Performance Ratio (LARGER is better
Application
Plotting Perf and Energy simultaneously optimizes Energy Efficiency.
Lower power,
best perf
0.9500
0.9600
0.9700
0.9800
0.9900
1.0000
1.0100
1.0200
1.0300
1.0400
1.0500
0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500
TotalEnergyRatio(SMALLERisbetter)
Performance Ratio (LARGER is better
Highest perf,
<1:1 power
impact
Lowest power,
< 1:1 perf
impact
Higher perf,
best power
38% lower
performance while
drawing 18% more
energy -> Debug
21. 13th International Workshop on Microprocessor Test and Verification
21
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
22. 13th International Workshop on Microprocessor Test and Verification
22
Challenges
• Automation is a must
• Timing is important
• New contacts need to be established
• Cross generational comparisons are difficult
23. 13th International Workshop on Microprocessor Test and Verification
23
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
24. 13th International Workshop on Microprocessor Test and Verification
24
Conclusions
• Constrained power budgets and demand for peak
performance dictates that energy efficient performance is
maximized across all computational segments.
• The process to simultaneously collect power and
performance data, convert power to energy, then model
performance v energy on one plot can be used to
demonstrate energy v performance trade-offs.
• The process shown has been deployed at Intel to:
• Quantify generation to generation Energy Efficiency
• Tune for best Energy Efficient Performance
• Optimize future designs