SlideShare a Scribd company logo
1 of 26
Download to read offline
New Process to Measure, Quantify, and
Model Energy Efficient Performance
13th International Workshop on
Microprocessor Test and Verification
December 10, 2012
Markus Mattwandel: Intel USA
13th International Workshop on Microprocessor Test and Verification
2
Purpose
• Demonstrate the impetus and define methods of a new
process to validates Energy Efficient Performance
• Show how the process is being applied to validate “large
core” client processors
13th International Workshop on Microprocessor Test and Verification
3
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
13th International Workshop on Microprocessor Test and Verification
4
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
13th International Workshop on Microprocessor Test and Verification
5
Both Power and Perf must be measured to establish best efficiency.
Energy Efficient Performance: Optimal Balance of
Highest Performance vs. Lowest Possible Power
Energy Efficient Performance
Snappy Response Holds charge >= 1 day
Highest Quality Settings Thinnest Design
Computationally Superior Longest Battery Life
Top Benchmark Scores Certified as Green
Maximum Throughput High PUE Datacenters
Performance Power
13th International Workshop on Microprocessor Test and Verification
6
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
13th International Workshop on Microprocessor Test and Verification
7
Traditional Power and Perf Measurements
Package Stepping
Core-Stepping
Core Freq (MHz)
ucode Patch
#cores
SMT
LLC Size
MLC Size
unCore Freq
Mem Model
Mem Type
Mem Speed
Mem Timings
Mem Channels
Mem Visible to OS
Total Mem Size
Ironlake Stepping
Bearlake-X Stepping
Tylersburg Stepping
Board-Revision
PCH Stepping
IOH
ICH
Descriptor
BIOS
Graphics Card
Graphics Driver
DirectX
INF Driver
BIOS Settings
Perf recipe
ITP settings
PowerConfiguration
Settings
Environment
Core
Uncore
GMCH
Board
Package Stepping
Core-Stepping
Core Freq (MHz)
ucode Patch
#cores
SMT
LLC Size
MLC Size
unCore Freq
Mem Model
Mem Type
Mem Speed
Mem Timings
Mem Channels
Mem Visible to OS
Total Mem Size
Ironlake Stepping
Bearlake-X Stepping
Tylersburg Stepping
Board-Revision
PCH Stepping
IOH
ICH
Descriptor
BIOS
Graphics Card
Graphics Driver
DirectX
INF Driver
BIOS Settings
Perf recipe
ITP settings
PerformanceConfiguration Settings
Environment
Core
Uncore
GMCH
Board
13th International Workshop on Microprocessor Test and Verification
8
Step #1: Synch Power and Perf Measurements
Exact Synch only guaranteed via simultaneous measurements.
Package Stepping
Core-Stepping
Core Freq (MHz)
ucode Patch
#cores
SMT
LLC Size
MLC Size
unCore Freq
Mem Model
Mem Type
Mem Speed
Mem Timings
Mem Channels
Mem Visible to OS
Total Mem Size
Ironlake Stepping
Bearlake-X Stepping
Tylersburg Stepping
Board-Revision
PCH Stepping
IOH
ICH
Descriptor
BIOS
Graphics Card
Graphics Driver
DirectX
INF Driver
BIOS Settings
Perf recipe
ITP settings
Power/PerformanceConfiguration
Settings
Environment
Core
Uncore
GMCH
Board
13th International Workshop on Microprocessor Test and Verification
9
Power Measurement Tiers
Power measurement granularity can be tailored.
Power rail
measurements
via interposer board
Component
measurements
via test points
On-chip
measurements
via counters
VIP
Board
Blue
Wires
DAL
13th International Workshop on Microprocessor Test and Verification
10
Step #2: Use the Correct Metric
Use Energy for time-based workloads.
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
5100
6997
8894
10791
12688
14585
16482
18379
20276
22173
24070
25967
27864
29761
31658
33555
35452
37349
39246
41143
43040
44937
46834
48731
50628
52525
54422
56319
58216
60113
62010
63907
Watts
Sample#
Case
Non
Optimized
Optimized
Average
Power
62.97
Watts
78.33
Watts
#Active
Samples
46280
40109
Active
Energy
2277
Joules
2125
Joules
• Power Signature of a time based benchmark:
13th International Workshop on Microprocessor Test and Verification
11
Calculating Energy via Active Region Detection
Power data can be processed via Active Region Detection.
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
90.00
95.00
100.00
105.00
110.00
115.00
120.00
125.00
130.00
135.00
140.00
145.00
150.00
155.00
160.00
165.00
170.00
175.00
180.00
185.00
190.00
195.00
200.00
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
10001
11001
12001
13001
14001
15001
16001
17001
18001
19001
20001
21001
22001
23001
24001
25001
26001
27001
28001
29001
30001
31001
32001
33001
34001
35001
36001
37001
38001
39001
40001
41001
42001
43001
44001
45001
46001
47001
48001
49001
50001
51001
52001
53001
54001
55001
56001
57001
58001
59001
60001
61001
Watts
Sample#
C:Python26EdgeDetect001_3dm11_gt1_ti0_wi0_nidaq_results.csv
All System Rails TOTAL Watts
Active Region =  1st Active Zone through  Last Active Zone
Threshold
Active Zone 1
Active
Zone 2
Duration
13th International Workshop on Microprocessor Test and Verification
12
Performance
Step #3: Graph Energy and Perf on one Plot
Low
Performance
High
Performance
13th International Workshop on Microprocessor Test and Verification
13
Performance
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
13th International Workshop on Microprocessor Test and Verification
14
Performance
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
Low
Performance
High
Performance
High Energy /
Low Perf
Low Energy /
Low Perf
High Energy /
High Perf
Low Energy /
High Perf
13th International Workshop on Microprocessor Test and Verification
15
Performance
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
Low
Performance
High
Performance
High Energy /
Low Perf
Low Energy /
Low Perf
High Energy /
High Perf
Low Energy /
High Perf
13th International Workshop on Microprocessor Test and Verification
16
Performance
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00
TotalPowerRatio(SMALLERisbetter)
Performance Ration (LARGER is better
Step #3: Graph Energy and Perf on one Plot
Energy
High Energy
Low Energy
Low
Performance
High
Performance
High Energy /
Low Perf
Low Energy /
Low Perf
High Energy /
High Perf
Low Energy /
High Perf
(LARGER is better)
13th International Workshop on Microprocessor Test and Verification
17
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00
TotalPowerRatio(SMALLERisbetter)
Performance Ration (LARGER is better
Step #3: Graph Energy and Perf on one Plot
= 1.13
UUT Perf Score: 4979
Ref Perf Score: 4393
= 0.93
UUT Energy: 2116J
Ref Energy: 2266J
(LARGER is better)
13th International Workshop on Microprocessor Test and Verification
18
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
13th International Workshop on Microprocessor Test and Verification
19
Traditional Power and Perf Results
Performance S-Curve: Power S-Curve:
• Identify Low Performance or High Power Cases on LHS
• Do not show Power v Performance tradeoffs
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
PerformanceY/X
(<1.0isslower,>1.0isfaster)
Performance of Processors Y/X across all Benchmark Components
(sorted Slowest to Fastest)
Performance: Processor X v Processor Y
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
PowerY/X
(<1.0islower,>1.0ishigher)
Power of Processors Y/X across all Benchmark Components
(sorted Highest to Lowest)
Power: Processor X v Processor Y
13th International Workshop on Microprocessor Test and Verification
20
GenerationtoGeneration
TuningforLowest
Power
TuningforBest
Performance
0.9500
0.9600
0.9700
0.9800
0.9900
1.0000
1.0100
1.0200
1.0300
1.0400
1.0500
0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500
TotalEnergyRatio(SMALLERisbetter)
Performance Ratio (LARGER is better
Application
Plotting Perf and Energy simultaneously optimizes Energy Efficiency.
Lower power,
best perf
0.9500
0.9600
0.9700
0.9800
0.9900
1.0000
1.0100
1.0200
1.0300
1.0400
1.0500
0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500
TotalEnergyRatio(SMALLERisbetter)
Performance Ratio (LARGER is better
Highest perf,
<1:1 power
impact
Lowest power,
< 1:1 perf
impact
Higher perf,
best power
38% lower
performance while
drawing 18% more
energy -> Debug
13th International Workshop on Microprocessor Test and Verification
21
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
13th International Workshop on Microprocessor Test and Verification
22
Challenges
• Automation is a must
• Timing is important
• New contacts need to be established
• Cross generational comparisons are difficult
13th International Workshop on Microprocessor Test and Verification
23
Agenda
• Definition and Impetus
• 3 Steps
• Application
• Challenges
• Conclusions
• Q&A
13th International Workshop on Microprocessor Test and Verification
24
Conclusions
• Constrained power budgets and demand for peak
performance dictates that energy efficient performance is
maximized across all computational segments.
• The process to simultaneously collect power and
performance data, convert power to energy, then model
performance v energy on one plot can be used to
demonstrate energy v performance trade-offs.
• The process shown has been deployed at Intel to:
• Quantify generation to generation Energy Efficiency
• Tune for best Energy Efficient Performance
• Optimize future designs
13th International Workshop on Microprocessor Test and Verification
25
Q&A
Measure Quantify Model Energy Efficient Performance

More Related Content

Viewers also liked

Drupal 8 - How it can helps your startup
Drupal 8 - How it can helps your startupDrupal 8 - How it can helps your startup
Drupal 8 - How it can helps your startupArradi Nur Rizal
 
pakolainen 3 2015 - web-spreads
pakolainen 3 2015 - web-spreadspakolainen 3 2015 - web-spreads
pakolainen 3 2015 - web-spreadsAnna-Maria Pasanen
 
Comunicación asertiva
Comunicación asertivaComunicación asertiva
Comunicación asertivalizetor
 
ASFPM 2016: Creating Resiliency in Streams
ASFPM 2016: Creating Resiliency in StreamsASFPM 2016: Creating Resiliency in Streams
ASFPM 2016: Creating Resiliency in StreamsCDM Smith
 
τα στενά παπούτσια καναβου β2
τα στενά παπούτσια καναβου β2τα στενά παπούτσια καναβου β2
τα στενά παπούτσια καναβου β2cgialopsos
 
Kibow presentation
Kibow presentationKibow presentation
Kibow presentationALAA AWN
 
αντικαπνιστικό πρόγραμμα
αντικαπνιστικό πρόγραμμααντικαπνιστικό πρόγραμμα
αντικαπνιστικό πρόγραμμαalkalaitzi
 
Товарные рекомендации - ключ к лояльности и повышению доходов
Товарные рекомендации - ключ к лояльности и повышению доходовТоварные рекомендации - ключ к лояльности и повышению доходов
Товарные рекомендации - ключ к лояльности и повышению доходовНовый Сайт
 
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMSHenry Muccini
 

Viewers also liked (12)

Drupal 8 - How it can helps your startup
Drupal 8 - How it can helps your startupDrupal 8 - How it can helps your startup
Drupal 8 - How it can helps your startup
 
pakolainen 3 2015 - web-spreads
pakolainen 3 2015 - web-spreadspakolainen 3 2015 - web-spreads
pakolainen 3 2015 - web-spreads
 
Vinay_Resume
Vinay_ResumeVinay_Resume
Vinay_Resume
 
Comunicación asertiva
Comunicación asertivaComunicación asertiva
Comunicación asertiva
 
VIKAS JALODIA
VIKAS JALODIAVIKAS JALODIA
VIKAS JALODIA
 
Human Rights Bang
Human Rights Bang Human Rights Bang
Human Rights Bang
 
ASFPM 2016: Creating Resiliency in Streams
ASFPM 2016: Creating Resiliency in StreamsASFPM 2016: Creating Resiliency in Streams
ASFPM 2016: Creating Resiliency in Streams
 
τα στενά παπούτσια καναβου β2
τα στενά παπούτσια καναβου β2τα στενά παπούτσια καναβου β2
τα στενά παπούτσια καναβου β2
 
Kibow presentation
Kibow presentationKibow presentation
Kibow presentation
 
αντικαπνιστικό πρόγραμμα
αντικαπνιστικό πρόγραμμααντικαπνιστικό πρόγραμμα
αντικαπνιστικό πρόγραμμα
 
Товарные рекомендации - ключ к лояльности и повышению доходов
Товарные рекомендации - ключ к лояльности и повышению доходовТоварные рекомендации - ключ к лояльности и повышению доходов
Товарные рекомендации - ключ к лояльности и повышению доходов
 
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
1ST DISIM WORKSHOP ON ENGINEERING CYBER-PHYSICAL SYSTEMS
 

Similar to Measure Quantify Model Energy Efficient Performance

Efficient Overclocking Experiment
Efficient Overclocking ExperimentEfficient Overclocking Experiment
Efficient Overclocking ExperimentJosh Mullis
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsFacultad de Informática UCM
 
Process Capability: Step 4 (Normal Distributions)
Process Capability: Step 4 (Normal Distributions)Process Capability: Step 4 (Normal Distributions)
Process Capability: Step 4 (Normal Distributions)Matt Hansen
 
Performance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cPerformance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cAjith Narayanan
 
Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)Matt Hansen
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
406 bruce worthington_windows_server_power_efficiency_slideshare
406 bruce worthington_windows_server_power_efficiency_slideshare406 bruce worthington_windows_server_power_efficiency_slideshare
406 bruce worthington_windows_server_power_efficiency_slideshareBruce Worthington
 
Application of Artificial Intelligence for Automotive Applications
Application of Artificial Intelligence for Automotive ApplicationsApplication of Artificial Intelligence for Automotive Applications
Application of Artificial Intelligence for Automotive ApplicationsKonfHubTechConferenc
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Amazon Web Services
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xMatthew Gaudet
 
Development of PLC based Transdermal Patch Evaluation System
Development of PLC based Transdermal Patch Evaluation SystemDevelopment of PLC based Transdermal Patch Evaluation System
Development of PLC based Transdermal Patch Evaluation SystemIRJET Journal
 
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)Tũi Wichets
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009James McGalliard
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLinaro
 
Scorpion Motor Noise - JE BB (part)-1
Scorpion Motor Noise - JE BB (part)-1Scorpion Motor Noise - JE BB (part)-1
Scorpion Motor Noise - JE BB (part)-1Peter Zhou
 
Hioki Electrical measuring instruments 2022 catalogue
Hioki Electrical measuring instruments 2022 catalogueHioki Electrical measuring instruments 2022 catalogue
Hioki Electrical measuring instruments 2022 catalogueNIHON DENKEI SINGAPORE
 
Advances In Digital Automation Within Refining
Advances In Digital Automation Within RefiningAdvances In Digital Automation Within Refining
Advances In Digital Automation Within RefiningJim Cahill
 
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2Shay Ginsbourg
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
 
Alibaba cloud benchmarking report ecs rds limton xavier
Alibaba cloud benchmarking report ecs  rds limton xavierAlibaba cloud benchmarking report ecs  rds limton xavier
Alibaba cloud benchmarking report ecs rds limton xavierLimton Xavier
 

Similar to Measure Quantify Model Energy Efficient Performance (20)

Efficient Overclocking Experiment
Efficient Overclocking ExperimentEfficient Overclocking Experiment
Efficient Overclocking Experiment
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC Applications
 
Process Capability: Step 4 (Normal Distributions)
Process Capability: Step 4 (Normal Distributions)Process Capability: Step 4 (Normal Distributions)
Process Capability: Step 4 (Normal Distributions)
 
Performance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cPerformance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12c
 
Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)Process Capability: Step 5 (Non-Normal Distributions)
Process Capability: Step 5 (Non-Normal Distributions)
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
406 bruce worthington_windows_server_power_efficiency_slideshare
406 bruce worthington_windows_server_power_efficiency_slideshare406 bruce worthington_windows_server_power_efficiency_slideshare
406 bruce worthington_windows_server_power_efficiency_slideshare
 
Application of Artificial Intelligence for Automotive Applications
Application of Artificial Intelligence for Automotive ApplicationsApplication of Artificial Intelligence for Automotive Applications
Application of Artificial Intelligence for Automotive Applications
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
Development of PLC based Transdermal Patch Evaluation System
Development of PLC based Transdermal Patch Evaluation SystemDevelopment of PLC based Transdermal Patch Evaluation System
Development of PLC based Transdermal Patch Evaluation System
 
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
ค่า Tx Power Mode ใน Ubiquiti และ Mikrotik (RF Tx Power Mode Settings)
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in Android
 
Scorpion Motor Noise - JE BB (part)-1
Scorpion Motor Noise - JE BB (part)-1Scorpion Motor Noise - JE BB (part)-1
Scorpion Motor Noise - JE BB (part)-1
 
Hioki Electrical measuring instruments 2022 catalogue
Hioki Electrical measuring instruments 2022 catalogueHioki Electrical measuring instruments 2022 catalogue
Hioki Electrical measuring instruments 2022 catalogue
 
Advances In Digital Automation Within Refining
Advances In Digital Automation Within RefiningAdvances In Digital Automation Within Refining
Advances In Digital Automation Within Refining
 
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
Ginsbourg.com - Performance and Load Test Report Template LTR 1.2
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Alibaba cloud benchmarking report ecs rds limton xavier
Alibaba cloud benchmarking report ecs  rds limton xavierAlibaba cloud benchmarking report ecs  rds limton xavier
Alibaba cloud benchmarking report ecs rds limton xavier
 

Measure Quantify Model Energy Efficient Performance

  • 1. New Process to Measure, Quantify, and Model Energy Efficient Performance 13th International Workshop on Microprocessor Test and Verification December 10, 2012 Markus Mattwandel: Intel USA
  • 2. 13th International Workshop on Microprocessor Test and Verification 2 Purpose • Demonstrate the impetus and define methods of a new process to validates Energy Efficient Performance • Show how the process is being applied to validate “large core” client processors
  • 3. 13th International Workshop on Microprocessor Test and Verification 3 Agenda • Definition and Impetus • 3 Steps • Application • Challenges • Conclusions • Q&A
  • 4. 13th International Workshop on Microprocessor Test and Verification 4 Agenda • Definition and Impetus • 3 Steps • Application • Challenges • Conclusions • Q&A
  • 5. 13th International Workshop on Microprocessor Test and Verification 5 Both Power and Perf must be measured to establish best efficiency. Energy Efficient Performance: Optimal Balance of Highest Performance vs. Lowest Possible Power Energy Efficient Performance Snappy Response Holds charge >= 1 day Highest Quality Settings Thinnest Design Computationally Superior Longest Battery Life Top Benchmark Scores Certified as Green Maximum Throughput High PUE Datacenters Performance Power
  • 6. 13th International Workshop on Microprocessor Test and Verification 6 Agenda • Definition and Impetus • 3 Steps • Application • Challenges • Conclusions • Q&A
  • 7. 13th International Workshop on Microprocessor Test and Verification 7 Traditional Power and Perf Measurements Package Stepping Core-Stepping Core Freq (MHz) ucode Patch #cores SMT LLC Size MLC Size unCore Freq Mem Model Mem Type Mem Speed Mem Timings Mem Channels Mem Visible to OS Total Mem Size Ironlake Stepping Bearlake-X Stepping Tylersburg Stepping Board-Revision PCH Stepping IOH ICH Descriptor BIOS Graphics Card Graphics Driver DirectX INF Driver BIOS Settings Perf recipe ITP settings PowerConfiguration Settings Environment Core Uncore GMCH Board Package Stepping Core-Stepping Core Freq (MHz) ucode Patch #cores SMT LLC Size MLC Size unCore Freq Mem Model Mem Type Mem Speed Mem Timings Mem Channels Mem Visible to OS Total Mem Size Ironlake Stepping Bearlake-X Stepping Tylersburg Stepping Board-Revision PCH Stepping IOH ICH Descriptor BIOS Graphics Card Graphics Driver DirectX INF Driver BIOS Settings Perf recipe ITP settings PerformanceConfiguration Settings Environment Core Uncore GMCH Board
  • 8. 13th International Workshop on Microprocessor Test and Verification 8 Step #1: Synch Power and Perf Measurements Exact Synch only guaranteed via simultaneous measurements. Package Stepping Core-Stepping Core Freq (MHz) ucode Patch #cores SMT LLC Size MLC Size unCore Freq Mem Model Mem Type Mem Speed Mem Timings Mem Channels Mem Visible to OS Total Mem Size Ironlake Stepping Bearlake-X Stepping Tylersburg Stepping Board-Revision PCH Stepping IOH ICH Descriptor BIOS Graphics Card Graphics Driver DirectX INF Driver BIOS Settings Perf recipe ITP settings Power/PerformanceConfiguration Settings Environment Core Uncore GMCH Board
  • 9. 13th International Workshop on Microprocessor Test and Verification 9 Power Measurement Tiers Power measurement granularity can be tailored. Power rail measurements via interposer board Component measurements via test points On-chip measurements via counters VIP Board Blue Wires DAL
  • 10. 13th International Workshop on Microprocessor Test and Verification 10 Step #2: Use the Correct Metric Use Energy for time-based workloads. 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 5100 6997 8894 10791 12688 14585 16482 18379 20276 22173 24070 25967 27864 29761 31658 33555 35452 37349 39246 41143 43040 44937 46834 48731 50628 52525 54422 56319 58216 60113 62010 63907 Watts Sample# Case Non Optimized Optimized Average Power 62.97 Watts 78.33 Watts #Active Samples 46280 40109 Active Energy 2277 Joules 2125 Joules • Power Signature of a time based benchmark:
  • 11. 13th International Workshop on Microprocessor Test and Verification 11 Calculating Energy via Active Region Detection Power data can be processed via Active Region Detection. 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 80.00 85.00 90.00 95.00 100.00 105.00 110.00 115.00 120.00 125.00 130.00 135.00 140.00 145.00 150.00 155.00 160.00 165.00 170.00 175.00 180.00 185.00 190.00 195.00 200.00 1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001 11001 12001 13001 14001 15001 16001 17001 18001 19001 20001 21001 22001 23001 24001 25001 26001 27001 28001 29001 30001 31001 32001 33001 34001 35001 36001 37001 38001 39001 40001 41001 42001 43001 44001 45001 46001 47001 48001 49001 50001 51001 52001 53001 54001 55001 56001 57001 58001 59001 60001 61001 Watts Sample# C:Python26EdgeDetect001_3dm11_gt1_ti0_wi0_nidaq_results.csv All System Rails TOTAL Watts Active Region =  1st Active Zone through  Last Active Zone Threshold Active Zone 1 Active Zone 2 Duration
  • 12. 13th International Workshop on Microprocessor Test and Verification 12 Performance Step #3: Graph Energy and Perf on one Plot Low Performance High Performance
  • 13. 13th International Workshop on Microprocessor Test and Verification 13 Performance Step #3: Graph Energy and Perf on one Plot Energy High Energy Low Energy
  • 14. 13th International Workshop on Microprocessor Test and Verification 14 Performance Step #3: Graph Energy and Perf on one Plot Energy High Energy Low Energy Low Performance High Performance High Energy / Low Perf Low Energy / Low Perf High Energy / High Perf Low Energy / High Perf
  • 15. 13th International Workshop on Microprocessor Test and Verification 15 Performance Step #3: Graph Energy and Perf on one Plot Energy High Energy Low Energy Low Performance High Performance High Energy / Low Perf Low Energy / Low Perf High Energy / High Perf Low Energy / High Perf
  • 16. 13th International Workshop on Microprocessor Test and Verification 16 Performance 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 TotalPowerRatio(SMALLERisbetter) Performance Ration (LARGER is better Step #3: Graph Energy and Perf on one Plot Energy High Energy Low Energy Low Performance High Performance High Energy / Low Perf Low Energy / Low Perf High Energy / High Perf Low Energy / High Perf (LARGER is better)
  • 17. 13th International Workshop on Microprocessor Test and Verification 17 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 TotalPowerRatio(SMALLERisbetter) Performance Ration (LARGER is better Step #3: Graph Energy and Perf on one Plot = 1.13 UUT Perf Score: 4979 Ref Perf Score: 4393 = 0.93 UUT Energy: 2116J Ref Energy: 2266J (LARGER is better)
  • 18. 13th International Workshop on Microprocessor Test and Verification 18 Agenda • Definition and Impetus • 3 Steps • Application • Challenges • Conclusions • Q&A
  • 19. 13th International Workshop on Microprocessor Test and Verification 19 Traditional Power and Perf Results Performance S-Curve: Power S-Curve: • Identify Low Performance or High Power Cases on LHS • Do not show Power v Performance tradeoffs 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 PerformanceY/X (<1.0isslower,>1.0isfaster) Performance of Processors Y/X across all Benchmark Components (sorted Slowest to Fastest) Performance: Processor X v Processor Y 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 PowerY/X (<1.0islower,>1.0ishigher) Power of Processors Y/X across all Benchmark Components (sorted Highest to Lowest) Power: Processor X v Processor Y
  • 20. 13th International Workshop on Microprocessor Test and Verification 20 GenerationtoGeneration TuningforLowest Power TuningforBest Performance 0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500 0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500 TotalEnergyRatio(SMALLERisbetter) Performance Ratio (LARGER is better Application Plotting Perf and Energy simultaneously optimizes Energy Efficiency. Lower power, best perf 0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500 0.9500 0.9600 0.9700 0.9800 0.9900 1.0000 1.0100 1.0200 1.0300 1.0400 1.0500 TotalEnergyRatio(SMALLERisbetter) Performance Ratio (LARGER is better Highest perf, <1:1 power impact Lowest power, < 1:1 perf impact Higher perf, best power 38% lower performance while drawing 18% more energy -> Debug
  • 21. 13th International Workshop on Microprocessor Test and Verification 21 Agenda • Definition and Impetus • 3 Steps • Application • Challenges • Conclusions • Q&A
  • 22. 13th International Workshop on Microprocessor Test and Verification 22 Challenges • Automation is a must • Timing is important • New contacts need to be established • Cross generational comparisons are difficult
  • 23. 13th International Workshop on Microprocessor Test and Verification 23 Agenda • Definition and Impetus • 3 Steps • Application • Challenges • Conclusions • Q&A
  • 24. 13th International Workshop on Microprocessor Test and Verification 24 Conclusions • Constrained power budgets and demand for peak performance dictates that energy efficient performance is maximized across all computational segments. • The process to simultaneously collect power and performance data, convert power to energy, then model performance v energy on one plot can be used to demonstrate energy v performance trade-offs. • The process shown has been deployed at Intel to: • Quantify generation to generation Energy Efficiency • Tune for best Energy Efficient Performance • Optimize future designs
  • 25. 13th International Workshop on Microprocessor Test and Verification 25 Q&A