SlideShare a Scribd company logo
Exploring performance and energy
consumption differences between
recent Intel processors
Unai Lopez-Novoa
Data Innovation Research Institute
Cardiff University
Challenge in High Performance Computing:
Increasing power consumption of modern supercomputers
Top500 list (June 2019):
2
Motivation
Rank Name Performance Power cons.
# 1 Summit (USA) 148.6 PFlops 10.1 MW
# 2 Sierra (USA) 94.6 PFlops 7.4 MW
# 3 Sunway TaihuLight (China) 93.0 PFlops 15.3 MW
Source: www.top500.org
Some agencies are setting energy consumption objectives in HPC,
e.g. US Dpt. of Energy by year 20221:
◦ 1 ExaFlops at 20-40 MW
◦ 25-50 GigaFlops/W
Proposal: analyse the behaviour of common parallel codes in
recent compute devices
◦ This work is focused on recent Intel CPUs
Goal: obtain a yardstick to guide performance or energy-oriented
optimisation
3
Motivation
1: Z.-N. Chen, J. Dongarra, Z.-W. Xu, “Post-exascale supercomputing: research opportunities
abound” Frontiers of Information Technology & Electronic Engineering, 19, 10, Oct 2018.
Feature introduced by Intel w/ Sandy Bridge (~2011) to monitor
energy consumption and cap power consumption.
Domains:
◦ Package: whole socket
◦ Power plane 0: cores within a package
◦ Power plane 1: uncore elements within a package
◦ DRAM: memory attached to socket
In this work: Package & DRAM
4
Intel RAPL
Issue: many ways to use RAPL, but they require either:
a) Modifying the source code of the target app (PAPI library)
b) Administrative privileges (perf command)
RAPL-logger: a tool to monitor the power consumption of an app
in an Intel-based system from user space
Generates a report of the energy (J) and power (W) consumption
GitHub: https://github.com/ulopeznovoa/RAPL-logger
5
RAPL-logger
./rapl_logger <your-app> <params-of-your-app>
3 hardware systems:
5 benchmarks:
◦ STREAM
◦ NPB: Conjugate Gradient (CG)
◦ NPB: Lower-Upper decomposition (LU)
Everything was compiled with GCC 7.3 and -O3 flag
6
Experimental setup
ID CPU Family # Cores Clock DRAM
SB 2 x Xeon E5-2620 Sandy Bridge 6 2.00 32 GB – DDR3
BW 2 x Xeon E5-2695 v4 Broadwell 18 2.10 126 GB – DDR4
SL 2 x Xeon Gold 6148 Skylake 20 2.40 377 GB – DDR4
◦ Rodinia: LavaMD
◦ Rodinia: Streamcluster
7
Results: performance comparison
8
Results: power consumption
SB BW SL
Package 0 9.7 19.2 51.6
Package 1 9.4 16.4 52.9
DRAM 0 2.9 1.2 10.1
DRAM 1 4.0 1.2 9.8
9
Results
SB BW SL
GFLOP/s 67.7 753.7 2000.3
DRAM GB/s 47.6 95.3 152.2
Idle power consumption (Watts)
Measured running sleep for 1 minute
Measured:
- GFLOP/s: FIRESTARTER benchmark
- DRAM GB/s: STREAM Triad
Raw computing power
On average:
◦ Codes run 2.1x faster with 1.1x less energy on BW than SB
◦ Codes run 3.7x faster with 1.3x less energy on SL than SB
RAPL-logger was used to collect measurements
◦ https://github.com/ulopeznovoa/RAPL-logger
Future work:
◦ Extend this study w/ more benchmarks and CPUs (ARM, IBM, …)
◦ Use this study to develop energy-aware performance models
10
Conclusions
Exploring performance and energy
consumption differences between
recent Intel processors
Unai Lopez-Novoa
LopezU@cardiff.ac.uk

More Related Content

What's hot

5 3 freeman pvpmc may 2016
5 3 freeman pvpmc may 20165 3 freeman pvpmc may 2016
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
ATMOSPHERE .
 
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run
inside-BigData.com
 
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind RaoHistogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Spark Summit
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
Robert Grossman
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
Guy Tel-Zur
 
13 helioscope pvpmc 2017v4
13 helioscope pvpmc 2017v413 helioscope pvpmc 2017v4
AWS Dublin Briefing - Cool AWS Use Cases
AWS Dublin Briefing - Cool AWS Use CasesAWS Dublin Briefing - Cool AWS Use Cases
AWS Dublin Briefing - Cool AWS Use Cases
Ian Massingham
 

What's hot (8)

5 3 freeman pvpmc may 2016
5 3 freeman pvpmc may 20165 3 freeman pvpmc may 2016
5 3 freeman pvpmc may 2016
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
 
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run
 
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind RaoHistogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
13 helioscope pvpmc 2017v4
13 helioscope pvpmc 2017v413 helioscope pvpmc 2017v4
13 helioscope pvpmc 2017v4
 
AWS Dublin Briefing - Cool AWS Use Cases
AWS Dublin Briefing - Cool AWS Use CasesAWS Dublin Briefing - Cool AWS Use Cases
AWS Dublin Briefing - Cool AWS Use Cases
 

Similar to Exploring performance and energy consumption differences between recent Intel processors

Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
Masoud Nikravesh
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ijdpsjournal
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ijdpsjournal
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
Deepak Shankar
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
Arun Joseph
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
inside-BigData.com
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
GIORGOS STAMELOS
 
Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...
Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...
Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...
Principled Technologies
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
Kangaroot
 
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough NutsMVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
inside-BigData.com
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
Ryousei Takano
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
VigneshwarRamaswamy
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Cores
inside-BigData.com
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
RCCSRENKEI
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
a3labdsp
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Intel® Software
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGATO project
 

Similar to Exploring performance and energy consumption differences between recent Intel processors (20)

Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...
Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...
Comparing Java performance: Red Hat Enterprise Linux 6 and OpenJDK vs. Micros...
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
 
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough NutsMVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Cores
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 

More from Unai Lopez-Novoa

A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
Unai Lopez-Novoa
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Unai Lopez-Novoa
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación Paralela
Unai Lopez-Novoa
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Unai Lopez-Novoa
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
Unai Lopez-Novoa
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
Unai Lopez-Novoa
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con Checkpointing
Unai Lopez-Novoa
 
Introduccion a MPI
Introduccion a MPIIntroduccion a MPI
Introduccion a MPI
Unai Lopez-Novoa
 

More from Unai Lopez-Novoa (9)

A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación Paralela
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con Checkpointing
 
Introduccion a MPI
Introduccion a MPIIntroduccion a MPI
Introduccion a MPI
 

Recently uploaded

NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 

Recently uploaded (20)

NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 

Exploring performance and energy consumption differences between recent Intel processors

  • 1. Exploring performance and energy consumption differences between recent Intel processors Unai Lopez-Novoa Data Innovation Research Institute Cardiff University
  • 2. Challenge in High Performance Computing: Increasing power consumption of modern supercomputers Top500 list (June 2019): 2 Motivation Rank Name Performance Power cons. # 1 Summit (USA) 148.6 PFlops 10.1 MW # 2 Sierra (USA) 94.6 PFlops 7.4 MW # 3 Sunway TaihuLight (China) 93.0 PFlops 15.3 MW Source: www.top500.org
  • 3. Some agencies are setting energy consumption objectives in HPC, e.g. US Dpt. of Energy by year 20221: ◦ 1 ExaFlops at 20-40 MW ◦ 25-50 GigaFlops/W Proposal: analyse the behaviour of common parallel codes in recent compute devices ◦ This work is focused on recent Intel CPUs Goal: obtain a yardstick to guide performance or energy-oriented optimisation 3 Motivation 1: Z.-N. Chen, J. Dongarra, Z.-W. Xu, “Post-exascale supercomputing: research opportunities abound” Frontiers of Information Technology & Electronic Engineering, 19, 10, Oct 2018.
  • 4. Feature introduced by Intel w/ Sandy Bridge (~2011) to monitor energy consumption and cap power consumption. Domains: ◦ Package: whole socket ◦ Power plane 0: cores within a package ◦ Power plane 1: uncore elements within a package ◦ DRAM: memory attached to socket In this work: Package & DRAM 4 Intel RAPL
  • 5. Issue: many ways to use RAPL, but they require either: a) Modifying the source code of the target app (PAPI library) b) Administrative privileges (perf command) RAPL-logger: a tool to monitor the power consumption of an app in an Intel-based system from user space Generates a report of the energy (J) and power (W) consumption GitHub: https://github.com/ulopeznovoa/RAPL-logger 5 RAPL-logger ./rapl_logger <your-app> <params-of-your-app>
  • 6. 3 hardware systems: 5 benchmarks: ◦ STREAM ◦ NPB: Conjugate Gradient (CG) ◦ NPB: Lower-Upper decomposition (LU) Everything was compiled with GCC 7.3 and -O3 flag 6 Experimental setup ID CPU Family # Cores Clock DRAM SB 2 x Xeon E5-2620 Sandy Bridge 6 2.00 32 GB – DDR3 BW 2 x Xeon E5-2695 v4 Broadwell 18 2.10 126 GB – DDR4 SL 2 x Xeon Gold 6148 Skylake 20 2.40 377 GB – DDR4 ◦ Rodinia: LavaMD ◦ Rodinia: Streamcluster
  • 9. SB BW SL Package 0 9.7 19.2 51.6 Package 1 9.4 16.4 52.9 DRAM 0 2.9 1.2 10.1 DRAM 1 4.0 1.2 9.8 9 Results SB BW SL GFLOP/s 67.7 753.7 2000.3 DRAM GB/s 47.6 95.3 152.2 Idle power consumption (Watts) Measured running sleep for 1 minute Measured: - GFLOP/s: FIRESTARTER benchmark - DRAM GB/s: STREAM Triad Raw computing power
  • 10. On average: ◦ Codes run 2.1x faster with 1.1x less energy on BW than SB ◦ Codes run 3.7x faster with 1.3x less energy on SL than SB RAPL-logger was used to collect measurements ◦ https://github.com/ulopeznovoa/RAPL-logger Future work: ◦ Extend this study w/ more benchmarks and CPUs (ARM, IBM, …) ◦ Use this study to develop energy-aware performance models 10 Conclusions
  • 11. Exploring performance and energy consumption differences between recent Intel processors Unai Lopez-Novoa LopezU@cardiff.ac.uk