SlideShare a Scribd company logo
The impact of software on energy
use – and what can we do about it
Energy in HPC
The world’s top 500
supercomputers cost 400M€
annually in energy alone
If software reduces
its energy footprint
… payback could
be enormous
Solution
Enable developers
and users to
improve application
energy
consumption
Two Key Questions
• Can developers optimize code for energy?
• Can owners and users tune applications for
energy?
What is
energy?
Approximations for Energy
• Floating point, vector operations,
memory access
• L1 or L2 misses vs main memory
orders of magnitude in energy
Heuristics
• Real data from some processor,
memory subsystems, accelerators
• Available in kernel - Intel RAPL
Low level
measurement
• PDU and server level readings
• Real data – real energy
Server level
monitoring
How we optimize for time
Capture
performance
• Profiler creates
application profile
• Allinea MAP
records multiple
processes
Find
bottlenecks
• Source code
viewer pinpoints
key consumers
• Timelines find
unusual patterns
Optimize
• Rewrite key loops
• Reorganize
memory access
patterns
• Change algorithms
CPU Package and System Power
Whole System
Power Usage
CPU Package
Power Usage
We can also measure power when profiling
Coprocessor Metrics
• Coprocessors have power requirements
– NVIDIA CUDA GPU OR INTEL XEON PHI (KNC)
• Devices provide kernel access to power
– HIGH POWER CONSUMPTION WHEN ACTIVE
– LOW POWER CONSUMPTION WHEN IDLE
Two Key Questions
• Can developers optimize code for energy? YES
• Can owners and users tune applications for
energy?
How we tune time without changing code
No instrumentation needed
No source code needed
No recompilation needed
Less than 5% runtime overhead
Fully scalable
Explicit and usable output
We can include energy usage too
Key Observation: In a Nutshell
• For many HPC workloads
– THE FASTER AN APPLICATION COMPLETES, THE LOWER ITS
ENERGY CONSUMPTION
– IE. OPTIMIZE FOR SPEED AND YOU ARE PROBABLY ALREADY
OPTIMIZING FOR ENERGY
• But for some HPC and non-HPC cases
– FREQUENCY SCALING SAVES ENERGY
Two Key Questions
• Can developers optimize code for energy? YES
• Can owners and users tune applications for
energy? YES
…. But should they?
• Are we counting all energy?
• Are we considering all costs?
.. Let’s rewrite the key question!
Two Key Questions
• When should developers optimize code for
energy?
• When should owners and users tune applications
for energy?
What is
energy?
Measuring Energy
• Real data from some processor,
memory subsystems
• Available in kernel - Intel RAPL
Low level
measurement
• PDU and server level readings
• Real data – real energy
Server level
monitoring
• Air-con
• Servers, switches, storage….
Full system
monitoring
Frequency Scaling
Some
workloads have
low compute
requirement, but
high data
volume
Data crunching vs number
crunching
Processor is
over-powered
for the speed of
memory, disk or
network
CPU frequency can be scaled
down in software
Providing
information to
developer, user
and system
owner
Allinea MAP
Allinea Performance Reports
A lot of codes are memory-bound
Multiple cores share bandwidth
Core 1
Core 2
Core 3
Core 4
…
Lots of
clever
technology
Main memory
Can we tune them for energy efficiency?
Core 1
Core 2
Core 3
Core 4
…
Lots of
clever
technology
Main
memory
How can users improve energy efficiency?
Buy a new cluster
Optimize the code
Reduce CPU frequency?
Run on fewer cores per node?
The Experiment
One
simple
code
A well-understood
wave equation
solver
One
compute
node
Minimize effect of
MPI
communications
Change
CPU
frequency
and
#cores
Measure the results
with Allinea
Performance
Reports
4 PPN @ 2.1 Ghz, 30 seconds
Observed Results
Lower CPU frequency
• 1.7GHz – 6% less energy use
• Same runtime
Fewer cores/more nodes
• 20% increased runtime
• … but according to RAPL 15% energy saving
• Nonsense!! – 2x servers baseline power!
Important take-aways:
• Measure whole system energy
• Control power per application
Improving energy efficiency
• Tools show if the application wastes power
• Developers can see when to optimize and change
code
• Users can improve efficiency without changing code
Each application
and system has
different
characteristics
• Slowing down applications costs science and results
• Machines and PhDs have finite lifetime – their cost
dominates
Don’t forget the
opportunity cost
• Optimize for time before optimizing for energy
Time and energy
are not the same
Allinea’s Energy Products
• Allinea MAP
– PROFILER - PART OF ALLINEA FORGE DEVELOPMENT SUITE
– ADD-ON FOR SYSTEM AND PACKAGE ENERGY/POWER
• Allinea Performance Reports
– 1-PAGE BENCHMARKING/ANALYSIS
– ADD-ON FOR SYSTEM AND CPU ENERGY/POWER
• Available now
– RAPL SUPPORT FOR CPU ENERGY/POWER
– WHOLE NODE POWER MEASUREMENT FOR CRAY AND SOME
OTHER SYSTEMS
• Visit: http://www.allinea.com/energy-pack

More Related Content

Similar to The impact of software on data-center energy use - and what can we do about it?

Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for Energy
David Lecomber
 
Unit 1 Computer organization and Instructions
Unit 1 Computer organization and InstructionsUnit 1 Computer organization and Instructions
Unit 1 Computer organization and Instructions
Balaji Vignesh
 
SC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC SoftwareSC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC Software
inside-BigData.com
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
Basics of micro controllers for biginners
Basics of  micro controllers for biginnersBasics of  micro controllers for biginners
Basics of micro controllers for biginners
Gerwin Makanyanga
 
Embedded system hardware architecture ii
Embedded system hardware architecture iiEmbedded system hardware architecture ii
Embedded system hardware architecture ii
Grace Abraham
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded Computing
Pradeep Kumar TS
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
dhivyak49
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500
Maho Nakata
 
L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).ppt
ImranKhan997082
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big Data
DESMOND YUEN
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
inside-BigData.com
 
Programmable Logic Controller (plc)
Programmable Logic Controller (plc)Programmable Logic Controller (plc)
Programmable Logic Controller (plc)
Dhruv Shah
 
Aged Data Center Infrastructure.pptx
Aged Data Center Infrastructure.pptxAged Data Center Infrastructure.pptx
Aged Data Center Infrastructure.pptx
Schneider Electric
 
Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...
Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...
Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...
Abhishekn84
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimization
inside-BigData.com
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
Akhil Kaushik
 
Vector Supercomputers and Scientific Array Processors
Vector Supercomputers and Scientific Array ProcessorsVector Supercomputers and Scientific Array Processors
Vector Supercomputers and Scientific Array Processors
Hsuvas Borkakoty
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
Spark Summit
 

Similar to The impact of software on data-center energy use - and what can we do about it? (20)

Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for Energy
 
Unit 1 Computer organization and Instructions
Unit 1 Computer organization and InstructionsUnit 1 Computer organization and Instructions
Unit 1 Computer organization and Instructions
 
SC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC SoftwareSC17 Panel: Energy Efficiency Gains From HPC Software
SC17 Panel: Energy Efficiency Gains From HPC Software
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Basics of micro controllers for biginners
Basics of  micro controllers for biginnersBasics of  micro controllers for biginners
Basics of micro controllers for biginners
 
Embedded system hardware architecture ii
Embedded system hardware architecture iiEmbedded system hardware architecture ii
Embedded system hardware architecture ii
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded Computing
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500
 
L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).ppt
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big Data
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
Programmable Logic Controller (plc)
Programmable Logic Controller (plc)Programmable Logic Controller (plc)
Programmable Logic Controller (plc)
 
Aged Data Center Infrastructure.pptx
Aged Data Center Infrastructure.pptxAged Data Center Infrastructure.pptx
Aged Data Center Infrastructure.pptx
 
Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...
Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...
Basic Structure of Computers: Functional Units, Basic Operational Concepts, B...
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimization
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Vector Supercomputers and Scientific Array Processors
Vector Supercomputers and Scientific Array ProcessorsVector Supercomputers and Scientific Array Processors
Vector Supercomputers and Scientific Array Processors
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 

The impact of software on data-center energy use - and what can we do about it?

  • 1. The impact of software on energy use – and what can we do about it
  • 2. Energy in HPC The world’s top 500 supercomputers cost 400M€ annually in energy alone If software reduces its energy footprint … payback could be enormous Solution Enable developers and users to improve application energy consumption
  • 3. Two Key Questions • Can developers optimize code for energy? • Can owners and users tune applications for energy?
  • 4. What is energy? Approximations for Energy • Floating point, vector operations, memory access • L1 or L2 misses vs main memory orders of magnitude in energy Heuristics • Real data from some processor, memory subsystems, accelerators • Available in kernel - Intel RAPL Low level measurement • PDU and server level readings • Real data – real energy Server level monitoring
  • 5. How we optimize for time Capture performance • Profiler creates application profile • Allinea MAP records multiple processes Find bottlenecks • Source code viewer pinpoints key consumers • Timelines find unusual patterns Optimize • Rewrite key loops • Reorganize memory access patterns • Change algorithms
  • 6. CPU Package and System Power Whole System Power Usage CPU Package Power Usage We can also measure power when profiling
  • 7. Coprocessor Metrics • Coprocessors have power requirements – NVIDIA CUDA GPU OR INTEL XEON PHI (KNC) • Devices provide kernel access to power – HIGH POWER CONSUMPTION WHEN ACTIVE – LOW POWER CONSUMPTION WHEN IDLE
  • 8. Two Key Questions • Can developers optimize code for energy? YES • Can owners and users tune applications for energy?
  • 9. How we tune time without changing code No instrumentation needed No source code needed No recompilation needed Less than 5% runtime overhead Fully scalable Explicit and usable output
  • 10. We can include energy usage too
  • 11. Key Observation: In a Nutshell • For many HPC workloads – THE FASTER AN APPLICATION COMPLETES, THE LOWER ITS ENERGY CONSUMPTION – IE. OPTIMIZE FOR SPEED AND YOU ARE PROBABLY ALREADY OPTIMIZING FOR ENERGY • But for some HPC and non-HPC cases – FREQUENCY SCALING SAVES ENERGY
  • 12. Two Key Questions • Can developers optimize code for energy? YES • Can owners and users tune applications for energy? YES …. But should they? • Are we counting all energy? • Are we considering all costs? .. Let’s rewrite the key question!
  • 13. Two Key Questions • When should developers optimize code for energy? • When should owners and users tune applications for energy?
  • 14. What is energy? Measuring Energy • Real data from some processor, memory subsystems • Available in kernel - Intel RAPL Low level measurement • PDU and server level readings • Real data – real energy Server level monitoring • Air-con • Servers, switches, storage…. Full system monitoring
  • 15. Frequency Scaling Some workloads have low compute requirement, but high data volume Data crunching vs number crunching Processor is over-powered for the speed of memory, disk or network CPU frequency can be scaled down in software Providing information to developer, user and system owner Allinea MAP Allinea Performance Reports
  • 16. A lot of codes are memory-bound
  • 17. Multiple cores share bandwidth Core 1 Core 2 Core 3 Core 4 … Lots of clever technology Main memory
  • 18. Can we tune them for energy efficiency? Core 1 Core 2 Core 3 Core 4 … Lots of clever technology Main memory
  • 19. How can users improve energy efficiency? Buy a new cluster Optimize the code Reduce CPU frequency? Run on fewer cores per node?
  • 20. The Experiment One simple code A well-understood wave equation solver One compute node Minimize effect of MPI communications Change CPU frequency and #cores Measure the results with Allinea Performance Reports
  • 21. 4 PPN @ 2.1 Ghz, 30 seconds
  • 22. Observed Results Lower CPU frequency • 1.7GHz – 6% less energy use • Same runtime Fewer cores/more nodes • 20% increased runtime • … but according to RAPL 15% energy saving • Nonsense!! – 2x servers baseline power! Important take-aways: • Measure whole system energy • Control power per application
  • 23. Improving energy efficiency • Tools show if the application wastes power • Developers can see when to optimize and change code • Users can improve efficiency without changing code Each application and system has different characteristics • Slowing down applications costs science and results • Machines and PhDs have finite lifetime – their cost dominates Don’t forget the opportunity cost • Optimize for time before optimizing for energy Time and energy are not the same
  • 24. Allinea’s Energy Products • Allinea MAP – PROFILER - PART OF ALLINEA FORGE DEVELOPMENT SUITE – ADD-ON FOR SYSTEM AND PACKAGE ENERGY/POWER • Allinea Performance Reports – 1-PAGE BENCHMARKING/ANALYSIS – ADD-ON FOR SYSTEM AND CPU ENERGY/POWER • Available now – RAPL SUPPORT FOR CPU ENERGY/POWER – WHOLE NODE POWER MEASUREMENT FOR CRAY AND SOME OTHER SYSTEMS • Visit: http://www.allinea.com/energy-pack