Symposium on HPC Applications – IIT Kanpur

A review of power & energy
consumption optimization in HPC

Rishi Pathak
riship@cdac.in
National PARAM Supercomputing Facility, C-DAC, Pune

Symposium on HPC Applications – IIT Kanpur
March 12 - 14, 2012

3

2.02 GF/Watt
2.02
1.98
2.5 1.68

Green 500, Rank 1-10 (GF per Watt)
1.99

Top 500, Rank 1-10 (GF per Watt)
2

1.37
1.26
GPU
1.5 GPU
GPU
1.01 0.95
GPU
0.96
GPU
1
GPU
0.83 0.85

GPU
0.63
GPU
0.5 0.49
0.36 0.44
0.28
0.25 0.29
0.27

0
1 2 3 4 5 6 7 8 9 10

Exascale system
• Likely to be feasible by 2017±2
• 10-100 Million processing elements (cores or mini-
cores)
• Chips perhaps as dense as 1,000 cores per socket
• Clock rates will grow more slowly
• Large-scale optics based interconnects
• 10-100 PB of aggregate memory
• Performance per watt ~ 100 GF/watt sustained
performance
• 10 – 100 MW Exascale system

Power & Energy
 E=P*T
 Energy(E) consumed in time(T) with average
power(P)
 Minimizing time interval will limit energy
 A minimum value of T for an application
 Mapping of application to cluster system
 Scalability & system bottlenecks
 Beyond that – Power management approaches

Power management techniques
 Static Power Management(SPM)
 Low power CPUs
 Local flash storage
 Suitable for data centric applications
 Dynamic Power Management(DPM)
 Software & power scalable components
 Dynamically adjust power consumption
 Frequency & Voltage scaling for CPU & memory

DVFS
 Dynamic Voltage & Frequency Scaling
 P = C * V2 * f
 Throttling when
 Workload is not CPU bound
 Is not much CPU intensive

DVFS Scheduling
 Off-line, trace-based scheduling
 Source code instrumentation for performance proﬁling
 Execution with proﬁling
 Determination of appropriate processor frequencies for
each phase
 Source code instrumentation for DVFS scheduling

S. Huang & W. Feng – Proc. Cluster computing[IEEE/ACM](2009)

DVFS Scheduling
 Run-time, proﬁling-based scheduling
 Time-window based performance prediction model
 No a priori information of application phases
 False prediction will have dire consequences for performance
or energy efﬁciency
 Metrics
 MIPS & CPU utilization
 Interception of MPI communication calls
 File I/O calls
 MPI receive wait cycles
 Shown to reduce energy with pre-specified performance loss
constraint

DVFS Implementations
 Memory MISER (Management Infra-Structure for Enerygy
Reduction)
 CPU MISER
 Linux CPUSPEED
 Ecod
 Beta-Algorithm
M. E. Tolentino, J. Turner & K. W. Cameron – Proc. of the 4th international conference
on Computing frontiers(2007)
S. Huang & W. Feng – Proc. Cluster computing[IEEE/ACM](2009)
C. Hsu & W. Feng - Proc. of the 2005 ACM/IEEE conference on Supercomputing

Enhancements in DVFS
 Dynamic Frequency Scaling per Core
 Each core runs at its own clock
 Power is linear with frequency
 Power savings are relatively small
 Separate power planes for the core and "uncore" part
of the CPU
 Cores can go to sleep (C-state)
 Memory controller is still operational for external device
(e.g. via DMA)

Enhancements in DVFS
 Clock gating
 Clock disabled sleep state (AMD-C1,E1, Intel-
C[0,1,3,6])
 At the CPU block level
 At the core level
 Reduces dynamic power
 Power Gating
 Power to CPU/core cut off (~0V)
 Reduces both dynamic and static(leakage) power

Power optimization at NPSF
 Scheduler capable of :
 Power off a node after a pre specified state of idleness(no
job)
 Power optimization with QOS(turnaround time)
 Node power on time(2-3 min) is additional
 Targeted power policies
 Aggressive optimization w/o regard to QOS
 Power capping
 Power budget

 Node packing via checkpointing, migration & restart
 MPI with BLCR – one approach
 Use of virtualization – another approach
 Considerations –
 Remaining walltime of job being migrated
 Remaining walltime of jobs on node in consideration
 Associated cost of migration against power savings expected to
be achieved

Simulation Results - Table
Parameter Case Case I Case II Case III

Power saving (in percentage) 4.05 4.22 9.29

NODEIDLEPOWERTHRESHOLD 8 6 4
(In minutes)

 Feedback driven policy engine
 Speculative power on/off of nodes at any given time
 Metrics/deciding factors
 Function of Jobs arrival time & resource requirements
 How many nodes at what time
 Current and probable cluster utilization at given time – another
metric
 Expected starttime of jobs in queue
 Minimize impact on turnaround time of job

PARAM Yuva – Access & Account

Symposium on HPC Applications – IIT Kanpur

More Related Content

What's hot

Viewers also liked

Similar to Symposium on HPC Applications – IIT Kanpur

Recently uploaded

Symposium on HPC Applications – IIT Kanpur