• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Symposium on HPC Applications – IIT Kanpur
 

Symposium on HPC Applications – IIT Kanpur

on

  • 347 views

 

Statistics

Views

Total Views
347
Views on SlideShare
347
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Symposium on HPC Applications – IIT Kanpur Symposium on HPC Applications – IIT Kanpur Presentation Transcript

    • A review of power & energyconsumption optimization in HPC Rishi Pathak riship@cdac.in National PARAM Supercomputing Facility, C-DAC, Pune Symposium on HPC Applications – IIT Kanpur March 12 - 14, 2012
    • Top 10 – Top500
    • Top 10 – Green 500
    • 3 2.02 GF/Watt 2.02 1.982.5 1.68 Green 500, Rank 1-10 (GF per Watt) 1.99 Top 500, Rank 1-10 (GF per Watt) 2 1.37 1.26 GPU1.5 GPU GPU 1.01 0.95 GPU 0.96 GPU 1 GPU 0.83 0.85 GPU 0.63 GPU0.5 0.49 0.36 0.44 0.28 0.25 0.29 0.27 0 1 2 3 4 5 6 7 8 9 10
    • Exascale system• Likely to be feasible by 2017±2• 10-100 Million processing elements (cores or mini- cores)• Chips perhaps as dense as 1,000 cores per socket• Clock rates will grow more slowly• Large-scale optics based interconnects• 10-100 PB of aggregate memory• Performance per watt ~ 100 GF/watt sustained performance• 10 – 100 MW Exascale system
    • Power & Energy E=P*T Energy(E) consumed in time(T) with average power(P) Minimizing time interval will limit energy A minimum value of T for an application  Mapping of application to cluster system  Scalability & system bottlenecks Beyond that – Power management approaches
    • Power management techniques Static Power Management(SPM)  Low power CPUs  Local flash storage  Suitable for data centric applications Dynamic Power Management(DPM)  Software & power scalable components  Dynamically adjust power consumption  Frequency & Voltage scaling for CPU & memory
    • DVFS Dynamic Voltage & Frequency Scaling P = C * V2 * f Throttling when  Workload is not CPU bound  Is not much CPU intensive
    • DVFS Scheduling Off-line, trace-based scheduling  Source code instrumentation for performance profiling  Execution with profiling  Determination of appropriate processor frequencies for each phase  Source code instrumentation for DVFS schedulingS. Huang & W. Feng – Proc. Cluster computing[IEEE/ACM](2009)
    • DVFS Scheduling Run-time, profiling-based scheduling  Time-window based performance prediction model  No a priori information of application phases  False prediction will have dire consequences for performance or energy efficiency  Metrics  MIPS & CPU utilization  Interception of MPI communication calls  File I/O calls  MPI receive wait cycles  Shown to reduce energy with pre-specified performance loss constraint
    • DVFS Implementations Memory MISER (Management Infra-Structure for Enerygy Reduction) CPU MISER Linux CPUSPEED Ecod Beta-Algorithm M. E. Tolentino, J. Turner & K. W. Cameron – Proc. of the 4th international conference on Computing frontiers(2007) S. Huang & W. Feng – Proc. Cluster computing[IEEE/ACM](2009) C. Hsu & W. Feng - Proc. of the 2005 ACM/IEEE conference on Supercomputing
    • Enhancements in DVFS Dynamic Frequency Scaling per Core  Each core runs at its own clock  Power is linear with frequency  Power savings are relatively small Separate power planes for the core and "uncore" part of the CPU  Cores can go to sleep (C-state)  Memory controller is still operational for external device (e.g. via DMA)
    • Enhancements in DVFS Clock gating  Clock disabled sleep state (AMD-C1,E1, Intel- C[0,1,3,6])  At the CPU block level  At the core level  Reduces dynamic power Power Gating  Power to CPU/core cut off (~0V)  Reduces both dynamic and static(leakage) power
    • Nehalem core sleep states
    • AMDs and Intels techniques
    • Power optimization at NPSF Scheduler capable of :  Power off a node after a pre specified state of idleness(no job)  Power optimization with QOS(turnaround time)  Node power on time(2-3 min) is additional Targeted power policies  Aggressive optimization w/o regard to QOS  Power capping  Power budget
    • Power optimization at NPSF Node packing via checkpointing, migration & restart  MPI with BLCR – one approach  Use of virtualization – another approach  Considerations –  Remaining walltime of job being migrated  Remaining walltime of jobs on node in consideration  Associated cost of migration against power savings expected to be achieved
    • Saving Potential
    • Simulation Result - Plot
    • Simulation Results - Table Parameter Case Case I Case II Case III Power saving (in percentage) 4.05 4.22 9.29NODEIDLEPOWERTHRESHOLD 8 6 4 (In minutes)
    • Power optimization at NPSF Feedback driven policy engine  Speculative power on/off of nodes at any given time  Metrics/deciding factors  Function of Jobs arrival time & resource requirements  How many nodes at what time  Current and probable cluster utilization at given time – another metric  Expected starttime of jobs in queue  Minimize impact on turnaround time of job
    • Job Arrival Time
    • PARAM Yuva – Access & Account
    • https://yuva.cdac.in/
    • Technical Affiliation Scheme
    • Thank Younpsfhelp@cdac.in