Energy efficient resource management for high-performance clusters

Energy Efficient Scheduling for High-Performance ClustersZiliangZong, Texas State University Adam Manzanares, Los Alamos National Lab Xiao Qin, Auburn University

Where is Auburn University?Ph.D.’04, U. of Nebraska-Lincoln04-07, New Mexico Tech07-now, Auburn University

Storage Systems Research Group at New Mexico Tech (2004-2007)2011/6/223

Storage Systems Research Group at Auburn (2008)2011/6/224

InvestigatorsZiliangZong, Ph.D. Assistant Professor, Texas State UniversityAdam Manzanares, Ph.D. Candidate Los Alamos National LabXiao Qin, Ph.D. Associate Professor Auburn University2011/6/227

2011/6/228Introduction - Applications

Introduction – Data Centers2011/6/229

Motivation – Electricity UsageEPA Report to Congress on Server and Data Center Energy Efficiency, 20072011/6/2210

Motivation – Energy ProjectionsEPA Report to Congress on Server and Data Center Energy Efficiency, 20072011/6/2211

Motivation – Design Issues2011/6/2212

Architecture – Multiple Layers2011/6/2213

Energy Efficient Devices2011/6/2214

Multiple Design Goals2011/6/2215

Energy-Aware Scheduling for Clusters2011/6/2216

Parallel Applications2011/6/2217

Motivational Example8T1T3T2T41233339086523T1T3T41015232632086224T2424146T3T4T1T123292008082T218Linear ScheduleTime: 39sNo Duplication Schedule (NDS)Time: 32sTask Duplication Schedule (TDS)Time: 29sAn Example of duplication2011/6/2218

Motivational Example (cont.)(8,48)(6,6)(5,5)T1T3T2T4123333908(15,90)(10,60)23T1T3T4(4,4)(2,2)2326320862T2(6,36)42414T3T4T1T123292008082T218Linear ScheduleTime:39s Energy: 234J No Duplication Schedule (MCP)Time: 32s Energy: 242JTask Duplication Schedule (TDS)Time: 29s Energy: 284JAn Example of duplicationCPU_Energy=6WNetwork_Energy=1W2011/6/2219

Motivational Example (cont.)(8,48)(6,6)(5,5)1(15,90)(10,60)23T1T3T4(4,4)(2,2)2326320862T2(6,36)42414T3T4T1T123292008082T218The energy cost of duplicating T1:CPU side: 48J Network side: -6J Total: 42JThe performance benefit of duplicating T1: 6sEnergy-performance tradeoff: 42/6 = 7EADTime: 32s Energy: 242JPEBDTime: 29s Energy: 284JIf Threshold = 10 Duplicate T1? EAD: NO PEBD: Yes2011/6/2220

Basic Steps of Energy-Aware SchedulingAlgorithm Implementation:Step 1: DAG GenerationTask Description:Task Set {T1, T2, …, T9, T10 }T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;2011/6/2221

Basic Steps of Energy-Aware SchedulingAlgorithm Implementation:Total Execution time from current task to the exit taskEarliest Start TimeEarliest Completion TimeLatest Allowable Start TimeLatest Allowable Completion TimeFavorite PredecessorStep 2: Parameters Calculation2011/6/2222

Basic Steps of Energy-Aware SchedulingAlgorithm Implementation:Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1} Original Task List: {10, 9, 8,5, 6, 2, 7, 4, 3,1} Original Task List: {10, 9, 8,5, 6, 2, 7,4, 3,1} Step 3: Scheduling2011/6/2223

Basic Steps of Energy-Aware SchedulingAlgorithm Implementation:Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1} Original Task List: {10, 9, 8,5, 6, 2, 7, 4, 3,1} Original Task List: {10, 9, 8,5, 6, 2, 7,4, 3,1} Step 4: Duplication DecisionDecision 1: Duplicate T1?Decision 2: Duplicate T2? Duplicate T1?Decision 3: Duplicate T1?2011/6/2224

The EAD and PEBD AlgorithmsGenerate the DAG of given task setsCalculate energy increaseand time decreaseCalculate energy increaseFind all the critical paths in DAGRatio= energy increase/ time decreasemore_energy<=Threshold?Generate scheduling queue based on the level (ascending)NoYesselect the task (has not been scheduled yet) with the lowest level as starting task NoRatio<=Threshold?Duplicate this task and select the next task in the same critical pathYesmeet entry taskDuplicate this task and select the next task in the same critical pathNoallocate it to the same processor with the tasks in the same critical pathYesNoFor each task which is in the same critical path with starting task, check if it is already scheduled Save time if duplicate this task?YesPEBDEAD2011/6/2225

Energy Dissipation in Processorshttp://www.xbitlabs.com2011/6/2226

Parallel Scientific ApplicationsFast Fourier TransformGaussian Elimination2011/6/2227

Large-Scale Parallel Applications Robot ControlSparse Matrix Solverhttp://www.kasahara.elec.waseda.ac.jp/schedule/2011/6/2228

Impact of CPU Power DissipationImpact of CPU Types:19.4%3.7%Energy consumption for different processors (Gaussian, CCR=0.4) Energy consumption for different processors (FFT, CCR=0.4) 2011/6/2229

Impact of Interconnect Power DissipationImpact of Interconnection Types:5%3.1%16.7%13.3%Energy consumption (Robot Control, Myrinet) Energy consumption (Robot Control, Infiniband) 2011/6/2230

Parallelism DegreesImpact of Application Parallelism:6.9%5.4%17%15.8%Energy consumption of Sparse Matrix (Myrinet)Energy consumption of Robert Control(Myrinet)2011/6/2231

Communication-Computation RatioImpact of CCR:Energy consumption under different CCRsCCR: Communication-Computation Rate2011/6/2232

PerformanceImpact to Schedule Length:Schedule length of Gaussian EliminationSchedule length of Sparse Matrix Solver2011/6/2233

Heterogeneous Clusters - Motivational Example2011/6/2234

Motivational Example (cont.)Energy calculation for tentative scheduleC1C2C3C42011/6/2235

Experimental SettingsSimulation Environments2011/6/2236

Communication-Computation RatioCCR sensitivity for Gaussian Elimination2011/6/2237

HeterogeneityComputational nodes heterogeneity experiments2011/6/2238

ConclusionsArchitecture for high-performance computing platforms

Energy-Efficient Scheduling for Clusters

Energy-Efficient Scheduling for Heterogeneous Systems

How to measure energy consumption? Kill-A-Watt2011/6/2239

Source Code Availabilitywww.mcs.sdsmt.edu/~zzong/software/scheduling.html2011/6/2240

Energy efficient resource management for high-performance clusters

More Related Content

Similar to Energy efficient resource management for high-performance clusters

More from Xiao Qin

Recently uploaded

Energy efficient resource management for high-performance clusters

Editor's Notes