• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Energy efficient resource management for high-performance clusters
 

Energy efficient resource management for high-performance clusters

on

  • 994 views

In the past decade, high-performance cluster computing platforms have been widely used to solve challenging and rigorous engineering tasks in industry and scientific applications. Due to extremely ...

In the past decade, high-performance cluster computing platforms have been widely used to solve challenging and rigorous engineering tasks in industry and scientific applications. Due to extremely high energy cost,reducing energy consumption has become a major
concern in designing economical and environmentally friendly cluster computing
infrastructures for many high-performance applications. The primary focus of this talk is to illustrate how to improve energy efficiency of clusters and storage systems without significantly degrading performance. In this talk, we will first describe a general architecture
for building energy-efficient cluster computing platforms. Then, we will outline several energyefficient scheduling algorithms designed for high-performance clusters and large-scale storage systems. The experimental results using both synthetic and real world applications
show that energy dissipation in clusters can be reduced with a marginal degradation of system performance.

Statistics

Views

Total Views
994
Views on SlideShare
994
Embed Views
0

Actions

Likes
0
Downloads
32
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • See also: defense_Ziliang.ppt
  • High performance computing platforms have been widely deployed for intensive data processing and data storage. The impact of high performance computing platforms could be found in almost every domain: financial services, scientific computing, bioinformatics, computational chemistry, and weather forecast.
  • High performance computing platforms have been widely deployed for intensive data processing and data storage. The impact of high performance computing platforms could be found in almost every domain: financial services, scientific computing, bioinformatics, computational chemistry, and weather forecast.
  • High performance computing platforms have been widely deployed for intensive data processing and data storage. The impact of high performance computing platforms could be found in almost every domain: financial services, scientific computing, bioinformatics, computational chemistry, and weather forecast.
  • High performance computing platforms have been widely deployed for intensive data processing and data storage. The impact of high performance computing platforms could be found in almost every domain: financial services, scientific computing, bioinformatics, computational chemistry, and weather forecast.
  • High performance computing platforms have been widely deployed for intensive data processing and data storage. The impact of high performance computing platforms could be found in almost every domain: financial services, scientific computing, bioinformatics, computational chemistry, and weather forecast.
  • High performance computing platforms have been widely deployed for intensive data processing and data storage. The impact of high performance computing platforms could be found in almost every domain: financial services, scientific computing, bioinformatics, computational chemistry, and weather forecast.
  • This slide shows a typical high-performance computing platform, which was built by Google in the Oregon state. There is no doubt that they have significantly changed our lives and we all benefit from the great services provided by these super computing platforms . However, these giant machines consume a huge amount of energy.
  • This figure comes from the report of Environmental Protection Agency submitted to the congress last year. Based on their report, the total power usage of servers and data centers in United States is 61.4 billion kwh in 2006. This is more than doubled the energy usage for the same purpose in 2000. If we look at the trend, from 2000 to 2006, the energy consumed by servers and data centers rapidly increased from 28.2 billion kwh all the way up to 61.4 billion kwh.
  • Even worse, the EPA predicts that the power usage of servers and data centers will be doubled again within 5 years if the historical trends are followed. Even we follow the current efficiency trends, the power usage will exceed 100 billion kwh in 2011. This is a huge amount of energy.
  • However, most pervious research primarily focus on the performance, security and reliability issues of high-performance computing platforms. The energy consumption issue was ignored. Now the energy problem has become so serious and I believe it is time for us to highlight the energy efficiency research of high-performance computing platforms.
  • In our architecture, we have four layers: application layer, middleware layer, resource layer, network layer. In each layer, we can incorporate energy-aware techniques. For example, in the application layer, we can reduce the unnecessary access to hardware when writing the code. In the middleware layer, we can schedule parallel tasks in more energy-efficient ways. In the resource and network layers, we can do energy-aware resource management.
  • This slide shows some typical hardware in the resource and network layers like CPU, main board, storage disk, network adapter, switch and router.
  • One thing I would like to emphasize here is that any energy-oriented research should not scarify other important characters like performance, reliability or security. Although there must be some tradeoff once we introduce energy-aware techniques, we do not want to see significant degradation in other characters. In other words, we would like to make our research compatible with existing techniques. For my research, I mainly focus on the tradeoff between performance and energy.
  • Before we talk about the algorithms, let’s see the cluster systems first. In a cluster, we have the master node and slave nodes. The master node is responsible to schedule tasks and allocate them to slave nodes for parallel execution. All slave nodes are connected by high speed interconnections and they communicate with each other through message passing.
  • The parallel tasks running on clusters are represented using Directed Acyclic Graph , or DAG for short. Usually, a dag has one entry task and one or multiple exit tasks. Dag shows the task number and the execution time of each task. It also shows the dependence and communication time among tasks. Explain a little bit…
  • Weakness 1: Do not consider energy conservation in memoryWeakness 2: Energy can’t be conserved even then network interconnects are idleIn order to improve performance, we use duplication strategy. This slide shows why duplication can improve performance. Here we have 4 tasks represented by the DAG in the left side. If we use linear scheduling, all four tasks will be allocated in 1 CPU and the execution time will be 39s. However, we noticed that we can schedule task 2 to the 2nd CPU so that we do not need to wait the completion of task 3. In that way, the total time will shortened to 32s. We also noticed that 6s are wasted in the 2nd CPU because task 2 has to wait the message from task 1. If duplicate task 1 in the 2nd CPU, we can further shorted the schedule length to 29s. Obviously, the duplication could improve performance.
  • However, if we calculate the energy, we will find that duplication may consume more power. For example, if we set the energy consumption for CPU and network 6w and 1w, the total energy consumption of duplication will be 42J more than NDS and 50J more than linear schedule. That is mainly because task 1 are executed twice. Here I would like to mention that I will use NDS(MCP) to represent no duplication schedule and use TDS to represent task duplication schedule. You will see a lot of them in the simulation results.
  • So we have to consider the tradeoff between performance and power consumption. We propose two algorithms to consider the tradeoff. One is called energy-aware duplication or EAD for short. The other one is called performance-energy balanced duplication or PEBD for short. In EAD, we only calculate the energy cost for duplicating a task. For example, if we duplicate T1, we will pay the 48J energy cost in the CPU side because we have to execute T1 twice . At the same time, we can save 6J energy in the network side because we do not need send message from T1 to T2. So the total cost will be 42J. In PEBD, we also calculate the performance benefit. If we duplicate T1, we can shorten schedule length 6s in maxim. So the ration between energy and performance will be 7. If we set duplication threshold to be 10, EAD will not duplicate while PEBD will duplicate.
  • Now let’s look at how to implement the algorithms using a concrete example. Step1, we will generate the DAD based on the task description, which should be provided by users.
  • Next, we are going to calculate the important parameters based on the equations 14-19 shown in Chapter4. The level means…
  • Once we have these parameters, we can obtain the original task list by sorting the level in an ascending order. We will start from the first unscheduled task in the list, which is 10, and follow the favorite predecessor to the entry task. All tasks on this path will form a critical path. Here the first critical path will be 10->9->7->3->1; Then, these tasks will be marked as scheduled. In the next iteration, the algorithm will pick up the next unscheduled task as the start task and form the second critical path. Then, the third one and the fourth one. The algorithm will not terminated until all tasks have been scheduled.
  • The algorithms also have to make the duplication decision. Explain…
  • This diagram summarize the steps we just talked about. I will just skip it.
  • Now we are going to discuss the simulation results. We implement our own simulator using C language under Linux system. The CPU power consumption parameters come from the xbitlabs. We simulate 4 different CPUs, 3 of them are AMD and one is Intel.
  • This slide shows the structure of two small task set. The left one is Fast Fourier Transform and the right one is Gaussian Elimination.
  • The slide shows the DAG structure of two real-world applications. The left one is Robot Control and the right one is Sparse Matrix Solver.
  • This slide shows the impact of CPU types. Recall that I simulate 4 different CPUs, which are represented in 4 different colors. We found that the CPU with blue color can save more energy compares with other 3 CPUs. For example, we can save 19.4% energy using blue CPU while we only can save 3.7% for the purple CPU. The indication behind is that these 4 CPUs have different gaps between CPU_busy and CPU_idle. This table summarize the difference. The gap for the blue CPU is 89w but the gap for the purple CPU is only 18w. So our observation is…
  • This slide shows the impact of interconnections. The left one is the simulation results for Myrinet and the right one is the simulation results for the Infiniband. We can save 16.7% and 13.3% energy when CCR is 0.1 and 0.5 respectively using Myrinet. However, the number drops down to 5% and 3.1% for Infiniband. We found that the only difference between these two simulation sets are the network power consumption rate. The Myrinet is 33.6w and the Infiniband is 65w. So our observation is that…
  • We also observe the impact of application parallelism. The left figure shows the experimental results for Robot Control and the right one shows the results for Sparse. We noticed that we can save 17% and 15.8% energy for robot but only save 6.9% and 5.4% energy for sparse when CCR is the same. That is because the parallelism of robot is less than sparse. So our observation is…
  • This slide shows our observation to the impact of CCR. Read...
  • This group of simulation results show the impact to performance. The left one is for Gaussian and the right one is for Sparse. This table summarize that the overall performance degradation of EAD and PEBD is 5.7% and 2.2% compared with TDS for Gaussian. For Sparse, the number is 2.92% and 2.02%. Our observation is …
  • For example, we designed a mapping matrix to represent the execution time of tasks in different processors. As you can see, for the same task T1, the execution time are 6.7, 3.9, 2.0 respectively. If a task could not be executed in a processor, we will put a infinite sign.
  • We compared our HEADUS algorithm with other 4 algorithms and found that HEADUS can obtain the best overall energy savings in all of the 4 different environments.
  • We also observed that HEADUS can same more energy under environment 2 and 4.

Energy efficient resource management for high-performance clusters Energy efficient resource management for high-performance clusters Presentation Transcript

  • Energy Efficient Scheduling for High-Performance Clusters
    ZiliangZong, Texas State University
    Adam Manzanares, Los Alamos National Lab
    Xiao Qin, Auburn University
  • Where is Auburn University?
    Ph.D.’04, U. of Nebraska-Lincoln
    04-07, New Mexico Tech
    07-now, Auburn University
  • Storage Systems Research Group at New Mexico Tech (2004-2007)
    2011/6/22
    3
  • Storage Systems Research Group at Auburn (2008)
    2011/6/22
    4
  • Storage Systems Research Group at Auburn (2009)
    2011/6/22
    5
  • Storage Systems Research Group at Auburn (2011)
    2011/6/22
    6
  • Investigators
    ZiliangZong, Ph.D.
    Assistant Professor,
    Texas State University
    Adam Manzanares, Ph.D. Candidate
    Los Alamos National Lab
    Xiao Qin, Ph.D.
    Associate Professor
    Auburn University
    2011/6/22
    7
  • 2011/6/22
    8
    Introduction - Applications
  • Introduction – Data Centers
    2011/6/22
    9
  • Motivation – Electricity Usage
    EPA Report to Congress on Server and Data Center Energy Efficiency, 2007
    2011/6/22
    10
  • Motivation – Energy Projections
    EPA Report to Congress on Server and Data Center Energy Efficiency, 2007
    2011/6/22
    11
  • Motivation – Design Issues
    2011/6/22
    12
  • Architecture – Multiple Layers
    2011/6/22
    13
  • Energy Efficient Devices
    2011/6/22
    14
  • Multiple Design Goals
    2011/6/22
    15
  • Energy-Aware Scheduling for Clusters
    2011/6/22
    16
  • Parallel Applications
    2011/6/22
    17
  • Motivational Example
    8
    T1
    T3
    T2
    T4
    1
    23
    33
    39
    0
    8
    6
    5
    2
    3
    T1
    T3
    T4
    10
    15
    23
    26
    32
    0
    8
    6
    2
    2
    4
    T2
    4
    24
    14
    6
    T3
    T4
    T1
    T1
    23
    29
    20
    0
    8
    0
    8
    2
    T2
    18
    Linear Schedule
    Time: 39s
    No Duplication Schedule (NDS)
    Time: 32s
    Task Duplication Schedule (TDS)
    Time: 29s
    An Example of duplication
    2011/6/22
    18
  • Motivational Example (cont.)
    (8,48)
    (6,6)
    (5,5)
    T1
    T3
    T2
    T4
    1
    23
    33
    39
    0
    8
    (15,90)
    (10,60)
    2
    3
    T1
    T3
    T4
    (4,4)
    (2,2)
    23
    26
    32
    0
    8
    6
    2
    T2
    (6,36)
    4
    24
    14
    T3
    T4
    T1
    T1
    23
    29
    20
    0
    8
    0
    8
    2
    T2
    18
    Linear Schedule
    Time:39s Energy: 234J
    No Duplication Schedule (MCP)
    Time: 32s Energy: 242J
    Task Duplication Schedule (TDS)
    Time: 29s Energy: 284J
    An Example of duplication
    CPU_Energy=6W
    Network_Energy=1W
    2011/6/22
    19
  • Motivational Example (cont.)
    (8,48)
    (6,6)
    (5,5)
    1
    (15,90)
    (10,60)
    2
    3
    T1
    T3
    T4
    (4,4)
    (2,2)
    23
    26
    32
    0
    8
    6
    2
    T2
    (6,36)
    4
    24
    14
    T3
    T4
    T1
    T1
    23
    29
    20
    0
    8
    0
    8
    2
    T2
    18
    The energy cost of duplicating T1:
    CPU side: 48J Network side: -6J Total: 42J
    The performance benefit of duplicating T1: 6s
    Energy-performance tradeoff: 42/6 = 7
    EAD
    Time: 32s Energy: 242J
    PEBD
    Time: 29s Energy: 284J
    If Threshold = 10
    Duplicate T1?
    EAD: NO
    PEBD: Yes
    2011/6/22
    20
  • Basic Steps of Energy-Aware Scheduling
    Algorithm Implementation:
    Step 1: DAG Generation
    Task Description:
    Task Set {T1, T2, …, T9, T10 }
    T1 is the entry task;
    T10 is the exit task;
    T2, T3 and T4 can not start until T1 finished;
    T5 and T6 can not start until T2 finished;
    T7 can not start until both T3 and T4 finished;
    T8 can not start until both T5 and T6 finished;
    T9 can not start until both T6 and T7 finished;
    T10 can not start until both T8 and T9 finished;
    2011/6/22
    21
  • Basic Steps of Energy-Aware Scheduling
    Algorithm Implementation:
    Total Execution time from current task to the exit task
    Earliest Start Time
    Earliest Completion Time
    Latest Allowable Start Time
    Latest Allowable Completion Time
    Favorite Predecessor
    Step 2: Parameters Calculation
    2011/6/22
    22
  • Basic Steps of Energy-Aware Scheduling
    Algorithm Implementation:
    Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}
    Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1}
    Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1}
    Original Task List: {10, 9, 8,5, 6, 2, 7, 4, 3,1}
    Original Task List: {10, 9, 8,5, 6, 2, 7,4, 3,1}
    Step 3: Scheduling
    2011/6/22
    23
  • Basic Steps of Energy-Aware Scheduling
    Algorithm Implementation:
    Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}
    Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1}
    Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3,1}
    Original Task List: {10, 9, 8,5, 6, 2, 7, 4, 3,1}
    Original Task List: {10, 9, 8,5, 6, 2, 7,4, 3,1}
    Step 4: Duplication Decision
    Decision 1: Duplicate T1?
    Decision 2: Duplicate T2?
    Duplicate T1?
    Decision 3: Duplicate T1?
    2011/6/22
    24
  • The EAD and PEBD Algorithms
    Generate the DAG of given task sets
    Calculate energy increase
    and time decrease
    Calculate energy increase
    Find all the critical paths in DAG
    Ratio= energy increase/ time decrease
    more_energy<=Threshold?
    Generate scheduling queue based on the level (ascending)
    No
    Yes
    select the task (has not been scheduled yet) with the lowest level as starting task
    No
    Ratio<=Threshold?
    Duplicate this task and select the next task in the same critical path
    Yes
    meet entry task
    Duplicate this task and select the next task in the same critical path
    No
    allocate it to the same processor with the tasks in the same critical path
    Yes
    No
    For each task which is in the
    same critical path with starting task, check
    if it is already scheduled
    Save time if duplicate
    this task?
    Yes
    PEBD
    EAD
    2011/6/22
    25
  • Energy Dissipation in Processors
    http://www.xbitlabs.com
    2011/6/22
    26
  • Parallel Scientific Applications
    Fast Fourier Transform
    Gaussian Elimination
    2011/6/22
    27
  • Large-Scale Parallel Applications
    Robot Control
    Sparse Matrix Solver
    http://www.kasahara.elec.waseda.ac.jp/schedule/
    2011/6/22
    28
  • Impact of CPU Power Dissipation
    Impact of CPU Types:
    19.4%
    3.7%
    Energy consumption for different processors (Gaussian, CCR=0.4)
    Energy consumption for different processors (FFT, CCR=0.4)
    2011/6/22
    29
  • Impact of Interconnect Power Dissipation
    Impact of Interconnection Types:
    5%
    3.1%
    16.7%
    13.3%
    Energy consumption (Robot Control, Myrinet)
    Energy consumption (Robot Control, Infiniband)
    2011/6/22
    30
  • Parallelism Degrees
    Impact of Application Parallelism:
    6.9%
    5.4%
    17%
    15.8%
    Energy consumption of Sparse Matrix (Myrinet)
    Energy consumption of Robert Control(Myrinet)
    2011/6/22
    31
  • Communication-Computation Ratio
    Impact of CCR:
    Energy consumption under different CCRs
    CCR: Communication-Computation Rate
    2011/6/22
    32
  • Performance
    Impact to Schedule Length:
    Schedule length of Gaussian Elimination
    Schedule length of Sparse Matrix Solver
    2011/6/22
    33
  • Heterogeneous Clusters - Motivational Example
    2011/6/22
    34
  • Motivational Example (cont.)
    Energy calculation for tentative schedule
    C1
    C2
    C3
    C4
    2011/6/22
    35
  • Experimental Settings
    Simulation Environments
    2011/6/22
    36
  • Communication-Computation Ratio
    CCR sensitivity for Gaussian Elimination
    2011/6/22
    37
  • Heterogeneity
    Computational nodes heterogeneity experiments
    2011/6/22
    38
  • Conclusions
    • Architecture for high-performance computing platforms
    • Energy-Efficient Scheduling for Clusters
    • Energy-Efficient Scheduling for Heterogeneous Systems
    • How to measure energy consumption? Kill-A-Watt
    2011/6/22
    39
  • Source Code Availability
    www.mcs.sdsmt.edu/~zzong/software/scheduling.html
    2011/6/22
    40
  • Download the presentation slideshttp://www.slideshare.net/xqin74
    Google: slideshare Xiao Qin
    ‹#›
  • http://www.eng.auburn.edu/~xqin
  • My webpagehttp://www.eng.auburn.edu/~xqin
  • Download Slides at slidesharehttp://www.slideshare.net/xqin74
  • Questionshttp://www.eng.auburn.edu/~xqin
    2011/6/22
    45