GMC: Greening MapReduce Clusters Considering both Computation Energy and Cooling Energy

GMC: Greening MapReduce Clusters
Considering both Computation Energy and
Cooling Energy
1,4,5Bangladesh University of Engineering and Technology, Bangladesh
2University of Nebraska-Lincoln, USA
3University of Southern California, USA
ICC 2018
Kansas City, MO, USA
Tarik Reza Toha1, Mohammad M. R. Lunar2, A s m Rizvi3,
Novia Nurain4, and A. B. M. Alim Al Islam5

Outline
• Background and motivation
• Related work
• Proposed methodology
– Considering both computation and cooling energy
consumption
• Performance evaluation
– Test-bed implementation
• Conclusion
2

Parallel Computing
3
Galaxy formation Planetary movements Climate changes
Modeling, simulation, and experimentation of complex real-world
phenomena demand rigorous computing
Traffic simulation Plate tectonics Weather forecasts

Architectures of Parallel Computing
4
Cluster topology
Grid computingCloud computing
Large Linux cluster in University
of Technology, Germany
Cluster machines are connected
by a local area network
Clouds and grids are
geographically distributed
Clusters are tightly coupled,
whereas clouds and grids are
loosely coupled
Home-made cluster

MapReduce Clusters
MapReduce programming model is increasingly
being used in recent times to parallelize and
distribute jobs across a cluster
5
Advantages Disadvantages
• Performance
• Availability
• High energy consumption
– High operational costs
(e.g., electricity bills)
– Negative environmental
impacts (e.g., CO2
emissions)
It is of utmost importance to
invest our efforts on energy
efficiency in addition to
performance improvement of
MapReduce clusters

Energy Consumption Breakdown
6
A breakdown of energy consumption by different components of a data center
Dayarathna, Miyuru, et al., 2016
Cooling energy occupies a significant portion of total energy
consumption in parallel and distributed systems encompassing
clusters

• Dynamic Energy-Aware Capacity Provisioning
for Cloud Computing Environments
– Zhang et al., ICAC, 2012
– A homogenous cloud solution
• Provides optimum number of machines
– Trade-offs between energy efficiency considering
computational power and waiting time as
performance
• Considers the cost of turning on and off servers and
fluctuation in energy prices
Cooling power is
not considered!
Existing Greening Approaches
7

• Power Management in Heterogeneous
MapReduce Cluster
– Sunuwar et al., MSc Thesis, UNL, 2016
– A heterogenous MapReduce cluster solution
• Addresses data unavailability of MapReduce cluster due to
dynamic capacity provisioning
computational power and throughput as performance
• Restricts CPU utilization of slave nodes
Cooling power is
not considered!
Existing Greening Approaches (contd.)
8

• Thermal Aware Server Provisioning and
Workload Distribution for Internet Data Centers
– Abbasi et al., HPDC, 2010
– A homogenous solution for Internet data center
both computational and cooling power and response
time as performance
• Uses thermodynamic model of the data center
• Selects active servers based on least recirculated heat
Neither consider
indoor nor
outdoor weather!
Existing Greening Approaches (contd.)
9
Cooing energy minimization of a spatially
distributed machines in a cloud does not need
indoor and outdoor weather data as it is
difficult to collect them
Indoor and outdoor weather conditions play a
significant role in cooling energy
minimization of a co-located cluster
environment

Our Contribution
10
We propose a machine learning based approach,
GMC, which predicts the number of machines for
minimum total energy consumption of a
MapReduce cluster including both computational
energy and cooling energy while considering
weather conditions with minimal overhead
Green MapReduce Cluster (GMC)

Our Proposed GMC Framework
11
Block diagram of operations in our proposed framework

• Configuration of our Hadoop cluster
• Other configurations
– Operating system: Ubuntu 14.04 LTS (x86)
– Hadoop version: 1.0.3 (default configuration)
– Input data-set: Wikimedia database
– Algorithm: wordcount (provided by Hadoop)
– Number of air-conditioners: 3
• Type: Split-type
• Tons: 2
12
Test-bed Implementation
Processor Cores RAM # of machines Nodes
Intel Core 2 Duo E4600 @ 2.40GHz 2 1 GB 1
SlaveIntel Core 2 Duo E7300 @ 2.66GHz 2 2 GB 11
Intel Core 2 Duo E7400 @ 2.80GHz 2 2 GB 17
Intel Core i5-3470 @ 3.20GHz 4 4 GB 1 Master

Snapshots of Test-bed Implementation
13
Hadoop cluster Weather sensing module
CPU power sensing module AC power sensing module

Indoor and Outdoor Weather Data
14
Temperature condition during training and testing phases

Prediction 1: Response Time
15
1-nearest neighbors can predict it with 87.27% accuracy

Prediction 2: Computational Power
16
Support vector machine for regression can predict it with 98.62% accuracy

Prediction 3: Cooling Power
17
Additive regression with random forest can predict it with 67.76% accuracy

Overall Prediction: Total Energy
18Accuracy of total energy prediction is 97.23%
Total Energy = (CPU Power + AC Power) × Response Time

Working Process of GMC Framework
19Working process of our proposed GMC framework

Energy Consumption Comparison
21
In all cases, GMC
achieves the best
trade-off (on an
average, 29% over
Zhang et al. and
55% over Sunuwar
et al.)
In all cases, GMC
reduces total
energy (on an
average, 19% over
Zhang et al. and
47% over Sunuwar
et al.)

Conclusion
• MapReduce clusters consume substantial amount of energy
including both computation and cooling energy
– Little effort has been spent to minimize the energy consumption
considering both the energy components
• We provide a greening scheme simultaneously considering
both computational power and cooling power consumption
with minimal overhead
– Outperforms existing greening method while maintaining performance similar
to static method
• Future work
– Simulate GMC in a large heterogeneous cluster
– Use context dependent classification in cooling power prediction (e.g. Kalman
filter)
– Use many objective optimization technique to optimize all performance terms
22

Thank you
Questions are welcome!
Email: alim_razi@cse.buet.ac.bd
23

GMC: Greening MapReduce Clusters Considering both Computation Energy and Cooling Energy

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to GMC: Greening MapReduce Clusters Considering both Computation Energy and Cooling Energy

Similar to GMC: Greening MapReduce Clusters Considering both Computation Energy and Cooling Energy (20)

More from Tarik Reza Toha

More from Tarik Reza Toha (19)

Recently uploaded

Recently uploaded (20)

GMC: Greening MapReduce Clusters Considering both Computation Energy and Cooling Energy

Editor's Notes