In a heterogeneous computing cluster, cluster objectives are conflicting to each other. Selecting a right combination of machines is necessary to enhance cluster performance, and to optimize all the cluster objectives. In this paper, we perform empirical performance analyses of a real cluster with our year-long collected data, formulate a new many-objective optimization problem for clusters, and integrate a greedy approach with the existing NSGA-III algorithm to solve this problem. From our experimental results, we find our approach performs better than existing optimization approaches.
Many-Objective Performance Enhancement in Computing Clusters
1. 1, 2, 3, 5Department of CSE , Bangladesh University of Engineering and Technology, Dhaka-1000, Bangladesh
1Department of Computer Science, University of Southern California, USA
4Department of CSE, University of South Florida, USA
Motivation
• Modeling, simulation, and experimentation of
complex real-world phenomena demand rigorous
computing.
•Parallel computing is required for this rigorousness.
• People often use clusters for such computing [5].
Many-Objective Performance Enhancement in Computing Clusters
A.S.M Rizvi1, Tarik Reza Toha2, Siddhartha Shankar Das3, Sriram Chellappan4 and A. B. M. Alim Al Islam5
• Clusters have conflicting outcomes.
• For example, if we increase the number of machines,
we get two conflicting outcomes:
Decrease in computation time
Increase in maintenance cost
• Particle Swarm Optimization (PSO) based
approach [1]
• Optimization technique based on Ant Colony
Optimization (ACO) [2]
• Stochastic optimization approach [3]
•Multi-objective optimization for virtual
machine based schemes in cloud [4]
Our Contributions
•We exploit a synergy between
greedy method and NSGA-III
algorithm to solve a many-
objective optimization problem
for clusters.
Incorporating cooling
energy consumption
Utilizing empirical
characterization of
clusters
•We perform laboratory
experiments to demonstrate the
efficacy of our proposed
solution.
Email: asmrizvi@usc.edu, 1205082.trt@ugrad.cse.buet.ac.bd, siddhartha047@cse.buet.ac.bd, sriramc@usf.edu, alim_razi@cse.buet.ac.bd
Formulation of Our Many-Objective Optimization
Problem for Computing Clusters
NSGA-III: Modified Selection Process
References
[1] C. Lijun and L. Xiyin. Modeling server load balance in cloud clusters based on multi-objective particle swarm optimization. IJGDC, 8(3):87–96, 2015.
[2] Y. Gao, H. Guan, Z. Qi, Y. Hou, and L. Liu. A multi-objective ant colony system algorithm for virtual machine placement in cloud computing. Journal of
Computer and System Sciences, 79(8):1230–1242, 2013.
[3] K.M. Tarplee, A.A. Maciejewski, and H.J. Siegel. Robust performance based resource provisioning using a steady-state model for multi-objective
stochastic programming. IEEE Transactions on Cloud Computing, 2016.
[4] R. Li, Q. Zheng, X. Li, and J. Wu. A novel multi-objective optimization scheme for rebalancing virtual machine placement. In 9th IEEE CLOUD, pages
710–717, 2016.
[5] B. Barney, “Introduction to Parallel Computing." https://computing.llnl.gov/tutorials/parallel_comp/, 2017.
Acknowledgement This research work has been funded by the ICT Division, Government of the People's Republic of
Bangladesh.
IEEE IPCCC, 2017
San Diego, California, USA
(a) Galaxy formation (b) Planetary movements (c) Climate changes • Hence, an optimization is required to select:
Right number of machines in the cluster,
Right combination of machines in the cluster.
Issues That Are Yet to Be Handled
• Consideration of cooling energy
consumption – around 39% energy is
cooling energy in a US data center.
•Empirical performance characterization of
clusters
Should result in a new optimization
model
Fig. 1: Examples of experimentation where parallel computing
is necessary
Fig. 2: Energy consumption in a US data center
Do not accumulate the impact of cooling energy consumption
Do not integrate any empirical performance characterization of
clusters
Fig. 3: Computation time decreases with an
increase in the # of machines
Fig. 4: Total energy decreases with an increase
in the # of machines
Following the empirical
analysis, we formulate our
objective functions as follows:
Computation time
Energy consumption
Cost
Inverse of resource
utilization
Restriction on
assigned workload
Constraint on # of
selected machines
Limit of cooling
temperature
Simulation Environment
• When the number of machine is small, computation
time and energy consumption become high.
• Hence, the number of selected machines should be
greater than a particular threshold.
• We select this threshold as
𝑁 𝑀
6
, where 𝑁 𝑀 is the
number of cluster machines.
Fig. 5: Computation time is very high when
the # of machines is very small
Fig. 6: Total energy consumption is very high
when the # of machines is very small
NSGA-III: Modified Crossover
Half Uniform
Crossover (HUX)
Greedy clustering approach
Greedy approach to include
the best machine and exclude
the worst machine
After
crossover
Machine
selection
decision
variables
(binary type)
Cluster
temperature
decision
variable
(float type)
Parent 1:
Parent 2:
Yellow variables will have
crossover within yellow
variables
Processor
speed
Memory
Network
B/W
Best
group
Worst
group
Try to take machines from best group
Try not to take machines from worst
group
Best
group
Worst
group
Best
group
Worst
group
Best
machine
Worst
machine
Best
machine
Worst
machine
Best
machine
Worst
machine
Processor
speed
Memory
Network
B/W
Based on a probability, include the best
machine and exclude the worst machine
Solution Filtering
Fig. 7: Pareto front showing 15 solutions of
our minimization problem
• Worst objective values for objective 1, 3,
and 4 (for being a minimization problem)
while having the best value for objective
2.
• We avoid such solutions through filtering
using a weighted function.
• Weighted function to select one solution:
𝑭 𝒕𝒐𝒕𝒂𝒍 = 𝑾 𝒐𝒃𝒋𝟏 × 𝑽 𝒐𝒃𝒋𝟏 + 𝑾 𝒐𝒃𝒋𝟐 ×
𝑽 𝒐𝒃𝒋𝟐 + … + 𝑾 𝒐𝒃𝒋𝑵 × 𝑽 𝒐𝒃𝒋𝑵
Parameter Value
# of master machines 1
# of slave machines 29
PC power Peak: 10 - 400 W, idle: 2.5-100 W, power off: 5 W
Network B/W 10 to 100 kbps
Total data size 67.7 GB, 50.4 GB, and 28.3 GB
SimGrid version 3.12
Simulation Results
Fig. 8: Comparing modified NSGA-III, PSO, and ACO in SimGrid with 30 machines
Workload Time Cooling energy Computation energy
67.7 GB 21, 43 13, 10 10, 5
50.4 GB 36, 17 11, 8 10,8
28.3 GB 43, 15 13, 5 10, 0
Table 2: % of improvement over PSO and ACO in SimGrid with 30 machines
Table 1: Simulation environment in SimGrid
Related Work
Conclusion
• Provide a solution to cluster administrator
for selecting the right number and right
combination of machines
• Our experimentation includes
Many-objective problem formulation
Developing a new solution approach
exploiting NSGA-III and greedy
algorithm
Performance evaluation
Background