SlideShare a Scribd company logo
1 of 7
Download to read offline
See	discussions,	stats,	and	author	profiles	for	this	publication	at:	https://www.researchgate.net/publication/306056598
A	Reinforcement	Learning	Approach	for	Dynamic
Selection	of	Virtual	Machines	in	Cloud	Data
Centres
Conference	Paper	·	August	2016
CITATIONS
0
READS
111
5	authors,	including:
Kieran	Flesk
National	University	of	Ireland,	Galway
1	PUBLICATION			0	CITATIONS			
SEE	PROFILE
Jim	Duggan
National	University	of	Ireland,	Galway
106	PUBLICATIONS			506	CITATIONS			
SEE	PROFILE
Enda	Howley
National	University	of	Ireland,	Galway
59	PUBLICATIONS			209	CITATIONS			
SEE	PROFILE
Enda	Barrett
11	PUBLICATIONS			61	CITATIONS			
SEE	PROFILE
All	content	following	this	page	was	uploaded	by	Martin	Duggan	on	11	August	2016.
The	user	has	requested	enhancement	of	the	downloaded	file.	All	in-text	references	underlined	in	blue	are	added	to	the	original	document
and	are	linked	to	publications	on	ResearchGate,	letting	you	access	and	read	them	immediately.
A Reinforcement Learning Approach for Dynamic
Selection of Virtual Machines in Cloud Data Centres
Martin Duggan
National University Of
Ireland, Galway
m.duggan1@nuigalway.ie
Kieran Flesk
National University Of
Ireland, Galway
k.flesk2@nuigalway.ie
Jim Duggan
National University Of
Ireland, Galway
jim.duggan@nuigalway.ie
Enda Howley
National University Of
Ireland, Galway
ehowley@nuigalway.ie
Enda Barrett
National University Of
Ireland, Galway
enda.barrett@nuigalway.ie
Abstract—In recent years Machine Learning techniques have
proven to reduce energy consumption when applied to cloud
computing systems. Reinforcement Learning provides a promis-
ing solution for the reduction of energy consumption, while
maintaining a high quality of service for customers. We present
a novel single agent Reinforcement Learning approach for the
selection of virtual machines, creating a new energy efficiency
practice for data centres. Our dynamic Reinforcement Learning
virtual machine selection policy learns to choose the optimal
virtual machine to migrate from an over-utilised host. Our
experiment results show that a learning agent has the abilities
to reduce energy consumption and decrease the number of
migrations when compared to a state-of-the-art approach.
Keywords—Live Migration, Energy, Reinforcement Learning.
I. INTRODUCTION
Recent research in cloud computing has highlighted the
increasing environmental impact of data centres with regards
to electricity usage and associated CO2 emissions. According
to Koomey et al. between the years of 2005 to 2010, data centre
power consumption increased by 56% and in 2010, 1.3% of all
power consumed worldwide was due to data centre operations
[14]. Studies by Barriso et al. and Koomey et al. highlight that
increases in energy consumption range in the billions of kWh
[4], [13], [14]. Furthermore a study by Petty et al. in 2007
highlighted that the Information Communication Technology
(ICT) industry contributed to 2% of global CO2 emission each
year, putting it on par with the aviation industry [15]. However,
a report by Brown in 2007 directly addresses this issue by
stating that existing technologies and strategies could reduce
typical server energy usage by an estimated 25% [8]. This
report highlights if state-of-the-art energy efficiency practices
are implemented throughout the U.S. data centres, an estimated
55% of energy consumption could be reduced when compared
to current levels. Cloud computing leverages infrastructure as
a service (IaaS), platform as a service (PaaS) and software as
a service (SaaS) to provide virtualised computing resources to
customers. A key aspect of the IaaS platform is to provide
computer infrastructure, for example virtualised hardware,
virtual server space, network connections and bandwidth. The
IaaS consists of multiple servers and networks, distributed
throughout and can be divided among numerous data centres.
The virtualised structure of the IaaS platform allows for the
re-allocation of resources through migrating a virtual machine
(VM) or a group of VMs among hosts. Migration is the
process of moving VMs from one physical host to another
based on resource allocation or power saving techniques,
however migration has a considerable impact on energy con-
sumption in data centres. Migration is triggered once a host
transitions into an over-utilised or underutilised state, then a
virtual Machine (VM) or VMs must be relocated to another
host that has sufficient resource, to ensure quality of service
(QoS) is guaranteed for both customer and cloud provider.
Live VM migration is commonly used in cloud data centres
due to its capability of maintaining high system performance
under dynamic workloads. Live migration has been studied
extensively and has shown to be of benefit to cloud providers
[1]. Live migration of a singe VM will require computing
resources and can lead to increases of energy consumption
in a data centre. However a group of migration will require
significant amount of resources and energy which may lead to
the violations in the Service Level Agreement (SLA), which
is the contract between cloud provider and cloud customer.
When a service level agreement violations (SLAV) occurs this
cause increased penalties in overhead for the cloud provider.
The longer a host stays in a over-utilised state the greater the
energy consumption. Thus selecting the optimal VM to migrate
from a host presents a complex and challenging problem.
In this paper, we focus on the selection of a VM to
migrate from an over-utilised host. We propose a dynamic
Reinforcement Learning (RL) VM selection policy, enabling a
single learning agent to decide on a optimal VM for migration
from an over-utilised host depending on the current energy
consumption. This agent-based VM selection policy directly
addresses the issues highlighted in [8] by implementing a new
state-of-the-art energy efficiency practice. Our algorithm is
shown to reduce energy consumption and decrease migration
of VMs, leading to a greener cloud data centre.
The contributions of this paper are:
1) Present a novel Reinforcement Learning VM selec-
tion policy (Lr-RL) with the capabilities to decide
an appropriate VM to migrate from an over-utilised
host. We create a novel state-action space, based
on the CPU utilisation percentage of the host and
VM to be migrated, to show how a Reinforcement
Learning algorithm can improve upon a state of the
art approach, in terms of energy consumption.
2) To show how an autonomous VM selection policy has
the abilities to reduce energy consumption, creating
a more efficient cloud data centre.
The rest of this paper is structured as follows: Section II
discusses related work, Section III introduces Reinforcement
Learning, Section IV presents our Dynamic RL-VM selection
Policy, Section V describes the experimental setup, Section VI
evaluates and analyses our results, and Section VII concludes
the paper.
II. RELATED WORK
In recent years much research has been conducted in the
areas of energy efficiency, dynamic resource selection and
allocation policies for cloud infrastructure. These approaches
can be classified into two main categories: (1) Threshold and
Non-Threshold, (2) Machine Learning.
A. Threshold and Non-Threshold Based Approaches
An example of a non-threshold approach is that of Verma
et al., who implemented a power aware application placement
framework called pMapper [19]. This is designed to utilise
power management applications such as CPU idling, Dynamic
Voltage and Frequency Scaling (DVFS) and consolidation
techniques that already exist in hypervisors. These techniques
are leveraged via separate modules, mainly the performance
manager which has a global overview of the system and
receives information such as SLAs and QoS parameters. The
migration manager deals directly with the VMs to implement
live migration, the power manager communicates with the
infrastructure layer to manage hardware energy policies and
then the arbitrator decides on information supplied from the
above mentioned polices for the optimal placement of VMs
through a bin packaging algorithm.
Threshold based approaches for autonomic scaling of re-
sources are commonplace, and are used by cloud providers
such as Amazon EC2 in their Auto Scaling software. Threshold
approaches are based on the premise of setting an upper and
lower bound threshold, that when broken trigger the allocation
or consolidation of resources as necessary. Research conducted
in the area of threshold based approaches includes a proposed
architecture known as ’the 1000 island solution architecture’
by Zhu et al. [22]. Similar to Verma et al. they consider
three separate application categories based on different time
periods, they then designate an individual controller to each
category. The largest timescale is hours to days, then minutes
and finally seconds. Each group is regarded as a pod and has
a node controller managing dynamic allocation of the node’s
resources. As part of the node controller, there is a utilisation
controller which computes resource consumption and estimates
what resources are required in order to meet SLAs in the
future.
B. Reinforcement Learning Based Approaches
In recent years, Reinforcement Learning has proven to
be a promising approach for optimal allocation of cloud
resources. Barrett et al. proposed a parallel RL framework for
optimisation of scaling resources instead of the threshold based
approach [3]. Barrett approach requires agents to approximate
optimal policies and each agents shares its information with
a global agent to improve overall performance. This approach
has been empirically proven to outperform the traditional rigid
threshold-based approaches. Bahati proposed an RL approach
to help simplify the management of existing threshold based
rules, where a primary controller applies rules to a system to
enforce its quality attribute and a secondary controller monitors
the effects of implementing these rules and adapts thresholds
accordingly [2]. Teasuro introduces a hybrid RL approach to
optimising sever allocations in data centres, through training
of a nonlinear function approximator in batch mode on a
data set, while an externally trained policy makes management
decisions within a given system [18] . Both Farahnakian et al.
[12] and Yuan et al. [21] demonstrate how RL can be used to
optimise the number of active hosts in operation in a given time
frame. An RL agent learns an on-line host detection policy
and dynamically consolidates machines in line with optimal
parameters. Both studies implement the minimum migration
time selection policy proposed by Beloglazov et al. [7], for
post detection of over utilised hosts in order to identify VMs
for migration. Tan et al. use an RL agent to shut down or
make hosts idle that are at minimal power consumption [17].
Dutreil et al. proposed a RL framework for autonomic resource
allocation in cloud domains [10]. They show how having good
learning policies in early phases using appropriate initialisation
and convergence, helps speed up learning in problems that
typically have a large convergence time.
This research is motivated by the fact all of the above
RL approaches have proven to be a statistical advantage over
threshold based approaches. We implement and evaluate RL at
a lower level of abstraction to learn policies for the selection of
VMs with the aim to reduce energy consumption and provide
a greener cloud data centre.
C. Virtual Machine Selection Policy
The study conducted by Beloglazov et al. in 2011 remains
one of the most highly cited and respected pieces of research
in relation to the consolidation of VMs while maximizing
performance and efficiency in cloud data centres [7]. Bel-
oglazov examines the dynamic consolidation of VMs while
considering multiple hosts and VMs in an IaaS environment.
Importantly, Beloglazov models SLAs as a key component in
a solution to VM consolidation which is a main feature in this
paper also. Beloglazov’s proposed algorithm can be broken
into three sections: (1) Over-Utilised/Under-Utilised detection,
(2) VM selection policy and (3) VM placement. In this paper
we are only interested in section 1 and 2 of Beloglazov’s
research. (1) Over-Utilised detection: Building on past research
Beloglazov suggests an adaptive selection policy known as
Local Regression (LR) for determining when VMs require
migration from the host in order not to violate SLAs [5]. LR,
first proposed by Cleveland, allows for the analysis of a local
subset of data, in this case hosts [9]. By providing an over
utilisation threshold along with a safety parameter, LR decides
if a host is likely to become over-utilised if their current
CPU utilisation usage multiplied by the safety parameter is
larger than the maximum possible utilisation. (2) VM selection:
VMv are placed on a migration list V based on the shortest
period of time to complete the migration. The minimum time
is considered as the utilised ram divided by spare bandwidth
for the host h , the policy selects a suitable VMv through the
following equation:
v ∈ Vh — ∀a ∈ Vh, RAMu(v)
NETh
≤ RAMu(a)
NETh
where RAMu(a) is the total RAM currently utilised by the
VMa and NETh is the spare network bandwidth available on
host h .
We have chosen Lr-Mmt as a state of the art approach
which we will use to benchmark the performance of our
proposed RL algorithm.
III. REINFORCEMENT LEARNING
In Reinforcement Learning (RL) (1998), an agent learns
through a trial and error process, by interacting with its
environment and observing the resulting reward signal [16]. RL
problems are modelled as Markov Decision Processes (MDP)
which provides a mathematical framework for modelling se-
quential decision making under uncertainty. An MDP is a tuple
S, A, T, R for example, an agent maps action a ∈ A, to
state s ∈ S, which then moves to future state s ∈ S.
The probability P that when a is executed in s it will
transition to s can be defined by the following:
Pa
s,s =Pr st+1 = s |st = s, at = a
The agent receives a scalar reward rt which can be either
negative or positive. The reward space represented as any
current state st and, at and together with any next state st+1,
the expected value of the next reward is as follows.
Ra
s,s = rt+1|st = s, at = a, st+1 = s
The goal of solving MDPs is to find a policy, by maximis-
ing accumulated rewards. In specific cases where a complete
environmental model is known that is S, A, T, R are fully
observable, the problem is then reduced to a planning problem
and can be solved using traditional dynamic programming
techniques such as value iteration. If there is no complete
model available then one must either attempt to approximate
the missing model (model-based reinforcement learning) or
directly estimate the value function or policy (model-free
Reinforcement Learning).
In the absence of a compete environmental model, model-
free Reinforcement Learning algorithms such as Q learning
which is used in this paper, can be used to generate optimal
policies [20]. Q-Learning belongs to a collection of algo-
rithms known as Temporal Difference (TD) methods which
estimates the state-action pair Q(st, at). Not needing a full
model of the environment TD has the capability to make
predictions incrementally by bootstrapping the current estimate
onto previous estimates. After every state-action-reward-state
transition experienced, the TD algorithm Q-learning, calculates
an estimated value known as a Q-value:
Q(st, at) ← Q(st, at) + α[rt + γMaxQ((st+1
, at+1
) −
Q(st, at))]
α is the learning rate, which determines how quickly
an agent learns. A α set close to 1 ensures most recent
information obtained is utilised while α close to 0 infers no
learning will take place. An agent’s degree of myopia can
be controlled by setting gamma γ between 0 and 1. The
closer γ is to 1, will emphasise greater weight on future
rewards whereas values close to 0 consider only most recent
rewards. MaxQ(st+1
, at+1
) returns the maximum estimate for
the future state-action pair. Once the Q-Value is calculate it is
then store in the agent’s Q-Matrix
Actions are chosen based on the policy π that the agent is
following. To ensure that the agent discover the most optimal
Fig. 1. RL Environment Interaction [16]
policy π, a trade-off between exploration and exploitation
must exist. An agent that always exploits the best action, is
said to be following a greedy selection policy, however such
an implementation never explores, thus paying no regard to
possible alternative more lucrative actions. In this research
paper an -greedy policy is used to ensure an agent can
explore the entirety of the environment and based on the value
of which will control exploration rate of the environment.
Figure 1 illustrates the interaction the RL agent has with the
environment. Pseudo-code 1 provides a
Q-learning Algorithm - Pseudo-code 1
Initialize Qmap arbitrarily, π
Repeat (while st is not terminal)
Observe st
Select at using: π
Execute at
Observe st+1 rt
Calculate Q Q(st, at) ←
Q(st, at) + α[rt + γMaxQ((st+1, at+1 ) − Q(st, at))]
IV. DYNAMIC RL-VM SELECTION POLICY
To the best of our knowledge this research is the first to
apply RL to the selection of individual VMs during migration
from an over-utilised host. Studies such as Beloglazov et al.
presents efficient VM selection techniques but however these
approaches cause high amounts of energy consumption [7].
Our aim is to investigate the effects an energy-aware learning
agent can have on live migration, which is a critical process
that consumes high amounts of energy in a data centre.
Our RL-VM selection policy (Lr-RL) learns to select a VM
to migrate from an over-utilised host. Each host contains a set
of VMs. Our Lr-RL algorithm determines the optimal VM
to migrate from a set of VMs. However to which host that
VM will be migrated is out of the scope of this research,
but will be considered in future work. To clarify, our Lr-RL
approach uses Beloglazov’s Local Regression (Lr) technique
to determine if a host is over-utilised (Section II, C). Our RL
agent is implemented in-place of their Mmt algorithm. By
giving the agent observability over key state variables such
as energy consumption then the agent has an advantage over
Lr-Mmt (section II), the algorithm we will be examining our
approach with.
A. State-Action Space
We created a novel percentile state-action space to incor-
porate the RL algorithm into an IaaS environment. The state
space st is defined as the current host utilisation (CPU Usage),
denoted as hu, returned as a percentage. Therefore, this allows
the state space to be defined as st ∈ S = {0 - 100} and is
obtained through the following equation,
st =
n
v=1
vmu(v) (1)
where v is the current VM selected, vmu is the function
that calculates the current VM’s utilisation of the host’s CPU
and n is all possible VMs that can be migrated. For Example,
each host contains a set of VMs, for each VM the utilisa-
tion percentage of that host resources is calculated and then
summed to determine the overall utilisation for that host, in
the form of a percentage.
The action a space is represented as vmu (defined above)
of its assigned host h, returned as a percentage.
a=
t
vmu(v)
hu(h)
· 100 (2)
The action space is defined in terms of percentage, as at ∈ A
= {0 - 100}. Each state-action pair is mapped to a q-value in
the Agent’s Q-Matrix.
B. Reward Function
Reinforcement Learning maximise rewards through map-
ping of states to actions. In order to achieve this a recurrent
interaction at discreet time steps between the agent and envi-
ronment is necessary. The RL agent receives a representation
of the environment in the form of the current state st which
allows an action at to be returned based on the policy the agent
is following. The Reward function R( st, at) is determined as
the action at, for the current state st of the host. At the next
time step the environment returns a new representation of the
current state st+1 and a numerical reward rt based on the
previous action at undertaken.
For example, a host has an over-utilisation (in terms of
CPU utilisation) of 90%, this then is the agents st. The RL
agent selects an at for example, a VM that is utilising 10%
of the host CPU. Then the host transitions into st+1, which in
this case would be 80%. The agent receives a reward based on
the energy usage of the host once the VM has been migrated.
The reward is defined as presented in [6] the host’s power
consumption is nearly proportional to its CPU utilization, so
the power consumption can be described by Equation (1):
P(µ) = 0.7 · Pmax + 0.3 · Pmaxµ (3)
where Pmax denotes the hosts power consumption when it is
in full load and µ represents the PMs CPU utilization, which
changes over time.
C. Q-Learning Implementation
The following details how the Lr-RL algorithm functions.
First the pseudo-code of the dynamic RL-VM selection policy
is provided (Pseudo-code 2) with a detailed explanation of how
the algorithm will function.
Dynamic RL-VM Selection Policy - Pseudo-code 2
host → overUtilisedHost
V M → migrateableV MspossibleAction ← vmSize
choose VM from possibleActions using π
Migrate V M
Observe Future host Utilisation, reward
calculate Q
Q(st, at) ← Q(st, at)+α[rt +γ Max(Q(st+1, at+1)−Q(st, at))]
update Q − Matrix
The Lr-RL algorithm is invoked when a host is determined to
be over-utilised through the Lr function. The host is placed on
a list of over-utilised hosts. A host is selected from the list of
over-utilised hosts and the RL agent calculates the host’s state,
in terms of percentage (see equation 1). All VMs are mapped
as possible actions based on the percentage of CPU utilisation
of their host (see equation 2). The RL agent selects an action
(i.e. VM) based on the RL selection policy i.e -greedy. The
agent performs the action, migrating the selected VM from the
current host to a suitable alternative host. Then agent observes
the new host’s utilisation level and a energy-aware-reward is
received (defined in section IV, B). The agent calculates the
q-value for the state-action pair, which is then mapped to the
Q-Matrix. If the host is deemed to be still over-utilised, the RL
agent selects the next optimal VM to migrate. The process is
repeated, until a time when the host is no longer over utilised.
V. EXPERIMENT SETUP
As the target system is an IaaS, it was essential to
conduct the experiments on a large scale virtualised data
centre infrastructure simulation tool, as a real world system
would be too complex. The Cloudsim framework allows for
the representation of a power-aware data centre with LAN-
migration capabilities. For the experiment conducted in this
paper to be considered fair, we have set the Cloudsim param-
eters according to Beloglazov et al [7]. This comprises of 800
physical servers consisting of 400 HP Proliant MI 110 G5 and
400 HP Proliant Ml 110 g4 servers is replicated with in the
simulator. A 30 day workload used in this experiment comes
from a real world IaaS environment. PlanetLab files within
the CloudSim framework contain data from CoMon project
representing CPU utilization of over a 100 VMs from servers
located in 500 locations worldwide. In order to make the
experiments more realistic, a 30 day workload experiment was
created on a random bases from the PlanetLab files containing
288 values representative of CPU workloads. VMs are assigned
to these 30 day workloads on a random basis in order to best
represent the stochastic characteristics of workload allocation
and demands within an IaaS environment. Cloudsim offers a
default ceiling threshold of 100% for each hosts, with a safety
parameter of 1.2. This safety parameter acts as an over-utilised
buffer. For example if the current utilisation of a host is 85%
and the safety parameter value is set to 1.2, this gives an the
host a utilisation level of 102% and is consider to be over-
utilised.
To ensure an agent converges to an optimal policy the
learning parameters must be set. These values were selected by
conducting a parameter sweep to determine the most optimal
performance of the agent’s learning abilities. The parameters
for the experiment in this paper are set as follows: α =.8, γ
=.8 and =.05.
A. Experiment Metrics
The following metrics are used to evaluate the Lr-RL
algorithm abilities with the Lr-Mmt heuristic.
1) Energy Consumption: The total energy consumed by the
data centre per day in relation to computational resources i.e.
servers. Although other energy draws exist, such as cooling
and infrastructural demands, this area was deemed outside the
scope of this research.
2) Migrations of VMs: The total migrations of all VMs
on all servers, performed by the data centre over a 30 day
workload.
VI. PRELIMINARY RESULTS
This section compares our Lr-RL algorithm against the
benchmark set by the Lr-Mmt technique. Both algorithms were
subject to the 30 day workload experiment, repeated 100 times.
For every iteration of the experiment, the RL agent’s Q-matrix
was initialised to 0.
Figure 2. presents the energy consumption results for
both algorithms. The results show over the 30 day workload
period, Lr-RL consumed a total of 3,948.35 kWh compared
to 4,623.75kWh for Lr-Mmt. The standard deviation for Lr-
RL was +
−28.79 kWh in comparison with +
− 33.12 kWh for
Lr-Mmt. A paired t-test shows that there is a statistically
significant difference in the consumption of energy when
utilising Lr-RL and Lr-Mmt resulting in a p-value <0.0067
with a 95% confidence interval (-6.474, -38.5533).
The migrations results for both algorithms are shown in
Figure 3., highlighting that Lr-RL had a considerably lower
number of migrations than Lr-Mmt. Lr-RL selects VMs based
on future energy usage of a host. This resulted in fewer
migrations from a single host for Lr-RL and a host will
maintain sufficient processing power capabilities. Over the 30
day workload period, Lr-RL had a total of 525,769 migrations
compared to Lr-Mmt with 797,496 migrations. The standard
deviation for migrations for Lr-RL was +
−4,443.25 compared
to +
−5,898.23 for Lr-Mmt. A paired t-test confirms that there
is a statistically significant difference between Lr-RL and Lr-
Mmt, with a p-value <0.0001 with a 95% confidence interval
(-6,358.85, -11,756.41 ).
Examining a single day workload in more detail further
highlights the improvements that Lr-RL could potentially
contribute to real world data centres. The correlation between
the decreased number of migrations and the energy reduction
for the Lr-RL algorithm is shown in Figure 4, measured at the
industry standard of 5 minute intervals. For day 1 of the 30 day
workload, Lr-Mmt had a total energy consumption of 138.55
kWh and 23,211 total migrations, while Lr-RL had an energy
80
100
120
140
160
180
200
220
240
0 5 10 15 20 25
kWh
Workloads
Lr-RL
Lr-mmt
Fig. 2. Energy Consumption
10000
15000
20000
25000
30000
35000
40000
45000
0 5 10 15 20 25
Migrations
Workloads
Lr-RL
Lr-mmt
Fig. 3. Migrations of VMs
consumption of 127.31 kWh and 19,437.4 total migrations.
Lr-RL saves on average 11.24 kWh and performs 3,773.6 less
migrations on the first day of this workload.
On average for a single day workload Lr-RL saves 22.51
kWh of energy, with 9,058 less migrations than Lr-Mmt. Lr-
Mmt requires nearly 12 VMs to be moved from a host whereas
on average Lr-RL never requires greater than 2 VMs to be
migrations. One reason for this being, Lr-Mmt chooses a VM
with the least time of movement accounting for only 3.06%
of overall host utilisation. Whereas Lr-RL on average selects
a VM that accounts for 12.87% of overall host utilisation thus
enabling faster process whereby a host is no longer considered
to be in an over-utilised state.
Considering the energy saving aspect of our results. On
average Lr-RL saved 22.51 kWh per day,this results in an
estimated savings of 8,577.5 kWh per year. According to
calculations by the EPA, this is equivalent to a reduction of 5.9
metric tons of CO2 emissions due to electricity generation and
potentially protecting 4.6 acres of forest land from destruction
[11].
The results highlight the adaptive nature of RL. An agent
with the capabilities to learn and adapt to changing workloads,
results in a reduction of energy consumption in a cloud
domain. However, RL too has drawbacks, for an agent to learn,
a number of training episodes must be conducted, potentially
leading to a substantial amount of time for an agent to
converge on an optimal policy. Although the Lr-Mmt approach
consumes high amounts of energy and incurs high overhead
Fig. 4. Energy & Migration Correlation for Workload Day 1
costs for cloud providers, the size of the VM (in terms of
RAM) to be migrated will not have a considerable impact on
compute resources such as bandwidth in a data centre.
VII. CONCLUSION
This research presented a dynamic RL VM selection policy
(Lr-RL), where an agent learned to select a VM to migrate
from an over-utilised host. Based on an energy-aware-reward
function the agent reduces energy consumption and migrations.
To address the aims of the paper proposed in Section I:
1) Our Lr-RL approach improves upon the best known
algorithm Lr-Mmt in terms of energy consumption.
Our agent based approach learns to select the optimal
VM to be migrated. We created a novel percentile
stat-action space, represented by the host CPU utili-
sation in a percentage and the VMs usage of the host’s
CPU, also in percentage. The experiment results
demonstrated that Reinforcement Learning can be
implemented at a low level of abstraction for the use
in a IaaS environment.
2) The energy aware reward function provided energy
performance feedback to the agent when selecting an
appropriate VM for migration. From our EPA calcula-
tions, our RL VM selection policy has the capabilities
to create a cognitive live migration framework that
has the potential to decrease C02 emissions from a
cloud data centre.
Our research so far shows the potential benefits of an
agent based approach when applied to energy consumption
problems in a cloud simulation domain. The SLAV model was
out of the scope of this research as we wanted to highlight
the advancement RL could achieve in energy consumption. In
future work we plan to model the SLAV performance of both
algorithms. The work will enable an agent to observe both
SLAV and energy to decide to most optimal VM to migrate
while improving the performance of a cloud data centre.
REFERENCES
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski,
G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud
computing. Communications of the ACM, 53(4):50–58, 2010.
[2] R. M. Bahati and M. A. Bauer. Towards adaptive policy-based
management. In Network Operations and Management Symposium
(NOMS), 2010 IEEE, pages 511–518. IEEE, 2010.
[3] E. Barrett, E. Howley, and J. Duggan. Applying reinforcement learning
towards automating resource allocation and application scalability in
the cloud. Concurrency and Computation: Practice and Experience,
25(12):1656–1674, 2013.
[4] L. A. Barroso and U. H¨olzle. The case for energy-proportional
computing. Computer, (12):33–37, 2007.
[5] A. Beloglazov, J. Abawajy, and R. Buyya. Energy-aware resource
allocation heuristics for efficient management of data centers for cloud
computing. Future generation computer systems, 28(5):755–768, 2012.
[6] A. Beloglazov and R. Buyya. Adaptive threshold-based approach for
energy-efficient consolidation of virtual machines in cloud data centers.
2010.
[7] A. Beloglazov and R. Buyya. Optimal online deterministic algorithms
and adaptive heuristics for energy and performance efficient dynamic
consolidation of virtual machines in cloud data centers. Concurrency
and Computation: Practice and Experience, 24(13):1397–1420, 2012.
[8] R. Brown. Report to congress on server and data center energy effi-
ciency: Public law 109-431. Lawrence Berkeley National Laboratory,
2008.
[9] W. S. Cleveland. Robust locally weighted regression and smooth-
ing scatterplots. Journal of the American statistical association,
74(368):829–836, 1979.
[10] X. Dutreilh, S. Kirgizov, O. Melekhova, J. Malenfant, N. Rivierre, and
I. Truck. Using reinforcement learning for autonomic resource alloca-
tion in clouds: Towards a fully automated workflow. In ICAS 2011,
The Seventh International Conference on Autonomic and Autonomous
Systems, pages 67–74, 2011.
[11] Epa.gov. Calculations and references — clean energy — us epa.
[12] F. Farahnakian, P. Liljeberg, and J. Plosila. Energy-efficient virtual ma-
chines consolidation in cloud data centers using reinforcement learning.
In Parallel, Distributed and Network-Based Processing (PDP), 2014
22nd Euromicro International Conference on, pages 500–507. IEEE,
2014.
[13] J. Koomey. Growth in data center electricity use 2005 to 2010. A report
by Analytical Press, completed at the request of The New York Times,
page 9, 2011.
[14] J. G. Koomey et al. Estimating total power consumption by servers in
the us and the world, 2007.
[15] C. Pettey. Gartner estimates ict industry accounts for 2 percent of global
co2 emissions. Dostupno na, 14:2013, 2007.
[16] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction,
volume 1. MIT press Cambridge, 1998.
[17] Y. Tan, W. Liu, and Q. Qiu. Adaptive power management using
reinforcement learning. In Proceedings of the 2009 International
Conference on Computer-Aided Design, pages 461–467. ACM, 2009.
[18] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. A hybrid
reinforcement learning approach to autonomic resource allocation. In
Autonomic Computing, 2006. ICAC’06. IEEE International Conference
on, pages 65–73. IEEE, 2006.
[19] A. Verma, P. Ahuja, and A. Neogi. pmapper: power and migration
cost aware application placement in virtualized systems. In Middleware
2008, pages 243–264. Springer, 2008.
[20] C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3-4):279–
292, 1992.
[21] J. Yuan, X. Miao, L. Li, and X. Jiang. An online energy saving
resource optimization methodology for data center. Journal of Software,
8(8):1875–1880, 2013.
[22] X. Zhu, D. Young, B. J. Watson, Z. Wang, J. Rolia, S. Singhal,
B. McKee, C. Hyser, D. Gmach, R. Gardner, et al. 1000 islands:
Integrated capacity and workload management for the next generation
data center. In Autonomic Computing, 2008. ICAC’08. International
Conference on, pages 172–181. IEEE, 2008.
View publication statsView publication stats

More Related Content

What's hot

Energy-aware Load Balancing and Application Scaling for the Cloud Ecosystem
Energy-aware Load Balancing and Application Scaling for the Cloud EcosystemEnergy-aware Load Balancing and Application Scaling for the Cloud Ecosystem
Energy-aware Load Balancing and Application Scaling for the Cloud Ecosystem1crore projects
 
FDMC: Framework for Decision Making in Cloud for EfficientResource Management
FDMC: Framework for Decision Making in Cloud for EfficientResource Management FDMC: Framework for Decision Making in Cloud for EfficientResource Management
FDMC: Framework for Decision Making in Cloud for EfficientResource Management IJECEIAES
 
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
Ieeepro techno solutions   2014 ieee java project - deadline based resource p...Ieeepro techno solutions   2014 ieee java project - deadline based resource p...
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...hemanthbbc
 
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMIMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMAssociate Professor in VSB Coimbatore
 
IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...
IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...
IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...IRJET Journal
 
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...IJECEIAES
 
Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmHybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmIRJET Journal
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Migration Control in Cloud Computing to Reduce the SLA Violation
Migration Control in Cloud Computing to Reduce the SLA ViolationMigration Control in Cloud Computing to Reduce the SLA Violation
Migration Control in Cloud Computing to Reduce the SLA Violationrahulmonikasharma
 
EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...
EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...
EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...IJCNCJournal
 
Failure aware resource provisioning for hybrid cloud infrastructure
Failure aware resource provisioning for hybrid cloud infrastructureFailure aware resource provisioning for hybrid cloud infrastructure
Failure aware resource provisioning for hybrid cloud infrastructureFreddie Zhang
 
Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters IJECEIAES
 
Differentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various ParametersDifferentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various Parametersiosrjce
 
Ieeepro techno solutions ieee java project - budget-driven scheduling algor...
Ieeepro techno solutions   ieee java project - budget-driven scheduling algor...Ieeepro techno solutions   ieee java project - budget-driven scheduling algor...
Ieeepro techno solutions ieee java project - budget-driven scheduling algor...hemanthbbc
 

What's hot (18)

Energy-aware Load Balancing and Application Scaling for the Cloud Ecosystem
Energy-aware Load Balancing and Application Scaling for the Cloud EcosystemEnergy-aware Load Balancing and Application Scaling for the Cloud Ecosystem
Energy-aware Load Balancing and Application Scaling for the Cloud Ecosystem
 
FDMC: Framework for Decision Making in Cloud for EfficientResource Management
FDMC: Framework for Decision Making in Cloud for EfficientResource Management FDMC: Framework for Decision Making in Cloud for EfficientResource Management
FDMC: Framework for Decision Making in Cloud for EfficientResource Management
 
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
Ieeepro techno solutions   2014 ieee java project - deadline based resource p...Ieeepro techno solutions   2014 ieee java project - deadline based resource p...
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
 
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMIMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
 
1732 1737
1732 17371732 1737
1732 1737
 
IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...
IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...
IRJET- An Efficient Energy Consumption Minimizing Based on Genetic and Power ...
 
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
 
Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search AlgorithmHybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
Hybrid Task Scheduling Approach using Gravitational and ACO Search Algorithm
 
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
 
B03410609
B03410609B03410609
B03410609
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
20120140504025
2012014050402520120140504025
20120140504025
 
Migration Control in Cloud Computing to Reduce the SLA Violation
Migration Control in Cloud Computing to Reduce the SLA ViolationMigration Control in Cloud Computing to Reduce the SLA Violation
Migration Control in Cloud Computing to Reduce the SLA Violation
 
EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...
EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...
EVALUATION OF CONGESTION CONTROL METHODS FOR JOINT MULTIPLE RESOURCE ALLOCATI...
 
Failure aware resource provisioning for hybrid cloud infrastructure
Failure aware resource provisioning for hybrid cloud infrastructureFailure aware resource provisioning for hybrid cloud infrastructure
Failure aware resource provisioning for hybrid cloud infrastructure
 
Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters
 
Differentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various ParametersDifferentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various Parameters
 
Ieeepro techno solutions ieee java project - budget-driven scheduling algor...
Ieeepro techno solutions   ieee java project - budget-driven scheduling algor...Ieeepro techno solutions   ieee java project - budget-driven scheduling algor...
Ieeepro techno solutions ieee java project - budget-driven scheduling algor...
 

Viewers also liked

Viewers also liked (16)

Firm Profile JSRA - 2017
Firm Profile JSRA - 2017Firm Profile JSRA - 2017
Firm Profile JSRA - 2017
 
Youthpass - Ana - signed
Youthpass - Ana - signedYouthpass - Ana - signed
Youthpass - Ana - signed
 
Recorrido literario por la figura de juan ramón
Recorrido literario por la figura de juan ramónRecorrido literario por la figura de juan ramón
Recorrido literario por la figura de juan ramón
 
Risk solutions company
Risk solutions companyRisk solutions company
Risk solutions company
 
Cultura
CulturaCultura
Cultura
 
Creación de una agencia de diseño de modas
Creación de una agencia de diseño de modasCreación de una agencia de diseño de modas
Creación de una agencia de diseño de modas
 
Sou de jesus
Sou de jesusSou de jesus
Sou de jesus
 
คำศัพท์ประกอบหน่วยที่1
คำศัพท์ประกอบหน่วยที่1คำศัพท์ประกอบหน่วยที่1
คำศัพท์ประกอบหน่วยที่1
 
Work12
Work12Work12
Work12
 
Penal general tarea_2
Penal general tarea_2Penal general tarea_2
Penal general tarea_2
 
Garry Salter CV
Garry Salter CVGarry Salter CV
Garry Salter CV
 
Planeacion de actividades
Planeacion de actividadesPlaneacion de actividades
Planeacion de actividades
 
Lição 05 Adulto 1 trimestre 2017
Lição 05 Adulto 1 trimestre 2017 Lição 05 Adulto 1 trimestre 2017
Lição 05 Adulto 1 trimestre 2017
 
Actividad 2 tic power point
Actividad 2 tic power pointActividad 2 tic power point
Actividad 2 tic power point
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
Se isto não for amor
Se isto não for amorSe isto não for amor
Se isto não for amor
 

Similar to INTECHDublinConference-247-camera-ready

Fuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task SchedulingFuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task Schedulingijtsrd
 
Hybrid Based Resource Provisioning in Cloud
Hybrid Based Resource Provisioning in CloudHybrid Based Resource Provisioning in Cloud
Hybrid Based Resource Provisioning in CloudEditor IJCATR
 
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...Susheel Thakur
 
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTINGA SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTINGijccsa
 
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTINGA SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTINGijccsa
 
A Survey on Resource Allocation in Cloud Computing
A Survey on Resource Allocation in Cloud ComputingA Survey on Resource Allocation in Cloud Computing
A Survey on Resource Allocation in Cloud Computingneirew J
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET Journal
 
Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...IAEME Publication
 
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGAIRCC Publishing Corporation
 
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGijcsit
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeYogeshIJTSRD
 
Load Balancing in Cloud Computing Through Virtual Machine Placement
Load Balancing in Cloud Computing Through Virtual Machine PlacementLoad Balancing in Cloud Computing Through Virtual Machine Placement
Load Balancing in Cloud Computing Through Virtual Machine PlacementIRJET Journal
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...neirew J
 
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...Ericsson
 
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Eswar Publications
 
Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World IRJET Journal
 
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...IRJET Journal
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Similar to INTECHDublinConference-247-camera-ready (20)

Fuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task SchedulingFuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
 
Hybrid Based Resource Provisioning in Cloud
Hybrid Based Resource Provisioning in CloudHybrid Based Resource Provisioning in Cloud
Hybrid Based Resource Provisioning in Cloud
 
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
 
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTINGA SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
 
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTINGA SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
A SURVEY ON RESOURCE ALLOCATION IN CLOUD COMPUTING
 
A Survey on Resource Allocation in Cloud Computing
A Survey on Resource Allocation in Cloud ComputingA Survey on Resource Allocation in Cloud Computing
A Survey on Resource Allocation in Cloud Computing
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...
 
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
 
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
 
Load Balancing in Cloud Computing Through Virtual Machine Placement
Load Balancing in Cloud Computing Through Virtual Machine PlacementLoad Balancing in Cloud Computing Through Virtual Machine Placement
Load Balancing in Cloud Computing Through Virtual Machine Placement
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
 
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
 
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
 
Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World
 
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
F1034047
F1034047F1034047
F1034047
 
ENERGY EFFICIENCY IN CLOUD COMPUTING
ENERGY EFFICIENCY IN CLOUD COMPUTINGENERGY EFFICIENCY IN CLOUD COMPUTING
ENERGY EFFICIENCY IN CLOUD COMPUTING
 

INTECHDublinConference-247-camera-ready

  • 1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/306056598 A Reinforcement Learning Approach for Dynamic Selection of Virtual Machines in Cloud Data Centres Conference Paper · August 2016 CITATIONS 0 READS 111 5 authors, including: Kieran Flesk National University of Ireland, Galway 1 PUBLICATION 0 CITATIONS SEE PROFILE Jim Duggan National University of Ireland, Galway 106 PUBLICATIONS 506 CITATIONS SEE PROFILE Enda Howley National University of Ireland, Galway 59 PUBLICATIONS 209 CITATIONS SEE PROFILE Enda Barrett 11 PUBLICATIONS 61 CITATIONS SEE PROFILE All content following this page was uploaded by Martin Duggan on 11 August 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.
  • 2. A Reinforcement Learning Approach for Dynamic Selection of Virtual Machines in Cloud Data Centres Martin Duggan National University Of Ireland, Galway m.duggan1@nuigalway.ie Kieran Flesk National University Of Ireland, Galway k.flesk2@nuigalway.ie Jim Duggan National University Of Ireland, Galway jim.duggan@nuigalway.ie Enda Howley National University Of Ireland, Galway ehowley@nuigalway.ie Enda Barrett National University Of Ireland, Galway enda.barrett@nuigalway.ie Abstract—In recent years Machine Learning techniques have proven to reduce energy consumption when applied to cloud computing systems. Reinforcement Learning provides a promis- ing solution for the reduction of energy consumption, while maintaining a high quality of service for customers. We present a novel single agent Reinforcement Learning approach for the selection of virtual machines, creating a new energy efficiency practice for data centres. Our dynamic Reinforcement Learning virtual machine selection policy learns to choose the optimal virtual machine to migrate from an over-utilised host. Our experiment results show that a learning agent has the abilities to reduce energy consumption and decrease the number of migrations when compared to a state-of-the-art approach. Keywords—Live Migration, Energy, Reinforcement Learning. I. INTRODUCTION Recent research in cloud computing has highlighted the increasing environmental impact of data centres with regards to electricity usage and associated CO2 emissions. According to Koomey et al. between the years of 2005 to 2010, data centre power consumption increased by 56% and in 2010, 1.3% of all power consumed worldwide was due to data centre operations [14]. Studies by Barriso et al. and Koomey et al. highlight that increases in energy consumption range in the billions of kWh [4], [13], [14]. Furthermore a study by Petty et al. in 2007 highlighted that the Information Communication Technology (ICT) industry contributed to 2% of global CO2 emission each year, putting it on par with the aviation industry [15]. However, a report by Brown in 2007 directly addresses this issue by stating that existing technologies and strategies could reduce typical server energy usage by an estimated 25% [8]. This report highlights if state-of-the-art energy efficiency practices are implemented throughout the U.S. data centres, an estimated 55% of energy consumption could be reduced when compared to current levels. Cloud computing leverages infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS) to provide virtualised computing resources to customers. A key aspect of the IaaS platform is to provide computer infrastructure, for example virtualised hardware, virtual server space, network connections and bandwidth. The IaaS consists of multiple servers and networks, distributed throughout and can be divided among numerous data centres. The virtualised structure of the IaaS platform allows for the re-allocation of resources through migrating a virtual machine (VM) or a group of VMs among hosts. Migration is the process of moving VMs from one physical host to another based on resource allocation or power saving techniques, however migration has a considerable impact on energy con- sumption in data centres. Migration is triggered once a host transitions into an over-utilised or underutilised state, then a virtual Machine (VM) or VMs must be relocated to another host that has sufficient resource, to ensure quality of service (QoS) is guaranteed for both customer and cloud provider. Live VM migration is commonly used in cloud data centres due to its capability of maintaining high system performance under dynamic workloads. Live migration has been studied extensively and has shown to be of benefit to cloud providers [1]. Live migration of a singe VM will require computing resources and can lead to increases of energy consumption in a data centre. However a group of migration will require significant amount of resources and energy which may lead to the violations in the Service Level Agreement (SLA), which is the contract between cloud provider and cloud customer. When a service level agreement violations (SLAV) occurs this cause increased penalties in overhead for the cloud provider. The longer a host stays in a over-utilised state the greater the energy consumption. Thus selecting the optimal VM to migrate from a host presents a complex and challenging problem. In this paper, we focus on the selection of a VM to migrate from an over-utilised host. We propose a dynamic Reinforcement Learning (RL) VM selection policy, enabling a single learning agent to decide on a optimal VM for migration from an over-utilised host depending on the current energy consumption. This agent-based VM selection policy directly addresses the issues highlighted in [8] by implementing a new state-of-the-art energy efficiency practice. Our algorithm is shown to reduce energy consumption and decrease migration of VMs, leading to a greener cloud data centre. The contributions of this paper are: 1) Present a novel Reinforcement Learning VM selec- tion policy (Lr-RL) with the capabilities to decide an appropriate VM to migrate from an over-utilised host. We create a novel state-action space, based on the CPU utilisation percentage of the host and VM to be migrated, to show how a Reinforcement Learning algorithm can improve upon a state of the art approach, in terms of energy consumption. 2) To show how an autonomous VM selection policy has the abilities to reduce energy consumption, creating a more efficient cloud data centre. The rest of this paper is structured as follows: Section II discusses related work, Section III introduces Reinforcement
  • 3. Learning, Section IV presents our Dynamic RL-VM selection Policy, Section V describes the experimental setup, Section VI evaluates and analyses our results, and Section VII concludes the paper. II. RELATED WORK In recent years much research has been conducted in the areas of energy efficiency, dynamic resource selection and allocation policies for cloud infrastructure. These approaches can be classified into two main categories: (1) Threshold and Non-Threshold, (2) Machine Learning. A. Threshold and Non-Threshold Based Approaches An example of a non-threshold approach is that of Verma et al., who implemented a power aware application placement framework called pMapper [19]. This is designed to utilise power management applications such as CPU idling, Dynamic Voltage and Frequency Scaling (DVFS) and consolidation techniques that already exist in hypervisors. These techniques are leveraged via separate modules, mainly the performance manager which has a global overview of the system and receives information such as SLAs and QoS parameters. The migration manager deals directly with the VMs to implement live migration, the power manager communicates with the infrastructure layer to manage hardware energy policies and then the arbitrator decides on information supplied from the above mentioned polices for the optimal placement of VMs through a bin packaging algorithm. Threshold based approaches for autonomic scaling of re- sources are commonplace, and are used by cloud providers such as Amazon EC2 in their Auto Scaling software. Threshold approaches are based on the premise of setting an upper and lower bound threshold, that when broken trigger the allocation or consolidation of resources as necessary. Research conducted in the area of threshold based approaches includes a proposed architecture known as ’the 1000 island solution architecture’ by Zhu et al. [22]. Similar to Verma et al. they consider three separate application categories based on different time periods, they then designate an individual controller to each category. The largest timescale is hours to days, then minutes and finally seconds. Each group is regarded as a pod and has a node controller managing dynamic allocation of the node’s resources. As part of the node controller, there is a utilisation controller which computes resource consumption and estimates what resources are required in order to meet SLAs in the future. B. Reinforcement Learning Based Approaches In recent years, Reinforcement Learning has proven to be a promising approach for optimal allocation of cloud resources. Barrett et al. proposed a parallel RL framework for optimisation of scaling resources instead of the threshold based approach [3]. Barrett approach requires agents to approximate optimal policies and each agents shares its information with a global agent to improve overall performance. This approach has been empirically proven to outperform the traditional rigid threshold-based approaches. Bahati proposed an RL approach to help simplify the management of existing threshold based rules, where a primary controller applies rules to a system to enforce its quality attribute and a secondary controller monitors the effects of implementing these rules and adapts thresholds accordingly [2]. Teasuro introduces a hybrid RL approach to optimising sever allocations in data centres, through training of a nonlinear function approximator in batch mode on a data set, while an externally trained policy makes management decisions within a given system [18] . Both Farahnakian et al. [12] and Yuan et al. [21] demonstrate how RL can be used to optimise the number of active hosts in operation in a given time frame. An RL agent learns an on-line host detection policy and dynamically consolidates machines in line with optimal parameters. Both studies implement the minimum migration time selection policy proposed by Beloglazov et al. [7], for post detection of over utilised hosts in order to identify VMs for migration. Tan et al. use an RL agent to shut down or make hosts idle that are at minimal power consumption [17]. Dutreil et al. proposed a RL framework for autonomic resource allocation in cloud domains [10]. They show how having good learning policies in early phases using appropriate initialisation and convergence, helps speed up learning in problems that typically have a large convergence time. This research is motivated by the fact all of the above RL approaches have proven to be a statistical advantage over threshold based approaches. We implement and evaluate RL at a lower level of abstraction to learn policies for the selection of VMs with the aim to reduce energy consumption and provide a greener cloud data centre. C. Virtual Machine Selection Policy The study conducted by Beloglazov et al. in 2011 remains one of the most highly cited and respected pieces of research in relation to the consolidation of VMs while maximizing performance and efficiency in cloud data centres [7]. Bel- oglazov examines the dynamic consolidation of VMs while considering multiple hosts and VMs in an IaaS environment. Importantly, Beloglazov models SLAs as a key component in a solution to VM consolidation which is a main feature in this paper also. Beloglazov’s proposed algorithm can be broken into three sections: (1) Over-Utilised/Under-Utilised detection, (2) VM selection policy and (3) VM placement. In this paper we are only interested in section 1 and 2 of Beloglazov’s research. (1) Over-Utilised detection: Building on past research Beloglazov suggests an adaptive selection policy known as Local Regression (LR) for determining when VMs require migration from the host in order not to violate SLAs [5]. LR, first proposed by Cleveland, allows for the analysis of a local subset of data, in this case hosts [9]. By providing an over utilisation threshold along with a safety parameter, LR decides if a host is likely to become over-utilised if their current CPU utilisation usage multiplied by the safety parameter is larger than the maximum possible utilisation. (2) VM selection: VMv are placed on a migration list V based on the shortest period of time to complete the migration. The minimum time is considered as the utilised ram divided by spare bandwidth for the host h , the policy selects a suitable VMv through the following equation: v ∈ Vh — ∀a ∈ Vh, RAMu(v) NETh ≤ RAMu(a) NETh where RAMu(a) is the total RAM currently utilised by the VMa and NETh is the spare network bandwidth available on host h .
  • 4. We have chosen Lr-Mmt as a state of the art approach which we will use to benchmark the performance of our proposed RL algorithm. III. REINFORCEMENT LEARNING In Reinforcement Learning (RL) (1998), an agent learns through a trial and error process, by interacting with its environment and observing the resulting reward signal [16]. RL problems are modelled as Markov Decision Processes (MDP) which provides a mathematical framework for modelling se- quential decision making under uncertainty. An MDP is a tuple S, A, T, R for example, an agent maps action a ∈ A, to state s ∈ S, which then moves to future state s ∈ S. The probability P that when a is executed in s it will transition to s can be defined by the following: Pa s,s =Pr st+1 = s |st = s, at = a The agent receives a scalar reward rt which can be either negative or positive. The reward space represented as any current state st and, at and together with any next state st+1, the expected value of the next reward is as follows. Ra s,s = rt+1|st = s, at = a, st+1 = s The goal of solving MDPs is to find a policy, by maximis- ing accumulated rewards. In specific cases where a complete environmental model is known that is S, A, T, R are fully observable, the problem is then reduced to a planning problem and can be solved using traditional dynamic programming techniques such as value iteration. If there is no complete model available then one must either attempt to approximate the missing model (model-based reinforcement learning) or directly estimate the value function or policy (model-free Reinforcement Learning). In the absence of a compete environmental model, model- free Reinforcement Learning algorithms such as Q learning which is used in this paper, can be used to generate optimal policies [20]. Q-Learning belongs to a collection of algo- rithms known as Temporal Difference (TD) methods which estimates the state-action pair Q(st, at). Not needing a full model of the environment TD has the capability to make predictions incrementally by bootstrapping the current estimate onto previous estimates. After every state-action-reward-state transition experienced, the TD algorithm Q-learning, calculates an estimated value known as a Q-value: Q(st, at) ← Q(st, at) + α[rt + γMaxQ((st+1 , at+1 ) − Q(st, at))] α is the learning rate, which determines how quickly an agent learns. A α set close to 1 ensures most recent information obtained is utilised while α close to 0 infers no learning will take place. An agent’s degree of myopia can be controlled by setting gamma γ between 0 and 1. The closer γ is to 1, will emphasise greater weight on future rewards whereas values close to 0 consider only most recent rewards. MaxQ(st+1 , at+1 ) returns the maximum estimate for the future state-action pair. Once the Q-Value is calculate it is then store in the agent’s Q-Matrix Actions are chosen based on the policy π that the agent is following. To ensure that the agent discover the most optimal Fig. 1. RL Environment Interaction [16] policy π, a trade-off between exploration and exploitation must exist. An agent that always exploits the best action, is said to be following a greedy selection policy, however such an implementation never explores, thus paying no regard to possible alternative more lucrative actions. In this research paper an -greedy policy is used to ensure an agent can explore the entirety of the environment and based on the value of which will control exploration rate of the environment. Figure 1 illustrates the interaction the RL agent has with the environment. Pseudo-code 1 provides a Q-learning Algorithm - Pseudo-code 1 Initialize Qmap arbitrarily, π Repeat (while st is not terminal) Observe st Select at using: π Execute at Observe st+1 rt Calculate Q Q(st, at) ← Q(st, at) + α[rt + γMaxQ((st+1, at+1 ) − Q(st, at))] IV. DYNAMIC RL-VM SELECTION POLICY To the best of our knowledge this research is the first to apply RL to the selection of individual VMs during migration from an over-utilised host. Studies such as Beloglazov et al. presents efficient VM selection techniques but however these approaches cause high amounts of energy consumption [7]. Our aim is to investigate the effects an energy-aware learning agent can have on live migration, which is a critical process that consumes high amounts of energy in a data centre. Our RL-VM selection policy (Lr-RL) learns to select a VM to migrate from an over-utilised host. Each host contains a set of VMs. Our Lr-RL algorithm determines the optimal VM to migrate from a set of VMs. However to which host that VM will be migrated is out of the scope of this research, but will be considered in future work. To clarify, our Lr-RL approach uses Beloglazov’s Local Regression (Lr) technique to determine if a host is over-utilised (Section II, C). Our RL agent is implemented in-place of their Mmt algorithm. By giving the agent observability over key state variables such as energy consumption then the agent has an advantage over Lr-Mmt (section II), the algorithm we will be examining our approach with.
  • 5. A. State-Action Space We created a novel percentile state-action space to incor- porate the RL algorithm into an IaaS environment. The state space st is defined as the current host utilisation (CPU Usage), denoted as hu, returned as a percentage. Therefore, this allows the state space to be defined as st ∈ S = {0 - 100} and is obtained through the following equation, st = n v=1 vmu(v) (1) where v is the current VM selected, vmu is the function that calculates the current VM’s utilisation of the host’s CPU and n is all possible VMs that can be migrated. For Example, each host contains a set of VMs, for each VM the utilisa- tion percentage of that host resources is calculated and then summed to determine the overall utilisation for that host, in the form of a percentage. The action a space is represented as vmu (defined above) of its assigned host h, returned as a percentage. a= t vmu(v) hu(h) · 100 (2) The action space is defined in terms of percentage, as at ∈ A = {0 - 100}. Each state-action pair is mapped to a q-value in the Agent’s Q-Matrix. B. Reward Function Reinforcement Learning maximise rewards through map- ping of states to actions. In order to achieve this a recurrent interaction at discreet time steps between the agent and envi- ronment is necessary. The RL agent receives a representation of the environment in the form of the current state st which allows an action at to be returned based on the policy the agent is following. The Reward function R( st, at) is determined as the action at, for the current state st of the host. At the next time step the environment returns a new representation of the current state st+1 and a numerical reward rt based on the previous action at undertaken. For example, a host has an over-utilisation (in terms of CPU utilisation) of 90%, this then is the agents st. The RL agent selects an at for example, a VM that is utilising 10% of the host CPU. Then the host transitions into st+1, which in this case would be 80%. The agent receives a reward based on the energy usage of the host once the VM has been migrated. The reward is defined as presented in [6] the host’s power consumption is nearly proportional to its CPU utilization, so the power consumption can be described by Equation (1): P(µ) = 0.7 · Pmax + 0.3 · Pmaxµ (3) where Pmax denotes the hosts power consumption when it is in full load and µ represents the PMs CPU utilization, which changes over time. C. Q-Learning Implementation The following details how the Lr-RL algorithm functions. First the pseudo-code of the dynamic RL-VM selection policy is provided (Pseudo-code 2) with a detailed explanation of how the algorithm will function. Dynamic RL-VM Selection Policy - Pseudo-code 2 host → overUtilisedHost V M → migrateableV MspossibleAction ← vmSize choose VM from possibleActions using π Migrate V M Observe Future host Utilisation, reward calculate Q Q(st, at) ← Q(st, at)+α[rt +γ Max(Q(st+1, at+1)−Q(st, at))] update Q − Matrix The Lr-RL algorithm is invoked when a host is determined to be over-utilised through the Lr function. The host is placed on a list of over-utilised hosts. A host is selected from the list of over-utilised hosts and the RL agent calculates the host’s state, in terms of percentage (see equation 1). All VMs are mapped as possible actions based on the percentage of CPU utilisation of their host (see equation 2). The RL agent selects an action (i.e. VM) based on the RL selection policy i.e -greedy. The agent performs the action, migrating the selected VM from the current host to a suitable alternative host. Then agent observes the new host’s utilisation level and a energy-aware-reward is received (defined in section IV, B). The agent calculates the q-value for the state-action pair, which is then mapped to the Q-Matrix. If the host is deemed to be still over-utilised, the RL agent selects the next optimal VM to migrate. The process is repeated, until a time when the host is no longer over utilised. V. EXPERIMENT SETUP As the target system is an IaaS, it was essential to conduct the experiments on a large scale virtualised data centre infrastructure simulation tool, as a real world system would be too complex. The Cloudsim framework allows for the representation of a power-aware data centre with LAN- migration capabilities. For the experiment conducted in this paper to be considered fair, we have set the Cloudsim param- eters according to Beloglazov et al [7]. This comprises of 800 physical servers consisting of 400 HP Proliant MI 110 G5 and 400 HP Proliant Ml 110 g4 servers is replicated with in the simulator. A 30 day workload used in this experiment comes from a real world IaaS environment. PlanetLab files within the CloudSim framework contain data from CoMon project representing CPU utilization of over a 100 VMs from servers located in 500 locations worldwide. In order to make the experiments more realistic, a 30 day workload experiment was created on a random bases from the PlanetLab files containing 288 values representative of CPU workloads. VMs are assigned to these 30 day workloads on a random basis in order to best represent the stochastic characteristics of workload allocation and demands within an IaaS environment. Cloudsim offers a default ceiling threshold of 100% for each hosts, with a safety parameter of 1.2. This safety parameter acts as an over-utilised buffer. For example if the current utilisation of a host is 85% and the safety parameter value is set to 1.2, this gives an the
  • 6. host a utilisation level of 102% and is consider to be over- utilised. To ensure an agent converges to an optimal policy the learning parameters must be set. These values were selected by conducting a parameter sweep to determine the most optimal performance of the agent’s learning abilities. The parameters for the experiment in this paper are set as follows: α =.8, γ =.8 and =.05. A. Experiment Metrics The following metrics are used to evaluate the Lr-RL algorithm abilities with the Lr-Mmt heuristic. 1) Energy Consumption: The total energy consumed by the data centre per day in relation to computational resources i.e. servers. Although other energy draws exist, such as cooling and infrastructural demands, this area was deemed outside the scope of this research. 2) Migrations of VMs: The total migrations of all VMs on all servers, performed by the data centre over a 30 day workload. VI. PRELIMINARY RESULTS This section compares our Lr-RL algorithm against the benchmark set by the Lr-Mmt technique. Both algorithms were subject to the 30 day workload experiment, repeated 100 times. For every iteration of the experiment, the RL agent’s Q-matrix was initialised to 0. Figure 2. presents the energy consumption results for both algorithms. The results show over the 30 day workload period, Lr-RL consumed a total of 3,948.35 kWh compared to 4,623.75kWh for Lr-Mmt. The standard deviation for Lr- RL was + −28.79 kWh in comparison with + − 33.12 kWh for Lr-Mmt. A paired t-test shows that there is a statistically significant difference in the consumption of energy when utilising Lr-RL and Lr-Mmt resulting in a p-value <0.0067 with a 95% confidence interval (-6.474, -38.5533). The migrations results for both algorithms are shown in Figure 3., highlighting that Lr-RL had a considerably lower number of migrations than Lr-Mmt. Lr-RL selects VMs based on future energy usage of a host. This resulted in fewer migrations from a single host for Lr-RL and a host will maintain sufficient processing power capabilities. Over the 30 day workload period, Lr-RL had a total of 525,769 migrations compared to Lr-Mmt with 797,496 migrations. The standard deviation for migrations for Lr-RL was + −4,443.25 compared to + −5,898.23 for Lr-Mmt. A paired t-test confirms that there is a statistically significant difference between Lr-RL and Lr- Mmt, with a p-value <0.0001 with a 95% confidence interval (-6,358.85, -11,756.41 ). Examining a single day workload in more detail further highlights the improvements that Lr-RL could potentially contribute to real world data centres. The correlation between the decreased number of migrations and the energy reduction for the Lr-RL algorithm is shown in Figure 4, measured at the industry standard of 5 minute intervals. For day 1 of the 30 day workload, Lr-Mmt had a total energy consumption of 138.55 kWh and 23,211 total migrations, while Lr-RL had an energy 80 100 120 140 160 180 200 220 240 0 5 10 15 20 25 kWh Workloads Lr-RL Lr-mmt Fig. 2. Energy Consumption 10000 15000 20000 25000 30000 35000 40000 45000 0 5 10 15 20 25 Migrations Workloads Lr-RL Lr-mmt Fig. 3. Migrations of VMs consumption of 127.31 kWh and 19,437.4 total migrations. Lr-RL saves on average 11.24 kWh and performs 3,773.6 less migrations on the first day of this workload. On average for a single day workload Lr-RL saves 22.51 kWh of energy, with 9,058 less migrations than Lr-Mmt. Lr- Mmt requires nearly 12 VMs to be moved from a host whereas on average Lr-RL never requires greater than 2 VMs to be migrations. One reason for this being, Lr-Mmt chooses a VM with the least time of movement accounting for only 3.06% of overall host utilisation. Whereas Lr-RL on average selects a VM that accounts for 12.87% of overall host utilisation thus enabling faster process whereby a host is no longer considered to be in an over-utilised state. Considering the energy saving aspect of our results. On average Lr-RL saved 22.51 kWh per day,this results in an estimated savings of 8,577.5 kWh per year. According to calculations by the EPA, this is equivalent to a reduction of 5.9 metric tons of CO2 emissions due to electricity generation and potentially protecting 4.6 acres of forest land from destruction [11]. The results highlight the adaptive nature of RL. An agent with the capabilities to learn and adapt to changing workloads, results in a reduction of energy consumption in a cloud domain. However, RL too has drawbacks, for an agent to learn, a number of training episodes must be conducted, potentially leading to a substantial amount of time for an agent to converge on an optimal policy. Although the Lr-Mmt approach consumes high amounts of energy and incurs high overhead
  • 7. Fig. 4. Energy & Migration Correlation for Workload Day 1 costs for cloud providers, the size of the VM (in terms of RAM) to be migrated will not have a considerable impact on compute resources such as bandwidth in a data centre. VII. CONCLUSION This research presented a dynamic RL VM selection policy (Lr-RL), where an agent learned to select a VM to migrate from an over-utilised host. Based on an energy-aware-reward function the agent reduces energy consumption and migrations. To address the aims of the paper proposed in Section I: 1) Our Lr-RL approach improves upon the best known algorithm Lr-Mmt in terms of energy consumption. Our agent based approach learns to select the optimal VM to be migrated. We created a novel percentile stat-action space, represented by the host CPU utili- sation in a percentage and the VMs usage of the host’s CPU, also in percentage. The experiment results demonstrated that Reinforcement Learning can be implemented at a low level of abstraction for the use in a IaaS environment. 2) The energy aware reward function provided energy performance feedback to the agent when selecting an appropriate VM for migration. From our EPA calcula- tions, our RL VM selection policy has the capabilities to create a cognitive live migration framework that has the potential to decrease C02 emissions from a cloud data centre. Our research so far shows the potential benefits of an agent based approach when applied to energy consumption problems in a cloud simulation domain. The SLAV model was out of the scope of this research as we wanted to highlight the advancement RL could achieve in energy consumption. In future work we plan to model the SLAV performance of both algorithms. The work will enable an agent to observe both SLAV and energy to decide to most optimal VM to migrate while improving the performance of a cloud data centre. REFERENCES [1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010. [2] R. M. Bahati and M. A. Bauer. Towards adaptive policy-based management. In Network Operations and Management Symposium (NOMS), 2010 IEEE, pages 511–518. IEEE, 2010. [3] E. Barrett, E. Howley, and J. Duggan. Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurrency and Computation: Practice and Experience, 25(12):1656–1674, 2013. [4] L. A. Barroso and U. H¨olzle. The case for energy-proportional computing. Computer, (12):33–37, 2007. [5] A. Beloglazov, J. Abawajy, and R. Buyya. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future generation computer systems, 28(5):755–768, 2012. [6] A. Beloglazov and R. Buyya. Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. 2010. [7] A. Beloglazov and R. Buyya. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurrency and Computation: Practice and Experience, 24(13):1397–1420, 2012. [8] R. Brown. Report to congress on server and data center energy effi- ciency: Public law 109-431. Lawrence Berkeley National Laboratory, 2008. [9] W. S. Cleveland. Robust locally weighted regression and smooth- ing scatterplots. Journal of the American statistical association, 74(368):829–836, 1979. [10] X. Dutreilh, S. Kirgizov, O. Melekhova, J. Malenfant, N. Rivierre, and I. Truck. Using reinforcement learning for autonomic resource alloca- tion in clouds: Towards a fully automated workflow. In ICAS 2011, The Seventh International Conference on Autonomic and Autonomous Systems, pages 67–74, 2011. [11] Epa.gov. Calculations and references — clean energy — us epa. [12] F. Farahnakian, P. Liljeberg, and J. Plosila. Energy-efficient virtual ma- chines consolidation in cloud data centers using reinforcement learning. In Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on, pages 500–507. IEEE, 2014. [13] J. Koomey. Growth in data center electricity use 2005 to 2010. A report by Analytical Press, completed at the request of The New York Times, page 9, 2011. [14] J. G. Koomey et al. Estimating total power consumption by servers in the us and the world, 2007. [15] C. Pettey. Gartner estimates ict industry accounts for 2 percent of global co2 emissions. Dostupno na, 14:2013, 2007. [16] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998. [17] Y. Tan, W. Liu, and Q. Qiu. Adaptive power management using reinforcement learning. In Proceedings of the 2009 International Conference on Computer-Aided Design, pages 461–467. ACM, 2009. [18] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. A hybrid reinforcement learning approach to autonomic resource allocation. In Autonomic Computing, 2006. ICAC’06. IEEE International Conference on, pages 65–73. IEEE, 2006. [19] A. Verma, P. Ahuja, and A. Neogi. pmapper: power and migration cost aware application placement in virtualized systems. In Middleware 2008, pages 243–264. Springer, 2008. [20] C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3-4):279– 292, 1992. [21] J. Yuan, X. Miao, L. Li, and X. Jiang. An online energy saving resource optimization methodology for data center. Journal of Software, 8(8):1875–1880, 2013. [22] X. Zhu, D. Young, B. J. Watson, Z. Wang, J. Rolia, S. Singhal, B. McKee, C. Hyser, D. Gmach, R. Gardner, et al. 1000 islands: Integrated capacity and workload management for the next generation data center. In Autonomic Computing, 2008. ICAC’08. International Conference on, pages 172–181. IEEE, 2008. View publication statsView publication stats