Master_Thesis

AI Optimisation Approach for Autonomic
Cloud Computing
Kieran Flesk
Submitted in accordance with the requirements for the degree of
Masters of Science
Software Design & Development
College of Engineering & Informatics
National University of Ireland, Galway
Research Supervisor: Dr. Enda Howley
August 2015

A B S T R A C T
Cloud computing has led to exponential growth in large scale data centers and ware-
houses, which form the paradigms substratum layer, Infrastructure as a Service. These
large scale server warehouses consume substantial energy, not only to power servers, but
also afﬁliated processes such as cooling. Dynamic consolidation of virtual machines us-
ing live migration and switching idle nodes to the sleep mode allows cloud providers to
optimize resource usage and reduce energy consumption. The following research pro-
poses a novel reinforcement learning approach for the selection of virtual machines for
migration. Due to low level of abstraction, the proposed algorithm provides a decision
support system which supports efﬁcient and open application deployment, monitoring,
and execution across different cloud service providers and results in lowering energy
consumption without negatively effecting service level agreements.
2

A C K N O W L E D G E M E N T S
Firstly, I would like to express my sincere gratitude to my supervisor Dr. Enda Howley
for the continuous support of my masters study and related research, for his patience,
motivation, and immense knowledge. His guidance helped me immensely in the research
and writing of this thesis. I could not have imagined having a better adviser and mentor
for my masters.
I would like to thank my family especially my parents and their unwavering support
in my decision to return to education and to my brothers and sister for supporting me
throughout the writing this thesis.
Finally I would like to thank my fellow researchers and friends who all have con-
tributed to the ﬁnal product in one way or another.
3

D E C L A R AT I O N
The candidate conﬁrms that the work submitted is his own and that appropriate credit has been
given where reference has been made to the work of others

P U B L I C AT I O N
A Reinforcement Learning Decision Support System for the Selection of Virtual Machines
Kieran Flesk, Dr. Enda Howley
Springer Special Edition Journal of Internet Services and Applications
Under Review

C O N T E N T S
i introduction 15
1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1 Motivations and Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
ii literature review 19
2 cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1 Origins of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.1 Cluster Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.3 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Characteristics of Cloud Computing . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Scalability of Infrastructure . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Autonomic Resource Control / Elasticity . . . . . . . . . . . . . 22
2.2.3 Service Centric Approach . . . . . . . . . . . . . . . . . . . . . . 23
2.2.4 Omnipresent Network Accessibility . . . . . . . . . . . . . . . . 23
2.2.5 Multi-Tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.6 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Cloud Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Private Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Community Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Cloud Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 data centers and energy consumption . . . . . . . . . . . . . . . . 29
3.1 Areas of energy consumption . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Server Consumption . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Power Management Techniques . . . . . . . . . . . . . . . . . . . . . . . 30
6

CONTENTS
3.2.1 Dynamic Component Deactivation . . . . . . . . . . . . . . . . . 31
3.2.2 Dynamic Performance Scaling . . . . . . . . . . . . . . . . . . . 31
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Modern Day Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Levels of Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Full Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Hardware Assisted Virtualization . . . . . . . . . . . . . . . . . 36
4.3 Hypervisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.3 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 Agent / Environment Interaction . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Learning Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.2 SARSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Action Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.1 ε-Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.2 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Reward Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4.1 Potential Based Reward Shaping . . . . . . . . . . . . . . . . . . 47
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 related research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1 Threshold and Non-Threshold Approach . . . . . . . . . . . . . . . . . 50
6.2 Artiﬁcial Intelligence Based Approach . . . . . . . . . . . . . . . . . . . 53
6.3 Reinforcement Learning Based Approach . . . . . . . . . . . . . . . . . 55
6.4 Virtual Machine Selection Policies . . . . . . . . . . . . . . . . . . . . . 56
6.4.1 Maximum Correlation . . . . . . . . . . . . . . . . . . . . . . . . 57
6.4.2 Minimum Utilization Policy . . . . . . . . . . . . . . . . . . . . 57
6.4.3 The Random Selection Policy . . . . . . . . . . . . . . . . . . . . 57
6.5 Research Group Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7

CONTENTS
iii methology 60
7 cloudsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 CloudSim Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.3 Energy Aware Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3.1 Initialising an Energy Aware Policy . . . . . . . . . . . . . . . . 64
7.3.2 Creating a Selection Policy . . . . . . . . . . . . . . . . . . . . . 64
7.4 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.5 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8 algorithm development . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.1 Registering a Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . 66
8.2 Recording of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Additional Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.1 Lr-RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.2 RlSelectionPolicy . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.3 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.4 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.3.6 RlUtilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9 implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.1 State-Action Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2 Q-Learning Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.3 SARSA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
iv experiments 74
10 experiment metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.1 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.3 Service Level Agreement Metrics . . . . . . . . . . . . . . . . . . . . . . 75
10.3.1 SLATAH, PDM & SLAV . . . . . . . . . . . . . . . . . . . . . . . 76
10.4 Energy and SLA Violations . . . . . . . . . . . . . . . . . . . . . . . . . 76
11 selection of policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8

CONTENTS
11.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
11.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
11.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 84
11.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
12 potential based reward shaping . . . . . . . . . . . . . . . . . . . . . 87
12.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
12.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
12.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13 comparative view of lr-rl vs lr-mmt . . . . . . . . . . . . . . . . . 92
13.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
v conclusion 98
14 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9

L I S T O F F I G U R E S
Figure 2.1 Private cloud [93] . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 2.2 Public cloud [93] . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 2.3 Hybrid cloud[93] . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 2.4 High level cloud architecture [15] . . . . . . . . . . . . . . . . . 27
Figure 5.2 PBRS effect [92] . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 7.1 CloudSim class structure [12] . . . . . . . . . . . . . . . . . . . 65
Figure 8.1 The reinforcement learning CloudSim architecture . . . . . . . 68
Figure 11.1 Energy consumption Q-Learning 100 iterations . . . . . . . . 78
Figure 11.2 Energy consumption SARSA 100 iterations . . . . . . . . . . . 79
Figure 11.3 Q-Learning ε-Greedy vs SARSA ε-Greedy . . . . . . . . . . . . 79
Figure 11.4 Overall Energy Consumption 30 Day Workload . . . . . . . . 80
Figure 11.5 Average Daily Energy Consumption 30 Day Workload . . . . 80
Figure 11.6 SARSA migrations 100 Iterations . . . . . . . . . . . . . . . . . 81
Figure 11.7 Q-Learning migrations 100 Iterations . . . . . . . . . . . . . . 81
Figure 11.8 Accumulated rewards for cliff walking task [73] . . . . . . . . 82
Figure 11.9 Accumulated rewards for migrations . . . . . . . . . . . . . . 82
Figure 11.10 Average migrations 100 iterations . . . . . . . . . . . . . . . . 83
Figure 11.11 Average migrations 30 day workload . . . . . . . . . . . . . . 83
Figure 11.12 Overall SLA violations 100 iterations . . . . . . . . . . . . . . . 84
Figure 11.13 Overall SLA violations for 30 days . . . . . . . . . . . . . . . . 84
Figure 11.14 Overall ESV for 100 iterations . . . . . . . . . . . . . . . . . . . 85
Figure 11.15 Overall ESV for 30 days . . . . . . . . . . . . . . . . . . . . . . 85
Figure 12.1 PBRS vs Q-Learning energy consumption . . . . . . . . . . . . 88
Figure 12.2 PBRS vs Q-Learning migrations . . . . . . . . . . . . . . . . . . 89
Figure 12.3 PBRS vs Q-Learning slav . . . . . . . . . . . . . . . . . . . . . . 90
Figure 12.4 PBRS vs Q-Learning ESV . . . . . . . . . . . . . . . . . . . . . . 90
Figure 13.1 Energy consumption for 30 day workload . . . . . . . . . . . . 93
Figure 13.2 Migrations for 30 day workload . . . . . . . . . . . . . . . . . . 94
Figure 13.3 SLA violations for 30 day workload . . . . . . . . . . . . . . . 94
10

LIST OF FIGURES
Figure 13.4 ESV for 30 day workload . . . . . . . . . . . . . . . . . . . . . . 95
Figure 13.5 Energy & Migration Correlation Day 21 . . . . . . . . . . . . . 96
Figure 13.6 Ram sizes of virtual machines migrated . . . . . . . . . . . . . 97
11

A C R O N Y M S
SLA Service Level Agreement
API Application Programming Interface
OS Operating system
QOS Quality of service
IT Information technology
IaaS Infrastructure as a service
PaaS Platform as a service
SaaS Software as a service
UPS Universal power supply
PUE Power Usage Efﬁciency
PDU Power distribution unit
DVFS Dynamic voltage frequency scaling
DRAM Direct Random Access Memory
SPM Static power management
DPM Dynamic power management
DCD Dynamic component distribution
DPS Dynamic performance scaling
CTSS Compatible time sharing systems
CP Control program
CMS Conventional monitor system
12

VMM Virtual machine management
ABI Application binary translation
KVM Kernel virtual machine
MMU Memory management unit
TLB Table lookaside buffer
RL Reinforcement Learning
TD Temporal Difference
PBRS Potential Based Reward System
AI Artiﬁcial Intelligence
MDP Markov Decision Process
PM-L Local power management
PM-G Global power management
LNQS Layered queuing network solver
GA Genetic algorithm
MLGGA Multi Layered Grouped genetic algorithm
GGA Grouped genetic algorithm
LR Local regression
MMT Minimum migration time
VM Virtual machine
RL Reinforcement Learning
CPU Central processing unit
LR Local Regression
MC Maximum Correlation
13

MC Maximum Correlation
MU Minimum Utilization
RS Random Selection
MIPS Millions of instructions per second
PDM Performance Degradation Due to Migration
SLATAH Service Level Agreement Time Per Active
SLAV Service Level Agreement Violation
ESV Energy and Service Level Agreement Violation
PC Personal Computer
14

Part I
I N T R O D U C T I O N

1
I N T R O D U C T I O N
Cloud computing refers to both the applications delivered as services over the Internet
and the hardware and software systems in the data centers that provide them [3]. Buyya
et al. defines cloud computing as a type of parallel and distributed system, consisting
of a collection of interconnected and virtualized computers, that are dynamically provi-
sioned and presented as one or more unified computing resources, based on service level
agreements (SLA) established through negotiation between the service provider and cus-
tomer [17]. Regardless of the ever growing heterogeneous nature of cloud platforms and
deployments, this definition still rings true.
Other key cornerstones also remain despite the ever changing landscape, one such cor-
nerstone is the ability of cloud providers to virtualize the key constituents which form
the lowest level of the cloud architecture known as infrastructure as a service layer (IaaS),
principally large scale data centers. The virtualization of large scale congregations of
nodes, typical of that found in modern day data centers into multiple virtual indepen-
dent machines executing on a single node, not only allows for the plasticity of services,
but plays a key role in the high level adherence of SLAs and maximum utilization of
resources which underpin the foundations of cloud computing, while providing maxi-
mum return on investment for service providers. However, due to the dynamic nature of
cloud services and their demands, the static or offline management of these virtual ma-
chines (VM) provides significant restrictions in the provision of an idealistic cloud service.
16

1.1 motivations and aims
The provision of such virtualized environments and services comes at a cost, studies
such as [8][41][42], highlight the growth of data centers directly resulting in
• An increase in energy consumption in the range of billions of kwh’s from the begin-
ning of the decade.
• An annual increase in the emission of Co2 from 42.8 million metric tons in 2007 to
67.9 million metric tons in 2011.
Much of this energy is wasted in idle systems, in typical deployments, server utilization
is below 30%, but idle servers still consume 60% of their peak energy draw. In order to
combat such wastage, advanced consolidation policies for under utilized and idle servers
are required to be deployed [56].
Two key findings from the Report to Congress titled Server and Data Center Energy
Efficiency 2007 directly addresses this issue by stating that ;
”Existing technologies and strategies could reduce typical server energy use by an estimated
25%. ”[14]
”Assuming state-of-the-art energy efficiency practices are implemented throughout U.S. data cen-
ters, this projected energy use can be reduced by up to 55% compared to current efficiency
trends”[14]
The following thesis purposes one such state-of-the-art energy efficiency policy.
1.1 motivations and aims
Motivated by these facts and the success of previous research regarding reinforcement
learning (RL) as an optimisation technique. The aim of this research is to design, develop,
implement and evaluate a RL agent based approach for the selection of VMs for migration
in a stochastic IaaS environment in order to reduce energy consumption.
17

1.2 research questions
1.2 research questions
This thesis aims to answer the following research questions
• Is reinforcement learning a viable approach for virtual machine selection in the
cloud ?
• Can advanced reinforcement learning techniques improve such a policy ?
• Can a reinforcement learning approach outperform the state of the art selection
policy ?
1.3 thesis structure
The thesis is laid out as follows.
• Chapter 1 contains an introduction that provides an overview of the research topic
and introduces the research questions, motivations and aims.
• Chapter 2-6 contains a literature review covering
– Cloud Computing
– Data Centers and Energy Consumption
– Virtualization and Hypervisors
– Reinforcement Learning
– Resource Allocation and Selection Methods
• Chapter 7-9 contains the methodology of the thesis including
– CloudSim Simulator
– Algorithm Development & Implementation
• Chapter 10-13 contains the experiments carried out
– The Policy Selection
– Addition of Potential Based Reward Shaping
– Comparative View of Lr-Rl vs Lr-Mmt
• Chapter 14 contains the conclusions and possible areas of future work.
18

Part II
L I T E R AT U R E R E V I E W

2
C L O U D C O M P U T I N G
The following chapter contains an in depth review of the most pertinent academic re-
search available in relation to cloud computing, its characteristics, architecture, service
and deployment models.
2.1 origins of cloud computing
There has been a long time vision of providing computer services as a utility alongside
water, gas, electricity and telephone. To achieve this individuals and companies must be
able to access the services they require on demand with the scalability and ﬂexibility they
require in a pay-per-use environment [17]. The following section outlines the historic
progression towards such a scenario.
2.1.1 Cluster Computing
Originally super computers led the way in large scale computational tasks in areas such
as science, engineering and commerce, eventually however more extensive computational
energy was required to cater for such problems and from this cluster computing was
developed. A cluster is a collection of parallel or distributed computers, which are in-
terconnected among themselves using high speed networks often, in the form of local
area networks [67]. Multiple computers and their resources are combined to function as
a virtual computer, allowing for greater computational energy. Each node carries out the
same task and each cluster contains redundant nodes, which allows for a backup should
a utilized node fail. Computers in a cluster can be described as homogeneous as they use
the same operating systems (OS) and hardware.
20

2.1 origins of cloud computing
2.1.2 Grid Computing
Grid computing, originally developed to meet the high computational demands of sci-
entific research. Grid computing is a distributed network, which couples a wide variety
of geographically distributed computational resources such as personal computers (PCs),
workstations, clusters, storage systems, data sources, databases, computational kernels,
special purpose scientific instruments and presents them as a unified integrated resource
[15]. These grids are commonly established maintained and owned by large research
groups with shared interest. Such an infrastructure requires a complex management
system having to manage multiple global locations, multiple owners, heterogeneous com-
puter networks and hardware as well as user policies and availability [1].
2.1.3 Cloud Computing
The most recent computing paradigm to progress towards the vision of providing com-
puter services as a utility is cloud computing. A cloud is a type of parallel and distributed
system consisting of a collection of interconnected and virtualized computers that are dy-
namically provisioned and presented as one or more unified computing resources in order
to provide a service underpinned by high levels of quality of service (QOS) and SLAs [17].
Cloud computing has been defined in many different ways the following is just one of
those definitions Ian Foster et al. describes it as,
”A large scale distributed computing paradigm that is driven by economies of scale,
in which a pool of abstracted, virtualized, dynamically-scalable, managed computing
energy, storage, platforms and services are delivered on demand to external customers
over the Internet.”
[29]
21

2.2 characteristics of cloud computing
The characteristics of cloud computing infrastructures and models are a recurring concept
in literature, highlighted no more so than by Gong et al. comprehensive research review
[30]. The following section contains a brief explanation of these characteristics.
2.2.1 Scalability of Infrastructure
A key feature of any cloud computing architecture is its ability to scale accordingly in
accordance with peaks and troughs in customer demand , such scalability not only al-
lows for providers to maintain SLAs but also allows for the strategic management of data
center resources, thus reducing costs.
Scalability can be summarised into two separate categories. Horizontal scalability refers
to the ability of a node to access extra processing resources from other nodes within a
data center i.e multiple nodes to work as a single logical node to perform a task, therefore
maintaining SLAs and QOS. Vertical scalability refers to the ability of additional resources
to be added to a single node where necessary, such as increasing bandwidth, memory or
central processing unit (CPU) utilization [55].
2.2.2 Autonomic Resource Control / Elasticity
The ability of services to be extended or retracted autonomously depending on demand
is a key characteristic of cloud computing [100]. This is also referred to by Zhang et al.
as the ability of self-organization [98]. This elasticity is a key aspect which differentiates
cloud computing from the more rigid grid and cluster computing. It is also a key selling
point as customers are offered the ability to re-size their hardware needs in parallel with
their requirements without the expense of investing in physical resources which may lie
largely redundant for long periods of time.
22

2.2.3 Service Centric Approach
Cloud computing providers deliver an on-demand service model, delivering services
when and where they are needed. These services are provided in accordance to SLAs
and QOS agreed between the consumer and provider prior to provider obtaining control
of the task [98].
2.2.4 Omnipresent Network Accessibility
Services can be accessed via an Internet connection from any location using a range of
heterogeneous devices at any given time [100].
2.2.5 Multi-Tenancy
Multi-tenancy refers to the sharing of cloud resources, including CPU, memory, networks
and applications [2]. Multi-tenancy in cloud computing plays a key role in the ﬁnancial
viability of providing such a service. Although users share these resources, providers
place a layer of virtualization technology above the hardware layer which allows a cus-
tomized and partitioned virtual application. Multi-tenancy is seen as a major aspect of
cloud security at all layers of the cloud infrastructure, through partitioning and isolation
of virtual resources providers strive to provide maximum security [2].
2.2.6 Virtualization
Virtualization allows service providers to create multiple instances of virtual machines on
a single server [11]. Each of these virtual machines can run different operating systems
(OS) independent of the underlying OS. The ability of a provider to provide multiple
instances of machines from a single server contributes greatly to the viability of cloud
computing by maximising return on investment. Virtualization is discussed in further
detail in Chapter 4.
23

2.3 cloud deployment models
There are four types of cloud deployment models that are commonly referred to in litera-
ture, these are private, public, hybrid and community. The following section outlines the
structure of each model.
2.3.1 Private Cloud
A private cloud is a cloud model that is devoted solely to one organisation. The cloud
infrastructure may be located in-house or elsewhere in a single or multiple data centers.
It may be managed solely by the organisation or by a third party [25]. A private cloud can
offer high security, performance and reliability, however the cost associated with private
clouds is often higher than that of other models [98].
Figure 2.1: Private cloud [93]
24

2.3.2 Community Cloud
A community cloud is a cloud infrastructure built and shared by multiple organisations
which share common policies, practices and regulations. The underlying infrastructure
can be third party hosted or by an individual organisation within the community [35].
2.3.3 Public Cloud
A public cloud is where commercial entities offer cloud services to the general public
usually on a pay-per-use model. Public clouds offer the beneﬁt of no upfront capital
expenditure on infrastructure, however the reﬁnement of services and security as seen in
private clouds are not as extensively available [98]. Examples of such services are Amazon
EC2 or Google compute engine.
Figure 2.2: Public cloud [93]
25

2.3.4 Hybrid Cloud
A hybrid cloud contains the facilities of two or more cloud models, private and other
public or community clouds. Such a model allows for a private cloud to be held in-
house while certain aspects of the information technology (IT) infrastructure can be held
on public clouds. Such an infrastructure supplies an organisation with the ability to
retain high security and speciﬁc optimisation, while maintaining the elasticity provided
by public clouds [98].
Figure 2.3: Hybrid cloud[93]
26

2.4 cloud architecture
2.4 cloud architecture
Fig.2.4 shows a high level view of a cloud computing architecture, an architecture which
is tightly coupled with what is known as the cloud service models.
At the lowest level is the hardware level these are data centers that hold large volumes
of physical servers and associated equipment. On top of the hardware lies the infras-
tructure layer, this layer virtualizes the servers held in the data centers on demand, by
creating multiple instances of virtual machines which includes virtualizing CPUs, mem-
ory, storage etc. These ﬁrst two layers combine the necessary elements to provide IaaS to
the consumer.
Figure 2.4: High level cloud architecture [15]
The third layer, the platform layer provides a development, modelling, testing and
deployment environment for developers of applications hosted in the cloud [29]. The de-
velopers have little or no access to the underlying networks servers etc. except for some
minor user conﬁguration [57]. This allows for the provision of a platform as a service
(PaaS) as a cloud service model.
The top layer known as the application layer is the user interface of cloud computing
usually supplied via browsers with heterogeneous Internet enable devices [77]. This layer
allows access via a web browser or an application interface for software applications
hosted on cloud servers. The consumer has no control of the underlying infrastructure of
the cloud or the applications capabilities except those provided by the creator. This cloud
service model is referred to as software as a service (SaaS) [57].
27

2.5 summary
2.5 summary
This chapter reviewed cloud computing from a high level viewpoint, reviewing the ori-
gins, characteristics, models and architecture of cloud computing. Key pieces of liter-
ature outlined the foundations of cloud computing in grid and cluster computing, the
importance of autonomic resource control, scalability of infrastructure and virtualization
in providing a cost effective and adaptable cloud.This chapter concludes with a review
of could models and architecture in order to convey their everyday real world use and
applications.
28

3
D ATA C E N T E R S A N D E N E R G Y C O N S U M P T I O N
From the years 2005 to 2010 the worldwide consumption of energy in data centers has
increased by 56%. In the year 2010 the data center energy consumption worldwide ac-
counted for 1.3% of all energy consumption. Furthermore of the approximate 6000 data
centers present in America in 2006 it cost $4.5 billion in energy overheads [41]. These fig-
ures highlight the extensive consumption of energy in data centers and the necessity of all
shareholders to actively pursue methods by which to reduce consumption both from an
economic and environmental viewpoint . This chapter examines the most current and rel-
evant research in relation to energy consumption and preservation techniques deployed
within large scale data centers.
3.1 areas of energy consumption
For a number of years, researchers and engineers have focused on improving the perfor-
mance of data centers, and in doing so have improved systems year on year. However
although the performance per watt has increased the total energy consumption has re-
mained static and in some cases risen [43]. In order to combat excessive consumption of
energy it is important to recognize the disparate elements which consume energy within
a data center. Servers naturally consume a large proportion of the overall energy intake,
however the associated infrastructural demands are also a major factor when calculating
overall costs. These costs are calculated via the Power Usage Efficiency (PUE) metric,
which is defined as the ratio between the total energy consumed by a data center and
the energy consumed by IT equipment such as servers, networking equipment and disk
drivers. The PUE factor ranges from as high as 2.0 in legacy data centers to as low as 1.2
in recent state of the art facilities [80]. At a PUE rate of 2.0, every kilowatt utilised by IT
components another kilowatt is consumed by infrastructure loads such as cooling, fans,
29

3.2 power management techniques
pumps, uninterrupted energy supplies (UPS) and power distribution units (PDU). In or-
der to remain within the scope of the research paper the author will solely investigate
energy usage in relation to IT components.
3.1.1 Server Consumption
Intel research has proven that the main source of energy consumption in a server remains
the CPU, however it no longer maintains the dominance of energy consumption it once
did due to the implementation of energy efﬁciency and energy saving techniques such as
dynamic voltage and frequency testing (DVFS) [45]. DVFS is a hardware based solution
which dynamically adjusts voltage and frequency of a CPU in accordance with workload
demand. The purpose of applying DVFS is to reduce energy consumption by lowering the
voltage and frequency levels, however this can lead to degradation of execution speeds
[46]. DVFS is important for energy management at server level as it allows a CPU to run
at levels as low as 30%. However the CPU is the only server component with the ability
to perform such a task, disk drives, dynamic random-access memory (DRAM), fans etc
can only cycle between states of on, off or idle which result in a idle server consuming in
excess of 70% of its overall energy.
Energy management techniques are incorporated in all aspects of system design, Bel-
oglazov breaks these techniques into two subsections of static power management (SPM)
and dynamic power management (DPM). SPM incorporates all design time power man-
agement methods, including complex gate and transistor design, energy switching in
circuits at a logical level and the incorporation of energy optimization techniques at archi-
tecture level [12]. Dynamic power management (DPM), refers to the run-time adaptability
of a system in correlation with resource demand. DPM technologies can be further sub-
divided into two further sections dynamic component deactivation (DCD) and dynamic
performance scaling (DPS).
30

3.2.1 Dynamic Component Deactivation
DCD incorporates the switching of power states for a component that does not incorpo-
rate DPS techniques such as DVFS. Switching between power sates i.e active-idle, idle-off
can result in signiﬁcant energy consumption at the reinitialisation stage should the com-
ponent be required at a later stage, therefore it is necessary to ensure DCD occurs only
when the saved energy consumption through deactivation is greater than that accrued
during reinitialization [12]. Benni et al. states that to apply DCD techniques, a workload
must be possible to predict [13]. This prediction and the accuracy thereof is imperative to
the performance of such techniques. These predictions are based on usage of the overall
system to date and possible use in the near future. An example as shown by Benini et al.
is that of a timeout function on a laptop, where a laptop moves from active to idle after
a period of time in the presumption that it was idle for x minutes therefore it is likely
to remain idle for an additional x amount of time [13]. Predictive policies rely on past
data and its correlation to future events. Through the analysis of past performance and
demands the system forms both predictive shut down and predictive wake up techniques.
3.2.2 Dynamic Performance Scaling
DPS allows the utilization and application of energy saving techniques in hardware com-
ponents with the ability to alter the frequency and clock speeds, mainly CPUs when they
are not fully utilized. This technique is known as DVFS. In order to save maximum en-
ergy, a system requires both frequency scaling i.e the ability to alter the clock speed and
voltage scaling. The implementation of such a technique is by no means straight forward,
reducing the instruction processing capability reduces throughput and performance, this
in turn increases a programs run-time and may not result in maximum energy consump-
tion, therefore it is necessary to balance the energy/performance ratios within a system
through careful approximation.
In order to optimize the ratio three common techniques are implemented. Interval
based algorithms harnesses past system usage data and adjusts voltage and frequency
in line with predicted future use. Intertask algorithms distinguish the number of tasks
in relation to the CPU in real-time systems and denotes resources appropriately. This
can become complex in a system with unpredictable heterogeneous workloads. Intratask
31

3.3 summary
algorithms looks at the data and individual components with a specific program and then
provides resources appropriately [12].
3.3 summary
This chapter focused on the area of energy consumption in data centers. Following a
brief introductory section highlighting energy consumption on a world scale, Section 3.1.1
focuses on the specific area of energy consumption within data centers through tertiary
elements such as PDUs and UPS and defines the PUE metric. Section 3.1.1 takes a closer
look at hardware specific consumption, particularly at server level. The chapter closes by
reviewing two key power management techniques, DCD and DVFS.
32

4
V I RT U A L I Z AT I O N
Although virtualization has seen increased notoriety and usage since the early 1990’s, it
was originally developed as far back as 1964 by IBM as a method to increase productivity
levels of both the hardware and the user. In the 1960’s many engineers, scientist and
large scale research groups were using programs to carry out research, however these
programs were resource intensive, requiring the full use of the hardware system and the
supervision of a researcher to run and record results.
This led to some pioneering work in areas such as compatible time sharing systems
(CTSS) at M.I.T in the early 1960’s [23]. CTSS allowed batch jobs to be run in parallel with
users request to run programs. This in turn led to the creation of the control program and
conventional monitor system (CP/CMS) in 1964 and was known as a second generation
time sharing machine, this machine was built on the concepts of the earlier CTSS. CP pro-
vided separate computing environments, while the CMS allowed for autonomy through
sharing, allocation and protection policies [23], similar to the operations carried out by
the virtualization layer in modern cloud environments.
33

4.1 modern day virtualization
4.1 modern day virtualization
In 1998 VMware conquered the task of virtualizing the x86 platform through a combina-
tion of binary translation and direct execution on the processor allowing multiple guest
OS on a single host [83].
Virtualization is the faithful reproduction of an entire architecture in software which
provides the illusion of a real machine to all software running above it [40]. In an era of
on-demand computing the ability to virtualize a single server into multiple instances of
virtual machines running separate guest OS and with secure, reliable access to resources
such as I/O devices, memory and storage has proven imperative in the growth of cloud
computing.
Virtualization of a single server into multiple VMs is achieved by placing an extra
layer known as a hypervisior directly on top of the hardware and beneath the OS layer.
This layer, also known as Virtual machine monitor (VMM) is responsible for providing
total mediation between all VMs and the underlying hardware [65]. The VMM allows
access to resources held in the infrastructure layer while ensuring isolation of VMs which
improves security levels and reliability. The VMM also spawns new VMs on demand,
migrates VMs to existing or new instances when necessary and applies consolidation
techniques by moving VMs from underutilized hosts and powers down these hosts in
order to conserve energy.
34

4.2 levels of virtualization
Virtualization can be applied using three different techniques, full virtualization, para
virtualization and/or hardware assisted virtualization. All three methods must and have
dealt with the need to alter the privilege level of an architecture also referred to in the
literature as ring aliasing or ring compression, to allow for virtualization to take place.
For example, in an x86 level architecture there are four levels of privilege, the OS takes
the lowest level and therefore presumes it has direct access to the host it is placed on,
however by placing a virtualization layer underneath the OS, the levels of privilege have
now been altered.
4.2.1 Full Virtualization
Full virtualization allows for the complete isolation of a guest OS from the underlying
infrastructure. This allows for an unmodified OS to run using a hypervisor to trap and
translate privileged instructions on the fly or through the use of binary translation [20].
Although full virtualization can carry high overheads due to the need to catch and trans-
late privileged instruction, it does provide the most secure and isolated environment for
VMs.
4.2.2 Paravirtualization
Paravirtualization continues to employ a hypervisior, however this method of execution
requires the hypervisor to alter the kernel of the guest VM. The hypervisor alters the OS
calls and replaces these with hypercalls, allowing for direct communication between the
guest VM and the hypervisor without the need to process privilege instruction or the
need to create binary translation, thus decreasing overhead [83]. In doing this paravirtu-
alization reduces the need for binary translation, and therefore significantly simplifies the
process of virtualization [96].
35

4.2.3 Hardware Assisted Virtualization
Hardware assisted virtualization also referred to in literature as native virtualization [20],
is an alternative method of virtualization that seeks to overcome the limitations of par-
avirtualization by removing the need for CPU virtualization and the overheads of full
virtualization which occurs through binary translation. Both Intel and AMD support
hardware assist virtualization through the Intel-VT and AMD-V virtualization extensions
[20]. In order to address the problem of virtualization and in particular the levels of priv-
ileges required for systems to run effectively and efﬁciently, Intel Vt-x which supports
the IA-32 processor virtualization consists of two separate forms of privileges, the guest
runs the VMX non root operation and the hypervisor runs a VMX root operation which
provides them with four separate levels of privilege. This allows the guest OS to run
on its expected level O privileges and provides the hypervisior with the ability to run
multiple privilege levels. In order to run this conﬁguration Intel has applied two extra
transitions, the guest to the hypervisior know as a VM exit and from the hypervisior to
the guest known as the VM entry. The VM exit and entry are managed via a virtual
machine control structure, which subdivides the IA-32 into two sections, one to deal with
the guest state while the other deals with the host state [78].
36

4.3 hypervisors
4.3 hypervisors
As highlighted in the above literature, in order to implement virtualization there is a
need to deploy a hypervisior commonly referred to as a VMM, on top of the hardware
level within the system, also referred to in literature as bare-metal level. The hypervisior
provides an intermediate layer between VMs and the underlying hardware. This layer
allows for the total encapsulation of the VM which provides stability security and relia-
bility from bugs or malicious attacks, mapping and remapping of existing and new VMs.
The subsections below review the most commonly deployed hypervisors in data centers
today.
4.3.1 Xen
Xen is a open source project whose hypervisior is currently powering some of the largest
cloud projects deployed today, such as Amazon web services, Google and Rackspace
services [94]. Originating in the early 1990’s in Cambridge University, the Xenoserver
computing infrastructure project proposed the creation of a simple hypervisior which
allows users to run their own OS and the added capability to run speciﬁcally designed
applications directly on top of the hypervisor to improve performance and allow a sub-
stantial number of disparate guest OS [32].
In 2002 Xen was released as a open source project, it has since seen four major updates.
Xen put forward a paravirtualized architecture, citing the complexity of full virtualiza-
tion as a major and unwelcome cost. Xen believed that the hiding of virtualization from
guest OS risked correctness and performance and that paravirtualization was necessary
to obtain high performance robustness and isolation[5]. In order to do this the hypervisor
must cater for all standard application binary interfaces (ABI) and support a full range of
OS.
In 2005 Xen in conjunction with Cambridge and Copenhagen Universities, introduced
the design and implementation for the live migration of VMs. This was a major step
forward in hypervisior efﬁciency. Live migration could be completed with a downtime as
low as 60ms, it allowed for the decommissioning of the original VM once the transfer was
complete and it also allowed for media services etc. to be transferred without the need
for users to reconnect, it also allowed for the VM to be transferred as a single unit, elim-
37

4.3 hypervisors
inating the need for the hypervisor to have knowledge of individual applications within
the VM. This further progressed the maintainability of a data center by further improving
the ability to perform dynamic consolidation of VMs [21].
Today, Xen offer a large range of virtualization solutions for multiple architectures, in-
cluding ARM and x86, it also provides the capability to virtualize a large range of OS,
including Linux, Solaris and Windows through the use of full hardware assisted virtual-
ization.
4.3.2 KVM
The kernel virtual machine (KVM) originated in 2006 as an open source project, KVM
requires Intel VT-x or AMD-v instruction sets to run, both of which were also made
available in 2006. A KVM hypervisior allows for up to 16 virtual CPUs running full
virtualization methods [95]. The KVM leverages the hardware extensions provided by
Intel and AMD to add a hypervisior to a Linux environment. Once this hypervisor is
added to the environment, it also adds a /dev/kvm device node which allows users to
create virtual machines, read and write to virtual CPUs, run a virtual CPU and inject in-
terrupts and allocate memory via a memory management unit (MMU) for the translation
of virtual address to physical address. This MMU consists of a page table which encodes
the virtual addresses to physical address, a notiﬁcation manager for page faults and a
table lookaside buffer (TLB) and instruction set, all located on the chip to decreases table
look-up time [39].
4.3.3 VMware
VMware is a hypervisor, which is a result of research carried out by Stanford University
[61]. In 1998 VMware built on this research and virtualized the x86 architecture through
binary translation and direct processor execution [84]. The implementation of full binary
translation allowed VMware to deploy full virtualization of its platform as well as the
ability of its guest VM to host a range of OS including Linux and Windows.
38

4.4 summary
Originally VMware offered VMworkstation as a deployed as a hosted architecture
which placed a virtual layer directly as an application on the host VM. In more recent
times VMware ESX uses a hypervisor layer of technology placed on bare-metal, signiﬁ-
cantly increasing I/O performance [86].
Similar to Xen 3.0.1 and KVM it utilises a data structure to track the translation of vir-
tual pages to physical memory pages, shadow pages are kept in sequence with the pmap
structure for the processor in order to minimise overheads. VMware Drs monitors VMs
within a data center, by leveraging VMmotion which allows for live migration and VM
schedulers, it allocates and reallocates VMs as necessary.VMware HA monitors hosts for
failures. It allows for rapid redeployment of VMs on a failed host when necessary, it also
ensures that the required storage to facilitate this redeployment is available at all times
within a cluster [86].
4.4 summary
This chapter reviewed the area of virtualization, beginning by looking at the early stages
initiated by IBM in the 1960’s and continuing through to modern day virtualization. The
second half of the chapter reviews the different layers and methods used in implementing
virtualization with the chapter coming to a close by examining the three most commonly
deployed hypervisors today.
39

5
R E I N F O R C E M E N T L E A R N I N G
Reinforcement Learning (RL) dates back to the early days of cybernetics and work in
statistics, psychology, neuroscience and computer science [38]. From a purely computer
science viewpoint RL is a type of machine learning, machine learning is viewed as the
ability of computer programs to automatically improve through experience.
RL has been an area of research since the late 1950’s when Samuel first applied tempo-
ral difference (TD) methods in order to manage action values, some years later in 1961
Minsky is attributed as developing the term RL [71], however it was the development
of value functions and its mathematical characterization in the form of Markov decision
process (MDP) in the mid 1980’s that has helped propel its popularity as an artificial
intelligence (AI) approach to problem solving [72]. The successful application of the RL
for disparate tasks such as Tesauros TD-Gammon or Bartos work on improving elevator
performance through the use of RL and neural networks has also elevated its appeal to
researchers in recent times [74] [9].
This AI approach offers a more flexible approach than many of its counterparts and
is a key part in what differentiates itself from that of other forms of machine learning,
including supervised and unsupervised, by this we mean, actions can be of low-level
non-critical decisions or high-level strategic methods, boundaries between an agent and
the environment are not rigidly defined and can adapt to suit the given workspace or
problem and the time steps involved need not be of chronological order, they can be stage
or task relate to suit the problem domain.
40

5.1 agent / environment interaction
Within a RL framework the learner is commonly referred to as an agent, with everything
outside of the agent referred to as its environment. Through a cyclical process of state-
action-reward at discrete time steps, the agent learns an optimum policy.
As an agent progresses through the state space, in the main, their current action not
only effects the immediate reward received, but the probability of maximising future re-
wards. Therefore an ”optimal action” must take into account not only immediate reward
but also the possible future reward in deciding which action to take, commonly referred
to as delayed reward.
RL delayed reward problems are commonly modelled as MDP. A MDP is a mathemati-
cal structure for the modelling of decision under uncertainty. A MDP is represented as a
4 tuple [7] s,a,t,r
s - The state space, in a reinforcement framework is referred to as the environment state.
a - The action space, representative of all possible actions in a given state in a reinforce-
ment framework.
t - The transition state or the probability (P) of action a in state s will result in s1 and can
be deﬁned as:
Pa
s,s = Pr st+1 = s |st = s, at = a (1)
r - The reward space - given any current state s and action a, and, together with any next
state, s , the expected value of the next reward is
Ra
s,s = E rt+1|st = s, at = a, st+1 = s (2)
41

Therefore, we can view reinforcement learning as the ability to map states to actions
in order to maximize a numerical reward. In order to achieve this a reoccurring interac-
tion at discrete time steps between the agent and environment is necessary as laid out
in Fig.5.1. The agent receives a representation of the environment in the form a state st
. This allows the agent select and return an action at , based on the agents policy. At
the beginning of the next time step the environment returns a new representation of the
current state st+1 and a numerical reward rt+1 base on the previous action undertaken at .
Figure 5.1: The agent-environment interaction in RL [73]
42

5.2 learning strategies
Traditional learning strategies commonly referred to as update functions, assign rewards
at the end of a task via the relation of actual and predicted outcome, however such meth-
ods have proved to be resource intensive as regards memory, and can also be viewed as
a static approach, unsuitable for more transformative problem domains. A more suitable
learning strategy known as temporal difference learning provides a collection of methods
for incremental learning specialized in the area of prediction problems [70]. Temporal dif-
ference methods do not require a full model of an environment in order to learn, rather
they update estimates based in part on previously learned estimates [87], without waiting
for a ﬁnal outcome often referred to as bootstrapping [73].
A discount factor γ, which may range from 0-1, determines the importance of future
rewards, a factor closer to 0 allows an agent to take a restricted view considering only
short-term rewards while a value closer to 1 allows for the agent to strive towards a greater
long term reward. The learning rate α establishes the rate at which new information
overrides old. A learning rate of 1 ensures that the most recent information obtained is
utilised while a learning rate of 0 infers no learning will take place.
43

5.2.1 Q-Learning
Q-learning is a form of TD model free learning proposed by Watkins [88]. Q-learning
learns on an incremental basis calculating Q-Values at each discrete time step as the
estimated value of taking action a and thereafter following an optimal rewarding policy
π. Q-learning, maps these state-action transitions at each discrete time step that is non
terminal, through the following update rule.
Q(st, at) ← Q(st, at) + α[rt + γmaxQ((st+1, ai) − Q(st, at))] (3)
A single iteration results in a single Q-value which is a concatenation of current reward
(r) plus the discounted estimated reward γmaxQ((st+1, ai) − Q(st, at)) allowing for ad-
vanced progression towards an optimal π. The general form of the Q-learning algorithm
is as follows;
Q-learning Algorithm
Initialize Qmap arbitrarily, π ;
Repeat (while s is not terminal) ;
Observe s ;
Select a using: π ;
Execute a;
Observe st+1 rt+1;
Calculate Q Q(st, at) ← Q(st, at) + α[rt + γMaxQ((st+1, ai) − Q(st, at))]
44

5.2.2 SARSA
The modified connectionist Q-learning algorithm, more commonly known as SARSA was
introduced by Rummery & Niranjan [66], they question if the use of γMaxQ(st+1, ai)
provides an accurate estimate of a given state particularly in large scale real world ap-
plications. They believe for optimal performance that γ must return to 0 for each non
policy derived action. To counteract this they proposed the following update function
now known as SARSA
Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))] (4)
Rather than utilising γmaxQ(st+1, ai) they associate a second state-action transistion
Q(s+1,a+1) for the calculation of a given Q-value, thus negating the need to return γ to
0 for non policy derived actions. This means that SARSA, a name which is derived from
the reflection it requires a quintuple of events st, at, r, s+1, a+1 , in order to calculate its Q-
values, is viewed as an online policy as it takes into account the control policy by which
the agent is moving, and incorporates that into its update of action values, in comparison
to Q-learning is viewed as a offline-policy, as it simply assumes that an optimal policy is
being followed. The general form of the Sarsa control algorithm is as follows;
SARSA Algorithm
Initialize Qmap arbitrarily , π ;
Repeat (while s is not terminal) ;
Observe st ;
Select a using: π ;
Execute a;
Observe st+1 rt;
Select a + 1 using: π ;
Calculate Q Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))]
45

5.3 action selection policy
5.3 action selection policy
One element of RL not shared with other machine learning techniques is that of explo-
ration vs. exploitation. In order to learn a truly optimal policy an agent must explore
all possible states and experience taking all possible actions, while on the other hand in
order to exploit an optimal policy and its associated rewards an agent must follow the
optimal policy. Commonly referred to as the dilemma of exploration and exploitation [73], it
can have a great impact on an agents ability to learn [76]. An agent that always exploits
the best action of any given state predefined in a state model, is said to be following a
greedy selection policy, however such an implementation never explores, thus paying no
regard to possible alternative more lucrative actions.
5.3.1 ε-Greedy
An alternative selection policy is known as ε-greedy. This method introduces a parameter
epsilon, epsilon controls the rate of exploration. epsilon is set at a desired rate of prob-
ability and at each time step is compared to a random number, should they coordinate
a random action is chosen, therefore providing an element of exploration. As an agent
converges closer to an optimum policy epsilon may be reduced to represent the lowered
need of exploration.
5.3.2 Softmax
ε-greedy remains a popular method for providing an exploration allowance, however a
draw back is the equal probability of choosing the worst or best action when exploring.
An alternative which goes someway to addressing this issue is known as the Softmax
action selection policy. When used in a RL paradigm an actions probability of selection
is a function of its estimate value, increasing the probability of the higher value action
been chosen [90]. Softmax action estimates are commonly obtained via Gibbs distribu-
tion, however estimates can be calculated in many different ways, often dependant on the
underlying schema of a system in which an agent is deployed. Similarly the benefits of
Softmax over ε-greedy is undefined as it to largely depends on the environment in which
they are applied [73].
46

5.4 reward shaping
5.4 reward shaping
One of the main limitations of RL is the slowness in convergence to an optimum policy
[52]. In a RL framework traditionally value functions otherwise referred to Q-values are
initialised with either pessimistic, optimistic or random values [24]. These methods tend
to overlook the fact that in real world applications a developer may hold key domain
expert knowledge, that if incorporated can help an agent based system to converge to a
level of optimum performance at a much quicker rate. The leveraging of such knowledge
is know as knowledge based reinforcement learning.
One such approach is known as reward shaping, this is the introduction of domain
expert designed reward in addition to the natural system reward. Due to the intrinsic
relationship of rewards, states and actions, the accurate shaping of rewards is vital to
the overall effectiveness of an agent. Poorly designed reward shaping can not only delay
the convergence to a optimal policy, but can in fact be detrimental to learning as seen
by Randlov and Alstrom where an agent learning to ride a bike actively pursued a path
away from the goal as the cumulative reward for correcting the orientation was greater
than that of reaching the goal [64].
5.4.1 Potential Based Reward Shaping
Ng et al. [60] introduce potential based reward shaping (PBRS) in order to optimize the
method of shaping rewards and in turn preventing the problems highlighted by Randlov
and Alstrom study [64]. The potential based reward is calculated as the difference in
potential between s’ and s+1 and is formally deﬁned as;
F(s , s) = γφ(s ) − φ(s) (5)
47

5.5 summary
The research has proven that in applying PBRS in both finite and infinite state spaces
with a single RL based agent, it does not alter the optimal policy of an agent but does
decrease the convergence time significantly.
This can best seen in in Fig.5.2, taken from Wiewiora et al. the diagram illustrates
the convergence of a PBRS based agent against that of non- PBRS based agent of a well
known RL problem known as mountain car [92]. It is clearly visible the PBRS based
agent begins much closer to the optimal policy, greatly outperforming the standard agents
based model.
Figure 5.2: PBRS effect [92]
5.5 summary
This chapter focuses on the A.I approach known as reinforcement learning. Section 5.1
looks at the agent-environment interaction and the modelling of RL delayed rewards as
MDPs. In Section 5.2 temporal difference learning strategies including Q-learning and
SARSA are explored in detail. This is followed by the explanation and evaluation of the
action selection policies ε-greedy and softmax. The chapter concludes by surveying the
advanced RL technique known as PBRS.
48

6
R E L AT E D R E S E A R C H
Cloud computing leverages the ability to virtualize the key constituents which form the
lowest level of the cloud architecture known as IaaS, principally large scale data centers.
The virtualization of large scale congregations of nodes, typical of that found in mod-
ern day data centers into multiple virtual independent machines executing on a single
node, not only allows for the plasticity of services, but plays a key role in the high level
adherence of SLAs, and maximum utilization of resources which underpin the founda-
tions of cloud computing, while providing maximum return on investment for service
providers. However due to the dynamic nature of cloud services and their demands, the
static or offline management of these VMs provides significant restriction to all of these
key principles.
In recent years much research has been undertaken focusing on combining the areas
of energy efficiency and dynamic resource selection and allocation policies. This research
can be categorized into the following three sections.
• Threshold and Non-Threshold Approach
• Artificial Intelligence Based Approach
• Reinforcement Learning Approach
49

6.1 threshold and non-threshold approach
The main area of research concentrates at machine or host level, such as Nathuji and
Schwan, proposed a VirtualPower architecture which implements the Xen hypervisor
with minimal alterations to the hypervisior [59]. Each host contains a local power man-
agement module (PM-L) residing locally as a controller in the driver domain known as a
Dom0 module. When a guest OS attempts to make power management decisions these
calls are trapped due to their privilege levels by the hypervisior, the VirtualPower pack-
age then passes these trapped calls to the PM-L where decisions on power management
can be made based on VirtualPower management rules contained in the Dom0 controller.
However while this research address local policies it fails to address global policies for
their suggested global power management (PM-G) module.
Kuysic et al. introduced a proactive look-ahead control algorithm [44]. The algorithm
known as LLC, proposes to minimize CPU power usage and SLA violations while max-
imising providers proﬁts. It proposes the use of a quadratic estimation algorithm known
as Kalman to estimate workload arrivals and supply VMs accordingly. This approach
requires a complex learning based structure in order to predict incomes which in turn
increase computational overhead. The research conclusions highlight this complexity
as a serious issue, especially when dealing with discrete input values with exponential
increases in worst case complexity, where the increase in control options accrue large in-
crease in the computational time require by the LLC controller. A data center with 15
hosts requiring 30 minutes execution time, which would be unrealistic for implementa-
tion in large scale data centers.
Cordosa et al. proposed the leveraging of existing parameters within the Xen and
VMware packages to alter the method in which VMs contend for power regardless of
workload priority [19]. Parameters provided by Xen and VMware hypervisiors such as
min allow for the allocation of minimum amount of resources provided to any given VM,
max parameter allows to set the maximum resources applied to any given VM, while
shares parameter allows a developer to set the ratio of CPU allocation between high and
low priority VMs. By allocating high levels of minimum resources to high priority VMs
and limiting the allocation to low priority VMs, they hope to improve overall performance.
Using VMware ESX servers the authors carried out their experiments, however the min
and max and share thresholds were designated prior to run-time i.e statically, with no al-
50

ternative for dynamic adjustment during run-time thus limiting suitability of the research
to be implemented in a real world cloud data center due to the heterogeneous nature of
evolving applications. The research assumes that pre-existing maps of SLA agreements
exist and uses these as input parameters, but fails to outline the number of SLA violations
as a result of applying the approach they outline.
Verma et al. implemented a power aware application placement framework called
pMapper designed to utilize power management applications such as CPU idling, DVFS
and consolidation techniques that already exist in hypervisiors such as Xen [81]. These
techniques are leverage via separate modules mainly the performance manager who has
a global overview of the system and receives information such as SLAs and QOS pa-
rameters, the migration manager which deals directly with the VMs to implement live
migration, the power manager which communicates with the infrastructure layer to man-
age hardware energy policies and finally the arbitrator which decides on information
supplied from the above mentioned polices for the optimal placement of VMs through
utilising a bin packaging algorithm. At implementation stage pMapper was utilised to
solve a cost minimization problem which considers power-migration cost and similar to
Cordosa et al. fails to address SLA violations [19].
In additional research by Verma et al. they suggest that server consolidation can be
viewed in three forms [82]. The first is static where VMs or applications are placed on
servers for an extended period such as months or years, second semi-static for daily and
weekly usage and dynamic for VMs and applications with execution times ranging from
minutes to hours. The author highlights that tools currently exist to manage such struc-
tures, but are rarely used and administrators often prefer to wait for offline migration
to decide on placements. Although the paper highlights three forms of consolidation it
deals only with static and semi-static and much like Cordosa et al. research, this limits
suitability of the research to be implemented in a real world cloud data center due to the
heterogeneous nature of evolving applications.
Jung et al. propose a hybrid system of on-line/offline collaboration by analysing data
based on system behaviour and workloads on-line to feed a decision tree structure offline
[36]. This approach allows for the modelling of large scale, complex configuration prob-
lems and reduces overheads by removing the decision model from run-time environment.
These models are used for the bases of creating on-line Multi-tier ques in an attempt to
51

reach peak utilization. This research was furthered in 2009, when Jung et al. created a
middleware for cost sensitive adapted and server consolidation, utilising the Multi-tier
ques developed in their earlier study and by applying a best first search graph algorithm
with cost-to-go as there transitions costs and a layered queuing network solver (LNQS)
predictive modelling package [37]. However this was modelled solely on a single web
application and similar to Cordosa and Verma, limits suitability of the research to be im-
plemented in a real world cloud data center.
Threshold based approach for autonomic scaling of resources are more commonplace,
with cloud providers such as Amazon Ec2 through their Auto scaling software and
RightScale implements such policies. Threshold based approaches are based on the
premise of setting an upper and lower bound threshold that, when broken trigger the
allocation or consolidation of resources as necessary.
Research carried out in the area of threshold based approaches include a proposed
architecture known as ”the 1000 island solution architecture” by Zhu et al. [99]. Similar
to Verma they consider three separate application categories based on time periods, they
then denote an individual controller to each category. The largest timescale is hours to
days, the second is minutes and finally seconds. Each group is regarded as a pod and
has a node controller managing dynamic allocation of the node’s resources, as part of
the node controller lies a utilization controller, which computes resource consumption
and estimates future consumption required in order to meet the SLA. This information is
passed to a global arbitrator module which decides overall allocation of resources. The
arbitrator module associates individual workloads priority levels in order to schedule
works appropriately, with high priority works getting first allocation of resources. The
pod controller monitors node utilization levels setting 85% CPU utilisation as an upper
threshold and 50% as a lower threshold, using this information it then migrates VMs as
necessary. The pod set controller studies, historic demands and estimates future demands
using an optimization heuristic approach to formulate policies. Although the results of
the experiments in this research are positive the author highlights the need to scale up
the size of the test bed to realistically evaluate its strength in a real world application, to
the best of our knowledge this has not yet been achieved.
52

6.2 artificial intelligence based approach
John McCarthy defines AI as the science and engineering of making intelligent machines,
especially intelligent computer programs [54]. This ability to make intelligent computer
programs forms the basis for the following concepts, in which researchers apply a range
of AI approaches as a tool for the optimization of resource allocation in a cloud environ-
ment.
One such example is that of Hu et al. who considers a genetic algorithm (GA) approach
to the scheduling of resources, in particular VMs [34]. Utilizing a GA in conjunction with
historic performance data, Hu attempts to predict the effect of multiple possible sched-
ules in advance of any deployment in order to apply the best load balance.
Wei et al. deploys a similar approach to resource optimization through a game the-
oretic approach, scheduling resources through a cost-time optimization algorithm, two
step approach [89]. Each agent solves their problem optimally independent of others, at
this stage an evolutionary optimization algorithm takes this information, collates the data
and estimates an approximate optimal solution and donates resources as necessary.
Particle swarm optimization a concept first introduced by Kennedy [63], was deployed
by Pandy in 2010 to optimise the mapping of work flow to resources in a cloud environ-
ment [62]. Each particle represents a mapping of a resource to a task in a 5D dimensional
space i.e each particle has five jobs, these particles are released into the search space
mapping their best locations, in this case the best task to resource allocation, in order to
determine the optimal combined work flow.
Moghaddam et al. in 2014 introduces the concept of a Multi-level grouping genetic
algorithm (MLGGA) [58]. The researcher highlights the fact that the problem of optimal
VM placement is an NP-hard problem and can be viewed as a bin packing problem. Due
to the bin packing nature, they use a grouping genetic algorithm (GGA) as their base al-
gorithm and attempt to introduce a Multi-level grouping concept to optimize placement
and grouping of VMs and in-turn reduce the carbon footprint. While the researchers’ ex-
periments are both substantial and strenuous proving a lowering in the carbon footprint,
the research fails to address some of the key aspects of VM placement in data centers
53

such as quality of experience, security, QOS and SLAs.
Sookhtsaraei et al. similar to Moghaddam introduces a genetic algorithm solution as
an approach to optimizing bin packing for VMs [69]. Using GGA as a base for their
algorithm they create an algorithm called CMPGGA, the CMPGGA algorithm considers
bandwidth, CPU, memory along with host and VMs as input parameters, with an output
of an optimized mapping for VMs to hosts. While the CMPGGA can argue improvement
in reducing operational costs, the research fails to address QOS or SLA violations. With-
out considering these violations which can result in monetary penalties for the service
providers it is impossible to fully calculate operations improvements.
54

6.3 reinforcement learning based approach
6.3 reinforcement learning based approach
A more recent approach is the application of RL agents to optimize resources manage-
ment in the cloud. Barret et al. purposes a parallel RL framework for the optimisation
of scaling resources in lieu of the threshold based approach [7]. The approach requires
agents to approximate optimal policies and sharing their experiences with a global agent
to improve overall performance, and has proven to perform exceptionally, despite the re-
moval of traditional rigid thresholds.
While Bahati proposes incorporating RL in order to simply manage the existing thresh-
old based rules [4]. A primary controller applies these rules to a system in order to enforce
its quality attributes. A secondary controller monitors the effects of implementing these
rules and adapts thresholds accordingly.
Another approach adopted by Teasauro introduces a hybrid RL approach to optimising
server allocation in data centers through the training of a nonlinear function approxima-
tor in batch mode on a data set while an externally trained policy makes management
decisions within a given system [75].
Finally Farahnakian et al. and Yuan et al. present dynamic RL techniques to optimize
the number of active hosts in operation in a given time-frame [97] [27]. A RL agent learns
an online host energy detection policy and dynamically consolidates machines in line
with optimal requirements. Post detection of over utilized hosts, both studies employ Bel-
oglazovs minimum migration time selection policy in order to identify VMs for migration
[11].
All of the above RL approaches have proven a statistical advantage over threshold based
approach, and forms the motivation for this research to implement and evaluate RL at a
lower level of abstraction as a policy for the selection of VMs.
55

6.4 virtual machine selection policies
Beloglazov et al. study carried out in 2011 remains one of the most highly cited and
accepted pieces of research in relation to the consolidation of VMs while maximizing
performance and efficiency in cloud data centers [11]. Beloglazov examines the dynamic
consolidation of VMs while considering multiple host and VMs in an IaaS environment.
Unlike numerous other research papers Beloglazov models SLAs as a key component in
a solution to VM consolidation.
Beloglazov proposed algorithm can be broken into three sections over-loading/under-
loading detection, VM selection and VM placement
Overload detection: Building on past research Beloglazov suggests an adaptive selec-
tion policy known as Local Regression (LR) for determining when VMs require migration
from host in order not to violate SLAs [10]. Local Regression first proposed by Cleveland
allows for the analysis of a local subset of data in this case hosts [22]. By providing an
over utilization threshold along with a safety parameter, LR decides if a host is likely to
become over utilised if their current CPU utilization usage multiplied by the safety pa-
rameter is larger than the maximum possible utilization.
VM selection: Virtual machines v are placed on a migration list based on the shortest
period of time to complete the migration, the minimum time is considered as the utilized
ram divided by spare bandwidth for the host h , the policy chooses the appropriate Vmv
through the following equation, where RAMu(a) is the amount of RAM currently utilized
by the VMa , and NETh is the spare network bandwidth available on host h .
v ∈ Vh | ∀a ∈ Vh,
RAMu(v)
NETh
≤
RAMu(a)
NETh
(6)
Beloglazov research proves dynamic VM consolidation algorithm Lr-Mmt significantly
outperforms static policies such as DVFS or non power aware approaches. It also outper-
forms the following dynamic policies.
56

6.4.1 Maximum Correlation
Maximum correlation policy is based on the premise that the stronger the inter-relationship
of applications running on an over utilized server, the higher the probability the server
will overload as highlighted by Verma et al. [81]. The multiple correlation policy ﬁnds a
VMv that satisﬁes the following policy.
v ∈ Vh | ∀a ∈ Vh, R2
xv, x1, x.., xv+1, xv−1, xn ≥ R2
xv, x1, x.., xv+1, xv−1, xn (7)
6.4.2 Minimum Utilization Policy
The Minimum Utilization Policy is a simple method to select VMs from overloading hosts.
The policy chooses a VM based solely on the minimum utilization of a host, calculated
in Millions of instructions per second (MIPS). The policy is repeated until the host is no
longer considered as being overloaded [79].
6.4.3 The Random Selection Policy
The Random Selection Policy is another simple method to select VMs from overloading
hosts. The policy chooses a VM randomly to migrate. The policy is repeated until the
host is no longer considered as being overloaded [79].
57

6.5 research group context
6.5 research group context
This research was undertaken as part of a wider research group led by Dr. Enda How-
ley. The research group is known for research in the area of Multi-Agent Systems, Cloud,
Swarm, Smart Cities, Social Network Analysis & Simulation and Data Analytics. The fol-
lowing section reviews just a subset of research carried out by past and present members
of the group in the area of Cloud and RL.
Barret et al. presents a novel approach to a work flow scheduling in a cloud environ-
ment [6]. A work flow architecture estimates the average execution and cost of a task ,
which are passed to multiple solver agents, which through the use of GAs, produce vari-
ous possible schedules. An MDP agent takes these possibilities and develops an optimal
schedule for the work flow execution. Results show that the MDP agent can optimally
choose a schedule despite an environment having varying loads and data sizes.
Further work by Barrett et al. includes the automation of resource allocation in the
cloud through the use of a RL multi-agent approach [7]. Each agent addresses incoming
workloads, on the basis of these requests an agent must approximate an optimal policy for
resource allocation, each agent shares this information amongst each other before finally
forwarding optimal scheduling policies to an instance manger which allocates VM based
on the advice. Results show that by paralleling the scheduling process the time taken
to converge is reduced greatly and the framework can effectively select VMs of varying
types for the required workload.
Mannion et al. presents a parallel learning RL algorithm which utilizes heterogeneous
agents [51]. Each of these heterogeneous agents learning in parallel on a partitioned
subset of the overall problem. The knowledge/experience of these agents is then made
available to a master agent where the values are used for Q-Value initialisation. This par-
allel approach has proven to outperform the standard Q-learning approach, resulting in
increased learning speed and a lower step to goal ratio.
This work is advanced in where Mannion et al. introduces this parallel learning of
partitioned action spaces to a smart city environment and traffic signal control [49]. Re-
sults show significant improvement with the use of action space partitioning compared
to a standard RL approach. Mannion also investigates the area of potential based reward
58

6.6 summary
systems to improve performance for the learning of trafﬁc signal control [50]. Comparing
a potential based reward agent with a standard agent Mannion shows that not only does
learning speed increase, but it also reduces cue and delay times.
6.6 summary
This chapter reviews the key literature in the area of resource allocation, selection and
scheduling in a cloud environment. Section 6.1 explores the traditional approach of static,
threshold and non threshold approaches to resource management. Section 6.2 progresses
to analyse a more dynamic approach of resource management through the application of
various A.I approaches ranging from GA, PSO and game theory. Section 6.3 focuses on
RL as a speciﬁc method of resource scheduling, with key work from Barret and Bhatti
providing key examples of resource scheduling via RL. Section 6.4 reviews pertinent
literature from Beloglazov in the area of VM selection algorithms including minimum
migration and maximum correlation. The chapter concluded by highlighting the role of
this research in relation to the wider research group.
59

7
C L O U D S I M
7.1 overview
The CloudSim toolkit was chosen as an appropriate simulation platform as it allows for
the modelling of the virtualized IaaS environment and is the basis of much leading re-
search in the area of cloud computing capabilities particular energy conservation and
resource allocation [47] [91] [68] [16].
The CloudSim framework is a Java based simulator developed by CLOUDS laboratory,
University of Melbourne. It allows for the representation of an energy-aware data center
with LAN-migration capabilities. In keeping with industry standards, 300 seconds/ 5
minute intervals establish if a host is over utilised and requires migration of VMs. The
default ceiling threshold for utilization is 100% with an added safety parameter of 1.2.
This safety parameter acts as a over utilisation buffer, for example, a host determined to
be 85% utilised is multiplied by the safety parameter 1.2 and results in a utilisation of
102%, and is therefore deemed as over utilised.
61

7.2 cloudsim components
CloudSim is an event driven application written in the Java programming language con-
taining over 200 classes and interfaces for the complete simulation of a cloud environment,
the following section highlights the main and most important classes as highlighted by
Buyya et al. and Calheiros et al. [16] [18]. Fig 10 contains a CloudSim class design dia-
gram.
CloudInformationServices: The CloudInformationServices (CIS) class represents an entity
which provides the registration, indexing and modelling services for a data center cre-
ated within a simulation. A host from a data center registers its details with CIS, which
in-turn shares these details with the data center broker class which can then directly pro-
vide workloads to a host.
DataCenter: This class which extends SimEntity instantiates a data center and assigns
a set of allocation policies for bandwidth(BW), memory storage and deals with the han-
dling of VMs. This class is extended within the CloudSim framework as PowerDatacenter
and NetworkDatacenter to allow for customised research, such as power reduction or net-
work related research.
DataCenterCharacteristics: The DataCenterCharacteristics class allocates the static proper-
ties of a data center such as OS, management policy, time and costs.
Host: The Host class represents a physical resource such as a server which hosts VMs.
The class contains the internal policies for BW, processing power and memory for a single
instance of a host.
Vm: The Vm class represents a VM which is contained within a host. The class allows
for the processing of cloudlets submitted from the DataCenterBroker class in accordance to
its ability deﬁned by its memory, processing power, storage size, and a VMs provisioning
policy. Similar to Datacenter it is extended within the CloudSim framework as an instance
of a PowerVm or NetworkVm to allow for customised research such as power reduction or
network related research.
62

Cloudlet: The Cloudlet class allows for the instantiation of a Cloudlet object, tracks
Cloudlet movement and allows for the cancellation, pausing or removal of a cloudlet
from the CloudletList(). A Cloudlet in CloudSim contains a workload assigned to a VM.
DataCenterBroker: This class represents a broker acting on behalf of a client/user. The
broker queries CIS and retrieves the hostList containing information on available VMs and
their respective specification allowing for the broker to directly assign cloudlets to VMs
with the necessary capability to achieve the customers QOS demands.
SimEntity: The SimEntity class is an abstract class when extended represents a single
simulation. The startEntity() method is invoked to begin a simulation, when started the
processEvent() method is called repeatably to process all events held in the deferredQue().
Finally the shutdown() method is invoked just prior to the termination of a simulation
which allows for events such as printing to a log file. All simulations must invoke the
SimEntity class.
RamProvisioner: This is an abstract class which provides the necessary methods for ram
provision policies to VMs inside a host. This must be extended by researchers to config-
ure custom Ram policies, otherwise CloudSim will implement the RamProvisionerSimple
class as default.
BwProvisioner: The BwProvisioner class is an abstract class which provides the basic
necessary methods to allocate a bandwidth allocation policy. This must be extended by
researchers to configure custom BW policies, otherwise CloudSim will implement the Bw-
ProvisionerSimple class as default.
63

7.3 energy aware simulations
7.3 energy aware simulations
7.3.1 Initialising an Energy Aware Policy
Initialising a energy aware policy is possible by accessing the
org.cloudbus.cloudsim.power.planetlab package located in the examples folder, this package
contains a various array of power aware simulations including Lr-Mmt, Lr-Mc and Lr-
Mu. In order to create a new policy one must locate the main class within this package,
from here CloudSim instantiates a new planet lab runner providing it with the necessary
information.
7.3.2 Creating a Selection Policy
The creation of a new selection policy is possible by accessing the org.cloudbus.cloudsim.power
package located in the source folder, this folder contains all allocation and selection poli-
cies for VMs. It also contains the PowerVmAllocationPolicyMigrationAbstract class which
invokes the method getVmsToMigrateFromHosts(). This method calls for the selection of
a VM from a overloaded host. It is from this point the selection policy instantiated by
the user is invoked, and this is the key location of interaction between new or existing
selection policies with CloudSim.
7.4 hardware
A data center comprising of 800 physical servers, consisting of 400 HP ProLiant Ml 110 G5
and 400 HP ProLiant Ml 110 G4 servers is the default data center topology. Alterations can
be made to the hardware setup via the Constants class located in org.cloudbus.cloudsim.power
package. This class also provides the option of altering other key constants in relation to
VM types and sizes, scheduling intervals and bandwidth and storage.
7.5 workload
The workload comes from a real world IaaS environment. PlanetLab ﬁles within the
CloudSim framework contain data from CoMon project representing CPU utilization of
over a 100 VMs from servers located in 500 locations worldwide. In order to produce an
64

7.6 summary
Figure 7.1: CloudSim class structure [12]
accurate and reliable experiment the algorithms were deployed to represent a one month
time period, in order to achieve this the PlanetLab ﬁles were utilized through random se-
lection to create a 30 day workload. Each PlanetLab ﬁle contains 288 values representative
of CPU workloads. VMs are assigned these workloads on a random basis in order to best
represent the stochastic characteristics of workload allocation and demands within an
IaaS environment. Each VM corresponds to that of the Amazon EC2, other than that they
are single core, representing the fact the workload was retrieved from single Core VMs.
The 288 CPU values when used with CloudSims default monitoring interval represents
24 hours of data center capacity.
7.6 summary
This chapter introduces the simulation environment used in the remainder of the thesis
known as CloudSim, including details on its class structure, the necessary alterations
needed to introduce a new energy aware simulation and the default hardware and work-
loads provided by the simulator.
65

8
A L G O R I T H M D E V E L O P M E N T
In order to produce a RL selection algorithm for VMs some additional information must
be provided to register the RL policy and measure its effects. The creation of the RL
framework must be undertaken. The following chapter outlines in detail the necessary
additional classes and alterations.
8.1 registering a selection policy
In order to register a new selection policy the method getVmSelectionPolicy() from the class
RunnerAbstract located in org.cloudbus.cloudsim.
examples.power must be altered to include the name of the new policy and provided access
on the instantiation of a simulation. This class also allows user to alter the name of the
output folder for the compilation of results if required.
8.2 recording of results
The recording of certain key metrics are automatically compiled by CloudSim at the end
of each simulation. However these results are a combined metric of the overall perfor-
mance, for example the overall energy consumed or the overall number of migrations.
However for the accurate detailing of the effect a new policy has on the data center it
is important to measure key information such as energy, number of migrations or SLA
violations on an ongoing basis at discrete intervals. It is possible to do so in the Helper
class located in located in org.cloudbus.cloudsim.examples.power. It is from this class that
key metrics are printed to ﬁle at the end of each simulation, by introducing new methods
it is possible to measure these key metrics on a much more reﬁne timescale.
66

8.3 additional classes
The following selection describes the additional classes required for the RL framework at
high level, of which a schematic can be seen in Fig. 8.1.
8.3.1 Lr-RL
The LrRl class is located in the package org.cloudbus.cloudsim.examples
.power.planetlab. It contains the main method and is where the simulation is instantiated.
This class supplies the workload, names of the selection and allocation policy and the
safety parameter to PlanetLabRunner in order to begin the simulation.
8.3.2 RlSelectionPolicy
The RlSelectionPolicy class is located in the package org.cloudbus.cloudsim.
power it overrides the default VM selection policy within CloudSim, it also acts as a
controller class, conversing between CloudSim, the Environment class and the Agent when
necessary.
8.3.3 Environment
The Environment class carries out all functions necessary to accumulate the required in-
formation for the Agent to make a decision, for example the Environment retrieves the
state, produces a list of all possible actions in the given state and calculates rewards. All
utilized by the Agent class.
8.3.4 Agent
The primary role of the Agent class is to choose a VM for migration, by one of two possible
methods either following a SARSA policy or an -greedy policy. The Agent also contains
a ”brain”, in this case a matrix in which it stores, updates and reads Q-values as required.
67

8.3.5 Algorithm
The role of the Agent class is to implement the requested Q-value estimation learning
strategy. In this case, Watkins Q-learning or Rummery and Niranjans SARSA algorithms.
8.3.6 RlUtilities
RlUtilities class contains all functions necessary for the accumulation and accurate mea-
surement of the required metrics.
Figure 8.1: The reinforcement learning CloudSim architecture
68

8.4 summary
8.4 summary
This chapter outlined the creation of a RL framework in CloudSim including the necessary
alterations to the existing simulator and additional classes required to replicate an agent
based approach for VM selection.
69

9
I M P L E M E N TAT I O N
In order to develop a RL algorithm in any system two key areas must be addressed
which are specific to the environment in which the algorithm is deployed, these are the
state-action space and the low level implementation of the learning strategy. This chapter
address both these issues in relation to an IaaS environment.
9.1 state-action space
RL techniques can suffer from a far-reaching state action space, which limits the effective-
ness and capabilities of a RL agent. Therefore to incorporate a RL algorithm into an IaaS
environment an appropriate state action range must first be defined. The state space s is
defined as current host utilization hu returned as a percentage which therefore confines
the state space in a range of 0-100 and is obtained through the following equation, where
virtual machiene utilization vmu is defined as a migratable VM utilization and n is all
possible migratable VMs.
s =
n
∑
vmu=1
vmu(n)
hu
.100 (8)
The action a space is represented as the vmu of its assigned host h, returned as a per-
centage, which also allows the action space to range from 0-100.
a= vmu(i)
hu(h)
.100 (9)
70

9.2 q-learning implementation
9.2 q-learning implementation
The ﬁrst implementation allows for the implementation a Q-learning algorithm as follows
.
Q-learning virtual machine selection algorithm
foreach host → overUtilizedHost
do
foreach vm → migrateableVms
do
possibleAction ← vmSize
end
choose VM from possibleActions using π
Migrate Vm ;
Observe hostUtilization+1, reward ;
calculate Q Q(st, at) ← Q(st, at) + α[rt + γMax(Q(st+1, ai) − Q(st, at))]
update Qmap
end
The algorithm is invoked when a host is determined to be overloaded through the LR
host overload detection policy, this host is placed on a list of over utilized hosts and
forwarded to the VM selection policy in this case, RL.
From this stage the ﬁrst host is selected from the list, the hosts level of utilization is
taken as the state, while all migratable VMs are mapped as possible actions based on the
percentage of their load in relation to their host.
A VM is then chosen based on the RL selection policy i.e ε-greedy or softmax. This
VM is placed on a migratable list and the hosts utilization level re-calculated and a scalar
reward attributed, the Q-value is calculated and stored.
If the current host is still deemed to be over utilised another VM is chosen in the same
manner until a time when the host is no longer overloaded . However if the host is no
longer over utilized the next host on the over utilized host list is chosen, until the list is
empty.
71

9.3 sarsa implementation
9.3 sarsa implementation
As referred to in section 5.2.2, SARSA requires a quintuplet consisting of values st, at, r, s+1, a+1
in order to calculate its Q-value.
This is where the design of the VM selection algorithms differs. Although both algo-
rithms accept the same input, the order in which they process this input must be altered
appropriately.
This alteration is evident in the following SARSA algorithm, post observation of the
new state i.e host utilization + 1 and the reward, it does not calculate the q-value at this
stage.
Instead it obtains a new list of possible actions in the shape of migratable VMs for the
state host utilization + 1, it then selects the appropriate VM following π. It is only now
that the algorithm has the required information to calculate for Q.
SARSA virtual machine selection algorithm
foreach host → overUtilizedHost
do
do
possibleAction ← vmUtilization
end
choose VmToMigrate from possibleActions using π
Migrate Vm ;
Observe hostUtilization+1, reward ;
do
possibleAction ← vmUtilization
end
choose VmToMigrate from possibleActions using π
Calculate Q Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))]
update Qmap
end
72

9.4 summary
9.4 summary
This chapter outlined a percentile state-action space to be utilised by the agent, reducing
the space to a 100*100 area reduces the state-action space necessary for an agent to traverse
regardless of the number of nodes in the data center. As a result addressing the so
called ”curse of dimensionality” and providing an adaptable and portable agent. Finally
the chapter concludes outlining a low level implementation for two key RL strategies
Q-learning and Softmax.
73

10
E X P E R I M E N T M E T R I C S
The following chapter outlines the key metrics for measuring the performance of the
RL algorithm. These metrics were purposed by Beloglazov and are widely adopted in
research as a standard measurement of data center performance [12].
10.1 energy consumption
The total energy consumed by the data center per day in relation to computational re-
sources i.e servers. Although other energy draws exist, such as cooling and infrastructural
demands, this area was deemed outside the scope of this research.
10.2 migrations
The total migrations of all VMs, on all servers, performed by the data center. As the agent
is trained to carry out intelligent selection of VMs each migration is imperative when
analysing this research.
10.3 service level agreement metrics
The importance of maintaining a high standard of QOS and SLAs is imperative for a cloud
provider to uphold. Their importance is highlighted by the three stages of measurement
used to accurately report SLA violations.
75

10.4 energy and sla violations
10.3.1 SLATAH, PDM & SLAV
Service level agreement time per active host (SLATAH) is calculated as the time Tsi where
active hosts have experienced 100% utilization of their CPU, as a result this restricts access
to VMs upon the host i to any further processing energy should they request additional
CPU utilization, thus forcing violations. Where N represents the number of hosts and
Ta represents the time host i is actively serving VMs. Performance degradation due to
migration.
SLATAH =
1
N
N
∑
i=1
Tsi
Tai
(10)
(PDM) is established as an estimate of degradation Csv caused by migrations m with
Cav repersenting the total requested CPU capacity by a VM v over its lifespan.
PDM =
1
M
m
∑
v=1
Csv
Cav
(11)
Due to the equal importance of both SLATAH and PDM, a combined metric, service
level agreement violation(SLAV) is used to measure the combination of both metrics as
follows.
SLAV = SLATAH.PDM (12)
10.4 energy and sla violations
In order to ensure the implementation of energy saving policies does no negatively effect
SLA researchers and developers are required to measure the co related effect. To measure
this a combined metric named Energy and SLA Violations(ESV) is calculated as follows.
With the lower the overall ESV the better the performance of a data center.
ESV = ENERGY.SLAV (13)
76

11
S E L E C T I O N O F P O L I C Y
11.1 experiment details
Whether softmax or ε-greedy action selection is better depends on the task or the envi-
ronment in which it is deployed, and an intrinsic link between the action selection choice
and the update function (Q-learning or SARSA) performance exist due to its dependence
of Q [73].
For this reason the following experiment was undertaken to analyse and distinguish the
optimal combination of update and selection policy as mentioned in Section 2.7. There is
four possible update/selection policies, they are;
• Q-Learning / ε-greedy
• Q-Learning / Softmax
• Sarsa / ε-greedy
• Sarsa / Softmax
Each combination of policy is analysed using both a 30 day stochastic workload in order
to measure adaptability and a repetitive single workload over 100 iterations in order to
measure convergence rates.
77

11.2 results
11.2 results
11.2.1 Energy
The running of a single workload over multiple iterations not only allows for a perspec-
tive on the ability of the agent to learn, but also allows for the identiﬁcation of speed of
convergence to a level of optimum performance for each possible update/selection policy.
The optimum energy consumption was determined as the point when the energy con-
sumptions passed beneath 140kWh, established from data gather from 100 iterations on
all four possible update/selection policies.
Fig. 11.1 displays the energy consumption of Q- learning with softmax and ε-greedy
action policies. On iteration No.34 ε-greedy converges to the optimal energy barrier, while
softmax fails to penetrate sub 140kWh until iteration No.64.
Figure 11.1: Energy consumption Q-Learning 100 iterations
78

11.2 results
Fig. 11.2 displays the energy consumption of Sarsa with softmax and ε-greedy action
policies. On iteration No.33 ε-greedy converges to the optimal energy barrier, while soft-
max again fails to penetrate sub 140kWh until it ﬁnally converges on iteration No.85.
Figure 11.2: Energy consumption SARSA 100 iterations
The policies that contain epsilon appear to converge the quickest to an optimal level of
performance. Fig. 11.3 displays a comparison of these policies in relation to convergence
time.
Figure 11.3: Q-Learning ε-Greedy vs SARSA ε-Greedy
As outlined in the previous section both Sarsa and Q-learning converge to an optimal
level in quick succession of each other, however Q-learning remains sub 140 kWh 22%
more than Sarsa for the remainder of the 100 iterations. Post the 50th iteration point
there is minimal performance differences, Q-learning produces a slight reduction in the
average consumption per iteration of 140.32kWh to Sarsa’s 140.74kWh, this is in-line with
the deviation level post 50 iterations with Q learning running at 0.2964 and Sarsa at 0.2977.
79

11.2 results
While multiple iterations of a single workload may highlight the rate of convergence
it does not portray an agent in a real world stochastic cloud environment. For that rea-
son one must take into account performance when supplied with a disparate workload.
Fig.11.5 and Fig.11.4 contain the daily average and overall energy consumption over a
30 day period. Again policies containing ε-greedy action selection methods perform best
and again, there is minimal difference in the performance of Q-learning and Sarsa, mir-
roring the results from the iterative test. Q-learning ε-greedy has a saving of 5, 16 and
25kWh over Sarsa ε-greedy, Q-learning softmax and Sarsa softmax respectively.
Figure 11.4: Overall Energy Consumption 30 Day Workload
Figure 11.5: Average Daily Energy Consumption 30 Day Workload
80

11.2 results
11.2.2 Migrations
Each time a Vm migrates the draw on energy increases as the contents of the VM is copied
from one server to another. Therefore a reduced level of migrations results in decreasing
the associated energy cost.
Figure 11.6: SARSA migrations 100 Iterations
Taking the analysis format from the previous section, each selection combination was
given an iterative single workload, Fig.11.6 displays the results of migrations selected
from over utilised hosts as chosen by Sarsa combinations with Fig.11.7 displaying migra-
tions selected from over utilised hosts as chosen by Q-learning combinations.
Figure 11.7: Q-Learning migrations 100 Iterations
A sizable difference is noticeable between the two update functions with the Sarsa
resulting in an average of between 5,864 to 5,713 migrations per iteration, while the Q-
learning update function average ranging from 2,941 to 3,002 migrations per iteration.
81

11.2 results
The differential in relation to migrations although considerable, is not a design ﬂaw,
rather it is in line with Sutton & Barto cliff walking example Fig 11.8 [73]. Fig 11.9
displays Sarsa as accumulating greater reward similar to the cliff walking. This is the
result of Sarsa’s online policy nature, which considers the action selection policy, therefore
not letting the agent fall of the cliff or this case move an unrewarding machine. Rather it
learns the safer, more consistent and more rewarding path. In contrast Q-learning chooses
to ignore the action selection policy and attempts to converge to the optimum policy, even
though on occasion, this can cause an agent to fall off the cliff, or move a machine of high
cost resulting in extreme negative impact on rewards.
Figure 11.8: Accumulated rewards for cliff walking task [73]
Figure 11.9: Accumulated rewards for migrations
82

11.2 results
The total average migrations per iteration, that is a combined metric of those selected
from both over utilised hosts and under utilised host are contained in Fig 11.10. Again Q-
learning outperforms all other possible combinations and closely aligns with the results
of the 30 day test shown in Fig 11.11.
Figure 11.10: Average migrations 100 iterations
Figure 11.11: Average migrations 30 day workload
83

11.2 results
11.2.3 Service Level Agreement Violations
Service level agreement violations when broken can result in a ﬁnancial penalty for the
service provider. Therefore, data center operators continuously strive to minimize viola-
tions and maximise performance, customer satisfaction and proﬁt. Fig 11.12 and Fig 11.13
displays the overall SLA violations for both the 100 day and 30 iterative test.
Figure 11.12: Overall SLA violations 100 iterations
Once again Q-Learning outperforms all other possible combinations, however this is
the closest of all simulations with Q-Learning/ε-greddy out performing other possible
combinations by between just 0.4-1.4%.
Figure 11.13: Overall SLA violations for 30 days
84

11.2 results
11.2.4 ESV
The reduction of energy can result in a correlated negative effect on SLAs if the method
of reducing energy is not chosen carefully. To measure this effect, we utilise the ESV
metric outlined in Section 10.4. This could be considered the most important metric as it
combines SLAV and energy to give a more inclusive view of the data center performance.
The lower the rate of ESV the more efﬁcient the data center is performing.
Fig.11.15 and Fig.11.14 contains the overall ESV for the itereative and 30 day tests, as
expected from the earlier analysis of energy and SLAV data the Q-Learning/ε-greedy
again outperforms other combinations.
Figure 11.14: Overall ESV for 100 iterations
Figure 11.15: Overall ESV for 30 days
85

11.3 discussion
11.3 discussion
The update/selection based ε-greedy policies outperform softmax based policies in rela-
tion to energy consumption and convergence time. The overall energy consumption for a
30 day workload ranges from a saving of 21kWh to 25kWh. ε-greedy also converges to an
optimum sub 140kWh earlier than Sarsa based combinations, with Q-Learning/Softmax
it closest rival converging after a further 30 iterations.
Fig 11.6 and Fig 11.7 displays the migrations for Sarsa and Q-Learning policies, with
Sarsa incurring a far greater number of migrations as a result of its online policy evalua-
tion and resulting safe approach to VM selection.
As regards SLA violations, Q-Learning/ε-greedy incurs the least violations all be it
by a fractional margin of between 0.4-1.4%, however small the improvement it remains
important not only from a ﬁscal penalty viewpoint but it also highlights that the reduction
of energy is not having a correlated negative effect on SLA violations.
This is further reinforced by the examination of ESV ﬁgures, a metric that as previ-
ously mentioned provides a more inclusive view of data center performance. Again
Q-Learning/ε-greedy records the lowest ESV, outperforming its rivals by between 5-8%.
The Q-Learning/ε-greedy based model consistently out performs other selection/update
policies in both the 30 day and 100 iterative test, it is therefore deemed the best policy for
this environment and has been chosen as the selection/update policy to be used for the
remaining experiments.
86

12
P O T E N T I A L B A S E D R E WA R D S H A P I N G
Chapter 11 highlights Q-learning/ε-Greedy as the optimum performing update/selection
policy. However this does not infer that the policy is performing optimally. In general
a RL agent learns through trial and error by visiting multiple states and carrying out
multiple actions. However such an approach highlights RLs main limitation, that is, its
slowness to converge to optimum performance. This experiment aims to introduce the
advanced RL technique known as potential based reward shaping as outlined in Section
5.4, as a method to improve current convergence rates. The advanced PBRS algorithm
will be analysed against the standard Q-learning/ε-Greedy developed in the previous
section.
PBRS formally outlined in equation 14. is an additional reward calculated as the differ-
ence between the potential of the original state and the resultant state [24].
F(s , s) = γφ(s ) − φ(s) (14)
This calculation is then introduce into the standard Q-learning update function as fol-
lows;
Q(st, at) + α[rt + F(s , s) + γMax(Q(st + 1, ai) − Q(st, at))] (15)
87

12.2 results
12.2 results
12.2.1 Energy
After a single iteration the standard Q-Learning algorithm consumes 146.14kWh and re-
mains 6.14kWh above the optimum level of energy conservation as determined in Chapter
11, while the PBRS algorithm consumed 141.02kWh just 1.02kWh from the optimum level
of consumption. It takes a further 10 iterations before the standard algorithm reaches a
consumption level at which PBRS began. By this time PBRS has long since broken the
sub 140kWh barrier on the 4th iteration. The standard agent continues to learn and it is
not until post the 32th iteration before a consistent level of deviation of between 0-1kWh
is maintained.
Figure 12.1: PBRS vs Q-Learning energy consumption
88

12.2 results
12.2.2 Migrations
The effects of the PBRS based agent are not restricted to energy only, the effects ripple
through the metrics, none less so than migrations. The standard agent after a single
iteration posts a migration count of 22,243 with the PBRS based agent migrating only
18,021 VMs, 4,042 less. In line with the energy data it is not until the 10th iteration that
the standard based agent reaches a migration rate at which PBRS began. The migration
count remains disparate until the 38th iteration where the differential in the migration
count consistently remains sub 1000.
Figure 12.2: PBRS vs Q-Learning migrations
The effects of the PBRS based agent on SLA violations mirrors that of what we have seen
in the previous two sections. The PBRS agent begins at a reduce rate of SLA violations
compared to the standard based agent by 1.28E-06, only after 26 iterations does the stan-
dard based agent surpass this level. And it is not until iteration 31 performs on par with
that of the PBRS agent.
89

12.2 results
Figure 12.3: PBRS vs Q-Learning slav
12.2.4 ESV
As expected given the reduce rate of energy and levels of SLA violations, the PBRS agents
ESV rating begins 7% lower at 0.004921. On the 11th iteration the standard based agent
surpasses this level for the ﬁrst time and shortly after the 30th iteration it consistently
performs on par with that of the PBRS agent.
Figure 12.4: PBRS vs Q-Learning ESV
90

12.3 discussion
12.3 discussion
The addition of PBRS to the Q-learning agent has signiﬁcantly decreased the convergence
time and therefore the time spent by the standard agent learning the state-action space,
overall the PBRS agent was tested on 10% of the overall workload and portrayed con-
sistent results throughout. On all occasions the PBRS converged to a level deemed as
optimal in less than 5 iterations, while the standard agent required on average in excess
of 21 iterations.
The effect was mirrored in relation to migrations with 22% less migrations after a single
iteration, with the standard agent taking an average of 10 iterations to reach this level.
Similar changes were noticeable in regards to the SLA violations with the standard
agent again on average taking over 10 iterations to reach a level where the PBRS agent
began. These improved Energy and SLA violations are reﬂected in the ESV data, with
the PBRS agent after a single iteration running at a 7% lower rate than the standard. On
average it takes the standard agent another 11 iterations to reach that level, from which
the differential between both agents remain steady.
91

13
C O M PA R AT I V E V I E W O F L R - R L V S L R - M M T
Following on from the experiments carried out in Chapters 8 and 9 and the determi-
nation that a PBRS Q-Learning/ε-Greedy based agent provides optimum performance
this section evaluates the algorithm against the leading VM selection policy in research
literature.
Research has previously established that dynamic consolidation algorithms have statis-
tically out preformed both static allocation policies such as DVFS and that heuristic based
dynamic VM consolidation outperforms online deterministic algorithms [12].
The optimal combination of selection-allocation policies were proven to be that of Lr-
Mmt, statistically out performing multiple disparate algorithms [12].
For that reason Lr-Mmt has been designated as the preeminent algorithm with which to
analyse the dynamic virtual machine selection algorithm, Lr-Rl. A 30 day stochastic real
world workload is provided to both algorithms with each algorithm subject to analysis
under four criteria, energy consumption, service level agreement violations, quantity of
virtual machine migrations and ESV.
92

13.2 results
13.2 results
13.2.1 Energy
Fig.13.1 contains the energy consumption data from the experiment, the paired t-test
shows that there is a statistically significant difference in the consumption of energy when
utilizing Lr-Rl and Lr-Mmt resulting in a P-value <0.0041 with a 95% confidence interval
(-7.8685 , -39.8715). As a result over the 30 day period the Lr-Rl algorithm consumes more
than 716kWh less energy overall or 23.87kWh less a day.
Figure 13.1: Energy consumption for 30 day workload
13.2.2 Migrations
The paired t-test shows that there is a significant statistical difference between Lr-Rl
and Lr-Mmt resulting in a P-value <0.0001 with a 95% confidence interval (-8,389.133
, -13,620.86) in relation to migrations. The results of the migration data per day are dis-
played in Fig.13.2. Through the use of Lr-Rl, migrations over the 30 day period decreases
by 330,154 overall or by an average of 11,005 per day.
93

13.2 results
Figure 13.2: Migrations for 30 day workload
When lowering energy usage within a data center, it is imperative to monitor SLA vi-
olations as the lowering of energy can have a parallel negative effect for example, one
can lower the number of active servers through the extreme consolidation of VMs to a
fewer number of servers, this however, results in a greater possibility of servers reaching
100% utilization of their CPU and VMs access to computational processing is restricted
resulting in violations. The SLA violations are displayed in Fig.13.3, the results of a T-
test shows no statistical difference and thus no negative effect on SLA, with a P-value of
<0.2751 with a 95% conﬁdence interval (1.1365 , -3.9669).
Figure 13.3: SLA violations for 30 day workload
94

13.2 results
13.2.4 ESV
The ﬁnal evaluation is ESV, results of which can be seen in Fig.13.4, again the results
reinforces the SLA violations and energy data previously gathered. On carrying out a
t-test, Lr-Rl again proved a statistical improvement in performance, with a P-value of
<0.0001 with a 95% conﬁdence interval (-0.0021 , -0.0037).
Figure 13.4: ESV for 30 day workload
95

13.3 discussion
13.3 discussion
In order to take a closer look at the improved performance of Lr-Rl over Lr-Mmt it is
necessary to take a closer look at a single day and the disparities that lie within. On day
21 a saving of 23.02 kWh of energy occurs , with 11,561 less migrations.
The average number of migrations for that day in order to reduce an over utilized host
to a safe workload for Mmt stood at 2.33 over twice that of RL at 1.06. On occasion Mmt
required as much as 12 migrations from a single in order to reach a safe state with Lr
never requiring more than 4 migrations for a single host.
An explanation of the necessity for extra migrations associated with Mmt can be found
in the data of the VMs chosen for migration. On average a VM chosen by Mmt accounts
for as little as 3.60% of the host overall utilization and therefore requires multiple mi-
grations in order to enter an under utilized state. On the other hand Rl chosen VMs on
average accounts for 18.04% of the overall host utilization, therefore when migrated im-
mediately moves the host to a under utilized state. The correlation between the reduced
amount of migrations and energy reduction of Lr-Rl, measured at industry standard 5
minute intervals for day 21 is shown in Fig. 13.5.
Figure 13.5: Energy & Migration Correlation Day 21
96

13.3 discussion
The difference in the selected VMs level of utilization of its host plays a major factor in
determining the overall number of hosts. Mmt however, places another large restriction
on its selection of these VMs as not only does is not take into account the VM utilization
level it also restricts the selection of VMs to that containing the smallest RAM. Therefore
over 79% of VMs selected for migration are those with a Ram size of 613 regardless of
how large or small their workload is. RL in the other hand does not implement such
restrictions taking into account the more holistic value of utilization levels from both the
host and VMs it allows the agent to select VMs across the full spectrum of Ram sizes, as
seen in Fig. 13.6
(a) Lr-Mmt (b) Lr-Rl
Figure 13.6: Ram sizes of virtual machines migrated
This results in Lr-Rl accounting for 716kWh or 15% less energy consumption, 339,154 or
41% less migrations, with a reduction in the ESV level of 38% and no statistical difference
in service level agreement violations.
97

14
C O N C L U S I O N
14.1 contributions
Reinforcement Learning techniques have been successfully applied to resource allocation
for cloud systems prior to this research. However these were at server or node level, this
research proposed a novel approach to incorporate RL at a lower infrastructural level in
the selection of VMs via reinforcement learning. Due to its low level of abstraction, the
algorithm can be incorporated in multiple cloud infrastructures including stand alone
private, federated and multi-cloud infrastructures.
The high level of Co2+ emissions, associated negative environmental effects along with
the increasing cost and demand of energy from data centers formed the motivation for
this research into the creation of a state of the art, low energy, software policy for the
selection of VMs for migration in IaaS environments. In order to produce such an algo-
rithm the thesis evolved to answer the following questions.
• Is RL a viable approach policy for VM selection in the cloud?
• Can advanced RL techniques improve such a policy?
• Can an RL approach outperform the state of the art selection policy?
99

14.1 contributions
Experiments carried out in Chapter 11. aims not only to address our first research ques-
tion but to further the thesis by providing an optimum update/selection policy for the
selection of VMs in an IaaS environment. Results align with Sutton and Bartos view that
whether softmax or ε-greedy action selection is better depends on the task or the envi-
ronment in which it is deployed [73]. Fig.11.3 presents us with evidence of an agent that
consistently learns to reduce energy, analysis of the results show a Q-Learning/ε-greedy
based agent consistently outperforms other update/selection policies across all four met-
rics.
In Chapter 12 the introduction of the advanced RL technique know as potential based
reward shaping further improved the agent based algorithm. Addressing one of RLs
greatest difficulties, the time of convergence often referred to as the learning period and
the second research question of this thesis. The introduction of PBRS has significantly
decreased the convergence time and has resulted in a direct saving of over 32kWh over
the 100 iterations due to a reduction in the convergence time period. Fig 12.2 highlights
the reduction in convergence time, the PBRS agent converged to a level deemed as opti-
mal in less than 5 iterations, while the standard agent required on average in excess of
30 iterations. This improved performance was seen throughout the data center metrics
with a reduction in migrations, SLA violations and an improvement in the overall ESV.
The importance of PBRS shaping when addressing such a complex problem is outlined
by Devlin et al. findings, that these benefits are more beneficial in complex problem do-
mains where reinforcement learning alone takes a long time to converge and has a large
difference in performance between the initial policy and the final policy converged to [24].
The benefits of introducing a PBRS based agent resides directly in line with the results of
many academic papers including, [24] [50] [31] and [92], however no academic literature
or otherwise could be found in which a PBRS based agent was introduced into a cloud
environment like what has been done in these experiments.
100

14.1 contributions
In Chapter 13 the third research question is addressed, Lr-Rl is compared to Lr-Mmt
selection algorithm. The algorithms are provided with a real world 30 workload. This
results in Lr-Rl accounting for 716kWh or 15% less energy consumption, 339,154 or 41%
less migrations, with a reduction in the ESV level of 38% and no statistical difference in
service level agreement violations. These results shows a significant improvement on the
work of Beloglazov and the Lr-Mmt algorithm [11].
Research carried out by Yuan, Voorsluys and Liu et al. [97] [85] [48], all highlight the
potential savings and improved performance as a direct result of the careful selection of
VMs for migration and the overall lowering migrations within a data center. The findings
of this thesis add further proof of such a theory with Fig13.5 highlighting the direct cor-
relation between reduced migrations and reduced energy usage.
The RL selection policy is one of many elements in the overall process of data center
management. However to avail of up to a 15% energy reduction in just one specific area
goes a long way to addressing both Brown et al. and Koomey et al. research , who esti-
mate savings of up to 25% through the introduction of a energy aware software policies
for the management of data centers[14][42].
The results of RL as a selection policy also adds to the possibility of improved per-
formance for many other pieces of research all of whom have developed their own host
detection algorithm but have used Mmt as a selection policy, including research such as
[28] [33] [53] [97] [27] just to name a few.
Viewing the results of Chapter 13 from an environmental viewpoint, with an average
of 23.87kWh per day, this results in a saving of 8715kWh per year. According to the EPAs
calculations that equates to a saving of 5.9 metric tons of CO2, which would require 4.8
acres of mature forest per year to sequestrate [26].
101

14.2 future work
14.2 future work
Arising from the work presented in this thesis a number of possibilities exist as regards
future work such as;
• The extension of testing across a more disperse cloud topology such as a cross-data
center migration scenario
• The extension of testing in a scaled up testbed
• Further development of the RL framework within CloudSim for optimization pur-
poses
Such additional research not only adds to requirement of energy aware management
policies as highlight by Koomey and Brown [42][14], it also adds to the furthers the
development of CloudSim as a research tool for academia and industry to utilise.
102

B I B L I O G R A P H Y
[1] David Abramson, Rajkumar Buyya, and Jonathan Giddy. A computational economy
for grid computing and its implementation in the nimrod-g resource broker. Future
Generation Computer Systems, 18(8):1061–1074, 2002.
[2] Mohamed Almorsy, John Grundy, and Ingo Müller. An analysis of the cloud com-
puting security problem. In Proceedings of APSEC 2010 Cloud Workshop, Sydney, Aus-
tralia, 30th Nov, 2010.
[3] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz,
Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A
view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.
[4] Raphael M Bahati and Michael A Bauer. Towards adaptive policy-based manage-
ment. In Network Operations and Management Symposium (NOMS), 2010 IEEE, pages
511–518. IEEE, 2010.
[5] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf
Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization.
ACM SIGOPS Operating Systems Review, 37(5):164–177, 2003.
[6] Enda Barrett, Enda Howley, and Jim Duggan. A learning architecture for scheduling
workflow applications in the cloud. In Web Services (ECOWS), 2011 Ninth IEEE
European Conference on, pages 83–90. IEEE, 2011.
[7] Enda Barrett, Enda Howley, and Jim Duggan. Applying reinforcement learning to-
wards automating resource allocation and application scalability in the cloud. Con-
currency and Computation: Practice and Experience, 25(12):1656–1674, 2013.
[8] Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing.
IEEE computer, 40(12):33–37, 2007.
[9] A Barto and RH Crites. Improving elevator performance using reinforcement learn-
ing. Advances in neural information processing systems, 8:1017–1023, 1996.
103

Bibliography
[10] Anton Beloglazov, Jemal Abawajy, and Rajkumar Buyya. Energy-aware resource
allocation heuristics for efficient management of data centers for cloud computing.
Future Generation Computer Systems, 28(5):755–768, 2012.
[11] Anton Beloglazov and Rajkumar Buyya. Optimal online deterministic algorithms
and adaptive heuristics for energy and performance efficient dynamic consolidation
of virtual machines in cloud data centers. Concurrency and Computation: Practice and
Experience, 24(13):1397–1420, 2012.
[12] Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, Albert Zomaya, et al. A
taxonomy and survey of energy-efficient data centers and cloud computing systems.
Advances in Computers, 82(2):47–111, 2011.
[13] Luca Benini, Alessandro Bogliolo, and Giovanni De Micheli. A survey of design
techniques for system-level dynamic power management. Very Large Scale Integra-
tion (VLSI) Systems, IEEE Transactions on, 8(3):299–316, 2000.
[14] Richard Brown et al. Report to congress on server and data center energy efficiency:
Public law 109-431. Lawrence Berkeley National Laboratory, 2008.
[15] Rajkumar Buyya, David Abramson, and Jonathan Giddy. A case for economy grid
architecture for service oriented grid computing. In Parallel and Distributed Pro-
cessing Symposium, International, volume 2, pages 20083a–20083a. IEEE Computer
Society, 2001.
[16] Rajkumar Buyya, Rajiv Ranjan, and Rodrigo N Calheiros. Modeling and simulation
of scalable cloud computing environments and the cloudsim toolkit: Challenges
and opportunities. In High Performance Computing & Simulation, 2009. HPCS’09.
International Conference on, pages 1–11. IEEE, 2009.
[17] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona
Brandic. Cloud computing and emerging it platforms: Vision, hype, and reality for
delivering computing as the 5th utility. Future Generation computer systems, 25(6):599–
616, 2009.
[18] Rodrigo N Calheiros, Rajiv Ranjan, Anton Beloglazov, César AF De Rose, and Rajku-
mar Buyya. Cloudsim: a toolkit for modeling and simulation of cloud computing
environments and evaluation of resource provisioning algorithms. Software: Practice
and Experience, 41(1):23–50, 2011.
104

Bibliography
[19] Michael Cardosa, Madhukar R Korupolu, and Aameek Singh. Shares and utilities
based power consolidation in virtualized server environments. In Integrated Network
Management, 2009. IM’09. IFIP/IEEE International Symposium on, pages 327–334. IEEE,
2009.
[20] V Chaudhary, Minsuk Cha, JP Walters, S Guercio, and Steve Gallo. A comparison
of virtualization technologies for hpc. In Advanced Information Networking and Ap-
plications, 2008. AINA 2008. 22nd International Conference on, pages 861–868. IEEE,
2008.
[21] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Chris-
tian Limpach, Ian Pratt, and Andrew Warﬁeld. Live migration of virtual machines.
In Proceedings of the 2nd conference on Symposium on Networked Systems Design &
Implementation-Volume 2, pages 273–286. USENIX Association, 2005.
[22] William S Cleveland. Robust locally weighted regression and smoothing scatter-
plots. Journal of the American statistical association, 74(368):829–836, 1979.
[23] Robert J. Creasy. The origin of the vm/370 time-sharing system. IBM Journal of
Research and Development, 25(5):483–490, 1981.
[24] Sam Devlin, Daniel Kudenko, and Marek Grze´s. An empirical study of potential-
based reward shaping and advice in complex, multi-agent systems. Advances in
Complex Systems, 14(02):251–278, 2011.
[25] Tharam Dillon, Chen Wu, and Elizabeth Chang. Cloud computing: issues and
challenges. In Advanced Information Networking and Applications (AINA), 2010 24th
IEEE International Conference on, pages 27–33. IEEE, 2010.
[26] Epa.gov. Calculations and references — clean energy — us epa, 2015.
[27] Fahimeh Farahnakian, Pasi Liljeberg, and Juha Plosila. Energy-efﬁcient virtual ma-
chines consolidation in cloud data centers using reinforcement learning. In Paral-
lel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International
Conference on, pages 500–507. IEEE, 2014.
[28] Fahimeh Farahnakian, Tapio Pahikkala, Pasi Liljeberg, and Juha Plosila. Energy
aware consolidation algorithm based on k-nearest neighbor regression for cloud
data centers. In Utility and Cloud Computing (UCC), 2013 IEEE/ACM 6th International
105

Bibliography
[29] Ian Foster, Yong Zhao, Ioan Raicu, and Shiyong Lu. Cloud computing and grid
computing 360-degree compared. In Grid Computing Environments Workshop, 2008.
GCE’08, pages 1–10. Ieee, 2008.
[30] Chunye Gong, Jie Liu, Qiang Zhang, Haitao Chen, and Zhenghu Gong. The char-
acteristics of cloud computing. In Parallel Processing Workshops (ICPPW), 2010 39th
International Conference on, pages 275–279. IEEE, 2010.
[31] Marek Grzes and Daniel Kudenko. Plan-based reward shaping for reinforcement
learning. In Intelligent Systems, 2008. IS’08. 4th International IEEE Conference, vol-
ume 2, pages 10–22. IEEE, 2008.
[32] Steven Hand, Tim Harris, Evangelos Kotsovinos, and Ian Pratt. Controlling the
xenoserver open platform. In Open Architectures and Network Programming, 2003
IEEE Conference on, pages 3–11. IEEE, 2003.
[33] Abbas Horri, Mohammad Sadegh Mozafari, and Gholamhossein Dastghaibyfard.
Novel resource allocation algorithms to performance and energy efﬁciency in cloud
computing. The Journal of Supercomputing, 69(3):1445–1461, 2014.
[34] Jinhua Hu, Jianhua Gu, Guofei Sun, and Tianhai Zhao. A scheduling strategy on
load balancing of virtual machine resources in cloud computing environment. In
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International
Symposium on, pages 89–96. IEEE, 2010.
[35] Yashpalsinh Jadeja and Kirit Modi. Cloud computing-concepts, architecture and
challenges. In Computing, Electronics and Electrical Technologies (ICCEET), 2012 Inter-
national Conference on, pages 877–880. IEEE, 2012.
[36] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and
Calton Pu. Generating adaptation policies for multi-tier applications in consoli-
dated server environments. In Autonomic Computing, 2008. ICAC’08. International
[37] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and
Calton Pu. A cost-sensitive adaptation engine for server consolidation of multitier
applications. In Middleware 2009, pages 163–183. Springer, 2009.
[38] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement
learning: A survey. Journal of artiﬁcial intelligence research, pages 237–285, 1996.
106

Bibliography
[39] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: the
linux virtual machine monitor. In Proceedings of the Linux Symposium, volume 1,
pages 225–230, 2007.
[40] Nadir Kiyanclar. A survey of virtualization techniques focusing on secure on-
demand cluster computing. arXiv preprint cs/0511010, 2005.
[41] Jonathan Koomey. Growth in data center electricity use 2005 to 2010. A report by
Analytical Press, completed at the request of The New York Times, 2011.
[42] Jonathan G Koomey. Estimating total power consumption by servers in the us
and the world, 2007. Lawrence Berkeley National Laboratory, Berkeley, CA, available at:
http://hightech. lbl. gov/documents/DATA CENTERS/svrpwrusecompleteﬁnal. pdf, 2007.
[43] Jonathan G Koomey, Christian Belady, Michael Patterson, Anthony Santos, and
Klaus-Dieter Lange. Assessing trends over time in performance, costs, and energy
use for servers. Lawrence Berkeley National Laboratory, Stanford University, Microsoft
Corporation, and Intel Corporation, Tech. Rep, 2009.
[44] Dara Kusic, Jeffrey O Kephart, James E Hanson, Nagarajan Kandasamy, and Guofei
Jiang. Power and performance management of virtualized computing environments
via lookahead control. Cluster computing, 12(1):1–15, 2009.
[45] B. Ellison L. Minas. Energy efﬁciency for information technology: How to reduce
power consumption in servers and data centers. Intel Press, 2009.
[46] Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang, and Chia-Ying Tseng. A dynamic
resource management with energy saving mechanism for supporting cloud com-
puting. International Journal of Grid and Distributed Computing, 6(1):67–76, 2013.
[47] Weiwei Lin, Chen Liang, James Z Wang, and Rajkumar Buyya. Bandwidth-aware
divisible task scheduling for cloud computing. Software: Practice and Experience,
44(2):163–174, 2014.
[48] Haikun Liu, Hai Jin, Cheng-Zhong Xu, and Xiaofei Liao. Performance and energy
modeling for live migration of virtual machines. Cluster computing, 16(2):249–264,
2013.
[49] Patrick Mannion, Jim Duggan, and Enda Howley. Parallel reinforcement learning
with state action space partitioning. In JMLRWorkshop and Conference Proceedings
0:19, 2015 12th European Workshop on Reinforcement Learning.
107

Bibliography
[50] Patrick Mannion, Jim Duggan, and Enda Howley. Learning traffic signal control
with advice. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS
2015), 2015.
[51] Patrick Mannion, Jim Duggan, and Enda Howley. Parallel learning using heteroge-
neous agents. In Proceedings of the Adaptive and Learning Agents Workshop (at AAMAS
2015), 2015.
[52] Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Reward func-
tion and initial values: better choices for accelerated goal-directed reinforcement
learning. In Artificial Neural Networks–ICANN 2006, pages 840–849. Springer, 2006.
[53] Khushbu Maurya and Richa Sinha. Energy conscious dynamic provisioning of
virtual machines using adaptive migration thresholds in cloud data center. Interna-
tional Journal of Computer Science and Mobile Computing, pages 74–82, 2013.
[54] John McCarthy. Applications of circumscription to formalizing common-sense
knowledge. Artificial Intelligence, 28(1):89–116, 1986.
[55] Lijun Mei, Wing Kwong Chan, and TH Tse. A tale of clouds: Paradigm comparisons
and some thoughts on research issues. In Asia-Pacific Services Computing Conference,
2008. APSCC’08. IEEE, pages 464–469. Ieee, 2008.
[56] David Meisner, Brian T Gold, and Thomas F Wenisch. Powernap: eliminating server
idle power. ACM SIGARCH Computer Architecture News, 37(1):205–216, 2009.
[57] Peter Mell and Tim Grance. The nist definition of cloud computing. Computer Secu-
rity Division, Information Technology Laboratory, National Institute of Standards
and Technology, 2011.
[58] Fereydoun Farrahi Moghaddam, Reza Farrahi Moghaddam, and Mohamed Cheriet.
Carbon-aware distributed cloud: multi-level grouping genetic algorithm. Cluster
Computing, pages 1–15, 2014.
[59] Ripal Nathuji and Karsten Schwan. Virtualpower: coordinated power management
in virtualized enterprise systems. In ACM SIGOPS Operating Systems Review, vol-
ume 41, pages 265–278. ACM, 2007.
[60] Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward
transformations: Theory and application to reward shaping. In ICML, volume 99,
pages 278–287, 1999.
108

Bibliography
[61] Jason Nieh and Ozgur Can Leonard. Examining vmware. Dr. Dobbs Journal, 25(8):70,
2000.
[62] Suraj Pandey, Linlin Wu, Siddeswara Mayura Guru, and Rajkumar Buyya. A par-
ticle swarm optimization-based heuristic for scheduling workflow applications in
cloud computing environments. In Advanced Information Networking and Applications
(AINA), 2010 24th IEEE International Conference on, pages 400–407. IEEE, 2010.
[63] Riccardo Poli, James Kennedy, and Tim Blackwell. Particle swarm optimization.
Swarm intelligence, 1(1):33–57, 2007.
[64] Jette Randløv and Preben Alstrøm. Learning to drive a bicycle using reinforcement
learning and shaping. In ICML, volume 98, pages 463–471, 1998.
[65] Mendel Rosenblum and Tal Garfinkel. Virtual machine monitors: Current technol-
ogy and future trends. Computer, 38(5):39–47, 2005.
[66] Gavin A Rummery and Mahesan Niranjan. On-line q-learning using connectionist
systems. 1994. University of Cambridge, Department of Engineering.
[67] Naidila Sadashiv and SM Dilip Kumar. Cluster, grid and cloud computing: A
detailed comparison. In Computer Science & Education (ICCSE), 2011 6th International
[68] Yuxiang Shi, Xiaohong Jiang, and Kejiang Ye. An energy-efficient scheme for cloud
resource provisioning based on cloudsim. In Cluster Computing (CLUSTER), 2011
IEEE International Conference on, pages 595–599. IEEE, 2011.
[69] Reza Sookhtsaraei, Mirmorsal Madani, and Atena Kavian. A multi objective virtual
machine placement method for reduce operational costs in cloud computing by
genetic. International Journal of Computer Networks & Communications Security, 2(8),
2014.
[70] Richard S Sutton. Learning to predict by the methods of temporal differences. Ma-
chine learning, 3(1):9–44, 1988.
[71] Richard S Sutton. Introduction: The challenge of reinforcement learning. In Rein-
forcement Learning, pages 1–3. Springer, 1992.
[72] Richard S Sutton. Reinforcement learning: Past, present and future. In Simulated
Evolution and Learning, pages 195–197. Springer, 1999.
109

Bibliography
[73] Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT
Press, 1998.
[74] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of
the ACM, 38(3):58–68, 1995.
[75] Gerald Tesauro, Nicholas K Jong, Rajarshi Das, and Mohamed N Bennani. On the
use of hybrid reinforcement learning for autonomic resource allocation. Cluster
Computing, 10(3):287–299, 2007.
[76] Michel Tokic and Günther Palm. Value-difference based exploration: adaptive con-
trol between epsilon-greedy and softmax. In KI 2011: Advances in Artificial Intelli-
gence, pages 335–346. Springer, 2011.
[77] Wei-Tek Tsai, Xin Sun, and Janaka Balasooriya. Service-oriented cloud computing
architecture. In Information Technology: New Generations (ITNG), 2010 Seventh Inter-
national Conference on, pages 684–689. IEEE, 2010.
[78] Rich Uhlig, Gil Neiger, Dion Rodgers, Amy L Santoni, Fernando CM Martins, An-
drew V Anderson, Steven M Bennett, Alain Kagi, Felix H Leung, and Larry Smith.
Intel virtualization technology. Computer, 38(5):48–56, 2005.
[79] Seema Vahora and Ritesh Patel. Cloudsim a survey on vm management techniques.
In International Journal of Advanced Research in Computer and Communication Engineer-
ing, pages 128 – 123, 2015.
[80] Vytautas Valancius, Nikolaos Laoutaris, Laurent Massoulié, Christophe Diot, and
Pablo Rodriguez. Greening the internet with nano data centers. In Proceedings of the
5th international conference on Emerging networking experiments and technologies, pages
37–48. ACM, 2009.
[81] Akshat Verma, Puneet Ahuja, and Anindya Neogi. pmapper: power and migration
cost aware application placement in virtualized systems. In Middleware 2008, pages
243–264. Springer, 2008.
[82] Akshat Verma, Gargi Dasgupta, Tapan Kumar Nayak, Pradipta De, and Ravi
Kothari. Server workload analysis for power minimization using consolidation. In
Proceedings of the 2009 conference on USENIX Annual technical conference, pages 28–28.
USENIX Association, 2009.
110

Bibliography
[83] vmware.com. Paravirtualization, 2014.
[84] vmware.com. Hypervisor performance, 2015.
[85] William Voorsluys, James Broberg, Srikumar Venugopal, and Rajkumar Buyya. Cost
of virtual machine live migration in clouds: A performance evaluation. In Cloud
Computing, pages 254–265. Springer, 2009.
[86] Carl A Waldspurger. Memory resource management in vmware esx server. ACM
SIGOPS Operating Systems Review, 36(SI):181–194, 2002.
[87] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–
292, 1992.
[88] Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD
thesis, University of Cambridge England, 1989.
[89] Guiyi Wei, Athanasios V Vasilakos, Yao Zheng, and Naixue Xiong. A game-
theoretic method of fair resource allocation for cloud computing services. The Jour-
nal of Supercomputing, 54(2):252–269, 2010.
[90] Shimon Whiteson and Peter Stone. Evolutionary function approximation for rein-
forcement learning. The Journal of Machine Learning Research, 7:877–917, 2006.
[91] Bhathiya Wickremasinghe, Rodrigo N Calheiros, and Rajkumar Buyya. Cloudan-
alyst: A cloudsim-based visual modeller for analysing cloud computing environ-
ments and applications. In Advanced Information Networking and Applications (AINA),
2010 24th IEEE International Conference on, pages 446–452. IEEE, 2010.
[92] Eric Wiewiora, Garrison Cottrell, and Charles Elkan. Principled methods for advis-
ing reinforcement learning agents. In ICML, pages 792–799, 2003.
[93] www.nskinc.com. cloud-computing-101, 2015.
[94] Xenproject.org. Vs15: Video spotlight with cavium’s larry wikelius, 2015.
[95] Andrew J. Younge, Robert Henschel, James T. Brown, Gregor von Laszewski, Judy
Qiu, and Geoffrey Fox. Analysis of virtualization technologies for high perfor-
mance computing environments. In IEEE International Conference on Cloud Comput-
ing, CLOUD 2011, Washington, DC, USA, 4-9 July, 2011, pages 9–16, 2011.
111

Bibliography
[96] Lamia Youseff, Rich Wolski, Brent Gorda, and Chandra Krintz. Paravirtualization
for hpc systems. In Frontiers of High Performance Computing and Networking–ISPA
2006 Workshops, pages 474–486. Springer, 2006.
[97] Jingling Yuan, Xuyang Miao, Lin Li, and Xing Jiang. An online energy saving
resource optimization methodology for data center. Journal of Software, 8(8):1875–
1880, 2013.
[98] Qi Zhang, Lu Cheng, and Raouf Boutaba. Cloud computing: state-of-the-art and
research challenges. Journal of internet services and applications, 1(1):7–18, 2010.
[99] Xiaoyun Zhu, Don Young, Brian J Watson, Zhikui Wang, Jerry Rolia, Sharad Sing-
hal, Bret McKee, Chris Hyser, Daniel Gmach, Rob Gardner, et al. 1000 islands:
Integrated capacity and workload management for the next generation data center.
In Autonomic Computing, 2008. ICAC’08. International Conference on, pages 172–181.
IEEE, 2008.
[100] Dimitrios Zissis and Dimitrios Lekkas. Addressing cloud computing security issues.
Future Generation Computer Systems, 28(3):583–592, 2012.
112

Master_Thesis

More Related Content

What's hot

Similar to Master_Thesis

Master_Thesis