AI Optimisation Approach for Autonomic
Cloud Computing
Kieran Flesk
Submitted in accordance with the requirements for the degree of
Masters of Science
Software Design & Development
College of Engineering & Informatics
National University of Ireland, Galway
Research Supervisor: Dr. Enda Howley
August 2015
A B S T R A C T
Cloud computing has led to exponential growth in large scale data centers and ware-
houses, which form the paradigms substratum layer, Infrastructure as a Service. These
large scale server warehouses consume substantial energy, not only to power servers, but
also affiliated processes such as cooling. Dynamic consolidation of virtual machines us-
ing live migration and switching idle nodes to the sleep mode allows cloud providers to
optimize resource usage and reduce energy consumption. The following research pro-
poses a novel reinforcement learning approach for the selection of virtual machines for
migration. Due to low level of abstraction, the proposed algorithm provides a decision
support system which supports efficient and open application deployment, monitoring,
and execution across different cloud service providers and results in lowering energy
consumption without negatively effecting service level agreements.
2
A C K N O W L E D G E M E N T S
Firstly, I would like to express my sincere gratitude to my supervisor Dr. Enda Howley
for the continuous support of my masters study and related research, for his patience,
motivation, and immense knowledge. His guidance helped me immensely in the research
and writing of this thesis. I could not have imagined having a better adviser and mentor
for my masters.
I would like to thank my family especially my parents and their unwavering support
in my decision to return to education and to my brothers and sister for supporting me
throughout the writing this thesis.
Finally I would like to thank my fellow researchers and friends who all have con-
tributed to the final product in one way or another.
3
D E C L A R AT I O N
The candidate confirms that the work submitted is his own and that appropriate credit has been
given where reference has been made to the work of others
P U B L I C AT I O N
A Reinforcement Learning Decision Support System for the Selection of Virtual Machines
Kieran Flesk, Dr. Enda Howley
Springer Special Edition Journal of Internet Services and Applications
Under Review
C O N T E N T S
i introduction 15
1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1 Motivations and Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
ii literature review 19
2 cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1 Origins of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.1 Cluster Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.3 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Characteristics of Cloud Computing . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Scalability of Infrastructure . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Autonomic Resource Control / Elasticity . . . . . . . . . . . . . 22
2.2.3 Service Centric Approach . . . . . . . . . . . . . . . . . . . . . . 23
2.2.4 Omnipresent Network Accessibility . . . . . . . . . . . . . . . . 23
2.2.5 Multi-Tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.6 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Cloud Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Private Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Community Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Cloud Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 data centers and energy consumption . . . . . . . . . . . . . . . . 29
3.1 Areas of energy consumption . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Server Consumption . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Power Management Techniques . . . . . . . . . . . . . . . . . . . . . . . 30
6
CONTENTS
3.2.1 Dynamic Component Deactivation . . . . . . . . . . . . . . . . . 31
3.2.2 Dynamic Performance Scaling . . . . . . . . . . . . . . . . . . . 31
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Modern Day Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Levels of Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Full Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Hardware Assisted Virtualization . . . . . . . . . . . . . . . . . 36
4.3 Hypervisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.3 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 Agent / Environment Interaction . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Learning Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.2 SARSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Action Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.1 ε-Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3.2 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Reward Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4.1 Potential Based Reward Shaping . . . . . . . . . . . . . . . . . . 47
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 related research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1 Threshold and Non-Threshold Approach . . . . . . . . . . . . . . . . . 50
6.2 Artificial Intelligence Based Approach . . . . . . . . . . . . . . . . . . . 53
6.3 Reinforcement Learning Based Approach . . . . . . . . . . . . . . . . . 55
6.4 Virtual Machine Selection Policies . . . . . . . . . . . . . . . . . . . . . 56
6.4.1 Maximum Correlation . . . . . . . . . . . . . . . . . . . . . . . . 57
6.4.2 Minimum Utilization Policy . . . . . . . . . . . . . . . . . . . . 57
6.4.3 The Random Selection Policy . . . . . . . . . . . . . . . . . . . . 57
6.5 Research Group Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7
CONTENTS
iii methology 60
7 cloudsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 CloudSim Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.3 Energy Aware Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3.1 Initialising an Energy Aware Policy . . . . . . . . . . . . . . . . 64
7.3.2 Creating a Selection Policy . . . . . . . . . . . . . . . . . . . . . 64
7.4 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.5 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8 algorithm development . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.1 Registering a Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . 66
8.2 Recording of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Additional Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.1 Lr-RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.2 RlSelectionPolicy . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.3 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.4 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.3.6 RlUtilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9 implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.1 State-Action Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2 Q-Learning Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.3 SARSA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
iv experiments 74
10 experiment metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.1 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.3 Service Level Agreement Metrics . . . . . . . . . . . . . . . . . . . . . . 75
10.3.1 SLATAH, PDM & SLAV . . . . . . . . . . . . . . . . . . . . . . . 76
10.4 Energy and SLA Violations . . . . . . . . . . . . . . . . . . . . . . . . . 76
11 selection of policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8
CONTENTS
11.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
11.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
11.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 84
11.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
12 potential based reward shaping . . . . . . . . . . . . . . . . . . . . . 87
12.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
12.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 89
12.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
12.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13 comparative view of lr-rl vs lr-mmt . . . . . . . . . . . . . . . . . 92
13.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
13.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 94
13.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
v conclusion 98
14 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9
L I S T O F F I G U R E S
Figure 2.1 Private cloud [93] . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 2.2 Public cloud [93] . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 2.3 Hybrid cloud[93] . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 2.4 High level cloud architecture [15] . . . . . . . . . . . . . . . . . 27
Figure 5.2 PBRS effect [92] . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 7.1 CloudSim class structure [12] . . . . . . . . . . . . . . . . . . . 65
Figure 8.1 The reinforcement learning CloudSim architecture . . . . . . . 68
Figure 11.1 Energy consumption Q-Learning 100 iterations . . . . . . . . 78
Figure 11.2 Energy consumption SARSA 100 iterations . . . . . . . . . . . 79
Figure 11.3 Q-Learning ε-Greedy vs SARSA ε-Greedy . . . . . . . . . . . . 79
Figure 11.4 Overall Energy Consumption 30 Day Workload . . . . . . . . 80
Figure 11.5 Average Daily Energy Consumption 30 Day Workload . . . . 80
Figure 11.6 SARSA migrations 100 Iterations . . . . . . . . . . . . . . . . . 81
Figure 11.7 Q-Learning migrations 100 Iterations . . . . . . . . . . . . . . 81
Figure 11.8 Accumulated rewards for cliff walking task [73] . . . . . . . . 82
Figure 11.9 Accumulated rewards for migrations . . . . . . . . . . . . . . 82
Figure 11.10 Average migrations 100 iterations . . . . . . . . . . . . . . . . 83
Figure 11.11 Average migrations 30 day workload . . . . . . . . . . . . . . 83
Figure 11.12 Overall SLA violations 100 iterations . . . . . . . . . . . . . . . 84
Figure 11.13 Overall SLA violations for 30 days . . . . . . . . . . . . . . . . 84
Figure 11.14 Overall ESV for 100 iterations . . . . . . . . . . . . . . . . . . . 85
Figure 11.15 Overall ESV for 30 days . . . . . . . . . . . . . . . . . . . . . . 85
Figure 12.1 PBRS vs Q-Learning energy consumption . . . . . . . . . . . . 88
Figure 12.2 PBRS vs Q-Learning migrations . . . . . . . . . . . . . . . . . . 89
Figure 12.3 PBRS vs Q-Learning slav . . . . . . . . . . . . . . . . . . . . . . 90
Figure 12.4 PBRS vs Q-Learning ESV . . . . . . . . . . . . . . . . . . . . . . 90
Figure 13.1 Energy consumption for 30 day workload . . . . . . . . . . . . 93
Figure 13.2 Migrations for 30 day workload . . . . . . . . . . . . . . . . . . 94
Figure 13.3 SLA violations for 30 day workload . . . . . . . . . . . . . . . 94
10
LIST OF FIGURES
Figure 13.4 ESV for 30 day workload . . . . . . . . . . . . . . . . . . . . . . 95
Figure 13.5 Energy & Migration Correlation Day 21 . . . . . . . . . . . . . 96
Figure 13.6 Ram sizes of virtual machines migrated . . . . . . . . . . . . . 97
11
A C R O N Y M S
SLA Service Level Agreement
API Application Programming Interface
OS Operating system
QOS Quality of service
IT Information technology
IaaS Infrastructure as a service
PaaS Platform as a service
SaaS Software as a service
UPS Universal power supply
PUE Power Usage Efficiency
PDU Power distribution unit
DVFS Dynamic voltage frequency scaling
DRAM Direct Random Access Memory
SPM Static power management
DPM Dynamic power management
DCD Dynamic component distribution
DPS Dynamic performance scaling
CTSS Compatible time sharing systems
CP Control program
CMS Conventional monitor system
12
VMM Virtual machine management
ABI Application binary translation
KVM Kernel virtual machine
MMU Memory management unit
TLB Table lookaside buffer
RL Reinforcement Learning
TD Temporal Difference
PBRS Potential Based Reward System
AI Artificial Intelligence
MDP Markov Decision Process
PM-L Local power management
PM-G Global power management
LNQS Layered queuing network solver
GA Genetic algorithm
MLGGA Multi Layered Grouped genetic algorithm
GGA Grouped genetic algorithm
LR Local regression
MMT Minimum migration time
VM Virtual machine
RL Reinforcement Learning
CPU Central processing unit
LR Local Regression
MC Maximum Correlation
13
MC Maximum Correlation
MU Minimum Utilization
RS Random Selection
MIPS Millions of instructions per second
PDM Performance Degradation Due to Migration
SLATAH Service Level Agreement Time Per Active
SLAV Service Level Agreement Violation
ESV Energy and Service Level Agreement Violation
PC Personal Computer
14
Part I
I N T R O D U C T I O N
1
I N T R O D U C T I O N
Cloud computing refers to both the applications delivered as services over the Internet
and the hardware and software systems in the data centers that provide them [3]. Buyya
et al. defines cloud computing as a type of parallel and distributed system, consisting
of a collection of interconnected and virtualized computers, that are dynamically provi-
sioned and presented as one or more unified computing resources, based on service level
agreements (SLA) established through negotiation between the service provider and cus-
tomer [17]. Regardless of the ever growing heterogeneous nature of cloud platforms and
deployments, this definition still rings true.
Other key cornerstones also remain despite the ever changing landscape, one such cor-
nerstone is the ability of cloud providers to virtualize the key constituents which form
the lowest level of the cloud architecture known as infrastructure as a service layer (IaaS),
principally large scale data centers. The virtualization of large scale congregations of
nodes, typical of that found in modern day data centers into multiple virtual indepen-
dent machines executing on a single node, not only allows for the plasticity of services,
but plays a key role in the high level adherence of SLAs and maximum utilization of
resources which underpin the foundations of cloud computing, while providing maxi-
mum return on investment for service providers. However, due to the dynamic nature of
cloud services and their demands, the static or offline management of these virtual ma-
chines (VM) provides significant restrictions in the provision of an idealistic cloud service.
16
1.1 motivations and aims
The provision of such virtualized environments and services comes at a cost, studies
such as [8][41][42], highlight the growth of data centers directly resulting in
• An increase in energy consumption in the range of billions of kwh’s from the begin-
ning of the decade.
• An annual increase in the emission of Co2 from 42.8 million metric tons in 2007 to
67.9 million metric tons in 2011.
Much of this energy is wasted in idle systems, in typical deployments, server utilization
is below 30%, but idle servers still consume 60% of their peak energy draw. In order to
combat such wastage, advanced consolidation policies for under utilized and idle servers
are required to be deployed [56].
Two key findings from the Report to Congress titled Server and Data Center Energy
Efficiency 2007 directly addresses this issue by stating that ;
”Existing technologies and strategies could reduce typical server energy use by an estimated
25%. ”[14]
”Assuming state-of-the-art energy efficiency practices are implemented throughout U.S. data cen-
ters, this projected energy use can be reduced by up to 55% compared to current efficiency
trends”[14]
The following thesis purposes one such state-of-the-art energy efficiency policy.
1.1 motivations and aims
Motivated by these facts and the success of previous research regarding reinforcement
learning (RL) as an optimisation technique. The aim of this research is to design, develop,
implement and evaluate a RL agent based approach for the selection of VMs for migration
in a stochastic IaaS environment in order to reduce energy consumption.
17
1.2 research questions
1.2 research questions
This thesis aims to answer the following research questions
• Is reinforcement learning a viable approach for virtual machine selection in the
cloud ?
• Can advanced reinforcement learning techniques improve such a policy ?
• Can a reinforcement learning approach outperform the state of the art selection
policy ?
1.3 thesis structure
The thesis is laid out as follows.
• Chapter 1 contains an introduction that provides an overview of the research topic
and introduces the research questions, motivations and aims.
• Chapter 2-6 contains a literature review covering
– Cloud Computing
– Data Centers and Energy Consumption
– Virtualization and Hypervisors
– Reinforcement Learning
– Resource Allocation and Selection Methods
• Chapter 7-9 contains the methodology of the thesis including
– CloudSim Simulator
– Algorithm Development & Implementation
• Chapter 10-13 contains the experiments carried out
– The Policy Selection
– Addition of Potential Based Reward Shaping
– Comparative View of Lr-Rl vs Lr-Mmt
• Chapter 14 contains the conclusions and possible areas of future work.
18
Part II
L I T E R AT U R E R E V I E W
2
C L O U D C O M P U T I N G
The following chapter contains an in depth review of the most pertinent academic re-
search available in relation to cloud computing, its characteristics, architecture, service
and deployment models.
2.1 origins of cloud computing
There has been a long time vision of providing computer services as a utility alongside
water, gas, electricity and telephone. To achieve this individuals and companies must be
able to access the services they require on demand with the scalability and flexibility they
require in a pay-per-use environment [17]. The following section outlines the historic
progression towards such a scenario.
2.1.1 Cluster Computing
Originally super computers led the way in large scale computational tasks in areas such
as science, engineering and commerce, eventually however more extensive computational
energy was required to cater for such problems and from this cluster computing was
developed. A cluster is a collection of parallel or distributed computers, which are in-
terconnected among themselves using high speed networks often, in the form of local
area networks [67]. Multiple computers and their resources are combined to function as
a virtual computer, allowing for greater computational energy. Each node carries out the
same task and each cluster contains redundant nodes, which allows for a backup should
a utilized node fail. Computers in a cluster can be described as homogeneous as they use
the same operating systems (OS) and hardware.
20
2.1 origins of cloud computing
2.1.2 Grid Computing
Grid computing, originally developed to meet the high computational demands of sci-
entific research. Grid computing is a distributed network, which couples a wide variety
of geographically distributed computational resources such as personal computers (PCs),
workstations, clusters, storage systems, data sources, databases, computational kernels,
special purpose scientific instruments and presents them as a unified integrated resource
[15]. These grids are commonly established maintained and owned by large research
groups with shared interest. Such an infrastructure requires a complex management
system having to manage multiple global locations, multiple owners, heterogeneous com-
puter networks and hardware as well as user policies and availability [1].
2.1.3 Cloud Computing
The most recent computing paradigm to progress towards the vision of providing com-
puter services as a utility is cloud computing. A cloud is a type of parallel and distributed
system consisting of a collection of interconnected and virtualized computers that are dy-
namically provisioned and presented as one or more unified computing resources in order
to provide a service underpinned by high levels of quality of service (QOS) and SLAs [17].
Cloud computing has been defined in many different ways the following is just one of
those definitions Ian Foster et al. describes it as,
”A large scale distributed computing paradigm that is driven by economies of scale,
in which a pool of abstracted, virtualized, dynamically-scalable, managed computing
energy, storage, platforms and services are delivered on demand to external customers
over the Internet.”
[29]
21
2.2 characteristics of cloud computing
2.2 characteristics of cloud computing
The characteristics of cloud computing infrastructures and models are a recurring concept
in literature, highlighted no more so than by Gong et al. comprehensive research review
[30]. The following section contains a brief explanation of these characteristics.
2.2.1 Scalability of Infrastructure
A key feature of any cloud computing architecture is its ability to scale accordingly in
accordance with peaks and troughs in customer demand , such scalability not only al-
lows for providers to maintain SLAs but also allows for the strategic management of data
center resources, thus reducing costs.
Scalability can be summarised into two separate categories. Horizontal scalability refers
to the ability of a node to access extra processing resources from other nodes within a
data center i.e multiple nodes to work as a single logical node to perform a task, therefore
maintaining SLAs and QOS. Vertical scalability refers to the ability of additional resources
to be added to a single node where necessary, such as increasing bandwidth, memory or
central processing unit (CPU) utilization [55].
2.2.2 Autonomic Resource Control / Elasticity
The ability of services to be extended or retracted autonomously depending on demand
is a key characteristic of cloud computing [100]. This is also referred to by Zhang et al.
as the ability of self-organization [98]. This elasticity is a key aspect which differentiates
cloud computing from the more rigid grid and cluster computing. It is also a key selling
point as customers are offered the ability to re-size their hardware needs in parallel with
their requirements without the expense of investing in physical resources which may lie
largely redundant for long periods of time.
22
2.2 characteristics of cloud computing
2.2.3 Service Centric Approach
Cloud computing providers deliver an on-demand service model, delivering services
when and where they are needed. These services are provided in accordance to SLAs
and QOS agreed between the consumer and provider prior to provider obtaining control
of the task [98].
2.2.4 Omnipresent Network Accessibility
Services can be accessed via an Internet connection from any location using a range of
heterogeneous devices at any given time [100].
2.2.5 Multi-Tenancy
Multi-tenancy refers to the sharing of cloud resources, including CPU, memory, networks
and applications [2]. Multi-tenancy in cloud computing plays a key role in the financial
viability of providing such a service. Although users share these resources, providers
place a layer of virtualization technology above the hardware layer which allows a cus-
tomized and partitioned virtual application. Multi-tenancy is seen as a major aspect of
cloud security at all layers of the cloud infrastructure, through partitioning and isolation
of virtual resources providers strive to provide maximum security [2].
2.2.6 Virtualization
Virtualization allows service providers to create multiple instances of virtual machines on
a single server [11]. Each of these virtual machines can run different operating systems
(OS) independent of the underlying OS. The ability of a provider to provide multiple
instances of machines from a single server contributes greatly to the viability of cloud
computing by maximising return on investment. Virtualization is discussed in further
detail in Chapter 4.
23
2.3 cloud deployment models
2.3 cloud deployment models
There are four types of cloud deployment models that are commonly referred to in litera-
ture, these are private, public, hybrid and community. The following section outlines the
structure of each model.
2.3.1 Private Cloud
A private cloud is a cloud model that is devoted solely to one organisation. The cloud
infrastructure may be located in-house or elsewhere in a single or multiple data centers.
It may be managed solely by the organisation or by a third party [25]. A private cloud can
offer high security, performance and reliability, however the cost associated with private
clouds is often higher than that of other models [98].
Figure 2.1: Private cloud [93]
24
2.3 cloud deployment models
2.3.2 Community Cloud
A community cloud is a cloud infrastructure built and shared by multiple organisations
which share common policies, practices and regulations. The underlying infrastructure
can be third party hosted or by an individual organisation within the community [35].
2.3.3 Public Cloud
A public cloud is where commercial entities offer cloud services to the general public
usually on a pay-per-use model. Public clouds offer the benefit of no upfront capital
expenditure on infrastructure, however the refinement of services and security as seen in
private clouds are not as extensively available [98]. Examples of such services are Amazon
EC2 or Google compute engine.
Figure 2.2: Public cloud [93]
25
2.3 cloud deployment models
2.3.4 Hybrid Cloud
A hybrid cloud contains the facilities of two or more cloud models, private and other
public or community clouds. Such a model allows for a private cloud to be held in-
house while certain aspects of the information technology (IT) infrastructure can be held
on public clouds. Such an infrastructure supplies an organisation with the ability to
retain high security and specific optimisation, while maintaining the elasticity provided
by public clouds [98].
Figure 2.3: Hybrid cloud[93]
26
2.4 cloud architecture
2.4 cloud architecture
Fig.2.4 shows a high level view of a cloud computing architecture, an architecture which
is tightly coupled with what is known as the cloud service models.
At the lowest level is the hardware level these are data centers that hold large volumes
of physical servers and associated equipment. On top of the hardware lies the infras-
tructure layer, this layer virtualizes the servers held in the data centers on demand, by
creating multiple instances of virtual machines which includes virtualizing CPUs, mem-
ory, storage etc. These first two layers combine the necessary elements to provide IaaS to
the consumer.
Figure 2.4: High level cloud architecture [15]
The third layer, the platform layer provides a development, modelling, testing and
deployment environment for developers of applications hosted in the cloud [29]. The de-
velopers have little or no access to the underlying networks servers etc. except for some
minor user configuration [57]. This allows for the provision of a platform as a service
(PaaS) as a cloud service model.
The top layer known as the application layer is the user interface of cloud computing
usually supplied via browsers with heterogeneous Internet enable devices [77]. This layer
allows access via a web browser or an application interface for software applications
hosted on cloud servers. The consumer has no control of the underlying infrastructure of
the cloud or the applications capabilities except those provided by the creator. This cloud
service model is referred to as software as a service (SaaS) [57].
27
2.5 summary
2.5 summary
This chapter reviewed cloud computing from a high level viewpoint, reviewing the ori-
gins, characteristics, models and architecture of cloud computing. Key pieces of liter-
ature outlined the foundations of cloud computing in grid and cluster computing, the
importance of autonomic resource control, scalability of infrastructure and virtualization
in providing a cost effective and adaptable cloud.This chapter concludes with a review
of could models and architecture in order to convey their everyday real world use and
applications.
28
3
D ATA C E N T E R S A N D E N E R G Y C O N S U M P T I O N
From the years 2005 to 2010 the worldwide consumption of energy in data centers has
increased by 56%. In the year 2010 the data center energy consumption worldwide ac-
counted for 1.3% of all energy consumption. Furthermore of the approximate 6000 data
centers present in America in 2006 it cost $4.5 billion in energy overheads [41]. These fig-
ures highlight the extensive consumption of energy in data centers and the necessity of all
shareholders to actively pursue methods by which to reduce consumption both from an
economic and environmental viewpoint . This chapter examines the most current and rel-
evant research in relation to energy consumption and preservation techniques deployed
within large scale data centers.
3.1 areas of energy consumption
For a number of years, researchers and engineers have focused on improving the perfor-
mance of data centers, and in doing so have improved systems year on year. However
although the performance per watt has increased the total energy consumption has re-
mained static and in some cases risen [43]. In order to combat excessive consumption of
energy it is important to recognize the disparate elements which consume energy within
a data center. Servers naturally consume a large proportion of the overall energy intake,
however the associated infrastructural demands are also a major factor when calculating
overall costs. These costs are calculated via the Power Usage Efficiency (PUE) metric,
which is defined as the ratio between the total energy consumed by a data center and
the energy consumed by IT equipment such as servers, networking equipment and disk
drivers. The PUE factor ranges from as high as 2.0 in legacy data centers to as low as 1.2
in recent state of the art facilities [80]. At a PUE rate of 2.0, every kilowatt utilised by IT
components another kilowatt is consumed by infrastructure loads such as cooling, fans,
29
3.2 power management techniques
pumps, uninterrupted energy supplies (UPS) and power distribution units (PDU). In or-
der to remain within the scope of the research paper the author will solely investigate
energy usage in relation to IT components.
3.1.1 Server Consumption
Intel research has proven that the main source of energy consumption in a server remains
the CPU, however it no longer maintains the dominance of energy consumption it once
did due to the implementation of energy efficiency and energy saving techniques such as
dynamic voltage and frequency testing (DVFS) [45]. DVFS is a hardware based solution
which dynamically adjusts voltage and frequency of a CPU in accordance with workload
demand. The purpose of applying DVFS is to reduce energy consumption by lowering the
voltage and frequency levels, however this can lead to degradation of execution speeds
[46]. DVFS is important for energy management at server level as it allows a CPU to run
at levels as low as 30%. However the CPU is the only server component with the ability
to perform such a task, disk drives, dynamic random-access memory (DRAM), fans etc
can only cycle between states of on, off or idle which result in a idle server consuming in
excess of 70% of its overall energy.
3.2 power management techniques
Energy management techniques are incorporated in all aspects of system design, Bel-
oglazov breaks these techniques into two subsections of static power management (SPM)
and dynamic power management (DPM). SPM incorporates all design time power man-
agement methods, including complex gate and transistor design, energy switching in
circuits at a logical level and the incorporation of energy optimization techniques at archi-
tecture level [12]. Dynamic power management (DPM), refers to the run-time adaptability
of a system in correlation with resource demand. DPM technologies can be further sub-
divided into two further sections dynamic component deactivation (DCD) and dynamic
performance scaling (DPS).
30
3.2 power management techniques
3.2.1 Dynamic Component Deactivation
DCD incorporates the switching of power states for a component that does not incorpo-
rate DPS techniques such as DVFS. Switching between power sates i.e active-idle, idle-off
can result in significant energy consumption at the reinitialisation stage should the com-
ponent be required at a later stage, therefore it is necessary to ensure DCD occurs only
when the saved energy consumption through deactivation is greater than that accrued
during reinitialization [12]. Benni et al. states that to apply DCD techniques, a workload
must be possible to predict [13]. This prediction and the accuracy thereof is imperative to
the performance of such techniques. These predictions are based on usage of the overall
system to date and possible use in the near future. An example as shown by Benini et al.
is that of a timeout function on a laptop, where a laptop moves from active to idle after
a period of time in the presumption that it was idle for x minutes therefore it is likely
to remain idle for an additional x amount of time [13]. Predictive policies rely on past
data and its correlation to future events. Through the analysis of past performance and
demands the system forms both predictive shut down and predictive wake up techniques.
3.2.2 Dynamic Performance Scaling
DPS allows the utilization and application of energy saving techniques in hardware com-
ponents with the ability to alter the frequency and clock speeds, mainly CPUs when they
are not fully utilized. This technique is known as DVFS. In order to save maximum en-
ergy, a system requires both frequency scaling i.e the ability to alter the clock speed and
voltage scaling. The implementation of such a technique is by no means straight forward,
reducing the instruction processing capability reduces throughput and performance, this
in turn increases a programs run-time and may not result in maximum energy consump-
tion, therefore it is necessary to balance the energy/performance ratios within a system
through careful approximation.
In order to optimize the ratio three common techniques are implemented. Interval
based algorithms harnesses past system usage data and adjusts voltage and frequency
in line with predicted future use. Intertask algorithms distinguish the number of tasks
in relation to the CPU in real-time systems and denotes resources appropriately. This
can become complex in a system with unpredictable heterogeneous workloads. Intratask
31
3.3 summary
algorithms looks at the data and individual components with a specific program and then
provides resources appropriately [12].
3.3 summary
This chapter focused on the area of energy consumption in data centers. Following a
brief introductory section highlighting energy consumption on a world scale, Section 3.1.1
focuses on the specific area of energy consumption within data centers through tertiary
elements such as PDUs and UPS and defines the PUE metric. Section 3.1.1 takes a closer
look at hardware specific consumption, particularly at server level. The chapter closes by
reviewing two key power management techniques, DCD and DVFS.
32
4
V I RT U A L I Z AT I O N
Although virtualization has seen increased notoriety and usage since the early 1990’s, it
was originally developed as far back as 1964 by IBM as a method to increase productivity
levels of both the hardware and the user. In the 1960’s many engineers, scientist and
large scale research groups were using programs to carry out research, however these
programs were resource intensive, requiring the full use of the hardware system and the
supervision of a researcher to run and record results.
This led to some pioneering work in areas such as compatible time sharing systems
(CTSS) at M.I.T in the early 1960’s [23]. CTSS allowed batch jobs to be run in parallel with
users request to run programs. This in turn led to the creation of the control program and
conventional monitor system (CP/CMS) in 1964 and was known as a second generation
time sharing machine, this machine was built on the concepts of the earlier CTSS. CP pro-
vided separate computing environments, while the CMS allowed for autonomy through
sharing, allocation and protection policies [23], similar to the operations carried out by
the virtualization layer in modern cloud environments.
33
4.1 modern day virtualization
4.1 modern day virtualization
In 1998 VMware conquered the task of virtualizing the x86 platform through a combina-
tion of binary translation and direct execution on the processor allowing multiple guest
OS on a single host [83].
Virtualization is the faithful reproduction of an entire architecture in software which
provides the illusion of a real machine to all software running above it [40]. In an era of
on-demand computing the ability to virtualize a single server into multiple instances of
virtual machines running separate guest OS and with secure, reliable access to resources
such as I/O devices, memory and storage has proven imperative in the growth of cloud
computing.
Virtualization of a single server into multiple VMs is achieved by placing an extra
layer known as a hypervisior directly on top of the hardware and beneath the OS layer.
This layer, also known as Virtual machine monitor (VMM) is responsible for providing
total mediation between all VMs and the underlying hardware [65]. The VMM allows
access to resources held in the infrastructure layer while ensuring isolation of VMs which
improves security levels and reliability. The VMM also spawns new VMs on demand,
migrates VMs to existing or new instances when necessary and applies consolidation
techniques by moving VMs from underutilized hosts and powers down these hosts in
order to conserve energy.
34
4.2 levels of virtualization
4.2 levels of virtualization
Virtualization can be applied using three different techniques, full virtualization, para
virtualization and/or hardware assisted virtualization. All three methods must and have
dealt with the need to alter the privilege level of an architecture also referred to in the
literature as ring aliasing or ring compression, to allow for virtualization to take place.
For example, in an x86 level architecture there are four levels of privilege, the OS takes
the lowest level and therefore presumes it has direct access to the host it is placed on,
however by placing a virtualization layer underneath the OS, the levels of privilege have
now been altered.
4.2.1 Full Virtualization
Full virtualization allows for the complete isolation of a guest OS from the underlying
infrastructure. This allows for an unmodified OS to run using a hypervisor to trap and
translate privileged instructions on the fly or through the use of binary translation [20].
Although full virtualization can carry high overheads due to the need to catch and trans-
late privileged instruction, it does provide the most secure and isolated environment for
VMs.
4.2.2 Paravirtualization
Paravirtualization continues to employ a hypervisior, however this method of execution
requires the hypervisor to alter the kernel of the guest VM. The hypervisor alters the OS
calls and replaces these with hypercalls, allowing for direct communication between the
guest VM and the hypervisor without the need to process privilege instruction or the
need to create binary translation, thus decreasing overhead [83]. In doing this paravirtu-
alization reduces the need for binary translation, and therefore significantly simplifies the
process of virtualization [96].
35
4.2 levels of virtualization
4.2.3 Hardware Assisted Virtualization
Hardware assisted virtualization also referred to in literature as native virtualization [20],
is an alternative method of virtualization that seeks to overcome the limitations of par-
avirtualization by removing the need for CPU virtualization and the overheads of full
virtualization which occurs through binary translation. Both Intel and AMD support
hardware assist virtualization through the Intel-VT and AMD-V virtualization extensions
[20]. In order to address the problem of virtualization and in particular the levels of priv-
ileges required for systems to run effectively and efficiently, Intel Vt-x which supports
the IA-32 processor virtualization consists of two separate forms of privileges, the guest
runs the VMX non root operation and the hypervisor runs a VMX root operation which
provides them with four separate levels of privilege. This allows the guest OS to run
on its expected level O privileges and provides the hypervisior with the ability to run
multiple privilege levels. In order to run this configuration Intel has applied two extra
transitions, the guest to the hypervisior know as a VM exit and from the hypervisior to
the guest known as the VM entry. The VM exit and entry are managed via a virtual
machine control structure, which subdivides the IA-32 into two sections, one to deal with
the guest state while the other deals with the host state [78].
36
4.3 hypervisors
4.3 hypervisors
As highlighted in the above literature, in order to implement virtualization there is a
need to deploy a hypervisior commonly referred to as a VMM, on top of the hardware
level within the system, also referred to in literature as bare-metal level. The hypervisior
provides an intermediate layer between VMs and the underlying hardware. This layer
allows for the total encapsulation of the VM which provides stability security and relia-
bility from bugs or malicious attacks, mapping and remapping of existing and new VMs.
The subsections below review the most commonly deployed hypervisors in data centers
today.
4.3.1 Xen
Xen is a open source project whose hypervisior is currently powering some of the largest
cloud projects deployed today, such as Amazon web services, Google and Rackspace
services [94]. Originating in the early 1990’s in Cambridge University, the Xenoserver
computing infrastructure project proposed the creation of a simple hypervisior which
allows users to run their own OS and the added capability to run specifically designed
applications directly on top of the hypervisor to improve performance and allow a sub-
stantial number of disparate guest OS [32].
In 2002 Xen was released as a open source project, it has since seen four major updates.
Xen put forward a paravirtualized architecture, citing the complexity of full virtualiza-
tion as a major and unwelcome cost. Xen believed that the hiding of virtualization from
guest OS risked correctness and performance and that paravirtualization was necessary
to obtain high performance robustness and isolation[5]. In order to do this the hypervisor
must cater for all standard application binary interfaces (ABI) and support a full range of
OS.
In 2005 Xen in conjunction with Cambridge and Copenhagen Universities, introduced
the design and implementation for the live migration of VMs. This was a major step
forward in hypervisior efficiency. Live migration could be completed with a downtime as
low as 60ms, it allowed for the decommissioning of the original VM once the transfer was
complete and it also allowed for media services etc. to be transferred without the need
for users to reconnect, it also allowed for the VM to be transferred as a single unit, elim-
37
4.3 hypervisors
inating the need for the hypervisor to have knowledge of individual applications within
the VM. This further progressed the maintainability of a data center by further improving
the ability to perform dynamic consolidation of VMs [21].
Today, Xen offer a large range of virtualization solutions for multiple architectures, in-
cluding ARM and x86, it also provides the capability to virtualize a large range of OS,
including Linux, Solaris and Windows through the use of full hardware assisted virtual-
ization.
4.3.2 KVM
The kernel virtual machine (KVM) originated in 2006 as an open source project, KVM
requires Intel VT-x or AMD-v instruction sets to run, both of which were also made
available in 2006. A KVM hypervisior allows for up to 16 virtual CPUs running full
virtualization methods [95]. The KVM leverages the hardware extensions provided by
Intel and AMD to add a hypervisior to a Linux environment. Once this hypervisor is
added to the environment, it also adds a /dev/kvm device node which allows users to
create virtual machines, read and write to virtual CPUs, run a virtual CPU and inject in-
terrupts and allocate memory via a memory management unit (MMU) for the translation
of virtual address to physical address. This MMU consists of a page table which encodes
the virtual addresses to physical address, a notification manager for page faults and a
table lookaside buffer (TLB) and instruction set, all located on the chip to decreases table
look-up time [39].
4.3.3 VMware
VMware is a hypervisor, which is a result of research carried out by Stanford University
[61]. In 1998 VMware built on this research and virtualized the x86 architecture through
binary translation and direct processor execution [84]. The implementation of full binary
translation allowed VMware to deploy full virtualization of its platform as well as the
ability of its guest VM to host a range of OS including Linux and Windows.
38
4.4 summary
Originally VMware offered VMworkstation as a deployed as a hosted architecture
which placed a virtual layer directly as an application on the host VM. In more recent
times VMware ESX uses a hypervisor layer of technology placed on bare-metal, signifi-
cantly increasing I/O performance [86].
Similar to Xen 3.0.1 and KVM it utilises a data structure to track the translation of vir-
tual pages to physical memory pages, shadow pages are kept in sequence with the pmap
structure for the processor in order to minimise overheads. VMware Drs monitors VMs
within a data center, by leveraging VMmotion which allows for live migration and VM
schedulers, it allocates and reallocates VMs as necessary.VMware HA monitors hosts for
failures. It allows for rapid redeployment of VMs on a failed host when necessary, it also
ensures that the required storage to facilitate this redeployment is available at all times
within a cluster [86].
4.4 summary
This chapter reviewed the area of virtualization, beginning by looking at the early stages
initiated by IBM in the 1960’s and continuing through to modern day virtualization. The
second half of the chapter reviews the different layers and methods used in implementing
virtualization with the chapter coming to a close by examining the three most commonly
deployed hypervisors today.
39
5
R E I N F O R C E M E N T L E A R N I N G
Reinforcement Learning (RL) dates back to the early days of cybernetics and work in
statistics, psychology, neuroscience and computer science [38]. From a purely computer
science viewpoint RL is a type of machine learning, machine learning is viewed as the
ability of computer programs to automatically improve through experience.
RL has been an area of research since the late 1950’s when Samuel first applied tempo-
ral difference (TD) methods in order to manage action values, some years later in 1961
Minsky is attributed as developing the term RL [71], however it was the development
of value functions and its mathematical characterization in the form of Markov decision
process (MDP) in the mid 1980’s that has helped propel its popularity as an artificial
intelligence (AI) approach to problem solving [72]. The successful application of the RL
for disparate tasks such as Tesauros TD-Gammon or Bartos work on improving elevator
performance through the use of RL and neural networks has also elevated its appeal to
researchers in recent times [74] [9].
This AI approach offers a more flexible approach than many of its counterparts and
is a key part in what differentiates itself from that of other forms of machine learning,
including supervised and unsupervised, by this we mean, actions can be of low-level
non-critical decisions or high-level strategic methods, boundaries between an agent and
the environment are not rigidly defined and can adapt to suit the given workspace or
problem and the time steps involved need not be of chronological order, they can be stage
or task relate to suit the problem domain.
40
5.1 agent / environment interaction
5.1 agent / environment interaction
Within a RL framework the learner is commonly referred to as an agent, with everything
outside of the agent referred to as its environment. Through a cyclical process of state-
action-reward at discrete time steps, the agent learns an optimum policy.
As an agent progresses through the state space, in the main, their current action not
only effects the immediate reward received, but the probability of maximising future re-
wards. Therefore an ”optimal action” must take into account not only immediate reward
but also the possible future reward in deciding which action to take, commonly referred
to as delayed reward.
RL delayed reward problems are commonly modelled as MDP. A MDP is a mathemati-
cal structure for the modelling of decision under uncertainty. A MDP is represented as a
4 tuple [7] s,a,t,r
s - The state space, in a reinforcement framework is referred to as the environment state.
a - The action space, representative of all possible actions in a given state in a reinforce-
ment framework.
t - The transition state or the probability (P) of action a in state s will result in s1 and can
be defined as:
Pa
s,s = Pr st+1 = s |st = s, at = a (1)
r - The reward space - given any current state s and action a, and, together with any next
state, s , the expected value of the next reward is
Ra
s,s = E rt+1|st = s, at = a, st+1 = s (2)
41
5.1 agent / environment interaction
Therefore, we can view reinforcement learning as the ability to map states to actions
in order to maximize a numerical reward. In order to achieve this a reoccurring interac-
tion at discrete time steps between the agent and environment is necessary as laid out
in Fig.5.1. The agent receives a representation of the environment in the form a state st
. This allows the agent select and return an action at , based on the agents policy. At
the beginning of the next time step the environment returns a new representation of the
current state st+1 and a numerical reward rt+1 base on the previous action undertaken at .
Figure 5.1: The agent-environment interaction in RL [73]
42
5.2 learning strategies
5.2 learning strategies
Traditional learning strategies commonly referred to as update functions, assign rewards
at the end of a task via the relation of actual and predicted outcome, however such meth-
ods have proved to be resource intensive as regards memory, and can also be viewed as
a static approach, unsuitable for more transformative problem domains. A more suitable
learning strategy known as temporal difference learning provides a collection of methods
for incremental learning specialized in the area of prediction problems [70]. Temporal dif-
ference methods do not require a full model of an environment in order to learn, rather
they update estimates based in part on previously learned estimates [87], without waiting
for a final outcome often referred to as bootstrapping [73].
A discount factor γ, which may range from 0-1, determines the importance of future
rewards, a factor closer to 0 allows an agent to take a restricted view considering only
short-term rewards while a value closer to 1 allows for the agent to strive towards a greater
long term reward. The learning rate α establishes the rate at which new information
overrides old. A learning rate of 1 ensures that the most recent information obtained is
utilised while a learning rate of 0 infers no learning will take place.
43
5.2 learning strategies
5.2.1 Q-Learning
Q-learning is a form of TD model free learning proposed by Watkins [88]. Q-learning
learns on an incremental basis calculating Q-Values at each discrete time step as the
estimated value of taking action a and thereafter following an optimal rewarding policy
π. Q-learning, maps these state-action transitions at each discrete time step that is non
terminal, through the following update rule.
Q(st, at) ← Q(st, at) + α[rt + γmaxQ((st+1, ai) − Q(st, at))] (3)
A single iteration results in a single Q-value which is a concatenation of current reward
(r) plus the discounted estimated reward γmaxQ((st+1, ai) − Q(st, at)) allowing for ad-
vanced progression towards an optimal π. The general form of the Q-learning algorithm
is as follows;
Q-learning Algorithm
Initialize Qmap arbitrarily, π ;
Repeat (while s is not terminal) ;
Observe s ;
Select a using: π ;
Execute a;
Observe st+1 rt+1;
Calculate Q Q(st, at) ← Q(st, at) + α[rt + γMaxQ((st+1, ai) − Q(st, at))]
44
5.2 learning strategies
5.2.2 SARSA
The modified connectionist Q-learning algorithm, more commonly known as SARSA was
introduced by Rummery & Niranjan [66], they question if the use of γMaxQ(st+1, ai)
provides an accurate estimate of a given state particularly in large scale real world ap-
plications. They believe for optimal performance that γ must return to 0 for each non
policy derived action. To counteract this they proposed the following update function
now known as SARSA
Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))] (4)
Rather than utilising γmaxQ(st+1, ai) they associate a second state-action transistion
Q(s+1,a+1) for the calculation of a given Q-value, thus negating the need to return γ to
0 for non policy derived actions. This means that SARSA, a name which is derived from
the reflection it requires a quintuple of events st, at, r, s+1, a+1 , in order to calculate its Q-
values, is viewed as an online policy as it takes into account the control policy by which
the agent is moving, and incorporates that into its update of action values, in comparison
to Q-learning is viewed as a offline-policy, as it simply assumes that an optimal policy is
being followed. The general form of the Sarsa control algorithm is as follows;
SARSA Algorithm
Initialize Qmap arbitrarily , π ;
Repeat (while s is not terminal) ;
Observe st ;
Select a using: π ;
Execute a;
Observe st+1 rt;
Select a + 1 using: π ;
Calculate Q Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))]
45
5.3 action selection policy
5.3 action selection policy
One element of RL not shared with other machine learning techniques is that of explo-
ration vs. exploitation. In order to learn a truly optimal policy an agent must explore
all possible states and experience taking all possible actions, while on the other hand in
order to exploit an optimal policy and its associated rewards an agent must follow the
optimal policy. Commonly referred to as the dilemma of exploration and exploitation [73], it
can have a great impact on an agents ability to learn [76]. An agent that always exploits
the best action of any given state predefined in a state model, is said to be following a
greedy selection policy, however such an implementation never explores, thus paying no
regard to possible alternative more lucrative actions.
5.3.1 ε-Greedy
An alternative selection policy is known as ε-greedy. This method introduces a parameter
epsilon, epsilon controls the rate of exploration. epsilon is set at a desired rate of prob-
ability and at each time step is compared to a random number, should they coordinate
a random action is chosen, therefore providing an element of exploration. As an agent
converges closer to an optimum policy epsilon may be reduced to represent the lowered
need of exploration.
5.3.2 Softmax
ε-greedy remains a popular method for providing an exploration allowance, however a
draw back is the equal probability of choosing the worst or best action when exploring.
An alternative which goes someway to addressing this issue is known as the Softmax
action selection policy. When used in a RL paradigm an actions probability of selection
is a function of its estimate value, increasing the probability of the higher value action
been chosen [90]. Softmax action estimates are commonly obtained via Gibbs distribu-
tion, however estimates can be calculated in many different ways, often dependant on the
underlying schema of a system in which an agent is deployed. Similarly the benefits of
Softmax over ε-greedy is undefined as it to largely depends on the environment in which
they are applied [73].
46
5.4 reward shaping
5.4 reward shaping
One of the main limitations of RL is the slowness in convergence to an optimum policy
[52]. In a RL framework traditionally value functions otherwise referred to Q-values are
initialised with either pessimistic, optimistic or random values [24]. These methods tend
to overlook the fact that in real world applications a developer may hold key domain
expert knowledge, that if incorporated can help an agent based system to converge to a
level of optimum performance at a much quicker rate. The leveraging of such knowledge
is know as knowledge based reinforcement learning.
One such approach is known as reward shaping, this is the introduction of domain
expert designed reward in addition to the natural system reward. Due to the intrinsic
relationship of rewards, states and actions, the accurate shaping of rewards is vital to
the overall effectiveness of an agent. Poorly designed reward shaping can not only delay
the convergence to a optimal policy, but can in fact be detrimental to learning as seen
by Randlov and Alstrom where an agent learning to ride a bike actively pursued a path
away from the goal as the cumulative reward for correcting the orientation was greater
than that of reaching the goal [64].
5.4.1 Potential Based Reward Shaping
Ng et al. [60] introduce potential based reward shaping (PBRS) in order to optimize the
method of shaping rewards and in turn preventing the problems highlighted by Randlov
and Alstrom study [64]. The potential based reward is calculated as the difference in
potential between s’ and s+1 and is formally defined as;
F(s , s) = γφ(s ) − φ(s) (5)
47
5.5 summary
The research has proven that in applying PBRS in both finite and infinite state spaces
with a single RL based agent, it does not alter the optimal policy of an agent but does
decrease the convergence time significantly.
This can best seen in in Fig.5.2, taken from Wiewiora et al. the diagram illustrates
the convergence of a PBRS based agent against that of non- PBRS based agent of a well
known RL problem known as mountain car [92]. It is clearly visible the PBRS based
agent begins much closer to the optimal policy, greatly outperforming the standard agents
based model.
Figure 5.2: PBRS effect [92]
5.5 summary
This chapter focuses on the A.I approach known as reinforcement learning. Section 5.1
looks at the agent-environment interaction and the modelling of RL delayed rewards as
MDPs. In Section 5.2 temporal difference learning strategies including Q-learning and
SARSA are explored in detail. This is followed by the explanation and evaluation of the
action selection policies ε-greedy and softmax. The chapter concludes by surveying the
advanced RL technique known as PBRS.
48
6
R E L AT E D R E S E A R C H
Cloud computing leverages the ability to virtualize the key constituents which form the
lowest level of the cloud architecture known as IaaS, principally large scale data centers.
The virtualization of large scale congregations of nodes, typical of that found in mod-
ern day data centers into multiple virtual independent machines executing on a single
node, not only allows for the plasticity of services, but plays a key role in the high level
adherence of SLAs, and maximum utilization of resources which underpin the founda-
tions of cloud computing, while providing maximum return on investment for service
providers. However due to the dynamic nature of cloud services and their demands, the
static or offline management of these VMs provides significant restriction to all of these
key principles.
In recent years much research has been undertaken focusing on combining the areas
of energy efficiency and dynamic resource selection and allocation policies. This research
can be categorized into the following three sections.
• Threshold and Non-Threshold Approach
• Artificial Intelligence Based Approach
• Reinforcement Learning Approach
49
6.1 threshold and non-threshold approach
6.1 threshold and non-threshold approach
The main area of research concentrates at machine or host level, such as Nathuji and
Schwan, proposed a VirtualPower architecture which implements the Xen hypervisor
with minimal alterations to the hypervisior [59]. Each host contains a local power man-
agement module (PM-L) residing locally as a controller in the driver domain known as a
Dom0 module. When a guest OS attempts to make power management decisions these
calls are trapped due to their privilege levels by the hypervisior, the VirtualPower pack-
age then passes these trapped calls to the PM-L where decisions on power management
can be made based on VirtualPower management rules contained in the Dom0 controller.
However while this research address local policies it fails to address global policies for
their suggested global power management (PM-G) module.
Kuysic et al. introduced a proactive look-ahead control algorithm [44]. The algorithm
known as LLC, proposes to minimize CPU power usage and SLA violations while max-
imising providers profits. It proposes the use of a quadratic estimation algorithm known
as Kalman to estimate workload arrivals and supply VMs accordingly. This approach
requires a complex learning based structure in order to predict incomes which in turn
increase computational overhead. The research conclusions highlight this complexity
as a serious issue, especially when dealing with discrete input values with exponential
increases in worst case complexity, where the increase in control options accrue large in-
crease in the computational time require by the LLC controller. A data center with 15
hosts requiring 30 minutes execution time, which would be unrealistic for implementa-
tion in large scale data centers.
Cordosa et al. proposed the leveraging of existing parameters within the Xen and
VMware packages to alter the method in which VMs contend for power regardless of
workload priority [19]. Parameters provided by Xen and VMware hypervisiors such as
min allow for the allocation of minimum amount of resources provided to any given VM,
max parameter allows to set the maximum resources applied to any given VM, while
shares parameter allows a developer to set the ratio of CPU allocation between high and
low priority VMs. By allocating high levels of minimum resources to high priority VMs
and limiting the allocation to low priority VMs, they hope to improve overall performance.
Using VMware ESX servers the authors carried out their experiments, however the min
and max and share thresholds were designated prior to run-time i.e statically, with no al-
50
6.1 threshold and non-threshold approach
ternative for dynamic adjustment during run-time thus limiting suitability of the research
to be implemented in a real world cloud data center due to the heterogeneous nature of
evolving applications. The research assumes that pre-existing maps of SLA agreements
exist and uses these as input parameters, but fails to outline the number of SLA violations
as a result of applying the approach they outline.
Verma et al. implemented a power aware application placement framework called
pMapper designed to utilize power management applications such as CPU idling, DVFS
and consolidation techniques that already exist in hypervisiors such as Xen [81]. These
techniques are leverage via separate modules mainly the performance manager who has
a global overview of the system and receives information such as SLAs and QOS pa-
rameters, the migration manager which deals directly with the VMs to implement live
migration, the power manager which communicates with the infrastructure layer to man-
age hardware energy policies and finally the arbitrator which decides on information
supplied from the above mentioned polices for the optimal placement of VMs through
utilising a bin packaging algorithm. At implementation stage pMapper was utilised to
solve a cost minimization problem which considers power-migration cost and similar to
Cordosa et al. fails to address SLA violations [19].
In additional research by Verma et al. they suggest that server consolidation can be
viewed in three forms [82]. The first is static where VMs or applications are placed on
servers for an extended period such as months or years, second semi-static for daily and
weekly usage and dynamic for VMs and applications with execution times ranging from
minutes to hours. The author highlights that tools currently exist to manage such struc-
tures, but are rarely used and administrators often prefer to wait for offline migration
to decide on placements. Although the paper highlights three forms of consolidation it
deals only with static and semi-static and much like Cordosa et al. research, this limits
suitability of the research to be implemented in a real world cloud data center due to the
heterogeneous nature of evolving applications.
Jung et al. propose a hybrid system of on-line/offline collaboration by analysing data
based on system behaviour and workloads on-line to feed a decision tree structure offline
[36]. This approach allows for the modelling of large scale, complex configuration prob-
lems and reduces overheads by removing the decision model from run-time environment.
These models are used for the bases of creating on-line Multi-tier ques in an attempt to
51
6.1 threshold and non-threshold approach
reach peak utilization. This research was furthered in 2009, when Jung et al. created a
middleware for cost sensitive adapted and server consolidation, utilising the Multi-tier
ques developed in their earlier study and by applying a best first search graph algorithm
with cost-to-go as there transitions costs and a layered queuing network solver (LNQS)
predictive modelling package [37]. However this was modelled solely on a single web
application and similar to Cordosa and Verma, limits suitability of the research to be im-
plemented in a real world cloud data center.
Threshold based approach for autonomic scaling of resources are more commonplace,
with cloud providers such as Amazon Ec2 through their Auto scaling software and
RightScale implements such policies. Threshold based approaches are based on the
premise of setting an upper and lower bound threshold that, when broken trigger the
allocation or consolidation of resources as necessary.
Research carried out in the area of threshold based approaches include a proposed
architecture known as ”the 1000 island solution architecture” by Zhu et al. [99]. Similar
to Verma they consider three separate application categories based on time periods, they
then denote an individual controller to each category. The largest timescale is hours to
days, the second is minutes and finally seconds. Each group is regarded as a pod and
has a node controller managing dynamic allocation of the node’s resources, as part of
the node controller lies a utilization controller, which computes resource consumption
and estimates future consumption required in order to meet the SLA. This information is
passed to a global arbitrator module which decides overall allocation of resources. The
arbitrator module associates individual workloads priority levels in order to schedule
works appropriately, with high priority works getting first allocation of resources. The
pod controller monitors node utilization levels setting 85% CPU utilisation as an upper
threshold and 50% as a lower threshold, using this information it then migrates VMs as
necessary. The pod set controller studies, historic demands and estimates future demands
using an optimization heuristic approach to formulate policies. Although the results of
the experiments in this research are positive the author highlights the need to scale up
the size of the test bed to realistically evaluate its strength in a real world application, to
the best of our knowledge this has not yet been achieved.
52
6.2 artificial intelligence based approach
6.2 artificial intelligence based approach
John McCarthy defines AI as the science and engineering of making intelligent machines,
especially intelligent computer programs [54]. This ability to make intelligent computer
programs forms the basis for the following concepts, in which researchers apply a range
of AI approaches as a tool for the optimization of resource allocation in a cloud environ-
ment.
One such example is that of Hu et al. who considers a genetic algorithm (GA) approach
to the scheduling of resources, in particular VMs [34]. Utilizing a GA in conjunction with
historic performance data, Hu attempts to predict the effect of multiple possible sched-
ules in advance of any deployment in order to apply the best load balance.
Wei et al. deploys a similar approach to resource optimization through a game the-
oretic approach, scheduling resources through a cost-time optimization algorithm, two
step approach [89]. Each agent solves their problem optimally independent of others, at
this stage an evolutionary optimization algorithm takes this information, collates the data
and estimates an approximate optimal solution and donates resources as necessary.
Particle swarm optimization a concept first introduced by Kennedy [63], was deployed
by Pandy in 2010 to optimise the mapping of work flow to resources in a cloud environ-
ment [62]. Each particle represents a mapping of a resource to a task in a 5D dimensional
space i.e each particle has five jobs, these particles are released into the search space
mapping their best locations, in this case the best task to resource allocation, in order to
determine the optimal combined work flow.
Moghaddam et al. in 2014 introduces the concept of a Multi-level grouping genetic
algorithm (MLGGA) [58]. The researcher highlights the fact that the problem of optimal
VM placement is an NP-hard problem and can be viewed as a bin packing problem. Due
to the bin packing nature, they use a grouping genetic algorithm (GGA) as their base al-
gorithm and attempt to introduce a Multi-level grouping concept to optimize placement
and grouping of VMs and in-turn reduce the carbon footprint. While the researchers’ ex-
periments are both substantial and strenuous proving a lowering in the carbon footprint,
the research fails to address some of the key aspects of VM placement in data centers
53
6.2 artificial intelligence based approach
such as quality of experience, security, QOS and SLAs.
Sookhtsaraei et al. similar to Moghaddam introduces a genetic algorithm solution as
an approach to optimizing bin packing for VMs [69]. Using GGA as a base for their
algorithm they create an algorithm called CMPGGA, the CMPGGA algorithm considers
bandwidth, CPU, memory along with host and VMs as input parameters, with an output
of an optimized mapping for VMs to hosts. While the CMPGGA can argue improvement
in reducing operational costs, the research fails to address QOS or SLA violations. With-
out considering these violations which can result in monetary penalties for the service
providers it is impossible to fully calculate operations improvements.
54
6.3 reinforcement learning based approach
6.3 reinforcement learning based approach
A more recent approach is the application of RL agents to optimize resources manage-
ment in the cloud. Barret et al. purposes a parallel RL framework for the optimisation
of scaling resources in lieu of the threshold based approach [7]. The approach requires
agents to approximate optimal policies and sharing their experiences with a global agent
to improve overall performance, and has proven to perform exceptionally, despite the re-
moval of traditional rigid thresholds.
While Bahati proposes incorporating RL in order to simply manage the existing thresh-
old based rules [4]. A primary controller applies these rules to a system in order to enforce
its quality attributes. A secondary controller monitors the effects of implementing these
rules and adapts thresholds accordingly.
Another approach adopted by Teasauro introduces a hybrid RL approach to optimising
server allocation in data centers through the training of a nonlinear function approxima-
tor in batch mode on a data set while an externally trained policy makes management
decisions within a given system [75].
Finally Farahnakian et al. and Yuan et al. present dynamic RL techniques to optimize
the number of active hosts in operation in a given time-frame [97] [27]. A RL agent learns
an online host energy detection policy and dynamically consolidates machines in line
with optimal requirements. Post detection of over utilized hosts, both studies employ Bel-
oglazovs minimum migration time selection policy in order to identify VMs for migration
[11].
All of the above RL approaches have proven a statistical advantage over threshold based
approach, and forms the motivation for this research to implement and evaluate RL at a
lower level of abstraction as a policy for the selection of VMs.
55
6.4 virtual machine selection policies
6.4 virtual machine selection policies
Beloglazov et al. study carried out in 2011 remains one of the most highly cited and
accepted pieces of research in relation to the consolidation of VMs while maximizing
performance and efficiency in cloud data centers [11]. Beloglazov examines the dynamic
consolidation of VMs while considering multiple host and VMs in an IaaS environment.
Unlike numerous other research papers Beloglazov models SLAs as a key component in
a solution to VM consolidation.
Beloglazov proposed algorithm can be broken into three sections over-loading/under-
loading detection, VM selection and VM placement
Overload detection: Building on past research Beloglazov suggests an adaptive selec-
tion policy known as Local Regression (LR) for determining when VMs require migration
from host in order not to violate SLAs [10]. Local Regression first proposed by Cleveland
allows for the analysis of a local subset of data in this case hosts [22]. By providing an
over utilization threshold along with a safety parameter, LR decides if a host is likely to
become over utilised if their current CPU utilization usage multiplied by the safety pa-
rameter is larger than the maximum possible utilization.
VM selection: Virtual machines v are placed on a migration list based on the shortest
period of time to complete the migration, the minimum time is considered as the utilized
ram divided by spare bandwidth for the host h , the policy chooses the appropriate Vmv
through the following equation, where RAMu(a) is the amount of RAM currently utilized
by the VMa , and NETh is the spare network bandwidth available on host h .
v ∈ Vh | ∀a ∈ Vh,
RAMu(v)
NETh
≤
RAMu(a)
NETh
(6)
Beloglazov research proves dynamic VM consolidation algorithm Lr-Mmt significantly
outperforms static policies such as DVFS or non power aware approaches. It also outper-
forms the following dynamic policies.
56
6.4 virtual machine selection policies
6.4.1 Maximum Correlation
Maximum correlation policy is based on the premise that the stronger the inter-relationship
of applications running on an over utilized server, the higher the probability the server
will overload as highlighted by Verma et al. [81]. The multiple correlation policy finds a
VMv that satisfies the following policy.
v ∈ Vh | ∀a ∈ Vh, R2
xv, x1, x.., xv+1, xv−1, xn ≥ R2
xv, x1, x.., xv+1, xv−1, xn (7)
6.4.2 Minimum Utilization Policy
The Minimum Utilization Policy is a simple method to select VMs from overloading hosts.
The policy chooses a VM based solely on the minimum utilization of a host, calculated
in Millions of instructions per second (MIPS). The policy is repeated until the host is no
longer considered as being overloaded [79].
6.4.3 The Random Selection Policy
The Random Selection Policy is another simple method to select VMs from overloading
hosts. The policy chooses a VM randomly to migrate. The policy is repeated until the
host is no longer considered as being overloaded [79].
57
6.5 research group context
6.5 research group context
This research was undertaken as part of a wider research group led by Dr. Enda How-
ley. The research group is known for research in the area of Multi-Agent Systems, Cloud,
Swarm, Smart Cities, Social Network Analysis & Simulation and Data Analytics. The fol-
lowing section reviews just a subset of research carried out by past and present members
of the group in the area of Cloud and RL.
Barret et al. presents a novel approach to a work flow scheduling in a cloud environ-
ment [6]. A work flow architecture estimates the average execution and cost of a task ,
which are passed to multiple solver agents, which through the use of GAs, produce vari-
ous possible schedules. An MDP agent takes these possibilities and develops an optimal
schedule for the work flow execution. Results show that the MDP agent can optimally
choose a schedule despite an environment having varying loads and data sizes.
Further work by Barrett et al. includes the automation of resource allocation in the
cloud through the use of a RL multi-agent approach [7]. Each agent addresses incoming
workloads, on the basis of these requests an agent must approximate an optimal policy for
resource allocation, each agent shares this information amongst each other before finally
forwarding optimal scheduling policies to an instance manger which allocates VM based
on the advice. Results show that by paralleling the scheduling process the time taken
to converge is reduced greatly and the framework can effectively select VMs of varying
types for the required workload.
Mannion et al. presents a parallel learning RL algorithm which utilizes heterogeneous
agents [51]. Each of these heterogeneous agents learning in parallel on a partitioned
subset of the overall problem. The knowledge/experience of these agents is then made
available to a master agent where the values are used for Q-Value initialisation. This par-
allel approach has proven to outperform the standard Q-learning approach, resulting in
increased learning speed and a lower step to goal ratio.
This work is advanced in where Mannion et al. introduces this parallel learning of
partitioned action spaces to a smart city environment and traffic signal control [49]. Re-
sults show significant improvement with the use of action space partitioning compared
to a standard RL approach. Mannion also investigates the area of potential based reward
58
6.6 summary
systems to improve performance for the learning of traffic signal control [50]. Comparing
a potential based reward agent with a standard agent Mannion shows that not only does
learning speed increase, but it also reduces cue and delay times.
6.6 summary
This chapter reviews the key literature in the area of resource allocation, selection and
scheduling in a cloud environment. Section 6.1 explores the traditional approach of static,
threshold and non threshold approaches to resource management. Section 6.2 progresses
to analyse a more dynamic approach of resource management through the application of
various A.I approaches ranging from GA, PSO and game theory. Section 6.3 focuses on
RL as a specific method of resource scheduling, with key work from Barret and Bhatti
providing key examples of resource scheduling via RL. Section 6.4 reviews pertinent
literature from Beloglazov in the area of VM selection algorithms including minimum
migration and maximum correlation. The chapter concluded by highlighting the role of
this research in relation to the wider research group.
59
Part III
M E T H O L O G Y
7
C L O U D S I M
7.1 overview
The CloudSim toolkit was chosen as an appropriate simulation platform as it allows for
the modelling of the virtualized IaaS environment and is the basis of much leading re-
search in the area of cloud computing capabilities particular energy conservation and
resource allocation [47] [91] [68] [16].
The CloudSim framework is a Java based simulator developed by CLOUDS laboratory,
University of Melbourne. It allows for the representation of an energy-aware data center
with LAN-migration capabilities. In keeping with industry standards, 300 seconds/ 5
minute intervals establish if a host is over utilised and requires migration of VMs. The
default ceiling threshold for utilization is 100% with an added safety parameter of 1.2.
This safety parameter acts as a over utilisation buffer, for example, a host determined to
be 85% utilised is multiplied by the safety parameter 1.2 and results in a utilisation of
102%, and is therefore deemed as over utilised.
61
7.2 cloudsim components
7.2 cloudsim components
CloudSim is an event driven application written in the Java programming language con-
taining over 200 classes and interfaces for the complete simulation of a cloud environment,
the following section highlights the main and most important classes as highlighted by
Buyya et al. and Calheiros et al. [16] [18]. Fig 10 contains a CloudSim class design dia-
gram.
CloudInformationServices: The CloudInformationServices (CIS) class represents an entity
which provides the registration, indexing and modelling services for a data center cre-
ated within a simulation. A host from a data center registers its details with CIS, which
in-turn shares these details with the data center broker class which can then directly pro-
vide workloads to a host.
DataCenter: This class which extends SimEntity instantiates a data center and assigns
a set of allocation policies for bandwidth(BW), memory storage and deals with the han-
dling of VMs. This class is extended within the CloudSim framework as PowerDatacenter
and NetworkDatacenter to allow for customised research, such as power reduction or net-
work related research.
DataCenterCharacteristics: The DataCenterCharacteristics class allocates the static proper-
ties of a data center such as OS, management policy, time and costs.
Host: The Host class represents a physical resource such as a server which hosts VMs.
The class contains the internal policies for BW, processing power and memory for a single
instance of a host.
Vm: The Vm class represents a VM which is contained within a host. The class allows
for the processing of cloudlets submitted from the DataCenterBroker class in accordance to
its ability defined by its memory, processing power, storage size, and a VMs provisioning
policy. Similar to Datacenter it is extended within the CloudSim framework as an instance
of a PowerVm or NetworkVm to allow for customised research such as power reduction or
network related research.
62
7.2 cloudsim components
Cloudlet: The Cloudlet class allows for the instantiation of a Cloudlet object, tracks
Cloudlet movement and allows for the cancellation, pausing or removal of a cloudlet
from the CloudletList(). A Cloudlet in CloudSim contains a workload assigned to a VM.
DataCenterBroker: This class represents a broker acting on behalf of a client/user. The
broker queries CIS and retrieves the hostList containing information on available VMs and
their respective specification allowing for the broker to directly assign cloudlets to VMs
with the necessary capability to achieve the customers QOS demands.
SimEntity: The SimEntity class is an abstract class when extended represents a single
simulation. The startEntity() method is invoked to begin a simulation, when started the
processEvent() method is called repeatably to process all events held in the deferredQue().
Finally the shutdown() method is invoked just prior to the termination of a simulation
which allows for events such as printing to a log file. All simulations must invoke the
SimEntity class.
RamProvisioner: This is an abstract class which provides the necessary methods for ram
provision policies to VMs inside a host. This must be extended by researchers to config-
ure custom Ram policies, otherwise CloudSim will implement the RamProvisionerSimple
class as default.
BwProvisioner: The BwProvisioner class is an abstract class which provides the basic
necessary methods to allocate a bandwidth allocation policy. This must be extended by
researchers to configure custom BW policies, otherwise CloudSim will implement the Bw-
ProvisionerSimple class as default.
63
7.3 energy aware simulations
7.3 energy aware simulations
7.3.1 Initialising an Energy Aware Policy
Initialising a energy aware policy is possible by accessing the
org.cloudbus.cloudsim.power.planetlab package located in the examples folder, this package
contains a various array of power aware simulations including Lr-Mmt, Lr-Mc and Lr-
Mu. In order to create a new policy one must locate the main class within this package,
from here CloudSim instantiates a new planet lab runner providing it with the necessary
information.
7.3.2 Creating a Selection Policy
The creation of a new selection policy is possible by accessing the org.cloudbus.cloudsim.power
package located in the source folder, this folder contains all allocation and selection poli-
cies for VMs. It also contains the PowerVmAllocationPolicyMigrationAbstract class which
invokes the method getVmsToMigrateFromHosts(). This method calls for the selection of
a VM from a overloaded host. It is from this point the selection policy instantiated by
the user is invoked, and this is the key location of interaction between new or existing
selection policies with CloudSim.
7.4 hardware
A data center comprising of 800 physical servers, consisting of 400 HP ProLiant Ml 110 G5
and 400 HP ProLiant Ml 110 G4 servers is the default data center topology. Alterations can
be made to the hardware setup via the Constants class located in org.cloudbus.cloudsim.power
package. This class also provides the option of altering other key constants in relation to
VM types and sizes, scheduling intervals and bandwidth and storage.
7.5 workload
The workload comes from a real world IaaS environment. PlanetLab files within the
CloudSim framework contain data from CoMon project representing CPU utilization of
over a 100 VMs from servers located in 500 locations worldwide. In order to produce an
64
7.6 summary
Figure 7.1: CloudSim class structure [12]
accurate and reliable experiment the algorithms were deployed to represent a one month
time period, in order to achieve this the PlanetLab files were utilized through random se-
lection to create a 30 day workload. Each PlanetLab file contains 288 values representative
of CPU workloads. VMs are assigned these workloads on a random basis in order to best
represent the stochastic characteristics of workload allocation and demands within an
IaaS environment. Each VM corresponds to that of the Amazon EC2, other than that they
are single core, representing the fact the workload was retrieved from single Core VMs.
The 288 CPU values when used with CloudSims default monitoring interval represents
24 hours of data center capacity.
7.6 summary
This chapter introduces the simulation environment used in the remainder of the thesis
known as CloudSim, including details on its class structure, the necessary alterations
needed to introduce a new energy aware simulation and the default hardware and work-
loads provided by the simulator.
65
8
A L G O R I T H M D E V E L O P M E N T
In order to produce a RL selection algorithm for VMs some additional information must
be provided to register the RL policy and measure its effects. The creation of the RL
framework must be undertaken. The following chapter outlines in detail the necessary
additional classes and alterations.
8.1 registering a selection policy
In order to register a new selection policy the method getVmSelectionPolicy() from the class
RunnerAbstract located in org.cloudbus.cloudsim.
examples.power must be altered to include the name of the new policy and provided access
on the instantiation of a simulation. This class also allows user to alter the name of the
output folder for the compilation of results if required.
8.2 recording of results
The recording of certain key metrics are automatically compiled by CloudSim at the end
of each simulation. However these results are a combined metric of the overall perfor-
mance, for example the overall energy consumed or the overall number of migrations.
However for the accurate detailing of the effect a new policy has on the data center it
is important to measure key information such as energy, number of migrations or SLA
violations on an ongoing basis at discrete intervals. It is possible to do so in the Helper
class located in located in org.cloudbus.cloudsim.examples.power. It is from this class that
key metrics are printed to file at the end of each simulation, by introducing new methods
it is possible to measure these key metrics on a much more refine timescale.
66
8.3 additional classes
8.3 additional classes
The following selection describes the additional classes required for the RL framework at
high level, of which a schematic can be seen in Fig. 8.1.
8.3.1 Lr-RL
The LrRl class is located in the package org.cloudbus.cloudsim.examples
.power.planetlab. It contains the main method and is where the simulation is instantiated.
This class supplies the workload, names of the selection and allocation policy and the
safety parameter to PlanetLabRunner in order to begin the simulation.
8.3.2 RlSelectionPolicy
The RlSelectionPolicy class is located in the package org.cloudbus.cloudsim.
power it overrides the default VM selection policy within CloudSim, it also acts as a
controller class, conversing between CloudSim, the Environment class and the Agent when
necessary.
8.3.3 Environment
The Environment class carries out all functions necessary to accumulate the required in-
formation for the Agent to make a decision, for example the Environment retrieves the
state, produces a list of all possible actions in the given state and calculates rewards. All
utilized by the Agent class.
8.3.4 Agent
The primary role of the Agent class is to choose a VM for migration, by one of two possible
methods either following a SARSA policy or an -greedy policy. The Agent also contains
a ”brain”, in this case a matrix in which it stores, updates and reads Q-values as required.
67
8.3 additional classes
8.3.5 Algorithm
The role of the Agent class is to implement the requested Q-value estimation learning
strategy. In this case, Watkins Q-learning or Rummery and Niranjans SARSA algorithms.
8.3.6 RlUtilities
RlUtilities class contains all functions necessary for the accumulation and accurate mea-
surement of the required metrics.
Figure 8.1: The reinforcement learning CloudSim architecture
68
8.4 summary
8.4 summary
This chapter outlined the creation of a RL framework in CloudSim including the necessary
alterations to the existing simulator and additional classes required to replicate an agent
based approach for VM selection.
69
9
I M P L E M E N TAT I O N
In order to develop a RL algorithm in any system two key areas must be addressed
which are specific to the environment in which the algorithm is deployed, these are the
state-action space and the low level implementation of the learning strategy. This chapter
address both these issues in relation to an IaaS environment.
9.1 state-action space
RL techniques can suffer from a far-reaching state action space, which limits the effective-
ness and capabilities of a RL agent. Therefore to incorporate a RL algorithm into an IaaS
environment an appropriate state action range must first be defined. The state space s is
defined as current host utilization hu returned as a percentage which therefore confines
the state space in a range of 0-100 and is obtained through the following equation, where
virtual machiene utilization vmu is defined as a migratable VM utilization and n is all
possible migratable VMs.
s =
n
∑
vmu=1
vmu(n)
hu
.100 (8)
The action a space is represented as the vmu of its assigned host h, returned as a per-
centage, which also allows the action space to range from 0-100.
a= vmu(i)
hu(h)
.100 (9)
70
9.2 q-learning implementation
9.2 q-learning implementation
The first implementation allows for the implementation a Q-learning algorithm as follows
.
Q-learning virtual machine selection algorithm
foreach host → overUtilizedHost
do
foreach vm → migrateableVms
do
possibleAction ← vmSize
end
choose VM from possibleActions using π
Migrate Vm ;
Observe hostUtilization+1, reward ;
calculate Q Q(st, at) ← Q(st, at) + α[rt + γMax(Q(st+1, ai) − Q(st, at))]
update Qmap
end
The algorithm is invoked when a host is determined to be overloaded through the LR
host overload detection policy, this host is placed on a list of over utilized hosts and
forwarded to the VM selection policy in this case, RL.
From this stage the first host is selected from the list, the hosts level of utilization is
taken as the state, while all migratable VMs are mapped as possible actions based on the
percentage of their load in relation to their host.
A VM is then chosen based on the RL selection policy i.e ε-greedy or softmax. This
VM is placed on a migratable list and the hosts utilization level re-calculated and a scalar
reward attributed, the Q-value is calculated and stored.
If the current host is still deemed to be over utilised another VM is chosen in the same
manner until a time when the host is no longer overloaded . However if the host is no
longer over utilized the next host on the over utilized host list is chosen, until the list is
empty.
71
9.3 sarsa implementation
9.3 sarsa implementation
As referred to in section 5.2.2, SARSA requires a quintuplet consisting of values st, at, r, s+1, a+1
in order to calculate its Q-value.
This is where the design of the VM selection algorithms differs. Although both algo-
rithms accept the same input, the order in which they process this input must be altered
appropriately.
This alteration is evident in the following SARSA algorithm, post observation of the
new state i.e host utilization + 1 and the reward, it does not calculate the q-value at this
stage.
Instead it obtains a new list of possible actions in the shape of migratable VMs for the
state host utilization + 1, it then selects the appropriate VM following π. It is only now
that the algorithm has the required information to calculate for Q.
SARSA virtual machine selection algorithm
foreach host → overUtilizedHost
do
foreach vm → migrateableVms
do
possibleAction ← vmUtilization
end
choose VmToMigrate from possibleActions using π
Migrate Vm ;
Observe hostUtilization+1, reward ;
foreach vm → migrateableVms
do
possibleAction ← vmUtilization
end
choose VmToMigrate from possibleActions using π
Calculate Q Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))]
update Qmap
end
72
9.4 summary
9.4 summary
This chapter outlined a percentile state-action space to be utilised by the agent, reducing
the space to a 100*100 area reduces the state-action space necessary for an agent to traverse
regardless of the number of nodes in the data center. As a result addressing the so
called ”curse of dimensionality” and providing an adaptable and portable agent. Finally
the chapter concludes outlining a low level implementation for two key RL strategies
Q-learning and Softmax.
73
Part IV
E X P E R I M E N T S
10
E X P E R I M E N T M E T R I C S
The following chapter outlines the key metrics for measuring the performance of the
RL algorithm. These metrics were purposed by Beloglazov and are widely adopted in
research as a standard measurement of data center performance [12].
10.1 energy consumption
The total energy consumed by the data center per day in relation to computational re-
sources i.e servers. Although other energy draws exist, such as cooling and infrastructural
demands, this area was deemed outside the scope of this research.
10.2 migrations
The total migrations of all VMs, on all servers, performed by the data center. As the agent
is trained to carry out intelligent selection of VMs each migration is imperative when
analysing this research.
10.3 service level agreement metrics
The importance of maintaining a high standard of QOS and SLAs is imperative for a cloud
provider to uphold. Their importance is highlighted by the three stages of measurement
used to accurately report SLA violations.
75
10.4 energy and sla violations
10.3.1 SLATAH, PDM & SLAV
Service level agreement time per active host (SLATAH) is calculated as the time Tsi where
active hosts have experienced 100% utilization of their CPU, as a result this restricts access
to VMs upon the host i to any further processing energy should they request additional
CPU utilization, thus forcing violations. Where N represents the number of hosts and
Ta represents the time host i is actively serving VMs. Performance degradation due to
migration.
SLATAH =
1
N
N
∑
i=1
Tsi
Tai
(10)
(PDM) is established as an estimate of degradation Csv caused by migrations m with
Cav repersenting the total requested CPU capacity by a VM v over its lifespan.
PDM =
1
M
m
∑
v=1
Csv
Cav
(11)
Due to the equal importance of both SLATAH and PDM, a combined metric, service
level agreement violation(SLAV) is used to measure the combination of both metrics as
follows.
SLAV = SLATAH.PDM (12)
10.4 energy and sla violations
In order to ensure the implementation of energy saving policies does no negatively effect
SLA researchers and developers are required to measure the co related effect. To measure
this a combined metric named Energy and SLA Violations(ESV) is calculated as follows.
With the lower the overall ESV the better the performance of a data center.
ESV = ENERGY.SLAV (13)
76
11
S E L E C T I O N O F P O L I C Y
11.1 experiment details
Whether softmax or ε-greedy action selection is better depends on the task or the envi-
ronment in which it is deployed, and an intrinsic link between the action selection choice
and the update function (Q-learning or SARSA) performance exist due to its dependence
of Q [73].
For this reason the following experiment was undertaken to analyse and distinguish the
optimal combination of update and selection policy as mentioned in Section 2.7. There is
four possible update/selection policies, they are;
• Q-Learning / ε-greedy
• Q-Learning / Softmax
• Sarsa / ε-greedy
• Sarsa / Softmax
Each combination of policy is analysed using both a 30 day stochastic workload in order
to measure adaptability and a repetitive single workload over 100 iterations in order to
measure convergence rates.
77
11.2 results
11.2 results
11.2.1 Energy
The running of a single workload over multiple iterations not only allows for a perspec-
tive on the ability of the agent to learn, but also allows for the identification of speed of
convergence to a level of optimum performance for each possible update/selection policy.
The optimum energy consumption was determined as the point when the energy con-
sumptions passed beneath 140kWh, established from data gather from 100 iterations on
all four possible update/selection policies.
Fig. 11.1 displays the energy consumption of Q- learning with softmax and ε-greedy
action policies. On iteration No.34 ε-greedy converges to the optimal energy barrier, while
softmax fails to penetrate sub 140kWh until iteration No.64.
Figure 11.1: Energy consumption Q-Learning 100 iterations
78
11.2 results
Fig. 11.2 displays the energy consumption of Sarsa with softmax and ε-greedy action
policies. On iteration No.33 ε-greedy converges to the optimal energy barrier, while soft-
max again fails to penetrate sub 140kWh until it finally converges on iteration No.85.
Figure 11.2: Energy consumption SARSA 100 iterations
The policies that contain epsilon appear to converge the quickest to an optimal level of
performance. Fig. 11.3 displays a comparison of these policies in relation to convergence
time.
Figure 11.3: Q-Learning ε-Greedy vs SARSA ε-Greedy
As outlined in the previous section both Sarsa and Q-learning converge to an optimal
level in quick succession of each other, however Q-learning remains sub 140 kWh 22%
more than Sarsa for the remainder of the 100 iterations. Post the 50th iteration point
there is minimal performance differences, Q-learning produces a slight reduction in the
average consumption per iteration of 140.32kWh to Sarsa’s 140.74kWh, this is in-line with
the deviation level post 50 iterations with Q learning running at 0.2964 and Sarsa at 0.2977.
79
11.2 results
While multiple iterations of a single workload may highlight the rate of convergence
it does not portray an agent in a real world stochastic cloud environment. For that rea-
son one must take into account performance when supplied with a disparate workload.
Fig.11.5 and Fig.11.4 contain the daily average and overall energy consumption over a
30 day period. Again policies containing ε-greedy action selection methods perform best
and again, there is minimal difference in the performance of Q-learning and Sarsa, mir-
roring the results from the iterative test. Q-learning ε-greedy has a saving of 5, 16 and
25kWh over Sarsa ε-greedy, Q-learning softmax and Sarsa softmax respectively.
Figure 11.4: Overall Energy Consumption 30 Day Workload
Figure 11.5: Average Daily Energy Consumption 30 Day Workload
80
11.2 results
11.2.2 Migrations
Each time a Vm migrates the draw on energy increases as the contents of the VM is copied
from one server to another. Therefore a reduced level of migrations results in decreasing
the associated energy cost.
Figure 11.6: SARSA migrations 100 Iterations
Taking the analysis format from the previous section, each selection combination was
given an iterative single workload, Fig.11.6 displays the results of migrations selected
from over utilised hosts as chosen by Sarsa combinations with Fig.11.7 displaying migra-
tions selected from over utilised hosts as chosen by Q-learning combinations.
Figure 11.7: Q-Learning migrations 100 Iterations
A sizable difference is noticeable between the two update functions with the Sarsa
resulting in an average of between 5,864 to 5,713 migrations per iteration, while the Q-
learning update function average ranging from 2,941 to 3,002 migrations per iteration.
81
11.2 results
The differential in relation to migrations although considerable, is not a design flaw,
rather it is in line with Sutton & Barto cliff walking example Fig 11.8 [73]. Fig 11.9
displays Sarsa as accumulating greater reward similar to the cliff walking. This is the
result of Sarsa’s online policy nature, which considers the action selection policy, therefore
not letting the agent fall of the cliff or this case move an unrewarding machine. Rather it
learns the safer, more consistent and more rewarding path. In contrast Q-learning chooses
to ignore the action selection policy and attempts to converge to the optimum policy, even
though on occasion, this can cause an agent to fall off the cliff, or move a machine of high
cost resulting in extreme negative impact on rewards.
Figure 11.8: Accumulated rewards for cliff walking task [73]
Figure 11.9: Accumulated rewards for migrations
82
11.2 results
The total average migrations per iteration, that is a combined metric of those selected
from both over utilised hosts and under utilised host are contained in Fig 11.10. Again Q-
learning outperforms all other possible combinations and closely aligns with the results
of the 30 day test shown in Fig 11.11.
Figure 11.10: Average migrations 100 iterations
Figure 11.11: Average migrations 30 day workload
83
11.2 results
11.2.3 Service Level Agreement Violations
Service level agreement violations when broken can result in a financial penalty for the
service provider. Therefore, data center operators continuously strive to minimize viola-
tions and maximise performance, customer satisfaction and profit. Fig 11.12 and Fig 11.13
displays the overall SLA violations for both the 100 day and 30 iterative test.
Figure 11.12: Overall SLA violations 100 iterations
Once again Q-Learning outperforms all other possible combinations, however this is
the closest of all simulations with Q-Learning/ε-greddy out performing other possible
combinations by between just 0.4-1.4%.
Figure 11.13: Overall SLA violations for 30 days
84
11.2 results
11.2.4 ESV
The reduction of energy can result in a correlated negative effect on SLAs if the method
of reducing energy is not chosen carefully. To measure this effect, we utilise the ESV
metric outlined in Section 10.4. This could be considered the most important metric as it
combines SLAV and energy to give a more inclusive view of the data center performance.
The lower the rate of ESV the more efficient the data center is performing.
Fig.11.15 and Fig.11.14 contains the overall ESV for the itereative and 30 day tests, as
expected from the earlier analysis of energy and SLAV data the Q-Learning/ε-greedy
again outperforms other combinations.
Figure 11.14: Overall ESV for 100 iterations
Figure 11.15: Overall ESV for 30 days
85
11.3 discussion
11.3 discussion
The update/selection based ε-greedy policies outperform softmax based policies in rela-
tion to energy consumption and convergence time. The overall energy consumption for a
30 day workload ranges from a saving of 21kWh to 25kWh. ε-greedy also converges to an
optimum sub 140kWh earlier than Sarsa based combinations, with Q-Learning/Softmax
it closest rival converging after a further 30 iterations.
Fig 11.6 and Fig 11.7 displays the migrations for Sarsa and Q-Learning policies, with
Sarsa incurring a far greater number of migrations as a result of its online policy evalua-
tion and resulting safe approach to VM selection.
As regards SLA violations, Q-Learning/ε-greedy incurs the least violations all be it
by a fractional margin of between 0.4-1.4%, however small the improvement it remains
important not only from a fiscal penalty viewpoint but it also highlights that the reduction
of energy is not having a correlated negative effect on SLA violations.
This is further reinforced by the examination of ESV figures, a metric that as previ-
ously mentioned provides a more inclusive view of data center performance. Again
Q-Learning/ε-greedy records the lowest ESV, outperforming its rivals by between 5-8%.
The Q-Learning/ε-greedy based model consistently out performs other selection/update
policies in both the 30 day and 100 iterative test, it is therefore deemed the best policy for
this environment and has been chosen as the selection/update policy to be used for the
remaining experiments.
86
12
P O T E N T I A L B A S E D R E WA R D S H A P I N G
12.1 experiment details
Chapter 11 highlights Q-learning/ε-Greedy as the optimum performing update/selection
policy. However this does not infer that the policy is performing optimally. In general
a RL agent learns through trial and error by visiting multiple states and carrying out
multiple actions. However such an approach highlights RLs main limitation, that is, its
slowness to converge to optimum performance. This experiment aims to introduce the
advanced RL technique known as potential based reward shaping as outlined in Section
5.4, as a method to improve current convergence rates. The advanced PBRS algorithm
will be analysed against the standard Q-learning/ε-Greedy developed in the previous
section.
PBRS formally outlined in equation 14. is an additional reward calculated as the differ-
ence between the potential of the original state and the resultant state [24].
F(s , s) = γφ(s ) − φ(s) (14)
This calculation is then introduce into the standard Q-learning update function as fol-
lows;
Q(st, at) + α[rt + F(s , s) + γMax(Q(st + 1, ai) − Q(st, at))] (15)
87
12.2 results
12.2 results
12.2.1 Energy
After a single iteration the standard Q-Learning algorithm consumes 146.14kWh and re-
mains 6.14kWh above the optimum level of energy conservation as determined in Chapter
11, while the PBRS algorithm consumed 141.02kWh just 1.02kWh from the optimum level
of consumption. It takes a further 10 iterations before the standard algorithm reaches a
consumption level at which PBRS began. By this time PBRS has long since broken the
sub 140kWh barrier on the 4th iteration. The standard agent continues to learn and it is
not until post the 32th iteration before a consistent level of deviation of between 0-1kWh
is maintained.
Figure 12.1: PBRS vs Q-Learning energy consumption
88
12.2 results
12.2.2 Migrations
The effects of the PBRS based agent are not restricted to energy only, the effects ripple
through the metrics, none less so than migrations. The standard agent after a single
iteration posts a migration count of 22,243 with the PBRS based agent migrating only
18,021 VMs, 4,042 less. In line with the energy data it is not until the 10th iteration that
the standard based agent reaches a migration rate at which PBRS began. The migration
count remains disparate until the 38th iteration where the differential in the migration
count consistently remains sub 1000.
Figure 12.2: PBRS vs Q-Learning migrations
12.2.3 Service Level Agreement Violations
The effects of the PBRS based agent on SLA violations mirrors that of what we have seen
in the previous two sections. The PBRS agent begins at a reduce rate of SLA violations
compared to the standard based agent by 1.28E-06, only after 26 iterations does the stan-
dard based agent surpass this level. And it is not until iteration 31 performs on par with
that of the PBRS agent.
89
12.2 results
Figure 12.3: PBRS vs Q-Learning slav
12.2.4 ESV
As expected given the reduce rate of energy and levels of SLA violations, the PBRS agents
ESV rating begins 7% lower at 0.004921. On the 11th iteration the standard based agent
surpasses this level for the first time and shortly after the 30th iteration it consistently
performs on par with that of the PBRS agent.
Figure 12.4: PBRS vs Q-Learning ESV
90
12.3 discussion
12.3 discussion
The addition of PBRS to the Q-learning agent has significantly decreased the convergence
time and therefore the time spent by the standard agent learning the state-action space,
overall the PBRS agent was tested on 10% of the overall workload and portrayed con-
sistent results throughout. On all occasions the PBRS converged to a level deemed as
optimal in less than 5 iterations, while the standard agent required on average in excess
of 21 iterations.
The effect was mirrored in relation to migrations with 22% less migrations after a single
iteration, with the standard agent taking an average of 10 iterations to reach this level.
Similar changes were noticeable in regards to the SLA violations with the standard
agent again on average taking over 10 iterations to reach a level where the PBRS agent
began. These improved Energy and SLA violations are reflected in the ESV data, with
the PBRS agent after a single iteration running at a 7% lower rate than the standard. On
average it takes the standard agent another 11 iterations to reach that level, from which
the differential between both agents remain steady.
91
13
C O M PA R AT I V E V I E W O F L R - R L V S L R - M M T
13.1 experiment details
Following on from the experiments carried out in Chapters 8 and 9 and the determi-
nation that a PBRS Q-Learning/ε-Greedy based agent provides optimum performance
this section evaluates the algorithm against the leading VM selection policy in research
literature.
Research has previously established that dynamic consolidation algorithms have statis-
tically out preformed both static allocation policies such as DVFS and that heuristic based
dynamic VM consolidation outperforms online deterministic algorithms [12].
The optimal combination of selection-allocation policies were proven to be that of Lr-
Mmt, statistically out performing multiple disparate algorithms [12].
For that reason Lr-Mmt has been designated as the preeminent algorithm with which to
analyse the dynamic virtual machine selection algorithm, Lr-Rl. A 30 day stochastic real
world workload is provided to both algorithms with each algorithm subject to analysis
under four criteria, energy consumption, service level agreement violations, quantity of
virtual machine migrations and ESV.
92
13.2 results
13.2 results
13.2.1 Energy
Fig.13.1 contains the energy consumption data from the experiment, the paired t-test
shows that there is a statistically significant difference in the consumption of energy when
utilizing Lr-Rl and Lr-Mmt resulting in a P-value <0.0041 with a 95% confidence interval
(-7.8685 , -39.8715). As a result over the 30 day period the Lr-Rl algorithm consumes more
than 716kWh less energy overall or 23.87kWh less a day.
Figure 13.1: Energy consumption for 30 day workload
13.2.2 Migrations
The paired t-test shows that there is a significant statistical difference between Lr-Rl
and Lr-Mmt resulting in a P-value <0.0001 with a 95% confidence interval (-8,389.133
, -13,620.86) in relation to migrations. The results of the migration data per day are dis-
played in Fig.13.2. Through the use of Lr-Rl, migrations over the 30 day period decreases
by 330,154 overall or by an average of 11,005 per day.
93
13.2 results
Figure 13.2: Migrations for 30 day workload
13.2.3 Service Level Agreement Violations
When lowering energy usage within a data center, it is imperative to monitor SLA vi-
olations as the lowering of energy can have a parallel negative effect for example, one
can lower the number of active servers through the extreme consolidation of VMs to a
fewer number of servers, this however, results in a greater possibility of servers reaching
100% utilization of their CPU and VMs access to computational processing is restricted
resulting in violations. The SLA violations are displayed in Fig.13.3, the results of a T-
test shows no statistical difference and thus no negative effect on SLA, with a P-value of
<0.2751 with a 95% confidence interval (1.1365 , -3.9669).
Figure 13.3: SLA violations for 30 day workload
94
13.2 results
13.2.4 ESV
The final evaluation is ESV, results of which can be seen in Fig.13.4, again the results
reinforces the SLA violations and energy data previously gathered. On carrying out a
t-test, Lr-Rl again proved a statistical improvement in performance, with a P-value of
<0.0001 with a 95% confidence interval (-0.0021 , -0.0037).
Figure 13.4: ESV for 30 day workload
95
13.3 discussion
13.3 discussion
In order to take a closer look at the improved performance of Lr-Rl over Lr-Mmt it is
necessary to take a closer look at a single day and the disparities that lie within. On day
21 a saving of 23.02 kWh of energy occurs , with 11,561 less migrations.
The average number of migrations for that day in order to reduce an over utilized host
to a safe workload for Mmt stood at 2.33 over twice that of RL at 1.06. On occasion Mmt
required as much as 12 migrations from a single in order to reach a safe state with Lr
never requiring more than 4 migrations for a single host.
An explanation of the necessity for extra migrations associated with Mmt can be found
in the data of the VMs chosen for migration. On average a VM chosen by Mmt accounts
for as little as 3.60% of the host overall utilization and therefore requires multiple mi-
grations in order to enter an under utilized state. On the other hand Rl chosen VMs on
average accounts for 18.04% of the overall host utilization, therefore when migrated im-
mediately moves the host to a under utilized state. The correlation between the reduced
amount of migrations and energy reduction of Lr-Rl, measured at industry standard 5
minute intervals for day 21 is shown in Fig. 13.5.
Figure 13.5: Energy & Migration Correlation Day 21
96
13.3 discussion
The difference in the selected VMs level of utilization of its host plays a major factor in
determining the overall number of hosts. Mmt however, places another large restriction
on its selection of these VMs as not only does is not take into account the VM utilization
level it also restricts the selection of VMs to that containing the smallest RAM. Therefore
over 79% of VMs selected for migration are those with a Ram size of 613 regardless of
how large or small their workload is. RL in the other hand does not implement such
restrictions taking into account the more holistic value of utilization levels from both the
host and VMs it allows the agent to select VMs across the full spectrum of Ram sizes, as
seen in Fig. 13.6
(a) Lr-Mmt (b) Lr-Rl
Figure 13.6: Ram sizes of virtual machines migrated
This results in Lr-Rl accounting for 716kWh or 15% less energy consumption, 339,154 or
41% less migrations, with a reduction in the ESV level of 38% and no statistical difference
in service level agreement violations.
97
Part V
C O N C L U S I O N
14
C O N C L U S I O N
14.1 contributions
Reinforcement Learning techniques have been successfully applied to resource allocation
for cloud systems prior to this research. However these were at server or node level, this
research proposed a novel approach to incorporate RL at a lower infrastructural level in
the selection of VMs via reinforcement learning. Due to its low level of abstraction, the
algorithm can be incorporated in multiple cloud infrastructures including stand alone
private, federated and multi-cloud infrastructures.
The high level of Co2+ emissions, associated negative environmental effects along with
the increasing cost and demand of energy from data centers formed the motivation for
this research into the creation of a state of the art, low energy, software policy for the
selection of VMs for migration in IaaS environments. In order to produce such an algo-
rithm the thesis evolved to answer the following questions.
• Is RL a viable approach policy for VM selection in the cloud?
• Can advanced RL techniques improve such a policy?
• Can an RL approach outperform the state of the art selection policy?
99
14.1 contributions
Experiments carried out in Chapter 11. aims not only to address our first research ques-
tion but to further the thesis by providing an optimum update/selection policy for the
selection of VMs in an IaaS environment. Results align with Sutton and Bartos view that
whether softmax or ε-greedy action selection is better depends on the task or the envi-
ronment in which it is deployed [73]. Fig.11.3 presents us with evidence of an agent that
consistently learns to reduce energy, analysis of the results show a Q-Learning/ε-greedy
based agent consistently outperforms other update/selection policies across all four met-
rics.
In Chapter 12 the introduction of the advanced RL technique know as potential based
reward shaping further improved the agent based algorithm. Addressing one of RLs
greatest difficulties, the time of convergence often referred to as the learning period and
the second research question of this thesis. The introduction of PBRS has significantly
decreased the convergence time and has resulted in a direct saving of over 32kWh over
the 100 iterations due to a reduction in the convergence time period. Fig 12.2 highlights
the reduction in convergence time, the PBRS agent converged to a level deemed as opti-
mal in less than 5 iterations, while the standard agent required on average in excess of
30 iterations. This improved performance was seen throughout the data center metrics
with a reduction in migrations, SLA violations and an improvement in the overall ESV.
The importance of PBRS shaping when addressing such a complex problem is outlined
by Devlin et al. findings, that these benefits are more beneficial in complex problem do-
mains where reinforcement learning alone takes a long time to converge and has a large
difference in performance between the initial policy and the final policy converged to [24].
The benefits of introducing a PBRS based agent resides directly in line with the results of
many academic papers including, [24] [50] [31] and [92], however no academic literature
or otherwise could be found in which a PBRS based agent was introduced into a cloud
environment like what has been done in these experiments.
100
14.1 contributions
In Chapter 13 the third research question is addressed, Lr-Rl is compared to Lr-Mmt
selection algorithm. The algorithms are provided with a real world 30 workload. This
results in Lr-Rl accounting for 716kWh or 15% less energy consumption, 339,154 or 41%
less migrations, with a reduction in the ESV level of 38% and no statistical difference in
service level agreement violations. These results shows a significant improvement on the
work of Beloglazov and the Lr-Mmt algorithm [11].
Research carried out by Yuan, Voorsluys and Liu et al. [97] [85] [48], all highlight the
potential savings and improved performance as a direct result of the careful selection of
VMs for migration and the overall lowering migrations within a data center. The findings
of this thesis add further proof of such a theory with Fig13.5 highlighting the direct cor-
relation between reduced migrations and reduced energy usage.
The RL selection policy is one of many elements in the overall process of data center
management. However to avail of up to a 15% energy reduction in just one specific area
goes a long way to addressing both Brown et al. and Koomey et al. research , who esti-
mate savings of up to 25% through the introduction of a energy aware software policies
for the management of data centers[14][42].
The results of RL as a selection policy also adds to the possibility of improved per-
formance for many other pieces of research all of whom have developed their own host
detection algorithm but have used Mmt as a selection policy, including research such as
[28] [33] [53] [97] [27] just to name a few.
Viewing the results of Chapter 13 from an environmental viewpoint, with an average
of 23.87kWh per day, this results in a saving of 8715kWh per year. According to the EPAs
calculations that equates to a saving of 5.9 metric tons of CO2, which would require 4.8
acres of mature forest per year to sequestrate [26].
101
14.2 future work
14.2 future work
Arising from the work presented in this thesis a number of possibilities exist as regards
future work such as;
• The extension of testing across a more disperse cloud topology such as a cross-data
center migration scenario
• The extension of testing in a scaled up testbed
• Further development of the RL framework within CloudSim for optimization pur-
poses
Such additional research not only adds to requirement of energy aware management
policies as highlight by Koomey and Brown [42][14], it also adds to the furthers the
development of CloudSim as a research tool for academia and industry to utilise.
102
B I B L I O G R A P H Y
[1] David Abramson, Rajkumar Buyya, and Jonathan Giddy. A computational economy
for grid computing and its implementation in the nimrod-g resource broker. Future
Generation Computer Systems, 18(8):1061–1074, 2002.
[2] Mohamed Almorsy, John Grundy, and Ingo M¨uller. An analysis of the cloud com-
puting security problem. In Proceedings of APSEC 2010 Cloud Workshop, Sydney, Aus-
tralia, 30th Nov, 2010.
[3] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz,
Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A
view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.
[4] Raphael M Bahati and Michael A Bauer. Towards adaptive policy-based manage-
ment. In Network Operations and Management Symposium (NOMS), 2010 IEEE, pages
511–518. IEEE, 2010.
[5] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf
Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization.
ACM SIGOPS Operating Systems Review, 37(5):164–177, 2003.
[6] Enda Barrett, Enda Howley, and Jim Duggan. A learning architecture for scheduling
workflow applications in the cloud. In Web Services (ECOWS), 2011 Ninth IEEE
European Conference on, pages 83–90. IEEE, 2011.
[7] Enda Barrett, Enda Howley, and Jim Duggan. Applying reinforcement learning to-
wards automating resource allocation and application scalability in the cloud. Con-
currency and Computation: Practice and Experience, 25(12):1656–1674, 2013.
[8] Luiz Andr´e Barroso and Urs H¨olzle. The case for energy-proportional computing.
IEEE computer, 40(12):33–37, 2007.
[9] A Barto and RH Crites. Improving elevator performance using reinforcement learn-
ing. Advances in neural information processing systems, 8:1017–1023, 1996.
103
Bibliography
[10] Anton Beloglazov, Jemal Abawajy, and Rajkumar Buyya. Energy-aware resource
allocation heuristics for efficient management of data centers for cloud computing.
Future Generation Computer Systems, 28(5):755–768, 2012.
[11] Anton Beloglazov and Rajkumar Buyya. Optimal online deterministic algorithms
and adaptive heuristics for energy and performance efficient dynamic consolidation
of virtual machines in cloud data centers. Concurrency and Computation: Practice and
Experience, 24(13):1397–1420, 2012.
[12] Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, Albert Zomaya, et al. A
taxonomy and survey of energy-efficient data centers and cloud computing systems.
Advances in Computers, 82(2):47–111, 2011.
[13] Luca Benini, Alessandro Bogliolo, and Giovanni De Micheli. A survey of design
techniques for system-level dynamic power management. Very Large Scale Integra-
tion (VLSI) Systems, IEEE Transactions on, 8(3):299–316, 2000.
[14] Richard Brown et al. Report to congress on server and data center energy efficiency:
Public law 109-431. Lawrence Berkeley National Laboratory, 2008.
[15] Rajkumar Buyya, David Abramson, and Jonathan Giddy. A case for economy grid
architecture for service oriented grid computing. In Parallel and Distributed Pro-
cessing Symposium, International, volume 2, pages 20083a–20083a. IEEE Computer
Society, 2001.
[16] Rajkumar Buyya, Rajiv Ranjan, and Rodrigo N Calheiros. Modeling and simulation
of scalable cloud computing environments and the cloudsim toolkit: Challenges
and opportunities. In High Performance Computing & Simulation, 2009. HPCS’09.
International Conference on, pages 1–11. IEEE, 2009.
[17] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona
Brandic. Cloud computing and emerging it platforms: Vision, hype, and reality for
delivering computing as the 5th utility. Future Generation computer systems, 25(6):599–
616, 2009.
[18] Rodrigo N Calheiros, Rajiv Ranjan, Anton Beloglazov, C´esar AF De Rose, and Rajku-
mar Buyya. Cloudsim: a toolkit for modeling and simulation of cloud computing
environments and evaluation of resource provisioning algorithms. Software: Practice
and Experience, 41(1):23–50, 2011.
104
Bibliography
[19] Michael Cardosa, Madhukar R Korupolu, and Aameek Singh. Shares and utilities
based power consolidation in virtualized server environments. In Integrated Network
Management, 2009. IM’09. IFIP/IEEE International Symposium on, pages 327–334. IEEE,
2009.
[20] V Chaudhary, Minsuk Cha, JP Walters, S Guercio, and Steve Gallo. A comparison
of virtualization technologies for hpc. In Advanced Information Networking and Ap-
plications, 2008. AINA 2008. 22nd International Conference on, pages 861–868. IEEE,
2008.
[21] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Chris-
tian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines.
In Proceedings of the 2nd conference on Symposium on Networked Systems Design &
Implementation-Volume 2, pages 273–286. USENIX Association, 2005.
[22] William S Cleveland. Robust locally weighted regression and smoothing scatter-
plots. Journal of the American statistical association, 74(368):829–836, 1979.
[23] Robert J. Creasy. The origin of the vm/370 time-sharing system. IBM Journal of
Research and Development, 25(5):483–490, 1981.
[24] Sam Devlin, Daniel Kudenko, and Marek Grze´s. An empirical study of potential-
based reward shaping and advice in complex, multi-agent systems. Advances in
Complex Systems, 14(02):251–278, 2011.
[25] Tharam Dillon, Chen Wu, and Elizabeth Chang. Cloud computing: issues and
challenges. In Advanced Information Networking and Applications (AINA), 2010 24th
IEEE International Conference on, pages 27–33. IEEE, 2010.
[26] Epa.gov. Calculations and references — clean energy — us epa, 2015.
[27] Fahimeh Farahnakian, Pasi Liljeberg, and Juha Plosila. Energy-efficient virtual ma-
chines consolidation in cloud data centers using reinforcement learning. In Paral-
lel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International
Conference on, pages 500–507. IEEE, 2014.
[28] Fahimeh Farahnakian, Tapio Pahikkala, Pasi Liljeberg, and Juha Plosila. Energy
aware consolidation algorithm based on k-nearest neighbor regression for cloud
data centers. In Utility and Cloud Computing (UCC), 2013 IEEE/ACM 6th International
Conference on, pages 256–259. IEEE, 2013.
105
Bibliography
[29] Ian Foster, Yong Zhao, Ioan Raicu, and Shiyong Lu. Cloud computing and grid
computing 360-degree compared. In Grid Computing Environments Workshop, 2008.
GCE’08, pages 1–10. Ieee, 2008.
[30] Chunye Gong, Jie Liu, Qiang Zhang, Haitao Chen, and Zhenghu Gong. The char-
acteristics of cloud computing. In Parallel Processing Workshops (ICPPW), 2010 39th
International Conference on, pages 275–279. IEEE, 2010.
[31] Marek Grzes and Daniel Kudenko. Plan-based reward shaping for reinforcement
learning. In Intelligent Systems, 2008. IS’08. 4th International IEEE Conference, vol-
ume 2, pages 10–22. IEEE, 2008.
[32] Steven Hand, Tim Harris, Evangelos Kotsovinos, and Ian Pratt. Controlling the
xenoserver open platform. In Open Architectures and Network Programming, 2003
IEEE Conference on, pages 3–11. IEEE, 2003.
[33] Abbas Horri, Mohammad Sadegh Mozafari, and Gholamhossein Dastghaibyfard.
Novel resource allocation algorithms to performance and energy efficiency in cloud
computing. The Journal of Supercomputing, 69(3):1445–1461, 2014.
[34] Jinhua Hu, Jianhua Gu, Guofei Sun, and Tianhai Zhao. A scheduling strategy on
load balancing of virtual machine resources in cloud computing environment. In
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International
Symposium on, pages 89–96. IEEE, 2010.
[35] Yashpalsinh Jadeja and Kirit Modi. Cloud computing-concepts, architecture and
challenges. In Computing, Electronics and Electrical Technologies (ICCEET), 2012 Inter-
national Conference on, pages 877–880. IEEE, 2012.
[36] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and
Calton Pu. Generating adaptation policies for multi-tier applications in consoli-
dated server environments. In Autonomic Computing, 2008. ICAC’08. International
Conference on, pages 23–32. IEEE, 2008.
[37] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and
Calton Pu. A cost-sensitive adaptation engine for server consolidation of multitier
applications. In Middleware 2009, pages 163–183. Springer, 2009.
[38] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement
learning: A survey. Journal of artificial intelligence research, pages 237–285, 1996.
106
Bibliography
[39] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: the
linux virtual machine monitor. In Proceedings of the Linux Symposium, volume 1,
pages 225–230, 2007.
[40] Nadir Kiyanclar. A survey of virtualization techniques focusing on secure on-
demand cluster computing. arXiv preprint cs/0511010, 2005.
[41] Jonathan Koomey. Growth in data center electricity use 2005 to 2010. A report by
Analytical Press, completed at the request of The New York Times, 2011.
[42] Jonathan G Koomey. Estimating total power consumption by servers in the us
and the world, 2007. Lawrence Berkeley National Laboratory, Berkeley, CA, available at:
http://hightech. lbl. gov/documents/DATA CENTERS/svrpwrusecompletefinal. pdf, 2007.
[43] Jonathan G Koomey, Christian Belady, Michael Patterson, Anthony Santos, and
Klaus-Dieter Lange. Assessing trends over time in performance, costs, and energy
use for servers. Lawrence Berkeley National Laboratory, Stanford University, Microsoft
Corporation, and Intel Corporation, Tech. Rep, 2009.
[44] Dara Kusic, Jeffrey O Kephart, James E Hanson, Nagarajan Kandasamy, and Guofei
Jiang. Power and performance management of virtualized computing environments
via lookahead control. Cluster computing, 12(1):1–15, 2009.
[45] B. Ellison L. Minas. Energy efficiency for information technology: How to reduce
power consumption in servers and data centers. Intel Press, 2009.
[46] Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang, and Chia-Ying Tseng. A dynamic
resource management with energy saving mechanism for supporting cloud com-
puting. International Journal of Grid and Distributed Computing, 6(1):67–76, 2013.
[47] Weiwei Lin, Chen Liang, James Z Wang, and Rajkumar Buyya. Bandwidth-aware
divisible task scheduling for cloud computing. Software: Practice and Experience,
44(2):163–174, 2014.
[48] Haikun Liu, Hai Jin, Cheng-Zhong Xu, and Xiaofei Liao. Performance and energy
modeling for live migration of virtual machines. Cluster computing, 16(2):249–264,
2013.
[49] Patrick Mannion, Jim Duggan, and Enda Howley. Parallel reinforcement learning
with state action space partitioning. In JMLRWorkshop and Conference Proceedings
0:19, 2015 12th European Workshop on Reinforcement Learning.
107
Bibliography
[50] Patrick Mannion, Jim Duggan, and Enda Howley. Learning traffic signal control
with advice. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS
2015), 2015.
[51] Patrick Mannion, Jim Duggan, and Enda Howley. Parallel learning using heteroge-
neous agents. In Proceedings of the Adaptive and Learning Agents Workshop (at AAMAS
2015), 2015.
[52] La¨etitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Reward func-
tion and initial values: better choices for accelerated goal-directed reinforcement
learning. In Artificial Neural Networks–ICANN 2006, pages 840–849. Springer, 2006.
[53] Khushbu Maurya and Richa Sinha. Energy conscious dynamic provisioning of
virtual machines using adaptive migration thresholds in cloud data center. Interna-
tional Journal of Computer Science and Mobile Computing, pages 74–82, 2013.
[54] John McCarthy. Applications of circumscription to formalizing common-sense
knowledge. Artificial Intelligence, 28(1):89–116, 1986.
[55] Lijun Mei, Wing Kwong Chan, and TH Tse. A tale of clouds: Paradigm comparisons
and some thoughts on research issues. In Asia-Pacific Services Computing Conference,
2008. APSCC’08. IEEE, pages 464–469. Ieee, 2008.
[56] David Meisner, Brian T Gold, and Thomas F Wenisch. Powernap: eliminating server
idle power. ACM SIGARCH Computer Architecture News, 37(1):205–216, 2009.
[57] Peter Mell and Tim Grance. The nist definition of cloud computing. Computer Secu-
rity Division, Information Technology Laboratory, National Institute of Standards
and Technology, 2011.
[58] Fereydoun Farrahi Moghaddam, Reza Farrahi Moghaddam, and Mohamed Cheriet.
Carbon-aware distributed cloud: multi-level grouping genetic algorithm. Cluster
Computing, pages 1–15, 2014.
[59] Ripal Nathuji and Karsten Schwan. Virtualpower: coordinated power management
in virtualized enterprise systems. In ACM SIGOPS Operating Systems Review, vol-
ume 41, pages 265–278. ACM, 2007.
[60] Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward
transformations: Theory and application to reward shaping. In ICML, volume 99,
pages 278–287, 1999.
108
Bibliography
[61] Jason Nieh and Ozgur Can Leonard. Examining vmware. Dr. Dobbs Journal, 25(8):70,
2000.
[62] Suraj Pandey, Linlin Wu, Siddeswara Mayura Guru, and Rajkumar Buyya. A par-
ticle swarm optimization-based heuristic for scheduling workflow applications in
cloud computing environments. In Advanced Information Networking and Applications
(AINA), 2010 24th IEEE International Conference on, pages 400–407. IEEE, 2010.
[63] Riccardo Poli, James Kennedy, and Tim Blackwell. Particle swarm optimization.
Swarm intelligence, 1(1):33–57, 2007.
[64] Jette Randløv and Preben Alstrøm. Learning to drive a bicycle using reinforcement
learning and shaping. In ICML, volume 98, pages 463–471, 1998.
[65] Mendel Rosenblum and Tal Garfinkel. Virtual machine monitors: Current technol-
ogy and future trends. Computer, 38(5):39–47, 2005.
[66] Gavin A Rummery and Mahesan Niranjan. On-line q-learning using connectionist
systems. 1994. University of Cambridge, Department of Engineering.
[67] Naidila Sadashiv and SM Dilip Kumar. Cluster, grid and cloud computing: A
detailed comparison. In Computer Science & Education (ICCSE), 2011 6th International
Conference on, pages 477–482. IEEE, 2011.
[68] Yuxiang Shi, Xiaohong Jiang, and Kejiang Ye. An energy-efficient scheme for cloud
resource provisioning based on cloudsim. In Cluster Computing (CLUSTER), 2011
IEEE International Conference on, pages 595–599. IEEE, 2011.
[69] Reza Sookhtsaraei, Mirmorsal Madani, and Atena Kavian. A multi objective virtual
machine placement method for reduce operational costs in cloud computing by
genetic. International Journal of Computer Networks & Communications Security, 2(8),
2014.
[70] Richard S Sutton. Learning to predict by the methods of temporal differences. Ma-
chine learning, 3(1):9–44, 1988.
[71] Richard S Sutton. Introduction: The challenge of reinforcement learning. In Rein-
forcement Learning, pages 1–3. Springer, 1992.
[72] Richard S Sutton. Reinforcement learning: Past, present and future. In Simulated
Evolution and Learning, pages 195–197. Springer, 1999.
109
Bibliography
[73] Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT
Press, 1998.
[74] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of
the ACM, 38(3):58–68, 1995.
[75] Gerald Tesauro, Nicholas K Jong, Rajarshi Das, and Mohamed N Bennani. On the
use of hybrid reinforcement learning for autonomic resource allocation. Cluster
Computing, 10(3):287–299, 2007.
[76] Michel Tokic and G¨unther Palm. Value-difference based exploration: adaptive con-
trol between epsilon-greedy and softmax. In KI 2011: Advances in Artificial Intelli-
gence, pages 335–346. Springer, 2011.
[77] Wei-Tek Tsai, Xin Sun, and Janaka Balasooriya. Service-oriented cloud computing
architecture. In Information Technology: New Generations (ITNG), 2010 Seventh Inter-
national Conference on, pages 684–689. IEEE, 2010.
[78] Rich Uhlig, Gil Neiger, Dion Rodgers, Amy L Santoni, Fernando CM Martins, An-
drew V Anderson, Steven M Bennett, Alain Kagi, Felix H Leung, and Larry Smith.
Intel virtualization technology. Computer, 38(5):48–56, 2005.
[79] Seema Vahora and Ritesh Patel. Cloudsim a survey on vm management techniques.
In International Journal of Advanced Research in Computer and Communication Engineer-
ing, pages 128 – 123, 2015.
[80] Vytautas Valancius, Nikolaos Laoutaris, Laurent Massouli´e, Christophe Diot, and
Pablo Rodriguez. Greening the internet with nano data centers. In Proceedings of the
5th international conference on Emerging networking experiments and technologies, pages
37–48. ACM, 2009.
[81] Akshat Verma, Puneet Ahuja, and Anindya Neogi. pmapper: power and migration
cost aware application placement in virtualized systems. In Middleware 2008, pages
243–264. Springer, 2008.
[82] Akshat Verma, Gargi Dasgupta, Tapan Kumar Nayak, Pradipta De, and Ravi
Kothari. Server workload analysis for power minimization using consolidation. In
Proceedings of the 2009 conference on USENIX Annual technical conference, pages 28–28.
USENIX Association, 2009.
110
Bibliography
[83] vmware.com. Paravirtualization, 2014.
[84] vmware.com. Hypervisor performance, 2015.
[85] William Voorsluys, James Broberg, Srikumar Venugopal, and Rajkumar Buyya. Cost
of virtual machine live migration in clouds: A performance evaluation. In Cloud
Computing, pages 254–265. Springer, 2009.
[86] Carl A Waldspurger. Memory resource management in vmware esx server. ACM
SIGOPS Operating Systems Review, 36(SI):181–194, 2002.
[87] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–
292, 1992.
[88] Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD
thesis, University of Cambridge England, 1989.
[89] Guiyi Wei, Athanasios V Vasilakos, Yao Zheng, and Naixue Xiong. A game-
theoretic method of fair resource allocation for cloud computing services. The Jour-
nal of Supercomputing, 54(2):252–269, 2010.
[90] Shimon Whiteson and Peter Stone. Evolutionary function approximation for rein-
forcement learning. The Journal of Machine Learning Research, 7:877–917, 2006.
[91] Bhathiya Wickremasinghe, Rodrigo N Calheiros, and Rajkumar Buyya. Cloudan-
alyst: A cloudsim-based visual modeller for analysing cloud computing environ-
ments and applications. In Advanced Information Networking and Applications (AINA),
2010 24th IEEE International Conference on, pages 446–452. IEEE, 2010.
[92] Eric Wiewiora, Garrison Cottrell, and Charles Elkan. Principled methods for advis-
ing reinforcement learning agents. In ICML, pages 792–799, 2003.
[93] www.nskinc.com. cloud-computing-101, 2015.
[94] Xenproject.org. Vs15: Video spotlight with cavium’s larry wikelius, 2015.
[95] Andrew J. Younge, Robert Henschel, James T. Brown, Gregor von Laszewski, Judy
Qiu, and Geoffrey Fox. Analysis of virtualization technologies for high perfor-
mance computing environments. In IEEE International Conference on Cloud Comput-
ing, CLOUD 2011, Washington, DC, USA, 4-9 July, 2011, pages 9–16, 2011.
111
Bibliography
[96] Lamia Youseff, Rich Wolski, Brent Gorda, and Chandra Krintz. Paravirtualization
for hpc systems. In Frontiers of High Performance Computing and Networking–ISPA
2006 Workshops, pages 474–486. Springer, 2006.
[97] Jingling Yuan, Xuyang Miao, Lin Li, and Xing Jiang. An online energy saving
resource optimization methodology for data center. Journal of Software, 8(8):1875–
1880, 2013.
[98] Qi Zhang, Lu Cheng, and Raouf Boutaba. Cloud computing: state-of-the-art and
research challenges. Journal of internet services and applications, 1(1):7–18, 2010.
[99] Xiaoyun Zhu, Don Young, Brian J Watson, Zhikui Wang, Jerry Rolia, Sharad Sing-
hal, Bret McKee, Chris Hyser, Daniel Gmach, Rob Gardner, et al. 1000 islands:
Integrated capacity and workload management for the next generation data center.
In Autonomic Computing, 2008. ICAC’08. International Conference on, pages 172–181.
IEEE, 2008.
[100] Dimitrios Zissis and Dimitrios Lekkas. Addressing cloud computing security issues.
Future Generation Computer Systems, 28(3):583–592, 2012.
112

Master_Thesis

  • 1.
    AI Optimisation Approachfor Autonomic Cloud Computing Kieran Flesk Submitted in accordance with the requirements for the degree of Masters of Science Software Design & Development College of Engineering & Informatics National University of Ireland, Galway Research Supervisor: Dr. Enda Howley August 2015
  • 2.
    A B ST R A C T Cloud computing has led to exponential growth in large scale data centers and ware- houses, which form the paradigms substratum layer, Infrastructure as a Service. These large scale server warehouses consume substantial energy, not only to power servers, but also affiliated processes such as cooling. Dynamic consolidation of virtual machines us- ing live migration and switching idle nodes to the sleep mode allows cloud providers to optimize resource usage and reduce energy consumption. The following research pro- poses a novel reinforcement learning approach for the selection of virtual machines for migration. Due to low level of abstraction, the proposed algorithm provides a decision support system which supports efficient and open application deployment, monitoring, and execution across different cloud service providers and results in lowering energy consumption without negatively effecting service level agreements. 2
  • 3.
    A C KN O W L E D G E M E N T S Firstly, I would like to express my sincere gratitude to my supervisor Dr. Enda Howley for the continuous support of my masters study and related research, for his patience, motivation, and immense knowledge. His guidance helped me immensely in the research and writing of this thesis. I could not have imagined having a better adviser and mentor for my masters. I would like to thank my family especially my parents and their unwavering support in my decision to return to education and to my brothers and sister for supporting me throughout the writing this thesis. Finally I would like to thank my fellow researchers and friends who all have con- tributed to the final product in one way or another. 3
  • 4.
    D E CL A R AT I O N The candidate confirms that the work submitted is his own and that appropriate credit has been given where reference has been made to the work of others
  • 5.
    P U BL I C AT I O N A Reinforcement Learning Decision Support System for the Selection of Virtual Machines Kieran Flesk, Dr. Enda Howley Springer Special Edition Journal of Internet Services and Applications Under Review
  • 6.
    C O NT E N T S i introduction 15 1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.1 Motivations and Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ii literature review 19 2 cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1 Origins of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.1 Cluster Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.3 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Characteristics of Cloud Computing . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Scalability of Infrastructure . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Autonomic Resource Control / Elasticity . . . . . . . . . . . . . 22 2.2.3 Service Centric Approach . . . . . . . . . . . . . . . . . . . . . . 23 2.2.4 Omnipresent Network Accessibility . . . . . . . . . . . . . . . . 23 2.2.5 Multi-Tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.6 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Cloud Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Private Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 Community Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.3 Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.4 Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Cloud Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 data centers and energy consumption . . . . . . . . . . . . . . . . 29 3.1 Areas of energy consumption . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.1 Server Consumption . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Power Management Techniques . . . . . . . . . . . . . . . . . . . . . . . 30 6
  • 7.
    CONTENTS 3.2.1 Dynamic ComponentDeactivation . . . . . . . . . . . . . . . . . 31 3.2.2 Dynamic Performance Scaling . . . . . . . . . . . . . . . . . . . 31 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4 virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Modern Day Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Levels of Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.1 Full Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.3 Hardware Assisted Virtualization . . . . . . . . . . . . . . . . . 36 4.3 Hypervisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.3 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1 Agent / Environment Interaction . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Learning Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.2 SARSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Action Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 ε-Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.2 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Reward Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.4.1 Potential Based Reward Shaping . . . . . . . . . . . . . . . . . . 47 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6 related research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.1 Threshold and Non-Threshold Approach . . . . . . . . . . . . . . . . . 50 6.2 Artificial Intelligence Based Approach . . . . . . . . . . . . . . . . . . . 53 6.3 Reinforcement Learning Based Approach . . . . . . . . . . . . . . . . . 55 6.4 Virtual Machine Selection Policies . . . . . . . . . . . . . . . . . . . . . 56 6.4.1 Maximum Correlation . . . . . . . . . . . . . . . . . . . . . . . . 57 6.4.2 Minimum Utilization Policy . . . . . . . . . . . . . . . . . . . . 57 6.4.3 The Random Selection Policy . . . . . . . . . . . . . . . . . . . . 57 6.5 Research Group Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7
  • 8.
    CONTENTS iii methology 60 7cloudsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 7.2 CloudSim Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.3 Energy Aware Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.3.1 Initialising an Energy Aware Policy . . . . . . . . . . . . . . . . 64 7.3.2 Creating a Selection Policy . . . . . . . . . . . . . . . . . . . . . 64 7.4 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.5 Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8 algorithm development . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.1 Registering a Selection Policy . . . . . . . . . . . . . . . . . . . . . . . . 66 8.2 Recording of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8.3 Additional Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.3.1 Lr-RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.3.2 RlSelectionPolicy . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.3.3 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.3.4 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.3.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.3.6 RlUtilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 9 implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 9.1 State-Action Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 9.2 Q-Learning Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 71 9.3 SARSA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 iv experiments 74 10 experiment metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 10.1 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 10.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 10.3 Service Level Agreement Metrics . . . . . . . . . . . . . . . . . . . . . . 75 10.3.1 SLATAH, PDM & SLAV . . . . . . . . . . . . . . . . . . . . . . . 76 10.4 Energy and SLA Violations . . . . . . . . . . . . . . . . . . . . . . . . . 76 11 selection of policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 8
  • 9.
    CONTENTS 11.1 Experiment Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 11.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 11.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 11.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 11.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 84 11.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 11.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 12 potential based reward shaping . . . . . . . . . . . . . . . . . . . . . 87 12.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 12.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 12.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 12.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 12.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 89 12.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 12.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 13 comparative view of lr-rl vs lr-mmt . . . . . . . . . . . . . . . . . 92 13.1 Experiment Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 13.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 13.2.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 13.2.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 13.2.3 Service Level Agreement Violations . . . . . . . . . . . . . . . . 94 13.2.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 13.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 v conclusion 98 14 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 14.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 14.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9
  • 10.
    L I ST O F F I G U R E S Figure 2.1 Private cloud [93] . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Figure 2.2 Public cloud [93] . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Figure 2.3 Hybrid cloud[93] . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Figure 2.4 High level cloud architecture [15] . . . . . . . . . . . . . . . . . 27 Figure 5.2 PBRS effect [92] . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Figure 7.1 CloudSim class structure [12] . . . . . . . . . . . . . . . . . . . 65 Figure 8.1 The reinforcement learning CloudSim architecture . . . . . . . 68 Figure 11.1 Energy consumption Q-Learning 100 iterations . . . . . . . . 78 Figure 11.2 Energy consumption SARSA 100 iterations . . . . . . . . . . . 79 Figure 11.3 Q-Learning ε-Greedy vs SARSA ε-Greedy . . . . . . . . . . . . 79 Figure 11.4 Overall Energy Consumption 30 Day Workload . . . . . . . . 80 Figure 11.5 Average Daily Energy Consumption 30 Day Workload . . . . 80 Figure 11.6 SARSA migrations 100 Iterations . . . . . . . . . . . . . . . . . 81 Figure 11.7 Q-Learning migrations 100 Iterations . . . . . . . . . . . . . . 81 Figure 11.8 Accumulated rewards for cliff walking task [73] . . . . . . . . 82 Figure 11.9 Accumulated rewards for migrations . . . . . . . . . . . . . . 82 Figure 11.10 Average migrations 100 iterations . . . . . . . . . . . . . . . . 83 Figure 11.11 Average migrations 30 day workload . . . . . . . . . . . . . . 83 Figure 11.12 Overall SLA violations 100 iterations . . . . . . . . . . . . . . . 84 Figure 11.13 Overall SLA violations for 30 days . . . . . . . . . . . . . . . . 84 Figure 11.14 Overall ESV for 100 iterations . . . . . . . . . . . . . . . . . . . 85 Figure 11.15 Overall ESV for 30 days . . . . . . . . . . . . . . . . . . . . . . 85 Figure 12.1 PBRS vs Q-Learning energy consumption . . . . . . . . . . . . 88 Figure 12.2 PBRS vs Q-Learning migrations . . . . . . . . . . . . . . . . . . 89 Figure 12.3 PBRS vs Q-Learning slav . . . . . . . . . . . . . . . . . . . . . . 90 Figure 12.4 PBRS vs Q-Learning ESV . . . . . . . . . . . . . . . . . . . . . . 90 Figure 13.1 Energy consumption for 30 day workload . . . . . . . . . . . . 93 Figure 13.2 Migrations for 30 day workload . . . . . . . . . . . . . . . . . . 94 Figure 13.3 SLA violations for 30 day workload . . . . . . . . . . . . . . . 94 10
  • 11.
    LIST OF FIGURES Figure13.4 ESV for 30 day workload . . . . . . . . . . . . . . . . . . . . . . 95 Figure 13.5 Energy & Migration Correlation Day 21 . . . . . . . . . . . . . 96 Figure 13.6 Ram sizes of virtual machines migrated . . . . . . . . . . . . . 97 11
  • 12.
    A C RO N Y M S SLA Service Level Agreement API Application Programming Interface OS Operating system QOS Quality of service IT Information technology IaaS Infrastructure as a service PaaS Platform as a service SaaS Software as a service UPS Universal power supply PUE Power Usage Efficiency PDU Power distribution unit DVFS Dynamic voltage frequency scaling DRAM Direct Random Access Memory SPM Static power management DPM Dynamic power management DCD Dynamic component distribution DPS Dynamic performance scaling CTSS Compatible time sharing systems CP Control program CMS Conventional monitor system 12
  • 13.
    VMM Virtual machinemanagement ABI Application binary translation KVM Kernel virtual machine MMU Memory management unit TLB Table lookaside buffer RL Reinforcement Learning TD Temporal Difference PBRS Potential Based Reward System AI Artificial Intelligence MDP Markov Decision Process PM-L Local power management PM-G Global power management LNQS Layered queuing network solver GA Genetic algorithm MLGGA Multi Layered Grouped genetic algorithm GGA Grouped genetic algorithm LR Local regression MMT Minimum migration time VM Virtual machine RL Reinforcement Learning CPU Central processing unit LR Local Regression MC Maximum Correlation 13
  • 14.
    MC Maximum Correlation MUMinimum Utilization RS Random Selection MIPS Millions of instructions per second PDM Performance Degradation Due to Migration SLATAH Service Level Agreement Time Per Active SLAV Service Level Agreement Violation ESV Energy and Service Level Agreement Violation PC Personal Computer 14
  • 15.
    Part I I NT R O D U C T I O N
  • 16.
    1 I N TR O D U C T I O N Cloud computing refers to both the applications delivered as services over the Internet and the hardware and software systems in the data centers that provide them [3]. Buyya et al. defines cloud computing as a type of parallel and distributed system, consisting of a collection of interconnected and virtualized computers, that are dynamically provi- sioned and presented as one or more unified computing resources, based on service level agreements (SLA) established through negotiation between the service provider and cus- tomer [17]. Regardless of the ever growing heterogeneous nature of cloud platforms and deployments, this definition still rings true. Other key cornerstones also remain despite the ever changing landscape, one such cor- nerstone is the ability of cloud providers to virtualize the key constituents which form the lowest level of the cloud architecture known as infrastructure as a service layer (IaaS), principally large scale data centers. The virtualization of large scale congregations of nodes, typical of that found in modern day data centers into multiple virtual indepen- dent machines executing on a single node, not only allows for the plasticity of services, but plays a key role in the high level adherence of SLAs and maximum utilization of resources which underpin the foundations of cloud computing, while providing maxi- mum return on investment for service providers. However, due to the dynamic nature of cloud services and their demands, the static or offline management of these virtual ma- chines (VM) provides significant restrictions in the provision of an idealistic cloud service. 16
  • 17.
    1.1 motivations andaims The provision of such virtualized environments and services comes at a cost, studies such as [8][41][42], highlight the growth of data centers directly resulting in • An increase in energy consumption in the range of billions of kwh’s from the begin- ning of the decade. • An annual increase in the emission of Co2 from 42.8 million metric tons in 2007 to 67.9 million metric tons in 2011. Much of this energy is wasted in idle systems, in typical deployments, server utilization is below 30%, but idle servers still consume 60% of their peak energy draw. In order to combat such wastage, advanced consolidation policies for under utilized and idle servers are required to be deployed [56]. Two key findings from the Report to Congress titled Server and Data Center Energy Efficiency 2007 directly addresses this issue by stating that ; ”Existing technologies and strategies could reduce typical server energy use by an estimated 25%. ”[14] ”Assuming state-of-the-art energy efficiency practices are implemented throughout U.S. data cen- ters, this projected energy use can be reduced by up to 55% compared to current efficiency trends”[14] The following thesis purposes one such state-of-the-art energy efficiency policy. 1.1 motivations and aims Motivated by these facts and the success of previous research regarding reinforcement learning (RL) as an optimisation technique. The aim of this research is to design, develop, implement and evaluate a RL agent based approach for the selection of VMs for migration in a stochastic IaaS environment in order to reduce energy consumption. 17
  • 18.
    1.2 research questions 1.2research questions This thesis aims to answer the following research questions • Is reinforcement learning a viable approach for virtual machine selection in the cloud ? • Can advanced reinforcement learning techniques improve such a policy ? • Can a reinforcement learning approach outperform the state of the art selection policy ? 1.3 thesis structure The thesis is laid out as follows. • Chapter 1 contains an introduction that provides an overview of the research topic and introduces the research questions, motivations and aims. • Chapter 2-6 contains a literature review covering – Cloud Computing – Data Centers and Energy Consumption – Virtualization and Hypervisors – Reinforcement Learning – Resource Allocation and Selection Methods • Chapter 7-9 contains the methodology of the thesis including – CloudSim Simulator – Algorithm Development & Implementation • Chapter 10-13 contains the experiments carried out – The Policy Selection – Addition of Potential Based Reward Shaping – Comparative View of Lr-Rl vs Lr-Mmt • Chapter 14 contains the conclusions and possible areas of future work. 18
  • 19.
    Part II L IT E R AT U R E R E V I E W
  • 20.
    2 C L OU D C O M P U T I N G The following chapter contains an in depth review of the most pertinent academic re- search available in relation to cloud computing, its characteristics, architecture, service and deployment models. 2.1 origins of cloud computing There has been a long time vision of providing computer services as a utility alongside water, gas, electricity and telephone. To achieve this individuals and companies must be able to access the services they require on demand with the scalability and flexibility they require in a pay-per-use environment [17]. The following section outlines the historic progression towards such a scenario. 2.1.1 Cluster Computing Originally super computers led the way in large scale computational tasks in areas such as science, engineering and commerce, eventually however more extensive computational energy was required to cater for such problems and from this cluster computing was developed. A cluster is a collection of parallel or distributed computers, which are in- terconnected among themselves using high speed networks often, in the form of local area networks [67]. Multiple computers and their resources are combined to function as a virtual computer, allowing for greater computational energy. Each node carries out the same task and each cluster contains redundant nodes, which allows for a backup should a utilized node fail. Computers in a cluster can be described as homogeneous as they use the same operating systems (OS) and hardware. 20
  • 21.
    2.1 origins ofcloud computing 2.1.2 Grid Computing Grid computing, originally developed to meet the high computational demands of sci- entific research. Grid computing is a distributed network, which couples a wide variety of geographically distributed computational resources such as personal computers (PCs), workstations, clusters, storage systems, data sources, databases, computational kernels, special purpose scientific instruments and presents them as a unified integrated resource [15]. These grids are commonly established maintained and owned by large research groups with shared interest. Such an infrastructure requires a complex management system having to manage multiple global locations, multiple owners, heterogeneous com- puter networks and hardware as well as user policies and availability [1]. 2.1.3 Cloud Computing The most recent computing paradigm to progress towards the vision of providing com- puter services as a utility is cloud computing. A cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dy- namically provisioned and presented as one or more unified computing resources in order to provide a service underpinned by high levels of quality of service (QOS) and SLAs [17]. Cloud computing has been defined in many different ways the following is just one of those definitions Ian Foster et al. describes it as, ”A large scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing energy, storage, platforms and services are delivered on demand to external customers over the Internet.” [29] 21
  • 22.
    2.2 characteristics ofcloud computing 2.2 characteristics of cloud computing The characteristics of cloud computing infrastructures and models are a recurring concept in literature, highlighted no more so than by Gong et al. comprehensive research review [30]. The following section contains a brief explanation of these characteristics. 2.2.1 Scalability of Infrastructure A key feature of any cloud computing architecture is its ability to scale accordingly in accordance with peaks and troughs in customer demand , such scalability not only al- lows for providers to maintain SLAs but also allows for the strategic management of data center resources, thus reducing costs. Scalability can be summarised into two separate categories. Horizontal scalability refers to the ability of a node to access extra processing resources from other nodes within a data center i.e multiple nodes to work as a single logical node to perform a task, therefore maintaining SLAs and QOS. Vertical scalability refers to the ability of additional resources to be added to a single node where necessary, such as increasing bandwidth, memory or central processing unit (CPU) utilization [55]. 2.2.2 Autonomic Resource Control / Elasticity The ability of services to be extended or retracted autonomously depending on demand is a key characteristic of cloud computing [100]. This is also referred to by Zhang et al. as the ability of self-organization [98]. This elasticity is a key aspect which differentiates cloud computing from the more rigid grid and cluster computing. It is also a key selling point as customers are offered the ability to re-size their hardware needs in parallel with their requirements without the expense of investing in physical resources which may lie largely redundant for long periods of time. 22
  • 23.
    2.2 characteristics ofcloud computing 2.2.3 Service Centric Approach Cloud computing providers deliver an on-demand service model, delivering services when and where they are needed. These services are provided in accordance to SLAs and QOS agreed between the consumer and provider prior to provider obtaining control of the task [98]. 2.2.4 Omnipresent Network Accessibility Services can be accessed via an Internet connection from any location using a range of heterogeneous devices at any given time [100]. 2.2.5 Multi-Tenancy Multi-tenancy refers to the sharing of cloud resources, including CPU, memory, networks and applications [2]. Multi-tenancy in cloud computing plays a key role in the financial viability of providing such a service. Although users share these resources, providers place a layer of virtualization technology above the hardware layer which allows a cus- tomized and partitioned virtual application. Multi-tenancy is seen as a major aspect of cloud security at all layers of the cloud infrastructure, through partitioning and isolation of virtual resources providers strive to provide maximum security [2]. 2.2.6 Virtualization Virtualization allows service providers to create multiple instances of virtual machines on a single server [11]. Each of these virtual machines can run different operating systems (OS) independent of the underlying OS. The ability of a provider to provide multiple instances of machines from a single server contributes greatly to the viability of cloud computing by maximising return on investment. Virtualization is discussed in further detail in Chapter 4. 23
  • 24.
    2.3 cloud deploymentmodels 2.3 cloud deployment models There are four types of cloud deployment models that are commonly referred to in litera- ture, these are private, public, hybrid and community. The following section outlines the structure of each model. 2.3.1 Private Cloud A private cloud is a cloud model that is devoted solely to one organisation. The cloud infrastructure may be located in-house or elsewhere in a single or multiple data centers. It may be managed solely by the organisation or by a third party [25]. A private cloud can offer high security, performance and reliability, however the cost associated with private clouds is often higher than that of other models [98]. Figure 2.1: Private cloud [93] 24
  • 25.
    2.3 cloud deploymentmodels 2.3.2 Community Cloud A community cloud is a cloud infrastructure built and shared by multiple organisations which share common policies, practices and regulations. The underlying infrastructure can be third party hosted or by an individual organisation within the community [35]. 2.3.3 Public Cloud A public cloud is where commercial entities offer cloud services to the general public usually on a pay-per-use model. Public clouds offer the benefit of no upfront capital expenditure on infrastructure, however the refinement of services and security as seen in private clouds are not as extensively available [98]. Examples of such services are Amazon EC2 or Google compute engine. Figure 2.2: Public cloud [93] 25
  • 26.
    2.3 cloud deploymentmodels 2.3.4 Hybrid Cloud A hybrid cloud contains the facilities of two or more cloud models, private and other public or community clouds. Such a model allows for a private cloud to be held in- house while certain aspects of the information technology (IT) infrastructure can be held on public clouds. Such an infrastructure supplies an organisation with the ability to retain high security and specific optimisation, while maintaining the elasticity provided by public clouds [98]. Figure 2.3: Hybrid cloud[93] 26
  • 27.
    2.4 cloud architecture 2.4cloud architecture Fig.2.4 shows a high level view of a cloud computing architecture, an architecture which is tightly coupled with what is known as the cloud service models. At the lowest level is the hardware level these are data centers that hold large volumes of physical servers and associated equipment. On top of the hardware lies the infras- tructure layer, this layer virtualizes the servers held in the data centers on demand, by creating multiple instances of virtual machines which includes virtualizing CPUs, mem- ory, storage etc. These first two layers combine the necessary elements to provide IaaS to the consumer. Figure 2.4: High level cloud architecture [15] The third layer, the platform layer provides a development, modelling, testing and deployment environment for developers of applications hosted in the cloud [29]. The de- velopers have little or no access to the underlying networks servers etc. except for some minor user configuration [57]. This allows for the provision of a platform as a service (PaaS) as a cloud service model. The top layer known as the application layer is the user interface of cloud computing usually supplied via browsers with heterogeneous Internet enable devices [77]. This layer allows access via a web browser or an application interface for software applications hosted on cloud servers. The consumer has no control of the underlying infrastructure of the cloud or the applications capabilities except those provided by the creator. This cloud service model is referred to as software as a service (SaaS) [57]. 27
  • 28.
    2.5 summary 2.5 summary Thischapter reviewed cloud computing from a high level viewpoint, reviewing the ori- gins, characteristics, models and architecture of cloud computing. Key pieces of liter- ature outlined the foundations of cloud computing in grid and cluster computing, the importance of autonomic resource control, scalability of infrastructure and virtualization in providing a cost effective and adaptable cloud.This chapter concludes with a review of could models and architecture in order to convey their everyday real world use and applications. 28
  • 29.
    3 D ATA CE N T E R S A N D E N E R G Y C O N S U M P T I O N From the years 2005 to 2010 the worldwide consumption of energy in data centers has increased by 56%. In the year 2010 the data center energy consumption worldwide ac- counted for 1.3% of all energy consumption. Furthermore of the approximate 6000 data centers present in America in 2006 it cost $4.5 billion in energy overheads [41]. These fig- ures highlight the extensive consumption of energy in data centers and the necessity of all shareholders to actively pursue methods by which to reduce consumption both from an economic and environmental viewpoint . This chapter examines the most current and rel- evant research in relation to energy consumption and preservation techniques deployed within large scale data centers. 3.1 areas of energy consumption For a number of years, researchers and engineers have focused on improving the perfor- mance of data centers, and in doing so have improved systems year on year. However although the performance per watt has increased the total energy consumption has re- mained static and in some cases risen [43]. In order to combat excessive consumption of energy it is important to recognize the disparate elements which consume energy within a data center. Servers naturally consume a large proportion of the overall energy intake, however the associated infrastructural demands are also a major factor when calculating overall costs. These costs are calculated via the Power Usage Efficiency (PUE) metric, which is defined as the ratio between the total energy consumed by a data center and the energy consumed by IT equipment such as servers, networking equipment and disk drivers. The PUE factor ranges from as high as 2.0 in legacy data centers to as low as 1.2 in recent state of the art facilities [80]. At a PUE rate of 2.0, every kilowatt utilised by IT components another kilowatt is consumed by infrastructure loads such as cooling, fans, 29
  • 30.
    3.2 power managementtechniques pumps, uninterrupted energy supplies (UPS) and power distribution units (PDU). In or- der to remain within the scope of the research paper the author will solely investigate energy usage in relation to IT components. 3.1.1 Server Consumption Intel research has proven that the main source of energy consumption in a server remains the CPU, however it no longer maintains the dominance of energy consumption it once did due to the implementation of energy efficiency and energy saving techniques such as dynamic voltage and frequency testing (DVFS) [45]. DVFS is a hardware based solution which dynamically adjusts voltage and frequency of a CPU in accordance with workload demand. The purpose of applying DVFS is to reduce energy consumption by lowering the voltage and frequency levels, however this can lead to degradation of execution speeds [46]. DVFS is important for energy management at server level as it allows a CPU to run at levels as low as 30%. However the CPU is the only server component with the ability to perform such a task, disk drives, dynamic random-access memory (DRAM), fans etc can only cycle between states of on, off or idle which result in a idle server consuming in excess of 70% of its overall energy. 3.2 power management techniques Energy management techniques are incorporated in all aspects of system design, Bel- oglazov breaks these techniques into two subsections of static power management (SPM) and dynamic power management (DPM). SPM incorporates all design time power man- agement methods, including complex gate and transistor design, energy switching in circuits at a logical level and the incorporation of energy optimization techniques at archi- tecture level [12]. Dynamic power management (DPM), refers to the run-time adaptability of a system in correlation with resource demand. DPM technologies can be further sub- divided into two further sections dynamic component deactivation (DCD) and dynamic performance scaling (DPS). 30
  • 31.
    3.2 power managementtechniques 3.2.1 Dynamic Component Deactivation DCD incorporates the switching of power states for a component that does not incorpo- rate DPS techniques such as DVFS. Switching between power sates i.e active-idle, idle-off can result in significant energy consumption at the reinitialisation stage should the com- ponent be required at a later stage, therefore it is necessary to ensure DCD occurs only when the saved energy consumption through deactivation is greater than that accrued during reinitialization [12]. Benni et al. states that to apply DCD techniques, a workload must be possible to predict [13]. This prediction and the accuracy thereof is imperative to the performance of such techniques. These predictions are based on usage of the overall system to date and possible use in the near future. An example as shown by Benini et al. is that of a timeout function on a laptop, where a laptop moves from active to idle after a period of time in the presumption that it was idle for x minutes therefore it is likely to remain idle for an additional x amount of time [13]. Predictive policies rely on past data and its correlation to future events. Through the analysis of past performance and demands the system forms both predictive shut down and predictive wake up techniques. 3.2.2 Dynamic Performance Scaling DPS allows the utilization and application of energy saving techniques in hardware com- ponents with the ability to alter the frequency and clock speeds, mainly CPUs when they are not fully utilized. This technique is known as DVFS. In order to save maximum en- ergy, a system requires both frequency scaling i.e the ability to alter the clock speed and voltage scaling. The implementation of such a technique is by no means straight forward, reducing the instruction processing capability reduces throughput and performance, this in turn increases a programs run-time and may not result in maximum energy consump- tion, therefore it is necessary to balance the energy/performance ratios within a system through careful approximation. In order to optimize the ratio three common techniques are implemented. Interval based algorithms harnesses past system usage data and adjusts voltage and frequency in line with predicted future use. Intertask algorithms distinguish the number of tasks in relation to the CPU in real-time systems and denotes resources appropriately. This can become complex in a system with unpredictable heterogeneous workloads. Intratask 31
  • 32.
    3.3 summary algorithms looksat the data and individual components with a specific program and then provides resources appropriately [12]. 3.3 summary This chapter focused on the area of energy consumption in data centers. Following a brief introductory section highlighting energy consumption on a world scale, Section 3.1.1 focuses on the specific area of energy consumption within data centers through tertiary elements such as PDUs and UPS and defines the PUE metric. Section 3.1.1 takes a closer look at hardware specific consumption, particularly at server level. The chapter closes by reviewing two key power management techniques, DCD and DVFS. 32
  • 33.
    4 V I RTU A L I Z AT I O N Although virtualization has seen increased notoriety and usage since the early 1990’s, it was originally developed as far back as 1964 by IBM as a method to increase productivity levels of both the hardware and the user. In the 1960’s many engineers, scientist and large scale research groups were using programs to carry out research, however these programs were resource intensive, requiring the full use of the hardware system and the supervision of a researcher to run and record results. This led to some pioneering work in areas such as compatible time sharing systems (CTSS) at M.I.T in the early 1960’s [23]. CTSS allowed batch jobs to be run in parallel with users request to run programs. This in turn led to the creation of the control program and conventional monitor system (CP/CMS) in 1964 and was known as a second generation time sharing machine, this machine was built on the concepts of the earlier CTSS. CP pro- vided separate computing environments, while the CMS allowed for autonomy through sharing, allocation and protection policies [23], similar to the operations carried out by the virtualization layer in modern cloud environments. 33
  • 34.
    4.1 modern dayvirtualization 4.1 modern day virtualization In 1998 VMware conquered the task of virtualizing the x86 platform through a combina- tion of binary translation and direct execution on the processor allowing multiple guest OS on a single host [83]. Virtualization is the faithful reproduction of an entire architecture in software which provides the illusion of a real machine to all software running above it [40]. In an era of on-demand computing the ability to virtualize a single server into multiple instances of virtual machines running separate guest OS and with secure, reliable access to resources such as I/O devices, memory and storage has proven imperative in the growth of cloud computing. Virtualization of a single server into multiple VMs is achieved by placing an extra layer known as a hypervisior directly on top of the hardware and beneath the OS layer. This layer, also known as Virtual machine monitor (VMM) is responsible for providing total mediation between all VMs and the underlying hardware [65]. The VMM allows access to resources held in the infrastructure layer while ensuring isolation of VMs which improves security levels and reliability. The VMM also spawns new VMs on demand, migrates VMs to existing or new instances when necessary and applies consolidation techniques by moving VMs from underutilized hosts and powers down these hosts in order to conserve energy. 34
  • 35.
    4.2 levels ofvirtualization 4.2 levels of virtualization Virtualization can be applied using three different techniques, full virtualization, para virtualization and/or hardware assisted virtualization. All three methods must and have dealt with the need to alter the privilege level of an architecture also referred to in the literature as ring aliasing or ring compression, to allow for virtualization to take place. For example, in an x86 level architecture there are four levels of privilege, the OS takes the lowest level and therefore presumes it has direct access to the host it is placed on, however by placing a virtualization layer underneath the OS, the levels of privilege have now been altered. 4.2.1 Full Virtualization Full virtualization allows for the complete isolation of a guest OS from the underlying infrastructure. This allows for an unmodified OS to run using a hypervisor to trap and translate privileged instructions on the fly or through the use of binary translation [20]. Although full virtualization can carry high overheads due to the need to catch and trans- late privileged instruction, it does provide the most secure and isolated environment for VMs. 4.2.2 Paravirtualization Paravirtualization continues to employ a hypervisior, however this method of execution requires the hypervisor to alter the kernel of the guest VM. The hypervisor alters the OS calls and replaces these with hypercalls, allowing for direct communication between the guest VM and the hypervisor without the need to process privilege instruction or the need to create binary translation, thus decreasing overhead [83]. In doing this paravirtu- alization reduces the need for binary translation, and therefore significantly simplifies the process of virtualization [96]. 35
  • 36.
    4.2 levels ofvirtualization 4.2.3 Hardware Assisted Virtualization Hardware assisted virtualization also referred to in literature as native virtualization [20], is an alternative method of virtualization that seeks to overcome the limitations of par- avirtualization by removing the need for CPU virtualization and the overheads of full virtualization which occurs through binary translation. Both Intel and AMD support hardware assist virtualization through the Intel-VT and AMD-V virtualization extensions [20]. In order to address the problem of virtualization and in particular the levels of priv- ileges required for systems to run effectively and efficiently, Intel Vt-x which supports the IA-32 processor virtualization consists of two separate forms of privileges, the guest runs the VMX non root operation and the hypervisor runs a VMX root operation which provides them with four separate levels of privilege. This allows the guest OS to run on its expected level O privileges and provides the hypervisior with the ability to run multiple privilege levels. In order to run this configuration Intel has applied two extra transitions, the guest to the hypervisior know as a VM exit and from the hypervisior to the guest known as the VM entry. The VM exit and entry are managed via a virtual machine control structure, which subdivides the IA-32 into two sections, one to deal with the guest state while the other deals with the host state [78]. 36
  • 37.
    4.3 hypervisors 4.3 hypervisors Ashighlighted in the above literature, in order to implement virtualization there is a need to deploy a hypervisior commonly referred to as a VMM, on top of the hardware level within the system, also referred to in literature as bare-metal level. The hypervisior provides an intermediate layer between VMs and the underlying hardware. This layer allows for the total encapsulation of the VM which provides stability security and relia- bility from bugs or malicious attacks, mapping and remapping of existing and new VMs. The subsections below review the most commonly deployed hypervisors in data centers today. 4.3.1 Xen Xen is a open source project whose hypervisior is currently powering some of the largest cloud projects deployed today, such as Amazon web services, Google and Rackspace services [94]. Originating in the early 1990’s in Cambridge University, the Xenoserver computing infrastructure project proposed the creation of a simple hypervisior which allows users to run their own OS and the added capability to run specifically designed applications directly on top of the hypervisor to improve performance and allow a sub- stantial number of disparate guest OS [32]. In 2002 Xen was released as a open source project, it has since seen four major updates. Xen put forward a paravirtualized architecture, citing the complexity of full virtualiza- tion as a major and unwelcome cost. Xen believed that the hiding of virtualization from guest OS risked correctness and performance and that paravirtualization was necessary to obtain high performance robustness and isolation[5]. In order to do this the hypervisor must cater for all standard application binary interfaces (ABI) and support a full range of OS. In 2005 Xen in conjunction with Cambridge and Copenhagen Universities, introduced the design and implementation for the live migration of VMs. This was a major step forward in hypervisior efficiency. Live migration could be completed with a downtime as low as 60ms, it allowed for the decommissioning of the original VM once the transfer was complete and it also allowed for media services etc. to be transferred without the need for users to reconnect, it also allowed for the VM to be transferred as a single unit, elim- 37
  • 38.
    4.3 hypervisors inating theneed for the hypervisor to have knowledge of individual applications within the VM. This further progressed the maintainability of a data center by further improving the ability to perform dynamic consolidation of VMs [21]. Today, Xen offer a large range of virtualization solutions for multiple architectures, in- cluding ARM and x86, it also provides the capability to virtualize a large range of OS, including Linux, Solaris and Windows through the use of full hardware assisted virtual- ization. 4.3.2 KVM The kernel virtual machine (KVM) originated in 2006 as an open source project, KVM requires Intel VT-x or AMD-v instruction sets to run, both of which were also made available in 2006. A KVM hypervisior allows for up to 16 virtual CPUs running full virtualization methods [95]. The KVM leverages the hardware extensions provided by Intel and AMD to add a hypervisior to a Linux environment. Once this hypervisor is added to the environment, it also adds a /dev/kvm device node which allows users to create virtual machines, read and write to virtual CPUs, run a virtual CPU and inject in- terrupts and allocate memory via a memory management unit (MMU) for the translation of virtual address to physical address. This MMU consists of a page table which encodes the virtual addresses to physical address, a notification manager for page faults and a table lookaside buffer (TLB) and instruction set, all located on the chip to decreases table look-up time [39]. 4.3.3 VMware VMware is a hypervisor, which is a result of research carried out by Stanford University [61]. In 1998 VMware built on this research and virtualized the x86 architecture through binary translation and direct processor execution [84]. The implementation of full binary translation allowed VMware to deploy full virtualization of its platform as well as the ability of its guest VM to host a range of OS including Linux and Windows. 38
  • 39.
    4.4 summary Originally VMwareoffered VMworkstation as a deployed as a hosted architecture which placed a virtual layer directly as an application on the host VM. In more recent times VMware ESX uses a hypervisor layer of technology placed on bare-metal, signifi- cantly increasing I/O performance [86]. Similar to Xen 3.0.1 and KVM it utilises a data structure to track the translation of vir- tual pages to physical memory pages, shadow pages are kept in sequence with the pmap structure for the processor in order to minimise overheads. VMware Drs monitors VMs within a data center, by leveraging VMmotion which allows for live migration and VM schedulers, it allocates and reallocates VMs as necessary.VMware HA monitors hosts for failures. It allows for rapid redeployment of VMs on a failed host when necessary, it also ensures that the required storage to facilitate this redeployment is available at all times within a cluster [86]. 4.4 summary This chapter reviewed the area of virtualization, beginning by looking at the early stages initiated by IBM in the 1960’s and continuing through to modern day virtualization. The second half of the chapter reviews the different layers and methods used in implementing virtualization with the chapter coming to a close by examining the three most commonly deployed hypervisors today. 39
  • 40.
    5 R E IN F O R C E M E N T L E A R N I N G Reinforcement Learning (RL) dates back to the early days of cybernetics and work in statistics, psychology, neuroscience and computer science [38]. From a purely computer science viewpoint RL is a type of machine learning, machine learning is viewed as the ability of computer programs to automatically improve through experience. RL has been an area of research since the late 1950’s when Samuel first applied tempo- ral difference (TD) methods in order to manage action values, some years later in 1961 Minsky is attributed as developing the term RL [71], however it was the development of value functions and its mathematical characterization in the form of Markov decision process (MDP) in the mid 1980’s that has helped propel its popularity as an artificial intelligence (AI) approach to problem solving [72]. The successful application of the RL for disparate tasks such as Tesauros TD-Gammon or Bartos work on improving elevator performance through the use of RL and neural networks has also elevated its appeal to researchers in recent times [74] [9]. This AI approach offers a more flexible approach than many of its counterparts and is a key part in what differentiates itself from that of other forms of machine learning, including supervised and unsupervised, by this we mean, actions can be of low-level non-critical decisions or high-level strategic methods, boundaries between an agent and the environment are not rigidly defined and can adapt to suit the given workspace or problem and the time steps involved need not be of chronological order, they can be stage or task relate to suit the problem domain. 40
  • 41.
    5.1 agent /environment interaction 5.1 agent / environment interaction Within a RL framework the learner is commonly referred to as an agent, with everything outside of the agent referred to as its environment. Through a cyclical process of state- action-reward at discrete time steps, the agent learns an optimum policy. As an agent progresses through the state space, in the main, their current action not only effects the immediate reward received, but the probability of maximising future re- wards. Therefore an ”optimal action” must take into account not only immediate reward but also the possible future reward in deciding which action to take, commonly referred to as delayed reward. RL delayed reward problems are commonly modelled as MDP. A MDP is a mathemati- cal structure for the modelling of decision under uncertainty. A MDP is represented as a 4 tuple [7] s,a,t,r s - The state space, in a reinforcement framework is referred to as the environment state. a - The action space, representative of all possible actions in a given state in a reinforce- ment framework. t - The transition state or the probability (P) of action a in state s will result in s1 and can be defined as: Pa s,s = Pr st+1 = s |st = s, at = a (1) r - The reward space - given any current state s and action a, and, together with any next state, s , the expected value of the next reward is Ra s,s = E rt+1|st = s, at = a, st+1 = s (2) 41
  • 42.
    5.1 agent /environment interaction Therefore, we can view reinforcement learning as the ability to map states to actions in order to maximize a numerical reward. In order to achieve this a reoccurring interac- tion at discrete time steps between the agent and environment is necessary as laid out in Fig.5.1. The agent receives a representation of the environment in the form a state st . This allows the agent select and return an action at , based on the agents policy. At the beginning of the next time step the environment returns a new representation of the current state st+1 and a numerical reward rt+1 base on the previous action undertaken at . Figure 5.1: The agent-environment interaction in RL [73] 42
  • 43.
    5.2 learning strategies 5.2learning strategies Traditional learning strategies commonly referred to as update functions, assign rewards at the end of a task via the relation of actual and predicted outcome, however such meth- ods have proved to be resource intensive as regards memory, and can also be viewed as a static approach, unsuitable for more transformative problem domains. A more suitable learning strategy known as temporal difference learning provides a collection of methods for incremental learning specialized in the area of prediction problems [70]. Temporal dif- ference methods do not require a full model of an environment in order to learn, rather they update estimates based in part on previously learned estimates [87], without waiting for a final outcome often referred to as bootstrapping [73]. A discount factor γ, which may range from 0-1, determines the importance of future rewards, a factor closer to 0 allows an agent to take a restricted view considering only short-term rewards while a value closer to 1 allows for the agent to strive towards a greater long term reward. The learning rate α establishes the rate at which new information overrides old. A learning rate of 1 ensures that the most recent information obtained is utilised while a learning rate of 0 infers no learning will take place. 43
  • 44.
    5.2 learning strategies 5.2.1Q-Learning Q-learning is a form of TD model free learning proposed by Watkins [88]. Q-learning learns on an incremental basis calculating Q-Values at each discrete time step as the estimated value of taking action a and thereafter following an optimal rewarding policy π. Q-learning, maps these state-action transitions at each discrete time step that is non terminal, through the following update rule. Q(st, at) ← Q(st, at) + α[rt + γmaxQ((st+1, ai) − Q(st, at))] (3) A single iteration results in a single Q-value which is a concatenation of current reward (r) plus the discounted estimated reward γmaxQ((st+1, ai) − Q(st, at)) allowing for ad- vanced progression towards an optimal π. The general form of the Q-learning algorithm is as follows; Q-learning Algorithm Initialize Qmap arbitrarily, π ; Repeat (while s is not terminal) ; Observe s ; Select a using: π ; Execute a; Observe st+1 rt+1; Calculate Q Q(st, at) ← Q(st, at) + α[rt + γMaxQ((st+1, ai) − Q(st, at))] 44
  • 45.
    5.2 learning strategies 5.2.2SARSA The modified connectionist Q-learning algorithm, more commonly known as SARSA was introduced by Rummery & Niranjan [66], they question if the use of γMaxQ(st+1, ai) provides an accurate estimate of a given state particularly in large scale real world ap- plications. They believe for optimal performance that γ must return to 0 for each non policy derived action. To counteract this they proposed the following update function now known as SARSA Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))] (4) Rather than utilising γmaxQ(st+1, ai) they associate a second state-action transistion Q(s+1,a+1) for the calculation of a given Q-value, thus negating the need to return γ to 0 for non policy derived actions. This means that SARSA, a name which is derived from the reflection it requires a quintuple of events st, at, r, s+1, a+1 , in order to calculate its Q- values, is viewed as an online policy as it takes into account the control policy by which the agent is moving, and incorporates that into its update of action values, in comparison to Q-learning is viewed as a offline-policy, as it simply assumes that an optimal policy is being followed. The general form of the Sarsa control algorithm is as follows; SARSA Algorithm Initialize Qmap arbitrarily , π ; Repeat (while s is not terminal) ; Observe st ; Select a using: π ; Execute a; Observe st+1 rt; Select a + 1 using: π ; Calculate Q Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))] 45
  • 46.
    5.3 action selectionpolicy 5.3 action selection policy One element of RL not shared with other machine learning techniques is that of explo- ration vs. exploitation. In order to learn a truly optimal policy an agent must explore all possible states and experience taking all possible actions, while on the other hand in order to exploit an optimal policy and its associated rewards an agent must follow the optimal policy. Commonly referred to as the dilemma of exploration and exploitation [73], it can have a great impact on an agents ability to learn [76]. An agent that always exploits the best action of any given state predefined in a state model, is said to be following a greedy selection policy, however such an implementation never explores, thus paying no regard to possible alternative more lucrative actions. 5.3.1 ε-Greedy An alternative selection policy is known as ε-greedy. This method introduces a parameter epsilon, epsilon controls the rate of exploration. epsilon is set at a desired rate of prob- ability and at each time step is compared to a random number, should they coordinate a random action is chosen, therefore providing an element of exploration. As an agent converges closer to an optimum policy epsilon may be reduced to represent the lowered need of exploration. 5.3.2 Softmax ε-greedy remains a popular method for providing an exploration allowance, however a draw back is the equal probability of choosing the worst or best action when exploring. An alternative which goes someway to addressing this issue is known as the Softmax action selection policy. When used in a RL paradigm an actions probability of selection is a function of its estimate value, increasing the probability of the higher value action been chosen [90]. Softmax action estimates are commonly obtained via Gibbs distribu- tion, however estimates can be calculated in many different ways, often dependant on the underlying schema of a system in which an agent is deployed. Similarly the benefits of Softmax over ε-greedy is undefined as it to largely depends on the environment in which they are applied [73]. 46
  • 47.
    5.4 reward shaping 5.4reward shaping One of the main limitations of RL is the slowness in convergence to an optimum policy [52]. In a RL framework traditionally value functions otherwise referred to Q-values are initialised with either pessimistic, optimistic or random values [24]. These methods tend to overlook the fact that in real world applications a developer may hold key domain expert knowledge, that if incorporated can help an agent based system to converge to a level of optimum performance at a much quicker rate. The leveraging of such knowledge is know as knowledge based reinforcement learning. One such approach is known as reward shaping, this is the introduction of domain expert designed reward in addition to the natural system reward. Due to the intrinsic relationship of rewards, states and actions, the accurate shaping of rewards is vital to the overall effectiveness of an agent. Poorly designed reward shaping can not only delay the convergence to a optimal policy, but can in fact be detrimental to learning as seen by Randlov and Alstrom where an agent learning to ride a bike actively pursued a path away from the goal as the cumulative reward for correcting the orientation was greater than that of reaching the goal [64]. 5.4.1 Potential Based Reward Shaping Ng et al. [60] introduce potential based reward shaping (PBRS) in order to optimize the method of shaping rewards and in turn preventing the problems highlighted by Randlov and Alstrom study [64]. The potential based reward is calculated as the difference in potential between s’ and s+1 and is formally defined as; F(s , s) = γφ(s ) − φ(s) (5) 47
  • 48.
    5.5 summary The researchhas proven that in applying PBRS in both finite and infinite state spaces with a single RL based agent, it does not alter the optimal policy of an agent but does decrease the convergence time significantly. This can best seen in in Fig.5.2, taken from Wiewiora et al. the diagram illustrates the convergence of a PBRS based agent against that of non- PBRS based agent of a well known RL problem known as mountain car [92]. It is clearly visible the PBRS based agent begins much closer to the optimal policy, greatly outperforming the standard agents based model. Figure 5.2: PBRS effect [92] 5.5 summary This chapter focuses on the A.I approach known as reinforcement learning. Section 5.1 looks at the agent-environment interaction and the modelling of RL delayed rewards as MDPs. In Section 5.2 temporal difference learning strategies including Q-learning and SARSA are explored in detail. This is followed by the explanation and evaluation of the action selection policies ε-greedy and softmax. The chapter concludes by surveying the advanced RL technique known as PBRS. 48
  • 49.
    6 R E LAT E D R E S E A R C H Cloud computing leverages the ability to virtualize the key constituents which form the lowest level of the cloud architecture known as IaaS, principally large scale data centers. The virtualization of large scale congregations of nodes, typical of that found in mod- ern day data centers into multiple virtual independent machines executing on a single node, not only allows for the plasticity of services, but plays a key role in the high level adherence of SLAs, and maximum utilization of resources which underpin the founda- tions of cloud computing, while providing maximum return on investment for service providers. However due to the dynamic nature of cloud services and their demands, the static or offline management of these VMs provides significant restriction to all of these key principles. In recent years much research has been undertaken focusing on combining the areas of energy efficiency and dynamic resource selection and allocation policies. This research can be categorized into the following three sections. • Threshold and Non-Threshold Approach • Artificial Intelligence Based Approach • Reinforcement Learning Approach 49
  • 50.
    6.1 threshold andnon-threshold approach 6.1 threshold and non-threshold approach The main area of research concentrates at machine or host level, such as Nathuji and Schwan, proposed a VirtualPower architecture which implements the Xen hypervisor with minimal alterations to the hypervisior [59]. Each host contains a local power man- agement module (PM-L) residing locally as a controller in the driver domain known as a Dom0 module. When a guest OS attempts to make power management decisions these calls are trapped due to their privilege levels by the hypervisior, the VirtualPower pack- age then passes these trapped calls to the PM-L where decisions on power management can be made based on VirtualPower management rules contained in the Dom0 controller. However while this research address local policies it fails to address global policies for their suggested global power management (PM-G) module. Kuysic et al. introduced a proactive look-ahead control algorithm [44]. The algorithm known as LLC, proposes to minimize CPU power usage and SLA violations while max- imising providers profits. It proposes the use of a quadratic estimation algorithm known as Kalman to estimate workload arrivals and supply VMs accordingly. This approach requires a complex learning based structure in order to predict incomes which in turn increase computational overhead. The research conclusions highlight this complexity as a serious issue, especially when dealing with discrete input values with exponential increases in worst case complexity, where the increase in control options accrue large in- crease in the computational time require by the LLC controller. A data center with 15 hosts requiring 30 minutes execution time, which would be unrealistic for implementa- tion in large scale data centers. Cordosa et al. proposed the leveraging of existing parameters within the Xen and VMware packages to alter the method in which VMs contend for power regardless of workload priority [19]. Parameters provided by Xen and VMware hypervisiors such as min allow for the allocation of minimum amount of resources provided to any given VM, max parameter allows to set the maximum resources applied to any given VM, while shares parameter allows a developer to set the ratio of CPU allocation between high and low priority VMs. By allocating high levels of minimum resources to high priority VMs and limiting the allocation to low priority VMs, they hope to improve overall performance. Using VMware ESX servers the authors carried out their experiments, however the min and max and share thresholds were designated prior to run-time i.e statically, with no al- 50
  • 51.
    6.1 threshold andnon-threshold approach ternative for dynamic adjustment during run-time thus limiting suitability of the research to be implemented in a real world cloud data center due to the heterogeneous nature of evolving applications. The research assumes that pre-existing maps of SLA agreements exist and uses these as input parameters, but fails to outline the number of SLA violations as a result of applying the approach they outline. Verma et al. implemented a power aware application placement framework called pMapper designed to utilize power management applications such as CPU idling, DVFS and consolidation techniques that already exist in hypervisiors such as Xen [81]. These techniques are leverage via separate modules mainly the performance manager who has a global overview of the system and receives information such as SLAs and QOS pa- rameters, the migration manager which deals directly with the VMs to implement live migration, the power manager which communicates with the infrastructure layer to man- age hardware energy policies and finally the arbitrator which decides on information supplied from the above mentioned polices for the optimal placement of VMs through utilising a bin packaging algorithm. At implementation stage pMapper was utilised to solve a cost minimization problem which considers power-migration cost and similar to Cordosa et al. fails to address SLA violations [19]. In additional research by Verma et al. they suggest that server consolidation can be viewed in three forms [82]. The first is static where VMs or applications are placed on servers for an extended period such as months or years, second semi-static for daily and weekly usage and dynamic for VMs and applications with execution times ranging from minutes to hours. The author highlights that tools currently exist to manage such struc- tures, but are rarely used and administrators often prefer to wait for offline migration to decide on placements. Although the paper highlights three forms of consolidation it deals only with static and semi-static and much like Cordosa et al. research, this limits suitability of the research to be implemented in a real world cloud data center due to the heterogeneous nature of evolving applications. Jung et al. propose a hybrid system of on-line/offline collaboration by analysing data based on system behaviour and workloads on-line to feed a decision tree structure offline [36]. This approach allows for the modelling of large scale, complex configuration prob- lems and reduces overheads by removing the decision model from run-time environment. These models are used for the bases of creating on-line Multi-tier ques in an attempt to 51
  • 52.
    6.1 threshold andnon-threshold approach reach peak utilization. This research was furthered in 2009, when Jung et al. created a middleware for cost sensitive adapted and server consolidation, utilising the Multi-tier ques developed in their earlier study and by applying a best first search graph algorithm with cost-to-go as there transitions costs and a layered queuing network solver (LNQS) predictive modelling package [37]. However this was modelled solely on a single web application and similar to Cordosa and Verma, limits suitability of the research to be im- plemented in a real world cloud data center. Threshold based approach for autonomic scaling of resources are more commonplace, with cloud providers such as Amazon Ec2 through their Auto scaling software and RightScale implements such policies. Threshold based approaches are based on the premise of setting an upper and lower bound threshold that, when broken trigger the allocation or consolidation of resources as necessary. Research carried out in the area of threshold based approaches include a proposed architecture known as ”the 1000 island solution architecture” by Zhu et al. [99]. Similar to Verma they consider three separate application categories based on time periods, they then denote an individual controller to each category. The largest timescale is hours to days, the second is minutes and finally seconds. Each group is regarded as a pod and has a node controller managing dynamic allocation of the node’s resources, as part of the node controller lies a utilization controller, which computes resource consumption and estimates future consumption required in order to meet the SLA. This information is passed to a global arbitrator module which decides overall allocation of resources. The arbitrator module associates individual workloads priority levels in order to schedule works appropriately, with high priority works getting first allocation of resources. The pod controller monitors node utilization levels setting 85% CPU utilisation as an upper threshold and 50% as a lower threshold, using this information it then migrates VMs as necessary. The pod set controller studies, historic demands and estimates future demands using an optimization heuristic approach to formulate policies. Although the results of the experiments in this research are positive the author highlights the need to scale up the size of the test bed to realistically evaluate its strength in a real world application, to the best of our knowledge this has not yet been achieved. 52
  • 53.
    6.2 artificial intelligencebased approach 6.2 artificial intelligence based approach John McCarthy defines AI as the science and engineering of making intelligent machines, especially intelligent computer programs [54]. This ability to make intelligent computer programs forms the basis for the following concepts, in which researchers apply a range of AI approaches as a tool for the optimization of resource allocation in a cloud environ- ment. One such example is that of Hu et al. who considers a genetic algorithm (GA) approach to the scheduling of resources, in particular VMs [34]. Utilizing a GA in conjunction with historic performance data, Hu attempts to predict the effect of multiple possible sched- ules in advance of any deployment in order to apply the best load balance. Wei et al. deploys a similar approach to resource optimization through a game the- oretic approach, scheduling resources through a cost-time optimization algorithm, two step approach [89]. Each agent solves their problem optimally independent of others, at this stage an evolutionary optimization algorithm takes this information, collates the data and estimates an approximate optimal solution and donates resources as necessary. Particle swarm optimization a concept first introduced by Kennedy [63], was deployed by Pandy in 2010 to optimise the mapping of work flow to resources in a cloud environ- ment [62]. Each particle represents a mapping of a resource to a task in a 5D dimensional space i.e each particle has five jobs, these particles are released into the search space mapping their best locations, in this case the best task to resource allocation, in order to determine the optimal combined work flow. Moghaddam et al. in 2014 introduces the concept of a Multi-level grouping genetic algorithm (MLGGA) [58]. The researcher highlights the fact that the problem of optimal VM placement is an NP-hard problem and can be viewed as a bin packing problem. Due to the bin packing nature, they use a grouping genetic algorithm (GGA) as their base al- gorithm and attempt to introduce a Multi-level grouping concept to optimize placement and grouping of VMs and in-turn reduce the carbon footprint. While the researchers’ ex- periments are both substantial and strenuous proving a lowering in the carbon footprint, the research fails to address some of the key aspects of VM placement in data centers 53
  • 54.
    6.2 artificial intelligencebased approach such as quality of experience, security, QOS and SLAs. Sookhtsaraei et al. similar to Moghaddam introduces a genetic algorithm solution as an approach to optimizing bin packing for VMs [69]. Using GGA as a base for their algorithm they create an algorithm called CMPGGA, the CMPGGA algorithm considers bandwidth, CPU, memory along with host and VMs as input parameters, with an output of an optimized mapping for VMs to hosts. While the CMPGGA can argue improvement in reducing operational costs, the research fails to address QOS or SLA violations. With- out considering these violations which can result in monetary penalties for the service providers it is impossible to fully calculate operations improvements. 54
  • 55.
    6.3 reinforcement learningbased approach 6.3 reinforcement learning based approach A more recent approach is the application of RL agents to optimize resources manage- ment in the cloud. Barret et al. purposes a parallel RL framework for the optimisation of scaling resources in lieu of the threshold based approach [7]. The approach requires agents to approximate optimal policies and sharing their experiences with a global agent to improve overall performance, and has proven to perform exceptionally, despite the re- moval of traditional rigid thresholds. While Bahati proposes incorporating RL in order to simply manage the existing thresh- old based rules [4]. A primary controller applies these rules to a system in order to enforce its quality attributes. A secondary controller monitors the effects of implementing these rules and adapts thresholds accordingly. Another approach adopted by Teasauro introduces a hybrid RL approach to optimising server allocation in data centers through the training of a nonlinear function approxima- tor in batch mode on a data set while an externally trained policy makes management decisions within a given system [75]. Finally Farahnakian et al. and Yuan et al. present dynamic RL techniques to optimize the number of active hosts in operation in a given time-frame [97] [27]. A RL agent learns an online host energy detection policy and dynamically consolidates machines in line with optimal requirements. Post detection of over utilized hosts, both studies employ Bel- oglazovs minimum migration time selection policy in order to identify VMs for migration [11]. All of the above RL approaches have proven a statistical advantage over threshold based approach, and forms the motivation for this research to implement and evaluate RL at a lower level of abstraction as a policy for the selection of VMs. 55
  • 56.
    6.4 virtual machineselection policies 6.4 virtual machine selection policies Beloglazov et al. study carried out in 2011 remains one of the most highly cited and accepted pieces of research in relation to the consolidation of VMs while maximizing performance and efficiency in cloud data centers [11]. Beloglazov examines the dynamic consolidation of VMs while considering multiple host and VMs in an IaaS environment. Unlike numerous other research papers Beloglazov models SLAs as a key component in a solution to VM consolidation. Beloglazov proposed algorithm can be broken into three sections over-loading/under- loading detection, VM selection and VM placement Overload detection: Building on past research Beloglazov suggests an adaptive selec- tion policy known as Local Regression (LR) for determining when VMs require migration from host in order not to violate SLAs [10]. Local Regression first proposed by Cleveland allows for the analysis of a local subset of data in this case hosts [22]. By providing an over utilization threshold along with a safety parameter, LR decides if a host is likely to become over utilised if their current CPU utilization usage multiplied by the safety pa- rameter is larger than the maximum possible utilization. VM selection: Virtual machines v are placed on a migration list based on the shortest period of time to complete the migration, the minimum time is considered as the utilized ram divided by spare bandwidth for the host h , the policy chooses the appropriate Vmv through the following equation, where RAMu(a) is the amount of RAM currently utilized by the VMa , and NETh is the spare network bandwidth available on host h . v ∈ Vh | ∀a ∈ Vh, RAMu(v) NETh ≤ RAMu(a) NETh (6) Beloglazov research proves dynamic VM consolidation algorithm Lr-Mmt significantly outperforms static policies such as DVFS or non power aware approaches. It also outper- forms the following dynamic policies. 56
  • 57.
    6.4 virtual machineselection policies 6.4.1 Maximum Correlation Maximum correlation policy is based on the premise that the stronger the inter-relationship of applications running on an over utilized server, the higher the probability the server will overload as highlighted by Verma et al. [81]. The multiple correlation policy finds a VMv that satisfies the following policy. v ∈ Vh | ∀a ∈ Vh, R2 xv, x1, x.., xv+1, xv−1, xn ≥ R2 xv, x1, x.., xv+1, xv−1, xn (7) 6.4.2 Minimum Utilization Policy The Minimum Utilization Policy is a simple method to select VMs from overloading hosts. The policy chooses a VM based solely on the minimum utilization of a host, calculated in Millions of instructions per second (MIPS). The policy is repeated until the host is no longer considered as being overloaded [79]. 6.4.3 The Random Selection Policy The Random Selection Policy is another simple method to select VMs from overloading hosts. The policy chooses a VM randomly to migrate. The policy is repeated until the host is no longer considered as being overloaded [79]. 57
  • 58.
    6.5 research groupcontext 6.5 research group context This research was undertaken as part of a wider research group led by Dr. Enda How- ley. The research group is known for research in the area of Multi-Agent Systems, Cloud, Swarm, Smart Cities, Social Network Analysis & Simulation and Data Analytics. The fol- lowing section reviews just a subset of research carried out by past and present members of the group in the area of Cloud and RL. Barret et al. presents a novel approach to a work flow scheduling in a cloud environ- ment [6]. A work flow architecture estimates the average execution and cost of a task , which are passed to multiple solver agents, which through the use of GAs, produce vari- ous possible schedules. An MDP agent takes these possibilities and develops an optimal schedule for the work flow execution. Results show that the MDP agent can optimally choose a schedule despite an environment having varying loads and data sizes. Further work by Barrett et al. includes the automation of resource allocation in the cloud through the use of a RL multi-agent approach [7]. Each agent addresses incoming workloads, on the basis of these requests an agent must approximate an optimal policy for resource allocation, each agent shares this information amongst each other before finally forwarding optimal scheduling policies to an instance manger which allocates VM based on the advice. Results show that by paralleling the scheduling process the time taken to converge is reduced greatly and the framework can effectively select VMs of varying types for the required workload. Mannion et al. presents a parallel learning RL algorithm which utilizes heterogeneous agents [51]. Each of these heterogeneous agents learning in parallel on a partitioned subset of the overall problem. The knowledge/experience of these agents is then made available to a master agent where the values are used for Q-Value initialisation. This par- allel approach has proven to outperform the standard Q-learning approach, resulting in increased learning speed and a lower step to goal ratio. This work is advanced in where Mannion et al. introduces this parallel learning of partitioned action spaces to a smart city environment and traffic signal control [49]. Re- sults show significant improvement with the use of action space partitioning compared to a standard RL approach. Mannion also investigates the area of potential based reward 58
  • 59.
    6.6 summary systems toimprove performance for the learning of traffic signal control [50]. Comparing a potential based reward agent with a standard agent Mannion shows that not only does learning speed increase, but it also reduces cue and delay times. 6.6 summary This chapter reviews the key literature in the area of resource allocation, selection and scheduling in a cloud environment. Section 6.1 explores the traditional approach of static, threshold and non threshold approaches to resource management. Section 6.2 progresses to analyse a more dynamic approach of resource management through the application of various A.I approaches ranging from GA, PSO and game theory. Section 6.3 focuses on RL as a specific method of resource scheduling, with key work from Barret and Bhatti providing key examples of resource scheduling via RL. Section 6.4 reviews pertinent literature from Beloglazov in the area of VM selection algorithms including minimum migration and maximum correlation. The chapter concluded by highlighting the role of this research in relation to the wider research group. 59
  • 60.
    Part III M ET H O L O G Y
  • 61.
    7 C L OU D S I M 7.1 overview The CloudSim toolkit was chosen as an appropriate simulation platform as it allows for the modelling of the virtualized IaaS environment and is the basis of much leading re- search in the area of cloud computing capabilities particular energy conservation and resource allocation [47] [91] [68] [16]. The CloudSim framework is a Java based simulator developed by CLOUDS laboratory, University of Melbourne. It allows for the representation of an energy-aware data center with LAN-migration capabilities. In keeping with industry standards, 300 seconds/ 5 minute intervals establish if a host is over utilised and requires migration of VMs. The default ceiling threshold for utilization is 100% with an added safety parameter of 1.2. This safety parameter acts as a over utilisation buffer, for example, a host determined to be 85% utilised is multiplied by the safety parameter 1.2 and results in a utilisation of 102%, and is therefore deemed as over utilised. 61
  • 62.
    7.2 cloudsim components 7.2cloudsim components CloudSim is an event driven application written in the Java programming language con- taining over 200 classes and interfaces for the complete simulation of a cloud environment, the following section highlights the main and most important classes as highlighted by Buyya et al. and Calheiros et al. [16] [18]. Fig 10 contains a CloudSim class design dia- gram. CloudInformationServices: The CloudInformationServices (CIS) class represents an entity which provides the registration, indexing and modelling services for a data center cre- ated within a simulation. A host from a data center registers its details with CIS, which in-turn shares these details with the data center broker class which can then directly pro- vide workloads to a host. DataCenter: This class which extends SimEntity instantiates a data center and assigns a set of allocation policies for bandwidth(BW), memory storage and deals with the han- dling of VMs. This class is extended within the CloudSim framework as PowerDatacenter and NetworkDatacenter to allow for customised research, such as power reduction or net- work related research. DataCenterCharacteristics: The DataCenterCharacteristics class allocates the static proper- ties of a data center such as OS, management policy, time and costs. Host: The Host class represents a physical resource such as a server which hosts VMs. The class contains the internal policies for BW, processing power and memory for a single instance of a host. Vm: The Vm class represents a VM which is contained within a host. The class allows for the processing of cloudlets submitted from the DataCenterBroker class in accordance to its ability defined by its memory, processing power, storage size, and a VMs provisioning policy. Similar to Datacenter it is extended within the CloudSim framework as an instance of a PowerVm or NetworkVm to allow for customised research such as power reduction or network related research. 62
  • 63.
    7.2 cloudsim components Cloudlet:The Cloudlet class allows for the instantiation of a Cloudlet object, tracks Cloudlet movement and allows for the cancellation, pausing or removal of a cloudlet from the CloudletList(). A Cloudlet in CloudSim contains a workload assigned to a VM. DataCenterBroker: This class represents a broker acting on behalf of a client/user. The broker queries CIS and retrieves the hostList containing information on available VMs and their respective specification allowing for the broker to directly assign cloudlets to VMs with the necessary capability to achieve the customers QOS demands. SimEntity: The SimEntity class is an abstract class when extended represents a single simulation. The startEntity() method is invoked to begin a simulation, when started the processEvent() method is called repeatably to process all events held in the deferredQue(). Finally the shutdown() method is invoked just prior to the termination of a simulation which allows for events such as printing to a log file. All simulations must invoke the SimEntity class. RamProvisioner: This is an abstract class which provides the necessary methods for ram provision policies to VMs inside a host. This must be extended by researchers to config- ure custom Ram policies, otherwise CloudSim will implement the RamProvisionerSimple class as default. BwProvisioner: The BwProvisioner class is an abstract class which provides the basic necessary methods to allocate a bandwidth allocation policy. This must be extended by researchers to configure custom BW policies, otherwise CloudSim will implement the Bw- ProvisionerSimple class as default. 63
  • 64.
    7.3 energy awaresimulations 7.3 energy aware simulations 7.3.1 Initialising an Energy Aware Policy Initialising a energy aware policy is possible by accessing the org.cloudbus.cloudsim.power.planetlab package located in the examples folder, this package contains a various array of power aware simulations including Lr-Mmt, Lr-Mc and Lr- Mu. In order to create a new policy one must locate the main class within this package, from here CloudSim instantiates a new planet lab runner providing it with the necessary information. 7.3.2 Creating a Selection Policy The creation of a new selection policy is possible by accessing the org.cloudbus.cloudsim.power package located in the source folder, this folder contains all allocation and selection poli- cies for VMs. It also contains the PowerVmAllocationPolicyMigrationAbstract class which invokes the method getVmsToMigrateFromHosts(). This method calls for the selection of a VM from a overloaded host. It is from this point the selection policy instantiated by the user is invoked, and this is the key location of interaction between new or existing selection policies with CloudSim. 7.4 hardware A data center comprising of 800 physical servers, consisting of 400 HP ProLiant Ml 110 G5 and 400 HP ProLiant Ml 110 G4 servers is the default data center topology. Alterations can be made to the hardware setup via the Constants class located in org.cloudbus.cloudsim.power package. This class also provides the option of altering other key constants in relation to VM types and sizes, scheduling intervals and bandwidth and storage. 7.5 workload The workload comes from a real world IaaS environment. PlanetLab files within the CloudSim framework contain data from CoMon project representing CPU utilization of over a 100 VMs from servers located in 500 locations worldwide. In order to produce an 64
  • 65.
    7.6 summary Figure 7.1:CloudSim class structure [12] accurate and reliable experiment the algorithms were deployed to represent a one month time period, in order to achieve this the PlanetLab files were utilized through random se- lection to create a 30 day workload. Each PlanetLab file contains 288 values representative of CPU workloads. VMs are assigned these workloads on a random basis in order to best represent the stochastic characteristics of workload allocation and demands within an IaaS environment. Each VM corresponds to that of the Amazon EC2, other than that they are single core, representing the fact the workload was retrieved from single Core VMs. The 288 CPU values when used with CloudSims default monitoring interval represents 24 hours of data center capacity. 7.6 summary This chapter introduces the simulation environment used in the remainder of the thesis known as CloudSim, including details on its class structure, the necessary alterations needed to introduce a new energy aware simulation and the default hardware and work- loads provided by the simulator. 65
  • 66.
    8 A L GO R I T H M D E V E L O P M E N T In order to produce a RL selection algorithm for VMs some additional information must be provided to register the RL policy and measure its effects. The creation of the RL framework must be undertaken. The following chapter outlines in detail the necessary additional classes and alterations. 8.1 registering a selection policy In order to register a new selection policy the method getVmSelectionPolicy() from the class RunnerAbstract located in org.cloudbus.cloudsim. examples.power must be altered to include the name of the new policy and provided access on the instantiation of a simulation. This class also allows user to alter the name of the output folder for the compilation of results if required. 8.2 recording of results The recording of certain key metrics are automatically compiled by CloudSim at the end of each simulation. However these results are a combined metric of the overall perfor- mance, for example the overall energy consumed or the overall number of migrations. However for the accurate detailing of the effect a new policy has on the data center it is important to measure key information such as energy, number of migrations or SLA violations on an ongoing basis at discrete intervals. It is possible to do so in the Helper class located in located in org.cloudbus.cloudsim.examples.power. It is from this class that key metrics are printed to file at the end of each simulation, by introducing new methods it is possible to measure these key metrics on a much more refine timescale. 66
  • 67.
    8.3 additional classes 8.3additional classes The following selection describes the additional classes required for the RL framework at high level, of which a schematic can be seen in Fig. 8.1. 8.3.1 Lr-RL The LrRl class is located in the package org.cloudbus.cloudsim.examples .power.planetlab. It contains the main method and is where the simulation is instantiated. This class supplies the workload, names of the selection and allocation policy and the safety parameter to PlanetLabRunner in order to begin the simulation. 8.3.2 RlSelectionPolicy The RlSelectionPolicy class is located in the package org.cloudbus.cloudsim. power it overrides the default VM selection policy within CloudSim, it also acts as a controller class, conversing between CloudSim, the Environment class and the Agent when necessary. 8.3.3 Environment The Environment class carries out all functions necessary to accumulate the required in- formation for the Agent to make a decision, for example the Environment retrieves the state, produces a list of all possible actions in the given state and calculates rewards. All utilized by the Agent class. 8.3.4 Agent The primary role of the Agent class is to choose a VM for migration, by one of two possible methods either following a SARSA policy or an -greedy policy. The Agent also contains a ”brain”, in this case a matrix in which it stores, updates and reads Q-values as required. 67
  • 68.
    8.3 additional classes 8.3.5Algorithm The role of the Agent class is to implement the requested Q-value estimation learning strategy. In this case, Watkins Q-learning or Rummery and Niranjans SARSA algorithms. 8.3.6 RlUtilities RlUtilities class contains all functions necessary for the accumulation and accurate mea- surement of the required metrics. Figure 8.1: The reinforcement learning CloudSim architecture 68
  • 69.
    8.4 summary 8.4 summary Thischapter outlined the creation of a RL framework in CloudSim including the necessary alterations to the existing simulator and additional classes required to replicate an agent based approach for VM selection. 69
  • 70.
    9 I M PL E M E N TAT I O N In order to develop a RL algorithm in any system two key areas must be addressed which are specific to the environment in which the algorithm is deployed, these are the state-action space and the low level implementation of the learning strategy. This chapter address both these issues in relation to an IaaS environment. 9.1 state-action space RL techniques can suffer from a far-reaching state action space, which limits the effective- ness and capabilities of a RL agent. Therefore to incorporate a RL algorithm into an IaaS environment an appropriate state action range must first be defined. The state space s is defined as current host utilization hu returned as a percentage which therefore confines the state space in a range of 0-100 and is obtained through the following equation, where virtual machiene utilization vmu is defined as a migratable VM utilization and n is all possible migratable VMs. s = n ∑ vmu=1 vmu(n) hu .100 (8) The action a space is represented as the vmu of its assigned host h, returned as a per- centage, which also allows the action space to range from 0-100. a= vmu(i) hu(h) .100 (9) 70
  • 71.
    9.2 q-learning implementation 9.2q-learning implementation The first implementation allows for the implementation a Q-learning algorithm as follows . Q-learning virtual machine selection algorithm foreach host → overUtilizedHost do foreach vm → migrateableVms do possibleAction ← vmSize end choose VM from possibleActions using π Migrate Vm ; Observe hostUtilization+1, reward ; calculate Q Q(st, at) ← Q(st, at) + α[rt + γMax(Q(st+1, ai) − Q(st, at))] update Qmap end The algorithm is invoked when a host is determined to be overloaded through the LR host overload detection policy, this host is placed on a list of over utilized hosts and forwarded to the VM selection policy in this case, RL. From this stage the first host is selected from the list, the hosts level of utilization is taken as the state, while all migratable VMs are mapped as possible actions based on the percentage of their load in relation to their host. A VM is then chosen based on the RL selection policy i.e ε-greedy or softmax. This VM is placed on a migratable list and the hosts utilization level re-calculated and a scalar reward attributed, the Q-value is calculated and stored. If the current host is still deemed to be over utilised another VM is chosen in the same manner until a time when the host is no longer overloaded . However if the host is no longer over utilized the next host on the over utilized host list is chosen, until the list is empty. 71
  • 72.
    9.3 sarsa implementation 9.3sarsa implementation As referred to in section 5.2.2, SARSA requires a quintuplet consisting of values st, at, r, s+1, a+1 in order to calculate its Q-value. This is where the design of the VM selection algorithms differs. Although both algo- rithms accept the same input, the order in which they process this input must be altered appropriately. This alteration is evident in the following SARSA algorithm, post observation of the new state i.e host utilization + 1 and the reward, it does not calculate the q-value at this stage. Instead it obtains a new list of possible actions in the shape of migratable VMs for the state host utilization + 1, it then selects the appropriate VM following π. It is only now that the algorithm has the required information to calculate for Q. SARSA virtual machine selection algorithm foreach host → overUtilizedHost do foreach vm → migrateableVms do possibleAction ← vmUtilization end choose VmToMigrate from possibleActions using π Migrate Vm ; Observe hostUtilization+1, reward ; foreach vm → migrateableVms do possibleAction ← vmUtilization end choose VmToMigrate from possibleActions using π Calculate Q Q(st, at) ← Q(st, at) + α[rt + γ(Q(s+1, a+1) − Q(st, at))] update Qmap end 72
  • 73.
    9.4 summary 9.4 summary Thischapter outlined a percentile state-action space to be utilised by the agent, reducing the space to a 100*100 area reduces the state-action space necessary for an agent to traverse regardless of the number of nodes in the data center. As a result addressing the so called ”curse of dimensionality” and providing an adaptable and portable agent. Finally the chapter concludes outlining a low level implementation for two key RL strategies Q-learning and Softmax. 73
  • 74.
    Part IV E XP E R I M E N T S
  • 75.
    10 E X PE R I M E N T M E T R I C S The following chapter outlines the key metrics for measuring the performance of the RL algorithm. These metrics were purposed by Beloglazov and are widely adopted in research as a standard measurement of data center performance [12]. 10.1 energy consumption The total energy consumed by the data center per day in relation to computational re- sources i.e servers. Although other energy draws exist, such as cooling and infrastructural demands, this area was deemed outside the scope of this research. 10.2 migrations The total migrations of all VMs, on all servers, performed by the data center. As the agent is trained to carry out intelligent selection of VMs each migration is imperative when analysing this research. 10.3 service level agreement metrics The importance of maintaining a high standard of QOS and SLAs is imperative for a cloud provider to uphold. Their importance is highlighted by the three stages of measurement used to accurately report SLA violations. 75
  • 76.
    10.4 energy andsla violations 10.3.1 SLATAH, PDM & SLAV Service level agreement time per active host (SLATAH) is calculated as the time Tsi where active hosts have experienced 100% utilization of their CPU, as a result this restricts access to VMs upon the host i to any further processing energy should they request additional CPU utilization, thus forcing violations. Where N represents the number of hosts and Ta represents the time host i is actively serving VMs. Performance degradation due to migration. SLATAH = 1 N N ∑ i=1 Tsi Tai (10) (PDM) is established as an estimate of degradation Csv caused by migrations m with Cav repersenting the total requested CPU capacity by a VM v over its lifespan. PDM = 1 M m ∑ v=1 Csv Cav (11) Due to the equal importance of both SLATAH and PDM, a combined metric, service level agreement violation(SLAV) is used to measure the combination of both metrics as follows. SLAV = SLATAH.PDM (12) 10.4 energy and sla violations In order to ensure the implementation of energy saving policies does no negatively effect SLA researchers and developers are required to measure the co related effect. To measure this a combined metric named Energy and SLA Violations(ESV) is calculated as follows. With the lower the overall ESV the better the performance of a data center. ESV = ENERGY.SLAV (13) 76
  • 77.
    11 S E LE C T I O N O F P O L I C Y 11.1 experiment details Whether softmax or ε-greedy action selection is better depends on the task or the envi- ronment in which it is deployed, and an intrinsic link between the action selection choice and the update function (Q-learning or SARSA) performance exist due to its dependence of Q [73]. For this reason the following experiment was undertaken to analyse and distinguish the optimal combination of update and selection policy as mentioned in Section 2.7. There is four possible update/selection policies, they are; • Q-Learning / ε-greedy • Q-Learning / Softmax • Sarsa / ε-greedy • Sarsa / Softmax Each combination of policy is analysed using both a 30 day stochastic workload in order to measure adaptability and a repetitive single workload over 100 iterations in order to measure convergence rates. 77
  • 78.
    11.2 results 11.2 results 11.2.1Energy The running of a single workload over multiple iterations not only allows for a perspec- tive on the ability of the agent to learn, but also allows for the identification of speed of convergence to a level of optimum performance for each possible update/selection policy. The optimum energy consumption was determined as the point when the energy con- sumptions passed beneath 140kWh, established from data gather from 100 iterations on all four possible update/selection policies. Fig. 11.1 displays the energy consumption of Q- learning with softmax and ε-greedy action policies. On iteration No.34 ε-greedy converges to the optimal energy barrier, while softmax fails to penetrate sub 140kWh until iteration No.64. Figure 11.1: Energy consumption Q-Learning 100 iterations 78
  • 79.
    11.2 results Fig. 11.2displays the energy consumption of Sarsa with softmax and ε-greedy action policies. On iteration No.33 ε-greedy converges to the optimal energy barrier, while soft- max again fails to penetrate sub 140kWh until it finally converges on iteration No.85. Figure 11.2: Energy consumption SARSA 100 iterations The policies that contain epsilon appear to converge the quickest to an optimal level of performance. Fig. 11.3 displays a comparison of these policies in relation to convergence time. Figure 11.3: Q-Learning ε-Greedy vs SARSA ε-Greedy As outlined in the previous section both Sarsa and Q-learning converge to an optimal level in quick succession of each other, however Q-learning remains sub 140 kWh 22% more than Sarsa for the remainder of the 100 iterations. Post the 50th iteration point there is minimal performance differences, Q-learning produces a slight reduction in the average consumption per iteration of 140.32kWh to Sarsa’s 140.74kWh, this is in-line with the deviation level post 50 iterations with Q learning running at 0.2964 and Sarsa at 0.2977. 79
  • 80.
    11.2 results While multipleiterations of a single workload may highlight the rate of convergence it does not portray an agent in a real world stochastic cloud environment. For that rea- son one must take into account performance when supplied with a disparate workload. Fig.11.5 and Fig.11.4 contain the daily average and overall energy consumption over a 30 day period. Again policies containing ε-greedy action selection methods perform best and again, there is minimal difference in the performance of Q-learning and Sarsa, mir- roring the results from the iterative test. Q-learning ε-greedy has a saving of 5, 16 and 25kWh over Sarsa ε-greedy, Q-learning softmax and Sarsa softmax respectively. Figure 11.4: Overall Energy Consumption 30 Day Workload Figure 11.5: Average Daily Energy Consumption 30 Day Workload 80
  • 81.
    11.2 results 11.2.2 Migrations Eachtime a Vm migrates the draw on energy increases as the contents of the VM is copied from one server to another. Therefore a reduced level of migrations results in decreasing the associated energy cost. Figure 11.6: SARSA migrations 100 Iterations Taking the analysis format from the previous section, each selection combination was given an iterative single workload, Fig.11.6 displays the results of migrations selected from over utilised hosts as chosen by Sarsa combinations with Fig.11.7 displaying migra- tions selected from over utilised hosts as chosen by Q-learning combinations. Figure 11.7: Q-Learning migrations 100 Iterations A sizable difference is noticeable between the two update functions with the Sarsa resulting in an average of between 5,864 to 5,713 migrations per iteration, while the Q- learning update function average ranging from 2,941 to 3,002 migrations per iteration. 81
  • 82.
    11.2 results The differentialin relation to migrations although considerable, is not a design flaw, rather it is in line with Sutton & Barto cliff walking example Fig 11.8 [73]. Fig 11.9 displays Sarsa as accumulating greater reward similar to the cliff walking. This is the result of Sarsa’s online policy nature, which considers the action selection policy, therefore not letting the agent fall of the cliff or this case move an unrewarding machine. Rather it learns the safer, more consistent and more rewarding path. In contrast Q-learning chooses to ignore the action selection policy and attempts to converge to the optimum policy, even though on occasion, this can cause an agent to fall off the cliff, or move a machine of high cost resulting in extreme negative impact on rewards. Figure 11.8: Accumulated rewards for cliff walking task [73] Figure 11.9: Accumulated rewards for migrations 82
  • 83.
    11.2 results The totalaverage migrations per iteration, that is a combined metric of those selected from both over utilised hosts and under utilised host are contained in Fig 11.10. Again Q- learning outperforms all other possible combinations and closely aligns with the results of the 30 day test shown in Fig 11.11. Figure 11.10: Average migrations 100 iterations Figure 11.11: Average migrations 30 day workload 83
  • 84.
    11.2 results 11.2.3 ServiceLevel Agreement Violations Service level agreement violations when broken can result in a financial penalty for the service provider. Therefore, data center operators continuously strive to minimize viola- tions and maximise performance, customer satisfaction and profit. Fig 11.12 and Fig 11.13 displays the overall SLA violations for both the 100 day and 30 iterative test. Figure 11.12: Overall SLA violations 100 iterations Once again Q-Learning outperforms all other possible combinations, however this is the closest of all simulations with Q-Learning/ε-greddy out performing other possible combinations by between just 0.4-1.4%. Figure 11.13: Overall SLA violations for 30 days 84
  • 85.
    11.2 results 11.2.4 ESV Thereduction of energy can result in a correlated negative effect on SLAs if the method of reducing energy is not chosen carefully. To measure this effect, we utilise the ESV metric outlined in Section 10.4. This could be considered the most important metric as it combines SLAV and energy to give a more inclusive view of the data center performance. The lower the rate of ESV the more efficient the data center is performing. Fig.11.15 and Fig.11.14 contains the overall ESV for the itereative and 30 day tests, as expected from the earlier analysis of energy and SLAV data the Q-Learning/ε-greedy again outperforms other combinations. Figure 11.14: Overall ESV for 100 iterations Figure 11.15: Overall ESV for 30 days 85
  • 86.
    11.3 discussion 11.3 discussion Theupdate/selection based ε-greedy policies outperform softmax based policies in rela- tion to energy consumption and convergence time. The overall energy consumption for a 30 day workload ranges from a saving of 21kWh to 25kWh. ε-greedy also converges to an optimum sub 140kWh earlier than Sarsa based combinations, with Q-Learning/Softmax it closest rival converging after a further 30 iterations. Fig 11.6 and Fig 11.7 displays the migrations for Sarsa and Q-Learning policies, with Sarsa incurring a far greater number of migrations as a result of its online policy evalua- tion and resulting safe approach to VM selection. As regards SLA violations, Q-Learning/ε-greedy incurs the least violations all be it by a fractional margin of between 0.4-1.4%, however small the improvement it remains important not only from a fiscal penalty viewpoint but it also highlights that the reduction of energy is not having a correlated negative effect on SLA violations. This is further reinforced by the examination of ESV figures, a metric that as previ- ously mentioned provides a more inclusive view of data center performance. Again Q-Learning/ε-greedy records the lowest ESV, outperforming its rivals by between 5-8%. The Q-Learning/ε-greedy based model consistently out performs other selection/update policies in both the 30 day and 100 iterative test, it is therefore deemed the best policy for this environment and has been chosen as the selection/update policy to be used for the remaining experiments. 86
  • 87.
    12 P O TE N T I A L B A S E D R E WA R D S H A P I N G 12.1 experiment details Chapter 11 highlights Q-learning/ε-Greedy as the optimum performing update/selection policy. However this does not infer that the policy is performing optimally. In general a RL agent learns through trial and error by visiting multiple states and carrying out multiple actions. However such an approach highlights RLs main limitation, that is, its slowness to converge to optimum performance. This experiment aims to introduce the advanced RL technique known as potential based reward shaping as outlined in Section 5.4, as a method to improve current convergence rates. The advanced PBRS algorithm will be analysed against the standard Q-learning/ε-Greedy developed in the previous section. PBRS formally outlined in equation 14. is an additional reward calculated as the differ- ence between the potential of the original state and the resultant state [24]. F(s , s) = γφ(s ) − φ(s) (14) This calculation is then introduce into the standard Q-learning update function as fol- lows; Q(st, at) + α[rt + F(s , s) + γMax(Q(st + 1, ai) − Q(st, at))] (15) 87
  • 88.
    12.2 results 12.2 results 12.2.1Energy After a single iteration the standard Q-Learning algorithm consumes 146.14kWh and re- mains 6.14kWh above the optimum level of energy conservation as determined in Chapter 11, while the PBRS algorithm consumed 141.02kWh just 1.02kWh from the optimum level of consumption. It takes a further 10 iterations before the standard algorithm reaches a consumption level at which PBRS began. By this time PBRS has long since broken the sub 140kWh barrier on the 4th iteration. The standard agent continues to learn and it is not until post the 32th iteration before a consistent level of deviation of between 0-1kWh is maintained. Figure 12.1: PBRS vs Q-Learning energy consumption 88
  • 89.
    12.2 results 12.2.2 Migrations Theeffects of the PBRS based agent are not restricted to energy only, the effects ripple through the metrics, none less so than migrations. The standard agent after a single iteration posts a migration count of 22,243 with the PBRS based agent migrating only 18,021 VMs, 4,042 less. In line with the energy data it is not until the 10th iteration that the standard based agent reaches a migration rate at which PBRS began. The migration count remains disparate until the 38th iteration where the differential in the migration count consistently remains sub 1000. Figure 12.2: PBRS vs Q-Learning migrations 12.2.3 Service Level Agreement Violations The effects of the PBRS based agent on SLA violations mirrors that of what we have seen in the previous two sections. The PBRS agent begins at a reduce rate of SLA violations compared to the standard based agent by 1.28E-06, only after 26 iterations does the stan- dard based agent surpass this level. And it is not until iteration 31 performs on par with that of the PBRS agent. 89
  • 90.
    12.2 results Figure 12.3:PBRS vs Q-Learning slav 12.2.4 ESV As expected given the reduce rate of energy and levels of SLA violations, the PBRS agents ESV rating begins 7% lower at 0.004921. On the 11th iteration the standard based agent surpasses this level for the first time and shortly after the 30th iteration it consistently performs on par with that of the PBRS agent. Figure 12.4: PBRS vs Q-Learning ESV 90
  • 91.
    12.3 discussion 12.3 discussion Theaddition of PBRS to the Q-learning agent has significantly decreased the convergence time and therefore the time spent by the standard agent learning the state-action space, overall the PBRS agent was tested on 10% of the overall workload and portrayed con- sistent results throughout. On all occasions the PBRS converged to a level deemed as optimal in less than 5 iterations, while the standard agent required on average in excess of 21 iterations. The effect was mirrored in relation to migrations with 22% less migrations after a single iteration, with the standard agent taking an average of 10 iterations to reach this level. Similar changes were noticeable in regards to the SLA violations with the standard agent again on average taking over 10 iterations to reach a level where the PBRS agent began. These improved Energy and SLA violations are reflected in the ESV data, with the PBRS agent after a single iteration running at a 7% lower rate than the standard. On average it takes the standard agent another 11 iterations to reach that level, from which the differential between both agents remain steady. 91
  • 92.
    13 C O MPA R AT I V E V I E W O F L R - R L V S L R - M M T 13.1 experiment details Following on from the experiments carried out in Chapters 8 and 9 and the determi- nation that a PBRS Q-Learning/ε-Greedy based agent provides optimum performance this section evaluates the algorithm against the leading VM selection policy in research literature. Research has previously established that dynamic consolidation algorithms have statis- tically out preformed both static allocation policies such as DVFS and that heuristic based dynamic VM consolidation outperforms online deterministic algorithms [12]. The optimal combination of selection-allocation policies were proven to be that of Lr- Mmt, statistically out performing multiple disparate algorithms [12]. For that reason Lr-Mmt has been designated as the preeminent algorithm with which to analyse the dynamic virtual machine selection algorithm, Lr-Rl. A 30 day stochastic real world workload is provided to both algorithms with each algorithm subject to analysis under four criteria, energy consumption, service level agreement violations, quantity of virtual machine migrations and ESV. 92
  • 93.
    13.2 results 13.2 results 13.2.1Energy Fig.13.1 contains the energy consumption data from the experiment, the paired t-test shows that there is a statistically significant difference in the consumption of energy when utilizing Lr-Rl and Lr-Mmt resulting in a P-value <0.0041 with a 95% confidence interval (-7.8685 , -39.8715). As a result over the 30 day period the Lr-Rl algorithm consumes more than 716kWh less energy overall or 23.87kWh less a day. Figure 13.1: Energy consumption for 30 day workload 13.2.2 Migrations The paired t-test shows that there is a significant statistical difference between Lr-Rl and Lr-Mmt resulting in a P-value <0.0001 with a 95% confidence interval (-8,389.133 , -13,620.86) in relation to migrations. The results of the migration data per day are dis- played in Fig.13.2. Through the use of Lr-Rl, migrations over the 30 day period decreases by 330,154 overall or by an average of 11,005 per day. 93
  • 94.
    13.2 results Figure 13.2:Migrations for 30 day workload 13.2.3 Service Level Agreement Violations When lowering energy usage within a data center, it is imperative to monitor SLA vi- olations as the lowering of energy can have a parallel negative effect for example, one can lower the number of active servers through the extreme consolidation of VMs to a fewer number of servers, this however, results in a greater possibility of servers reaching 100% utilization of their CPU and VMs access to computational processing is restricted resulting in violations. The SLA violations are displayed in Fig.13.3, the results of a T- test shows no statistical difference and thus no negative effect on SLA, with a P-value of <0.2751 with a 95% confidence interval (1.1365 , -3.9669). Figure 13.3: SLA violations for 30 day workload 94
  • 95.
    13.2 results 13.2.4 ESV Thefinal evaluation is ESV, results of which can be seen in Fig.13.4, again the results reinforces the SLA violations and energy data previously gathered. On carrying out a t-test, Lr-Rl again proved a statistical improvement in performance, with a P-value of <0.0001 with a 95% confidence interval (-0.0021 , -0.0037). Figure 13.4: ESV for 30 day workload 95
  • 96.
    13.3 discussion 13.3 discussion Inorder to take a closer look at the improved performance of Lr-Rl over Lr-Mmt it is necessary to take a closer look at a single day and the disparities that lie within. On day 21 a saving of 23.02 kWh of energy occurs , with 11,561 less migrations. The average number of migrations for that day in order to reduce an over utilized host to a safe workload for Mmt stood at 2.33 over twice that of RL at 1.06. On occasion Mmt required as much as 12 migrations from a single in order to reach a safe state with Lr never requiring more than 4 migrations for a single host. An explanation of the necessity for extra migrations associated with Mmt can be found in the data of the VMs chosen for migration. On average a VM chosen by Mmt accounts for as little as 3.60% of the host overall utilization and therefore requires multiple mi- grations in order to enter an under utilized state. On the other hand Rl chosen VMs on average accounts for 18.04% of the overall host utilization, therefore when migrated im- mediately moves the host to a under utilized state. The correlation between the reduced amount of migrations and energy reduction of Lr-Rl, measured at industry standard 5 minute intervals for day 21 is shown in Fig. 13.5. Figure 13.5: Energy & Migration Correlation Day 21 96
  • 97.
    13.3 discussion The differencein the selected VMs level of utilization of its host plays a major factor in determining the overall number of hosts. Mmt however, places another large restriction on its selection of these VMs as not only does is not take into account the VM utilization level it also restricts the selection of VMs to that containing the smallest RAM. Therefore over 79% of VMs selected for migration are those with a Ram size of 613 regardless of how large or small their workload is. RL in the other hand does not implement such restrictions taking into account the more holistic value of utilization levels from both the host and VMs it allows the agent to select VMs across the full spectrum of Ram sizes, as seen in Fig. 13.6 (a) Lr-Mmt (b) Lr-Rl Figure 13.6: Ram sizes of virtual machines migrated This results in Lr-Rl accounting for 716kWh or 15% less energy consumption, 339,154 or 41% less migrations, with a reduction in the ESV level of 38% and no statistical difference in service level agreement violations. 97
  • 98.
    Part V C ON C L U S I O N
  • 99.
    14 C O NC L U S I O N 14.1 contributions Reinforcement Learning techniques have been successfully applied to resource allocation for cloud systems prior to this research. However these were at server or node level, this research proposed a novel approach to incorporate RL at a lower infrastructural level in the selection of VMs via reinforcement learning. Due to its low level of abstraction, the algorithm can be incorporated in multiple cloud infrastructures including stand alone private, federated and multi-cloud infrastructures. The high level of Co2+ emissions, associated negative environmental effects along with the increasing cost and demand of energy from data centers formed the motivation for this research into the creation of a state of the art, low energy, software policy for the selection of VMs for migration in IaaS environments. In order to produce such an algo- rithm the thesis evolved to answer the following questions. • Is RL a viable approach policy for VM selection in the cloud? • Can advanced RL techniques improve such a policy? • Can an RL approach outperform the state of the art selection policy? 99
  • 100.
    14.1 contributions Experiments carriedout in Chapter 11. aims not only to address our first research ques- tion but to further the thesis by providing an optimum update/selection policy for the selection of VMs in an IaaS environment. Results align with Sutton and Bartos view that whether softmax or ε-greedy action selection is better depends on the task or the envi- ronment in which it is deployed [73]. Fig.11.3 presents us with evidence of an agent that consistently learns to reduce energy, analysis of the results show a Q-Learning/ε-greedy based agent consistently outperforms other update/selection policies across all four met- rics. In Chapter 12 the introduction of the advanced RL technique know as potential based reward shaping further improved the agent based algorithm. Addressing one of RLs greatest difficulties, the time of convergence often referred to as the learning period and the second research question of this thesis. The introduction of PBRS has significantly decreased the convergence time and has resulted in a direct saving of over 32kWh over the 100 iterations due to a reduction in the convergence time period. Fig 12.2 highlights the reduction in convergence time, the PBRS agent converged to a level deemed as opti- mal in less than 5 iterations, while the standard agent required on average in excess of 30 iterations. This improved performance was seen throughout the data center metrics with a reduction in migrations, SLA violations and an improvement in the overall ESV. The importance of PBRS shaping when addressing such a complex problem is outlined by Devlin et al. findings, that these benefits are more beneficial in complex problem do- mains where reinforcement learning alone takes a long time to converge and has a large difference in performance between the initial policy and the final policy converged to [24]. The benefits of introducing a PBRS based agent resides directly in line with the results of many academic papers including, [24] [50] [31] and [92], however no academic literature or otherwise could be found in which a PBRS based agent was introduced into a cloud environment like what has been done in these experiments. 100
  • 101.
    14.1 contributions In Chapter13 the third research question is addressed, Lr-Rl is compared to Lr-Mmt selection algorithm. The algorithms are provided with a real world 30 workload. This results in Lr-Rl accounting for 716kWh or 15% less energy consumption, 339,154 or 41% less migrations, with a reduction in the ESV level of 38% and no statistical difference in service level agreement violations. These results shows a significant improvement on the work of Beloglazov and the Lr-Mmt algorithm [11]. Research carried out by Yuan, Voorsluys and Liu et al. [97] [85] [48], all highlight the potential savings and improved performance as a direct result of the careful selection of VMs for migration and the overall lowering migrations within a data center. The findings of this thesis add further proof of such a theory with Fig13.5 highlighting the direct cor- relation between reduced migrations and reduced energy usage. The RL selection policy is one of many elements in the overall process of data center management. However to avail of up to a 15% energy reduction in just one specific area goes a long way to addressing both Brown et al. and Koomey et al. research , who esti- mate savings of up to 25% through the introduction of a energy aware software policies for the management of data centers[14][42]. The results of RL as a selection policy also adds to the possibility of improved per- formance for many other pieces of research all of whom have developed their own host detection algorithm but have used Mmt as a selection policy, including research such as [28] [33] [53] [97] [27] just to name a few. Viewing the results of Chapter 13 from an environmental viewpoint, with an average of 23.87kWh per day, this results in a saving of 8715kWh per year. According to the EPAs calculations that equates to a saving of 5.9 metric tons of CO2, which would require 4.8 acres of mature forest per year to sequestrate [26]. 101
  • 102.
    14.2 future work 14.2future work Arising from the work presented in this thesis a number of possibilities exist as regards future work such as; • The extension of testing across a more disperse cloud topology such as a cross-data center migration scenario • The extension of testing in a scaled up testbed • Further development of the RL framework within CloudSim for optimization pur- poses Such additional research not only adds to requirement of energy aware management policies as highlight by Koomey and Brown [42][14], it also adds to the furthers the development of CloudSim as a research tool for academia and industry to utilise. 102
  • 103.
    B I BL I O G R A P H Y [1] David Abramson, Rajkumar Buyya, and Jonathan Giddy. A computational economy for grid computing and its implementation in the nimrod-g resource broker. Future Generation Computer Systems, 18(8):1061–1074, 2002. [2] Mohamed Almorsy, John Grundy, and Ingo M¨uller. An analysis of the cloud com- puting security problem. In Proceedings of APSEC 2010 Cloud Workshop, Sydney, Aus- tralia, 30th Nov, 2010. [3] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010. [4] Raphael M Bahati and Michael A Bauer. Towards adaptive policy-based manage- ment. In Network Operations and Management Symposium (NOMS), 2010 IEEE, pages 511–518. IEEE, 2010. [5] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. ACM SIGOPS Operating Systems Review, 37(5):164–177, 2003. [6] Enda Barrett, Enda Howley, and Jim Duggan. A learning architecture for scheduling workflow applications in the cloud. In Web Services (ECOWS), 2011 Ninth IEEE European Conference on, pages 83–90. IEEE, 2011. [7] Enda Barrett, Enda Howley, and Jim Duggan. Applying reinforcement learning to- wards automating resource allocation and application scalability in the cloud. Con- currency and Computation: Practice and Experience, 25(12):1656–1674, 2013. [8] Luiz Andr´e Barroso and Urs H¨olzle. The case for energy-proportional computing. IEEE computer, 40(12):33–37, 2007. [9] A Barto and RH Crites. Improving elevator performance using reinforcement learn- ing. Advances in neural information processing systems, 8:1017–1023, 1996. 103
  • 104.
    Bibliography [10] Anton Beloglazov,Jemal Abawajy, and Rajkumar Buyya. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Generation Computer Systems, 28(5):755–768, 2012. [11] Anton Beloglazov and Rajkumar Buyya. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurrency and Computation: Practice and Experience, 24(13):1397–1420, 2012. [12] Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, Albert Zomaya, et al. A taxonomy and survey of energy-efficient data centers and cloud computing systems. Advances in Computers, 82(2):47–111, 2011. [13] Luca Benini, Alessandro Bogliolo, and Giovanni De Micheli. A survey of design techniques for system-level dynamic power management. Very Large Scale Integra- tion (VLSI) Systems, IEEE Transactions on, 8(3):299–316, 2000. [14] Richard Brown et al. Report to congress on server and data center energy efficiency: Public law 109-431. Lawrence Berkeley National Laboratory, 2008. [15] Rajkumar Buyya, David Abramson, and Jonathan Giddy. A case for economy grid architecture for service oriented grid computing. In Parallel and Distributed Pro- cessing Symposium, International, volume 2, pages 20083a–20083a. IEEE Computer Society, 2001. [16] Rajkumar Buyya, Rajiv Ranjan, and Rodrigo N Calheiros. Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: Challenges and opportunities. In High Performance Computing & Simulation, 2009. HPCS’09. International Conference on, pages 1–11. IEEE, 2009. [17] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation computer systems, 25(6):599– 616, 2009. [18] Rodrigo N Calheiros, Rajiv Ranjan, Anton Beloglazov, C´esar AF De Rose, and Rajku- mar Buyya. Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1):23–50, 2011. 104
  • 105.
    Bibliography [19] Michael Cardosa,Madhukar R Korupolu, and Aameek Singh. Shares and utilities based power consolidation in virtualized server environments. In Integrated Network Management, 2009. IM’09. IFIP/IEEE International Symposium on, pages 327–334. IEEE, 2009. [20] V Chaudhary, Minsuk Cha, JP Walters, S Guercio, and Steve Gallo. A comparison of virtualization technologies for hpc. In Advanced Information Networking and Ap- plications, 2008. AINA 2008. 22nd International Conference on, pages 861–868. IEEE, 2008. [21] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Chris- tian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2, pages 273–286. USENIX Association, 2005. [22] William S Cleveland. Robust locally weighted regression and smoothing scatter- plots. Journal of the American statistical association, 74(368):829–836, 1979. [23] Robert J. Creasy. The origin of the vm/370 time-sharing system. IBM Journal of Research and Development, 25(5):483–490, 1981. [24] Sam Devlin, Daniel Kudenko, and Marek Grze´s. An empirical study of potential- based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems, 14(02):251–278, 2011. [25] Tharam Dillon, Chen Wu, and Elizabeth Chang. Cloud computing: issues and challenges. In Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, pages 27–33. IEEE, 2010. [26] Epa.gov. Calculations and references — clean energy — us epa, 2015. [27] Fahimeh Farahnakian, Pasi Liljeberg, and Juha Plosila. Energy-efficient virtual ma- chines consolidation in cloud data centers using reinforcement learning. In Paral- lel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on, pages 500–507. IEEE, 2014. [28] Fahimeh Farahnakian, Tapio Pahikkala, Pasi Liljeberg, and Juha Plosila. Energy aware consolidation algorithm based on k-nearest neighbor regression for cloud data centers. In Utility and Cloud Computing (UCC), 2013 IEEE/ACM 6th International Conference on, pages 256–259. IEEE, 2013. 105
  • 106.
    Bibliography [29] Ian Foster,Yong Zhao, Ioan Raicu, and Shiyong Lu. Cloud computing and grid computing 360-degree compared. In Grid Computing Environments Workshop, 2008. GCE’08, pages 1–10. Ieee, 2008. [30] Chunye Gong, Jie Liu, Qiang Zhang, Haitao Chen, and Zhenghu Gong. The char- acteristics of cloud computing. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on, pages 275–279. IEEE, 2010. [31] Marek Grzes and Daniel Kudenko. Plan-based reward shaping for reinforcement learning. In Intelligent Systems, 2008. IS’08. 4th International IEEE Conference, vol- ume 2, pages 10–22. IEEE, 2008. [32] Steven Hand, Tim Harris, Evangelos Kotsovinos, and Ian Pratt. Controlling the xenoserver open platform. In Open Architectures and Network Programming, 2003 IEEE Conference on, pages 3–11. IEEE, 2003. [33] Abbas Horri, Mohammad Sadegh Mozafari, and Gholamhossein Dastghaibyfard. Novel resource allocation algorithms to performance and energy efficiency in cloud computing. The Journal of Supercomputing, 69(3):1445–1461, 2014. [34] Jinhua Hu, Jianhua Gu, Guofei Sun, and Tianhai Zhao. A scheduling strategy on load balancing of virtual machine resources in cloud computing environment. In Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on, pages 89–96. IEEE, 2010. [35] Yashpalsinh Jadeja and Kirit Modi. Cloud computing-concepts, architecture and challenges. In Computing, Electronics and Electrical Technologies (ICCEET), 2012 Inter- national Conference on, pages 877–880. IEEE, 2012. [36] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and Calton Pu. Generating adaptation policies for multi-tier applications in consoli- dated server environments. In Autonomic Computing, 2008. ICAC’08. International Conference on, pages 23–32. IEEE, 2008. [37] Gueyoung Jung, Kaustubh R Joshi, Matti A Hiltunen, Richard D Schlichting, and Calton Pu. A cost-sensitive adaptation engine for server consolidation of multitier applications. In Middleware 2009, pages 163–183. Springer, 2009. [38] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, pages 237–285, 1996. 106
  • 107.
    Bibliography [39] Avi Kivity,Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: the linux virtual machine monitor. In Proceedings of the Linux Symposium, volume 1, pages 225–230, 2007. [40] Nadir Kiyanclar. A survey of virtualization techniques focusing on secure on- demand cluster computing. arXiv preprint cs/0511010, 2005. [41] Jonathan Koomey. Growth in data center electricity use 2005 to 2010. A report by Analytical Press, completed at the request of The New York Times, 2011. [42] Jonathan G Koomey. Estimating total power consumption by servers in the us and the world, 2007. Lawrence Berkeley National Laboratory, Berkeley, CA, available at: http://hightech. lbl. gov/documents/DATA CENTERS/svrpwrusecompletefinal. pdf, 2007. [43] Jonathan G Koomey, Christian Belady, Michael Patterson, Anthony Santos, and Klaus-Dieter Lange. Assessing trends over time in performance, costs, and energy use for servers. Lawrence Berkeley National Laboratory, Stanford University, Microsoft Corporation, and Intel Corporation, Tech. Rep, 2009. [44] Dara Kusic, Jeffrey O Kephart, James E Hanson, Nagarajan Kandasamy, and Guofei Jiang. Power and performance management of virtualized computing environments via lookahead control. Cluster computing, 12(1):1–15, 2009. [45] B. Ellison L. Minas. Energy efficiency for information technology: How to reduce power consumption in servers and data centers. Intel Press, 2009. [46] Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang, and Chia-Ying Tseng. A dynamic resource management with energy saving mechanism for supporting cloud com- puting. International Journal of Grid and Distributed Computing, 6(1):67–76, 2013. [47] Weiwei Lin, Chen Liang, James Z Wang, and Rajkumar Buyya. Bandwidth-aware divisible task scheduling for cloud computing. Software: Practice and Experience, 44(2):163–174, 2014. [48] Haikun Liu, Hai Jin, Cheng-Zhong Xu, and Xiaofei Liao. Performance and energy modeling for live migration of virtual machines. Cluster computing, 16(2):249–264, 2013. [49] Patrick Mannion, Jim Duggan, and Enda Howley. Parallel reinforcement learning with state action space partitioning. In JMLRWorkshop and Conference Proceedings 0:19, 2015 12th European Workshop on Reinforcement Learning. 107
  • 108.
    Bibliography [50] Patrick Mannion,Jim Duggan, and Enda Howley. Learning traffic signal control with advice. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2015), 2015. [51] Patrick Mannion, Jim Duggan, and Enda Howley. Parallel learning using heteroge- neous agents. In Proceedings of the Adaptive and Learning Agents Workshop (at AAMAS 2015), 2015. [52] La¨etitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Reward func- tion and initial values: better choices for accelerated goal-directed reinforcement learning. In Artificial Neural Networks–ICANN 2006, pages 840–849. Springer, 2006. [53] Khushbu Maurya and Richa Sinha. Energy conscious dynamic provisioning of virtual machines using adaptive migration thresholds in cloud data center. Interna- tional Journal of Computer Science and Mobile Computing, pages 74–82, 2013. [54] John McCarthy. Applications of circumscription to formalizing common-sense knowledge. Artificial Intelligence, 28(1):89–116, 1986. [55] Lijun Mei, Wing Kwong Chan, and TH Tse. A tale of clouds: Paradigm comparisons and some thoughts on research issues. In Asia-Pacific Services Computing Conference, 2008. APSCC’08. IEEE, pages 464–469. Ieee, 2008. [56] David Meisner, Brian T Gold, and Thomas F Wenisch. Powernap: eliminating server idle power. ACM SIGARCH Computer Architecture News, 37(1):205–216, 2009. [57] Peter Mell and Tim Grance. The nist definition of cloud computing. Computer Secu- rity Division, Information Technology Laboratory, National Institute of Standards and Technology, 2011. [58] Fereydoun Farrahi Moghaddam, Reza Farrahi Moghaddam, and Mohamed Cheriet. Carbon-aware distributed cloud: multi-level grouping genetic algorithm. Cluster Computing, pages 1–15, 2014. [59] Ripal Nathuji and Karsten Schwan. Virtualpower: coordinated power management in virtualized enterprise systems. In ACM SIGOPS Operating Systems Review, vol- ume 41, pages 265–278. ACM, 2007. [60] Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278–287, 1999. 108
  • 109.
    Bibliography [61] Jason Niehand Ozgur Can Leonard. Examining vmware. Dr. Dobbs Journal, 25(8):70, 2000. [62] Suraj Pandey, Linlin Wu, Siddeswara Mayura Guru, and Rajkumar Buyya. A par- ticle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, pages 400–407. IEEE, 2010. [63] Riccardo Poli, James Kennedy, and Tim Blackwell. Particle swarm optimization. Swarm intelligence, 1(1):33–57, 2007. [64] Jette Randløv and Preben Alstrøm. Learning to drive a bicycle using reinforcement learning and shaping. In ICML, volume 98, pages 463–471, 1998. [65] Mendel Rosenblum and Tal Garfinkel. Virtual machine monitors: Current technol- ogy and future trends. Computer, 38(5):39–47, 2005. [66] Gavin A Rummery and Mahesan Niranjan. On-line q-learning using connectionist systems. 1994. University of Cambridge, Department of Engineering. [67] Naidila Sadashiv and SM Dilip Kumar. Cluster, grid and cloud computing: A detailed comparison. In Computer Science & Education (ICCSE), 2011 6th International Conference on, pages 477–482. IEEE, 2011. [68] Yuxiang Shi, Xiaohong Jiang, and Kejiang Ye. An energy-efficient scheme for cloud resource provisioning based on cloudsim. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 595–599. IEEE, 2011. [69] Reza Sookhtsaraei, Mirmorsal Madani, and Atena Kavian. A multi objective virtual machine placement method for reduce operational costs in cloud computing by genetic. International Journal of Computer Networks & Communications Security, 2(8), 2014. [70] Richard S Sutton. Learning to predict by the methods of temporal differences. Ma- chine learning, 3(1):9–44, 1988. [71] Richard S Sutton. Introduction: The challenge of reinforcement learning. In Rein- forcement Learning, pages 1–3. Springer, 1992. [72] Richard S Sutton. Reinforcement learning: Past, present and future. In Simulated Evolution and Learning, pages 195–197. Springer, 1999. 109
  • 110.
    Bibliography [73] Richard SSutton and Andrew G Barto. Introduction to reinforcement learning. MIT Press, 1998. [74] Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995. [75] Gerald Tesauro, Nicholas K Jong, Rajarshi Das, and Mohamed N Bennani. On the use of hybrid reinforcement learning for autonomic resource allocation. Cluster Computing, 10(3):287–299, 2007. [76] Michel Tokic and G¨unther Palm. Value-difference based exploration: adaptive con- trol between epsilon-greedy and softmax. In KI 2011: Advances in Artificial Intelli- gence, pages 335–346. Springer, 2011. [77] Wei-Tek Tsai, Xin Sun, and Janaka Balasooriya. Service-oriented cloud computing architecture. In Information Technology: New Generations (ITNG), 2010 Seventh Inter- national Conference on, pages 684–689. IEEE, 2010. [78] Rich Uhlig, Gil Neiger, Dion Rodgers, Amy L Santoni, Fernando CM Martins, An- drew V Anderson, Steven M Bennett, Alain Kagi, Felix H Leung, and Larry Smith. Intel virtualization technology. Computer, 38(5):48–56, 2005. [79] Seema Vahora and Ritesh Patel. Cloudsim a survey on vm management techniques. In International Journal of Advanced Research in Computer and Communication Engineer- ing, pages 128 – 123, 2015. [80] Vytautas Valancius, Nikolaos Laoutaris, Laurent Massouli´e, Christophe Diot, and Pablo Rodriguez. Greening the internet with nano data centers. In Proceedings of the 5th international conference on Emerging networking experiments and technologies, pages 37–48. ACM, 2009. [81] Akshat Verma, Puneet Ahuja, and Anindya Neogi. pmapper: power and migration cost aware application placement in virtualized systems. In Middleware 2008, pages 243–264. Springer, 2008. [82] Akshat Verma, Gargi Dasgupta, Tapan Kumar Nayak, Pradipta De, and Ravi Kothari. Server workload analysis for power minimization using consolidation. In Proceedings of the 2009 conference on USENIX Annual technical conference, pages 28–28. USENIX Association, 2009. 110
  • 111.
    Bibliography [83] vmware.com. Paravirtualization,2014. [84] vmware.com. Hypervisor performance, 2015. [85] William Voorsluys, James Broberg, Srikumar Venugopal, and Rajkumar Buyya. Cost of virtual machine live migration in clouds: A performance evaluation. In Cloud Computing, pages 254–265. Springer, 2009. [86] Carl A Waldspurger. Memory resource management in vmware esx server. ACM SIGOPS Operating Systems Review, 36(SI):181–194, 2002. [87] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279– 292, 1992. [88] Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989. [89] Guiyi Wei, Athanasios V Vasilakos, Yao Zheng, and Naixue Xiong. A game- theoretic method of fair resource allocation for cloud computing services. The Jour- nal of Supercomputing, 54(2):252–269, 2010. [90] Shimon Whiteson and Peter Stone. Evolutionary function approximation for rein- forcement learning. The Journal of Machine Learning Research, 7:877–917, 2006. [91] Bhathiya Wickremasinghe, Rodrigo N Calheiros, and Rajkumar Buyya. Cloudan- alyst: A cloudsim-based visual modeller for analysing cloud computing environ- ments and applications. In Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, pages 446–452. IEEE, 2010. [92] Eric Wiewiora, Garrison Cottrell, and Charles Elkan. Principled methods for advis- ing reinforcement learning agents. In ICML, pages 792–799, 2003. [93] www.nskinc.com. cloud-computing-101, 2015. [94] Xenproject.org. Vs15: Video spotlight with cavium’s larry wikelius, 2015. [95] Andrew J. Younge, Robert Henschel, James T. Brown, Gregor von Laszewski, Judy Qiu, and Geoffrey Fox. Analysis of virtualization technologies for high perfor- mance computing environments. In IEEE International Conference on Cloud Comput- ing, CLOUD 2011, Washington, DC, USA, 4-9 July, 2011, pages 9–16, 2011. 111
  • 112.
    Bibliography [96] Lamia Youseff,Rich Wolski, Brent Gorda, and Chandra Krintz. Paravirtualization for hpc systems. In Frontiers of High Performance Computing and Networking–ISPA 2006 Workshops, pages 474–486. Springer, 2006. [97] Jingling Yuan, Xuyang Miao, Lin Li, and Xing Jiang. An online energy saving resource optimization methodology for data center. Journal of Software, 8(8):1875– 1880, 2013. [98] Qi Zhang, Lu Cheng, and Raouf Boutaba. Cloud computing: state-of-the-art and research challenges. Journal of internet services and applications, 1(1):7–18, 2010. [99] Xiaoyun Zhu, Don Young, Brian J Watson, Zhikui Wang, Jerry Rolia, Sharad Sing- hal, Bret McKee, Chris Hyser, Daniel Gmach, Rob Gardner, et al. 1000 islands: Integrated capacity and workload management for the next generation data center. In Autonomic Computing, 2008. ICAC’08. International Conference on, pages 172–181. IEEE, 2008. [100] Dimitrios Zissis and Dimitrios Lekkas. Addressing cloud computing security issues. Future Generation Computer Systems, 28(3):583–592, 2012. 112