Probabilistic consolidation of virtual machines in self organizing cloud data centers

Probabilistic Consolidation
of Virtual Machines
in Self-Organizing Cloud Data Centers
IEEE TRANSACTIONS ON CLOUD
COMPUTING, JULY-DECEMBER 2013
Speaker: Caroline

Outline
• Introduction
• Related Works
• Scenario & Performance Metrics
• ecoCloud：Assignment & Migration Procedures
• Mathematical Analysis & Experiment on Real DC
• Comparison ecoCloud & Best Fit Decreasing
• Results with Different Data Center Sizes
• Conclusion
2

Outline
• Introduction
• Related Works
• Conclusion
3

Outline
• Introduction
• Related Works
• Conclusion
4

Introduction
• Data Center & Power consumption
• Virtualization & Consolidation
• Paper Contribution
5

• Cloud computing、Big Data、IoT
• They all need large and powerful computing and
storage infrastructures to support.
6
Trends in Information Technology

Data Center (Server Farm )
• It generally includes
• Computation or storage resource
• Redundant or backup power supplies
• Redundant data communications connections
• Environmental controls
• Various security devices
8

Power Consumption
• In 2006
• The energy consumed by IT infrastructures was
about 61 billion kWh, corresponding to 1.5% of
all the produced electricity.
• 2% of the global carbon emissions, equal to the
aviation industry.
• These figures are expected to double every 5
years. [1]
9

Power usage effectiveness (PUE)
• A measure of how efficiently a computer data
center uses energy.
• In past few year, the typical values have
decreased from 2 and 3 to lower than 1.1.
11

• Most of the time servers
operate at 10-50% of their
full capacity. [2], [3]
• Caused by the changing
variability of VMs’ workload.
[4],[5]
• The DC is planned to sustain
the peaks of load, while for
long periods of time the load
is much lower.
12
Utilization of each server

• An active but idle server
consumes 50-70% of the
power consumed when fully
utilized. [6]
• Although the power is used
in the computing as much as
possible, the utilization rate
for each server itself does
not achieve the best.
13
Utilization of each server

Virtualization
• Many Virtual Machine (VM) can be executed on
the same physical server to increase the utility.
14

Consolidation
• Allocate the max number of VMs in the min
number of physical machines [7].
• Allows unneeded servers to be put into
• A low-power state or switched off
• Devoted to the execution of incremental workload.
15

The complexity of the problem
• The optimal assignment of VMs to PM is analogous
to the NP-hard “Bin Packing Problem” [17], [1], [28]
• Assigning a given set of items of variable size to the min
number of bins taken from a given set.
16

The complexity of the problem
• The assignment should take into account
multiple server resources, it becomes a “multi-
dimensional bin packing problem”
• The VMs continuously modify their hardware
requirements.
17

In this paper
• Proposed ecoCloud, inspired by ant algorithm.
• Using two types of probabilistic procedures：
Assignment and Migration.
• Key decisions is made by single servers,
• Increasing the utilization of servers
• Consolidating VMs dynamically and locally.
18

In this paper
• Extended to the Multi-dimension problem (CPU
and memory)
• Save the electrical costs and respect to the
Service Level Agreements.
19

Outline
• Introduction
• Related Works
• Conclusion
20

Forecast the load?
• [27] and [13]—try to forecast the processing
load and aim at determining the min number of
servers that should be switched on to satisfy the
demand.
• How to correctly set the server’s number?
• How to predict the processing load precisely?
• How the VMs map to servers in a dynamic
environment?
21

Heuristic approaches?
• Optimally mapping VMs to PMs
• = Bin packing problem
• = NP-hard problem
• The heuristic approaches can only lead to
suboptimal solutions.
22

• The heuristic approaches presented are use
• the Best Fit Decreasing algorithms. [1]
• the First Fit Decreasing algorithms. [28]
• the Constraint Programming paradigm. [30]
• They use lower and upper utilization thresholds
to decide when execute migration. [29]
23

• Deterministic and centralized algorithms,
• Efficiency goes bad when the size of the data
center grows.
• Mapping strategies may require the concurrent
migration of many VMs
• Cause considerable performance degradation
during the reassignment process.
24

P2P Model?
• The data center is modeled as a P2P network. [33]
• Server explore the network to collect information that
can later be used to migrate VMs.
• The V-MAN system[34] uses a gossip protocol to
communicate their state to each other.
• The complete absence of centralized control can be
seen as an obstacle by the data center administrator.
25

In the multi-resource problem
• Based on the first-fit approximation. [38]
• Using an LP formulation[39].
• Performs dynamic consolidation based on
constraint programming. [41]
• But they all need to use any complex centralized
algorithm.
26

In this paper
• Adopts a probabilistic approach
• naturally scalable
• an asynchronous and smooth migration process
• Servers can autonomously decide whether or
not to migrate or accept a VM
• The final decisions are still granted to the central
manager
27

Outline
• Introduction
• Related Works
• Conclusion
28

Scenriao - request comes
29
• Data center manager will selects a VM that is
appropriate for the application.
• Application characteristics & Client demand

Scenriao - assignment procedure
30
• Single servers to decide whether they should
accept or reject a VM.
• information available locally (CPU/RAM utilization)

Scenriao - migration procedure
31
• Migrating a VM when highly underutilized or
possibly causing overload situations.
• requests a VM migration
• choose the server that will host the migrating VM

Performance metrics
• Resource utilization
• Number of active servers
• Consumed power
• Frequency of migrations and server switches
• SLA violations
32

Outline
• Introduction
• Related Works
• Conclusion
33

34
• Performed when a client asks the data center to
execute a new application.
• The manager delegates a main part of the procedure
to single servers

35
Reject?
Accept?
Depends on the server’s utilization

36
Overutilization might cause
overload situations
Underutilization, the objective is
to put the server in a sleep mode
and save energy

• Decision is taken performing a Bernoulli trial.
• The success probability for this trial is equal to the
value of the overall assignment function.
37

• X (0-1)：the relative utilization of a resource.
• T：the maximum allowed utilization.
• P：the shape parameter
• Mp：the factor used to normalize the max
value to 1
38

39

• This figure shows the graph of the single-
resource assignment function
• for some values of the parameter p, and T=0.9.
40

• us、 ms：the current CPU and RAM utilization
at server s.
• pu、pm：the shape parameters.
• Tu、 Tm：the respective maximum utilizations
41

• Bernoulli trial is successful
• server communicates its availability to the data
center manager.
• manager selects one of the available servers, and
assigns the new VM to it.
42
Yes

• Bernoulli trial is unsuccessful
• Current number of active servers is not sufficient.
• Manager wakes up an inactive server and
requests it to run the new VM
43
No

44
• Application workload changes with time
• VMs terminate or reduce demand → underutilized
• VMs increase their requirements → overutilized

• Each server monitors its CPU and RAM
utilization
• using the libraries provided by the virtualization
infrastructure (e.g., VMWare or Hyper-V)
• Tl ：the lower threshold
• Th：the upper threshold
45

• Each server evaluates the corresponding
probability function, fl
migrate or fh
migrate
• X：the utilization of given source
46

• This figure shows the graph of the single-
resource migration function
• for some values of the parameter α, β, Tl=0.3, Th=0.8
47

• Whenever a Bernoulli trial is success
• This server will choose the VM
• Utilization of resource > Current server’s
utilization - Th
48
• Current server’s utilization：0.9
• Th：0.8
• VM1 utilization：0.05

• The choice of the new server is made by assignment
procedure, with 2 difference：
• Threshold T of the assignment function is set to 0.9
times the resource utilization of the resource server.
• This ensures the migrate to a less loaded server, and
avoid multiple migrations of the same VM.
49

• The second difference concerns the migration
from a lightly loaded server.
• When no server is available to run a migrating
VM, it would not be acceptable to switch on a
new server.
50

• This paper’s approach ensures a gradual and
continuous migration process.
• The data center administrator can set
• threshold values & shape parameters
• To choose different consolidation strategies (e.g.
conservative, intermediate, aggressive)
51

Outline
• Introduction
• Related Works
• Conclusion
52

Mathematical Analysis
• Ns ：the number of servers in a data center
• Nc ：the number of cores in each server
• Nv ：the number of VMs that can be executed
in each core.
53

• It is assumed that two types of VMs are
executed on the data center.
• CPU-bound(C-type)
• RAM-bound(M-type)
• C-type’s CPU > M-type’s CPU factor γC > 1
• M-type’s RAM > C-type’s RAM factor γM > 1
54

Power Consume
• As the CPU utilization increases, the consumed
power can be assumed increase linearly.[13][14]
• In analytical and simulation experiments
presented in this study, the power consumed by
a single server is expressed as：
55

• To analyze the behavior of the system, an
experiment with parameters as follow：
• Ns = 100 (servers)
• Nc = 6 (cores)
• CPU frequency = 2 GHz
• RAM = 4GB
• VMs‘ CPU frequency use = 500 MHz. → Nv = 4
56

• Power consumption
• Pmax = 250 W
• Pidle = 0.7 * Pmax = 175 W
• The average CPU (memory) load of the DC is
defined as the ratio between
• Total amount of CPU (RAM) required by VMs
• Corresponding CPU (RAM) capacity of the DC
• Denoted as ρC (ρM)
57

• Initial CPU and RAM utilizations = 40% of the
server capacity. T = 0.9, and p = 3.
• without ecoCloud
• 100 active server
• with CPU/RAM utilization around 40%
• without ecoCloud
• 45 active server
• nearly halve the consumed power
58

• Next we considered the values of γC and γM
• Different ratios between the CPU and RAM
demanded by the two types of VMs.
• In test case：
• 1.0 (the two kinds of applications coincide)
• 1.5 (C-type need 50% more CPU than M-type)
59

• Such an efficient consolidation is possible
• when the overall loads of CPU and RAM are
comparable. (ρC =ρM =0.4)
• In next experiment
• ρC = 0.4
• ρM = 0.2-0.6
• γC and γM are set to 4.0
61

Experiment on Real Data Center
• Do the experiments in May 2013 on a live DC
owned by a major telecommunications operator.
• The experiment was run on 28 servers virtualized
with the platform VMWare vSphere 4.0.
• 2 with CPU Xeon 32 cores and 256-GB RAM
• 7 with CPU Xeon 8 cores and 32-GB RAM.
64

• The servers hosted 447 VMs assigned
• a number of virtual cores varying between 1 - 4
• an amount of RAM varying between 1 - 16 GB.
• M-type：358 (80%) & C-type：88 (20%)
• M-type VMs contributed for
• 49.44% of the overall CPU load
• 92.15 % of the overall memory load.
65

• Network adapters with bandwidth of 10 Gbps.
• Assignment procedure
• T = 0.8 (imposed by the data center administrator)
• p = 3.
• migration procedure
• Th = 0.95、Tl = 0.5
• Shape parameter α, and β = 0.25
66

67

68

69

70

71

Outline
• Introduction
• Related Works
• Conclusion
72

Comparison Between ecoCloud & BFD
• Implement a variant of the classical Best Fit
Decreasing algorithm described and analyzed in [1]
• It was proved in [18] that BFD algorithm is the
polynomial algorithm that gives the best results in
terms of effectiveness.
73

74
VMs of over-utilized and under-utilized
servers are collected.

75
And then they are sorted in decreasing
order of CPU utilization.
100MHz 80MHz 60MHz 55MHz 40MHz

• Each VM is allocated to the server that provides
the smallest increase of the power consumption.
76
100MHz 80MHz 60MHz 55MHz 40MHz

• Key parameter is the interval of time between
two executions of the algorithm.
• Experiments with four different values of the
interval: 1, 5, 15, and 60 minutes.
• Use a home-made Java simulator fed with the
logs of real VMs to compare ecoCloud and BFD
in a data center with 400 servers.
77

• The traces represent the CPU utilization of 6,000
VMs, monitored in March/April 2012 and
updated every 5 minutes.
• Since the CPU is the only resource considered in
[1], we also consider this resource only for the
experiments reported below.
78

• Assigned the VMs to 400 servers, using the
ecoCloud and BFD algorithms for assignment
and migration of VMs.
• servers are all equipped with 2-GHz cores.
• 1/3 (4 cores)、1/3 (6 cores)、1/3 (8 cores)
• Ta = 0.90, Tl = 0.50, Th = 0.95
• α = 0.25, and β = 0.25.
79

80

81

82

83

Outline
• Introduction
• Related Works
• Conclusion
84

Results with Different Data Center Sizes
• In small systems, it can happen that all the
servers reject the VM.
• even when some of them have enough spare
CPU to accommodate the VM.
• The probability of this event becomes negligible
in large data centers.
• a server is activated only when strictly needed.
85

Results with Different Data Center Sizes
• Simulations with data centers of different size
• 100, 200, 400, and 3,000 servers
• Using the VM traces described in the previous
section
86

Outline
• Introduction
• Related Works
• Conclusion
87

Conclusion
• This paper tackles the issue of energy-
related costs in data centers and Cloud
infrastructures.
• The aim is to consolidate the VMs on as
few PMs as possible
• Minimize power consumption and carbon
emissions.
• Ensuring a good level of the QoS experienced.
88

Conclusion
• Proposed the mapping of VM based on
Bernoulli trials.
• Through single servers decide, on the basis of
the local information.
• ecoCloud particularly efficient in large data
centers.
89

Conclusion
• Mathematical analysis and experiments in a real DC
prove that ecoCloud can
• Reduce power consumption
• Avoid overload events that cause SLA violations
• Limit the number of VM migrations and server
switches
• Balance CPU-bound and RAM-bound applications.
90

Probabilistic consolidation of virtual machines in self organizing cloud data centers

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Probabilistic consolidation of virtual machines in self organizing cloud data centers

Similar to Probabilistic consolidation of virtual machines in self organizing cloud data centers (20)

Recently uploaded

Recently uploaded (20)

Probabilistic consolidation of virtual machines in self organizing cloud data centers

Editor's Notes