Self learning cloud controllers

Self-Learning Cloud Controllers
Pooyan Jamshidi, Amir Sharifloo, Claus Pahl,
Andreas Metzger, Giovani Estrada
IC4, Dublin City University, Ireland
University of Duisburg-Essen, Germany
Intel, Ireland
pooyan.jamshidi@computing.dcu.ie
Invited presentation at
UFC, Brazil, Fortaleza

~50% = wasted hardware
Actual
traffic
Typical weekly traffic to Web-based applications (e.g., Amazon.com)

Problem 1: ~75% wasted capacity
Actual
demand
Problem 2:
customer lost
Traffic in an unexpected burst in requests (e.g. end of year
traffic to Amazon.com)

Really like this??
Auto-scaling enables you to realize this ideal on-demand provisioning
Time

Demand

?

Enacting change in the
Cloud resources are not
real-time

Capacity we can provision
with Auto-Scaling
A realistic figure of dynamic provisioning

0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600

These quantitative
values are required to be
determined by the user
⇒  requires deep
knowledge of
application (CPU,
memory, thresholds)
⇒  requires performance
modeling expertise
(when and how to
scale)
⇒  A unified opinion of
user(s) is required
Amazon auto scaling
Microsoft Azure Watch
9

Microsoft Azure Auto-
scaling Application Block

Naeem Esfahani and Sam Malek,
“Uncertainty in Self-Adaptive
Software Systems”
Pooyan Jamshidi, Aakash Ahmad,
Claus Pahl, Muhammad Ali Babar
, “Sources of Uncertainty in Dynamic
Management of Elastic Systems”,
Under Review

Uncertainty related to enactment latency:
The same scaling action (adding/removing
a VM with precisely the same size) took
different time to be enacted on the
cloud platform (here is Microsoft Azure)
at different points and
this difference were significant
(up to couple of minutes).
The enactment latency would be also different
on different cloud platforms.

Ø Offline benchmarking
Ø Trial-and-error
Ø Expert knowledge
Costly and
not systematic
A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive,
“Model-driven Autoscaling for Cloud Applications”, ICAC’14
arrival
rate
(req/s)

95%
Resp.
:me
(ms)

400
ms

60
req/s

RobusT2Scale
Initial setting +
elasticity rules +
response-time SLA
environment
monitoring
application
monitoring
scaling
actions
Fuzzy Reasoning
Users
Prediction/
Smoothing

0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Region
of

definite

satisfaction

Region
of

definite

dissatisfaction
Region
of

uncertain

satisfaction

Performance Index
Possibility
Performance Index
Possibility
words can mean different
things to different people
Different users often
recommend
different elasticity policies
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Type-2 MF
Type-1 MF

Workload
Response time
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
uMembershipgrade
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
uMembershipgrade
=>
=>
UMF
LMF
Embedded
FOU
mean
sd

Rule

(𝒍)

Antecedents
Consequent

𝒄 𝒂𝒗𝒈
𝒍

Workload

Response-‐
time

Normal

(-‐2)

Effort

(-‐1)

Medium

Effort

(0)

High

Effort

(+1)

Maximum

Effort
(+2)

1
Very
low
Instantaneous
7
2
1
0
0
-‐1.6

2
Very
low
Fast
5
4
1
0
0
-‐1.4

3
Very
low
Medium
0
2
6
2
0
0

4
Very
low
Slow
0
0
4
6
0
0.6

5
Very
low
Very
slow
0
0
0
6
4
1.4

6
Low
Instantaneous
5
3
2
0
0
-‐1.3

7
Low
Fast
2
7
1
0
0
-‐1.1

8
Low
Medium
0
1
5
3
1
0.4

9
Low
Slow
0
0
1
8
1
1

10
Low
Very
slow
0
0
0
4
6
1.6

11
Medium
Instantaneous
6
4
0
0
0
-‐1.6

12
Medium
Fast
2
5
3
0
0
-‐0.9

13
Medium
Medium
0
0
5
4
1
0.6

14
Medium
Slow
0
0
1
7
2
1.1

15
Medium
Very
slow
0
0
1
3
6
1.5

16
High
Instantaneous
8
2
0
0
0
-‐1.8

17
High
Fast
4
6
0
0
0
-‐1.4

18
High
Medium
0
1
5
3
1
0.4

19
High
Slow
0
0
1
7
2
1.1

20
High
Very
slow
0
0
0
6
4
1.4

21
Very
high
Instantaneous
9
1
0
0
0
-‐1.9

22
Very
high
Fast
3
6
1
0
0
-‐1.2

23
Very
high
Medium
0
1
4
4
1
0.5

24
Very
high
Slow
0
0
1
8
1
1

25
Very
high
Very
slow
0
0
0
4
6
1.6

Rule
()

Antecedents
Consequent

Work
load

Respons
e
-time

-2
-1
0 +1
+2

12
Medium
Fast
2
5
3 0
0
-0.9

10 experts’ responses
𝑅↑𝑙 : IF (the workload ( 𝑥↓1 ) is 𝐹 ↓ 𝑖↓1  , AND the response-
time ( 𝑥↓2 ) is 𝐺 ↓ 𝑖↓2  ), THEN (add/remove 𝑐↓𝑎𝑣𝑔↑𝑙 
instances).
𝑐↓𝑎𝑣𝑔↑𝑙 =∑ 𝑢=1↑ 𝑁↓𝑙 ▒ 𝑤↓𝑢↑𝑙 × 𝐶 /∑𝑢=1↑ 𝑁↓
Goal: pre-computations of costly calculations to
make a runtime efficient elasticity reasoning based on
fuzzy inference

Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic
systems: theory and design. Fuzzy Systems, IEEE Transactions
on, 8(5), 535-550.
Scaling Actions
Monitoring Data

0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
𝑀
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2212
0.0000
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
u
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
u Monitoring data
Workload
Response time

0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.9568
0.9377

Performance index
𝑦↓𝑙 , 𝑦↓𝑟

0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600
0 50 100
0
500
1000
0 50 100
0
500
1000

0 10 20 30 40 50 60 70 80 90 100
-500
0
500
1000
1500
2000
Time (seconds)
Numberofhits
Original data
betta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase
0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600
0 50 100
0
500
1000
0 50 100
0
500
1000
RootRelativeSquaredError

SUT Criteria
Big
spike
Dual
phase
Large
variations
Quickly
varying
Slowly
varying
Steep tri
phase
RobusT2Scale
973ms 537ms 509ms 451ms 423ms 498ms
3.2 3.8 5.1 5.3 3.7 3.9
Overprovisioni
ng
354ms 411ms 395ms 446ms 371ms 491ms
6 6 6 6 6 6
Under
provisioning
1465ms 1832ms 1789ms 1594ms 1898ms 2194ms
2 2 2 2 2 2
SLA: 𝒓 𝒕↓ 𝟗𝟓 ≤𝟔𝟎𝟎 𝒎𝒔
For every 10s control interval
• RobusT2Scale is superior to under-provisioning in terms of
guaranteeing the SLA and does not require excessive resources
• RobusT2Scale is superior to over-provisioning in terms of
guaranteeing required resources while guaranteeing the SLA

0
0.02
0.04
0.06
0.08
0.1
alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0
RootMeanSquareError
Noise level: 10%

Current Solution
(my PhD Thesis
+ IC4 work on
auto-scaling)
Future Plan
Updating K
in MAPE-K
@ RuntimeDesign-time
Assistance
Multi-cloud

Fuzzifier
Inference
Engine
Defuzzifier
Rule
base
Fuzzy
Q-learning
Cloud ApplicationMonitoring Actuator
Cloud Platform
Fuzzy Logic
Controller
Knowledge Learning
AutonomicController
𝑟𝑡
𝑤𝑤, 𝑟𝑡, 𝑡ℎ, 𝑣𝑚 𝑠𝑎
system state system goal

RobusT2Scal
e
Learned rules
FQL
Monitoring Actuator
Cloud Platform
.fis
L
W
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡,
𝑡ℎ, 𝑣𝑚
𝑠𝑎
Load
Generator
C
system state
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟

Cloud Platform (PaaS)On-Premise
P:
Worker
Role
L: Web
Role
P:
Worker
Role
P:
Worker
Role
Cache
M:
Worker
Role
Results
:
Storag
e
Blackboard
: Storage
LG:
Console
Auto-scaling
Logic (controller)
Policy Enforcer
1112
10
LB:
Load
Balance
r
Queue
Actuator
Monitoring

0
0.2
0.4
0.6
0.8
1
1.2
0
4
12
15
21
26
33
39
47
53
60
68
76
83
90
95
103
110
118
124
130
136
145
152
159
166
171
179
185
193
200
206
214
218
219
228
236
242
249
255
263
267
271
275
283
290
295
303
307
314
320
S1 S2 S3 S4 S5

0
0.5
1
1.5
2
2.5
0 50 100 150 200 250 300 350
q(9,5)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 50 100 150 200 250 300 350
q(7,5)
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300 350
q(9,3)
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 50 100 150 200 250 300 350
q(3,3)

0
1
2
3
4
5
6
7
8
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Challenge
1:
~75%
wasted
capacity
A c t u a l
d e m a n d
Challenge
2:

customer
lost
Fuzzifier
Inference

Engine
Defuzzifier
Rule

base
Fuzzy
Q-‐learning
Cloud
ApplicationMonitoring Actuator
Cloud
Platform
Fuzzy
Logic

Controller
Knowledge
Learning
Autonomic
Controller
𝑟𝑡
𝑤
𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚
𝑠𝑎
system
state system
goal
RobusT2Scale
Learned
rules
FQL
Monitoring Actuator
Cloud
Platform
.fis
L
W
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡,
𝑡ℎ, 𝑣𝑚
𝑠𝑎
Load
Generator
C
system
state
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟

http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf
More
Details?
=>
http://www.slideshare.net/pooyanjamshidi/
Slides?
=>
Thank you!
Aakash AhmadClaus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie
https://github.com/pooyanjamshidi
Code?
=>
Hamed Jamshidi
Nabor Mendonca
Brian CarrollReza Teimourzadegan

Self learning cloud controllers

More Related Content

What's hot

Viewers also liked

Similar to Self learning cloud controllers

More from Pooyan Jamshidi

Recently uploaded

Self learning cloud controllers