http://www.cs.virginia.edu/~mm5bw/papers/WorkflowAutoScaling.pdf
The presentation for SC 2011
http://dl.acm.org/citation.cfm?id=2063449
www.mingmao.org
1. 1
Auto-Scaling to Minimize Cost and
Meet Application Deadlines in
Cloud Workflows
SC 11
(Nov 16, TCC 305)
Ming Mao, Marty Humphrey
CS Department, University of Virginia
2. Introduction
2
Resource provisioning questions are not trivial
Under-provisioning → hurt performance
Over-provisioning → pay more than necessary
How much resources?
What types of resources?
When to acquire or release?
How to use them?
A performance-resource mapping problem
3. Auto-Scaling
3
Schedule-based and rule-based auto-scaling
E.g. “run 10 instances between 8AM to 6PM everyday and
2 instances all the other time.”
E.g. “add (remove) 2 instances when the average CPU
utilization is above 70% (below 20%) for 5 minutes.”
Simple and convenient, works well for simple applications
What if the relationship between the performance and
resources utilization indicators is complex
The resource utilization indicators are low-level and may
not be expressive enough
They do not consider the user budgets well
4. Auto-Scaling
4
Goals of auto-scaling mechanisms
Balance performance and cost
E.g. meet performance goals with minimum cost or maximize
utilities with the limited budget
Reflect different options for computing resources
E.g. VMs have different processing power and price
Be aware of practical considerations
E.g. VM may takes several min to be ready to use
Be aware of the cloud billing model
E.g. billed by instance-hours
Support specific application performance requirements
E.g. deadlines, the number of concurrent users, communication
latency
5. Cloud application model
5
Credit
Cloud History
Third Party
Evaluation
Complete
Model
Gold (5) (8)
(10)
Members Authentication
Loading
Profile Health
(2)
(4) Record
Advanced
(6) Model
Silver Entry
Members Point (1) (9) Response
(11)
Data Base
Validation Model
Non- (3) (7)
Member
Auto-Scaling
Non-Member Job Silver Member Job Gold Member Job
Cloud VMs
App consists of service units
Job consists of tasks
Jobs are categorized into classes (deadline and processing flow)
Cloud offers multiple VM types (price and processing power)
App has no knowledge on the workload info in advance
VM takes time to start up (VM acquisition delay) and are billed by hours
8. Solution – Step 1
8
Task bundling
Idea – force tasks run on the same instance to improve
performance and save data transfer cost
Example
T6 T8 T6 T8
Bundle task as T6'
Server 1 Server 2 Server 1 Server 1
Before After
9. Solution – Step 2
9
Deadline assignment
Idea – to break task dependencies, assign deadlines
proportionally based on task running time (on their cost-
efficient machines)
Example
T3
T3 T7
T7
T4 T11
T4 T11
T13 T1 T2 T8 T10 T13
T1 T2 T8 T10
T5 T12 T5 T12
T9 T9
T6 T6
3:00PM
3:00 4:30 3:00 3:10 3:20 3:50 4:00 4:20 4:30
Before After
Task upgrading
𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 𝑏𝑒𝑓𝑜𝑟𝑒 −𝑚𝑎𝑘𝑒𝑠𝑝𝑎𝑛 𝑎𝑓𝑡𝑒𝑟
𝑟𝑎𝑛𝑘 =
𝑐𝑜𝑠𝑡 𝑎𝑓𝑡𝑒𝑟 −𝑐𝑜𝑠𝑡 𝑏𝑒𝑓𝑜𝑟𝑒
10. Solution – Step 3
10
Determine the number of instances
From deadline assignment, we have
Task running time – tm
Task execution interval – [T0 ,T1 ]
Load vector
LVm = [tm/( T1 – T0 )]
# of instances = [LVm]
Example
T1 0 0 0.25 0 0
T2 0 0 0 0.5 0 0 0
3:00 3:15 3:45 4:00
VM1
All 0 0 0.25 0.75 0.25 0 0
11. Solution – Step 5
11
Instance consolidation
Idea – put tasks on the same instance even if some
task may not run the most cost-efficiently on that
machine
Example
T11 Idle
High-CPU 3:00 PM 4:00 PM
Before
T12 Idle
3:00 PM 4:00 PM
Standard
After T11 T12 Idle
Standard 3:00 PM 4:00 PM
12. Solution – Step 6
12
Scheduling – Earliest Deadline First
The dynamic scaling feature can make sure that the
tasks facing missed deadlines can be found in time
𝑡𝑖
<1
𝑖 𝑇 𝑒𝑛𝑑_𝑖 − 𝑇 𝑠𝑡𝑎𝑟𝑡_𝑖
14. Evaluation
14
Workload patterns
Application models
VM Type Price
Micro $0.02/hour
Standard $0.085/hour
High-CPU $0.68/hour
High-Memory $0.50/hour
Base line Time Task execution VM lag
Greedy 72 hours Randomly generated 8 min
GAIN
15. Evaluation
15
SCS cost saving ranges from 6.8% to 40.4%
The performance difference is larger with longer deadlines
16. Evaluation – High volume V.S. Low volume
16
High workload (10X ) V.S. low workload (X)
Pipeline, 1-hour deadline
Cost ($)
High Volume V.S. Low Volume
120 Greedy-
High
100 GAIN-
High
80
SCS-High
60
Greedy-
40 Low
GAIN-
20
Low
0 SCS-Low
Stable Growing Cycle OnOff
17. Evaluation – Imprecise parameters
17
Deadline(0.5hour) Non-Miss Rate for Pipeline application, 20% variance
Non-miss
Rate (%) Imprecise Task Execution Estimation
100.0% in estimated execution time, 0.5-
90.0%
80.0%
hour deadline
Greedy
SCS can finish jobs before
70.0%
60.0%
50.0% GAIN
40.0% SCS
deadlines for more than 90%,
30.0%
20.0% much better than Greedy(40%)
10.0%
0.0%
and GAIN(50%)
Stable Growing Cycle OnOff
Deadeline(1 hour) Non-Miss Rate for Pipeline application, 20% variance
Non-miss
Rate(%) Imprecise Instance Acquisition Lag in the estimate VM acquisition
100.0%
90.0% time, 1-hour deadline
80.0%
70.0% Greedy SCS beats Greedy and GAIN
60.0%
50.0%
GAIN
The performance is more affected
40.0% SCS
30.0% by the VM acquisition time
20.0%
10.0%
0.0%
Stable Growing Cycle OnOff
18. Related work
18
Dynamic resource provisioning in virtualized
environment
Multi-tier web applications, queuing theory, control theory
Workflow scheduling in Grid environment with
deadline and budget constraints
Single workflow instance
Resource pool is limited
Cloud economics
Cloud provider side V.S. cloud user side
Current cloud auto-scaling mechanisms
E.g. AWS auto-scaling, RightScale, enStratus, Scalr, AzureScale
project, etc.
19. Conclusion and future work
19
Conclusions
SCS cost saving ranges from 6.8% to 40.4%
SCS can better handle different workload volume and imprecise
parameters
Choosing proper VM types based on the workload saves cost
Instance consolidation can help save partial instance hours
VM acquisition time plays a very important role
Future work
Different scheduling approaches
Real scientific applications
Insufficient budget cases - maximize cloud user benefits/utilities
under budget constraints
Data-intensive applications