Cloud auto-scaling with deadline and budget constraints
Ming Mao, Jie Li, Marty Humphrey eScience Group CS Department, University of Virginia Grid 2010 – Oct 27, 2010
A fast growing computing platform IDC - Cloud spending increases 27.4% a year to $56 billion (compared 5% a year of traditional IT) $16.5 billion (2009) -> $55.5 billion (2014) src: Worldwide and Regional Public IT Cloud Service 2010-2014 Forecast Two most quoted benefits Scalable computing and storage Reduced cost Concerns Security, availability, cost management, integration interoperability, etc.
Q1. Cost – the most important factor in practice? Rate the benefits commonly ascribed to the How important is it that Cloud service providers... cloud on-demand model Offer competitive pricing 91.60% Pay only for what you use 77.90% Offer Service Level Agreements 88.60% Easy/fast to deply to end-users Option to move cloud offerings back on premise 87.80% 77.70% Provide a complete solution 86.00% Monthly payments 75.30% Understand my business and industry 84.50% Encourages standard systems 68.50% Allow managing on-premise & cloud together 82.10% Requires less in-house IT staff, costs 67.00% Support many of my IT needes 81.00% Alwasys offers latest functionality 64.60% Offer both on-premise and public cloud services 79.20%Sharing systems with partners simpler 63.90% Are a technology and business model innovator 78.30% Seems like the way of future 54.00% Have local presence, can come to my offices 72.90% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009 Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009 Q2. Moving into Cloud == Reduced Cost ?
Resource utilization information based triggers (e.g. AWS auto-scaling, RightScale, enStratus, Scalr, etc)
Multiple instance types Current billing models Full hour billing Non-ignorable instance acquisition time 7-15 min in Windows Azure More specific performance goals Budget awareness (e.g. dollars/month, dollars/job)
Cloud Deadline Users Application (Job finish time) Cloud Server Cost Job Problem Statement – how to enable cloud applications to finish all the submitted jobs before user specified deadline with as little money as possible using auto-scaling.
Workload are non-dependent jobs submitted in the job queue FCFS manner and fairly distributed Different classes of jobs Same performance goal (e.g.1 hour deadline) VM instances take time to startup
Workload W (J j , nj ) Computing Power of Instance I i D nj P (J j , ) Running Instance i j t j ,type ( Ii ) n j ( D (dtype ( Ii ) si )) n jP (J j , ) j t j ,type ( Ii ) n j Pending Instance i
Scale up Sufficient budget Min(i ctype ( Ii ) ) P W P i i Insufficient budget Max( Pi ) c i type ( Ii ) C i ctype ( Ii ) Scale down P P W i i s
Cloud Cruise Control notify Decider admin Min( i ctype ( Ii ) ) & Pj W P dynamic j configuration vm plan VM Monitor Repository Manager +, – Config workload update update vm infoenqueue Historical VM instances Data users dequeue
Workload & VM simulation parameters Mix Computing IO Intensive Avg 30 jobs/hour Intensive Avg 30 jobs/hour STD 5 jobs/hour Avg 30 jobs/hour STD 5 jobs/hour STD 5 jobs/hour General Average 300s Average 300s Average 300s0.085$/hour STD 50s STD 50s STD 50sDelay 600s High-CPU Average 210s Average 75s Average 300s0.17$/hour STD 25s STD 15s STD 50sDelay 720s High-IO Average 210s Average 300s Average 75s0.17$/hour STD 25s STD 50s STD 15sDelay 720s
VM Types Total Cost ($) % more than optimalChoice #1 General 98.52$ (43%)Choice #2 High-CPU 128.86$ (87%)Choice #3 High-IO 129.71$ (88%)Choice #4 General, High-CPU, High-IO 78.62$ (14%) Optimal General, High-CPU, High-IO 68.85$
MODIS200X – Year Terra & Aqua – Satellite(X - Y) – Day X to day Y 15 images / day Moderate scale test (up to 20 instances) 1hour deadline 2hour deadline 3hour deadline Terra 2004(10-12) 18 min late 8 min early 20 min early Total 45 jobs 9 C.H.or 1.08$ 6 C.H or 0.72$ 5 C.H.or 0.6$ 4 C.H.* or 0.48$ Aqua 2008(30-32) 15min late 20 min early 29 min early Total 45 jobs 10 C.H or 1.2$ 7 C.H.or 0.84$ 5 C.H.or 0.6$ 4 C.H. or 0.48$ Large Scale test (up to 90 instances) 2 hour deadline 4 hour deadline Terra & Aqua 2006(1-75) 20min late 6 min early Total 1125 jobs 170 C.H. or 20.4$ 132 C.H. or 15.84$ 93 C.H. or 11.16$ Terra & Aqua 2006(1-150) Admission Denied 22 min early Total 2250 jobs 243 C.H. or 29.16$ 185 C.H. or 22.2$ * C.H. – computing hour 1C.H. = 0.12$ in Windows Azure
Test: Terra & Aqua 2006(1-75) - total 1125 jobs 6min early theoretical cost - 93 C.H. or 11.16$ actual cost - 132 C.H. or 15.84$ Instance Acquisition and Release 40 38 36 34 32 30 28 26 Instance Number 24 22 20 18 16 14 12 10 8 6 4 2 0 0 1 2 3 4 5 Time (hour) Released Acquiring Ready
Conclusions More cost-efficient than fixed-size instance choice VM startup delay can affect hugely in practice Future works More general cloud application model Multiple job classes Consider other instance types (e.g. spot instances & reserved instances) Data transfer performance and storage cost