Decision support for Amazon Spot Instance


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Decision support for Amazon Spot Instance

  1. 1. Decision support for Amazon Spot Instance Fei Dong Duke University December 6, 2011 Abstract Infrastructure-as-a-Service (IaaS) provides an attractive computing paradigm to allocate cluster resource dynamically for enterprise cus- tomers and online business. Cloud providers offer a biddable virtual machines for spare computing instances known as ”Spot Instances”, which price is usually significantly lower than their fixed on-demand prices. However, users need to take the risk of uncertain availability, that is, when the bidding price is lower than spot price, the running instances will be terminated by cloud providers. This report will address the optimization problem for cloud users. We first apply regression technology to predict the spot price. Next we propose a resource application mechanism to maximize the util- ity under deadline and budget constrains. Finally, we present some experiments with real instance price traces.1 IntroductionInfrastructure-as-a-service(IaaS) cloud platform has brought unprecedentedchanges in the cloud leasing market. Amazon EC2 [1] is the popular cloudprovider to address the challenges, by providing the standard on-demandinstances, reserved instances and Spot Instances(SI) [3]. SI allow users to bidfor spare capacity and run them as long as the bid price is over the currentspot price. For some applications (e.g., web crawling, image processing, bigdata), SI can reduce 50%-60% computing costs. Table 1 shows the featuresand renting costs of some representative EC2 node types. We can formulatea simple pricing model to compute the corresponding total of each workloadexecution. total cost = cost unit × num nodes × exec time (1)Here, cost unit is the unit price in Table 1 num nodes is the number nodesin the cluster. exec time is the execution time. Based on Figure 1, if the 1
  2. 2. Figure 1: Performance Vs. pay-as-you-go costs for a workload that runs on differ-ent EC2 cluster resource configurations.user want to minimize cost subject to an execution time of under 1 hour,then it is best to choose six c1.xlarge EC2 nodes. More choices on cloud raise new challenges for users: how many instanceto rent, what kind of type(on-demand, spot, high-CPU, large disk), andwhat bid value to use for spot instances? In particular, renting on-demandrisks high costs while renting spot instances risks job interruption and thusdelayed completion when the spot price exceeds users’ bids. However, thissituation can be avoid by bidding slightly higher, thus mitigating this un-certainty, or by using fault-tolerance techniques such as checkpointing [7] To manage those tradeoffs and decision support on behalf of customers,we propose a scheme to optimize the utility. The scheme relies on twocomponents: (i) a price prediction model aimed at determining the lowestlimit price to bid in order to achieve a given level of availability; and (ii)a strategy to apply spot instances given time and budget constrains. Twoalternative strategies are proposed in our report. The goal of first one is tofinish jobs as soon as possible. The second one, instead, emphasizes lowestmonetary cost rather than execution time. EC2 Node CPU Memory Storage I/O Cost Type (# EC2 Units) (GB) (GB) Performance (U.S. $ per hour) m1.small 1 1.7 160 moderate 0.085 m1.large 4 7.5 850 high 0.34 m1.xlarge 8 15 1,690 high 0.68 c1.medium 5 1.7 350 moderate 0.17 c1.xlarge 20 7 1,690 high 0.68Table 1: Five representative EC2 node types, along with resources and costs The rest of report is organized as follows: Section 2 describes relatedwork; Section 3 describe the proposed price prediction algorithm; Section4 details the mechanisms that composed our bidding strategy; Section 5presents experimental results and discussion; Section 6 concludes the report. 2
  3. 3. 2 Related Work2.1 Cloud-based Cluster Sizing ProblemSome work has proposed methods to predict job runtimes, Elastisizer [5]shows that ”one size fits all” notion does not apply to runtime estimationof job runtimes.In our scenarios, runtime prediction aid the decision-makingprocess in the following ways: i) the user can use Elastisizer to estimatethe running time on on-demand instances. Then we can know whether it ispossible to finish before deadline; ii) along with information about currentprices, we estimate the cost to run a job on a given instance type, thusincreasing the chances of meeting monetary constraints.2.2 Cloud Management in IndustryAmazon launched the Spot Instance project in 2009. Although AWS [4]provides users dashboard to manage the account and usage, there are stillsome companies which help customer launch, monitor and manage multi-server deployments. One of these is RightScale [6], which claims that itsdynamic server configuration and automation allows you to dramaticallyreduce deployment and operational costs in the cloud.3 Prediction ModelThe first part of the proposal is a model to predict an optimal limit pricefor customer to bid on the spot market. In a Vickrey auction such as thatused in the Amazon spot market, bidders have an incentive to bid truthfullyrather than over or under bidding. Our objective is to bid in such a way toachieve a desired level of availability. In order to achieve our goal, we first need to collect the history data.Fortunately, Amazon Web Service allows us to download the history priceof the latest 3 months. Cloudexchange [2] also demonstrates the spot pricetrace and provides source data download (see Figure 2. We report statisticsof the spot prices over the period August 15, 2011 - November 15, 2011 (us-east-1 region) as well as the approximated values estimated from a regressiondistribution with the mean and variance over 1,000 samples. According toFigure 3, the normal approximation is not better than exponential distribu-tion, as the distribution of the spot prices is more long-tailed: even thoughthe first one third are a good match, the minimum and maximum values inthe historical data differ substantially from the approximation. Another attempt is to build a relationship between spot prices at differ-ent times. It is based on the factor that history data at specific momentshares similar pattern (i.e. Price at late night are cheaper than day time.We can conduct next moment price by referring yesterday price at the ex- 3
  4. 4. act monment). We suggest a formula (named time series model) to predictprice. n P redict(moment) = p(1 − p)n−1 H(i, moment) (2) i=1where p is a similarity factor, H(i, moment) donates history price at thatmoment i days ago. (a) c1.medium (b) m1.largeFigure 2: Price history for c1.medium and m1.large Spot Instance types(in USDper hour; geographic zone us-east; operation system Linux/Unix). (a) Normal Distribution (b) Exponential Distribution Figure 3: Regression on Price history for m1.small Spot Instance type. In light of the above, we propose the following algorithm: 1. Collect the prices over a period of time, in order to estimate their mean and variance. 2. Use the exponential approximation fitting, i.e., assume that spot prices are distributed with the best fitting. 3. Given the availability PR , calculate the inverse of CDF , which is a candidate price. 4
  5. 5. 4. Compare time series model, normal distribution or others and pick a maximum value. 5. if the bid price is smaller than the spot price, thus increase the bid by α for next interval. The algorithm does well especially for large instances, while the mone-tary saving compared with ”on-demand” instances. The highest bid for spotinstances ($/hour) is: • 0.229, achieving 98.11% uptime, compared to 0.34 for on-demand m1.large instance Notice the ”best” fitting to bid depends on the availability. If user canaccept a slightly lower availability, the bid price can be reduced significantly.4 Bidding EngineTo simplify the approach, we assume that the instance type and bid priceare fixed, then focus on answering the last question. Notation Description t one spot instance type n number of the instance type t s spot instance price A users’ application B budget D deadline ET estimated running time EC estimated cost AT available time ( total time in-bid) AR availability rate AT /ET ST start execution time. (Clock Time) FT finish execution time. (Clock Time) BPt bid price on SI type t PP predicted price in a time window M real monetary cost V the overall value from executing all jobs U utility value when the user application completes Table 2: PARAMETERS AND CONSTRAINS The customer’s goal is to finish A which consists of n jobs {J1 , J2 , ...Jn }by the deadline D. We employ a user model with hard deadline constrains;if A finishes at RT ≤ DT , then the utility is U ; otherwise the utility is 0. Ina model of sof t deadline constrains, the utility will decrease with a rewardfunction r(U ). Here we consider V (D) = U instead of the latter one. 5
  6. 6. The utility function is described by the tuple {D, B, ET, P P [time0 , time0 +D], t, n}. Here P P can be retrieved from the previous Section. U = F (D, B, ET, P P, t, n)our goal is to maximize the utility and find the besting settings settingsopt = argc∈S max U 1. Fastest Execution minimize F T (3) 2. Minimize the cost Dj minimize (n · si · xi ) (4) i subject to xi ∈ [0, 1] ∀i ∈ {1...D} (5) D xi ≥ ET (6) i=1 M ≤B (7) Since we assume user set deadline in a small window. i.e.: 12 hours or1 day. the search space for the problem is limited. We can enumerate thetotal feasible solutions and check the constrains with exhaustive searching.Within the idea, we develop a program in Figure 4 in Python which canpredict the price and simulate the bid engine. It will return the ”optimal”solution containing ”BPt ”, ”EC”, ”ST ”, ”F T ”, ”t” in less than 1 second. Figure 4: program interface 6
  7. 7. 5 EvaluationThe goal of experimental evaluation is to study the ability of the biddingengine to provide some useful suggestions on spot market. The evaluationmethodology is as follows: • We evaluate the predictive price for a specified cluster type. • We compare the running time and cost of spot instances with on- demand instances. • We evaluate the optimization capabilities of the bidding engine in find- ing a good time range and bid price to meet user requirement. To evaluate the prediction prices, we first collect the history price ofm1.small instance from August 17, 2011 to November 17, 2011 as the train-ing data. With the algorithm we propose in Section 3, we can predict 24hours in November 18. After comparing the real price on November 18, wefind the predictor can capture the trend of price and the variance is 0.001786which is small enough. We can see that the price between Hour 13 to 19is obviously higher than other time range, which means more users wouldlike to bid in the afternoon. When we tune some parameters carefully, wecan even get more precise result with variance 0.000769 which is marked as”Predict Adjust” in Figure 5. To evaluate the relationship of running time and monetary cost, we run acomparison experiment between on-demand and spot instances. In Section1, we have shown the performance on on-demand types. Figure 6 showsthe running time of the workload when run with the same configurationsettings, across clusters each with a different type of node. It is interestingto note that complex interactions between execution times and monetarycosts as we vary the node type used in the clusters. As expected, using spotinstance lead to increasing of the running time (1 to 5 times) rather thanon-demand instances. However, monetary cost is reduced by 1/2 to 2/3 inmost of cases. To evaluate the optimization capabilities of the bidding engine, we takethe m1.small instance as an example. In Figure 7, we show the predictionprice and real price in blue and green line respectively. Now a user wants torun an application which ET 8 hours with the constrain of $20 budget and 1day deadline. With economical mode, we set the bidding price as $0.157/has bidding engine suggested, the timeline shows the job starts at Hour 9 andends at Hour 21. Duration time is 12 hours and M is $8.04 (10 nodes). ARis 67%. When we switch to fast-run mode, the bidding price is $0.315 whichstarts at Hour 2 and ends at Hour 12. EC is $14.44 and M is $11.98. AR is8hours/12hours = 80%. If we notice that the on-demand price for m1.smallis only $0.085/h, which is lower than spot prices, the best strategy for usersis actually to choose on-demand instances. 7
  8. 8. Figure 5: Spot Price Prediction on m1.small, europe timezoneFigure 6: On demand Instance v.s. Spot Instance on running time and costFigure 7: Bid strategies on m1.small spot instance at November 18, 2011. 8
  9. 9. 6 ConclusionMarket-based cloud systems with spot instances offer the flexibility of freemarket economics and the possibility of low cost utility computing. A majorchallenge is how to bid given the users’ constraints, like resource availabilityand deadline for job completion. We propose an algorithm to predict thespot instances. We next formulate a model which enables users to optimizemonetary costs, performance, and availability as desired with tuning someparameters(cluster size, instance type). With simulation by real price traceof Amazon’s Spot Instance and workload of real applications, we evaluatedthe result. Some specific recommendations and general implications of this modelas follows. • More cost-efficient than fixed-size instance choice. It reduces more than 50% cost in most cases. • Spot Instances not always provide inexpensive resources for transient workloads. i.e. m1.small spot price is even higher than on-demand price. • A user can change several of the knobs in order to achieve a suit- able balance between monetary cost and desired service levels, such as deadline for job execution or availability.For the future work, we can study the optimization problem when allowingfor the mixing of instances(on-demand, SI together). In this proportion thebidding price is fixed totally by the users, while we should reconsider thatthe dynamic environment where customer requirement changes frequently.Besides, generalization of price prediction mechanisms and disaster recoveryproblems are some issues to be addressed.References[1] Amazon Elastic MapReduce.[2] Spot Instance History Price Trace.[3] Amazon Spot Instance.[4] Amazon Web Service.[5] H. Herodotou, F. Dong, and S. Babu. No One(Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics. In ACM Symposium on Cloud Computing, 2011.[6] RightScale.[7] S. Yi, D. Kondo, and A. Andrzejak. Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud. IEEE 3rd International Conference on Cloud Computing, 2010. 9