The Cloud computing paradigm emerged by establishing new resources provisioning and consumption models. Together with the improvement of resource management techniques, these models have contributed to an increase in the number of application developers that are strong supporters of partially or completely migrating their application to a highly scalable and pay-per-use infrastructure. In this paper we derive a set of functional and non-functional requirements and propose a process-based approach to support the optimal distribution of an application in the Cloud in order to handle fluctuating over time workloads. Using the TPC-H workload as the basis, and by means of empirical workload analysis and characterization, we evaluate the application persistence layer's performance under different deployment scenarios using generated workloads with particular behavior characteristics.
1. Research
University of Stuttgart
Universitätsstr. 38
70569 Stuttgart
Germany
Phone +49-711-685 88337
Fax +49-711-685 88472
Santiago Gómez Sáez, Vasilios Andrikopoulos, Frank Leymann, and Steve Strauch
Institute of Architecture of Application Systems
{gomez-saez, andrikopoulos, leymann, strauch}@iaas.uni-stuttgart.de
Towards Dynamic Application
Distribution Support for Performance
Optimization in the Cloud
IEEE CLOUD 2014
Make an animation here with the perspective and the background
- Empirical Cumulative Distribution Function: An empirical cumulative distribution function (CDF) is a non-parametric estimator of the underlying CDF of a random variable. It assigns a probability of to each datum, orders the data from smallest to largest in value, and calculates the sum of the assigned probabilities up to and including each datum. The result is a step function that increases by at each datum. The empirical CDF is usually denoted by or , and is defined as- Derive the workload behavior model -> The performance of a system depends on the workload that it must serve. For example, if a work is evenly distributed the performance will be better than if it comes in unpredictable bursts that lead to congestion. Therefore, performance evaluations require the use of representative workloads towards producing dependable results. This can be achieved by collecting data about real workloads, and creating statistical models that capture their features.
- Characterization -> Exploratory Data Analysis: exploring data sets to summarize their main characteristics
TPC Benchmark™ H is comprised of a set of business queries designed to exercise system functionalities in a manner representative of complex business analysis applications. These queries have been given a realistic context, portraying the activity of a wholesale supplier to help the reader relate intuitively to the components of the benchmark
Workload model: Two ways to use a measured workload to analyze and evaluate the system: 1) use traced workload to drive a simulation, or 2) create a model from the trace and use this model for analysis or simulation.
Real world: testing vs. production. e.g. if a new job is created, then the workload model has to be derived again. Need large amounts of real data to be able to optimize the emulation.
Degree of details:
C -> if AVG throughut Qi < median -> H
if AVG throughput Qi > Mean & % Logical Evaluations wrt Benchmark > Average Table Access -> M
else
-> L
- In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.
C -> if AVG throughut Qi < median -> H
if AVG throughput Qi > Mean & % Logical Evaluations wrt Benchmark > Average Table Access -> M
else
-> L
- In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.
- Explain that not all queries were in an appropriate time frame executed. We incorporate to the workload analysis the successfully executed in a reasonable time frame. Some queries need more than an hour to be executed. This is another parameter to take into account when distributing the application, as some queries need too much time in some deployment scenarios and relative ok time for other scenarios
Standard deviation: how much dispersion from the average exists. A low standard deviation indicates that most of the values are close to the mean.
Distribution of CL is shifted to the left, and it increases fast as the probability of ocurrence of queries with a CL throughput is greater than the other ones. The probability to find a query CL is high, so it increases rapidly until the CL queries, and in the CL queries the probability between them is low, so that it increases slowly
Distribution of CH is shifter to the right, as the probability of ocurrence of CL queries is really low, and most of the CH queries have an equal probability which is low but in conjunction sums up a high probability.
Distribution of the CM is geared towards the initial load
- Cloud service broker: intermediary between cloud consumer and cloud services. Functionalities: discovery and purchase, contract negotiation. A Cloud broker is a software application that facilitates the distribution of work between different cloud service providers.