A survey on performance management for internet applications


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A survey on performance management for internet applications

  1. 1. CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2010; 22:68–106 Published online 19 August 2009 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1470 A survey on performance management for internet applications Jordi Guitart1,2, ∗, † , Jordi Torres1,2 and Eduard Ayguad´ 1,2 e 1 Barcelona Supercomputing Center (BSC), Barcelona, Spain 2 Computer Architecture Department, Technical University of Catalonia, Barcelona, Spain SUMMARY Internet applications have become indispensable for many business and personal processes, turning the performance of these applications into a key issue. For this reason, recent research has comprehensively explored mechanisms for managing the performance of these applications, with special focus on dealing with overload situations and providing QoS guarantees to clients. This paper makes a survey on the different proposals in the literature for managing Internet applications’ performance. We present a complete taxonomy that characterizes and classifies these proposals into several categories including request scheduling, admission control, service differentiation, dynamic resource management, service degradation, control theoretic approaches, works using queuing models, observation-based approaches that use runtime measurements, and overall approaches combining several mechanisms. For each work, we provide a brief description in order to provide the reader with a global understanding of the research progress in this area. Copyright © 2009 John Wiley & Sons, Ltd. Received 27 November 2008; Revised 30 April 2009; Accepted 7 June 2009 KEY WORDS: internet applications; performance management; QoS guarantees; overload control ∗ Correspondence to: Jordi Guitart, Jordi Girona 1-3, Campus Nord UPC, M` dul C6, E-08034 Barcelona, Spain. o † E-mail: jguitart@ac.upc.edu Contract/grant sponsor: Ministry of Science and Technology of Spain and the European Union (FEDER funds); contract/grant number: TIN2007-60625 Contract/grant sponsor: BSC (Barcelona Supercomputing Center) Copyright q 2009 John Wiley & Sons, Ltd.
  2. 2. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 69 1. INTRODUCTION 1.1. Motivation Nowadays Internet applications are present everywhere, becoming indispensable both for people and especially for companies. This importance is translated into strong requirements in the performance, availability, or reliability of these applications. Organizations relying on Internet applications to provide services to their customers may need to be able to guarantee some quality of service (QoS) to them. In addition, they may also want to differentiate the QoS delivered to preferred customers from other ordinary customers. This typically occurs in Internet hosting platforms (a.k.a. Internet data centers or Internet service providers), which rely on on-demand computing models (e.g. utility computing), allowing service providers to make computational resources available to customers when needed. In such a model, platform resources are shared across multiple applications in order to maximize the efficient use of the resources while minimizing the enterprise costs. Application owners pay for the actual use of platform resources, and in return, the application is provided with guarantees on resource availability and QoS, which can be expressed in the form of a service level agreement (SLA). It is obvious that in such an environment, being able to fulfill the established QoS agreements with the customers while managing the resources in the most cost-effective way is mandatory. However, providing these performance guarantees is not straightforward. First, due to the com- plexity achieved in today’s Internet applications. Formerly, these applications consisted of simple static web content (HTML files and images) accessed using a web browser. Nowadays, Internet ap- plications support sophisticated web content (e.g. dynamic web content) and security capabilities to protect users’ information. In addition, new web-based computing models, in which services (a.k.a. Web Services) are remotely offered over the web to the applications, have arisen too. This has led to complex architectures including several tiers (e.g. web server, application server and database server) that must interact to provide the requested service. Second, because the workload of Internet applications varies dynamically over multiple time scales (often in an unpredictable manner) and this can lead the application servers that host the services to overload (i.e. the volume of requests for content at a site temporarily exceeds the capacity for serving them). During overload conditions, the response times may grow to unacceptable levels, and exhaustion of resources may cause the server to behave erratically or even crash causing denial of services. In e-commerce applications, such server behavior could translate to sizable revenue losses. For this reason, overload prevention is a critical goal so that a system can remain operational in the presence of overload even when the incoming request rate is several times greater than the system’s capacity. At the same time, it should be able to serve the maximum number of requests during such overload, while maintaining the response times (i.e. the QoS) at acceptable levels. Taking into account these considerations, recent research has comprehensively explored mech- anisms for managing the performance of Internet applications, with special focus on dealing with overload situations and providing QoS guarantees to customers. Distinct schemes have been pro- posed, each of them tackling the problem from a different perspective or focusing on a specific scenario. In addition to this, proposed schemes have been evolving according to the progress of Internet applications. For these reasons, great effort is required to get a global understanding of the research progress in this area. Being able to classify the proposed schemes according to a taxonomy Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  3. 3. 70 ´ J. GUITART, J. TORRES AND E. AYGUADE characterizing the performance management of Internet applications would greatly improve this understanding. However, to the best of our knowledge, this effort has not been carried out in the literature yet. 1.2. Contributions In this paper, we present a complete taxonomy that characterizes the existing schemes followed in the implementation of performance management solutions for Internet applications. Then, we perform a survey on the related literature, placing each work within the described categories and briefly describing them in order to provide the reader with a basic idea of the proposed work. In this way, the reader can realize the current research developments in this area and identify the outstanding issues. Our aim is to assist the readers to comprehend the performance management of Internet applications field by providing referral to relevant research materials. Finally, we perform a comparative discussion of the different techniques, which states their strong points and limitations. It must be noticed that there are many works that are mentioned in several sections. This occurs when these works combine several techniques. However, the detailed description of each work is done only in the section dedicated to the technique that is more representative of the proposal. The remainder of this paper is organized as follows: Section 2 introduces the taxonomy of tech- niques for the performance management of Internet applications. Section 3 presents the works using request scheduling. Section 4 introduces the proposals that use admission control and ser- vice differentiation. Section 5 describes the works focused on the dynamic resource management. Section 6 describes the approaches using service degradation. Section 7 describes the control theo- retic approaches. Section 8 introduces the works based on queuing models. Section 9 describes the approaches that combine control theory and queuing models. Section 10 presents the observation- based approaches using runtime measurements. Section 11 describes the overall works that com- bine several techniques. Section 12 discusses about the different techniques described in this paper. Finally, Section 13 presents the conclusions of this paper and some future research directions. 2. PERFORMANCE MANAGEMENT TAXONOMY Figure 1 shows a taxonomy for classifying the proposed techniques for dealing with the perfor- mance management of Internet applications. On one side, the techniques can be grouped depending on the actuation performed to manage the performance. Techniques in this category include re- quest scheduling, admission control, service differentiation, dynamic resource management, service degradation, and almost any combination of them. Notice that these techniques basically cover all the actuations that a provider can undertake when an unexpected increase in an application demand jeopardizes its performance guarantees, namely allocate additional capacity to the application by as- signing idle or under-used resources (i.e. dynamic resource management), degrade the performance of admitted requests (if further degradation is allowed by the agreed SLA) in order to temporarily increase the effective capacity (i.e. service degradation), or turn away excess requests (i.e. admis- sion control), while preferentially admitting more important requests (i.e. service differentiation) and giving prior service to them (i.e. request scheduling). Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  4. 4. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 71 Request Scheduling Admission Control for Dynamic Content & Web Services Session-based Admission Control Service Differentiation Performed Actuation Dynamic Resource Mgmt Shared Platforms Single Machine Clustered Platforms Performance Virtualized Platforms Management Techniques Dedicated Platforms Service Degradation Combined Approaches Control Theory Queuing Models Decisions Driven by Observation-based using Runtime Measurements Combined Approaches Figure 1. Taxonomy of techniques for the performance management of Internet applications. On the other side, techniques can also be grouped depending on the mechanism used to take the performance management decisions. Based on some measurable values representing the dynamics of the system (e.g. the rate of incoming requests, the output response time, etc.), the configu- ration parameters of the techniques described in the previous paragraph must be computed. For instance, this includes computing the resource distribution among the different applications in the hosting platform, or computing the maximum number of requests that can be admitted. Tech- niques in this category include control theoretical approaches, queuing model-based approaches, observation-based approaches that use runtime measurements to make the required computations, and combinations of them. 3. REQUEST SCHEDULING Request scheduling refers to the order in which concurrent requests should be served, as shown in Figure 2. Typically, servers have left this ordination to the operating system, which usually leads to process incoming requests in a FIFO manner. However, some proposals suggest adopting Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  5. 5. 72 ´ J. GUITART, J. TORRES AND E. AYGUADE ACB Request CBA Server Scheduling Figure 2. Request scheduling technique. non-traditional request-ordering policies. Their common idea is to distinguish classes of requests and schedule these classes in a given order, thus providing different QoS to each class. In fact, these proposals implement the service differentiation by means of request scheduling. For instance, several works [1–4] implement policies based on the shortest remaining processing time first (SRPT) scheduling to prioritize the service of short static web content requests in front of long requests. These works provide results that demonstrate the effectiveness of SRPT scheduling in providing significantly better response time to short requests at relatively low cost to long requests. In particular, Crovella et al. [1] experiment with connection scheduling at user level, improving the mean response time by a factor of upto 4, but at the cost of a drop in throughput by a factor of almost 2. The problem is that application level scheduling does not provide fine enough control over the order in which packets enter the network. The authors evaluate their proposals by generating synthetic workloads to the Apache web server using the SURGE tool. Balter et al. [2] overcome the problem in [1] by implementing the connection scheduling at kernel level (controlling the order in which socket buffers are drained into the network). This eliminates the drop in throughput and offers much larger performance improvements than [1]. However, it also requires modifying the OS kernel in order to incorporate the scheduling policy. The evaluation is conducted by issuing requests to a modified Apache web server using the S-Client software. The requests are extracted from the World Cup Soccer’98 server logs. Schroeder and Harchol-Balter [3] demonstrate an additional benefit from performing SRPT scheduling at kernel level for static content web requests. They show that SRPT scheduling can be used to mitigate the response time effects of transient overload conditions. The authors evaluate their approach by generating a workload based on a one-day trace from the 1998 Soccer World Cup to a modified Apache web server. They use their own trace-based Web-workload generator based on the libwww library. Finally, Rawat and Kshemkayani [4] extend the work in [2] by proposing and implementing a scheduling algorithm, which takes, in addition to the size of the request, the distance of the client from the server into account. They show that this new policy can improve the performance of large-sized files by 2.5–10%. In the evaluation, the authors use a modified version of the S-Client software to generate requests to an Apache web server. The web workload is the same as used in [2]. The size of the requests can also be used to build scheduling policies that are more sophisticated. For instance, Zhou et al. [5] propose a size-adaptive request-aware scheduling mechanism for busy dynamic Internet services that use the multi-threaded programming model. The authors propose managing resource usage at a request level instead of a thread or process level. In this way, the system can differentiate long requests and short requests during load peaks and prioritize resources for short requests. The scheduling algorithm, called SRQ, is implemented at kernel level and integrates size adaptiveness and deadline driven prioritization in a multiple queue scheduling framework with Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  6. 6. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 73 dynamic feedback-guided queue management. In order to evaluate their proposal, the authors have implemented SRQ in the Linux kernel and have integrated it with the Neptune clustering middleware [6]. This evaluation, which is driven by two services from Ask Jeeves search engine, demonstrates that the proposed scheduler can significantly outperform the standard Linux scheduler. Previous works perform quite well when targeting static content web requests. However, they are not suitable for being used with Internet applications based on dynamic content web requests or web services. These applications generally benefit from taking into account the business value of the requests for ordering them. According to this, some works [7–10] implement policies that schedule the requests depending on their priority. In these works, the priority of a request is assigned considering business parameters, such as the type of customer that has issued the request (as done in [7,8]) or the reward that is likely to generate the request (as done in [9,10]). In particular, Almeida et al. [7] propose priority-based request scheduling as a mechanism for providing differentiated QoS. Priorities to requests are assigned based on the customer to whom the requested file pertains. The authors implement the priority-based scheduling at both user and kernel levels. In the user-level approach, the Apache web server is modified with the inclusion of a scheduler process responsible for deciding the order in which the requests should be handled. In the kernel- level approach, the Linux kernel is modified such that requests priorities are mapped into priorities of the HTTP processes handling them. The evaluation shows upto 26% of improvement for higher- priority requests, with an accompanying 504% fall in the performance of lower ones, for the user- level approach. For the kernel-level approach, improvement is similar, while the slowdown is around 208%. The web workload used in the evaluation is generated using the WebStone benchmark. Alonso et al. [8] propose a mechanism to provide differentiated QoS to the different client categories by assigning different priorities to the threads attending the connections, depending on the type of client. The authors propose to schedule threads using native operating system priorities arguing that priorities at the JVM level are not considered for the scheduling of threads at the kernel level. The authors provide an implementation of the mechanism using Linux Real Time priorities. The evaluation of this proposal, which is conducted by using the httperf workload generator to establish SSL connections with a Tomcat server for requesting static web content, shows that higher-priority clients have better QoS, especially in high competition situations. Menasce et al. [9] consider the problem of resource scheduling in e-commerce sites with the aim of maximizing the revenue. The authors describe customers’ sessions with a customer behavior model graph (CBMG) and then propose a priority scheme for scheduling requests at the user level based on the states of the users. Priorities change dynamically as a function of the state, a customer is in and as a function of the amount of money that the customer has accumulated in the shopping cart. In order to evaluate the proposed policy, the authors have built an e-commerce site simulator, which is a hybrid between a trace-driven simulator and a discrete event simulator. SURGE is used to generate requests to start the sessions at the e-commerce site. Simulation results show that the suggested scheme can increase the revenue as much as 29% over the no priority policy for an overloaded site. Finally, Totok and Karamcheti [10] propose a user-level request scheduling mechanism, called reward-driven request prioritization (RDRP), which maximizes the reward attained by an Internet service by dynamically assigning higher execution priorities to the requests whose sessions are likely to bring more profit (or any other application-specific reward) to the service. The authors use a Bayesian inference analysis to statistically predict future sessions structure (in the form of a Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  7. 7. 74 ´ J. GUITART, J. TORRES AND E. AYGUADE Table I. Characteristics of works including request scheduling in their solution. work policy level joint? workload [1] SRPT user no Surge (static) on Apache [2] SRPT kernel no World Cup Soccer’98 logs (static) on Apache [3] SRPT kernel no World Cup Soccer’98 logs (static) on Apache [4] SRPT kernel no World Cup Soccer’98 logs (static) on Apache [5] SRQ kernel no Ask Jeeves search on Neptune [7] priority (type of client) both no WebStone (static) on Apache [8] priority (type of client) kernel no static & SSL on Tomcat [9] priority (reward) user no simulation of e-commerce site (dynamic) [10] RDRP (priority, reward) user no TPC-W (dynamic) on JBoss [11] priority (type of client) user yes 8kb static web pages on Apache [12] priority (reward) + GPS user yes Matlab simulation of electronic store (dynamic) [13] DWFS (priority, session user yes WebStone (static) on Apache completion probability) [14] SJF user yes TPC-W (dynamic) on Tomcat [15] weighted round robin for user yes Trade 6 benchmark (dynamic) on IBM WebSphere each traffic class [6] priority (service yield) user yes Ask Jeeves search on Neptune [16] SRJF user yes simulation of messaging service based on JMS [17] priority (type of client) kernel yes WebStone (static & CGI files) on Apache CBMG) by comparing their requests seen so far with aggregated information about client behavior. The predicted reward and execution cost values are used to compute each request priority, which is used in scheduling bottleneck server resources. The authors have integrated their proposed methods in the open-source application server JBoss, and they have used the TPC-W transactional web e-commerce benchmark to conduct the evaluation, showing that RDRP techniques yield benefits in both underload and overload situations, for both smooth and bursty client behavior. Table I summarizes the main characteristics (concerning this technique) of the works using request scheduling in their solution. This includes the scheduling policy, the level where this policy is implemented (i.e. user, kernel or both), whether this work uses other techniques in addition to scheduling‡ , and its target workload. However, although request scheduling can improve the response times, under extreme overloads other mechanisms become indispensable. Anyway, better scheduling can always be complementary to any other mechanism. In this sense, some works have incorporated request scheduling to their overload prevention solutions [6,11–17]. Detailed description of these works can be found in the corresponding sections. 4. ADMISSION CONTROL AND SERVICE DIFFERENTIATION Service differentiation is based on differentiating classes of customers and providing different QoS to each class, as shown in Figure 4. This allows, for instance, maintaining response times ‡ Further details on the techniques used can be found in Table VII. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  8. 8. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 75 AAAA Admission AA Server Control AA Figure 3. Admission control technique. AAA ABCABC Service BBB Server Differentiation CCC Figure 4. Service differentiation technique. of preferred clients in the presence of overload. As discussed in the previous section, service differentiation can be implemented by means of scheduling. Other approaches, such as refusing connections coming from a given customer’s class, allocating more resources to higher-priority services or establishing different target delays for different service classes, are introduced in this section and in the following ones. Admission control is based on reducing the amount of work, the server accepts when it is faced with overload by refusing a percentage of connections, as shown in Figure 3. Simpler admission control approaches refuse all the incoming connections when predefined thresholds in the system (e.g. the number of pending connections in the queue or the CPU utilization) are exceeded [18]. Nevertheless, admission control is generally approached as a special case of service differentiation, where a given customer’s class (typically the lower-priority class) is provided with a ’refused’ level of service when the server is overloaded. For this reason, admission control and service differentiation have been combined in a great amount of works [11,17,19–24] to prevent server overload and provide differentiated QoS to clients. For instance, Bhatti and Friedrich [11] propose WebQoS, an architecture for web servers to provide QoS to differentiated clients. The authors use request classification, admission control, and request scheduling for supporting distinct performance levels for different classes of users and maintaining the predictable performance even when the server is overloaded. Requests are classified into priorities according to the customers’ preference, and admission control of low- priority requests is triggered when thresholds in the number of requests queued and the number of premium requests queued are exceeded. Admitted requests are scheduled for execution depending on the selected policy. The authors suggest several potential scheduling policies at the user level, but evaluate only priority scheduling. Nevertheless, they do not experimentally demonstrate sustained throughput in the presence of overload. This work considers only static web content by providing an example implementation using Apache. The client workload is generated using the httperf tool. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  9. 9. 76 ´ J. GUITART, J. TORRES AND E. AYGUADE Chen et al. [19] present an admission control algorithm called admission control based on esti- mation of service time that attempts to limit the number of admitted requests based on estimated service times. Admission of a request is decided by comparing the available computation power for a duration equivalent to the predetermined delay bound with the estimated service time of the request. A double-queue structure is used to handle the over/under admission problems due to bad estimation of service times. The authors also present an extension to give better service to higher-priority tasks. The available resources for lower-priority tasks are equal to the system capacity minus the predicted resource requirements for higher-priority tasks. In order to test the effectiveness of their proposal, the authors have developed an event-driven simulator and use real traces of web requests (extracted from the logs of the CS departmental web server at Michigan State University) as input. Iyer et al. [18] propose three simple schemes for web server overload control. The first scheme uses thresholds on the connection queue length for selectively dropping incoming requests as they arrive at the server. The second scheme provides feedback to the proxy during overloads, which would cause it to restrict the traffic being forwarded to the server. The third scheme is simply a combination of the other two schemes. The evaluation of this proposal is conducted by requesting a single ASP file to an IIS server. Although the evaluation results show that these schemes improve the server throughput under overload, the authors do not address how the thresholds may be determined online. Jamjoom and Reumann [20] propose an adaptive mechanism called QGuard that uses traffic shaping to provide the overload protection and the differential service for Internet servers. QGuard drives the selection of incoming packet rates based on the observation of different system load indicators. However, it cannot take the potential resource consumption of requests into account, having to reduce the acceptance rate of all requests when one resource is over-utilized. For evaluating their mechanism, the authors have implemented a load-generating server application that can be configured to generate different types of load: CPU activity, memory usage, file accesses, and the generation of large response messages. Kanodia and Knightly [21] present a queuing-based algorithm, called latency-targeted multiclass admission control, for admission control and service differentiation using request and service sta- tistical envelopes (a modeling technique used to characterize the traffic of multiple flows over a shared link). The authors propose a server architecture having one request queue per service class. The admission controller attempts to meet latency bounds for the multiple classes using measured request and service rates of each class. The algorithm is only evaluated via trace-driven simula- tion using streams of tokenized requests collected from logs of two real Web servers: the server of the Computer Science Department at Rice University and the server for the 1998 World Cup Soccer. Li and Jamin [22] describe a measurement-based admission control algorithm to provide the proportional bandwidth allocation to clients independently of the amount of requests generated by each one. Bandwidth percentage for each client is statically specified at server startup. Requests from a client are rejected when it has both received more than its allocated bandwidth and the server is fully utilized. This approach considers the introduction of controlled amounts of delay in the processing of certain requests during overloads to ensure the different classes of requests are receiving the appropriate share of the bandwidth. The authors evaluate their proposal on the Apache web server using the httperf measurement tool. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  10. 10. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 77 Voigt et al. [17] present different kernel-based mechanisms for admission control and service differentiation in overloaded web servers. Their mechanisms include TCP SYN policing, prioritized listen queue, and HTTP header-based connection control. TCP SYN policing limits the number of incoming connection requests using a token bucket policy and prevents overload by enforcing a maximum acceptance rate of non-preferred clients. Prioritized listen queue reorders the listen queue based on the predefined connection priorities (request URL and client IP address) in order to provide low delay and high throughput to clients with high priority. Finally, HTTP header-based connection control provides admission control and priority based on the application-level information such as URLs and cookies. The authors have implemented the proposed kernel mechanisms in the AIX 5.0 system, and have evaluated the performance of a modified Apache Web server using WebStone 2.5 and a modified version of S-Client as workload generators. Finally, Voigt and Gunningberg [23] present an adaptive architecture that performs admission control based on the expected resource consumption of requests in order to avoid over-utilization of the critical resources. The authors identify the resource intensive requests and the resource they demand by the URL in the HTTP header and adapt the rate of CPU and bandwidth intensive requests according to the utilization of the corresponding resource. The adaptation of the acceptance rates is done using feedback control loops. This proposal is evaluated on the Apache web server using the S-Client tool for generating a workload that is derived from the SURGE traffic generator. 4.1. Admission control for dynamic content sites and web services The aforementioned works on admission control are based on web systems serving only static content, and for this reason, most of the proposed solutions are not directly applicable on multi-tiered sites based on the dynamic web content or web services. The main challenge in these environments is to determine how the acceptation of a new request would affect the state of the system. Formerly, it was assumed that request cost was linear in proportion to the size of the response generated. While this is true for static web content, the times of serving dynamic content or web services have much greater variability, which has no relation to the size of the response generated. Because of this, recently some works [14,16,25,26] have focused specifically on proposing admission control and service differentiation solutions for these systems. For instance, Elnikety et al. [14] present a proxy called Gatekeeper that transparently performs admission control and user-level request scheduling for multiply tiered e-commerce Web sites by externally observing execution costs of requests. When a request arrives to the server, Gatekeeper determines if admitting the request will exceed the capacity of the system by using an estimation of the resource usage for that request. In addition, Gatekeeper sorts the admission queue based on their expected processing times (SJF scheduling). Since the proxy is external, no modification to the host OS, web server, application server or database is required. However, this proposal depends on offline profiling in order to determine the capacity of the system. The evaluation is conducted by using the TPC-W benchmark for generating an e-commerce workload to a Tomcat application server. Guitart et al. [25] present a session-based admission control mechanism for secure environments, which is based on SSL connections differentiation depending on if the SSL connection reuses an existing SSL connection on the server or not. This mechanism prioritizes the acceptation of client connections that resume an existing SSL session, in order to increase the probability for a client Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  11. 11. 78 ´ J. GUITART, J. TORRES AND E. AYGUADE to complete a session, maximizing in this way the number of sessions successfully completed. This allows e-commerce sites based on SSL to increase the number of transactions completed, thus generating more revenue. In addition, the mechanism prevents applications overload by limiting the acceptation of new SSL connections to the maximum number acceptable by the application without overloading, depending on the available computation power and the estimated service time of the incoming connections. This proposal is evaluated by using the httperf generator to issue secure requests to a Tomcat server with the RUBiS benchmark deployed. Verma and Ghosal [16] propose a service time-based online admission control methodology for maximizing the profits of a service provider. The authors derive an online version of the shortest remaining job first (SRJF) policy for taking the admission control decision of a request. They use an estimate of the request service time, its QoS bounds, the prediction of arrivals and services times of requests to come in the short-term future, the rewards associated with servicing a request within its QoS bounds, and the capacity availability (after accounting for the usage of already admitted requests). The admission control admits a subset of requests that would maximize the profit of the provider. The evaluation is carried out with a content-based publish/subscribe messaging middleware service based on the standard Java Messaging Service (JMS). Finally, Welsh and Culler [26] describe an adaptive approach to overload control in the context of the SEDA web server [27]. SEDA decomposes services into a set of event-driven stages connected with request queues. Each stage can perform admission control based on monitoring the response time through the stage. The admission is controlled by using an additive-increase multiplicative- decrease algorithm that works by gradually increasing admission rate when the performance is satisfactory and decreasing it multiplicatively upon observing QoS violations. Their solution relies mainly on a heuristic approach for controller design (control parameters are set by hand after running tests against the system) instead of using control-theoretic techniques. The downside of this approach is that a request may be rejected late in the processing pipeline, after it has consumed significant resources in the previous stages. The evaluation includes dynamic content in the form of a web-based email service (Arashi). Emulated users access the service based on a simple Markovian model of the user behavior derived from traces of the UC Berkeley Computer Science Division’s IMAP server. 4.2. Session-based admission control In many web sites, especially in e-commerce, most of the applications are session-based. A session contains temporally and logically related request sequences from the same client. Session integrity is a critical metric in e-commerce. For an online retailer, the higher the number of sessions completed, the higher the amount of revenue that is likely to be generated. The same statement cannot be made about the individual request completions. Sessions that are broken or delayed at some critical stages, like checkout and shipping, could mean loss of revenue to the web site. Sessions have distinguishable features from individual requests that complicate the overload control. For this reason, admission control techniques that work on a per request basis, such as limiting the number of threads in the server or suspending the listener thread during overload periods, may lead to a large number of broken or incomplete sessions when the system is overloaded (despite the fact that they can help to prevent the server overload). Most of the aforesaid admission control works suffer from this limitation when dealing with session-based workloads. This has Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  12. 12. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 79 motivated some proposals [12,13,25,28,29] to prevent overload considering the particularities of session-based applications. For instance, Carlstrom and Rom [12] describe an architecture for request scheduling at user level and session-based admission control in web servers. The admission controller is rate-based. New sessions arriving when the maximum admitted session arrival rate has been achieved will be refused. If the initial request of a session is admitted, all subsequent requests within the same session will be also admitted. Once the session has been established, each request is classified with respect to the requested stage in the session, and enters in a stage-specific FIFO queue before receiving service. When the critical resource is freed, queued requests are scheduled using a generalized processor sharing (GPS) algorithm that uses techniques for nonlinear optimization (based on steady-state queue behavior) for maximizing an application-specific reward function. The evaluation is based on simulating an electronic store using Matlab. Chen and Mohapatra [13] characterize a commercial web server log for deriving session-based dependency relationships among HTTP requests. Based on the session-level traffic model, the authors propose a dynamic weighted fair sharing (DWFS) scheduling algorithm to control overload. This work combines an admission control mechanism (proposed by the authors in a previous work [19]) that admits as many sessions as possible so long as the server is not overloaded with the DWFS algorithm that discriminates the scheduling of requests based on the probability of completion of the session that the requests belong to. In this way, requests of sessions that have a higher probability of being completed are scheduled first. The authors evaluate their proposal over an Apache web server using a modified version of WebStone 2.5. Cherkasova and Phaal [28] propose a session-based admission control scheme and demonstrate that considering session characteristics in admission control reduces the number of sessions rejected. This approach allows admitted sessions to run to completion even under overload, denying new sessions when the observed server utilization exceeds a predefined threshold. Once the observed utilization drops below the given threshold, the server begins to admit and process new sessions again. The authors evaluate their proposal by simulating a commercial site and using a simple model to characterize sessions. Finally, Voigt and Gunningberg [29] extend the architecture proposed in their previous work [17] to provide kernel-based control of persistent connections. The goal is to allow the important sessions (understanding sessions as the sequence of individual requests on the same persistent connection) to complete when the server becomes overloaded. In this approach, the importance of persistent connections is based on the cookies found in the HTTP header. When the CPU utilization is higher than a predefined threshold, they do not abort all the incoming connections, but only the ones that are considered less important (those do not carrying a suitable cookie). However, this approach relies on the existence of unimportant persistent connections that can be aborted. If all the active persistent connections are important, the system will not abort any of them and will fail to recover. This proposal is evaluated on the Apache Web server using a modified version of the S-Client tool. The workload consists of static and dynamic requests. The latter are minor modifications of WebStone CGI files. Table II summarizes the main characteristics (concerning this technique) of the works using admission control and service differentiation in their solution. This includes how the admission control mechanism is triggered, the actuation performed, whether this work supports sessions, whether it also uses other techniques, and its target workload. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  13. 13. 80 Table II. Characteristics of works including admission control and service differentiation in their solution. work triggered by actuation sess? joint? workload Copyright q [11] threshold in queue length (total and pre- rejection of basic requests no yes 8kb static web pages on Apache mium) [19] available computation power vs. esti- rejection of requests (low- no no event-driven simulation of MSU mated service time est priority first) CS server logs (static) [18] threshold in queue length rejection of requests no no single ASP file on IIS server [20] system load indicators traffic shaping (rate) no no own server that can be configured to generate different types of load [21] request & service rate per class rejection of requests per no no trace-driven simulation of Rice class server & World Cup Soccer’98 logs (static) 2009 John Wiley & Sons, Ltd. [22] threshold in bandwidth per client rejection of requests per no no 10kb static web pages on Apache client [17] token bucket policy defined by rate & rejection of requests no yes WebStone (static & CGI files) on burst Apache ´ J. GUITART, J. TORRES AND E. AYGUADE [23] CPU utilization & queue length rate adaptation no no Surge (static) & CGI files on Apache [24] threshold in stretch factor per class (re- rejection of requests (low- no yes Internet search & online auction sponse time/service demand) est priority class first) (dynamic) [14] estimated resource usage of requests vs. rejection of request no yes TPC-W (dynamic) on Tomcat system capacity [25] available computation power vs. esti- rejection of new clients yes no RUBiS (dynamic) on Tomcat mated service time [16] estimated service time & associated re- rejection of request no yes simulation of messaging service ward vs. capacity availability & pre- based on JMS dicted arrivals [26] additive-increase multiplicative- rejection of requests per no no Arashi web-based email service decrease algorithm on 90th-percentile stage (dynamic) on SEDA response time [12] rate threshold based on expected reward rejection of new session yes yes Matlab simulation of electronic store (dynamic) [13] available computation power vs. esti- rejection of requests (low- yes yes WebStone (static) on Apache mated service time est priority first) [28] threshold in utilization reject new sessions yes no commercial site simulation [29] threshold in CPU utilization reject connections without yes no 12–29 kb static files & CGI files proper cookie on Apache DOI: 10.1002/cpe Concurrency Computat.: Pract. Exper. 2010; 22:68–106
  14. 14. Copyright q Table II. Continued. work triggered by actuation sess? joint? workload 2009 John Wiley & Sons, Ltd. [30] serving request before its maxi- selective request dropping no no TPC-W & Teoma Internet mum allowed response time un- Search Engine (dynamic) on feasible Apache/Tomcat/Neptune [31] deviation from target settings in update MaxClients & no no WebStone (static) on Apache CPU & memory KeepAlive Timeout [32] deviation from target response adjust request acceptance no yes TPC-W (dynamic) on Tomcat time probability [33] deviation from target response adjust request acceptance no yes TPC-W (dynamic) on Tomcat time probability [34] available computation power vs. rejection of new clients yes yes RUBiS (dynamic) on Tomcat estimated service time [35] serving request before its maxi- rejection of requests no yes 1–256 kb static files & PHP mum allowed response time un- within a batch scripts on Apache feasible A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS DOI: 10.1002/cpe Concurrency Computat.: Pract. Exper. 2010; 22:68–106 81
  15. 15. 82 ´ J. GUITART, J. TORRES AND E. AYGUADE While admission control and service differentiation can be effective for providing differentiated QoS to clients and preventing server overload, they must be used with caution. Using admission control determines that a percentage of requests are refused, which can translate into unhappy clients. For this reason, dropping too many requests should be prevented since this will also cause revenue loss. According to this, it would be desirable to attempt less aggressive solutions before. In addition, session-based workloads require specialized admission control solutions to support performance guarantees. Furthermore, most of the proposed solutions trigger admission control when some given thresholds are surpassed. In general, these thresholds are static and must be determined beforehand. For this reason, these solutions are not directly applicable if we aim for an autonomic system. 5. DYNAMIC RESOURCE MANAGEMENT As commented before, the hosting platform is responsible for providing their running applications with enough resources to guarantee the agreed QoS in the SLA. When the hosting platform fails to fulfill these QoS guarantees, it must compensate the customer by paying him a penalty. In order to avoid these payments, the hosting platforms tend to over-provision the applications with an amount of resources according to their highest expected resource demand. However, as the resource demand of Internet applications can vary considerably over time, this can lead to resource under-utilization in the hosting platform most of the time. Of course, this is not desirable when a profitable hosting platform is envisioned. Notice that the unused resources could be temporarily allocated to another service, improving in this way the utilization and the profit for the provider. Taking this into account, recent studies [36–38] have reported the considerable benefits of dynam- ically reallocating resources among hosted applications based on the variations in their workloads instead of statically over-provisioning resources in a hosting platform. The goal is to meet the applications’ requirements on demand and adapt to their changing resource needs, as shown in Figure 5. In this way, better resource utilization can be achieved and the system can react to un- expected workload increases. In addition, differentiated service can be easily provided to different customers’ classes by varying their allocated resources. This premise has been used as the basis of several works, which dynamically provision resources to applications on demand in order to meet the agreed QoS while using the hosting platform resources in a cost-effective way. These works can be classified depending on the resource pro- visioning model they consider, which can be either a dedicated or a shared model (see [36]). In the dedicated model, some cluster nodes are dedicated to each application and the provisioning AAAA AA AAAA AAAA Server Server CPU 1 CPU 1 CPU 2 Figure 5. Dynamic resource management technique. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  16. 16. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 83 technique must determine how many nodes to allocate to the application. In the shared model, the node resources can be shared among multiple applications and the provisioning technique needs to determine how to partition the resources on each node among competing applications. 5.1. Resource management in dedicated hosting platforms Some works [35,39–43] have addressed the resource provisioning problem using the dedicated model. For instance, Appleby et al. [39] present Oceano, a dedicated management infrastructure for server farms developed at IBM. In Oceano, the server farm is physically partitioned into clusters, each serving one customer. The servers can be moved dynamically across clusters depending on the customers’ changing needs. The addition and removal of servers from clusters is triggered by SLA violations triggered by monitoring agents. This approach suffers of lack of responsiveness due to the latency of the server transfer operation from one cluster to another. Bouchenak et al. [41] present Jade, a middleware for self-management of distributed software en- vironments. The main principle is to wrap legacy software pieces in components in order to provide a uniform management interface, thus allowing the implementation of management applications. The authors use Jade to implement a self-optimizing version of an emulated electronic auction site deployed on a dedicated J2EE cluster. The self-optimization, which is implemented using a control loop based on thresholds, consists of dynamically increasing or decreasing the number of replicas (at any tier of the J2EE architecture) in order to accommodate load peaks. The proposed self-optimization is very simple, intended only to demonstrate the ability of Jade to implement spe- cific autonomic components. The evaluation has been realized by deploying the RUBiS benchmark on the Tomcat application server. Finally, Ranjan et al. [42] describe a framework for QoS-driven dynamic resource allocation in dedicated Internet data centers. The authors propose a simple adaptive algorithm to reduce the average number of servers used by an application while satisfying its QoS objectives. The algorithm assumes a G/G/N open queuing model with FCFS scheduling on each server in the cluster, and response time linearly related to utilization. This proposal is evaluated with a trace-driven simulator based on YacSim using traces from large-scale e-commerce and search-engine sites (i.e. google). Nevertheless, hosting platforms based on a dedicated model use to be expensive in terms of space, power, cooling, and cost. For this reason, shared model constitutes an attractive lower cost choice for hosting environments when dedicated model cannot be afforded. 5.2. Resource management in shared hosting platforms 5.2.1. Resource management in a single machine In the context of resource management solutions using the shared model, some works [44,45] have developed mechanisms for predictable allocation of resources in single machine environments. In particular, Banga et al. [44] provide resource containers, a new operating system abstraction that embodies a resource enabling fine-grained allocation of resources and accurate accounting of resource consumption in web server-like applications. When combined with proportional resource schedulers, they enable the provision of differentiated services, where a predetermined fraction of the server resources is guaranteed to be available to each service. The authors have prototyped their Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  17. 17. 84 ´ J. GUITART, J. TORRES AND E. AYGUADE proposal modifying a Digital Unix kernel. The evaluation is conducted by issuing requests to a modified version of the thttpd server (changed in order to use resource containers) from a number of clients running the S-Client software. Finally, Liu et al. [45] describe the design of online feedback control algorithms to dynamically adjust entitlement values for a resource container on a server shared by multiple applications. The goal is to determine the minimum level of entitlement for the container such that its hosted appli- cation achieves the desired performance levels. A proportional integral (PI) controller is designed offline for a fixed workload and a self-tuning adaptive controller is described to handle limited variations in the workload. These controllers are used to compute the proper level of CPU entitle- ment for the Web server in order to maintain the measured mean response time around the desired response time. The controllers are implemented and evaluated on a testbed using the HP-UX PRM as the resource container and the Apache web server as the hosted application in the container. Client requests are generated using the httperf tool. 5.2.2. Resource management in clustered hosting platforms Whereas the mechanisms for resource management in a single machine are useful, they are in- sufficient in typical hosting platforms, which are generally comprised of clusters of nodes. These systems require coordinated resource management mechanisms that consider all the nodes in the cluster. According to this, addressing the resource provisioning problem for clustered architectures using the shared model has been the main goal of numerous works [6,24,46–52]. In particular, Aron et al. [46] present Cluster Reserves, a resource management facility for cluster- based web servers that afford effective performance isolation in the presence of multiple services that share a server cluster. Cluster Reserves extend single-node resource management mechanisms (i.e. Resource Containers [44]) to a cluster environment. This work provides differential service to clients by providing fixed resource shares to services spanned in multiple nodes and dynamically adjusting the shares on each individual server to bound aggregate usage for each hosted service based on the local resource usage. The authors employ a linear programming formulation for allocating resources, resulting in polynomial time complexity. The authors evaluate their proposal by running the Apache Web server at the server nodes and generating requests with a client program based on the S-Client software. Chase et al. [49] present the design and implementation of Muse, which is an architecture for resource management in a shared hosting center with an emphasis on energy as a driving resource management issue. Muse uses an economic model for dynamic provisioning of resources in which services bid for resources as a function of delivered performance. The system continuously monitors load and plans resource allotments by estimating their effect on service performance. A greedy resource allocation algorithm, called maximize service revenue and profit, adjusts resource prices to balance supply and demand. A major goal of this work is to maximize the number of unused servers so that they can be powered down to reduce the energy consumption. The authors evaluate their approach on Apache Web servers by generating combined workloads of synthetic traffic and real request traces with the SURGE workload generation tool. Chase et al. [50] describe the architecture and implementation of cluster-on-demand, an auto- mated framework to manage resources in a shared hosting platform, which allows partitioning on-the-fly a physical cluster into multiple independent virtual clusters. A virtual cluster (vcluster) Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  18. 18. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 85 is a functionally isolated subset of cluster nodes configured for a common purpose, with associ- ated user accounts and storage resources, a user-specified software environment, and a private IP address block and DNS naming domain. Nodes can be reallocated among these virtual clusters as needed (according to demand or resource availability), reconfiguring them as needed with PXE network boots. The system resizes each dynamic vcluster in cooperation with the service hosted within it, which contains the logic for monitoring load and changing membership in the active server set. The evaluation uses the Sun’s GridEngine and traces from the Duke Computer Science server. Shen et al. [6] present an integrated resource management framework (part of Neptune system) that provides flexible service quality specification, efficient resource utilization, and service differ- entiation for cluster-based services. The authors introduce a unified quality-aware metric, called ‘quality-aware service yield’, which depends on the response time. The overall goal of the system is to maximize the aggregate service yield resulting from all requests. The resources are managed through a two-level request distribution and scheduling scheme. At the cluster level, requests for each service class are evenly distributed to all replicated service nodes without explicit partitioning. Inside each node, an adaptive scheduling policy adjusts to the runtime load condition and seeks high aggregate service yield. This policy considers the relative deadline of the requests, their expected resource consumption (estimated using an EWMA filter), and their expected yield. The proposed framework is evaluated using traces of two service workloads. The first service is a Differentiated Search service based on an index search component from Ask Jeeves search. The second service, called Micro-benchmark, is based on a CPU-spinning benchmark. Urgaonkar et al. [51] present techniques for provisioning CPU and network resources in shared hosting platforms running potentially antagonistic third-party applications. The authors first de- rive the resource usage profiles of services using kernel-based resource monitoring on a dedicated node running the service, and use these profiles to guide the placement of application compo- nents onto shared nodes. Then they propose techniques to overbook the cluster resources in a controlled manner such that the platform can maximize its revenue while providing probabilistic QoS guarantees to applications (resources allocated to services correspond to a high percentile of the resources needed to satisfy all requests). However, the system proposed does not describe any continuous monitoring facility or reaction mechanism (resource usages are only measured when deriving profiles). In addition, provisioning decisions are based on the application steady state instead of considering rapidly changing demands. The proposed techniques are implemented in a Linux kernel and evaluated using an Apache web server with the SpecWeb99 benchmark, an MPEG streaming media server, a Quake game server, and a PostgreSQL database server with the pgbench benchmark. Finally, Urgaonkar and Shenoy [52] present the design of Sharc, a system that enables resource sharing among applications running on a shared hosting platform. Sharc extends the benefits of resource management mechanisms in a single node, such as reservations or shares to clustered environments. The authors present techniques for managing CPU and network interface band- width on a cluster-wide basis, supporting resource reservations based on the applications resource requests (made at their startup time), dynamic resource allocation based on applications past re- source usage, and performance isolation of applications. This proposal is evaluated using two types of workloads: a commercial third-party hosting platform workload (an e-commerce applica- tion, a replicated Apache web server, a file download server, and a home-grown streaming media Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  19. 19. 86 ´ J. GUITART, J. TORRES AND E. AYGUADE server) and a research workgroup environment workload (a compute-intensive scientific application, an information retrieval application, a compute-intensive disk simulator, and an application build job). 5.2.3. Resource management in virtualized hosting platforms Recently, the use of virtualization has been explored for cost reduction and easier management in shared hosting platforms. Virtualization allows the consolidation of services, multiplexing them onto physical resources while supporting their isolation from other services sharing the same physical resource, reducing in this way providers costs and maintaining the integrity of the underlying resources and the other services. Virtualization has other valuable features for hosting platforms. It offers the image of a dedicated and customized machine to each user, decoupling them from the system software of the underlying resource. In addition, virtualization allows agile and fine- grain dynamic resource provisioning by providing a mechanism for carefully controlling how and when the resources are used for migrating a running machine from resource to resource if needed. According to this, recently several works [53–58] have proposed resource management solutions over virtualized platforms. In particular, Govindan et al. [55] have developed a new communication-aware CPU scheduling algorithm and a CPU usage accounting mechanism to address the existing obstacles in the efficient operation of highly consolidated virtualization-based hosting platforms. The CPU scheduling algo- rithm incorporates the I/O behavior of the overlying VMs into its decision-making. The key idea is to introduce short-term unfairness in CPU allocations by preferentially scheduling communication- oriented applications over their CPU-intensive counterparts. The CPU overheads resulting from server virtualization are also incorporated into the CPU scheduling algorithm. The evaluation is conducted in a Xen-based hosting platform using two applications: TPC-W-NYU, which is a three- tiered application based on the TPC-W benchmark for an online bookstore, deployed on a JBoss server, and a streaming media server written in Java. Norris et al. [56] propose OnCall, a spike management system based on virtual machines (i.e. VMware GSX server) for shared hosting clusters that serve dynamic content. The system relies on an economically efficient marketplace for computing resources, allowing applications to trade com- puting capacity on the market using automated market policies. In this way, OnCall can multiplex several (possibly competing) applications onto a single server cluster and provide enough aggregate capacity to handle temporary workload surges for a particular application while guaranteeing some capacity to each application in steady state. The proposed system is evaluated using simulation. Load is generated using the Apache JMeter testing tool. Finally, Xu et al. [58] present a two-level autonomic resource management system that enables automatic and adaptive resource provisioning in accordance with SLA specifying dynamic tradeoffs of service quality and cost. The authors implement local controllers at the virtual-container level and a global controller at the resource-pool level. A novelty of the controller designs is their use of fuzzy logic to characterize the relationship between application workload and resource demand. The evaluation is conducted on a data center virtualized with VMware ESX Server, using a combination of e-business applications (e.g. Java Pet Store) and traces collected from the ’98 World Cup site. Java Pet Store requests are generated with the httperf tool, while the ’98 World Cup workload is replayed using the Real-Time Web Log Replayer application. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  20. 20. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 87 Table III summarizes the main characteristics (concerning this technique) of the works using dynamic resource management in their solution. This includes the mechanism, which drives the resource allocation decisions, the resource provisioning model (i.e. shared (single machine (S), clustered platform (C), virtualized platform (V)) or dedicated), whether this work uses also other techniques, and its target workload. Dynamic resource management has become an indispensable technique for providing perfor- mance guarantees to customers, and at the same time, fully exploiting the platform resources. Having this in mind, shared platforms (and in particular those using virtualization) have consoli- dated as the best choice. However, to obtain better results, dynamic resource management should be combined with other techniques, in particular for dealing with situations when it is not possible to allocate additional resources to fulfill the agreed QoS. 6. SERVICE DEGRADATION Service degradation avoids refusing clients as a response to overload (as admission control does) by reducing the level of service offered to them under overload conditions, as shown in Figure 6. In general, service degradation can be thought as a special case of service differentiation, where a given customer’s class is provided with a degraded level of service when the server is overloaded. In addition, rejecting a request through admission control can be seen as the lowest quality setting for a degraded service. The initial approach considered in previous works for implementing service degradation was con- tent adaptation [63,64]. These works propose to reduce the quality of static web content (e.g. lower resolution images) provided to clients during overload periods, consuming in this way less system resources. In particular, content adaptation aims to reduce the network bandwidth consumption on the server. In particular, Abdelzaher and Bhatti [63] evaluate the potential for content adaptation on the web by considering three techniques: image degradation by lossy compression, reduction of embedded objects per page, and reduction in local links. In addition, they present a solution to the overload problem in web servers, which relies on adapting the delivered content. This solution is based on an adaptation agent that monitors the web server load and triggers web content adaptation when server response time increases beyond a pre-computed threshold. Adaptation is evaluated on an Apache web server using only HTTP 1.0 connections. Chandra et al. [64] propose using informed transcoding techniques to provide differentiated service and to dynamically allocate available bandwidth among different client classes, while deliv- ering a high degree of information content (quality factor) for all clients. The authors have modified the Apache web server in order to incorporate transcoding by serving proportionately lower quality variations of images if the average consumed bandwidth for the past half hour is more than the target bandwidth. The server is evaluated using the http load test client. One disadvantage of content adaptation is that it is not applicable to many services due to their design. For example, an e-mail or chat service cannot practically degrade service in response to overload: ‘lower-quality’ e-mail and chat have no meaning. In fact, service degradation as described is not directly applicable to web services. In these cases, another of the described mechanisms must be used. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  21. 21. 88 Copyright q Table III. Characteristics of works including resource management in their solution. work allocations driven by model joint? workload [39] monitoring agents issuing SLA violations dedicated no — whenever thresholds are exceeded [41] control loop based on thresholds dedicated no RUBiS (dynamic) on Tomcat [42] open queuing model dedicated no trace simulation (based on YacSim) of large-scale e-commerce & search- engine traces (dynamic) [44] resource usage of containers shared (S) no 1 kb static file & CGI script on thttpd [45] feedback control loop shared (S) no 1 kb to 90 kb static files on Apache 2009 John Wiley & Sons, Ltd. [46] linear programming formulation shared (C) no static files & CGI scripts on Apache [47] open queuing model shared (C) no simulation of World Cup Soccer’98 logs (static) [48] resource shares & QoS goals relation shared (C) no static SSL & CGI script on Apache ´ J. GUITART, J. TORRES AND E. AYGUADE computed from runtime measurements [49] predictions on service performance based shared (C) no Surge (static) on Apache on runtime measurements [50] applications resource requests based on shared (C) no Traces from Duke Computer Science runtime measurements server on Sun’s GridEngine [6] predictions from online measurements shared (C) yes Ask Jeeves search (dynamic) on Neptune [51] kernel-based online profiling of the shared (C) no SpecWeb99 (static & dynamic) on Apache resource usage on a dedicated node [52] applications resource requests and past shared (C) no commercial data center & research usage workgroup environment [24] multiclass open queuing model shared (C) yes Internet search (WebGlimpse) & online auction (dynamic) [55] runtime measurements of I/O shared (V) no TPC-W-NYU (dynamic) on JBoss & virtualization Streaming media server [56] marketplace for computing resources shared (V) no simulation with Apache JMeter [58] two-level controller based on fuzzy shared (V) no Java Pet Store (dynamic) & World modeling Cup Soccer’98 logs (static) [59] feedback control loop shared (C) no e-commerce (dynamic) [60] feedback control loop dedicated no Surge (static) on Apache DOI: 10.1002/cpe Concurrency Computat.: Pract. Exper. 2010; 22:68–106
  22. 22. Copyright q Table III. Continued. work allocations driven by model joint? workload 2009 John Wiley & Sons, Ltd. [57] feedback control loop shared (V) no RUBiS/TPC-W (dynamic) on Apache [54] open queuing model shared (V) no algorithm execution time [53] open queuing model shared (V) no discrete event simulation of synthetic workload [40] multiclass queuing model dedicated no simulation of transactional & batch workloads [61] closed queuing model shared (C) no Surge (static) on Dash [62] multiclass open queuing model dedicated no simulation of e-commerce (dynamic) [43] open queuing model dedicated no RUBiS/Rubbos (dynamic) on Apache/Tomcat [34] applications resource requests based shared (C) yes RUBiS (dynamic) on Tomcat on runtime measurements [35] multiclass open queuing model in dedicated yes 1kb to 256kb static files & PHP conjunction with online measurements scripts on Apache [15] feedback control loop in conjunction shared (C) yes Trade 6 benchmark (dynamic) on with closed queuing model IBM WebSphere A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS DOI: 10.1002/cpe Concurrency Computat.: Pract. Exper. 2010; 22:68–106 89
  23. 23. 90 ´ J. GUITART, J. TORRES AND E. AYGUADE Server Normal AAAA Service Degraded aaaa Service Figure 6. Service degradation technique. Table IV. Characteristics of works including service degradation in their solution. work triggered by actuation joint? workload [63] threshold in response degrade delivered content (images, no static content on Apache time embedded objects, local links) [64] threshold in avg deliver lower quality images no static content on Apache consumed bandwidth [65] threshold in resource degrade delivered content yes 64kb static images on Apache utilization within SLA limits [35] threshold in request degrade performance within yes 1–256kb static files & PHP drop rate SLA limits scripts on Apache For this reason, recent works [35,65] have reinterpreted the concept of service degradation. For instance, Urgaonkar and Shenoy [35] implement what they call ‘QoS adaptive degradation’. This approach considers that during overload conditions (when the request drop rate exceeds a certain predefined value), the performance of admitted requests can be degraded within the limits established by the SLA. The same concept is applied in [65], but named ‘QoS adaptation’. In this work, the authors use the mechanisms for content adaptation described in [63] for providing degraded service within the values indicated in the SLA when the resource utilization exceeds a predefined threshold. Other works do not directly implement service degradation mechanisms, but rather signal overload to applications in a way that allows them to degrade if possible. For example, in SEDA [26], stages can obtain the current 90th-percentile response time measurement as well as enable or disable the stage’s admission control mechanism. This allows a service to implement the degradation by periodically sampling the current response time and comparing it to the target. If service degradation is ineffective, because the load is too high to support even the lowest quality setting, the stage can re-enable admission control to cause requests to be rejected. Table IV summarizes the main characteristics (concerning this technique) of the works using service degradation in their solution. This includes how the service degradation mechanism is triggered, the actuation performed, whether this work uses also other techniques, and its target workload. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  24. 24. A SURVEY ON PERFORMANCE MANAGEMENT FOR INTERNET APPLICATIONS 91 Service degradation can be thought of as a complementary technique that can be used to enhance a given performance management solution. Using it alone can contribute to delay overload, but it needs to be combined with other techniques to be fully effective. In addition, it is mainly focused on static web content, though it could be reinterpreted to fit also in web services environments. 7. CONTROL THEORETICAL APPROACHES Control theory has been widely used to adjust the behavior of traditional dynamical systems. Internet systems also fit in the definition of a dynamical system, and for this reason, a great number of works [15,30,31,45,57,59,60,65] have proposed the use of control theory for performance management in the context of Internet applications. As a rule of thumb, these works propose the use of a closed- loop controller that controls the parameters of the actuation technique (generally admission control and/or resource provisioning) using feedback information from the system. A typical architecture of such a system is shown in Figure 7. Notice that in a system managed by a closed-loop controller (a.k.a. feedback controller), the output of the system y is fed back to the reference value r , through a sensor measurement. The controller then takes the error e (i.e. the difference) between the reference and the output to change the inputs u to the system under control (the server) by means of an actuator. As commented, admission control and dynamic resource provisioning have been typically used as actuators in the related literature. In particular, Abdelzaher et al. [65] describe the use of classical feedback control theory for performance control of a web server, achieving in this way overload protection, performance guar- antees, and service differentiation in the presence of load unpredictability. The proposed architecture extends pure admission control schemes with the degradation of clients QoS, which is accomplished using content adaptation. The authors design a utilization control loop that regulates the extent of degradation to satisfy a pre-specified utilization bound. The reference resource demands are de- rived from served request rate and delivered byte bandwidth using a time-varying linear model of a static web server (i.e. Apache web server). The workload for the experiments is generated using the httperf tool. Blanquer et al. [30] describe Quorum, a non-invasive software approach to QoS provisioning for large-scale Internet services which ensures reliable QoS guarantees using traffic shaping and admission control at the entrance of the site, and response monitoring at the exit. Quorum treats the site as an unknown ‘black-box’ system, intercepts the request and response streams entering and leaving the site, and uses feedback-driven techniques (based on weighted fair queuing) to r e u y Controller Actuator Server Sensor Figure 7. Typical architecture of a control theoretical approach. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe
  25. 25. 92 ´ J. GUITART, J. TORRES AND E. AYGUADE dynamically control the proportions in which requests of different classes must be forwarded to the server. In this system, a request will be dropped when the sum of the time it has been queued and its observed compute time is higher than its maximum allowed response time. The evaluation, based on the Teoma Internet Search Engine and the TPC-W benchmark, includes a comparison with the Neptune system [6]. Diao et al. [31] describe a feedback control mechanism based on a ‘black-box’ model that meets given CPU and memory utilization targets by tuning Apache server parameters (the number of server processes (MaxClients) and the per-connection idle timeout (KeepAlive Timeout)). However, these parameters do not directly address metrics of interest to the web site, such as response time or throughput. In addition, reducing the number of server processes leads to increased likelihood of stalling incoming connections. Although this can effectively protect server from overload, it results in the poor client perceived performance. The authors also limit their attention to static web content. This proposal is evaluated on the Apache web server using the httperf tool. The generated workload is extracted from the WebStone benchmark. Karlsson et al. [59] propose a non-intrusive approach that interposes a fair-queuing scheduler on the request path between a service and its clients to intercept incoming requests and enforce proportional sharing of the service resources among workloads. The authors have designed an adaptive optimal MIMO controller that dynamically sets the share of each workload based on their observed throughputs and response times. The experimental evaluation is conducted using two different systems: a 3-tier e-commerce system that mimics real-world users behavior and receives requests from clients running the httperf tool, and an NFS file server that is shared by multiple workloads. Lu et al. [60] introduce a feedback control architecture for adapting resource allocation such that differentiated connection delays between different service classes in web servers under HTTP 1.1 are achieved. Using the Root Locus method, the authors design a feedback controller that computes the process proportions for each class to maintain the desired connection delays (the time interval between the arrival of a TCP connection request and the time, the connection is accepted by a server process). The controller assumes linear behavior of the system. A modified Apache web server is used for implementing the adaptive architecture and, for this reason, the evaluation only considers static web content (in the form of the SURGE benchmark). Finally, Padala et al. [57] address the problem of dynamically controlling resource allocations to individual components of complex, multi-tier enterprise applications in a shared hosting envi- ronment. The authors propose a two-layered controller architecture: a utilization controller that controls the resource allocation for a single application tier and an arbiter controller that controls the resource allocations across multiple application tiers and multiple application stacks sharing the same infrastructure. This controller dynamically adjusts the CPU shares to each VM in order to meet application-level QoS goals while achieving high resource utilization in the data center. In order to accomplish this, the controller sets CPU shares that lead to a 80% utilization level for each VM. This proposal is evaluated in a virtualized platform (with Xen Hypervisor), using two two-tier applications: the RUBiS auction site benchmark and a Java implementation of TPC-W, both deployed in the Apache web server. Table V summarizes the main characteristics (concerning this technique) of the works using control theory in their solution. This includes the reference signal of the system, the actuation performed, whether this work also uses other techniques, and its target workload. Copyright q 2009 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2010; 22:68–106 DOI: 10.1002/cpe