8 J Internet Serv Appl (2010) 1: 7–18 Easy access: Services hosted in the cloud are generally has generated not only market hypes, but also a fair amountweb-based. Therefore, they are easily accessible through a of skepticism and confusion. For this reason, recently therevariety of devices with Internet connections. These devices has been work on standardizing the deﬁnition of cloud com-not only include desktop and laptop computers, but also cell puting. As an example, the work in  compared over 20phones and PDAs. different deﬁnitions from a variety of sources to conﬁrm a Reducing business risks and maintenance expenses: By standard deﬁnition. In this paper, we adopt the deﬁnitionoutsourcing the service infrastructure to the clouds, a service of cloud computing provided by The National Institute ofprovider shifts its business risks (such as hardware failures) Standards and Technology (NIST) , as it covers, in ourto infrastructure providers, who often have better expertise opinion, all the essential aspects of cloud computing:and are better equipped for managing these risks. In addi-tion, a service provider can cut down the hardware mainte- NIST deﬁnition of cloud computing Cloud computing is anance and the staff training costs. model for enabling convenient, on-demand network access However, although cloud computing has shown consid- to a shared pool of conﬁgurable computing resources (e.g.,erable opportunities to the IT industry, it also brings many networks, servers, storage, applications, and services) thatunique challenges that need to be carefully addressed. In this can be rapidly provisioned and released with minimal man-paper, we present a survey of cloud computing, highlighting agement effort or service provider interaction.its key concepts, architectural principles, state-of-the-art im-plementations as well as research challenges. Our aim is to The main reason for the existence of different percep-provide a better understanding of the design challenges of tions of cloud computing is that cloud computing, unlikecloud computing and identify important research directions other technical terms, is not a new technology, but ratherin this fascinating topic. a new operations model that brings together a set of ex- The remainder of this paper is organized as follows. In isting technologies to run business in a different way. In-Sect. 2 we provide an overview of cloud computing and deed, most of the technologies used by cloud computing,compare it with other related technologies. In Sect. 3, we such as virtualization and utility-based pricing, are not new.describe the architecture of cloud computing and present Instead, cloud computing leverages these existing technolo- gies to meet the technological and economic requirementsits design principles. The key features and characteristics of of today’s demand for information technology.cloud computing are detailed in Sect. 4. Section 5 surveysthe commercial products as well as the current technologies 2.2 Related technologiesused for cloud computing. In Sect. 6, we summarize the cur-rent research topics in cloud computing. Finally, the paper Cloud computing is often compared to the following tech-concludes in Sect. 7. nologies, each of which shares certain aspects with cloud computing: Grid Computing: Grid computing is a distributed com-2 Overview of cloud computing puting paradigm that coordinates networked resources to achieve a common computational objective. The develop-This section presents a general overview of cloud comput- ment of Grid computing was originally driven by scien-ing, including its deﬁnition and a comparison with related tiﬁc applications which are usually computation-intensive.concepts. Cloud computing is similar to Grid computing in that it also employs distributed resources to achieve application-level2.1 Deﬁnitions objectives. However, cloud computing takes one step further by leveraging virtualization technologies at multiple levelsThe main idea behind cloud computing is not a new one. (hardware and application platform) to realize resource shar-John McCarthy in the 1960s already envisioned that com- ing and dynamic resource provisioning.puting facilities will be provided to the general public like Utility Computing: Utility computing represents thea utility . The term “cloud” has also been used in vari- model of providing resources on-demand and charging cus-ous contexts such as describing large ATM networks in the tomers based on usage rather than a ﬂat rate. Cloud comput-1990s. However, it was after Google’s CEO Eric Schmidt ing can be perceived as a realization of utility computing. Itused the word to describe the business model of provid- adopts a utility-based pricing scheme entirely for economicing services across the Internet in 2006, that the term re- reasons. With on-demand resource provisioning and utility-ally started to gain popularity. Since then, the term cloud based pricing, service providers can truly maximize resourcecomputing has been used mainly as a marketing term in a utilization and minimize their operating costs.variety of contexts to represent many different ideas. Cer- Virtualization: Virtualization is a technology that ab-tainly, the lack of a standard deﬁnition of cloud computing stracts away the details of physical hardware and provides
J Internet Serv Appl (2010) 1: 7–18 9virtualized resources for high-level applications. A virtual- The hardware layer: This layer is responsible for man-ized server is commonly called a virtual machine (VM). Vir- aging the physical resources of the cloud, including phys-tualization forms the foundation of cloud computing, as it ical servers, routers, switches, power and cooling systems.provides the capability of pooling computing resources from In practice, the hardware layer is typically implementedclusters of servers and dynamically assigning or reassigning in data centers. A data center usually contains thousandsvirtual resources to applications on-demand. of servers that are organized in racks and interconnected Autonomic Computing: Originally coined by IBM in through switches, routers or other fabrics. Typical issues2001, autonomic computing aims at building computing sys- at hardware layer include hardware conﬁguration, fault-tems capable of self-management, i.e. reacting to internal tolerance, trafﬁc management, power and cooling resourceand external observations without human intervention. The management.goal of autonomic computing is to overcome the manage- The infrastructure layer: Also known as the virtualiza-ment complexity of today’s computer systems. Although tion layer, the infrastructure layer creates a pool of storagecloud computing exhibits certain autonomic features such and computing resources by partitioning the physical re-as automatic resource provisioning, its objective is to lower sources using virtualization technologies such as Xen ,the resource cost rather than to reduce system complexity. KVM  and VMware . The infrastructure layer is an In summary, cloud computing leverages virtualization essential component of cloud computing, since many keytechnology to achieve the goal of providing computing re- features, such as dynamic resource assignment, are onlysources as a utility. It shares certain aspects with grid com- made available through virtualization technologies.puting and autonomic computing but differs from them in The platform layer: Built on top of the infrastructureother aspects. Therefore, it offers unique beneﬁts and im- layer, the platform layer consists of operating systems andposes distinctive challenges to meet its requirements. application frameworks. The purpose of the platform layer is to minimize the burden of deploying applications directly into VM containers. For example, Google App Engine oper- ates at the platform layer to provide API support for imple-3 Cloud computing architecture menting storage, database and business logic of typical web applications.This section describes the architectural, business and various The application layer: At the highest level of the hierar-operation models of cloud computing. chy, the application layer consists of the actual cloud appli- cations. Different from traditional applications, cloud appli-3.1 A layered model of cloud computing cations can leverage the automatic-scaling feature to achieve better performance, availability and lower operating cost.Generally speaking, the architecture of a cloud comput- Compared to traditional service hosting environmentsing environment can be divided into 4 layers: the hard- such as dedicated server farms, the architecture of cloudware/datacenter layer, the infrastructure layer, the platform computing is more modular. Each layer is loosely coupledlayer and the application layer, as shown in Fig. 1. We de- with the layers above and below, allowing each layer toscribe each of them in detail: evolve separately. This is similar to the design of the OSIFig. 1 Cloud computingarchitecture
10 J Internet Serv Appl (2010) 1: 7–18model for network protocols. The architectural modularity 3.3 Types of cloudsallows cloud computing to support a wide range of applica-tion requirements while reducing management and mainte- There are many issues to consider when moving an enter-nance overhead. prise application to the cloud environment. For example, some service providers are mostly interested in lowering op-3.2 Business model eration cost, while others may prefer high reliability and se- curity. Accordingly, there are different types of clouds, each with its own beneﬁts and drawbacks:Cloud computing employs a service-driven business model. Public clouds: A cloud in which service providers of-In other words, hardware and platform-level resources are fer their resources as services to the general public. Pub-provided as services on an on-demand basis. Conceptually, lic clouds offer several key beneﬁts to service providers, in-every layer of the architecture described in the previous sec- cluding no initial capital investment on infrastructure andtion can be implemented as a service to the layer above. shifting of risks to infrastructure providers. However, pub-Conversely, every layer can be perceived as a customer of lic clouds lack ﬁne-grained control over data, network andthe layer below. However, in practice, clouds offer services security settings, which hampers their effectiveness in manythat can be grouped into three categories: software as a ser- business scenarios.vice (SaaS), platform as a service (PaaS), and infrastructure Private clouds: Also known as internal clouds, privateas a service (IaaS). clouds are designed for exclusive use by a single organiza-1. Infrastructure as a Service: IaaS refers to on-demand tion. A private cloud may be built and managed by the orga- nization or by external providers. A private cloud offers the provisioning of infrastructural resources, usually in terms highest degree of control over performance, reliability and of VMs. The cloud owner who offers IaaS is called an security. However, they are often criticized for being simi- IaaS provider. Examples of IaaS providers include Ama- lar to traditional proprietary server farms and do not provide zon EC2 , GoGrid  and Flexiscale . beneﬁts such as no up-front capital costs.2. Platform as a Service: PaaS refers to providing platform Hybrid clouds: A hybrid cloud is a combination of public layer resources, including operating system support and and private cloud models that tries to address the limitations software development frameworks. Examples of PaaS of each approach. In a hybrid cloud, part of the service in- providers include Google App Engine , Microsoft frastructure runs in private clouds while the remaining part Windows Azure  and Force.com . runs in public clouds. Hybrid clouds offer more ﬂexibility3. Software as a Service: SaaS refers to providing on- than both public and private clouds. Speciﬁcally, they pro- demand applications over the Internet. Examples of SaaS vide tighter control and security over application data com- providers include Salesforce.com , Rackspace  pared to public clouds, while still facilitating on-demand and SAP Business ByDesign . service expansion and contraction. On the down side, de- The business model of cloud computing is depicted by signing a hybrid cloud requires carefully determining theFig. 2. According to the layered architecture of cloud com- best split between public and private cloud components.puting, it is entirely possible that a PaaS provider runs its Virtual Private Cloud: An alternative solution to address-cloud on top of an IaaS provider’s cloud. However, in the ing the limitations of both public and private clouds is calledcurrent practice, IaaS and PaaS providers are often parts of Virtual Private Cloud (VPC). A VPC is essentially a plat-the same organization (e.g., Google and Salesforce). This is form running on top of public clouds. The main difference is that a VPC leverages virtual private network (VPN) technol-why PaaS and IaaS providers are often called the infrastruc- ogy that allows service providers to design their own topol-ture providers or cloud providers . ogy and security settings such as ﬁrewall rules. VPC is es- sentially a more holistic design since it not only virtualizes servers and applications, but also the underlying commu- nication network as well. Additionally, for most companies, VPC provides seamless transition from a proprietary service infrastructure to a cloud-based infrastructure, owing to the virtualized network layer. For most service providers, selecting the right cloud model is dependent on the business scenario. For exam- ple, computation-intensive scientiﬁc applications are best deployed on public clouds for cost-effectiveness. Arguably,Fig. 2 Business model of cloud computing certain types of clouds will be more popular than others.
J Internet Serv Appl (2010) 1: 7–18 11In particular, it was predicted that hybrid clouds will be the Self-organizing: Since resources can be allocated or de-dominant type for most organizations . However, vir- allocated on-demand, service providers are empowered totual private clouds have started to gain more popularity since manage their resource consumption according to their owntheir inception in 2009. needs. Furthermore, the automated resource management feature yields high agility that enables service providers to respond quickly to rapid changes in service demand such as4 Cloud computing characteristics the ﬂash crowd effect. Utility-based pricing: Cloud computing employs a pay-Cloud computing provides several salient features that are per-use pricing model. The exact pricing scheme may varydifferent from traditional service computing, which we sum- from service to service. For example, a SaaS provider maymarize below: rent a virtual machine from an IaaS provider on a per-hour Multi-tenancy: In a cloud environment, services owned basis. On the other hand, a SaaS provider that providesby multiple providers are co-located in a single data center. on-demand customer relationship management (CRM) mayThe performance and management issues of these services charge its customers based on the number of clients it servesare shared among service providers and the infrastructure (e.g., Salesforce). Utility-based pricing lowers service oper-provider. The layered architecture of cloud computing pro- ating cost as it charges customers on a per-use basis. How-vides a natural division of responsibilities: the owner of each ever, it also introduces complexities in controlling the oper-layer only needs to focus on the speciﬁc objectives associ- ating cost. In this perspective, companies like VKernel ated with this layer. However, multi-tenancy also introduces provide software to help cloud customers understand, ana-difﬁculties in understanding and managing the interactions lyze and cut down the unnecessary cost on resource con-among various stakeholders. sumption. Shared resource pooling: The infrastructure provider of-fers a pool of computing resources that can be dynamicallyassigned to multiple resource consumers. Such dynamic re- 5 State-of-the-artsource assignment capability provides much ﬂexibility to in-frastructure providers for managing their own resource us- In this section, we present the state-of-the-art implementa-age and operating costs. For instance, an IaaS provider can tions of cloud computing. We ﬁrst describe the key technolo-leverage VM migration technology to attain a high degreeof server consolidation, hence maximizing resource utiliza- gies currently used for cloud computing. Then, we surveytion while minimizing cost such as power consumption and the popular cloud computing products.cooling. Geo-distribution and ubiquitous network access: Clouds 5.1 Cloud computing technologiesare generally accessible through the Internet and use theInternet as a service delivery network. Hence any device This section provides a review of technologies used in cloudwith Internet connectivity, be it a mobile phone, a PDA or computing environments.a laptop, is able to access cloud services. Additionally, toachieve high network performance and localization, many 5.1.1 Architectural design of data centersof today’s clouds consist of data centers located at manylocations around the globe. A service provider can easily A data center, which is home to the computation power andleverage geo-diversity to achieve maximum service utility. storage, is central to cloud computing and contains thou- Service oriented: As mentioned previously, cloud com- sands of devices like servers, switches and routers. Properputing adopts a service-driven operating model. Hence it planning of this network architecture is critical, as it willplaces a strong emphasis on service management. In a cloud, heavily inﬂuence applications performance and throughputeach IaaS, PaaS and SaaS provider offers its service accord- in such a distributed computing environment. Further, scala-ing to the Service Level Agreement (SLA) negotiated with bility and resiliency features need to be carefully considered.its customers. SLA assurance is therefore a critical objective Currently, a layered approach is the basic foundation ofof every provider. the network architecture design, which has been tested in Dynamic resource provisioning: One of the key features some of the largest deployed data centers. The basic layersof cloud computing is that computing resources can be ob- of a data center consist of the core, aggregation, and accesstained and released on the ﬂy. Compared to the traditional layers, as shown in Fig. 3. The access layer is where themodel that provisions resources according to peak demand, servers in racks physically connect to the network. Theredynamic resource provisioning allows service providers to are typically 20 to 40 servers per rack, each connected to anacquire resources based on the current demand, which can access switch with a 1 Gbps link. Access switches usuallyconsiderably lower the operating cost. connect to two aggregation switches for redundancy with
12 J Internet Serv Appl (2010) 1: 7–18 Scalability: The network infrastructure must be able to scale to a large number of servers and allow for incremental expansion. Backward compatibility: The network infrastructure should be backward compatible with switches and routers running Ethernet and IP. Because existing data centers have commonly leveraged commodity Ethernet and IP based de- vices, they should also be used in the new architecture with- out major modiﬁcations. Another area of rapid innovation in the industry is the de- sign and deployment of shipping-container based, modular data center (MDC). In an MDC, normally up to a few thou- sands of servers, are interconnected via switches to form the network infrastructure. Highly interactive applications, which are sensitive to response time, are suitable for geo-Fig. 3 Basic layered design of data center network infrastructure diverse MDC placed close to major population areas. The MDC also helps with redundancy because not all areas are likely to lose power, experience an earthquake, or suffer ri-10 Gbps links. The aggregation layer usually provides im- ots at the same time. Rather than the three-layered approachportant functions, such as domain service, location service, discussed above, Guo et al. [22, 23] proposed server-centric,server load balancing, and more. The core layer provides recursively deﬁned network structures of MDC.connectivity to multiple aggregation switches and providesa resilient routed fabric with no single point of failure. The 5.1.2 Distributed ﬁle system over cloudscore routers manage trafﬁc into and out of the data center. A popular practice is to leverage commodity Ethernet Google File System (GFS)  is a proprietary distributedswitches and routers to build the network infrastructure. In ﬁle system developed by Google and specially designed todifferent business solutions, the layered network infrastruc- provide efﬁcient, reliable access to data using large clustersture can be elaborated to meet speciﬁc business challenges. of commodity servers. Files are divided into chunks of 64Basically, the design of a data center network architecture megabytes, and are usually appended to or read and onlyshould meet the following objectives [1, 21–23, 35]: extremely rarely overwritten or shrunk. Compared with tra- Uniform high capacity: The maximum rate of a server- ditional ﬁle systems, GFS is designed and optimized to runto-server trafﬁc ﬂow should be limited only by the available on data centers to provide extremely high data throughputs,capacity on the network-interface cards of the sending and low latency and survive individual server failures.receiving servers, and assigning servers to a service should Inspired by GFS, the open source Hadoop Distributedbe independent of the network topology. It should be possi- File System (HDFS)  stores large ﬁles across multi-ble for an arbitrary host in the data center to communicate ple machines. It achieves reliability by replicating the datawith any other host in the network at the full bandwidth of across multiple servers. Similarly to GFS, data is stored on multiple geo-diverse nodes. The ﬁle system is built from aits local network interface. cluster of data nodes, each of which serves blocks of data Free VM migration: Virtualization allows the entire VM over the network using a block protocol speciﬁc to HDFS.state to be transmitted across the network to migrate a VM Data is also provided over HTTP, allowing access to all con-from one physical machine to another. A cloud comput- tent from a web browser or other types of clients. Data nodesing hosting service may migrate VMs for statistical multi- can talk to each other to rebalance data distribution, to moveplexing or dynamically changing communication patterns copies around, and to keep the replication of data high.to achieve high bandwidth for tightly coupled hosts or toachieve variable heat distribution and power availability in 5.1.3 Distributed application framework over cloudsthe data center. The communication topology should be de-signed so as to support rapid virtual machine migration. HTTP-based applications usually conform to some web ap- Resiliency: Failures will be common at scale. The net- plication framework such as Java EE. In modern data centerwork infrastructure must be fault-tolerant against various environments, clusters of servers are also used for computa-types of server failures, link outages, or server-rack failures. tion and data-intensive jobs such as ﬁnancial trend analysis,Existing unicast and multicast communications should not or ﬁlm animation.be affected to the extent allowed by the underlying physical MapReduce  is a software framework introduced byconnectivity. Google to support distributed computing on large data sets
J Internet Serv Appl (2010) 1: 7–18 13on clusters of computers. MapReduce consists of one Mas- data as “objects” that are grouped in “buckets.” Each objectter, to which client applications submit MapReduce jobs. contains from 1 byte to 5 gigabytes of data. Object namesThe Master pushes work out to available task nodes in the are essentially URI  pathnames. Buckets must be explic-data center, striving to keep the tasks as close to the data itly created before they can be used. A bucket can be storedas possible. The Master knows which node contains the in one of several Regions. Users can choose a Region to opti-data, and which other hosts are nearby. If the task cannot mize latency, minimize costs, or address regulatory require-be hosted on the node where the data is stored, priority is ments.given to nodes in the same rack. In this way, network trafﬁc Amazon Virtual Private Cloud (VPC) is a secure andon the main backbone is reduced, which also helps to im- seamless bridge between a company’s existing IT infrastruc-prove throughput, as the backbone is usually the bottleneck. ture and the AWS cloud. Amazon VPC enables enterprisesIf a task fails or times out, it is rescheduled. If the Master to connect their existing infrastructure to a set of isolatedfails, all ongoing tasks are lost. The Master records what it AWS compute resources via a Virtual Private Networkis up to in the ﬁlesystem. When it starts up, it looks for any (VPN) connection, and to extend their existing managementsuch data, so that it can restart work from where it left off. capabilities such as security services, ﬁrewalls, and intrusion The open source Hadoop MapReduce project  is in- detection systems to include their AWS resources.spired by Google’s work. Currently, many organizations are For cloud users, Amazon CloudWatch is a useful man-using Hadoop MapReduce to run large data-intensive com- agement tool which collects raw data from partnered AWSputations. services such as Amazon EC2 and then processes the in- formation into readable, near real-time metrics. The metrics5.2 Commercial products about EC2 include, for example, CPU utilization, network in/out bytes, disk read/write operations, etc.In this section, we provide a survey of some of the dominantcloud computing products. 5.2.2 Microsoft Windows Azure platform5.2.1 Amazon EC2 Microsoft’s Windows Azure platform  consists of threeAmazon Web Services (AWS)  is a set of cloud services, components and each of them provides a speciﬁc set of ser-providing cloud-based computation, storage and other func- vices to cloud users. Windows Azure provides a Windows-tionality that enable organizations and individuals to deploy based environment for running applications and storing dataapplications and services on an on-demand basis and at com- on servers in data centers; SQL Azure provides data servicesmodity prices. Amazon Web Services’ offerings are acces- in the cloud based on SQL Server; and .NET Services offersible over HTTP, using REST and SOAP protocols. distributed infrastructure services to cloud-based and local Amazon Elastic Compute Cloud (Amazon EC2) enables applications. Windows Azure platform can be used both bycloud users to launch and manage server instances in data applications running in the cloud and by applications run-centers using APIs or available tools and utilities. EC2 in- ning on local systems.stances are virtual machines running on top of the Xen virtu- Windows Azure also supports applications built on thealization engine . After creating and starting an instance, .NET Framework and other ordinary languages supported inusers can upload software and make changes to it. When Windows systems, like C#, Visual Basic, C++, and others.changes are ﬁnished, they can be bundled as a new machine Windows Azure supports general-purpose programs, ratherimage. An identical copy can then be launched at any time. than a single class of computing. Developers can create webUsers have nearly full control of the entire software stack applications using technologies such as ASP.NET and Win-on the EC2 instances that look like hardware to them. On dows Communication Foundation (WCF), applications thatthe other hand, this feature makes it inherently difﬁcult for run as independent background processes, or applicationsAmazon to offer automatic scaling of resources. that combine the two. Windows Azure allows storing data EC2 provides the ability to place instances in multiple lo- in blobs, tables, and queues, all accessed in a RESTful stylecations. EC2 locations are composed of Regions and Avail- via HTTP or HTTPS.ability Zones. Regions consist of one or more Availability SQL Azure components are SQL Azure Database andZones, are geographically dispersed. Availability Zones are “Huron” Data Sync. SQL Azure Database is built on Mi-distinct locations that are engineered to be insulated from crosoft SQL Server, providing a database management sys-failures in other Availability Zones and provide inexpensive, tem (DBMS) in the cloud. The data can be accessed usinglow latency network connectivity to other Availability Zones ADO.NET and other Windows data access interfaces. Usersin the same Region. can also use on-premises software to work with this cloud- EC2 machine images are stored in and retrieved from based information. “Huron” Data Sync synchronizes rela-Amazon Simple Storage Service (Amazon S3). S3 stores tional data across various on-premises DBMSs.
14 J Internet Serv Appl (2010) 1: 7–18Table 1 A comparison of representative commercial productsCloud Provider Amazon EC2 Windows Azure Google App EngineClasses of Utility Computing Infrastructure service Platform service Platform serviceTarget Applications General-purpose applications General-purpose Windows Traditional web applications applications with supported frameworkComputation OS Level on a Xen Virtual Microsoft Common Language Predeﬁned web application Machine Runtime (CLR) VM; Predeﬁned frameworks roles of app. instancesStorage Elastic Block Store; Amazon Azure storage service and SQL BigTable and MegaStore Simple Storage Service (S3); Data Services Amazon SimpleDBAuto Scaling Automatically changing the Automatic scaling based on Automatic Scaling which is number of instances based on application roles and a transparent to users parameters that users specify conﬁguration ﬁle speciﬁed by users The .NET Services facilitate the creation of distributed and management of the resources. Users can choose oneapplications. The Access Control component provides a type or combinations of several types of cloud offerings tocloud-based implementation of single identity veriﬁcation satisfy speciﬁc business requirements.across applications and companies. The Service Bus helpsan application expose web services endpoints that can beaccessed by other applications, whether on-premises or in 6 Research challengesthe cloud. Each exposed endpoint is assigned a URI, whichclients can use to locate and access a service. Although cloud computing has been widely adopted by the All of the physical resources, VMs and applications in industry, the research on cloud computing is still at an earlythe data center are monitored by software called the fabric stage. Many existing issues have not been fully addressed,controller. With each application, the users upload a conﬁg- while new challenges keep emerging from industry applica-uration ﬁle that provides an XML-based description of what tions. In this section, we summarize some of the challengingthe application needs. Based on this ﬁle, the fabric controller research issues in cloud computing.decides where new applications should run, choosing phys-ical servers to optimize hardware utilization. 6.1 Automated service provisioning5.2.3 Google App Engine One of the key features of cloud computing is the capabil- ity of acquiring and releasing resources on-demand. The ob-Google App Engine  is a platform for traditional web jective of a service provider in this case is to allocate andapplications in Google-managed data centers. Currently, the de-allocate resources from the cloud to satisfy its servicesupported programming languages are Python and Java. level objectives (SLOs), while minimizing its operationalWeb frameworks that run on the Google App Engine include cost. However, it is not obvious how a service provider canDjango, CherryPy, Pylons, and web2py, as well as a custom achieve this objective. In particular, it is not easy to de-Google-written web application framework similar to JSP termine how to map SLOs such as QoS requirements toor ASP.NET. Google handles deploying code to a cluster, low-level resource requirement such as CPU and memorymonitoring, failover, and launching application instances as requirements. Furthermore, to achieve high agility and re-necessary. Current APIs support features such as storing and spond to rapid demand ﬂuctuations such as in ﬂash crowdretrieving data from a BigTable  non-relational database, effect, the resource provisioning decisions must be made on-making HTTP requests and caching. Developers have read- line.only access to the ﬁlesystem on App Engine. Automated service provisioning is not a new problem. Table 1 summarizes the three examples of popular cloud Dynamic resource provisioning for Internet applications hasofferings in terms of the classes of utility computing, tar- been studied extensively in the past [47, 57]. These ap-get types of application, and more importantly their models proaches typically involve: (1) Constructing an applicationof computation, storage and auto-scaling. Apparently, these performance model that predicts the number of applicationcloud offerings are based on different levels of abstraction instances required to handle demand at each particular level,
J Internet Serv Appl (2010) 1: 7–18 15in order to satisfy QoS requirements; (2) Periodically pre- communication requirements, have also been considered re-dicting future demand and determining resource require- cently .ments using the performance model; and (3) Automatically However, server consolidation activities should not hurtallocating resources using the predicted resource require- application performance. It is known that the resource usagements. Application performance model can be constructed (also known as the footprint ) of individual VMs mayusing various techniques, including Queuing theory , vary over time . For server resources that are sharedControl theory  and Statistical Machine Learning . among VMs, such as bandwidth, memory cache and disk Additionally, there is a distinction between proactive I/O, maximally consolidating a server may result in re-and reactive resource control. The proactive approach uses source congestion when a VM changes its footprint on thepredicted demand to periodically allocate resources before server . Hence, it is sometimes important to observethey are needed. The reactive approach reacts to immedi- the ﬂuctuations of VM footprints and use this informationate demand ﬂuctuations before periodic demand prediction for effective server consolidation. Finally, the system mustis available. Both approaches are important and necessary quickly react to resource congestions when they occur .for effective resource control in dynamic operating environ-ments. 6.4 Energy management6.2 Virtual machine migration Improving energy efﬁciency is another major issue in cloud computing. It has been estimated that the cost of poweringVirtualization can provide signiﬁcant beneﬁts in cloud com- and cooling accounts for 53% of the total operational expen-puting by enabling virtual machine migration to balance diture of data centers . In 2006, data centers in the USload across the data center. In addition, virtual machine mi- consumed more than 1.5% of the total energy generated ingration enables robust and highly responsive provisioning in that year, and the percentage is projected to grow 18% an-data centers. nually . Hence infrastructure providers are under enor- Virtual machine migration has evolved from process mous pressure to reduce energy consumption. The goal ismigration techniques . More recently, Xen  and not only to cut down energy cost in data centers, but also toVMWare  have implemented “live” migration of VMs meet government regulations and environmental standards.that involves extremely short downtimes ranging from tens Designing energy-efﬁcient data centers has recently re-of milliseconds to a second. Clark et al.  pointed out that ceived considerable attention. This problem can be ap-migrating an entire OS and all of its applications as one unit proached from several directions. For example, energy-allows to avoid many of the difﬁculties faced by process- efﬁcient hardware architecture that enables slowing downlevel migration approaches, and analyzed the beneﬁts of live CPU speeds and turning off partial hardware componentsmigration of VMs.  has become commonplace. Energy-aware job schedul- The major beneﬁts of VM migration is to avoid hotspots; ing  and server consolidation  are two other ways tohowever, this is not straightforward. Currently, detecting reduce power consumption by turning off unused machines.workload hotspots and initiating a migration lacks the agility Recent research has also begun to study energy-efﬁcient net-to respond to sudden workload changes. Moreover, the in- work protocols and infrastructures . A key challenge inmemory state should be transferred consistently and efﬁ- all the above methods is to achieve a good trade-off betweenciently, with integrated consideration of resources for appli- energy savings and application performance. In this respect,cations and physical servers. few researchers have recently started to investigate coordi- nated solutions for performance and power management in6.3 Server consolidation a dynamic cloud environment .Server consolidation is an effective approach to maximize 6.5 Trafﬁc management and analysisresource utilization while minimizing energy consumptionin a cloud computing environment. Live VM migration tech- Analysis of data trafﬁc is important for today’s data cen-nology is often used to consolidate VMs residing on multi- ters. For example, many web applications rely on analysisple under-utilized servers onto a single server, so that the of trafﬁc data to optimize customer experiences. Networkremaining servers can be set to an energy-saving state. The operators also need to know how trafﬁc ﬂows through theproblem of optimally consolidating servers in a data center network in order to make many of the management and plan-is often formulated as a variant of the vector bin-packing ning decisions.problem , which is an NP-hard optimization problem. However, there are several challenges for existing traf-Various heuristics have been proposed for this problem ﬁc measurement and analysis methods in Internet Service[33, 46]. Additionally, dependencies among VMs, such as Providers (ISPs) networks and enterprise to extend to data
16 J Internet Serv Appl (2010) 1: 7–18centers. Firstly, the density of links is much higher than applications leverage MapReduce frameworks such asthat in ISPs or enterprise networks, which makes the worst- Hadoop for scalable and fault-tolerant data processing. Re-case scenario for existing methods. Secondly, most existing cent work has shown that the performance and resource con-methods can compute trafﬁc matrices between a few hun- sumption of a MapReduce job is highly dependent on thedreds end hosts, but even a modular data center can have type of the application [29, 42, 56]. For instance, Hadoopseveral thousand servers. Finally, existing methods usually tasks such as sort is I/O intensive, whereas grep requiresassume some ﬂow patterns that are reasonable in Internet signiﬁcant CPU resources. Furthermore, the VM allocatedand enterprises networks, but the applications deployed on to each Hadoop node may have heterogeneous character-data centers, such as MapReduce jobs, signiﬁcantly change istics. For example, the bandwidth available to a VM isthe trafﬁc pattern. Further, there is tighter coupling in appli- dependent on other VMs collocated on the same server.cation’s use of network, computing, and storage resources, Hence, it is possible to optimize the performance and costthan what is seen in other settings. of a MapReduce application by carefully selecting its con- Currently, there is not much work on measurement and ﬁguration parameter values  and designing more efﬁ-analysis of data center trafﬁc. Greenberg et al.  report cient scheduling algorithms [42, 56]. By mitigating the bot-data center trafﬁc characteristics on ﬂow sizes and concur- tleneck resources, execution time of applications can berent ﬂows, and use these to guide network infrastructure de- signiﬁcantly improved. The key challenges include perfor-sign. Benson et al.  perform a complementary study of mance modeling of Hadoop jobs (either online or ofﬂine),trafﬁc at the edges of a data center by examining SNMP and adaptive scheduling in dynamic conditions.traces from routers. Another related approach argues for making MapReduce frameworks energy-aware . The essential idea of this ap-6.6 Data security proach is to turn Hadoop node into sleep mode when it hasData security is another important research topic in cloud ﬁnished its job while waiting for new assignments. To do so,computing. Since service providers typically do not have ac- both Hadoop and HDFS must be made energy-aware. Fur-cess to the physical security system of data centers, they thermore, there is often a trade-off between performance andmust rely on the infrastructure provider to achieve full energy-awareness. Depending on the objective, ﬁnding a de-data security. Even for a virtual private cloud, the service sirable trade-off point is still an unexplored research topic.provider can only specify the security setting remotely, with- 6.8 Storage technologies and data managementout knowing whether it is fully implemented. The infrastruc-ture provider, in this context, must achieve the following Software frameworks such as MapReduce and its variousobjectives: (1) conﬁdentiality, for secure data access and implementations such as Hadoop and Dryad are designedtransfer, and (2) auditability, for attesting whether secu- for distributed processing of data-intensive tasks. As men-rity setting of applications has been tampered or not. Con- tioned previously, these frameworks typically operate onﬁdentiality is usually achieved using cryptographic proto- Internet-scale ﬁle systems such as GFS and HDFS. Thesecols, whereas auditability can be achieved using remote at- ﬁle systems are different from traditional distributed ﬁle sys-testation techniques. Remote attestation typically requires atrusted platform module (TPM) to generate non-forgeable tems in their storage structure, access pattern and applicationsystem summary (i.e. system state encrypted using TPM’s programming interface. In particular, they do not implementprivate key) as the proof of system security. However, in a the standard POSIX interface, and therefore introduce com-virtualized environment like the clouds, VMs can dynami- patibility issues with legacy ﬁle systems and applications.cally migrate from one location to another, hence directly Several research efforts have studied this problem [4, 40].using remote attestation is not sufﬁcient. In this case, it is For instance, the work in  proposed a method for sup-critical to build trust mechanisms at every architectural layer porting the MapReduce framework using cluster ﬁle sys-of the cloud. Firstly, the hardware layer must be trusted tems such as IBM’s GPFS. Patil et al.  proposed newusing hardware TPM. Secondly, the virtualization platform API primitives for scalable and concurrent data access.must be trusted using secure virtual machine monitors .VM migration should only be allowed if both source and 6.9 Novel cloud architecturesdestination servers are trusted. Recent work has been de- Currently, most of the commercial clouds are implementedvoted to designing efﬁcient protocols for trust establishment in large data centers and operated in a centralized fashion.and management [31, 43]. Although this design achieves economy-of-scale and high6.7 Software frameworks manageability, it also comes with its limitations such high energy expense and high initial investment for construct-Cloud computing provides a compelling platform for host- ing data centers. Recent work [12, 48] suggests that small-ing large-scale data-intensive applications. Typically, these size data centers can be more advantageous than big data
J Internet Serv Appl (2010) 1: 7–18 17centers in many cases: a small data center does not con- 4. Ananthanarayanan R, Gupta K et al (2009) Cloud analytics: do wesume so much power, hence it does not require a power- really need to reinvent the storage stack? In: Proc of HotCloud 5. Armbrust M et al (2009) Above the clouds: a Berkeley view offul and yet expensive cooling system; small data centers are cloud computing. UC Berkeley Technical Reportcheaper to build and better geographically distributed than 6. Berners-Lee T, Fielding R, Masinter L (2005) RFC 3986: uniformlarge data centers. Geo-diversity is often desirable for re- resource identiﬁer (URI): generic syntax, January 2005sponse time-critical services such as content delivery and 7. Bodik P et al (2009) Statistical machine learning makes automatic control practical for Internet datacenters. In: Proc HotCloudinteractive gaming. For example, Valancius et al.  stud- 8. Brooks D et al (2000) Power-aware microarchitecture: designied the feasibility of hosting video-streaming services using and modeling challenges for the next-generation microprocessors,application gateways (a.k.a. nano-data centers). IEEE Micro Another related research trend is on using voluntary re- 9. Chandra A et al (2009) Nebulas: using distributed voluntary re- sources to build clouds. In: Proc of HotCloudsources (i.e. resources donated by end-users) for hosting 10. Chang F, Dean J et al (2006) Bigtable: a distributed storage systemcloud applications . Clouds built using voluntary re- for structured data. In: Proc of OSDIsources, or a mixture of voluntary and dedicated resources 11. Chekuri C, Khanna S (2004) On multi-dimensional packing prob-are much cheaper to operate and more suitable for non-proﬁt lems. SIAM J Comput 33(4):837–851applications such as scientiﬁc computing. However, this ar- 12. Church K et al (2008) On delivering embarrassingly distributed cloud services. In: Proc of HotNetschitecture also imposes challenges such managing heteroge- 13. Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I,neous resources and frequent churn events. Also, devising Warﬁeld A (2005) Live migration of virtual machines. In: Proc ofincentive schemes for such architectures is an open research NSDIproblem. 14. Cloud Computing on Wikipedia, en.wikipedia.org/wiki/ Cloudcomputing, 20 Dec 2009 15. Cloud Hosting, CLoud Computing and Hybrid Infrastructure from GoGrid, http://www.gogrid.com7 Conclusion 16. Dean J, Ghemawat S (2004) MapReduce: simpliﬁed data process- ing on large clusters. In: Proc of OSDI 17. Dedicated Server, Managed Hosting, Web Hosting by RackspaceCloud computing has recently emerged as a compelling par- Hosting, http://www.rackspace.comadigm for managing and delivering services over the Inter- 18. FlexiScale Cloud Comp and Hosting, www.ﬂexiscale.comnet. The rise of cloud computing is rapidly changing the 19. Ghemawat S, Gobioff H, Leung S-T (2003) The Google ﬁle sys-landscape of information technology, and ultimately turning tem. In: Proc of SOSP, October 2003 20. Google App Engine, URL http://code.google.com/appenginethe long-held promise of utility computing into a reality. 21. Greenberg A, Jain N et al (2009) VL2: a scalable and ﬂexible data However, despite the signiﬁcant beneﬁts offered by cloud center network. In: Proc SIGCOMMcomputing, the current technologies are not matured enough 22. Guo C et al (2008) DCell: a scalable and fault-tolerant networkto realize its full potential. Many key challenges in this structure for data centers. In: Proc SIGCOMM 23. Guo C, Lu G, Li D et al (2009) BCube: a high performance,domain, including automatic resource provisioning, power server-centric network architecture for modular data centers. In:management and security management, are only starting Proc SIGCOMMto receive attention from the research community. There- 24. Hadoop Distributed File System, hadoop.apache.org/hdfsfore, we believe there is still tremendous opportunity for re- 25. Hadoop MapReduce, hadoop.apache.org/mapreduce 26. Hamilton J (2009) Cooperative expendable micro-slice serverssearchers to make groundbreaking contributions in this ﬁeld, (CEMS): low cost, low power servers for Internet-scale servicesand bring signiﬁcant impact to their development in the in- In: Proc of CIDRdustry. 27. IEEE P802.3az Energy Efﬁcient Ethernet Task Force, www. In this paper, we have surveyed the state-of-the-art of ieee802.org/3/az 28. Kalyvianaki E et al (2009) Self-adaptive and self-conﬁgured CPUcloud computing, covering its essential concepts, architec- resource provisioning for virtualized servers using Kalman ﬁlters.tural designs, prominent characteristics, key technologies as In: Proc of international conference on autonomic computingwell as research directions. As the development of cloud 29. Kambatla K et al (2009) Towards optimizing Hadoop provisioningcomputing technology is still at an early stage, we hope our in the cloud. In: Proc of HotCloudwork will provide a better understanding of the design chal- 30. Kernal Based Virtual Machine, www.linux-kvm.org/page/ MainPagelenges of cloud computing, and pave the way for further re- 31. Krautheim FJ (2009) Private virtual infrastructure for cloud com-search in this area. puting. In: Proc of HotCloud 32. Kumar S et al (2009) vManage: loosely coupled platform and vir- tualization management in data centers. In: Proc of international conference on cloud computingReferences 33. Li B et al (2009) EnaCloud: an energy-saving application live placement approach for cloud computing environments. In: Proc 1. Al-Fares M et al (2008) A scalable, commodity data center net- of international conf on cloud computing work architecture. In: Proc SIGCOMM 34. Meng X et al (2010) Improving the scalability of data center net- 2. Amazon Elastic Computing Cloud, aws.amazon.com/ec2 works with trafﬁc-aware virtual machine placement. In: Proc IN- 3. Amazon Web Services, aws.amazon.com FOCOM
18 J Internet Serv Appl (2010) 1: 7–1835. Mysore R et al (2009) PortLand: a scalable fault-tolerant layer 2 46. Srikantaiah S et al (2008) Energy aware consolidation for cloud data center network fabric. In: Proc SIGCOMM computing. In: Proc of HotPower36. NIST Deﬁnition of Cloud Computing v15, csrc.nist.gov/groups/ 47. Urgaonkar B et al (2005) Dynamic provisioning of multi-tier In- SNS/cloud-computing/cloud-def-v15.doc ternet applications. In: Proc of ICAC37. Osman S, Subhraveti D et al (2002) The design and implementa- 48. Valancius V, Laoutaris N et al (2009) Greening the Internet with tion of zap: a system for migrating computing environments. In: nano data centers. In: Proc of CoNext Proc of OSDI 49. Vaquero L, Rodero-Merino L, Caceres J, Lindner M (2009)38. Padala P, Hou K-Y et al (2009) Automated control of multiple A break in the clouds: towards a cloud deﬁnition. ACM SIG- virtualized resources. In: Proc of EuroSys COMM computer communications review39. Parkhill D (1966) The challenge of the computer utility. Addison- 50. Vasic N et al (2009) Making cluster applications energy-aware. In: Wesley, Reading Proc of automated ctrl for datacenters and clouds40. Patil S et al (2009) In search of an API for scalable ﬁle systems: 51. Virtualization Resource Chargeback, www.vkernel.com/products/ under the table or above it? HotCloud EnterpriseChargebackVirtualAppliance 52. VMWare ESX Server, www.vmware.com/products/esx41. Salesforce CRM, http://www.salesforce.com/platform 53. Windows Azure, www.microsoft.com/azure42. Sandholm T, Lai K (2009) MapReduce optimization us- 54. Wood T et al (2007) Black-box and gray-box strategies for virtual ing regulated dynamic prioritization. In: Proc of SIGMET- machine migration. In: Proc of NSDI RICS/Performance 55. XenSource Inc, Xen, www.xensource.com43. Santos N, Gummadi K, Rodrigues R (2009) Towards trusted cloud 56. Zaharia M et al (2009) Improving MapReduce performance in het- computing. In: Proc of HotCloud erogeneous environments. In: Proc of HotCloud44. SAP Business ByDesign, www.sap.com/sme/solutions/ 57. Zhang Q et al (2007) A regression-based analytic model for dy- businessmanagement/businessbydesign/index.epx namic resource provisioning of multi-tier applications. In: Proc45. Sonnek J et al (2009) Virtual putty: reshaping the physical foot- ICAC print of virtual machines. In: Proc of HotCloud