CLOUD COMPUTING                                                  grow—simultaneously—they realize that it’s               ...
Host a variety of different workloads,           The Cloud makes it possible to launch Webincluding batch-style back-end j...
made in such a way that the different service    resources by these virtual server instances.level agreements and reliabil...
remote server. It brings together all the       output key/value pairs. The user of thefeatures such as deployment plannin...
file outputs as a double check to ensure that     what processes are reading or writing to athere are not parallel conflic...
compute nodes, providing very high                  Facebook uses Hadoop to store copies ofaggregate bandwidth across the ...
Amazon EC2s simple web service interface         CPU, and instance storage that is optimal forallows you to obtain and    ...
scalable, hosted queue for storing messages   infrastructure. App Engine applications areas they travel between computers....
email address and displayable nameassociated with the account. Using Google                                               ...
Upcoming SlideShare
Loading in …5

Cloud computing


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cloud computing

  1. 1. CLOUD COMPUTING grow—simultaneously—they realize that it’s not possible to succeed simply by doing the same thingsbetter. They know they have to do new things that produce better results. Cloud computing enables innovation. It alleviates the need of innovators to find1.Abstract1.Abstract resources to This paper describes cloud computing, a develop, test, and make their innovationscomputing platform for the next generation of available to the user community. Innovatorsthe Internet. The paper defines clouds, are free toexplains the business benefits of cloud focus on the innovation rather than thecomputing, and logistics of finding and managing resourcesoutlines cloud architecture and its major that enable thecomponents. Readers will discover how a innovation. Cloud computing helps leveragebusiness can use innovation as early as possible to delivercloud computing to foster innovation and businessvalue to IBM and its customers.reduce IT costs. Fostering innovation requires unprecedentedIntroductionEnterprises strive to reduce flexibility and responsiveness. The enterprisecomputing costs. Many start by consolidating should provide an ecosystem where innovatorstheir IT operations and are not hindered by excessive processes, rules,later introducing virtualization technologies. and resource constraints. In this context, aCloud computing takes these steps to a new cloud computing service is a necessity. Itleveland allows an organization to further comprisesreduce costs through improved utilization, an automated framework that can deliverreduced standardized services quickly and cheaply.administration and infrastructure costs, and Cloud computing is a term used to describefaster deployment cycles. The cloud is a next both a platform and type of application. Ageneration platform that provides dynamic cloudresource pools, virtualization, and high computing platform dynamically provisions,availability. configures, reconfigures, and deprovisionsCloud computing describes both a platform servers asand a type of application. A cloud computing needed. Servers in the cloud can be physicalplatform dynamically provisions, configures, machines or virtual machines. Advancedreconfigures, and deprovisions servers as cloudsneeded. typically include other computing resourcesCloud applications are applications that are such as storage area networks (SANs),extended to be accessible through the Internet. networkThese equipment, firewall and other security applications use large data centers and Cloud computing also describes applicationspowerful servers that host Web applications that are extended to be accessible through theandWeb services. Internet. These cloud applications use large data centers and powerful servers that host2.Introduction Webapplications and Web services. Anyone with a suitable Internet connection and a standardbrowser can access a cloudCloud computing infrastructure accelerates application.and fosters the adoption of innovationsEnterprises are increasingly making 3.Definitioninnovation their highest priority. They realize A cloud is a pool of virtualized computerthey need to resources. A cloud can:seek new ideas and unlock new sources ofvalue. Driven by the pressure to cut costs and
  2. 2. Host a variety of different workloads, The Cloud makes it possible to launch Webincluding batch-style back-end jobs and 2.0 applications quickly and to scale upinteractive,user-facing applications applicationsas much as needed when needed. Allow workloads to be deployed and scaled- The platform supports traditional Java™ andout quickly through the rapid provisioning of Linux, Apache,virtual machines or physical machines MySQL, PHP (LAMP) stack-based Support redundant, self-recovering, highly applications as well as new architectures suchscalable programming models that allow as MapReduceand the Google File System,workloads to recover from many unavoidable which provide a means to scale applicationshardware/software failures across thousands ofservers instantly Monitor resource use in real time to enable architecturerebalancing of allocations when neededCloud computing environments support gridcomputing by quickly providing physical andvirtualservers on which the grid applicationscan run. Cloud computing should not beconfused withgrid computing. Grid computing involvesdividing a large task into many smaller tasksthat runin parallel on separate servers. Gridsrequire many computers, typically in the 4. Enablingthousands, andcommonly use servers,desktops, and laptops. TechnologiesClouds also support nongrid environments, 4.1. Cloud Computing Applicationsuch as a three-tier Web architecture runningstandardor Web 2.0 applications. A cloud is Architecturemore than a collection of computer resourcesbecause acloud provides a mechanism tomanage those resources. Management includesprovisioning,change requests, reimaging, workloadrebalancing, deprovisioning, and monitoring. • BenefitsCloud computing infrastructures can allowenterprises to achieve more efficient use oftheir IThardware and software investments.They do this by breaking down the physical This gives the basic architecture of a cloudbarriers inherentin isolated systems, and computing application. We knowautomating the management of the group ofsystems as a single entity. that cloud computing is the shift ofCloud computing is an example of an computing to a host of hardwareultimately virtualized system, and a natural infrastructure thatis distributed in the cloud.evolution for The commodity hardware infrastructuredata centers that employ automated systems consists of thevarious low cost data serversmanagement, workload balancing, and that are connected to the system and providevirtualizationtechnologies. theirstorage and processing and otherA cloud infrastructure can be a cost efficient computing resources to the application.model for delivering information services, Cloudcomputing involves runningreducingIT management complexity, applications on virtual servers that arepromoting innovation, and increasing allocated on thisdistributed hardwareresponsiveness through realtime infrastructure available in the cloud. Theseworkload balancing. virtual servers arePage 2 of 9
  3. 3. made in such a way that the different service resources by these virtual server instances.level agreements and reliability issues One of these is the Xen hypervisor whichare met. provides an abstraction layer between the There may be multiple instances of the hardware and the virtual OS so that thesame virtual server accessing the distribution of the resources and thedifferent parts of the hardware infrastructure processing is well managed.available. This is to make sure that there Another applicationare multiple copies of the applications which that is widely used is the Enomalism serverare ready to take over on another one’s management system which is used forfailure. management of the infrastructure platform. The virtual server distributes the processing When Xen is used for virtualization of thebetween the infrastructure and the servers over the infrastructure, acomputing is done and the result returned. thin software layer known as the XenThere will be a workload distribution hypervisor is inserted between the serversmanagement system, also known as the grid hardware and the operating system. Thisengine, for managing the different provides an abstraction layer that allowsrequests coming to the virtual servers. This each physical server to run one or moreengine will take care of the creation of "virtual servers," effectively decoupling themultiple copies and also the preservation of operating system and its applications fromintegrity of the data that is stored in the the underlying physical server.infrastructure. This will also adjust itself The Xensuch that even on heavier load, the hypervisor is a unique open sourceprocessing is completed as per the technology, developed collaboratively by therequirements. XenThe different workload management community and engineers at over 20 of thesystems are hidden from the users. For the most innovative data center solutionuser, the processing is done and the result vendors, including AMD, Cisco, Dell, HP,is obtained. There is no question of where it IBM, Intel, Mellanox, Network Appliance,was done and how it was done. The users Novell, Red Hat, SGI, Sun, Unisys, Veritas,are billed based on the usage of the system - Voltaire, and Citrix. Xen is licensedas said before - the commodity is now under the GNU General Public Licensecycles and bytes. The billing is usually on (GPL2) and is available at no charge in boththe basis of usage per CPU per hour or GB source and object transfer per hour. The Xen hypervisor is also exceptionally4.2. Server Architecture lean-- less than 50,000 lines of code. That translates to extremely low overhead and near-native performance for guests. Xen re-uses existing device drivers (both closed and open source) from Linux, making device management easy. Moreover Xen is robust to device driver failure and protects both guests and the hypervisor from faulty orCloud computing makes use of a large malicious driversphysical resource pool in the cloud. As The Enomalism virtualized serversaid above, cloud computing services and management system is a complete virtualapplications make use of virtual server server infrastructure platform. Enomalisminstances built upon this resource pool. helps in an effective management of theThere are two applications which help in resources. Enomalism can be used to tap intomanaging the server instances, the resources the cloud just as you would into aand also the management of thePage 3 of 9
  4. 4. remote server. It brings together all the output key/value pairs. The user of thefeatures such as deployment planning, load MapReduce library expresses thebalancing, resource monitoring, etc. computationEnomalism is an open source application. It as two functions: Map and Reduce.has avery simple and easy to use web based Map, written by the user, takes an input pairuser interface. It has a module architecture and produces a set ofwhich allows for the creation of additional intermediate key/value pairs. Thesystem add-ons and plugins. It supports MapReduce library groups together allone click deployment of distributed or intermediatereplicated applications on a global basis. It values associated with the same intermediatesupports the management of various virtual key I and passes them to the Reduceenvironments including KVM/Qemu,Amazon EC2 and Xen, OpenVZ, LinuxContainers, VirtualBox. It has fine graineduser permissions and access privileges.4.3. Map ReduceMap Reduce is a software frameworkdeveloped at Google in 2003 to supportparallel computations over large (multiplepetabyte) data sets on clusters ofcommodity computers. This framework is The Reduce function, also written by thelargely taken from ‘map’ and ‘reduce’ user, accepts an intermediate key Ifunctions commonly used in functional and a set of values for that key. It mergesprogramming, although the actual semantics together these values to form a possiblyof the framework are not the same. It is a smaller set of values. Typically just zero orprogramming model and an associated one output value is produced per Reduceimplementation for processing and invocation. The intermediate values aregenerating large data sets. Many of the real supplied to the users reduce function via anworld iterator. This allows us to handle lists oftasks are expressible in this model. values that are too large to fit in memory.MapReduce implementations have beenwritten inC++, Java and other languages.Programs written in this functional style areautomatically parallelized andexecuted on the cloud. The run-time systemtakes care of the details of partitioningthe input data, scheduling the program’sexecution across a set of machines, handlingmachine failures, and managing the required MapReduce achieves reliability by parcelinginter-machine communication. This out a number of operations onallows programmers without any experience the set of data to each node in the network;with parallel and distributed systems to each node is expected to report backeasily utilize the resources of a largely periodically with completed work and statusdistributed system. updates. If a node falls silent for longerThe computation takes a set of input than that interval, the master node recordskey/value pairs, and produces a set of the node as dead, and sends out the nodes assigned work to other nodes. Individual operations use atomic operations for namingPage 4 of 9
  5. 5. file outputs as a double check to ensure that what processes are reading or writing to athere are not parallel conflicting threads particular chunk, or taking a "snapshot" ofrunning; when files are renamed, it is the chunk pursuant to replicating it (usuallypossible to also copy them to another name at the instigation of the Master server,in when, due to node failures, the number ofaddition to the name of the task (allowing copies of a chunk has fallen beneath the setfor side-effects). number). All this metadata is kept current by4.4. Google File System the Master server periodically receivingGoogle File System (GFS) is a scalable updates from each chunk server ("Heart-beatdistributed file system developed by messages").Google for data intensive applications. It is Permissions for modifications are handleddesigned to provide efficient, reliable by a system of time-limited, expiringaccess to data using large clusters of "leases", where the Master server grantscommodity hardware. permission to a process for a finite period ofIt provides fault tolerance time during which no other process will bewhile running on inexpensive commodity granted permission by the Master server tohardware, and it delivers high aggregate modify the chunk.performance to a large number of clients. The modified chunkserver, which is alwaysFiles are divided into chunks of 64 the primary chunkmegabytes, which are only extremely rarely holder, then propagates the changes to theoverwritten, or shrunk; files are usually chunkservers with the backup copies. Theappended to or read. changes are not saved until all chunkservers It is also designed and acknowledge, thus guaranteeing theoptimized to run on computing clusters, the completion and atomicity of the operation.nodes of which consist of cheap, Programs access the chunks by first"commodity" computers, which means querying the Master server for the locationsprecautions must be taken against the high of the desired chunks; if the chunks are notfailure rate of individual nodes and the being operated on (if there are nosubsequent data loss. Other design decisions outstanding leases), the Master replies withselect for high data throughputs, even when the locations, and the program thenit comes at the cost of latency. contacts and receives the data from the The nodes are divided into two types: chunkserver directly. As opposed to manyone Master node and a large number of fileChunkservers. Chunkservers store the data systems, its not implemented in the kernelfiles, with each individual file broken up of an Operating System but accessedinto fixed size chunks (hence the name) of through a library to avoid overhead.about 64 megabytes, similar to clusters or 4.5. Hadoopsectors in regular file systems. Each chunk is Hadoop is a framework for runningassigned a unique 64-bit label, and applications on large cluster built oflogical mappings of files to constituent commodity hardware. The Hadoopchunks are maintained. Each chunk is framework transparently providesreplicated several times throughout the applicationsnetwork, with the minimum being three, but both reliability and data motion. Hadoopeven more for files that have high demand or implements the computation paradigmneed more redundancy. named MapReduce which was explainedThe Master server doesnt usually store the above. The application is divided into manyactual chunks, but rather all the small fragments of work, each of which maymetadata associated with the chunks, such as be executed or re-executed on any nodethe tables mapping the 64-bit labels to in the cluster. In addition, it provides achunk locations and the files they make up, distributed file system that stores data on thethe locations of the copies of the chunks,Page 5 of 9
  6. 6. compute nodes, providing very high Facebook uses Hadoop to store copies ofaggregate bandwidth across the cluster. internal logs and dimension data sources a Both use it as a source for reporting/analytics andMapReduce and the distributed file system machine learning. The New York Timesare designed so that the node failures are made use of Hadoop for large scale imageautomatically handled by the framework. conversions. Yahoo uses Hadoop toHadoop has been implemented making use support research for advertisement systemsof Java. In Hadoop, the combination of the and web searching tools. They also use itentire JAR files and classed needed to run to do scaling tests to support development ofa MapReduce program is called a job. All of Hadoopthese components are themselvescollected into a JAR which is usually 5. Cloud Computingreferred to as the job file. To execute a job, it Servicesis Even though cloud computing is a prettysubmitted to a jobTracker and then executed. new technology, there are manyTasks in each phase are executed in a fault- companies offering cloud computingtolerant manner. If node(s) fail in services. Different companies like Amazon,the middle of a computation the tasks Google, Yahoo, IBM and Microsoft are allassigned to them are re-distributed among players in the cloud computing servicesthe industry. But Amazon is the pioneer in theremaining nodes. Since we are using cloud computing industry with servicesMapReduce, having many map and reduce like EC2 (Elastic Compute Cloud) and S3tasksenables good load balancing and allows (Simple Storage Service) dominating thefailed tasks to be re-run with smaller runtime industry. Amazon has an expertise in thisoverhead. industry and has a small advantage over theThe Hadoop MapReduce framework has others because of this. Microsoft has goodmaster/slave architecture. It has a knowledge of the fundamentals of cloudsingle master server or a jobTracker and science and is building massive data centers.several slave servers or taskTrackers, one IBM, the king of business computingper and traditional supercomputers, teams upnode in the cluster. The jobTracker is the with Google to get a foothold in the clouds.point of interaction between the users and Google is far and away the leader in cloudthe framework. Users submit jobs to the computing with the company itself builtjobTracker, which puts them in a queue of from the ground up on hardware.pending jobs and executes them on a first-come first-serve basis. The jobTracker 5.1. Amazon Web Servicesmanages the assignment of MapReduce jobs The ‘Amazon Web Services’ is the set ofto the taskTrackers. The taskTrackers cloud computing services offered byexecute tasks upon instruction from the Amazon. It involves four different services.jobTracker and also handle data motion They are Elastic Compute Cloud (EC2),between the ‘map’ and ‘reduce’ phases of Simple Storage Service (S3), Simple Queuethe MapReduce job. Service (SQS) and Simple DatabaseHadoop is a framework which has received a Service (SDB).wide industry adoption. Hadoop 1. Elastic Compute Cloud (EC2)is used along with other cloud computing Amazon Elastic Compute Cloud (Amazontechnologies like the Amazon services so as EC2) is a web service thatto make better use of the resources. There provides resizable compute capacity in theare many instances where Hadoop has been cloud. It is designed to make webscaleused. Amazon makes use of Hadoop for computing easier for developers. It providesprocessing millions of sessions which it uses on-demand processingfor analytics. This is made use of in a cluster power.which has about 1 to 100 nodes.Page 6 of 9
  7. 7. Amazon EC2s simple web service interface CPU, and instance storage that is optimal forallows you to obtain and your application. Amazon EC2configure capacity with minimal friction. It offers a highly reliable environment whereprovides you with complete replacement instances can becontrol of your computing resources and lets rapidly and reliably commissioned. Amazonyou run on Amazons proven EC2 provides web servicecomputing environment. Amazon EC2 interfaces to configure firewall settings thatreduces the time required to obtain and control network access to andboot new server instances to minutes, between groups of instances. You will beallowing you to quickly scale capacity, charged at the end of each month forboth up and down, as your computing your EC2 resources actually consumed. Sorequirements change. Amazon EC2 charging will be based on thechanges the economics of computing by actual usage of the resources.allowing you to pay only for capacity 2. Simple Storage Service (S3)that you actually use. Amazon EC2 provides S3 or Simple Storage Service offers clouddevelopers the tools to build computing storage service.failure resilient applications and isolate It offers services for storage of data in thethemselves from common failure cloud. It provides a high-availabilityscenarios. large-store database. It provides a simpleAmazon EC2 presents a true virtual SQL-like language. It has beencomputing environment, allowing designed for interactive online use. S3 isyou to use web service interfaces to storage for the Internet. It is designedrequisition machines for use, load them to make web-scale computing easier forwith your custom application environment, developers. S3 provides a simple webmanage your networks access services interface that can be used to storepermissions, and run your image using as and retrieve any amount of data, atmany or few systems as you desire. any time, from anywhere on the web. ItTo set up an Amazon EC2 node we have to gives any developer access to thecreate an EC2 node same highly scalable, reliable, fast,configuration which consists of all our inexpensive data storage infrastructure thatapplications, libraries, data and Amazon uses to run its own global networkassociated configuration settings. This of web sites.configuration is then saved as an AMI Amazon S3 allows write, read and delete of(Amazon Machine Image). There are also objects containing from 1several stock instances of Amazon byte to 5 gigabytes of data each. The numberAMIs available which can be customized of objects that you can store isand used. We can then start, unlimited. Each object is stored in a bucketterminate and monitor as many instances of and retrieved via a uniquethe AMI as needed. developer-assigned key. A bucket can beAmazon EC2 enables you to increase or located anywhere in Europe or thedecrease capacity within Americas but can be accessed fromminutes. You can commission one, hundreds anywhere. Authentication mechanisms areor even thousands of server provided to ensure that the data is keptinstances simultaneously. Thus the secure from unauthorized access.applications can automatically scale itself Objects can be made private or public, andup and down depending on its needs. You rights can be granted to specifichave root access to each one, and users for particular objects. Also the S3you can interact with them as you would any service also works with a pay only formachine. You have the choice of what you use method of payment.several instance types, allowing you to select 3. Simple Queue Service (SQS)a configuration of memory, Amazon Simple Queue Service (SQS) offers a reliable, highlyPage 7 of 9
  8. 8. scalable, hosted queue for storing messages infrastructure. App Engine applications areas they travel between computers. easy to build, easy to maintain, and easyBy using SQS, developers can simply move to scale as your traffic and data storagedata between distributed needs grow. You can serve your app using acomponents of their applications that free domain name on the appspot.comperform different tasks, without losing domain, or use Google Apps to serve it frommessages or requiring each component to be your own domain. You can share youralways available application with the world, or limit access to members of your organization. App Engine costs nothing to get started. Sign up for a free account, and you can develop and publish your application at no charge and with no obligation. A free account can use up to 500MB of persistent storage and enough CPU and bandwidth for about 5 million pageWith SQS, developers can create an views a month.unlimited number of SQS queues, Google App Engine makes it easy to buildeach of which can send and receive an an application that runs reliably,unlimited number of messages even under heavy load and with largeMessages can be retained in a queue for up amounts of data. The environment includesto 4 days. It is simple, reliable, thesecure and scalable. following features:4. Simple Database Service (SDB) • dynamic web serving, with full support forAmazon SimpleDB is a web service for common web technologiesrunning queries on structured data • persistent storage with queries, sorting andin real time. This service works in close transactionsconjunction with the Amazon S3 and • automatic scaling and load balancingEC2, collectively providing the ability to • APIs for authenticating users and sendingstore, process and query data sets in email using Google Accountsthe cloud. These services are designed to • a fully featured local developmentmake web-scale computing easier environment that simulates Google Appand more cost-effective to developers. Engine on your computerTraditionally, this type of functionality Google App Engine applications areis accomplished with a clustered relational implemented using the Pythondatabase, which requires a sizable programming language. The runtimeupfront investment and often requires a environment includes the full PythonDBA to maintain and administer them. languageAmazon SDB provides all these without the and most of the Python standard library.operational complexity. It Applications run in a secure environmentrequires no schema, automatically indexes thatyour data and provides a simple provides limited access to the underlyingAPI for storage and access. Developers gain operating system. These limitations allowaccess to the different App Engine to distribute web requests forfunctionalities from within the Amazon’s the application across multiple servers, andproven computing environment and start and stop servers to meet trafficare able to scale instantly and need to pay demands.only for what they use. App Engine includes a service API for5.2. Google App Engine integrating with Google Accounts.Google App Engine lets you run your web Your application can allow a user to sign inapplications on Googles with a Google account, and access thePage 8 of 9
  9. 9. email address and displayable nameassociated with the account. Using Google 8. ReferencesAccounts lets the user start using your 1. faster, because the user may not /15FE-cloud-computing-reality_1.html,need to create a new account. It also saves “What Cloud Computing Really Means”you the effort of implementing a user 2.account system just for your application Engine provides a variety of services ting.pdfthat enable you to perform common “Welcome to the new era of cloudoperations when managing your application. computing PPT”The following APIs are provided to 3. these services: Applications can “Demystifying Clouds” - discusses manyaccess resources on the Internet, such as web players in the cloud spaceservices or other data, using App Engines 4 refer all the HiPODS white papers atURL fetch service. Applications can send es/hipods/library.htmlemail messages using App Engines mailservice. The mail service uses Googleinfrastructure to send email messages. TheImage service lets your applicationmanipulate images. With this API, you canresize, crop, rotate and flip images inJPEG and PNG formats.In theory, Google claims App Engine canscale nicely. But Google currentlyplaces a limit of 5 million hits per month oneach application. This limit nullifies AppEngines scalability, because any small,dedicated server can have this performance.Google will eventually allow webmasters togo beyond this limit (if they pay).7. ConclusionCloud computing is a powerful newabstraction for large scale data processingsystems which is scalable, reliable andavailable. In cloud computing, there arelargeself-managed server pools available whichreduces the overhead and eliminatesmanagement headache. Cloud computingservices can also grow and shrink accordingto need. Cloud computing is particularlyvaluable to small and medium businesses,where effective and affordable IT tools arecritical to helping them become moreproductive without spending lots of moneyon in-house resources and technicalequipment. Also it is a new emergingarchitecture needed to expand the Internet tobecome the computing platform of thefuture.Page 9 of 9