An Introduction to Cloud ComputingRobert GrossmanDecember 8, 2009
Part 1Introduction2
What is a Cloud?Clouds provide elastic, on-demand resources or services over a network, often the Internet, with the scale and reliability of a data center.The NIST definition has become standard.Cloud architectures are not new.What is new:ScaleEase of usePricing model.3
4Scale is new.
Elastic, Usage Based Pricing Is New5costs the same as1 computer in a rack for 120 hours120 computers in  three racks for 1 hour Elastic, usage based pricing turns capex into opex.
 Clouds can manage surges in computing needs.Simplicity Offered By the Cloud is New6+.. and you have a computer ready to work.A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
Two Types of CloudsOn-demand resources & services over a network at the scale of a data centerOn-demand, elastic computing instances (IaaS)IaaS: Amazon EC2, S3, etc.; Eucalyptussupports many Web 2.0 applications/usersLarge data clouds (Large Data PaaS)GFS/MapReduce/Bigtable, Hadoop, Sector, …Manage and compute with large data  (say 100+ TB)7
Ease of use – With Google’s GFS & MapReduce, it is simple  to compute with 10 terabytes of data over 100 nodes.  With Amazon’s AMIs, it is simple to respond to a surge of 100 additional web servers.8
Cloud Architectures – How Do You Fill a Data Center?on-demand computing capacityAppAppAppAppAppon-demand computing instancesCloud Data Services (BigTable, etc.) Quasi-relational Data ServicesAppAppCloud Compute Services (MapReduce & Generalizations)AppApp…AppAppAppCloud Storage Services
Varieties of CloudsArchitectural ModelComputing Instances vs Computing CapacityEconomic ModelElastic, usage based pricing, lease/own, …Management ModelPrivate vs Public; Single vs Multiple Tenant; …Programming ModelQueue Service, MPI, MapReduce, Distributed UDF10Computing instances vs computing capacityPrivate internal vspublic external Elastic, usage-based pricing or notAll combinations occur.
Payment ModelsBuying racks, containers and data centersLeasing racks containers and data centersUtility based computing (pay as you go)Moves cap ex to op exHandle surge requirements (use 1000 servers for 1 hour vs 1 server for 1000 hours)11
Management ModelsPublic, private and hybrid modelsSingle tenant vs multiple tenant (shared vs non-shared hardware)Owned vs leasedManage yourself vs outsource managementAll combinations are possible12
Programming Model13on-demandcomputing instanceson-demand computing capacityAmazon’s Simple Queue ServiceMPI, sockets, FIFODryadLINQ
Azure services
MapReduce
Distributed UDFApplicationsAppsCompute ServicesData ServicesMetadata ServicesPaaSStorage ServicesIdentity ManagerVirtual Machine ManagerVirtual Network ManagerIaaSNetwork Transport
Instances, Services & Frameworks15Hadoop DFS & MapReduceGoogle AppEngineMicrosoft AzureForce.comVMWareVmotion…many instancesAmazon’s SQSAzure ServicesAmazon’s EC2single instanceS3instance(IaaS)serviceframework(PaaS)operating system
Part 2.  Cloud Computing Industry“Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions16Cloud computing is approaching the top of the Gartner hype cycle.
Cloud Computing Eco-SystemNo agreed upon terminologyVendors supporting data centersVendors providing cloud apps & services to end usersVendors supporting the industry i.e. those developing cloud applications and services for themselves or to sell to end usersCommunities developing software, standards, benchmarks, etc.17
Cloud Computing Ecosystem18Consumers of Software as a ServiceProviders of Software as a ServiceData CentersConsumers of Cloud ServicesProviders of Cloud ServicesBerkeley RAD Report on cloud computing divides industry into these layers.
Transition Taking PlaceA hand full of players are building multiple data centers a year and improving with each one.This includes Google, Microsoft, Yahoo, …A data center today costs $200 M – $400+ MBerkeley RAD Report points out analogy with semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B 19
Data Center Operating Systems20……VM 50,000VM 1VM 1VM 5Data Center Operating SystemworkstationData center services include: VM management services, business continuity services, security services, power management services, etc.
Building Data CentersSun’s Modular Data Center (MD)Formerly Project BlackboxContainers used by Google, Microsoft & othersData center consists of 10-60+ containers.21
Mindmeister Map of Cloud ComputingDupont’sMindmeister Map divides the industry:IaaS, PaaS, Management, Communityhttp://www.mindmeister.com/maps/show_public/1593605822
Part 3Virtualization23
VirtualizationVirtualization separates logical infrastructure from the underlying physical resources to decrease time to make changes, improve flexibility, improve utilization and reduce costsExample - server virtualization.  Use one physical server to support multiple logical virtual machines (VMs), which are sometimes called logical partitions (LPARs)Technology pioneered by IBM in 1960s to better utilize mainframes24
Idea Dates Back to the 1960s25AppAppAppCMSCMSMVSIBM VM/370IBM MainframeNative (Full) VirtualizationExamples: Vmware ESX
Two Types of Virtualization26AppsAppsUnmodified Guest OS 1Unmodified Guest OS 2Modified Guest OS 1Modified Guest OS 2HyperviserHyperviserPhysical HardwarePhysical HardwareNative (Full) VirtualizationExamples: Vmware ESXPara VirtualizationExamples: XenUsing the hypervisor, each guest OS sees its own independent copy of the CPU, memory, IO, etc.
Four Key PropertiesPartitioning: run multiple VMs on one physical server; one VM doesn’t know about the othersIsolation: security isolation is at the hardware level.Encapsulation: entire state of the machine can be copied to files and moved aroundHardware abstraction: provision and migrate VM to another server27
Managing Virtual MachinesProvision VMSchedule VMMonitor VMSelf-service portal for VM28
Part 4  Technical differences between clouds for data intensive computing, databases and supercomputers29
Supercomputer Center ModelorData Center Model
What Resource is Managed?Scarce processors wait for dataManage cycleswait for an opening in the queuescatter the data to the processorsand gather the resultsPersistent data wait for queriesManage datapersistent data waits for queriescomputation done locallyresults returnedSupercomputer Center Model (local)HPC Grid(distributed)Data Center 2.0 ModelDistributed 2.0Data Centers
DatabasesvsData CloudsTrading functionality for scalability.32
Trading Functionality for Scalability33
Not Everyone AgreesDavid J. DeWitt and Michael Stonebraker, MapReduce: A Major Step Backwards, Database Column, Jane 17, 200834

An Introduction to Cloud Computing (2009)

  • 1.
    An Introduction toCloud ComputingRobert GrossmanDecember 8, 2009
  • 2.
  • 3.
    What is aCloud?Clouds provide elastic, on-demand resources or services over a network, often the Internet, with the scale and reliability of a data center.The NIST definition has become standard.Cloud architectures are not new.What is new:ScaleEase of usePricing model.3
  • 4.
  • 5.
    Elastic, Usage BasedPricing Is New5costs the same as1 computer in a rack for 120 hours120 computers in three racks for 1 hour Elastic, usage based pricing turns capex into opex.
  • 6.
    Clouds canmanage surges in computing needs.Simplicity Offered By the Cloud is New6+.. and you have a computer ready to work.A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
  • 7.
    Two Types ofCloudsOn-demand resources & services over a network at the scale of a data centerOn-demand, elastic computing instances (IaaS)IaaS: Amazon EC2, S3, etc.; Eucalyptussupports many Web 2.0 applications/usersLarge data clouds (Large Data PaaS)GFS/MapReduce/Bigtable, Hadoop, Sector, …Manage and compute with large data (say 100+ TB)7
  • 8.
    Ease of use– With Google’s GFS & MapReduce, it is simple to compute with 10 terabytes of data over 100 nodes. With Amazon’s AMIs, it is simple to respond to a surge of 100 additional web servers.8
  • 9.
    Cloud Architectures –How Do You Fill a Data Center?on-demand computing capacityAppAppAppAppAppon-demand computing instancesCloud Data Services (BigTable, etc.) Quasi-relational Data ServicesAppAppCloud Compute Services (MapReduce & Generalizations)AppApp…AppAppAppCloud Storage Services
  • 10.
    Varieties of CloudsArchitecturalModelComputing Instances vs Computing CapacityEconomic ModelElastic, usage based pricing, lease/own, …Management ModelPrivate vs Public; Single vs Multiple Tenant; …Programming ModelQueue Service, MPI, MapReduce, Distributed UDF10Computing instances vs computing capacityPrivate internal vspublic external Elastic, usage-based pricing or notAll combinations occur.
  • 11.
    Payment ModelsBuying racks,containers and data centersLeasing racks containers and data centersUtility based computing (pay as you go)Moves cap ex to op exHandle surge requirements (use 1000 servers for 1 hour vs 1 server for 1000 hours)11
  • 12.
    Management ModelsPublic, privateand hybrid modelsSingle tenant vs multiple tenant (shared vs non-shared hardware)Owned vs leasedManage yourself vs outsource managementAll combinations are possible12
  • 13.
    Programming Model13on-demandcomputing instanceson-demandcomputing capacityAmazon’s Simple Queue ServiceMPI, sockets, FIFODryadLINQ
  • 14.
  • 15.
  • 16.
    Distributed UDFApplicationsAppsCompute ServicesDataServicesMetadata ServicesPaaSStorage ServicesIdentity ManagerVirtual Machine ManagerVirtual Network ManagerIaaSNetwork Transport
  • 17.
    Instances, Services &Frameworks15Hadoop DFS & MapReduceGoogle AppEngineMicrosoft AzureForce.comVMWareVmotion…many instancesAmazon’s SQSAzure ServicesAmazon’s EC2single instanceS3instance(IaaS)serviceframework(PaaS)operating system
  • 18.
    Part 2. Cloud Computing Industry“Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions16Cloud computing is approaching the top of the Gartner hype cycle.
  • 19.
    Cloud Computing Eco-SystemNoagreed upon terminologyVendors supporting data centersVendors providing cloud apps & services to end usersVendors supporting the industry i.e. those developing cloud applications and services for themselves or to sell to end usersCommunities developing software, standards, benchmarks, etc.17
  • 20.
    Cloud Computing Ecosystem18Consumersof Software as a ServiceProviders of Software as a ServiceData CentersConsumers of Cloud ServicesProviders of Cloud ServicesBerkeley RAD Report on cloud computing divides industry into these layers.
  • 21.
    Transition Taking PlaceAhand full of players are building multiple data centers a year and improving with each one.This includes Google, Microsoft, Yahoo, …A data center today costs $200 M – $400+ MBerkeley RAD Report points out analogy with semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B 19
  • 22.
    Data Center OperatingSystems20……VM 50,000VM 1VM 1VM 5Data Center Operating SystemworkstationData center services include: VM management services, business continuity services, security services, power management services, etc.
  • 23.
    Building Data CentersSun’sModular Data Center (MD)Formerly Project BlackboxContainers used by Google, Microsoft & othersData center consists of 10-60+ containers.21
  • 24.
    Mindmeister Map ofCloud ComputingDupont’sMindmeister Map divides the industry:IaaS, PaaS, Management, Communityhttp://www.mindmeister.com/maps/show_public/1593605822
  • 25.
  • 26.
    VirtualizationVirtualization separates logicalinfrastructure from the underlying physical resources to decrease time to make changes, improve flexibility, improve utilization and reduce costsExample - server virtualization. Use one physical server to support multiple logical virtual machines (VMs), which are sometimes called logical partitions (LPARs)Technology pioneered by IBM in 1960s to better utilize mainframes24
  • 27.
    Idea Dates Backto the 1960s25AppAppAppCMSCMSMVSIBM VM/370IBM MainframeNative (Full) VirtualizationExamples: Vmware ESX
  • 28.
    Two Types ofVirtualization26AppsAppsUnmodified Guest OS 1Unmodified Guest OS 2Modified Guest OS 1Modified Guest OS 2HyperviserHyperviserPhysical HardwarePhysical HardwareNative (Full) VirtualizationExamples: Vmware ESXPara VirtualizationExamples: XenUsing the hypervisor, each guest OS sees its own independent copy of the CPU, memory, IO, etc.
  • 29.
    Four Key PropertiesPartitioning:run multiple VMs on one physical server; one VM doesn’t know about the othersIsolation: security isolation is at the hardware level.Encapsulation: entire state of the machine can be copied to files and moved aroundHardware abstraction: provision and migrate VM to another server27
  • 30.
    Managing Virtual MachinesProvisionVMSchedule VMMonitor VMSelf-service portal for VM28
  • 31.
    Part 4 Technical differences between clouds for data intensive computing, databases and supercomputers29
  • 32.
  • 33.
    What Resource isManaged?Scarce processors wait for dataManage cycleswait for an opening in the queuescatter the data to the processorsand gather the resultsPersistent data wait for queriesManage datapersistent data waits for queriescomputation done locallyresults returnedSupercomputer Center Model (local)HPC Grid(distributed)Data Center 2.0 ModelDistributed 2.0Data Centers
  • 34.
  • 35.
  • 36.
    Not Everyone AgreesDavidJ. DeWitt and Michael Stonebraker, MapReduce: A Major Step Backwards, Database Column, Jane 17, 200834