Exploring the Cloud Chris Sosa, Dr. Andrew Grimshaw sosa, grimshaw @ cs.virginia.edu University of Virginia
Introduction - 1 Ever-increasing demand for computing resources During non-peak times, computing resources sit idle Still paying! Power, cooling, etc Total Cost of Ownership (TCO) is much more than the cost of Hardware  Maintenance Administration Cooling Etc. University of Virginia
Introduction - 2 Observation – load on main ITC clusters exhibit bimodal distribution Can we only pay for what we use?  University of Virginia
Enter Cloud Computing (field trip!) What is it? Infrastructure-related capabilities provided as a service Also known as utility-computing and is associated with very basic API’s Lots of industry support Amazon Infrastructure Services:  EC2, S3, …  Google App Engine Microsoft Azure IBM led initiatives University of Virginia
Cloud Computing Paradigms Top-down:  Client only provides program and deployment information Microsoft Azure Google App Engine Bottom-up:  Raw Infrastructure provided (virtualized hardware)  Amazon Nirvanix Flexiscale GoGrid University of Virginia
Advantages and Disadvantages of Using the Cloud Advantages Pay for what you use  – model is based on how long you use resources.  You can allocate and deallocate them on-the-fly Hardware cost, set-up time, maintenance, cooling all go down to zero Can start developing immediately Disadvantages No control over physical resources.  Do you trust Amazon? SLA’s may not be good enough.  Is 99.95% availability good enough? Some limitations in what you can run.  Must stay within the API / framework given University of Virginia
Why Cloud Computing Only have to  pay for what we use Disadvantages do not affect most users in a batch system University of Virginia
Amazon Leading the Push Amazon has been most successful player so far Over 29 billion objects stored on S3 Using over 60% of their resources for Cloud services EC2 just went out of Beta in October (new) …  rest of these slides will assume we use Amazon University of Virginia
Outline Introduction Overview of Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
Amazon S3 Simple Storage for the Internet Applications can interact with various mechanisms REST SOAP Bit Torrent 250 Mb/second network link Objects stored in buckets Buckets have own namespace Up to 100 buckets per account  Unlimited objects per bucket 5 GB limit on size of objects Objects are write-once SLA guarantees 99.9% availability University of Virginia
S3 Pricing Storage $0.15 per GB-Month of storage used  Data Transfer $0.10 per GB - all data transfer in $0.18 per GB - first 10 TB / month data transfer out $0.16 per GB - next 40 TB / month data transfer out $0.13 per GB - data transfer out / month over 50 TB FREE to EC2 Requests $0.01 per 1,000 PUT or LIST requests $0.01 per 10,000 GET and all other requests* * No charge for delete requests  University of Virginia
Amazon EC2 Provides Virtual Compute Resources Purchase CPU’s on hourly basis Can use provided virtual machine images, or make own Virtual Machines run atop Xen Can do meta data operations with REST, SOAP, command-line tools Instances assigned IP address for SSH, remote desktop, etc SLA guarantees 99.95% availability University of Virginia
EC2 Pricing  Instances $0.10 / hr - Small Instance - 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core - 1.7 GHz processor), 160 GB of instance storage, 32-bit platform (can buy in sets of 1, 4, 8) $0.20 / hr - High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform  (can buy in sets of 1 or 4) Data Transfer  $0.10 per GB – data in  $0.18 per GB - first 10 TB out FREE to S3 University of Virginia
Overview Introduction Overview of Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
Main Idea Reduce the number of resources we have active and improve peak performance Modify local scheduler When CPU usage is above threshold, allocate new machines from EC2 and schedule jobs As usage decreases, deallocate resources and return to normal usage University of Virginia
Design University of Virginia
Research Setup Instead of spending funds on running experiments using EC2 and S3, we will be using Eucalyptus to emulate EC2 Eucalyptus is an open-source implementation of the EC2 interface Requires Xen be installed on host machines (need dedicated machines) Create a centralized repository for data for our tests (S3) NFS share Other possibilities?  University of Virginia
Task Bar Decide on the software that will be installed on the virtual machines PBS licensing is complicated and expensive Several alternatives such as Genesis II, Hadoop, etc. Create AMI image and register with Eucalyptus Incorporate virtual machines from Eucalyptus into existing scheduler and create mechanism to do this on-the-fly Modify scheduler to take into account a threshold Build stubs to measure how much bandwidth, time, etc. is being used by the scheduler so that we can determine the price we would be charged by Amazon's EC2 and S3 Incorporate these costs, build economic model using actual workloads at UVa, differing thresholds, and various ways of passing jobs to the Cloud University of Virginia
Overview Introduction Overview of Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
Questions to be Answered What is the Cost Model associated with working with Cloud computing? What costs would be associated with common jobs being run at UVa? What software will we have installed on the Virtual Machines in the Cloud? How can we create a threshold such that we can decide on-the-fly when to start offloading resources to Cloud resources? University of Virginia
Overview Introduction Overview of Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
Conclusions Important to be concerned about reducing costs as well as getting bigger bang for your buck Offloading job processing to Cloud computing infrastructures can save costs while improving peak throughput University of Virginia
Questions? University of Virginia

Exploring The Cloud

  • 1.
    Exploring the CloudChris Sosa, Dr. Andrew Grimshaw sosa, grimshaw @ cs.virginia.edu University of Virginia
  • 2.
    Introduction - 1Ever-increasing demand for computing resources During non-peak times, computing resources sit idle Still paying! Power, cooling, etc Total Cost of Ownership (TCO) is much more than the cost of Hardware Maintenance Administration Cooling Etc. University of Virginia
  • 3.
    Introduction - 2Observation – load on main ITC clusters exhibit bimodal distribution Can we only pay for what we use? University of Virginia
  • 4.
    Enter Cloud Computing(field trip!) What is it? Infrastructure-related capabilities provided as a service Also known as utility-computing and is associated with very basic API’s Lots of industry support Amazon Infrastructure Services: EC2, S3, … Google App Engine Microsoft Azure IBM led initiatives University of Virginia
  • 5.
    Cloud Computing ParadigmsTop-down: Client only provides program and deployment information Microsoft Azure Google App Engine Bottom-up: Raw Infrastructure provided (virtualized hardware) Amazon Nirvanix Flexiscale GoGrid University of Virginia
  • 6.
    Advantages and Disadvantagesof Using the Cloud Advantages Pay for what you use – model is based on how long you use resources. You can allocate and deallocate them on-the-fly Hardware cost, set-up time, maintenance, cooling all go down to zero Can start developing immediately Disadvantages No control over physical resources. Do you trust Amazon? SLA’s may not be good enough. Is 99.95% availability good enough? Some limitations in what you can run. Must stay within the API / framework given University of Virginia
  • 7.
    Why Cloud ComputingOnly have to pay for what we use Disadvantages do not affect most users in a batch system University of Virginia
  • 8.
    Amazon Leading thePush Amazon has been most successful player so far Over 29 billion objects stored on S3 Using over 60% of their resources for Cloud services EC2 just went out of Beta in October (new) … rest of these slides will assume we use Amazon University of Virginia
  • 9.
    Outline Introduction Overviewof Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
  • 10.
    Amazon S3 SimpleStorage for the Internet Applications can interact with various mechanisms REST SOAP Bit Torrent 250 Mb/second network link Objects stored in buckets Buckets have own namespace Up to 100 buckets per account Unlimited objects per bucket 5 GB limit on size of objects Objects are write-once SLA guarantees 99.9% availability University of Virginia
  • 11.
    S3 Pricing Storage$0.15 per GB-Month of storage used Data Transfer $0.10 per GB - all data transfer in $0.18 per GB - first 10 TB / month data transfer out $0.16 per GB - next 40 TB / month data transfer out $0.13 per GB - data transfer out / month over 50 TB FREE to EC2 Requests $0.01 per 1,000 PUT or LIST requests $0.01 per 10,000 GET and all other requests* * No charge for delete requests University of Virginia
  • 12.
    Amazon EC2 ProvidesVirtual Compute Resources Purchase CPU’s on hourly basis Can use provided virtual machine images, or make own Virtual Machines run atop Xen Can do meta data operations with REST, SOAP, command-line tools Instances assigned IP address for SSH, remote desktop, etc SLA guarantees 99.95% availability University of Virginia
  • 13.
    EC2 Pricing Instances $0.10 / hr - Small Instance - 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core - 1.7 GHz processor), 160 GB of instance storage, 32-bit platform (can buy in sets of 1, 4, 8) $0.20 / hr - High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform (can buy in sets of 1 or 4) Data Transfer $0.10 per GB – data in $0.18 per GB - first 10 TB out FREE to S3 University of Virginia
  • 14.
    Overview Introduction Overviewof Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
  • 15.
    Main Idea Reducethe number of resources we have active and improve peak performance Modify local scheduler When CPU usage is above threshold, allocate new machines from EC2 and schedule jobs As usage decreases, deallocate resources and return to normal usage University of Virginia
  • 16.
  • 17.
    Research Setup Insteadof spending funds on running experiments using EC2 and S3, we will be using Eucalyptus to emulate EC2 Eucalyptus is an open-source implementation of the EC2 interface Requires Xen be installed on host machines (need dedicated machines) Create a centralized repository for data for our tests (S3) NFS share Other possibilities? University of Virginia
  • 18.
    Task Bar Decideon the software that will be installed on the virtual machines PBS licensing is complicated and expensive Several alternatives such as Genesis II, Hadoop, etc. Create AMI image and register with Eucalyptus Incorporate virtual machines from Eucalyptus into existing scheduler and create mechanism to do this on-the-fly Modify scheduler to take into account a threshold Build stubs to measure how much bandwidth, time, etc. is being used by the scheduler so that we can determine the price we would be charged by Amazon's EC2 and S3 Incorporate these costs, build economic model using actual workloads at UVa, differing thresholds, and various ways of passing jobs to the Cloud University of Virginia
  • 19.
    Overview Introduction Overviewof Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
  • 20.
    Questions to beAnswered What is the Cost Model associated with working with Cloud computing? What costs would be associated with common jobs being run at UVa? What software will we have installed on the Virtual Machines in the Cloud? How can we create a threshold such that we can decide on-the-fly when to start offloading resources to Cloud resources? University of Virginia
  • 21.
    Overview Introduction Overviewof Amazon Cloud Services Proposal of Hybrid Scheduler Questions to be Answered Conclusion University of Virginia
  • 22.
    Conclusions Important tobe concerned about reducing costs as well as getting bigger bang for your buck Offloading job processing to Cloud computing infrastructures can save costs while improving peak throughput University of Virginia
  • 23.

Editor's Notes

  • #3 - Gartner group and IBM estimate that hardware is about 17% of the overall cost of ownership
  • #5 Other big players: Nirvanix, Flexiscale, Rackspace, GoGrid
  • #6 Bottom-up: . Client builds applications to take advantage of limited API
  • #8 Due to the economic crisis, many organizations are moving to hosting their IT infrastructure on the Cloud as well